
For instance, if you have a set of data, you have to understand many things. First, you have to know the biological relevance. How was it produced, what does the data mean, and what is the significance? Lab biologists need to know this as well, but a bioinformatician must also know, in addition, how to store the data. What is the best method of representing it, given that the data needs to pulled out and manipulated within computer code. Next the bioinformatician needs to know what’s statistically feasible given what’s available. It’s going to be tricky to get answers from only three replicates of a noisy microarray experiment. Can you use a SVD to filter some of this noise? What about a microarray experiment with 10 different drug treatments. Where do you begin, can you use dimensionality reduction? How about using clustering? How many clusters? Did someone say probabilistic?
Read more »