Being a bioinformatician is hard · Posted: Aug 06, 2007

Given a set of data, you have to understand many things. First, you have to know the biological relevance. How was it produced, what does the data mean, and what is the significance? Lab biologists need to know this as well, but a bioinformatician must also know, in addition, how to store the data. What is the best method of representing it, given that the data needs to pulled out and manipulated within computer code.

A bioinformatician needs to know what’s statistically feasible given the available data. It’s going to be tricky to get answers from only three replicates of a noisy microarray experiment. Can you use a SVD to filter some of this noise? What about a microarray experiment with 10 different drug treatments. Where do you begin, can you use dimensionality reduction? How about using clustering? How many clusters? Did someone say probabilistic?

Admittedly producing results in bioinformatics is easy. Compared with the wet lab, where gels have to run, and overnight experiments have to be started at 6am. We do have it easy in this respect, it’s entirely possible to produce a new figure, or another set of data every day. But this blessing is also a curse; with the ability to quickly produce so many figures and answers, it’s also easy to get swamped, or lose focus on the overall goal. Being a bioinformatician you need to be disciplined, not because the web and email is a constant distraction, but because you need to always being thinking about the larger picture.

What you’re doing right now, how does it relate to the larger research question. There’s a always a danger of being pulled down a side track, doing work because it would interesting to see what happens. But unless it’s something that leads towards you writing a paper, you’re wasting your time. It’s tough, but it’s true. We’re judged on publications, and we always need to be focused on the big question.

In addition to discipline, you need to believe in yourself. You’ll never have something that you can hold in your hand and say “I made this”. No gel pictures, or tubes of purified protein. We produce p-values, programs, and figures. When things are getting hard, you need a measure of self belief, and good friends to talk to, because tomorrow you’ll have to sit behind the same desk, at the same screen, and keep plugging away at the same problems, problems only you really understand.

Unfortunately you’ll also need a measure of self confidence because, at the moment, bioinformatics is not taken seriously by the majority of biologists. Our peers in the lab think we sit around all day drinking tea and pushing buttons. What we do is either easy, or not “proper science” because we don’t work in a laboratory. The reason for this, partly our fault, is because most lab biologists don’t understand what we do day-to-day. Modern day biology uses the computer more and more, and this means having to ask for help from a bioinformatician. Unthinkable for an “old school” biologist.

But it’s worth it. Because bioinformaticians are at the forefront of biological science. The fruits of our work were unimaginable twenty years ago. A tool where you can compare a nucleic acid sequence against all those ever found? Analyse the transcript profiles of 23,000 human genes and pick the handful that are drug targets? Simulate the entire reaction network of a cell and be able to predict the genes that are essential? Other biologists have difficulty understanding what we do, because it’s hard. We’ve put in the time to learn how to program, understand what a support vector machine is, and when to use a t-test or an ANOVA. For me, being a bioinformatician is gruelling, personally as well as professionally. But the satisfaction comes from the zen-like moments, when everything comes together. When problems are broken down mentally, and then computationally using a fast and elegant programming solution. When the black arts of statistics, multivariate analysis, and machine learning are your everyday toolbox.

Interested in bioinformatics and data analysis? Follow me on twitter.