Posts about programming

Three libraries and a tool to enhance your bioinformatics coding

March 30th, 2007

Coding is fact of life for bioinformatics. If you work in bioinformatics you probably enjoy coding to some extent. It’s our equivalent to PCR, western blots and sequencing. So whether your weapon of choice is Java, Perl, Python or C++, here’s three packages and a tool worth a look.

Read more »

Twelve reasons to favour simplicity over complexity

March 20th, 2007

I think simple is better. Statistics says so too. Statistics says that you’ll probably read the first two paragraphs of this post, look at the pictures then go elsewhere. So I’d simply better get to my point. In terms of attention spans, computer code and (statistical) explanations, and possibly everything in general, I think it’s always better to favour simplicity over complexity.

Read more »

Bioinformatics : which programming language to use?

March 14th, 2007

Two recent posts on using programming languages in bioinformatics. One at biowhat and the other at Omics! Omics!. Both discuss what type of language to use. Heavy weight languages such as C++ and Java versus lighter scripting languages such as Perl, Ruby and Python.

I think this depends on what what your research goals are. If your aim is to build a tool for biologists, then you probably need an application building language such as C or Java. On the other hand if you want to find an answer to a biological question then it’s a lot easier to create a short Perl script than manipulates the data to produce the desired result.

Heavy weight
My background is biology rather than computing science, but I find languages like Java encourage a better coding style. Which if you’re working on a large project, is what you want. The object orientation aspects such as polymorphism and encapsulation work to prevent bugs. The syntax of these languages are often a lot stricter; object types are declared and generics can be used to further enforce correct allocation of resources. Development environments such as Eclipse and Netbeans can also make the production relatively quick. On the other hand using a language like this to strip a set of protein names from a file can be rather cumbersome and somewhat overkill.

Light weight
Perl was originally intended as a regular expression language for manipulating text. Something that is still very useful in biology, given the vast array of non-standard formats that biological data is distributed in. If you want to quickly strip data from a file, then Perl is by far the best choice. Which is probably what has made Perl the most popular choice of language in bioinformatics, and led to the incredibly successful bioperl project. A very useful set of libraries for performing common bioinformatics tasks; created and maintained by the community.

Specialised
If you want to create a non-linear mixed effects model, or solve a series of stochastic differential equations then you’ll need a language designed with specific set of functions in mind. Examples are the impenetrably named “R” for statistics, and the more descriptive Matlab/Mathematica for, unsurprisingly, mathematics. Numerical languages such as these also take care of the sometimes tricky binary imprecision problem. Where storing a base 10 number in base 2 format can lead to inaccuracies.

Of course no programming language is a golden hammer that can solve all of your problems. Each has it’s own place. During my work I use a combination of Java, Ant and Hibernate to maintain a large omic database. I then use R to pull the data and run my statistical analyses. Using a database also decouples stripping the data out of the files, from running the statistical analysis. Have I mentioned before that databases are great?

Reinventing the wheel, badly

February 19th, 2007

I spent several hours today implementing a sequence analysis method taken from a paper I had read earlier. I created a database, downloaded yeast coding sequences, then coded the whole method up in java. Shortly after doing this, a google search showed that not only had a tool been published to perform this analysis, but that the original method I implemented was flawed.

I think the lesson here is to always check what the goal is of the research. Before touching the keyboard. Do you sometimes find that it’s easy to get bogged down in the individual details of implementation, such as coding, rather than the higher scientific question. Coding is enjoyable and is one of the reasons why I like bioinformatics. But what I have to keep telling myself is that in the end it’s about the science rather than the details of implementation. If someone else has already created a tool, or even better produced the results I’m after, then I can skip this step and start the intended analysis. Plus my implementation is probably a lot worse than someone who has given the problem a lot more consideration.