March 2007 Archive

Three libraries and a tool to enhance your bioinformatics coding

March 30th, 2007

Coding is fact of life for bioinformatics. If you work in bioinformatics you probably enjoy coding to some extent. It’s our equivalent to PCR, western blots and sequencing. So whether your weapon of choice is Java, Perl, Python or C++, here’s three packages and a tool worth a look.

Read more »

The Cambridge LaTeX thesis template

March 21st, 2007

If you’ve ever used LaTeX you’ll probably agree with me when I say that it makes writing documents much easier. A lot of the problems encountered using WYSIWYG editors, images jumping around, or manually numbering figures, just don’t exist.

Read more »

Twelve reasons to favour simplicity over complexity

March 20th, 2007

I think simple is better. Statistics says so too. Statistics says that you’ll probably read the first two paragraphs of this post, look at the pictures then go elsewhere. So I’d simply better get to my point. In terms of attention spans, computer code and (statistical) explanations, and possibly everything in general, I think it’s always better to favour simplicity over complexity.

Read more »

Bioinformatics : which programming language to use?

March 14th, 2007

Two recent posts on using programming languages in bioinformatics. One at biowhat and the other at Omics! Omics!. Both discuss what type of language to use. Heavy weight languages such as C++ and Java versus lighter scripting languages such as Perl, Ruby and Python.

I think this depends on what what your research goals are. If your aim is to build a tool for biologists, then you probably need an application building language such as C or Java. On the other hand if you want to find an answer to a biological question then it’s a lot easier to create a short Perl script than manipulates the data to produce the desired result.

Heavy weight
My background is biology rather than computing science, but I find languages like Java encourage a better coding style. Which if you’re working on a large project, is what you want. The object orientation aspects such as polymorphism and encapsulation work to prevent bugs. The syntax of these languages are often a lot stricter; object types are declared and generics can be used to further enforce correct allocation of resources. Development environments such as Eclipse and Netbeans can also make the production relatively quick. On the other hand using a language like this to strip a set of protein names from a file can be rather cumbersome and somewhat overkill.

Light weight
Perl was originally intended as a regular expression language for manipulating text. Something that is still very useful in biology, given the vast array of non-standard formats that biological data is distributed in. If you want to quickly strip data from a file, then Perl is by far the best choice. Which is probably what has made Perl the most popular choice of language in bioinformatics, and led to the incredibly successful bioperl project. A very useful set of libraries for performing common bioinformatics tasks; created and maintained by the community.

Specialised
If you want to create a non-linear mixed effects model, or solve a series of stochastic differential equations then you’ll need a language designed with specific set of functions in mind. Examples are the impenetrably named “R” for statistics, and the more descriptive Matlab/Mathematica for, unsurprisingly, mathematics. Numerical languages such as these also take care of the sometimes tricky binary imprecision problem. Where storing a base 10 number in base 2 format can lead to inaccuracies.

Of course no programming language is a golden hammer that can solve all of your problems. Each has it’s own place. During my work I use a combination of Java, Ant and Hibernate to maintain a large omic database. I then use R to pull the data and run my statistical analyses. Using a database also decouples stripping the data out of the files, from running the statistical analysis. Have I mentioned before that databases are great?

Using graphs in presentations and keeping your message simple

March 12th, 2007

A post at Presentation Zen discusses keeping the signal-to-noise ratio in presentations as low as possible. Definitely worth a look, the point is to keep your slides uncluttered (noise) so that the audience can focus on your message (signal).

As an example of this I recently gave a presentation to illustrate hierarchical regulation. I gave the talk to a non bioinformatics audience so therefore I was trying to present using a simple and straight forward manner. The slides included a couple of graphs, and since I’ve mentioned graphs in presentations previously I thought I’d include a few slides here. The presentation might appear minimal, but I was also speaking at the same time.

page 1
page 2
page 3
page 4
page 5
page 6
page 7
page 8
page 9
page 11