February 2007 Archive

The right graph, at the right time

February 28th, 2007

I think everyone would agree that the most important thing in science is results. The best scientists produce the most relevant and important results. Of course, the best results won’t matter if no one knows about them. Which is why we publish and give presentations.

Sometimes I see results in papers and presentations illustrated poorly. Graphs that don’t demostrate the point to the reader/audience in the best possible way. Here I give examples of how data can be presented in different contexts, based on two of my favorite resources. The first is the R language for statistics, the other is Garr ReynoldsPresentation Zen ideology.

A bad example
Here’s an extreme case, but not completely uncommon in presentations. Two continuous variables - the oxidation of ammonia to nitric acid, and air flow. The chart was produced, using default options, in NeoOffice.
Office example
My initial complaint, is the inappropriate x axis - the first half of the plot isn’t being used. The axis should begin around 40, where the data starts.

Next, the unattractive grey background and horizontal black lines. I personally find this style unpleasant, and would recommend that these always be removed.

Finally, the trend-line, the magenta color is not particularly nice, and why is it so thick? The wide line makes the chart look clunky and inelegant. If you’re making a chart, you want people to look at it, and appreciate the data. You’ve spent months slaving away to produce a set of results, so why not put the extra effort into presenting them well?

Producing a graph for a paper
Here is the same data produced using the default plot function in R.
Rexample

What strikes me about R plots, is how clean they appear. You could argue that it looks rather spartan, but the chart shows the data and nothing else. There are no frills, but then you want to illustrate your results efficiently. If the results aren’t that good, then no amount of fluffing will make them better. On the other hand if the results are good, extra decoration distracts from the main point.

Producing a graph for a presentation
Controversial, but I say don’t. If you can use a simpler way to show the result, do it. When looking at a chart in a paper, the reader has time to read the legend and think about what point it illustrates. I look at all the figures in a paper at least twice.

On the other hand, when presenting, you’ve usually got a limited time to get your point across. When you show a chart in a presentation the audience has to look at many things, the axis, points, trend-lines. This could distract from you, and your message.

What do you want to do in the time you have? You want to show your work as exciting and interesting to as many people as possible. How many times have you been in a presentation where there has been slide after slide of graphs. You can imagine that audience attention drops dramatically with each new plot. Here’s an example slide to illustrate how I would show the above data.

Presentation Example

This shows the point succinctly, no distractions. Remember that you’ll be talking at the same time as well. If the audience wants more information, they can find you afterwards. You can direct them to the great figures that you included in your paper!

Of course you’ll need to include a plot to demonstrate controversial and important results. The less plots you have prior to these, the more impact they and therefore your point, will have. Garr Reynolds has some tips (point 6) on producing graphs for presentations.

Finally
I’d like to end this post by quoting the R help page on the subject of pie charts

Pie charts are a very bad way of displaying information. The eye is good at judging linear measures and bad at judging relative areas. A bar chart or dot chart is a preferable way of displaying this type of data.

Cleveland (1985), page 264: “Data that can be shown by pie charts always can be shown by a dot chart. This means that judgements of position along a common scale can be made instead of the less accurate angle judgements.” This statement is based on the empirical investigations of Cleveland and McGill as well as investigations by perceptual psychologists.

Bioinformatics : use a database for data

February 26th, 2007

Previously, I wrote about organising your file system to make the relationships between files that produce data, and files containing data more descriptive. One of the best tips I’ve been given, is to store all my data in a database. Regardless of what the data is, or how “mission critical”. Here are some reasons to use a database, rather than files, to store your data.

Read more »

Reinventing the wheel, badly

February 19th, 2007

I spent several hours today implementing a sequence analysis method taken from a paper I had read earlier. I created a database, downloaded yeast coding sequences, then coded the whole method up in java. Shortly after doing this, a google search showed that not only had a tool been published to perform this analysis, but that the original method I implemented was flawed.

I think the lesson here is to always check what the goal is of the research. Before touching the keyboard. Do you sometimes find that it’s easy to get bogged down in the individual details of implementation, such as coding, rather than the higher scientific question. Coding is enjoyable and is one of the reasons why I like bioinformatics. But what I have to keep telling myself is that in the end it’s about the science rather than the details of implementation. If someone else has already created a tool, or even better produced the results I’m after, then I can skip this step and start the intended analysis. Plus my implementation is probably a lot worse than someone who has given the problem a lot more consideration.

Getting your (scientific) point across

February 17th, 2007

A great post at creating passionate users, I think really applicable to science too.

I know this is horribly overgeneralized, but as a high-level rule, we believe:

If you’re using formal language in a lecture, learning book [...], you’re worrying about how people perceive YOU. If you’re thinking only about the USERS, on the other hand, you’re probably using more conversational language.

Read more »

Organising yourself as a dry lab scientist

February 16th, 2007

Browsing wikiomics, I found this small section on keeping organised as a practising bioinformatician. In particular these lines contain gems of information.

  • Use text files/plain e-mail whenever possible
  • Give meaningful names to your files
  • Create separate folders/directories for each project with meaningful names

I find keeping my work organised one of the most frustrating but necessary tasks of being a bioinformatician. Also this subject seems to recieve little attention in the bioinformatics community.

Wet scientists are expected to keep laboratory books. Where not doing so considered very bad practice. I am jealous when I see these books filled with pictures of gels and printed tables of results. I’ve tried using a lab book, but I didn’t find it applicable for the many different types of scripts and results I was producing.

Here are some tips I find useful for organising myself.

Read more »