Organising yourself as a dry lab scientist

February 16th, 2007

Browsing wikiomics, I found this small section on keeping organised as a practising bioinformatician. In particular these lines contain gems of information.

  • Use text files/plain e-mail whenever possible
  • Give meaningful names to your files
  • Create separate folders/directories for each project with meaningful names

I find keeping my work organised one of the most frustrating but necessary tasks of being a bioinformatician. Also this subject seems to recieve little attention in the bioinformatics community.

Wet scientists are expected to keep laboratory books. Where not doing so considered very bad practice. I am jealous when I see these books filled with pictures of gels and printed tables of results. I’ve tried using a lab book, but I didn’t find it applicable for the many different types of scripts and results I was producing.

Here are some tips I find useful for organising myself.

I couldn’t agree more with the above tips. Give directories and files the most verbose names as possible. This helps when trying to find a specific file. Being verbose as possible in naming your files is useful because often sets of files are all related to a similar subject. Take the following example.

  • ancova_sequence_hydrophobicity.R
  • ancova_sequence_hydrophobicity_interaction_term.R
  • ancova_sequence_hydrophobicity_residuals.R

All three files contain a script fitting an ancova model, but all differ slighty in focusing on different parts of the model. Finding the one you need is still simple for you, but perhaps not so in a few months time when you return to the results to write a paper.

Consider this example

  • ancova_sequence_hydrophobicity.R
  • ancova_sequence_hydrophobicity.csv
  • ancova_sequence_hydrophobicity.tiff
  • ancova_sequence_hydrophobicity_interaction_term.R
  • ancova_sequence_hydrophobicity_interaction_term.csv
  • ancova_sequence_hydrophobicity_interaction_term.tiff
  • ancova_sequence_hydrophobicity_residuals.R
  • ancova_sequence_hydrophobicity_residuals.csv
  • ancova_sequence_hydrophobicity_residuals.tiff

Here, there are files for the results of each model (csv) and a plot of the results (tiff). This illustrates how quickly things can expand. Making it more difficult to understand what each file refers to.

Here’s one way that this could be organised

  • 1.ancova_sequence_hydrophobicity
    • scripts
      • model.R
      • model_interaction_term.R
      • model_residuals.R
    • results
      • model.csv
      • model_interaction_term.csv
      • model_residuals.csv
    • pictures
      • model.tiff
      • model_interaction_term.tiff
      • model_residuals.tiff

Each sub directory names describes its contents, which keeps things verbose. Furthermore the directory path contributes to describing each file, e.g. 1.ancova_sequence_hydrophobicity/results/model_residuals.csv. This helpful if you are referencing the file else where and want to know what the file contains.

Since the files are related, they each have an identically named counterpart in the other directories. This is useful for determining which script produced which result.

Finally the top level directory has a number. Often projects and experiments are carried out linearly, one being done after another. Keeping the directories numbered can help to trace the thought process at a later date.

There’s an interesting post at LifeHacker about organising file structure. The comments also have a lot of useful ideas too.

There are an infinite number of ways to organise. Probably the best way to do this is to use the system that suits you best. Experiment, you’re a scientist.

10 responses

  1. Bioinformatics Zen » Blog Archive » Bioinformatics : use a database for data pings back:

    [...] Previously, I wrote about organising your file system to make the relationships between files that produce data, and files containing data more descriptive. One of the best tips I’ve been given, is to store all my data in a database. Regardless of what the data is, or how “mission critical”. Here are some reasons to use a database, rather than files, to store your data. [...]

  2. Greg Tyrelle comments:

    I posted this over at nodalpoint, it is one of our favorite and never ending topics of discussion.

    Nice to see new bioinformatics blogs. Keep up the good work.

  3. Mike comments:

    Thanks for the linkback Greg.
    I’m regular reader of nodal point, and was very pleased to get a mention.

  4. Luca Beltrame comments:

    Nice and informative post. It prompted me to reconsider and reorganize my messy home directory.

  5. Mike comments:

    I happy to read that it was helpful Luca.

  6. alex comments:

    hi nice site.

  7. Bioinformatics Zen » Blog Archive » Use a hyperlinked document as a bioinformatics lab book pings back:

    [...] wrote previously about using the file system to organise your scripts and data. I use this method and it does help [...]

  8. robert comments:

    hi all.

  9. Nenad Bartonicek comments:

    Great set of blogs and articles! Just added your RSS to my iGoogle page.

  10. Organised bioinformatics experiments | Bioinformatics Zen pings back:

    [...] previous post I wrote tried to address some of these problems by being strict about directory and file naming, however [...]

Leave a comment