Organising yourself as a dry lab scientist
February 16th, 2007Browsing wikiomics, I found this small section on keeping organised as a practising bioinformatician. In particular these lines contain gems of information.
- Use text files/plain e-mail whenever possible
- Give meaningful names to your files
- Create separate folders/directories for each project with meaningful names
I find keeping my work organised one of the most frustrating but necessary tasks of being a bioinformatician. Also this subject seems to recieve little attention in the bioinformatics community.
Wet scientists are expected to keep laboratory books. Where not doing so considered very bad practice. I am jealous when I see these books filled with pictures of gels and printed tables of results. I’ve tried using a lab book, but I didn’t find it applicable for the many different types of scripts and results I was producing.
Here are some tips I find useful for organising myself.
I couldn’t agree more with the above tips. Give directories and files the most verbose names as possible. This helps when trying to find a specific file. Being verbose as possible in naming your files is useful because often sets of files are all related to a similar subject. Take the following example.
- ancova_sequence_hydrophobicity.R
- ancova_sequence_hydrophobicity_interaction_term.R
- ancova_sequence_hydrophobicity_residuals.R
All three files contain a script fitting an ancova model, but all differ slighty in focusing on different parts of the model. Finding the one you need is still simple for you, but perhaps not so in a few months time when you return to the results to write a paper.
Consider this example
- ancova_sequence_hydrophobicity.R
- ancova_sequence_hydrophobicity.csv
- ancova_sequence_hydrophobicity.tiff
- ancova_sequence_hydrophobicity_interaction_term.R
- ancova_sequence_hydrophobicity_interaction_term.csv
- ancova_sequence_hydrophobicity_interaction_term.tiff
- ancova_sequence_hydrophobicity_residuals.R
- ancova_sequence_hydrophobicity_residuals.csv
- ancova_sequence_hydrophobicity_residuals.tiff
Here, there are files for the results of each model (csv) and a plot of the results (tiff). This illustrates how quickly things can expand. Making it more difficult to understand what each file refers to.
Here’s one way that this could be organised
- 1.ancova_sequence_hydrophobicity
- scripts
- model.R
- model_interaction_term.R
- model_residuals.R
- results
- model.csv
- model_interaction_term.csv
- model_residuals.csv
- pictures
- model.tiff
- model_interaction_term.tiff
- model_residuals.tiff
- scripts
Each sub directory names describes its contents, which keeps things verbose. Furthermore the directory path contributes to describing each file, e.g. 1.ancova_sequence_hydrophobicity/results/model_residuals.csv. This helpful if you are referencing the file else where and want to know what the file contains.
Since the files are related, they each have an identically named counterpart in the other directories. This is useful for determining which script produced which result.
Finally the top level directory has a number. Often projects and experiments are carried out linearly, one being done after another. Keeping the directories numbered can help to trace the thought process at a later date.
There’s an interesting post at LifeHacker about organising file structure. The comments also have a lot of useful ideas too.
There are an infinite number of ways to organise. Probably the best way to do this is to use the system that suits you best. Experiment, you’re a scientist.
February 26th, 2007 at 3:21 pm
[...] Previously, I wrote about organising your file system to make the relationships between files that produce data, and files containing data more descriptive. One of the best tips I’ve been given, is to store all my data in a database. Regardless of what the data is, or how “mission critical”. Here are some reasons to use a database, rather than files, to store your data. [...]
February 28th, 2007 at 8:17 am
I posted this over at nodalpoint, it is one of our favorite and never ending topics of discussion.
Nice to see new bioinformatics blogs. Keep up the good work.
February 28th, 2007 at 8:59 pm
Thanks for the linkback Greg.
I’m regular reader of nodal point, and was very pleased to get a mention.
March 3rd, 2007 at 11:48 am
Nice and informative post. It prompted me to reconsider and reorganize my messy home directory.
March 7th, 2007 at 12:53 am
I happy to read that it was helpful Luca.
April 11th, 2007 at 10:37 pm
hi nice site.
April 13th, 2007 at 3:34 pm
[...] wrote previously about using the file system to organise your scripts and data. I use this method and it does help [...]
June 14th, 2007 at 6:29 pm
hi all.
January 30th, 2008 at 1:13 pm
Great set of blogs and articles! Just added your RSS to my iGoogle page.
May 24th, 2008 at 6:43 pm
[...] previous post I wrote tried to address some of these problems by being strict about directory and file naming, however [...]