Bioinformatics : use a database for data
Previously, I wrote about organising your file system to make the relationships between files that produce data, and files containing data more descriptive. One of the best tips I've been given, is to store all my data in a database. Regardless of what the data is, or how "mission critical". Here are some reasons to use a database, rather than files, to store your data.
Location independent You create a perl script that analyses file A. You later move file A. So you have to update your perl script with the new location. What if you've got a perl script that analyses file A, B, C etc. Or if you've moved the file several months ago, and you can't rember which is the one you need. Instead, if you have everything as tables in a database, you can pull the data, location independently. The database doesn't even need to be on your computer.
Databases are clean Unless they are XML, data files are messy. Missing commas. Too many commas. Blank lines at the end of file. Bizarre header lines. Binary data files are even worse, you'll need a library to parse it. Databases on the other hand are consistent - data is always stored the same way. Named columns in a named table. You'll always use the same methods to pull the data. You'll always use the same program to view the data
Easier to backup Obviously you backup regularly. If you use files to store your data, every time you create a new file you'll have to inform your backup application that the file needs to be included. On the other hand, databases can be saved into a single text. If you've 5, 10, or 20 tables in your database, everything can still be backed up into one file.
Relational meaning Relational data management is a huge topic and I'm not going into detail here. But a simple illustration is table for organisms and a table for sequences. Each sequence can referenced to the originating organism using SQL, and vice versa. A operation that would more difficult if the two data sets were in separate files.
Where to get started I personally use MySQL for my databases. Not for any particular technical reason, but because this is what I was taught using. I know that PostgreSQL is popular, HSQLDB also. As for tutorials, this page has a good explanation on different database types.