Posts about use this

Shortcuts for generating HTML

June 21st, 2008

Usually at some point you’ll need to produce some web pages. This could be for a group website, or a page to embed a tool in. Writing HTML is an dull way to spend time, so it’s worth knowing some tools that can make HTML generation easier.

Markdown

I use markdown to write blog posts, as it’s just the same as writing plain text but with a few extra markups.

# A level one heading

Some normal paragraph text

### Three hashes makes a level three heading


text = 'this is indented'
text << 'and so will be marked up as code'
puts text

Converting markdown to HTML will produce the following output.

<h1>A level one heading</h1>

<p>Some normal paragraph text</p>

<h3>Three hashes makes a level three heading</h3>

<pre><code>
text = 'this is indented'
text << 'and so will be marked up as code'
puts text
</code></pre>

Writing in markdown means you don’t have to edit HTML. Instead edit the markdown and proof read the resulting HTML in a browser. An analogy for markdown is a more readable syntax that be ‘compiled’ into HTML. The Wikipedia entry lists interpreters for all major languages, and there is full listing of the markdown syntax. If you’re interested, you can look at the markdown I used to write this blog post, which I run through my Ruby command line markdown parser.

Textile

Markdown is great for creating simple HTML content, but sometimes you might need to create more fully featured content, that might include attributes matching a CSS specification. Textile’s syntax is very similar to that of markdown, but also allows a few extra HTML features.

h2(#with-id). Level Two Heading with attribute

p(custom-class). Paragraph text with some extra markup

|. Table |. Heading |
| Table cell | Table cell |

Produces this HTML.

<h2 id="with-id">Level Two Heading with attribute</h2>

<p class="custom-class">Paragraph text with some extra markup</p>

<table>
    <tr>
        <th>Table </th>
        <th>Heading </th>
    </tr>
    <tr>
        <td> Table cell </td>
        <td> Table cell </td>
    </tr>
</table>

Like Markdown, the Wikipedia entry lists interpreters for the major programming languages. There is also a full listing of the textile syntax

Haml

If Textile isn’t enough, Haml gives you complete control of HTML generation. The downside is that HAML is less readable than Textile or Markdown, but still much easier to edit and maintain than HTML.

#div-with-id
  %custom-tag with enclosed text
  Some normal text here
  %a{'href' => 'example.com'} A specific tag with attributes

Produces this HTML

<div id='div-with-id'>
  <custom-tag>with enclosed text</custom-tag>
  Some normal text here
  <a href='example.com'>A specific tag with attributes</a>
</div>

Haml uses indentation to know when to enclose tags, see the div added on the last line, which means sensitive to how you use whitespace to indent lines. In addition there are only a few interpreters at the moment. The full syntax is described on the Haml website

A note on templating

Related to HTML generating is templating, which allows code to be embedded and evaluated in text. Here’s an example using the Ruby templating library - ERB.

There are <%= Gene.all.length %> protein coding genes in this data set. The data set was downloaded in fasta format from the [SGD ftp server][sgd]. The dataset was last modified on <%= File.stat(“data/yeastproteingenes.fasta.gz”).mtime %>

The section of text <%= Gene.all.length %> is evaluated as Ruby code and returns the result. The Ruby code is looking at the ‘gene’ table in my database, finding all the entries, then counting the number of records. Using the DataMapper ORM, which I discussed previously, this Ruby code is concise and readable, so it makes it easy to know what this ruby code will return, and therefore makes it easier maintain than raw SQL. The second block of code does some providence to identify when I modified the data file. Running this through ERB then markdown will produce this text.

There are 5883 protein coding genes in this data set. The data set was downloaded in fasta format from the SGD ftp server. The dataset was last modified on Fri May 09 16:25:27 +0100 2008

Three stories about science and the web : The movie

January 21st, 2008

In a previous post I wrote about how great new web tools are making it easier for scientists to collaborate, find information, and share information. This light-hearted introduction was rather popular, so heres’s a tongue-in-cheek video version.

How to avoid errors when processing CSV files

November 1st, 2007

A lot of bioinformatics involves reading data from files to manipulate them for our analysis. For example, I spend a lot of time importing data from CSV files into my database. Doing this involves creating a script to iterate over each line of the file, then referencing each token in the row by its column number.

However this is bad for two reasons. The first reason is because it introduces a dependency on the column number, which may feasibly change. You can fix this by changing the script though, so this is not too bad.

The second reason is much more worse, because it could introduce a silent error. If the column number was wrong, then the wrong entry would be referenced. If correct and wrong entry where both of the same type, e.g. floats, then there is a chance you would miss the mistake, which is very bad.

One approach to fix this is to treat each row as a hash or map. I’ve laid out two examples in Ruby using the gem FasterCSV. They’re quite simple, so you should get the idea whatever language you use, hopefully there are equivalent libraries too.

Bad example

FasterCSV.foreach(file_path) do |row|


# In this instance the row is an array
# and has to accessed by the column number.
# Bad, because this introduces a dependency
# on the position of the column and doesn't
# throw an error if you are using the wrong column
row[column_number] # Do something here


end

Good example

#Set the header processing option...
FasterCSV.foreach(data_path, :headers => true) do |row|


# ...each row is now a hash, and the
# data can be accessed using a key
row['column_name']

# This is dependent on the column
# name, but not its position.
# Also you will get an error if
# the column doesn't exist and you
# will always reference the column you expect

end

Importantly by using a third party library, you implement another programming best practice which is, don’t reinvent the wheel.

Three stories about science and the web

October 19th, 2007

Picture of many different web logos

Collaborating on the same document

Tom, Dick, and Harry are collaborating on a paper. Tom, being the PhD student, does all the work and then writes the paper. Tom then sends a copy to Dick and Harry who edit it with their opinions. Unfortunately Dick completely removes the second paragraph of the discussion, while Harry expands it. Both then send their edited copies back to Tom.

Read more »

Six alternatives to PubMed for searching scientific content

June 24th, 2007

In my opinion, great coding skills, a thorough knowledge of statistics, and Shakespearian writing ability do not make a great bioinformatician. They help, but the most important things are a relevant scientific question and a good understanding of the literature. If you’re like me, the path to scientific enlightenment begins with typing keywords into PubMed until you get the results you were after - the same way you use Google. However the are other options besides PubMed, here are six other options you might not have heard of, worth a look perhaps?

Read more »