git, github, and bioinformatics software development

April 15th, 2008

Github, a source code management (SCM) repository based on git has exited beta and is ready for people to sign up. Git and github offer interesting opportunities for bioinformatics software development, and I think it’s worth taking a few minutes to explore them. There’s a free option too, so it doesn’t cost anything to sign up and play around.

Source code management

Github is based on git, and if you’re familiar with a source code management tool like subversion, git uses a similar command syntax, and would only take about 20 minutes to familiarise with. Git does many things to improve upon SCM, and one of the first things I noticed is how much faster it is than subversion. Also if you’ve ever used subversion, you’ll know that it creates a .svn directory in every subdirectory of the project. This can make it rather difficult to share and maintain. Git on the other hand creates a single .git directory in the root of the project, so if you want to share the project minus git revision control, you can just delete this directory. Git also simplifies the process of when there there is more than one developer working on a project, where each developer needs to work on the same code, which will obviously lead to conflicts in the different versions. Git’s approach allows each developer to copy the main project and work on their own version. This copy can be modified, and committed to, while nothing is sent back to the master copy. Only when you decide to push the changes to the master, are they sent back to the original, at which point the maintainer decides what changes to merge into original version. Subversion does have this option to create branches, but I find that git’s interface is much simpler and gives the developer more freedom in taking risks and trying out new code.

Social Software

Github builds on git and takes the easy branching feature a step further to create a social software site. I know everyone and their dog is creating a social [insert verb]ing application/site, but you might find that that github’s approach can make a difference in your approach to software development. Github makes it possible to see who is creating branches of your project, visualised as a network, where branch and merge points are shown in a timeline.

Image of github network feature

As a use case, I’m working on a manuscript and I have a set of ruby classes which I’ve been using in my analysis. I think these might be useful to other bioinformaticians, and I’d like to contribute them to the BioRuby library. To do this, I have to contact the BioRuby mailing list with my suggestion, get CVS access, and my changes, them commit them to the trunk. Were BioRuby a git repository I could fork it at the beginning of my project, edit BioRuby as I am doing my research, then when my manuscript is done I can prune and tidy my changes and push them back as a patch. Even better, with github’s network feature, anyone interested in BioRuby can see that I’ve forked it, follow the link to my changes and see what I’m doing, even before I’ve committed my changes back to the main project. The BioRuby developers spend a lot of time maintaining the code and so are entitled to tell me what I can do with my ideas, however I’m writing this as a suggestion as a way for BioRuby to further grow, and encourage contributions

I think it would be great if bioinformatics researchers, on publication of a manuscript, included a link to a github repository. As how often is bioinformatics code reinvented? Or when someone emails another researcher for their code, wouldn’t it be great to know what they’re up to? In particular, when you see some code mentioned in a paper, you want to be able to quickly get access, and start playing around. Whether people would want to share code in this way is one issue, but if they choose to, the features that git and github offer can make it much easier.

More on git and github

Repository Formats Matter

Moving from subversion to git

Video tutorial on using git

Comments in github

Project forking using github

Ruby on Rails moves to github

14 responses

  1. Charles Comstock comments:

    I have been giving git a lot of thought recently concerning last minute change that could break existing functionality in order to make the code work for a specific project. If I had been using git, I could have made a topic branch, and committed each group of change seperately, then I could have cherry picked which changes should actually be pushed to the head, and leave the rest as modifications that require further work before moving them back to HEAD.

    I wonder if we could contact the bioruby people, and see if they would be amenable to a fork of BioRuby into github with the understanding that the release versions still go through CVS. It seems like it would open up the playing field for more patches, and then at a later date they could determine if they wanted to include them, but it would not preclude others from using those patches in the meantime.

  2. jan. comments:

    Hi Mike and Charles.

    During the last hackathon in Japan, we decided to - over time - move bioruby from CVS to SVN together with the move to open-bio and when Toshiaki would have some time to do that. Purely by coincidence, I mailed him not more than an hour ago about having a look at git instead of SVN. My reasons being exactly the ones pinpointed by you two here. It’d be helpful if you (both?) could start a discussing on the bioruby mailing list about that. I’ve posted a while back on the state of bioruby on my own blog arguing that it would gain immensely from a more active community. I believe git can make that work. Toshiaki could still maintain the “main” repository that everyone trusts and can pull from, but people can trust other developers as well who they can pull directly from.

  3. Sebastian comments:

    You should also have a look at Mercurial (http://www.selenic.com/mercurial/wiki/). There are also a lot of comparisons between git and Mercurial out there. I found it straight forward and easy to use, even though I just use it for simple stuff.

  4. Mike comments:

    Ruby on Rails has moved onto github recently, and when they did so they froze the existing repository. I think this is important for two reasons, two different repositories are difficult to maintain, and second how would a developer know which repository to update. There a ways to make git work over the top of SVN, but I think that keeping it simple is always better.

    I will email the BioRuby mailing list to make the suggestion.

    @ Charles
    Thanks for the discussion of forking. I’ll be the first to admit I don’t completely understand git’s capabilities and but there are some interesting features that look like exploring in more detail. For example git rebase –interactive is looks interesting

    @Jan
    I saw your post on BioRuby participation and agree that widening participation can only benefit. In my case I want to contribute more, it’s just that it is difficult for me to find the time, think the more simpler it is to contribute the better and the more likely people will contribute

    @Sebastian
    Yes, Mercurial is also worth looking at. I wrote about git after seeing that Rails and many other Ruby based projects were moving over. That’s not to say that Mercurial wouldn’t be worth a try too, the more options the better.

  5. Craig comments:

    Immensely agree. I can’t tell you how much this would help BioRuby.

    Mercurial is an excellent piece of software, no doubt, but in this case BioRuby really needs github since there’s so many different directions people want to take it. Git and github could really inject some life into it.

    I’m “meh” on Lighthouse, but it would certainly be better than the present situation. svn isn’t even worth considering at this point - too little too late.

    Video introductions to git:
    http://www.youtube.com/watch?v=4XpnKHJAok8
    http://video.google.com/videoplay?docid=-3999952944619245780

  6. Around the web - April 19, 2008 : business|bytes|genes|molecules pings back:

    [...] Michael Barton riffs on Git and Github (and yes I have an account) [...]

  7. Software Development comments:

    Sing up for git, git and github is now coming up with wonderful opportunities for bioinformatics software development. Hey ! there is a free option too……….

  8. Using Github, Lighthouse, and Twitter in my research | michael barton pings back:

    [...] think git is great, and I now use this git instead of subversion to version my research. Github is the natural place [...]

  9. jan. comments:

    Just to update you guys: bioruby now *is* available on github: http://github.com/bioruby/bioruby. Toshiaki and Naohisa are look(ing into it at the moment and I’m expecting a phase out of CVS in the next couple of weeks. When that happens we’ll send out an official notice to the mailing list as well.)
    So go ahead and fork/clone!

    @Mike: can you tell me what additions to bioruby you worked on during your own analysis? (See the “use case” in your post)

  10. gioby comments:

    There is also gitorious:
    - http://gitorious.org/

    It seems to be free, but you can’t create private projects.

  11. Mike comments:

    @Jan
    Thanks for all your hard work, I think BioRuby will really benefit from the move to GitHub. As for my use case, I’m currently writing a manuscript which looks at the cost of a protein sequence, and I’ve a got a load of Ruby libraries to do this. I was planning to package all of them into the appropriate format and add them to the BioRuby library. However we’re still focusing on tying up the manuscript first, so it might be a few weeks before this is done.

  12. gioby comments:

    I have also found this website:
    - http://www.assembla.com/

    It seems to offer a svn/git hosting, a bug tracking system, wiki, chat, all for free in the basic version.

  13. gioby comments:

    have a look at trac, too:
    http://trac.edgewall.org/

    It is free software, it integrates a wiki, a subversion/CVS repository, and a lot of things more.

    Also, are you going to write an article on extreme programming?
    cheers

  14. Mike comments:

    @ Gioby
    Trac is cool, but my personal opinion is that Git+Github = Disco!

    As for extreme programming, I’d like to, but I think most (academic) bioinformaticians work solo, and extreme programming is most useful for group programmers. However Matt Wood has an iteresting article on Scrum.

    http://www.greenisgood.co.uk/pages/show/introduction_to_scrum

Leave a comment