How Git Encourages Open Source Contribution

It is almost exactly 10 years ago that I released my first piece of open source software. The name of the project was YaBB and it was the first open source bulletin board/forum software written in Perl. Really — Perl? Yeah, that was the lingua franca of the web at that time — CGI baby!

A few months after YaBB’s initial release the team of contributors grew to about a dozen or so people. Our main collaboration tool was the YaBB software itself and e-mail. We did not use a version control system (who knew how to use CVS anyway). The main “repository” of source code was my hard drive, or even the FTP where the YaBB website was hosted. Some people had “commit” access i.e. had the FTP username and password and they contributed by directly uploading their changed files straight to the FTP — any changes were live immediately. Once in a while we would make a release (in the beginning about a few times a day, later less frequently).

Outsiders could contribute by submitting patches to us, or send us modified files via e-mail. But frankly, that did not happen often. The barrier to contribute was pretty high. People essentially had to be invited to join the team and get FTP access to contribute. Looking back, I think this greatly reduced the amount of outside contributions. And in case you were wondering, the project is still around — not very popular anymore, it seems — but its successor SMF seems to be doing pretty well.

Later on, as a consumer of open source software I encountered this issue from the other end. Even if a project did use a version control system, like CVS or subversion, it was mostly useful to stay up to date with developments of the project team — the people with commit access. If I made any changes myself, there was no light-weight way to contribute them back. Sure, I could make a patch and send it by e-mail, but that’s a lot of hassle. Who does that? In effect, my changes only lived on my hard drive and ultimately disappeared when cleaning stuff up, after I didn’t need them anymore.

About a year ago I started to use Git, a distributed version control system. That means that, rather than having a centralized source repository, every developer keeps his or her own repository locally. Locally, but maybe also remotely someplace, so that others can access it. For open source, a popular place to host such remote repositories is GitHub. I host a number of repositories at github myself.

So, how does “git encourage open source contribution” then? Well, in Git, forking a project is a very common thing to do. Effectively, as soon as you clone a repository (a “checkout” in subversion-speak), you keep your own copy locally. You can commit to that copy as much as you want, create branch, tags etc. Effectively you have now created a fork of the project, which you can push (upload) to your own project repository, on GitHub for instance. As an example, let’s consider persistence.js — my Javascript ORM library. According to GitHub, there are currently 7 forks of this project. The main one lives at http://github.com/zefhemel/persistencejs, but there’s another one at http://github.com/fgrehm/persistencejs, and yet another one at http://github.com/eegg/orm.js. That is, there are 7 people who made changes locally and pushed them to their own GitHub project repository.

So, what would you fork a project for, is that like a hostile take-over? Not really. People usually do it if they found some kind of bug, or want to add some kind of feature and publish those changes (or they want to port the whole thing to Coffeescript, lord knows why). They can do that by simply forking and pushing it to GitHub. If they want, the can send me a pull request, which is a request for me to pull in their changes into my repository, these changes can be merged into my main repository. I say main repository, but there is really no such thing.

So, does that actually work? Yes, it turns out that, indeed, it does. In the case of persistence.js, people have contributed bug fixes and the persistence.migrations.js plug-in (to handle changes in your data model) has been contributed by a “stranger”. With stranger I mean somebody who did not request to be “part of the team” (there’s not really a team), or to get commit access to anything. Just a somebody that happens to use persistence.js, and needed this plug-in. I have contributed fixes to a couple of projects started by others as well, such as nodejs-mysql-native, TouchScroll and congomongo. Doing so was a breeze.

If your open source project is still running on subversion — or lord forbid, CVS — and you feel you do not get a lot of outside contributions, Git (or Mercurial for that matter) is definitely something to consider. GitHub is a great place to host your code, but if you need more space (you only get about 300MB for all your repos together), Gitorious is excellent as well (you get virtually unlimited space there). Or use both!