For a research paper I am working on, we wanted to analyze the top 30 "most collaborative" projects on Github. Defining a quantitative metric of collaboration and sorting projects according to it is not an easy task, as collaboration is in many cases implicit and not recorded, while not all actions of collaboration are equal. As a proxy, we chose to measure the number of people that perform changes that mutate the state of a repository. On Github, we could identify the following:

  • A: Create a commit to a repository
  • B: Perform a code review on an individual commit
  • C: Create/Update/Merge/Close a pull request
  • D: Perform a code review on a pull request
  • E: Comment on a pull request
  • F: Create/Close an issue
  • G: Comment on an issue

Using GHTorrent as a data source, I wrote a script to measure the individual persons that performed the actions above for all non-forked repositories and then sorted the repos according to the total number of individual contributors. The results can be seen in the table below:


The numbers are staggering. A project (Homebrew) that is just 5 years old has attracted 20.5k --- 20,500, the size of a small city! --- people to contribute to it. Ruby on Rails has been collaboratively developed by a community of 15k people and still works! To compare these numbers with other software engineering projects is futile: most projects, even ones with a very long lifeline are very small in comparison. Perhaps a more fair comparison is with other online collaborative initiatives: The English Wikipedia is being maintained by 130,800 people, while the effort of decoding the human genome has been carried out by thousands of people.

If nothing else, the above are an example of the power of commons and certainly the usefulness of Github as a collaboration platform.


27 March 2014



Submit to HN