Paper discussion: Hilton et al. [1]

We discuss the paper “Usage, costs, and benefits of continuous integration in open-source projects[1] by M. Hilton et al.


  • Appeared in 2016
  • Published at ASE – competitive conference
  • Cited 66 times
  • Citations still picking up!

Citations over time


  • Michael Hilton from Oregon State
  • Two students (?) from Illinois, whom I do not know
  • Darko Marinov, professor at Illinois
  • Danny Dig, Michael’s supervisor at Oregon State


Why do this research?

  • CI is an important topic because it is adopted in industry, and can save money
  • CI is an understudied topic in SE research
  • We need to understand more about it to make informed decisions (otherwise, leads to “poor decision making and missed opportunities”)

Research method

What does the paper do?

The paper uses a mixed-method research design: a quantitative analysis and a survey.

  • The research objects for both are split into a breadth corpus and a depth corpus. The quantitative analysis features 34,544 OSS projects and the qualitative analysis includes data from 620 projects using Travis CI
  • Survey: Why do developers (not) use CI? Among 442 developers.


What does the paper find?

  • CI is widely used in OSS, and the more popular a project, the more likely it is to use CI.
  • Travis CI is the de facto standard (somewhat similar to GitHub)
  • Main reasons hindering adoption is a lack of familiarity with CI.
  • CI makes developers less worried about breaking the build.
  • CI makes integrating PRs faster. PR accepted 1.6 hours faster.


Why are the results important?

  • CI needs to be studied more.
  • CI might have a handful of quantifiable benefits.


Techincal questions

  • What do we think about HP’s claim to reduced development costs by 78%?
  • What do we think about the paper’s way to contact developers (mentioned in section 3.2)? Sending 4,508 emails mined from their GitHub profiles … is this ethical? See GHTorrent issue 32.
  • How can we interpret the correlation between popularity of a project and its usage of CI?
  • What might be a problem with the conclusion “CI makes integrating PRs faster. PRs are accepted 1.6 hours faster?”

Meta questions

  • The paper title says Continuous Integration at large, but a large part of the paper focuses on Travis CI. Is this justified?
  • The paper contains the string “we collect[ed] all publicly available information” twice. Is this likely?
  • Do we find the motivation in the paper strong?

Discussion summary

by Guatam Miglani

  • Paper took about an hour to read for most people. For some it took over two hours since the paper is quite long.
  • People like the questions the paper discusses and the structure.
  • The paper contains 14 research questions which is quite impressive and discusses them in 12 pages. So it is a very dense paper.
  • Paper does not mention anything about social media (it only references the approximately 4508 emails). Since there was no response rate included, the paper was rejected by other conferences.
  • Mined emails from GitHub profiles (or GHTorrent) and spammed potential participants. In a way, this can be seen as ``academic spam’’. GHTorrent contains about 60 million data entries. However, to distributed this data, there needs to be consent from the person’s data being shared.
  • CI makes Pull Requests faster (1.6 hours faster). Not sure what the average is compared to, is it compared to a popular project.
  • The paper only studies whether it uses ``Travis’’ or not and the popularity of the project. All the other factors aforementioned are the confounding factors. The paper does not mention these factors.

Paper discussion: Rausch et al. [2]

The paper


  • Thomas Rausch: University assistant and PhD student at TU Vienna
  • Waldemar Hummer: Cloud & AI Researcher at IBM, Senior Researcher TU Vienna
  • Philipp Leitner: Former Senior Research Associate at UZH, Assistant Professor at University of Gothenburg
  • Stefan Schulte: Assistant Professor at TU Vienna


  • Continuous Integration (CI) widely used
  • Errors during CI build → development inefficiency
  • CI with version control systems (VCS) + build automation platforms used heavily in open-source software (OSS)
  • Unknowns in the variety and frequency of errors which causes the build to fail
  • Pull-based development model → new undocumented aspect

Research questions

  1. Which types of errors occur during CI builds of Java-based open-source systems?
  2. Which development practices can be associated with build failures in such systems?
  • 14 OSS: CI with GitHub and Travis-CI
  • Java-based projects with Maven/Gradle
  • Data from CI servers and VCS repositories


  • Continuous Repository Mining: loss of historical data (PR-based dev model)
  • Custom Data Crawler for detailed analysis of build errors
  • Labeling build log data
  • Define metrics for individual changes (VCS) and overall CI process
  • Extract this metrics: combine graph-structure VCS data with semi-structures CI servers log files

Types of build errors

  • LogCat: one-to-many mapping between labels and message patterns
  • 14 error categories
    • Test failures, compile errors and VCS interaction errors common to all projects
    • Test failures, code quality errors and compile errors are the most frequent errors
    • Test failures and the most common reason for build failures
  • Time of build errors during the build time cycle
    • git errors & compile errors in the first half
    • 30% errors in the first half
    • Second half dominated by test failures
    • Test failures has the higher outlier rate

Impact of development practices

  • Changes types: introduce failures(break), perpetuate failures(broken), fix failures(fix) and unrelated to failures (passing)
  • Process metrics
    • Complexity of changes: high change complexity → increase in software defects
    • Date and time and file types: little correlation with build results
    • Author: less frequently commits cause fewer build failures
  • CI Metrics
    • Build types: PRs cause failures more often than changes pushed directly
    • Build history: failures tend to occur consecutively → historical data to be used as a predictor of build results


  • Analyzed build log data: 14 error categories with 30% of errors in the first half of build execution time and test failures are a major threat to CI
  • Analyzed impact factors: build stability, change complexity and author experience most influential factors
  • Future work: prediction of build results

Discussion summary

by Guatam Miglani

  • Liked the first part and surprised by the number of builds failed. Did not like the second part.
  • Only way to know your build is going to fail is by executing the build. A message saying that the build is going to fail is not exactly helpful since we can find out my just executing the build.
  • What metrics could also be considered that were not mentioned in the paper? Most were happy with the metrics mentioned in the paper.
  • Eight months is long enough to gather data and stipulate information through the data. It is not too long to be altered due to the changing structure of the development and the period if not too short to not gather sufficient data. However, potentially, it may be interesting to see the results of the data for a period of longer than eight months with changes to the structure of the development.
  • Look into project timeline and see what projects have reached more millstones for builds. Alternatively, increase the test suite to increase the build time of the project.
  • How would you replicate this paper? And if so, what would you change if anything? If you change the data, you would need to change the tools. Some
  • How would you make sure that tests build early using JavaScript. For Java, it will be completely different. All tests will be in ladder since we will run more tests. So there will be less compiling. In java, you can fix errors earlier in the build as you get feedback. Use script to get build errors earlier and tackle them before.
  • We’ve identified five things that would be different with JavaScript. The paper can be generalised to programming languages such as C sharp.
  • Continuous mining is done in depth and paper is detailed and not generalised. The paper is very dense and filled with information which is impressive but also makes the paper longer to read.
  • The paper talks about failures of CI, where as paper one takes a more positive approach to CI.


M. Hilton, T. Tunnell, K. Huang, D. Marinov, and D. Dig, “Usage, costs, and benefits of continuous integration in open-source projects,” in Proceedings of the 31st IEEE/ACM international conference on automated software engineering, 2016, pp. 426–437.
T. Rausch, W. Hummer, P. Leitner, and S. Schulte, “An empirical analysis of build failures in the continuous integration workflows of java-based open-source software,” in Proceedings of the 14th international conference on mining software repositories, 2017, pp. 345–355.
C. Bird and T. Zimmermann, “PREDICTING SOFTWARE BUILD ERRORS.” Feb-2014.
M. Beller, G. Gousios, and A. Zaidman, “Oops, my tests broke the build: An explorative analysis of travis CI with GitHub,” in 2017 IEEE/ACM 14th international conference on mining software repositories (MSR), 2017, pp. 356–367.
M. Beller, G. Gousios, and A. Zaidman, “Travistorrent: Synthesizing travis ci and github for full-stack research on continuous integration,” in Proceedings of the 14th international conference on mining software repositories, 2017, pp. 447–450.
G. Pinto and F. C. R. B. M. Rebouças, “Work practices and challenges in continuous integration: A survey with travis CI users,” 2018.
Y. Zhao, A. Serebrenik, Y. Zhou, V. Filkov, and B. Vasilescu, “The impact of continuous integration on other software development practices: A large-scale empirical study,” in Proceedings of the 32nd IEEE/ACM international conference on automated software engineering, 2017, pp. 60–71.
D. G. Widder, M. Hilton, C. Kästner, and B. Vasilescu, “I’m leaving you, travis: A continuous integration breakup story,” 2018.