How to Analyze Git Repositories with Command Line Tools: We’Re Not in Kansas Anymore

by Spinellis, Diomidis and Gousios, Georgios

You can get a pre-print version from here.
You can view the publisher's page here.
See the paper's associated code repository: gousiosg/icse-tb

Abstract

Git repositories are an important source of empirical so ware engineering product and process data. Running the Git command-line tool and processing its output with other Unix tools allows the incremental construction of sophisticated data processing pipelines. Git data analytics on the command-line can be systematically presented through a pattern that involves fetching, selection, processing, summarization, and reporting. For each part of the processing pipeline, we examine the tools and techniques that can be most effectively used to perform the task at hand. The presented techniques can be easily applied, first to get a feeling of version control repository data at hand and then also for extracting empirical results.

Bibtex record

@inproceedings{SG18,
  author = {Spinellis, Diomidis and Gousios, Georgios},
  title = {How to Analyze Git Repositories with Command Line Tools: We'Re Not in Kansas Anymore},
  booktitle = {Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings},
  series = {ICSE '18},
  year = {2018},
  isbn = {978-1-4503-5663-3},
  location = {Gothenburg, Sweden},
  pages = {540--541},
  numpages = {2},
  doi = {10.1145/3183440.3183469},
  acmid = {3183469},
  publisher = {ACM},
  address = {New York, NY, USA},
  keywords = {command-line tools, data analytics, empirical software engineering, git, pipes and filters},
  github = {gousiosg/icse-tb},
  url = {/pub/git-repos-cmdline-tools.pdf}
}

The paper