Software analytics

Software analytics is a modern term for the use of empirical (mostly quantitative) research methods on software data.

In this lecture, we will:

Review basic empirical research methods
Present important developments in quantitative software engineering research
Discuss what software analytics are

Why quantitative software engineering?

Quantitative software engineering is a subset of empirical software engineering, a discipline that

Transform data traces into working knowledge
Identify and optimise patterns and trends
Quantify and improve processes and products

D: Can you identify potential applications of quantitative Software Engineering?

(Some) Applications of software analytics

Pattern mining and recommendation
- “You changed this file, perhaps you might also want to change that one as well”
- “This bug report is duplicate from…”
Autocompletion in Eclipse
Process improvement
- test case runtime optimisation
- work prioritization
Evolution analysis
Software quality analysis

Empirical Research methods: A quick primer

Empirical research

Empiricism is a philosophical theory that states that true knoweledge can only arise by systematically observing the world.

Types of empirical research:

Experiment: Measure the effect of an intervention
Case study: (usually post-mortem) Analysis the properties of specific cases
Field study: Observation of behaviours in their context
Grounded Theory: Iteratively refine theories based on new data
Action research: Embed into context, modify it, measure the effects

Data collection methods

Empirical research requires the collection of data to answer research questions (RQs).

Qualitative research methods collect non-numerical data
- Surveys
- Interviews
- Focus groups
Quantitative methods using mathematical, statistical or numerical techniques to process numerical data:
- Modelling
- Machine learning
- Simulations

Hypotheses

A hypothesis proposes an explanation to a phenomenon

Defined in pairs

\(H_1\): Default hypothesis
\(H_0\): Null hypothesis

A good hypothesis is readily falsifiable.

Hypotheses | p-values

Most statistical tests return the probability (\(p\)) that \(H_0\) is true.
To interpret a test, we set a threshold (usually, 0.05) for \(p\)
If \(p <\) threshold, then the null hypothesis is rejected and the default one is accepted
Need to know before hand what statistical tests do

Theories

A theory is a proposed explanation for an observed phenomenon. It (usually) specifies entities and prescribes their interactions. Using a theoretical model, we can explain and predict

Q: How can we build or dismantle a theory?

Theories are built by generalizing over consecutive research results.

A single contradicting data point is enough to reject a theory.

Measurement

Extract samples of data for a running process. Data types:

Continuous or Ratio: Can calculate a degree of difference between measurements
Interval: Cannot calculate a degree of difference between measurements
Categorical: One of a predefined set (sorting does not make sense)
Ordinal: One of a predefined set (sorting makes sense)

A brief history of Quantitative Software Engineering

late 70s – Metrics: McCabe Complexity, Halstead software science, LoCs
mid 80s – Measuring projects: Function Points, COCOMO
mid 90s – Modeling: Reliability, Cost, Maintainability. Goal-Question-Metric.
mid 00s – OSS repositories: Mining VCS, BTS, MLs
mid 10s — software analytics and big data

70s: Basic metrics

McCabe’s complexity [1]: Attempt to quantify complexity at the function level by counting number of branches.
Halstead software science [2]: Attempt to generate laws of software growth
Curtis et al. [3] found that: “All three metrics (Halstead volume, McCabe complexity, LoCs) correlated with both the accuracy of the modification and the time to completion.”

they just work!

80s: Fighting the “software crisis”

Boehm [4] defined the COCOMO model, and effort to quantify and predict software cost:
- Effort: \(E = a_{b}(KLOC)^b\)
- Development time: \(D = c_{b}(E)^{d_b}\)
- People required: \(P = E/D\)

\(a, b, c\) and \(d\) were collected through case studies.

Function points were developed to express the amount of business functionality.

Both COCOMO and function points are widely used today for cost estimation.

80s: Lehman’s laws of software evolution

Manny Lehman [5] defined a set of laws that characterise how software evolves (and ultimately predict its demise)

software must be continually adapted or it becomes progressively less satisfactory
as software evolves, its complexity increases unless work is done to maintain or reduce it

80s: Metrics galore!

Using metrics to define product and process quality

Defect density
Customer satisfaction
Backlog management
Bug removal effectiveness
Availability
6\(\sigma\)
… (more metrics)

90s: The ISO 9126 standard

The ISO 9126 standard

90s: Empirical software engineering

Basili [7]: The Goal-Question-Metric approach:

Goal: What do we need to study and why?
Question: Models to characterize object of study.
Metric: One or more metrics to quantify the question models.

A goal is stated as follows:

what	example
Object of study	A tool or a practice
Purpose	Characterize, improve, predict etc
Focus	prespective to study the problem from
Stackeholder	Who is concerned with the result?
Context	Confouding factors (e.g. company, environment)

The GQM approach is another way of describing the scientific method.

90s: New technical developments

Object-oriented programming: Existing metrics not considering encapsulation or nesting
- Chidamber and Kemerer [8] metrics capture OO characteristics, e.g. Depth of Inheritance tree. Found to be able to predict defects in OO projects, at the design phase.
- Used to guide refactoring efforts
Open Source Software:
- Exciting, participatory, distributed development method
- Abudance of product and process data: Version Control Systems, Bug Tracking Databases, Mailing Lists, IRC chats

00s: The primordial OSS study

Mockus et al: “Two case studies of open source software development: Apache and mozilla” [9]

Not the first to use OSS data, but:

Pioneered the case study as a research instrument in QSE
Used robust statistical methods
Combined traces from multiple OSS data sources (VCS/BTS)

00s: How does OSS work?

von Krogh et al.: “Community, joining, and specialization in open source software innovation: a case study” [10]

Defined the, now obvious, vocabulary of OSS research:

joining script
contribution barrier
coordination and awareness

Herbsleb and Mockus: “An empirical study of speed and communication in globally distributed software development” [11]

“… distributed work items take about 2.5x as long to complete as items where all the work is colocated”
Established the basis of the socio-technical congruence team model

00s: Software engineering ❤️ Data Science

Zimmerman et al. “Mining Version Histories to Guide Software Changes” [12]

Very important work because:

Used unsupervised learning (association rules) to answer the question: “Programmers who changed these functions also changed …”
Kickstarted a research area known as “Mining Software Repositories”
The MSR conference, the premier conference for software analytics research, has skyrocketed to be the 8th most important venue in the software systems research

00s: Making money with metrics

Nagappan et al.: “Mining Metrics to Predict Component Failures” [13]

“Using principal component analysis on the code metrics, we built regression models that accurately predict the likelihood of post-release defects for new entities.”

Heitlager et al.: “A Practical Model for Measuring Maintainability” [14]

“we identify a number of requirements to be fulfilled by a maintainability model to be usable in practice. We sketch a new maintainability model that alleviates most of these problems, and we discuss our experiences with using such as system for IT management consultancy activities.”

00s: Finding bugs – The holy grail

Noteworthy findings (at the file level):

Churn correlates with bug proness
Previous bugs are good predictors of future bugs
The more recent bugs, the more future bugs
Number of developers does not correlate with number of bugs
Complexity does not correlate with #bugs

Late 00s: MSR goes mainstream

Predicting component failures: Hassan [15] found a connection between process metrics and bugs
Distributed software development: Bird et al. [16] found that software quality is not affected by distance
No model to rule them all: Zimmerman et al. [17] established that software projects are different and therefore models need to be localised and specialised.
Naturalness: Hindle et al. [18] found that “code is very repetitive, and in fact even more so than natural languages”

10s: OSS v2, the AppStore and DevOps

In the early 10s, the velocity of software production increased at a breakneck rate

GitHub revolutionalized OSS by centralizing it. Anyone can contribute (and contribute they do!).
AppStores made discoverability and distribution to the end client trivial.
The cloud transfored hardware into software.

Software analytics coined as a term to help teams improve their performance

10s: Learning from Big Data

Big Software: GHTorrent (Gousios [19]) made TBs of GitHub data available to researchers. Inspired TravisTorrent [20] and SOTorrent [21]
Big testing: Herzig et al. [22] developed “a cost model, which dynamically skips tests when the expected cost of running the test exceeds the expected cost of removing it. ”
Big security: Gorla et al. [23] “after clustering Android apps by their description topics, (we) identified outliers in each cluster with respect to their API usage.”

10s: Code as Data

Code summarization Allamanis et al. [24] use CNNs to automatically give names to methods based on their contents
Code search Gu et al. [25] search for code snippers using natural language queries
PR Duplicates: Nijessen [26] used deep learning to find duplicate PRs

An overview can be seen in this taxonomy.

This course

In this course, we will focus on state of the art research in the areas of:

Testing
Building and Continuous integration
Release engineering and DevOps
Ecosystems
Runtime
App stores

What are software analytics?

Modern software engineering

Modern software development

Various definitions

Ref	Who?	Definition
[27]	Hassan	[Software Intelligence] offers software practitioners (not just developers) up-to-date and pertinent information to support their daily decision-making processes.
[28]	Buse	The idea of analytics is to leverage potentially large amounts of data into real and actionable insights.
[29]	Zhang	Software analytics is to enable software practitioners to perform data exploration and analysis in order to obtain insightful and actionable information for data-driven tasks around software and services.
[31]	Menzies	Software analytics is analytics on software data for managers and software engineers with the aim of empowering software development individuals and teams to gain and share insight from their data to make better decisions.

D: So what are software analytics?

Software analytics: The main goal

The broader goal of software analytics is to extract value from data traces residing in software repositories, in order to assist developers to write better software.

The software analytics feedback loop

Content Credits

ISO-9126 figure, from WikiCommons
Feedback loop figure, from ACG Research

Bibliography

[1]

T. J. McCabe, “A complexity measure,” IEEE Transactions on Software Engineering, vol. SE–2, no. 4, pp. 308–320, Dec. 1976.

[2]

M. H. Halstead and others, Elements of software science (operating and programming systems series). Elsevier Science Inc., New York, NY, 1977.

[3]

B. Curtis, S. B. Sheppard, P. Milliman, M. A. Borst, and T. Love, “Measuring the psychological complexity of software maintenance tasks with the halstead and McCabe metrics,” IEEE Transactions on Software Engineering, vol. SE–5, no. 2, pp. 96–104, 1979.

[4]

B. W. Boehm and others, Software engineering economics, vol. 197. Prentice-hall Englewood Cliffs (NJ), 1981.

[5]

M. M. Lehman, “Programs, life cycles, and laws of software evolution,” Proceedings of the IEEE, vol. 68, no. 9, pp. 1060–1076, 1980.

[6]

V. R. Basili, R. W. Selby, and D. H. Hutchens, “Experimentation in software engineering,” IEEE Transactions on software engineering, no. 7, pp. 733–743, 1986.

[7]

V. R. Basili, “Software modeling and measurement: The goal/question/metric paradigm,” 1992.

[8]

S. R. Chidamber and C. F. Kemerer, “A metrics suite for object oriented design,” IEEE Transactions on software engineering, vol. 20, no. 6, pp. 476–493, 1994.

[9]

A. Mockus, R. T. Fielding, and J. D. Herbsleb, “Two case studies of open source software development: Apache and mozilla,” ACM Trans. Softw. Eng. Methodol., vol. 11, no. 3, pp. 309–346, Jul. 2002.

[10]

G. von Krogh, S. Spaeth, and K. R. Lakhani, “Community, joining, and specialization in open source software innovation: A case study,” Research Policy, vol. 32, no. 7, pp. 1217–1241, 2003.

[11]

J. D. Herbsleb and A. Mockus, “An empirical study of speed and communication in globally distributed software development,” IEEE Transactions on Software Engineering, vol. 29, no. 6, pp. 481–494, 2003.

[12]

T. Zimmermann, P. Weisgerber, S. Diehl, and A. Zeller, “Mining version histories to guide software changes,” in Proceedings of the 26th international conference on software engineering, 2004, pp. 563–572.

[13]

N. Nagappan, T. Ball, and A. Zeller, “Mining metrics to predict component failures,” in Proceedings of the 28th international conference on software engineering, 2006, pp. 452–461.

[14]

I. Heitlager, T. Kuipers, and J. Visser, “A practical model for measuring maintainability,” in 6th international conference on the quality of information and communications technology (QUATIC 2007), 2007, pp. 30–39.

[15]

A. E. Hassan, “Predicting faults using the complexity of code changes,” in Proceedings of the 31st international conference on software engineering, 2009, pp. 78–88.

[16]

C. Bird, N. Nagappan, P. Devanbu, H. Gall, and B. Murphy, “Does distributed development affect software quality? An empirical case study of windows vista,” in Proceedings of the 31st international conference on software engineering, 2009, pp. 518–528.

[17]

T. Zimmermann, N. Nagappan, H. Gall, E. Giger, and B. Murphy, “Cross-project defect prediction: A large scale experiment on data vs. Domain vs. process,” in Proceedings of the the 7th joint meeting of the european software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, 2009, pp. 91–100.

[18]

A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu, “On the naturalness of software,” in Software engineering (ICSE), 2012 34th international conference on, 2012, pp. 837–847.

[19]

G. Gousios, “The GHTorrent dataset and tool suite,” in Proceedings of the 10th working conference on mining software repositories, 2013, pp. 233–236.

[20]

M. Beller, G. Gousios, and A. Zaidman, “TravisTorrent: Synthesizing travis CI and GitHub for full-stack research on continuous integration,” in Proceedings of the 14th working conference on mining software repositories, 2017.

[21]

S. Baltes, L. Dumani, C. Treude, and S. Diehl, “SOTorrent: Reconstructing and analyzing the evolution of stack overflow posts,” in Proceedings of the 15th international conference on mining software repositories, 2018, pp. 319–330.

[22]

K. Herzig, M. Greiler, J. Czerwonka, and B. Murphy, “The art of testing less without sacrificing quality,” in Proceedings of the 37th international conference on software engineering - volume 1, 2015, pp. 483–493.

[23]

A. Gorla, I. Tavecchia, F. Gross, and A. Zeller, “Checking app behavior against app descriptions,” in Proceedings of the 36th international conference on software engineering, 2014, pp. 1025–1035.

[24]

M. Allamanis, H. Peng, and C. Sutton, “A convolutional attention network for extreme summarization of source code,” in International conference on machine learning, 2016, pp. 2091–2100.

[25]

X. Gu, H. Zhang, and S. Kim, “Deep code search,” in Proceedings of the 40th international conference on software engineering, 2018, pp. 933–944.

[26]

R. Nijessen, “A case for deep learning in mining software repositories.” TU Delft, Delft, NL, Nov-2017.

[27]

A. E. Hassan and T. Xie, “Software intelligence: The future of mining software engineering data,” in Proceedings of the FSE/SDP workshop on future of software engineering research, 2010, pp. 161–166.

[28]

R. P. L. Buse and T. Zimmermann, “Analytics for software development,” in Proceedings of the FSE/SDP workshop on future of software engineering research, 2010, pp. 77–80.

[29]

D. Zhang, Y. Dang, J.-G. Lou, S. Han, H. Zhang, and T. Xie, “Software analytics as a learning case in practice: Approaches and experiences,” in Proceedings of the international workshop on machine learning technologies in software engineering, 2011, pp. 55–58.

[31]

T. Menzies and T. Zimmermann, “The many faces of software analytics,” IEEE Software, vol. 30, no. 5, pp. 28–29, 2013.