## Software analytics

Software analytics is a modern term for the use of empirical (mostly quantitative) research methods on software data.

In this lecture, we will:

• Review basic empirical research methods
• Present important developments in quantitative software engineering research
• Discuss what software analytics are

## Why quantitative software engineering?

Quantitative software engineering is a subset of empirical software engineering, a discipline that

• Transform data traces into working knowledge
• Identify and optimise patterns and trends
• Quantify and improve processes and products

D: Can you identify potential applications of quantitative Software Engineering?

## (Some) Applications of software analytics

• Pattern mining and recommendation
• “You changed this file, perhaps you might also want to change that one as well”
• “This bug report is duplicate from…”
• Autocompletion in Eclipse
• Process improvement
• test case runtime optimisation
• work prioritization
• Evolution analysis
• Software quality analysis

# Empirical Research methods: A quick primer

## Empirical research

Empiricism is a philosophical theory that states that true knoweledge can only arise by systematically observing the world.

Types of empirical research:

• Experiment: Measure the effect of an intervention
• Case study: (usually post-mortem) Analysis the properties of specific cases
• Field study: Observation of behaviours in their context
• Grounded Theory: Iteratively refine theories based on new data
• Action research: Embed into context, modify it, measure the effects

## Data collection methods

Empirical research requires the collection of data to answer research questions (RQs).

• Qualitative research methods collect non-numerical data

• Surveys
• Interviews
• Focus groups
• Quantitative methods using mathematical, statistical or numerical techniques to process numerical data:

• Modelling
• Machine learning
• Simulations

## Hypotheses

A hypothesis proposes an explanation to a phenomenon

Defined in pairs

• $$H_1$$: Default hypothesis
• $$H_0$$: Null hypothesis

A good hypothesis is readily falsifiable.

## Hypotheses | p-values

• Most statistical tests return the probability ($$p$$) that $$H_0$$ is true.

• To interpret a test, we set a threshold (usually, 0.05) for $$p$$

• If $$p <$$ threshold, then the null hypothesis is rejected and the default one is accepted

• Need to know before hand what statistical tests do

## Theories

A theory is a proposed explanation for an observed phenomenon. It (usually) specifies entities and prescribes their interactions. Using a theoretical model, we can explain and predict

Q: How can we build or dismantle a theory?

Theories are built by generalizing over consecutive research results.

A single contradicting data point is enough to reject a theory.

## Measurement

Extract samples of data for a running process. Data types:

• Continuous or Ratio: Can calculate a degree of difference between measurements
• Interval: Cannot calculate a degree of difference between measurements
• Categorical: One of a predefined set (sorting does not make sense)
• Ordinal: One of a predefined set (sorting makes sense)

# A brief history of Quantitative Software Engineering

• late 70s – Metrics: McCabe Complexity, Halstead software science, LoCs
• mid 80s – Measuring projects: Function Points, COCOMO
• mid 90s – Modeling: Reliability, Cost, Maintainability. Goal-Question-Metric.
• mid 00s – OSS repositories: Mining VCS, BTS, MLs
• mid 10s — software analytics and big data

## 70s: Basic metrics

• McCabe’s complexity [1]: Attempt to quantify complexity at the function level by counting number of branches.

• Halstead software science [2]: Attempt to generate laws of software growth

• Curtis et al. [3] found that: “All three metrics (Halstead volume, McCabe complexity, LoCs) correlated with both the accuracy of the modification and the time to completion.”

they just work!

## 80s: Fighting the “software crisis”

• Boehm [4] defined the COCOMO model, and effort to quantify and predict software cost:

• Effort: $$E = a_{b}(KLOC)^b$$
• Development time: $$D = c_{b}(E)^{d_b}$$
• People required: $$P = E/D$$

$$a, b, c$$ and $$d$$ were collected through case studies.

• Function points were developed to express the amount of business functionality.

Both COCOMO and function points are widely used today for cost estimation.

## 80s: Lehman’s laws of software evolution

Manny Lehman [5] defined a set of laws that characterise how software evolves (and ultimately predict its demise)

• software must be continually adapted or it becomes progressively less satisfactory
• as software evolves, its complexity increases unless work is done to maintain or reduce it

## 80s: Metrics galore!

Using metrics to define product and process quality

• Defect density
• Customer satisfaction
• Backlog management
• Bug removal effectiveness
• Availability
• 6$$\sigma$$
• … (more metrics)

## 90s: Empirical software engineering

Basili [6]: The Goal-Question-Metric approach:

• Goal: What do we need to study and why?
• Question: Models to characterize object of study.
• Metric: One or more metrics to quantify the question models.

A goal is stated as follows:

what example
Object of study A tool or a practice
Purpose Characterize, improve, predict etc
Focus prespective to study the problem from
Stackeholder Who is concerned with the result?
Context Confouding factors (e.g. company, environment)

The GQM approach is another way of describing the scientific method.

## 90s: New technical developments

• Object-oriented programming: Existing metrics not considering encapsulation or nesting
• Chidamber and Kemerer [8] metrics capture OO characteristics, e.g. Depth of Inheritance tree. Found to be able to predict defects in OO projects, at the design phase.
• Used to guide refactoring efforts
• Open Source Software:
• Exciting, participatory, distributed development method
• Abudance of product and process data: Version Control Systems, Bug Tracking Databases, Mailing Lists, IRC chats

## 00s: The primordial OSS study

Mockus et al: “Two case studies of open source software development: Apache and mozilla[9]

Not the first to use OSS data, but:

• Pioneered the case study as a research instrument in QSE
• Used robust statistical methods
• Combined traces from multiple OSS data sources (VCS/BTS)

## 00s: How does OSS work?

von Krogh et al.: “Community, joining, and specialization in open source software innovation: a case study[10]

Defined the, now obvious, vocabulary of OSS research:

• joining script
• contribution barrier
• coordination and awareness

Herbsleb and Mockus: “An empirical study of speed and communication in globally distributed software development[11]

• … distributed work items take about 2.5x as long to complete as items where all the work is colocated
• Established the basis of the socio-technical congruence team model

## 00s: Software engineering ❤️ Data Science

Zimmerman et al. “Mining Version Histories to Guide Software Changes[12]

Very important work because:

• Used unsupervised learning (association rules) to answer the question: “Programmers who changed these functions also changed …”
• Kickstarted a research area known as “Mining Software Repositories”
• The MSR conference, the premier conference for software analytics research, has skyrocketed to be the 8th most important venue in the software systems research

## 00s: Making money with metrics

Nagappan et al.: “Mining Metrics to Predict Component Failures[13]

• Using principal component analysis on the code metrics, we built regression models that accurately predict the likelihood of post-release defects for new entities.

Heitlager et al.: “A Practical Model for Measuring Maintainability[14]

• we identify a number of requirements to be fulfilled by a maintainability model to be usable in practice. We sketch a new maintainability model that alleviates most of these problems, and we discuss our experiences with using such as system for IT management consultancy activities.

## 00s: Finding bugs – The holy grail

Noteworthy findings (at the file level):

• Churn correlates with bug proness
• Previous bugs are good predictors of future bugs
• The more recent bugs, the more future bugs
• Number of developers does not correlate with number of bugs
• Complexity does not correlate with #bugs

## Late 00s: MSR goes mainstream

• Predicting component failures: Hassan [15] found a connection between process metrics and bugs

• Distributed software development: Bird et al. [16] found that software quality is not affected by distance

• No model to rule them all: Zimmerman et al. [17] established that software projects are different and therefore models need to be localised and specialised.

• Naturalness: Hindle et al. [18] found that “code is very repetitive, and in fact even more so than natural languages

## 10s: OSS v2, the AppStore and DevOps

In the early 10s, the velocity of software production increased at a breakneck rate

• GitHub revolutionalized OSS by centralizing it. Anyone can contribute (and contribute they do!).

• AppStores made discoverability and distribution to the end client trivial.

• The cloud transfored hardware into software.

Software analytics coined as a term to help teams improve their performance

## 10s: Learning from Big Data

• Big Software: GHTorrent (Gousios [19]) made TBs of GitHub data available to researchers. Inspired TravisTorrent [20] and SOTorrent [21]

• Big testing: Herzig et al. [22] developed “a cost model, which dynamically skips tests when the expected cost of running the test exceeds the expected cost of removing it.

• Big security: Gorla et al. [23]after clustering Android apps by their description topics, (we) identified outliers in each cluster with respect to their API usage.

## 10s: Code as Data

• Code summarization Allamanis et al. [24] use CNNs to automatically give names to methods based on their contents

• Code search Gu et al. [25] search for code snippers using natural language queries

• PR Duplicates: Nijessen [26] used deep learning to find duplicate PRs

An overview can be seen in this taxonomy.

## This course

In this course, we will focus on state of the art research in the areas of:

• Testing
• Building and Continuous integration
• Release engineering and DevOps
• Ecosystems
• Runtime
• App stores

# What are software analytics?

## Various definitions

Ref Who? Definition
[27] Hassan [Software Intelligence] offers software practitioners (not just developers) up-to-date and pertinent information to support their daily decision-making processes.
[28] Buse The idea of analytics is to leverage potentially large amounts of data into real and actionable insights.
[29] Zhang Software analytics is to enable software practitioners to perform data exploration and analysis in order to obtain insightful and actionable information for data-driven tasks around software and services.
[30] Menzies Software analytics is analytics on software data for managers and software engineers with the aim of empowering software development individuals and teams to gain and share insight from their data to make better decisions.

D: So what are software analytics?

## Software analytics: The main goal

The broader goal of software analytics is to extract value from data traces residing in software repositories, in order to assist developers to write better software.

## Bibliography

[1] T. J. McCabe, “A complexity measure,” IEEE Transactions on Software Engineering, vols. SE-2, no. 4, pp. 308–320, Dec. 1976.

[2] M. H. Halstead and others, Elements of software science (operating and programming systems series). Elsevier Science Inc., New York, NY, 1977.

[3] B. Curtis, S. B. Sheppard, P. Milliman, M. A. Borst, and T. Love, “Measuring the psychological complexity of software maintenance tasks with the halstead and mccabe metrics,” IEEE Transactions on Software Engineering, vols. SE-5, no. 2, pp. 96–104, March 1979.

[4] B. W. Boehm and others, Software engineering economics, vol. 197. Prentice-hall Englewood Cliffs (NJ), 1981.

[5] M. M. Lehman, “Programs, life cycles, and laws of software evolution,” Proceedings of the IEEE, vol. 68, no. 9, pp. 1060–1076, Sept 1980.

[6] V. R. Basili, R. W. Selby, and D. H. Hutchens, “Experimentation in software engineering,” IEEE Transactions on software engineering, no. 7, pp. 733–743, 1986.

[7] V. R. Basili, “Software modeling and measurement: The goal/question/metric paradigm,” 1992.

[8] S. R. Chidamber and C. F. Kemerer, “A metrics suite for object oriented design,” IEEE Transactions on software engineering, vol. 20, no. 6, pp. 476–493, 1994.

[9] A. Mockus, R. T. Fielding, and J. D. Herbsleb, “Two case studies of open source software development: Apache and mozilla,” ACM Trans. Softw. Eng. Methodol., vol. 11, no. 3, pp. 309–346, Jul. 2002.

[10] G. von Krogh, S. Spaeth, and K. R. Lakhani, “Community, joining, and specialization in open source software innovation: A case study,” Research Policy, vol. 32, no. 7, pp. 1217–1241, 2003.

[11] J. D. Herbsleb and A. Mockus, “An empirical study of speed and communication in globally distributed software development,” IEEE Transactions on Software Engineering, vol. 29, no. 6, pp. 481–494, June 2003.

[12] T. Zimmermann, P. Weisgerber, S. Diehl, and A. Zeller, “Mining version histories to guide software changes,” in Proceedings of the 26th international conference on software engineering, 2004, pp. 563–572.

[13] N. Nagappan, T. Ball, and A. Zeller, “Mining metrics to predict component failures,” in Proceedings of the 28th international conference on software engineering, 2006, pp. 452–461.

[14] I. Heitlager, T. Kuipers, and J. Visser, “A practical model for measuring maintainability,” in 6th international conference on the quality of information and communications technology (quatic 2007), 2007, pp. 30–39.

[15] A. E. Hassan, “Predicting faults using the complexity of code changes,” in Proceedings of the 31st international conference on software engineering, 2009, pp. 78–88.

[16] C. Bird, N. Nagappan, P. Devanbu, H. Gall, and B. Murphy, “Does distributed development affect software quality? An empirical case study of windows vista,” in Proceedings of the 31st international conference on software engineering, 2009, pp. 518–528.

[17] T. Zimmermann, N. Nagappan, H. Gall, E. Giger, and B. Murphy, “Cross-project defect prediction: A large scale experiment on data vs. domain vs. process,” in Proceedings of the the 7th joint meeting of the european software engineering conference and the acm sigsoft symposium on the foundations of software engineering, 2009, pp. 91–100.

[18] A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu, “On the naturalness of software,” in Software engineering (icse), 2012 34th international conference on, 2012, pp. 837–847.

[19] G. Gousios, “The GHTorrent dataset and tool suite,” in Proceedings of the 10th working conference on mining software repositories, 2013, pp. 233–236.

[20] M. Beller, G. Gousios, and A. Zaidman, “TravisTorrent: Synthesizing travis ci and github for full-stack research on continuous integration,” in Proceedings of the 14th working conference on mining software repositories, 2017.

[21] S. Baltes, L. Dumani, C. Treude, and S. Diehl, “SOTorrent: Reconstructing and analyzing the evolution of stack overflow posts,” in Proceedings of the 15th international conference on mining software repositories, 2018, pp. 319–330.

[22] K. Herzig, M. Greiler, J. Czerwonka, and B. Murphy, “The art of testing less without sacrificing quality,” in Proceedings of the 37th international conference on software engineering - volume 1, 2015, pp. 483–493.

[23] A. Gorla, I. Tavecchia, F. Gross, and A. Zeller, “Checking app behavior against app descriptions,” in Proceedings of the 36th international conference on software engineering, 2014, pp. 1025–1035.

[24] M. Allamanis, H. Peng, and C. Sutton, “A convolutional attention network for extreme summarization of source code,” in International conference on machine learning, 2016, pp. 2091–2100.

[25] X. Gu, H. Zhang, and S. Kim, “Deep code search,” in Proceedings of the 40th international conference on software engineering, 2018, pp. 933–944.

[26] R. Nijessen, “A case for deep learning in mining software repositories.” TU Delft, Delft, NL, Nov-2017.

[27] A. E. Hassan and T. Xie, “Software intelligence: The future of mining software engineering data,” in Proceedings of the fse/sdp workshop on future of software engineering research, 2010, pp. 161–166.

[28] R. P. Buse and T. Zimmermann, “Analytics for software development,” in Proceedings of the fse/sdp workshop on future of software engineering research, 2010, pp. 77–80.

[29] D. Zhang, Y. Dang, J.-G. Lou, S. Han, H. Zhang, and T. Xie, “Software analytics as a learning case in practice: Approaches and experiences,” in Proceedings of the international workshop on machine learning technologies in software engineering, 2011, pp. 55–58.

[30] T. Menzies and T. Zimmermann, “Software analytics: So what?” IEEE Software, vol. 30, no. 4, pp. 31–37, July 2013.