Software repositories archive valuable software engineering data, such as source code, execution traces, historical code changes, mailing lists, and bug reports. This data contains a wealth of information about a project’s status and history. Doing data science on software repositories, researchers can gain empirically based understanding of software development practices, and practitioners can better manage, maintain, and evolve complex software projects.
IN4334 is a seminar course that aims to give students a deep understanding of and hands-on approach on software analytics.
This course will enable students to:
5 ECTS: This means that you need to devote at least 140 hours of study for this course, per person. Given that the course runs in a period of 7 weeks, the workload is around 20 hours a week.
Lectures: The course consists of 14 2-hour lectures. You are not required, but you are strongly encouraged, to attend. We will be discussing 2-3 papers (presentations given either by the lecturer or by teams) in terms of techniques, insights and impact.
Homework: Before each lecture, you must read and prepare questions about the papers that will be discussed during the lecture. You can find the list of the papers to read on the beginning of each week’s lecture.
Lecturers: The course is supervised by Georgios Gousios, who is responsible for the content, assignments and exams. Several people will provide extra lectures in topics of their expertise.
Course work: To finish the course you will need to:
Groups: You will work in groups of 3-4 persons. You are free to choose your group partners.
Labs: Unsupervised, optional. 4 hours per week, designed to give you a place and time to work together.
During the course, the students will engage in 2 collaborative projects:
Every year, tens of papers are published in the area of software analytics. This leads to a high noise to signal ratio: lots of papers containing marginal insights. For outsiders, it is really difficult to obtain an overview of what software analytics have to offer to software projects.
To make things easier for newcomers, we will collaborative work on a high-quality summary of the area, outlining the current state of the art and future challenges. To make this work, the course instructor will provide an outline of the area, pointers to important papers and a paper skeleton; students will have to summarize a sub-area of software analytics.
Task duration: 5 weeks
Replication is a topic much touted but seldom practiced in the mining software repositories and the software analytics communities. It is, however, a core aspect of science, especially empirical.
The purpose of this task is to attempt a replication of a recent paper, either by downloading readily available data sets published together with the paper, requesting the data from the original authors or by applying the same techniques on a different sample. You will select a paper from the list that you studied for your literature survey.
Task duration: 5 weeks
The following material is a-must-read in the study of software analytics.
Date | Week | Lecture | Topic | Lecturer |
---|---|---|---|---|
3/9 | 1 | 1 | Course Introduction, Quantitative methods in Software Engineering | MB |
5/9 | 1 | 2 | Discussion Groups | GG |
10/9 | 2 | 1 | Process Analytics | AR / GG |
12/9 | 2 | 2 | Testing Analytics | students, MB |
17/9 | 3 | 1 | Build Analytics | students, MB |
19/9 | 3 | 2 | Bug Prediction | students, MB |
24/9 | 4 | 1 | Software Ecosystem Analytics | students, JH |
26/9 | 4 | 2 | Release Engineering Analytics | students, AR |
1/10 | 5 | 1 | Results: Survey on Software Analytics | students |
3/10 | 5 | 2 | Code Review | students, GG |
8/10 | 6 | 1 | Runtime and Performance Analytics, Cross-review of surveys | MK |
10/10 | 6 | 2 | App Store Analytics | students, MK |
15/10 | 7 | 1 | Analytics at Work: ING | Hennie Huijgens |
17/10 | 7 | 2 | Results: Replication project results | students |
Lecturers
The final course grade will be calculated as:
All deliverables will be peer-reviewed by 2 other teams. The peer-review grade is 50% of the final grade per grade item. The results add up to 110%.
[1] C. Bird, T. Menzies, and T. Zimmermann, The art and science of analyzing software data, 1st ed. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2015.
The course contents are copyrighted (c) 2018 - onwards by TU Delft and their respective authors and licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.