How to run our seminars

Purpose of seminars

The purpose of a seminar is to read and discuss papers critically:

  • Distinguish main contributions
  • Comment on the appropriateness of methods
  • Discuss the validity and generalizability of the results
  • Generate questions and discuss potential answers
  • Draw inferences and think of future work
  • Understand (retrospectively!) the paper’s importance

Discussing a paper: Moderator’s duties

  1. Have the audience read the paper and come up with a list of questions
  2. Prepare a short presentation of the paper
  3. Prepare a list of questions about the paper
  4. While the discussion is running:
    • Ensure that everyone participates
    • Ensure that everyone’s opinion is being heard
    • Keep track of time per participant
    • Take notes of opinions and discussion points
  5. After the discussion: summarize the main points discussed

Discussing a paper: participant’s duties

  1. Read the paper before the discussion and come up with a set of questions or discussion points
  2. Annotate parts of the paper (e.g., underline important sentences), especially those that are surprising to you
  3. Write a short review / commentary about the paper
  4. Ask clarification questions to the moderator
  5. Actively think about, take notes and participate to the discussion

What is a good discussion question?

A good question:

  • Does not have a binary (i.e., yes/no) or quantitative (i.e., 15) answer
  • Encourages participants to explain “how”
  • Motivates participants to make connections to things they know already

Types of good questions:

  • Comparative: Contrast various approaches / aspects / opinions
  • Connective: Link examples, facts, theories etc
  • Critical: Break down an argument, provide counter arguments

Example questions

D Characterize the following questions:

  • What aspects of software complexity does the paper capture?
  • Using what techniques could the author have analyzed the results better?
  • How many projects did the author analyze?
  • Is statistical test X suitable for this paper?
  • How would you replicate this paper?

Role playing

In some cases, actually assigning roles may help the discussion:

  • Problem or Dilemma poser: The moderator
  • Reflective analyst: Keeps track of the discussion and periodically sums it up
  • Devil’s advocate: When consensus emerges, formulates an alternative view
  • Detective: Listens carefully for unacknowledged biases that seem to be emerging

Another form of role playing is debating. Groups are assigned two opposing views and try to defend them until an agreement is reached.

Tips for being moderator

  • Encourage everyone to participate:
    • Make arrangements so that everyone is heard
    • Ask questions to specific people
    • Never criticise any opinion. Welcome inadequate answers.
  • Help participants summarize and articulate what they ’ve learned

  • Be honest, open and inviting.

  • Most importantly: Keep notes!

Our reading sessions

Befor each session:

  • The moderator announces the papers to be discussed
  • The teacher randomly selects a group of 10 participants with whom the papers are shared with
  • The participants read the paper and prepare a list of questions

At the end of each session:

  • all materials are returned to the teacher
  • the moderators grade the participants and VV

A seminar example

The paper

We will discuss the paper by Zimmerman et al. “Cross project defect prediction[1]

Motivation

Why do this research?

  • Defect prediction saves time and money if done right
  • Can we train a defect predictor to work across projects?
  • Let’s evaluate this empirically

Research method

What did the authors do?

The authors took several versions of 12 projects, extracted features and trained a logistic regression model to predict whether a component is defect prone.

Then, they trained a model on a project version A to predict defect proness on project version B. They fail to do so with good prec / recal.

Then, they quantify the effect of project similarity on the prediction accuracy, and also attempt to rank the features in terms of how they contribute to cross-project prediction.

Results

What did the authors find?

The paper authors argue (and present evidence) that cross-project defect prediction does not work

Through feature ranking, they find that the number of samples, use of specific technologies and average churn contribute the most to cross project predictive power.

Discussion / Implications

Why are the results important?

A consequence for research is that rather than increasing the precision and recall of models by some small percentage, it should focus on how to make defect prediction work across projects and relevant for a wide audience. We believe that this will be an important trend for software engineering in general. Learn from one project, to improve another.

A set of questions to discuss

  • Let’s say that you have to describe this paper to a collegue. How would you summarize it?
Defect prediction as a technique
  • Why is defect prediction a thing?
  • In which situations would you use defect prediction.
  • What is the motivation of the authors to compare firefox and IE?
Work method
  • Is the definition of defect proness satisfying; what could a more fine-grained instance of it look like?

  • If you where to extend the prediction model, what features would you use?

  • Why do the authors need to check similarity of features?

  • Are there any alternative ways to check feature similarity?

Implications
  • Can the results be degeneralized?
  • 3 key things to take away

Seminar discussion

by Thomas Kluiters

The discussion was opened by the moderator introducing the question on how one would summarize the paper.

Multiple students gave their summary, however, the moderator noted how a summary should be told like a story.

After summarizing the paper the moderator asked the students if there were any technical questions about the paper.

General questions

General questions asked:

  • “I wonder what technical defects are, what kind of technical defects are out there?”
  • “What kind of defects do the authors explore?”
  • “What types of defects can we actually predict?”

The moderator asked the first question to the audience and the group agreed that a technical defect is “Anything that’s a bug, something that doesn’t conform to the specification.”.

The second question was shortly answered as it was a factual question and could be found in the paper, the authors explored post-release defects.

A remark was made on how these days we can apply patches to systems that contain defects and fix bugs much quicker.

On modeling / predicting

Another student asked another technical question: “Why did the authors choose to use logistic regression?”

The moderator agreed that this was debatable and asked the students first, “What is logistic regression?”.

The students then defined logistic regression and agreed on it’s definition.

The follow-up question was “What other methods could the authors have used? Why would they use logistic regression?”.

A student responded how logistic regression is very simple to apply and easy to explain, thus, in the interest of simplicity the authors used logistic regression.

The moderator noted the different ways we can classify data these days, going over SVMs, Random forests and decision trees.

A student formulated his opinion on the choice of logistic regression: “It’s an arguable decision to use logistic regression, as logistic regression does not seem fit.”.

Another student disagreed: “It makes sense to use logistic regression as it’s interpretable by humans, in other words, ‘explainable’ to humans”.

A third student disagreed to this and argued that Neural Networks can be explained and will outperform logistic regression considerably.

The moderator raised a controversial opinion: “Perhaps the authors did not know any better?”.

The instructor added to this opinion: “The paper was published in 2009, is it possible for the authors to know about neural networks then?”.

The moderator continues: “The paper is published in 2009, is it possible for Neural Networks to not be as popular?”.

A student answered: “They should have known, though, it would yield the best result.”.

The instructor also mentions model fit: “What about the model fit? Did they report the model fit?” (A model fit explains how well a model fits into the data).

The students respond by saying how the paper does not meson the model fit.

On the features used for modeling

The moderator then raises a new question: “Now I have a question about the modeling used by the authors, they have used their own set of 40 features… Do you think there is any feature missing?”.

Some students reply:

  • “I feel like the (project) age is missing”
  • “Introducing more features will make the logistic regression only more vague”
  • “The curse of dimensionality” (too many features will weaken the logistic regression).

The moderator continues to ask the students about the features: “Do they have any process features?”

Multiple students give answers, and agree on the fact that the authors could have chosen more process features.

A small discussion was held on what the difference is between a product feature and a process feature. The moderator makes a comment on the fact that the authors were constrained by the data they were given, instead of gathering their own data. Furthermore, the moderator continued how researchers either choose to use quantitative data or qualitative data.

The students agreed on the fact that the authors should form a hypothesis on what features would be deemed interesting to keep, instead of using all of them.

On generalizability

The moderator then raises the question: “In your opinion, can the results of this paper be generalized?”

  • “I think it’s very hard to generalist this over multiple projects and domains. Software is specific and targets specific use-cases, and I think these metrics do not allow for broader application.”.
  • “I think they just made a model and tried to generalize it, and it didn’t work. Software engineering isn’t a manufacturing process, so this won’t work.”. “The paper is honest about the results and admits that more research shoul be done”. (The moderator noted that this answer coud apply any emperical research).

Core message

Lastly, the moderator asks the students: “A a higher level of abstraction what’s the message of this paper”.

After a short discussions the moderator and students agree on the following: “The value of this paper is the message it conveys, doing cross-project research does not really work.”.

References

[1] T. Zimmermann, N. Nagappan, H. Gall, E. Giger, and B. Murphy, “Cross-project defect prediction: A large scale experiment on data vs. domain vs. process,” in Proceedings of the the 7th joint meeting of the european software engineering conference and the acm sigsoft symposium on the foundations of software engineering, 2009, pp. 91–100.

[2] S. A. Stuff, “Leading an interactive discussion: Elective seminar,” Case Western University, 2016.