Paper Discusion: Prazi

We discuss a paper under submission, entitled: “Prazi: From Package-based to Precise Call-based Dependency Network Analyses”, by anonymous authors.

Paper

  • Conference paper (10 pages)
  • Anonymized - no author names appear in the paper or replication package

People

We don’t know the names of the authors

Q Can we guess who the authors are by the work? Should we also do this as reviewers?

For those who are interested to know more, I refer to the paper by [1]

Motivation

Why do this research?

  • Package repositories are fragile to changes (also an understudied topic in SE research)
  • Dependency analyses are imprecise with a high degree of false positives
  • Metadata-based approaches cannot localize problems
  • Current metadata-based approaches scratches the surface of what ecosystem analyses can do

Research method

What does the paper do?

The paper invents a technique to derive dependency networks at the function call granularity and a reference implementation for Rust’s ecosystem.

How is it done? The technique extends the dependency resolution of packages together with call graph extraction to construct a dependency network representing how packages call each other.

Given a seed of packages with dependency descriptors:

  1. resolves the dependencies of the packages in the seed set in a recursive fashion (transitive closure)
  2. download the source code content of each package in the transitive closure of the set
  3. infer a call graph and annotate function nodes with dependency information
  4. merge annotated call graphs into a single dependency network

The reference implementation is applied to two use cases, security and deprecation.

Results

What does the paper find?

  • Feasible to construct a dependency network based on call data, possible to build nearly 70% of Rust ecosystem
  • Case study on security vulnerability propagation shows that the approach exhibits higher precision and accuracy than PDN, but also significantly lower recall
  • Half of the dependents in the “transitive closure” of 6 package releases with deprecated functions could face problems if removed

Discussion / Implications

Why are the results important?

  • Previous research does not consider trade-off aspects such as precision and unsoundness
  • Fast-paced nature of package releases in an ecosystem strongly suggests that snapshot analysis are not enough
  • Both use cases demonstrate that dependency analysis on a function level provides precision benefits and do reduce false positives
  • “The quality of the ___ is directly dependent on the quality of the call graph generator.”
  • The implementation reference share weaknesses and difficulties; important for researchers who wish to embark on something similar
  • Deprecation impact analysis is useful to library maintainers for identifying dependencies on deprecated functions that would lead to impaired functionality if the deprecated functions were to be removed

Questions

  • After saying “we also show that there is no principal objection to make ___ fully sound and precise.” What is the scientific basis on which this statement stands? Speculative: Why would the paper contain such a statement?
  • The approach claims to be generic in several passages. Do we have impediments to accept this claim?
  • The paper use numerous arguments to implement it for Rust. Do we believe the arguments are justified given the problems with the call graph generator?
  • To evaluate the completeness of the CDN, the paper reverse engineer it to a PDN. Is this justified? What are the problems of doing it this way?

  • Do the title capture the work well or is it a “click-bait”?
  • The paper mention a couple of times “novel”, do we find the technique novel?
  • Do the results of Peril 3 add a significant contribution to the paper?
  • The paper includes perils in the approach section. Do we believe it is necessary to directly specify limitations early on in the paper?
  • Do we find the motivation in the paper strong?
  • If reviewers would ask us for advice, would we favor acceptance or rejection of this paper?

Discussion summary

by Max van Deursen

Q: What do you think about investigating who the author behind a paper is, when reviewing in a double-blind setting?

A: This might introduce a bias towards the paper and defeats the whole point of a double-blind.

Q: Do you think action should be taken against such investigating?

A: This investigation is not explicitly stated, which makes it hard to act upon. If this is completely clear, action could be taken.

Q: What were the key results of the paper? Did they show that this approach was feasible?

A: They did the research on the Rust Package Archive, however, this is a relatively small amount of packages. Maybe if they had used bigger network, it might have been infeasible.

Q: What were the problems they mentioned and were these reasonably adressed in the paper?

A: In the threats to validity, they mentioned that the system sacrifices soundness, which is inevitable. Moreover, the paper stated that a static tool sacrifices a lot, missing the dynamic security aspects of analysis.

Q: How would a lower recall in this analysis play into different applications?

A: When a system favors false positives over false negatives, for example when security is concerned, then it does play a big role.

Q: Considering the results, do you think these are strong with respect to the motivation?

A: They do not seem very strong, at some point it is reasoned “When it doesn’t work, there is no reason why it should”. However, these results are partially a result of the CDN. Therefore, it might be necessary to decouple these results. However, the practical implications also require the CDN techniques to be taken into account.

Q: Is the approach stated in the paper novel enough?

A: I haven’ts een any call graph over dependencies, in that sense the approach is novel.

Q: When referring to table II, the paper states that using a snapshot is not viable. However, the paper itself does use a snapshot. What do you think about this?

A: The problem might be that there is not a better alternative to snapshots. In that case, snapshots would be viable. However, from a practical standpoint, snapshots of different times have different conclusions. Therefore, from a practical perspective, it is definitely not viable to use snapshots. However, for research, it might be.

Q: What has to be doen for an incremental call graph update, to get rid of the snapshot technique?

A: Whenever a package is update, update the call graph for that package. However, although the problem is addressed in the problem as well, many packages use wild cards for updating as well. This means that the whole call graph has to be updated on each updated package in the worst case, which might not be feasible to do, especially within large networks.

Q: What is the problem with using other languages using the same techniques? For example, why would this analysis be harder for dynamic languages?

A: Dynamic analysis is required to coer edge cases, since almost anything can be changed at runtime. The Call Graph Generator cannot be accurate in that case.

Q: What can the problem be if Java was analyzed?

A: Java both has generic functions, as well as some dynamic properties as well, such as reflection and dynamic dispatch.

Q: Multiple arguments are given for using Rust, but there are still a lot of problems using Rust. Do you think these problems are justifiable?

A: In the paper, it is mentioned that translating the CDN to a PDN still has better results. Therefore, it can be justified.

Q: Shouldn’t the paper have used a language which has more research in the field of generating a call graph?

A: It might indeed be that Rust is indeed relatively new and that other older languages might have been better because of this research.

Paper discussion: Valiev et al. [2]

We discuss the paper “Ecosystem-Level Determinants of Sustained Activity in Open-Source Projects: A Case Study of the PyPI Ecosystem”, by Valiev et al.

Aim of the paper

What factors influence PyPi project survival?

  • The number of upstream dependencies is related to a lower probability of project survival.
  • The number of downstream dependencies is related to a greater probability of project survival.
  • Structural properties indicating more indirect connectivity through transitive dependencies are related to a greater probability of survival.
  • Backporting is related to a higher probability of project survival.
  • Projects supported by large organizations have a higher probability of survival.

Methodology

  • Combination of qualitative and quantitative research
  • Quantitative: Survival analysis of 130,000 PyPi packages
  • Qualitative: 10 semi structured interviews
  • Used logistic regression for projects becoming dormant in the early-stage
  • Used Cox proportional hazards model to estimate regression coefficients for later stage

#### Results

  • The number of upstream dependencies is related to a lower probability of project survival. (inconclusive)
  • The number of downstream dependencies is related to a greater probability of project survival. (probably)
  • Structural properties indicating more indirect connectivity through transitive dependencies are related to a greater probability of survival. (cannot confirm)
  • Backporting is related to a higher probability of project survival. (cannot confirm)
  • Projects supported by large organizations have a higher probability of survival. (probably)

Discussion

by Max van Deursen

Q: What did we think of this paper? It was fairly new, without being formally published.

A: It is an interesting paper, which might have been groundbreaking if it would find a metric for dormant projects.

Q: Why would you focus on the dormant projects?

A: It is interesting to know when deciding on dependencies, whether it is actively being developed.

Q: The paper uses a definition for dormancy, where a project is marked dormant if it had less than one commit on average per month for the last half year. What do you think about this definition?

A: Although the definition is not perfect, it might be that there is no better definition. Moreover, the data required for this definition is easy to obtain. Other indicators, such as issues versus commits, responses on new issues or inactive issues and pull requests might be indicators for dormancy as well.

Q: The paper has conducted qualitative research, consisting of only 10 interviewees. What do you think about this?

A: The amount of interviewees are quite low.

Q: When would you have enough interviewees?

A: If a response of one of the interviewees has a negligible impact on the results. Instead, if more respondents seem to be given the same responses, this might be an indication to stop. However, it has to be made sure that all roles in a development team are covered by the interviews.

Q: Ultimately, do we trust these interviews?

A: Firstly, the paper stated that after six interviews, theoretical saturation was achieved. However, they did not post the exact questions asked, just the topics which were discussed. Although few interviews lead to inaccurate results, these interviews are only used to explain the results and therefore it the small amount of interviews shouldn’t matter too much. However, using these interviews to explain the results might lead to the complete paper not being a mixed method paper anymore.

Q: The paper has only used projects which were on GitHub. Do you think this could influence the quantitative results?

A: GitHub has such a big share of projects that it should not influence the results at all, even though not everything is represented. On this note, however, it could be interesting to see whether different repository platforms influence the survival rate of a project. For example, using GitHub versus using Sourceforge.

Q: The paper used PyPI to retrieve the GitHub URLs from the projects. If you were to replicate this study, would you use only GitHub or PyPI, or use both as well?

A: If only GitHub was used, retrieving all Python packages would result into a lot of client applications being retrieved, not actual packages. However, PyPI lacks the in depth information such as individual commits, but instead only includes releases. Therefore, it might be best to use both GitHub and PyPI, as they compliment eachother.

Q: The paper first conducted its quantative research, before conducting the qualitative part of the research. What do you think of this sequence of events?

A: It might be that the quantitative research rendered unexpected results and that was why a qualitative part was introduced to explain these findings. However, the best might be to do each part alongside eachother, since doing one after another might result into bias for the latter part.

Q: If you were to do the qualitative part before the quantative part, what questions would you ask in the interview? (Assuming that no hypothesis is set up)

A: “What do you think influences projects in becoming dormant after some time?” However, if we ask this to the class, we get answers such as that the core developers are bored or lack time to work on the projects, it is feature complete or the competitors are killing the project. As we notice, this does not match the definitions, however, many of these factors cannot be retrieved from the provided data.

Q: The paper formulates five hypotheses, three of which are not concludable and two are probable. The paper does not provide hard conclusions. Does this tell us anything about the research strategy?

A: In any case, it is good that none of these conclusions are hard conclusions, because this makes a lot more sense than to have a hard conclusion from only one point of view.

Q: Do you think this paper will receive a lot of citations in the future?

A: This paper might be too specific to retrieve citations. However, since this paper only tests for the PyPI ecosystem, it might be referenced as continuation on this paper, with a different ecosystem.

Q: In the end, can we say that this paper is important?

A: This paper might give a nice way to look at the health of a package. Moreover, there are a lot of things to win in future work, building on this paper. The driving factor will probably be businesses, which would want to look at the health of a certain dependency