Pull Request Decisions Explained: An Empirical Overview

by Zhang, Xunhui and Yu, Yue and Georgios, Gousios and Rastogi, Ayushi

You can get a pre-print version from here.
You can view the publisher's page here.

Abstract

Context: Pull-based development model is widely used in open source, leading the trends in distributed software development. One aspect which has garnered significant attention is studies on pull request decision - identifying factors for explanation. Objective: This study builds on a decade long research on pull request decision to explain it. We empirically investigate how factors influence pull request decision and scenarios that change the influence of factors. Method: We identify factors influencing pull request decision on GitHub through a systematic literature review and infer it by mining archival data. We collect a total of 3,347,937 pull requests with 95 features from 11,230 diverse projects on GitHub. Using this data, we explore the relations of the factors to each other and build mixed-effect logistic regression models to empirically explain pull request decision. Results: Our study shows that a small number of factors explain pull request decision with the integrator same or different from the submitter as the most important factor. We also noted that some factors are important only in special cases e.g., the percentage of failed builds is important for pull request decision when continuous integration is used.

Bibtex record

@article{ZTGR22,
  author = {Zhang, Xunhui and Yu, Yue and Georgios, Gousios and Rastogi, Ayushi},
  journal = {IEEE Transactions on Software Engineering},
  title = {Pull Request Decisions Explained: An Empirical Overview},
  year = {2023},
  volume = {49},
  number = {2},
  pages = {849-871},
  doi = {10.1109/TSE.2022.3165056},
  url = {https://arxiv.org/pdf/2105.13970.pdf}
}