Short Introduction to This Paper
This paper deeply studied 395 patches of the Defects4J dataset. Quantitative properties (patch size and spreading) were automatically extracted, whereas qualitative ones (repair actions and patterns) were manually extracted using a thematic analysisbased approach. It found that:
1) the median size of Defects4J patches is four lines, and almost 30% of the patches contain only addition of lines;
2) 92% of the patches change only one file, and 38% has no spreading at all;
3) the top-3 most applied repair actions are addition of method calls, conditionals, and assignments, occurring in 77% of the patches;
4) nine repair patterns were found for 95% of the patches, where the most prevalent, appearing in 43% of the patches, is on conditional blocks.
These results are useful for researchers to perform advanced analysis on their techniques’ results based on Defects4J. Moreover, this set of properties can be used to characterize and compare different bug datasets
Highlights of This Paper
- The anatomy of the patches in Defects4J containing an extensive set of patch properties, consolidated into a JSON file and augmented with a web user-interface to facilitate exploration
- A bug dataset dissection methodology to extract valuable quantitative and qualitative properties regarding patches from bug datasets. The methodology is based on diff and advanced patch analysis and combines automated and manual thematic analysis
- A taxonomy of repair actions and patterns, resulted from manual analysis of patches according our methodology
Key Infomation
Relevant Future Works
- Maybe some supplement for Defects4J's document
- Characterization and comparison between different bug datasets
URL
Dissection of a Bug Dataset: Anatomy of 395 Patches from Defects4J