[Paper Review] Dissection of a Bug Dataset - Anatomy of 395 Patches from Defects4J
Short Introduction to This Paper
This paper deeply studied 395 patches of the Defects4J dataset. Quantitative properties (patch size and spreading) were automatically extracted, whereas qualitative ones (repair actions and patterns) were manually extracted using a thematic analysisbased approach. It found that:
1) the median size of Defects4J patches is four lines, and almost 30% of the patches contain only addition of lines;
2) 92% of the patches change only one file, and 38% has no spreading at all;
3) the top-3 most applied repair actions are addition of method calls, conditionals, and assignments, occurring in 77% of the patches;
4) nine repair patterns were found for 95% of the patches, where the most prevalent, appearing in 43% of the patches, is on conditional blocks.
These results are useful for researchers to perform advanced analysis on their techniques’ results based on Defects4J. Moreover, this set of properties can be used to characterize and compare different bug datasets
Highlights of This Paper
- The anatomy of the patches in Defects4J containing an extensive set of patch properties, consolidated into a JSON file and augmented with a web user-interface to facilitate exploration
- A bug dataset dissection methodology to extract valuable quantitative and qualitative properties regarding patches from bug datasets. The methodology is based on diff and advanced patch analysis and combines automated and manual thematic analysis
- A taxonomy of repair actions and patterns, resulted from manual analysis of patches according our methodology
Key Infomation
Research questions:
- What is the size distribution of Defects4J patches?
- To what extent are Defects4J patches spread in source code?
- What is the composition of Defects4J patches in terms of repair actions (additions, removals and modifications) over code elements (e.g. conditions and method calls)?
- What repair patterns can be found in Defects4J using a manual thematic analysis?
- Data Collection: For each bug, first produced diff views between the buggy program version and its associated fixed version. These views served as source for data extraction and analysis
Repair Patterns in the Defects4J Patches:
- Conditional Block
- Expression Fix
- Wraps-with / Unwraps-from
- Single Line
- Wrong Reference
- Missing Null-Check
- Copy / Paste
- Constant Change
- Code Moving
Relevant Future Works
- Maybe some supplement for Defects4J's document
- Characterization and comparison between different bug datasets
URL
Dissection of a Bug Dataset: Anatomy of 395 Patches from Defects4J