[Paper Review] How to Design a Program Repair Bot - Insights from the Repairnator Project

Short Introduction to This Paper

This paper investigates the concept of a "program repair bot" and present Repairnator. The Repairnator bot is an autonomous agent that constantly monitors test failures, reproduces bugs, and runs program repair tools against each reproduced bug. If a patch is found, Repairnator bot reports it to the developers. At the time of writing, Repairnator uses three different program repair systems and has been operating since February 2017. In total, it has studied 11 317 test failures over 1 609 open-source software projects hosted on GitHub, and has generated patches for 17 different bugs.

Highlights of This Paper

A blueprint design of a program repair bot for continuous integration (CI) test failures
A set of unique empirical facts about program repair and bug reproduction collected over 11 317 test failures across 1 609 software projects
7 recommendations to help future authors of program repair bots

Key Infomation

Figure 1: The overview workflow of the Repairnator program repair bot

Repairnator Workflow:
- The primary input of Repairnator are continuous-integration builds, triggered by commits made by developers (top part of the figure, arrows (a) and (b)). The outputs of Repairnator are two-fold: (1) it automatically produces patches for repairing failing builds (g), if any; (2) it collects valuable data on program repair in the field (h), for future research in this area (k)
- The Repairnator bot itself works as follows. Continuously, it monitors all CI activity of projects coming from a specific configuration list (c). The CI builds are given as input to a pipeline that contains three stages: (1) a first stage, called CI Build Analysis, that collects and analyzes CI builds (d) from GitHub projects (a and b); (2) a second stage, called Bug Reproduction, that aims at reproducing the build failures that have happened on CI; (3) a third stage, called Patch Synthesis, that uses the failure reproduction information to search for patches
Recommendations:
- List of Considered GitHub Projects: Check directly against the API from CI and code hosting services for building an appropriate up-to-date list of projects
- Analyzing Build Information: To the maximum extent, stick to the metadata provided by the considered CI service, and think twice before parsing logs, which is very tedious and error-prone
- The problem of merge commits: Take great care of getting the exact same code state as CI. When getting a build from a pull-request, reproduce yourself the merge commit
- Managing the dependencies: Run the bug reproduction (compilation and test execution) in a well-isolated environment. Local caches and containerization help a lot to achieve good isolation
- The problem of spurious bugs: Consider engineering the replication of TravisCI environment and run the TravisCI build scripts for repair attempts: the additional effort may be balanced by the number and quality of reproduced failing bugs
- About multi-module projects: Existing repair tools do not handle multimodule Maven project. This is major barrier to wide applicability in the field. If you were to design a new repair tool, take care of multi-module projects right at the beginning
- About response time: Consider implementing CI hooks for program repair bots, it is a good way to minimize the repair bot response time

Relevant Future Works

Targeting specific kinds of failure types can be an interesting strategy for designing future and effective repair tools
Taking advantage of the valuable information collected by Repairnator to optimize program repair technologies

URL

How to Design a Program Repair Bot - Insights from the Repairnator Project

[Paper Review] How to Design a Program Repair Bot - Insights from the Repairnator Project

Short Introduction to This Paper

Highlights of This Paper

Key Infomation

Relevant Future Works

URL

Add a new comment.

Recent posts

Recent replies

Category

Archive

Other