A systematic literature review on benchmarks for evaluating debugging approaches

Thomas Hirsch; Birgit Gertraud Hofer

doi:10.1016/j.jss.2022.111423

A systematic literature review on benchmarks for evaluating debugging approaches

Thomas Hirsch, Birgit Gertraud Hofer^*

^*Corresponding author for this work

Institute of Software Technology (7160)

Research output: Contribution to journal › Article › peer-review

Abstract

Bug benchmarks are used in development and evaluation of debugging approaches, e.g. fault localization and automated repair. Quantitative performance comparison of different debugging approaches is only possible when they have been evaluated on the same dataset or benchmark. However, benchmarks are often specialized towards usage for certain debugging approaches in their contained data, metrics, and artifacts. Such benchmarks cannot be easily used on debugging approaches outside their scope as such approach may rely on specific data such as bug reports or code metrics that are not included in the dataset. Furthermore, benchmarks vary in their size w.r.t. the number of subject programs and the size of the individual subject programs. For these reasons, we have performed a systematic literature review where we have identified 73 benchmarks that can be used to evaluate debugging approaches. We compare the different benchmarks w.r.t. their size and the provided information such as bug reports, contained test cases, and other code metrics. This comparison is intended to help researchers to quickly identify all suitable benchmarks for evaluating their specific debugging approaches. Furthermore, we discuss reoccurring issues and challenges in selection, acquisition, and usage of such bug benchmarks, i.e., data availability, data quality, duplicated content, data formats, reproducibility, and extensibility. Editor's note: Open Science material was validated by the Journal of Systems and Software Open Science Board.

Original language	English
Article number	111423
Number of pages	17
Journal	Journal of Systems and Software
Volume	192
DOIs	https://doi.org/10.1016/j.jss.2022.111423
Publication status	Published - Oct 2022

Keywords

Debugging
Benchmark
Fault localization
Automated repair
Automatic repair

ASJC Scopus subject areas

Software
Information Systems
Hardware and Architecture

Fields of Expertise

Information, Communication & Computing

Treatment code (Nähere Zuordnung)

Basic - Fundamental (Grundlagenforschung)

Access to Document

10.1016/j.jss.2022.111423Licence: CC BY 4.0

Cite this

@article{98703337716f4302ae0ac8e741b69faf,

title = "A systematic literature review on benchmarks for evaluating debugging approaches",

abstract = "Bug benchmarks are used in development and evaluation of debugging approaches, e.g. fault localization and automated repair. Quantitative performance comparison of different debugging approaches is only possible when they have been evaluated on the same dataset or benchmark. However, benchmarks are often specialized towards usage for certain debugging approaches in their contained data, metrics, and artifacts. Such benchmarks cannot be easily used on debugging approaches outside their scope as such approach may rely on specific data such as bug reports or code metrics that are not included in the dataset. Furthermore, benchmarks vary in their size w.r.t. the number of subject programs and the size of the individual subject programs. For these reasons, we have performed a systematic literature review where we have identified 73 benchmarks that can be used to evaluate debugging approaches. We compare the different benchmarks w.r.t. their size and the provided information such as bug reports, contained test cases, and other code metrics. This comparison is intended to help researchers to quickly identify all suitable benchmarks for evaluating their specific debugging approaches. Furthermore, we discuss reoccurring issues and challenges in selection, acquisition, and usage of such bug benchmarks, i.e., data availability, data quality, duplicated content, data formats, reproducibility, and extensibility. Editor's note: Open Science material was validated by the Journal of Systems and Software Open Science Board.",

keywords = "Debugging, Benchmark, Fault localization, Automated repair, Automatic repair",

author = "Thomas Hirsch and Hofer, {Birgit Gertraud}",

year = "2022",

month = oct,

doi = "10.1016/j.jss.2022.111423",

language = "English",

volume = "192",

journal = "Journal of Systems and Software ",

issn = "0164-1212",

publisher = "Elsevier Inc.",

}

TY - JOUR

T1 - A systematic literature review on benchmarks for evaluating debugging approaches

AU - Hirsch, Thomas

AU - Hofer, Birgit Gertraud

PY - 2022/10

Y1 - 2022/10

N2 - Bug benchmarks are used in development and evaluation of debugging approaches, e.g. fault localization and automated repair. Quantitative performance comparison of different debugging approaches is only possible when they have been evaluated on the same dataset or benchmark. However, benchmarks are often specialized towards usage for certain debugging approaches in their contained data, metrics, and artifacts. Such benchmarks cannot be easily used on debugging approaches outside their scope as such approach may rely on specific data such as bug reports or code metrics that are not included in the dataset. Furthermore, benchmarks vary in their size w.r.t. the number of subject programs and the size of the individual subject programs. For these reasons, we have performed a systematic literature review where we have identified 73 benchmarks that can be used to evaluate debugging approaches. We compare the different benchmarks w.r.t. their size and the provided information such as bug reports, contained test cases, and other code metrics. This comparison is intended to help researchers to quickly identify all suitable benchmarks for evaluating their specific debugging approaches. Furthermore, we discuss reoccurring issues and challenges in selection, acquisition, and usage of such bug benchmarks, i.e., data availability, data quality, duplicated content, data formats, reproducibility, and extensibility. Editor's note: Open Science material was validated by the Journal of Systems and Software Open Science Board.

AB - Bug benchmarks are used in development and evaluation of debugging approaches, e.g. fault localization and automated repair. Quantitative performance comparison of different debugging approaches is only possible when they have been evaluated on the same dataset or benchmark. However, benchmarks are often specialized towards usage for certain debugging approaches in their contained data, metrics, and artifacts. Such benchmarks cannot be easily used on debugging approaches outside their scope as such approach may rely on specific data such as bug reports or code metrics that are not included in the dataset. Furthermore, benchmarks vary in their size w.r.t. the number of subject programs and the size of the individual subject programs. For these reasons, we have performed a systematic literature review where we have identified 73 benchmarks that can be used to evaluate debugging approaches. We compare the different benchmarks w.r.t. their size and the provided information such as bug reports, contained test cases, and other code metrics. This comparison is intended to help researchers to quickly identify all suitable benchmarks for evaluating their specific debugging approaches. Furthermore, we discuss reoccurring issues and challenges in selection, acquisition, and usage of such bug benchmarks, i.e., data availability, data quality, duplicated content, data formats, reproducibility, and extensibility. Editor's note: Open Science material was validated by the Journal of Systems and Software Open Science Board.

KW - Debugging

KW - Benchmark

KW - Fault localization

KW - Automated repair

KW - Automatic repair

UR - http://www.scopus.com/inward/record.url?scp=85134429445&partnerID=8YFLogxK

U2 - 10.1016/j.jss.2022.111423

DO - 10.1016/j.jss.2022.111423

M3 - Article

SN - 0164-1212

VL - 192

JO - Journal of Systems and Software

JF - Journal of Systems and Software

M1 - 111423

ER -

A systematic literature review on benchmarks for evaluating debugging approaches

Abstract

Keywords

ASJC Scopus subject areas

Fields of Expertise

Treatment code (Nähere Zuordnung)

Access to Document

Other files and links

Fingerprint

FWF - AMADEUS - Automated Debugging in Use

Cite this

A systematic literature review on benchmarks for evaluating debugging approaches

Abstract

Keywords

ASJC Scopus subject areas

Fields of Expertise

Treatment code (Nähere Zuordnung)

Access to Document

Other files and links

Fingerprint

Projects

FWF - AMADEUS - Automated Debugging in Use

Cite this