TY - GEN
T1 - Investigating Reproducibility in Deep Learning-Based Software Fault Prediction
AU - Mukhtar, Adil
AU - Jannach, Dietmar
AU - Wotawa, Franz
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024/9/26
Y1 - 2024/9/26
N2 - Over the past few years, increasingly complex machine learning methods have been applied for various Software Engineering (SE) tasks, particularly for the important task of automated fault prediction and localization. It, however, becomes much more difficult for scholars to reproduce the results that are reported in the literature, especially when the applied deep learning models and the evaluation methodology are not properly documented and when code and data are not shared. Given some recent - and very worrying - findings regarding reproducibility and progress in other areas of applied machine learning, this study aims to analyze to what extent the field of software engineering, in particular in the area of software fault prediction, is plagued by similar problems. We have therefore conducted a systematic review of the current literature and examined the level of reproducibility of 56 research articles that were published between 2019 and 2022 in top-tier software engineering conferences. Our analysis revealed that scholars are apparently largely aware of the reproducibility problem, and about two-thirds of the papers provide code for their proposed deep-learning models. However, it turned out that in the vast majority of cases, crucial elements for reproducibility are missing, such as the code of the compared baselines, code for data pre-processing, or code for hyperparameter tuning. In these cases, it, therefore, remains challenging to reproduce the results in the current research literature exactly. Overall, our meta-analysis, therefore, calls for improved research practices to ensure the reproducibility of machine-learning-based research.
AB - Over the past few years, increasingly complex machine learning methods have been applied for various Software Engineering (SE) tasks, particularly for the important task of automated fault prediction and localization. It, however, becomes much more difficult for scholars to reproduce the results that are reported in the literature, especially when the applied deep learning models and the evaluation methodology are not properly documented and when code and data are not shared. Given some recent - and very worrying - findings regarding reproducibility and progress in other areas of applied machine learning, this study aims to analyze to what extent the field of software engineering, in particular in the area of software fault prediction, is plagued by similar problems. We have therefore conducted a systematic review of the current literature and examined the level of reproducibility of 56 research articles that were published between 2019 and 2022 in top-tier software engineering conferences. Our analysis revealed that scholars are apparently largely aware of the reproducibility problem, and about two-thirds of the papers provide code for their proposed deep-learning models. However, it turned out that in the vast majority of cases, crucial elements for reproducibility are missing, such as the code of the compared baselines, code for data pre-processing, or code for hyperparameter tuning. In these cases, it, therefore, remains challenging to reproduce the results in the current research literature exactly. Overall, our meta-analysis, therefore, calls for improved research practices to ensure the reproducibility of machine-learning-based research.
KW - Bug Prediction
KW - Deep Learning
KW - Defect Prediction
KW - Fault Localization
KW - Reproducibility
KW - Software Debugging
UR - http://www.scopus.com/inward/record.url?scp=85206359742&partnerID=8YFLogxK
U2 - 10.1109/QRS62785.2024.00038
DO - 10.1109/QRS62785.2024.00038
M3 - Conference paper
AN - SCOPUS:85206359742
T3 - IEEE International Conference on Software Quality, Reliability and Security, QRS
SP - 306
EP - 317
BT - Proceedings - 2024 IEEE 24th International Conference on Software Quality, Reliability and Security, QRS 2024
PB - IEEE
T2 - 24th IEEE International Conference on Software Quality, Reliability and Security, QRS 2024
Y2 - 1 July 2024 through 5 July 2024
ER -