Comparison of data selection methods for modeling chemical processes with artificial neural networks

Fabian Zapf, Thomas Wallek*

*Korrespondierende/r Autor/-in für diese Arbeit

Publikation: Beitrag in einer FachzeitschriftArtikelBegutachtung


Instance selection aims at selecting model training data in a way that the performance of the trained models is maximized. In the context of modeling chemical processes by artificial neural networks, it can serve as an essential preprocessing step since measurement data of such processes are commonly highly clustered and thus far away from being ideally normally distributed. In this paper, four filter methods from literature and a newly proposed method for data selection are tested and combined with a convex hull data selection algorithm, which results in ten different selection approaches. These approaches are applied to five selected datasets by training feed-forward artificial neural networks with the produced split datasets. The final mean model deviation is used to quantify the algorithms’ performance and their standard deviation to provide information about their reproducibility. It is found that the convex hull extended algorithms self-organizing maps based stratified sampling with a proportional allocation rule and the newly proposed self-information-based subset selection perform best for real-world chemical engineering data.

FachzeitschriftApplied Soft Computing
PublikationsstatusVeröffentlicht - Dez. 2021

ASJC Scopus subject areas

  • Software


Untersuchen Sie die Forschungsthemen von „Comparison of data selection methods for modeling chemical processes with artificial neural networks“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren