Comparison of data selection methods for modeling chemical processes with artificial neural networks

Fabian Zapf, Thomas Wallek*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Instance selection aims at selecting model training data in a way that the performance of the trained models is maximized. In the context of modeling chemical processes by artificial neural networks, it can serve as an essential preprocessing step since measurement data of such processes are commonly highly clustered and thus far away from being ideally normally distributed. In this paper, four filter methods from literature and a newly proposed method for data selection are tested and combined with a convex hull data selection algorithm, which results in ten different selection approaches. These approaches are applied to five selected datasets by training feed-forward artificial neural networks with the produced split datasets. The final mean model deviation is used to quantify the algorithms’ performance and their standard deviation to provide information about their reproducibility. It is found that the convex hull extended algorithms self-organizing maps based stratified sampling with a proportional allocation rule and the newly proposed self-information-based subset selection perform best for real-world chemical engineering data.

Original languageEnglish
Article number107938
JournalApplied Soft Computing
Volume113
Issue numberB
DOIs
Publication statusPublished - Dec 2021

Keywords

  • Artificial neural networks
  • Chemical processes
  • Data selection
  • Instance selection
  • Regression
  • Subset selection
  • Wolfram Mathematica

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'Comparison of data selection methods for modeling chemical processes with artificial neural networks'. Together they form a unique fingerprint.

Cite this