Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery

Michael Netzer*, Christian Baumgartner, Daniel Baumgarten

*Korrespondierende/r Autor/-in für diese Arbeit

Publikation: Beitrag in einer FachzeitschriftArtikelBegutachtung

Abstract

High throughput technologies in genomics allow to analyze small alterations in gene
expression levels. Patterns of such deviations are an important starting point for the
discovery and verification of new biomarker candidates. The identification of such
patterns is a challenging task requiring sophisticated machine learning approaches.
Currently, there exists a large variety of classification models and a common approach is to compare the performance and select the best one for a given classification problem. Since the association between data set characteristics and performance of a particular classification method is still not fully understood, the major contribution of this work is to provide a new methodology to predict prediction results of different classifiers in the field of biomarker discovery. We here propose a three-steps computational workflow that includes an analysis of the data set characteristics, the calculation of the classification accuracy and, finally, the prediction of the resulting classification error. The experiments were carried out on synthetic and microarray datasets. Using thismethod, we could demonstrate that the predictability strongly depends on thediscriminatory ability of the features (e.g., sets of genes) in two or multi-class datasets. If a dataset exhibits a certain discriminatory ability this method allows predicting the classification performance before applying a learning model. Thus, our results contribute to a better understanding of the relationship between dataset characteristics and the corresponding performance of a machine learning method and suggest the optimal classification method for a given data set based on its discriminatory ability.
Originalspracheenglisch
Aufsatznummere0276607
FachzeitschriftPLoS ONE
Jahrgang17
Ausgabenummer11
DOIs
PublikationsstatusVeröffentlicht - 9 Nov. 2022

ASJC Scopus subject areas

  • Allgemein

Fields of Expertise

  • Human- & Biotechnology

Fingerprint

Untersuchen Sie die Forschungsthemen von „Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren