Effects of pooling samples on the performance of classifiers: A comparative study

K. Kusonmano; M. Netzer; Christian Baumgartner; M. Dehmer; K.R. Liedl; A. Graber

doi:10.1100/2012/278352

Effects of pooling samples on the performance of classifiers: A comparative study

K. Kusonmano, M. Netzer, Christian Baumgartner, M. Dehmer, K.R. Liedl, A. Graber^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

A pooling design can be used as a powerful strategy to compensate for limited amounts of samples or high biological variation. In this paper, we perform a comparative study to model and quantify the effects of virtual pooling on the performance of the widely applied classifiers, support vector machines (SVMs), random forest (RF), k-nearest neighbors (k-NN), penalized logistic regression (PLR), and prediction analysis for microarrays (PAMs). We evaluate a variety of experimental designs using mock omics datasets with varying levels of pool sizes and considering effects from feature selection. Our results show that feature selection significantly improves classifier performance for non-pooled and pooled data. All investigated classifiers yield lower misclassification rates with smaller pool sizes. RF mainly outperforms other investigated algorithms, while accuracy levels are comparable among all the remaining ones. Guidelines are derived to identify an optimal pooling scheme for obtaining adequate predictive power and, hence, to motivate a study design that meets best experimental objectives and budgetary conditions, including time constraints.

Original language	English
Article number	278352
Number of pages	10
Journal	The Scientific World Journal
Volume	2012
DOIs	https://doi.org/10.1100/2012/278352
Publication status	Published - 2012
Externally published	Yes

Fields of Expertise

Human- & Biotechnology

Treatment code (Nähere Zuordnung)

Basic - Fundamental (Grundlagenforschung)

Access to Document

10.1100/2012/278352Licence: CC BY 4.0

Cite this

@article{db52165aae4a4ad18351be2783a2d9b6,

title = "Effects of pooling samples on the performance of classifiers: A comparative study",

abstract = "A pooling design can be used as a powerful strategy to compensate for limited amounts of samples or high biological variation. In this paper, we perform a comparative study to model and quantify the effects of virtual pooling on the performance of the widely applied classifiers, support vector machines (SVMs), random forest (RF), k-nearest neighbors (k-NN), penalized logistic regression (PLR), and prediction analysis for microarrays (PAMs). We evaluate a variety of experimental designs using mock omics datasets with varying levels of pool sizes and considering effects from feature selection. Our results show that feature selection significantly improves classifier performance for non-pooled and pooled data. All investigated classifiers yield lower misclassification rates with smaller pool sizes. RF mainly outperforms other investigated algorithms, while accuracy levels are comparable among all the remaining ones. Guidelines are derived to identify an optimal pooling scheme for obtaining adequate predictive power and, hence, to motivate a study design that meets best experimental objectives and budgetary conditions, including time constraints.",

author = "K. Kusonmano and M. Netzer and Christian Baumgartner and M. Dehmer and K.R. Liedl and A. Graber",

year = "2012",

doi = "10.1100/2012/278352",

language = "English",

volume = "2012",

journal = "The Scientific World Journal ",

issn = "2356-6140",

publisher = "Hindawi Publ. Co.",

}

TY - JOUR

T1 - Effects of pooling samples on the performance of classifiers: A comparative study

AU - Kusonmano, K.

AU - Netzer, M.

AU - Baumgartner, Christian

AU - Dehmer, M.

AU - Liedl, K.R.

AU - Graber, A.

PY - 2012

Y1 - 2012

N2 - A pooling design can be used as a powerful strategy to compensate for limited amounts of samples or high biological variation. In this paper, we perform a comparative study to model and quantify the effects of virtual pooling on the performance of the widely applied classifiers, support vector machines (SVMs), random forest (RF), k-nearest neighbors (k-NN), penalized logistic regression (PLR), and prediction analysis for microarrays (PAMs). We evaluate a variety of experimental designs using mock omics datasets with varying levels of pool sizes and considering effects from feature selection. Our results show that feature selection significantly improves classifier performance for non-pooled and pooled data. All investigated classifiers yield lower misclassification rates with smaller pool sizes. RF mainly outperforms other investigated algorithms, while accuracy levels are comparable among all the remaining ones. Guidelines are derived to identify an optimal pooling scheme for obtaining adequate predictive power and, hence, to motivate a study design that meets best experimental objectives and budgetary conditions, including time constraints.

AB - A pooling design can be used as a powerful strategy to compensate for limited amounts of samples or high biological variation. In this paper, we perform a comparative study to model and quantify the effects of virtual pooling on the performance of the widely applied classifiers, support vector machines (SVMs), random forest (RF), k-nearest neighbors (k-NN), penalized logistic regression (PLR), and prediction analysis for microarrays (PAMs). We evaluate a variety of experimental designs using mock omics datasets with varying levels of pool sizes and considering effects from feature selection. Our results show that feature selection significantly improves classifier performance for non-pooled and pooled data. All investigated classifiers yield lower misclassification rates with smaller pool sizes. RF mainly outperforms other investigated algorithms, while accuracy levels are comparable among all the remaining ones. Guidelines are derived to identify an optimal pooling scheme for obtaining adequate predictive power and, hence, to motivate a study design that meets best experimental objectives and budgetary conditions, including time constraints.

UR - http://www.ncbi.nlm.nih.gov/pubmed/22654582

U2 - 10.1100/2012/278352

DO - 10.1100/2012/278352

M3 - Article

SN - 2356-6140

VL - 2012

JO - The Scientific World Journal

JF - The Scientific World Journal

M1 - 278352

ER -

Effects of pooling samples on the performance of classifiers: A comparative study

Abstract

Fields of Expertise

Treatment code (Nähere Zuordnung)

Access to Document

Other files and links

Fingerprint

Cite this