Acoustic Scene Classification Using A Convolutional Neural Network Ensemble and Nearest Neighbor Filters

Thi Kim Truc Nguyen; Franz Pernkopf

Acoustic Scene Classification Using A Convolutional Neural Network Ensemble and Nearest Neighbor Filters

Thi Kim Truc Nguyen, Franz Pernkopf

Institut für Signalverarbeitung und Sprachkommunikation (4420)

Publikation: Konferenzbeitrag › Poster › Begutachtung

Abstract

This paper proposes Convolutional Neural Network (CNN) ensembles for acoustic scene classification of tasks 1A and 1B of the DCASE 2018 challenge. We introduce a nearest neighbor filter applied on spectrograms, which allows to emphasize and smooth similar patterns of sound events in a scene. We also propose a variety of CNN models for single-input (SI) and multi-input (MI) channels and three different methods for building a network ensemble. The experimental results show that for task 1A the combination of the MI-CNN structures using both of log-mel features and their nearest neighbor filtering is slightly more effective than the single-input channel CNN models using log-mel features only. This statement is opposite for task 1B. In addition, the ensemble methods improve the accuracy of the system significantly, the best ensemble method is ensemble selection, which achieves 69.3% for task 1A and 63.6% for task 1B. This improves the baseline system by 8.9% and 14.4% for task 1A and 1B, respectively.

Originalsprache	englisch
Publikationsstatus	Veröffentlicht - 20 Nov. 2018

ASJC Scopus subject areas

Ingenieurwesen (insg.)

Zugriff auf Dokument

poster_dcase2018

Dieses zitieren

@conference{1911737d0c0f45e88d878023b080f63f,

title = "Acoustic Scene Classification Using A Convolutional Neural Network Ensemble and Nearest Neighbor Filters",

abstract = "This paper proposes Convolutional Neural Network (CNN) ensembles for acoustic scene classification of tasks 1A and 1B of the DCASE 2018 challenge. We introduce a nearest neighbor filter applied on spectrograms, which allows to emphasize and smooth similar patterns of sound events in a scene. We also propose a variety of CNN models for single-input (SI) and multi-input (MI) channels and three different methods for building a network ensemble. The experimental results show that for task 1A the combination of the MI-CNN structures using both of log-mel features and their nearest neighbor filtering is slightly more effective than the single-input channel CNN models using log-mel features only. This statement is opposite for task 1B. In addition, the ensemble methods improve the accuracy of the system significantly, the best ensemble method is ensemble selection, which achieves 69.3% for task 1A and 63.6% for task 1B. This improves the baseline system by 8.9% and 14.4% for task 1A and 1B, respectively.",

author = "Nguyen, {Thi Kim Truc} and Franz Pernkopf",

year = "2018",

month = nov,

day = "20",

language = "English",

}

TY - CONF

T1 - Acoustic Scene Classification Using A Convolutional Neural Network Ensemble and Nearest Neighbor Filters

AU - Nguyen, Thi Kim Truc

AU - Pernkopf, Franz

PY - 2018/11/20

Y1 - 2018/11/20

N2 - This paper proposes Convolutional Neural Network (CNN) ensembles for acoustic scene classification of tasks 1A and 1B of the DCASE 2018 challenge. We introduce a nearest neighbor filter applied on spectrograms, which allows to emphasize and smooth similar patterns of sound events in a scene. We also propose a variety of CNN models for single-input (SI) and multi-input (MI) channels and three different methods for building a network ensemble. The experimental results show that for task 1A the combination of the MI-CNN structures using both of log-mel features and their nearest neighbor filtering is slightly more effective than the single-input channel CNN models using log-mel features only. This statement is opposite for task 1B. In addition, the ensemble methods improve the accuracy of the system significantly, the best ensemble method is ensemble selection, which achieves 69.3% for task 1A and 63.6% for task 1B. This improves the baseline system by 8.9% and 14.4% for task 1A and 1B, respectively.

AB - This paper proposes Convolutional Neural Network (CNN) ensembles for acoustic scene classification of tasks 1A and 1B of the DCASE 2018 challenge. We introduce a nearest neighbor filter applied on spectrograms, which allows to emphasize and smooth similar patterns of sound events in a scene. We also propose a variety of CNN models for single-input (SI) and multi-input (MI) channels and three different methods for building a network ensemble. The experimental results show that for task 1A the combination of the MI-CNN structures using both of log-mel features and their nearest neighbor filtering is slightly more effective than the single-input channel CNN models using log-mel features only. This statement is opposite for task 1B. In addition, the ensemble methods improve the accuracy of the system significantly, the best ensemble method is ensemble selection, which achieves 69.3% for task 1A and 63.6% for task 1B. This improves the baseline system by 8.9% and 14.4% for task 1A and 1B, respectively.

M3 - Poster

ER -

Acoustic Scene Classification Using A Convolutional Neural Network Ensemble and Nearest Neighbor Filters

Abstract

ASJC Scopus subject areas

Zugriff auf Dokument

Fingerprint

Dieses zitieren