Representation Learning for Single-Channel Source Separation and Bandwidth Extension

Matthias Zöhrer; Franz Pernkopf; Robert Peharz

doi:10.1109/TASLP.2015.2470560

Representation Learning for Single-Channel Source Separation and Bandwidth Extension

Matthias Zöhrer, Franz Pernkopf, Robert Peharz

Institute of Signal Processing and Speech Communication (4420)

Research output: Contribution to journal › Article › peer-review

Abstract

In this paper, we use deep representation learning for model-based single-channel source separation (SCSS) and artificial bandwidth extension (ABE). Both tasks are ill-posed and source-specific prior knowledge is required. In addition to well-known generative models such as restricted Boltzmann machines and higher order contractive autoencoders two recently introduced deep models, namely generative stochastic networks (GSNs) and sum-product networks (SPNs), are used for learning spectrogram representations. For SCSS we evaluate the deep architectures on data of the 2 nd CHiME speech separation challenge and provide results for a speaker dependent, a speaker independent, a matched noise condition and an unmatched noise condition task. GSNs obtain the best PESQ and overall perceptual score on average in all four tasks. Similarly, frame-wise GSNs are able to reconstruct the missing frequency bands in ABE best, measured in frequency-domain segmental SNR. They outperform SPNs embedded in hidden Markov models and the other representation models significantly.

Original language	English
Pages (from-to)	2398-2409
Journal	IEEE Transactions on Audio Speech and Language Processing
Volume	23
Issue number	12
DOIs	https://doi.org/10.1109/TASLP.2015.2470560
Publication status	Published - 2015

Fields of Expertise

Information, Communication & Computing

Access to Document

10.1109/TASLP.2015.2470560

Cite this

@article{5893a28b83d84ff5b55ed90d09dc0d4b,

title = "Representation Learning for Single-Channel Source Separation and Bandwidth Extension",

abstract = "In this paper, we use deep representation learning for model-based single-channel source separation (SCSS) and artificial bandwidth extension (ABE). Both tasks are ill-posed and source-specific prior knowledge is required. In addition to well-known generative models such as restricted Boltzmann machines and higher order contractive autoencoders two recently introduced deep models, namely generative stochastic networks (GSNs) and sum-product networks (SPNs), are used for learning spectrogram representations. For SCSS we evaluate the deep architectures on data of the 2 nd CHiME speech separation challenge and provide results for a speaker dependent, a speaker independent, a matched noise condition and an unmatched noise condition task. GSNs obtain the best PESQ and overall perceptual score on average in all four tasks. Similarly, frame-wise GSNs are able to reconstruct the missing frequency bands in ABE best, measured in frequency-domain segmental SNR. They outperform SPNs embedded in hidden Markov models and the other representation models significantly.",

author = "Matthias Z{\"o}hrer and Franz Pernkopf and Robert Peharz",

year = "2015",

doi = "10.1109/TASLP.2015.2470560",

language = "English",

volume = "23",

pages = "2398--2409",

journal = "IEEE Transactions on Audio Speech and Language Processing ",

issn = "1558-7924",

publisher = "Institute of Electrical and Electronics Engineers",

number = "12",

}

TY - JOUR

T1 - Representation Learning for Single-Channel Source Separation and Bandwidth Extension

AU - Zöhrer, Matthias

AU - Pernkopf, Franz

AU - Peharz, Robert

PY - 2015

Y1 - 2015

N2 - In this paper, we use deep representation learning for model-based single-channel source separation (SCSS) and artificial bandwidth extension (ABE). Both tasks are ill-posed and source-specific prior knowledge is required. In addition to well-known generative models such as restricted Boltzmann machines and higher order contractive autoencoders two recently introduced deep models, namely generative stochastic networks (GSNs) and sum-product networks (SPNs), are used for learning spectrogram representations. For SCSS we evaluate the deep architectures on data of the 2 nd CHiME speech separation challenge and provide results for a speaker dependent, a speaker independent, a matched noise condition and an unmatched noise condition task. GSNs obtain the best PESQ and overall perceptual score on average in all four tasks. Similarly, frame-wise GSNs are able to reconstruct the missing frequency bands in ABE best, measured in frequency-domain segmental SNR. They outperform SPNs embedded in hidden Markov models and the other representation models significantly.

AB - In this paper, we use deep representation learning for model-based single-channel source separation (SCSS) and artificial bandwidth extension (ABE). Both tasks are ill-posed and source-specific prior knowledge is required. In addition to well-known generative models such as restricted Boltzmann machines and higher order contractive autoencoders two recently introduced deep models, namely generative stochastic networks (GSNs) and sum-product networks (SPNs), are used for learning spectrogram representations. For SCSS we evaluate the deep architectures on data of the 2 nd CHiME speech separation challenge and provide results for a speaker dependent, a speaker independent, a matched noise condition and an unmatched noise condition task. GSNs obtain the best PESQ and overall perceptual score on average in all four tasks. Similarly, frame-wise GSNs are able to reconstruct the missing frequency bands in ABE best, measured in frequency-domain segmental SNR. They outperform SPNs embedded in hidden Markov models and the other representation models significantly.

U2 - 10.1109/TASLP.2015.2470560

DO - 10.1109/TASLP.2015.2470560

M3 - Article

SN - 1558-7924

VL - 23

SP - 2398

EP - 2409

JO - IEEE Transactions on Audio Speech and Language Processing

JF - IEEE Transactions on Audio Speech and Language Processing

IS - 12

ER -

Representation Learning for Single-Channel Source Separation and Bandwidth Extension

Abstract

Fields of Expertise

Access to Document

Fingerprint

Cite this