Multi-channel speech processing architectures for noise robust speech recognition: 3rd CHiME Challenge results

Lukas Pfeifenberger, Tobias Schrank, Matthias Zöhrer, Martin Hagmüller, Franz Pernkopf

Publikation: Beitrag in Buch/Bericht/KonferenzbandBeitrag in einem KonferenzbandBegutachtung

Abstract

Recognizing speech under noisy condition is an ill-posed problem. The CHiME 3 challenge targets robust speech recognition in realistic environments such as street, bus, caffee and pedestrian areas. We study variants of beamformers used for pre-processing multi-channel speech recordings. In particular, we investigate three variants of generalized side-lobe canceller (GSC) beamformers, i.e. GSC with sparse blocking matrix (BM), GSC with adaptive BM (ABM), and GSC with minimum variance distortionless response (MVDR) and ABM. Furthermore, we apply several post-filters to further enhance the speech signal. We introduce MaxPower postfilters and deep neural postfilters (DPFs). DPFs outperformed our baseline systems significantly when measuring the overall perceptual score (OPS) and the perceptual evaluation of speech quality (PESQ). In particular DPFs achieved an average relative improvement of 17.54% OPS points and 18.28% in PESQ, when compared to the CHiME 3 baseline. DPFs also achieved the best WER when combined with an ASR engine on simulated development and evaluation data, i.e. 8.98% and 10.82% WER. The proposed MaxPower beamformer achieved the best overall WER on CHiME 3 real development and evaluation data, i.e. 14.23% and 22.12%, respectively
Originalspracheenglisch
Titel2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings
Seiten452 - 459
DOIs
PublikationsstatusAngenommen/In Druck - 2015
Veranstaltung2015 IEEE Workshop on Automatic Speech Recognition & Understanding: ASRU 2015 - Scottsdale, Arizona, USA / Vereinigte Staaten
Dauer: 13 Dez. 201517 Dez. 2015

Konferenz

Konferenz2015 IEEE Workshop on Automatic Speech Recognition & Understanding
KurztitelASRU 2015
Land/GebietUSA / Vereinigte Staaten
OrtScottsdale, Arizona
Zeitraum13/12/1517/12/15

Fields of Expertise

  • Information, Communication & Computing

Treatment code (Nähere Zuordnung)

  • Application
  • Experimental

Fingerprint

Untersuchen Sie die Forschungsthemen von „Multi-channel speech processing architectures for noise robust speech recognition: 3rd CHiME Challenge results“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren