Projekte pro Jahr
Abstract
Recognizing speech under noisy condition is an ill-posed problem. The CHiME 3 challenge targets robust speech recognition in realistic environments such as street, bus, caffee and pedestrian areas. We study variants of beamformers used for pre-processing multi-channel speech recordings. In particular, we investigate three variants of generalized side-lobe canceller (GSC) beamformers, i.e. GSC with sparse blocking matrix (BM), GSC with adaptive BM (ABM), and GSC with minimum variance distortionless response (MVDR) and ABM. Furthermore, we apply several post-filters to further enhance the speech signal. We introduce MaxPower postfilters and deep neural postfilters (DPFs). DPFs outperformed our baseline systems significantly when measuring the overall perceptual score (OPS) and the perceptual evaluation of speech quality (PESQ). In particular DPFs achieved an average relative improvement of 17.54% OPS points and 18.28% in PESQ, when compared to the CHiME 3 baseline. DPFs also achieved the best WER when combined with an ASR engine on simulated development and evaluation data, i.e. 8.98% and 10.82% WER. The proposed MaxPower beamformer achieved the best overall WER on CHiME 3 real development and evaluation data, i.e. 14.23% and 22.12%, respectively
Originalsprache | englisch |
---|---|
Titel | 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings |
Seiten | 452 - 459 |
DOIs | |
Publikationsstatus | Angenommen/In Druck - 2015 |
Veranstaltung | 2015 IEEE Workshop on Automatic Speech Recognition & Understanding: ASRU 2015 - Scottsdale, Arizona, USA / Vereinigte Staaten Dauer: 13 Dez. 2015 → 17 Dez. 2015 |
Konferenz
Konferenz | 2015 IEEE Workshop on Automatic Speech Recognition & Understanding |
---|---|
Kurztitel | ASRU 2015 |
Land/Gebiet | USA / Vereinigte Staaten |
Ort | Scottsdale, Arizona |
Zeitraum | 13/12/15 → 17/12/15 |
Fields of Expertise
- Information, Communication & Computing
Treatment code (Nähere Zuordnung)
- Application
- Experimental
Fingerprint
Untersuchen Sie die Forschungsthemen von „Multi-channel speech processing architectures for noise robust speech recognition: 3rd CHiME Challenge results“. Zusammen bilden sie einen einzigartigen Fingerprint.Projekte
- 1 Abgeschlossen
-
ASD-COMET - Acoustic Sensing & Design
Pessentheiner, H. (Teilnehmer (Co-Investigator)), Hagmueller, M. (Teilnehmer (Co-Investigator)) & Kubin, G. (Projektleiter (Principal Investigator))
1/04/13 → 31/03/17
Projekt: Forschungsprojekt