Abstract
This paper addresses the problem of distant speech recognition in reverberant noise conditions applying a star-shaped microphone array and missing data techniques. The performance of
the system is evaluated over a German database, which has been contaminated with noise of an apartment of the DIRHA (Distant Speech Interaction for Robust Home Applications) project.
The proposed system is composed of three blocks. First, a beamformer yields an enhanced single-channel signal by filtering multi-channel signals and summing up all signals after-
wards. To optimize the filter weights, we apply convex (CVX) optimization over three spatial dimensions given the spatiotemporal position of the target speaker as prior knowledge. Sec-
ond, the beamformer output is exploited to extract pitch and estimate the stationary part of the background noise. Third, the system produces a final noise estimate by combining both, the
stationary noise part as well as the harmonic noise estimate obtained from the pitch. Finally, the filter-bank representation of the enhanced signal and its corresponding missing data mask
obtained from this final noise estimate are sent to the speech recognition back-end. The purpose of this paper is to analyze the impact of employing a beamformer followed by a missing data technique.
the system is evaluated over a German database, which has been contaminated with noise of an apartment of the DIRHA (Distant Speech Interaction for Robust Home Applications) project.
The proposed system is composed of three blocks. First, a beamformer yields an enhanced single-channel signal by filtering multi-channel signals and summing up all signals after-
wards. To optimize the filter weights, we apply convex (CVX) optimization over three spatial dimensions given the spatiotemporal position of the target speaker as prior knowledge. Sec-
ond, the beamformer output is exploited to extract pitch and estimate the stationary part of the background noise. Third, the system produces a final noise estimate by combining both, the
stationary noise part as well as the harmonic noise estimate obtained from the pitch. Finally, the filter-bank representation of the enhanced signal and its corresponding missing data mask
obtained from this final noise estimate are sent to the speech recognition back-end. The purpose of this paper is to analyze the impact of employing a beamformer followed by a missing data technique.
Originalsprache | englisch |
---|---|
Titel | AIA-DAGA 2013 : proceedings of the International Conference on Acoustics |
Seiten | 2049-2052 |
ISBN (elektronisch) | 9783939296058 |
Publikationsstatus | Veröffentlicht - 2013 |
Veranstaltung | 39. Jahrestagung für Akustik: AIA-DAGA 2013 - Meran, Italien Dauer: 18 März 2013 → 21 März 2013 |
Konferenz
Konferenz | 39. Jahrestagung für Akustik |
---|---|
Kurztitel | DAGA 2013 |
Land/Gebiet | Italien |
Ort | Meran |
Zeitraum | 18/03/13 → 21/03/13 |
Fields of Expertise
- Information, Communication & Computing
Treatment code (Nähere Zuordnung)
- Basic - Fundamental (Grundlagenforschung)
- Application