TY - JOUR
T1 - Eigenvector-based Speech Mask Estimation for Multi- Channel Speech Enhancement
AU - Pfeifenberger, Lukas
AU - Zöhrer, Matthias
AU - Pernkopf, Franz
PY - 2019
Y1 - 2019
N2 - We present the Eigennet architecture for estimating a gain mask from noisy, multi-channel microphone observations. While existing mask estimators use magnitude features, our system also exploits the spatial information embedded in the phase of the data. The mask is used to obtain the Minimum Variance Distortionless Response (MVDR) and Generalized Eigenvalue (GEV) beamformers. We also derive the Phase Aware Normalization (PAN) postfilter, which corrects both magnitude and phase distortions caused by the GEV. Further, we demonstrate the properties of our eigenvector features, and compare their performance with three state-of-the-art reference systems. We report their performance in terms of SNR improvement and Word Error Rate (WER) using Google and Kaldi Speech-to-Text API. Experiments are performed on the WSJ0 and CHiME4 corpora, where competitive performance in both WER and SNR is achieved.
AB - We present the Eigennet architecture for estimating a gain mask from noisy, multi-channel microphone observations. While existing mask estimators use magnitude features, our system also exploits the spatial information embedded in the phase of the data. The mask is used to obtain the Minimum Variance Distortionless Response (MVDR) and Generalized Eigenvalue (GEV) beamformers. We also derive the Phase Aware Normalization (PAN) postfilter, which corrects both magnitude and phase distortions caused by the GEV. Further, we demonstrate the properties of our eigenvector features, and compare their performance with three state-of-the-art reference systems. We report their performance in terms of SNR improvement and Word Error Rate (WER) using Google and Kaldi Speech-to-Text API. Experiments are performed on the WSJ0 and CHiME4 corpora, where competitive performance in both WER and SNR is achieved.
U2 - 10.1109/TASLP.2019.2941592
DO - 10.1109/TASLP.2019.2941592
M3 - Article
SN - 2329-9290
VL - 27
SP - 2162
EP - 2172
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
IS - 12
ER -