Abstract
In this work, we present a universal codebook-based speech enhancement framework that relies on a single codebook to encode both speech and noise components. The atomic speech presence probability (ASPP) is defined as the probability that a given codebook atom encodes speech at a given point in time. We develop ASPP estimators based on binaural cues including the interaural phase and level difference (IPD and ILD), the interaural coherence magnitude (ICM), as well as a combined version leveraging the full interaural transfer function (ITF). We evaluate the performance of the resulting ASPP-based speech enhancement algorithms on binaural mixtures of reverberant speech and real-world noise. The proposed approach improves both objective speech quality and intelligibility over a wide range of input SNR, as measured with PESQ and binaural STOI metrics, outperforming two binaural speech enhancement benchmark methods. We show that the proposed ITF-based ASPP approach achieves a good balance of the trade-off between binaural noise reduction and binaural cue preservation.
Original language | English |
---|---|
Article number | 8811601 |
Pages (from-to) | 2150-2161 |
Number of pages | 12 |
Journal | IEEE/ACM Transactions on Audio Speech and Language Processing |
Volume | 27 |
Issue number | 12 |
DOIs | |
Publication status | Published - 1 Dec 2019 |
Keywords
- atomic speech presence probability
- Binaural speech enhancement
- interaural transfer function
- nonnegative matrix factorization
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- Acoustics and Ultrasonics
- Computational Mathematics
- Electrical and Electronic Engineering