Impact of phase estimation on single-channel speech separation based on time-frequency masking

Florian Mayer*, Donald S. Williamson, Pejman Mowlaee, Deliang Wang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Time-frequency masking is a common solution for the single-channel source separation (SCSS) problem where the goal is to find a time-frequency mask that separates the underlying sources from an observed mixture. An estimated mask is then applied to the mixed signal to extract the desired signal. During signal reconstruction, the time-frequency-masked spectral amplitude is combined with the mixture phase. This article considers the impact of replacing the mixture spectral phase with an estimated clean spectral phase combined with the estimated magnitude spectrum using a conventional model-based approach. As the proposed phase estimator requires estimated fundamental frequency of the underlying signal from the mixture, a robust pitch estimator is proposed. The upper-bound clean phase results show the potential of phase-Aware processing in single-channel source separation. Also, the experiments demonstrate that replacing the mixture phase with the estimated clean spectral phase consistently improves perceptual speech quality, predicted speech intelligibility, and source separation performance across all signal-To-noise ratio and noise scenarios.

Original languageEnglish
Pages (from-to)4668-4679
Number of pages12
JournalThe Journal of the Acoustical Society of America
Volume141
Issue number6
DOIs
Publication statusPublished - 1 Jun 2017

ASJC Scopus subject areas

  • Arts and Humanities (miscellaneous)
  • Acoustics and Ultrasonics

Fingerprint

Dive into the research topics of 'Impact of phase estimation on single-channel speech separation based on time-frequency masking'. Together they form a unique fingerprint.

Cite this