Impact of phase estimation on single-channel speech separation based on time-frequency masking

Florian Mayer; Donald S. Williamson; Pejman Mowlaee; Deliang Wang

doi:10.1121/1.4986647

Impact of phase estimation on single-channel speech separation based on time-frequency masking

Florian Mayer^*, Donald S. Williamson, Pejman Mowlaee, Deliang Wang

^*Korrespondierende/r Autor/-in für diese Arbeit

Institut für Signalverarbeitung und Sprachkommunikation (4420)

Publikation: Beitrag in einer Fachzeitschrift › Artikel › Begutachtung

Abstract

Time-frequency masking is a common solution for the single-channel source separation (SCSS) problem where the goal is to find a time-frequency mask that separates the underlying sources from an observed mixture. An estimated mask is then applied to the mixed signal to extract the desired signal. During signal reconstruction, the time-frequency-masked spectral amplitude is combined with the mixture phase. This article considers the impact of replacing the mixture spectral phase with an estimated clean spectral phase combined with the estimated magnitude spectrum using a conventional model-based approach. As the proposed phase estimator requires estimated fundamental frequency of the underlying signal from the mixture, a robust pitch estimator is proposed. The upper-bound clean phase results show the potential of phase-Aware processing in single-channel source separation. Also, the experiments demonstrate that replacing the mixture phase with the estimated clean spectral phase consistently improves perceptual speech quality, predicted speech intelligibility, and source separation performance across all signal-To-noise ratio and noise scenarios.

Originalsprache	englisch
Seiten (von - bis)	4668-4679
Seitenumfang	12
Fachzeitschrift	The Journal of the Acoustical Society of America
Jahrgang	141
Ausgabenummer	6
DOIs	https://doi.org/10.1121/1.4986647
Publikationsstatus	Veröffentlicht - 1 Juni 2017

ASJC Scopus subject areas

Geisteswissenschaftliche Fächer (sonstige)
Akustik und Ultraschall

Zugriff auf Dokument

10.1121/1.4986647

Andere Dateien und Links

http://www.scopus.com/inward/record.url?scp=85021156150&partnerID=8YFLogxK

Dieses zitieren

@article{c365c8933d1e4ad8b98a2a492ceafaf6,

title = "Impact of phase estimation on single-channel speech separation based on time-frequency masking",

abstract = "Time-frequency masking is a common solution for the single-channel source separation (SCSS) problem where the goal is to find a time-frequency mask that separates the underlying sources from an observed mixture. An estimated mask is then applied to the mixed signal to extract the desired signal. During signal reconstruction, the time-frequency-masked spectral amplitude is combined with the mixture phase. This article considers the impact of replacing the mixture spectral phase with an estimated clean spectral phase combined with the estimated magnitude spectrum using a conventional model-based approach. As the proposed phase estimator requires estimated fundamental frequency of the underlying signal from the mixture, a robust pitch estimator is proposed. The upper-bound clean phase results show the potential of phase-Aware processing in single-channel source separation. Also, the experiments demonstrate that replacing the mixture phase with the estimated clean spectral phase consistently improves perceptual speech quality, predicted speech intelligibility, and source separation performance across all signal-To-noise ratio and noise scenarios.",

author = "Florian Mayer and Williamson, {Donald S.} and Pejman Mowlaee and Deliang Wang",

year = "2017",

month = jun,

day = "1",

doi = "10.1121/1.4986647",

language = "English",

volume = "141",

pages = "4668--4679",

journal = "The Journal of the Acoustical Society of America",

issn = "0001-4966",

publisher = "American Institute of Physics Publising LLC",

number = "6",

}

TY - JOUR

T1 - Impact of phase estimation on single-channel speech separation based on time-frequency masking

AU - Mayer, Florian

AU - Williamson, Donald S.

AU - Mowlaee, Pejman

AU - Wang, Deliang

PY - 2017/6/1

Y1 - 2017/6/1

N2 - Time-frequency masking is a common solution for the single-channel source separation (SCSS) problem where the goal is to find a time-frequency mask that separates the underlying sources from an observed mixture. An estimated mask is then applied to the mixed signal to extract the desired signal. During signal reconstruction, the time-frequency-masked spectral amplitude is combined with the mixture phase. This article considers the impact of replacing the mixture spectral phase with an estimated clean spectral phase combined with the estimated magnitude spectrum using a conventional model-based approach. As the proposed phase estimator requires estimated fundamental frequency of the underlying signal from the mixture, a robust pitch estimator is proposed. The upper-bound clean phase results show the potential of phase-Aware processing in single-channel source separation. Also, the experiments demonstrate that replacing the mixture phase with the estimated clean spectral phase consistently improves perceptual speech quality, predicted speech intelligibility, and source separation performance across all signal-To-noise ratio and noise scenarios.

AB - Time-frequency masking is a common solution for the single-channel source separation (SCSS) problem where the goal is to find a time-frequency mask that separates the underlying sources from an observed mixture. An estimated mask is then applied to the mixed signal to extract the desired signal. During signal reconstruction, the time-frequency-masked spectral amplitude is combined with the mixture phase. This article considers the impact of replacing the mixture spectral phase with an estimated clean spectral phase combined with the estimated magnitude spectrum using a conventional model-based approach. As the proposed phase estimator requires estimated fundamental frequency of the underlying signal from the mixture, a robust pitch estimator is proposed. The upper-bound clean phase results show the potential of phase-Aware processing in single-channel source separation. Also, the experiments demonstrate that replacing the mixture phase with the estimated clean spectral phase consistently improves perceptual speech quality, predicted speech intelligibility, and source separation performance across all signal-To-noise ratio and noise scenarios.

UR - http://www.scopus.com/inward/record.url?scp=85021156150&partnerID=8YFLogxK

U2 - 10.1121/1.4986647

DO - 10.1121/1.4986647

M3 - Article

AN - SCOPUS:85021156150

SN - 0001-4966

VL - 141

SP - 4668

EP - 4679

JO - The Journal of the Acoustical Society of America

JF - The Journal of the Acoustical Society of America

IS - 6

ER -

Impact of phase estimation on single-channel speech separation based on time-frequency masking

Abstract

ASJC Scopus subject areas

Zugriff auf Dokument

Andere Dateien und Links

Fingerprint

Dieses zitieren