Single Channel Source Separation in the Wild – Conversational Speech in Realistic Environments

Publikation: Beitrag in Buch/Bericht/KonferenzbandBeitrag in einem KonferenzbandBegutachtung

Abstract

Recent progress in Single Channel Source Separation (SCSS) using deep neural networks led to impressive performance gains while also increasing the model sizes, requiring tremendous data resources. This demand is covered by artificially composed speech and noise mixtures that do not capture real-life characteristics of conversations taking place in noisy environments. This paper introduces a new dataset containing task-oriented dialogues spoken in a realistic environment and presents experimental results for two SCSS architectures - the Conv-TasNet and the transformer-based MossFormer. Overall, we observe a severe drop in performance of up to 4.3dB (SI-SDR improvement) for the 8kHz variant of the Conv-TasNet. For speaker pairs of homogeneous sex, the difference is even higher of up to 6dB. Only the model using 16kHz sample rate performs on a comparable level for speaker pairs of mixed sex. Our findings illustrate the need of using realistic data for both, training and evaluating.

Originalspracheenglisch
TitelSpeech Communication - 15th ITG Conference
Herausgeber (Verlag)VDE Verlag GmbH
Seiten96-100
Seitenumfang5
ISBN (elektronisch)9783800761654
DOIs
PublikationsstatusVeröffentlicht - 2023
Veranstaltung15th ITG Conference on Speech Communication - RWTH, Aachen, Deutschland
Dauer: 20 Sept. 202322 Sept. 2023

Publikationsreihe

NameSpeech Communication - 15th ITG Conference

Konferenz

Konferenz15th ITG Conference on Speech Communication
Land/GebietDeutschland
OrtAachen
Zeitraum20/09/2322/09/23

ASJC Scopus subject areas

  • Angewandte Informatik
  • Signalverarbeitung
  • Elektrotechnik und Elektronik
  • Akustik und Ultraschall
  • Sprechen und Hören

Fingerprint

Untersuchen Sie die Forschungsthemen von „Single Channel Source Separation in the Wild – Conversational Speech in Realistic Environments“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren