Conversational Speech Recognition Needs Data? Experiments with Austrian German

Publikation: KonferenzbeitragPaperBegutachtung

Abstract

Conversational speech represents one of the most complex of automatic speech recognition (ASR) tasks owing to the high
inter-speaker variation in both pronunciation and conversational dynamics. Such complexity is particularly sensitive to
low-resourced (LR) scenarios. Recent developments in self-supervision have allowed such scenarios to take advantage of large
amounts of otherwise unrelated data. In this study, we characterise an (LR) Austrian German conversational task. We begin
with a non-pre-trained baseline and show that fine-tuning of a model pre-trained using self-supervision leads to improvements
consistent with those in the literature; this extends to cases where a lexicon and language model are included. We also show
that the advantage of pre-training indeed arises from the larger database rather than the self-supervision. Further, by use
of a leave-one-conversation out technique, we demonstrate that robustness problems remain with respect to inter-speaker
and inter-conversation variation. This serves to guide where future research might best be focused in light of the current
state-of-the-art.
Originalspracheenglisch
Seiten4684–4691
Seitenumfang8
PublikationsstatusVeröffentlicht - 2022

Schlagwörter

  • Speech Recognition
  • Conversational Speech
  • Austrian German
  • Low-Resource
  • Wav2vec2.0
  • Kaldi

Fingerprint

Untersuchen Sie die Forschungsthemen von „Conversational Speech Recognition Needs Data? Experiments with Austrian German“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren