Projekte pro Jahr
Abstract
Conversational speech represents one of the most complex of automatic speech recognition (ASR) tasks owing to the high
inter-speaker variation in both pronunciation and conversational dynamics. Such complexity is particularly sensitive to
low-resourced (LR) scenarios. Recent developments in self-supervision have allowed such scenarios to take advantage of large
amounts of otherwise unrelated data. In this study, we characterise an (LR) Austrian German conversational task. We begin
with a non-pre-trained baseline and show that fine-tuning of a model pre-trained using self-supervision leads to improvements
consistent with those in the literature; this extends to cases where a lexicon and language model are included. We also show
that the advantage of pre-training indeed arises from the larger database rather than the self-supervision. Further, by use
of a leave-one-conversation out technique, we demonstrate that robustness problems remain with respect to inter-speaker
and inter-conversation variation. This serves to guide where future research might best be focused in light of the current
state-of-the-art.
inter-speaker variation in both pronunciation and conversational dynamics. Such complexity is particularly sensitive to
low-resourced (LR) scenarios. Recent developments in self-supervision have allowed such scenarios to take advantage of large
amounts of otherwise unrelated data. In this study, we characterise an (LR) Austrian German conversational task. We begin
with a non-pre-trained baseline and show that fine-tuning of a model pre-trained using self-supervision leads to improvements
consistent with those in the literature; this extends to cases where a lexicon and language model are included. We also show
that the advantage of pre-training indeed arises from the larger database rather than the self-supervision. Further, by use
of a leave-one-conversation out technique, we demonstrate that robustness problems remain with respect to inter-speaker
and inter-conversation variation. This serves to guide where future research might best be focused in light of the current
state-of-the-art.
Originalsprache | englisch |
---|---|
Seiten | 4684–4691 |
Seitenumfang | 8 |
Publikationsstatus | Veröffentlicht - 2022 |
Schlagwörter
- Speech Recognition
- Conversational Speech
- Austrian German
- Low-Resource
- Wav2vec2.0
- Kaldi
Fingerprint
Untersuchen Sie die Forschungsthemen von „Conversational Speech Recognition Needs Data? Experiments with Austrian German“. Zusammen bilden sie einen einzigartigen Fingerprint.Projekte
- 1 Abgeschlossen
-
FWF - CLCS_2 - Cross-layer Prosodie Modelle für Spontansprache
1/10/18 → 30/11/21
Projekt: Forschungsprojekt