Automatic speech recognition for conversational speech, or: What we can learn from human talk in interaction

Activity: Talk or presentationInvited talkScience to science


In the last decade, conversational speech has received a lot of attention among speech scientists. On the one hand, accurate automatic speech recognition (ASR) systems are essential for conversational dialogue systems, as these become more interactional and social rather than solely transactional. On the other hand, linguists study natural conversations, as they reveal additional insights to controlled experiments with respect to how speech processing works. Investigating conversational speech, however, does not only require applying existing methods to new data, but developing new categories, new modeling techniques and including new knowledge sources. Whereas traditional models are trained on either text or acoustic information, I propose language models that incorporate information on the phonetic variation of the words (i.e., pronunciation variation and prosody) and relate this information to the semantic context of the conversation and to the communicative functions in the conversation. This approach to language modeling is in line with the theoretical model proposed by Hawkins and Smith (2001), where the perceptual system accesses meaning from speech by using the most salient sensory information from any combination of levels/layers of formal linguistic analysis. The overal aim of my research is to create cross-layer models for conversational speech. In this talk, I will illustrate general challenges for ASR with conversational speech, I will present results from my recent and ongoing projects on pronunciation and prosody modeling, and I will discuss directions for future research.
Period31 Oct 2019
Held atBrno University of Technology, Czech Republic
Degree of RecognitionRegional