Detection of Extra Pulses in Synthesized Glottal Area Waveforms of Dysphonic Voices

Philipp Aichinger*, Franz Pernkopf, Jean Schoentgen

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


Background and objectives

The description of production kinematics of dysphonic voices plays an important role in the clinical care of voice disorders. However, high-speed videolaryngoscopy is not routinely used in clinical practice, partly because there is a lack of diagnostic markers that may be obtained from high-speed videos automatically. Aim of the study is to propose and test a procedure that automatically detects extra pulses, which may occur in voiced source signals of pathological voices in addition to cyclic pulses.
Material and methods

Glottal area waveforms (GAW) are synthesized and used to test a detector for extra pulses. Regarding synthesis, for each GAW a cyclic pulse train is mixed with an extra pulse train, and additive noise. The cyclic pulse trains are varied across GAWs in terms of fundamental frequency, pulse shape, and modulation noise, i.e., jitter and shimmer. The extra pulse trains are varied across GAWs in terms of the height of the extra pulses, and their rates of occurrence. The energy level of the additive noise is also varied. Regarding detection, first, the fundamental frequency is estimated jointly with the cyclic pulse train waveform, second, the modulation noise is estimated, and finally the extra pulse train waveform is estimated. Two versions of the detector are compared, i.e., one that parameterizes the shapes of the cyclic pulses, and one that uses unparameterized pulse shape estimates. Two corpora are used for testing, i.e., one with 100 GAWs containing random extra pulses, and one with 25 GAWs containing extra pulses in the closed phases of each glottal phase representing subharmonic voices.
Results and discussion

With pulse shape parameterization (PSP) a maximum mean accuracy of 88.3% is achieved when detecting random extra pulses. Without PSP, the maximum mean accuracy reduces to 82.9%. Detection performance decreases if the energy level of additive noise is higher than −25 dB with respect to the energy of the cyclic pulse train, and if the irregularity strength exceeds 0.1. For bicyclic, i.e., subharmonic voices, the approach fails without PSP, whereas with PSP, a mean sensitivity of 87.4% is achieved for subharmonic voices.

A synthesizer for GAWs containing extra pulses, and a detector for extra pulses are proposed. With PSP, favorable detector performance is observed for not too high levels of additive noise and irregularity strengths. In signals with high noise levels, the detector without PSP outperforms the other one. Detection of extra pulses fails if irregularity strength is large. For subharmonic voices PSP must be used.
Original languageEnglish
Pages (from-to)158-167
JournalBiomedical Signal Processing and Control
Publication statusPublished - 2018


Dive into the research topics of 'Detection of Extra Pulses in Synthesized Glottal Area Waveforms of Dysphonic Voices'. Together they form a unique fingerprint.

Cite this