Joint Time-Frequency Segmentation Algorithm for Transient Speech Decomposition and Speech Enhancement

Charturong Tantibundhit; Franz Pernkopf; Gernot Kubin

doi:10.1109/TASL.2009.2035037

Joint Time-Frequency Segmentation Algorithm for Transient Speech Decomposition and Speech Enhancement

Charturong Tantibundhit, Franz Pernkopf, Gernot Kubin

Institute of Signal Processing and Speech Communication (4420)

Research output: Contribution to journal › Article › peer-review

Abstract

We develop an algorithm, the joint time-frequency segmentation algorithm, where the wavelet packet coefficients of the analyzed speech signal are represented as tiles of a time-frequency representation adapted to the characteristics of the signal itself. Further, our algorithm enables the decomposition of the speech signal into transient and non-transient components, respectively. Any block of wavelet packet coefficients, whose tiling height is larger than or equal to the tiling width belongs to the transient component and vice versa for the non-transient component. The transient component is selectively amplified and recombined with the original speech to generate the modified speech with energy adjusted to be equal to the original speech. The intelligibility of the original and modified speech is evaluated by 16 human listeners. Word recognition rate results show that the modified speech significantly improves speech intelligibility in background noise, i.e., by 10% absolute at 0 dB to 27% absolute at -30 dB.

Original language	English
Pages (from-to)	1417-1428
Journal	IEEE Transactions on Audio Speech and Language Processing
Volume	18
Issue number	6
DOIs	https://doi.org/10.1109/TASL.2009.2035037
Publication status	Published - 2010

Fields of Expertise

Information, Communication & Computing

Access to Document

10.1109/TASL.2009.2035037

Cite this

@article{05816041c42d4596813d9be85151a18b,

title = "Joint Time-Frequency Segmentation Algorithm for Transient Speech Decomposition and Speech Enhancement",

abstract = "We develop an algorithm, the joint time-frequency segmentation algorithm, where the wavelet packet coefficients of the analyzed speech signal are represented as tiles of a time-frequency representation adapted to the characteristics of the signal itself. Further, our algorithm enables the decomposition of the speech signal into transient and non-transient components, respectively. Any block of wavelet packet coefficients, whose tiling height is larger than or equal to the tiling width belongs to the transient component and vice versa for the non-transient component. The transient component is selectively amplified and recombined with the original speech to generate the modified speech with energy adjusted to be equal to the original speech. The intelligibility of the original and modified speech is evaluated by 16 human listeners. Word recognition rate results show that the modified speech significantly improves speech intelligibility in background noise, i.e., by 10% absolute at 0 dB to 27% absolute at -30 dB.",

author = "Charturong Tantibundhit and Franz Pernkopf and Gernot Kubin",

year = "2010",

doi = "10.1109/TASL.2009.2035037",

language = "English",

volume = "18",

pages = "1417--1428",

journal = "IEEE Transactions on Audio Speech and Language Processing ",

issn = "1558-7924",

publisher = "Institute of Electrical and Electronics Engineers",

number = "6",

}

TY - JOUR

T1 - Joint Time-Frequency Segmentation Algorithm for Transient Speech Decomposition and Speech Enhancement

AU - Tantibundhit, Charturong

AU - Pernkopf, Franz

AU - Kubin, Gernot

PY - 2010

Y1 - 2010

N2 - We develop an algorithm, the joint time-frequency segmentation algorithm, where the wavelet packet coefficients of the analyzed speech signal are represented as tiles of a time-frequency representation adapted to the characteristics of the signal itself. Further, our algorithm enables the decomposition of the speech signal into transient and non-transient components, respectively. Any block of wavelet packet coefficients, whose tiling height is larger than or equal to the tiling width belongs to the transient component and vice versa for the non-transient component. The transient component is selectively amplified and recombined with the original speech to generate the modified speech with energy adjusted to be equal to the original speech. The intelligibility of the original and modified speech is evaluated by 16 human listeners. Word recognition rate results show that the modified speech significantly improves speech intelligibility in background noise, i.e., by 10% absolute at 0 dB to 27% absolute at -30 dB.

AB - We develop an algorithm, the joint time-frequency segmentation algorithm, where the wavelet packet coefficients of the analyzed speech signal are represented as tiles of a time-frequency representation adapted to the characteristics of the signal itself. Further, our algorithm enables the decomposition of the speech signal into transient and non-transient components, respectively. Any block of wavelet packet coefficients, whose tiling height is larger than or equal to the tiling width belongs to the transient component and vice versa for the non-transient component. The transient component is selectively amplified and recombined with the original speech to generate the modified speech with energy adjusted to be equal to the original speech. The intelligibility of the original and modified speech is evaluated by 16 human listeners. Word recognition rate results show that the modified speech significantly improves speech intelligibility in background noise, i.e., by 10% absolute at 0 dB to 27% absolute at -30 dB.

U2 - 10.1109/TASL.2009.2035037

DO - 10.1109/TASL.2009.2035037

M3 - Article

SN - 1558-7924

VL - 18

SP - 1417

EP - 1428

JO - IEEE Transactions on Audio Speech and Language Processing

JF - IEEE Transactions on Audio Speech and Language Processing

IS - 6

ER -

Joint Time-Frequency Segmentation Algorithm for Transient Speech Decomposition and Speech Enhancement

Abstract

Fields of Expertise

Access to Document

Fingerprint

Cite this