Dataset of directional room impulse responses for realistic speech data

Stefan Fragner; Lukas Pfeifenberger; Martin Hagmüller; Franz Pernkopf

doi:10.1016/j.dib.2024.110229

Dataset of directional room impulse responses for realistic speech data

Stefan Fragner^*, Lukas Pfeifenberger, Martin Hagmüller, Franz Pernkopf

^*Corresponding author for this work

Institute of Signal Processing and Speech Communication (4420)

Research output: Contribution to journal › Article › peer-review

Abstract

Obtaining real-world multi-channel speech recordings is expensive and time-consuming. Therefore, multi-channel recordings are often artificially generated by convolving existing monaural speech recordings with simulated Room Impulse Responses (RIRs) from a so-called shoebox room [1] for stationary (not moving) speakers. Far-field speech processing for home automation or smart assistants have to cope with moving speakers in reverberant environments. With this dataset, we aim to support the generation of realistic speech data by providing multiple directional RIRs along a fine grid of locations in a real room. We provide directional RIR recordings for a classroom and a large corridor. These RIRs can be used to simulate moving speakers by generating random trajectories on that grid, and quantize the trajectories along the grid points. For each matching grid point, the monaural speech recording can be convolved with the RIR at this grid point. Then, the spatialized recording can be compiled using the overlap-add method for each grid point [2]. An example is provided with the data.

Original language	English
Article number	110229
Journal	Data in Brief
Volume	53
DOIs	https://doi.org/10.1016/j.dib.2024.110229
Publication status	Published - Apr 2024

Keywords

Artificial intelligence
Deep learning
Reverberant speech data
Room impulse response
Speech processing

ASJC Scopus subject areas

General

Access to Document

10.1016/j.dib.2024.110229Licence: CC BY-NC 4.0

Cite this

@article{0ed1b0302efd494792f9f9524b2621dd,

title = "Dataset of directional room impulse responses for realistic speech data",

abstract = "Obtaining real-world multi-channel speech recordings is expensive and time-consuming. Therefore, multi-channel recordings are often artificially generated by convolving existing monaural speech recordings with simulated Room Impulse Responses (RIRs) from a so-called shoebox room [1] for stationary (not moving) speakers. Far-field speech processing for home automation or smart assistants have to cope with moving speakers in reverberant environments. With this dataset, we aim to support the generation of realistic speech data by providing multiple directional RIRs along a fine grid of locations in a real room. We provide directional RIR recordings for a classroom and a large corridor. These RIRs can be used to simulate moving speakers by generating random trajectories on that grid, and quantize the trajectories along the grid points. For each matching grid point, the monaural speech recording can be convolved with the RIR at this grid point. Then, the spatialized recording can be compiled using the overlap-add method for each grid point [2]. An example is provided with the data.",

keywords = "Artificial intelligence, Deep learning, Reverberant speech data, Room impulse response, Speech processing",

author = "Stefan Fragner and Lukas Pfeifenberger and Martin Hagm{\"u}ller and Franz Pernkopf",

note = "Publisher Copyright: {\textcopyright} 2024",

year = "2024",

month = apr,

doi = "10.1016/j.dib.2024.110229",

language = "English",

volume = "53",

journal = "Data in Brief ",

issn = "2352-3409",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Dataset of directional room impulse responses for realistic speech data

AU - Fragner, Stefan

AU - Pfeifenberger, Lukas

AU - Hagmüller, Martin

AU - Pernkopf, Franz

PY - 2024/4

Y1 - 2024/4

N2 - Obtaining real-world multi-channel speech recordings is expensive and time-consuming. Therefore, multi-channel recordings are often artificially generated by convolving existing monaural speech recordings with simulated Room Impulse Responses (RIRs) from a so-called shoebox room [1] for stationary (not moving) speakers. Far-field speech processing for home automation or smart assistants have to cope with moving speakers in reverberant environments. With this dataset, we aim to support the generation of realistic speech data by providing multiple directional RIRs along a fine grid of locations in a real room. We provide directional RIR recordings for a classroom and a large corridor. These RIRs can be used to simulate moving speakers by generating random trajectories on that grid, and quantize the trajectories along the grid points. For each matching grid point, the monaural speech recording can be convolved with the RIR at this grid point. Then, the spatialized recording can be compiled using the overlap-add method for each grid point [2]. An example is provided with the data.

AB - Obtaining real-world multi-channel speech recordings is expensive and time-consuming. Therefore, multi-channel recordings are often artificially generated by convolving existing monaural speech recordings with simulated Room Impulse Responses (RIRs) from a so-called shoebox room [1] for stationary (not moving) speakers. Far-field speech processing for home automation or smart assistants have to cope with moving speakers in reverberant environments. With this dataset, we aim to support the generation of realistic speech data by providing multiple directional RIRs along a fine grid of locations in a real room. We provide directional RIR recordings for a classroom and a large corridor. These RIRs can be used to simulate moving speakers by generating random trajectories on that grid, and quantize the trajectories along the grid points. For each matching grid point, the monaural speech recording can be convolved with the RIR at this grid point. Then, the spatialized recording can be compiled using the overlap-add method for each grid point [2]. An example is provided with the data.

KW - Artificial intelligence

KW - Deep learning

KW - Reverberant speech data

KW - Room impulse response

KW - Speech processing

UR - http://www.scopus.com/inward/record.url?scp=85186578297&partnerID=8YFLogxK

U2 - 10.1016/j.dib.2024.110229

DO - 10.1016/j.dib.2024.110229

M3 - Article

AN - SCOPUS:85186578297

SN - 2352-3409

VL - 53

JO - Data in Brief

JF - Data in Brief

M1 - 110229

ER -

Dataset of directional room impulse responses for realistic speech data

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this