The ICSI Meeting Recorder Digits Corpus provides a collection of connected digit speech data recorded in a real meeting room. Its aim is to support and ease reverberation and noise reduction algorithm development and comparison in real-world environments.
The package available here contains non-segmented recordings of read connected digits made simultaneously with four table-top PZM microphones. (This audio data, along with recordings from personal mics and table-top electret microphones, is also available from the Linguistic Data Consortium as part of the ICSI Meeting Corpus.) Segmentation and utterance extraction scripts, transcription files and additional documentation are also included. 2790 utterances are available after segmentation.
The Meeting Recorder Digit Corpus (table-top PZM mics) is available via anonymous ftp here.
This corpus was used in Ferras' master's thesis. The Aurora 5 benchmark, available from ELRA, also makes use of data from this set. Before the full corpus was ready, a subset of this corpus was used in ASRU 2001 and ICSLP 2002 papers by Gelbart and Morgan.