This page is a placeholder for the sound examples specifically referenced in my submission to the Speech Communications special issue on CASA arising from the 2nd CASA Workshop held at the 1997 IJCAI in Nagoya.
All sound examples are in 16-bit mono AIFF format. The construction site examples are sampled at 22 kHz, and the voice/clap examples are sampled at 8 kHz. If your browser supports client-side image maps, you should be able to hear the examples by clicking on particular elements in the images.
This is an example of using my prediction-driven analysis system to analyze a complex nonspeech ambience scene. It is taken from my Ph.D. dissertation.
original · saw
noise · voice wefts · wood click · metal
clicks · wood drop click ·
clink1 click · clink2
click · weft group 1 · weft group 2 · background
noise ·
resynthesis with all elements
This is an example of using the prediction-driven system with an integrated speech recognizer to attempt to separate a mixture of speech and nonspeech. It is described further in my ICASSP'98 submission.
clap alone (resynthesized)
· speech plus clap (resynthesized)
· label-based reconstruction ·
low-frequency spectrum reconstruction ·
full reconstruction · clap
nonspeech element ·
other nonspeech elements
The "(resynthesized)" versions take the spectral-envelope information from the original sounds (the only information used for analysis) and use it for reconstruction (as a kind of upper-bound on resynthesis quality).