Companion page to
Dan Ellis's submission to
Speech Communications's special issue on
Computational Auditory Scene Analysis

This page is a placeholder for the sound examples specifically referenced in my submission to the Speech Communications special issue on CASA arising from the 2nd CASA Workshop held at the 1997 IJCAI in Nagoya.

All sound examples are in 16-bit mono AIFF format. The construction site examples are sampled at 22 kHz, and the voice/clap examples are sampled at 8 kHz. If your browser supports client-side image maps, you should be able to hear the examples by clicking on particular elements in the images.

Construction site ambience example

This is an example of using my prediction-driven analysis system to analyze a complex nonspeech ambience scene. It is taken from my Ph.D. dissertation.

original mixture noise2 wefts8,10 click1 clicks2,3 click4 clicks7,8 wefts1-5 wefts7,9 noise1

original · saw noise · voice wefts · wood click · metal clicks · wood drop click ·
clink1 click · clink2 click · weft group 1 · weft group 2 · background noise ·
resynthesis with all elements

Speech+Clap example

This is an example of using the prediction-driven system with an integrated speech recognizer to attempt to separate a mixture of speech and nonspeech. It is described further in my ICASSP'98 submission.

original clap original speech+clap label-based reconstruction low-mod-freq portion of input complete speech reconstruction nonspeech click element spurious nonspeech elements

clap alone (resynthesized) · speech plus clap (resynthesized) · label-based reconstruction ·
low-frequency spectrum reconstruction · full reconstruction · clap nonspeech element ·
other nonspeech elements

The "(resynthesized)" versions take the spectral-envelope information from the original sounds (the only information used for analysis) and use it for reconstruction (as a kind of upper-bound on resynthesis quality).

