Analyzing sound mixtures
Real sound = many sources (speech, ...)
How to analyze / decompose?
The CASA approach
- bottom-up cues
- assemble into larger structures
- train on mixed (average static noise)
- parallel models (decomposition)
- calculating joint probabilities
- relative levels (cepstral domain)
- suitability of state model for nonspeech?
What do people do?
- ?: hypotheses pruned by gen-purp bottom-up
- i.e. a combination...
Exploiting ASR in CASA
- DAn Ellis