ICSI Speech FAQ:
5.8 How do different features compare performancewise?
Answer by: dpwe - 2000-08-05
Different features perform differently in different circumstances, and
no complete pattern or explanation is apparent; that's why this is
still research. However, on this page I have collected some
comparative results, where different features have been used on
the same task, to give some idea of performance variation.
- For BBC Broadcast News (a rather uniform database
of broadcast recordings in British English, from the THISL
project), with a 2000HU net, some different feature results
are shown below:
Feature | WER |
plp12O | 32.3% |
msg1-8kHz | 36.3% |
posterior combo of above | 30.7% |
The "O" suffix means online normalization was used (the best
condition), and -8kHz means the msg features were based on
downsampled waveforms.
See the full results in my
status report of 2000-07-31.
- For hybrid recognition on Aurora noisy digits, using the 1999
matched-condition test set, here are some typical results
(N suffix means per-utterance normalization; +d means with
deltas; +dd means with deltas and double-deltas):
Feature | Clean WER | SNR5 WER |
Rasta8+d | 2.5% | 15.2% |
MSG3N | 2.1% | 11.6% |
PLP12N | 2.8% | 13.0% |
PLP12+dN | 2.6% | 10.6% |
PLP12+ddN | 2.4% | 10.6% |
PLP12+dd | 1.7% | 8.7% |
mfcc13+dd | 1.6% | 8.7% |
For the excruciating detail, see my
Aurora 1999 results page.
- I made a set of comparative tests with and without
Rasta filtering, PLP smoothing and cepstral orthogonalization
for the NUMBERS95 test set. Although my baseline results
of around 7% WER are quite poor by today's standards,
the results are quite interesting: the different
variants (most notably cepstral or spectral domain)
make almost no difference. See my
status report of 1997-10-03,
but here are the complete final results:
Features | Net size | WERR (sub/del/ins) |
rasta-plp-cepstra | 243:500:56 | 7.0% (4.0/1.3/1.7) |
rasta-cepstra | 243:500:56 | 6.9% (3.8/1.4/1.7) |
plp-cepstra | 243:500:56 | 6.9% (3.7/1.8/1.4) |
cepstra | 243:500:56 | 6.8% (3.7/1.6/1.5) |
rasta-plp-logspec | 405:500:56 | 6.6% (3.7/1.2/1.7) |
rasta-logspec | 405:500:56 | 6.8% (3.9/1.3/1.6) |
plp-logspec | 405:500:56 | 7.1% (3.8/1.7/1.6) |
logspec | 405:500:56 | 7.0% (3.8/1.6/1.6) |
signif requires a minimum difference of 0.8% for significance
in this test, so all the variations are equivalent!
Previous: 5.7 How can I create my own novel features? - Next: 5.9 How can I run the SRI front-end standalone?
Back to ICSI Speech FAQ index
Generated by build-faq-index on Tue Mar 24 16:18:16 PDT 2009