ICSI Speech FAQ:
5.8 How do different features compare performancewise?

Answer by: dpwe - 2000-08-05

Different features perform differently in different circumstances, and no complete pattern or explanation is apparent; that's why this is still research. However, on this page I have collected some comparative results, where different features have been used on the same task, to give some idea of performance variation.

For BBC Broadcast News (a rather uniform database of broadcast recordings in British English, from the THISL project), with a 2000HU net, some different feature results are shown below:

Feature WER

plp12O 32.3%

msg1-8kHz 36.3%

posterior combo
of above 30.7%

The "O" suffix means online normalization was used (the best condition), and -8kHz means the msg features were based on downsampled waveforms. See the full results in my status report of 2000-07-31.

For hybrid recognition on Aurora noisy digits, using the 1999 matched-condition test set, here are some typical results (N suffix means per-utterance normalization; +d means with deltas; +dd means with deltas and double-deltas):

Feature Clean WER SNR5 WER

Rasta8+d 2.5% 15.2%

MSG3N 2.1% 11.6%

PLP12N 2.8% 13.0%

PLP12+dN 2.6% 10.6%

PLP12+ddN 2.4% 10.6%

PLP12+dd 1.7% 8.7%

mfcc13+dd 1.6% 8.7%

For the excruciating detail, see my Aurora 1999 results page.

Feature	WER
plp12O	32.3%
msg1-8kHz	36.3%
posterior combo of above	30.7%

Feature	Clean WER	SNR5 WER
Rasta8+d	2.5%	15.2%
MSG3N	2.1%	11.6%
PLP12N	2.8%	13.0%
PLP12+dN	2.6%	10.6%
PLP12+ddN	2.4%	10.6%
PLP12+dd	1.7%	8.7%
mfcc13+dd	1.6%	8.7%

I made a set of comparative tests with and without Rasta filtering, PLP smoothing and cepstral orthogonalization for the NUMBERS95 test set. Although my baseline results of around 7% WER are quite poor by today's standards, the results are quite interesting: the different variants (most notably cepstral or spectral domain) make almost no difference. See my status report of 1997-10-03, but here are the complete final results:

Features	Net size	WERR (sub/del/ins)
rasta-plp-cepstra	243:500:56	7.0% (4.0/1.3/1.7)
rasta-cepstra	243:500:56	6.9% (3.8/1.4/1.7)
plp-cepstra	243:500:56	6.9% (3.7/1.8/1.4)
cepstra	243:500:56	6.8% (3.7/1.6/1.5)
rasta-plp-logspec	405:500:56	6.6% (3.7/1.2/1.7)
rasta-logspec	405:500:56	6.8% (3.9/1.3/1.6)
plp-logspec	405:500:56	7.1% (3.8/1.7/1.6)
logspec	405:500:56	7.0% (3.8/1.6/1.6)

signif requires a minimum difference of 0.8% for significance in this test, so all the variations are equivalent!

Previous: 5.7 How can I create my own novel features? - Next: 5.9 How can I run the SRI front-end standalone?
Back to ICSI Speech FAQ index

Generated by build-faq-index on Tue Mar 24 16:18:16 PDT 2009

ICSI Speech FAQ: 5.8 How do different features compare performancewise?

ICSI Speech FAQ:
5.8 How do different features compare performancewise?