This page is a central location to gather and organize
the results of the various forays we are making into the Hub 4 - Broadcast
News task here at ICSI. As our experiments continue, it will hopefully
form a kind of narrative of our approach to the problem.
1998aug03: Progress of the current trainings
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
49.7 | 45.8 | 44.6 | 43.2 | 42.4 | 39.4 | 37.7 | 36.8 | 36.6 | 35.3 | 35.1 | 35.2 | 35.0 |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
51.9 | 50.5 | 48.5 | 44.5 | 42.2 | 39.9 | 39.2 | 39.4 | 38.8 | 38.8 | 38.6 | 38.5 | 38.5 |
No Deltas /
No Deltas Normalized |
Delta Window 9 /
Delta Window 9 Normalized |
Delta Window 5 /
Delta Window 5 Normalized |
|
Context Window 9 | 39.8 / 36.7 | 40.0 | 37.7 / 35.0 |
Context Window 5 | 44.5 | 40.5 | 39.3 / 35.7 |
No Deltas /
No Deltas Normalized |
Delta Window 9 /
Delta Window 9 Normalized |
Delta Window 5 /
Delta Window 5 Normalized |
|
Context Window 9 | 30.8 / 31.1 | 31.2 | 30.5 |
Context Window 5 | 32.2 | 30.8 | 30.9 / 30.2 |
No Deltas /
No Deltas Normalized |
Delta Window 9 /
Delta Window 9 Normalized |
Delta Window 5 /
Delta Window 5 Normalized |
|
Context Window 9 | 33.9 | ||
Context Window 5 | 34.7 | 34.0 / 34.7 |
No Deltas /
No Deltas Normalized |
Delta Window 9 /
Delta Window 9 Normalized |
Delta Window 5 /
Delta Window 5 Normalized |
|
Context Window 9 | 29.6 | ||
Context Window 5 | 29.8 | 29.4 / 29.9 |
System/Spoke
|
all
|
F0
|
F1
|
F2
|
F3
|
F4
|
F5
|
Fx
|
---|---|---|---|---|---|---|---|---|
% test set
|
100.0
|
37.8
|
19.6
|
14.9
|
5.1
|
12.6
|
2.2
|
7.7
|
Features | # Hidden Units | # epochs | %err alone | %err w/RNN |
PLP12N, 7hyp | 2000 | 8 | 36.7 | 31.1 |
" | 2000 | 13 | 35.0 | 30.5 |
" | 4000 | 8 | 36.0 | 31.0 |
" | 4000 | 10 | 35.0 | 30.4 |
" | 4000 | 13 | 34.4 | 30.1 |
msg0+2N-8k-half epoch 8 | Hidden Bias (mean -6.5684) | Output Bias (mean -3.758) |
plp12N-8k-half merged with msg0+2N-8k-half | Hidden Bias (mean -8.1153) | Output Bias (mean -3.5811) |
plp12+d5_cw5 | Hidden Bias (mean -6.7564) | Output Bias (mean -3.6158) |
plp12N | Hidden Bias (mean -6.6209) | Output Bias (mean -3.6358) |
The subbands are labelled "a", "b", "c", and "d", and consist of the
following features:
Band | Top Half Features | Bottom Half Features |
A | 0,1,2,3,4 | 14,15,16,17,18 |
B | 5,6,7,8 | 19,20,21,22 |
C | 9,10,11 | 23,24,25 |
D | 11,12,13 | 25,26,27 |
Bands | WER |
abcd | 38.7 |
abcd-rnn* | 30.4 |
a-b-c-d | 64.9 |
a-b-c-d-abcdx4 | 40.5 |
a-b-c-d-abcdx4-rnnx8 | 31.8 |
a+b+c+d | 48.6 |
a+b+c+d-abcdx4 | 38.5 |
a+b+c+d-abcdx4-rnnx8 | 30.7 |
abc-abcd | 38.7 |
abc-abd-acd-bcd | 37.9 |
abc-abd-acd-bcd-abcdx4 | 37.7 |
abc-abd-acd-bcd-abcdx4-rnnx8 | 30.2 |
cep a-b-c-d | Out of memory on teq |
klt a-b-c-d | 62.7 |
cep a-b-c-d-abcdx4 | 40.2 |
klt a-b-c-d-abcdx4 | 40.2 |
cep a-b-c-d-abcdx4-rnnx8 | 31.6 |
klt a-b-c-d-abcdx4-rnnx8 | 31.4 |
klt a+b+c+d | 47.6 |
klt a+b+c+d no norm | 54.7 |
klt a+b+c+d-abcd | 38.7 |
klt a+b+c+d-abcdx4 | 38.2 |
klt a+b+c+d-abcdx4 no norm | 37.7 |
klt a+b+c+d-abcdx4-rnnx8 | 30.3 |
klt a+b+c+d-abcdx4-rnnx8 no norm | 30.5 |
From these results, single multiband hurts overall recognition, even when combined using an MLP. The "drop one" multiband doesn't help enough to be worth the extra complexity (although I haven't tried combining the "drop one" multiband using an MLP).
See the bnspokes database for a breakdown of the focus conditions.