ICSI Broadcast News Results Page


This page is a central location to gather and organize the results of the various forays we are making into the Hub 4 - Broadcast News task here at ICSI. As our experiments continue, it will hopefully form a kind of narrative of our approach to the problem.
 

1998aug03: Progress of the current trainings

Contents

Be sure to check out the BN machine usage page.

Single Net Test Results

These results are now accessed through the bnsingle database. (If the server isn't working, you can see the frozen bnsingle results version.)
 
 
Word-Error Rate for the 13 Epochs of plp12N-8k-half
2 3 4 5 6 7 8 9 10 11 12 13
49.7 45.8 44.6 43.2 42.4 39.4 37.7 36.8 36.6 35.3 35.1 35.2 35.0
I don't know why epoch 8 isn't 36.7 as in the above table.

 
Word-Error Rate for the 13 Epochs of msg0+2N-8k-half
1 2 3 4 5 6 7 8 9 10 11 12 13
51.9 50.5 48.5 44.5 42.2 39.9 39.2 39.4 38.8 38.8 38.6 38.5 38.5
 



 

PLP Context Window Experiment Results

 
Summary of plp12-8k / plp12N-8k Context Window Experiments
No Deltas / 
No Deltas Normalized
Delta Window 9 / 
Delta Window 9 Normalized
Delta Window 5 / 
Delta Window 5 Normalized
Context Window 9 39.8 / 36.7 40.0 37.7 / 35.0
Context Window 5 44.5 40.5 39.3 / 35.7
 
 
Summary of plp12-8k / plp12N-8k + RNN Context Window Experiments
No Deltas / 
No Deltas Normalized
Delta Window 9 / 
Delta Window 9 Normalized
Delta Window 5 / 
Delta Window 5 Normalized
Context Window 9 30.8 / 31.1 31.2 30.5
Context Window 5 32.2 30.8 30.9 / 30.2
 
 
Summary of plp12-8k / plp12N-8k + msg Context Window Experiments
No Deltas / 
No Deltas Normalized
Delta Window 9 / 
Delta Window 9 Normalized
Delta Window 5 / 
Delta Window 5 Normalized
Context Window 9 33.9
Context Window 5 34.7 34.0 / 34.7
 
 
Summary of plp12-8k / plp12N-8k + msg + RNN Context Window Experiments
No Deltas / 
No Deltas Normalized
Delta Window 9 / 
Delta Window 9 Normalized
Delta Window 5 / 
Delta Window 5 Normalized
Context Window 9 29.6
Context Window 5 29.8 29.4 / 29.9

Combinations Results:

 
These results are now accessed through the bncombo database. (If the server isn't working, you can see the frozen bncombo results version.)



 
WER% breakdown by spoke (acoustic condition):
Not every trial has been recorded here. Contact janin@cs.berkeley.edu if there's one you're interested in.
 
These results are now accessed through the bnspokes database. (If the server isn't working, you can see the frozen bnspokes results version.)
 
System/Spoke
all
F0
F1
F2
F3
F4
F5
Fx
% test set
100.0
37.8
19.6
14.9
5.1
12.6
2.2
7.7
 

Comparing HU size,  # epochs
Features # Hidden Units # epochs %err alone %err w/RNN
PLP12N, 7hyp 2000 8 36.7 31.1
" 2000 13 35.0 30.5
" 4000 8 36.0 31.0
" 4000 10 35.0 30.4
" 4000 13 34.4 30.1
 
 
 

 

Notes:
Histograms of the biases for selected nets:
msg0+2N-8k-half epoch 8 Hidden Bias (mean -6.5684) Output Bias (mean -3.758)
plp12N-8k-half merged with msg0+2N-8k-half Hidden Bias (mean -8.1153)  Output Bias (mean -3.5811)
plp12+d5_cw5 Hidden Bias (mean -6.7564) Output Bias (mean -3.6158)
plp12N Hidden Bias (mean -6.6209) Output Bias (mean -3.6358)
 


 

MSG Multiband Results

The following results were obtained from the align2 alignment, normalized modspec features (known as msg0+2N and as msg1N), context window 9, and sampled at 8k. The net consisted of 2000 hidden units. Decodes were done with the 7hyp pruning parameters. Decodes and training can be found in /u/janin/bn/train13-multiband-msg.

The subbands are labelled "a", "b", "c", and "d", and consist of the following features:
 
Band Top Half Features Bottom Half Features
A 0,1,2,3,4 14,15,16,17,18
B 5,6,7,8 19,20,21,22
C 9,10,11 23,24,25
D 11,12,13 25,26,27
 
Notation: a-b means subband a and subband b combined at the feature level by adding the log probabilities (using mergeLna). a-b-c-d-abcdx4 means each of the subbands combined with 4x the fullband. a+b means subband a and subband b combined using a 500 HU mlp. So a+b+c+d-abcdx4 means the 4 subbands are combined with an mlp, and the results are merged with 4x the fullband using mergeLna. "No norm" indicates that the probabilities fed into the merging net were not normalized (e.g. the .norm file contained 0.0 for the means and 1.0 for the variances).
 
 
Bands WER
abcd 38.7
abcd-rnn* 30.4
a-b-c-d 64.9
a-b-c-d-abcdx4 40.5
a-b-c-d-abcdx4-rnnx8 31.8
a+b+c+d 48.6
a+b+c+d-abcdx4 38.5
a+b+c+d-abcdx4-rnnx8 30.7
abc-abcd 38.7
abc-abd-acd-bcd 37.9
abc-abd-acd-bcd-abcdx4 37.7
abc-abd-acd-bcd-abcdx4-rnnx8 30.2
cep a-b-c-d Out of memory on teq
klt a-b-c-d 62.7
cep a-b-c-d-abcdx4 40.2
klt a-b-c-d-abcdx4 40.2
cep a-b-c-d-abcdx4-rnnx8 31.6
klt a-b-c-d-abcdx4-rnnx8 31.4
klt a+b+c+d 47.6
klt a+b+c+d no norm 54.7
klt a+b+c+d-abcd 38.7
klt a+b+c+d-abcdx4 38.2
klt a+b+c+d-abcdx4 no norm 37.7
klt a+b+c+d-abcdx4-rnnx8 30.3
klt a+b+c+d-abcdx4-rnnx8 no norm 30.5
* - These are from Dan's decodes in /u/drspeech/data/bn/experiments/dpwe/train17-msg1+align2.

From these results, single multiband hurts overall recognition, even when combined using an MLP. The "drop one" multiband doesn't help enough to be worth the extra complexity (although I haven't tried combining the "drop one" multiband using an MLP).

See the bnspokes database for a breakdown of the focus conditions.


Dan Ellis <dpwe@icsi.berkeley.edu>
Adam Janin <janin@cs.berkeley.edu>
Eric Fosler <fosler@icsi.berkeley.edu>
et. al.