Broadcast News Results

This page is a central location to gather and organize the results of the various forays we are making into the Hub 4 - Broadcast News task here at ICSI. As our experiments continue, it will hopefully form a kind of narrative of our approach to the problem.

Be sure to check out the BN machine usage page.

Single Net Test Results

Word-Error Rate for the 13 Epochs of plp12N-8k-half
1	2	3	4	5	6	7	8	9	10	11	12	13
49.7	45.8	44.6	43.2	42.4	39.4	37.7	36.8	36.6	35.3	35.1	35.2	35.0

I don't know why epoch 8 isn't 36.7 as in the above table.

Word-Error Rate for the 13 Epochs of msg0+2N-8k-half
1 2 3 4 5 6 7 8 9 10 11 12 13

51.9 50.5 48.5 44.5 42.2 39.9 39.2 39.4 38.8 38.8 38.6 38.5 38.5

PLP Context Window Experiment Results

Summary of plp12-8k / plp12N-8k Context Window Experiments
	No Deltas / No Deltas Normalized	Delta Window 9 / Delta Window 9 Normalized	Delta Window 5 / Delta Window 5 Normalized
Context Window 9	39.8 / 36.7	40.0	37.7 / 35.0
Context Window 5	44.5	40.5	39.3 / 35.7

Summary of plp12-8k / plp12N-8k + RNN Context Window Experiments
	No Deltas / No Deltas Normalized	Delta Window 9 / Delta Window 9 Normalized	Delta Window 5 / Delta Window 5 Normalized
Context Window 9	30.8 / 31.1	31.2	30.5
Context Window 5	32.2	30.8	30.9 / 30.2

Summary of plp12-8k / plp12N-8k + msg Context Window Experiments
	No Deltas / No Deltas Normalized	Delta Window 9 / Delta Window 9 Normalized	Delta Window 5 / Delta Window 5 Normalized
Context Window 9			33.9
Context Window 5		34.7	34.0 / 34.7

Summary of plp12-8k / plp12N-8k + msg + RNN Context Window Experiments
	No Deltas / No Deltas Normalized	Delta Window 9 / Delta Window 9 Normalized	Delta Window 5 / Delta Window 5 Normalized
Context Window 9			29.6
Context Window 5		29.8	29.4 / 29.9

Combinations Results:

WER% breakdown by spoke (acoustic condition):
Not every trial has been recorded here. Contact janin@cs.berkeley.edu if there's one you're interested in.

System/Spoke	all	F0	F1	F2	F3	F4	F5	Fx
% test set	100.0	37.8	19.6	14.9	5.1	12.6	2.2	7.7

MSG Multiband Results

The subbands are labelled "a", "b", "c", and "d", and consist of the following features:


Band	Top Half Features	Bottom Half Features
A	0,1,2,3,4	14,15,16,17,18
B	5,6,7,8	19,20,21,22
C	9,10,11	23,24,25
D	11,12,13	25,26,27

Notation: a-b means subband a and subband b combined at the feature level by adding the log probabilities (using mergeLna). a-b-c-d-abcdx4 means each of the subbands combined with 4x the fullband. a+b means subband a and subband b combined using a 500 HU mlp. So a+b+c+d-abcdx4 means the 4 subbands are combined with an mlp, and the results are merged with 4x the fullband using mergeLna. "No norm" indicates that the probabilities fed into the merging net were not normalized (e.g. the .norm file contained 0.0 for the means and 1.0 for the variances).

Bands	WER
abcd	38.7
abcd-rnn*	30.4
a-b-c-d	64.9
a-b-c-d-abcdx4	40.5
a-b-c-d-abcdx4-rnnx8	31.8
a+b+c+d	48.6
a+b+c+d-abcdx4	38.5
a+b+c+d-abcdx4-rnnx8	30.7
abc-abcd	38.7
abc-abd-acd-bcd	37.9
abc-abd-acd-bcd-abcdx4	37.7
abc-abd-acd-bcd-abcdx4-rnnx8	30.2
cep a-b-c-d	Out of memory on teq
klt a-b-c-d	62.7
cep a-b-c-d-abcdx4	40.2
klt a-b-c-d-abcdx4	40.2
cep a-b-c-d-abcdx4-rnnx8	31.6
klt a-b-c-d-abcdx4-rnnx8	31.4
klt a+b+c+d	47.6
klt a+b+c+d no norm	54.7
klt a+b+c+d-abcd	38.7
klt a+b+c+d-abcdx4	38.2
klt a+b+c+d-abcdx4 no norm	37.7
klt a+b+c+d-abcdx4-rnnx8	30.3
klt a+b+c+d-abcdx4-rnnx8 no norm	30.5

* - These are from Dan's decodes in /u/drspeech/data/bn/experiments/dpwe/train17-msg1+align2.

From these results, single multiband hurts overall recognition, even when combined using an MLP. The "drop one" multiband doesn't help enough to be worth the extra complexity (although I haven't tried combining the "drop one" multiband using an MLP).

Features	# Hidden Units	# epochs	%err alone	%err w/RNN
PLP12N, 7hyp	2000	8	36.7	31.1
"	2000	13	35.0	30.5
"	4000	8	36.0	31.0
"	4000	10	35.0	30.4
"	4000	13	34.4	30.1

msg0+2N-8k-half epoch 8	Hidden Bias (mean -6.5684)	Output Bias (mean -3.758)
plp12N-8k-half merged with msg0+2N-8k-half	Hidden Bias (mean -8.1153)	Output Bias (mean -3.5811)
plp12+d5_cw5	Hidden Bias (mean -6.7564)	Output Bias (mean -3.6158)
plp12N	Hidden Bias (mean -6.6209)	Output Bias (mean -3.6358)

ICSI Broadcast News Results Page

Contents

Single Net Test Results

PLP Context Window Experiment Results

MSG Multiband Results

1	2	3	4	5	6	7	8	9	10	11	12	13
51.9	50.5	48.5	44.5	42.2	39.9	39.2	39.4	38.8	38.8	38.6	38.5	38.5