ICSI Broadcast News Project

August Trainings Progress Page

There are several critical trainings in progress this month, August, 1998. This page will be kept up to date with the latest results.


Align 3 'Kitchen sink' (train24)

The latest set of alignments, align3, was installed at ICSI on Aug 21. These alignments are based on the output of a combined RNN/ModSpecGram acoustic model, so should be much better suited to ModSpec training. We immediately set to training a 4000 HU net for comparison:

 Epoch  Date

 Align 3 - msg1N-4k

Comparison (align2, train18)
 Comments
 LrnRt  CV FA%  WER%  LrnRt  CV FA%  WER%
 1  aug22 09:21  0.008  60.60    0.008  59.33  49.9  
 2  aug23 03:11  0.008  62.13    0.008  60.89  47.7  
 3  aug23 20:56  0.008  63.28    0.008  61.62  47.0  
 4  aug24 14:42  0.008  63.45    0.008  62.02  45.5  
 5  aug25 08:38  0.004  66.23  40.7  0.004  64.83  41.3  BN98I models
 6  aug26 02:38  0.002  67.93    0.002  66.43  38.5  
 7  aug26 20:43  0.001  69.03    0.001  67.41  37.1  
 8  aug27 14:43  0.0005  69.72    0.0005  68.12  36.4  
 9  aug28 08:40  0.00025  70.08  35.6  0.00025  68.52  35.3  BN98I models
 10         (training complete)    
     combo w/RNN1  29.5  combo w/RNN1  29.3  BN98I models
     .. + 27hyp  27.6  .. + 27hyp  27.2  


'Kitchen sink' training (train18)

This is the large (4000HU) net we are training to give to Cambridge to use in their next round of alignments. It is based on the modulation spectrogram (msg1) features, with a 28x9 input window. This is similar to our best-performing msg net so far, except that it is trained to the new 'align2' labels.

 Epoch  Date

'Kitchen Sink'

Comparison (4000HU msg1N)
 Comments
 LrnRt  CV FA%  WERR%  LrnRt  CV FA%  WERR%
 1  aug01 12:53  0.008  59.33  49.9  0.008  56.58    
 2  aug02 06:45  0.008  60.89  47.7  0.008  57.83    
 3  aug03 00:37  0.008  61.62  47.0  0.008  58.47    
 4  aug03 18:35  0.008  62.02  45.5  0.008  59.39    
 5  aug04 12:31  0.004  64.83  41.3  0.008  58.97    
 6  aug05 06:26  0.002  66.43  38.5  0.004  61.87    
 7  aug06 00:24  0.001  67.41  37.1  0.002  63.43    
 8  aug06 18:31  0.0005  68.12  36.4  0.001  64.62    
 9  aug07 12:31  0.00025  68.52  35.3  0.0005  65.34    
 10   (training complete)    0.00025  65.74  37.8  
     combo w/RNN1  29.3  combo w/RNN1  29.9  


BAT for plp (train20)

Since we're throwing everything we've got at the problem this week, we are also training up a large plp12N net on all the data. We already trained a (13x9):4000:54 plp12N net to the align1 labels on half the data, so we used the net from the 4th iteration of this training (bn/experiments/fosler/train5) as the starting point for this net, and fixed the learning rate to ramp down, as if starting with epoch 6 of the previous training (since epoch 5 gave a negligable CV improvement). Comparison is against the equivalent iterations from the align1-half training.

 E  Date

plp12N-4kHU-all-a2

plp12N-4kHU-half-a1
 Comments
 LrnRt  CV FA%  WERR%  LrnRt  CV FA%  WERR%
6  aug04 20:19 0.004  63.49 39.9 0.004 62.55    
7  aug05 17:13 0.002  65.29  37.8 0.002 63.68    
8  aug06 14:30 0.001  66.20  36.5 0.001 64.52 36.0  
9  aug07 11:30 0.0005  66.88  35.1 0.0005 65.06    
10  aug08 08:25 0.00025  67.47  34.3 0.00025 65.34 35.0  
11  aug09 05:14 0.000125  67.87  33.7 0.000125 65.48    
12   0.000063     0.000063 65.55 34.4  
     combo w/RNN1  29.7  combo w/RNN1  30.1  


Align2 comparison training (train17)

We have only recently started working with the second set of alignments, generated by Cambridge using improved pronunciation modelling and a combination of more acoustic models. This training is a duplication of our 'standard' msg1N 2000HU training, but based on the align2 alignments. The corresponding figures for the align1 training are shown as comparison.

 Epoch  Date

align2

 align1
 Comments
 LrnRt  CV FA%  WERR%  LrnRt CV FA%  WERR%
 1  jul31  0.008 57.35 53.3  0.008  56.39  51.9  
 2    0.008 58.32 51.3  0.008  57.60  50.5  
 3    0.008 59.17 48.7  0.008  58.05  48.5  
 4  aug01  0.008 59.86 48.6  0.004  61.05  44.5  
 5    0.008 59.88 48.2  0.002  62.59  42.2  
 6  aug02  0.004 62.87 42.8  0.001  63.73  39.9  
 7    0.002 64.26 40.3  0.0005  64.39  39.2  
 8    0.001 65.32 39.3  0.00025  64.75  39.4  
 9  aug03  0.0005 65.99 38.6  0.000125  64.95  38.8  
 10    0.00025 66.29 38.6  0.000063  65.04  38.8  
 11    (training finished)  0.000031  65.08  38.6  
 12          0.000016  65.09  38.5  
 13          0.000008  65.10  38.5  
     combo w/RNN1  30.4  combo w/RNN1  30.1  
     +RNN, 27hyp  28.5  +RNN, 27hyp  27.9  


Cepstra-for-modspec training (train19)

Per-utterance normalization gave a big win on plp features, but had negligable effect on msg features. Other analysis reveals that, for the very shortest utterances, msg features did particularly well until they were normalized. One hypothesis is that because the msg features correspond almost directly to energy in given spectral bands, they might be strongly bimodal, corresponding to the two modes of voiced speech and silence (particularly in longer utterances that contain pauses). Hence, simple normalization could be very unfortunate, and depend critically on the proportion of silence in an utterance.

Using the cepstral transform on the modspec features has proven to make little or no difference in Brian's experiments with NUMBERS, but it might at least alleviate the bimodality of feature dimensions (if it exists). Since Brian had already written code to calculate cepstral features, it was easy to start a training on them (train19), whose progress is charted below.

This is a 2000HU net trained on 28-element feature vectors - full-order cepstral transforms of both msg1 spectra, followed by per-utterance normalization (msg1cepN). Training labels are from align2. Comparison net is the plain msg1N-align2-2kHU from above.

 E  Date

 msg1cepN

msg1N
 Comments
 LrnRt  CV FA%  WERR%  LrnRt  CV FA%  WERR%
 1  aug01 01:10  0.008  56.66    0.008 57.35 53.3  
 2  aug01 13:42  0.008  58.22    0.008 58.32 51.3  
 3  aug02 02:!5  0.008  58.82    0.008 59.17 48.7  
 4  aug02 14:47  0.008  59.65  47.7  0.008 59.86 48.6  
 5  aug03 03:21  0.008  59.79  47.3  0.008 59.88 48.2  
 6  aug03 16:00  0.004  62.71  42.4  0.004 62.87 42.8  
 7  aug04 05:06  0.002  64.10  39.9  0.002 64.26 40.3  
 8  aug04 17:58  0.001  65.02  38.7  0.001 65.32 39.3  
 9  aug05 06:31  0.0005  65.60  38.2  0.0005 65.99 38.6  
 10  (training halted)  0.00025 66.29 38.6  
 11          (training finished)  
     combo w/RNN1  30.2  combo w/RNN1  30.4  
                 

Back to ICSI BN Home Page



Dan Ellis <dpwe@icsi.berkeley.edu>