ICSI Speech FAQ:
6.9 How does neural net randomization affect performance?

Answer by: dpwe - 2000-08-11

Our neural net trainings involve several sources of randomization:

The order of utterances in the training set is usually randomized. The fear is that by presenting a set of similar training patterns (e.g. from the same speaker) in sequence to the net, it will be locally trained specifically for that speaker, rather than making steady progress throughout the training epoch towards a truly speaker-independent average of all the training speakers.
Training frames are read from the 'training cache buffer' in a pseudo-random order for the same reason. This is why QuickNet complains if the train_cache_frames is not larger than the longest utterance - with only one utterance in the cache, the net will be trained preferentially for that speaker during that block.
The starting point of the net is a set of random values, again generated pseudo-randomly, that in theory can profoundly influence the character of the final net.

Note that although we call these values random, they are all deterministically produced, so that two trainings run with the same parameters on the same architecture and the same data should give identical results. However, due to the very large number of calculations involved, small arithmetic differences (e.g. SPARC versus Intel, and certainly floating point versus fixed point such as the SPERT) can result in measurable differences in performance.

To investigate the possible influence of these randomization values, Eric and I conducted a number of trainings of essentially the same net, varying only the randomization in each case. The results are summarized in my status reports of 2000-04-28 and 2000-05-12.

Essentially, we found that different initial randomizations can indeed vary resulting net performance, but only by a maximum of 3-4% relative. We also experimented with posterior combination of these nets (i.e. combining two nets trained on exactly the same data), which, as you might expect, doesn't help very much - maybe 2% relative if you're lucky.

Previous: 6.8 How does neural net size affect performance? - Next: 6.10 What is posterior combination? What other kinds of combination are possible?
Back to ICSI Speech FAQ index

Generated by build-faq-index on Tue Mar 24 16:18:16 PDT 2009

ICSI Speech FAQ: 6.9 How does neural net randomization affect performance?

ICSI Speech FAQ:
6.9 How does neural net randomization affect performance?