ICSI Speech FAQ:
6.4 How long and how many epochs does it take to train a net?

Answer by: dpwe - 2000-08-07


Net training time is close to being a linear function of training set size and the number of connections in the network (hidden*(inp+out), ignoring the biases). The trainings we run can vary in duration from under an hour for a quick test of a simple net on a small task, to almost a month to our best large-vocabulary nets (a training that might take a year if we didn't have the multispert hardware).

As explained on the neural net training FAQ page, the training procedure makes multiple passes through the entire set of training patterns, eventually reducing the learning rate step size, until the net performance stabilizes on the cross-validation set. Each pass is known as an epoch.

Under the "newbob" learning schedule, where the the learning rate is initially constant, then ramps down exponentially after the net stabilizes, training usually takes between 7 and 10 epochs. There are usually 3 to 5 epochs at the initial learning rate of 0.008, then a further 4 or 5 epochs with the reducing learning rate, which rarely gets below 0.00025. A typical training history is shown below: such reports are generated by the command qn_log qnstrn.log, where qnstrn.log is the log_file written by qnstrn:

***qn_log results for:  swbt-plp12O-70h-a4+2000h.log ***
Host: ravioli.
Training run start: Sat Jul 29 13:45:16 2000.

Epc Learn       ----Training--- --Cross Val---  Date
    Rate        % Corr  MCUPS   % Corr  MCPS    Finished
pre                             2.60%   306.28  Sat Jul 29 13:53:12 2000
1   0.008000    64.19%  152.65  61.68%  306.48  Sat Jul 29 23:20:28 2000
2   0.008000    67.28%  152.55  63.06%  304.96  Sun Jul 30 08:48:10 2000
3   0.008000    67.98%  152.46  63.60%  305.91  Sun Jul 30 18:16:09 2000
4   0.008000    68.31%  152.33  64.07%  305.71  Mon Jul 31 03:44:37 2000
5   0.004000    68.94%  152.36  65.90%  305.42  Mon Jul 31 13:13:00 2000
6   0.002000    69.29%  152.52  67.04%  306.27  Mon Jul 31 22:40:47 2000
7   0.001000    69.44%  152.31  67.80%  305.98  Tue Aug 1 08:09:20 2000
8   0.000500    69.50%  152.45  68.32%  306.01  Tue Aug 1 17:37:20 2000
9   0.000250    69.52%  152.32  68.60%  305.59  Wed Aug 2 03:05:50 2000

Training run stop: Wed Aug  2 03:05:50 2000.
Training time: 307234.30 secs (85 hours, 20 mins, 34 secs).

Note that, in keeping with the "newbob" schedule, the learn rate started reducing after the Cross Val % Corr increase by only 0.47% (to 64.07%) in epoch 4, and training stopped when the CVFA increased only 0.28% (to 68.60%) in epoch 9.

Note also the speed of the training as indicated in the MCUPS column (the forward pass speed, in the MCPS column, is not a significant contribution to training time). This training ran on a Duo spert (two SPERT boards being controlled by a single host). Other arrangements give the following typical train speeds (from my status reports of 1998may27 and 2000-06-30).

HostArithmeticBunch sizeMCUPS
TEQUILA (Ultra 30/300MHz)Floating pt125.6
TEQUILA (Ultra 30/300MHz)Floating pt1633.9
YAM (SPERT)Fixed pt147.4
YAM (SPERT)Fixed pt1689.6
GUINNESS (TetraSPERT)Fixed pt16170.4

These results are for a small (162:2000:54) net designed to fit in the 2MB L2 cache of the Ultra-30. Larger nets are worse for the Ultra (because they exceed the cache size) but amortize the overhead better for the SPERTs; on our largest trainings (252:8000:54), we get 370 MCUPS on the TetraSPERT.


Previous: 6.3 How are neural nets trained? - Next: 6.5 How do I apply a previously-trained neural net to some data?
Back to ICSI Speech FAQ index

Generated by build-faq-index on Tue Mar 24 16:18:16 PDT 2009