ICSI Speech FAQ:
3.5 What are the neural net data formats?

Answer by: dpwe - 2000-07-28


Our preferred form of acoustic model / phone classifier is the neural network, specifically a fully-connected, feed-forward multi-layer perceptron with a single hidden layer. Such a network is specified by its layer sizes: typically we have 100..400 input units (for a 9..17 frame temporal context of feature vectors with 9..40 elements), an output layer of 24..56 units (each corresponding to one of the monophone posterior probability targets the net is trained to produce), and, in-between, a hidden layer of between 100 and 8000 units.

A full specification of a neural net also includes the nonlinearity type.

The network weights then consist of the input-to-hidden layer of #I x #H weights, the hidden-to-output layer of #H x #O weights, and bias weights for all the hidden and output units. Thus the total number of parameters is #I x #H + #H x #O + #H + #O or #H x (#I + #O + 1) + #O.

Weights files define these parameters, although they do not define the other net parameters such as the layer sizes and the nonlinearity type - these have to be additionally specified by hand. Weights files are written by the neural net training program qnstrn and read by forward pass programs qnsfwd and ffwd. There are also Matlab routines to read and write these files - readmlpwts and writemlpwts in /u/drspeech/share/lib/matlab/icsi.

Surprisingly, the one weights file format that we use - *.wts - is actually an ascii format. Basically, there is a small amount of formatting, then the weights are listed, one per line, as ascii floating point values. The full format is described in the weights(5) man page.

This ends up using 9 or 10 bytes per value, even though the weights from fixed-point SPERT trainings are only valid to 16 bits (i.e. they lie in the range [-32768..32767]/8192). The ascii format compresses well with gzip (and qnsfwd can read gzipped weights files directly). It's also rather slow to read and write, compared to what a binary format might offer.

One of the things I never got around to was writing a new QN_MLPWeightFile subclass to do binary weights files. It wouldn't be hard, and it would be quite easy to integrate into QuickNet. Then our 22MB 8000HU wts files would be under 5MB.


Previous: 3.4 What are the training target data formats? - Next: 3.6 What are the posterior probabilty data formats?
Back to ICSI Speech FAQ index

Generated by build-faq-index on Tue Mar 24 16:18:14 PDT 2009