ICSI Speech FAQ:
3.6 What are the posterior probabilty data formats?

Answer by: dpwe - 2000-07-29


Our neural net acoustic models convert features into posterior probability estimates for a mutually-exclusive, collectively-exhaustive set of phone classes - implying, among other things, that the probabilities sum to 1. (This is guaranteed through the 'softmax' final nonlinearity we generally use.) Thus, an utterance can be represented by a set of these probability vectors, one for each time frame. This data bears certain similarities to the input feature vector data -- i.e. it consists of fixed-length vectors of real numbers, one for each regularly-spaced time frame -- and is often about the same size, with 30-60 probabilities from a feature vector of 15-40 elements. However, it's less common to store these files, since in simple cases we can just pump them straight into the decoder (through a pipe) to get the recognition result we're really interested in.

Posterior probability files (also called activation files because they represent the activation levels of the output nodes of the neural net classifier) can be held in any of the file formats used for features, but are usually encountered as LNA or rapbin. Both are written directly by qnsfwd, the program that implements the neural-net classifier, although when running on a SPERT card, LNA output is a bad idea - better to pipe the rapbin output through rapact2lna -b running on the host. Rapbin files encode activations as 32 bit binary floats, whereas LNA files use just a single byte, and quantize the probability in logarithmically-spaced steps according to prob=exp(-(val+.5)/24) i.e. between prob=0.0000238 (val=0xFF) and prob=0.979 (val=0x00). It seems that this quantization scheme is perfectly adequate for representing the activation information useful to the decoder.

Because they use one byte per value, LNA files are about one-quarter the size of rapbin files, and are therefore preferred. Some of things that we might want to do with posteriors are implemented in programs that operate directly on LNAs, such as log-domain averaging as performed by the SoftSound program lnaMerge. LNA files can be read into Matlab with the local script, readlna.


Previous: 3.5 What are the neural net data formats? - Next: 3.7 What are the HMM model data formats?
Back to ICSI Speech FAQ index

Generated by build-faq-index on Tue Mar 24 16:18:14 PDT 2009