ICSI Speech FAQ:
6.1 What is the function of the neural net?

Answer by: dpwe - 2000-08-06


In the Hybrid Connectionist-HMM approach to speech recognition, a neural net is used as the statistical classifier to estimate the probability that the acoustic observations at a given instant correspond to particular speech sound.

Our standard structure at ICSI uses 9 (or occasionally 17) successive feature vectors as a network input layer, feeding a hidden layer of 100-8000 units, feeding an output layer with one unit for each of the 40-60 context-independent phones from which we build word pronunciations. The output layer nonlinearity is 'softmax' (see the comp.ai.neural-nets FAQ on What is a softmax activation function?) so that the output layer activations are directly interpretable as posterior probabilities of a set of mutually-exclusive phone classes

The role of the neural net within the whole speech recognition chain is presented in this visualization of the hybrid recognizer. Posterior probabilities from the net form the space through which the decoder finds the least-cost path consistent with word and language models, to return the most-likely corresponding word sequence.


Previous: 5.9 How can I run the SRI front-end standalone? - Next: 6.2 What kinds of neural nets are there?
Back to ICSI Speech FAQ index

Generated by build-faq-index on Tue Mar 24 16:18:16 PDT 2009