ICSI Speech FAQ:
2.3 Why do we use connectionist rather than GMM?

Answer by: dpwe - 2000-07-22


(a/k/a "why is this night different from all other nights?")

Neural nets have been a well-established technique for probabilistic classification ever since their invention -- certainly since the development of the back-propagation algorithm, which provides a way to 'learn' the weights in a multi-layer perceptron (MLP) to reproduce the outputs as represented in a body of training examples. At ICSI (specifically by Morgan and Bourlard in the early 1990s) a particular approach for using neural nets as the classifiers, or "acoustic models", in speech recognizers was developed: the so-called hybrid connectionist-HMM model, in which a temporal window of 9 or so successive feature vectors are presented to the input layer of a network (typically an MLP with a single hidden layer) whose outputs estimate the posterior probability over a set of mutually-exclusive speech classes that the current frame corresponds to each class.

In the years since, the majority of speech recognition systems have used parametric distribution models to estimate the likelihoods of particular feature vectors given a particular speech class. King of the distribution models is the Gaussian mixture model (GMM).

At ICSI, and at the labs to which we have close links (Cambridge, Sheffield, FPMons, IDIAP and a few others), neural net models have been held on to despite the flood of techniques and tools (such as HTK) specific to GMM systems. Much of the reason may be historical i.e. we have continued along the path we were already on, but there are a number of reasons why this is a reasonable thing to do:

For more discussion of this matter, see the following article:


Previous: 2.2 What are the basic approaches to speech recognition? - Next: 2.4 What are the different speech corpora at ICSI or elsewhere?
Back to ICSI Speech FAQ index

Generated by build-faq-index on Tue Mar 24 16:18:13 PDT 2009