ICSI Speech FAQ:
9.2 What are the decoders we use at ICSI?

Answer by: dpwe - 2000-08-14


As explained on the decoding page, HMM decoding is probably the trickiest and most resource-intensive part of the speech recognition chain. This makes it difficult to standardize on a single decoder; different decoders are used for historical reasons, because they offer different features, and because they perform better at different tasks.

The three main decoders you will come across at ICSI are:

noway
noway is our favorite decoder. It was written by Steve Renals (formerly of ICSI, now at Sheffield University), and he describes it as a 'start-synchronous stack decoder'. It uses a host of tricks described in his IEEE Tr.SAP paper with Mike Hochberg, Start-synchronous search for large vocabulary continuous speech recognition. We like noway because it is flexible (accepting a wide range of data formats), it is rather mature (few unknown bugs), and it is pretty fast (clever pruning, and easily-controlled speed-accuracy tradeoff).
chronos
chronos was written by Tony Robinson (Cambridge/SoftSound) as a substitute for noway. Tony had some ideas about how to speed up the search by using stretches of time spent in a particular state as the atomic units, rather than each time step individually, as described in his 1998 ICASSP paper (with James Christie), Time-first search for large-vocabulary speech recognition. chronos is significantly faster, more accurate, and less memory-consuming than noway, but it has fewer options, fewer sanity checks and some weird bugs. One major attraction, however, is that it seems to need fewer options - informally, performs varies rather little in response to pruning parameter changes.
y0
y0 is the original ICSI decoder. Its use is deprecated, but still happens because its special data formats mean that converting to noway (or chronos) is not trivial. Also, y0 supports a cruder form of HMM model specification that, while less convenient to use, does permit a wider range of model structures. However, y0 has rather poor pruning, and tends to be very inefficient.

There are two other decoders that should be mentioned. First is HVite, the HTK Viterbi decoder. HTK Is installed and operational at ICSI, but despite being the most popular speech toolkit in the world, there is rather little expertise in using it at ICSI. We do use it (based on standard scripts) for the Aurora tandem modeling work.

Finally, there is the issue of multistream decoding. We have done a lot of work at ICSI with multistream recognition, where speech information is presented on several streams in parallel. One way of building a recognizer in this situation is to track parallel state evolutions based on separate streams, with some constraints between the decodes, but not necessarily requiring the same states in each stream. This doesn't fit terribly comfortably into our existing decode infrastructure (although it can be done by defining an 'outer product' state space), so there was interest in developing a new decoder specifically for this process. Eric Fosler-Lussier started work on such a beast, called jose (you figure it out), but I don't know the current status of this project.


Previous: 9.1 What is decoding? - Next: 9.4 How do I balance insertions and deletions?
Back to ICSI Speech FAQ index

Generated by build-faq-index on Tue Mar 24 16:18:17 PDT 2009