ICSI Speech FAQ:
9.2 What are the decoders we use at ICSI?
Answer by: dpwe - 2000-08-14
As explained on the decoding page, HMM
decoding is probably the trickiest and most resource-intensive part
of the speech recognition chain. This makes it difficult to
standardize on a single decoder; different decoders are used for
historical reasons, because they offer different features, and
because they perform better at different tasks.
The three main decoders you will come across at ICSI are:
- noway
- noway is our favorite decoder. It was written by
Steve Renals (formerly of ICSI, now at Sheffield University),
and he describes it as a 'start-synchronous stack decoder'.
It uses a host of tricks described in his IEEE Tr.SAP paper
with Mike Hochberg,
Start-synchronous search for large vocabulary continuous speech
recognition. We like noway because it is flexible
(accepting a wide range of data formats), it is rather mature
(few unknown bugs), and it is pretty fast (clever pruning,
and easily-controlled speed-accuracy tradeoff).
-
- chronos
- chronos was written by Tony Robinson (Cambridge/SoftSound)
as a substitute for noway. Tony had some ideas about how
to speed up the search by using stretches of time spent in a particular
state as the atomic units, rather than each time step individually,
as described in his 1998 ICASSP paper (with James Christie),
Time-first search for large-vocabulary speech recognition.
chronos is significantly faster, more accurate, and
less memory-consuming than noway, but it has fewer
options, fewer sanity checks and some weird bugs. One major
attraction, however, is that it seems to need fewer
options - informally, performs varies rather little in response
to pruning parameter changes.
- y0
- y0 is the original ICSI decoder. Its use is deprecated,
but still happens because its special data formats mean that
converting to noway (or chronos) is not
trivial. Also, y0 supports a cruder form of HMM
model specification that, while less convenient to use, does
permit a wider range of model structures. However, y0
has rather poor pruning, and tends to be very inefficient.
There are two other decoders that should be mentioned. First is
HVite, the HTK
Viterbi decoder. HTK Is installed and operational at ICSI, but despite
being the most popular speech toolkit in the world, there is rather
little expertise in using it at ICSI. We do use it (based on standard
scripts) for the Aurora tandem modeling work.
Finally, there is the issue of multistream decoding. We have
done a lot of work at ICSI with multistream recognition, where speech
information is presented on several streams in parallel. One way of
building a recognizer in this situation is to track parallel state
evolutions based on separate streams, with some constraints between
the decodes, but not necessarily requiring the same states in each
stream. This doesn't fit terribly comfortably into our existing
decode infrastructure (although it can be done by defining an
'outer product' state space), so there was interest in developing
a new decoder specifically for this process. Eric Fosler-Lussier
started work on such a beast, called jose (you figure it out),
but I don't know the current status of this project.
Previous: 9.1 What is decoding? - Next: 9.4 How do I balance insertions and deletions?
Back to ICSI Speech FAQ index
Generated by build-faq-index on Tue Mar 24 16:18:17 PDT 2009