ICSI Speech FAQ:
8.3 How do I build an n-gram grammar for noway or chronos?

Answer by: gildea - 2000-11-12


To build a language model from text, use Andreas Stolcke's SRI Language Modeling Toolkit, specifically the ngram-count program.

Another alternative is the CMU-Cambridge Statistical Language Modeling Toolkit.

Both the above packages will generate language models in the standard human-readable ARPA format. (The CMU package can also generate its own binary language model format.) Noway can read ARPA format, or noway's own binary n-gram format. To convert from ARPA to noway binary format, call noway with the -write_lm command-line switch. When feeding an ARPA-format language model to Noway, you may have to manually add zero backoff weights to lines in the file with no backoff weight. The backoff weight is the number at the end of the line.

Although earlier versions of noway had separate bigram and trigram binary file formats specified with -bigram and -trigram, it's better to use the newer generic -ngram format.


Previous: 8.2 How do I build a bigram grammar for Y0? - Next: 9.1 What is decoding?
Back to ICSI Speech FAQ index

Generated by build-faq-index on Tue Mar 24 16:18:17 PDT 2009