ICSI Speech FAQ:
3.14 What are lattices?

Answer by: gelbart - 2000-10-10

What are lattices?

Lattices

Rather than outputting the single best hypothesis for an utterance, or a list of the best hypotheses, some decoders can output a graph, known as a lattice, which compactly represents many different possible hypotheses for the utterance.

The noway decoder produces a lattice in which nodes are labeled with times and links (edges) are labeled with words and information about the probabilities of those words. Other types of lattices are possible, such as lattices in which nodes, rather than links, are labeled with words.

Lattices and Decoders Used at ICSI

Of y0, noway, and chronos, only noway appears able to provide lattice output. There is a lattice output option (-lat) documented for chronos, but as of this writing it does not appear to be implemented.

Lattice Output from noway

noway can be instructed to provide lattice output via the -lattice or -pipe_lattice options. The -lattice option directs noway to write the lattice for each decoded utterance to a separate file. The -pipe_lattice option directs noway to write the lattices for each utterance to stdout, one after the other. The lattice output produced with the -lattice option starts with additional information, including noway parameters and node definitions, that is not provided with the -pipe_lattice option.

noway's lattice output roughly follows the "Standard Lattice Format" documented in chapter 16 of The HTK Book.

Nodes are defined using one line for each node in the graph. The following noway output gives the first few node definitions of a particular lattice: I=0 t=0.000 I=1 t=0.016 I=2 t=0.032 I=3 t=0.048 I=4 t=0.064 I=5 t=0.144

The I field gives the identifying number of the node, and the t field gives the time associated with the node, in seconds.

Links are defined using one line for each link (edge) in the graph. The following noway output gives the first few link definitions of a particular lattice: J=0 S=0 E=1 W=[uh] v=0 a=-4.43322 r=-0.00427245 J=1 S=1 E=2 W=<SIL> v=0 a=-262142 r=0 J=2 S=1 E=3 W=<SIL> v=0 a=1.74243 r=0 J=3 S=1 E=5 W=one v=0 a=-16.163 r=-0.00427245 J=4 S=2 E=4 W=[uh] v=0 a=-9.90146 r=-0.00427245

The fields' meanings are given below. The definitions of the J, S, E, and W fields are taken directly from the Standard Lattice Format description in The HTK Book, while the definitions of the v and r fields have been at least partly guessed at.

J field: the identifying number of the link described on the current line
S field: the node that the link starts at
E field: the node that the link ends at
W field: the word associated with the link
v field: specifies which of the word's pronunication variants was chosen (with v=0 being the first pronunciation variant in the dictionary, v=1 being the second variant, etc.)
a field: this is a word-level acoustic score (read more)
r field: log10(probability of word pronunciation variant); i.e., the probability of that pronunciation being used for the word rather than one of the other pronunciations

Other Relevant Software

Adam Janin and Eric Fosler have written a tool, nbestlat, to prune less likely hypotheses out of a lattice, resulting in a simpler lattice. The tool was created because the lattices produced by noway were too complex for the needs of the VerbMobil and SmartKom projects. Right now, the latest version of nbestlat is stored in our SmartKom source tree. As of this writing it has not been fully tested and debugged.

According to the Abbot documentation, the Abbot package includes a tool named lattice2nbest which extracts the N best (most likely) hypotheses from a Standard Lattice Format lattice. There is also an SRI tool, nbest-lattice, which creates a lattice from N best hypotheses (I think).

Previous: 3.13 What about phoneset definitions and files? - Next: 3.16 How should I structure the directories for my new task?
Back to ICSI Speech FAQ index