...Stolcke
Speech Technology and Research Laboratory, SRI International, 333 Ravenswood Ave., Menlo Park, CA 94025. E-mail: stolcke@speech.sri.com.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...Jelinek:91.
Their paper phrases these problem in terms of context-free probabilistic grammars, but they generalize in obvious ways to other classes of models.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...derivations.
Earley states are also known as items in LR parsing, see [section 5.2]Aho:72 and Section 6.2.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...far.
This index is implicit in Earley:70. We include it here for clarity.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...state.
Note the difference between ``complete'' and ``completed'' states: Complete states (those with the dot to the right of the entire RHS) are the result of a completion or scanning step, but completion also produces states which are not yet complete.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...generalization.
The same technical complication was noticed by Wright:90 in the computation of probabilistic LR parser tables. The relation to LR parsing will be discussed in Section 6.2. Incidentally, a similar interpretation of forward ``probabilities'' is required for HMMs with non-emitting states.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...x.
This notation suggests a simple implementation, being obviously borrowed from the programming language C.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...prediction.
In different parsing scenarios the scanning step may well modify probabilities. For example, if the input symbols themselves have attached likelihoods these can be integrated by multiplying them onto 2#2 and 3#3 when a symbol is scanned. That way it is possible to perform efficient Earley parsing with integrated joint probability computation directly on weighted lattices describing ambiguous inputs.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...productions
Unit productions are also called ``chain productions'' or ``single productions'' in the literature.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...instance.
This does not preclude using a shared chart at the implementation level, of course, as long as the above protocol is adhered to.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...frameworks.
Like Earley parsers, LR parsers can be built using various amounts of lookahead to make the operation of the parser (more) deterministic, and hence more efficient. Only the case of zero-lookahead, LR(0), is considered here; the correspondence between LR(k) parsers and k-lookahead Earley parsers is discussed in the literature [Earley1970, Aho and Ullman1972].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...parser.
Once again, it is helpful to compare this to a closely related finite-state concept: the states of the LR parser correspond to sets of Earley states, similar to the way the states of a deterministic FSA correspond to sets of states of an equivalent non-deterministic FSA under the standard subset construction.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...prefix.
The identity of this expression with the item probabilities of Wright:90 can be proved by induction on the steps performed to compute the p's, as shown in Stolcke:earley-tr.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...sees.
It is not clear what the numerical properties of this approximation are, e.g., how the errors will accumulate over longer parses.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...matrices.
This connection to the GHR algorithm was pointed out by Fernando Pereira. Exploration of this link then lead to the extension of our algorithm to handle 1#1-productions, as described in Section 4.7.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...CNF.
Their method uses the transitive (but not reflexive) closure over the left-corner relation 103#103, for which they chose the symbol 196#196. We chose the symbol 5#5 in this article to point to this difference.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...definition.
Of course a CYK-style parser can operate left-to-right, right-to-left, or otherwise by reordering the computation of chart entries.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...seconds.
These figures are not very meaningful for their absolute values. All measurements were obtained on a Sun SPARCstation 2 with a CommonLisp/CLOS implementation of generic sparse matrices that was not particularly optimized for this task.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

Andreas Stolcke
Sat Jun 29 21:49:02 PDT 1996