- ...Stolcke
-
Speech Technology and Research Laboratory, SRI International,
333 Ravenswood Ave., Menlo Park, CA 94025.
E-mail: stolcke@speech.sri.com.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...Jelinek:91.
- Their paper phrases these problem in terms of context-free probabilistic
grammars, but they generalize in obvious ways to other classes of models.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...derivations.
- Earley states are also known as items in LR parsing,
see [section 5.2]Aho:72 and Section 6.2.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...far.
- This index is implicit in Earley:70.
We include it here for clarity.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...state.
- Note the difference between ``complete'' and
``completed'' states: Complete states (those with the dot to the
right of the entire RHS) are the result of a completion or scanning
step,
but completion also produces states which are not yet complete.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...generalization.
- The same technical complication was noticed by
Wright:90 in the computation of probabilistic
LR parser tables. The relation to LR parsing will be
discussed in Section 6.2.
Incidentally, a similar interpretation of forward ``probabilities''
is required for HMMs with non-emitting states.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...x.
- This notation suggests a simple implementation,
being obviously borrowed from the programming language C.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...prediction.
-
In different parsing scenarios the scanning step may well modify
probabilities.
For example, if the input symbols themselves have attached likelihoods
these can be integrated by multiplying them onto 2#2 and 3#3
when a symbol is scanned.
That way it is possible to perform efficient Earley parsing with
integrated joint probability computation directly on weighted
lattices describing ambiguous inputs.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...productions
- Unit productions are also called ``chain productions''
or ``single productions'' in the literature.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...instance.
- This does not preclude using a shared chart at the implementation
level, of course, as long as the above protocol is adhered to.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...frameworks.
-
Like Earley parsers, LR parsers can be built using various amounts
of lookahead to make the operation of the parser (more) deterministic,
and hence more efficient. Only the case of zero-lookahead, LR(0),
is considered here; the correspondence between LR(k) parsers and
k-lookahead Earley parsers is discussed in the literature
[Earley1970, Aho and Ullman1972].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...parser.
-
Once again, it is helpful to compare this to a closely related finite-state
concept:
the states of the LR parser correspond to sets of Earley states, similar
to the way the states of a deterministic FSA correspond to sets of
states of an equivalent non-deterministic FSA under the standard
subset construction.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...prefix.
- The identity of this expression with the item probabilities
of Wright:90 can be proved by induction on the steps
performed to compute the p's, as shown in
Stolcke:earley-tr.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...sees.
-
It is not clear what the numerical properties of this approximation
are, e.g., how the errors will accumulate over longer parses.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...matrices.
- This connection to the GHR algorithm was pointed out by Fernando
Pereira. Exploration of this link then lead to the extension of our algorithm
to handle 1#1-productions, as described in Section 4.7.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...CNF.
- Their method uses the transitive (but not reflexive) closure over
the left-corner relation 103#103, for which they chose the symbol 196#196.
We chose the symbol 5#5 in this article to point to this difference.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...definition.
-
Of course a CYK-style parser can operate left-to-right, right-to-left,
or otherwise by reordering the computation of chart entries.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...seconds.
- These figures are not very meaningful for their
absolute values. All measurements were obtained on a Sun SPARCstation 2 with
a CommonLisp/CLOS implementation of generic sparse matrices that was
not particularly optimized for this task.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.