Other related work

Next: Conclusions Up: Discussion Previous: Relation to probabilistic LR

Other related work

The literature on Earley-based probabilistic parsers is sparse, presumably because of the precedent set by the Inside/Outside algorithm, which is more naturally formulated as a bottom-up algorithm.

Both Nakagawa:87 and Paeseler:88 use a non-probabilistic Earley parser augmented with ``word match'' scoring. Though not truly probabilistic, these algorithms are similar to the Viterbi version described here, in that they find a parse that optimizes the accumulated matching scores (without regard to rule probabilities). Prediction and completion loops do not come into play since no precise inner or forward probabilities are computed.

Magerman:91c are interested primarily in scoring functions to guide a parser efficiently to the most promising parses. Earley-style top-down prediction is used only to suggest worthwhile parses, not to compute precise probabilities, which they argue would be an inappropriate metric for natural language parsing.

Casacuberta:88 exhibit an Earley parser that processes weighted (not necessarily probabilistic) CFGs and performs a computation that is isomorphic to that of inside probabilities shown here. Schabes:91 adds both inner and outer probabilities to Earley's algorithm, with the purpose of obtaining a generalized estimation algorithm for SCFGs. Both of these approaches are restricted to grammars without unbounded ambiguities, which can arise from unit or null productions.

Dan Jurafsky (personal communication) wrote an Earley parser for the Berkeley Restaurant Project (BeRP) speech understanding system that originally computed forward probabilities for restricted grammars (without left-corner or unit production recursion). The parser now uses the method described here to provide exact SCFG prefix and next-word probabilities to a tightly-coupled speech decoder [Jurafsky et al. 1995].

An essential idea in the probabilistic formulation of Earley's algorithm is the collapsing of recursive predictions and unit completion chains, replacing both with lookups in precomputed matrices. This idea arises in our formulation out of the need to compute probability sums given as infinite series. Graham:80 use a non-probabilistic version of the same technique to create a highly optimized Earley-like parser for general CFGs that implements prediction and completion by operations on Boolean matrices.

The matrix inversion method for dealing with left-recursive prediction is borrowed from the LRI algorithm of Jelinek:91 for computing prefix probabilities for SCFGs in CNF. We then use that idea a second time to deal with the similar recursion arising from unit productions in the completion step. We suspect, but have not proved, that the Earley computation of forward probabilities when applied to a CNF grammar performs a computation that is isomorphic to that of the LRI algorithm. In any case, we believe that the parser-oriented view afforded by the Earley framework makes for a very intuitive solution to the prefix probability problem, with the added advantage that it is not restricted to CNF grammars.

Algorithms for probabilistic CFGs can be broadly characterized along several dimensions. One such dimension is whether the quantities entered into the parser chart are defined in a bottom-up (CYK) fashion, or whether left-to-right constraints are an inherent part of their definition. The probabilistic Earley parser shares the inherent left-to-right character of the LRI algorithm, and contrasts with the bottom-up I/O algorithm.

Probabilistic parsing algorithms may also be classified as to whether they are formulated for fully parameterized CNF grammars or arbitrary context-free rules (typically taking advantage of grammar sparseness). In this respect the Earley approach contrasts with both the CNF-oriented I/O and LRI algorithms. Another approach to avoiding the CNF constraint is a formulation based on probabilistic Recursive Transition Networks (RTNs) [Kupiec1992]. The similarity goes further, as both Kupiec's and our approach is based on state transitions, and dotted productions (Earley states) turn out to be equivalent to RTN states if the RTN is constructed from a CFG.

Next: Conclusions Up: Discussion Previous: Relation to probabilistic LR

Andreas Stolcke
Sat Jun 29 21:49:02 PDT 1996