Relation to probabilistic LR parsing

Next: Other related work Up: Discussion Previous: Online pruning

Relation to probabilistic LR parsing

One of the major alternative context-free parsing paradigms besides Earley's algorithm is LR parsing [Aho and Ullman1972]. A comparison of the two approaches, both in their probabilistic and non-probabilistic aspects, is interesting and provides useful insights. The following remarks assume familiarity with both approaches. We sketch the fundamental relations, as well as the important tradeoffs between the two frameworks.

Like an Earley parser, LR parsing uses dotted productions, called items, to keep track of the progress of derivations as the input is processed. The start indices are not part of LR items: we may therefore use the term ``item'' to refer to both LR items and Earley states without start indices. An Earley parser constructs sets of possible items on the fly, by following all possible partial derivations. An LR parser, on the other hand, has access to a complete list of sets of possible items computed beforehand, and at runtime simply follows transitions between these sets. The item sets are known as the ``states'' of the LR parser. A grammar is suitable for LR parsing if these transitions can be performed deterministically by considering only the next input and the contents of a shift-reduce stack. Generalized LR parsing is an extension that allows parallel tracking of multiple state transitions and stack actions by using a graph-structured stack [Tomita1986].

Probabilistic LR parsing [Wright1990] is based on LR items augmented with certain conditional probabilities. Specifically, the probability p associated with an LR item is, in our terminology, a normalized forward probability:

where the denominator is the probability of the current prefix. LR item probabilities, are thus conditioned forward probabilities, and can be used to compute conditional probabilities of next words: is the sum of the p's of all items having to the right of the dot (extra work is required if the item corresponds to a ``reduce'' state, i.e., if the dot is in final position).

Notice that the definition of p is independent of i as well as the start index of the corresponding Earley state. Therefore, to ensure that item probabilities are correct independent of input position, item sets would have to be constructed so that their probabilities are unique within each set. However, this may be impossible given that the probabilities can take on infinitely many values and in general depend on the history of the parse. The solution used by Wright:90 is to collapse items whose probabilities are within a small tolerance and are otherwise identical. The same threshold is used to simplify a number of other technical problems, e.g., left-corner probabilities are computed by iterated prediction, until the resulting changes in probabilities are smaller than . Subject to these approximations, then, a probabilistic LR parser can compute prefix probabilities by multiplying successive conditional probabilities for the words it sees.

As an alternative to the computation of LR transition probabilities from a given SCFG, one might instead estimate such probabilities directly from traces of parses on a training corpus. Due to the imprecise relationship between LR probabilities and SCFG probabilities it is not clear if the model thus estimated corresponds to any particular SCFG in the usual sense.

Briscoe:93 turn this incongruity into an advantage by using the LR parser as a probabilistic model in its own right, and show how LR probabilities can be extended to capture non-context-free contingencies. The problem of capturing more complex distributional constraints in natural language is clearly important, but well beyond the scope of this article. We simply remark that it should be possible to define ``interesting'' non-standard probabilities in terms of Earley parser actions so as to better model non-context-free phenomena.

Apart from such considerations, the choice between LR methods and Earley parsing is a typical space-time tradeoff. Even though an Earley parser runs with the same linear time and space complexity as an LR parser on grammars of the appropriate LR class, the constant factors involved will be much in favor of the LR parser as almost all the work has already been compiled into its transition and action table. However, the size of LR parser tables can be exponential in the size of the grammar (due to the number of potential item subsets). Furthermore, if the generalized LR method is used for dealing with non-deterministic grammars [Tomita1986] the runtime on arbitrary inputs may also grow exponentially. The bottom line is that each application's needs have to be evaluated against the pros and cons of both approaches to find the best solution. From a theoretical point of view, the Earley approach has the inherent appeal of being the more general (and exact) solution to the computation of the various SCFG probabilities.

Next: Other related work Up: Discussion Previous: Online pruning

Andreas Stolcke
Sat Jun 29 21:49:02 PDT 1996