Complexity issues

Next: Extensions Up: Probabilistic Earley Parsing Previous: Eliminating null productions

Complexity issues

The probabilistic extension of Earley's parser preserves the original control structure in most aspects, the major exception being the collapsing of cyclic predictions and unit completions, which can only make these steps more efficient.

Therefore the complexity analysis from Earley:70 applies, and we only summarize the most important results here.

The worst-case complexity for Earley's parser is dominated by the completion step, which takes for each input position, l being the length of the current prefix. The total time is therefore for an input of length l, which is also the complexity of the standard Inside/Outside [Baker1979] and LRI [Jelinek and Lafferty1991] algorithms. For grammars of bounded ambiguity, the incremental per-word cost reduces to O(l), total. For deterministic CFGs the incremental cost is constant, O(l) total. Due to the possible start indices each state set can contain O(l) Earley states, giving worst-case space complexity overall.

Apart from input length, complexity is also determined by grammar size. We will not try to give a precise characterization in the case of sparse grammars (Appendix B.3 gives some hints on how to implement the algorithm efficiently for such grammars). However, for fully parameterized grammars in CNF we can verify the scaling of the algorithm in terms of the number of nonterminals n, and verify that it has the same time and space requirements as the Inside/Outside (I/O) and LRI algorithms.

The completion step again dominates the computation, which has to compute probabilities for at most states. By organizing summations (1) and (2) so that are first summed by LHS nonterminals, the entire completion operation can be accomplished in . The one-time cost for the matrix inversions to compute the left-corner and unit-production relation matrices is also .

Andreas Stolcke
Sat Jun 29 21:49:02 PDT 1996