next up previous
Next: Extensions Up: Probabilistic Earley Parsing Previous: Eliminating null productions

Complexity issues

 

The probabilistic extension of Earley's parser preserves the original control structure in most aspects, the major exception being the collapsing of cyclic predictions and unit completions, which can only make these steps more efficient.

Therefore the complexity analysis from Earley:70 applies, and we only summarize the most important results here.

The worst-case complexity for Earley's parser is dominated by the completion step, which takes tex2html_wrap_inline8605 for each input position, l being the length of the current prefix. The total time is therefore tex2html_wrap_inline8609 for an input of length l, which is also the complexity of the standard Inside/Outside [Baker1979] and LRI [Jelinek and Lafferty1991] algorithms. For grammars of bounded ambiguity, the incremental per-word cost reduces to O(l), tex2html_wrap_inline8605 total. For deterministic CFGs the incremental cost is constant, O(l) total. Due to the possible start indices each state set can contain O(l) Earley states, giving tex2html_wrap_inline8605 worst-case space complexity overall.

Apart from input length, complexity is also determined by grammar size. We will not try to give a precise characterization in the case of sparse grammars (Appendix B.3 gives some hints on how to implement the algorithm efficiently for such grammars). However, for fully parameterized grammars in CNF we can verify the scaling of the algorithm in terms of the number of nonterminals n, and verify that it has the same tex2html_wrap_inline8625 time and space requirements as the Inside/Outside (I/O) and LRI algorithms.

The completion step again dominates the computation, which has to compute probabilities for at most tex2html_wrap_inline8625 states. By organizing summations (1) and (2) so that tex2html_wrap_inline8187 are first summed by LHS nonterminals, the entire completion operation can be accomplished in tex2html_wrap_inline8625 . The one-time cost for the matrix inversions to compute the left-corner and unit-production relation matrices is also tex2html_wrap_inline8625 .



Andreas Stolcke
Sat Jun 29 21:49:02 PDT 1996