next up previous
Next: Prediction with null productions Up: Null productions Previous: Null productions

Computing tex2html_wrap_inline7691 -expansion probabilities

The main problem with null productions is that they allow multiple prediction-completion cycles inbetween scanning steps (since null productions do not have to be matched against one or more input symbols). Our strategy will be to collapse all predictions and completions due to chains of null productions into the regular prediction and completion steps, not unlike the way recursive predictions/completions were handled in Section 4.5.

A prerequisite for this approach is to precompute, for all nonterminals X, the probability that X expands to the empty string. Note that this is another recursive problem, since X itself may not have a null production, but expand to some nonterminal Y that does.

Computation of tex2html_wrap_inline8467 for all X can be cast as a system of non-linear equations, as follows. For each X, let tex2html_wrap_inline8473 be an abbreviation for tex2html_wrap_inline8467 . For example, let X have productions

eqnarray6171

The semantics of context-free rules imply that X can only expand to tex2html_wrap_inline7691 if all the RHS nonterminals in one of X's productions expand to tex2html_wrap_inline7691 . Translating to probabilities, we obtain the equation

displaymath8455

In other words, each production contributes a term in which the rule probability is multiplied by the product of the e variables corresponding to the RHS nonterminals, unless the RHS contains a terminal (in which case the production contributes nothing to tex2html_wrap_inline8473 because it cannot possibly lead to tex2html_wrap_inline7691 ).

The resulting non-linear system can be solved by iterative approximation. Each variable tex2html_wrap_inline8473 is initialized to tex2html_wrap_inline8495 , and then repeatedly updated by substituting in the equation right-hand sides, until the desired level of accuracy is attained. Convergence is guaranteed since the tex2html_wrap_inline8473 values are monotonically increasing and bounded above by the true values tex2html_wrap_inline8499 . For grammars without cyclic dependencies among tex2html_wrap_inline7691 -producing nonterminals this procedure degenerates to simple backward substitution. Obviously the system has to be solved only once for each grammar.

The probability tex2html_wrap_inline8473 can be seen as the precomputed inner probability of an expansion of X to the empty string, i.e., it sums the probabilities of all Earley paths that derive tex2html_wrap_inline7691 from X. This is the justification for the way these probabilities can be used in modified prediction and completion steps, described next.


next up previous
Next: Prediction with null productions Up: Null productions Previous: Null productions

Andreas Stolcke
Sat Jun 29 21:49:02 PDT 1996