next up previous
Next: Computing forward and inner Up: Probabilistic Earley Parsing Previous: Earley paths and their

Forward and inner probabilities

Since string and prefix probabilities are the result of summing derivation probabilities, the goal is to compute these sums efficiently by taking advantage of the Earley control structure. This can be accomplished by attaching two probabilistic quantities to each Earley state, as follows. The terminology is derived from analogous or similar quantities commonly used in the literature on Hidden Markov Models (HMMs) [Rabiner and Juang1986] and in Baker:79.

defn5164

It helps to interpret these quantities in terms of an unconstrained Earley parser that operates as a generator emitting--rather than recognizing--strings. Instead of tracking all possible derivations, the generator traces along a single Earley path randomly determined by always choosing among prediction steps according to the associated rule probabilities. Notice that the scanning and completion steps are deterministic once the rules have been chosen.

Intuitively, the forward probability tex2html_wrap_inline8071 is the probability of an Earley generator producing the prefix of the input up to position i-1 while passing through state tex2html_wrap_inline8075 at position i. However, due to left-recursion in productions the same state may appear several times on a path, and each occurrence is counted towards the total tex2html_wrap_inline8079 . Thus, tex2html_wrap_inline8079 is really the expected number of occurrences of the given state in state set i. Having said that, we will refer to tex2html_wrap_inline7695 simply as a probability, both for the sake of brevity, and to keep the analogy to the HMM terminology of which this is a generalization.gif Note that for scanned states, tex2html_wrap_inline7695 is always a probability, since by definition a scanned state can occur only once along a path.

The inner probabilities, on the other hand, represent the probability of generating a substring of the input from a given nonterminal, using a particular production. Inner probabilities are thus conditional on the presence of a given nonterminal X with expansion starting at position k, unlike the forward probabilities, which include the generation history starting with the initial state. The inner probabilities as defined here correspond closely to the quantities of the same name in Baker:79. The sum of tex2html_wrap_inline7697 of all states with a given LHS X is exactly Baker's inner probability for X.

The following is essentially a restatement of Lemma 2 in terms of forward and inner probabilities. It shows how to obtain the sentence and string probabilities we are interested in, provided that forward and inner probabilities can be computed effectively.

lemma5197

The restriction in (a) that X be preceded by a possible prefix is necessary since the Earley parser at position i will only pursue derivations that are consistent with the input up to position i. This constitutes the main distinguishing feature of Earley parsing compared to the strict bottom-up computation used in the standard inside probability computation [Baker1979]. There, inside probabilities for all positions and nonterminals are computed, regardless of possible prefixes.


next up previous
Next: Computing forward and inner Up: Probabilistic Earley Parsing Previous: Earley paths and their

Andreas Stolcke
Sat Jun 29 21:49:02 PDT 1996