next up previous
Next: Earley paths and their Up: Probabilistic Earley Parsing Previous: Probabilistic Earley Parsing

Stochastic context-free grammars

 

A stochastic context-free grammar (SCFG) extends the standard context-free formalism by adding probabilities to each production:

displaymath7903

where the rule probability p is usually written as tex2html_wrap_inline7909 . This notation to some extent hides the fact that p is a conditional probability, of production tex2html_wrap_inline7913 being chosen, given that X is up for expansion. The probabilities of all rules with the same nonterminal X on the LHS must therefore sum to unity. Context-freeness in a probabilistic setting translates into conditional independence of rule choices. As a result, complete derivations have joint probabilities that are simply the products of the rule probabilities involved.

The probabilities of interest mentioned in Section 1 can now be defined formally.

  defn5099

In the following, we assume that the probabilities in a SCFG are proper and consistent as defined in Booth:73, and that the grammar contains no useless nonterminals (ones that can never appear in a derivation). These restrictions ensure that all nonterminals define probability measures over strings, i.e., tex2html_wrap_inline7987 is a proper distribution over x for all X. Formal definitions of these conditions are given in Appendix A.



Andreas Stolcke
Sat Jun 29 21:49:02 PDT 1996