The rule probabilities in a SCFG can be iteratively estimated using the EM (Expectation-Maximization) algorithm [Dempster et al.1977]. Given a sample corpus D, the estimation procedure finds a set of parameters that represent a local maximum of the grammar likelihood function P(D | G), which is given by the product of the string probabilities
i.e., the samples are assumed to be distributed identically and independently.
The two steps of this algorithm can be briefly characterized as follows.
EM is a generalization of the well-known Baum-Welch algorithm for HMM estimation [Baum et al.1970]; the original formulation for the case of SCFGs is due to Baker:79. For SCFGs, the E-step involves computing the expected number of times each production is applied in generating the training corpus. After that, the M-step consists of a simple normalization of these counts to yield the new production probabilities.
In this section we examine the computation of production count expectations required for the E-step. The crucial notion introduced by Baker:79 for this purpose is the ``outer probability'' of a nonterminal, or the joint probability that the nonterminal is generated with a given prefix and suffix of terminals. Essentially the same method can be used in the Earley framework, after extending the definition of outer probabilities to apply to arbitrary Earley states.
Outer probabilities complement inner probabilities in that they refer precisely to those parts of complete paths generating x not covered by the corresponding inner probability . Therefore, the choice of the production is not part of the outer probability associated with a state . In fact, the definition makes no reference to the first part of the RHS: all states sharing the same k, X and will have identical outer probabilities.
Intuitively, is the probability that an Earley parser operating as a string generator yields the prefix and the suffix , while passing through state at position i (which is independent of ). As was the case for forward probabilities, is actually an expectation of the number of such states in the path, as unit production cycles can result in multiple occurrences for a single state. Again, we gloss over this technicality in our terminology. The name is motivated by the fact that reduces to the ``outer probability'' of X as defined in Baker:79 if the dot is in final position.