next up previous
Next: Assembling partial parses Up: Robust parsing Previous: Robust parsing

Seeding the chart

In standard Earley parsing the parser expects to find exactly one instance of an S nonterminal generating the entire input. This expectation is reflected by the fact that the chart is initialized with the dummy start state

displaymath8897

For robust parsing, we want to identify all nonterminals that can possibly generate any substring of the input. This can be accomplished by also placing dummy states

displaymath8898

for all positions k and nonterminals X, in the Earley chart prior to the start of normal operation. (In practice, dummy states need to be added only for those nonterminals X whose expansion can start with the current input symbol. This technique is discussed in Appendix B.3.2.)

The immediate effect of these extra states is that more predictions will be generated, from which more completions follow, etc. After finishing the processing of the jth state set, the chart will contain states

displaymath8899

indicating that nonterminal X generates the substring tex2html_wrap_inline8915 .

 
Table:   Robust parsing using the simple grammar from Table 1. (a) State sets generated from parsing the ungrammatical string a circle touches above a square. Dummy states (those with empty LHS) represent partial parses. States representing ``maximal'' partial parses are marked with MAX. Predictions that don't lead to completions have been omitted to save space. (b) Trace of wildcard state completions resulting in a list of complete partial parses for this input.

Table gif(a) illustrates the robust parsing process using the example grammar from Table 1 (p. gif).

Probabilities in the extra states are handled as follows. The initial dummy states are initialized with a forward probability of zero. This will ensure that the forward probabilities of all extra states remain at zero and don't interfere with the computation of prefix probabilities from the regular Earley states.

Inner probabilities on dummy states are initialized to unity just as for the S start state, and processed in the usual way. The inner probabilities for the each substring/nonterminal pair can then be read off of the complete dummy states.

Viterbi probabilities and Viterbi back-pointers can also be processed unchanged. Applying the Viterbi-parse procedure from Section 5.1 to the complete dummy states yields Viterbi parses for all substring/nonterminal pairs.


next up previous
Next: Assembling partial parses Up: Robust parsing Previous: Robust parsing

Andreas Stolcke
Sat Jun 29 21:49:02 PDT 1996