next up previous
Next: Complexity Up: Extensions Previous: Reverse completion (transitive)

Parsing bracketed inputs

 

The estimation procedure described above (and EM-based estimators in general) are only guaranteed to find locally optimal parameter estimates. Unfortunately, it seems that in the case of unconstrained SCFG estimation local maxima present a very real problem, and make success dependent on chance and initial conditions [Lari and Young1990]. Pereira:92 showed that partially bracketed input samples can alleviate the problem in certain cases. The bracketing information constrains the parse of the inputs, and therefore the parameter estimates, steering it clear from some of the suboptimal solutions that could otherwise be found.

An Earley parser can be minimally modified to take advantage of bracketed strings by invoking itself recursively when a left parenthesis is encountered. The recursive instance of the parser is passed any predicted states at that position, processes the input up to the matching right parenthesis, and hands complete states back to the invoking instance. This technique is efficient as it never explicitly rejects parses not consistent with the bracketing. It is also convenient as it leaves the basic parser operations, including the left-to-right processing and the probabilistic computations unchanged. For example, prefix probabilities conditioned on partial bracketings could be computed easily this way.

Parsing bracketed inputs is described in more detail in Stolcke:earley-tr, where it is also shown that bracketing gives the expected improved efficiency. For example, the modified Earley parser processes fully bracketed inputs in linear time. A second advantage of bracketed inputs is that they potentially allow more efficient processing, since the space of potential derivations (or equivalently, Earley paths) is reduced. It is therefore interesting to see how any given parser can incorporate partial bracketing information. This is typically not a big problem, but in the case of Earley's algorithm there is a particularly simple and elegant solution.

Consider again the grammar

eqnarray6158

A partially bracketed input for this grammar would be a (a a) a. The parentheses indicate phrase boundaries that any candidate parse has to be consistent with, e.g., there cannot be a parse that has a constituent spanning the first and second a, or the third and fourth. The supplied bracketing can be nested, of course, and need not be complete, i.e., within a bracketing there are still potentially several ways of parsing a substring.

The Earley parser can deal efficiently with partial bracketing information as follows. A partially bracketed input is processed as usual, left-to-right. When a bracketed portion is encountered, the parser invokes itself recursively on the substring delimited by the pair of parentheses encountered. More precisely:

This recursion scheme is efficient in that it never explicitly rejects a parse that would be inconsistent with the bracketing. Instead it only considers those parses that are consistent with the bracketing, while continuing to make use of top-down information like a standard Earley parser.

Processing bracketed strings requires no modification to the computation of probabilities. Probabilities are passed between parent and child as part of states, and processed as before. The recursive control structure simply constrains the set of Earley paths considered by the parser, thereby affecting the probabilities indirectly. For example, ambiguous strings may end up with lower inner probabilities because some derivations are inconsistent with the bracketing.

Only the forward pass is directly affected by the bracketing. Both the Viterbi procedure (Section 5.1) and the reverse completion pass (Section 5.2) only examine the states already in the chart. They are therefore automatically constrained by the bracketing.




next up previous
Next: Complexity Up: Extensions Previous: Reverse completion (transitive)

Andreas Stolcke
Sat Jun 29 21:49:02 PDT 1996