Parsing bracketed inputs

Next: Complexity Up: Extensions Previous: Reverse completion (transitive)

Parsing bracketed inputs

The estimation procedure described above (and EM-based estimators in general) are only guaranteed to find locally optimal parameter estimates. Unfortunately, it seems that in the case of unconstrained SCFG estimation local maxima present a very real problem, and make success dependent on chance and initial conditions [Lari and Young1990]. Pereira:92 showed that partially bracketed input samples can alleviate the problem in certain cases. The bracketing information constrains the parse of the inputs, and therefore the parameter estimates, steering it clear from some of the suboptimal solutions that could otherwise be found.

An Earley parser can be minimally modified to take advantage of bracketed strings by invoking itself recursively when a left parenthesis is encountered. The recursive instance of the parser is passed any predicted states at that position, processes the input up to the matching right parenthesis, and hands complete states back to the invoking instance. This technique is efficient as it never explicitly rejects parses not consistent with the bracketing. It is also convenient as it leaves the basic parser operations, including the left-to-right processing and the probabilistic computations unchanged. For example, prefix probabilities conditioned on partial bracketings could be computed easily this way.

Parsing bracketed inputs is described in more detail in Stolcke:earley-tr, where it is also shown that bracketing gives the expected improved efficiency. For example, the modified Earley parser processes fully bracketed inputs in linear time. A second advantage of bracketed inputs is that they potentially allow more efficient processing, since the space of potential derivations (or equivalently, Earley paths) is reduced. It is therefore interesting to see how any given parser can incorporate partial bracketing information. This is typically not a big problem, but in the case of Earley's algorithm there is a particularly simple and elegant solution.

Consider again the grammar

A partially bracketed input for this grammar would be a (a a) a. The parentheses indicate phrase boundaries that any candidate parse has to be consistent with, e.g., there cannot be a parse that has a constituent spanning the first and second a, or the third and fourth. The supplied bracketing can be nested, of course, and need not be complete, i.e., within a bracketing there are still potentially several ways of parsing a substring.

The Earley parser can deal efficiently with partial bracketing information as follows. A partially bracketed input is processed as usual, left-to-right. When a bracketed portion is encountered, the parser invokes itself recursively on the substring delimited by the pair of parentheses encountered. More precisely:

The recursive parser instance gets to see only the substring as input.
Its chart is disjoint from the one used by the parent instance. It cannot use states from the parent chart, except those explicitly passed to it (see below). Conversely, when finished, the parent has access only to those states returned explicitly by the child instance. (The first restriction prevents parsing of constituents that cross the left phrase boundary, while the second restriction prevents a violation of the right phrase boundary.)
The chart of the child is initialized with all incomplete states from the parent's state set at the start position of the substring.
The child returns to the parent all (and only) the complete states from its last state set. The parent adds the returned states to the state set at the position immediately following the end of the substring, using it as the input for its own completion procedure.
Thus the recursive parser invocation and the following completion step replaces the usual prediction-scanning-completion cycle for the entire bracketed substring. After the child returns, the parent continues processing regular input symbols, or other bracketed substrings.
Needless to say, the child parser instance may itself call on recursive instances to deal with nested bracketings.

This recursion scheme is efficient in that it never explicitly rejects a parse that would be inconsistent with the bracketing. Instead it only considers those parses that are consistent with the bracketing, while continuing to make use of top-down information like a standard Earley parser.

Processing bracketed strings requires no modification to the computation of probabilities. Probabilities are passed between parent and child as part of states, and processed as before. The recursive control structure simply constrains the set of Earley paths considered by the parser, thereby affecting the probabilities indirectly. For example, ambiguous strings may end up with lower inner probabilities because some derivations are inconsistent with the bracketing.

Only the forward pass is directly affected by the bracketing. Both the Viterbi procedure (Section 5.1) and the reverse completion pass (Section 5.2) only examine the states already in the chart. They are therefore automatically constrained by the bracketing.

Complexity

Next: Complexity Up: Extensions Previous: Reverse completion (transitive)

Andreas Stolcke
Sat Jun 29 21:49:02 PDT 1996