Linking and bottom-up filtering

Next: References Up: Efficient parsing with large Previous: Speeding up matrix inversions

Linking and bottom-up filtering

As discussed in Section 4.8, the worst-case run-time on fully parameterized CNF grammars is dominated by the completion step. However, this is not necessarily true of sparse grammars. Our experiments showed that the computation is dominated by the generation of Earley states during the prediction steps.

It is therefore worthwhile to minimize the total number of predicted states generated by the parser. Since predicted states only affect the derivation if they lead to subsequent scanning we can use the next input symbol to constrain the relevant predictions. To this end, we compute the extended left-corner relation , indicating which terminals can appear as left corners of which nonterminals. is a Boolean matrix with rows indexed by nonterminals and columns indexed by terminals. It can be computed as the product

where has a non-zero entry at (X,a) iff there is a production for nonterminal X that starts with terminal a. is the old left-corner relation.

During the prediction step we can ignore incoming states whose RHS nonterminal following the dot cannot have the current input as a left-corner, and then eliminate from the remaining predictions all those whose LHS cannot produce the current input as a left-corner. These filtering steps are very fast as they involve only table lookup.

This technique for speeding up Earley prediction is the exact converse of the ``linking'' method described by [chapter 6]Pereira:87 for improving the efficiency of bottom-up parsers. There, the extended left-corner relation is used for top-down filtering the bottom-up application of grammar rules. In our case, we use linking to provide bottom-up filtering for top-down application of productions.

On a test corpus this technique cut the number of generated predictions to almost 1/4 and speeded up parsing by a factor of 3.3. The corpus consisted of 1143 sentence with an average length of 4.65 words. The top-down prediction alone generated 991781 states and parsed at a rate of 590 milliseconds per sentence. With bottom-up filtered prediction only 262287 states were generated, resulting in 180 milliseconds per sentence.

Next: References Up: Efficient parsing with large Previous: Speeding up matrix inversions

Andreas Stolcke
Sat Jun 29 21:49:02 PDT 1996