next up previous
Next: Probabilistic Earley Parsing Up: Earley Parsing Previous: Scanning

Completion

For each complete state

displaymath7815

and each state in set j, tex2html_wrap_inline7827 , that has Y to the right of the dot,

displaymath7816

add the state

displaymath7817

(move the dot over the current nonterminal). A state produced by completion is called a completed state.gif Each completion corresponds to the end of a nonterminal expansion started by a matching prediction step.

For each input symbol and corresponding state set, an Earley parser performs all three operations exhaustively, i.e., until no new states are generated. One crucial insight into the working of the algorithm is that, although both prediction and completion feed themselves, there are only a finite number of states that can possibly be produced. Therefore recursive prediction and completion at each position have to terminate eventually, and the parser can proceed to the next input via scanning.

To complete the description we need only specify the initial and final states. The parser starts out with

displaymath7818

where S is the sentence nonterminal (note the empty left-hand side). After processing the last symbol, the parser verifies that

displaymath7819

has been produced (among possibly others), where l is the length of the input x. If at any intermediate stage a state set remains empty (because no states from the previous stage permit scanning) the parse can be aborted because an impossible prefix has been detected.

States with empty LHS such as those above are useful in other contexts, as will be shown in Section 5.4. We will collectively refer to them as dummy states. Dummy states enter the chart only as a result of initialization, as opposed to being derived from grammar productions.

It is easy to see that Earley parser operations are correct, in the sense that each chain of transitions (predictions, scanning steps, completions) corresponds to a possible (partial) derivation. Intuitively, it is also true that a parser that performs these transitions exhaustively is complete, i.e., it finds all possible derivations. Formal proofs of these properties are given in the literature, e.g., Aho:72. The relationship between Earley transitions and derivations will be stated more formally in the next section.

The parse trees for sentences can be reconstructed from the chart contents. We will illustrate this in Section 5 when discussing Viterbi parses.

Table 1 shows a simple grammar and a trace of Earley parser operation on a sample sentence.

Earley's parser can deal with any type of context-free rule format, even with null or tex2html_wrap_inline7691 -productions, i.e., those that replace a nonterminal with the empty string. Such productions do however require special attention, and make the algorithm and its description more complicated than otherwise necessary. In the following sections we assume that no null productions have to be dealt with, and then summarize the necessary changes in Section 4.7. One might chose to simply preprocess the grammar to eliminate null productions, a process which is also described.

  table4888
Table:   (a) Example grammar for a tiny fragment of English. (b) Earley parser processing the sentence a circle touches a triangle.


next up previous
Next: Probabilistic Earley Parsing Up: Earley Parsing Previous: Scanning

Andreas Stolcke
Sat Jun 29 21:49:02 PDT 1996