Top-Down Parsing

From Wiki**3

Revision as of 16:18, 4 April 2008 by Root (talk | contribs)

LL(1)

LL(1) is a top-down, also called predictive, parsing algorithm that works by reading input Left-right and building the Leftmost-derivations using 1-symbol lookahead.

Parsing Algorithm

The parsing algorithm works as follows: at the beginning the stack contains the initial symbol and the input contains the full input string. Then, for each non-terminal symbol at the top of the stack, select the rule whose FIRST set contains a terminal symbol that matches the first input symbol. If one cannot be found, the parser rejects the input and processing stops. If a match can be found, however, then the non-terminal symbol at the top of the stack is replaced with the body of the rule selected in the previous step (the symbols in the rule's body are pushed from right to left, allowing the leftmost symbol to be processed first). If at any stage the top of the stack contains a terminal symbol, the input must match it (otherwise, input is rejected) and both are removed (from the stack and from the input). The parsing process continues until both the stack and the input are empty.

Problems

Since the parser relies on the FIRST set to decide which rule to select at each stage, the FIRST sets of each production for a non-terminal must be disjoint (otherwise, there would be ambiguity). This type of ambiguity may arise either because various productions start with the same string or because of left corners (elements in their FIRST sets may coincide with elements in other productions).

The following rule shows a case of direct ambiguity in 'a'. Non-terminal C is a left corner of A and must also be dealt with:

A -> a M N d | a X Y z | C D e C -> a | x M

Left-recursion in rules is also a problem, since the parser replaces items on the stack with their bodies: if a rule's body starts with the symbol that has just left the stack, then the parser is in no better situation to parse the input (in fact, infinite recursion ensues). In addition to "plain" left-recursion, mutual left-recursion (in which two or more rules depend on each other via left corners) is also a problem.

The following rule shows direct left-recursion (in A) and indirect recursion (between the third production of A and the first production of B):

A -> a M N d | B X Y z | A D e B -> A M d | i N A