Top-Down Parsing/Example 1

From Wiki**3

< Top-Down Parsing

Problem

Consider the following grammar with the following set of terminal symbols {+,-,*,/,(,),id}:

 E -> E + T | E - T | T
 T -> T * F | T / F | F
 F -> ( E ) | id
  1. Rewrite the grammar so that it can be analyzed by an LL(1) predictive parser.
  2. Compute the FIRST and FOLLOW sets for the non-terminal symbols.
  3. Build the parse table for the predictive parser.
  4. Process the input phrase a*(b+c) detailing the contents of the stack, the input and each action performed by the parser.

Solution

The grammar can be parsed by an LL(1) parser if it does not have left recursion and no ambiguity is present (i.e., the LOOKAHEADs for all productions of each non-terminal are disjoint).

A simple inspection of the grammar shows that left recursion is present in both E and T rules. Also, there are left corners that may hide ambiguity.

The first step, then, is to rewrite the grammar so that left recursion is eliminated:

 E  -> T E'
 E' -> + T E' | - T E' | ε
 T  -> F T'
 T' -> * F T' | / F T' | ε
 F  -> ( E ) | id

The new grammar still has left corners, but a cursory inspection shows that they are not problematic. Nevertheless, they should be eliminated:

 E  -> ( E ) T' E' | id T' E'
 E' -> + T E' | - T E' | ε
 T  -> ( E ) T' | id T'
 T' -> * F T' | / F T' | ε
 F  -> ( E ) | id
 FIRST(E)  = FIRST(( E ) T' E') ∪ FIRST(id T' E') = {(, id}
 FIRST(E') = FIRST(+ T E') ∪ FIRST(- T E') ∪ {ε} = {+, -, ε}
 FIRST(T)  = FIRST(( E ) T') ∪ FIRST(id T') = {(, id}
 FIRST(T') = FIRST(* F T') ∪ FIRST(/ F T') ∪ {ε} = {*, /, ε}
 FIRST(F)  = FIRST(( E )) ∪ FIRST(id) = {(, id}
 FOLLOW(E)  = {$} ∪ {)} = {), $}
 FOLLOW(E') = FOLLOW(E) = {), $}
 FOLLOW(T)  = FIRST(E')\{ε} ∪ FOLLOW(E') = {+, -, ), $}
 FOLLOW(T') = FOLLOW(T) = {+, -, ), $}
 FOLLOW(F)  = FIRST(T')\{ε} ∪ FOLLOW(T') = {*, /, +, -, ), $}

For building the parse table, we will consider the FIRST and FOLLOW sets above.

+ - * / id ( ) $
E E -> id T' E' E -> ( E ) T' E'
E' E' -> + T E' E' -> - T E' E' -> ε E' -> ε
T T -> id T' T -> ( E ) T'
T' T' -> ε T' -> ε T' -> * F T' T' -> / F T' T' -> ε T' -> ε
F F -> id F -> ( E )

The following table show the parsing of the a*(b+c) input sequence.

STACK INPUT ACTION
E$ a*(b+c)$ E -> id T' E'
idT'E'$ a*(b+c)$ -> id ≡ a
T'E'$
*(b+c)$ T' -> * F T'
*FT'E'$
*(b+c)$ -> *
FT'E'$ (b+c)$ F -> ( E )
(E)T'E'$
(b+c)$ -> (
E)T'E'$ b+c)$ E -> id T' E'
idT'E')T'E'$ b+c)$ -> id ≡ b
T'E')T'E'$ +c)$ T' -> ε
E')T'E'$ +c)$ E' -> + T E'
+TE')T'E'$ +c)$ -> +
TE')T'E'$ c)$ T -> id T'
idT'E')T'E'$ c)$ -> id ≡ c
T'E')T'E'$ )$ T' -> ε
E')T'E'$ )$ E' -> ε
)T'E'$ )$ -> )
T'E'$ $ T' -> ε
E'$ $ E' -> ε
$ $ ACCEPT