Bottom-Up Parsing: Difference between revisions
From Wiki**3
Line 79: | Line 79: | ||
Example: | Example: | ||
Considering the grammar above, the start configuration is: | |||
DFA = {I0} = {ϵ-closure({[E'->•E$]})} | |||
Or: | |||
I0 = E'->•E$ | |||
E ->•E+T | |||
E ->•T | |||
T ->•T*F | |||
T ->•F | |||
F ->•(E) | |||
F ->•id | |||
Then, applying the goto function as described above (essentially, we will look only to the symbols immediately in from of the dot), we get: | |||
I0 = E'->•E$ I1 = E'->E•$ I4 = | |||
E ->•E+T E ->E•+T | |||
E ->•T | |||
T ->•T*F I2 = E ->T• | |||
T ->•F T ->T•*F | |||
F ->•(E) | |||
F ->•id I3 = T ->F• |
Revision as of 13:30, 28 April 2008
Bottom-up parsers start from the input tokens and try to build increasingly complex structures until the initial symbol is reached.
The bottom-up parsers covered in this section are the ones belonging to the LR family (left-right rightmost derivation). These parsers are also known as shift/reduce parsers, after the actions they perform (shift = input consumption; reduce = node building). The parsers covered are LR(0) (a simple parser without any kind of lookahead), SLR (a parser that builds only when the input matches the elements in the follow set of a rule being reduced), and LALR(1) (a parser that restricts the follow symbols that can be matched in each state).
The sections below describe, first, the LR(0) and SLR(1) algorithms, and then, the LALR(1) algorithm. All three parsers share the same "shift" and "goto" actions, but differ in the restrictions put on "reduce" actions.
Building SLR Parse Tables
For building SLR tables we will define the notion of LR(0) item.
An LR(0) item is, informally, a production with a dot (•). The dot represents the portion of the rule that was seen by the parser so far. When the dot crosses over a symbol (advancing its position within the rule), the parser has performed a shift action (if the symbol is a terminal) or a reduce (if the symbol is a non-terminal).
For instance:
E -> • A B C E -> A • B C
Internally, the parser may use a pair of integers for representing an item (one for the rule, another for the dot's position).
For each production in a grammar, there is a set of LR(0) items, each corresponding to a possible position of the dot.
The main idea is to build an automaton that recognizes viable prefixes from the input. This automaton may be thought of as an NFA in which every single LR(0) item and corresponding transition is considered. This NFA could then be converted into a DFA and minimized. This process is, however, too long and, since a direct approach exists, unnecessary.
The following steps describe the direct construction of the DFA.
The first step is to build the augmented grammar. This is a grammar like the original, but with an additional rule S' (S is the start symbol):
S' -> S $
This rule indicates that when S (the start symbol) is reduced, then the input should be accepted, i.e., we would go from item S' -> • S $ to item S' -> S • $.
Like for lexical analyzers, we will define two primitive functions: ϵ-closure and goto (similar to "move"), that will allow us to describe an algorithm for building the DFA.
ϵ-closure
Consider I a set of LR(0) items. The following conditions apply to rules in ϵ-closure(I):
- Initially, all elements of I are in ϵ-closure(I)
- If A->α•Bβ is in ϵ-closure(I) and B->γ is a production, the add B->•γ to ϵ-closure(I)
(repeat until the ϵ-closure(I) is unchanged)
Example:
Consider the following grammar (rule 0 corresponds to the augmented grammar):
0) E' -> E $ 1) E -> E + T 2) E -> T 3) T -> T * F 4) T -> F 5) F -> ( E ) 6) F -> id
Then:
I = {[E'->•E$]} ϵ-closure(I) = {[E'->•E$], [E->•E+T], [E->•T], [T->•T*F], [T->•F], [F->•(E)], [F->•id]}
Goto
The goto function is formally defined as goto(I,X) (I is a set of LR(0) items, and X is a grammar symbol).
goto(I,X) = ϵ-closure({[A->αX•β]: ∀[A->α•Xβ]∈I})
Informally, if γ was a viable prefix for the elements of I, the γX is a viable prefix for the elements of goto(I,X).
Example:
I = {[E'->E•$], [E->E•+T]} goto(I,+) = ϵ-closure({[E->E+•T]}) = {[E->E+•T], [T->•T*F], [T->•F], [F->•(E)], [F->•id]}
Building the DFA
The DFA (a set of sets) starts by containing a single element, consisting of the set obtained by computing the ϵ-closure of the singular set {[S'->•S$]} (S is the start symbol):
DFA = { ϵ-closure({[S'->•S$]}) }
Then, repeat until it is not possible to add any further elements, for each element (set) I in the DFA and for each grammar symbol X, add goto(I,X) (if not empty) to the DFA.
Example:
Considering the grammar above, the start configuration is:
DFA = {I0} = {ϵ-closure({[E'->•E$]})}
Or:
I0 = E'->•E$ E ->•E+T E ->•T T ->•T*F T ->•F F ->•(E) F ->•id
Then, applying the goto function as described above (essentially, we will look only to the symbols immediately in from of the dot), we get:
I0 = E'->•E$ I1 = E'->E•$ I4 = E ->•E+T E ->E•+T E ->•T T ->•T*F I2 = E ->T• T ->•F T ->T•*F F ->•(E) F ->•id I3 = T ->F•