|
|
(14 intermediate revisions by the same user not shown) |
Line 1: |
Line 1: |
| {{TOCright}}
| | #REDIRECT [[ist:Introduction to Syntax]] |
| | |
| = What is a Grammar? =
| |
| | |
| An unrestricted grammar is a quadruple <amsmath>G=(V,\Sigma,R,S)</amsmath>, where <amsmath>V</amsmath> is an alphabet; <amsmath>\Sigma</amsmath> is the set of terminal symbols (<amsmath>\Sigma\subseteq{}V</amsmath>); <amsmath>(V-\Sigma)</amsmath> is the set of non-terminal symbols; <amsmath>S</amsmath> is the initial symbol; and <amsmath>R</amsmath> is a set of rules (a finite subset of <amsmath>(V^*(V-\Sigma)V^*)\times{}V^*</amsmath>).
| |
| | |
| The following are defined:
| |
| | |
| * Direct derivation: <amsmath>u\underset{\text{\tiny G}}{\Rightarrow}v\;\text{iff}\;\exists_{w_1,w_2\in{}V^*}: \exists_{(u',v')\in{}R}: u=w_1u'w_2 \wedge v=w_1v'w_2</amsmath>
| |
| * Derivation: <amsmath>w_0\underset{\text{\tiny G}}{\Rightarrow}w_1\underset{\text{\tiny G}}{\Rightarrow}\cdots\underset{\text{\tiny G}}{\Rightarrow}w_n\;\Leftrightarrow{}w_0\overset{*}{\underset{\text{\tiny G}}{\Rightarrow}}w_n</amsmath>
| |
| * Generated language: <amsmath>L(G) = \{ w\: |\: w \in \Sigma^* \wedge{}S\overset{*}{\underset{\text{\tiny G}}{\Rightarrow}}w \}</amsmath>
| |
| | |
| == Context-free Grammars ==
| |
| | |
| The above grammar is unrestricted in its derivations capabilities and directly supports context-dependent rules.
| |
| | |
| In our coverage of programming language processing, though, only context-free grammars will be considered. This means that the <amsmath>R</amsmath> set is defined on <amsmath>(V-\Sigma)\times{}V^*</amsmath>), i.e., only non-terminals are allowed on the left side of a rule.
| |
| | |
| = The FIRST and FOLLOW sets =
| |
| | |
| == Computing the FIRST Set ==
| |
| | |
| The FIRST set for a given string or symbol can be computed as follows:
| |
| | |
| # If '''a''' is a terminal symbol, then FIRST('''a''') = {'''a'''}
| |
| # If X is a non-terminal symbol and X -> ε is a production then add ε to FIRST(X)
| |
| # If X is a non-terminal symbol and X -> Y<sub>1</sub>...Y<sub>n</sub> is a production, then | |
| #:a ∈ FIRST(X) if a ∈ FIRST(Y<sub>i</sub>) and ε ∈ FIRST(Y<sub>j</sub>), i>j (i.e., Y<sub>j</sub> <amsmath>\overset{*}{\Rightarrow}</amsmath> ε)
| |
| | |
| As an example, consider production X -> Y<sub>1</sub>...Y<sub>n</sub>
| |
| * If Y<sub>1</sub> <amsmath>\overset{*}{\nRightarrow}</amsmath> ε then FIRST(X) = FIRST(Y<sub>1</sub>)
| |
| * If Y<sub>1</sub> <amsmath>\overset{*}{\Rightarrow}</amsmath> ε and Y<sub>2</sub> <amsmath>\overset{*}{\nRightarrow}</amsmath> ε then FIRST(X) = FIRST(Y<sub>1</sub>) \ {ε} ∪ FIRST(Y<sub>2</sub>)
| |
| * If Y<sub>i</sub> <amsmath>\overset{*}{\Rightarrow}</amsmath> ε (∀i) then FIRST(X) = ∪<sub>i</sub>(FIRST(Y<sub>i</sub>)\{ε}) ∪ {ε}
| |
| | |
| The FIRST set can also be computed for a string Y<sub>1</sub>...Y<sub>n</sub> much in the same way as in case 3 above.
| |
| | |
| == Computing the FOLLOW Set ==
| |
| | |
| The FOLLOW set is computed for non-terminals and indicates the set of terminal symbols that are possible after a given non-terminal. The special symbol $ is used to represent the end of phrase (end of input).
| |
| | |
| # If X is the grammar's initial symbol then {$} ⊆ FOLLOW(X)
| |
| # If A -> αXβ is a production, then FIRST(β)\{ε} ⊆ FOLLOW(X)
| |
| # If A -> αX or A -> αXβ (β <amsmath>\overset{*}{\Rightarrow}</amsmath> ε), then FOLLOW(A) ⊆ FOLLOW(X)
| |
| | |
| The algorithm should be repeated until the FOLLOW set remains unchanged.
| |
| | |
| = Exercises =
| |
| | |
| * [[Introduction to Syntax/Exercise 1|Exercise 1]] - simple ambiguous grammar.
| |
| | |
| [[category:Teaching]]
| |
| [[category:Compilers]]
| |