|
|
(94 intermediate revisions by the same user not shown) |
Line 1: |
Line 1: |
| {{TOCright}}
| | #REDIRECT [[ist:Theoretical Aspects of Lexical Analysis]] |
| Lexical analysis, the first step in the compilation process, splits the input data into segments and classifies them. Each segment of the input (a lexeme) will be assigned a label (the token).
| |
| | |
| In this case, we will be using regular expressions for recognizing portions of the input text.
| |
| | |
| == Regular Expressions ==
| |
| | |
| Regular expressions are defined considering a finite alphabet '''Σ = { a, b, ..., c }''' and the empty string '''ε''':
| |
| | |
| The languages (sets of strings) for each of these entities are:
| |
| * '''{ε}''', for '''ε'''
| |
| * '''{a}''', for an entry '''a''' in '''Σ'''
| |
| | |
| The following primitive constructors are defined:
| |
| * concatenation
| |
| * alternative
| |
| * Kleene-star (*)
| |
| | |
| Extensions (derived from the above):
| |
| * Transitive closure (+) - a+ ("one or more 'a'")
| |
| * Optionality (?) - a? ("zero or one 'a'")
| |
| * Character classes - [a-z] ("all chars in the 'a-z' range" - only one character is matched)
| |
| | |
| == Recognizing/Matching Regular Expressions ==
| |
| | |
| == Building the NFA: Thompson's Algorithm ==
| |
| | |
| <dot>
| |
| rankdir=LR;
| |
| node [shape = doublecircle]; LR_0 LR_3 LR_8;
| |
| node [shape = circle];
| |
| LR_0 -> LR_2 [ label = "SS(B)" ];
| |
| LR_0 -> LR_1 [ label = "SS(S)" ];
| |
| LR_1 -> LR_3 [ label = "S($end)" ];
| |
| LR_2 -> LR_6 [ label = "SS(b)" ];
| |
| LR_2 -> LR_5 [ label = "SS(a)" ];
| |
| LR_2 -> LR_4 [ label = "S(A)" ];
| |
| LR_4 -> LR_8 [ label = "S(D)" ];
| |
| LR_5 -> LR_7 [ label = "S(a)" ];
| |
| LR_5 -> LR_5 [ label = "S(b)" ];
| |
| LR_6 -> LR_6 [ label = "S(b)" ];
| |
| LR_6 -> LR_5 [ label = "S(a)" ];
| |
| LR_7 -> LR_8 [ label = "S(b)" ];
| |
| LR_7 -> LR_5 [ label = "S(a)" ];
| |
| LR_8 -> LR_6 [ label = "S(b)" ];
| |
| LR_8 -> LR_5 [ label = "S(a)" ];
| |
| </dot>
| |
| | |
| == Building DFAs from NFAs ==
| |
| | |
| == DFA Minimization ==
| |
| | |
| == Input Processing ==
| |
| | |
| == Recognizing Multiple Expressions ==
| |
| | |
| == Example 1: Ambiguous Expressions ==
| |
| | |
| == Example 2: Backtracking ==
| |
| | |
| [[category:Teaching]]
| |
| [[category:Compilers]]
| |