|   |     | 
| (94 intermediate revisions by the same user not shown) | 
| Line 1: | Line 1: | 
|  | {{TOCright}}
 |  | #REDIRECT [[ist:Theoretical Aspects of Lexical Analysis]] | 
|  | Lexical analysis, the first step in the compilation process, splits the input data into segments and classifies them. Each segment of the input (a lexeme) will be assigned a label (the token).
 |  | 
|  |   |  | 
|  | In this case, we will be using regular expressions for recognizing portions of the input text.
 |  | 
|  |   |  | 
|  | == Regular Expressions ==
 |  | 
|  |   |  | 
|  | Regular expressions are defined considering a finite alphabet '''Σ = { a, b, ..., c }''' and the empty string '''ε''':
 |  | 
|  |   |  | 
|  | The languages (sets of strings) for each of these entities are:
 |  | 
|  | * '''{ε}''', for '''ε'''
 |  | 
|  | * '''{a}''', for an entry '''a''' in '''Σ'''
 |  | 
|  |   |  | 
|  | The following primitive constructors are defined:
 |  | 
|  | * concatenation
 |  | 
|  | * alternative
 |  | 
|  | * Kleene-star (*)
 |  | 
|  |   |  | 
|  | Extensions (derived from the above):
 |  | 
|  | * Transitive closure (+) - a+ ("one or more 'a'")
 |  | 
|  | * Optionality (?) - a? ("zero or one 'a'")
 |  | 
|  | * Character classes - [a-z] ("all chars in the 'a-z' range" - only one character is matched)
 |  | 
|  |   |  | 
|  | == Recognizing/Matching Regular Expressions ==
 |  | 
|  |   |  | 
|  | == Building the NFA: Thompson's Algorithm ==
 |  | 
|  |   |  | 
|  | <dot> 
 |  | 
|  | 	rankdir=LR;    
 |  | 
|  | 	node [shape = doublecircle]; LR_0 LR_3 LR_8;
 |  | 
|  | 	node [shape = circle];
 |  | 
|  | 	LR_0 -> LR_2 [ label = "SS(B)" ];
 |  | 
|  | 	LR_0 -> LR_1 [ label = "SS(S)" ];
 |  | 
|  | 	LR_1 -> LR_3 [ label = "S($end)" ];
 |  | 
|  | 	LR_2 -> LR_6 [ label = "SS(b)" ];
 |  | 
|  | 	LR_2 -> LR_5 [ label = "SS(a)" ];
 |  | 
|  | 	LR_2 -> LR_4 [ label = "S(A)" ];
 |  | 
|  |         LR_4 -> LR_8 [ label = "S(D)" ];  
 |  | 
|  | 	LR_5 -> LR_7 [ label = "S(a)" ];
 |  | 
|  | 	LR_5 -> LR_5 [ label = "S(b)" ];
 |  | 
|  | 	LR_6 -> LR_6 [ label = "S(b)" ];
 |  | 
|  | 	LR_6 -> LR_5 [ label = "S(a)" ];
 |  | 
|  | 	LR_7 -> LR_8 [ label = "S(b)" ];
 |  | 
|  | 	LR_7 -> LR_5 [ label = "S(a)" ];
 |  | 
|  | 	LR_8 -> LR_6 [ label = "S(b)" ];
 |  | 
|  | 	LR_8 -> LR_5 [ label = "S(a)" ];
 |  | 
|  | </dot>
 |  | 
|  |   |  | 
|  | == Building DFAs from NFAs ==
 |  | 
|  |   |  | 
|  | == DFA Minimization ==
 |  | 
|  |   |  | 
|  | == Input Processing ==
 |  | 
|  |   |  | 
|  | == Recognizing Multiple Expressions ==
 |  | 
|  |   |  | 
|  | == Example 1:Ambiguous Expressions ==
 |  | 
|  |   |  | 
|  | == Example 2: Backtracking ==
 |  | 
|  |   |  | 
|  | [[category:Teaching]]
 |  | 
|  | [[category:Compilers]]
 |  |