|   |     | 
| (93 intermediate revisions by the same user not shown) | 
| Line 1: | Line 1: | 
|  | {{TOCright}}
 |  | #REDIRECT [[ist:Theoretical Aspects of Lexical Analysis]] | 
|  | Lexical analysis, the first step in the compilation process, splits the input data into segments and classifies them. Each segment of the input (a lexeme) will be assigned a label (the token).
 |  | 
|  |   |  | 
|  | In this case, we will be using regular expressions for recognizing portions of the input text.
 |  | 
|  |   |  | 
|  | == Regular Expressions ==
 |  | 
|  |   |  | 
|  | Regular expressions are defined considering a finite alphabet '''Σ = { a, b, ..., c }''' and the empty string '''ε''':
 |  | 
|  |   |  | 
|  | The languages (sets of strings) for each of these entities are:
 |  | 
|  | * '''{ε}''', for '''ε'''
 |  | 
|  | * '''{a}''', for an entry '''a''' in '''Σ'''
 |  | 
|  |   |  | 
|  | The following primitive constructors are defined:
 |  | 
|  | * concatenation
 |  | 
|  | * alternative
 |  | 
|  | * Kleene-star (*)
 |  | 
|  |   |  | 
|  | Extensions (derived from the above):
 |  | 
|  | * Transitive closure (+) - a+ ("one or more 'a'")
 |  | 
|  | * Optionality (?) - a? ("zero or one 'a'")
 |  | 
|  | * Character classes - [a-z] ("all chars in the 'a-z' range" - only one character is matched)
 |  | 
|  |   |  | 
|  | == Recognizing/Matching Regular Expressions ==
 |  | 
|  |   |  | 
|  | == Building the NFA: Thompson's Algorithm ==
 |  | 
|  |   |  | 
|  | == Building DFAs from NFAs ==
 |  | 
|  |   |  | 
|  | == DFA Minimization ==
 |  | 
|  |   |  | 
|  | == Input Processing ==
 |  | 
|  |   |  | 
|  | == Recognizing Multiple Expressions ==
 |  | 
|  |   |  | 
|  | == Example 1: Ambiguous Expressions ==
 |  | 
|  |   |  | 
|  | == Example 2: Backtracking ==
 |  | 
|  |   |  | 
|  | [[category:Teaching]] |  | 
|  | [[category:Compilers]]
 |  |