Theoretical Aspects of Lexical Analysis: Difference between revisions

From Wiki**3

 
(96 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{TOCright}}
#REDIRECT [[ist:Theoretical Aspects of Lexical Analysis]]
Lexical analysis, the first step in the compilation process, splits the input data into segments and classifies them. Each segment of the input (a lexeme) will be assigned a label (the token).
 
In this case, we will be using regular expressions for recognizing portions of the input text.
 
== Regular Expressions ==
 
Regular expressions are defined considering a finite alphabet '''Σ = { a, b, ..., c }''' and the empty string '''ε''':
 
The languages (sets of strings) for each of these entities are:
* '''{ε}''', for '''ε'''
* '''{a}''', for an entry '''a''' in '''Σ'''
 
The following primitive constructors are defined:
* concatenation
* alternative
* Kleene-star (*)
 
Extensions (derived from the above):
* Transitive closure (+) - a+ ("one or more 'a'")
* Optionality (?) - a? ("zero or one 'a'")
* Character classes - [a-z] ("all chars in the 'a-z' range" - only one character is matched)
 
== Recognizing Regular Expressions ==
 
== Building the NFA: Thompson's Algorithm ==
 
 
== Building DFAs from NFAs ==
 
== DFA Minimization ==
 
== Input Processing ==
 
== Recognizing Multiple Expressions ==
 
== Example 1: Ambiguous Expressions ==
 
== Example 2: Backtracking ==
 
[[category:Teaching]]
[[category:Compilers]]

Latest revision as of 22:06, 5 December 2018