Theoretical Aspects of Lexical Analysis: Difference between revisions

Latest revision as of 22:06, 5 December 2018

Redirect to:

@@ Line 1: / Line 1: @@
-{{TOCright}}
+#REDIRECT [[ist:Theoretical Aspects of Lexical Analysis]]
-Lexical analysis, the first step in the compilation process, splits the input data into segments and classifies them. Each segment of the input (a lexeme) will be assigned a label (the token).
-In this case, we will be using regular expressions for recognizing portions of the input text.
-== Regular Expressions ==
-Regular expressions are defined considering a finite alphabet '''Î£ = { a, b, ..., c }''' and the empty string '''Îµ''':
-The languages (sets of strings) for each of these entities are:
-* '''{Îµ}''', for '''Îµ'''
-* '''{a}''', for an entry '''a''' in '''Î£'''
-The following primitive constructors are defined:
-* concatenation
-* alternative
-* Kleene-star (*)
-Extensions (derived from the above):
-* Transitive closure (+) - a+ ("one or more 'a'")
-* Optionality (?) - a? ("zero or one 'a'")
-* Character classes - [a-z] ("all chars in the 'a-z' range" - only one character is matched)
-== Recognizing Regular Expressions ==
-== Building the NFA: Thompson's Algorithm ==
-== Building DFAs from NFAs ==
-== DFA Minimization ==
-== Input Processing ==
-== Recognizing Multiple Expressions ==
-== Example 1: Ambiguous Expressions ==
-== Example 2: Backtracking ==
-[[category:Teaching]]
-[[category:Compilers]]