Theoretical Aspects of Lexical Analysis

Lexical analysis, the first step in the compilation process, splits the input data into segments and classifies them. Each segment of the input (a lexeme) will be assigned a label (the token).

In this case, we will be using regular expressions for recognizing portions of the input text.

Regular Expressions

Regular expressions are defined considering an alphabet { a, b, ..., c } and the empty string eps.

eps
a

Primitive constructors:

concatenation
alternative
Kleene-star (*)

Extensions:

Transitive closure (+) - a+ ("one or more 'a'")
Optionality (?) - a? ("zero or one 'a'")
Character classes - [a-z] ("all chars in the 'a-z' range" - only one character is matched)

Recognizing Regular Expressions

Building the NFA: Thompson's Algorithm

Building DFAs from NFAs

DFA Minimization

Input Processing

Recognizing Multiple Expressions

Example 1: Ambiguous Expressions

Example 2: Backtracking

Theoretical Aspects of Lexical Analysis

From Wiki**3

Revision as of 01:18, 14 March 2008 by Root (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Contents

Regular Expressions

Recognizing Regular Expressions

Building the NFA: Thompson's Algorithm

Building DFAs from NFAs

DFA Minimization

Input Processing

Recognizing Multiple Expressions

Example 1: Ambiguous Expressions

Example 2: Backtracking

Theoretical Aspects of Lexical Analysis

From Wiki**3

Revision as of 01:18, 14 March 2008 by Root (talk | contribs)(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Regular Expressions

Recognizing Regular Expressions

Building the NFA: Thompson's Algorithm

Building DFAs from NFAs

DFA Minimization

Input Processing

Recognizing Multiple Expressions

Example 1: Ambiguous Expressions

Example 2: Backtracking

Revision as of 01:18, 14 March 2008 by Root (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)