Theoretical Aspects of Lexical Analysis: Difference between revisions
From Wiki**3
| No edit summary | |||
| Line 6: | Line 6: | ||
| == Regular Expressions == | == Regular Expressions == | ||
| Regular expressions are defined considering  | Regular expressions are defined considering a finite alphabet '''Σ = { a, b, ..., c }''' and the empty string '''ε''': | ||
| The languages (sets of strings) for each of these entities are: | |||
| * '''{ε}''', for '''ε''' | |||
| * '''{a}''', for an entry '''a''' in '''Σ''' | |||
| The following primitive constructors are defined: | |||
| * concatenation | * concatenation | ||
| * alternative | * alternative | ||
| * Kleene-star (*) | * Kleene-star (*) | ||
| Extensions: | Extensions (derived from the above): | ||
| * Transitive closure (+) - a+ ("one or more 'a'") | * Transitive closure (+) - a+ ("one or more 'a'") | ||
| * Optionality (?) - a? ("zero or one 'a'") | * Optionality (?) - a? ("zero or one 'a'") | ||
Revision as of 01:24, 14 March 2008
Lexical analysis, the first step in the compilation process, splits the input data into segments and classifies them. Each segment of the input (a lexeme) will be assigned a label (the token).
In this case, we will be using regular expressions for recognizing portions of the input text.
Regular Expressions
Regular expressions are defined considering a finite alphabet Σ = { a, b, ..., c } and the empty string ε:
The languages (sets of strings) for each of these entities are:
- {ε}, for ε
- {a}, for an entry a in Σ
The following primitive constructors are defined:
- concatenation
- alternative
- Kleene-star (*)
Extensions (derived from the above):
- Transitive closure (+) - a+ ("one or more 'a'")
- Optionality (?) - a? ("zero or one 'a'")
- Character classes - [a-z] ("all chars in the 'a-z' range" - only one character is matched)