Theoretical Aspects of Lexical Analysis: Difference between revisions
From Wiki**3
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
{{TOCright}} | {{TOCright}} | ||
Lexical analysis, the first step in the compilation process, splits the input data into segments and classifies them. Each segment of the input (a lexeme) will be assigned a label (the token). | |||
In this case, we will be using regular expressions for recognizing portions of the input text. | |||
== Regular Expressions == | == Regular Expressions == | ||
Regular expressions are defined considering an alphabet { a, b, ..., c } and the empty string ''eps''. | |||
* eps | |||
* a | |||
Primitive constructors: | |||
* concatenation | |||
* alternative | |||
* Kleene-star (*) | |||
Extensions: | |||
* Transitive closure (+) - a+ ("one or more 'a'") | |||
* Optionality (?) - a? ("zero or one 'a'") | |||
* Character classes - [a-z] ("all chars in the 'a-z' range" - only one character is matched) | |||
== Recognizing Regular Expressions == | == Recognizing Regular Expressions == |
Revision as of 01:18, 14 March 2008
Lexical analysis, the first step in the compilation process, splits the input data into segments and classifies them. Each segment of the input (a lexeme) will be assigned a label (the token).
In this case, we will be using regular expressions for recognizing portions of the input text.
Regular Expressions
Regular expressions are defined considering an alphabet { a, b, ..., c } and the empty string eps.
- eps
- a
Primitive constructors:
- concatenation
- alternative
- Kleene-star (*)
Extensions:
- Transitive closure (+) - a+ ("one or more 'a'")
- Optionality (?) - a? ("zero or one 'a'")
- Character classes - [a-z] ("all chars in the 'a-z' range" - only one character is matched)