A regular expression based lexical analyzer generator for Erlang, similar to lex or flex.
The Leex module should be considered experimental as it will be subject to changes in future releases.
ErrorInfo = {ErrorLine,module(),error_descriptor()}
ErrorLine = integer()
Token = tuple()
Generates a lexical analyzer from the definition in the input
file. The input file has the extension
The current options are:
Generates a
Uses a specific or customised prologue file
instead of default
Causes errors to be printed as they occur. Default is
Causes warnings to be printed as they occur. Default is
This is a short form for both
If this flag is set,
If this flag is set, an extra field containing
This is a short form for both
Outputs information from parsing the input file and generating the internal tables.
Any of the Boolean options can be set to
Leex will add the extension
Returns a string which describes the error
The following functions are exported by the generated scanner.
Scans
It is an error if not all of the characters in
This is a re-entrant call to try and scan one token from
It is not designed to be called directly by an application but used through the i/o system where it can typically be called in an application by:
io:request(InFile, {get_until,Prompt,Module,token,[Line]})
-> TokenRet
This is a re-entrant call to try and scan tokens from
This functions differs from
It is not designed to be called directly by an application but used through the i/o system where it can typically be called in an application by:
io:request(InFile, {get_until,Prompt,Module,tokens,[Line]})
-> TokensRet
Erlang style comments starting with a
<Header>
Definitions.
<Macro Definitions>
Rules.
<Token Rules>
Erlang code.
<Erlang code>
The "Definitions.", "Rules." and "Erlang code." headings are mandatory and must occur at the beginning of a source line. The <Header>, <Macro Definitions> and <Erlang code> sections may be empty but there must be at least one rule.
Macro definitions have the following format:
NAME = VALUE
and there must be spaces around
When macros are expanded in expressions the macro calls are replaced by the macro value without any form of quoting or enclosing in parentheses.
Rules have the following format:
<Regexp> : <Erlang code>.
The <Regexp> must occur at the start of a line and not
include any blanks; use
A list of the characters in the matched token.
The number of characters in the matched token.
The line number where the token occurred.
The code must return:
Return
Return
Skip this token completely.
An error in the token,
It is also possible to push back characters into the input characters with the following returns:
These have the same meanings as the normal returns but the
characters in
Pushing back characters gives you unexpected possibilities to cause the scanner to loop!
The following example would match a simple Erlang integer or float and return a token which could be sent to the Erlang parser:
D = [0-9]
{D}+ :
{token,{integer,TokenLine,list_to_integer(TokenChars)}}.
{D}+\\.{D}+((E|e)(\\+|\\-)?{D}+)? :
{token,{float,TokenLine,list_to_float(TokenChars)}}.
The Erlang code in the "Erlang code." section is written into the output file directly after the module declaration and predefined exports declaration so it is possible to add extra exports, define imports and other attributes which are then visible in the whole file.
The regular expressions allowed here is a subset of the set
found in
Matches the non-metacharacter c.
Matches the escape sequence or literal character c.
Matches any character.
Matches the beginning of a string.
Matches the end of a string.
Character class, which matches any of the characters
Negated character class, which matches any character
except
Alternation. It matches either
Concatenation. It matches
Matches one or more
Matches zero or more
Matches zero or one
Grouping. It matches
The escape sequences allowed are the same as for Erlang strings:
Backspace.
Form feed.
Newline (line feed).
Carriage return.
Tab.
Escape.
Vertical tab.
Space.
Delete.
The octal value
The hexadecimal value
The hexadecimal value
Any other character literally, for example
The following examples define Erlang data types:
Atoms [a-z][0-9a-zA-Z_]*
Variables [A-Z_][0-9a-zA-Z_]*
Floats (\\+|-)?[0-9]+\\.[0-9]+((E|e)(\\+|-)?[0-9]+)?
Anchoring a regular expression with