From 84adefa331c4159d432d22840663c38f155cd4c1 Mon Sep 17 00:00:00 2001 From: Erlang/OTP Date: Fri, 20 Nov 2009 14:54:40 +0000 Subject: The R13B03 release. --- lib/stdlib/doc/src/erl_scan.xml | 417 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 417 insertions(+) create mode 100644 lib/stdlib/doc/src/erl_scan.xml (limited to 'lib/stdlib/doc/src/erl_scan.xml') diff --git a/lib/stdlib/doc/src/erl_scan.xml b/lib/stdlib/doc/src/erl_scan.xml new file mode 100644 index 0000000000..4175146c3c --- /dev/null +++ b/lib/stdlib/doc/src/erl_scan.xml @@ -0,0 +1,417 @@ + + + + +
+ + 19962009 + Ericsson AB. All Rights Reserved. + + + The contents of this file are subject to the Erlang Public License, + Version 1.1, (the "License"); you may not use this file except in + compliance with the License. You should have received a copy of the + Erlang Public License along with this software. If not, it can be + retrieved online at http://www.erlang.org/. + + Software distributed under the License is distributed on an "AS IS" + basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See + the License for the specific language governing rights and limitations + under the License. + + + + erl_scan + Robert Virding + Bjarne Däcker + 1 + Bjarne Däcker + + 97-01-24 + B + erl_scan.sgml +
+ erl_scan + The Erlang Token Scanner + +

This module contains functions for tokenizing characters into + Erlang tokens.

+
+
+ DATA TYPES + +category() = atom() +column() = integer() > 0 +line() = integer() +location() = line() | {line(), column()} +reserved_word_fun() -> fun(atom()) -> bool() +set_attribute_fun() -> fun(term()) -> term() +symbol() = atom() | float() | integer() | string() +token() = {category(), attributes()} | {category(), attributes(), symbol()} +attributes() = line() | list() | tuple() +
+ + + string(String) -> Return + string(String, StartLocation) -> Return + string(String, StartLocation, Options) -> Return + Scan a string and return the Erlang tokens + + String = string() + Return = {ok, Tokens, EndLocation} | Error + Tokens = [token()] + Error = {error, ErrorInfo, EndLocation} + StartLocation = EndLocation = location() + Options = Option | [Option] + Option = {reserved_word_fun,reserved_word_fun()} + | return_comments | return_white_spaces | return + | text + + +

Takes the list of characters String and tries to + scan (tokenize) them. Returns {ok, Tokens, EndLocation}, + where Tokens are the Erlang tokens from + String. EndLocation is the first location + after the last token.

+

{error, ErrorInfo, EndLocation} is returned if an + error occurs. EndLocation is the first location after + the erroneous token.

+

string(String) is equivalent to + string(String, 1), and string(String, + StartLocation) is equivalent to string(String, + StartLocation, []).

+

StartLocation indicates the initial location when + scanning starts. If StartLocation is a line + attributes() as well as EndLocation and + ErrorLocation will be lines. If + StartLocation is a pair of a line and a column + attributes() takes the form of an opaque compound + data type, and EndLocation and ErrorLocation + will be pairs of a line and a column. The token + attributes contain information about the column and the + line where the token begins, as well as the text of the + token (if the text option is given), all of which can + be accessed by calling token_info/1,2 or attributes_info/1,2.

+

A token is a tuple containing information about + syntactic category, the token attributes, and the actual + terminal symbol. For punctuation characters (e.g. ;, + |) and reserved words, the category and the symbol + coincide, and the token is represented by a two-tuple. + Three-tuples have one of the following forms: {atom, + Info, atom()}, + {char, Info, integer()}, {comment, Info, + string()}, {float, Info, float()}, {integer, + Info, integer()}, {var, Info, atom()}, + and {white_space, Info, string()}.

+

The valid options are:

+ + {reserved_word_fun, reserved_word_fun()} +

A callback function that is called when the scanner + has found an unquoted atom. If the function returns + true, the unquoted atom itself will be the category + of the token; if the function returns false, + atom will be the category of the unquoted atom.

+
+ return_comments +

Return comment tokens.

+
+ return_white_spaces +

Return white space tokens. By convention, if there is + a newline character, it is always the first character of the + text (there cannot be more than one newline in a white space + token).

+
+ return +

Short for [return_comments, return_white_spaces].

+
+ text +

Include the token's text in the token attributes. The + text is the part of the input corresponding to the token.

+
+
+
+
+ + tokens(Continuation, CharSpec, StartLocation) -> Return + tokens(Continuation, CharSpec, StartLocation, Options) -> Return + Re-entrant scanner + + Continuation = [] | Continuation1 + Return = {done, Result, LeftOverChars} | {more, Continuation1} + LeftOverChars = CharSpec + CharSpec = string() | eof + Continuation1 = tuple() + Result = {ok, Tokens, EndLocation} | {eof, EndLocation} | Error + Tokens = [token()] + Error = {error, ErrorInfo, EndLocation} + StartLocation = EndLocation = location() + Options = Option | [Option] + Option = {reserved_word_fun,reserved_word_fun()} + | return_comments | return_white_spaces | return + + +

This is the re-entrant scanner which scans characters until + a dot ('.' followed by a white space) or + eof has been reached. It returns:

+ + {done, Result, LeftOverChars} + +

This return indicates that there is sufficient input + data to get a result. Result is:

+ + {ok, Tokens, EndLocation} + +

The scanning was successful. Tokens is the + list of tokens including dot.

+
+ {eof, EndLocation} + +

End of file was encountered before any more tokens.

+
+ {error, ErrorInfo, EndLocation} + +

An error occurred. LeftOverChars is the remaining + characters of the input data, + starting from EndLocation.

+
+
+
+ {more, Continuation1} + +

More data is required for building a term. + Continuation1 must be passed in a new call to + tokens/3,4 when more data is available.

+
+
+

The CharSpec eof signals end of file. + LeftOverChars will then take the value eof as + well.

+

tokens(Continuation, CharSpec, StartLocation) is + equivalent to tokens(Continuation, CharSpec, + StartLocation, []).

+

See string/3 for a + description of the various options.

+
+
+ + reserved_word(Atom) -> bool() + Test for a reserved word + + Atom = atom() + + +

Returns true if Atom is an Erlang reserved + word, otherwise false.

+
+
+ + token_info(Token) -> TokenInfo + Return information about a token + + Token = token() + TokenInfo = [TokenInfoTuple] + TokenInfoTuple = {TokenItem, Info} + TokenItem = atom() + Info = term() + + +

Returns a list containing information about the token + Token. The order of the TokenInfoTuples is not + defined. The following TokenItems are returned: + category, column, length, + line, symbol, and text. See token_info/2 for + information about specific + TokenInfoTuples.

+

Note that if token_info(Token, TokenItem) returns + undefined for some TokenItem in the list above, the + item is not included in TokenInfo.

+
+
+ + token_info(Token, TokenItemSpec) -> TokenInfo + Return information about a token + + Token = token() + TokenItemSpec = TokenItem | [TokenItem] + TokenInfo = TokenInfoTuple | undefined | [TokenInfoTuple] + TokenInfoTuple = {TokenItem, Info} + TokenItem = atom() + Info = term() + + +

Returns a list containing information about the token + Token. If TokenItemSpec is a single + TokenItem, the returned value is the corresponding + TokenInfoTuple, or undefined if the + TokenItem has no value. If TokenItemSpec is a + list of + TokenItem, the result is a list of + TokenInfoTuple. The TokenInfoTuples will + appear with the corresponding + TokenItems in the same order as the TokenItems + appeared in the list of TokenItems. TokenItems + with no value are not included in the list of + TokenInfoTuple.

+

The following TokenInfoTuples with corresponding + TokenItems are valid:

+ + {category, category()} +

The category of the token.

+
+ {column, column()} +

The column where the token begins.

+
+ {length, integer() > 0} +

The length of the token's text.

+
+ {line, line()} +

The line where the token begins.

+
+ {location, location()} +

The line and column where the token begins, or + just the line if the column unknown.

+
+ {symbol, symbol()} +

The token's symbol.

+
+ {text, string()} +

The token's text..

+
+
+
+
+ + attributes_info(Attributes) -> AttributesInfo + Return information about token attributes + + Attributes = attributes() + AttributesInfo = [AttributeInfoTuple] + AttributeInfoTuple = {AttributeItem, Info} + AttributeItem = atom() + Info = term() + + +

Returns a list containing information about the token + attributes Attributes. The order of the + AttributeInfoTuples is not defined. The following + AttributeItems are returned: + column, length, line, and text. + See attributes_info/2 for + information about specific + AttributeInfoTuples.

+

Note that if attributes_info(Token, AttributeItem) + returns undefined for some AttributeItem in + the list above, the item is not included in + AttributesInfo.

+
+
+ + attributes_info(Attributes, AttributeItemSpec) -> AttributesInfo + Return information about a token attributes + + Attributes = attributes() + AttributeItemSpec = AttributeItem | [AttributeItem] + AttributesInfo = AttributeInfoTuple | undefined + | [AttributeInfoTuple] + AttributeInfoTuple = {AttributeItem, Info} + AttributeItem = atom() + Info = term() + + +

Returns a list containing information about the token + attributes Attributes. If AttributeItemSpec is + a single AttributeItem, the returned value is the + corresponding AttributeInfoTuple, or undefined + if the AttributeItem has no value. If + AttributeItemSpec is a list of + AttributeItem, the result is a list of + AttributeInfoTuple. The AttributeInfoTuples + will appear with the corresponding AttributeItems in + the same order as the AttributeItems appeared in the + list of AttributeItems. AttributeItems with no + value are not included in the list of + AttributeInfoTuple.

+

The following AttributeInfoTuples with corresponding + AttributeItems are valid:

+ + {column, column()} +

The column where the token begins.

+
+ {length, integer() > 0} +

The length of the token's text.

+
+ {line, line()} +

The line where the token begins.

+
+ {location, location()} +

The line and column where the token begins, or + just the line if the column unknown.

+
+ {text, string()} +

The token's text..

+
+
+
+
+ + set_attribute(AttributeItem, Attributes, SetAttributeFun) -> AttributesInfo + Set a token attribute value + + AttributeItem = line + Attributes = attributes() + SetAttributeFun = set_attribute_fun() + + +

Sets the value of the line attribute of the token + attributes Attributes.

+

The SetAttributeFun is called with the value of + the line attribute, and is to return the new value of + the line attribute.

+
+
+ + format_error(ErrorDescriptor) -> string() + Format an error descriptor + + ErrorDescriptor = errordesc() + + +

Takes an ErrorDescriptor and returns a string which + describes the error or warning. This function is usually + called implicitly when processing an ErrorInfo + structure (see below).

+
+
+
+ +
+ Error Information +

The ErrorInfo mentioned above is the standard + ErrorInfo structure which is returned from all IO + modules. It has the following format:

+ +{ErrorLocation, Module, ErrorDescriptor} +

A string which describes the error is obtained with the + following call:

+ +Module:format_error(ErrorDescriptor) +
+ +
+ Notes +

The continuation of the first call to the re-entrant input + functions must be []. Refer to Armstrong, Virding and + Williams, 'Concurrent Programming in Erlang', Chapter 13, for a + complete description of how the re-entrant input scheme works.

+
+ +
+ See Also +

io(3), + erl_parse(3)

+
+
-- cgit v1.2.3