diff options
Diffstat (limited to 'lib/stdlib/doc/src/erl_scan.xml')
-rw-r--r-- | lib/stdlib/doc/src/erl_scan.xml | 417 |
1 files changed, 417 insertions, 0 deletions
diff --git a/lib/stdlib/doc/src/erl_scan.xml b/lib/stdlib/doc/src/erl_scan.xml new file mode 100644 index 0000000000..4175146c3c --- /dev/null +++ b/lib/stdlib/doc/src/erl_scan.xml @@ -0,0 +1,417 @@ +<?xml version="1.0" encoding="latin1" ?> +<!DOCTYPE erlref SYSTEM "erlref.dtd"> + +<erlref> + <header> + <copyright> + <year>1996</year><year>2009</year> + <holder>Ericsson AB. All Rights Reserved.</holder> + </copyright> + <legalnotice> + The contents of this file are subject to the Erlang Public License, + Version 1.1, (the "License"); you may not use this file except in + compliance with the License. You should have received a copy of the + Erlang Public License along with this software. If not, it can be + retrieved online at http://www.erlang.org/. + + Software distributed under the License is distributed on an "AS IS" + basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See + the License for the specific language governing rights and limitations + under the License. + + </legalnotice> + + <title>erl_scan</title> + <prepared>Robert Virding</prepared> + <responsible>Bjarne Däcker</responsible> + <docno>1</docno> + <approved>Bjarne Däcker</approved> + <checked></checked> + <date>97-01-24</date> + <rev>B</rev> + <file>erl_scan.sgml</file> + </header> + <module>erl_scan</module> + <modulesummary>The Erlang Token Scanner</modulesummary> + <description> + <p>This module contains functions for tokenizing characters into + Erlang tokens.</p> + </description> + <section> + <title>DATA TYPES</title> + <code type="none"> +category() = atom() +column() = integer() > 0 +line() = integer() +location() = line() | {line(), column()} +reserved_word_fun() -> fun(atom()) -> bool() +set_attribute_fun() -> fun(term()) -> term() +symbol() = atom() | float() | integer() | string() +token() = {category(), attributes()} | {category(), attributes(), symbol()} +attributes() = line() | list() | tuple()</code> + </section> + <funcs> + <func> + <name>string(String) -> Return</name> + <name>string(String, StartLocation) -> Return</name> + <name>string(String, StartLocation, Options) -> Return</name> + <fsummary>Scan a string and return the Erlang tokens</fsummary> + <type> + <v>String = string()</v> + <v>Return = {ok, Tokens, EndLocation} | Error</v> + <v>Tokens = [token()]</v> + <v>Error = {error, ErrorInfo, EndLocation}</v> + <v>StartLocation = EndLocation = location()</v> + <v>Options = Option | [Option]</v> + <v>Option = {reserved_word_fun,reserved_word_fun()} + | return_comments | return_white_spaces | return + | text</v> + </type> + <desc> + <p>Takes the list of characters <c>String</c> and tries to + scan (tokenize) them. Returns <c>{ok, Tokens, EndLocation}</c>, + where <c>Tokens</c> are the Erlang tokens from + <c>String</c>. <c>EndLocation</c> is the first location + after the last token.</p> + <p><c>{error, ErrorInfo, EndLocation}</c> is returned if an + error occurs. <c>EndLocation</c> is the first location after + the erroneous token.</p> + <p><c>string(String)</c> is equivalent to + <c>string(String, 1)</c>, and <c>string(String, + StartLocation)</c> is equivalent to <c>string(String, + StartLocation, [])</c>.</p> + <p><c>StartLocation</c> indicates the initial location when + scanning starts. If <c>StartLocation</c> is a line + <c>attributes()</c> as well as <c>EndLocation</c> and + <c>ErrorLocation</c> will be lines. If + <c>StartLocation</c> is a pair of a line and a column + <c>attributes()</c> takes the form of an opaque compound + data type, and <c>EndLocation</c> and <c>ErrorLocation</c> + will be pairs of a line and a column. The <em>token + attributes</em> contain information about the column and the + line where the token begins, as well as the text of the + token (if the <c>text</c> option is given), all of which can + be accessed by calling <seealso + marker="#token_info/1">token_info/1,2</seealso> or <seealso + marker="#attributes_info/1">attributes_info/1,2</seealso>.</p> + <p>A <em>token</em> is a tuple containing information about + syntactic category, the token attributes, and the actual + terminal symbol. For punctuation characters (e.g. <c>;</c>, + <c>|</c>) and reserved words, the category and the symbol + coincide, and the token is represented by a two-tuple. + Three-tuples have one of the following forms: <c>{atom, + Info, atom()}</c>, + <c>{char, Info, integer()}</c>, <c>{comment, Info, + string()}</c>, <c>{float, Info, float()}</c>, <c>{integer, + Info, integer()}</c>, <c>{var, Info, atom()}</c>, + and <c>{white_space, Info, string()}</c>.</p> + <p>The valid options are:</p> + <taglist> + <tag><c>{reserved_word_fun, reserved_word_fun()}</c></tag> + <item><p>A callback function that is called when the scanner + has found an unquoted atom. If the function returns + <c>true</c>, the unquoted atom itself will be the category + of the token; if the function returns <c>false</c>, + <c>atom</c> will be the category of the unquoted atom.</p> + </item> + <tag><c>return_comments</c></tag> + <item><p>Return comment tokens.</p> + </item> + <tag><c>return_white_spaces</c></tag> + <item><p>Return white space tokens. By convention, if there is + a newline character, it is always the first character of the + text (there cannot be more than one newline in a white space + token).</p> + </item> + <tag><c>return</c></tag> + <item><p>Short for <c>[return_comments, return_white_spaces]</c>.</p> + </item> + <tag><c>text</c></tag> + <item><p>Include the token's text in the token attributes. The + text is the part of the input corresponding to the token.</p> + </item> + </taglist> + </desc> + </func> + <func> + <name>tokens(Continuation, CharSpec, StartLocation) -> Return</name> + <name>tokens(Continuation, CharSpec, StartLocation, Options) -> Return</name> + <fsummary>Re-entrant scanner</fsummary> + <type> + <v>Continuation = [] | Continuation1</v> + <v>Return = {done, Result, LeftOverChars} | {more, Continuation1}</v> + <v>LeftOverChars = CharSpec</v> + <v>CharSpec = string() | eof</v> + <v>Continuation1 = tuple()</v> + <v>Result = {ok, Tokens, EndLocation} | {eof, EndLocation} | Error</v> + <v>Tokens = [token()]</v> + <v>Error = {error, ErrorInfo, EndLocation}</v> + <v>StartLocation = EndLocation = location()</v> + <v>Options = Option | [Option]</v> + <v>Option = {reserved_word_fun,reserved_word_fun()} + | return_comments | return_white_spaces | return</v> + </type> + <desc> + <p>This is the re-entrant scanner which scans characters until + a <em>dot</em> ('.' followed by a white space) or + <c>eof</c> has been reached. It returns:</p> + <taglist> + <tag><c>{done, Result, LeftOverChars}</c></tag> + <item> + <p>This return indicates that there is sufficient input + data to get a result. <c>Result</c> is:</p> + <taglist> + <tag><c>{ok, Tokens, EndLocation}</c></tag> + <item> + <p>The scanning was successful. <c>Tokens</c> is the + list of tokens including <em>dot</em>.</p> + </item> + <tag><c>{eof, EndLocation}</c></tag> + <item> + <p>End of file was encountered before any more tokens.</p> + </item> + <tag><c>{error, ErrorInfo, EndLocation}</c></tag> + <item> + <p>An error occurred. <c>LeftOverChars</c> is the remaining + characters of the input data, + starting from <c>EndLocation</c>.</p> + </item> + </taglist> + </item> + <tag><c>{more, Continuation1}</c></tag> + <item> + <p>More data is required for building a term. + <c>Continuation1</c> must be passed in a new call to + <c>tokens/3,4</c> when more data is available.</p> + </item> + </taglist> + <p>The <c>CharSpec</c> <c>eof</c> signals end of file. + <c>LeftOverChars</c> will then take the value <c>eof</c> as + well.</p> + <p><c>tokens(Continuation, CharSpec, StartLocation)</c> is + equivalent to <c>tokens(Continuation, CharSpec, + StartLocation, [])</c>.</p> + <p>See <seealso marker="#string/3">string/3</seealso> for a + description of the various options.</p> + </desc> + </func> + <func> + <name>reserved_word(Atom) -> bool()</name> + <fsummary>Test for a reserved word</fsummary> + <type> + <v>Atom = atom()</v> + </type> + <desc> + <p>Returns <c>true</c> if <c>Atom</c> is an Erlang reserved + word, otherwise <c>false</c>.</p> + </desc> + </func> + <func> + <name>token_info(Token) -> TokenInfo</name> + <fsummary>Return information about a token</fsummary> + <type> + <v>Token = token()</v> + <v>TokenInfo = [TokenInfoTuple]</v> + <v>TokenInfoTuple = {TokenItem, Info}</v> + <v>TokenItem = atom()</v> + <v>Info = term()</v> + </type> + <desc> + <p>Returns a list containing information about the token + <c>Token</c>. The order of the <c>TokenInfoTuple</c>s is not + defined. The following <c>TokenItem</c>s are returned: + <c>category</c>, <c>column</c>, <c>length</c>, + <c>line</c>, <c>symbol</c>, and <c>text</c>. See <seealso + marker="#token_info/2">token_info/2</seealso> for + information about specific + <c>TokenInfoTuple</c>s.</p> + <p>Note that if <c>token_info(Token, TokenItem)</c> returns + <c>undefined</c> for some <c>TokenItem</c> in the list above, the + item is not included in <c>TokenInfo</c>.</p> + </desc> + </func> + <func> + <name>token_info(Token, TokenItemSpec) -> TokenInfo</name> + <fsummary>Return information about a token</fsummary> + <type> + <v>Token = token()</v> + <v>TokenItemSpec = TokenItem | [TokenItem]</v> + <v>TokenInfo = TokenInfoTuple | undefined | [TokenInfoTuple]</v> + <v>TokenInfoTuple = {TokenItem, Info}</v> + <v>TokenItem = atom()</v> + <v>Info = term()</v> + </type> + <desc> + <p>Returns a list containing information about the token + <c>Token</c>. If <c>TokenItemSpec</c> is a single + <c>TokenItem</c>, the returned value is the corresponding + <c>TokenInfoTuple</c>, or <c>undefined</c> if the + <c>TokenItem</c> has no value. If <c>TokenItemSpec</c> is a + list of + <c>TokenItem</c>, the result is a list of + <c>TokenInfoTuple</c>. The <c>TokenInfoTuple</c>s will + appear with the corresponding + <c>TokenItem</c>s in the same order as the <c>TokenItem</c>s + appeared in the list of <c>TokenItem</c>s. <c>TokenItem</c>s + with no value are not included in the list of + <c>TokenInfoTuple</c>.</p> + <p>The following <c>TokenInfoTuple</c>s with corresponding + <c>TokenItem</c>s are valid:</p> + <taglist> + <tag><c>{category, category()}</c></tag> + <item><p>The category of the token.</p> + </item> + <tag><c>{column, column()}</c></tag> + <item><p>The column where the token begins.</p> + </item> + <tag><c>{length, integer() > 0}</c></tag> + <item><p>The length of the token's text.</p> + </item> + <tag><c>{line, line()}</c></tag> + <item><p>The line where the token begins.</p> + </item> + <tag><c>{location, location()}</c></tag> + <item><p>The line and column where the token begins, or + just the line if the column unknown.</p> + </item> + <tag><c>{symbol, symbol()}</c></tag> + <item><p>The token's symbol.</p> + </item> + <tag><c>{text, string()}</c></tag> + <item><p>The token's text..</p> + </item> + </taglist> + </desc> + </func> + <func> + <name>attributes_info(Attributes) -> AttributesInfo</name> + <fsummary>Return information about token attributes</fsummary> + <type> + <v>Attributes = attributes()</v> + <v>AttributesInfo = [AttributeInfoTuple]</v> + <v>AttributeInfoTuple = {AttributeItem, Info}</v> + <v>AttributeItem = atom()</v> + <v>Info = term()</v> + </type> + <desc> + <p>Returns a list containing information about the token + attributes <c>Attributes</c>. The order of the + <c>AttributeInfoTuple</c>s is not defined. The following + <c>AttributeItem</c>s are returned: + <c>column</c>, <c>length</c>, <c>line</c>, and <c>text</c>. + See <seealso + marker="#attributes_info/2">attributes_info/2</seealso> for + information about specific + <c>AttributeInfoTuple</c>s.</p> + <p>Note that if <c>attributes_info(Token, AttributeItem)</c> + returns <c>undefined</c> for some <c>AttributeItem</c> in + the list above, the item is not included in + <c>AttributesInfo</c>.</p> + </desc> + </func> + <func> + <name>attributes_info(Attributes, AttributeItemSpec) -> AttributesInfo</name> + <fsummary>Return information about a token attributes</fsummary> + <type> + <v>Attributes = attributes()</v> + <v>AttributeItemSpec = AttributeItem | [AttributeItem]</v> + <v>AttributesInfo = AttributeInfoTuple | undefined + | [AttributeInfoTuple]</v> + <v>AttributeInfoTuple = {AttributeItem, Info}</v> + <v>AttributeItem = atom()</v> + <v>Info = term()</v> + </type> + <desc> + <p>Returns a list containing information about the token + attributes <c>Attributes</c>. If <c>AttributeItemSpec</c> is + a single <c>AttributeItem</c>, the returned value is the + corresponding <c>AttributeInfoTuple</c>, or <c>undefined</c> + if the <c>AttributeItem</c> has no value. If + <c>AttributeItemSpec</c> is a list of + <c>AttributeItem</c>, the result is a list of + <c>AttributeInfoTuple</c>. The <c>AttributeInfoTuple</c>s + will appear with the corresponding <c>AttributeItem</c>s in + the same order as the <c>AttributeItem</c>s appeared in the + list of <c>AttributeItem</c>s. <c>AttributeItem</c>s with no + value are not included in the list of + <c>AttributeInfoTuple</c>.</p> + <p>The following <c>AttributeInfoTuple</c>s with corresponding + <c>AttributeItem</c>s are valid:</p> + <taglist> + <tag><c>{column, column()}</c></tag> + <item><p>The column where the token begins.</p> + </item> + <tag><c>{length, integer() > 0}</c></tag> + <item><p>The length of the token's text.</p> + </item> + <tag><c>{line, line()}</c></tag> + <item><p>The line where the token begins.</p> + </item> + <tag><c>{location, location()}</c></tag> + <item><p>The line and column where the token begins, or + just the line if the column unknown.</p> + </item> + <tag><c>{text, string()}</c></tag> + <item><p>The token's text..</p> + </item> + </taglist> + </desc> + </func> + <func> + <name>set_attribute(AttributeItem, Attributes, SetAttributeFun) -> AttributesInfo</name> + <fsummary>Set a token attribute value</fsummary> + <type> + <v>AttributeItem = line</v> + <v>Attributes = attributes()</v> + <v>SetAttributeFun = set_attribute_fun()</v> + </type> + <desc> + <p>Sets the value of the <c>line</c> attribute of the token + attributes <c>Attributes</c>.</p> + <p>The <c>SetAttributeFun</c> is called with the value of + the <c>line</c> attribute, and is to return the new value of + the <c>line</c> attribute.</p> + </desc> + </func> + <func> + <name>format_error(ErrorDescriptor) -> string()</name> + <fsummary>Format an error descriptor</fsummary> + <type> + <v>ErrorDescriptor = errordesc()</v> + </type> + <desc> + <p>Takes an <c>ErrorDescriptor</c> and returns a string which + describes the error or warning. This function is usually + called implicitly when processing an <c>ErrorInfo</c> + structure (see below).</p> + </desc> + </func> + </funcs> + + <section> + <title>Error Information</title> + <p>The <c>ErrorInfo</c> mentioned above is the standard + <c>ErrorInfo</c> structure which is returned from all IO + modules. It has the following format:</p> + <code type="none"> +{ErrorLocation, Module, ErrorDescriptor}</code> + <p>A string which describes the error is obtained with the + following call:</p> + <code type="none"> +Module:format_error(ErrorDescriptor)</code> + </section> + + <section> + <title>Notes</title> + <p>The continuation of the first call to the re-entrant input + functions must be <c>[]</c>. Refer to Armstrong, Virding and + Williams, 'Concurrent Programming in Erlang', Chapter 13, for a + complete description of how the re-entrant input scheme works.</p> + </section> + + <section> + <title>See Also</title> + <p><seealso marker="io">io(3)</seealso>, + <seealso marker="erl_parse">erl_parse(3)</seealso></p> + </section> +</erlref> |