From 68d53c01b0b8e9a007a6a30158c19e34b2d2a34e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B6rn=20Gustavsson?= Functional, extendible arrays. Arrays can have fixed size, or
-can grow automatically as needed. A default value is used for entries
-that have not been explicitly set. Arrays uses zero based indexing. This is a deliberate design
-choice and differs from other erlang datastructures, e.g. tuples. Unless specified by the user when the array is created, the default
- value is the atom The array never shrinks automatically; if an index Examples:
- A functional, extendible array. The representation is
- not documented and is subject to change without notice. Note that
- arrays cannot be directly compared for equality. Get the value used for uninitialized entries.
- See also: Fix the size of the array. This prevents it from growing
- automatically upon insertion; see also See also: Fold the elements of the array using the given function and
- initial accumulator value. The elements are visited in order from the
- lowest index to the highest. If See also: Fold the elements of the array right-to-left using the given
- function and initial accumulator value. The elements are visited in
- order from the highest index to the lowest. If See also: Equivalent to Convert a list to an extendible array. See also: Equivalent to Convert an ordered list of pairs See also: Get the value of entry If the array does not have fixed size, this function will return the
- default value for any index See also: Returns Check if the array has fixed size.
- Returns See also: Map the given function onto each element of the array. The
- elements are visited in order from the lowest index to the highest.
- If See also: Create a new, extendible array with initial size zero. See also: Create a new array according to the given options. By default,
-the array is extendible and has initial size zero. Array indices
-start at 0. Specifies the initial size of the array; this also implies
- Creates a fixed-size array; see also Creates an extendible (non fixed-size) array. Sets the default value for the array to
-Options are processed in the order they occur in the list, i.e.,
-later options have higher precedence. The default value is used as the value of uninitialized entries, and
-cannot be changed once the array has been created. Examples:
- creates a fixed-size array of size 100.
- creates an empty, extendible array
- whose default value is 0.
- creates an
- extendible array with initial size 10 whose default value is -1.
- See also: Create a new array according to the given size and options. If
- If Example:
- creates a fixed-size array of size
- 100, whose default value is 0.
- See also: Make the array resizable. (Reverses the effects of See also: Reset entry If See also: Change the size of the array to that reported by See also: Change the size of the array. If Set entry If the array does not have fixed size, and See also: Get the number of entries in the array. Entries are numbered
- from 0 to See also: Fold the elements of the array using the given function and
- initial accumulator value, skipping default-valued entries. The
- elements are visited in order from the lowest index to the highest.
- If See also: Fold the elements of the array right-to-left using the given
- function and initial accumulator value, skipping default-valued
- entries. The elements are visited in order from the highest index to
- the lowest. If See also: Map the given function onto each element of the array, skipping
- default-valued entries. The elements are visited in order from the
- lowest index to the highest. If See also: Get the number of entries in the array up until the last
- non-default valued entry. In other words, returns See also: Converts the array to a list, skipping default-valued entries.
- See also: Convert the array to an ordered list of pairs See also: Converts the array to a list.
- See also: Convert the array to an ordered list of pairs See also: Functional, extendible arrays. Arrays can have fixed size, or can grow
+ automatically as needed. A default value is used for entries that have not
+ been explicitly set. Arrays uses zero-based indexing. This is a deliberate design
+ choice and differs from other Erlang data structures, for example,
+ tuples. Unless specified by the user when the array is created, the default
+ value is the atom The array never shrinks automatically. If an index Examples: Create a fixed-size array with entries 0-9 set to Create an extendible array and set entry 17 to Read back a stored value: Accessing an unset entry returns default value: Accessing an entry beyond the last set entry also returns the default
+ value, if the array does not have fixed size: "Sparse" functions ignore default-valued entries: An extendible array can be made fixed-size later: A fixed-size array does not grow automatically and does not allow
+ accesses beyond the last set entry: A functional, extendible array. The representation is not documented
+ and is subject to change without notice. Notice that arrays cannot be
+ directly compared for equality. Gets the value used for uninitialized entries. See also Fixes the array size. This prevents it from growing automatically
+ upon insertion. See also Folds the array elements using the specified function and initial
+ accumulator value. The elements are visited in order from the lowest
+ index to the highest. If See also Folds the array elements right-to-left using the specified function
+ and initial accumulator value. The elements are visited in order from
+ the highest index to the lowest. If See also Equivalent to
+ Converts a list to an extendible array. See also Equivalent to
+ Converts an ordered list of pairs See also Gets the value of entry If the array does not have fixed size, the default value for any
+ index See also Returns Checks if the array has fixed size. Returns See also Maps the specified function onto each array element. The elements are
+ visited in order from the lowest index to the highest. If
+ See also Creates a new, extendible array with initial size zero. See also Creates a new array according to the specified otions. By default,
+ the array is extendible and has initial size zero. Array indices
+ start at Specifies the initial array size; this also implies
+ Creates a fixed-size array. See also
+ Creates an extendible (non-fixed-size) array. Sets the default value for the array to Options are processed in the order they occur in the list, that is,
+ later options have higher precedence. The default value is used as the value of uninitialized entries, and
+ cannot be changed once the array has been created. Examples: creates a fixed-size array of size 100. creates an empty, extendible array whose default value is creates an extendible array with initial size 10 whose default value
+ is See also Creates a new array according to the specified size and options. If
+ If Example: creates a fixed-size array of size 100, whose default value is
+ See also Makes the array resizable. (Reverses the effects of
+ See also Resets entry If See also Changes the array size to that reported by
+ See also Change the array size. If Sets entry If the array does not have fixed size, and See also Gets the number of entries in the array. Entries are numbered from
+ See also Folds the array elements using the specified function and initial
+ accumulator value, skipping default-valued entries. The elements are
+ visited in order from the lowest index to the highest. If
+ See also Folds the array elements right-to-left using the specified
+ function and initial accumulator value, skipping default-valued
+ entries. The elements are visited in order from the highest index to
+ the lowest. If See also Maps the specified function onto each array element, skipping
+ default-valued entries. The elements are visited in order from the
+ lowest index to the highest. If See also Gets the number of entries in the array up until the last
+ non-default-valued entry. That is, returns See also Converts the array to a list, skipping default-valued entries. See also Converts the array to an ordered list of pairs See also
+ Converts the array to a list. See also Converts the array to an ordered list of pairs See also
+ The include file These macros are defined in the Stdlib include file
- %% Create a fixed-size array with entries 0-9 set to 'undefined'
- A0 = array:new(10).
- 10 = array:size(A0).
-
- %% Create an extendible array and set entry 17 to 'true',
- %% causing the array to grow automatically
- A1 = array:set(17, true, array:new()).
- 18 = array:size(A1).
-
- %% Read back a stored value
- true = array:get(17, A1).
-
- %% Accessing an unset entry returns the default value
- undefined = array:get(3, A1).
-
- %% Accessing an entry beyond the last set entry also returns the
- %% default value, if the array does not have fixed size
- undefined = array:get(18, A1).
-
- %% "sparse" functions ignore default-valued entries
- A2 = array:set(4, false, A1).
- [{4, false}, {17, true}] = array:sparse_to_orddict(A2).
-
- %% An extendible array can be made fixed-size later
- A3 = array:fix(A2).
-
- %% A fixed-size array does not grow automatically and does not
- %% allow accesses beyond the last set entry
- {'EXIT',{badarg,_}} = (catch array:set(18, true, A3)).
- {'EXIT',{badarg,_}} = (catch array:get(18, A3)). array:new(100)
array:new({default,0}) array:new([{size,10},{fixed,false},{default,-1}]) array:new(100, {default,0})
+A0 = array:new(10).
+10 = array:size(A0).
+
+
+A1 = array:set(17, true, array:new()).
+18 = array:size(A1).
+
+
+true = array:get(17, A1).
+
+
+undefined = array:get(3, A1)
+
+
+undefined = array:get(18, A1).
+
+
+A2 = array:set(4, false, A1).
+[{4, false}, {17, true}] = array:sparse_to_orddict(A2).
+
+
+A3 = array:fix(A2).
+
+
+{'EXIT',{badarg,_}} = (catch array:set(18, true, A3)).
+{'EXIT',{badarg,_}} = (catch array:get(18, A3)).
+
+array:new(100)
+
+array:new({default,0})
+
+array:new([{size,10},{fixed,false},{default,-1}])
+
+array:new(100, {default,0})
+
+ assertions in your program code.
Include the following directive in the module from which the function is + called:
+ +
-include_lib("stdlib/include/assert.hrl").
- When an assertion succeeds, the assert macro yields the atom
-
If the macro
When an assertion succeeds, the assert macro yields the atom
If the macro
For example, using
+ disable all assertions:
+
+
erlc -DNOASSERT=true *.erl
- (The value of NOASSERT does not matter, only the fact that it
- is defined.)
+
+ The value of NOASSERT does not matter, only the fact that it is
+ defined.
+
A few other macros also have effect on the enabling or disabling of
- assertions:
+ assertions:
+
- - If
NODEBUG is defined, it implies NOASSERT , unless
- DEBUG is also defined, which is assumed to take
- precedence.
- - If
ASSERT is defined, it overrides NOASSERT , that
- is, the assertions will remain enabled.
+ If NODEBUG is defined, it implies NOASSERT , unless
+ DEBUG is also defined, which is assumed to take precedence.
+
+ If ASSERT is defined, it overrides NOASSERT , that
+ is, the assertions remain enabled.
- If you prefer, you can thus use only DEBUG /NODEBUG as
- the main flags to control the behaviour of the assertions (which is
- useful if you have other compiler conditionals or debugging macros
- controlled by those flags), or you can use ASSERT /NOASSERT
- to control only the assert macros.
+ If you prefer, you can thus use only DEBUG /NODEBUG as the
+ main flags to control the behavior of the assertions (which is useful if
+ you have other compiler conditionals or debugging macros controlled by
+ those flags), or you can use ASSERT /NOASSERT to control only
+ the assert macros.
Macros
assert(BoolExpr)
- Tests that BoolExpr completes normally returning
- true .
+ -
+
Tests that BoolExpr completes normally returning
+ true .
-
assertNot(BoolExpr)
- Tests that BoolExpr completes normally returning
- false .
+ -
+
Tests that BoolExpr completes normally returning
+ false .
-
assertMatch(GuardedPattern, Expr)
- Tests that Expr completes normally yielding a value
- that matches GuardedPattern . For example:
+ -
+
Tests that Expr completes normally yielding a value that
+ matches GuardedPattern , for example:
- ?assertMatch({bork, _}, f())
- Note that a guard when ... can be included:
+?assertMatch({bork, _}, f())
+ Notice that a guard
- ?assertMatch({bork, X} when X > 0, f())
+?assertMatch({bork, X} when X > 0, f())
-
Tests that
As in
Tests that
As in
Tests that
Tests that
Tests that
Tests that
Tests that
Note that both
Tests that
Notice that both
Tests that
As in
Tests that
As in
Equivalent to
Equivalent to
Equivalent to
Equivalent to
Equivalent to
Equivalent to
Implements base 64 encode and decode, see RFC2045.
+Provides base64 encode and decode, see
+
A
A
Encodes a plain ASCII string into base64. The result will - be 33% larger than the data.
-Decodes a base64 encoded string to plain ASCII. See RFC4648.
-
Decodes a base64-encoded string to plain ASCII. See
+
Encodes a plain ASCII string into base64. The result is 33% larger + than the data.
This module provides an interface to files created by + the BEAM Compiler ("BEAM files"). The format used, a variant of "EA IFF 1985" Standard for Interchange Format Files, divides data into chunks.
+Chunk data can be returned as binaries or as compound terms. Compound terms are returned when chunks are referenced by names - (atoms) rather than identifiers (strings). The names recognized - and the corresponding identifiers are:
+ (atoms) rather than identifiers (strings). The recognized names + and the corresponding identifiers are as follows: +The option
Option
Source code can be reconstructed from the debug information. - Use encrypted debug information (see below) to prevent this.
+ To prevent this, use encrypted debug information (see below).The debug information can also be removed from BEAM files
- using
Here is an example of how to reconstruct source code from
- the debug information in a BEAM file
- {ok,{_,[{abstract_code,{_,AC}}]}} = beam_lib:chunks(Beam,[abstract_code]).
- io:fwrite("~s~n", [erl_prettypr:format(erl_syntax:form_list(AC))]).
- The debug information can be encrypted in order to keep - the source code secret, but still being able to use tools such as - Xref or Debugger.
-To use encrypted debug information, a key must be provided to
- the compiler and
The default type -- and currently the only type -- of crypto
- algorithm is
As far as we know by the time of writing, it is
- infeasible to break
There are two ways to provide the key:
-Use the compiler option
If no such fun is registered,
Store the key in a text file named
In this case, the compiler option
The following example shows how to reconstruct source code from
+ the debug information in a BEAM file
+{ok,{_,[{abstract_code,{_,AC}}]}} = beam_lib:chunks(Beam,[abstract_code]).
+io:fwrite("~s~n", [erl_prettypr:format(erl_syntax:form_list(AC))]).
The
- {debug_info, Mode, Module, Key}
- The
Here is an example of an
+ Encrypted Debug Information
+ The debug information can be encrypted to keep
+ the source code secret, but still be able to use tools such as
+ Debugger or Xref.
+
+ To use encrypted debug information, a key must be provided to
+ the compiler and beam_lib . The key is specified as a string.
+ It is recommended that the string contains at least 32 characters and
+ that both upper and lower case letters as well as digits and
+ special characters are used.
+
+ The default type (and currently the only type) of crypto
+ algorithm is des3_cbc , three rounds of DES. The key string
+ is scrambled using
+ erlang:md5/1
+ to generate the keys used for des3_cbc .
+
+
+ As far as we know by the time of writing, it is
+ infeasible to break des3_cbc encryption without any
+ knowledge of the key. Therefore, as long as the key is kept
+ safe and is unguessable, the encrypted debug information
+ should be safe from intruders.
+
+
+ The key can be provided in the following two ways:
+
+
+ -
+
Use Compiler option {debug_info,Key} , see
+ compile(3)
+ and function
+ crypto_key_fun/1
+ to register a fun that returns the key whenever
+ beam_lib must decrypt the debug information.
+ If no such fun is registered, beam_lib instead
+ searches for an .erlang.crypt file, see the next section.
+
+ -
+
Store the key in a text file named .erlang.crypt .
+ In this case, Compiler option encrypt_debug_info
+ can be used, see
+ compile(3) .
+
+
+
+ File
+{debug_info, Mode, Module, Key}
+
+ The following is an example of an
7}|pc/DM6Cga*68$Mw]L#&_Gejr]G^"}].]]>
- And here is a slightly more complicated example of an
-
The following is a slightly more complicated example of an
+ .erlang.crypt providing one key for module
+ t and another key for all other modules:
+
+ 7}|pc/DM6Cga*68$Mw]L#&_Gejr]G^"}].]]>
-
- Do not use any of the keys in these examples. Use your own
- keys.
-
- Do not use any of the keys in these examples. Use your own keys.
+Each of the functions described below accept either the - module name, the filename, or a binary containing the beam + module name, the filename, or a binary containing the BEAM module.
The list of attributes is sorted on
It is not checked that the forms conform to the abstract format
- indicated by
Reads chunk data for all chunks.
Builds a BEAM module (as a binary) from a list of chunks.
+Reads chunk data for selected chunks refs. The order of +
Reads chunk data for selected chunks references. The order of the returned list of chunk data is determined by the order of the list of chunks references.
Reads chunk data for selected chunks refs. The order of +
Reads chunk data for selected chunks references. The order of the returned list of chunk data is determined by the order of the list of chunks references.
-By default, if any requested chunk is missing in
By default, if any requested chunk is missing in
+
Builds a BEAM module (as a binary) from a list of chunks.
+Unregisters the crypto key fun and terminates the process
+ holding it, started by
+
Returns either
Returns the module version(s). A version is defined by
- the module attribute
-1> beam_lib:version(a). % -vsn(1).
-{ok,{a,[1]}}
-2> beam_lib:version(b). % -vsn([1]).
-{ok,{b,[1]}}
-3> beam_lib:version(c). % -vsn([1]). -vsn(2).
-{ok,{c,[1,2]}}
-4> beam_lib:version(d). % no -vsn attribute
-{ok,{d,[275613208176997377698094100858909383631]}}
+ Compares the contents of two BEAM files. If the module names
+ are the same, and all chunks except for chunk
Calculates an MD5 redundancy check for the code of the module - (compilation date and other attributes are not included).
+Compares the BEAM files in
+ two directories. Only files with extension
Registers an unary fun
+ that is called if
If a fun is already registered when attempting to
+ register a fun,
The fun must handle the following arguments:
+
+CryptoKeyFun(init) -> ok | {ok, NewCryptoKeyFun} | {error, Term}
+ Called when the fun is registered, in the process that holds
+ the fun. Here the crypto key fun can do any necessary
+ initializations. If
+CryptoKeyFun({debug_info, Mode, Module, Filename}) -> Key
+ Called when the key is needed for module
+CryptoKeyFun(clear) -> term()
+ Called before the fun is unregistered. Here any cleaning up
+ can be done. The return value is not important, but is passed
+ back to the caller of
Compares the BEAM files in two directories as
+
For a specified error returned by any function in this module,
+ this function returns a descriptive string
+ of the error in English. For file errors, function
+
Returns a list containing some information about a BEAM file
as tuples
The name (string) of the BEAM file, or the binary from which the information was extracted.
@@ -310,7 +417,8 @@The name (atom) of the module.
For each chunk, the identifier (string) and the position and size of the chunk data, in bytes.
@@ -318,135 +426,75 @@Compares the contents of two BEAM files. If the module names
- are the same, and all chunks except for the
The
The
Calculates an MD5 redundancy check for the code of the module + (compilation date and other attributes are not included).
The
Removes all chunks from a BEAM
file except those needed by the loader. In particular,
- the debug information (
The
Removes all chunks except
those needed by the loader from BEAM files. In particular,
- the debug information (
The
Removes all chunks
except those needed by the loader from the BEAM files of a
- release.
Given the error returned by any function in this module,
- the function
The
If there already is a fun registered when attempting to
- register a fun,
The fun must handle the following arguments:
-
- CryptoKeyFun(init) -> ok | {ok, NewCryptoKeyFun} | {error, Term}
- Called when the fun is registered, in the process that holds
- the fun. Here the crypto key fun can do any necessary
- initializations. If
- CryptoKeyFun({debug_info, Mode, Module, Filename}) -> Key
- Called when the key is needed for the module
- CryptoKeyFun(clear) -> term()
- Called before the fun is unregistered. Here any cleaning up
- can be done. The return value is not important, but is passed
- back to the caller of
Unregisters the crypto key fun and terminates the process
- holding it, started by
The
Returns the module version or versions. A version is defined by
+ module attribute
Examples:
+
+1> beam_lib:version(a). % -vsn(1).
+{ok,{a,[1]}}
+2> beam_lib:version(b). % -vsn([1]).
+{ok,{b,[1]}}
+3> beam_lib:version(c). % -vsn([1]). -vsn(2).
+{ok,{c,[1,2]}}
+4> beam_lib:version(d). % no -vsn attribute
+{ok,{d,[275613208176997377698094100858909383631]}}
This module contains functions for manipulating byte-oriented - binaries. Although the majority of functions could be implemented + binaries. Although the majority of functions could be provided using bit-syntax, the functions in this library are highly optimized and are expected to either execute faster or consume - less memory (or both) than a counterpart written in pure Erlang.
+ less memory, or both, than a counterpart written in pure Erlang. -The module is implemented according to the EEP (Erlang Enhancement Proposal) 31.
+The module is provided according to Erlang Enhancement Proposal + (EEP) 31.
- The library handles byte-oriented data. Bitstrings that are not
- binaries (does not contain whole octets of bits) will result in a
The library handles byte-oriented data. For bitstrings that are not
+ binaries (does not contain whole octets of bits) a
Opaque data-type representing a compiled - search-pattern. Guaranteed to be a tuple() to allow programs to - distinguish it from non precompiled search patterns.
+Opaque data type representing a compiled
+ search pattern. Guaranteed to be a
A representaion of a part (or range) in a binary. Start is a
- zero-based offset into a binary() and Length is the length of
- that part. As input to functions in this module, a reverse
+ A representaion of a part (or range) in a binary.
Returns the byte at position
Returns the byte at position
The same as
Same as
Converts
Converts
Example:
-1> binary:bin_to_list(<<"erlang">>,{1,3}).
+1> binary:bin_to_list(<<"erlang">>, {1,3}).
"rla"
-%% or [114,108,97] in list notation.
-
- If
If
The same as
Same as
Builds an internal structure representing a compilation of a
- search-pattern, later to be used in the
When a list of binaries is given, it denotes a set of
- alternative binaries to search for. I.e if
+ search pattern, later to be used in functions
+
When a list of binaries is specified, it denotes a set of
+ alternative binaries to search for. For example, if
The list of binaries used for search alternatives shall be flat and proper.
+ alternatives; when only a single binary is specified, the set has + only one element. The order of alternatives in a pattern is + not significant. -If
The list of binaries used for search alternatives must be flat and + proper.
+If
The same as
Same as
Creates a binary with the content of
Creates a binary with the content of
This function will always create a new binary, even if
This function always creates a new binary, even if
By deliberately copying a single binary to avoid referencing - a larger binary, one might, instead of freeing up the larger + a larger binary, one can, instead of freeing up the larger binary for later garbage collection, create much more binary data than needed. Sharing binary data is usually good. Only in special cases, when small parts reference large binaries and the large binaries are no longer used in any process, deliberate - copying might be a good idea.
If
If
The same as
Same as
Converts the binary digit representation, in big endian or little
+ endian, of a positive integer in
Converts the binary digit representation, in big or little
- endian, of a positive integer in
Example:
+Example:
1> binary:decode_unsigned(<<169,138,199>>,big).
-11111111
-
+11111111
The same as
Same as
Converts a positive integer to the smallest possible - representation in a binary digit representation, either big + representation in a binary digit representation, either big endian or little endian.
-Example:
+Example:
-1> binary:encode_unsigned(11111111,big).
-<<169,138,199>>
-
+1> binary:encode_unsigned(11111111, big).
+<<169,138,199>>
Returns the first byte of the binary
Returns the first byte of binary
Returns the last byte of the binary
Returns the last byte of binary
Works exactly as
Works exactly as
+
Returns the length of the longest common prefix of the
- binaries in the list
Example:
-1> binary:longest_common_prefix([<<"erlang">>,<<"ergonomy">>]).
+1> binary:longest_common_prefix([<<"erlang">>, <<"ergonomy">>]).
2
-2> binary:longest_common_prefix([<<"erlang">>,<<"perl">>]).
-0
-
+2> binary:longest_common_prefix([<<"erlang">>, <<"perl">>]).
+0
- If
If
Returns the length of the longest common suffix of the
- binaries in the list
Example:
-1> binary:longest_common_suffix([<<"erlang">>,<<"fang">>]).
+1> binary:longest_common_suffix([<<"erlang">>, <<"fang">>]).
3
-2> binary:longest_common_suffix([<<"erlang">>,<<"perl">>]).
-0
-
-
- If
If
The same as
Same as
Searches for the first occurrence of
Searches for the first occurrence of
The function returns
The function will return
Example:
-1> binary:match(<<"abcde">>, [<<"bcde">>,<<"cd">>],[]).
-{1,4}
-
+1> binary:match(<<"abcde">>, [<<"bcde">>, <<"cd">>],[]).
+{1,4}
Even though Only the given part is searched. Return values still have
- offsets from the beginning of Only the specified part is searched. Return values still have
+ offsets from the beginning of
If none of the strings in
-
If none of the strings in
For a description of
For a description of
If
If
The same as
Same as
Works like
As
The first and longest match is preferred to a shorter, which is illustrated by the following example:
@@ -367,76 +396,84 @@
1> binary:matches(<<"abcde">>,
[<<"bcde">>,<<"bc">>,<<"de">>],[]).
-[{1,4}]
-
-
- The result shows that <<"bcde">> is selected instead of the - shorter match <<"bc">> (which would have given raise to one - more match,<<"de">>). This corresponds to the behavior of posix - regular expressions (and programs like awk), but is not - consistent with alternative matches in re (and Perl), where +[{1,4}] + +
The result shows that <<"bcde">> is selected instead of
+ the shorter match <<"bc">> (which would have given raise to
+ one more match, <<"de">>).
+ This corresponds to the behavior of
+ POSIX regular expressions (and programs like awk), but is not
+ consistent with alternative matches in
If none of the strings in pattern is found, an empty list is returned.
- -For a description of
If none of the strings in a pattern is found, an empty list is + returned.
-If
For a description of
If
Extracts the part of binary
Extracts the part of the binary
Negative length can be used to extract bytes at the end of a binary:
+A negative length can be used to extract bytes at the end of a + binary:
1> Bin = <<1,2,3,4,5,6,7,8,9,10>>.
-2> binary:part(Bin,{byte_size(Bin), -5}).
-<<6,7,8,9,10>>
-
+2> binary:part(Bin, {byte_size(Bin), -5}).
+<<6,7,8,9,10>>
If
If
The same as
Same as
If a binary references a larger binary (often described as
+ being a subbinary), it can be useful to get the size of the
+ referenced binary. This function can be used in a program to trigger the
+ use of
If a binary references a larger binary (often described as
- being a sub-binary), it can be useful to get the size of the
- actual referenced binary. This function can be used in a program
- to trigger the use of
Example:
+Example:
store(Binary, GBSet) ->
@@ -447,26 +484,24 @@ store(Binary, GBSet) ->
_ ->
Binary
end,
- gb_sets:insert(NewBin,GBSet).
-
+ gb_sets:insert(NewBin,GBSet).
In this example, we chose to copy the binary content before
- inserting it in the
Binary sharing will occur whenever binaries are taken apart, - this is the fundamental reason why binaries are fast, +
Binary sharing occurs whenever binaries are taken apart.
+ This is the fundamental reason why binaries are fast,
decomposition can always be done with O(1) complexity. In rare
circumstances this data sharing is however undesirable, why this
- function together with
Example of binary sharing:
-1> A = binary:copy(<<1>>,100).
+1> A = binary:copy(<<1>>, 100).
<<1,1,1,1,1 ...
2> byte_size(A).
100
@@ -477,141 +512,138 @@ store(Binary, GBSet) ->
5> byte_size(B).
10
6> binary:referenced_byte_size(B)
-100
-
+100
Binary data is shared among processes. If another process still references the larger binary, copying the part this - process uses only consumes more memory and will not free up the + process uses only consumes more memory and does not free up the larger binary for garbage collection. Use this kind of intrusive - functions with extreme care, and only if a real problem is - detected.
+ functions with extreme care and only if a real problem is detected.The same as
Same as
Constructs a new binary by replacing the parts in
-
If the matching subpart of
If the matching sub-part of
Example:
-1> binary:replace(<<"abcde">>,<<"b">>,<<"[]">>,[{insert_replaced,1}]).
+1> binary:replace(<<"abcde">>,<<"b">>,<<"[]">>, [{insert_replaced,1}]).
<<"a[b]cde">>
-2> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>,
- [global,{insert_replaced,1}]).
+2> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>,[global,{insert_replaced,1}]).
<<"a[b]c[d]e">>
-3> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>,
- [global,{insert_replaced,[1,1]}]).
+3> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>,[global,{insert_replaced,[1,1]}]).
<<"a[bb]c[dd]e">>
-4> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[-]">>,
- [global,{insert_replaced,[1,2]}]).
-<<"a[b-b]c[d-d]e">>
-
+4> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[-]">>,[global,{insert_replaced,[1,2]}]).
+<<"a[b-b]c[d-d]e">>
- If any position given in
If any position specified in
The options
Options
For a description of
For a description of
The same as
Same as
Splits
Splits
The parts of
The parts of
Example:
+Example:
1> binary:split(<<1,255,4,0,0,0,2,3>>, [<<0,0,0>>,<<2>>],[]).
[<<1,255,4>>, <<2,3>>]
2> binary:split(<<0,1,0,0,4,255,255,9>>, [<<0,0>>, <<255,255>>],[global]).
-[<<0,1>>,<<4>>,<<9>>]
-
+[<<0,1>>,<<4>>,<<9>>]
Summary of options:
-Works as in Works as in
Removes trailing empty parts of the result (as does trim in
Removes trailing empty parts of the result (as does
Removes all empty parts of the result.
Repeats the split until the
Repeats the split until
Example of the difference between a scope and taking the binary apart before splitting:
-1> binary:split(<<"banana">>,[<<"a">>],[{scope,{2,3}}]).
+1> binary:split(<<"banana">>, [<<"a">>],[{scope,{2,3}}]).
[<<"ban">>,<<"na">>]
-2> binary:split(binary:part(<<"banana">>,{2,3}),[<<"a">>],[]).
-[<<"n">>,<<"n">>]
-
+2> binary:split(binary:part(<<"banana">>,{2,3}), [<<"a">>],[]).
+[<<"n">>,<<"n">>]
The return type is always a list of binaries that are all
- referencing
For a description of
For a description of
The
This module enables users to enter the short form of some commonly used commands.
These functions are are intended for interactive use in - the Erlang shell only. The module prefix may be omitted.
+These functions are intended for interactive use in + the Erlang shell only. The module prefix can be omitted.
Stack backtrace for a process. Equivalent to
Compiles and then purges and loads the code for a file.
+
compile:file(File , Options ++ [report_errors, report_warnings])
- Note that purging the code means that any processes +
Notice that purging the code means that any processes
lingering in old code for the module are killed without
- warning. See
Changes working directory to
Changes working directory to
Example:
2> cd("../erlang").
/home/ron/erlang
Flushes any messages sent to the shell.
Displays help information: all valid shell internal commands, and commands in this module.
Displays information about a process, Equivalent to
-
Purges and loads, or reloads, a module by calling
Note that purging the code means that any processes +
Notice that purging the code means that any processes
lingering in old code for the module are killed without
- warning. See
Compiles a list of files by calling
Compiles a list of files by calling
+
For information about
Lists files in the current directory.
Lists files in directory
Lists files in directory
Displays information about the loaded modules, including the files from which they have been loaded.
Displays information about
Memory allocation information. Equivalent to
-
Memory allocation information. Equivalent to
-
Compiles and then loads the code for a file on all nodes.
-
compile:file(File , Options ++ [report_errors, report_warnings])
Loads
Converts
Converts
Prints the name of the working directory.
This function is shorthand for
Prints the node uptime (as given by
-
Prints the node uptime (as specified by
+
This function finds undefined functions, unused functions, +
Finds undefined functions, unused functions,
and calls to deprecated functions in a module by calling
Generates an LALR-1 parser. Equivalent to:
yecc:file(File)
+ For information about
Generates an LALR-1 parser. Equivalent to:
yecc:file(File, Options)
+ For information about
This module provides computation of local and universal time, - day-of-the-week, and several time conversion functions.
+ day of the week, and many time conversion functions.Time is local when it is adjusted in accordance with the current time zone and daylight saving. Time is universal when it reflects the time at longitude zero, without any adjustment for daylight saving. Universal Coordinated Time (UTC) time is also called Greenwich Mean Time (GMT).
The time functions
The Gregorian calendar in this module is extended back to year 0. For a given date, the gregorian days is the number of days up to and including the date specified. Similarly, - the gregorian seconds for a given date and time, is - the the number of seconds up to and including the specified date + the gregorian seconds for a specified date and time is + the number of seconds up to and including the specified date and time.
For computing differences between epochs in time, use
the functions counting gregorian days or seconds. If epochs are
- given as local time, they must be converted to universal time, in
- order to get the correct value of the elapsed time between epochs.
- Use of the function
There exists different definitions for the week of the year.
- The calendar module contains a week of the year implementation
- which conforms to the ISO 8601 standard. Since the week number for
- a given date can fall on the previous, the current or on the next
- year it is important to provide the information which year is it
- together with the week number. The function
Different definitions exist for the week of the year.
+ This module contains a week of the year implementation
+ conforming to the ISO 8601 standard. As the week number for a
+ specified date can fall on the previous, the current, or on the next
+ year, it is important to specify both the year and the week number.
+ Functions
Year cannot be abbreviated. Example: 93 denotes year - 93, not 1993. Valid range depends on the underlying OS. The - date tuple must denote a valid date.
+Year cannot be abbreviated. For example, 93 denotes year + 93, not 1993. The valid range depends on the underlying operating + system. The date tuple must denote a valid date.
This function computes the number of gregorian days starting - with year 0 and ending at the given date.
+Computes the number of gregorian days starting + with year 0 and ending at the specified date.
This function computes the number of gregorian seconds - starting with year 0 and ending at the given date and time.
+Computes the number of gregorian seconds starting + with year 0 and ending at the specified date and time.
This function computes the day of the week given
Computes the day of the week from the specified
+
This function computes the date given the number of - gregorian days.
+Computes the date from the specified number of gregorian days.
This function computes the date and time from the given +
Computes the date and time from the specified number of gregorian seconds.
This function checks if a year is a leap year.
+Checks if the specified year is a leap year.
This function returns the tuple {Year, WeekNum} representing
- the iso week number for the actual date. For determining the
- actual date, the function
Returns tuple
This function returns the tuple {Year, WeekNum} representing - the iso week number for the given date.
+Returns tuple
This function computes the number of days in a month.
+Computes the number of days in a month.
This function returns the local time reported by +
Returns the local time reported by the underlying operating system.
This function converts from local time to Universal
- Coordinated Time (UTC).
Converts from local time to Universal Coordinated Time (UTC).
+
This function is deprecated. Use
-
This function converts from local time to Universal
- Coordinated Time (UTC).
Converts from local time to Universal Coordinated Time (UTC).
+
The return value is a list of 0, 1 or 2 possible UTC times:
+The return value is a list of 0, 1, or 2 possible UTC times:
For a local
For a local
For all other local times there is only one - corresponding UTC.
+For all other local times only one corresponding UTC exists.
Returns Universal Coordinated Time (UTC)
+ converted from the return value from
+
This function returns local date and time converted from
- the return value from
-
Returns local date and time converted from the return value from
+
This function returns Universal Coordinated Time (UTC)
- converted from the return value from
-
Returns Universal Coordinated Time (UTC)
+ converted from the return value from
+
This function transforms a given number of seconds into days,
- hours, minutes, and seconds. The
Converts a specified number of seconds into days, hours, minutes,
+ and seconds.
This function computes the time from the given number of
- seconds.
Computes the time from the specified number of seconds.
+
This function returns the difference between two
Returns the difference between two
This function is obsolete. Use the conversion functions for @@ -317,33 +352,38 @@
This function computes the number of seconds since midnight +
Returns the number of seconds since midnight up to the specified time.
This function returns the Universal Coordinated Time (UTC) - reported by the underlying operating system. Local time is - returned if universal time is not available.
+Returns the Universal Coordinated Time (UTC) + reported by the underlying operating system. Returns local time if + universal time is unavailable.
This function converts from Universal Coordinated Time (UTC)
- to local time.
Converts from Universal Coordinated Time (UTC) to local time.
+
The notion that every fourth year is a leap year is not completely true. By the Gregorian rule, a year Y is a leap year if - either of the following rules is valid:
+ one of the following rules is valid:Y is divisible by 4, but not by 100; or
+Y is divisible by 4, but not by 100.
Y is divisible by 400.
Accordingly, 1996 is a leap year, 1900 is not, but 2000 is.
+Hence, 1996 is a leap year, 1900 is not, but 2000 is.
Local time is obtained from the Erlang BIF
The following facts apply:
+The following fapply:
The module
This module provides a term storage on file. The stored terms, in this module called objects, are tuples such that one element is defined to be the key. A Dets table is a collection of objects with the key at the same position stored on a file.
-Dets is used by the Mnesia application, and is provided as is - for users who are interested in an efficient storage of Erlang - terms on disk only. Many applications just need to store some + +
This module is used by the Mnesia application, and is provided + "as is" for users who are interested in efficient storage of Erlang + terms on disk only. Many applications only need to store some terms in a file. Mnesia adds transactions, queries, and distribution. The size of Dets files cannot exceed 2 GB. If larger - tables are needed, Mnesia's table fragmentation can be used.
-There are three types of Dets tables: set, bag and - duplicate_bag. A table of type set has at most one object - with a given key. If an object with a key already present in the - table is inserted, the existing object is overwritten by the new - object. A table of type bag has zero or more different - objects with a given key. A table of type duplicate_bag - has zero or more possibly matching objects with a given key.
+ tables are needed, table fragmentation in Mnesia can be used. + +Three types of Dets tables exist:
+ +Dets tables must be opened before they can be updated or read, - and when finished they must be properly closed. If a table has not - been properly closed, Dets will automatically repair the table. + and when finished they must be properly closed. If a table is not + properly closed, Dets automatically repairs the table. This can take a substantial time if the table is large. A Dets table is closed when the process which opened the table - terminates. If several Erlang processes (users) open the same Dets - table, they will share the table. The table is properly closed + terminates. If many Erlang processes (users) open the same Dets + table, they share the table. The table is properly closed when all users have either terminated or closed the table. Dets - tables are not properly closed if the Erlang runtime system is - terminated abnormally.
+ tables are not properly closed if the Erlang runtime system + terminates abnormally. +A ^C command abnormally terminates an Erlang runtime +
A
Since all operations performed by Dets are disk operations, it + +
As all operations performed by Dets are disk operations, it
is important to realize that a single look-up operation involves a
- series of disk seek and read operations. For this reason, the Dets
- functions are much slower than the corresponding Ets functions,
+ series of disk seek and read operations. The Dets functions
+ are therefore much slower than the corresponding
+
Dets organizes data as a linear hash list and the hash list
grows gracefully as more data is inserted into the table. Space
management on the file is performed by what is called a buddy
system. The current implementation keeps the entire buddy system
in RAM, which implies that if the table gets heavily fragmented,
quite some memory can be used up. The only way to defragment a
- table is to close it and then open it again with the
It is worth noting that the ordered_set type present in Ets is
- not yet implemented by Dets, neither is the limited support for
- concurrent updates which makes a sequence of
Notice that type
Two versions of the format used for storing objects on file are supported by Dets. The first version, 8, is the format always used - for tables created by OTP R7 and earlier. The second version, 9, - is the default version of tables created by OTP R8 (and later OTP - releases). OTP R8 can create version 8 tables, and convert version - 8 tables to version 9, and vice versa, upon request. -
+ for tables created by Erlang/OTP R7 and earlier. The second version, 9, + is the default version of tables created by Erlang/OTP R8 (and later + releases). Erlang/OTP R8 can create version 8 tables, and convert version + 8 tables to version 9, and conversely, upon request.All Dets functions return
Match specifications, see the
Match specifications, see section
+
Opaque continuation used by
See
For a description of patterns, see
+
Returns a list of the names of all open tables on this - node.
+Returns a list of the names of all open tables on this node.
Returns a list of objects stored in a table. The exact
representation of the returned objects is not public. The
- lists of data can be used for initializing a table by giving
- the value
Unless the table is protected using
The first time
The
Closes a table. Only processes that have opened a table are - allowed to close it. -
+ allowed to close it.All open tables must be closed before the system is - stopped. If an attempt is made to open a table which has not - been properly closed, Dets automatically tries to repair the - table.
+ stopped. If an attempt is made to open a table that is not + properly closed, Dets automatically tries to repair it.Deletes all objects with the key
Deletes all objects with key
Deletes all instances of a given object from a table. If a
- table is of type
Deletes all instances of a specified object from a table. If a
+ table is of type
Returns the first key stored in the table
Returns the first key stored in table
Unless the table is protected using
Should an error occur, the process is exited with an error
- tuple
If an error occurs, the process is exited with an error
+ tuple
There are two reasons why
Calls
Deletes all objects of the table
Deletes all objects of table
Returns information about the table
Returns information about table
Returns the information associated with
-
-
Replaces the existing objects of the table
Replaces the existing objects of table
When called with the argument
When called with argument
If the type of the table is
If the table type is
It is important that the table has a sufficient number of
- slots for the objects. If not, the hash list will start to
- grow when
The
Argument
Inserts one or more objects into the table
Inserts one or more objects into table
Returns
Returns
Returns
Returns a list of all objects with the key
Returns a list of all objects with key
2> dets:open_file(abc, [{type, bag}]).
{ok,abc}
@@ -561,394 +597,419 @@ ok
4> dets:insert(abc, {1,3,4}).
ok
5> dets:lookup(abc, 1).
-[{1,2,3},{1,3,4}]
- If the table is of type
If the table type is
Note that the order of objects returned is unspecified. In +
Notice that the order of objects returned is unspecified. In particular, the order in which objects were inserted is not reflected.
Matches some objects stored in a table and returns a
- non-empty list of the bindings that match a given pattern in
+ non-empty list of the bindings matching a specified pattern in
some unspecified order. The table, the pattern, and the number
of objects that are matched are all defined by
-
When all objects of the table have been matched,
+
When all table objects are matched,
Returns for each object of the table
Returns for each object of table
Matches some or all objects of the table
Matches some or all objects of table
A tuple of the bindings and a continuation is returned,
unless the table is empty, in which case
If the keypos'th element of
The table should always be protected using
-
The table is always to be protected using
+
Deletes all objects that match
Deletes all objects that match
If the keypos'th element of
Returns a non-empty list of some objects stored in a table
that match a given pattern in some unspecified order. The
table, the pattern, and the number of objects that are matched
are all defined by
When all objects of the table have been matched, +
When all table objects are matched,
Returns a list of all objects of the table
Returns a list of all objects of table
If the keypos'th element of
Using the
Matches some or all objects stored in the table
Matches some or all objects stored in table
A list of objects and a continuation is returned, unless
the table is empty, in which case
If the keypos'th element of
If the keypos'th element of
The table should always be protected using
-
The table is always to be protected using
+
Works like
Works like
Returns the key following
Should an error occur, the process is exited with an error +
Returns either the key following
If an error occurs, the process is exited with an error
tuple
Use
To find the first key in the table, use
+
Opens an existing table. If the table has not been properly - closed, it will be repaired. The returned reference is to be - used as the name of the table. This function is most useful - for debugging purposes.
+Opens an existing table. If the table is not properly closed, + it is repaired. The returned reference is to be used as the table + name. This function is most useful for debugging purposes.
Opens a table. An empty Dets table is created if no file exists.
-The atom
The atom
If two processes open the same table by giving the same - name and arguments, then the table will have two users. If one - user closes the table, it still remains open until the second - user closes the table.
-The
Argument
The value
Value
The
Option
Returns the name of the table given the pid of a process +
Returns the table name given the pid of a process
that handles requests to a table, or
This function is meant to be used for debugging only.
This function can be used to restore an opaque continuation
- returned by
The reason for this function is that continuation terms
- contain compiled match specifications and therefore will be
+ contain compiled match specifications and therefore are
invalidated if converted to external term format. Given that
the original match specification is kept intact, the
continuation can be restored, meaning it can once again be
used in subsequent
See also
For more information and examples, see the
+
This function is very rarely needed in application code. It
- is used by Mnesia to implement distributed
This function is rarely needed in application code. It is used by
+ application Mnesia to provide distributed
The reason for not having an external representation of - compiled match specifications is performance. It may be + compiled match specifications is performance. It can be subject to change in future releases, while this interface - will remain for backward compatibility.
+ remains for backward compatibility.If
If
If several processes fix a table, the table will remain + terminates.
+If many processes fix a table, the table remains fixed until all processes have released it or terminated. A reference counter is kept on a per process basis, and N consecutive fixes require N releases to release the table.
It is not guaranteed that calls to
If objects have been added while the table was fixed, the - hash list will start to grow when the table is released which - will significantly slow down access to the table for a period + hash list starts to grow when the table is released, which + significantly slows down access to the table for a period of time.
Applies a match specification to some objects stored in a
table and returns a non-empty list of the results. The
table, the match specification, and the number of objects
that are matched are all defined by
When all objects of the table have been matched,
Returns the results of applying the match specification
-
Returns the results of applying match specification
+
If the keypos'th element of
Using the
Returns the results of applying the match specification
-
Returns the results of applying match specification
+
A tuple of the results of applying the match specification
and a continuation is returned, unless the table is empty,
in which case
If the keypos'th element of
If the keypos'th element of
The table should always be protected using
-
The table is always to be protected using
+
Deletes each object from the table
Deletes each object from table
If the keypos'th element of
The objects of a table are distributed among slots,
- starting with slot
Ensures that all updates made to the table
Note that the space management data structures kept in RAM, - the buddy system, is also written to the disk. This may take +
Ensures that all updates made to table
Notice that the space management data structures kept in RAM, + the buddy system, is also written to the disk. This can take some time if the table is fragmented.
When there are only simple restrictions on the key position
-
Returns a Query List
+ Comprehension (QLC) query handle. The
+
When there are only simple restrictions on the key position,
+
Simple filters are translated into equivalent match + specifications.
+More complicated filters must be applied to all
+ objects returned by
An example with implicit match specification:
-2> QH2 = qlc:q([{Y} || {X,Y} <- dets:table(t), (X > 1) or (X < 5)]).
- The latter example is in fact equivalent to the former which
- can be verified using the function
The latter example is equivalent to the former, which
+ can be verified using function
3> qlc:info(QH1) =:= qlc:info(QH2). -true-
Inserts the objects of the Dets table
Inserts the objects of the Dets table
Applies
Applies
Continue to perform the traversal. For example, the - following function can be used to print out the contents + following function can be used to print the contents of a table:
-fun(X) -> io:format("~p~n", [X]), continue end.
+fun(X) -> io:format("~p~n", [X]), continue end.
Continue the traversal and accumulate
Continue the traversal and accumulate
-fun(X) -> {continue, X} end.
+fun(X) -> {continue, X} end.
Terminate the traversal and return
Terminate the traversal and return
+
Any other value
Any other value
Updates the object with key
Updates the object with key
This functions provides a way of updating a counter, without having to look up an object, update the object by - incrementing an element and insert the resulting object into + incrementing an element, and insert the resulting object into the table again.
This module provides a
This module provides exactly the same interface as the module
-
This module provides the same interface as the
+
Dictionary as returned by
Dictionary as returned by
+
This function appends a new
Appends a new
See also section
This function appends a list of values
Appends a list of values
See also section
This function erases all items with a given key from a - dictionary.
+Erases all items with a given key from a dictionary.
This function returns the value associated with
Returns the value associated with
See also section
This function returns a list of all keys in the dictionary.
+Returns a list of all keys in dictionary
This function searches for a key in a dictionary. Returns
-
Searches for a key in dictionary
See also section
Calls
This function converts the
Converts the
Returns
This function tests if
Tests if
Calls
Merges two dictionaries,
merge(Fun, D1, D2) ->
fold(fun (K, V1, D) ->
update(K, fun (V2) -> Fun(K, V1, V2) end, V1, D)
end, D2, D1).
- but is faster.
This function creates a new dictionary.
+Creates a new dictionary.
Returns the number of elements in a
Returns
Returns the number of elements in dictionary
+
This function stores a
Stores a
This function converts the dictionary to a list - representation.
+Converts dictionary
Update a value in a dictionary by calling
Updates a value in a dictionary by calling
Update a value in a dictionary by calling
Updates a value in a dictionary by calling
append(Key, Val, D) ->
update(Key, fun (Old) -> Old ++ [Val] end, [Val], D).
Add
This could be defined as:
+Adds
This can be defined as follows, but is faster:
update_counter(Key, Incr, D) ->
update(Key, fun (Old) -> Old + Incr end, Incr, D).
- but is faster.
The functions
Functions
> D0 = dict:new(), @@ -256,19 +287,18 @@ update_counter(Key, Incr, D) -> D3 = dict:append(files, f2, D2), D4 = dict:append(files, f3, D3), dict:fetch(files, D4). -[f1,f2,f3]+[f1,f2,f3]
This saves the trouble of first fetching a keyed value, appending a new value to the list of stored values, and storing - the result. -
-The function
Function
The
This module provides a version of labeled + directed graphs. What makes the graphs provided here non-proper directed graphs is that multiple edges between vertices are allowed. However, the customary definition of - directed graphs will be used in the text that follows. -
-A
Digraphs can be annotated with additional information. Such
- information may be attached to the vertices and to the edges of
- the digraph. A digraph which has been annotated is called a
- labeled digraph, and the information attached to a
- vertex or an edge is called a
An edge e = (v, w) is said to
-
A
In this module, V is allowed to be empty. The so obtained unique
+ digraph is called the
+
Digraphs can be annotated with more information. Such information
+ can be attached to the vertices and to the edges of the digraph. An
+ annotated digraph is called a labeled digraph, and the
+ information attached to a vertex or an edge is called a
+
An edge e = (v, w) is said to
+
The
The
If an edge is emanating from v and incident on w, then w is
+ said to be an
A
The
Path P is
Path P is a
A
A
An
A digraph as returned by
A digraph as returned by
+
If the edge would create a cycle in
- an
If the edge would create a cycle in
+ an
Deletes the edge
Deletes edge
Deletes the edges in the list
Deletes the edges in list
Deletes edges from the digraph
A sketch of the procedure employed: Find an arbitrary
-
Deletes edges from digraph
A sketch of the procedure employed:
+Find an arbitrary
+
Remove all edges of
Repeat until there is no path between
Deletes the vertex
Deletes vertex
Deletes the vertices in the list
Deletes the vertices in list
Deletes the digraph
Deletes digraph
Returns
Returns
+
Returns a list of all edges of the digraph
Returns a list of all edges of digraph
Returns a list of all
- edges
Returns a list of all
+ edges
If there is
- a
If a
Tries to find
- a
The digraph
Tries to find
+ a
Digraph
Tries to find an as short as
- possible
Tries to find an as short as possible
+
Tries to find an as short as
- possible
The digraph
Tries to find an as short as possible
+
Digraph
Returns the
Returns the
Returns a list of all
- edges
Returns a list of all
+ edges
Returns a list of
- all
Returns a list of
+ all
Returns a list of
Returns a list of
Equivalent to
Equivalent to
Returns
- an
Returns
+ an
Allows
The digraph is to be kept
+
Other processes can read the digraph (default).
The digraph can be read and modified by the creating + process only.
If an unrecognized type option
If an unrecognized type option
Returns the number of edges of the digraph
Returns the number of edges of digraph
Returns the number of vertices of the digraph
Returns the number of vertices of digraph
Returns the
Returns the
Returns a list of all
- edges
Returns a list of all
+ edges
Returns a list of
- all
Returns
Returns
Returns a list of all vertices of the digraph
Returns a list of all vertices of digraph
The
A
Digraphs can be annotated with additional information. Such
- information may be attached to the vertices and to the edges of
- the digraph. A digraph which has been annotated is called a
- labeled digraph, and the information attached to a
- vertex or an edge is called a
An edge e = (v, w) is said
- to
This module provides algorithms based on depth-first traversal of
+ directed graphs. For basic functions on directed graphs, see the
+
A
A
A
A
Digraphs can be annotated with more information. Such information
+ can be attached to the vertices and to the edges of the digraph. An
+ annotated digraph is called a labeled digraph, and the
+ information attached to a vertex or an edge is called a
+
An edge e = (v, w) is said to
+
If an edge is emanating from v and incident on w, then w is
+ said to be an
A
The
Path P is a
A
An
A
A
The problem of
+
A
G' is maximal with respect to a property P if all other + subgraphs that include the vertices of G' do not have property P.
+A
A
An
A
Returns
Returns
Returns a list
- of
Returns a list
+ of
Creates a digraph where the vertices are
- the
Creates a digraph where the vertices are
+ the
The created digraph has the same type as
Each and every
Each
Returns a list of
Returns a list of
Returns
Returns
Returns
Returns
Returns
Returns
Returns a list of all vertices of
Returns a list of all vertices of
Returns all vertices of the digraph
Returns all vertices of digraph
Returns all vertices of the digraph
Returns all vertices of digraph
Returns an unsorted list of digraph vertices such that for
- each vertex in the list, there is
- a
Returns an unsorted list of digraph vertices such that for
- each vertex in the list, there is
- a
Returns an unsorted list of digraph vertices such that for
- each vertex in the list, there is
- a
Returns an unsorted list of digraph vertices such that for
- each vertex in the list, there is
- a
Returns a list of
Returns a list of
Creates a maximal
Creates a maximal
If the value of the option
If the value of option
If the value of the option
There will be a
If the value of option
If any of the arguments are invalid, a
Returns a
Returns a
The Erlang code preprocessor includes functions which are used
- by
The Erlang code preprocessor includes functions that are used by the
+
The Erlang source file
+ the matching string is not a valid encoding, it is ignored. The + valid encodings areLatin-1 andUTF-8 , where the + case of the characters can be chosen freely. + +Examples:
+ +%% coding: utf-8-+ +%% For this file we have chosen encoding = Latin-1-+ +%% -*- coding: latin-1 -*-
Handle to the epp server.
Handle to the
Opens a file for preprocessing.
-If
Closes the preprocessing of a file.
Equivalent to
Returns the default encoding of Erlang source files.
Equivalent to
Returns a string representation of an encoding. The string
+ is recognized by
+
Closes the preprocessing of a file.
+Takes an
Returns the next Erlang form from the opened Erlang source file.
- The tuple
Opens a file for preprocessing.
+If
Preprocesses and parses an Erlang source file.
- Note that the tuple
If
Equivalent to
+
Equivalent to
Equivalent to
Returns the default encoding of Erlang source files.
+Returns the next Erlang form from the opened Erlang source file.
+ Tuple
Returns a string representation of an encoding. The string
- is recognized by
Preprocesses and parses an Erlang source file.
+ Notice that tuple
If
Equivalent to
Read the
The option
Option
Read the
The option
Option
Reads the
Returns the read encoding, or
Reads the
Returns the read encoding, or
Takes an
The
- {ErrorLine, Module, ErrorDescriptor}
- A string which describes the error is obtained with the following call: -
+{ErrorLine, Module, ErrorDescriptor} +A string describing the error is obtained with the following call:
- Module:format_error(ErrorDescriptor)
+Module:format_error(ErrorDescriptor)
This module implements an abstract type that is used by the +
This module provides an abstract type that is used by the
Erlang Compiler and its helper modules for holding data such as
column, line number, and text. The data type is a collection of
The Erlang Token Scanner returns tokens with a subset of the following annotations, depending on the options:
+The column where the token begins.
The line and column where the token begins, or - just the line if the column unknown.
-The token's text.
From the above the following annotation is derived:
+ +From this, the following annotation is derived:
+The line where the token begins.
Furthermore, the following annotations are supported by - this module, and used by various modules:
+ +This module also supports the following annotations, + which are used by various modules:
+A filename.
A Boolean indicating if the abstract code is - compiler generated. The Erlang Compiler does not emit warnings - for such code.
-A Boolean indicating if the origin of the abstract - code is a record. Used by Dialyzer to assign types to tuple - elements.
+ code is a record. Used by +The functions
-
The functions
-
To be changed to a non-negative integer in Erlang/OTP 19.0.
-Returns the column of the annotations
Returns the column of the annotations
Returns the end location of the text of the
annotations
Returns the filename of the annotations
Returns annotations with the representation
See also
Returns annotations with representation
See also
Returns
Returns
Returns
Returns the line of the annotations
Returns the line of the annotations
Returns the location of the annotations
Returns the location of the annotations
Creates a new collection of annotations given a location.
Modifies the filename of the annotations
Modifies the filename of the annotations
Modifies the generated marker of the annotations
-
Modifies the generated marker of the annotations
Modifies the line of the annotations
Modifies the line of the annotations
Modifies the location of the annotations
Modifies the location of the annotations
Modifies the record marker of the annotations
Modifies the record marker of the annotations
Modifies the text of the annotations
Modifies the text of the annotations
Returns the text of the annotations
Returns the term representing the annotations
See also
Returns the term representing the annotations
See also
This module provides an interpreter for Erlang expressions. The
expressions are in the abstract syntax as returned by
Further described
-
Further described in section
+
Further described
-
Further described in section
+
Evaluates
Returns
Adds binding
Returns the binding of
Returns the list of bindings contained in the binding + structure.
+Removes the binding of
Evaluates
Returns
Returns
Evaluates a list of expressions in parallel, using the same
initial bindings for each expression. Attempts are made to
- merge the bindings returned from each evaluation. This
- function is useful in the
Returns
Returns an empty binding structure.
-Returns the list of bindings contained in the binding - structure.
-Returns the binding of
Adds the binding
Evaluates
Returns
Removes the binding of
Returns an empty binding structure.
During evaluation of a function, no calls can be made to local
functions. An undefined function error would be
generated. However, the optional argument
-
This defines a local function handler which is called with:
+This defines a local function handler that is called with:
-Func(Name, Arguments)
+Func(Name, Arguments)
This defines a local function handler which is called with:
+This defines a local function handler that is called with:
-Func(Name, Arguments, Bindings)
+Func(Name, Arguments, Bindings)
-{value,Value,NewBindings}
+{value,Value,NewBindings}
The optional argument
A functional object (fun) is called.
A built-in function is called.
A function is called using the
An operator
Exceptions are calls to
This defines an nonlocal function handler which is called with:
+This defines a non-local function handler that is called with:
-Func(FuncSpec, Arguments)
+Func(FuncSpec, Arguments)
There is no nonlocal function handler.
+There is no non-local function handler.
For calls such as
The non-local function handler is however called with the
+ evaluated arguments of the call to
+
Calls to functions defined by evaluating fun expressions +
Calls to functions defined by evaluating fun expressions
The nonlocal function handler argument is probably not used as + handlers.
+ +The non-local function handler argument is probably not used as
frequently as the local function handler argument. A possible
use is to call
Undocumented functions in
Undocumented functions in this module are not to be used.
This module expands records in a module.
Expands all records in a module. The returned module has no - references to records, neither attributes nor code.
+ references to records, attributes, or code.The
Section
This module performs an identity parse transformation of Erlang code.
- It is included as an example for users who may wish to write their own
- parse transformers. If the option
Performs an identity transformation on Erlang forms, as an example.
+Performs an identity transformation on Erlang forms, as an example. +
Parse transformations are used if a programmer wants to use Erlang syntax, but with different semantics. The original Erlang - code is then transformed into other Erlang code. -
+ code is then transformed into other Erlang code.Programmers are strongly advised not to engage in parse transformations and no support is offered for problems encountered.
+Programmers are strongly advised not to engage in parse + transformations. No support is offered for problems encountered.
This module defines Erlang BIFs, guard tests and operators. +
This module defines Erlang BIFs, guard tests, and operators. This module is only of interest to programmers who manipulate Erlang code.
Returns
Returns
Returns
Returns
Returns
Returns
Returns
Returns
Returns
Returns
Returns
Returns
Returns
Returns the
Returns
Returns
Returns the
Returns
This module is used to check Erlang code for illegal syntax and - other bugs. It also warns against coding practices which are - not recommended.
+ other bugs. It also warns against coding practices that are + not recommended. +The errors detected include:
+Warnings include:
+ +The warnings detected include:
+Some of the warnings are optional, and can be turned on by - giving the appropriate option, described below.
+ specifying the appropriate option, described below. +The functions in this module are invoked automatically by the - Erlang compiler and there is no reason to invoke these + Erlang compiler. There is no reason to invoke these functions separately unless you have written your own Erlang compiler.
Takes an
Tests if
This function checks all the forms in a module for errors. - It returns: -
+Checks all the forms in a module for errors. It returns:
There were no errors in the module.
+There are no errors in the module.
There were errors in the module.
+There are errors in the module.
Since this module is of interest only to the maintainers of - the compiler, and to avoid having the same description in - two places to avoid the usual maintenance nightmare, the +
As this module is of interest only to the maintainers of the
+ compiler, and to avoid the same description in two places, the
elements of
The
- [{FileName2 ,[ErrorInfo ]}]
- The errors and warnings are listed in the order in which - they are encountered in the forms. This means that the - errors from one file may be split into different entries in - the list of errors.
-This function tests if
Takes an
The errors and warnings are listed in the order in which they are + encountered in the forms. The errors from one file can therefore be + split into different entries in the list of errors.
The
- {ErrorLine, Module, ErrorDescriptor}
- A string which describes the error is obtained with the following call: -
+{ErrorLine, Module, ErrorDescriptor} +A string describing the error is obtained with the following call:
- Module:format_error(ErrorDescriptor)
+Module:format_error(ErrorDescriptor)
This module is the basic Erlang parser which converts tokens into - the abstract form of either forms (i.e., top-level constructs), +
This module is the basic Erlang parser that converts tokens into
+ the abstract form of either forms (that is, top-level constructs),
expressions, or terms. The Abstract Format is described in the
This function parses
The parsing was successful.
An error occurred.
-This function parses
The parsing was successful.
An error occurred.
-This function parses
The parsing was successful.
An error occurred.
-Uses an
This function generates a list of tokens representing the abstract
- form
Converts the abstract form
Converts the Erlang data structure
Converts the Erlang data structure
The
The
Option
Option
Modifies the
Assumes that
Returns a term where each collection of annotations
+
Updates an accumulator by applying
Uses an
Modifies the
Modifies the
Assumes that
Assumes that
Converts the abstract form
Returns a term where each collection of annotations
-
Parses
The parsing was successful.
An error occurred.
+Parses
The parsing was successful.
An error occurred.
+Parses
The parsing was successful.
An error occurred.
+Generates a list of tokens representing the abstract
+ form
The
- {ErrorLine, Module, ErrorDescriptor}
- A string which describes the error is obtained with the following call: -
+{ErrorLine, Module, ErrorDescriptor} +A string describing the error is obtained with the following call:
- Module:format_error(ErrorDescriptor)
+Module:format_error(ErrorDescriptor)
The functions in this module are used to generate aesthetically attractive representations of abstract - forms, which are suitable for printing. All functions return (possibly deep) + forms, which are suitable for printing. + All functions return (possibly deep) lists of characters and generate an error if the form is wrong.
-All functions can have an optional argument which specifies a hook + +
All functions can have an optional argument, which specifies a hook that is called if an attempt is made to print an unknown form.
The optional argument
If
The called hook function should return a (possibly deep) list
- of characters.
If
Optional argument
The called hook function is to return a (possibly deep) list of
+ characters. Function
If
Pretty prints a
-
Same as
The same as
Prints one expression. It is useful for implementing hooks (see
+ section
+
The same as
Same as
The same as
Pretty prints a
+
The same as
Same as
This function prints one expression. It is useful for implementing hooks (see below).
+Same as
It should be possible to have hook functions for unknown forms - at places other than expressions.
+It is not possible to have hook functions for unknown forms + at other places than expressions.
This module contains functions for tokenizing characters into +
This module contains functions for tokenizing (scanning) characters into Erlang tokens.
Returns the category of
Returns the column of
Returns the end location of the text of
+
Uses an
Returns the line of
Returns the location of
Returns
Takes the list of characters
An error occurred.
A token is a tuple containing information about
- syntactic category, the token annotations, and the actual
- terminal symbol. For punctuation characters (e.g.
The valid options are:
+ Three-tuples have one of the following forms: +Valid options:
A callback function that is called when the scanner
- has found an unquoted atom. If the function returns
-
Return comment tokens.
-Return white space tokens. By convention, if there is - a newline character, it is always the first character of the - text (there cannot be more than one newline in a white space - token).
-Short for
Include the token's text in the token annotation. The - text is the part of the input corresponding to the token.
-A callback function that is called when the scanner
+ has found an unquoted atom. If the function returns
+
Return comment tokens.
+Return white space tokens. By convention, a newline + character, if present, is always the first character of the + text (there cannot be more than one newline in a white space + token).
+Short for
Include the token text in the token annotation. The + text is the part of the input corresponding to the token.
+Returns the symbol of
Returns the text of
This is the re-entrant scanner which scans characters until
- a dot ('.' followed by a white space) or
-
This is the re-entrant scanner, which scans characters until
+ either a dot ('.' followed by a white space) or
+
This return indicates that there is sufficient input +
Indicates that there is sufficient input
data to get a result.
The scanning was successful.
End of file was encountered before any more tokens.
An error occurred.
The
See
Returns
Returns the category of
Returns the symbol of
Returns the column of
Returns the end location of the text of
-
Returns the line of
Returns the location of
Returns the text of
Takes an
For a description of the options, see
+
The
{ErrorLocation, Module, ErrorDescriptor}
- A string which describes the error is obtained with the - following call:
+A string describing the error is obtained with the following call:
Module:format_error(ErrorDescriptor)
The continuation of the first call to the re-entrant input
- functions must be
The
This module archives and extract files to and from
+ a tar file. This module supports the
By convention, the name of a tar file should end in "
Tar files can be created in one operation using the
-
Alternatively, for more control, the
-
To extract all files from a tar file, use the
-
By convention, the name of a tar file is to end in "
Tar files can be created in one operation using function
+
Alternatively, for more control, use functions
+
To extract all files from a tar file, use function
+
To return a list of the files in a tar file,
- use either the
To convert an error term returned from one of the functions
- above to a readable message, use the
-
If
If
If
If
The
An example of this is the sftp support in
The
An example of this is the SFTP support in
+
For maximum compatibility, it is safe to archive files with names
- up to 100 characters in length. Such tar files can generally be
- extracted by any
If filenames exceed 100 characters in length, the resulting tar
- file can only be correctly extracted by a POSIX-compatible
File have longer names than 256 bytes cannot be stored at all.
-The filename of the file a symbolic link points is always limited - to 100 characters.
+For maximum compatibility, it is safe to archive files with names
+ up to 100 characters in length. Such tar files can generally be
+ extracted by any
For filenames exceeding 100 characters in length, the resulting tar
+ file can only be correctly extracted by a POSIX-compatible
Files with longer names than 256 bytes cannot be stored.
+The file name a symbolic link points is always limited + to 100 characters.
+The
Adds a file to a tar file that has been opened for writing by
+
Options:
By default, symbolic links will be stored as symbolic links
- in the tar file. Use the
By default, symbolic links are stored as symbolic links
+ in the tar file. To override the default and store the file
+ that the symbolic link points to into the tar file, use
+ option
Print an informational message about the file being added.
+Prints an informational message about the added file.
+Reads data in parts from the file. This is intended for
+ memory-limited machines that, for example, builds a tar file
+ on a remote machine over SFTP, see
+
Read data in parts from the file. This is intended for memory-limited
- machines that for example builds a tar file on a remote machine over
-
The
Adds a file to a tar file that has been opened for writing by
+
The
Closes a tar file
+ opened by
The
Creates a tar file and archives the files whose names are specified
+ in
The
The options in
Creates a tar file and archives the files whose names are specified
+ in
The options in
The entire tar file will be compressed, as if it has +
The entire tar file is compressed, as if it has
been run through the
By default, the
By default, function
By default, symbolic links will be stored as symbolic links
- in the tar file. Use the
By default, symbolic links are stored as symbolic links in
+ the tar file. To override the default and store the file that
+ the symbolic link points to into the tar file, use
+ option
Print an informational message about each file being added.
+Prints an informational message about each added file.
The
If the
If the
Otherwise,
Extracts all files from a tar archive.
+If argument
If argument
Otherwise,
The
If the
If the
Otherwise,
Extracts files from a tar archive.
+If argument
If argument
Otherwise,
The following options modify the defaults for the extraction as - follows.
+ follows:Files with relative filenames will by default be extracted
- to the current working directory.
- Given the
Files with relative filenames are by default extracted
+ to the current working directory. With this option, files are
+ instead extracted into directory
By default, all files will be extracted from the tar file.
- Given the
By default, all files are extracted from the tar file. With
+ this option, only those files are extracted whose names are
+ included in
Given the
With this option, the file is uncompressed while extracting. + If the tar file is not compressed, this option is ignored.
By default, the
By default, function
Instead of extracting to a directory, the memory option will - give the result as a list of tuples {Filename, Binary}, where - Binary is a binary containing the extracted data of the file named - Filename in the tar file.
+Instead of extracting to a directory, this option gives the
+ result as a list of tuples
By default, all existing files with the same name as file in
- the tar file will be overwritten
- Given the
By default, all existing files with the same name as files in + the tar file are overwritten. With this option, existing + files are not overwriten.
Print an informational message as each file is being extracted.
+Prints an informational message for each extracted file.
The
Cconverts an error reason term to a human-readable error message + string.
The
By convention, the name of a tar file should end in "
Except for the
The
The
Parameter
The following are the fun clauses parameter lists:
The entire tar file will be compressed, as if it has
- been run through the
Writes term
Closes the access.
+By default, the
Reads using
Sets the position of
Use the
Example:
+The following is a complete
+ExampleFun =
+ fun(write, {Fd,Data}) -> file:write(Fd, Data);
+ (position, {Fd,Pos}) -> file:position(Fd, Pos);
+ (read2, {Fd,Size}) -> file:read(Fd, Size);
+ (close, Fd) -> file:close(Fd)
+ end
+ Here
+{ok,Fd} = file:open(Name, ...).
+{ok,TarDesc} = erl_tar:init(Fd, [write], ExampleFun),
+
+erl_tar:add(TarDesc, SomeValueIwantToAdd, FileNameInTarFile),
+...,
+erl_tar:close(TarDesc)
+ When the
This example with the
The
The
The
The
The parameter
The fun clauses parameter lists are:
-A complete
- ExampleFun =
- fun(write, {Fd,Data}) -> file:write(Fd, Data);
- (position, {Fd,Pos}) -> file:position(Fd, Pos);
- (read2, {Fd,Size}) -> file:read(Fd,Size);
- (close, Fd) -> file:close(Fd)
- end
-
- where
- {ok,Fd} = file:open(Name,...).
- {ok,TarDesc} = erl_tar:init(Fd, [write], ExampleFun),
-
- The
- erl_tar:add(TarDesc, SomeValueIwantToAdd, FileNameInTarFile),
- ....,
- erl_tar:close(TarDesc)
-
- When the erl_tar core wants to e.g. write a piece of Data, it would call
-
The example above with
Creates a tar file for writing (any existing file with the same + name is truncated).
+By convention, the name of a tar file is to end in "
Except for the
The entire tar file is compressed, as if it has been run
+ through the
By default, the tar file is opened in
To add one file at the time into an opened tar file, use function
+
The
The
The
Retrieves the names of all files in the tar file
The
Retrieves the names of all files in the tar file
The
Prints the names of all files in the tar file
The
Prints names and information about all files in the tar file
+
This module is an interface to the Erlang built-in term storage
BIFs. These provide the ability to store very large quantities of
data in an Erlang runtime system, and to have constant access
time to the data. (In the case of
Data is organized as a set of dynamic tables, which can store tuples. Each table is created by a process. When the process terminates, the table is automatically destroyed. Every table has access rights set at creation.
+Tables are divided into four different types,
The number of tables stored at one Erlang node is limited.
- The current default limit is approximately 1400 tables. The upper
- limit can be increased by setting the environment variable
+ The current default limit is about 1400 tables. The upper
+ limit can be increased by setting environment variable
Note that there is no automatic garbage collection for tables. + +
Notice that there is no automatic garbage collection for tables.
Even if there are no references to a table from any process, it
- will not automatically be destroyed unless the owner process
- terminates. It can be destroyed explicitly by using
-
Some implementation details:
+In the current implementation, every object insert and + look-up operation results in a copy of the object.
Also worth noting is the subtle difference between + +
Notice the subtle difference between
matching and comparing equal, which is
- demonstrated by the different table types
Two Erlang terms
Two Erlang terms compare equal
+ if they either are of the same type and value, or if
+ both are numeric types and extend to the same value, so that
+
The
In general, the functions below will exit with reason
-
The functions in this module exits with reason
+
This module provides some limited support for concurrent access. All updates to single objects are guaranteed to be both atomic - and isolated. This means that an updating operation towards - a single object will either succeed or fail completely without any - effect at all (atomicity). - Nor can any intermediate results of the update be seen by other - processes (isolation). Some functions that update several objects + and isolated. This means that an updating operation to + a single object either succeeds or fails completely without any + effect (atomicity) and that + no intermediate results of the update can be seen by other + processes (isolation). Some functions that update many objects state that they even guarantee atomicity and isolation for the entire operation. In database terms the isolation level can be seen as - "serializable", as if all isolated operations were carried out serially, + "serializable", as if all isolated operations are carried out serially, one after the other in a strict order.
-No other support is available within ETS that would guarantee
- consistency between objects. However, the
No other support is available within this module that would guarantee
+ consistency between objects. However, function
+
Some of the functions uses a match specification,
- match_spec. A brief explanation is given in
-
Some of the functions use a match specification,
+
Opaque continuation used by
A table identifier, as returned by new/2.
A table identifier, as returned by
+
Returns a list of all tables at the node. Named tables are - given by their names, unnamed tables are given by their + specified by their names, unnamed tables are specified by their table identifiers.
-There is no guarantee of consistency in the returned list. Tables created - or deleted by other processes "during" the ets:all() call may or may - not be included in the list. Only tables created/deleted before - ets:all() is called are guaranteed to be included/excluded.
+There is no guarantee of consistency in the returned list. Tables
+ created or deleted by other processes "during" the
Deletes the entire table
Deletes all objects with the key
Deletes all objects with key
Delete all objects in the ETS table
Delete the exact object
Delete the exact object
Reads a file produced by
Equivalent to
Reads a file produced by
Equivalent to
Reads a file produced by
Reads a file produced by
The currently only supported option is
If no
If verification is turned on and the file was written with
- the option
The only supported option is
If no
If verification is turned on and the file was written with
+ option
Returns the first key
Use
Returns the first key
To find subsequent keys in the table, use
+
If
If
If
If
Fills an already created ETS table with the objects in the
- already opened Dets table named
Throws a badarg error if any of the tables does not exist or the - dets table is not open.
+ already opened Dets tableIf any of the tables does not exist or the Dets table is
+ not open, a
Pseudo function that by means of a
Pseudo function that by a
The parse transform is implemented in the module
-
The parse transform is provided in the
The fun is very restricted, it can take only a single
parameter (the object to match): a sole variable or a
- tuple. It needs to use the
The return value is the resulting match_spec.
-Example:
+ tuple. It must use theThe return value is the resulting match specification.
+Example:
1> ets:fun2ms(fun({M,N}) when N > 3 -> M end).
[{{'$1','$2'},[{'>','$2',3}],['$1']}]
- Variables from the environment can be imported, so that this - works:
+Variables from the environment can be imported, so that the + following works:
2> X=3.
3
3> ets:fun2ms(fun({M,N}) when N > X -> M end).
[{{'$1','$2'},[{'>','$2',{const,3}}],['$1']}]
- The imported variables will be replaced by match_spec +
The imported variables are replaced by match specification
4> ets:fun2ms(fun({M,N}) when N > X, is_atomm(M) -> M end).
Error: fun containing local Erlang function calls
@@ -362,724 +405,832 @@ Error: fun containing local Erlang function calls
{error,transform_error}
5> ets:fun2ms(fun({M,N}) when N > X, is_atom(M) -> M end).
[{{'$1','$2'},[{'>','$2',{const,3}},{is_atom,'$1'}],['$1']}]
- As can be seen by the example, the function can be called - from the shell too. The fun needs to be literally in the call - when used from the shell as well. Other means than the - parse_transform are used in the shell case, but more or less - the same restrictions apply (the exception being records, - as they are not handled by the shell).
+As shown by the example, the function can be called + from the shell also. The fun must be literally in the call + when used from the shell as well.
If the parse_transform is not applied to a module which
- calls this pseudo function, the call will fail in runtime
- (with a
If the
For more information, see
-
For more information, see
Make process
The process
Note that
Make process
The process
Notice that this function does not affect option
+
Displays information about all ETS tables on tty.
+Displays information about all ETS tables on a terminal.
Browses the table
Browses table
Returns information about the table
Returns information about table
Indicates if the table is compressed.
+The pid of the heir of the table, or
The key position.
+The number of words allocated to the table.
+The table name.
+Indicates if the table is named.
+The node where the table is stored. This field is no longer + meaningful, as tables cannot be accessed from other nodes.
+The pid of the owner of the table.
+The table access rights.
+The number of objects inserted in the table.
+The table type.
+Indicates whether the table uses
Indicates whether the table uses
Returns the information associated with
In R11B and earlier, this function would not fail but return
-
In addition to the
Returns the information associated with
In Erlang/OTP R11B and earlier, this function would not fail but
+ return
In addition to the
-
Indicates if the table is fixed by any process.
+If the table has been fixed using
-
The format and value of
The format and value of
If the table never has been fixed, the call returns
-
- Returns internal statistics about set, bag and duplicate_bag tables on an internal format used by OTP test suites.
- Not for production use.
Returns internal statistics about
Replaces the existing objects of the table
Replaces the existing objects of table
When called with the argument
When called with argument
If the type of the table is
If the table type is
Inserts the object or all of the objects in the list
-
Inserts the object or all of the objects in list
+
If the table type is
If the table type is
If the list contains more than one object with
+ matching keys and the table type is
The entire operation is guaranteed to be
This function works exactly like
Same as
If
This function is used to check if a term is a valid
- compiled
Checks if a term is a valid
+ compiled
Examples:
+The following expression yields
ets:is_compiled_ms(ets:match_spec_compile([{'_',[],[true]}])).
- will yield
The following expressions yield
MS = ets:match_spec_compile([{'_',[],[true]}]),
Broken = binary_to_term(term_to_binary(MS)),
ets:is_compiled_ms(Broken).
- will yield false, as the variable
The fact that compiled match_specs has no external - representation is for performance reasons. It may be subject - to change in future releases, while this interface will - still remain for backward compatibility reasons.
+The reason for not having an external representation of + compiled match specifications is performance. It can be + subject to change in future releases, while this interface + remains for backward compatibility.
Returns the last key
Use
Returns the last key
To find preceding keys in the table, use
+
Returns a list of all objects with the key
In the case of
Returns a list of all objects with key
For tables of type
For tables of type
The difference is the same as between
As an example, one can insert an object with
If the table is of type
For tables of type
Note that the time order of object insertions is preserved;
- the first object inserted with the given key will be first
+ key. For tables of type
Notice that the time order of object insertions is preserved; + the first object inserted with the specified key is the first in the resulting list, and so on.
-Insert and look-up times in tables of type
Insert and lookup times in tables of type
If the table
If the table is of type
If no object with the key
The difference between
For a table
For tables of type
If no object with key
The difference between
Continues a match started with
+
When there are no more objects in the table,
Matches the objects in the table
Matches the objects in table
A pattern is a term that may contain:
+A pattern is a term that can contain:
The function returns a list with one element for each matching object, where each element is an ordered list of - pattern variable bindings. An example:
+ pattern variable bindings, for example:
-6> ets:match(T, '$1'). % Matches every object in the table
+6> ets:match(T, '$1'). % Matches every object in table
[[{rufsen,dog,7}],[{brunte,horse,5}],[{ludde,dog,5}]]
7> ets:match(T, {'_',dog,'$1'}).
[[7],[5]]
8> ets:match(T, {'_',cow,'$1'}).
[]
If the key is specified in the pattern, the match is very - efficient. If the key is not specified, i.e. if it is a + efficient. If the key is not specified, that is, if it is a variable or an underscore, the entire table must be searched. The search time can be substantial if the table is very large.
-On tables of the
For tables of type
Works like
Works like
If the table is empty,
Continues a match started with
Deletes all objects that match pattern
Deletes all objects which match the pattern
Continues a match started with
+
When there are no more objects in the table,
Matches the objects in the table
Matches the objects in table
If the key is specified in the pattern, the match is very - efficient. If the key is not specified, i.e. if it is a + efficient. If the key is not specified, that is, if it is a variable or an underscore, the entire table must be searched. The search time can be substantial if the table is very large.
-On tables of the
For tables of type
Works like
Continues a match started with
Works like
If the table is empty,
This function transforms a
-
If the term
Transforms a
+
If term
This function has limited use in normal code, it is used by
- Dets to perform the
This function has limited use in normal code. It is used by the
+
This function executes the matching specified in a
- compiled
The matching will be executed on each element in
Executes the matching specified in a compiled
+
The matching is executed on each element in
Example:
+The following two calls give the same result (but certainly not the + same execution time):
Table = ets:new...
-MatchSpec = ....
+MatchSpec = ...
% The following call...
ets:match_spec_run(ets:tab2list(Table),
ets:match_spec_compile(MatchSpec)),
-% ...will give the same result as the more common (and more efficient)
-ets:select(Table,MatchSpec),
+% ...gives the same result as the more common (and more efficient)
+ets:select(Table, MatchSpec),
This function has limited use in normal code, it is used by
- Dets to perform the
This function has limited use in normal code. It is used by the
+
Works like
Works like
Creates a new table and returns a table identifier which can +
Creates a new table and returns a table identifier that can be used in subsequent operations. The table identifier can be sent to other processes so that a table can be shared between different processes within a node.
-The parameter
Parameter
The table is a
The table is a
The table is a
The table is a
Any process can read or write to the table.
+The owner process can read and write to the table. Other processes can only read the table. This is the default setting for the access rights.
+Only the owner process can read or write to the table.
If this option is present, name
Specifies which element in the stored tuples to use
+ as key. By default, it is the first element, that is,
+
Note that any tuple stored in the table must have at +
Notice that any tuple stored in the table must have at
least
- Set a process as heir. The heir will inherit the table if
- the owner terminates. The message
-
Set a process as heir. The heir inherits the table if
+ the owner terminates. Message
+
In current implementation, table type
Performance tuning. Defaults to
Option
Notice that this option does not change any guarantees about
+
Table type
Performance tuning. Defaults to
You typically want to enable this option when concurrent read + operations are much more frequent than write operations, or when + concurrent reads and writes comes in large read and write bursts + (that is, many reads not interrupted by writes, and many + writes not interrupted by reads).
+You typically do + not want to enable this option when the common access + pattern is a few read operations interleaved with a few write + operations repeatedly. In this case, you would get a performance + degradation by enabling this option.
+Option
If this option is present, the table data is stored in a more
+ compact format to consume less memory. However, it will make
+ table operations slower. Especially operations that need to
+ inspect entire objects, such as
Returns the next key
Use
Unless a table of type
Returns the next key
To find the first key in the table, use
+
Unless a table of type
Returns the previous key
Use
Returns the previous key
To find the last key in the table, use
+
Renames the named table
This function can be used to restore an opaque continuation
- returned by
Restores an opaque continuation returned by
+
The reason for this function is that continuation terms
- contain compiled match_specs and therefore will be
- invalidated if converted to external term format. Given that
- the original match_spec is kept intact, the continuation can
+ contain compiled match specifications and therefore are
+ invalidated if converted to external term format. Given that the
+ original match specification is kept intact, the continuation can
be restored, meaning it can once again be used in subsequent
-
As an example, the following sequence of calls will fail:
+Examples:
+The following sequence of calls fails:
T=ets:new(x,[]),
...
@@ -1089,7 +1240,9 @@ A
end),10),
Broken = binary_to_term(term_to_binary(C)),
ets:select(Broken).
- ...while the following sequence will work:
+The following sequence works, as the call to
+
T=ets:new(x,[]),
...
@@ -1100,45 +1253,44 @@ end),
{_,C} = ets:select(T,MS,10),
Broken = binary_to_term(term_to_binary(C)),
ets:select(ets:repair_continuation(Broken,MS)).
- ...as the call to
This function is very rarely needed in application code. It
- is used by Mnesia to implement distributed
This function is rarely needed in application code. It is used
+ by Mnesia to provide distributed
The reason for not having an external representation of a - compiled match_spec is performance. It may be subject to - change in future releases, while this interface will remain + compiled match specification is performance. It can be subject to + change in future releases, while this interface remains for backward compatibility.
Fixes a table of the
Fixes a table of type
A process fixes a table by calling
-
If several processes fix a table, the table will remain fixed +
If many processes fix a table, the table remains fixed until all processes have released it (or terminated). A reference counter is kept on a per process basis, and N - consecutive fixes requires N releases to actually release - the table.
-When a table is fixed, a sequence of
When a table is fixed, a sequence of
+
Example:
clean_all_with_value(Tab,X) ->
safe_fixtable(Tab,true),
@@ -1155,218 +1307,205 @@ clean_all_with_value(Tab,X,Key) ->
true
end,
clean_all_with_value(Tab,X,ets:next(Tab,Key)).
- Note that no deleted objects are actually removed from a +
Notice that no deleted objects are removed from a fixed table until it has been released. If a process fixes a table but never releases it, the memory used by the deleted - objects will never be freed. The performance of operations on - the table will also degrade significantly.
-Use
-
To retrieve information about which processes have fixed which
+ tables, use
Note that for tables of the
Notice that for table type
Continues a match started with
+
When there are no more objects in the table,
Matches the objects in the table
This means that the match_spec is always a list of one or
- more tuples (of arity 3). The tuples first element should be
- a pattern as described in the documentation of
-
Matches the objects in table
+MatchSpec = [MatchFunction]
+MatchFunction = {MatchHead, [Guard], [Result]}
+MatchHead = "Pattern as in ets:match"
+Guard = {"Guardtest name", ...}
+Result = "Term construct"
+ This means that the match specification is always a list of one or
+ more tuples (of arity 3). The first element of the tuple is to be
+ a pattern as described in
+
The return value is constructed using the "match variables"
- bound in the MatchHead or using the special match variables
+ bound in
ets:match(Tab,{'$1','$2','$3'})
is exactly equivalent to:
ets:select(Tab,[{{'$1','$2','$3'},[],['$$']}])
- - and the following
And that the following
ets:match_object(Tab,{'$1','$2','$1'})
is exactly equivalent to
ets:select(Tab,[{{'$1','$2','$1'},[],['$_']}])
Composite terms can be constructed in the
ets:select(Tab,[{{'$1','$2','$3'},[],['$$']}])
gives the same output as:
ets:select(Tab,[{{'$1','$2','$3'},[],[['$1','$2','$3']]}])
- i.e. all the bound variables in the match head as a list. If +
That is, all the bound variables in the match head as a list. If
tuples are to be constructed, one has to write a tuple of
- arity 1 with the single element in the tuple being the tuple
- one wants to construct (as an ordinary tuple could be mistaken
- for a
Therefore the following call:
ets:select(Tab,[{{'$1','$2','$1'},[],['$_']}])
gives the same output as:
ets:select(Tab,[{{'$1','$2','$1'},[],[{{'$1','$2','$3'}}]}])
- - this syntax is equivalent to the syntax used in the trace
- patterns (see
-
The
This syntax is equivalent to the syntax used in the trace
+ patterns (see the
+
The
The
- is expressed like this (X replaced with '$1' and Y with - '$2'):
+is expressed as follows (
- On tables of the
For tables of type
Works like
Continues a match started with
-
Works like
If the table is empty,
Matches the objects in the table
Matches the objects in table
The function could be described as a
This function can be described as a
+
The function returns the number of objects matched.
Matches the objects in the table
The function returns the number of objects actually +
Matches the objects in table
The function returns the number of objects deleted from the table.
The
The match specification has to return the atom
Works like
Works like
Note that this is not equivalent to
- reversing the result list of a
Continues a match started with
-
For all other table types, the behaviour is exactly that of
Example:
+Continues a match started with
Example:
1> T = ets:new(x,[ordered_set]).
2> [ ets:insert(T,{N}) || N <- lists:seq(1,10) ].
@@ -1384,217 +1523,288 @@ is_integer(X), is_integer(Y), X + Y < 4711]]>
8> R2.
[{2},{1}]
9> '$end_of_table' = ets:select_reverse(C2).
-...
-
+...
Works like
Works like
Notice that this is not equivalent to
+ reversing the result list of a
Set table options. The only option that currently is allowed to be
- set after the table has been created is
-
Sets table options. The only allowed option to be set after the
+ table has been created is
+
This function is mostly for debugging purposes, Normally
- one should use
Returns all objects in the
Returns all objects in slot
Unless a table of type
Unless a table of type
Dumps the table
Equivalent to
Dumps table
Equivalent to
+
Dumps the table
When dumping the table, certain information about the table - is dumped to a header at the beginning of the dump. This - information contains data about the table type, - name, protection, size, version and if it's a named table. It - also contains notes about what extended information is added - to the file, which can be a count of the objects in the file - or a MD5 sum of the header and records in the file.
-The size field in the header might not correspond to the - actual number of records in the file if the table is public - and records are added or removed from the table during - dumping. Public tables updated during dump, and that one wants - to verify when reading, needs at least one field of extended - information for the read verification process to be reliable - later.
-The
The number of objects actually written to the file is - noted in the file footer, why verification of file truncation - is possible even if the file was updated during - dump.
The header and objects in the file are checksummed using - the built in MD5 functions. The MD5 sum of all objects is - written in the file footer, so that verification while reading - will detect the slightest bitflip in the file data. Using this - costs a fair amount of CPU time.
Whenever the
The
Dumps table
When dumping the table, some information about the table + is dumped to a header at the beginning of the dump. This + information contains data about the table type, + name, protection, size, version, and if it is a named table. It + also contains notes about what extended information is added + to the file, which can be a count of the objects in the file + or a MD5 sum of the header and records in the file.
+The size field in the header might not correspond to the + number of records in the file if the table is public + and records are added or removed from the table during + dumping. Public tables updated during dump, and that one wants + to verify when reading, needs at least one field of extended + information for the read verification process to be reliable + later.
+Option
The number of objects written to the file is + noted in the file footer, so file truncation can be + verified even if the file was updated during dump.
+The header and objects in the file are checksummed using + the built-in MD5 functions. The MD5 sum of all objects is + written in the file footer, so that verification while reading + detects the slightest bitflip in the file data. Using this + costs a fair amount of CPU time.
+Whenever option
If option
Returns a list of all objects in the table
Returns a list of all objects in table
Returns information about the table dumped to file by
-
The following items are returned:
-The name of the dumped table. If the table was a
- named table, a table with the same name cannot exist when the
- table is loaded from file with
-
An error is returned if the file is inaccessible,
- badly damaged or not an file produced with
Returns information about the table dumped to file by
+
The following items are returned:
+The name of the dumped table. If the table was a
+ named table, a table with the same name cannot exist when the
+ table is loaded from file with
+
The ETS type of the dumped table (that is,
The protection of the dumped table (that is,
The
The number of objects in the table when the table dump
+ to file started. For a
The extended information written in the file footer to
+ allow stronger verification during table loading from file, as
+ specified to
A tuple
An error is returned if the file is inaccessible,
+ badly damaged, or not produced with
+
Returns a Query List
+ Comprehension (QLC) query handle. The
+
When there are only simple restrictions on the key position
- QLC uses
When there are only simple restrictions on the key position,
+ QLC uses
The table is traversed one key at a time by calling
+
The table is traversed one key at a time by calling
+
The table is traversed by calling
+
As for
The following example uses an explicit match_spec to - traverse the table:
+ +Examples:
+An explicit match specification is here used to traverse the + table:
9> true = ets:insert(Tab = ets:new(t, []), [{1,a},{2,b},{3,c},{4,d}]),
MS = ets:fun2ms(fun({X,Y}) when (X > 1) or (X < 5) -> {Y} end),
QH1 = ets:table(Tab, [{traverse, {select, MS}}]).
- An example with implicit match_spec:
+An example with an implicit match specification:
10> QH2 = qlc:q([{Y} || {X,Y} <- ets:table(Tab), (X > 1) or (X < 5)]).
- The latter example is in fact equivalent to the former which
- can be verified using the function
The latter example is equivalent to the former, which
+ can be verified using function
11> qlc:info(QH1) =:= qlc:info(QH2). true@@ -1603,52 +1813,60 @@ true two query handles.
Returns and removes a list of all objects with key
+
The specified
This function is a utility to test a
-
If the match specification is syntactically correct, the function
+ either returns
If the match specification contains errors, tuple
+
This is a useful debugging and test tool, especially when
- writing complicated
See also:
Returns a list of all objects with the key
The given
Fills an already created/opened Dets table with the objects
- in the already opened ETS table named
This function provides an efficient way to update one or more - counters, without the hassle of having to look up an object, update - the object by incrementing an element and insert the resulting object - into the table again. (The update is done atomically; i.e. no process - can access the ets table in the middle of the operation.) -
-It will destructively update the object with key
This function destructively update the object with key
+
If a
If a
A list of
The given
If a default object
A list of
The specified
If a default object
The function will fail with reason
The function fails with reason
This function provides an efficient way to update one or more - elements within an object, without the hassle of having to look up, - update and write back the entire object. -
-It will destructively update the object with key
A list of
The function returns
The given
The function will fail with reason
This function destructively updates the object with key
+
A list of
Returns
The specified
The function fails with reason
The functions of this module sort terms on files, merge already - sorted files, and check files for sortedness. Chunks containing - binary terms are read from a sequence of files, sorted +
This module contains functions for sorting terms on files, merging + already sorted files, and checking files for sortedness. Chunks + containing binary terms are read from a sequence of files, sorted internally in memory and written on temporary files, which are merged producing one sorted file as output. Merging is provided as an optimization; it is faster when the files are already - sorted, but it always works to sort instead of merge. -
+ sorted, but it always works to sort instead of merge. +On a file, a term is represented by a header and a binary. Two - options define the format of terms on files: -
-Option
Option
Other options are: -
-Other options are:
+ +The default is to sort terms in
+ ascending order, but that can be changed by value
+
When sorting or merging files,
+ only the first of a sequence of terms that compare equal (
The directory where
+ temporary files are put can be chosen explicitly. The
+ default, implied by value
Temporary files and the output file can be compressed. Defaults
+
By default about 512*1024 bytes read from files are sorted + internally. This option is rarely needed.
By default 16 files are merged at a time. This option is rarely + needed.
As an alternative to sorting files, a function of one argument
- can be given as input. When called with the argument
A function of one argument can be given as output. The results
+ can be specified as input. When called with argument
Any other value is immediately returned as value of the current call
+ to
A function of one argument can be specified as output. The results
of sorting or merging the input is collected in a non-empty
sequence of variable length lists of binaries or terms depending
on the format. The output function is called with one list at a
@@ -151,18 +176,20 @@
call to the sort or merge function. Each output function is
called exactly once. When some output function has been applied
to all of the results or an error occurs, the last function is
- called with the argument
If a function is specified as input and the last input function
+ returns
As an example, consider sorting the terms on a disk log file. A function that reads chunks from the disk log and returns a list of binaries is used as input. The results are collected in a list of terms.
+
sort(Log) ->
{ok, _} = disk_log:open([{name,Log}, {mode,read_only}]),
@@ -193,29 +220,32 @@ output(L) ->
lists:append(lists:reverse(L));
(Terms) ->
output([Terms | L])
- end.
- Further examples of functions as input and output can be found
- at the end of the
For more examples of functions as input and output, see
+ the end of the
The possible values of
Sorts terms on files.
Checks files for sortedness. If a file is not sorted, the + first out-of-order element is returned. The first term on a + file has position 1.
+Sorts terms on files.
Checks files for sortedness. If a file is not sorted, the + first out-of-order element is returned. The first term on a + file has position 1.
+Merges tuples on files. Each input file is assumed to be + sorted on key(s).
+Sorts tuples on files.
Sorts tuples on files.
+Sorts tuples on files. The sort is performed on the
element(s) mentioned in
Merges tuples on files. Each input file is assumed to be - sorted on key(s).
-Checks files for sortedness. If a file is not sorted, the - first out-of-order element is returned. The first term on a - file has position 1.
-Sorts terms on files.
+Checks files for sortedness. If a file is not sorted, the - first out-of-order element is returned. The first term on a - file has position 1.
-Sorts terms on files.
+This module contains utilities on a higher level than the
This module does not support "raw" file names (i.e. files whose names - do not comply with the expected encoding). Such files will be ignored - by the functions in this module.
-For more information about raw file names, see the
This module contains utilities on a higher level than the
+
This module does not support "raw" filenames (that is, files whose + names do not comply with the expected encoding). Such files are ignored + by the functions in this module.
+ +For more information about raw filenames, see the
+
The
Ensures that all parent directories for the specified file or
+ directory name
Returns
The
Returns the size of the specified file.
The
If Unicode file name translation is in effect and the file
- system is completely transparent, file names that cannot be
- interpreted as Unicode may be encountered, in which case the
-
For more information about raw file names, see the
-
Folds function
If Unicode filename translation is in effect and the file
+ system is transparent, filenames that cannot be
+ interpreted as Unicode can be encountered, in which case the
+
For more information about raw filenames, see the
+
The
Returns
The
Returns
The
Returns
The
Returns the date and time the specified file or directory was last
+ modified, or
The
Returns a list of all files that match Unix-style wildcard string
+
The wildcard string looks like an ordinary filename, except - that certain "wildcard characters" are interpreted in a special - way. The following characters are special: -
+ that the following "wildcard characters" are interpreted in a special + way:Two adjacent
Two adjacent
Matches any of the characters listed. Two characters
- separated by a hyphen will match a range of characters.
- Example:
Other characters represent themselves. Only filenames that - have exactly the same character in the same position will match. - (Matching is case-sensitive; i.e. "a" will not match "A"). -
-Note that multiple "*" characters are allowed - (as in Unix wildcards, but opposed to Windows/DOS wildcards). -
-Examples:
+ have exactly the same character in the same position match. + Matching is case-sensitive, for example, "a" does not match "A". +Notice that multiple "*" characters are allowed + (as in Unix wildcards, but opposed to Windows/DOS wildcards).
+Examples:
The following examples assume that the current directory is the - top of an Erlang/OTP installation. -
-To find all
To find all
- filelib:wildcard("lib/*/ebin/*.beam").
- To find either
To find
- filelib:wildcard("lib/*/src/*.?rl")
- or the following line
+filelib:wildcard("lib/*/src/*.?rl")
- filelib:wildcard("lib/*/src/*.{erl,hrl}")
- can be used.
-To find all
To find all
- filelib:wildcard("lib/*/{src,include}/*.hrl").
+filelib:wildcard("lib/*/{src,include}/*.hrl").
To find all
- filelib:wildcard("lib/*/{src,include}/*.{erl,hrl}")
- To find all
To find all
- filelib:wildcard("lib/**/*.{erl,hrl}")
+filelib:wildcard("lib/**/*.{erl,hrl}")
The
Same as
The module
This module provides functions
+ for analyzing and manipulating filenames. These functions are
designed so that the Erlang code can work on many different
- platforms with different formats for file names. With file name
- is meant all strings that can be used to denote a file. They can
- be short relative names like
In Windows, all functions return file names with forward slashes
- only, even if the arguments contain back slashes. Use
-
The module supports raw file names in the way that if a binary is present, or the file name cannot be interpreted according to the return value of
-
In Windows, all functions return filenames with forward slashes
+ only, even if the arguments contain backslashes. To normalize a
+ filename by removing redundant directory separators, use
+
The module supports raw filenames in the way that if a binary is
+ present, or the filename cannot be interpreted according to the return
+ value of
Converts a relative
Converts a relative
Unix examples:
+Unix examples:
1> pwd().
"/usr/local"
@@ -72,7 +83,7 @@
"/usr/local/../x"
4> filename:absname("/").
"/"
- Windows examples:
+Windows examples:
1> pwd(). "D:/usr/local" @@ -84,28 +95,32 @@ "D:/"
This function works like
Same as
Joins an absolute directory with a relative filename.
- Similar to
Joins an absolute directory with a relative filename. Similar to
+
- Returns a suitable path, or paths, for a given type.
- If
The options
Returns the last component of
Examples:
5> filename:basename("foo").
"foo"
@@ -271,15 +289,18 @@ true
[]
Returns the last component of
Returns the last component of
Examples:
8> filename:basename("~/src/kalle.erl", ".erl").
"kalle"
@@ -293,27 +314,32 @@ true
"kalle"
Returns the directory part of
Examples:
13> filename:dirname("/usr/src/kalle.erl").
"/usr/src"
14> filename:dirname("kalle.erl").
-"."
-
+"."
+
5> filename:dirname("\\usr\\src/kalle.erl"). % Windows
"/usr/src"
Returns the file extension of
Returns the file extension of
Examples:
15> filename:extension("foo.erl").
".erl"
@@ -321,69 +347,123 @@ true
[]
Finds the source filename and compiler options for a module.
+ The result can be fed to
It is not recommended to use this function. If possible,
+ use the
Argument
+[{"", ""}, {"ebin", "src"}, {"ebin", "esrc"}]
+ If the source file is found in the resulting directory, the function
+ returns that location together with
The function returns
Converts a possibly deep list filename consisting of characters and atoms into the corresponding flat string filename.
Joins a list of file name
Joins a list of filename
The result is "normalized":
Examples:
17> filename:join(["/usr", "local", "bin"]). "/usr/local/bin" 18> filename:join(["a/b///c/"]). -"a/b/c" - +"a/b/c"+
6> filename:join(["B:a\\b///c/"]). % Windows "b:a/b/c"+
Joins two file name components with directory separators.
- Equivalent to
Joins two filename components with directory separators.
+ Equivalent to
Converts
Converts
Examples:
19> filename:nativename("/usr/local/bin/"). % Unix
-"/usr/local/bin"
-
+"/usr/local/bin"
+
7> filename:nativename("/usr/local/bin/"). % Windows
"\\usr\\local\\bin"
Returns the type of path, one of
Returns the path type, which is one of the following:
Remove a filename extension.
Removes a filename extension.
Examples:
20> filename:rootname("/beam.src/kalle").
/beam.src/kalle"
@@ -427,12 +509,14 @@ true
"/beam.src/foo.beam"
Returns a list whose elements are the path components of
Examples:
24> filename:split("/usr/local/bin").
["/","usr","local","bin"]
@@ -442,50 +526,6 @@ true
["a:/","msdev","include"]
Finds the source filename and compiler options for a module.
- The result can be fed to
We don't recommend using this function. If possible,
- use
The
-[{"", ""}, {"ebin", "src"}, {"ebin", "esrc"}]
- If the source file is found in the resulting directory, then
- the function returns that location together with
-
The function returns
An implementation of ordered sets using Prof. Arne Andersson's - General Balanced Trees. This can be much more efficient than +
This module provides ordered sets using Prof. Arne Andersson's + General Balanced Trees. Ordered sets can be much more efficient than using ordered lists, for larger sets, but depends on the application.
+This module considers two elements as different if and only if
they do not compare equal (
The complexity on set operations is bounded by either O(|S|) or - O(|T| * log(|S|)), where S is the largest given set, depending +
The complexity on set operations is bounded by either O(|S|) or + O(|T| * log(|S|)), where S is the largest given set, depending on which is fastest for any particular function call. For operating on sets of almost equal size, this implementation is about 3 times slower than using ordered-list sets directly. For sets of very different sizes, however, this solution can be - arbitrarily much faster; in practical cases, often between 10 - and 100 times. This implementation is particularly suited for + arbitrarily much faster; in practical cases, often + 10-100 times. This implementation is particularly suited for accumulating elements a few at a time, building up a large set - (more than 100-200 elements), and repeatedly testing for + (> 100-200 elements), and repeatedly testing for membership in the current set.
+As with normal tree structures, lookup (membership testing), - insertion and deletion have logarithmic complexity.
+ insertion, and deletion have logarithmic complexity.All of the following functions in this module also exist
- and do the same thing in the
The following functions in this module also exist and provides
+ the same functionality in the
+
A GB set.
A general balanced set.
A GB set iterator.
A general balanced set iterator.
Returns a new set formed from
Rebalances the tree representation of
Rebalances the tree representation of
Returns a new set formed from
Returns a new set formed from
Returns a new set formed from
Returns only the elements of
Returns only the elements of
Returns a new empty set.
Filters elements in
Folds
Folds
Returns a set of the elements in
Turns an ordered-set list
Turns an ordered-set list
Returns a new set formed from
Returns the intersection of
Returns the intersection of the non-empty list of sets.
Returns the intersection of the non-empty list of sets.
+Returns the intersection of
Returns
Returns
Returns
Returns
Returns
Returns
Returns
Returns an iterator that can be used for traversing the
- entries of
Returns an iterator that can be used for traversing the entries of
+
Returns an iterator that can be used for traversing the
- entries of
Returns the largest element in
Returns a new empty set.
+Returns
Returns
Returns a set containing only the element
Returns a set containing only element
Returns the number of elements in
Returns the smallest element in
Returns only the elements of
Returns
Returns
Returns
Returns
Returns the elements of
Returns the merged (union) set of
Returns the merged (union) set of the list of sets.
Returns the merged (union) set of the list of sets.
+Returns the merged (union) set of
An efficient implementation of Prof. Arne Andersson's General +
This module provides Prof. Arne Andersson's General Balanced Trees. These have no storage overhead compared to - unbalanced binary trees, and their performance is in general + unbalanced binary trees, and their performance is better than AVL trees.
+This module considers two keys as different if and only if
they do not compare equal (
Data structure:
+
-
-- {Size, Tree}, where `Tree' is composed of nodes of the form:
- - {Key, Value, Smaller, Bigger}, and the "empty tree" node:
- - nil.
- There is no attempt to balance trees after deletions. Since +{Size, Tree} + +
There is no attempt to balance trees after deletions. As deletions do not increase the height of a tree, this should be OK.
-Original balance condition h(T) <= ceil(c * log(|T|)) + +
The original balance condition h(T) <= ceil(c * log(|T|)) has been changed to the similar (but not quite equivalent) condition 2 ^ h(T) <= |T| ^ c. This should also be OK.
-Performance is comparable to the AVL trees in the Erlang book - (and faster in general due to less overhead); the difference is - that deletion works for these trees, but not for the book's - trees. Behaviour is logarithmic (as it should be).
A GB tree.
A general balanced tree.
A GB tree iterator.
A general balanced tree iterator.
Rebalances
Rebalances
Removes the node with key
Removes the node with key
Removes the node with key
Removes the node with key
Returns a new empty tree
+Returns a new empty tree.
Inserts
Inserts
Turns an ordered list
Turns an ordered list
Retrieves the value stored with
Retrieves the value stored with
Inserts
Inserts
Returns
Returns
Returns
Returns
Returns an iterator that can be used for traversing the
- entries of
Returns an iterator that can be used for traversing the
- entries of
Returns the keys in
Returns
Returns
Looks up
Looks up
Maps the function F(
Maps function F(
Returns
Returns
Returns the number of nodes in
Returns
Returns
Returns
Returns
Returns
Returns
Converts a tree into an ordered list of key-value tuples.
Updates
Updates
Returns the values in
Returns the values in
A behaviour module for implementing event handling functionality. - The OTP event handling model consists of a generic event manager - process with an arbitrary number of event handlers which are added and - deleted dynamically.
-An event manager implemented using this module will have a standard - set of interface functions and include functionality for tracing and - error reporting. It will also fit into an OTP supervision tree. - Refer to OTP Design Principles for more information.
+This behavior module provides event handling functionality. It + consists of a generic event manager process with any number of + event handlers that are added and deleted dynamically.
+ +An event manager implemented using this module has a standard
+ set of interface functions and includes functionality for tracing and
+ error reporting. It also fits into an OTP supervision tree. For more
+ information, see
+
Each event handler is implemented as a callback module exporting - a pre-defined set of functions. The relationship between the behaviour - functions and the callback functions can be illustrated as follows:
+ a predefined set of functions. The relationship between the behavior + functions and the callback functions is as follows: +gen_event module Callback module ---------------- --------------- @@ -69,39 +73,46 @@ gen_event:which_handlers -----> - gen_event:stop -----> Module:terminate/2 - -----> Module:code_change/3-
Since each event handler is one callback module, an event manager
- will have several callback modules which are added and deleted
- dynamically. Therefore
As each event handler is one callback module, an event manager
+ has many callback modules that are added and deleted
+ dynamically.
A gen_event process handles system messages as documented in
-
A
Note that an event manager does trap exit signals + +
Notice that an event manager does trap exit signals automatically.
-The gen_event process can go into hibernation
- (see
It's also worth noting that when multiple event handlers are
- invoked, it's sufficient that one single event handler returns a
-
The
Notice that when multiple event handlers are
+ invoked, it is sufficient that one single event handler returns a
+
Unless otherwise stated, all functions in this module fail if the specified event manager does not exist or if bad arguments are - given.
+ specified.Creates an event manager process as part of a supervision - tree. The function should be called, directly or indirectly, - by the supervisor. It will, among other things, ensure that - the event manager is linked to the supervisor.
-If
If the event manager is successfully created the function
- returns
Creates a stand-alone event manager process, i.e. an event - manager which is not part of a supervision tree and thus has - no supervisor.
-See
Adds a new event handler to the event manager
Adds a new event handler to event manager
If
Adds a new event handler in the same way as
Adds a new event handler in the same way as
+
If the event handler later is deleted, the event manager +
If the event handler is deleted later, the event manager
sends a message
A term, if the event handler is removed because of an error. + Which term depends on the error.
See
Sends an event notification to the event manager
-
See
For a description of the arguments and return values, see
+
Makes a synchronous call to the event handler
See
Makes a synchronous call to event handler
For a description of
The return value
Deletes an event handler from the event manager
-
See
Deletes an event handler from event manager
+
For a description of
The return value is the return value of
Sends an event notification to event manager
+
For a description of
Creates a stand-alone event manager process, that is, an event + manager that is not part of a supervision tree and thus has + no supervisor.
+For a description of the arguments and return values, see
+
Creates an event manager process as part of a supervision + tree. The function is to be called, directly or indirectly, + by the supervisor. For example, it ensures that + the event manager is linked to the supervisor.
+If
If
If
If the event manager is successfully created, the function
+ returns
Orders event manager
The function returns
If the process does not exist, a
For a description of
Replaces an old event handler with a new event handler in
- the event manager
See
For a description of the arguments, see
+
First the old event handler
Then the new event handler
The new handler will be added even if the the specified old event
- handler is not installed in which case
The new handler is added even if the the specified old event
+ handler is not installed, in which case
If there was a supervised connection between
If
Replaces an event handler in the event manager
Replaces an event handler in event manager
See
For a description of the arguments and return values, see
+
Returns a list of all event handlers installed in the event +
Returns a list of all event handlers installed in event
manager
See
Orders the event manager
The function returns
If the process does not exist, a
See
For a description of
The following functions should be exported from a
The following functions are to be exported from a
Whenever a new event handler is added to an event manager, - this function is called to initialize the event handler.
-If the event handler is added due to a call to
-
If the event handler is replacing another event handler due to
- a call to
If successful, the function should return
If
This function is called for an installed event handler that
+ is to update its internal state during a release
+ upgrade/downgrade, that is, when the instruction
+
For an upgrade,
The function is to return the updated internal state.
Whenever an event manager receives an event sent using
-
This callback is optional, so event handler modules need
+ not export it. If a handler does not export this function,
+ the
This function is called by a
This function is useful for changing the form and
+ appearance of the event handler state for these cases. An
+ event handler callback module wishing to change the
+ the
If the function returns
If
If the function returns
-
If the function returns
The function is to return
When
When an event handler terminates abnormally,
One use for this function is to return compact alternative + state representations to avoid that large state terms + are printed in log files.
Whenever an event manager receives a request sent using
-
The return values are the same as for
+
Whenever an event manager receives an event sent using
+
The return values are the same as for
If
If
If
If
See
For a description of
Whenever a new event handler is added to an event manager, + this function is called to initialize the event handler.
+If the event handler is added because of a call to
+
If the event handler replaces another event handler because of
+ a call to
+
If successful, the function returns
If
Whenever an event handler is deleted from an event manager,
- this function is called. It should be the opposite of
-
If the event handler is deleted due to a call to
-
If the event handler is deleted because of a call to
+
The event manager will terminate if it is part of a supervision
- tree and it is ordered by its supervisor to terminate.
- Even if it is not part of a supervision tree, it will
- terminate if it receives an
The event manager terminates if it is part of a supervision
+ tree and it is ordered by its supervisor to terminate.
+ Even if it is not part of a supervision tree, it
+ terminates if it receives an
The function may return any term. If the event handler is
- deleted due to a call to
The function can return any term. If the event handler is
+ deleted because of a call to
This function is called for an installed event handler which
- should update its internal state during a release
- upgrade/downgrade, i.e. when the instruction
-
In the case of an upgrade,
The function should return the updated internal state.
-This callback is optional, so event handler modules need - not export it. If a handler does not export this function, - the gen_event module uses the handler state directly for - the purposes described below.
-This function is called by a gen_event process when:
-This function is useful for customising the form and
- appearance of the event handler state for these cases. An
- event handler callback module wishing to customise
- the
The function should return
One use for this function is to return compact alternative - state representations to avoid having large state terms - printed in logfiles.
-
There is a new behaviour
A behaviour module for implementing a finite state machine.
- A generic finite state machine process (gen_fsm) implemented
- using this module will have a standard set of interface functions
- and include functionality for tracing and error reporting. It will
- also fit into an OTP supervision tree. Refer to
-
This behavior module provides a finite state machine.
+ A generic finite state machine process (
A gen_fsm assumes all specific parts to be located in a callback - module exporting a pre-defined set of functions. The relationship - between the behaviour functions and the callback functions can be - illustrated as follows:
+ +A
gen_fsm module Callback module -------------- --------------- @@ -73,34 +74,261 @@ gen_fsm:sync_send_all_state_event -----> Module:handle_sync_event/4 - -----> Module:terminate/3 - -----> Module:code_change/4-
If a callback function fails or returns a bad value, the gen_fsm - will terminate.
-A gen_fsm handles system messages as documented in
-
Note that a gen_fsm does not trap exit signals automatically, - this must be explicitly initiated in the callback module.
+ +If a callback function fails or returns a bad value, the
A
Notice that a
Unless otherwise stated, all functions in this module fail if - the specified gen_fsm does not exist or if bad arguments are - given.
-The gen_fsm process can go into hibernation
- (see
The
Cancels an internal timer referred by
If the timer has already timed out, but the event not yet + been delivered, it is cancelled as if it had not + timed out, so there is no false timer event after + returning from this function.
+Returns the remaining time in milliseconds until the timer would
+ have expired if
Makes an existing process into a
This function is useful when a more complex initialization
+ procedure is needed than the
The function fails if the calling process was not started by a
+
This function can be used by a
Return value
Sends an event asynchronously to the
For a description of the arguments, see
+
The difference between
Sends an event asynchronously to the
Sends a delayed event internally in the
The
Creates a standalone
For a description of arguments and return values, see
+
Creates a gen_fsm process as part of a supervision tree. - The function should be called, directly or indirectly, by - the supervisor. It will, among other things, ensure that - the gen_fsm is linked to the supervisor.
-The gen_fsm process calls
Creates a
The
If
If no name is provided, - the gen_fsm is not registered.
+If
If
If
If no name is provided, the
If the option
If option
If the option
If the option
If option
If option
Using the spawn option
Using spawn option
If the gen_fsm is successfully created and initialized
- the function returns
If the
If
Creates a stand-alone gen_fsm process, i.e. a gen_fsm which - is not part of a supervision tree and thus has no supervisor.
-See
Sends a time-out event internally in the
The
Orders a generic FSM to exit with the given
The function returns
Orders a generic finite state machine to exit with the specified
+
The function returns
If the process does not exist, a
Sends an event asynchronously to the gen_fsm
If the process does not exist, a
Sends an event asynchronously to the gen_fsm
See
The difference between
Sends an event to the
For a description of
For a discussion about the difference between
+
Sends an event to the gen_fsm
Sends an event to the
See
For a description of
The return value
Return value
The ancient behaviour of sometimes consuming the server
+ The ancient behavior of sometimes consuming the server
exit message if the server died during the call while
- linked to the client has been removed in OTP R12B/Erlang 5.6.
The following functions are to be exported from a
state name denotes a state of the state machine.
+ +state data denotes the internal state of the Erlang process + that implements the state machine.
+Sends an event to the gen_fsm
See
See
-
This function is called by a
For an upgrade,
The function is to return the new current state name and + updated internal data.
This function can be used by a gen_fsm to explicitly send a
- reply to a client process that called
-
The return value
This callback is optional, so callback modules need not
+ export it. The
This function is called by a
This function is useful for changing the form and
+ appearance of the
The function is to return
One use for this function is to return compact alternative + state data representations to avoid that large state terms + are printed in log files.
Sends a delayed event internally in the gen_fsm that calls
- this function after
The gen_fsm will call
Sends a timeout event internally in the gen_fsm that calls
- this function after
The gen_fsm will call
Whenever a
For a description of the other arguments and possible return values,
+ see
Cancels an internal timer referred by
If the timer has already timed out, but the event not yet - been delivered, it is cancelled as if it had not - timed out, so there will be no false timer event after - returning from this function.
-Returns the remaining time in ms until the timer would
- have expired if
This function is called by a
For a description of the other arguments and possible return values,
+ see
Makes an existing process into a gen_fsm. Does not return,
- instead the calling process will enter the gen_fsm receive
- loop and become a gen_fsm process. The process must
- have been started using one of the start functions in
-
This function is useful when a more complex initialization - procedure is needed than the gen_fsm behaviour provides.
-Failure: If the calling process was not started by a
-
Whenever a
For a description of the other arguments and possible return values,
+ see
The following functions should be exported from a
In the description, the expression state name is used to - denote a state of the state machine. state data is used - to denote the internal state of the Erlang process which - implements the state machine.
-Whenever a gen_fsm is started using
-
Whenever a
If initialization is successful, the function should return
-
If initialization is successful, the function is to return
+
If an integer timeout value is provided, a timeout will occur
+ state data of the
If an integer time-out value is provided, a time-out occurs
unless an event or a message is received within
If
If something goes wrong during the initialization
- the function should return
If
If the initialization fails, the function returns
+
There should be one instance of this function for each
- possible state name. Whenever a gen_fsm receives an event
- sent using
-
There is to be one instance of this function for each
+ possible state name. Whenever a
If the function returns
If the function returns
Whenever a gen_fsm receives an event sent using
-
See
There should be one instance of this function for each
- possible state name. Whenever a gen_fsm receives an event
- sent using
-
There is to be one instance of this function for each
+ possible state name. Whenever a
If the function returns
-
If the function returns
-
If the function returns
-
Whenever a gen_fsm receives an event sent using
-
See
This function is called by a gen_fsm when it receives any - other message than a synchronous or asynchronous event (or a - system message).
-See
If
If
If the function returns
+
This function is called by a gen_fsm when it is about to
- terminate. It should be the opposite of
This function is called by a
If the gen_fsm is part of a supervision tree and is ordered - by its supervisor to terminate, this function will be called +
If the
The
The shutdown strategy as defined in the child specification of
+ the supervisor is an integer time-out value, not
+
Even if the gen_fsm is not part of a supervision tree,
- this function will be called if it receives an
Otherwise, the gen_fsm will be immediately terminated.
-Note that for any other reason than
This function is called by a gen_fsm when it should update
- its internal state data during a release upgrade/downgrade,
- i.e. when the instruction
In the case of an upgrade,
The function should return the new current state name and - updated internal data.
-This callback is optional, so callback modules need not - export it. The gen_fsm module provides a default - implementation of this function that returns the callback - module state data.
-This function is called by a gen_fsm process when:
-This function is useful for customising the form and
- appearance of the gen_fsm status for these cases. A callback
- module wishing to customise the
The function should return
One use for this function is to return compact alternative - state data representations to avoid having large state terms - printed in logfiles.
+Even if the
Otherwise, the
Notice that for any other reason than
A behaviour module for implementing the server of a client-server
- relation. A generic server process (gen_server) implemented using
- this module will have a standard set of interface functions and
- include functionality for tracing and error reporting. It will
- also fit into an OTP supervision tree. Refer to
-
A gen_server assumes all specific parts to be located in a - callback module exporting a pre-defined set of functions. - The relationship between the behaviour functions and the callback - functions can be illustrated as follows:
+This behavior module provides the server of a client-server
+ relation. A generic server process (
A
gen_server module Callback module ----------------- --------------- @@ -59,175 +62,65 @@ gen_server:abcast -----> Module:handle_cast/2 - -----> Module:terminate/2 -- -----> Module:code_change/3-
If a callback function fails or returns a bad value, - the gen_server will terminate.
-A gen_server handles system messages as documented in
-
Note that a gen_server does not trap exit signals automatically, - this must be explicitly initiated in the callback module.
+- -----> Module:code_change/3 + +If a callback function fails or returns a bad value, the
+
A
Notice that a
Unless otherwise stated, all functions in this module fail if - the specified gen_server does not exist or if bad arguments are - given.
- -The gen_server process can go into hibernation
- (see
The
Creates a gen_server process as part of a supervision tree. - The function should be called, directly or indirectly, by - the supervisor. It will, among other things, ensure that - the gen_server is linked to the supervisor.
-The gen_server process calls
If
If the option
If the option
If the option
Using the spawn option
If the gen_server is successfully created and initialized
- the function returns
If
Creates a stand-alone gen_server process, i.e. a gen_server - which is not part of a supervision tree and thus has no - supervisor.
-See
Orders a generic server to exit with the
- given
The function returns
If the process does not exist, a
Sends an asynchronous request to the
For a description of the arguments, see
+
Makes a synchronous call to the gen_server
Makes a synchronous call to the
The return value
The call may fail for several reasons, including timeout and - the called gen_server dying before or during the call.
-The ancient behaviour of sometimes consuming the server +
The call can fail for many reasons, including time-out and the
+ called
The ancient behavior of sometimes consuming the server exit message if the server died during the call while - linked to the client has been removed in OTP R12B/Erlang 5.6.
+ linked to the client was removed in Erlang 5.6/OTP R12B. +Sends an asynchronous request to the
For a description of
Makes an existing process into a
This function is useful when a more complex initialization procedure
+ is needed than the
The function fails if the calling process was not started by a
+
Makes a synchronous call to all gen_servers locally +
Makes a synchronous call to all
The function returns a tuple
The function returns a tuple
When a reply
When a reply
If one of the nodes is not capable of process monitors,
- for example C or Java nodes, and the gen_server is not started
- when the requests are sent, but starts within 2 seconds,
- this function waits the whole
If one of the nodes cannot process monitors, for example,
+ C or Java nodes, and the
This problem does not exist if all nodes are Erlang nodes.
To prevent late answers (after the timeout) from polluting - the caller's message queue, a middleman process is used to - do the actual calls. Late answers will then be discarded +
To prevent late answers (after the time-out) from polluting + the message queue of the caller, a middleman process is used to + do the calls. Late answers are then discarded when they arrive to a terminated process.
Sends an asynchronous request to the gen_server
-
See
This function can be used by a
The return value
Sends an asynchronous request to the gen_servers locally
- registered as
See
-
Creates a standalone
For a description of arguments and return values, see
+
This function can be used by a gen_server to explicitly send
- a reply to a client that called
The return value
Creates a
The
If
If
If
If option
If option
If option
Using spawn option
If the
If
Makes an existing process into a gen_server. Does not return,
- instead the calling process will enter the gen_server receive
- loop and become a gen_server process. The process
- must have been started using one of the start
- functions in
This function is useful when a more complex initialization - procedure is needed than the gen_server behaviour provides.
-Failure: If the calling process was not started by a
-
Orders a generic server to exit with the specified
The function returns
If the process does not exist, a
The following functions
- should be exported from a
Whenever a gen_server is started using
-
If the initialization is successful, the function should
- return
If an integer timeout value is provided, a timeout will occur
- unless a request or a message is received within
-
If
If something goes wrong during the initialization
- the function should return
This function is called by a
For an upgrade,
If successful, the function must return the updated + internal state.
+If the function returns
This callback is optional, so callback modules need not
+ export it. The
This function is called by a
One of
The
This function is useful for changing the form and
+ appearance of the
The function is to return
One use for this function is to return compact alternative + state representations to avoid that large state terms are + printed in log files.
+Whenever a gen_server receives a request sent using
-
Whenever a
If the function returns
If the functions returns
If the function returns
If
For a description of
If
If
If
Whenever a gen_server receives a request sent using
-
Whenever a
See
For a description of the arguments and possible return values, see
+
This function is called by a gen_server when a timeout - occurs or when it receives any other message than a +
This function is called by a
See
For a description of the other arguments and possible return values,
+ see
Whenever a
If the initialization is successful, the function is to
+ return
If an integer time-out value is provided, a time-out occurs
+ unless a request or a message is received within
+
If
If the initialization fails, the function is to return
+
This function is called by a gen_server when it is about to
- terminate. It should be the opposite of
This function is called by a
If the gen_server is part of a supervision tree and is - ordered by its supervisor to terminate, this function will be +
If the
The
The shutdown strategy as defined in the child specification
+ of the supervisor is an integer time-out value, not
+
Even if the gen_server is not part of a supervision tree,
- this function will be called if it receives an
Otherwise, the gen_server will be immediately terminated.
-Note that for any other reason than
This function is called by a gen_server when it should
- update its internal state during a release upgrade/downgrade,
- i.e. when the instruction
In the case of an upgrade,
If successful, the function shall return the updated - internal state.
-If the function returns
This callback is optional, so callback modules need not - export it. The gen_server module provides a default - implementation of this function that returns the callback - module state.
-This function is called by a gen_server process when:
-This function is useful for customising the form and
- appearance of the gen_server status for these cases. A
- callback module wishing to customise
- the
The function should return
One use for this function is to return compact alternative - state representations to avoid having large state terms - printed in logfiles.
+Even if the
Otherwise, the
Notice that for any other reason than
The
The Standard Erlang Libraries application, STDLIB, is mandatory + in the sense that the minimal system based on Erlang/OTP consists of + STDLIB and Kernel.
+ +STDLIB contains the following functional areas:
+ +It is assumed that the reader is familiar with the Erlang programming + language.
+This module provides an interface to standard Erlang I/O servers.
The output functions all return
In the following description, all functions have an optional + +
All functions in this module have an optional
parameter
For a description of the IO protocols refer to the
As of R13A, data supplied to the
If an IO device is set in binary mode, the functions
To work with binaries in ISO-latin-1 encoding, use the
For conversion functions between character encodings, see the
For a description of the I/O protocols, see section
+
As from Erlang/OTP R13A, data supplied to function
+
If an I/O device is set in binary mode, functions
+
To work with binaries in ISO Latin-1 encoding, use the
+
For conversion functions between character encodings, see the
+
An IO device. Either
An I/O device, either
What the I/O-server sends when there is no data.
What the I/O server sends when there is no data.
Retrieves the number of columns of the
-
Writes the characters of
Writes new line to the standard output (
Reads
The input characters. If the IO device supports Unicode, - the data may represent codepoints larger than 255 (the - latin1 range). If the I/O server is set to deliver - binaries, they will be encoded in UTF-8 (regardless of if - the IO device actually supports Unicode or not).
-End of file was encountered.
-Other (rare) error condition, for instance
Reads a line from the standard input (
The characters in the line terminated by a LF (or end of - file). If the IO device supports Unicode, - the data may represent codepoints larger than 255 (the - latin1 range). If the I/O server is set to deliver - binaries, they will be encoded in UTF-8 (regardless of if - the IO device actually supports Unicode or not).
-End of file was encountered.
-Other (rare) error condition, for instance
This function requests all available options and their current values for a specific IO device. Example:
-
-1> {ok,F} = file:open("/dev/null",[read]).
-{ok,<0.42.0>}
-2> io:getopts(F).
-[{binary,false},{encoding,latin1}]
- Here the file I/O-server returns all available options for a file,
- which are the expected ones,
-3> io:getopts().
-[{expand_fun,#Fun<group.0.120017273>},
- {echo,true},
- {binary,false},
- {encoding,unicode}]
- This example is, as can be seen, run in an environment where the terminal supports Unicode input and output.
+Retrieves the number of columns of the
+
Return the user requested range of printable Unicode characters.
-The user can request a range of characters that are to be considered printable in heuristic detection of strings by the shell and by the formatting functions. This is done by supplying
Currently the only valid values for
By default, Erlang is started so that only the
The simplest way to utilize the setting is to call
In the future, this function may return more values and ranges. It is recommended to use the io_lib:printable_list/1 function to avoid compatibility problems.
Set options for the standard IO device (
Possible options and values vary depending on the actual
- IO device. For a list of supported options and their current values
- on a specific IO device, use the
The options and values supported by the current OTP IO devices are:
-If set in binary mode (
By default, all IO devices in OTP are set in list mode, but the I/O functions can handle any of these modes and so should other, user written, modules behaving as clients to I/O-servers.
-This option is supported by the standard shell (
Denotes if the terminal should echo input. Only supported for the standard shell I/O-server (
Provide a function for tab-completion (expansion)
- like the Erlang shell. This function is called
- when the user presses the TAB key. The expansion is
- active when calling line-reading functions such as
-
The function is called with the current line, upto
- the cursor, as a reversed string. It should return a
- three-tuple:
Trivial example (beep on anything except empty line, which
- is expanded to
- fun("") -> {yes, "quit", []};
- (_) -> {no, "", ["quit"]} end
- This option is supported by the standard shell only (
Specifies how characters are input or output from or to the actual IO device, implying that i.e. a terminal is set to handle Unicode input and output or a file is set to handle UTF-8 data encoding.
-The option does not affect how data is returned from the I/O functions or how it is sent in the I/O-protocol, it only affects how the IO device is to handle Unicode characters towards the "physical" device.
-The standard shell will be set for either Unicode or latin1 encoding when the system is started. The actual encoding is set with the help of the
The IO device used when Erlang is started with the "-oldshell" or "-noshell" flags is by default set to latin1 encoding, meaning that any characters beyond codepoint 255 will be escaped and that input is expected to be plain 8-bit ISO-latin-1. If the encoding is changed to Unicode, input and output from the standard file descriptors will be in UTF-8 (regardless of operating system).
-Files can also be set in
For disk files, the encoding can be set to various UTF variants. This will have the effect that data is expected to be read as the specified encoding from the file and the data will be written in the specified encoding to the disk file.
-The extended encodings are only supported on disk files (opened by the
Writes the term
Reads a term
The parsing was successful.
-End of file was encountered.
-The parsing failed.
-Other (rare) error condition, for instance
Reads a term
The parsing was successful.
-End of file was encountered.
-The parsing failed.
-Other (rare) error condition, for instance
Writes the items in
Writes the items in
1> io:fwrite("Hello world!~n", []).
Hello world!
ok
- The general format of a control sequence is
The general format of a control sequence is
Character
The following control sequences are available:
+Available control sequences:
The character
Character
The argument is a number that will be interpreted as an +
The argument is a number that is interpreted as an ASCII code. The precision is the number of times the - character is printed and it defaults to the field width, - which in turn defaults to 1. The following example - illustrates:
+ character is printed and defaults to the field width, + which in turn defaults to 1. Example:
1> io:fwrite("|~10.5c|~-10.5c|~5c|~n", [$a, $b, $c]).
| aaaaa|bbbbb |ccccc|
ok
If the Unicode translation modifier (
2> io:fwrite("~tc~n",[1024]).
@@ -435,29 +201,28 @@ ok
3> io:fwrite("~c~n",[1024]).
^@
ok
-
The argument is a float which is written as +
The argument is a float that is written as
The argument is a float which is written as +
The argument is a float that is written as
The argument is a float which is written as
The argument is a float that is written as
This format can be used for printing any object and truncating the output so it fits a specified field:
@@ -484,7 +250,8 @@ ok
3> io:fwrite("|~-10.8s|~n", [io_lib:write({hey, hey, hey})]).
|{hey,hey |
ok
- A list with integers larger than 255 is considered an error if the Unicode translation modifier is not given:
+A list with integers > 255 is considered an error if the + Unicode translation modifier is not specified:
4> io:fwrite("~ts~n",[[1024]]).
\x{400}
@@ -497,8 +264,8 @@ ok
-
Writes data with the standard syntax. This is used to
output Erlang terms. Atoms are printed within quotes if
- they contain embedded non-printable characters, and
- floats are printed accurately as the shortest, correctly
+ they contain embedded non-printable characters.
+ Floats are printed accurately as the shortest, correctly
rounded string.
p
@@ -506,11 +273,11 @@ ok
Writes the data with standard syntax in the same way as
~w , but breaks terms whose printed representation
is longer than one line into many lines and indents each
- line sensibly. Left justification is not supported.
+ line sensibly. Left-justification is not supported.
It also tries to detect lists of
printable characters and to output these as strings. The
Unicode translation modifier is used for determining
- what characters are printable. For example:
+ what characters are printable, for example:
1> T = [{attributes,[[{id,age,1.50000},{mode,explicit},
{typename,"INTEGER"}], [{id,cho},{mode,explicit},{typename,'Cho'}]]},
@@ -531,12 +298,13 @@ ok
{tag,{'PRIVATE',3}},
{mode,implicit}]
ok
- The field width specifies the maximum line length. It
- defaults to 80. The precision specifies the initial
+
The field width specifies the maximum line length.
+ Defaults to 80. The precision specifies the initial
indentation of the term. It defaults to the number of
- characters printed on this line in the same call to
- io:fwrite or io:format . For example, using
- T above:
+ characters printed on this line in the same call to
+ write/1 or
+ format/1,2,3 .
+ For example, using T above:
4> io:fwrite("Here T = ~62p~n", [T]).
Here T = [{attributes,[[{id,age,1.5},
@@ -549,8 +317,8 @@ Here T = [{attributes,[[{id,age,1.5},
{tag,{'PRIVATE',3}},
{mode,implicit}]
ok
- When the modifier l is given no detection of
- printable character lists will take place. For example:
+ When the modifier l is specified, no detection of
+ printable character lists takes place, for example:
5> S = [{a,"a"}, {b, "b"}].
6> io:fwrite("~15p~n", [S]).
@@ -561,9 +329,9 @@ ok
[{a,[97]},
{b,[98]}]
ok
- Binaries that look like UTF-8 encoded strings will be
+
Binaries that look like UTF-8 encoded strings are
output with the string syntax if the Unicode translation
- modifier is given:
+ modifier is specified:
9> io:fwrite("~p~n",[[1024]]).
[1024]
@@ -578,7 +346,7 @@ ok
W
-
Writes data in the same way as ~w , but takes an
- extra argument which is the maximum depth to which terms
+ extra argument that is the maximum depth to which terms
are printed. Anything below this depth is replaced with
... . For example, using T above:
@@ -587,17 +355,17 @@ ok
[{id,cho},{mode,...},{...}]]},{typename,'Person'},
{tag,{'PRIVATE',3}},{mode,implicit}]
ok
- If the maximum depth has been reached, then it is - impossible to read in the resultant output. Also, the +
If the maximum depth is reached, it cannot
+ be read in the resultant output. Also, the
Writes data in the same way as
9> io:fwrite("~62P~n", [T,9]).
[{attributes,[[{id,age,1.5},{mode,explicit},{typename,...}],
@@ -609,9 +377,9 @@ ok
Writes an integer in base 2..36, the default base is +
Writes an integer in base 2-36, the default base is 10. A leading dash is printed for negative integers.
-The precision field selects base. For example:
+The precision field selects base, for example:
1> io:fwrite("~.16B~n", [31]).
1F
@@ -629,7 +397,7 @@ ok
prefix to insert before the number, but after the leading
dash, if any.
The prefix can be a possibly deep list of characters or - an atom.
+ an atom. Example:
1> io:fwrite("~X~n", [31,"10#"]).
10#31
@@ -641,7 +409,7 @@ ok
Like
1> io:fwrite("~.10#~n", [31]).
10#31
@@ -671,14 +439,14 @@ ok
Ignores the next term.
Returns:
+The function returns:
The formatting succeeded.
If an error occurs, there is no output. For example:
+If an error occurs, there is no output. Example:
1> io:fwrite("~s ~w ~i ~w ~c ~n",['abc def', 'abc def', {foo, 1},{foo, 1}, 65]).
abc def 'abc def' {foo,1} A
@@ -692,45 +460,57 @@ ok
in function io:o_request/2
In this example, an attempt was made to output the single character 65 with the aid of the string formatting directive - "~s".
+Reads characters from the standard input (
Reads characters from the standard input
+ (
White space characters (SPACE, TAB and NEWLINE) which - cause input to be read to the next non-white space - character.
+Whitespace characters (Space, Tab, and + Newline) that cause input to be read to the next + non-whitespace character.
Ordinary characters which must match the next input +
Ordinary characters that must match the next input character.
Control sequences, which have the general format
-
Unless otherwise specified, leading white-space is
+
Character
Unless otherwise specified, leading whitespace is ignored for all control sequences. An input field cannot - be more than one line wide. The following control - sequences are available:
+ be more than one line wide. +Available control sequences:
An unsigned integer in base 2..36 is expected. The +
An unsigned integer in base 2-36 is expected. The field width parameter is used to specify base. Leading - white-space characters are not skipped.
+ whitespace characters are not skipped.An optional sign character is expected. A sign
- character
An integer in base 2..36 with Erlang-style base
- prefix (for example
An integer in base 2-36 with Erlang-style base
+ prefix (for example,
A string of non-white-space characters is read. If a +
A string of non-whitespace characters is read. If a field width has been specified, this number of - characters are read and all trailing white-space + characters are read and all trailing whitespace characters are stripped. An Erlang string (list of characters) is returned.
- -If Unicode translation is in effect (
If Unicode translation is in effect (
1> io:fread("Prompt> ","~s").
Prompt> <Characters beyond latin1 range not printable in this medium>
@@ -785,22 +562,23 @@ Prompt> <Characters beyond latin1 range not printable in this medium&g
2> io:fread("Prompt> ","~ts").
Prompt> <Characters beyond latin1 range not printable in this medium>
{ok,[[1091,1085,1080,1094,1086,1076,1077]]}
-
Similar to
The Unicode translation modifier is not allowed (atoms can not contain characters beyond the latin1 range).
+The Unicode translation modifier is not allowed (atoms
+ cannot contain characters beyond the
The number of characters equal to the field width are
read (default is 1) and returned as an Erlang string.
- However, leading and trailing white-space characters
+ However, leading and trailing whitespace characters
are not omitted as they are with
The Unicode translation modifier works as with
The Unicode translation modifier works as with
1> io:fread("Prompt> ","~c").
Prompt> <Character beyond latin1 range not printable in this medium>
@@ -808,21 +586,20 @@ Prompt> <Character beyond latin1 range not printable in this medium>
2> io:fread("Prompt> ","~tc").
Prompt> <Character beyond latin1 range not printable in this medium>
{ok,[[1091]]}
-
Returns the number of characters which have been - scanned up to that point, including white-space +
Returns the number of characters that have been + scanned up to that point, including whitespace characters.
It returns:
+The function returns:
The read was successful and
The read was successful and
The read operation failed and the parameter
-
The read operation failed and parameter
+
Examples:
+Examples:
20> io:fread('enter>', "~f~f~f").
enter>1.9 35.5e3 15.0
@@ -854,104 +632,127 @@ enter>: alan : joe
Retrieves the number of rows of the
-
Reads
The function returns:
+The input characters. If the I/O device supports Unicode,
+ the data can represent codepoints > 255 (the
+
End of file was encountered.
+Other (rare) error condition, such as
Reads data from the standard input (
Reads a line from the standard input (
The function returns:
The tokenization succeeded.
-End of file was encountered by the tokenizer.
+The characters in the line terminated by a line feed (or end of
+ file). If the I/O device supports Unicode,
+ the data can represent codepoints > 255 (the
+
End of file was encountered by the I/O-server.
+End of file was encountered.
An error occurred while tokenizing.
-Other (rare) error condition, for instance
Other (rare) error condition, such as
Example:
-
-23> io:scan_erl_exprs('enter>').
-enter>abc(), "hey".
-{ok,[{atom,1,abc},{'(',1},{')',1},{',',1},{string,1,"hey"},{dot,1}],2}
-24> io:scan_erl_exprs('enter>').
-enter>1.0er.
-{error,{1,erl_scan,{illegal,float}},2}
Reads data from the standard input (
Requests all available options and their current + values for a specific I/O device, for example:
+
+1> {ok,F} = file:open("/dev/null",[read]).
+{ok,<0.42.0>}
+2> io:getopts(F).
+[{binary,false},{encoding,latin1}]
+ Here the file I/O server returns all available options for a file,
+ which are the expected ones,
+3> io:getopts().
+[{expand_fun,#Fun<group.0.120017273>},
+ {echo,true},
+ {binary,false},
+ {encoding,unicode}]
+ This example is, as can be seen, run in an environment where the + terminal supports Unicode input and output.
Writes new line to the standard output
+ (
Reads data from the standard input
(
The function returns:
End of file was encountered by the I/O-server.
+End of file was encountered by the I/O server.
An error occurred while tokenizing or parsing.
Other (rare) error condition, for instance
Other (rare) error condition, such as
Example:
@@ -985,24 +786,25 @@ enter>abc("hey".
{error,{1,erl_parse,["syntax error before: ",["'.'"]]},2}
Reads data from the standard input (
The function returns:
End of file was encountered by the I/O-server.
+End of file was encountered by the I/O server.
An error occurred while tokenizing or parsing.
Other (rare) error condition, for instance
Other (rare) error condition, such as
Returns the user-requested range of printable Unicode characters.
+The user can request a range of characters that are to be considered
+ printable in heuristic detection of strings by the shell and by the
+ formatting functions. This is done by supplying
+
The only valid values for
By default, Erlang is started so that only the
The simplest way to use the setting is to call
+
In a future release, this function may return more values and
+ ranges. To avoid compatibility problems, it is recommended to use
+ function
Writes the characters of
Reads a term
The function returns:
+The parsing was successful.
+End of file was encountered.
+The parsing failed.
+Other (rare) error condition, such as
Reads a term
The function returns:
+The parsing was successful.
+End of file was encountered.
+The parsing failed.
+Other (rare) error condition, such as
Retrieves the number of rows of
Reads data from the standard input (
The function returns:
+The tokenization succeeded.
+End of file was encountered by the tokenizer.
+End of file was encountered by the I/O server.
+An error occurred while tokenizing.
+Other (rare) error condition, such as
Example:
+
+23> io:scan_erl_exprs('enter>').
+enter>abc(), "hey".
+{ok,[{atom,1,abc},{'(',1},{')',1},{',',1},{string,1,"hey"},{dot,1}],2}
+24> io:scan_erl_exprs('enter>').
+enter>1.0er.
+{error,{1,erl_scan,{illegal,float}},2}
+ Reads data from the standard input (
The return values are the same as for
+
Set options for the standard I/O device
+ (
Possible options and values vary depending on the
+ I/O device. For a list of supported options and their current values
+ on a specific I/O device, use function
+
The options and values supported by the OTP I/O devices + are as follows:
+If set in binary mode (
By default, all I/O devices in OTP are set in
This option is supported by the standard shell
+ (
Denotes if the terminal is to echo input. Only supported for
+ the standard shell I/O server (
Provides a function for tab-completion (expansion)
+ like the Erlang shell. This function is called
+ when the user presses the Tab key. The expansion is
+ active when calling line-reading functions, such as
+
The function is called with the current line, up to
+ the cursor, as a reversed string. It is to return a
+ three-tuple:
Trivial example (beep on anything except empty line, which
+ is expanded to
+fun("") -> {yes, "quit", []};
+ (_) -> {no, "", ["quit"]} end
+ This option is only supported by the standard shell
+ (
Specifies how characters are input or output from or to the I/O + device, implying that, for example, a terminal is set to handle + Unicode input and output or a file is set to handle UTF-8 data + encoding.
+The option does not affect how data is returned from the + I/O functions or how it is sent in the I/O protocol, it only + affects how the I/O device is to handle Unicode characters to the + "physical" device.
+The standard shell is set for
The I/O device used when Erlang is started with the "-oldshell"
+ or "-noshell" flags is by default set to
Files can also be set in
For disk files, the encoding can be set to various UTF variants. + This has the effect that data is expected to be read as the + specified encoding from the file, and the data is written in the + specified encoding to the disk file.
+The extended encodings are only supported on disk files
+ (opened by function
+
Writes term
All Erlang processes have a default standard IO device. This +
All Erlang processes have a default standard I/O device. This
device is used when no
27> io:read('enter>').
enter>foo.
@@ -1047,30 +1170,37 @@ enter>foo.
28> io:read(standard_io, 'enter>').
enter>bar.
{ok,bar}
+
There is always a process registered under the name of
In certain situations, especially when the standard output is redirected, access to an I/O-server specific for error messages might be convenient. The IO device
In certain situations, especially when the standard output is
+ redirected, access to an I/O server specific for error messages can be
+ convenient. The I/O device
$ erl -noshell -noinput -eval 'io:format(standard_error,"Error: ~s~n",["error 11"]),'\ 'init:stop().' > /dev/null Error: error 11- - -
The
The
{ErrorLocation, Module, ErrorDescriptor}
- A string which describes the error is obtained with the following + +
A string that describes the error is obtained with the following call:
+
Module:format_error(ErrorDescriptor)
This module contains functions for converting to and from
strings (lists of characters). They are used for implementing the
- functions in the
A continuation as returned by
A continuation as returned by
+
Description:
+Where:
Returns a character list which represents a new line - character.
+For details, see
+
Returns a character list which represents
-1> lists:flatten(io_lib:write({1,[2],[3],[4,5],6,7,8,9})).
-"{1,[2],[3],[4,5],6,7,8,9}"
-2> lists:flatten(io_lib:write({1,[2],[3],[4,5],6,7,8,9}, 5)).
-"{1,[2],[3],[...],...}"
+ Returns
Also returns a list of characters which represents
-
Returns
Returns a character list which represents
Returns
If (and only if) the Unicode translation modifier is used - in the format string (i.e. ~ts or ~tc), the resulting list - may contain characters beyond the ISO-latin-1 character - range (in other words, numbers larger than 255). If so, the - result is not an ordinary Erlang string(), but can well be - used in any context where Unicode data is allowed.
- +Returns a character list that represents
If and only if the Unicode translation modifier is used in the
+ format string (that is,
Tries to read
Tries to read
The function returns:
The string was read.
The string was read.
The string was read, but more input is needed in order
- to complete the original format string.
The string was read, but more input is needed to complete the
+ original format string.
The read operation failed and the parameter
The read operation failed and parameter
Example:
+Example:
3> io_lib:fread("~f~f~f", "15.6 17.3e-6 24.5").
{ok,[15.6,1.73e-5,24.5],[]}
This is the re-entrant formatted reader. The continuation of
- the first call to the functions must be
The function returns:
The input is complete. The result is one of the - following:
+The input is complete. The result is one of the following:
The string was read.
The string was read.
End of file has been encountered. +
End of file was encountered.
An error occurred and the parameter
An error occurred and parameter
More data is required to build a term.
-
Returns the list of characters needed to print the atom
-
Returns the indentation if
Returns the list of characters needed to print
-
Returns
Returns the list of characters needed to print
-
Returns a character list that represents a new line character.
Returns the list of characters needed to print
-
Returns a list of characters that represents
+
Returns the list of characters needed to print a character - constant in the Unicode character set.
+Returns
Returns the list of characters needed to print a character - constant in the Unicode character set. Non-Latin-1 characters - are escaped.
+Returns
What is a printable character in this case is determined by
+ startup flag
Returns the list of characters needed to print a character - constant in the ISO-latin-1 character set.
+Returns
Returns a list corresponding to the given format string, +
Returns a list corresponding to the specified format string,
where control sequences have been replaced with
- corresponding tuples. This list can be passed to
A typical use of this function is to replace unbounded-size
control sequences like
See
See
For details, see
+
Returns the indentation if
Returns a character list that represents
Example:
+
+1> lists:flatten(io_lib:write({1,[2],[3],[4,5],6,7,8,9})).
+"{1,[2],[3],[4,5],6,7,8,9}"
+2> lists:flatten(io_lib:write({1,[2],[3],[4,5],6,7,8,9}, 5)).
+"{1,[2],[3],[...],...}"
Returns
Returns the list of characters needed to print atom
+
Returns
Returns the list of characters needed to print a character + constant in the Unicode character set.
Returns
Returns the list of characters needed to print a character + constant in the Unicode character set. Non-Latin-1 characters + are escaped.
Returns
Returns the list of characters needed to print a character + constant in the ISO Latin-1 character set.
Returns
What is a printable character in this case is determined by the
-
Returns the list of characters needed to print
+
Returns
Returns the list of characters needed to print
+
Returns
Returns the list of characters needed to print
+
The I/O-protocol in Erlang specifies a way for a client to communicate -with an I/O server and vice versa. The I/O server is a process that handles -the requests and performs the requested task on e.g. an IO device. The -client is any Erlang process wishing to read or write data from/to the -IO device.
- -The common I/O-protocol has been present in OTP since the -beginning, but has been fairly undocumented and has also somewhat -evolved over the years. In an addendum to Robert Virdings rationale -the original I/O-protocol is described. This document describes the -current I/O-protocol.
- -The original I/O-protocol was simple and flexible. Demands for spacial -and execution time efficiency has triggered extensions to the protocol -over the years, making the protocol larger and somewhat less easy to -implement than the original. It can certainly be argued that the -current protocol is too complex, but this text describes how it looks -today, not how it should have looked.
- -The basic ideas from the original protocol still hold. The I/O server -and client communicate with one single, rather simplistic protocol and -no server state is ever present in the client. Any I/O server can be -used together with any client code and client code need not be aware -of the actual IO device the I/O server communicates with.
- -As described in Robert's paper, I/O servers and clients communicate using
-
{io_request, From, ReplyAs, Request}
-{io_reply, ReplyAs, Reply}
The client sends an
When an I/O server receives an
To output characters on an IO device, the following
-{put_chars, Encoding, Characters}
-{put_chars, Encoding, Module, Function, Args}
-
The I/O server replies to the client with an
-ok
-{error, Error}
-
For backward compatibility the following
-{put_chars, Characters}
-{put_chars, Module, Function, Args}
-
These should behave as
To read characters from an IO device, the following
{get_until, Encoding, Prompt, Module, Function, ExtraArgs}
- -
-{done, Result, RestChars}
-{more, Continuation}
-
The
The function will be called with the data the I/O server finds on
- its IO device, returning
+ The I/O protocol in Erlang enables bi-directional communication between
+ clients and servers.
+
+
+ -
+
The I/O server is a process that handles the requests and performs
+ the requested task on, for example, an I/O device.
+
+ -
+
The client is any Erlang process wishing to read or write data from/to
+ the I/O device.
+
+
+
+ The common I/O protocol has been present in OTP since the beginning, but
+ has been undocumented and has also evolved over the years. In an
+ addendum to Robert Virding's rationale, the original I/O protocol is
+ described. This section describes the current I/O protocol.
+
+ The original I/O protocol was simple and flexible. Demands for memory
+ efficiency and execution time efficiency have triggered extensions
+ to the protocol over the years, making the protocol larger and somewhat
+ less easy to implement than the original. It can certainly be argued that
+ the current protocol is too complex, but this section describes how it
+ looks today, not how it should have looked.
+
+ The basic ideas from the original protocol still hold. The I/O server
+ and client communicate with one single, rather simplistic protocol and no
+ server state is ever present in the client. Any I/O server can be used
+ together with any client code, and the client code does not need to be
+ aware of the I/O device that the I/O server communicates with.
+
+
+ Protocol Basics
+ As described in Robert's paper, I/O servers and clients communicate
+ using io_request /io_reply tuples as follows:
+
+
+{io_request, From, ReplyAs, Request}
+{io_reply, ReplyAs, Reply}
+
+ The client sends an io_request tuple to the I/O server and the
+ server eventually sends a corresponding io_reply tuple.
+
+
+ -
+
From is the pid() of the client, the process which
+ the I/O server sends the I/O reply to.
+
+ -
+
ReplyAs can be any datum and is returned in the
+ corresponding io_reply . The
+ io module monitors the
+ the I/O server and uses the monitor reference as the ReplyAs
+ datum. A more complicated client can have many outstanding I/O
+ requests to the same I/O server and can use different references (or
+ something else) to differentiate among the incoming I/O replies.
+ Element ReplyAs is to be considered opaque by the I/O
+ server.
+ Notice that the pid() of the I/O server is not explicitly
+ present in tuple io_reply . The reply can be sent from any
+ process, not necessarily the actual I/O server.
+
+ -
+
Request and Reply are described below.
+
+
+
+ When an I/O server receives an io_request tuple, it acts upon the
+ Request part and eventually sends an io_reply tuple with
+ the corresponding Reply part.
+
+
+
+ Output Requests
+ To output characters on an I/O device, the following Request s
+ exist:
+
+
+{put_chars, Encoding, Characters}
+{put_chars, Encoding, Module, Function, Args}
+
+
+ -
+
Encoding is unicode or latin1 , meaning that the
+ characters are (in case of binaries) encoded as UTF-8 or ISO Latin-1
+ (pure bytes). A well-behaved I/O server is also to return an error
+ indication if list elements contain integers > 255
+ when Encoding is set to latin1 .
+ Notice that this does not in any way tell how characters are to be
+ put on the I/O device or handled by the I/O server. Different I/O
+ servers can handle the characters however they want, this only tells
+ the I/O server which format the data is expected to have. In the
+ Module /Function /Args case, Encoding tells
+ which format the designated function produces.
+ Notice also that byte-oriented data is simplest sent using the ISO
+ Latin-1 encoding.
+
+ -
+
Characters are the data to be put on the I/O device. If
+ Encoding is latin1 , this is an iolist() . If
+ Encoding is unicode , this is an Erlang standard mixed
+ Unicode list (one integer in a list per character, characters in
+ binaries represented as UTF-8).
+
+ -
+
Module , Function , and Args denote a function
+ that is called to produce the data (like
+ io_lib:format/2 ).
+
+ Args is a list of arguments to the function. The function is
+ to produce data in the specified Encoding . The I/O server is
+ to call the function as apply(Mod, Func, Args) and put the
+ returned data on the I/O device as if it was sent in a
+ {put_chars, Encoding, Characters} request. If the function
+ returns anything else than a binary or list, or throws an exception,
+ an error is to be sent back to the client.
+
+
+
+ The I/O server replies to the client with an io_reply tuple, where
+ element Reply is one of:
+
+
+ok
+{error, Error}
+
+
+ Error describes the error to the client, which can do
+ whatever it wants with it. The
+ io module typically
+ returns it "as is".
+
+
+ For backward compatibility, the following Request s are also to be
+ handled by an I/O server (they are not to be present after
+ Erlang/OTP R15B):
+
+
+{put_chars, Characters}
+{put_chars, Module, Function, Args}
+
+ These are to behave as {put_chars, latin1, Characters} and
+ {put_chars, latin1, Module, Function, Args} , respectively.
+
+
+
+ Input Requests
+ To read characters from an I/O device, the following Request s
+ exist:
+
+
+{get_until, Encoding, Prompt, Module, Function, ExtraArgs}
+
+
+ -
+
Encoding denotes how data is to be sent back to the client
+ and what data is sent to the function denoted by
+ Module /Function /ExtraArgs . If the function
+ supplied returns data as a list, the data is converted to this
+ encoding. If the function supplied returns data in some other format,
+ no conversion can be done, and it is up to the client-supplied
+ function to return data in a proper way.
+ If Encoding is latin1 , lists of integers 0..255
+ or binaries containing plain bytes are sent back to the client when
+ possible. If Encoding is unicode , lists with integers
+ in the whole Unicode range or binaries encoded in UTF-8 are sent to
+ the client. The user-supplied function always sees lists of
+ integers, never binaries, but the list can contain numbers > 255
+ if Encoding is unicode .
+
+ -
+
Prompt is a list of characters (not mixed, no binaries) or an
+ atom to be output as a prompt for input on the I/O device.
+ Prompt is often ignored by the I/O server; if set to '' ,
+ it is always to be ignored (and results in nothing being written to
+ the I/O device).
+
+ -
+
Module , Function , and ExtraArgs denote a
+ function and arguments to determine when enough data is written. The
+ function is to take two more arguments, the last state, and a list of
+ characters. The function is to return one of:
+
+{done, Result, RestChars}
+{more, Continuation}
+ Result can be any Erlang term, but if it is a list() ,
+ the I/O server can convert it to a binary() of appropriate
+ format before returning it to the client, if the I/O server is set in
+ binary mode (see below).
+ The function is called with the data the I/O server finds on its I/O
+ device, returning one of:
+
+ -
+
{done, Result, RestChars} when enough data is read. In
+ this case Result is sent to the client and RestChars
+ is kept in the I/O server as a buffer for later input.
+
+ -
+
{more, Continuation} , which indicates that more
+ characters are needed to complete the request.
+
+
+ Continuation is sent as the state in later calls to the
+ function when more characters are available. When no more characters
+ are available, the function must return {done, eof, Rest} . The
+ initial state is the empty list. The data when an end of file is
+ reached on the IO device is the atom eof .
+ An emulation of the get_line request can be (inefficiently)
+ implemented using the following functions:
+
-module(demo).
-export([until_newline/3, get_line/1]).
@@ -234,226 +268,253 @@ get_line(IoServer) ->
receive
{io_reply, IoServer, Data} ->
Data
- end.
-
- Note especially that the last element in the Request tuple ([$\n] )
- is appended to the argument list when the function is called. The
- function should be called like
- apply(Module, Function, [ State, Data | ExtraArgs ]) by the I/O server
-
-
-
-A fixed number of characters is requested using this Request :
-
-{get_chars, Encoding, Prompt, N}
-
-
-
-Encoding and Prompt as for get_until .
-
-N is the number of characters to be read from the IO device.
-
-
-A single line (like in the example above) is requested with this Request :
-
-{get_line, Encoding, Prompt}
-
-
-
-Encoding and Prompt as above.
-
-
-Obviously, the get_chars and get_line could be implemented with the
-get_until request (and indeed they were originally), but demands for
-efficiency has made these additions necessary.
-
-The I/O server replies to the client with an io_reply tuple where the Reply
-element is one of:
-
-Data
-eof
-{error, Error}
-
-
-
-Data is the characters read, in either list or binary form
- (depending on the I/O server mode, see below).
-Error describes the error to the client, which may do whatever it
- wants with it. The Erlang io
- module typically returns it as is.
-eof is returned when input end is reached and no more data is
-available to the client process.
-
-
-For backward compatibility the following Request s should also be
-handled by an I/O server (these reqeusts should not be present after
-R15B of OTP):
-
-
-{get_until, Prompt, Module, Function, ExtraArgs}
-{get_chars, Prompt, N}
-{get_line, Prompt}
-
-
-These should behave as {get_until, latin1, Prompt, Module, Function,
-ExtraArgs} , {get_chars, latin1, Prompt, N} and {get_line, latin1,
-Prompt} respectively.
-
-
-I/O-server Modes
-
-Demands for efficiency when reading data from an I/O server has not
-only lead to the addition of the get_line and get_chars requests, but
-has also added the concept of I/O server options. No options are
-mandatory to implement, but all I/O servers in the Erlang standard
-libraries honor the binary option, which allows the Data element of the
-io_reply tuple to be a binary instead of a list when possible.
-If the data is sent as a binary, Unicode data will be sent in the
-standard Erlang Unicode
-format, i.e. UTF-8 (note that the function of the get_until request still gets
-list data regardless of the I/O server mode).
-
-Note that i.e. the get_until request allows for a function with the data specified as always being a list. Also the return value data from such a function can be of any type (as is indeed the case when an io:fread request is sent to an I/O server). The client has to be prepared for data received as answers to those requests to be in a variety of forms, but the I/O server should convert the results to binaries whenever possible (i.e. when the function supplied to get_until actually returns a list). The example shown later in this text does just that.
-
-An I/O-server in binary mode will affect the data sent to the client,
-so that it has to be able to handle binary data. For convenience, it
-is possible to set and retrieve the modes of an I/O server using the
-following I/O requests:
-
-
-{setopts, Opts}
-
-
-
-
-Opts is a list of options in the format recognized by proplists (and
- of course by the I/O server itself).
-
-As an example, the I/O server for the interactive shell (in group.erl )
-understands the following options:
-
-{binary, boolean()} (or binary/list)
-{echo, boolean()}
-{expand_fun, fun()}
-{encoding, unicode/latin1} (or unicode/latin1)
-
-
-- of which the binary and encoding options are common for all
-I/O servers in OTP, while echo and expand are valid only for this
-I/O server. It is worth noting that the unicode option notifies how
-characters are actually put on the physical IO device, i.e. if the
-terminal per se is Unicode aware, it does not affect how characters
-are sent in the I/O-protocol, where each request contains encoding
-information for the provided or returned data.
-
-The I/O server should send one of the following as Reply :
-
-ok
-{error, Error}
-
-
-An error (preferably enotsup ) is to be expected if the option is
-not supported by the I/O server (like if an echo option is sent in a
-setopts request to a plain file).
-
-To retrieve options, this request is used:
-
-getopts
-
-
-The getopts request asks for a complete list of all options
-supported by the I/O server as well as their current values.
-
-The I/O server replies:
-
-OptList
-{error, Error}
-
-
-
-OptList is a list of tuples {Option, Value} where Option is always
- an atom.
-
-
-
-Multiple I/O Requests
-
-The Request element can in itself contain several Request s by using
-the following format:
-
-{requests, Requests}
-
-
-Requests is a list of valid io_request tuples for the protocol, they
- shall be executed in the order in which they appear in the list and
- the execution should continue until one of the requests result in an
- error or the list is consumed. The result of the last request is
- sent back to the client.
-
-
-The I/O server can for a list of requests send any of the valid results in
-the reply:
-
-
-ok
-{ok, Data}
-{ok, Options}
-{error, Error}
-
-- depending on the actual requests in the list.
-
-
-Optional I/O Requests
-
-The following I/O request is optional to implement and a client
-should be prepared for an error return:
-
-{get_geometry, Geometry}
-
-
-Geometry is either the atom rows or the atom columns .
-
-The I/O server should send the Reply as:
-
-{ok, N}
-{error, Error}
-
-
-
-N is the number of character rows or columns the IO device has, if
- applicable to the IO device the I/O server handles, otherwise {error,
- enotsup} is a good answer.
-
-
-
-Unimplemented Request Types
-
-If an I/O server encounters a request it does not recognize (i.e. the
-io_request tuple is in the expected format, but the actual Request is
-unknown), the I/O server should send a valid reply with the error tuple:
-
-{error, request}
-
-
-This makes it possible to extend the protocol with optional requests
-and for the clients to be somewhat backwards compatible.
-
-
-An Annotated and Working Example I/O Server
-
-An I/O server is any process capable of handling the I/O protocol. There is
-no generic I/O server behavior, but could well be. The framework is
-simple enough, a process handling incoming requests, usually both
-I/O-requests and other IO device-specific requests (for i.e. positioning,
-closing etc.).
-
-Our example I/O server stores characters in an ETS table, making up a
-fairly crude ram-file (it is probably not useful, but working).
-
-The module begins with the usual directives, a function to start the
-I/O server and a main loop handling the requests:
-
-
+ end.
+ Notice that the last element in the Request tuple
+ ([$\n] ) is appended to the argument list when the function is
+ called. The function is to be called like
+ apply(Module, Function, [ State, Data | ExtraArgs ]) by the
+ I/O server.
+ A fixed number of characters is requested using the following
+
+{get_chars, Encoding, Prompt, N}
+
+ A single line (as in former example) is requested with the
+ following
+{get_line, Encoding, Prompt}
+
+ Clearly,
The I/O server replies to the client with an
+Data
+eof
+{error, Error}
+
+ For backward compatibility, the following
+{get_until, Prompt, Module, Function, ExtraArgs}
+{get_chars, Prompt, N}
+{get_line, Prompt}
+
+ These are to behave as
+
Demands for efficiency when reading data from an I/O server has not only
+ lead to the addition of the
Notice that the
An I/O server in binary mode affects the data sent to the client, so that + it must be able to handle binary data. For convenience, the modes of an + I/O server can be set and retrieved using the following I/O requests:
+ +
+{setopts, Opts}
+
+ As an example, the I/O server for the interactive shell (in
+
+{binary, boolean()} (or binary/list)
+{echo, boolean()}
+{expand_fun, fun()}
+{encoding, unicode/latin1} (or unicode/latin1)
+
+ Options
The I/O server is to send one of the following as
+ok
+{error, Error}
+
+ An error (preferably
To retrieve options, the following request is used:
+ ++getopts+ +
This request asks for a complete list of all options supported by the + I/O server as well as their current values.
+ +The I/O server replies:
+ +
+OptList
+{error, Error}
+
+ The
+{requests, Requests}
+
+ The I/O server can, for a list of requests, send any of the following + valid results in the reply, depending on the requests in the list:
+ +
+ok
+{ok, Data}
+{ok, Options}
+{error, Error}
+ The following I/O request is optional to implement and a client is to + be prepared for an error return:
+ +
+{get_geometry, Geometry}
+
+ The I/O server is to send the
+{ok, N}
+{error, Error}
+
+ If an I/O server encounters a request that it does not recognize (that
+ is, the
+{error, request}
+
+ This makes it possible to extend the protocol with optional requests + and for the clients to be somewhat backward compatible.
+An I/O server is any process capable of handling the I/O protocol. There + is no generic I/O server behavior, but could well be. The framework is + simple, a process handling incoming requests, usually both I/O-requests + and other I/O device-specific requests (positioning, closing, and so on). +
+ +The example I/O server stores characters in an ETS table, making + up a fairly crude RAM file.
+ +The module begins with the usual directives, a function to start the + I/O server and a main loop handling the requests:
+ +
-module(ets_io_server).
-export([start_link/0, init/0, loop/1, until_newline/3, until_enough/3]).
@@ -490,39 +551,34 @@ loop(State) ->
?MODULE:loop(State#state{position = 0});
_Unknown ->
?MODULE:loop(State)
- end.
-
-
-The main loop receives messages from the client (which might be using
-the
The "private" message
The main loop receives messages from the client (which can use the
+ the
Let us look at the reply function first...
+The "private" message
+ First, we examine the reply function:
+
reply(From, ReplyAs, Reply) ->
- From ! {io_reply, ReplyAs, Reply}.
+ From ! {io_reply, ReplyAs, Reply}.
-
+ It sends the
Simple enough, it sends the
We need to handle some requests. First the requests for writing + characters:
-Now look at the different requests we need to handle. First the -requests for writing characters:
- -
+
request({put_chars, Encoding, Chars}, State) ->
put_chars(unicode:characters_to_list(Chars,Encoding),State);
request({put_chars, Encoding, Module, Function, Args}, State) ->
@@ -531,23 +587,22 @@ request({put_chars, Encoding, Module, Function, Args}, State) ->
catch
_:_ ->
{error, {error,Function}, State}
- end;
-
+ end;
-The
The
When
When
Let us handle the requests for retrieving data too:
+We handle the requests for retrieving data:
-
+
request({get_until, Encoding, _Prompt, M, F, As}, State) ->
get_until(Encoding, M, F, As, State);
request({get_chars, Encoding, _Prompt, N}, State) ->
@@ -555,17 +610,16 @@ request({get_chars, Encoding, _Prompt, N}, State) ->
get_until(Encoding, ?MODULE, until_enough, [N], State);
request({get_line, Encoding, _Prompt}, State) ->
%% To simplify the code, get_line is implemented using get_until
- get_until(Encoding, ?MODULE, until_newline, [$\n], State);
-
+ get_until(Encoding, ?MODULE, until_newline, [$\n], State);
-Here we have cheated a little by more or less only implementing
-
Here we have cheated a little by more or less only implementing
+
+
request({get_geometry,_}, State) ->
{error, {error,enotsup}, State};
request({setopts, Opts}, State) ->
@@ -573,23 +627,23 @@ request({setopts, Opts}, State) ->
request(getopts, State) ->
getopts(State);
request({requests, Reqs}, State) ->
- multi_request(Reqs, {ok, ok, State});
-
+ multi_request(Reqs, {ok, ok, State});
-The
Request
The multi-request tag (
The multi-request tag (
What is left is to handle backward compatibility and the
We need to handle backward compatibility and the
+
+
request({put_chars,Chars}, State) ->
request({put_chars,latin1,Chars}, State);
request({put_chars,M,F,As}, State) ->
@@ -599,38 +653,35 @@ request({get_chars,Prompt,N}, State) ->
request({get_line,Prompt}, State) ->
request({get_line,latin1,Prompt}, State);
request({get_until, Prompt,M,F,As}, State) ->
- request({get_until,latin1,Prompt,M,F,As}, State);
-
+ request({get_until,latin1,Prompt,M,F,As}, State);
-OK, what is left now is to return
+
request(_Other, State) ->
- {error, {error, request}, State}.
-
+ {error, {error, request}, State}.
-Let us move further and actually handle the different requests, first -the fairly generic multi-request type:
+Next we handle the different requests, first the fairly generic + multi-request type:
-
+
multi_request([R|Rs], {ok, _Res, State}) ->
multi_request(Rs, request(R, State));
multi_request([_|_], Error) ->
Error;
multi_request([], Result) ->
- Result.
-
+ Result.
-We loop through the requests one at the time, stopping when we either
-encounter an error or the list is exhausted. The last return value is
-sent back to the client (it is first returned to the main loop and then
-sent back by the function
We loop through the requests one at the time, stopping when we either
+ encounter an error or the list is exhausted. The last return value is
+ sent back to the client (it is first returned to the main loop and then
+ sent back by function
The
Requests
+
setopts(Opts0,State) ->
Opts = proplists:unfold(
proplists:substitute_negations(
@@ -662,46 +713,44 @@ getopts(#state{mode=M} = S) ->
true;
_ ->
false
- end}],S}.
-
+ end}],S}.
-As a convention, all I/O servers handle both
As a convention, all I/O servers handle both
The
Request
So far our I/O server has been fairly generic (except for the
So far this I/O server is fairly generic (except for request
+
To make the example runnable, we now start implementing the actual
-reading and writing of the data to/from the ETS table. First the
-
To make the example runnable, we start implementing the reading and
+ writing of the data to/from the ETS table. First function
+
+
put_chars(Chars, #state{table = T, position = P} = State) ->
R = P div ?CHARS_PER_REC,
C = P rem ?CHARS_PER_REC,
[ apply_update(T,U) || U <- split_data(Chars, R, C) ],
- {ok, ok, State#state{position = (P + length(Chars))}}.
-
+ {ok, ok, State#state{position = (P + length(Chars))}}.
-We already have the data as (Unicode) lists and therefore just split
-the list in runs of a predefined size and put each run in the
-table at the current position (and forward). The functions
-
We already have the data as (Unicode) lists and therefore only split
+ the list in runs of a predefined size and put each run in the table at
+ the current position (and forward). Functions
Now we want to read data from the table. The
Now we want to read data from the table. Function
+
get_until(Encoding, Mod, Func, As,
#state{position = P, mode = M, table = T} = State) ->
case get_loop(Mod,Func,As,T,P,[]) of
@@ -737,34 +786,34 @@ get_loop(M,F,A,T,P,C) ->
get_loop(M,F,A,T,NewP,NewC);
_ ->
{error,F}
- end.
-
-
-Here we also handle the mode (binary or list ) that can be set by
-the setopts request. By default, all OTP I/O servers send data back to
-the client as lists, but switching mode to binary might increase
-efficiency if the I/O server handles it in an appropriate way. The
-implementation of get_until is hard to get efficient as the supplied
-function is defined to take lists as arguments, but get_chars and
-get_line can be optimized for binary mode. This example does not
-optimize anything however. It is important though that the returned
-data is of the right type depending on the options set, so we convert
-the lists to binaries in the correct encoding if possible
-before returning. The function supplied in the get_until request tuple may,
-as its final result return anything, so only functions actually
-returning lists can get them converted to binaries. If the request
-contained the encoding tag unicode , the lists can contain all Unicode
-codepoints and the binaries should be in UTF-8, if the encoding tag
-was latin1 , the client should only get characters in the range
-0..255. The function check/2 takes care of not returning arbitrary
-Unicode codepoints in lists if the encoding was given as latin1 . If
-the function did not return a list, the check cannot be performed and
-the result will be that of the supplied function untouched.
-
-Now we are more or less done. We implement the utility functions below
-to actually manipulate the table:
-
-
+ end.
+
+ Here we also handle the mode (binary or list ) that can be
+ set by request setopts . By default, all OTP I/O servers send data
+ back to the client as lists, but switching mode to binary can
+ increase efficiency if the I/O server handles it in an appropriate way.
+ The implementation of get_until is difficult to get efficient, as
+ the supplied function is defined to take lists as arguments, but
+ get_chars and get_line can be optimized for binary mode.
+ However, this example does not optimize anything.
+
+ It is important though that the returned data is of the correct type
+ depending on the options set. We therefore convert the lists to binaries
+ in the correct encoding if possible before returning. The
+ function supplied in the get_until request tuple can, as its final
+ result return anything, so only functions returning lists can get them
+ converted to binaries. If the request contains encoding tag
+ unicode , the lists can contain all Unicode code points and the
+ binaries are to be in UTF-8. If the encoding tag is latin1 , the
+ client is only to get characters in the range 0..255 . Function
+ check/2 takes care of not returning arbitrary Unicode code points
+ in lists if the encoding was specified as latin1 . If the function
+ does not return a list, the check cannot be performed and the result is
+ that of the supplied function untouched.
+
+ To manipulate the table we implement the following utility functions:
+
+
check(unicode, List) ->
List;
check(latin1, List) ->
@@ -775,18 +824,16 @@ check(latin1, List) ->
catch
throw:_ ->
{error,{cannot_convert, unicode, latin1}}
- end.
-
+ end.
-The function check takes care of providing an error tuple if Unicode -codepoints above 255 is to be returned if the client requested -latin1.
+The function check provides an error tuple if Unicode code points >
+ 255 are to be returned if the client requested
The two functions
+ The two functions until_newline/3 and until_enough/3 are
+ helpers used together with function get_until/5 to implement
+ get_chars and get_line (inefficiently):
+
+
until_newline([],eof,_MyStopCharacter) ->
{done,eof,[]};
until_newline(ThisFar,eof,_MyStopCharacter) ->
@@ -810,16 +857,15 @@ until_enough(ThisFar,CharList,N)
{Res,Rest} = my_split(N,ThisFar ++ CharList, []),
{done,Res,Rest};
until_enough(ThisFar,CharList,_N) ->
- {more,ThisFar++CharList}.
-
+ {more,ThisFar++CharList}.
-As can be seen, the functions above are just the type of functions
-that should be provided in
As can be seen, the functions above are just the type of functions that
+ are to be provided in
Now we only need to read and write the table in an appropriate way to -complete the I/O server:
+To complete the I/O server, we only need to read and write the table in + an appropriate way:
-
+
get(P,Tab) ->
R = P div ?CHARS_PER_REC,
C = P rem ?CHARS_PER_REC,
@@ -856,18 +902,16 @@ apply_update(Table, {Row, Col, List}) ->
{Part1,_} = my_split(Col,OldData,[]),
{_,Part2} = my_split(Col+length(List),OldData,[]),
ets:insert(Table,{Row, Part1 ++ List ++ Part2})
- end.
-
-
-The table is read or written in chunks of ?CHARS_PER_REC , overwriting
-when necessary. The implementation is obviously not efficient, it is
-just working.
-
-This concludes the example. It is fully runnable and you can read or
-write to the I/O server by using i.e. the io module or even the file
-module. It is as simple as that to implement a fully fledged I/O server
-in Erlang.
-The table is read or written in chunks of
This concludes the example. It is fully runnable and you can read or
+ write to the I/O server by using, for example, the
+
This module is retained for compatibility. It may disappear - without warning in a future release.
+This module is retained for backward compatibility. It can disappear + without warning in a future Erlang/OTP release.
Flushes the message buffer of the current process.
-Prints error message
Returns the name of the script that started the current - Erlang session.
+Flushes the message buffer of the current process.
Removes the last newline character, if any, in
Returns the name of the script that started the current + Erlang session.
+This function to makes it possible to send a message using
- the
Makes it possible to send a message using the
As
As
sendw(To, Msg) ->
To ! {self(),Msg},
receive
Reply -> Reply
end.
- The message returned is not necessarily a reply to the - message sent.
+The returned message is not necessarily a reply to the sent + message.
This module contains functions for list processing.
@@ -44,132 +44,156 @@Whenever an
if x
If x
if x
If x
x
An example of a typical ordering function is less than or equal
- to,
An example of a typical ordering function is less than or equal
+ to:
Returns
Returns
Returns
Returns
Returns a list in which all the sub-lists of
-
Returns a list in which all the sublists of
+
Example:
> lists:append([[1, 2, 3], [a, b], [4, 5, 6]]). [1,2,3,a,b,4,5,6]
Returns a new list
Returns a new list
Example:
> lists:append("abc", "def").
"abcdef"
Concatenates the text representation of the elements
- of
Concatenates the text representation of the elements of
+
Example:
> lists:concat([doc, '/', file, '.', 3]). "doc/file.3"
Returns a copy of
Drops the last element of a
Drops the last element of a
Drops elements
Drops elements
Returns a list which contains
Returns a list containing
Example:
> lists:duplicate(5, xx). [xx,xx,xx,xx,xx]
Calls
That is,
Calls
That is,
filtermap(Fun, List1) ->
lists:foldr(fun(Elem, Acc) ->
@@ -179,26 +203,29 @@ filtermap(Fun, List1) ->
{true,Value} -> [Value|Acc]
end
end, [], List1).
- Example:
+Example:
> lists:filtermap(fun(X) -> case X rem 2 of 0 -> {true, X div 2}; _ -> false end end, [1,2,3,4,5]).
[1,2]
Equivalent to
Equivalent to
Takes a function from
Takes a function from
That is,
flatmap(Fun, List1) ->
append(map(Fun, List1)).
-
Example:
+Example:
> lists:flatmap(fun(X)->[X,X] end, [a,b,c]). [a,a,b,b,c,c]
Returns a flattened version of
Returns a flattened version of
Returns a flattened version of
Calls
Calls
Example:
> lists:foldl(fun(X, Sum) -> X + Sum end, 0, [1,2,3,4,5]). 15 @@ -244,12 +276,14 @@ flatmap(Fun, List1) -> 120
Like
Like
Example:
> P = fun(A, AccIn) -> io:format("~p ", [A]), AccIn end.
#Fun<erl_eval.12.2225172>
@@ -257,10 +291,11 @@ flatmap(Fun, List1) ->
1 2 3 void
> lists:foldr(P, void, [1,2,3]).
3 2 1 void
- Calls
Calls
Returns a copy of
Searches the list of tuples
Returns a list of tuples where, for each tuple in
-
Examples:
+Examples:
> Fun = fun(Atom) -> atom_to_list(Atom) end.
#Fun<erl_eval.6.10732646>
@@ -324,33 +366,37 @@ flatmap(Fun, List1) ->
[{name,"jane",22},{name,"lizzie",20},{name,"lydia",15}]
Returns
Returns
Returns the sorted list formed by merging
Returns the sorted list formed by merging
+
Returns a copy of
Searches the list of tuples
This function is retained for backward compatibility.
- The function
This function is retained for backward compatibility. Function
+
Returns a list containing the sorted elements of the list
-
Returns a list containing the sorted elements of list
+
Returns a copy of
Searches the list of tuples
Searches the list of tuples
Returns the last element in
Takes a function from
Takes a function from
Combines the operations of
+
Example:
+Summing the elements in a list and double them at the same time:
> lists:mapfoldl(fun(X, Sum) -> {2*X, X+Sum} end,
0, [1,2,3,4,5]).
{[2,4,6,8,10],15}
Combines the operations of
+
Returns the first element of
Returns
Returns
Returns the sorted list formed by merging all the sub-lists
- of
Returns the sorted list formed by merging all the sublists of
+
Returns the sorted list formed by merging
Returns the sorted list formed by merging
Returns the sorted list formed by merging
Returns the sorted list formed by merging
Returns the first element of
Returns the
Returns the
Example:
> lists:nth(3, [a, b, c, d, e]). c
Returns the
Returns the
Example
> lists:nthtail(3, [a, b, c, d, e]). [d,e] @@ -557,70 +636,91 @@ c[]
Partitions
Examples:
+Partitions
Examples:
> lists:partition(fun(A) -> A rem 2 == 1 end, [1,2,3,4,5,6,7]).
{[1,3,5,7],[2,4,6]}
> lists:partition(fun(A) -> is_atom(A) end, [a,b,1,c,d,2,3,4,e]).
{[a,b,c,d,e],[1,2,3,4]}
- See also
For a different way to partition a list, see
+
Returns
Returns a list with the elements in
Returns a list with the elements in
Example:
> lists:reverse([1, 2, 3, 4], [a, b, c]). [4,3,2,1,a,b,c]
Returns a sequence of integers which starts with
Returns a sequence of integers that starts with
+
Failure: If
Failures:
+If
If
If
The following equalities hold for all sequences:
-length(lists:seq(From, To)) == To-From+1
-length(lists:seq(From, To, Incr)) == (To-From+Incr) div Incr
- Examples:
+length(lists:seq(From, To)) =:= To - From + 1 +length(lists:seq(From, To, Incr)) =:= (To - From + Incr) div Incr +Examples:
> lists:seq(1, 10). [1,2,3,4,5,6,7,8,9,10] @@ -634,74 +734,87 @@ length(lists:seq(From, To, Incr)) == (To-From+Incr) div Incr [1]
Returns a list containing the sorted elements of
Returns a list containing the sorted elements of
Splits
Splits
Partitions
splitwith(Pred, List) ->
{takewhile(Pred, List), dropwhile(Pred, List)}.
- Examples:
+Examples:
> lists:splitwith(fun(A) -> A rem 2 == 1 end, [1,2,3,4,5,6,7]).
{[1],[2,3,4,5,6,7]}
> lists:splitwith(fun(A) -> is_atom(A) end, [a,b,1,c,d,2,3,4,e]).
{[a,b],[1,c,d,2,3,4,e]}
- See also
For a different way to partition a list, see
+
Returns the sub-list of
Returns the sublist of
Returns the sub-list of
Returns the sublist of
Examples:
> lists:sublist([1,2,3,4], 2, 2). [2,3] @@ -711,142 +824,163 @@ splitwith(Pred, List) -> []
Returns a new list
Returns a new list
Example:
> lists:subtract("123212", "212").
"312".
The complexity of
The complexity of
Returns
Returns the sum of the elements in
Takes elements
Takes elements
Returns the sorted list formed by merging
Returns the sorted list formed by merging
+
Returns a list containing the sorted elements of the list
-
Returns a list containing the sorted elements of list
+
Returns the sorted list formed by merging all the sub-lists
- of
Returns the sorted list formed by merging all the sublists
+ of
Returns the sorted list formed by merging
Returns the sorted list formed by merging
Returns the sorted list formed by merging
Returns the sorted list formed by merging
"Unzips" a list of two-tuples into two lists, where the first list contains the first element of each tuple, and the second list contains the second element of each tuple.
"Unzips" a list of three-tuples into three lists, where the first list contains the first element of each tuple, @@ -854,76 +988,84 @@ splitwith(Pred, List) -> the third list contains the third element of each tuple.
Returns a list containing the sorted elements of
-
Returns a list which contains the sorted elements of
-
"Zips" two lists of equal length into one list of two-tuples, where the first element of each tuple is taken from the first - list and the second element is taken from corresponding + list and the second element is taken from the corresponding element in the second list.
"Zips" three lists of equal length into one list of three-tuples, where the first element of each tuple is taken from the first list, the second element is taken from - corresponding element in the second list, and the third - element is taken from the corresponding element in the third - list.
+ the corresponding element in the second list, and the third + element is taken from the corresponding element in the third list.Combine the elements of two lists of equal length into one
- list. For each pair
Combines the elements of two lists of equal length into one list.
+ For each pair
Example:
+Example:
> lists:zipwith(fun(X, Y) -> X+Y end, [1,2,3], [4,5,6]). [5,7,9]
Combine the elements of three lists of equal length into one
- list. For each triple
Examples:
+Combines the elements of three lists of equal length into one
+ list. For each triple
Examples:
> lists:zipwith3(fun(X, Y, Z) -> X+Y+Z end, [1,2,3], [4,5,6], [7,8,9]).
[12,15,18]
diff --git a/lib/stdlib/doc/src/log_mf_h.xml b/lib/stdlib/doc/src/log_mf_h.xml
index 65622e52f5..edc3d31025 100644
--- a/lib/stdlib/doc/src/log_mf_h.xml
+++ b/lib/stdlib/doc/src/log_mf_h.xml
@@ -32,48 +32,56 @@
Martin Björklund
1996-10-31
A
- log_mf_h.sgml
+ log_mf_h.xml
log_mf_h
- An Event Handler which Logs Events to Disk
+ An event handler that logs events to disk.
- The log_mf_h is a gen_event handler module which
- can be installed in any gen_event process. It logs onto disk all events
- which are sent to an event manager. Each event is written as a
- binary which makes the logging very fast. However, a tool such as the Report Browser (rb ) must be used in order to read the files. The events are written to multiple files. When all files have been used, the first one is re-used and overwritten. The directory location, the number of files, and the size of each file are configurable. The directory will include one file called index , and
- report files 1, 2, .... .
-
+ This module is a gen_event handler module that can be installed
+ in any gen_event process. It logs onto disk all events that are
+ sent to an event manager. Each event is written as a binary, which makes
+ the logging very fast. However, a tool such as the Report Browser
+ (rb(3) ) must be used to read
+ the files. The events are written to multiple files. When all files have
+ been used, the first one is reused and overwritten. The directory
+ location, the number of files, and the size of each file are configurable.
+ The directory will include one file called index , and report files
+ 1, 2, ... .
+
Term to be sent to
- gen_event:add_handler/3 .
+ gen_event:add_handler/3 .
+ Initiates the event handler. This function returns
-
Initiates the event handler. Returns
This module contains functions for maps processing.
-
- Returns a map
- The call will fail with a
Example:
-
+ maps
+ Björn-Egil Dahlberg
+ 1
+ 2014-02-28
+ A
+ This module contains functions for maps processing.
+Returns a map
The call fails with a
Example:
+
> M = #{a => 2, b => 3, c=> 4, "a" => 1, "b" => 2, "c" => 4},
Pred = fun(K,V) -> is_atom(K) andalso (V rem 2) =:= 0 end,
maps:filter(Pred,M).
-#{a => 2,c => 4}
-
- Returns a tuple
- The call will fail with a
Example:
-
+
+
+
+
+ Returns a tuple {ok, Value} , where Value
+ is the value associated with Key , or error
+ if no value is associated with Key in
+ Map .
+ The call fails with a {badmap,Map} exception if
+ Map is not a map.
+ Example:
+
> Map = #{"hi" => 42},
Key = "hi",
maps:find(Key,Map).
-{ok,42}
-
-
+{ok,42}
+
- Calls
Example:
-
+
+
+
+
+ Calls F(K, V, AccIn) for every K to value
+ V association in Map in
+ any order. Function fun F/3 must return a new
+ accumulator, which is passed to the next successive call.
+ This function returns the final value of the accumulator. The initial
+ accumulator value Init is returned if the map is
+ empty.
+ Example:
+
> Fun = fun(K,V,AccIn) when is_list(K) -> AccIn + V end,
Map = #{"k1" => 1, "k2" => 2, "k3" => 3},
maps:fold(Fun,0,Map).
6
-
-
+ - The function takes a list of key-value tuples elements and builds a - map. The associations may be in any order and both keys and values in the - association may be of any term. If the same key appears more than once, - the latter (rightmost) value is used and the previous values are ignored. -
-Example:
-
+
+
+
+
+ Takes a list of key-value tuples elements and builds a map. The
+ associations can be in any order, and both keys and values in the
+ association can be of any term. If the same key appears more than
+ once, the latter (right-most) value is used and the previous values
+ are ignored.
+ Example:
+
> List = [{"a",ignored},{1337,"value two"},{42,value_three},{"a",1}],
maps:from_list(List).
#{42 => value_three,1337 => "value two","a" => 1}
-
-
+
- Returns the value
- The call will fail with a
Example:
-
+
+
+
+
+ Returns value Value associated with
+ Key if Map contains
+ Key .
+ The call fails with a {badmap,Map} exception if
+ Map is not a map, or with a {badkey,Key}
+ exception if no value is associated with Key .
+ Example:
+
> Key = 1337,
Map = #{42 => value_two,1337 => "value one","a" => 1},
maps:get(Key,Map).
"value one"
-
-
-
-
-
-
-
-
- Returns the value Value associated with Key if
- Map contains Key .
- If no value is associated with Key then returns Default .
-
-
- The call will fail with a {badmap,Map} exception if Map is not a map.
+
+
-
- Example:
-
+
+
+
+
+ Returns value Value associated with
+ Key if Map contains
+ Key . If no value is associated with
+ Key , Default is returned.
+ The call fails with a {badmap,Map} exception if
+ Map is not a map.
+ Example:
+
> Map = #{ key1 => val1, key2 => val2 }.
#{key1 => val1,key2 => val2}
> maps:get(key1, Map, "Default value").
val1
> maps:get(key3, Map, "Default value").
"Default value"
-
-
+
- Returns
- The call will fail with a
Example:
-
+
+
+
+
+ Returns true if map Map contains
+ Key and returns false if it does not
+ contain the Key .
+ The call fails with a {badmap,Map} exception if
+ Map is not a map.
+ Example:
+
> Map = #{"42" => value}.
#{"42"> => value}
> maps:is_key("42",Map).
true
> maps:is_key(value,Map).
false
-
-
+
- Returns a complete list of keys, in arbitrary order, which resides within
- The call will fail with a
Example:
-
+
+
+
+
+ Returns a complete list of keys, in any order, which resides
+ within Map .
+ The call fails with a {badmap,Map} exception if
+ Map is not a map.
+ Example:
+
> Map = #{42 => value_three,1337 => "value two","a" => 1},
maps:keys(Map).
[42,1337,"a"]
-
-
+
- The function produces a new map
Example:
-
+
+
+
+
+ Produces a new map Map2 by calling function
+ fun F(K, V1) for every K to value
+ V1 association in Map1 in
+ any order. Function fun F/2 must return value
+ V2 to be associated with key K
+ for the new map Map2 .
+ Example:
+
> Fun = fun(K,V1) when is_list(K) -> V1*2 end,
Map = #{"k1" => 1, "k2" => 2, "k3" => 3},
maps:map(Fun,Map).
#{"k1" => 2,"k2" => 4,"k3" => 6}
-
-
+
- Merges two maps into a single map
- The call will fail with a
Example:
-
+
+
+
+
+ Merges two maps into a single map Map3 . If two
+ keys exist in both maps, the value in Map1 is
+ superseded by the value in Map2 .
+ The call fails with a {badmap,Map} exception if
+ Map1 or Map2 is not a map.
+ Example:
+
> Map1 = #{a => "value_one", b => "value_two"},
Map2 = #{a => 1, c => 2},
maps:merge(Map1,Map2).
#{a => 1,b => "value_two",c => 2}
-
-
+ - Returns a new empty map. -
-Example:
-
+
+
+
+
+ Returns a new empty map.
+ Example:
+
> maps:new().
#{}
-
-
-
-
-
-
-
-
- Associates Key with value Value and inserts the association into map Map2 .
- If key Key already exists in map Map1 , the old associated value is
- replaced by value Value . The function returns a new map Map2 containing the new association and
- the old associations in Map1 .
-
-
- The call will fail with a {badmap,Map} exception if Map1 is not a map.
-
+
+
- Example:
-
+
+
+
+
+ Associates Key with value
+ Value and inserts the association into map
+ Map2 . If key Key already exists in map
+ Map1 , the old associated value is replaced by
+ value Value . The function returns a new map
+ Map2 containing the new association and the old
+ associations in Map1 .
+ The call fails with a {badmap,Map} exception if
+ Map1 is not a map.
+ Example:
+
> Map = #{"a" => 1}.
#{"a" => 1}
> maps:put("a", 42, Map).
#{"a" => 42}
> maps:put("b", 1337, Map).
#{"a" => 1,"b" => 1337}
-
-
+
- The function removes the
- The call will fail with a
Example:
-
+
+
+
+
+ Removes the Key , if it exists, and its
+ associated value from Map1 and returns a new map
+ Map2 without key Key .
+ The call fails with a {badmap,Map} exception if
+ Map1 is not a map.
+ Example:
+
> Map = #{"a" => 1}.
#{"a" => 1}
> maps:remove("a",Map).
#{}
> maps:remove("b",Map).
#{"a" => 1}
-
-
+ Returns the number of key-value associations in
+
Example:
+
+> Map = #{42 => value_two,1337 => "value one","a" => 1},
+ maps:size(Map).
+3
+
- The function removes the
- The call will fail with a
Example:
-
+
+
+
+
+ The function removes the Key , if it
+ exists, and its associated value from Map1
+ and returns a tuple with the removed Value
+ and the new map Map2 without key
+ Key . If the key does not exist
+ error is returned.
+
+ The call will fail with a {badmap,Map} exception if
+ Map1 is not a map.
+
+ Example:
+
> Map = #{"a" => "hello", "b" => "world"}.
#{"a" => "hello", "b" => "world"}
> maps:take("a",Map).
{"hello",#{"b" => "world"}}
> maps:take("does not exist",Map).
error
-
-
-
-
-
-
-
-
- The function returns the number of key-value associations in the Map .
- This operation happens in constant time.
-
- Example:
-
-> Map = #{42 => value_two,1337 => "value one","a" => 1},
- maps:size(Map).
-3
-
-
+
- The fuction returns a list of pairs representing the key-value associations of
- The call will fail with a
Example:
-
+
+
+
+
+ Returns a list of pairs representing the key-value associations of
+ Map , where the pairs
+ [{K1,V1}, ..., {Kn,Vn}] are returned in arbitrary order.
+ The call fails with a {badmap,Map} exception if
+ Map is not a map.
+ Example:
+
> Map = #{42 => value_three,1337 => "value two","a" => 1},
maps:to_list(Map).
[{42,value_three},{1337,"value two"},{"a",1}]
-
-
+
- If
- The call will fail with a
Example:
-
+
+
+
+
+ If Key exists in Map1 , the
+ old associated value is replaced by value Value .
+ The function returns a new map Map2 containing
+ the new associated value.
+ The call fails with a {badmap,Map} exception if
+ Map1 is not a map, or with a {badkey,Key}
+ exception if no value is associated with Key .
+ Example:
+
> Map = #{"a" => 1}.
#{"a" => 1}
> maps:update("a", 42, Map).
#{"a" => 42}
-
-
+ Update a value in a
Example:
-
+
+
+
+
+ Update a value in a Map1 associated
+ with Key by calling
+ Fun on the old value to get a new
+ value. An exception {badkey,Key } is
+ generated if Key is not present in the
+ map.
+ Example:
+
> Map = #{"counter" => 1},
Fun = fun(V) -> V + 1 end,
maps:update_with("counter",Fun,Map).
#{"counter" => 2}
-
-
+ Update a value in a
Example:
-
+
+
+
+
+ Update a value in a Map1 associated
+ with Key by calling
+ Fun on the old value to get a new value.
+ If Key is not present in
+ Map1 then Init will be
+ associated with Key .
+
+ Example:
+
> Map = #{"counter" => 1},
Fun = fun(V) -> V + 1 end,
maps:update_with("new counter",Fun,42,Map).
@@ -417,56 +392,54 @@ error
-
-
-
-
-
- Returns a complete list of values, in arbitrary order, contained in map Map .
-
-
- The call will fail with a {badmap,Map} exception if Map is not a map.
-
- Example:
-
+
+
+
+
+ Returns a complete list of values, in arbitrary order, contained in
+ map Map .
+ The call fails with a {badmap,Map} exception if
+ Map is not a map.
+ Example:
+
> Map = #{42 => value_three,1337 => "value two","a" => 1},
maps:values(Map).
[value_three,"value two",1]
-
-
+
+
-
-
-
-
-
- Returns a new map Map2 with the keys K1 through Kn and their associated values from map Map1 .
- Any key in Ks that does not exist in Map1 are ignored.
-
- Example:
-
+
+
+
+
+ Returns a new map Map2 with the keys K1
+ through Kn and their associated values from map
+ Map1 . Any key in Ks that does
+ not exist in Map1 is ignored.
+ Example:
+
> Map = #{42 => value_three,1337 => "value two","a" => 1},
Ks = ["a",42,"other key"],
maps:with(Ks,Map).
#{42 => value_three,"a" => 1}
-
-
+
+
-
-
-
-
-
- Returns a new map Map2 without the keys K1 through Kn and their associated values from map Map1 .
- Any key in Ks that does not exist in Map1 are ignored.
-
- Example:
-
+
+
+
+
+ Returns a new map Map2 without keys K1
+ through Kn and their associated values from map
+ Map1 . Any key in Ks that does
+ not exist in Map1 is ignored
+ Example:
+
> Map = #{42 => value_three,1337 => "value two","a" => 1},
Ks = ["a",42,"other key"],
maps:without(Ks,Map).
#{1337 => "value two"}
-
-
- This module provides an interface to a number of mathematical functions.
+Not all functions are implemented on all platforms. In particular,
- the
Not all functions are provided on all platforms. In particular,
+ the
A useful number.
-A collection of math functions which return floats. Arguments - are numbers.
+A collection of mathematical functions that return floats. Arguments + are numbers.
Returns the error function of
Returns the error function of
-erf(X) = 2/sqrt(pi)*integral from 0 to X of exp(-t*t) dt.+erf(X) = 2/sqrt(pi)*integral from 0 to X of exp(-t*t) dt.
A useful number.
+As these are the C library, the bugs are the same.
+As these are the C library, the same limitations apply.
This module implements the parse_transform that makes calls to
-
The translations from fun's to match_specs
- is accessed through the two "pseudo
- functions"
Actually this introduction is more or less an introduction to the
- whole concept of match specifications. Since everyone trying to use
-
There are some caveats one should be aware of, please read through - the whole manual page if it's the first time you're using the - transformations.
-Match specifications are used more or less as filters.
- They resemble usual Erlang matching in a list comprehension or in
- a
As the match specifications execution and structure is quite like - that of a fun, it would for most programmers be more straight forward - to simply write it using the familiar fun syntax and having that - translated into a match specification automatically. Of course a real - fun is more powerful than the match specifications allow, but bearing - the match specifications in mind, and what they can do, it's still +
This module provides the parse transformation that makes calls to
+
The translation from funs to match specifications
+ is accessed through the two "pseudo functions"
+
As everyone trying to use
+
Read the whole manual page if it is the first time you are using + the transformations.
+ +Match specifications are used more or less as filters. They resemble
+ usual Erlang matching in a list comprehension or in a fun used with
+
As the execution and structure of the match specifications are like + that of a fun, it is more straightforward + to write it using the familiar fun syntax and to have that + translated into a match specification automatically. A real fun is + clearly more powerful than the match specifications allow, but bearing + the match specifications in mind, and what they can do, it is still more convenient to write it all as a fun. This module contains the - code that simply translates the fun syntax into match_spec terms.
-Let's start with an ets example. Using
As an example, consider a simple table of employees:
+ code that translates the fun syntax into match specification + terms. +Using
Consider a simple table of employees:
+
-record(emp, {empno, %Employee number as a string, the key
surname, %Surname of the employee
givenname, %Given name of employee
- dept, %Department one of {dev,sales,prod,adm}
- empyear}). %Year the employee was employed
+ dept, %Department, one of {dev,sales,prod,adm}
+ empyear}). %Year the employee was employed
+
We create the table using:
+
-ets:new(emp_tab,[{keypos,#emp.empno},named_table,ordered_set]).
- Let's also fill it with some randomly chosen data for the examples:
+ets:new(emp_tab, [{keypos,#emp.empno},named_table,ordered_set]). + +We fill the table with randomly chosen data:
+
[{emp,"011103","Black","Alfred",sales,2000},
{emp,"041231","Doe","John",prod,2001},
@@ -96,167 +112,204 @@ ets:new(emp_tab,[{keypos,#emp.empno},named_table,ordered_set]).
{emp,"535216","Chalker","Samuel",adm,1998},
{emp,"789789","Harrysson","Joe",adm,1996},
{emp,"963721","Scott","Juliana",dev,2003},
- {emp,"989891","Brown","Gabriel",prod,1999}]
- Now, the amount of data in the table is of course to small to justify
- complicated ets searches, but on real tables, using
Lets say for example that we'd want the employee numbers of
- everyone in the sales department. One might use
Assuming that we want the employee numbers of everyone in the sales + department, there are several ways.
+ +
1> ets:match(emp_tab, {'_', '$1', '_', '_', sales, '_'}).
-[["011103"],["076324"]]
- Even though
ets:foldr(fun(#emp{empno = E, dept = sales},Acc) -> [E | Acc];
(_,Acc) -> Acc
end,
[],
- emp_tab).
- Running that would result in
The result is
Consider a "pure"
-ets:select(emp_tab,[{#emp{empno = '$1', dept = sales, _='_'},[],['$1']}]).
- Even though the record syntax is used, it's still somewhat hard to +ets:select(emp_tab, [{#emp{empno = '$1', dept = sales, _='_'},[],['$1']}]). + +
Although the record syntax is used, it is still hard to
read and even harder to write. The first element of the tuple,
-
We have one efficient but hardly readable way of doing it and one
- inefficient but fairly readable (at least to the skilled Erlang
- programmer) way of doing it. With the use of
Using
-include_lib("stdlib/include/ms_transform.hrl").
-% ...
-
ets:select(emp_tab, ets:fun2ms(
fun(#emp{empno = E, dept = sales}) ->
E
- end)).
- This may not be the shortest of the expressions, but it requires no
- special knowledge of match specifications to read. The fun's head
- should simply match what you want to filter out and the body returns
- what you want returned. As long as the fun can be kept within the
- limits of the match specifications, there is no need to transfer all
- data of the table to the process for filtering as in the
-
It's worth noting in the above
Let's look at some more
This example requires no special knowledge of match
+ specifications to understand. The head of the fun matches what
+ you want to filter out and the body returns what you want
+ returned. As long as the fun can be kept within the limits of the
+ match specifications, there is no need to transfer all table data
+ to the process for filtering as in the
In the
Assume that we want to get all the employee numbers of employees
+ hired before year 2000. Using
[E | Acc];
(_,Acc) -> Acc
end,
[],
emp_tab). ]]>
- The result will be
-
The result is
- This gives the same result, the
This gives the same result.
We write it using
- E
+ E
end)). ]]>
- Obviously readability is gained by using the parse transformation.
-I'll show some more examples without the tiresome
- comparing-to-alternatives stuff. Let's say we'd want the whole object
- matching instead of only one element. We could of course assign a
- variable to every part of the record and build it up once again in the
- body of the
Assume that we want the whole object matching instead of only one + element. One alternative is to assign a variable to every part + of the record and build it up once again in the body of the fun, but + the following is easier:
+
Obj
- end)). ]]>
- Just as in ordinary Erlang matching, you can bind a variable to the
- whole matched object using a "match in then match", i.e. a
-
Let's do something in the
As in ordinary Erlang matching, you can bind a variable to the
+ whole matched object using a "match inside the match", that is, a
+
This example concerns the body of the fun. Assume that all employee
+ numbers beginning with zero (
ets:select(emp_tab, ets:fun2ms(
fun(#emp{empno = [$0 | Rest] }) ->
{[$0|Rest],[$1|Rest]}
- end)).
- As a matter of fact, this query hits the feature of partially bound
- keys in the table type
The fun of course can have several clauses, so that if one could do
- the following: For each employee, if he or she is hired prior to 1997,
- return the tuple
This query hits the feature of partially bound
+ keys in table type
The fun can have many clauses. Assume that we want to do + the following:
+ +If an employee started before 1997, return the tuple
+
If an employee started 1997 or later, but before 2001, return
+
For all other employees, return
+
This is accomplished as follows:
+
@@ -268,7 +321,9 @@ ets:select(emp_tab, ets:fun2ms(
(#emp{empno = E, empyear = Y}) -> % 1997 -- 2001
{rookie, E}
end)). ]]>
- The result will be:
+ +The result is as follows:
+
[{rookie,"011103"},
{rookie,"041231"},
@@ -278,162 +333,207 @@ ets:select(emp_tab, ets:fun2ms(
{rookie,"535216"},
{inventory,"789789"},
{newbie,"963721"},
- {rookie,"989891"}]
- and so the Smith's will be happy...
-So, what more can you do? Well, the simple answer would be; look
- in the documentation of match specifications in ERTS users
- guide. However let's briefly go through the most useful "built in
- functions" that you can use when the
What more can you do? A simple answer is: see the documentation of
+
The head of the
The head of the fun is a head matching (or mismatching)
+ one parameter, one object of the table we select
from. The object is always a single variable (can be
The guard section can contain any guard expression of Erlang.
- Even the "old" type test are allowed on the toplevel of the guard
- (
Type tests:
Boolean operators:
Relational operators: >, >=, <, =<, =:=, ==, =/=, /=
+Arithmetics:
Bitwise operators:
The guard BIFs:
Contrary to the fact with "handwritten" match specifications, the
Semicolons (
Semicolons (
The body of the
The body of the fun is used to construct the
+ resulting value. When selecting from tables, one usually construct
a suiting term here, using ordinary Erlang term construction, like
- tuple parentheses, list brackets and variables matched out in the
- head, possibly in conjunction with the occasional constant. Whatever
- expressions are allowed in guards are also allowed here, but there are
- no special functions except
The
Let's move on to the
The same reasons for using the parse transformation applies to
-
Let's manufacture a toy module to trace on
+ only thing left, more or less, is term construction. +This section describes the slightly different match specifications
+ translated by
The same reasons for using the parse transformation apply to
+
The following is an example module to trace on:
+
-module(toy).
-export([start/1, store/2, retrieve/1]).
start(Args) ->
- toy_table = ets:new(toy_table,Args).
+ toy_table = ets:new(toy_table, Args).
store(Key, Value) ->
- ets:insert(toy_table,{Key,Value}).
+ ets:insert(toy_table, {Key,Value}).
retrieve(Key) ->
- [{Key, Value}] = ets:lookup(toy_table,Key),
- Value.
- During model testing, the first test bails out with a + [{Key, Value}] = ets:lookup(toy_table, Key), + Value. + +
During model testing, the first test results in
We suspect the ets call, as we match hard on the return value, but
- want only the particular
We suspect the
1> dbg:tracer().
{ok,<0.88.0>}
- And so we turn on call tracing for all processes, we are going to - make a pretty restrictive trace pattern, so there's no need to call - trace only a few processes (it usually isn't):
+ +We turn on call tracing for all processes, we want to + make a pretty restrictive trace pattern, so there is no need to call + trace only a few processes (usually it is not):
+
2> dbg:p(all,call).
-{ok,[{matched,nonode@nohost,25}]}
- It's time to specify the filter. We want to view calls that resemble
-
We specify the filter, we want to view calls that resemble
+
3> dbg:tp(ets,new,dbg:fun2ms(fun([toy_table,_]) -> true end)).
-{ok,[{matched,nonode@nohost,1},{saved,1}]}
- As can be seen, the
As can be seen, the fun used with
+
When we run the test of our module now, we get the following trace - output:
+ function. A single variable can also be used. The body + of the fun expresses, in a more imperative way, actions to be taken if + the fun head (and the guards) matches.The following trace output is received during test:
+) call ets:new(toy_table,[ordered_set]) ]]>
- Let's play we haven't spotted the problem yet, and want to see what
-
Assume that we have not found the problem yet, and want to see what
+
4> dbg:tp(ets,new,dbg:fun2ms(fun([toy_table,_]) -> return_trace() end)).-
Resulting in the following trace output when we run the test:
+ +The following trace output is received during test:
+) call ets:new(toy_table,[ordered_set])
(<0.86.0>) returned from ets:new/2 -> 24 ]]>
- The call to
The call to
As the test now fails with
The test now fails with
start(Args) ->
- toy_table = ets:new(toy_table,[named_table |Args]).
- And with the same tracing turned on, we get the following trace - output:
+ toy_table = ets:new(toy_table, [named_table|Args]). + +With the same tracing turned on, the following trace output is + received:
+) call ets:new(toy_table,[named_table,ordered_set])
(<0.86.0>) returned from ets:new/2 -> toy_table ]]>
- Very well. Let's say the module now passes all testing and goes into
- the system. After a while someone realizes that the table
-
Assume that the module now passes all testing and goes into
+ the system. After a while, it is found that table
+
1> dbg:tracer().
{ok,<0.88.0>}
@@ -441,80 +541,101 @@ start(Args) ->
{ok,[{matched,nonode@nohost,25}]}
3> dbg:tpl(toy,store,dbg:fun2ms(fun([A,_]) when is_atom(A) -> true end)).
{ok,[{matched,nonode@nohost,1},{saved,1}]}
- We use
Let's say nothing happens when we trace in this way. Our function
- is never called with these parameters. We make the conclusion that
- someone else (some other module) is doing it and we realize that we
- must trace on ets:insert and want to see the calling function. The
- calling function may be retrieved using the match specification
- function
We use
Assume that nothing happens when tracing in this way. The function
+ is never called with these parameters. We conclude that
+ someone else (some other module) is doing it and realize that we
+ must trace on
4> dbg:tpl(ets,insert,dbg:fun2ms(fun([toy_table,{A,_}]) when is_atom(A) ->
message(caller())
end)).
-{ok,[{matched,nonode@nohost,1},{saved,2}]}
- The caller will now appear in the "additional message" part of the - trace output, and so after a while, the following output comes:
+{ok,[{matched,nonode@nohost,1},{saved,2}]} + +The caller is now displayed in the "additional message" part of the + trace output, and the following is displayed after a while:
+) call ets:insert(toy_table,{garbage,can}) ({evil_mod,evil_fun,2}) ]]>
- You have found out that the function
This was just a toy example, but it illustrated the most used
- calls in match specifications for
To end this chatty introduction with something more precise, here
- follows some parts about caveats and restrictions concerning the fun's
- used in conjunction with
You have realized that function
This example illustrates the most used calls in match specifications for
+
The following warnings and restrictions apply to the funs used in
+ with
To use the pseudo functions triggering the translation, one
- has to include the header file
To use the pseudo functions triggering the translation,
+ ensure to include the header file
The
The fun must be literally constructed inside the
+ parameter list to the pseudo functions. The fun cannot
be bound to a variable first and then passed to
-
Several restrictions apply to the fun that is being translated - into a match_spec. To put it simple you cannot use anything in - the fun that you cannot use in a match_spec. This means that, + +
Many restrictions apply to the fun that is translated into a match + specification. To put it simple: you cannot use anything in the fun + that you cannot use in a match specification. This means that, among others, the following restrictions apply to the fun itself:
+Variables that are not appearing in the head are imported
- from the environment and made into
- match_spec
Functions written in Erlang cannot be called, neither can + local functions, global functions, or real funs.
+Everything that is written as a function call is translated
+ into a match specification call to a built-in function, so that
+ the call
Variables occurring in the head of the fun are replaced by
+ match specification variables in the order of occurrence, so
+ that fragment
Variables that are not included in the head are imported
+ from the environment and made into match specification
+
1> X = 25.
25
@@ -523,7 +644,7 @@ start(Args) ->
Matching with
1> ets:fun2ms(fun({A,[B|C]} = D) when A > B -> D end).
@@ -534,106 +655,125 @@ match_spec
{error,transform_error}
3> ets:fun2ms(fun({A,[B|C]}) when A > B -> D = [B|C], D end).
Error: fun with body matching ('=' in body) is illegal as match_spec
-{error,transform_error}
- All variables are bound in the head of a match_spec, so the
- translator can not allow multiple bindings. The special case
- when matching is done on the top level makes the variable bind
- to
All variables are bound in the head of a match specification, so
+ the translator cannot allow multiple bindings. The special case
+ when matching is done on the top-level makes the variable bind
+ to
The following expressions are translated equally:
ets:fun2ms(fun({a,_} = A) -> A end).
ets:fun2ms(fun({a,_}) -> object() end).
The special match_spec variables
The special match specification variables
ets:match_object(Table, {'$1',test,'$2'}).
- ...is the same as...
+This is the same as:
ets:select(Table, ets:fun2ms(fun({A,test,B}) -> object() end)).
- (This was just an example, in this simple case the former
- expression is probably preferable in terms of readability).
- The
In this simple case, the former + expression is probably preferable in terms of readability.
+The
ets:select(Table, [{{'$1',test,'$2'},[],['$_']}]).
- Matching on the top level of the fun head might feel like a +
Matching on the top-level of the fun head can be a
more natural way to access
Term constructions/literals are translated as much as is needed to
+ get them into valid match specification. This way tuples are made
+ into match specification tuple constructions (a one element tuple
+ containing the tuple) and constant expressions are used when
+ importing variables from the environment. Records are also
+ translated into plain tuple constructions, calls to element,
+ and so on. The guard test
Language constructions such as
If header file
Ensure that the header is included when using
If pseudo function triggering the translation is
+
The translation from fun's to match_specs is done at compile +
The translation from funs to match specifications is done at compile time, so runtime performance is not affected by using these pseudo - functions. The compile time might be somewhat longer though.
-For more information about match_specs, please read about them - in ERTS users guide.
- + functions. +For more information about match specifications, see the
+
Implements the actual transformation at compile time. This
- function is called by the compiler to do the source code
- transformation if and when the
Takes an error code returned by one of the other functions + in the module and creates a textual description of the + error.
Implements the actual transformation when the
Implements the transformation at compile time. This
+ function is called by the compiler to do the source code
+ transformation if and when header file
For information about how to use this parse transformation, see
+
For a description of match specifications, see section
+
Takes an error code returned by one of the other functions - in the module and creates a textual description of the - error. Fairly uninteresting function actually.
+Implements the transformation when the
This module provides a
This module provides exactly the same interface as the module
-
This module provides the same interface as the
+
Dictionary as returned by
Dictionary as returned by
+
This function appends a new
Appends a new
See also section
This function appends a list of values
Appends a list of values
See also section
This function erases all items with a given key from a - dictionary.
+Erases all items with a specified key from a dictionary.
This function returns the value associated with
Returns the value associated with
See also section
This function returns a list of all keys in the dictionary.
+Returns a list of all keys in a dictionary.
This function searches for a key in a dictionary. Returns
-
Searches for a key in a dictionary. Returns
+
See also section
Calls
This function converts the
Converts the
Returns
This function tests if
Tests if
Calls
Merges two dictionaries,
merge(Fun, D1, D2) ->
fold(fun (K, V1, D) ->
update(K, fun (V2) -> Fun(K, V1, V2) end, V1, D)
end, D2, D1).
- but is faster.
This function creates a new dictionary.
+Creates a new dictionary.
Returns the number of elements in an
Returns
This function stores a
Stores a
This function converts the dictionary to a list - representation.
+Converts a dictionary to a list representation.
Update a value in a dictionary by calling
Updates a value in a dictionary by calling
Update a value in a dictionary by calling
Updates a value in a dictionary by calling
append(Key, Val, D) ->
update(Key, fun (Old) -> Old ++ [Val] end, [Val], D).
Add
Adds
This could be defined as:
+This can be defined as follows, but is faster:
update_counter(Key, Incr, D) ->
update(Key, fun (Old) -> Old + Incr end, Incr, D).
- but is faster.
The functions
Functions
> D0 = orddict:new(), @@ -264,19 +293,18 @@ update_counter(Key, Incr, D) -> D3 = orddict:append(files, f2, D2), D4 = orddict:append(files, f3, D3), orddict:fetch(files, D4). -[f1,f2,f3]+[f1,f2,f3]
This saves the trouble of first fetching a keyed value, appending a new value to the list of stored values, and storing - the result. -
-The function
Function
Sets are collections of elements with no duplicate elements.
An
This module provides exactly the same interface as the module
-
This module provides the same interface as the
+
As returned by new/0.
As returned by
+
Returns a new empty ordered set.
+Returns a new ordered set formed from
Returns
Returns
Returns the number of elements in
Filters elements in
Returns the elements of
Folds
Returns an ordered set of the elements in
Returns an ordered set of the elements in
Returns
Returns the intersection of the non-empty list of sets.
Returns a new ordered set formed from
Returns the intersection of
Returns
Returns
Returns the merged (union) set of
Returns
Returns the merged (union) set of the list of sets.
+Returns
Returns the intersection of
Returns
Returns the intersection of the non-empty list of sets.
+Returns a new empty ordered set.
Returns
Returns the number of elements in
Returns only the elements of
Returns only the elements of
Returns
Returns the elements of
Fold
Returns the merged (union) set of the list of sets.
Filter elements in
Returns the merged (union) set of
The Erlang standard library STDLIB.
-This module can be used to run a set of Erlang nodes as a pool of computational processors. It is organized as a master and a set of slave nodes and includes the following features:
+The BIF
The slave nodes are started with the
If the master node fails, the entire pool will exit.
+ +The slave nodes are started with the
+
If the master node fails, the entire pool exits.
+Starts a new pool. The file
The slave nodes are started with
Access rights must be set so that all nodes in the pool have - the authority to access each other.
-The function is synchronous and all the nodes, as well as - all the system servers, are running when it returns a value.
-This function ensures that a pool master is running and
- includes
Ensures that a pool master is running and includes
+
Stops the pool and kills all the slave nodes.
+Returns the node with the expected lowest future load.
Returns a list of the current member nodes of the pool.
Spawns a process on the pool node which is expected to have +
Spawns a process on the pool node that is expected to have the lowest future load.
Spawn links a process on the pool node which is expected to +
Spawns and links to a process on the pool node that is expected to have the lowest future load.
Returns the node with the expected lowest future load.
+Starts a new pool. The file
The slave nodes are started with
+
Access rights must be set so that all nodes in the pool have + the authority to access each other.
+The function is synchronous and all the nodes, and + all the system servers, are running when it returns a value.
+Stops the pool and kills all the slave nodes.
This module is used to start processes adhering to
- the
Some useful information is initialized when a process starts. The registered names, or the process identifiers, of the parent process, and the parent ancestors, are stored together with information about the function initially called in the process.
-While in "plain Erlang" a process is said to terminate normally
- only for the exit reason
While in "plain Erlang", a process is said to terminate normally
+ only for exit reason
When a process started using
When a process that is started using
The crash report contains the previously stored information such
+ if the
The crash report contains the previously stored information, such as ancestors and initial function, the termination reason, and - information regarding other processes which terminate as a result + information about other processes that terminate as a result of this process terminating.
See
Equivalent to
This function can be used by a user-defined event handler to
+ format a crash report. The crash report is sent using
+
This function can be used by a user-defined event handler to
+ format a crash report. When
This function does the same as (and does call) the
+
Always use this function instead of the BIF for processes started
+ using
This function must be used by a process that has been started by
+ a
Function
If this function is not called, the start function + returns an error tuple (if a link and/or a time-out is used) or + hang otherwise.
+The following example illustrates how this function and
+
+-module(my_proc).
+-export([start_link/0]).
+-export([init/1]).
+
+start_link() ->
+ proc_lib:start_link(my_proc, init, [self()]).
+
+init(Parent) ->
+ case do_initialization() of
+ ok ->
+ proc_lib:init_ack(Parent, {ok, self()});
+ {error, Reason} ->
+ exit(Reason)
+ end,
+ loop().
+
+...
+ Extracts the initial call of a process that was started
+ using one of the spawn or start functions in this module.
+
The list
If the process was spawned using a fun,
Spawns a new process and initializes it as described above.
- The process is spawned using the
-
Spawns a new process and initializes it as described in the
+ beginning of this manual page. The process is spawned using the
+
Spawns a new process and initializes it as described above.
- The process is spawned using the
-
Spawns a new process and initializes it as described in the
+ beginning of this manual page. The process is spawned using the
+
Spawns a new process and initializes it as described above.
- The process is spawned using the
-
Spawns a new process and initializes it as described in the
+ beginning of this manual page. The process is spawned using the
+
Using the spawn option
Using spawn option
Starts a new process synchronously. Spawns the process and
- waits for it to start. When the process has started, it
+ waits for it to start. When the process has started, it
must call
-
If the
If function
If
The
If
Argument
Using the spawn option
Using spawn option
This function must be used by a process that has been started by
- a
The
If this function is not called, the start function will - return an error tuple (if a link and/or a timeout is used) or - hang otherwise.
-The following example illustrates how this function and
-
--module(my_proc).
--export([start_link/0]).
--export([init/1]).
-start_link() ->
- proc_lib:start_link(my_proc, init, [self()]).
-
-init(Parent) ->
- case do_initialization() of
- ok ->
- proc_lib:init_ack(Parent, {ok, self()});
- {error, Reason} ->
- exit(Reason)
- end,
- loop().
-
-...
- Equivalent to
This function can be used by a user defined event handler to
- format a crash report. The crash report is sent using
-
Equivalent to
This function can be used by a user defined event handler to
- format a crash report. When
Orders the process to exit with the specified
Returns
If the call times out, a
If the process does not exist, a
The implementation of this function is based on the
+
Extracts the initial call of a process that was started
- using one of the spawn or start functions described above.
-
The list
If the process was spawned using a fun,
This function is used by the
Extracts the initial call of a process that was started
- using one of the spawn or start functions described above,
- and translates it to more useful information.
This function is used by functions
+
This function extracts the initial call of a process that was
+ started using one of the spawn or start functions in this module,
+ and translates it to more useful information.
+
If the initial call is to one of the system defined behaviors +
If the initial call is to one of the system-defined behaviors
such as
A
By default,
This function does the same as (and does call) the BIF
-
Equivalent to
Orders the process to exit with the given
The function returns
If the call times out, a
If the process does not exist, a
The implementation of this function is based on the
-
Property lists are ordinary lists containing entries in the form
of either tuples, whose first elements are keys used for lookup and
- insertion, or atoms, which work as shorthand for tuples
Property lists are useful for representing inherited properties, - such as options passed to a function where a user may specify options + such as options passed to a function where a user can specify options overriding the default settings, object properties, annotations, - etc.
-Two keys are considered equal if they match (
Two keys are considered equal if they match (
Similar to
Similar to
+
Example:
+
+append_values(a, [{a, [1,2]}, {b, 0}, {a, 3}, {c, -1}, {a, [4]}])
+ returns:
+
+[1,2,3,4]
Minimizes the representation of all entries in the list. This is
equivalent to
See also:
See also
+
Expands particular properties to corresponding sets of
- properties (or other terms). For each pair
For example, the following expressions all return
For example, the following expressions all return
+
- expand([{foo, [bar, baz]}],
- [fie, foo, fum])
- expand([{{foo, true}, [bar, baz]}],
- [fie, foo, fum])
- expand([{{foo, false}, [bar, baz]}],
- [fie, {foo, false}, fum])
- However, no expansion is done in the following call:
+expand([{foo, [bar, baz]}], [fie, foo, fum]) +expand([{{foo, true}, [bar, baz]}], [fie, foo, fum]) +expand([{{foo, false}, [bar, baz]}], [fie, {foo, false}, fum]) +However, no expansion is done in the following call
+ because
- expand([{{foo, true}, [bar, baz]}],
- [{foo, false}, fie, foo, fum])
- because
Note that if the original property term is to be preserved in the +expand([{{foo, true}, [bar, baz]}], [{foo, false}, fie, foo, fum]) +
Notice that if the original property term is to be preserved in the
result when expanded, it must be included in the expansion list. The
inserted terms are not expanded recursively. If
-
See also:
See also
+
Similar to
See also:
Similar to
+
Returns the value of a boolean key/value option. If
-
See also:
See also
+
Returns an unordered list of the keys used in
Returns an unordered list of the keys used in
+
Equivalent to
Equivalent to
+
Returns the value of a simple key/value property in
-
See also:
See also
+
Returns
See also:
See also
+
Returns the list of all entries associated with
See also:
Returns the list of all entries associated with
+
See also
+
Passes
For a
For an
The final result is automatically compacted (compare
+
Typically you want to substitute negations first, then aliases, then perform one or more expansions (sometimes you want to pre-expand particular entries before doing the main expansion). You might want to substitute negations and/or aliases repeatedly, to allow such forms in the right-hand side of aliases and expansion lists.
-See also:
See also
Creates a normal form (minimal) representation of a property. If
-
See also:
See also
+
Creates a normal form (minimal) representation of a simple
- key/value property. Returns
See also:
Creates a normal form (minimal) representation of a simple key/value
+ property. Returns
See also
+
Partitions
Example: - split([{c, 2}, {e, 1}, a, {c, 3, 4}, d, {b, 5}, b], [a, b, c])
-returns
-{[[a], [{b, 5}, b],[{c, 2}, {c, 3, 4}]], [{e, 1}, d]}
+Example:
+
+split([{c, 2}, {e, 1}, a, {c, 3, 4}, d, {b, 5}, b], [a, b, c])
+ returns:
+
+{[[a], [{b, 5}, b],[{c, 2}, {c, 3, 4}]], [{e, 1}, d]}
Substitutes keys of properties. For each entry in
-
Example:
For example,
See also:
See also
+
Substitutes keys of boolean-valued properties and
simultaneously negates their values. For each entry in
-
Example:
For example,
See also:
See also
+
Unfolds all occurrences of atoms in
Unfolds all occurrences of atoms in
The
This module provides a query interface to
+
The This module provides a query interface to QLC
+ tables. Typical QLC tables are Mnesia, ETS, and
+ Dets tables. Support is also provided for user-defined tables, see section
+ While ordinary list comprehensions evaluate to lists, calling
- Syntactically QLCs have the same parts as ordinary list
comprehensions: [Expression || Qualifier1, Qualifier2, ...]
+
+[Expression || Qualifier1, Qualifier2, ...]
-
The evaluation of a query handle begins by the inspection of
- options and the collection of information about tables. As a
- result qualifiers are modified during the optimization phase.
- Next all list expressions are evaluated. If a cursor has been
- created evaluation takes place in the cursor process. For those
- list expressions that are QLCs, the list expressions of the
- QLCs' generators are evaluated as well. One has to be careful if
- list expressions have side effects since the order in which list
- expressions are evaluated is unspecified. Finally the answers
- are found by evaluating the qualifiers from left to right,
- backtracking when some filter returns
Filters that do not return A query handle is evaluated in the following order: Inspection of options and the collection of information about
+ tables. As a result, qualifiers are modified during the optimization
+ phase. All list expressions are evaluated. If a cursor has been created,
+ evaluation takes place in the cursor process. For list expressions
+ that are QLCs, the list expressions of the generators of the QLCs
+ are evaluated as well. Be careful if list expressions have side
+ effects, as list expressions are evaluated in unspecified order. The answers are found by evaluating the qualifiers from left to
+ right, backtracking when some filter returns Filters that do not return
+
+
+
The
Lookup join traverses all objects of one query handle
+ and finds objects of the other handle (a QLC table) such that the
values at
Merge join sorts the objects of each query handle if
necessary and filters out objects where the values at
-
The
The join is to be expressed as a guard filter. The filter must
be placed immediately after the two joined generators, possibly
after guard filters that use variables from no other generators
- but the two joined generators. The
The following options are accepted by
The following options are accepted by
+
As mentioned earlier,
+ queries are expressed in the list comprehension syntax as described
+ in section
+ Many list comprehension expressions can be evaluated by the
-
What the
Besides
Besides
As another example, consider concatenating the answers to two
- queries QH1 and QH2 while removing all duplicates. The means to
- accomplish this is to use the
+
+
- The cost is substantial: every returned answer will be stored - in an ETS table. Before returning an answer it is looked up in +
The cost is substantial: every returned answer is stored
+ in an ETS table. Before returning an answer, it is looked up in
the ETS table to check if it has already been returned. Without
- the
If the order of the answers is not important there is the - alternative to sort the answers uniquely:
+If the order of the answers is not important, there is an
+ alternative to the
+
+
- This query also removes duplicates but the answers will be - sorted. If there are many answers temporary files will be used. - Note that in order to get the first unique answer all answers - have to be found and sorted. Both alternatives find duplicates - by comparing answers, that is, if A1 and A2 are answers found in - that order, then A2 is a removed if A1 == A2.
+This query also removes duplicates but the answers are
+ sorted. If there are many answers, temporary files are used.
+ Notice that to get the first unique answer, all answers
+ must be found and sorted. Both alternatives find duplicates by comparing
+ answers, that is, if
To return just a few answers cursors can be used. The following +
To return only a few answers, cursors can be used. The following code returns no more than five answers using an ETS table for storing the unique answers:
-
+
- Query list comprehensions are convenient for stating - constraints on data from two or more tables. An example that +
QLCs are convenient for stating + constraints on data from two or more tables. The following example does a natural join on two query handles on position 2:
-
+
- The
The
The
Option
+
- In this case the filter will be applied to every possible pair - of answers to QH1 and QH2, one at a time. If there are M answers - to QH1 and N answers to QH2 the filter will be run M*N - times.
- -If QH2 is a call to the function for
In this case the filter is applied to every possible pair
+ of answers to
If
-
- or just
- -
-
- The effect of the
+
+
+ or only
+ +
+
+
+ The effect of option
Option
There is an option
Option
As an example of
+ how to use function
+
TF is the traversal function. The qlc module
requires that there is a way of traversing all objects of the
- data structure; in gb_trees there is an iterator function
- suitable for that purpose. Note that for each object returned a
+ data structure. gb_trees has an iterator function
+ suitable for that purpose. Notice that for each object returned, a
new fun is created. As long as the list is not terminated by
- [] it is assumed that the tail of the list is a nullary
+ [] , it is assumed that the tail of the list is a nullary
function and that calling the function returns further objects
(and functions).
The lookup function is optional. It is assumed that the lookup
function always finds values much faster than it would take to
traverse the table. The first argument is the position of the
- key. Since qlc_next returns the objects as
- {Key, Value} pairs the position is 1. Note that the lookup
- function should return {Key, Value} pairs, just as the
- traversal function does.
+ key. As qlc_next/1 returns the objects as {Key, Value}
+ pairs, the position is 1. Notice that the lookup function is to return
+ {Key, Value} pairs, as the traversal function does.
The format function is also optional. It is called by
- qlc:info to give feedback at runtime of how the query
- will be evaluated. One should try to give as good feedback as
- possible without showing too much details. In the example at
- most 7 objects of the table are shown. The format function
+ info/1,2
+ to give feedback at runtime of how the query
+ is to be evaluated. Try to give as good feedback as
+ possible without showing too much details. In the example, at
+ most seven objects of the table are shown. The format function
handles two cases: all means that all objects of the
- table will be traversed; {lookup, 1, KeyValues}
- means that the lookup function will be used for looking up key
+ table are traversed; {lookup, 1, KeyValues}
+ means that the lookup function is used for looking up key
values.
- Whether the whole table will be traversed or just some keys
- looked up depends on how the query is stated. If the query has
+
Whether the whole table is traversed or only some keys
+ looked up depends on how the query is expressed. If the query has
the form
-
+
+
- and P is a tuple, the qlc module analyzes P and F in
- compile time to find positions of the tuple P that are tested
+
and P is a tuple, the qlc module analyzes
+ P and F in
+ compile time to find positions of tuple P that are tested
for equality to constants. If such a position at runtime turns
out to be the key position, the lookup function can be used,
- otherwise all objects of the table have to be traversed. It is
- the info function InfoFun that returns the key position.
+ otherwise all objects of the table must be traversed.
+ The info function InfoFun returns the key position.
There can be indexed positions as well, also returned by the
info function. An index is an extra table that makes lookup on
- some position fast. Mnesia maintains indices upon request,
- thereby introducing so called secondary keys. The qlc
+ some position fast. Mnesia maintains indexes upon request,
+ and introduces so called secondary keys. The qlc
module prefers to look up objects using the key before secondary
keys regardless of the number of constants to look up.
-
In Erlang there are two operators for testing term equality,
- namely Erlang/OTP has two operators for testing term equality: If the If the In the example the In the example, operator Looking up just Looking up only If the table uses Lookup join is handled analogously to lookup of constants in a
- table: if the join operator is Parse trees for Erlang expression, see the Parse trees for Erlang expression, see section Match specification, see the Match specification, see section Actually an integer > 1. An integer > 1. A literal
@@ -586,11 +613,11 @@ ets:table(53264,
3> lists:sort(qlc:e(Q2)).
[a,b,c]
-
See
Returns a query handle. When evaluating the query handle
-
Returns a query handle. When evaluating query handle
+
Returns a query handle. When evaluating the query handle
-
Returns a query handle. When evaluating query handle
+
Creates a query cursor and
makes the calling process the owner of the cursor. The
- cursor is to be used as argument to
Example:
1> QH = qlc:q([{X,Y} || X <- [a,b], Y <- [1,2]]),
QC = qlc:cursor(QH),
@@ -759,15 +786,15 @@ ok
Evaluates a query handle in the calling process and collects all answers in a list.
- +Example:
1> QH = qlc:q([{X,Y} || X <- [a,b], Y <- [1,2]]),
qlc:eval(QH).
@@ -786,11 +813,11 @@ ok
the query handle together with an extra argument
Example:
1> QH = [1,2,3,4,5,6], qlc:fold(fun(X, Sum) -> X + Sum end, 0, QH). @@ -818,30 +845,46 @@ ok
Returns information about a query handle. The information describes the simplifications and optimizations that are the results of preparing the - query for evaluation. This function is probably useful - mostly during debugging.
- + query for evaluation. This function is probably mainly useful + during debugging.The information has the form of an Erlang expression where QLCs most likely occur. Depending on the format functions of - mentioned QLC tables it may not be absolutely accurate.
- -The default is to return a sequence of QLCs in a block, but
- if the option
Options:
+The default is to return a sequence of QLCs in a block, but
+ if option
The default is to return a string, but if
+ option
The default is to return all elements in lists, but if
+ option
The default is to show all parts of
+ objects and match specifications,
+ but if option
Examples:
+In the following example two simple QLCs are inserted only to
+ hold option
1> QH = qlc:q([{X,Y} || X <- [x,y], Y <- [a,b]]),
io:format("~s~n", [qlc:info(QH, unique_all)]).
@@ -865,10 +908,11 @@ begin
],
[{unique,true}])
end
-
- In this example two simple QLCs have been inserted just to
- hold the
In the following example QLC
1> E1 = ets:new(e1, []),
E2 = ets:new(e2, []),
@@ -898,15 +942,6 @@ begin
[{X,Z}|{W,Y}] <- V2
])
end
-
- In this example the query list comprehension
Returns a query handle. When evaluating the query handle
-
Returns a query handle. When evaluating query handle
+
The sorter will use temporary files only if +
The sorter uses temporary files only if
Returns some or all of the remaining answers to a query
cursor. Only the owner of
The optional argument
Optional argument
Returns a query handle for a QLC. + The QLC must be the first argument to this function, otherwise + it is evaluated as an ordinary list comprehension. It is also + necessary to add the following line to the source code:
-include_lib("stdlib/include/qlc.hrl").
-
- to the source file. This causes a parse transform to - substitute a fun for the query list comprehension. The - (compiled) fun will be called when the query handle is - evaluated.
- -When calling
This causes a parse transform to substitute a fun for the QLC. The + (compiled) fun is called when the query handle is evaluated.
+When calling
To be very explicit, this will not work:
- +To be explicit, this does not work:
...
A = [X || {X} <- [{1},{2}]],
QH = qlc:q(A),
...
-
- The variable
Variable
The
The
The
The
The
Options:
+Option
Option
Option
Option
Options
Example:
+In the following example the cached results of the merge join are
+ traversed for each value of
1> Q = qlc:q([{A,X,Z,W} ||
A <- [a,b,c],
@@ -1076,29 +1104,31 @@ begin
X =:= Y
])
end
-
- In this example the cached results of the merge join are
- traversed for each value of
Sometimes (see
Example:
+In the following example, using the
1> T = gb_trees:empty(),
QH = qlc:q([X || {{X,Y},_} <- gb_table:table(T),
@@ -1119,39 +1149,41 @@ ets:match_spec_run(
end,
[{1,a},{1,b},{1,c},{2,a},{2,b},{2,c}]),
ets:match_spec_compile([{{{'$1','$2'},'_'},[],['$1']}]))
-
- In this example using the
The
The
Options:
+Option
Option
The evaluation of the query fails if the
Returns a query handle. When evaluating the query handle
-
Returns a query handle. When evaluating query handle
+
The sorter will use temporary files only if
+ marker="file_sorter#sort/3">
The sorter uses temporary files only if
A string version of
A string version of
Example:
1> L = [1,2,3],
Bs = erl_eval:add_binding('L', L, erl_eval:new_bindings()),
QH = qlc:string_to_handle("[X+1 || X <- L].", [], Bs),
qlc:eval(QH).
[2,3,4]
-
This function is probably useful mostly when called from - outside of Erlang, for instance from a driver written in C.
+This function is probably mainly useful when called from + outside of Erlang, for example from a driver written in C.
Returns a query handle for a QLC table.
+ In Erlang/OTP there is support for ETS, Dets, and
+ Mnesia tables, but many other data structures can be turned
+ into QLC tables. This is accomplished by letting function(s) in the
+ module implementing the data structure create a query handle by
+ calling
The callback function
The unary callback function
Callback function
Modules that can use match specifications for optimized
+ traversal of tables are to call
Other modules can provide a nullary
+
Unary callback function
Argument
The value of
The value of
Nullary callback function
+
The pre (post) functions for different tables are evaluated in + unspecified order.
+Other table access than reading, such as calling
+
The key position is obtained by calling
+
The unary callback function
Unary callback function
Unary callback function
No way of finding all possible answers by looking up keys
+ was found, but the filters could be transformed into a
+ match specification. All answers are found by calling
+
No optimization was found. A match specification matching
+ all objects is used if
If calling
See
For the various options recognized by
This module implements (double ended) FIFO queues +
This module provides (double-ended) FIFO queues in an efficient manner.
+All functions fail with reason
Some functions, where noted, fail with reason
The data representing a queue as used by this module - should be regarded as opaque by other modules. Any code + is to be regarded as opaque by other modules. Any code assuming knowledge of the format is running on thin ice.
+All operations has an amortized O(1) running time, except
-
Queues are double ended. The mental picture of + +
Queues are double-ended. The mental picture of a queue is a line of people (items) waiting for their turn. The queue front is the end with the item that has waited the longest. The queue rear is the end an item enters when it starts to wait. If instead using the mental picture of a list, the front is called head and the rear is called tail.
+Entering at the front and exiting at the rear are reverse operations on the queue.
-The module has several sets of interface functions. The - "Original API", the "Extended API" and the "Okasaki API".
+ +This module has three sets of interface functions: the + "Original API", the "Extended API", and the "Okasaki API".
+The "Original API" and the "Extended API" both use the - mental picture of a waiting line of items. Both also + mental picture of a waiting line of items. Both have reverse operations suffixed "_r".
+The "Original API" item removal functions return compound terms with both the removed item and the resulting queue. - The "Extended API" contain alternative functions that build - less garbage as well as functions for just inspecting the + The "Extended API" contains alternative functions that build + less garbage and functions for just inspecting the queue ends. Also the "Okasaki API" functions build less garbage.
-The "Okasaki API" is inspired by "Purely Functional Data structures" + +
The "Okasaki API" is inspired by "Purely Functional Data Structures" by Chris Okasaki. It regards queues as lists. - The API is by many regarded as strange and avoidable. - For example many reverse operations have lexically reversed names, + This API is by many regarded as strange and avoidable. + For example, many reverse operations have lexically reversed names, some with more readable but perhaps less understandable aliases.
As returned by
As returned by
+
Returns an empty queue.
+Returns a queue
If
So,
Tests if
Returns a queue containing the items in
Tests if
Inserts
Calculates and returns the length of queue
Inserts
Inserts
Tests if
Inserts
Tests if
Removes the item at the front of queue
Returns a queue
Removes the item at the rear of the queue
Calculates and returns the length of queue
Returns a queue containing the items in
Returns
Returns a list of the items in the queue in the same order; - the front item of the queue will become the head of the list.
+Returns an empty queue.
Returns a queue
Removes the item at the front of queue
Splits
Removes the item at the rear of queue
Returns a queue
Returns a queue
Returns a queue
If
So,
Splits
Returns
Returns a list of the items in the queue in the same order; + the front item of the queue becomes the head of the list.
Returns
Fails with reason
Returns
Fails with reason
Returns a queue
Fails with reason
Returns a queue
Fails with reason
Returns
Fails with reason
Returns
Fails with reason
Returns the tuple
Returns tuple
Returns the tuple
Returns tuple
Inserts
Inserts
Returns the tail item of queue
Fails with reason
Returns
Returns
Fails with reason
Returns a queue
Fails with reason
Inserts
Returns a queue
Fails with reason
The name
Returns the tail item of queue
Fails with reason
Returns a queue
Fails with reason
The name
Inserts
Returns a queue
Fails with reason
Random number generator.
- -The module contains several different algorithms and can be
- extended with more in the future. The current uniform
- distribution algorithms uses the
-
The implemented algorithms are:
+This module provides a random number generator. The module contains
+ a number of algorithms. The uniform distribution algorithms use the
+
The following algorithms are provided:
+Xorshift116+, 58 bits precision and period of 2^116-1
+Xorshift64*, 64 bits precision and a period of 2^64-1
+Xorshift1024*, 64 bits precision and a period of 2^1024-1
+The current default algorithm is
The default algorithm is
Every time a random number is requested, a state is used to - calculate it and a new state produced. The state can either be - implicit or it can be an explicit argument and return value. -
+ calculate it and a new state is produced. The state can either be + implicit or be an explicit argument and return value.The functions with implicit state use the process dictionary
- variable
If a process calls
+
The functions with explicit state never use the process dictionary.
+ +Examples:
+ +Simple use; creates and seeds the default algorithm + with a non-constant seed if not already done:
+ ++R0 = rand:uniform(), +R1 = rand:uniform(),-
If a process calls
Use a specified algorithm:
-The functions with explicit state never use the process - dictionary.
++_ = rand:seed(exs1024), +R2 = rand:uniform(),+ +
Use a specified algorithm with a constant seed:
-Examples:
- %% Simple usage. Creates and seeds the default algorithm
- %% with a non-constant seed if not already done.
- R0 = rand:uniform(),
- R1 = rand:uniform(),
-
- %% Use a given algorithm.
- _ = rand:seed(exs1024),
- R2 = rand:uniform(),
-
- %% Use a given algorithm with a constant seed.
- _ = rand:seed(exs1024, {123, 123534, 345345}),
- R3 = rand:uniform(),
-
- %% Use the functional api with non-constant seed.
- S0 = rand:seed_s(exsplus),
- {R4, S1} = rand:uniform_s(S0),
-
- %% Create a standard normal deviate.
- {SND0, S2} = rand:normal_s(S1),
-
-
- This random number generator is not cryptographically
- strong. If a strong cryptographic random number generator is
- needed, use one of functions in the
-
Use the functional API with a non-constant seed:
+ +
+S0 = rand:seed_s(exsplus),
+{R4, S1} = rand:uniform_s(S0),
+
+ Create a standard normal deviate:
+ +
+{SND0, S2} = rand:normal_s(S1),
+
+ This random number generator is not cryptographically
+ strong. If a strong cryptographic random number generator is
+ needed, use one of functions in the
+
Algorithm dependent state.
Algorithm-dependent state.
Algorithm dependent state which can be printed or saved to file.
Algorithm-dependent state that can be printed or saved to + file.
Seeds random number generation with the given algorithm and time dependent
- data if
Otherwise recreates the exported seed in the process
- dictionary, and returns the state.
- See also:
Returns the random number state in an external format.
+ To be used with
Seeds random number generation with the given algorithm and time dependent
- data if
Otherwise recreates the exported seed and returns the state.
- See also:
Returns the random number generator state in an external format.
+ To be used with
Seeds random number generation with the given algorithm and - integers in the process dictionary and returns - the state.
+Returns a standard normal deviate float (that is, the mean + is 0 and the standard deviation is 1) and updates the state in + the process dictionary.
Seeds random number generation with the given algorithm and - integers and returns the state.
+Returns, for a specified state, a standard normal + deviate float (that is, the mean is 0 and the standard + deviation is 1) and a new state.
Returns the random number state in an external format.
- To be used with
Seeds random number generation with the specifed algorithm and
+ time-dependent data if
Otherwise recreates the exported seed in the process dictionary,
+ and returns the state. See also
+
Returns the random number generator state in an external format.
- To be used with
Seeds random number generation with the specified algorithm and + integers in the process dictionary and returns the state.
Returns a random float uniformly distributed in the value
- range
Seeds random number generation with the specifed algorithm and
+ time-dependent data if
Otherwise recreates the exported seed and returns the state.
+ See also
Given a state,
Seeds random number generation with the specified algorithm and + integers and returns the state.
Given an integer
Returns a random float uniformly distributed in the value
+ range
Given an integer
Returns, for a specified integer
Returns a standard normal deviate float (that is, the mean - is 0 and the standard deviation is 1) and updates the state in - the process dictionary.
+Returns, for a specified state, random float
+ uniformly distributed in the value range
Given a state,
Returns, for a specified integer
Random number generator. The method is attributed to - B.A. Wichmann and I.D.Hill, in 'An efficient and portable +
This module provides a random number generator. The method is attributed + to B.A. Wichmann and I.D. Hill in 'An efficient and portable pseudo-random number generator', Journal of Applied - Statistics. AS183. 1982. Also Byte March 1987.
-The current algorithm is a modification of the version attributed - to Richard A O'Keefe in the standard Prolog library.
+ Statistics. AS183. 1982. Also Byte March 1987. + +The algorithm is a modification of the version attributed + to Richard A. O'Keefe in the standard Prolog library.
+Every time a random number is requested, a state is used to calculate
- it, and a new state produced. The state can either be implicit (kept
+ it, and a new state is produced. The state can either be implicit (kept
in the process dictionary) or be an explicit argument and return value.
In this implementation, the state (the type
It should be noted that this random number generator is not cryptographically
- strong. If a strong cryptographic random number generator is needed for
- example
The new and improved
This random number generator is not cryptographically
+ strong. If a strong cryptographic random number generator is
+ needed, use one of functions in the
+
The improved
The state.
Seeds random number generation with default (fixed) values - in the process dictionary, and returns the old state.
+ in the process dictionary and returns the old state.Seeds random number generation with integer values in the process - dictionary, and returns the old state.
-One easy way of obtaining a unique value to seed with is to:
+ dictionary and returns the old state. +The following is an easy way of obtaining a unique value to seed + with:
random:seed(erlang:phash2([node()]),
erlang:monotonic_time(),
erlang:unique_integer())
- See
For details, see
+
-
Returns the default state.
Returns a random float uniformly distributed between
Given an integer
Returns, for a specified integer
Given a state,
Returns, for a specified state, a random float uniformly
distributed between
Given an integer
Returns, for a specified integer
Some of the functions use the process dictionary variable
If a process calls
The implementation changed in R15. Upgrading to R15 will break
- applications that expect a specific output for a given seed. The output
- is still deterministic number series, but different compared to releases
- older than R15. The seed
If a process calls
+
The implementation changed in Erlang/OTP R15. Upgrading to R15 breaks
+ applications that expect a specific output for a specified seed. The
+ output is still deterministic number series, but different compared to
+ releases older than R15. Seed
This module contains regular expression matching functions for - strings and binaries.
+ strings and binaries.The
The library's matching algorithms are currently based on the - PCRE library, but not all of the PCRE library is interfaced and - some parts of the library go beyond what PCRE offers. The sections of - the PCRE documentation which are relevant to this module are included - here.
+The matching algorithms of the library are based on the + PCRE library, but not all of the PCRE library is interfaced and + some parts of the library go beyond what PCRE offers. The sections of + the PCRE documentation that are relevant to this module are included + here.
The Erlang literal syntax for strings uses the "\" - (backslash) character as an escape code. You need to escape - backslashes in literal strings, both in your code and in the shell, - with an additional backslash, i.e.: "\\".
+The Erlang literal syntax for strings uses the "\" + (backslash) character as an escape code. You need to escape + backslashes in literal strings, both in your code and in the shell, + with an extra backslash, that is, "\\".
Opaque datatype containing a compiled regular expression. - The mp() is guaranteed to be a tuple() having the atom - 're_pattern' as its first element, to allow for matching in - guards. The arity of the tuple() or the content of the other fields - may change in future releases.
+Opaque data type containing a compiled regular expression.
+
The same as
This function compiles a regular expression with the syntax - described below into an internal format to be used later as a - parameter to the run/2,3 functions.
-Compiling the regular expression before matching is useful if - the same expression is to be used in matching against multiple - subjects during the program's lifetime. Compiling once and - executing many times is far more efficient than compiling each - time one wants to match.
-When the unicode option is given, the regular expression should be given as a valid Unicode
By default, PCRE treats the subject string as consisting of a single line of characters (even if it actually contains newlines). The "start of line" metacharacter (^) matches only at the start of the string, while the "end of line" metacharacter ($) matches only at the end of the string, or before a terminating newline (unless
When
Override the default definition of a newline in the subject string, which is LF (ASCII 10) in Erlang.
-Compiles a regular expression, with the syntax
+ described below, into an internal format to be used later as a
+ parameter to
+
Compiling the regular expression before matching is useful if + the same expression is to be used in matching against multiple + subjects during the lifetime of the program. Compiling once and + executing many times is far more efficient than compiling each + time one wants to match.
+When option
Options:
+The regular expression is specified as a Unicode
+
The pattern is forced to be "anchored", that is, it is + constrained to match only at the first matching point in the + string that is searched (the "subject string"). This effect can + also be achieved by appropriate constructs in the pattern + itself.
+Letters in the pattern match both uppercase and lowercase
+ letters. It is equivalent to Perl option
A dollar metacharacter in the pattern matches only at the end of
+ the subject string. Without this option, a dollar also matches
+ immediately before a newline at the end of the string (but not
+ before any other newlines). This option is ignored if option
+
A dot in the pattern matches all characters, including those
+ indicating newline. Without it, a dot does not match when the
+ current position is at a newline. This option is equivalent to
+ Perl option
Whitespace data characters in the pattern are ignored except
+ when escaped or inside a character class. Whitespace does not
+ include character 'vt' (ASCII 11). Characters between an
+ unescaped
With this option, comments inside complicated patterns can be
+ included. However, notice that this applies only to data
+ characters. Whitespace characters can never appear within special
+ character sequences in a pattern, for example within sequence
+
An unanchored pattern is required to match before or at the first + newline in the subject string, although the matched text can + continue over the newline.
+By default, PCRE treats the subject string as consisting of a
+ single line of characters (even if it contains newlines). The
+ "start of line" metacharacter (
When this option is specified, the "start of line" and "end of
+ line" constructs match immediately following or immediately
+ before internal newlines in the subject string, respectively, as
+ well as at the very start and end. This is equivalent to Perl
+ option
Disables the use of numbered capturing parentheses in the
+ pattern. Any opening parenthesis that is not followed by
Names used to identify capturing subpatterns need not be unique. + This can be helpful for certain types of pattern when it is known + that only one instance of the named subpattern can ever be + matched. More details of named subpatterns are provided below.
+Inverts the "greediness" of the quantifiers so that they are not
+ greedy by default, but become greedy if followed by "?". It is
+ not compatible with Perl. It can also be set by a
Overrides the default definition of a newline in the subject + string, which is LF (ASCII 10) in Erlang.
+Newline is indicated by a single character
Newline is indicated by a single character LF (ASCII 10), the + default.
+Newline is indicated by the two-character CRLF (ASCII 13 + followed by ASCII 10) sequence.
+Any of the three preceding sequences is to be recognized.
+Any of the newline sequences above, and the Unicode sequences + VT (vertical tab, U+000B), FF (formfeed, U+000C), NEL (next + line, U+0085), LS (line separator, U+2028), and PS (paragraph + separator, U+2029).
+Specifies specifically that \R is to match only the CR, + LF, or CRLF sequences, not the Unicode-specific newline + characters.
+Specifies specifically that \R is to match all the Unicode + newline characters (including CRLF, and so on, the default).
+Disables optimization that can malfunction if "Special
+ start-of-pattern items" are present in the regular expression. A
+ typical example would be when matching "DEFABC" against
+ "(*COMMIT)ABC", where the start optimization of PCRE would skip
+ the subject up to "A" and never realize that the (*COMMIT)
+ instruction is to have made the matching fail. This option is only
+ relevant if you use "start-of-pattern items", as discussed in
+ section
Specifies that Unicode character properties are to be used when + resolving \B, \b, \D, \d, \S, \s, \W and \w. Without this flag, + only ISO Latin-1 properties are used. Using Unicode properties + hurts performance, but is semantically correct when working with + Unicode characters beyond the ISO Latin-1 range.
+Specifies that the (*UTF) and/or (*UTF8) "start-of-pattern
+ items" are forbidden. This flag cannot be combined with option
+
This function takes a compiled regular expression and an item, returning the relevant data from the regular expression. Currently the only supported item is
Example:
-
+ Takes a compiled regular expression and an item, and returns the
+ relevant data from the regular expression. The only
+ supported item is namelist , which returns the tuple
+ {namelist, [binary()]} , containing the names of all (unique)
+ named subpatterns in the regular expression. For example:
+
1> {ok,MP} = re:compile("(?<A>A)|(?<B>B)|(?<C>C)").
{ok,{re_pattern,3,0,0,
<<69,82,67,80,119,0,0,0,0,0,0,0,1,0,0,0,255,255,255,255,
@@ -181,8 +304,15 @@ This option makes it possible to include comments inside complicated patterns. N
255,255,...>>}}
4> re:inspect(MPD,namelist).
{namelist,[<<"B">>,<<"C">>]}
- Note specifically in the second example that the duplicate name only occurs once in the returned list, and that the list is in alphabetical order regardless of where the names are positioned in the regular expression. The order of the names is the same as the order of captured subexpressions if {capture, all_names} is given as an option to re:run/3 . You can therefore create a name-to-value mapping from the result of re:run/3 like this:
-
+ Notice in the second example that the duplicate name only occurs
+ once in the returned list, and that the list is in alphabetical order
+ regardless of where the names are positioned in the regular
+ expression. The order of the names is the same as the order of
+ captured subexpressions if {capture, all_names} is specified as
+ an option to run/3 .
+ You can therefore create a name-to-value mapping from the result of
+ run/3 like this:
+
1> {ok,MP} = re:compile("(?<A>A)|(?<B>B)|(?<C>C)").
{ok,{re_pattern,3,0,0,
<<69,82,67,80,119,0,0,0,0,0,0,0,1,0,0,0,255,255,255,255,
@@ -193,249 +323,318 @@ This option makes it possible to include comments inside complicated patterns. N
{match,[<<"A">>,<<>>,<<>>]}
4> NameMap = lists:zip(N,L).
[{<<"A">>,<<"A">>},{<<"B">>,<<>>},{<<"C">>,<<>>}]
- More items are expected to be added in the future.
+ Same as
Replaces the matched part of the
The permissible options are the same as for
+
As in function
The replacement string can contain the special character
+
To insert an & or a \ in the result, precede it
+ with a \. Notice that Erlang already gives a special meaning to
+ \ in literal strings, so a single \ must be written as
+
Example:
+
+re:replace("abcd","c","[&]",[{return,list}]).
+ gives
+
+"ab[c]d"
+ while
+
+re:replace("abcd","c","[\\&]",[{return,list}]).
+ gives
+
+"ab[&]d"
+ As with
The same as
Same as
Executes a regexp matching, returning
When compilation is involved, the exception
If the regular expression is previously compiled, the option
- list can only contain the options
If the regular expression was previously compiled with the
- option
The
If the capture options describe that no substring capturing
- at all is to be done (
The
The options relevant for execution are:
- -Implements global (repetitive) search (the
The interaction of the global option with a regular
- expression which matches an empty string surprises some users.
- When the global option is given,
re:run("cat","(|at)",[global]).
-
- The following matching will be performed:
-The result of the call is:
- - {match,[[{0,0},{0,0}],[{1,0},{1,0}],[{1,2},{1,2}],[{3,0},{3,0}]]}
-An empty string is not considered to be a valid match if this - option is given. If there are alternatives in the pattern, they - are tried. If all the alternatives match the empty string, the - entire match fails. For example, if the pattern
- a?b?
- is applied to a string not beginning with "a" or "b", it
- would normally match the empty string at the start of the
- subject. With the
This is like
Perl has no direct equivalent of
This option gives better control of the error handling in
The possible run-time errors are:
-It is important to understand that what is referred to as - "recursion" when limiting matches is not actually recursion on - the C stack of the Erlang machine, neither is it recursion on - the Erlang process stack. The version of PCRE compiled into the - Erlang VM uses machine "heap" memory to store values that needs to be - kept over recursion in regular expression matches.
-This option limits the execution time of a match in an - implementation-specific way. It is described in the following - way by the PCRE documentation:
- -
+ Executes a regular expression matching, and returns
+ match/{match, Captured } or nomatch . The
+ regular expression can be specified either as iodata() in
+ which case it is automatically compiled (as by compile/2 ) and
+ executed, or as a precompiled mp() in which case it is executed
+ against the subject directly.
+ When compilation is involved, exception badarg is thrown if a
+ compilation error occurs. Call compile/2 to get information
+ about the location of the error in the regular expression.
+ If the regular expression is previously compiled, the option list can
+ only contain the following options:
+
+ anchored
+ {capture, ValueSpec }/{capture,
+ ValueSpec , Type }
+ global
+ {match_limit, integer() >= 0}
+ {match_limit_recursion, integer() >= 0}
+ {newline, NLSpec }
+ notbol
+ notempty
+ notempty_atstart
+ noteol
+ {offset, integer() >= 0}
+ report_errors
+
+ Otherwise all options valid for function compile/2 are also
+ allowed. Options allowed both for compilation and execution of a
+ match, namely anchored and {newline,
+ NLSpec } , affect both the compilation and execution if
+ present together with a non-precompiled regular expression.
+ If the regular expression was previously compiled with option
+ unicode , Subject is to be provided as a
+ valid Unicode charlist() , otherwise any iodata() will
+ do. If compilation is involved and option unicode is specified,
+ both Subject and the regular expression are to be
+ specified as valid Unicode charlists() .
+ {capture, ValueSpec }/{capture,
+ ValueSpec , Type } defines what to return
+ from the function upon successful matching. The capture tuple
+ can contain both a value specification, telling which of the captured
+ substrings are to be returned, and a type specification, telling how
+ captured substrings are to be returned (as index tuples, lists, or
+ binaries). The options are described in detail below.
+ If the capture options describe that no substring capturing is to be
+ done ({capture, none} ), the function returns the single atom
+ match upon successful matching, otherwise the tuple
+ {match, ValueList } . Disabling capturing can be
+ done either by specifying none or an empty list as
+ ValueSpec .
+ Option report_errors adds the possibility that an error tuple
+ is returned. The tuple either indicates a matching error
+ (match_limit or match_limit_recursion ), or a compilation
+ error, where the error tuple has the format {error, {compile,
+ CompileErr }} . Notice that if option
+ report_errors is not specified, the function never returns
+ error tuples, but reports compilation errors as a badarg
+ exception and failed matches because of exceeded match limits simply
+ as nomatch .
+ The following options are relevant for execution:
+
+ anchored
+ -
+
Limits run/3 to matching at the first matching
+ position. If a pattern was compiled with anchored , or
+ turned out to be anchored by virtue of its contents, it cannot
+ be made unanchored at matching time, hence there is no
+ unanchored option.
+ global
+ -
+
Implements global (repetitive) search (flag g in Perl).
+ Each match is returned as a separate list() containing the
+ specific match and any matching subexpressions (or as specified
+ by option capture . The Captured part
+ of the return value is hence a list() of list() s
+ when this option is specified.
+ The interaction of option global with a regular
+ expression that matches an empty string surprises some users.
+ When option global is specified, run/3 handles
+ empty matches in the same way as Perl: a zero-length match at any
+ point is also retried with options [anchored,
+ notempty_atstart] . If that search gives a result of length
+ > 0, the result is included. Example:
+
+re:run("cat","(|at)",[global]).
+ The following matchings are performed:
+
+ At offset 0
+ -
+
The regular expression (|at) first match at the
+ initial position of string cat , giving the result set
+ [{0,0},{0,0}] (the second {0,0} is because of
+ the subexpression marked by the parentheses). As the length
+ of the match is 0, we do not advance to the next position
+ yet.
+
+ At offset 0 with [anchored,
+ notempty_atstart]
+ -
+
The search is retried with options [anchored,
+ notempty_atstart] at the same position, which does not
+ give any interesting result of longer length, so the search
+ position is advanced to the next character (a ).
+
+ At offset 1
+ -
+
The search results in [{1,0},{1,0}] , so this search is
+ also repeated with the extra options.
+
+ At offset 1 with [anchored,
+ notempty_atstart]
+ -
+
Alternative ab is found and the result is
+ [{1,2},{1,2}]. The result is added to the list of results and
+ the position in the search string is advanced two steps.
+
+ At offset 3
+ -
+
The search once again matches the empty string, giving
+ [{3,0},{3,0}] .
+
+ At offset 1 with [anchored,
+ notempty_atstart]
+ -
+
This gives no result of length > 0 and we are at the last
+ position, so the global search is complete.
+
+
+ The result of the call is:
+
+{match,[[{0,0},{0,0}],[{1,0},{1,0}],[{1,2},{1,2}],[{3,0},{3,0}]]}
+
+ notempty
+ -
+
An empty string is not considered to be a valid match if this
+ option is specified. If alternatives in the pattern exist, they
+ are tried. If all the alternatives match the empty string, the
+ entire match fails.
+ Example:
+ If the following pattern is applied to a string not beginning
+ with "a" or "b", it would normally match the empty string at the
+ start of the subject:
+
+a?b?
+ With option notempty , this match is invalid, so
+ run/3 searches further into the string for occurrences of
+ "a" or "b".
+
+ notempty_atstart
+ -
+
Like notempty , except that an empty string match that is
+ not at the start of the subject is permitted. If the pattern is
+ anchored, such a match can occur only if the pattern contains
+ \K.
+ Perl has no direct equivalent of notempty or
+ notempty_atstart , but it does make a special case of a
+ pattern match of the empty string within its split() function,
+ and when using modifier /g . The Perl behavior can be
+ emulated after matching a null string by first trying the
+ match again at the same offset with notempty_atstart and
+ anchored , and then, if that fails, by advancing the
+ starting offset (see below) and trying an ordinary match
+ again.
+
+ notbol
+ -
+
Specifies that the first character of the subject string is not
+ the beginning of a line, so the circumflex metacharacter is not
+ to match before it. Setting this without multiline (at
+ compile time) causes circumflex never to match. This option only
+ affects the behavior of the circumflex metacharacter. It does not
+ affect \A.
+
+ noteol
+ -
+
Specifies that the end of the subject string is not the end of a
+ line, so the dollar metacharacter is not to match it nor (except
+ in multiline mode) a newline immediately before it. Setting this
+ without multiline (at compile time) causes dollar never to
+ match. This option affects only the behavior of the dollar
+ metacharacter. It does not affect \Z or \z.
+
+ report_errors
+ -
+
Gives better control of the error handling in run/3 . When
+ specified, compilation errors (if the regular expression is not
+ already compiled) and runtime errors are explicitly returned as
+ an error tuple.
+ The following are the possible runtime errors:
+
+ match_limit
+ -
+
The PCRE library sets a limit on how many times the internal
+ match function can be called. Defaults to 10,000,000 in the
+ library compiled for Erlang. If {error, match_limit}
+ is returned, the execution of the regular expression has
+ reached this limit. This is normally to be regarded as a
+ nomatch , which is the default return value when this
+ occurs, but by specifying report_errors , you are
+ informed when the match fails because of too many internal
+ calls.
+
+ match_limit_recursion
+ -
+
This error is very similar to match_limit , but occurs
+ when the internal match function of PCRE is "recursively"
+ called more times than the match_limit_recursion limit,
+ which defaults to 10,000,000 as well. Notice that as long as
+ the match_limit
+ and match_limit_default values are
+ kept at the default values, the match_limit_recursion
+ error cannot occur, as the match_limit error occurs
+ before that (each recursive call is also a call, but not
+ conversely). Both limits can however be changed, either by
+ setting limits directly in the regular expression string (see
+ section PCRE Regular
+ Eexpression Details ) or by specifying options to
+ run/3 .
+
+
+ It is important to understand that what is referred to as
+ "recursion" when limiting matches is not recursion on the C stack
+ of the Erlang machine or on the Erlang process stack. The PCRE
+ version compiled into the Erlang VM uses machine "heap" memory to
+ store values that must be kept over recursion in regular
+ expression matches.
+
+ {match_limit, integer() >= 0}
+ -
+
Limits the execution time of a match in an
+ implementation-specific way. It is described as follows by the
+ PCRE documentation:
+
The match_limit field provides a means of preventing PCRE from using
up a vast amount of resources when running patterns that are not going
to match, but which have a very large number of possibilities in their
@@ -448,26 +647,22 @@ imposed on the number of times this function is called during a match,
which has the effect of limiting the amount of backtracking that can
take place. For patterns that are not anchored, the count restarts
from zero for each position in the subject string.
-
- This means that runaway regular expression matches can fail
- faster if the limit is lowered using this option. The default
- value compiled into the Erlang virtual machine is 10000000
-
- This option does in no way affect the execution of the
- Erlang virtual machine in terms of "long running
- BIF's". re:run always give control back to the scheduler
- of Erlang processes at intervals that ensures the real time
- properties of the Erlang system.
-
-
- {match_limit_recursion, integer() >= 0}
-
- This option limits the execution time and memory
- consumption of a match in an implementation-specific way, very
- similar to match_limit . It is described in the following
- way by the PCRE documentation:
-
-
+ This means that runaway regular expression matches can fail
+ faster if the limit is lowered using this option. The default
+ value 10,000,000 is compiled into the Erlang VM.
+
+ This option does in no way affect the execution of the Erlang
+ VM in terms of "long running BIFs". run/3 always gives
+ control back to the scheduler of Erlang processes at intervals
+ that ensures the real-time properties of the Erlang system.
+
+
+ {match_limit_recursion, integer() >= 0}
+ -
+
Limits the execution time and memory consumption of a match in an
+ implementation-specific way, very similar to match_limit .
+ It is described as follows by the PCRE documentation:
+
The match_limit_recursion field is similar to match_limit, but instead
of limiting the total number of times that match() is called, it
limits the depth of recursion. The recursion depth is a smaller number
@@ -477,3273 +672,3535 @@ match_limit.
Limiting the recursion depth limits the amount of machine stack that
can be used, or, when PCRE has been compiled to use memory on the heap
-instead of the stack, the amount of heap memory that can be
-used.
-
- The Erlang virtual machine uses a PCRE library where heap
- memory is used when regular expression match recursion happens,
- why this limits the usage of machine heap, not C stack.
-
- Specifying a lower value may result in matches with deep recursion failing, when they should actually have matched:
-
+instead of the stack, the amount of heap memory that can be used.
+ The Erlang VM uses a PCRE library where heap memory is used when
+ regular expression match recursion occurs. This therefore limits
+ the use of machine heap, not C stack.
+ Specifying a lower value can result in matches with deep
+ recursion failing, when they should have matched:
+
1> re:run("aaaaaaaaaaaaaz","(a+)*z").
{match,[{0,14},{0,13}]}
2> re:run("aaaaaaaaaaaaaz","(a+)*z",[{match_limit_recursion,5}]).
nomatch
3> re:run("aaaaaaaaaaaaaz","(a+)*z",[{match_limit_recursion,5},report_errors]).
{error,match_limit_recursion}
-
- This option, as well as the match_limit option should
- only be used in very rare cases. Understanding of the PCRE
- library internals is recommended before tampering with these
- limits.
-
-
- {offset, integer() >= 0}
-
- - Start matching at the offset (position) given in the
- subject string. The offset is zero-based, so that the default is
-
{offset,0} (all of the subject string).
-
- {newline, NLSpec }
- -
-
Override the default definition of a newline in the subject string, which is LF (ASCII 10) in Erlang.
-
- cr
- - Newline is indicated by a single character CR (ASCII 13)
- lf
- - Newline is indicated by a single character LF (ASCII 10), the default
- crlf
- - Newline is indicated by the two-character CRLF (ASCII 13 followed by ASCII 10) sequence.
- anycrlf
- - Any of the three preceding sequences should be recognized.
- any
- - Any of the newline sequences above, plus the Unicode sequences VT (vertical tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS (paragraph separator, U+2029).
-
-
- bsr_anycrlf
- - Specifies specifically that \R is to match only the cr, lf or crlf sequences, not the Unicode specific newline characters. (overrides compilation option)
- bsr_unicode
- - Specifies specifically that \R is to match all the Unicode newline characters (including crlf etc, the default).(overrides compilation option)
-
- {capture, ValueSpec } /{capture, ValueSpec , Type }
- -
-
-
Specifies which captured substrings are returned and in what
- format. By default,
- re:run/3 captures all of the matching part of the
- substring as well as all capturing subpatterns (all of the
- pattern is automatically captured). The default return type is
- (zero-based) indexes of the captured parts of the string, given as
- {Offset,Length} pairs (the index Type of
- capturing).
-
- As an example of the default behavior, the following call:
-
- re:run("ABCabcdABC","abcd",[]).
-
- returns, as first and only captured string the matching part of the subject ("abcd" in the middle) as a index pair {3,4} , where character positions are zero based, just as in offsets. The return value of the call above would then be:
- {match,[{3,4}]}
- Another (and quite common) case is where the regular expression matches all of the subject, as in:
- re:run("ABCabcdABC",".*abcd.*",[]).
- where the return value correspondingly will point out all of the string, beginning at index 0 and being 10 characters long:
- {match,[{0,10}]}
-
- If the regular expression contains capturing subpatterns,
- like in the following case:
-
- re:run("ABCabcdABC",".*(abcd).*",[]).
-
- all of the matched subject is captured, as
- well as the captured substrings:
-
- {match,[{0,10},{3,4}]}
-
- the complete matching pattern always giving the first return value in the
- list and the rest of the subpatterns being added in the
- order they occurred in the regular expression.
-
- The capture tuple is built up as follows:
-
- ValueSpec
- Specifies which captured (sub)patterns are to be returned. The ValueSpec can either be an atom describing a predefined set of return values, or a list containing either the indexes or the names of specific subpatterns to return.
- The predefined sets of subpatterns are:
-
- all
- - All captured subpatterns including the complete matching string. This is the default.
- all_names
- - All named subpatterns in the regular expression, as if a
list()
- of all the names in alphabetical order was given. The list of all names can also be retrieved with the inspect/2 function.
- first
- - Only the first captured subpattern, which is always the complete matching part of the subject. All explicitly captured subpatterns are discarded.
- all_but_first
- - All but the first matching subpattern, i.e. all explicitly captured subpatterns, but not the complete matching part of the subject string. This is useful if the regular expression as a whole matches a large part of the subject, but the part you're interested in is in an explicitly captured subpattern. If the return type is
list or binary , not returning subpatterns you're not interested in is a good way to optimize.
- none
- - Do not return matching subpatterns at all, yielding the single atom
match as the return value of the function when matching successfully instead of the {match, list()} return. Specifying an empty list gives the same behavior.
-
- The value list is a list of indexes for the subpatterns to return, where index 0 is for all of the pattern, and 1 is for the first explicit capturing subpattern in the regular expression, and so forth. When using named captured subpatterns (see below) in the regular expression, one can use atom() s or string() s to specify the subpatterns to be returned. For example, consider the regular expression:
- ".*(abcd).*"
- matched against the string "ABCabcdABC", capturing only the "abcd" part (the first explicit subpattern):
- re:run("ABCabcdABC",".*(abcd).*",[{capture,[1]}]).
- The call will yield the following result:
- {match,[{3,4}]}
- as the first explicitly captured subpattern is "(abcd)", matching "abcd" in the subject, at (zero-based) position 3, of length 4.
- Now consider the same regular expression, but with the subpattern explicitly named 'FOO':
- ".*(?<FOO>abcd).*"
- With this expression, we could still give the index of the subpattern with the following call:
- re:run("ABCabcdABC",".*(?<FOO>abcd).*",[{capture,[1]}]).
- giving the same result as before. But, since the subpattern is named, we can also specify its name in the value list:
- re:run("ABCabcdABC",".*(?<FOO>abcd).*",[{capture,['FOO']}]).
- which would yield the same result as the earlier examples, namely:
- {match,[{3,4}]}
-
- The values list might specify indexes or names not present in
- the regular expression, in which case the return values vary
- depending on the type. If the type is index , the tuple
- {-1,0} is returned for values having no corresponding
- subpattern in the regexp, but for the other types
- (binary and list ), the values are the empty binary
- or list respectively.
-
-
- Type
- Optionally specifies how captured substrings are to be returned. If omitted, the default of index is used. The Type can be one of the following:
-
- index
- - Return captured substrings as pairs of byte indexes into the subject string and length of the matching string in the subject (as if the subject string was flattened with
iolist_to_binary/1 or unicode:characters_to_binary/2 prior to matching). Note that the unicode option results in byte-oriented indexes in a (possibly virtual) UTF-8 encoded binary. A byte index tuple {0,2} might therefore represent one or two characters when unicode is in effect. This might seem counter-intuitive, but has been deemed the most effective and useful way to way to do it. To return lists instead might result in simpler code if that is desired. This return type is the default.
- list
- - Return matching substrings as lists of characters (Erlang
string() s). It the unicode option is used in combination with the \C sequence in the regular expression, a captured subpattern can contain bytes that are not valid UTF-8 (\C matches bytes regardless of character encoding). In that case the list capturing may result in the same types of tuples that unicode:characters_to_list/2 can return, namely three-tuples with the tag incomplete or error , the successfully converted characters and the invalid UTF-8 tail of the conversion as a binary. The best strategy is to avoid using the \C sequence when capturing lists.
- binary
- - Return matching substrings as binaries. If the
unicode option is used, these binaries are in UTF-8. If the \C sequence is used together with unicode the binaries may be invalid UTF-8.
+ This option and option match_limit are only to be used in
+ rare cases. Understanding of the PCRE library internals is
+ recommended before tampering with these limits.
+
+ {offset, integer() >= 0}
+ -
+
Start matching at the offset (position) specified in the
+ subject string. The offset is zero-based, so that the default is
+ {offset,0} (all of the subject string).
+
+ {newline, NLSpec }
+ -
+
Overrides the default definition of a newline in the subject
+ string, which is LF (ASCII 10) in Erlang.
+
+ cr
+ -
+
Newline is indicated by a single character CR (ASCII 13).
+
+ lf
+ -
+
Newline is indicated by a single character LF (ASCII 10),
+ the default.
+
+ crlf
+ -
+
Newline is indicated by the two-character CRLF (ASCII 13
+ followed by ASCII 10) sequence.
+
+ anycrlf
+ -
+
Any of the three preceding sequences is be recognized.
+
+ any
+ -
+
Any of the newline sequences above, and the Unicode
+ sequences VT (vertical tab, U+000B), FF (formfeed, U+000C), NEL
+ (next line, U+0085), LS (line separator, U+2028), and PS
+ (paragraph separator, U+2029).
+
+
+
+ bsr_anycrlf
+ -
+
Specifies specifically that \R is to match only the CR
+ LF, or CRLF sequences, not the Unicode-specific newline
+ characters. (Overrides the compilation option.)
+
+ bsr_unicode
+ -
+
Specifies specifically that \R is to match all the Unicode
+ newline characters (including CRLF, and so on, the default).
+ (Overrides the compilation option.)
+
+ {capture, ValueSpec } /{capture,
+ ValueSpec , Type }
+ -
+
Specifies which captured substrings are returned and in what
+ format. By default, run/3 captures all of the matching
+ part of the substring and all capturing subpatterns (all of the
+ pattern is automatically captured). The default return type is
+ (zero-based) indexes of the captured parts of the string,
+ specified as {Offset,Length} pairs (the index
+ Type of capturing).
+ As an example of the default behavior, the following call
+ returns, as first and only captured string, the matching part of
+ the subject ("abcd" in the middle) as an index pair {3,4} ,
+ where character positions are zero-based, just as in offsets:
+
+re:run("ABCabcdABC","abcd",[]).
+ The return value of this call is:
+
+{match,[{3,4}]}
+ Another (and quite common) case is where the regular expression
+ matches all of the subject:
+
+re:run("ABCabcdABC",".*abcd.*",[]).
+ Here the return value correspondingly points out all of the
+ string, beginning at index 0, and it is 10 characters long:
+
+{match,[{0,10}]}
+ If the regular expression contains capturing subpatterns, like
+ in:
+
+re:run("ABCabcdABC",".*(abcd).*",[]).
+ all of the matched subject is captured, as well as the captured
+ substrings:
+
+{match,[{0,10},{3,4}]}
+ The complete matching pattern always gives the first return
+ value in the list and the remaining subpatterns are added in the
+ order they occurred in the regular expression.
+ The capture tuple is built up as follows:
+
+ ValueSpec
+ -
+
Specifies which captured (sub)patterns are to be returned.
+ ValueSpec can either be an atom describing
+ a predefined set of return values, or a list containing the
+ indexes or the names of specific subpatterns to return.
+ The following are the predefined sets of subpatterns:
+
+ all
+ -
+
All captured subpatterns including the complete matching
+ string. This is the default.
+
+ all_names
+ -
+
All named subpatterns in the regular expression,
+ as if a list() of all the names in
+ alphabetical order was specified. The list of all
+ names can also be retrieved with
+
+ inspect/2 .
+
+ first
+ -
+
Only the first captured subpattern, which is always the
+ complete matching part of the subject. All explicitly
+ captured subpatterns are discarded.
+
+ all_but_first
+ -
+
All but the first matching subpattern, that is, all
+ explicitly captured subpatterns, but not the complete
+ matching part of the subject string. This is useful if
+ the regular expression as a whole matches a large part of
+ the subject, but the part you are interested in is in an
+ explicitly captured subpattern. If the return type is
+ list or binary , not returning subpatterns
+ you are not interested in is a good way to optimize.
+
+ none
+ -
+
Returns no matching subpatterns, gives the single
+ atom match as the return value of the function
+ when matching successfully instead of the {match,
+ list()} return. Specifying an empty list gives the
+ same behavior.
+
+
+ The value list is a list of indexes for the subpatterns to
+ return, where index 0 is for all of the pattern, and 1 is for
+ the first explicit capturing subpattern in the regular
+ expression, and so on. When using named captured subpatterns
+ (see below) in the regular expression, one can use
+ atom() s or string() s to specify the subpatterns
+ to be returned. For example, consider the regular
+ expression:
+
+".*(abcd).*"
+ matched against string "ABCabcdABC", capturing only the
+ "abcd" part (the first explicit subpattern):
+
+re:run("ABCabcdABC",".*(abcd).*",[{capture,[1]}]).
+ The call gives the following result, as the first explicitly
+ captured subpattern is "(abcd)", matching "abcd" in the
+ subject, at (zero-based) position 3, of length 4:
+
+{match,[{3,4}]}
+ Consider the same regular expression, but with the subpattern
+ explicitly named 'FOO':
+
+".*(?<FOO>abcd).*"
+ With this expression, we could still give the index of the
+ subpattern with the following call:
+
+re:run("ABCabcdABC",".*(?<FOO>abcd).*",[{capture,[1]}]).
+ giving the same result as before. But, as the subpattern is
+ named, we can also specify its name in the value list:
+
+re:run("ABCabcdABC",".*(?<FOO>abcd).*",[{capture,['FOO']}]).
+ This would give the same result as the earlier examples,
+ namely:
+
+{match,[{3,4}]}
+ The values list can specify indexes or names not present in
+ the regular expression, in which case the return values vary
+ depending on the type. If the type is index , the tuple
+ {-1,0} is returned for values with no corresponding
+ subpattern in the regular expression, but for the other types
+ (binary and list ), the values are the empty
+ binary or list, respectively.
+
+ Type
+ -
+
Optionally specifies how captured substrings are to be
+ returned. If omitted, the default of index is used.
+ Type can be one of the following:
+
+ index
+ -
+
Returns captured substrings as pairs of byte indexes
+ into the subject string and length of the matching string
+ in the subject (as if the subject string was flattened
+ with
+ erlang:iolist_to_binary/1 or
+
+ unicode:characters_to_binary/2 before
+ matching). Notice that option unicode results in
+ byte-oriented indexes in a (possibly virtual)
+ UTF-8 encoded binary. A byte index tuple
+ {0,2} can therefore represent one or two
+ characters when unicode is in effect. This can seem
+ counter-intuitive, but has been deemed the most effective
+ and useful way to do it. To return lists instead can
+ result in simpler code if that is desired. This return
+ type is the default.
+
+ list
+ -
+
Returns matching substrings as lists of characters
+ (Erlang string() s). It option unicode is
+ used in combination with the \C sequence in the
+ regular expression, a captured subpattern can contain
+ bytes that are not valid UTF-8 (\C matches bytes
+ regardless of character encoding). In that case the
+ list capturing can result in the same types of
+ tuples that
+
+ unicode:characters_to_list/2 can return,
+ namely three-tuples with tag incomplete or
+ error , the successfully converted characters and
+ the invalid UTF-8 tail of the conversion as a binary. The
+ best strategy is to avoid using the \C sequence
+ when capturing lists.
+
+ binary
+ -
+
Returns matching substrings as binaries. If option
+ unicode is used, these binaries are in UTF-8. If
+ the \C sequence is used together with
+ unicode , the binaries can be invalid UTF-8.
+
+
+
+
+ In general, subpatterns that were not assigned a value in the
+ match are returned as the tuple {-1,0} when type is
+ index . Unassigned subpatterns are returned as the empty
+ binary or list, respectively, for other return types. Consider
+ the following regular expression:
+
+".*((?<FOO>abdd)|a(..d)).*"
+ There are three explicitly capturing subpatterns, where the
+ opening parenthesis position determines the order in the result,
+ hence ((?<FOO>abdd)|a(..d)) is subpattern index 1,
+ (?<FOO>abdd) is subpattern index 2, and (..d)
+ is subpattern index 3. When matched against the following
+ string:
+
+"ABCabcdABC"
+ the subpattern at index 2 does not match, as "abdd" is not
+ present in the string, but the complete pattern matches (because
+ of the alternative a(..d) ). The subpattern at index 2 is
+ therefore unassigned and the default return value is:
+
+{match,[{0,10},{3,4},{-1,0},{4,3}]}
+ Setting the capture Type to binary
+ gives:
+
+{match,[<<"ABCabcdABC">>,<<"abcd">>,<<>>,<<"bcd">>]}
+ Here the empty binary (<<>> ) represents the
+ unassigned subpattern. In the binary case, some information
+ about the matching is therefore lost, as
+ <<>> can
+ also be an empty string captured.
+ If differentiation between empty matches and non-existing
+ subpatterns is necessary, use the type index and do
+ the conversion to the final type in Erlang code.
+ When option global is speciified, the capture
+ specification affects each match separately, so that:
+
+re:run("cacb","c(a|b)",[global,{capture,[1],list}]).
+ gives
+
+{match,[["a"],["b"]]}
+
-
-
- In general, subpatterns that were not assigned a value in the match are returned as the tuple {-1,0} when type is index . Unassigned subpatterns are returned as the empty binary or list, respectively, for other return types. Consider the regular expression:
- ".*((?<FOO>abdd)|a(..d)).*"
- There are three explicitly capturing subpatterns, where the opening parenthesis position determines the order in the result, hence ((?<FOO>abdd)|a(..d)) is subpattern index 1, (?<FOO>abdd) is subpattern index 2 and (..d) is subpattern index 3. When matched against the following string:
- "ABCabcdABC"
- the subpattern at index 2 won't match, as "abdd" is not present in the string, but the complete pattern matches (due to the alternative a(..d) . The subpattern at index 2 is therefore unassigned and the default return value will be:
- {match,[{0,10},{3,4},{-1,0},{4,3}]}
- Setting the capture Type to binary would give the following:
- {match,[<<"ABCabcdABC">>,<<"abcd">>,<<>>,<<"bcd">>]}
- where the empty binary (<<>> ) represents the unassigned subpattern. In the binary case, some information about the matching is therefore lost, the <<>> might just as well be an empty string captured.
- If differentiation between empty matches and non existing subpatterns is necessary, use the type index
- and do the conversion to the final type in Erlang code.
-
- When the option global is given, the capture
- specification affects each match separately, so that:
-
- re:run("cacb","c(a|b)",[global,{capture,[1],list}]).
-
- gives the result:
-
- {match,[["a"],["b"]]}
-
- The options solely affecting the compilation step are described in the
The same as
Replaces the matched part of the
The permissible options are the same as for
As in the
The replacement string can contain the special character
-
To insert an
re:replace("abcd","c","[&]",[{return,list}]).
- gives
- "ab[c]d"
- while
- re:replace("abcd","c","[\\&]",[{return,list}]).
- gives
- "ab[&]d"
- As with
For a descriptions of options only affecting the compilation step,
+ see
The same as
Same as
This function splits the input into parts by finding tokens - according to the regular expression supplied.
- -The splitting is done basically by running a global regexp match and - dividing the initial string wherever a match occurs. The matching part - of the string is removed from the output.
- -As in the
The result is given as a list of "strings", the
- preferred datatype given in the
If subexpressions are given in the regular expression, the - matching subexpressions are returned in the resulting list as - well. An example:
- - re:split("Erlang","[ln]",[{return,list}]).
-
- will yield the result:
- - ["Er","a","g"]
-
- while
- - re:split("Erlang","([ln])",[{return,list}]).
-
- will yield
- - ["Er","l","a","n","g"]
-
- The text matching the subexpression (marked by the parentheses - in the regexp) is - inserted in the result list where it was found. In effect this means - that concatenating the result of a split where the whole regexp is a - single subexpression (as in the example above) will always result in - the original string.
- -As there is no matching subexpression for the last part in
- the example (the "g"), there is nothing inserted after
- that. To make the group of strings and the parts matching the
- subexpressions more obvious, one might use the
re:split("Erlang","([ln])",[{return,list},group]).
-
- gives:
- - [["Er","l"],["a","n"],["g"]]
-
- Here the regular expression matched first the "l", - causing "Er" to be the first part in the result. When - the regular expression matched, the (only) subexpression was - bound to the "l", so the "l" is inserted - in the group together with "Er". The next match is of - the "n", making "a" the next part to be - returned. Since the subexpression is bound to the substring - "n" in this case, the "n" is inserted into - this group. The last group consists of the rest of the string, - as no more matches are found.
- - -By default, all parts of the string, including the empty - strings, are returned from the function. For example:
- - re:split("Erlang","[lg]",[{return,list}]).
-
- will return:
- - ["Er","an",[]]
-
- since the matching of the "g" in the end of the string
- leaves an empty rest which is also returned. This behaviour
- differs from the default behaviour of the split function in
- Perl, where empty strings at the end are by default removed. To
- get the
- "trimming" default behavior of Perl, specify
-
re:split("Erlang","[lg]",[{return,list},trim]).
-
- The result will be:
- - ["Er","an"]
-
- The "trim" option in effect says; "give me as
- many parts as possible except the empty ones", which might
- be useful in some circumstances. You can also specify how many
- parts you want, by specifying
re:split("Erlang","[lg]",[{return,list},{parts,2}]).
-
- This will give:
- - ["Er","ang"]
-
- Note that the last part is "ang", not
- "an", as we only specified splitting into two parts,
- and the splitting stops when enough parts are given, which is
- why the result differs from that of
More than three parts are not possible with this indata, so
- - re:split("Erlang","[lg]",[{return,list},{parts,4}]).
-
- will give the same result as the default, which is to be - viewed as "an infinite number of parts".
- -Specifying
Splits the input into parts by finding tokens according to the + regular expression supplied. The splitting is basically done by + running a global regular expression match and dividing the initial + string wherever a match occurs. The matching part of the string is + removed from the output.
+As in
The result is given as a list of "strings", the preferred
+ data type specified in option
If subexpressions are specified in the regular expression, the + matching subexpressions are returned in the resulting list as + well. For example:
+
+re:split("Erlang","[ln]",[{return,list}]).
+ gives
+
+["Er","a","g"]
+ while
+
+re:split("Erlang","([ln])",[{return,list}]).
+ gives
+
+["Er","l","a","n","g"]
+ The text matching the subexpression (marked by the parentheses in the + regular expression) is inserted in the result list where it was found. + This means that concatenating the result of a split where the whole + regular expression is a single subexpression (as in the last example) + always results in the original string.
+As there is no matching subexpression for the last part in the
+ example (the "g"), nothing is inserted after that. To make
+ the group of strings and the parts matching the subexpressions more
+ obvious, one can use option
+re:split("Erlang","([ln])",[{return,list},group]).
+ gives
+
+[["Er","l"],["a","n"],["g"]]
+ Here the regular expression first matched the "l", + causing "Er" to be the first part in the result. When + the regular expression matched, the (only) subexpression was + bound to the "l", so the "l" is inserted + in the group together with "Er". The next match is of + the "n", making "a" the next part to be + returned. As the subexpression is bound to substring + "n" in this case, the "n" is inserted into + this group. The last group consists of the remaining string, + as no more matches are found.
+By default, all parts of the string, including the empty strings, + are returned from the function, for example:
+
+re:split("Erlang","[lg]",[{return,list}]).
+ gives
+
+["Er","an",[]]
+ as the matching of the "g" in the end of the string
+ leaves an empty rest, which is also returned. This behavior
+ differs from the default behavior of the split function in
+ Perl, where empty strings at the end are by default removed. To
+ get the "trimming" default behavior of Perl, specify
+
+re:split("Erlang","[lg]",[{return,list},trim]).
+ gives
+
+["Er","an"]
+ The "trim" option says; "give me as many parts as
+ possible except the empty ones", which sometimes can be
+ useful. You can also specify how many parts you want, by specifying
+
+re:split("Erlang","[lg]",[{return,list},{parts,2}]).
+ gives
+
+["Er","ang"]
+ Notice that the last part is "ang", not
+ "an", as splitting was specified into two parts,
+ and the splitting stops when enough parts are given, which is
+ why the result differs from that of
More than three parts are not possible with this indata, so
+
+re:split("Erlang","[lg]",[{return,list},{parts,4}]).
+ gives the same result as the default, which is to be + viewed as "an infinite number of parts".
+Specifying
The
Summary of options not previously described for function
+
Specifies how the parts of the original string are presented in + the result list. Valid types:
+The variant of
All parts returned as binaries.
All parts returned as lists of characters + ("strings").
+Groups together the part of the string with + the parts of the string matching the subexpressions of the + regular expression.
+The return value from the function is in this case a
+
Specifies the number of parts the subject string is to be + split into.
+The number of parts is to be a positive integer for a specific
+ maximum number of parts, and
Specifies that empty parts at the end of the result list are
+ to be disregarded. The same as specifying
If you are familiar with Perl, the
The following sections contain reference material for the regular + expressions used by this module. The information is based on the PCRE + documentation, with changes where this module behaves differently to + the PCRE library.
+Summary of options not previously described for the
Specifies how the parts of the original string are presented in the result list. The possible types are:
-The syntax and semantics of the regular expressions supported by PCRE are + described in detail in the following sections. Perl's regular expressions + are described in its own documentation, and regular expressions in general + are covered in many books, some with copious examples. + Jeffrey Friedl's "Mastering Regular Expressions", published by O'Reilly, + covers regular expressions in great detail. This description of the PCRE + regular expressions is intended as reference material.
+ +The reference material is divided into the following sections:
+ +Groups together the part of the string with - the parts of the string matching the subexpressions of the - regexp.
-The return value from the function will in this case be a
-
Specifies the number of parts the subject string is to be - split into.
- -The number of parts should be a positive integer for a specific maximum on the
- number of parts and
Specifies that empty parts at the end of the result list are
- to be disregarded. The same as specifying
Some options that can be passed to
UTF Support
+ +Unicode support is basically UTF-8 based. To use Unicode characters, you
+ either call
+(*UTF8)
+(*UTF)
+
+ Both options give the same effect, the input string is interpreted as
+ UTF-8. Notice that with these instructions, the automatic conversion of
+ lists to UTF-8 is not performed by the
Some applications that allow their users to supply patterns can wish to
+ restrict them to non-UTF data for security reasons. If option
+
Unicode Property Support
+ +The following is another special sequence that can appear at the start of + a pattern:
+ +
+(*UCP)
+
+ This has the same effect as setting option
Disabling Startup Optimizations
+ +If a pattern starts with
Newline Conventions
+PCRE supports five conventions for indicating line breaks in strings: a + single CR (carriage return) character, a single LF (line feed) character, + the two-character sequence CRLF, any of the three preceding, and any + Unicode newline sequence.
+ +A newline convention can also be specified by starting a pattern string + with one of the following five sequences:
+ +These override the default and the options specified to
+
+(*CR)a.b
+
+ This pattern matches
The newline convention affects where the circumflex and dollar assertions
+ are true. It also affects the interpretation of the dot metacharacter when
+
Setting Match and Recursion Limits
+ +The caller of
+(*LIMIT_MATCH=d)
+(*LIMIT_RECURSION=d)
+
+ Here d is any number of decimal digits. However, the value of the setting
+ must be less than the value set by the caller of
The default value for both the limits is 10,000,000 in the Erlang + VM. Notice that the recursion limit does not affect the stack depth of the + VM, as PCRE for Erlang is compiled in such a way that the match function + never does recursion on the C stack.
The syntax and semantics of the regular expressions that are supported by PCRE -are described in detail below. Perl's regular expressions are described in its own documentation, and -regular expressions in general are covered in a number of books, some of which -have copious examples. Jeffrey Friedl's "Mastering Regular Expressions", -published by O'Reilly, covers regular expressions in great detail. This -description of PCRE's regular expressions is intended as reference material.
-The reference material is divided into the following sections:
-A number of options that can be passed to
UTF support
-
-Unicode support is basically UTF-8 based. To use Unicode characters, you either
-call
-- -(*UTF8)
-(*UTF)
-
Both options give the same effect, the input string is interpreted
-as UTF-8. Note that with these instructions, the automatic conversion
-of lists to UTF-8 is not performed by the
-Some applications that allow their users to supply patterns may wish to
-restrict them to non-UTF data for security reasons. If the
Unicode property support
-Another special sequence that may appear at the start of a pattern is
---(*UCP)
-
This has the same effect as setting the
Disabling start-up optimizations
-
-If a pattern starts with (*NO_START_OPT), it has the same effect as setting the
-
Newline conventions
- -PCRE supports -five -different conventions for indicating line breaks in -strings: a single CR (carriage return) character, a single LF (linefeed) -character, the two-character sequence CRLF -, any of the three preceding, or any -Unicode newline sequence.
- -It is also possible to specify a newline convention by starting a pattern -string with one of the following five sequences:
- -These override the default and the options given to
-- -(*CR)a.b
-
changes the convention to CR. That pattern matches "a\nb" because LF is no -longer a newline. If more than one of them is present, the last one -is used.
- -The newline convention affects where the circumflex and dollar assertions are
-true. It also affects the interpretation of the dot metacharacter when
-
Setting match and recursion limits
- -The caller of
--(*LIMIT_MATCH=d)
-(*LIMIT_RECURSION=d)
-
where d is any number of decimal digits. However, the value of the setting must be less than the value set by the caller of
The current default value for both the limits are 10000000 in the Erlang -VM. Note that the recursion limit does not actually affect the stack -depth of the VM, as PCRE for Erlang is compiled in such a way that the -match function never does recursion on the "C-stack".
- -A regular expression is a pattern that is matched against a subject -string from left to right. Most characters stand for themselves in a -pattern, and match the corresponding characters in the subject. As a -trivial example, the pattern
- --- -The quick brown fox
-
matches a portion of a subject string that is identical to
-itself. When caseless matching is specified (the
The power of regular expressions comes from the ability to include -alternatives and repetitions in the pattern. These are encoded in the -pattern by the use of metacharacters, which do not stand for -themselves but instead are interpreted in some special way.
- -There are two different sets of metacharacters: those that are recognized -anywhere in the pattern except within square brackets, and those that are -recognized within square brackets. Outside square brackets, the metacharacters -are as follows:
- -Part of a pattern that is in square brackets is called a "character class". In -a character class the only metacharacters are:
- -A regular expression is a pattern that is matched against a subject + string from left to right. Most characters stand for themselves in a + pattern and match the corresponding characters in the subject. As a + trivial example, the following pattern matches a portion of a subject + string that is identical to itself:
+ +
+The quick brown fox
+
+ When caseless matching is specified (option
The power of regular expressions comes from the ability to include + alternatives and repetitions in the pattern. These are encoded in the + pattern by the use of metacharacters, which do not stand for + themselves but instead are interpreted in some special way.
+ +Two sets of metacharacters exist: those that are recognized anywhere in + the pattern except within square brackets, and those that are recognized + within square brackets. Outside square brackets, the metacharacters are + as follows:
+ +The following sections describe the use of each of the metacharacters.
- - -Part of a pattern within square brackets is called a "character class". + The following are the only metacharacters in a character class:
+The backslash character has several uses. Firstly, if it is followed by a -character that is not a number or a letter, it takes away any special meaning that character -may have. This use of backslash as an escape character applies both inside and -outside character classes.
- -For example, if you want to match a * character, you write \* in the pattern. -This escaping action applies whether or not the following character would -otherwise be interpreted as a metacharacter, so it is always safe to precede a -non-alphanumeric with backslash to specify that it stands for itself. In -particular, if you want to match a backslash, you write \\.
- -In
The following sections describe the use of each metacharacter.
+If a pattern is compiled with the
The backslash character has many uses. First, if it is followed by a + character that is not a number or a letter, it takes away any special + meaning that a character can have. This use of backslash as an escape + character applies both inside and outside character classes.
+ +For example, if you want to match a * character, you write \* in the + pattern. This escaping action applies if the following character would + otherwise be interpreted as a metacharacter, so it is always safe to + precede a non-alphanumeric with backslash to specify that it stands for + itself. In particular, if you want to match a backslash, write \\.
+ +In
If a pattern is compiled with option
To remove the special meaning from a sequence of characters, put them + between \Q and \E. This is different from Perl in that $ and @ are + handled as literals in \Q...\E sequences in PCRE, while $ and @ cause + variable interpolation in Perl. Notice the following examples:
-If you want to remove the special meaning from a sequence of characters, you -can do so by putting them between \Q and \E. This is different from Perl in -that $ and @ are handled as literals in \Q...\E sequences in PCRE, whereas in -Perl, $ and @ cause variable interpolation. Note the following examples:
- Pattern PCRE matches Perl matches
-
- \Qabc$xyz\E abc$xyz abc followed by the contents of $xyz
- \Qabc\$xyz\E abc\$xyz abc\$xyz
- \Qabc\E\$\Qxyz\E abc$xyz abc$xyz
-
-
-The \Q...\E sequence is recognized both inside and outside -character classes. An isolated \E that is not preceded by \Q is -ignored. If \Q is not followed by \E later in the pattern, the literal -interpretation continues to the end of the pattern (that is, \E is -assumed at the end). If the isolated \Q is inside a character class, -this causes an error, because the character class is not -terminated.
- -Non-printing characters
- -A second use of backslash provides a way of encoding non-printing characters -in patterns in a visible manner. There is no restriction on the appearance of -non-printing characters, apart from the binary zero that terminates a pattern, -but when a pattern is being prepared by text editing, it is often easier to use -one of the following escape sequences than the binary character it represents:
- -The precise effect of \cx on ASCII characters is as follows: if x is a lower -case letter, it is converted to upper case. Then bit 6 of the character (hex -40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A (A is 41, Z is 5A), -but \c{ becomes hex 3B ({ is 7B), and \c; becomes hex 7B (; is 3B). If the -data item (byte or 16-bit value) following \c has a value greater than 127, a -compile-time error occurs. This locks out non-ASCII characters in all modes.
- -The \c facility was designed for use with ASCII characters, but with the -extension to Unicode it is even less useful than it once was.
- -By default, after \x, from zero to two hexadecimal digits are read (letters -can be in upper or lower case). Any number of hexadecimal digits may appear -between \x{ and }, but the character code is constrained as follows:
-Invalid Unicode codepoints are the range 0xd800 to 0xdfff (the so-called -"surrogate" codepoints), and 0xffef.
- -If characters other than hexadecimal digits appear between \x{ and }, or if -there is no terminating }, this form of escape is not recognized. Instead, the -initial \x will be interpreted as a basic hexadecimal escape, with no -following digits, giving a character whose value is zero.
- -Characters whose value is less than 256 can be defined by either of the two -syntaxes for \x. There is no difference in the way they are handled. For -example, \xdc is exactly the same as \x{dc}.
- -After \0 up to two further octal digits are read. If there are fewer than two -digits, just those that are present are used. Thus the sequence \0\x\07 -specifies two binary zeros followed by a BEL character (code value 7). Make -sure you supply two digits after the initial zero if the pattern character that -follows is itself an octal digit.
- -The handling of a backslash followed by a digit other than 0 is complicated. -Outside a character class, PCRE reads it and any following digits as a decimal -number. If the number is less than 10, or if there have been at least that many -previous capturing left parentheses in the expression, the entire sequence is -taken as a back reference. A description of how this works is given -later, following the discussion of parenthesized subpatterns.
- - -Inside a character class, or if the decimal number is greater than 9 and there -have not been that many capturing subpatterns, PCRE re-reads up to three octal -digits following the backslash, and uses them to generate a data character. Any -subsequent digits stand for themselves. The value of the character is -constrained in the same way as characters specified in hexadecimal. -For example:
- -Note that octal values of 100 or greater must not be introduced by -a leading zero, because no more than three octal digits are ever -read.
- -All the sequences that define a single character value can be used both inside -and outside character classes. In addition, inside a character class, \b is -interpreted as the backspace character (hex 08).
-\N is not allowed in a character class. \B, \R, and \X are not special -inside a character class. Like other unrecognized escape sequences, they are -treated as the literal characters "B", "R", and "X". Outside a character class, these -sequences have different meanings.
- -Unsupported escape sequences
- -In Perl, the sequences \l, \L, \u, and \U are recognized by its string -handler and used to modify the case of following characters. PCRE -does not support these escape sequences.
- -Absolute and relative back references
- -The sequence \g followed by an unsigned or a negative number, -optionally enclosed in braces, is an absolute or relative back -reference. A named back reference can be coded as \g{name}. Back -references are discussed later, following the discussion of -parenthesized subpatterns.
- -Absolute and relative subroutine calls
-For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or -a number enclosed either in angle brackets or single quotes, is an alternative -syntax for referencing a subpattern as a "subroutine". Details are discussed -later. -Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are not -synonymous. The former is a back reference; the latter is a -subroutine call.
- -Generic character types
- -Another use of backslash is for specifying generic character types:
- -There is also the single sequence \N, which matches a non-newline character.
-This is the same as the "." metacharacter
-when
Each pair of lower and upper case escape sequences partitions the complete set -of characters into two disjoint sets. Any given character matches one, and only -one, of each pair. The sequences can appear both inside and outside character -classes. They each match one character of the appropriate type. If the current -matching point is at the end of the subject string, all of them fail, because -there is no character to match.
- -For compatibility with Perl, \s does not match the VT character (code 11). -This makes it different from the POSIX "space" class. The \s characters -are HT (9), LF (10), FF (12), CR (13), and space (32). If "use locale;" is -included in a Perl script, \s may match the VT character. In PCRE, it never -does.
- -A "word" character is an underscore or any character that is a letter or digit.
-By default, the definition of letters and digits is controlled by PCRE's
-low-valued character tables, in Erlang's case (and without the
By default, in
The upper case escapes match the inverse sets of characters. Note that \d
-matches only decimal digits, whereas \w matches any Unicode digit, as well as
-any Unicode letter, and underscore. Note also that
The sequences \h, \H, \v, and \V are features that were added to Perl at
-release 5.10. In contrast to the other sequences, which match only ASCII
-characters by default, these always match certain high-valued codepoints,
-whether or not
The vertical space characters are:
- -In 8-bit, non-UTF-8 mode, only the characters with codepoints less than 256 are -relevant.
- -Newline sequences
- -Outside a character class, by default, the escape sequence \R matches any -Unicode newline sequence. In non-UTF-8 mode \R is -equivalent to the following:
- -- -(?>\r\n|\n|\x0b|\f|\r|\x85)
This is an example of an "atomic group", details of which are given below.
- -This particular group matches either the two-character sequence CR followed by -LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab, -U+000B), FF (form feed, U+000C), CR (carriage return, U+000D), or NEL (next -line, U+0085). The two-character sequence is treated as a single unit that -cannot be split.
- -In Unicode mode, two additional characters whose codepoints are greater than 255 -are added: LS (line separator, U+2028) and PS (paragraph separator, U+2029). -Unicode character property support is not needed for these characters to be -recognized.
- - -It is possible to restrict \R to match only CR, LF, or CRLF (instead of the
-complete set of Unicode line endings) by setting the option
(*BSR_ANYCRLF) CR, LF, or CRLF only - (*BSR_UNICODE) any Unicode newline sequence
- -These override the default and the options given to the compiling function, but -they can themselves be overridden by options given to a matching function. Note -that these special settings, which are not Perl-compatible, are recognized only -at the very start of a pattern, and that they must be in upper case. If more -than one of them is present, the last one is used. They can be combined with a -change of newline convention; for example, a pattern can start with:
- -(*ANY)(*BSR_ANYCRLF)
- -They can also be combined with the (*UTF8), (*UTF) or -(*UCP) special sequences. Inside a character class, \R is treated as an -unrecognized escape sequence, and so matches the letter "R" by default.
- -Unicode character properties
- -Three additional -escape sequences that match characters with specific properties are available. -When in 8-bit non-UTF-8 mode, these sequences are of course limited to testing -characters whose codepoints are less than 256, but they do work in this mode. -The extra escape sequences are:
-The property names represented by xx above are limited to the Unicode -script names, the general category properties, "Any", which matches any -character (including newline), and some special PCRE properties (described -in the next section). -Other Perl properties such as "InMusicalSymbols" are not currently supported by -PCRE. Note that \P{Any} does not match any characters, so always causes a -match failure.
- -Sets of Unicode characters are defined as belonging to certain scripts. A -character from one of these sets can be matched using a script name. For -example:
- -\p{Greek} - \P{Han}
- -Those that are not part of an identified script are lumped together as -"Common". The current list of scripts is:
- -Each character has exactly one Unicode general category property, specified by -a two-letter abbreviation. For compatibility with Perl, negation can be -specified by including a circumflex between the opening brace and the property -name. For example, \p{^Lu} is the same as \P{Lu}.
- -If only one letter is specified with \p or \P, it includes all the general -category properties that start with that letter. In this case, in the absence -of negation, the curly brackets in the escape sequence are optional; these two -examples have the same effect:
- -The following general category property codes are supported:
- -The \Q...\E sequence is recognized both inside and outside character + classes. An isolated \E that is not preceded by \Q is ignored. If \Q is + not followed by \E later in the pattern, the literal interpretation + continues to the end of the pattern (that is, \E is assumed at the end). + If the isolated \Q is inside a character class, this causes an error, as + the character class is not terminated.
+ +Non-Printing Characters
+A second use of backslash provides a way of encoding non-printing + characters in patterns in a visible manner. There is no restriction on the + appearance of non-printing characters, apart from the binary zero that + terminates a pattern. When a pattern is prepared by text editing, it is + often easier to use one of the following escape sequences than the binary + character it represents:
+ +The precise effect of \cx on ASCII characters is as follows: if x is a + lowercase letter, it is converted to upper case. Then bit 6 of the + character (hex 40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A + (A is 41, Z is 5A), but \c{ becomes hex 3B ({ is 7B), and \c; becomes + hex 7B (; is 3B). If the data item (byte or 16-bit value) following \c + has a value > 127, a compile-time error occurs. This locks out + non-ASCII characters in all modes.
+ +The \c facility was designed for use with ASCII characters, but with the + extension to Unicode it is even less useful than it once was.
+ +By default, after \x, from zero to two hexadecimal digits are read + (letters can be in upper or lower case). Any number of hexadecimal digits + can appear between \x{ and }, but the character code is constrained as + follows:
+ +Invalid Unicode code points are the range 0xd800 to 0xdfff (the so-called + "surrogate" code points), and 0xffef.
+ +If characters other than hexadecimal digits appear between \x{ and }, + or if there is no terminating }, this form of escape is not recognized. + Instead, the initial \x is interpreted as a basic hexadecimal escape, + with no following digits, giving a character whose value is zero.
+ +Characters whose value is < 256 can be defined by either of the two + syntaxes for \x. There is no difference in the way they are handled. For + example, \xdc is the same as \x{dc}.
+ +After \0 up to two further octal digits are read. If there are fewer than + two digits, only those that are present are used. Thus the sequence + \0\x\07 specifies two binary zeros followed by a BEL character (code value + 7). Ensure to supply two digits after the initial zero if the pattern + character that follows is itself an octal digit.
+ +The handling of a backslash followed by a digit other than 0 is + complicated. Outside a character class, PCRE reads it and any following + digits as a decimal number. If the number is < 10, or if there have + been at least that many previous capturing left parentheses in the + expression, the entire sequence is taken as a back reference. A + description of how this works is provided later, following the discussion + of parenthesized subpatterns.
+ +Inside a character class, or if the decimal number is > 9 and there + have not been that many capturing subpatterns, PCRE re-reads up to three + octal digits following the backslash, and uses them to generate a data + character. Any subsequent digits stand for themselves. The value of the + character is constrained in the same way as characters specified in + hexadecimal. For example:
+ +Notice that octal values >= 100 must not be introduced by a leading + zero, as no more than three octal digits are ever read.
+ +All the sequences that define a single character value can be used both + inside and outside character classes. Also, inside a character class, \b + is interpreted as the backspace character (hex 08).
+ +\N is not allowed in a character class. \B, \R, and \X are not special + inside a character class. Like other unrecognized escape sequences, they + are treated as the literal characters "B", "R", and "X". Outside a + character class, these sequences have different meanings.
+ +Unsupported Escape Sequences
+ +In Perl, the sequences \l, \L, \u, and \U are recognized by its string + handler and used to modify the case of following characters. PCRE does not + support these escape sequences.
+ +Absolute and Relative Back References
+ +The sequence \g followed by an unsigned or a negative number, optionally + enclosed in braces, is an absolute or relative back reference. A named + back reference can be coded as \g{name}. Back references are discussed + later, following the discussion of parenthesized subpatterns.
+ +Absolute and Relative Subroutine Calls
+ +For compatibility with Oniguruma, the non-Perl syntax \g followed by a + name or a number enclosed either in angle brackets or single quotes, is + alternative syntax for referencing a subpattern as a "subroutine". + Details are discussed later. Notice that \g{...} (Perl syntax) and + \g<...> (Oniguruma syntax) are not synonymous. The former + is a back reference and the latter is a subroutine call.
+ +Generic Character Types
+Another use of backslash is for specifying generic character types:
+ +There is also the single sequence \N, which matches a non-newline
+ character. This is the same as the "." metacharacter when
Each pair of lowercase and uppercase escape sequences partitions the + complete set of characters into two disjoint sets. Any given character + matches one, and only one, of each pair. The sequences can appear both + inside and outside character classes. They each match one character of the + appropriate type. If the current matching point is at the end of the + subject string, all fail, as there is no character to match.
+ +For compatibility with Perl, \s does not match the VT character + (code 11). This makes it different from the Posix "space" class. The \s + characters are HT (9), LF (10), FF (12), CR (13), and space (32). If "use + locale;" is included in a Perl script, \s can match the VT character. In + PCRE, it never does.
+ +A "word" character is an underscore or any character that is a letter or
+ a digit. By default, the definition of letters and digits is controlled by
+ the PCRE low-valued character tables, in Erlang's case (and without option
+
By default, in
The uppercase escapes match the inverse sets of characters. Notice that
+ \d matches only decimal digits, while \w matches any Unicode digit, any
+ Unicode letter, and underscore. Notice also that
The sequences \h, \H, \v, and \V are features that were added to Perl in
+ release 5.10. In contrast to the other sequences, which match only ASCII
+ characters by default, these always match certain high-valued code points,
+ regardless if
The following are the horizontal space characters:
+ +The following are the vertical space characters:
+ +In 8-bit, non-UTF-8 mode, only the characters with code points < 256 + are relevant.
+ +Newline Sequences
+Outside a character class, by default, the escape sequence \R matches any + Unicode newline sequence. In non-UTF-8 mode, \R is equivalent to the + following:
+ +
+(?>\r\n|\n|\x0b|\f|\r|\x85)
+
+ This is an example of an "atomic group", details are provided below.
+ +This particular group matches either the two-character sequence CR + followed by LF, or one of the single characters LF (line feed, U+000A), + VT (vertical tab, U+000B), FF (form feed, U+000C), CR (carriage return, + U+000D), or NEL (next line, U+0085). The two-character sequence is + treated as a single unit that cannot be split.
+ +In Unicode mode, two more characters whose code points are > 255 are + added: LS (line separator, U+2028) and PS (paragraph separator, U+2029). + Unicode character property support is not needed for these characters to + be recognized.
+ +\R can be restricted to match only CR, LF, or CRLF (instead of the
+ complete set of Unicode line endings) by setting option
These override the default and the options specified to the compiling + function, but they can themselves be overridden by options specified to a + matching function. Notice that these special settings, which are not + Perl-compatible, are recognized only at the very start of a pattern, and + that they must be in upper case. If more than one of them is present, the + last one is used. They can be combined with a change of newline + convention; for example, a pattern can start with:
+ +
+(*ANY)(*BSR_ANYCRLF)
+
+ They can also be combined with the (*UTF8), (*UTF), or (*UCP) special + sequences. Inside a character class, \R is treated as an unrecognized + escape sequence, and so matches the letter "R" by default.
+ +Unicode Character Properties
+ +Three more escape sequences that match characters with specific + properties are available. When in 8-bit non-UTF-8 mode, these sequences + are limited to testing characters whose code points are < + 256, but they do work in this mode. The following are the extra escape + sequences:
+ +The property names represented by xx above are limited to the + Unicode script names, the general category properties, "Any", which + matches any character (including newline), and some special PCRE + properties (described in the next section). Other Perl properties, such as + "InMusicalSymbols", are currently not supported by PCRE. Notice that + \P{Any} does not match any characters and always causes a match + failure.
+ +Sets of Unicode characters are defined as belonging to certain scripts. + A character from one of these sets can be matched using a script name, for + example:
+ +
+\p{Greek} \P{Han}
+
+ Those that are not part of an identified script are lumped together as + "Common". The following is the current list of scripts:
+ +Each character has exactly one Unicode general category property, + specified by a two-letter acronym. For compatibility with Perl, negation + can be specified by including a circumflex between the opening brace and + the property name. For example, \p{^Lu} is the same as \P{Lu}.
+ +If only one letter is specified with \p or \P, it includes all the + general category properties that start with that letter. In this case, in + the absence of negation, the curly brackets in the escape sequence are + optional. The following two examples have the same effect:
+ +
+\p{L}
+\pL
+
+ The following general category property codes are supported:
+ +The special property L& is also supported. It matches a character + that has the Lu, Ll, or Lt property, that is, a letter that is not + classified as a modifier or "other".
+ +The Cs (Surrogate) property applies only to characters in the range + U+D800 to U+DFFF. Such characters are invalid in Unicode strings and so + cannot be tested by PCRE. Perl does not support the Cs property.
+ +The long synonyms for property names supported by Perl (such as + \p{Letter}) are not supported by PCRE. It is not permitted to prefix any + of these properties with "Is".
+ +No character in the Unicode table has the Cn (unassigned) property. + This property is instead assumed for any code point that is not in the + Unicode table.
+ +Specifying caseless matching does not affect these escape sequences. For + example, \p{Lu} always matches only uppercase letters. This is different + from the behavior of current versions of Perl.
+ +Matching characters by Unicode property is not fast, as PCRE must do a
+ multistage table lookup to find a character property. That is why the
+ traditional escape sequences such as \d and \w do not use Unicode
+ properties in PCRE by default. However, you can make them do so by setting
+ option
Extended Grapheme Clusters
+ +The \X escape matches any number of Unicode characters that form an
+ "extended grapheme cluster", and treats the sequence as an atomic group
+ (see below). Up to and including release 8.31, PCRE matched an earlier,
+ simpler definition that was equivalent to
This simple definition was extended in Unicode to include more + complicated kinds of composite character by giving each character a + grapheme breaking property, and creating rules that use these properties + to define the boundaries of extended grapheme clusters. In PCRE releases + later than 8.31, \X matches one of these clusters.
+ +\X always matches at least one character. Then it decides whether to add + more characters according to the following rules for ending a cluster:
+ +End at the end of the subject string.
+Do not end between CR and LF; otherwise end after any control + character.
+Do not break Hangul (a Korean script) syllable sequences. Hangul + characters are of five types: L, V, T, LV, and LVT. An L character can + be followed by an L, V, LV, or LVT character. An LV or V character can + be followed by a V or T character. An LVT or T character can be + followed only by a T character.
+Do not end before extending characters or spacing marks. Characters + with the "mark" property always have the "extend" grapheme breaking + property.
+Do not end after prepend characters.
+Otherwise, end the cluster.
+PCRE Additional Properties
-In addition to the standard Unicode properties described earlier, PCRE
+ supports four more that make it possible to convert traditional escape
+ sequences, such as \w and \s, and Posix character classes to use Unicode
+ properties. PCRE uses these non-standard, non-Perl properties internally
+ when
Any alphanumeric character. Matches characters that have either the + L (letter) or the N (number) property.
+Any Posix space character. Matches the characters tab, line feed, + vertical tab, form feed, carriage return, and any other character + that has the Z (separator) property.
+Any Perl space character. Matches the same as Xps, except that + vertical tab is excluded.
+Any Perl "word" character. Matches the same characters as Xan, plus + underscore.
+There is another non-standard property, Xuc, which matches any character + that can be represented by a Universal Character Name in C++ and other + programming languages. These are the characters $, @, ` (grave accent), + and all characters with Unicode code points >= U+00A0, except for the + surrogates U+D800 to U+DFFF. Notice that most base (ASCII) characters are + excluded. (Universal Character Names are of the form \uHHHH or \UHHHHHHHH, + where H is a hexadecimal digit. Notice that the Xuc property does not + match these sequences but the characters that they represent.)
+ +Resetting the Match Start
+ +The escape sequence \K causes any previously matched characters not to + be included in the final matched sequence. For example, the following + pattern matches "foobar", but reports that it has matched "bar":
+ +
+foo\Kbar
+
+ This feature is similar to a lookbehind assertion + + + (described below). However, in this case, the part of the subject before + the real match does not have to be of fixed length, as lookbehind + assertions do. The use of \K does not interfere with the setting of + captured substrings. For example, when the following pattern matches + "foobar", the first substring is still set to "foo":
-
+(foo)\Kbar
+
+ Perl documents that the use of \K within assertions is "not well + defined". In PCRE, \K is acted upon when it occurs inside positive + assertions, but is ignored in negative assertions.
+ +Simple Assertions
+ +The final use of backslash is for certain simple assertions. An + assertion specifies a condition that must be met at a particular point in + a match, without consuming any characters from the subject string. The + use of subpatterns for more complicated assertions is described below. The + following are the backslashed assertions:
+ +Inside a character class, \b has a different meaning; it matches the + backspace character. If any other of these assertions appears in a + character class, by default it matches the corresponding literal character + (for example, \B matches the letter B).
+ +A word boundary is a position in the subject string where the current
+ character and the previous character do not both match \w or \W (that is,
+ one matches \w and the other matches \W), or the start or end of the
+ string if the first or last character matches \w, respectively. In UTF
+ mode, the meanings of \w and \W can be changed by setting option
+
The \A, \Z, and \z assertions differ from the traditional circumflex and
+ dollar (described in the next section) in that they only ever match at the
+ very start and end of the subject string, whatever options are set. Thus,
+ they are independent of multiline mode. These three assertions are not
+ affected by options
The \G assertion is true only when the current matching position is at
+ the start point of the match, as specified by argument
Notice, however, that the PCRE interpretation of \G, as the start of the + current match, is subtly different from Perl, which defines it as the end + of the previous match. In Perl, these can be different when the previously + matched string was empty. As PCRE does only one match at a time, it cannot + reproduce this behavior.
+ +If all the alternatives of a pattern begin with \G, the expression is + anchored to the starting match position, and the "anchored" flag is set in + the compiled regular expression.
+The special property L& is also supported: it matches a character that has -the Lu, Ll, or Lt property, in other words, a letter that is not classified as -a modifier or "other".
- -The Cs (Surrogate) property applies only to characters in the range U+D800 to -U+DFFF. Such characters are not valid in Unicode strings and so -cannot be tested by PCRE. Perl does not support the Cs property
- -The long synonyms for property names that Perl supports (such as \p{Letter}) -are not supported by PCRE, nor is it permitted to prefix any of these -properties with "Is".
- -No character that is in the Unicode table has the Cn (unassigned) property. -Instead, this property is assumed for any code point that is not in the -Unicode table.
- -Specifying caseless matching does not affect these escape sequences. For -example, \p{Lu} always matches only upper case letters. This is different from -the behaviour of current versions of Perl.
-Matching characters by Unicode property is not fast, because PCRE has to do a
-multistage table lookup in order to find a character's property. That is why
-the traditional escape sequences such as \d and \w do not use Unicode
-properties in PCRE by default, though you can make them do so by setting the
-
Extended grapheme clusters
-The \X escape matches any number of Unicode characters that form an "extended -grapheme cluster", and treats the sequence as an atomic group (see below). -Up to and including release 8.31, PCRE matched an earlier, simpler definition -that was equivalent to
- -- -(?>\PM\pM*)
That is, it matched a character without the "mark" property, followed by zero -or more characters with the "mark" property. Characters with the "mark" -property are typically non-spacing accents that affect the preceding character.
- -This simple definition was extended in Unicode to include more complicated -kinds of composite character by giving each character a grapheme breaking -property, and creating rules that use these properties to define the boundaries -of extended grapheme clusters. In releases of PCRE later than 8.31, \X matches -one of these clusters.
- -\X always matches at least one character. Then it decides whether to add -additional characters according to the following rules for ending a cluster:
-The circumflex and dollar metacharacters are zero-width assertions. That + is, they test for a particular condition to be true without consuming any + characters from the subject string.
+ +Outside a character class, in the default matching mode, the circumflex
+ character is an assertion that is true only if the current matching point
+ is at the start of the subject string. If argument
Circumflex needs not to be the first character of the pattern if + some alternatives are involved, but it is to be the first thing in + each alternative in which it appears if the pattern is ever to match that + branch. If all possible alternatives start with a circumflex, that is, if + the pattern is constrained to match only at the start of the subject, it + is said to be an "anchored" pattern. (There are also other constructs that + can cause a pattern to be anchored.)
+ +The dollar character is an assertion that is true only if the current + matching point is at the end of the subject string, or immediately before + a newline at the end of the string (by default). Notice however that it + does not match the newline. Dollar needs not to be the last character of + the pattern if some alternatives are involved, but it is to be the + last item in any branch in which it appears. Dollar has no special meaning + in a character class.
+ +The meaning of dollar can be changed so that it matches only at the very
+ end of the string, by setting option
The meanings of the circumflex and dollar characters are changed if
+ option
For example, the pattern /^abc$/ matches the subject string "def\nabc"
+ (where \n represents a newline) in multiline mode, but not otherwise.
+ So, patterns that are anchored in single-line mode because all
+ branches start with ^ are not anchored in multiline mode, and a match for
+ circumflex is possible when argument startoffset of
Notice that the sequences \A, \Z, and \z can be used to match the start
+ and end of the subject in both modes. If all branches of a pattern start
+ with \A, it is always anchored, regardless if
PCRE's additional properties
- -As well as the standard Unicode properties described above, PCRE supports four -more that make it possible to convert traditional escape sequences such as \w -and \s and POSIX character classes to use Unicode properties. PCRE uses these -non-standard, non-Perl properties internally when PCRE_UCP is set. However, -they may also be used explicitly. These properties are:
-Xan matches characters that have either the L (letter) or the N (number) -property. Xps matches the characters tab, linefeed, vertical tab, form feed, or -carriage return, and any other character that has the Z (separator) property. -Xsp is the same as Xps, except that vertical tab is excluded. Xwd matches the -same characters as Xan, plus underscore.
- -There is another non-standard property, Xuc, which matches any character that -can be represented by a Universal Character Name in C++ and other programming -languages. These are the characters $, @, ` (grave accent), and all characters -with Unicode code points greater than or equal to U+00A0, except for the -surrogates U+D800 to U+DFFF. Note that most base (ASCII) characters are -excluded. (Universal Character Names are of the form \uHHHH or \UHHHHHHHH -where H is a hexadecimal digit. Note that the Xuc property does not match these -sequences but the characters that they represent.)
- -Resetting the match start
- -The escape sequence \K causes any previously matched characters not to be -included in the final matched sequence. For example, the pattern:
- -- -foo\Kbar
matches "foobar", but reports that it has matched "bar". This feature is -similar to a lookbehind assertion - - -(described below). - -However, in this case, the part of the subject before the real match does not -have to be of fixed length, as lookbehind assertions do. The use of \K does -not interfere with the setting of -captured substrings. -For example, when the pattern
- -- -(foo)\Kbar
matches "foobar", the first substring is still set to "foo".
- -Perl documents that the use of \K within assertions is "not well defined". In -PCRE, \K is acted upon when it occurs inside positive assertions, but is -ignored in negative assertions.
- -Simple assertions
- -The final use of backslash is for certain simple assertions. An -assertion specifies a condition that has to be met at a particular -point in a match, without consuming any characters from the subject -string. The use of subpatterns for more complicated assertions is -described below. The backslashed assertions are:
- -Outside a character class, a dot in the pattern matches any character in + the subject string except (by default) a character that signifies the end + of a line.
+ +When a line ending is defined as a single character, dot never matches + that character. When the two-character sequence CRLF is used, dot does not + match CR if it is immediately followed by LF, otherwise it matches all + characters (including isolated CRs and LFs). When any Unicode line endings + are recognized, dot does not match CR, LF, or any of the other + line-ending characters.
+ +The behavior of dot regarding newlines can be changed. If option
+
The handling of dot is entirely independent of the handling of circumflex + and dollar, the only relationship is that both involve newlines. Dot has + no special meaning in a character class.
+ +The escape sequence \N behaves like a dot, except that it is not affected
+ by option
Inside a character class, \b has a different meaning; it matches the backspace -character. If any other of these assertions appears in a character class, by -default it matches the corresponding literal character (for example, \B -matches the letter B).
- -A word boundary is a position in the subject string where the current character
-and the previous character do not both match \w or \W (i.e. one matches
-\w and the other matches \W), or the start or end of the string if the
-first or last character matches \w, respectively. In a UTF mode, the meanings
-of \w and \W can be changed by setting the
The \A, \Z, and \z assertions differ from the traditional circumflex and
-dollar (described in the next section) in that they only ever match at the very
-start and end of the subject string, whatever options are set. Thus, they are
-independent of multiline mode. These three assertions are not affected by the
-
The \G assertion is true only when the current matching position is at the
-start point of the match, as specified by the startoffset argument of
-
Note, however, that PCRE's interpretation of \G, as the start of the current -match, is subtly different from Perl's, which defines it as the end of the -previous match. In Perl, these can be different when the previously matched -string was empty. Because PCRE does just one match at a time, it cannot -reproduce this behaviour.
- -If all the alternatives of a pattern begin with \G, the expression is anchored -to the starting match position, and the "anchored" flag is set in the compiled -regular expression.
- -The circumflex and dollar metacharacters are zero-width assertions. That is, -they test for a particular condition being true without consuming any -characters from the subject string.
- -Outside a character class, in the default matching mode, the circumflex
-character is an assertion that is true only if the current matching point is at
-the start of the subject string. If the startoffset argument of
-
Circumflex need not be the first character of the pattern if a number of -alternatives are involved, but it should be the first thing in each alternative -in which it appears if the pattern is ever to match that branch. If all -possible alternatives start with a circumflex, that is, if the pattern is -constrained to match only at the start of the subject, it is said to be an -"anchored" pattern. (There are also other constructs that can cause a pattern -to be anchored.)
- -The dollar character is an assertion that is true only if the current matching -point is at the end of the subject string, or immediately before a newline at -the end of the string (by default). Note, however, that it does not actually -match the newline. Dollar need not be the last character of the pattern if a -number of alternatives are involved, but it should be the last item in any -branch in which it appears. Dollar has no special meaning in a character class.
- -The meaning of dollar can be changed so that it matches only at the
-very end of the string, by setting the
The meanings of the circumflex and dollar characters are changed if the
-
For example, the pattern /^abc$/ matches the subject string
-"def\nabc" (where \n represents a newline) in multiline mode, but
-not otherwise. Consequently, patterns that are anchored in single line
-mode because all branches start with ^ are not anchored in multiline
-mode, and a match for circumflex is possible when the
-startoffset argument of
Note that the sequences \A, \Z, and \z can be used to match the start and
-end of the subject in both modes, and if all branches of a pattern start with
-\A it is always anchored, whether or not
Outside a character class, a dot in the pattern matches any one character in -the subject string except (by default) a character that signifies the end of a -line. -
- -When a line ending is defined as a single character, dot never matches that -character; when the two-character sequence CRLF is used, dot does not match CR -if it is immediately followed by LF, but otherwise it matches all characters -(including isolated CRs and LFs). -When any Unicode line endings are being -recognized, dot does not match CR or LF or any of the other line ending -characters. -
- -The behaviour of dot with regard to newlines can be changed. If
-the
The handling of dot is entirely independent of the handling of -circumflex and dollar, the only relationship being that they both -involve newlines. Dot has no special meaning in a character class.
- -The escape sequence \N behaves like a dot, except that it is not affected by -the PCRE_DOTALL option. In other words, it matches any character except one -that signifies the end of a line. Perl also uses \N to match characters by -name; PCRE does not support this.
- -Outside a character class, the escape sequence \C matches any one data unit, -whether or not a UTF mode is set. One data unit is one -byte. Unlike a dot, \C always -matches line-ending characters. The feature is provided in Perl in order to -match individual bytes in UTF-8 mode, but it is unclear how it can usefully be -used. Because \C breaks up characters into individual data units, matching one -unit with \C in a UTF mode means that the rest of the string may start with a -malformed UTF character. This has undefined results, because PCRE assumes that -it is dealing with valid UTF strings.
- -PCRE does not allow \C to appear in lookbehind assertions (described below) -in a UTF mode, because this would make it impossible to calculate the length of -the lookbehind.
- -In general, the \C escape sequence is best avoided. However, one -way of using it that avoids the problem of malformed UTF characters is to use a -lookahead to check the length of the next character, as in this pattern, which -could be used with a UTF-8 string (ignore white space and line breaks):
+Outside a character class, the escape sequence \C matches any data unit, + regardless if a UTF mode is set. One data unit is one byte. Unlike a dot, + \C always matches line-ending characters. The feature is provided in Perl + to match individual bytes in UTF-8 mode, but it is unclear how it can + usefully be used. As \C breaks up characters into individual data units, + matching one unit with \C in a UTF mode means that the remaining string + can start with a malformed UTF character. This has undefined results, as + PCRE assumes that it deals with valid UTF strings.
+ +PCRE does not allow \C to appear in lookbehind assertions (described + below) in a UTF mode, as this would make it impossible to calculate the + length of the lookbehind.
+ +The \C escape sequence is best avoided. However, one way of using it that + avoids the problem of malformed UTF characters is to use a lookahead to + check the length of the next character, as in the following pattern, which + can be used with a UTF-8 string (ignore whitespace and line breaks):
- (?| (?=[\x00-\x7f])(\C) |
- (?=[\x80-\x{7ff}])(\C)(\C) |
- (?=[\x{800}-\x{ffff}])(\C)(\C)(\C) |
- (?=[\x{10000}-\x{1fffff}])(\C)(\C)(\C)(\C))
-
-A group that starts with (?| resets the capturing parentheses numbers in each -alternative (see "Duplicate Subpattern Numbers" -below). The assertions at the start of each branch check the next UTF-8 -character for values whose encoding uses 1, 2, 3, or 4 bytes, respectively. The -character's individual bytes are then captured by the appropriate number of -groups.
- -An opening square bracket introduces a character class, terminated by a closing -square bracket. A closing square bracket on its own is not special by default. -However, if the PCRE_JAVASCRIPT_COMPAT option is set, a lone closing square -bracket causes a compile-time error. If a closing square bracket is required as -a member of the class, it should be the first data character in the class -(after an initial circumflex, if present) or escaped with a backslash.
- -A character class matches a single character in the subject. In a UTF mode, the -character may be more than one data unit long. A matched character must be in -the set of characters defined by the class, unless the first character in the -class definition is a circumflex, in which case the subject character must not -be in the set defined by the class. If a circumflex is actually required as a -member of the class, ensure it is not the first character, or escape it with a -backslash.
- -For example, the character class [aeiou] matches any lower case vowel, while -[^aeiou] matches any character that is not a lower case vowel. Note that a -circumflex is just a convenient notation for specifying the characters that -are in the class by enumerating those that are not. A class that starts with a -circumflex is not an assertion; it still consumes a character from the subject -string, and therefore it fails if the current pointer is at the end of the -string.
- -In UTF-8 mode, characters with values greater than 255 (0xffff) -can be included in a class as a literal string of data units, or by using the -\x{ escaping mechanism.
- -When caseless matching is set, any letters in a class represent both their -upper case and lower case versions, so for example, a caseless [aeiou] matches -"A" as well as "a", and a caseless [^aeiou] does not match "A", whereas a -caseful version would. In a UTF mode, PCRE always understands the concept of -case for characters whose values are less than 256, so caseless matching is -always possible. For characters with higher values, the concept of case is -supported if PCRE is compiled with Unicode property support, but not otherwise. -If you want to use caseless matching in a UTF mode for characters 256 and -above, you must ensure that PCRE is compiled with Unicode property support as -well as with UTF support.
- -Characters that might indicate line breaks are never treated in any special way -when matching character classes, whatever line-ending sequence is in use, and -whatever setting of the PCRE_DOTALL and PCRE_MULTILINE options is used. A class -such as [^a] always matches one of these characters.
- -The minus (hyphen) character can be used to specify a range of characters in a -character class. For example, [d-m] matches any letter between d and m, -inclusive. If a minus character is required in a class, it must be escaped with -a backslash or appear in a position where it cannot be interpreted as -indicating a range, typically as the first or last character in the class.
- -It is not possible to have the literal character "]" as the end character of a -range. A pattern such as [W-]46] is interpreted as a class of two characters -("W" and "-") followed by a literal string "46]", so it would match "W46]" or -"-46]". However, if the "]" is escaped with a backslash it is interpreted as -the end of range, so [W-\]46] is interpreted as a class containing a range -followed by two other characters. The octal or hexadecimal representation of -"]" can also be used to end a range.
- -Ranges operate in the collating sequence of character values. They can also be -used for characters specified numerically, for example [\000-\037]. Ranges -can include any characters that are valid for the current mode.
- -If a range that includes letters is used when caseless matching is set, it -matches the letters in either case. For example, [W-c] is equivalent to -[][\\^_`wxyzabc], matched caselessly, and in a non-UTF mode, if character -tables for a French locale are in use, [\xc8-\xcb] matches accented E -characters in both cases. In UTF modes, PCRE supports the concept of case for -characters with values greater than 255 only when it is compiled with Unicode -property support.
- -The character escape sequences \d, \D, \h, \H, \p, \P, \s, \S, \v,
-\V, \w, and \W may appear in a character class, and add the characters that
-they match to the class. For example, [\dABCDEF] matches any hexadecimal
-digit. In UTF modes, the
A circumflex can conveniently be used with the upper case character types to -specify a more restricted set of characters than the matching lower case type. -For example, the class [^\W_] matches any letter or digit, but not underscore, -whereas [\w] includes underscore. A positive character class should be read as -"something OR something OR ..." and a negative class as "NOT something AND NOT -something AND NOT ...".
- -The only metacharacters that are recognized in character classes -are backslash, hyphen (only where it can be interpreted as specifying -a range), circumflex (only at the start), opening square bracket (only -when it can be interpreted as introducing a POSIX class name - see the -next section), and the terminating closing square bracket. However, -escaping other non-alphanumeric characters does no harm.
-Perl supports the POSIX notation for character classes. This uses names -enclosed by [: and :] within the enclosing square brackets. PCRE also supports -this notation. For example,
- -- -[01[:alpha:]%]
matches "0", "1", any alphabetic character, or "%". The supported class names -are:
- -The "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13), and -space (32). Notice that this list includes the VT character (code 11). This -makes "space" different to \s, which does not include VT (for Perl -compatibility).
- -The name "word" is a Perl extension, and "blank" is a GNU extension -from Perl 5.8. Another Perl extension is negation, which is indicated -by a ^ character after the colon. For example,
- -- -[12[:^digit:]]
matches "1", "2", or any non-digit. PCRE (and Perl) also recognize the POSIX -syntax [.ch.] and [=ch=] where "ch" is a "collating element", but these are not -supported, and an error is given if they are encountered.
- -By default, in UTF modes, characters with values greater than 255 do not match -any of the POSIX character classes. However, if the PCRE_UCP option is passed -to pcre_compile(), some of the classes are changed so that Unicode -character properties are used. This is achieved by replacing the POSIX classes -by other sequences, as follows:
- -Negated versions, such as [:^alpha:] use \P instead of \p. The other POSIX -classes are unchanged, and match only characters with code points less than -256.
- -Vertical bar characters are used to separate alternative -patterns. For example, the pattern
- -- -gilbert|sullivan
matches either "gilbert" or "sullivan". Any number of alternatives -may appear, and an empty alternative is permitted (matching the empty -string). The matching process tries each alternative in turn, from -left to right, and the first one that succeeds is used. If the -alternatives are within a subpattern (defined below), "succeeds" means -matching the rest of the main pattern as well as the alternative in -the subpattern.
- -The settings of the
For example, (?im) sets caseless, multiline matching. It is also possible to
-unset these options by preceding the letter with a hyphen, and a combined
-setting and unsetting such as (?im-sx), which sets
The PCRE-specific options
When one of these option changes occurs at top level (that is, not inside -subpattern parentheses), the change applies to the remainder of the pattern -that follows. If the change is placed right at the start of a pattern, PCRE -extracts it into the global options.
- -An option change within a subpattern (see below for a description of -subpatterns) affects only that part of the subpattern that follows it, so
- -- -(a(?i)b)c
matches abc and aBc and no other strings (assuming
- -(a(?i)b|c)
matches "ab", "aB", "c", and "C", even though when matching "C" the first -branch is abandoned before the option setting. This is because the effects of -option settings happen at compile time. There would be some very weird -behaviour otherwise.
- -Note: There are other PCRE-specific options that can be set by the
-application when the compiling or matching functions are called. In some cases
-the pattern can contain special leading sequences such as (*CRLF) to override
-what the application has set or what has been defaulted. Details are given in
-the section entitled "Newline sequences"
-above. There are also the (*UTF8) and (*UCP) leading
-sequences that can be used to set UTF and Unicode property modes; they are
-equivalent to setting the
Subpatterns are delimited by parentheses (round brackets), which -can be nested. Turning part of a pattern into a subpattern does two -things:
- -1. It localizes a set of alternatives. For example, the pattern
- -- -cat(aract|erpillar|)
matches "cataract", "caterpillar", or "cat". Without the parentheses, it would -match "cataract", "erpillar" or an empty string.
- -2. It sets up the subpattern as a capturing subpattern. This means that, when
-the complete pattern matches, that portion of the subject string that matched the
-subpattern is passed back to the caller via the return value of
-
Opening parentheses are counted from left to right (starting -from 1) to obtain numbers for the capturing subpatterns.For example, if the string -"the red king" is matched against the pattern
- -- -the ((red|white) (king|queen))
the captured substrings are "red king", "red", and "king", and are numbered 1, -2, and 3, respectively.
- -The fact that plain parentheses fulfil two functions is not always helpful. -There are often times when a grouping subpattern is required without a -capturing requirement. If an opening parenthesis is followed by a question mark -and a colon, the subpattern does not do any capturing, and is not counted when -computing the number of any subsequent capturing subpatterns. For example, if -the string "the white queen" is matched against the pattern
- -- -the ((?:red|white) (king|queen))
the captured substrings are "white queen" and "queen", and are numbered 1 and -2. The maximum number of capturing subpatterns is 65535.
- -As a convenient shorthand, if any option settings are required at the start of -a non-capturing subpattern, the option letters may appear between the "?" and -the ":". Thus the two patterns
+(?| (?=[\x00-\x7f])(\C) | + (?=[\x80-\x{7ff}])(\C)(\C) | + (?=[\x{800}-\x{ffff}])(\C)(\C)(\C) | + (?=[\x{10000}-\x{1fffff}])(\C)(\C)(\C)(\C)) + +A group that starts with (?| resets the capturing parentheses numbers in
+ each alternative (see section
An opening square bracket introduces a character class, terminated by a
+ closing square bracket. A closing square bracket on its own is not special
+ by default. However, if option
A character class matches a single character in the subject. In a UTF + mode, the character can be more than one data unit long. A matched + character must be in the set of characters defined by the class, unless + the first character in the class definition is a circumflex, in which case + the subject character must not be in the set defined by the class. If a + circumflex is required as a member of the class, ensure that it is not the + first character, or escape it with a backslash.
+ +For example, the character class
In UTF-8 mode, characters with values > 255 (0xffff) can be included + in a class as a literal string of data units, or by using the \x{ escaping + mechanism.
+ +When caseless matching is set, any letters in a class represent both
+ their uppercase and lowercase versions. For example, a caseless
+
Characters that can indicate line breaks are never treated in any special
+ way when matching character classes, whatever line-ending sequence is in
+ use, and whatever setting of options
The minus (hyphen) character can be used to specify a range of characters + in a character class. For example, [d-m] matches any letter between d and + m, inclusive. If a minus character is required in a class, it must be + escaped with a backslash or appear in a position where it cannot be + interpreted as indicating a range, typically as the first or last + character in the class.
+ +The literal character "]" cannot be the end character of a range. A + pattern such as [W-]46] is interpreted as a class of two characters ("W" + and "-") followed by a literal string "46]", so it would match "W46]" or + "-46]". However, if "]" is escaped with a backslash, it is interpreted as + the end of range, so [W-\]46] is interpreted as a class containing a range + followed by two other characters. The octal or hexadecimal representation + of "]" can also be used to end a range.
+ +Ranges operate in the collating sequence of character values. They can + also be used for characters specified numerically, for example, + [\000-\037]. Ranges can include any characters that are valid for the + current mode.
+ +If a range that includes letters is used when caseless matching is set, + it matches the letters in either case. For example, [W-c] is equivalent to + [][\\^_`wxyzabc], matched caselessly. In a non-UTF mode, if character + tables for a French locale are in use, [\xc8-\xcb] matches accented E + characters in both cases. In UTF modes, PCRE supports the concept of case + for characters with values > 255 only when it is compiled with Unicode + property support.
+ +The character escape sequences \d, \D, \h, \H, \p, \P, \s, \S, \v, \V,
+ \w, and \W can appear in a character class, and add the characters that
+ they match to the class. For example, [\dABCDEF] matches any hexadecimal
+ digit. In UTF modes, option
A circumflex can conveniently be used with the uppercase character types + to specify a more restricted set of characters than the matching lowercase + type. For example, class [^\W_] matches any letter or digit, but not + underscore, while [\w] includes underscore. A positive character class + is to be read as "something OR something OR ..." and a negative class as + "NOT something AND NOT something AND NOT ...".
+ +Only the following metacharacters are recognized in character + classes:
+ +However, escaping other non-alphanumeric characters does no harm.
+match exactly the same set of strings. Because alternative branches are tried -from left to right, and options are not reset until the end of the subpattern -is reached, an option setting in one branch does affect subsequent branches, so -the above patterns match "SUNDAY" as well as "Saturday".
+Perl supports the Posix notation for character classes. This uses names + enclosed by [: and :] within the enclosing square brackets. PCRE also + supports this notation. For example, the following matches "0", "1", any + alphabetic character, or "%":
+ +
+[01[:alpha:]%]
+
+ The following are the supported class names:
+ +The "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13), + and space (32). Notice that this list includes the VT character (code 11). + This makes "space" different to \s, which does not include VT (for Perl + compatibility).
+ +The name "word" is a Perl extension, and "blank" is a GNU extension from + Perl 5.8. Another Perl extension is negation, which is indicated by a ^ + character after the colon. For example, the following matches "1", "2", + or any non-digit:
+ +
+[12[:^digit:]]
+
+ PCRE (and Perl) also recognize the Posix syntax [.ch.] and [=ch=] where + "ch" is a "collating element", but these are not supported, and an error + is given if they are encountered.
+ +By default, in UTF modes, characters with values > 255 do not match
+ any of the Posix character classes. However, if option
Negated versions, such as [:^alpha:], use \P instead of \p. The other + Posix classes are unchanged, and match only characters with code points + < 256.
+Vertical bar characters are used to separate alternative patterns. For + example, the following pattern matches either "gilbert" or "sullivan":
+ +
+gilbert|sullivan
+
+ Any number of alternatives can appear, and an empty alternative is
+ permitted (matching the empty string). The matching process tries each
+ alternative in turn, from left to right, and the first that succeeds is
+ used. If the alternatives are within a subpattern (defined in section
+
The settings of the Perl-compatible options
For example,
The PCRE-specific options
When one of these option changes occurs at top-level (that is, not inside + subpattern parentheses), the change applies to the remainder of the + pattern that follows. If the change is placed right at the start of a + pattern, PCRE extracts it into the global options.
+An option change within a subpattern (see section
+
+(a(?i)b)c
+
+ By this means, options can be made to have different settings in + different parts of the pattern. Any changes made in one alternative do + carry on into subsequent branches within the same subpattern. For + example:
+ +
+(a(?i)b|c)
+
+ matches "ab", "aB", "c", and "C", although when matching "C" the first + branch is abandoned before the option setting. This is because the effects + of option settings occur at compile time. There would be some weird + behavior otherwise.
-Perl 5.10 introduced a feature whereby each alternative in a subpattern uses -the same numbers for its capturing parentheses. Such a subpattern starts with -(?| and is itself a non-capturing subpattern. For example, consider this -pattern:
+Other PCRE-specific options can be set by the application when the
+ compiling or matching functions are called. Sometimes the pattern can
+ contain special leading sequences, such as (*CRLF), to override what
+ the application has set or what has been defaulted. Details are provided
+ in section
The (*UTF8) and (*UCP) leading sequences can be used to set UTF and
+ Unicode property modes. They are equivalent to setting options
+
+(?|(Sat)ur|(Sun))day
Subpatterns are delimited by parentheses (round brackets), which can be + nested. Turning part of a pattern into a subpattern does two things:
-Because the two alternatives are inside a (?| group, both sets of capturing -parentheses are numbered one. Thus, when the pattern matches, you can look -at captured substring number one, whichever alternative matched. This construct -is useful when you want to capture part, but not all, of one of a number of -alternatives. Inside a (?| group, parentheses are numbered as usual, but the -number is reset at the start of each branch. The numbers of any capturing -parentheses that follow the subpattern start after the highest number used in -any branch. The following example is taken from the Perl documentation. The -numbers underneath show in which buffer the captured content will be stored.
+It localizes a set of alternatives. For example, the following + pattern matches "cataract", "caterpillar", or "cat":
+
+cat(aract|erpillar|)
+ Without the parentheses, it would match "cataract", "erpillar", or an + empty string.
+It sets up the subpattern as a capturing subpattern. That is, when
+ the complete pattern matches, that portion of the subject string that
+ matched the subpattern is passed back to the caller through the
+ return value of
Opening parentheses are counted from left to right (starting from 1) to + obtain numbers for the capturing subpatterns. For example, if the string + "the red king" is matched against the following pattern, the captured + substrings are "red king", "red", and "king", and are numbered 1, 2, and + 3, respectively:
+ +
+the ((red|white) (king|queen))
+
+ It is not always helpful that plain parentheses fulfill two functions. + Often a grouping subpattern is required without a capturing requirement. + If an opening parenthesis is followed by a question mark and a colon, the + subpattern does not do any capturing, and is not counted when computing + the number of any subsequent capturing subpatterns. For example, if the + string "the white queen" is matched against the following pattern, the + captured substrings are "white queen" and "queen", and are numbered 1 and + 2:
+ +
+the ((?:red|white) (king|queen))
+
+ The maximum number of capturing subpatterns is 65535.
+ +As a convenient shorthand, if any option settings are required at the + start of a non-capturing subpattern, the option letters can appear between + "?" and ":". Thus, the following two patterns match the same set of + strings:
+ +
+(?i:saturday|sunday)
+(?:(?i)saturday|sunday)
+
+ As alternative branches are tried from left to right, and options are not + reset until the end of the subpattern is reached, an option setting in one + branch does affect subsequent branches, so the above patterns match both + "SUNDAY" and "Saturday".
+
- # before ---------------branch-reset----------- after
- / ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
- # 1 2 2 3 2 3 4
-
-A back reference to a numbered subpattern uses the most recent value that is -set for that number by any subpattern. The following pattern matches "abcabc" -or "defdef":
- -- -/(?|(abc)|(def))\1/
In contrast, a subroutine call to a numbered subpattern always refers to the -first one in the pattern with the given number. The following pattern matches -"abcabc" or "defabc":
- -- -/(?|(abc)|(def))(?1)/
If a condition test -for a subpattern's having matched refers to a non-unique number, the test is -true if any of the subpatterns of that number have matched.
- -An alternative approach to using this "branch reset" feature is to use -duplicate named subpatterns, as described in the next section.
- -Identifying capturing parentheses by number is simple, but it can be very hard -to keep track of the numbers in complicated regular expressions. Furthermore, -if an expression is modified, the numbers may change. To help with this -difficulty, PCRE supports the naming of subpatterns. This feature was not -added to Perl until release 5.10. Python had the feature earlier, and PCRE -introduced it at release 4.0, using the Python syntax. PCRE now supports both -the Perl and the Python syntax. Perl allows identically numbered subpatterns to -have different names, but PCRE does not.
- -In PCRE, a subpattern can be named in one of three ways: -(?<name>...) or (?'name'...) as in Perl, or (?P<name>...) -as in Python. References to capturing parentheses from other parts of -the pattern, such as back references, recursion, and conditions, can be -made by name as well as by number.
- -Names consist of up to 32 alphanumeric characters and underscores. Named
-capturing parentheses are still allocated numbers as well as names, exactly as
-if the names were not present.
-
-The
By default, a name must be unique within a pattern, but it is possible to relax
-this constraint by setting the
Perl 5.10 introduced a feature where each alternative in a subpattern
+ uses the same numbers for its capturing parentheses. Such a subpattern
+ starts with
+(?|(Sat)ur|(Sun))day
+
+ As the two alternatives are inside a
- (?<DN>Mon|Fri|Sun)(?:day)?|
- (?<DN>Tue)(?:sday)?|
- (?<DN>Wed)(?:nesday)?|
- (?<DN>Thu)(?:rsday)?|
- (?<DN>Sat)(?:urday)?
-
-There are five capturing substrings, but only one is ever set after a match. -(An alternative way of solving this problem is to use a "branch reset" -subpattern, as described in the previous section.)
- - - -In case of capturing named subpatterns which names are not unique, the first matching occurrence (counted from left to right in the subject) is returned from
Warning: You cannot use different names to distinguish between two
-subpatterns with the same number because PCRE uses only the numbers when
-matching. For this reason, an error is given at compile time if different names
-are given to subpatterns with the same number. However, you can give the same
-name to subpatterns with the same number, even when
Repetition is specified by quantifiers, which can follow any of the -following items:
- -The general repetition quantifier specifies a minimum and maximum number of -permitted matches, by giving the two numbers in curly brackets (braces), -separated by a comma. The numbers must be less than 65536, and the first must -be less than or equal to the second. For example:
- -- -z{2,4}
matches "zz", "zzz", or "zzzz". A closing brace on its own is not a special -character. If the second number is omitted, but the comma is present, there is -no upper limit; if the second number and the comma are both omitted, the -quantifier specifies an exact number of required matches. Thus
- -- -[aeiou]{3,}
matches at least 3 successive vowels, but may match many more, while
- -- -\d{8}
matches exactly 8 digits. An opening curly bracket that appears in a position -where a quantifier is not allowed, or one that does not match the syntax of a -quantifier, is taken as a literal character. For example, {,6} is not a -quantifier, but a literal string of four characters.
- -In Unicode mode, quantifiers apply to characters rather than to individual data -units. Thus, for example, \x{100}{2} matches two characters, each of -which is represented by a two-byte sequence in a UTF-8 string. Similarly, -\X{3} matches three Unicode extended grapheme clusters, each of which may be -several data units long (and they may be of different lengths).
-The quantifier {0} is permitted, causing the expression to behave as if the -previous item and the quantifier were not present. This may be useful for -subpatterns that are referenced as subroutines -from elsewhere in the pattern (but see also the section entitled -"Defining subpatterns for use by reference only" -below). Items other than subpatterns that have a {0} quantifier are omitted -from the compiled pattern.
- -For convenience, the three most common quantifiers have single-character -abbreviations:
- -It is possible to construct infinite loops by following a -subpattern that can match no characters with a quantifier that has no -upper limit, for example:
- -- -(a?)*
Earlier versions of Perl and PCRE used to give an error at compile time for -such patterns. However, because there are cases where this can be useful, such -patterns are now accepted, but if any repetition of the subpattern does in fact -match no characters, the loop is forcibly broken.
- -By default, the quantifiers are "greedy", that is, they match as much as -possible (up to the maximum number of permitted times), without causing the -rest of the pattern to fail. The classic example of where this gives problems -is in trying to match comments in C programs. These appear between /* and */ -and within the comment, individual * and / characters may appear. An attempt to -match C comments by applying the pattern
- -- -/\*.*\*/
to the string
- -- -/* first comment */ not comment /* second comment */
fails, because it matches the entire string owing to the greediness of the .* -item.
- -However, if a quantifier is followed by a question mark, it ceases to be -greedy, and instead matches the minimum number of times possible, so the -pattern
- -- -/\*.*?\*/
does the right thing with the C comments. The meaning of the various -quantifiers is not otherwise changed, just the preferred number of matches. -Do not confuse this use of question mark with its use as a quantifier in its -own right. Because it has two uses, it can sometimes appear doubled, as in
- -- -\d??\d
which matches one digit by preference, but can match two if that is the only -way the rest of the pattern matches.
- -If the
When a parenthesized subpattern is quantified with a minimum repeat count that -is greater than 1 or with a limited maximum, more memory is required for the -compiled pattern, in proportion to the size of the minimum or maximum.
- -If a pattern starts with .* or .{0,} and the
In cases where it is known that the subject string contains no newlines, it is
-worth setting
However, there are some cases where the optimization cannot be used. When .* -is inside capturing parentheses that are the subject of a back reference -elsewhere in the pattern, a match at the start may fail where a later one -succeeds. Consider, for example:
- -- -(.*)abc\1
If the subject is "xyz123abc123" the match point is the fourth character. For -this reason, such a pattern is not implicitly anchored.
- -Another case where implicit anchoring is not applied is when the leading .* is -inside an atomic group. Once again, a match at the start may fail where a later -one succeeds. Consider this pattern:
- -- -(?>.*?a)b
It matches "ab" in the subject "aab". The use of the backtracking control verbs -(*PRUNE) and (*SKIP) also disable this optimization.
- -When a capturing subpattern is repeated, the value captured is the substring -that matched the final iteration. For example, after
- -- -(tweedle[dume]{3}\s*)+
has matched "tweedledum tweedledee" the value of the captured substring is -"tweedledee". However, if there are nested capturing subpatterns, the -corresponding captured values may have been set in previous iterations. For -example, after
- -- -/(a|(b))+/
matches "aba" the value of the second captured substring is "b".
- - -With both maximizing ("greedy") and minimizing ("ungreedy" or "lazy") -repetition, failure of what follows normally causes the repeated item to be -re-evaluated to see if a different number of repeats allows the rest of the -pattern to match. Sometimes it is useful to prevent this, either to change the -nature of the match, or to cause it fail earlier than it otherwise might, when -the author of the pattern knows there is no point in carrying on.
+A back reference to a numbered subpattern uses the most recent value that + is set for that number by any subpattern. The following pattern matches + "abcabc" or "defdef":
-Consider, for example, the pattern \d+foo when applied to the subject line
+
+/(?|(abc)|(def))\1/
-+123456bar
In contrast, a subroutine call to a numbered subpattern always refers to + the first one in the pattern with the given number. The following pattern + matches "abcabc" or "defabc":
-After matching all 6 digits and then failing to match "foo", the normal -action of the matcher is to try again with only 5 digits matching the \d+ -item, and then with 4, and so on, before ultimately failing. "Atomic grouping" -(a term taken from Jeffrey Friedl's book) provides the means for specifying -that once a subpattern has matched, it is not to be re-evaluated in this way.
+
+/(?|(abc)|(def))(?1)/
-If we use atomic grouping for the previous example, the matcher gives up -immediately on failing to match "foo" the first time. The notation is a kind of -special parenthesis, starting with (?> as in this example:
+If a condition test for a subpattern having matched refers to a + non-unique number, the test is true if any of the subpatterns of that + number have matched.
-- -(?>\d+)foo
This kind of parenthesis "locks up" the part of the pattern it contains once -it has matched, and a failure further into the pattern is prevented from -backtracking into it. Backtracking past it to previous items, however, works as -normal.
- -An alternative description is that a subpattern of this type matches the string -of characters that an identical standalone pattern would match, if anchored at -the current point in the subject string.
- -Atomic grouping subpatterns are not capturing subpatterns. Simple cases such as -the above example can be thought of as a maximizing repeat that must swallow -everything it can. So, while both \d+ and \d+? are prepared to adjust the -number of digits they match in order to make the rest of the pattern match, -(?>\d+) can only match an entire sequence of digits.
- -Atomic groups in general can of course contain arbitrarily complicated -subpatterns, and can be nested. However, when the subpattern for an atomic -group is just a single repeated item, as in the example above, a simpler -notation, called a "possessive quantifier" can be used. This consists of an -additional + character following a quantifier. Using this notation, the -previous example can be rewritten as
- -- -\d++foo
Note that a possessive quantifier can be used with an entire group, for -example:
- -- -(abc|xyz){2,3}+
Possessive quantifiers are always greedy; the setting of the
The possessive quantifier syntax is an extension to the Perl 5.8 syntax. -Jeffrey Friedl originated the idea (and the name) in the first edition of his -book. Mike McCloskey liked it, so implemented it when he built Sun's Java -package, and PCRE copied it from there. It ultimately found its way into Perl -at release 5.10.
- -PCRE has an optimization that automatically "possessifies" certain simple -pattern constructs. For example, the sequence A+B is treated as A++B because -there is no point in backtracking into a sequence of A's when B must follow.
+An alternative approach using this "branch reset" feature is to use + duplicate named subpatterns, as described in the next section.
+When a pattern contains an unlimited repeat inside a subpattern that can itself -be repeated an unlimited number of times, the use of an atomic group is the -only way to avoid some failing matches taking a very long time indeed. The -pattern
+Identifying capturing parentheses by number is simple, but it can be + hard to keep track of the numbers in complicated regular expressions. + Also, if an expression is modified, the numbers can change. To help with + this difficulty, PCRE supports the naming of subpatterns. This feature was + not added to Perl until release 5.10. Python had the feature earlier, and + PCRE introduced it at release 4.0, using the Python syntax. PCRE now + supports both the Perl and the Python syntax. Perl allows identically + numbered subpatterns to have different names, but PCRE does not.
+ +In PCRE, a subpattern can be named in one of three ways:
+
Names consist of up to 32 alphanumeric characters and underscores. Named
+ capturing parentheses are still allocated numbers as well as names,
+ exactly as if the names were not present.
+ The
+(\D+|<\d+>)*[!?]
By default, a name must be unique within a pattern, but this constraint
+ can be relaxed by setting option
+(?<DN>Mon|Fri|Sun)(?:day)?|
+(?<DN>Tue)(?:sday)?|
+(?<DN>Wed)(?:nesday)?|
+(?<DN>Thu)(?:rsday)?|
+(?<DN>Sat)(?:urday)?
+
+ There are five capturing substrings, but only one is ever set after a + match. (An alternative way of solving this problem is to use a "branch + reset" subpattern, as described in the previous section.)
+ +For capturing named subpatterns which names are not unique, the first
+ matching occurrence (counted from left to right in the subject) is
+ returned from
matches an unlimited number of substrings that either consist of non-digits, or -digits enclosed in <>, followed by either ! or ?. When it matches, it runs -quickly. However, if it is applied to
+You cannot use different names to distinguish between two subpatterns
+ with the same number, as PCRE uses only the numbers when matching. For
+ this reason, an error is given at compile time if different names are
+ specified to subpatterns with the same number. However, you can specify
+ the same name to subpatterns with the same number, even when
+
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Repetition is specified by quantifiers, which can follow any of the + following items:
+ +The general repetition quantifier specifies a minimum and maximum number + of permitted matches, by giving the two numbers in curly brackets + (braces), separated by a comma. The numbers must be < 65536, and the + first must be less than or equal to the second. For example, the following + matches "zz", "zzz", or "zzzz":
+ +
+z{2,4}
+
+ A closing brace on its own is not a special character. If the second + number is omitted, but the comma is present, there is no upper limit. If + the second number and the comma are both omitted, the quantifier specifies + an exact number of required matches. Thus, the following matches at least + three successive vowels, but can match many more:
+ +
+[aeiou]{3,}
+
+ The following matches exactly eight digits:
+ +
+\d{8}
+
+ An opening curly bracket that appears in a position where a quantifier is + not allowed, or one that does not match the syntax of a quantifier, is + taken as a literal character. For example, {,6} is not a quantifier, but a + literal string of four characters.
+ +In Unicode mode, quantifiers apply to characters rather than to + individual data units. Thus, for example, \x{100}{2} matches two + characters, each of which is represented by a 2-byte sequence in a + UTF-8 string. Similarly, \X{3} matches three Unicode extended grapheme + clusters, each of which can be many data units long (and they can be of + different lengths).
+ +The quantifier {0} is permitted, causing the expression to behave as if
+ the previous item and the quantifier were not present. This can be useful
+ for subpatterns that are referenced as subroutines from elsewhere in the
+ pattern (but see also section
For convenience, the three most common quantifiers have single-character + abbreviations:
+ +Infinite loops can be constructed by following a subpattern that can + match no characters with a quantifier that has no upper limit, for + example:
+ +
+(a?)*
+
+ Earlier versions of Perl and PCRE used to give an error at compile time + for such patterns. However, as there are cases where this can be useful, + such patterns are now accepted. However, if any repetition of the + subpattern matches no characters, the loop is forcibly broken.
+ +By default, the quantifiers are "greedy", that is, they match as much as + possible (up to the maximum number of permitted times), without causing + the remaining pattern to fail. The classic example of where this gives + problems is in trying to match comments in C programs. These appear + between /* and */. Within the comment, individual * and / characters can + appear. An attempt to match C comments by applying the pattern
+ +
+/\*.*\*/
+
+ to the string
+ +
+/* first comment */ not comment /* second comment */
+
+ fails, as it matches the entire string owing to the greediness of the .* + item.
+ +However, if a quantifier is followed by a question mark, it ceases to be + greedy, and instead matches the minimum number of times possible, so the + following pattern does the right thing with the C comments:
+ +
+/\*.*?\*/
+
+ The meaning of the various quantifiers is not otherwise changed, only + the preferred number of matches. Do not confuse this use of question mark + with its use as a quantifier in its own right. As it has two uses, it can + sometimes appear doubled, as in
+ +
+\d??\d
+
+ which matches one digit by preference, but can match two if that is the + only way the remaining pattern matches.
-it takes a long time before reporting failure. This is because the string can -be divided between the internal \D+ repeat and the external * repeat in a -large number of ways, and all have to be tried. (The example uses [!?] rather -than a single character at the end, because both PCRE and Perl have an -optimization that allows for fast failure when a single character is used. They -remember the last single character that is required for a match, and fail early -if it is not present in the string.) If the pattern is changed so that it uses -an atomic group, like this:
+If option
+((?>\D+)|<\d+>)*[!?]
When a parenthesized subpattern is quantified with a minimum repeat count + that is > 1 or with a limited maximum, more memory is required for the + compiled pattern, in proportion to the size of the minimum or maximum.
-sequences of non-digits cannot be broken, and failure happens quickly.
- -Outside a character class, a backslash followed by a digit greater than 0 (and -possibly further digits) is a back reference to a capturing subpattern earlier -(that is, to its left) in the pattern, provided there have been that many -previous capturing left parentheses.
- -However, if the decimal number following the backslash is less than 10, it is -always taken as a back reference, and causes an error only if there are not -that many capturing left parentheses in the entire pattern. In other words, the -parentheses that are referenced need not be to the left of the reference for -numbers less than 10. A "forward back reference" of this type can make sense -when a repetition is involved and the subpattern to the right has participated -in an earlier iteration.
- -It is not possible to have a numerical "forward back reference" to -a subpattern whose number is 10 or more using this syntax because a -sequence such as \50 is interpreted as a character defined in -octal. See the subsection entitled "Non-printing characters" above for -further details of the handling of digits following a backslash. There -is no such problem when named parentheses are used. A back reference -to any subpattern is possible using named parentheses (see below).
- -Another way of avoiding the ambiguity inherent in the use of digits following a -backslash is to use the \g escape sequence. This escape must be followed by an -unsigned number or a negative number, optionally enclosed in braces. These -examples are all identical:
- -An unsigned number specifies an absolute reference without the -ambiguity that is present in the older syntax. It is also useful when -literal digits follow the reference. A negative number is a relative -reference. Consider this example:
- -- -(abc(def)ghi)\g{-1}
The sequence \g{-1} is a reference to the most recently started capturing -subpattern before \g, that is, is it equivalent to \2 in this example. -Similarly, \g{-2} would be equivalent to \1. The use of relative references -can be helpful in long patterns, and also in patterns that are created by -joining together fragments that contain references within themselves.
- -A back reference matches whatever actually matched the capturing -subpattern in the current subject string, rather than anything -matching the subpattern itself (see "Subpatterns as subroutines" below -for a way of doing that). So the pattern
- -- -(sens|respons)e and \1ibility
matches "sense and sensibility" and "response and responsibility", but not -"sense and responsibility". If caseful matching is in force at the time of the -back reference, the case of letters is relevant. For example,
- -- -((?i)rah)\s+\1
matches "rah rah" and "RAH RAH", but not "RAH rah", even though the original -capturing subpattern is matched caselessly.
- -There are several different ways of writing back references to named -subpatterns. The .NET syntax \k{name} and the Perl syntax \k<name> or -\k'name' are supported, as is the Python syntax (?P=name). Perl 5.10's unified -back reference syntax, in which \g can be used for both numeric and named -references, is also supported. We could rewrite the above example in any of -the following ways:
- -A subpattern that is referenced by name may appear in the pattern before or -after the reference.
- -There may be more than one back reference to the same subpattern. If a -subpattern has not actually been used in a particular match, any back -references to it always fail. For example, the pattern
- -- -(a|(bc))\2
always fails if it starts to match "a" rather than "bc". Because
-there may be many capturing parentheses in a pattern, all digits
-following the backslash are taken as part of a potential back
-reference number. If the pattern continues with a digit character,
-some delimiter must be used to terminate the back reference. If the
-
Recursive back references
- -A back reference that occurs inside the parentheses to which it refers fails -when the subpattern is first used, so, for example, (a\1) never matches. -However, such references can be useful inside repeated subpatterns. For -example, the pattern
- -- -(a|b\1)+
matches any number of "a"s and also "aba", "ababbaa" etc. At each iteration of -the subpattern, the back reference matches the character string corresponding -to the previous iteration. In order for this to work, the pattern must be such -that the first iteration does not need to match the back reference. This can be -done using alternation, as in the example above, or by a quantifier with a -minimum of zero.
- -Back references of this type cause the group that they reference to be treated -as an atomic group. -Once the whole group has been matched, a subsequent matching failure cannot -cause backtracking into the middle of the group.
- -An assertion is a test on the characters following or preceding the current -matching point that does not actually consume any characters. The simple -assertions coded as \b, \B, \A, \G, \Z, \z, ^ and $ are described -above.
- - -More complicated assertions are coded as subpatterns. There are two kinds: -those that look ahead of the current position in the subject string, and those -that look behind it. An assertion subpattern is matched in the normal way, -except that it does not cause the current matching position to be changed.
- -Assertion subpatterns are not capturing subpatterns. If such an assertion -contains capturing subpatterns within it, these are counted for the purposes of -numbering the capturing subpatterns in the whole pattern. However, substring -capturing is carried out only for positive assertions. (Perl sometimes, but not -always, does do capturing in negative assertions.)
- -For compatibility with Perl, assertion subpatterns may be repeated; though -it makes no sense to assert the same thing several times, the side effect of -capturing parentheses may occasionally be useful. In practice, there only three -cases:
- -If a pattern starts with .* or .{0,} and option
Lookahead assertions
+In cases where it is known that the subject string contains no newlines,
+ it is worth setting
Lookahead assertions start with (?= for positive assertions and (?! for -negative assertions. For example,
+However, there are some cases where the optimization cannot be used. When + .* is inside capturing parentheses that are the subject of a back + reference elsewhere in the pattern, a match at the start can fail where a + later one succeeds. Consider, for example:
+ +
+(.*)abc\1
-+\w+(?=;)
If the subject is "xyz123abc123", the match point is the fourth + character. Therefore, such a pattern is not implicitly anchored.
-matches a word followed by a semicolon, but does not include the semicolon in -the match, and
+Another case where implicit anchoring is not applied is when the leading + .* is inside an atomic group. Once again, a match at the start can fail + where a later one succeeds. Consider the following pattern:
-+foo(?!bar)
+(?>.*?a)b
-matches any occurrence of "foo" that is not followed by "bar". Note that the -apparently similar pattern
+It matches "ab" in the subject "aab". The use of the backtracking control + verbs (*PRUNE) and (*SKIP) also disable this optimization.
-+(?!foo)bar
When a capturing subpattern is repeated, the value captured is the + substring that matched the final iteration. For example, after
-does not find an occurrence of "bar" that is preceded by something other than -"foo"; it finds any occurrence of "bar" whatsoever, because the assertion -(?!foo) is always true when the next three characters are "bar". A -lookbehind assertion is needed to achieve the other effect.
+
+(tweedle[dume]{3}\s*)+
-If you want to force a matching failure at some point in a pattern, the most -convenient way to do it is with (?!) because an empty string always matches, so -an assertion that requires there not to be an empty string must always fail. -The backtracking control verb (*FAIL) or (*F) is a synonym for (?!).
+has matched "tweedledum tweedledee", the value of the captured substring + is "tweedledee". However, if there are nested capturing subpatterns, the + corresponding captured values can have been set in previous iterations. + For example, after
+
+/(a|(b))+/
-Lookbehind assertions
+matches "aba", the value of the second captured substring is "b".
+Lookbehind assertions start with (?<= for positive assertions and (?<! for -negative assertions. For example,
+With both maximizing ("greedy") and minimizing ("ungreedy" or "lazy") + repetition, failure of what follows normally causes the repeated item to + be re-evaluated to see if a different number of repeats allows the + remaining pattern to match. Sometimes it is useful to prevent this, either + to change the nature of the match, or to cause it to fail earlier than it + otherwise might, when the author of the pattern knows that there is no + point in carrying on.
+ +Consider, for example, the pattern \d+foo when applied to the following + subject line:
+ +
+123456bar
+
+ After matching all six digits and then failing to match "foo", the normal + action of the matcher is to try again with only five digits matching item + \d+, and then with four, and so on, before ultimately failing. "Atomic + grouping" (a term taken from Jeffrey Friedl's book) provides the means for + specifying that once a subpattern has matched, it is not to be + re-evaluated in this way.
+ +If atomic grouping is used for the previous example, the matcher gives up
+ immediately on failing to match "foo" the first time. The notation is a
+ kind of special parenthesis, starting with
+(?>\d+)foo
+
+ This kind of parenthesis "locks up" the part of the pattern it contains + once it has matched, and a failure further into the pattern is prevented + from backtracking into it. Backtracking past it to previous items, + however, works as normal.
+ +An alternative description is that a subpattern of this type matches the + string of characters that an identical standalone pattern would match, if + anchored at the current point in the subject string.
+ +Atomic grouping subpatterns are not capturing subpatterns. Simple cases
+ such as the above example can be thought of as a maximizing repeat that
+ must swallow everything it can. So, while both \d+ and \d+? are prepared
+ to adjust the number of digits they match to make the remaining pattern
+ match,
Atomic groups in general can contain any complicated + subpatterns, and can be nested. However, when the subpattern for an atomic + group is just a single repeated item, as in the example above, a simpler + notation, called a "possessive quantifier" can be used. This consists of + an extra + character following a quantifier. Using this notation, the + previous example can be rewritten as
+ +
+\d++foo
+
+ Notice that a possessive quantifier can be used with an entire group, + for example:
+ +
+(abc|xyz){2,3}+
+
+ Possessive quantifiers are always greedy; the setting of option
+
The possessive quantifier syntax is an extension to the Perl 5.8 syntax. + Jeffrey Friedl originated the idea (and the name) in the first edition of + his book. Mike McCloskey liked it, so implemented it when he built the + Sun Java package, and PCRE copied it from there. It ultimately found its + way into Perl at release 5.10.
+ +PCRE has an optimization that automatically "possessifies" certain simple + pattern constructs. For example, the sequence A+B is treated as A++B, as + there is no point in backtracking into a sequence of A:s when B must + follow.
+ +When a pattern contains an unlimited repeat inside a subpattern that can + itself be repeated an unlimited number of times, the use of an atomic + group is the only way to avoid some failing matches taking a long time. + The pattern
+ +
+(\D+|<\d+>)*[!?]
+
+ matches an unlimited number of substrings that either consist of + non-digits, or digits enclosed in <>, followed by ! or ?. When it + matches, it runs quickly. However, if it is applied to
+ +
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+
+ it takes a long time before reporting failure. This is because the string + can be divided between the internal \D+ repeat and the external * repeat + in many ways, and all must be tried. (The example uses [!?] rather than a + single character at the end, as both PCRE and Perl have an optimization + that allows for fast failure when a single character is used. They + remember the last single character that is required for a match, and fail + early if it is not present in the string.) If the pattern is changed so + that it uses an atomic group, like the following, sequences of non-digits + cannot be broken, and failure happens quickly:
+ +
+((?>\D+)|<\d+>)*[!?]
+ +(?<!foo)bar
Outside a character class, a backslash followed by a digit > 0 (and + possibly further digits) is a back reference to a capturing subpattern + earlier (that is, to its left) in the pattern, provided there have been + that many previous capturing left parentheses.
+ +However, if the decimal number following the backslash is < 10, it is + always taken as a back reference, and causes an error only if there are + not that many capturing left parentheses in the entire pattern. That is, + the parentheses that are referenced do need not be to the left of the + reference for numbers < 10. A "forward back reference" of this type can + make sense when a repetition is involved and the subpattern to the right + has participated in an earlier iteration.
+ +It is not possible to have a numerical "forward back reference" to a
+ subpattern whose number is 10 or more using this syntax, as a sequence
+ such as \50 is interpreted as a character defined in octal. For more
+ details of the handling of digits following a backslash, see section
+
Another way to avoid the ambiguity inherent in the use of digits + following a backslash is to use the \g escape sequence. This escape must + be followed by an unsigned number or a negative number, optionally + enclosed in braces. The following examples are identical:
+ +
+(ring), \1
+(ring), \g1
+(ring), \g{1}
+
+ An unsigned number specifies an absolute reference without the ambiguity + that is present in the older syntax. It is also useful when literal digits + follow the reference. A negative number is a relative reference. Consider + the following example:
+ +
+(abc(def)ghi)\g{-1}
+
+ The sequence \g{-1} is a reference to the most recently started capturing + subpattern before \g, that is, it is equivalent to \2 in this example. + Similarly, \g{-2} would be equivalent to \1. The use of relative + references can be helpful in long patterns, and also in patterns that are + created by joining fragments containing references within themselves.
+ +A back reference matches whatever matched the capturing subpattern in the
+ current subject string, rather than anything matching the subpattern
+ itself (section
+(sens|respons)e and \1ibility
+
+ If caseful matching is in force at the time of the back reference, the + case of letters is relevant. For example, the following matches "rah rah" + and "RAH RAH", but not "RAH rah", although the original capturing + subpattern is matched caselessly:
+ +
+((?i)rah)\s+\1
+
+ There are many different ways of writing back references to named
+ subpatterns. The .NET syntax
+(?<p1>(?i)rah)\s+\k<p1>
+(?'p1'(?i)rah)\s+\k{p1}
+(?P<p1>(?i)rah)\s+(?P=p1)
+(?<p1>(?i)rah)\s+\g{p1}
+
+ A subpattern that is referenced by name can appear in the pattern before + or after the reference.
+ +There can be more than one back reference to the same subpattern. If a + subpattern has not been used in a particular match, any back references to + it always fails. For example, the following pattern always fails if it + starts to match "a" rather than "bc":
+ +
+(a|(bc))\2
+
+ As there can be many capturing parentheses in a pattern, all digits
+ following the backslash are taken as part of a potential back reference
+ number. If the pattern continues with a digit character, some delimiter
+ must be used to terminate the back reference. If option
Recursive Back References
+ +A back reference that occurs inside the parentheses to which it refers + fails when the subpattern is first used, so, for example, (a\1) never + matches. However, such references can be useful inside repeated + subpatterns. For example, the following pattern matches any number of + "a"s and also "aba", "ababbaa", and so on:
+ +
+(a|b\1)+
+
+ At each iteration of the subpattern, the back reference matches the + character string corresponding to the previous iteration. In order for + this to work, the pattern must be such that the first iteration does not + need to match the back reference. This can be done using alternation, as + in the example above, or by a quantifier with a minimum of zero.
+ +Back references of this type cause the group that they reference to be + treated as an atomic group. Once the whole group has been matched, a + subsequent matching failure cannot cause backtracking into the middle of + the group.
+does find an occurrence of "bar" that is not preceded by "foo". The contents of -a lookbehind assertion are restricted such that all the strings it matches must -have a fixed length. However, if there are several top-level alternatives, they -do not all have to have the same fixed length. Thus
+An assertion is a test on the characters following or preceding the + current matching point that does not consume any characters. The simple + assertions coded as \b, \B, \A, \G, \Z, \z, ^, and $ are described in + the previous sections.
+ +More complicated assertions are coded as subpatterns. There are two + kinds: those that look ahead of the current position in the subject + string, and those that look behind it. An assertion subpattern is matched + in the normal way, except that it does not cause the current matching + position to be changed.
+ +Assertion subpatterns are not capturing subpatterns. If such an assertion + contains capturing subpatterns within it, these are counted for the + purposes of numbering the capturing subpatterns in the whole pattern. + However, substring capturing is done only for positive assertions. (Perl + sometimes, but not always, performs capturing in negative assertions.)
+ +For compatibility with Perl, assertion subpatterns can be repeated. + However, it makes no sense to assert the same thing many times, the side + effect of capturing parentheses can occasionally be useful. In practice, + there are only three cases:
+ +If the quantifier is {0}, the assertion is never obeyed during + matching. However, it can contain internal capturing parenthesized + groups that are called from elsewhere through the subroutine + mechanism.
+If quantifier is {0,n}, where n > 0, it is treated as if it was + {0,1}. At runtime, the remaining pattern match is tried with and + without the assertion, the order depends on the greediness of the + quantifier.
+If the minimum repetition is > 0, the quantifier is ignored. The + assertion is obeyed only once when encountered during matching.
++(?<=bullock|donkey)
Lookahead Assertions
-is permitted, but
+Lookahead assertions start with (?= for positive assertions and (?! for + negative assertions. For example, the following matches a word followed by + a semicolon, but does not include the semicolon in the match:
-+(?<!dogs?|cats?)
+\w+(?=;)
-causes an error at compile time. Branches that match different length strings -are permitted only at the top level of a lookbehind assertion. This is an -extension compared with Perl, which requires all branches to -match the same length of string. An assertion such as
+The following matches any occurrence of "foo" that is not followed by + "bar":
-+(?<=ab(c|de))
+foo(?!bar)
-is not permitted, because its single top-level branch can match two different -lengths, but it is acceptable to PCRE if rewritten to use two top-level -branches:
+Notice that the apparently similar pattern
-+(?<=abc|abde)
+(?!foo)bar
-In some cases, the escape sequence \K (see above) can be -used instead of a lookbehind assertion to get round the fixed-length -restriction.
+does not find an occurrence of "bar" that is preceded by something other + than "foo". It finds any occurrence of "bar" whatsoever, as the assertion + (?!foo) is always true when the next three characters are "bar". A + lookbehind assertion is needed to achieve the other effect.
-The implementation of lookbehind assertions is, for each alternative, to -temporarily move the current position back by the fixed length and then try to -match. If there are insufficient characters before the current position, the -assertion fails.
+If you want to force a matching failure at some point in a pattern, the + most convenient way to do it is with (?!), as an empty string always + matches. So, an assertion that requires there is not to be an empty + string must always fail. The backtracking control verb (*FAIL) or (*F) is + a synonym for (?!).
-In a UTF mode, PCRE does not allow the \C escape (which matches a single data -unit even in a UTF mode) to appear in lookbehind assertions, because it makes -it impossible to calculate the length of the lookbehind. The \X and \R -escapes, which can match different numbers of data units, are also not -permitted.
-"Subroutine" calls (see below) such as (?2) or (?&X) are permitted in lookbehinds, as long -as the subpattern matches a fixed-length string. Recursion, -however, is not supported.
+Lookbehind Assertions
-Possessive quantifiers can be used in conjunction with lookbehind assertions to -specify efficient matching of fixed-length strings at the end of subject -strings. Consider a simple pattern such as
+Lookbehind assertions start with (?<= for positive assertions and + (?<! for negative assertions. For example, the following finds an + occurrence of "bar" that is not preceded by "foo":
-+abcd$
+(?<!foo)bar
-when applied to a long string that does not match. Because matching proceeds -from left to right, PCRE will look for each "a" in the subject and then see if -what follows matches the rest of the pattern. If the pattern is specified as
+The contents of a lookbehind assertion are restricted such that all the + strings it matches must have a fixed length. However, if there are many + top-level alternatives, they do not all have to have the same fixed + length. Thus, the following is permitted:
-+^.*abcd$
+(?<=bullock|donkey)
-the initial .* matches the entire string at first, but when this fails (because -there is no following "a"), it backtracks to match all but the last character, -then all but the last two characters, and so on. Once again the search for "a" -covers the entire string, from right to left, so we are no better off. However, -if the pattern is written as
+The following causes an error at compile time:
-+^.*+(?<=abcd)
+(?<!dogs?|cats?)
-there can be no backtracking for the .*+ item; it can match only the entire -string. The subsequent lookbehind assertion does a single test on the last four -characters. If it fails, the match fails immediately. For long strings, this -approach makes a significant difference to the processing time.
+Branches that match different length strings are permitted only at the + top-level of a lookbehind assertion. This is an extension compared with + Perl, which requires all branches to match the same length of string. An + assertion such as the following is not permitted, as its single top-level + branch can match two different lengths:
-Using multiple assertions
+
+(?<=ab(c|de))
-Several assertions (of any sort) may occur in succession. For example,
+However, it is acceptable to PCRE if rewritten to use two top-level + branches:
-+(?<=\d{3})(?<!999)foo
+(?<=abc|abde)
-matches "foo" preceded by three digits that are not "999". Notice -that each of the assertions is applied independently at the same point -in the subject string. First there is a check that the previous three -characters are all digits, and then there is a check that the same -three characters are not "999". This pattern does not match -"foo" preceded by six characters, the first of which are digits and -the last three of which are not "999". For example, it doesn't match -"123abcfoo". A pattern to do that is
+Sometimes the escape sequence \K (see above) can be used instead of + a lookbehind assertion to get round the fixed-length restriction.
-+(?<=\d{3}...)(?<!999)foo
The implementation of lookbehind assertions is, for each alternative, to + move the current position back temporarily by the fixed length and then + try to match. If there are insufficient characters before the current + position, the assertion fails.
-This time the first assertion looks at the preceding six -characters, checking that the first three are digits, and then the -second assertion checks that the preceding three characters are not -"999".
+In a UTF mode, PCRE does not allow the \C escape (which matches a single + data unit even in a UTF mode) to appear in lookbehind assertions, as it + makes it impossible to calculate the length of the lookbehind. The \X and + \R escapes, which can match different numbers of data units, are not + permitted either.
-Assertions can be nested in any combination. For example,
+"Subroutine" calls (see below), such as (?2) or (?&X), are permitted + in lookbehinds, as long as the subpattern matches a fixed-length string. + Recursion, however, is not supported.
-+(?<=(?<!foo)bar)baz
Possessive quantifiers can be used with lookbehind + assertions to specify efficient matching of fixed-length strings at the + end of subject strings. Consider the following simple pattern when applied + to a long string that does not match:
-matches an occurrence of "baz" that is preceded by "bar" which in -turn is not preceded by "foo", while
+
+abcd$
-+(?<=\d{3}(?!999)...)foo
As matching proceeds from left to right, PCRE looks for each "a" in the + subject and then sees if what follows matches the remaining pattern. If + the pattern is specified as
-is another pattern that matches "foo" preceded by three digits and any three -characters that are not "999".
+
+^.*abcd$
-the initial .* matches the entire string at first. However, when this + fails (as there is no following "a"), it backtracks to match all but the + last character, then all but the last two characters, and so on. Once + again the search for "a" covers the entire string, from right to left, so + we are no better off. However, if the pattern is written as
-
+^.*+(?<=abcd)
-It is possible to cause the matching process to obey a subpattern -conditionally or to choose between two alternative subpatterns, depending on -the result of an assertion, or whether a specific capturing subpattern has -already been matched. The two possible forms of conditional subpattern are:
+there can be no backtracking for the .*+ item; it can match only the + entire string. The subsequent lookbehind assertion does a single test on + the last four characters. If it fails, the match fails immediately. For + long strings, this approach makes a significant difference to the + processing time.
-Using Multiple Assertions
-If the condition is satisfied, the yes-pattern is used; otherwise the -no-pattern (if present) is used. If there are more than two alternatives in the -subpattern, a compile-time error occurs. Each of the two alternatives may -itself contain nested subpatterns of any form, including conditional -subpatterns; the restriction to two alternatives applies only at the level of -the condition. This pattern fragment is an example where the alternatives are -complex:
+Many assertions (of any sort) can occur in succession. For example, the + following matches "foo" preceded by three digits that are not "999":
-+(?(1) (A|B|C) | (D | (?(2)E|F) | E) )
+(?<=\d{3})(?<!999)foo
-There are four kinds of condition: references to subpatterns, references to -recursion, a pseudo-condition called DEFINE, and assertions.
+Notice that each of the assertions is applied independently at the same + point in the subject string. First there is a check that the previous + three characters are all digits, and then there is a check that the same + three characters are not "999". This pattern does not match + "foo" preceded by six characters, the first of which are digits and the + last three of which are not "999". For example, it does not match + "123abcfoo". A pattern to do that is the following:
+
+(?<=\d{3}...)(?<!999)foo
-Checking for a used subpattern by number
+This time the first assertion looks at the preceding six characters, + checks that the first three are digits, and then the second assertion + checks that the preceding three characters are not "999".
-If the text between the parentheses consists of a sequence of -digits, the condition is true if a capturing subpattern of that number has previously -matched. If there is more than one capturing subpattern with the same number -(see the earlier section about duplicate subpattern numbers), -the condition is true if any of them have matched. An alternative notation is -to precede the digits with a plus or minus sign. In this case, the subpattern -number is relative rather than absolute. The most recently opened parentheses -can be referenced by (?(-1), the next most recent by (?(-2), and so on. Inside -loops it can also make sense to refer to subsequent groups. The next -parentheses to be opened can be referenced as (?(+1), and so on. (The value -zero in any of these forms is not used; it provokes a compile-time error.)
+Assertions can be nested in any combination. For example, the following + matches an occurrence of "baz" that is preceded by "bar", which in turn is + not preceded by "foo":
-Consider the following pattern, which contains non-significant
-whitespace to make it more readable (assume the
+(?<=(?<!foo)bar)baz
-+( \( )? [^()]+ (?(1) \) )
The following pattern matches "foo" preceded by three digits and any + three characters that are not "999":
-The first part matches an optional opening parenthesis, and if that -character is present, sets it as the first captured substring. The second part -matches one or more characters that are not parentheses. The third part is a -conditional subpattern that tests whether or not the first set of parentheses matched -or not. If they did, that is, if subject started with an opening parenthesis, -the condition is true, and so the yes-pattern is executed and a closing -parenthesis is required. Otherwise, since no-pattern is not present, the -subpattern matches nothing. In other words, this pattern matches a sequence of -non-parentheses, optionally enclosed in parentheses.
+
+(?<=\d{3}(?!999)...)foo
+ If you were embedding this pattern in a larger one, you could use a relative -reference:
+It is possible to cause the matching process to obey a subpattern + conditionally or to choose between two alternative subpatterns, depending + on the result of an assertion, or whether a specific capturing subpattern + has already been matched. The following are the two possible forms of + conditional subpattern:
+ +
+(?(condition)yes-pattern)
+(?(condition)yes-pattern|no-pattern)
+
+ If the condition is satisfied, the yes-pattern is used, otherwise the + no-pattern (if present). If more than two alternatives exist in the + subpattern, a compile-time error occurs. Each of the two alternatives can + itself contain nested subpatterns of any form, including conditional + subpatterns; the restriction to two alternatives applies only at the level + of the condition. The following pattern fragment is an example where the + alternatives are complex:
+ +
+(?(1) (A|B|C) | (D | (?(2)E|F) | E) )
+
+ There are four kinds of condition: references to subpatterns, references + to recursion, a pseudo-condition called DEFINE, and assertions.
+ +Checking for a Used Subpattern By Number
+ +If the text between the parentheses consists of a sequence of digits,
+ the condition is true if a capturing subpattern of that number has
+ previously matched. If more than one capturing subpattern with the same
+ number exists (see section
Consider the following pattern, which contains non-significant whitespace
+ to make it more readable (assume option
+( \( )? [^()]+ (?(1) \) )
+
+ The first part matches an optional opening parenthesis, and if that + character is present, sets it as the first captured substring. The second + part matches one or more characters that are not parentheses. The third + part is a conditional subpattern that tests whether the first set of + parentheses matched or not. If they did, that is, if subject started with + an opening parenthesis, the condition is true, and so the yes-pattern is + executed and a closing parenthesis is required. Otherwise, as no-pattern + is not present, the subpattern matches nothing. That is, this pattern + matches a sequence of non-parentheses, optionally enclosed in + parentheses.
+ +If this pattern is embedded in a larger one, a relative reference can be + used:
+ +
+...other stuff... ( \( )? [^()]+ (?(-1) \) ) ...
+
+ This makes the fragment independent of the parentheses in the larger + pattern.
+ +Checking for a Used Subpattern By Name
+ +Perl uses the syntax (?(<name>)...) or (?('name')...) to test for a + used subpattern by name. For compatibility with earlier versions of PCRE, + which had this facility before Perl, the syntax (?(name)...) is also + recognized. However, there is a possible ambiguity with this syntax, as + subpattern names can consist entirely of digits. PCRE looks first for a + named subpattern; if it cannot find one and the name consists entirely of + digits, PCRE looks for a subpattern of that number, which must be > 0. + Using subpattern names that consist entirely of digits is not + recommended.
+ +Rewriting the previous example to use a named subpattern gives:
+ +
+(?<OPEN> \( )? [^()]+ (?(<OPEN>) \) )
+
+ If the name used in a condition of this kind is a duplicate, the test is + applied to all subpatterns of the same name, and is true if any one of + them has matched.
+ +Checking for Pattern Recursion
+ +If the condition is the string (R), and there is no subpattern with the + name R, the condition is true if a recursive call to the whole pattern or + any subpattern has been made. If digits or a name preceded by ampersand + follow the letter R, for example:
+ +
+(?(R3)...) or (?(R&name)...)
+
+ the condition is true if the most recent recursion is into a subpattern + whose number or name is given. This condition does not check the entire + recursion stack. If the name used in a condition of this kind is a + duplicate, the test is applied to all subpatterns of the same name, and is + true if any one of them is the most recent recursion.
+ +At "top-level", all these recursion test conditions are false. The syntax + for recursive patterns is described below.
+ +Defining Subpatterns for Use By Reference Only
+If the condition is the string (DEFINE), and there is no subpattern with + the name DEFINE, the condition is always false. In this case, there can be + only one alternative in the subpattern. It is always skipped if control + reaches this point in the pattern. The idea of DEFINE is that it can be + used to define "subroutines" that can be referenced from elsewhere. (The + use of subroutines is described below.) For example, a pattern to match + an IPv4 address, such as "192.168.23.245", can be written like this + (ignore whitespace and line breaks):
+ +
+(?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) ) \b (?&byte) (\.(?&byte)){3} \b
+
+ The first part of the pattern is a DEFINE group inside which is a another + group named "byte" is defined. This matches an individual component of an + IPv4 address (a number < 256). When matching takes place, this part of + the pattern is skipped, as DEFINE acts like a false condition. The + remaining pattern uses references to the named group to match the four + dot-separated components of an IPv4 address, insisting on a word boundary + at each end.
+ +Assertion Conditions
+ +If the condition is not in any of the above formats, it must be an + assertion. This can be a positive or negative lookahead or lookbehind + assertion. Consider the following pattern, containing non-significant + whitespace, and with the two alternatives on the second line:
+ +
+(?(?=[^a-z]*[a-z])
+\d{2}-[a-z]{3}-\d{2} | \d{2}-\d{2}-\d{2} )
+
+ The condition is a positive lookahead assertion that matches an optional + sequence of non-letters followed by a letter. That is, it tests for the + presence of at least one letter in the subject. If a letter is found, the + subject is matched against the first alternative, otherwise it is matched + against the second. This pattern matches strings in one of the two forms + dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits.
++...other stuff... ( \( )? [^()]+ (?(-1) \) ) ...
There are two ways to include comments in patterns that are processed by + PCRE. In both cases, the start of the comment must not be in a character + class, or in the middle of any other sequence of related characters such + as (?: or a subpattern name or number. The characters that make up a + comment play no part in the pattern matching.
+ +The sequence (?# marks the start of a comment that continues up to the
+ next closing parenthesis. Nested parentheses are not permitted. If option
+ PCRE_EXTENDED is set, an unescaped # character also introduces a comment,
+ which in this case continues to immediately after the next newline
+ character or character sequence in the pattern. Which characters are
+ interpreted as newlines is controlled by the options passed to a
+ compiling function or by a special sequence at the start of the pattern,
+ as described in section
Notice that the end of this type of comment is a literal newline sequence
+ in the pattern; escape sequences that happen to represent a newline do not
+ count. For example, consider the following pattern when
+abc #comment \n still comment
+
+ On encountering character #,
This makes the fragment independent of the parentheses in the larger pattern.
+Consider the problem of matching a string in parentheses, allowing for + unlimited nested parentheses. Without the use of recursion, the best that + can be done is to use a pattern that matches up to some fixed depth of + nesting. It is not possible to handle an arbitrary nesting depth.
+ +For some time, Perl has provided a facility that allows regular + expressions to recurse (among other things). It does this by + interpolating Perl code in the expression at runtime, and the code can + refer to the expression itself. A Perl pattern using code interpolation to + solve the parentheses problem can be created like this:
+ +
+$re = qr{\( (?: (?>[^()]+) | (?p{$re}) )* \)}x;
+
+ Item (?p{...}) interpolates Perl code at runtime, and in this case refers + recursively to the pattern in which it appears.
+ +Obviously, PCRE cannot support the interpolation of Perl code. Instead, + it supports special syntax for recursion of the entire pattern, and for + individual subpattern recursion. After its introduction in PCRE and + Python, this kind of recursion was later introduced into Perl at + release 5.10.
+ +A special item that consists of (? followed by a number > 0 and a + closing parenthesis is a recursive subroutine call of the subpattern of + the given number, if it occurs inside that subpattern. (If not, + it is a non-recursive subroutine call, which is described in the next + section.) The special item (?R) or (?0) is a recursive call of the entire + regular expression.
-Checking for a used subpattern by name
+This PCRE pattern solves the nested parentheses problem (assume that
+ option
+\( ( [^()]++ | (?R) )* \)
+
+ First it matches an opening parenthesis. Then it matches any number of + substrings, which can either be a sequence of non-parentheses or a + recursive match of the pattern itself (that is, a correctly parenthesized + substring). Finally there is a closing parenthesis. Notice the use of a + possessive quantifier to avoid backtracking into sequences of + non-parentheses.
+ +If this was part of a larger pattern, you would not want to recurse the + entire pattern, so instead you can use:
+ +
+( \( ( [^()]++ | (?1) )* \) )
+
+ The pattern is here within parentheses so that the recursion refers to + them instead of the whole pattern.
+ +In a larger pattern, keeping track of parenthesis numbers can be tricky. + This is made easier by the use of relative references. Instead of (?1) in + the pattern above, you can write (?-2) to refer to the second most + recently opened parentheses preceding the recursion. That is, a negative + number counts capturing parentheses leftwards from the point at which it + is encountered.
+ +It is also possible to refer to later opened parentheses, by + writing references such as (?+2). However, these cannot be recursive, as + the reference is not inside the parentheses that are referenced. They are + always non-recursive subroutine calls, as described in the next + section.
+ +An alternative approach is to use named parentheses instead. The Perl + syntax for this is (?&name). The earlier PCRE syntax (?P>name) is + also supported. We can rewrite the above example as follows:
+ +
+(?<pn> \( ( [^()]++ | (?&pn) )* \) )
+
+ If there is more than one subpattern with the same name, the earliest + one is used.
+ +This particular example pattern that we have studied contains nested + unlimited repeats, and so the use of a possessive quantifier for matching + strings of non-parentheses is important when applying the pattern to + strings that do not match. For example, when this pattern is applied + to
+ +
+(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
+
+ it gives "no match" quickly. However, if a possessive quantifier is not + used, the match runs for a long time, as there are so many different + ways the + and * repeats can carve up the subject, and all must be tested + before failure can be reported.
+ +At the end of a match, the values of capturing parentheses are those from + the outermost level. If the pattern above is matched against
+ +
+(ab(cd)ef)
+
+ the value for the inner capturing parentheses (numbered 2) is "ef", + which is the last value taken on at the top-level. If a capturing + subpattern is not matched at the top level, its final captured value is + unset, even if it was (temporarily) set at a deeper level during the + matching process.
+ +Do not confuse item (?R) with condition (R), which tests for recursion. + Consider the following pattern, which matches text in angle brackets, + allowing for arbitrary nesting. Only digits are allowed in nested brackets + (that is, when recursing), while any characters are permitted at the + outer level.
+ +
+< (?: (?(R) \d++ | [^<>]*+) | (?R)) * >
+
+ Here (?(R) is the start of a conditional subpattern, with two different + alternatives for the recursive and non-recursive cases. Item (?R) is the + actual recursive call.
+ +Differences in Recursion Processing between PCRE and Perl
+ +Recursion processing in PCRE differs from Perl in two important ways. In + PCRE (like Python, but unlike Perl), a recursive subpattern call is always + treated as an atomic group. That is, once it has matched some of the + subject string, it is never re-entered, even if it contains untried + alternatives and there is a subsequent matching failure. This can be + illustrated by the following pattern, which means to match a palindromic + string containing an odd number of characters (for example, "a", "aba", + "abcba", "abcdcba"):
+ +
+^(.|(.)(?1)\2)$
+
+ The idea is that it either matches a single character, or two identical + characters surrounding a subpalindrome. In Perl, this pattern works; in + PCRE it does not work if the pattern is longer than three characters. + Consider the subject string "abcba".
+ +At the top level, the first character is matched, but as it is not at + the end of the string, the first alternative fails, the second + alternative is taken, and the recursion kicks in. The recursive call to + subpattern 1 successfully matches the next character ("b"). (Notice that + the beginning and end of line tests are not part of the recursion.)
+ +Back at the top level, the next character ("c") is compared with what + subpattern 2 matched, which was "a". This fails. As the recursion is + treated as an atomic group, there are now no backtracking points, and so + the entire match fails. (Perl can now re-enter the recursion + and try the second alternative.) However, if the pattern is written with + the alternatives in the other order, things are different:
+ +
+^((.)(?1)\2|.)$
+
+ This time, the recursing alternative is tried first, and continues to + recurse until it runs out of characters, at which point the recursion + fails. But this time we have another alternative to try at the higher + level. That is the significant difference: in the previous case the + remaining alternative is at a deeper recursion level, which PCRE cannot + use.
+ +To change the pattern so that it matches all palindromic strings, not + only those with an odd number of characters, it is tempting to change the + pattern to this:
+ +
+^((.)(?1)\2|.?)$
+
+ Again, this works in Perl, but not in PCRE, and for the same reason. When + a deeper recursion has matched a single character, it cannot be entered + again to match an empty string. The solution is to separate the two cases, + and write out the odd and even cases as alternatives at the higher + level:
+ +
+^(?:((.)(?1)\2|)|((.)(?3)\4|.))
+
+ If you want to match typical palindromic phrases, the pattern must ignore + all non-word characters, which can be done as follows:
+ +
+^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$
+
+ If run with option
Perl uses the syntax (?(<name>)...) or (?('name')...) to test -for a used subpattern by name. For compatibility with earlier versions -of PCRE, which had this facility before Perl, the syntax (?(name)...) -is also recognized. However, there is a possible ambiguity with this -syntax, because subpattern names may consist entirely of digits. PCRE -looks first for a named subpattern; if it cannot find one and the name -consists entirely of digits, PCRE looks for a subpattern of that -number, which must be greater than zero. Using subpattern names that -consist entirely of digits is not recommended.
+The palindrome-matching patterns above work only if the subject string + does not start with a palindrome that is shorter than the entire string. + For example, although "abcba" is correctly matched, if the subject is + "ababa", PCRE finds palindrome "aba" at the start, and then fails at top + level, as the end of the string does not follow. Once again, it cannot + jump back into the recursion to try other alternatives, so the entire + match fails.
+Rewriting the above example to use a named subpattern gives this:
+The second way in which PCRE and Perl differ in their recursion + processing is in the handling of captured values. In Perl, when a + subpattern is called recursively or as a subpattern (see the next + section), it has no access to any values that were captured outside the + recursion. In PCRE these values can be referenced. Consider the following + pattern:
+ +
+^(.)(\1|a(?2))
+
+ In PCRE, it matches "bab". The first capturing parentheses match "b", + then in the second group, when the back reference \1 fails to match "b", + the second alternative matches "a", and then recurses. In the recursion, + \1 does now match "b" and so the whole match succeeds. In Perl, the + pattern fails to match because inside the recursive call \1 cannot access + the externally set value.
++(?<OPEN> \( )? [^()]+ (?(<OPEN>) \) )
If the syntax for a recursive subpattern call (either by number or by + name) is used outside the parentheses to which it refers, it operates + like a subroutine in a programming language. The called subpattern can be + defined before or after the reference. A numbered reference can be + absolute or relative, as in the following examples:
+ +
+(...(absolute)...)...(?2)...
+(...(relative)...)...(?-1)...
+(...(?+1)...(relative)...
+
+ An earlier example pointed out that the following pattern matches "sense + and sensibility" and "response and responsibility", but not "sense and + responsibility":
+ +
+(sens|respons)e and \1ibility
+
+ If instead the following pattern is used, it matches "sense and + responsibility" and the other two strings:
+ +
+(sens|respons)e and (?1)ibility
+
+ Another example is provided in the discussion of DEFINE earlier.
+ +All subroutine calls, recursive or not, are always treated as atomic + groups. That is, once a subroutine has matched some of the subject string, + it is never re-entered, even if it contains untried alternatives and there + is a subsequent matching failure. Any capturing parentheses that are set + during the subroutine call revert to their previous values afterwards.
+ +Processing options such as case-independence are fixed when a subpattern + is defined, so if it is used as a subroutine, such options cannot be + changed for different calls. For example, the following pattern matches + "abcabc" but not "abcABC", as the change of processing option does not + affect the called subpattern:
+ +
+(abc)(?i:(?-1))
+ If the name used in a condition of this kind is a duplicate, the test is -applied to all subpatterns of the same name, and is true if any one of them has -matched.
+For compatibility with Oniguruma, the non-Perl syntax \g followed by a + name or a number enclosed either in angle brackets or single quotes, is + alternative syntax for referencing a subpattern as a subroutine, possibly + recursively. Here follows two of the examples used above, rewritten using + this syntax:
+ +
+(?<pn> \( ( (?>[^()]+) | \g<pn> )* \) )
+(sens|respons)e and \g'1'ibility
+
+ PCRE supports an extension to Oniguruma: if a number is preceded by a + plus or minus sign, it is taken as a relative reference, for example:
+ +
+(abc)(?i:\g<-1>)
+
+ Notice that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) + are not synonymous. The former is a back reference; the latter + is a subroutine call.
+Checking for pattern recursion
+Perl 5.10 introduced some "Special Backtracking Control Verbs", + which are still described in the Perl documentation as "experimental and + subject to change or removal in a future version of Perl". It goes on to + say: "Their usage in production code should be noted to avoid problems + during upgrades." The same remarks apply to the PCRE features described + in this section.
+ +The new verbs make use of what was previously invalid syntax: an opening + parenthesis followed by an asterisk. They are generally of the form + (*VERB) or (*VERB:NAME). Some can take either form, possibly behaving + differently depending on whether a name is present. A name is any sequence + of characters that does not include a closing parenthesis. The maximum + name length is 255 in the 8-bit library and 65535 in the 16-bit and 32-bit + libraries. If the name is empty, that is, if the closing parenthesis + immediately follows the colon, the effect is as if the colon was not + there. Any number of these verbs can occur in a pattern.
+ +The behavior of these verbs in repeated groups, assertions, and in + subpatterns called as subroutines (whether or not recursively) is + described below.
+ +Optimizations That Affect Backtracking Verbs
+ +PCRE contains some optimizations that are used to speed up matching by
+ running some checks at the start of each match attempt. For example, it
+ can know the minimum length of matching subject, or that a particular
+ character must be present. When one of these optimizations bypasses the
+ running of a match, any included backtracking verbs are not processed.
+ processed. You can suppress the start-of-match optimizations by setting
+ option
Experiments with Perl suggest that it too has similar optimizations, + sometimes leading to anomalous results.
+ +Verbs That Act Immediately
+ +The following verbs act as soon as they are encountered. They must not + be followed by a name.
+ +
+(*ACCEPT)
+
+ This verb causes the match to end successfully, skipping the remainder of + the pattern. However, when it is inside a subpattern that is called as a + subroutine, only that subpattern is ended successfully. Matching then + continues at the outer level. If (*ACCEPT) is triggered in a positive + assertion, the assertion succeeds; in a negative assertion, the assertion + fails.
+ +If (*ACCEPT) is inside capturing parentheses, the data so far is + captured. For example, the following matches "AB", "AAD", or "ACD". When + it matches "AB", "B" is captured by the outer parentheses.
+ +
+A((?:A|B(*ACCEPT)|C)D)
+
+ The following verb causes a matching failure, forcing backtracking to + occur. It is equivalent to (?!) but easier to read.
+ +
+(*FAIL) or (*F)
+
+ The Perl documentation states that it is probably useful only when + combined with (?{}) or (??{}). Those are Perl features that + are not present in PCRE.
+ +A match with the string "aaaa" always fails, but the callout is taken + before each backtrack occurs (in this example, 10 times).
+ +Recording Which Path Was Taken
+ +The main purpose of this verb is to track how a match was arrived at, + although it also has a secondary use in with advancing the match + starting point (see (*SKIP) below).
-If the condition is the string (R), and there is no subpattern with -the name R, the condition is true if a recursive call to the whole -pattern or any subpattern has been made. If digits or a name preceded -by ampersand follow the letter R, for example:
+In Erlang, there is no interface to retrieve a mark with
+
+(?(R3)...) or (?(R&name)...)
The rest of this section is therefore deliberately not adapted for + reading by the Erlang programmer, but the examples can help in + understanding NAMES as they can be used by (*SKIP).
+the condition is true if the most recent recursion is into a -subpattern whose number or name is given. This condition does not -check the entire recursion stack. If the name used in a condition of this kind is a duplicate, the test is -applied to all subpatterns of the same name, and is true if any one of them is -the most recent recursion.
+
+(*MARK:NAME) or (*:NAME)
-At "top level", all these recursion test conditions are false. The syntax for recursive -patterns is described below.
- -Defining subpatterns for use by reference only
- -If the condition is the string (DEFINE), and there is no subpattern with the -name DEFINE, the condition is always false. In this case, there may be only one -alternative in the subpattern. It is always skipped if control reaches this -point in the pattern; the idea of DEFINE is that it can be used to define -"subroutines" that can be referenced from elsewhere. (The use of subroutines -is described below.) For example, a pattern to match an IPv4 address such as -"192.168.23.245" could be -written like this (ignore whitespace and line breaks):
+A name is always required with this verb. There can be as many instances + of (*MARK) as you like in a pattern, and their names do not have to be + unique.
-- -(?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) ) - \b (?&byte) (\.(?&byte)){3} \b
The first part of the pattern is a DEFINE group inside which a -another group named "byte" is defined. This matches an individual -component of an IPv4 address (a number less than 256). When matching -takes place, this part of the pattern is skipped because DEFINE acts -like a false condition. The rest of the pattern uses references to the -named group to match the four dot-separated components of an IPv4 -address, insisting on a word boundary at each end.
- -Assertion conditions
+When a match succeeds, the name of the last encountered (*MARK:NAME),
+ (*PRUNE:NAME), or (*THEN:NAME) on the matching path is passed back to the
+ caller as described in section "Extra data for
If the condition is not in any of the above formats, it must be an -assertion. This may be a positive or negative lookahead or lookbehind -assertion. Consider this pattern, again containing non-significant -whitespace, and with the two alternatives on the second line:
+
+ re> /X(*MARK:A)Y|X(*MARK:B)Z/K
+data> XY
+ 0: XY
+MK: A
+XZ
+ 0: XZ
+MK: B
+
+ The (*MARK) name is tagged with "MK:" in this output, and in this example + it indicates which of the two alternatives matched. This is a more + efficient way of obtaining this information than putting each alternative + in its own capturing parentheses.
+ +If a verb with a name is encountered in a positive assertion that is + true, the name is recorded and passed back if it is the last encountered. + This does not occur for negative assertions or failing positive + assertions.
+ +After a partial match or a failed match, the last encountered name in the + entire match process is returned, for example:
+ +
+ re> /X(*MARK:A)Y|X(*MARK:B)Z/K
+data> XP
+No match, mark = B
+
+ Notice that in this unanchored example, the mark is retained from the + match attempt that started at letter "X" in the subject. Subsequent match + attempts starting at "P" and then with an empty string do not get as far + as the (*MARK) item, nevertheless do not reset it.
+ +Verbs That Act after Backtracking
+ +The following verbs do nothing when they are encountered. Matching + continues with what follows, but if there is no subsequent match, causing + a backtrack to the verb, a failure is forced. That is, backtracking cannot + pass to the left of the verb. However, when one of these verbs appears + inside an atomic group or an assertion that is true, its effect is + confined to that group, as once the group has been matched, there is never + any backtracking into it. In this situation, backtracking can "jump back" + to the left of the entire atomic group or assertion. (Remember also, as + stated above, that this localization also applies in subroutine + calls.)
+ +These verbs differ in exactly what kind of failure occurs when + backtracking reaches them. The behavior described below is what occurs + when the verb is not in a subroutine or an assertion. Subsequent sections + cover these special cases.
+ +The following verb, which must not be followed by a name, causes the + whole match to fail outright if there is a later matching failure that + causes backtracking to reach it. Even if the pattern is unanchored, no + further attempts to find a match by advancing the starting point take + place.
+ +
+(*COMMIT)
+
+ If (*COMMIT) is the only backtracking verb that is encountered, once it
+ has been passed,
+a+(*COMMIT)b
+
+ This matches "xxaab" but not "aacaab". It can be thought of as a kind of + dynamic anchor, or "I've started, so I must finish". The name of the most + recently passed (*MARK) in the path is passed back when (*COMMIT) forces + a match failure.
+ +If more than one backtracking verb exists in a pattern, a different one + that follows (*COMMIT) can be triggered first, so merely passing (*COMMIT) + during a match does not always guarantee that a match must be at this + starting point.
+ +Notice that (*COMMIT) at the start of a pattern is not the same as an + anchor, unless the PCRE start-of-match optimizations are turned off, as + shown in the following example:
- (?(?=[^a-z]*[a-z])
- \d{2}-[a-z]{3}-\d{2} | \d{2}-\d{2}-\d{2} )
-
-The condition is a positive lookahead assertion that matches an optional -sequence of non-letters followed by a letter. In other words, it tests for the -presence of at least one letter in the subject. If a letter is found, the -subject is matched against the first alternative; otherwise it is matched -against the second. This pattern matches strings in one of the two forms -dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits.
- - -There are two ways of including comments in patterns that are processed by -PCRE. In both cases, the start of the comment must not be in a character class, -nor in the middle of any other sequence of related characters such as (?: or a -subpattern name or number. The characters that make up a comment play no part -in the pattern matching.
- -The sequence (?# marks the start of a comment that continues up to the next
-closing parenthesis. Nested parentheses are not permitted. If the PCRE_EXTENDED
-option is set, an unescaped # character also introduces a comment, which in
-this case continues to immediately after the next newline character or
-character sequence in the pattern. Which characters are interpreted as newlines
-is controlled by the options passed to a compiling function or by a special
-sequence at the start of the pattern, as described in the section entitled
-"Newline conventions"
-above. Note that the end of this type of comment is a literal newline sequence
-in the pattern; escape sequences that happen to represent a newline do not
-count. For example, consider this pattern when
- -abc #comment \n still comment
On encountering the # character, pcre_compile() skips along, looking for -a newline in the pattern. The sequence \n is still literal at this stage, so -it does not terminate the comment. Only an actual character with the code value -0x0a (the default newline) does so.
- -PCRE knows that any match must start with "a", so the optimization skips
+ along the subject to "a" before running the first match attempt, which
+ succeeds. When the optimization is disabled by option
+
The following verb causes the match to fail at the current starting + position in the subject if there is a later matching failure that causes + backtracking to reach it:
+ +
+(*PRUNE) or (*PRUNE:NAME)
+
+ If the pattern is unanchored, the normal "bumpalong" advance to the next + starting character then occurs. Backtracking can occur as usual to the + left of (*PRUNE), before it is reached, or when matching to the right of + (*PRUNE), but if there is no match to the right, backtracking cannot + cross (*PRUNE). In simple cases, the use of (*PRUNE) is just an + alternative to an atomic group or possessive quantifier, but there are + some uses of (*PRUNE) that cannot be expressed in any other way. In an + anchored pattern, (*PRUNE) has the same effect as (*COMMIT).
+ +The behavior of (*PRUNE:NAME) is the not the same as + (*MARK:NAME)(*PRUNE). It is like (*MARK:NAME) in that the name is + remembered for passing back to the caller. However, (*SKIP:NAME) searches + only for names set with (*MARK).
-Consider the problem of matching a string in parentheses, allowing for -unlimited nested parentheses. Without the use of recursion, the best that can -be done is to use a pattern that matches up to some fixed depth of nesting. It -is not possible to handle an arbitrary nesting depth.
- -For some time, Perl has provided a facility that allows regular -expressions to recurse (amongst other things). It does this by -interpolating Perl code in the expression at run time, and the code -can refer to the expression itself. A Perl pattern using code -interpolation to solve the parentheses problem can be created like -this:
- -- -$re = qr{\( (?: (?>[^()]+) | (?p{$re}) )* \)}x;
The (?p{...}) item interpolates Perl code at run time, and in this -case refers recursively to the pattern in which it appears.
- -Obviously, PCRE cannot support the interpolation of Perl code. Instead, it -supports special syntax for recursion of the entire pattern, and also for -individual subpattern recursion. After its introduction in PCRE and Python, -this kind of recursion was subsequently introduced into Perl at release 5.10.
- -A special item that consists of (? followed by a number greater -than zero and a closing parenthesis is a recursive subroutine call of the -subpattern of the given number, provided that it occurs inside that -subpattern. (If not, it is a non-recursive subroutine call, which is described in -the next section.) The special item (?R) or (?0) is a recursive call -of the entire regular expression.
- -This PCRE pattern solves the nested parentheses problem (assume the
-
- -\( ( [^()]++ | (?R) )* \)
First it matches an opening parenthesis. Then it matches any number -of substrings which can either be a sequence of non-parentheses, or a -recursive match of the pattern itself (that is, a correctly -parenthesized substring). Finally there is a closing -parenthesis. Note the use of a possessive quantifier to avoid -backtracking into sequences of non-parentheses.
- -If this were part of a larger pattern, you would not want to -recurse the entire pattern, so instead you could use this:
- -- -( \( ( [^()]++ | (?1) )* \) )
We have put the pattern into parentheses, and caused the recursion -to refer to them instead of the whole pattern.
- -In a larger pattern, keeping track of parenthesis numbers can be tricky. This -is made easier by the use of relative references. Instead of (?1) in the -pattern above you can write (?-2) to refer to the second most recently opened -parentheses preceding the recursion. In other words, a negative number counts -capturing parentheses leftwards from the point at which it is encountered.
- -It is also possible to refer to subsequently opened parentheses, by -writing references such as (?+2). However, these cannot be recursive -because the reference is not inside the parentheses that are -referenced. They are always non-recursive subroutine calls, as described in the -next section.
- -An alternative approach is to use named parentheses instead. The -Perl syntax for this is (?&name); PCRE's earlier syntax -(?P>name) is also supported. We could rewrite the above example as -follows:
- -- -(?<pn> \( ( [^()]++ | (?&pn) )* \) )
If there is more than one subpattern with the same name, the earliest one is -used.
- -This particular example pattern that we have been looking at contains nested -unlimited repeats, and so the use of a possessive quantifier for matching -strings of non-parentheses is important when applying the pattern to strings -that do not match. For example, when this pattern is applied to
- -- -(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
it yields "no match" quickly. However, if a possessive quantifier is not used, -the match runs for a very long time indeed because there are so many different -ways the + and * repeats can carve up the subject, and all have to be tested -before failure can be reported.
- -At the end of a match, the values of capturing parentheses are those from -the outermost level. If the pattern above is matched against
- -- -(ab(cd)ef)
the value for the inner capturing parentheses (numbered 2) is "ef", which is -the last value taken on at the top level. If a capturing subpattern is not -matched at the top level, its final captured value is unset, even if it was -(temporarily) set at a deeper level during the matching process.
- -Do not confuse the (?R) item with the condition (R), which tests for recursion. -Consider this pattern, which matches text in angle brackets, allowing for -arbitrary nesting. Only digits are allowed in nested brackets (that is, when -recursing), whereas any characters are permitted at the outer level.
- -- -< (?: (?(R) \d++ | [^<>]*+) | (?R)) * >
In this pattern, (?(R) is the start of a conditional subpattern, with two -different alternatives for the recursive and non-recursive cases. The (?R) item -is the actual recursive call.
- -Differences in recursion processing between PCRE and Perl
- -Recursion processing in PCRE differs from Perl in two important ways. In PCRE -(like Python, but unlike Perl), a recursive subpattern call is always treated -as an atomic group. That is, once it has matched some of the subject string, it -is never re-entered, even if it contains untried alternatives and there is a -subsequent matching failure. This can be illustrated by the following pattern, -which purports to match a palindromic string that contains an odd number of -characters (for example, "a", "aba", "abcba", "abcdcba"):
- -- -^(.|(.)(?1)\2)$
The idea is that it either matches a single character, or two identical -characters surrounding a sub-palindrome. In Perl, this pattern works; in PCRE -it does not if the pattern is longer than three characters. Consider the -subject string "abcba":
- -At the top level, the first character is matched, but as it is not at the end -of the string, the first alternative fails; the second alternative is taken -and the recursion kicks in. The recursive call to subpattern 1 successfully -matches the next character ("b"). (Note that the beginning and end of line -tests are not part of the recursion).
- -Back at the top level, the next character ("c") is compared with what -subpattern 2 matched, which was "a". This fails. Because the recursion is -treated as an atomic group, there are now no backtracking points, and so the -entire match fails. (Perl is able, at this point, to re-enter the recursion and -try the second alternative.) However, if the pattern is written with the -alternatives in the other order, things are different:
- -- -^((.)(?1)\2|.)$
This time, the recursing alternative is tried first, and continues to recurse -until it runs out of characters, at which point the recursion fails. But this -time we do have another alternative to try at the higher level. That is the big -difference: in the previous case the remaining alternative is at a deeper -recursion level, which PCRE cannot use.
- -To change the pattern so that it matches all palindromic strings, not just -those with an odd number of characters, it is tempting to change the pattern to -this:
- -- -^((.)(?1)\2|.?)$
Again, this works in Perl, but not in PCRE, and for the same reason. When a -deeper recursion has matched a single character, it cannot be entered again in -order to match an empty string. The solution is to separate the two cases, and -write out the odd and even cases as alternatives at the higher level:
- -- -^(?:((.)(?1)\2|)|((.)(?3)\4|.))
If you want to match typical palindromic phrases, the pattern has to ignore all -non-word characters, which can be done like this:
- -- -^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$
If run with the
WARNING: The palindrome-matching patterns above work only if the subject -string does not start with a palindrome that is shorter than the entire string. -For example, although "abcba" is correctly matched, if the subject is "ababa", -PCRE finds the palindrome "aba" at the start, then fails at top level because -the end of the string does not follow. Once again, it cannot jump back into the -recursion to try other alternatives, so the entire match fails.
- -The second way in which PCRE and Perl differ in their recursion processing is -in the handling of captured values. In Perl, when a subpattern is called -recursively or as a subpattern (see the next section), it has no access to any -values that were captured outside the recursion, whereas in PCRE these values -can be referenced. Consider this pattern:
- -- -^(.)(\1|a(?2))
In PCRE, this pattern matches "bab". The first capturing parentheses match "b", -then in the second group, when the back reference \1 fails to match "b", the -second alternative matches "a" and then recurses. In the recursion, \1 does -now match "b" and so the whole match succeeds. In Perl, the pattern fails to -match because inside the recursive call \1 cannot access the externally set -value.
- -If the syntax for a recursive subpattern call (either by number or by -name) is used outside the parentheses to which it refers, it operates like a -subroutine in a programming language. The called subpattern may be defined -before or after the reference. A numbered reference can be absolute or -relative, as in these examples:
- -An earlier example pointed out that the pattern
- -- -(sens|respons)e and \1ibility
matches "sense and sensibility" and "response and responsibility", but not -"sense and responsibility". If instead the pattern
- -- -(sens|respons)e and (?1)ibility
is used, it does match "sense and responsibility" as well as the other two -strings. Another example is given in the discussion of DEFINE above.
- -All subroutine calls, whether recursive or not, are always treated as atomic -groups. That is, once a subroutine has matched some of the subject string, it -is never re-entered, even if it contains untried alternatives and there is a -subsequent matching failure. Any capturing parentheses that are set during the -subroutine call revert to their previous values afterwards.
- -Processing options such as case-independence are fixed when a subpattern is -defined, so if it is used as a subroutine, such options cannot be changed for -different calls. For example, consider this pattern:
-- -(abc)(?i:(?-1))
It matches "abcabc". It does not match "abcABC" because the change of -processing option does not affect the called subpattern.
- -For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or -a number enclosed either in angle brackets or single quotes, is an alternative -syntax for referencing a subpattern as a subroutine, possibly recursively. Here -are two of the examples used above, rewritten using this syntax:
---(?<pn> \( ( (?>[^()]+) | \g<pn> )* \) )
-(sens|respons)e and \g'1'ibility
-
PCRE supports an extension to Oniguruma: if a number is preceded by a -plus or a minus sign it is taken as a relative reference. For example:
- -- -(abc)(?i:\g<-1>)
Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are not -synonymous. The former is a back reference; the latter is a subroutine call.
- -(*SKIP) signifies that whatever text was matched leading up to it cannot + be part of a successful match. Consider:
-Perl 5.10 introduced a number of "Special Backtracking Control Verbs", which -are still described in the Perl documentation as "experimental and subject to -change or removal in a future version of Perl". It goes on to say: "Their usage -in production code should be noted to avoid problems during upgrades." The same -remarks apply to the PCRE features described in this section.
- -The new verbs make use of what was previously invalid syntax: an opening -parenthesis followed by an asterisk. They are generally of the form -(*VERB) or (*VERB:NAME). Some may take either form, possibly behaving -differently depending on whether or not a name is present. A name is any -sequence of characters that does not include a closing parenthesis. The maximum -length of name is 255 in the 8-bit library and 65535 in the 16-bit and 32-bit -libraries. If the name is empty, that is, if the closing parenthesis -immediately follows the colon, the effect is as if the colon were not there. -Any number of these verbs may occur in a pattern.
- - -The behaviour of these verbs in -repeated groups, assertions, -and in subpatterns called as subroutines -(whether or not recursively) is documented below.
- -Optimizations that affect backtracking verbs
- -PCRE contains some optimizations that are used to speed up matching by running
-some checks at the start of each match attempt. For example, it may know the
-minimum length of matching subject, or that a particular character must be
-present. When one of these optimizations bypasses the running of a match, any
-included backtracking verbs will not, of course, be processed. You can suppress
-the start-of-match optimizations by setting the
Experiments with Perl suggest that it too has similar optimizations, sometimes -leading to anomalous results.
- -Verbs that act immediately
- -The following verbs act as soon as they are encountered. They may not be -followed by a name.
- -- -(*ACCEPT)
This verb causes the match to end successfully, skipping the remainder of the -pattern. However, when it is inside a subpattern that is called as a -subroutine, only that subpattern is ended successfully. Matching then continues -at the outer level. If (*ACCEPT) in triggered in a positive assertion, the -assertion succeeds; in a negative assertion, the assertion fails.
- -If (*ACCEPT) is inside capturing parentheses, the data so far is captured. For -example:
- -- -A((?:A|B(*ACCEPT)|C)D)
This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by -the outer parentheses.
- -- -(*FAIL) or (*F)
This verb causes a matching failure, forcing backtracking to occur. It is -equivalent to (?!) but easier to read. The Perl documentation notes that it is -probably useful only when combined with (?{}) or (??{}). Those are, of course, -Perl features that are not present in PCRE. The nearest equivalent is the -callout feature, as for example in this pattern:
- -- -a+(?C)(*FAIL)
A match with the string "aaaa" always fails, but the callout is taken before -each backtrack happens (in this example, 10 times).
- -Recording which path was taken
- -There is one verb whose main purpose is to track how a match was arrived at, -though it also has a secondary use in conjunction with advancing the match -starting point (see (*SKIP) below).
- -In Erlang, there is no interface to retrieve a mark with
The rest of this section is therefore deliberately not adapted for reading -by the Erlang programmer, however the examples might help in understanding NAMES as -they can be used by (*SKIP).
-- -(*MARK:NAME) or (*:NAME)
A name is always required with this verb. There may be as many instances of -(*MARK) as you like in a pattern, and their names do not have to be unique.
- -When a match succeeds, the name of the last-encountered (*MARK:NAME),
-(*PRUNE:NAME), or (*THEN:NAME) on the matching path is passed back to the
-caller as described in the section entitled "Extra data for
- re> /X(*MARK:A)Y|X(*MARK:B)Z/K
- data> XY
- 0: XY
- MK: A
- XZ
- 0: XZ
- MK: B
-
-The (*MARK) name is tagged with "MK:" in this output, and in this example it -indicates which of the two alternatives matched. This is a more efficient way -of obtaining this information than putting each alternative in its own -capturing parentheses.
- -If a verb with a name is encountered in a positive assertion that is true, the -name is recorded and passed back if it is the last-encountered. This does not -happen for negative assertions or failing positive assertions.
- -After a partial match or a failed match, the last encountered name in the -entire match process is returned. For example:
-
- re> /X(*MARK:A)Y|X(*MARK:B)Z/K
- data> XP
- No match, mark = B
-
-Note that in this unanchored example the mark is retained from the match -attempt that started at the letter "X" in the subject. Subsequent match -attempts starting at "P" and then with an empty string do not get as far as the -(*MARK) item, but nevertheless do not reset it.
- - - -Verbs that act after backtracking
- -The following verbs do nothing when they are encountered. Matching continues -with what follows, but if there is no subsequent match, causing a backtrack to -the verb, a failure is forced. That is, backtracking cannot pass to the left of -the verb. However, when one of these verbs appears inside an atomic group or an -assertion that is true, its effect is confined to that group, because once the -group has been matched, there is never any backtracking into it. In this -situation, backtracking can "jump back" to the left of the entire atomic group -or assertion. (Remember also, as stated above, that this localization also -applies in subroutine calls.)
- -These verbs differ in exactly what kind of failure occurs when backtracking -reaches them. The behaviour described below is what happens when the verb is -not in a subroutine or an assertion. Subsequent sections cover these special -cases.
- -- -(*COMMIT)
This verb, which may not be followed by a name, causes the whole match to fail
-outright if there is a later matching failure that causes backtracking to reach
-it. Even if the pattern is unanchored, no further attempts to find a match by
-advancing the starting point take place. If (*COMMIT) is the only backtracking
-verb that is encountered, once it has been passed
- -a+(*COMMIT)b
This matches "xxaab" but not "aacaab". It can be thought of as a kind of -dynamic anchor, or "I've started, so I must finish." The name of the most -recently passed (*MARK) in the path is passed back when (*COMMIT) forces a -match failure.
- -If there is more than one backtracking verb in a pattern, a different one that -follows (*COMMIT) may be triggered first, so merely passing (*COMMIT) during a -match does not always guarantee that a match must be at this starting point.
- -Note that (*COMMIT) at the start of a pattern is not the same as an anchor, -unless PCRE's start-of-match optimizations are turned off, as shown in this - example:
-
- 1> re:run("xyzabc","(*COMMIT)abc",[{capture,all,list}]).
- {match,["abc"]}
- 2> re:run("xyzabc","(*COMMIT)abc",[{capture,all,list},no_start_optimize]).
- nomatch
-
-PCRE knows that any match must start with "a", so the optimization skips along
-the subject to "a" before running the first match attempt, which succeeds. When
-the optimization is disabled by the
- -(*PRUNE) or (*PRUNE:NAME)
This verb causes the match to fail at the current starting position in the -subject if there is a later matching failure that causes backtracking to reach -it. If the pattern is unanchored, the normal "bumpalong" advance to the next -starting character then happens. Backtracking can occur as usual to the left of -(*PRUNE), before it is reached, or when matching to the right of (*PRUNE), but -if there is no match to the right, backtracking cannot cross (*PRUNE). In -simple cases, the use of (*PRUNE) is just an alternative to an atomic group or -possessive quantifier, but there are some uses of (*PRUNE) that cannot be -expressed in any other way. In an anchored pattern (*PRUNE) has the same effect -as (*COMMIT).
- -The behaviour of (*PRUNE:NAME) is the not the same as (*MARK:NAME)(*PRUNE). -It is like (*MARK:NAME) in that the name is remembered for passing back to the -caller. However, (*SKIP:NAME) searches only for names set with (*MARK).
- -The fact that (*PRUNE:NAME) remembers the name is useless to the Erlang programmer, -as names can not be retrieved.
-- -(*SKIP)
This verb, when given without a name, is like (*PRUNE), except that if the -pattern is unanchored, the "bumpalong" advance is not to the next character, -but to the position in the subject where (*SKIP) was encountered. (*SKIP) -signifies that whatever text was matched leading up to it cannot be part of a -successful match. Consider:
- -- -a+(*SKIP)b
If the subject is "aaaac...", after the first match attempt fails (starting at -the first character in the string), the starting point skips on to start the -next attempt at "c". Note that a possessive quantifer does not have the same -effect as this example; although it would suppress backtracking during the -first match attempt, the second attempt would start at the second character -instead of skipping on to "c".
- -- -(*SKIP:NAME)
When (*SKIP) has an associated name, its behaviour is modified. When it is -triggered, the previous path through the pattern is searched for the most -recent (*MARK) that has the same name. If one is found, the "bumpalong" advance -is to the subject position that corresponds to that (*MARK) instead of to where -(*SKIP) was encountered. If no (*MARK) with a matching name is found, the -(*SKIP) is ignored.
- -Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores -names that are set by (*PRUNE:NAME) or (*THEN:NAME).
- -- -(*THEN) or (*THEN:NAME)
This verb causes a skip to the next innermost alternative when backtracking -reaches it. That is, it cancels any further backtracking within the current -alternative. Its name comes from the observation that it can be used for a -pattern-based if-then-else block:
- -+( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...
+a+(*SKIP)b
-If the COND1 pattern matches, FOO is tried (and possibly further items after -the end of the group if FOO succeeds); on failure, the matcher skips to the -second alternative and tries COND2, without backtracking into COND1. If that -succeeds and BAR fails, COND3 is tried. If subsequently BAZ fails, there are no -more alternatives, so there is a backtrack to whatever came before the entire -group. If (*THEN) is not inside an alternation, it acts like (*PRUNE).
- -The behaviour of (*THEN:NAME) is the not the same as (*MARK:NAME)(*THEN). -It is like (*MARK:NAME) in that the name is remembered for passing back to the -caller. However, (*SKIP:NAME) searches only for names set with (*MARK).
- -The fact that (*THEN:NAME) remembers the name is useless to the Erlang programmer, -as names can not be retrieved.
-A subpattern that does not contain a | character is just a part of the -enclosing alternative; it is not a nested alternation with only one -alternative. The effect of (*THEN) extends beyond such a subpattern to the -enclosing alternative. Consider this pattern, where A, B, etc. are complex -pattern fragments that do not contain any | characters at this level:
- -+A (B(*THEN)C) | D
If the subject is "aaaac...", after the first match attempt fails + (starting at the first character in the string), the starting point skips + on to start the next attempt at "c". Notice that a possessive quantifier + does not have the same effect as this example; although it would suppress + backtracking during the first match attempt, the second attempt would + start at the second character instead of skipping on to "c".
-If A and B are matched, but there is a failure in C, matching does not -backtrack into A; instead it moves to the next alternative, that is, D. -However, if the subpattern containing (*THEN) is given an alternative, it -behaves differently:
- -+A (B(*THEN)C | (*FAIL)) | D
When (*SKIP) has an associated name, its behavior is modified:
-The effect of (*THEN) is now confined to the inner subpattern. After a failure -in C, matching moves to (*FAIL), which causes the whole subpattern to fail -because there are no more alternatives to try. In this case, matching does now -backtrack into A.
+
+(*SKIP:NAME)
-Note that a conditional subpattern is not considered as having two -alternatives, because only one is ever used. In other words, the | character in -a conditional subpattern has a different meaning. Ignoring white space, -consider:
+When this is triggered, the previous path through the pattern is searched + for the most recent (*MARK) that has the same name. If one is found, the + "bumpalong" advance is to the subject position that corresponds to that + (*MARK) instead of to where (*SKIP) was encountered. If no (*MARK) with a + matching name is found, (*SKIP) is ignored.
-+^.*? (?(?=a) a | b(*THEN)c )
Notice that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It + ignores names that are set by (*PRUNE:NAME) or (*THEN:NAME).
-If the subject is "ba", this pattern does not match. Because .*? is ungreedy, -it initially matches zero characters. The condition (?=a) then fails, the -character "b" is matched, but "c" is not. At this point, matching does not -backtrack to .*? as might perhaps be expected from the presence of the | -character. The conditional subpattern is part of the single alternative that -comprises the whole pattern, and so the match fails. (If there was a backtrack -into .*?, allowing it to match "b", the match would succeed.)
+The following verb causes a skip to the next innermost alternative when + backtracking reaches it. That is, it cancels any further backtracking + within the current alternative.
-The verbs just described provide four different "strengths" of control when -subsequent matching fails. (*THEN) is the weakest, carrying on the match at the -next alternative. (*PRUNE) comes next, failing the match at the current -starting position, but allowing an advance to the next character (for an -unanchored pattern). (*SKIP) is similar, except that the advance may be more -than one character. (*COMMIT) is the strongest, causing the entire match to -fail.
+
+(*THEN) or (*THEN:NAME)
+ The verb name comes from the observation that it can be used for a + pattern-based if-then-else block:
-More than one backtracking verb
+
+( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...
-If more than one backtracking verb is present in a pattern, the one that is -backtracked onto first acts. For example, consider this pattern, where A, B, -etc. are complex pattern fragments:
+If the COND1 pattern matches, FOO is tried (and possibly further items + after the end of the group if FOO succeeds). On failure, the matcher skips + to the second alternative and tries COND2, without backtracking into + COND1. If that succeeds and BAR fails, COND3 is tried. If BAZ then fails, + there are no more alternatives, so there is a backtrack to whatever + came before the entire group. If (*THEN) is not inside an alternation, it + acts like (*PRUNE).
-+(A(*COMMIT)B(*THEN)C|ABD)
The behavior of (*THEN:NAME) is the not the same as (*MARK:NAME)(*THEN). + It is like (*MARK:NAME) in that the name is remembered for passing back to + the caller. However, (*SKIP:NAME) searches only for names set with + (*MARK).
-If A matches but B fails, the backtrack to (*COMMIT) causes the entire match to -fail. However, if A and B match, but C fails, the backtrack to (*THEN) causes -the next alternative (ABD) to be tried. This behaviour is consistent, but is -not always the same as Perl's. It means that if two or more backtracking verbs -appear in succession, all the the last of them has no effect. Consider this -example:
+The fact that (*THEN:NAME) remembers the name is useless to the Erlang + programmer, as names cannot be retrieved.
++...(*COMMIT)(*PRUNE)...
A subpattern that does not contain a | character is just a part of the + enclosing alternative; it is not a nested alternation with only one + alternative. The effect of (*THEN) extends beyond such a subpattern to the + enclosing alternative. Consider the following pattern, where A, B, and so + on, are complex pattern fragments that do not contain any | characters at + this level:
+ +
+A (B(*THEN)C) | D
+
+ If A and B are matched, but there is a failure in C, matching does not + backtrack into A; instead it moves to the next alternative, that is, D. + However, if the subpattern containing (*THEN) is given an alternative, it + behaves differently:
+ +
+A (B(*THEN)C | (*FAIL)) | D
+
+ The effect of (*THEN) is now confined to the inner subpattern. After a + failure in C, matching moves to (*FAIL), which causes the whole subpattern + to fail, as there are no more alternatives to try. In this case, matching + does now backtrack into A.
+ +Notice that a conditional subpattern is not considered as having two + alternatives, as only one is ever used. That is, the | character in a + conditional subpattern has a different meaning. Ignoring whitespace, + consider:
+ +
+^.*? (?(?=a) a | b(*THEN)c )
+
+ If the subject is "ba", this pattern does not match. As .*? is ungreedy, + it initially matches zero characters. The condition (?=a) then fails, the + character "b" is matched, but "c" is not. At this point, matching does not + backtrack to .*? as can perhaps be expected from the presence of the | + character. The conditional subpattern is part of the single alternative + that comprises the whole pattern, and so the match fails. (If there was a + backtrack into .*?, allowing it to match "b", the match would + succeed.)
+ +The verbs described above provide four different "strengths" of control + when subsequent matching fails:
+ +(*THEN) is the weakest, carrying on the match at the next + alternative.
+(*PRUNE) comes next, fails the match at the current starting + position, but allows an advance to the next character (for an + unanchored pattern).
+(*SKIP) is similar, except that the advance can be more than one + character.
+(*COMMIT) is the strongest, causing the entire match to fail.
+If there is a matching failure to the right, backtracking onto (*PRUNE) cases -it to be triggered, and its action is taken. There can never be a backtrack -onto (*COMMIT).
+More than One Backtracking Verb
-Backtracking verbs in repeated groups
+If more than one backtracking verb is present in a pattern, the one that + is backtracked onto first acts. For example, consider the following + pattern, where A, B, and so on, are complex pattern fragments:
-PCRE differs from Perl in its handling of backtracking verbs in repeated -groups. For example, consider:
+
+(A(*COMMIT)B(*THEN)C|ABD)
-+/(a(*COMMIT)b)+ac/
If A matches but B fails, the backtrack to (*COMMIT) causes the entire + match to fail. However, if A and B match, but C fails, the backtrack to + (*THEN) causes the next alternative (ABD) to be tried. This behavior is + consistent, but is not always the same as in Perl. It means that if two or + more backtracking verbs appear in succession, the last of them has no + effect. Consider the following example:
-If the subject is "abac", Perl matches, but PCRE fails because the (*COMMIT) in -the second repeat of the group acts.
+
+...(*COMMIT)(*PRUNE)...
-Backtracking verbs in assertions
+If there is a matching failure to the right, backtracking onto (*PRUNE) + cases it to be triggered, and its action is taken. There can never be a + backtrack onto (*COMMIT).
-(*FAIL) in an assertion has its normal effect: it forces an immediate backtrack.
+Backtracking Verbs in Repeated Groups
-(*ACCEPT) in a positive assertion causes the assertion to succeed without any -further processing. In a negative assertion, (*ACCEPT) causes the assertion to -fail without any further processing.
+PCRE differs from Perl in its handling of backtracking verbs in repeated + groups. For example, consider:
-The other backtracking verbs are not treated specially if they appear in a -positive assertion. In particular, (*THEN) skips to the next alternative in the -innermost enclosing group that has alternations, whether or not this is within -the assertion.
+
+/(a(*COMMIT)b)+ac/
-Negative assertions are, however, different, in order to ensure that changing a -positive assertion into a negative assertion changes its result. Backtracking -into (*COMMIT), (*SKIP), or (*PRUNE) causes a negative assertion to be true, -without considering any further alternative branches in the assertion. -Backtracking into (*THEN) causes it to skip to the next enclosing alternative -within the assertion (the normal behaviour), but if the assertion does not have -such an alternative, (*THEN) behaves like (*PRUNE).
+If the subject is "abac", Perl matches, but PCRE fails because the + (*COMMIT) in the second repeat of the group acts.
-Backtracking verbs in subroutines
+Backtracking Verbs in Assertions
-These behaviours occur whether or not the subpattern is called recursively. -Perl's treatment of subroutines is different in some cases.
+(*FAIL) in an assertion has its normal effect: it forces an immediate + backtrack.
-(*FAIL) in a subpattern called as a subroutine has its normal effect: it forces -an immediate backtrack.
+(*ACCEPT) in a positive assertion causes the assertion to succeed without + any further processing. In a negative assertion, (*ACCEPT) causes the + assertion to fail without any further processing.
-(*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to -succeed without any further processing. Matching then continues after the -subroutine call.
+The other backtracking verbs are not treated specially if they appear in + a positive assertion. In particular, (*THEN) skips to the next alternative + in the innermost enclosing group that has alternations, regardless if this + is within the assertion.
-(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine cause -the subroutine match to fail.
+Negative assertions are, however, different, to ensure that changing a + positive assertion into a negative assertion changes its result. + Backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes a negative + assertion to be true, without considering any further alternative branches + in the assertion. Backtracking into (*THEN) causes it to skip to the next + enclosing alternative within the assertion (the normal behavior), but if + the assertion does not have such an alternative, (*THEN) behaves like + (*PRUNE).
-(*THEN) skips to the next alternative in the innermost enclosing group within -the subpattern that has alternatives. If there is no such group within the -subpattern, (*THEN) causes the subroutine match to fail.
+Backtracking Verbs in Subroutines
-These behaviors occur regardless if the subpattern is called recursively. + The treatment of subroutines in Perl is different in some cases.
+(*FAIL) in a subpattern called as a subroutine has its normal effect: + it forces an immediate backtrack.
+(*ACCEPT) in a subpattern called as a subroutine causes the + subroutine match to succeed without any further processing. Matching + then continues after the subroutine call.
+(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a + subroutine cause the subroutine match to fail.
+(*THEN) skips to the next alternative in the innermost enclosing + group within the subpattern that has alternatives. If there is no such + group within the subpattern, (*THEN) causes the subroutine match to + fail.
+The Standard Erlang Libraries application, STDLIB, - contains modules for manipulating lists, strings and files etc.
-Sets are collections of elements with no duplicate elements. - The representation of a set is not defined.
-This module provides exactly the same interface as the module
-
This module provides the same interface as the
+
As returned by
As returned by
+
Returns a new empty set.
+Returns a new set formed from
Returns
Returns
Returns the number of elements in
Filters elements in
Returns the elements of
Folds
Returns a set of the elements in
Returns
Returns the intersection of the non-empty list of sets.
Returns a new set formed from
Returns the intersection of
Returns
Returns
Returns the merged (union) set of
Returns
Returns the merged (union) set of the list of sets.
+Returns
Returns the intersection of
Returns
Returns the intersection of the non-empty list of sets.
+Returns a new empty set.
Returns
Returns the number of elements in
Returns only the elements of
Returns only the elements of
Returns
Returns the elements of
Fold
Returns the merged (union) set of the list of sets.
Filter elements in
Returns the merged (union) set of
The module
The shell is a user interface program +
This module provides an Erlang shell.
+ +The shell is a user interface program
for entering expression sequences. The expressions are
- evaluated and a value is returned.
+ evaluated and a value is returned.
A history mechanism saves previous commands and their
values, which can then be incorporated in later commands.
How many commands and results to save can be determined by the user,
- either interactively, by calling
The shell uses a helper process for evaluating commands in
- order to protect the history mechanism from exceptions. By
+
The shell uses a helper process for evaluating commands
+ to protect the history mechanism from exceptions. By
default the evaluator process is killed when an exception
- occurs, but by calling
Variable bindings, and local process dictionary changes - which are generated in user expressions are preserved, and the variables + that are generated in user expressions are preserved, and the variables can be used in later commands to access their values. The - bindings can also be forgotten so the variables can be re-used. -
+ bindings can also be forgotten so the variables can be reused. +The special shell commands all have the syntax of (local) function calls. They are evaluated as normal function calls and many commands can be used in one - expression sequence. -
+ expression sequence. +If a command (local function call) is not recognized by the
- shell, an attempt is first made to find the function in the
+ shell, an attempt is first made to find the function in
module
The shell also permits the user to start multiple concurrent - jobs. A job can be regarded as a set of processes which can - communicate with the shell. -
+ jobs. A job can be regarded as a set of processes that can + communicate with the shell. +There is some support for reading and printing records in
the shell. During compilation record expressions are translated
to tuple expressions. In runtime it is not known whether a tuple
- actually represents a record. Nor are the record definitions
- used by compiler available at runtime. So in order to read the
+ represents a record, and the record definitions
+ used by the compiler are unavailable at runtime. So, to read the
record syntax and print tuples as records when possible, record
- definitions have to be maintained by the shell itself. The shell
- commands for reading, defining, forgetting, listing, and
- printing records are described below. Note that each job has its
- own set of record definitions. To facilitate matters record
- definitions in the modules
The shell commands for reading, defining, forgetting, listing, and
+ printing records are described below. Notice that each job has its
+ own set of record definitions. To facilitate matters, record
+ definitions in modules
- -include_lib("kernel/include/file.hrl").
- to
The shell runs in two modes:
+-include_lib("kernel/include/file.hrl"). + +The shell runs in two modes:
+Job Control Mode,
Only the currently connected job can 'talk' to the shell.
Removes all variable bindings. -
+Removes all variable bindings.
Removes the binding of variable
Removes the binding of variable
Prints the history list. -
+Prints the history list.
Sets the number of previous commands to keep in the
history list to
Sets the number of results from previous commands to keep in
the history list to
Repeats the command
Repeats command
Uses the return value of the command
Uses the return value of command
Evaluates
Evaluates
Evaluates
Defines a record in the shell.
Removes all record definitions, then reads record
definitions from the modules
Removes selected record definitions.
Prints all record definitions. -
+Prints all record definitions.
Prints selected record definitions.
-
Prints a term using the record definitions known to the
shell. All of
Reads record definitions from a module's BEAM file. If
there are no record definitions in the BEAM file, the
source file is located and read instead. Returns the names
- of the record definitions read.
Reads record definitions from files. Existing
definitions of any of the record names read are replaced.
Reads record definitions from files but
discards record names not mentioned in
Reads record definitions from files. The compiler
options
The following example is a long dialogue with the shell. Commands +
The following example is a long dialog with the shell. Commands
starting with
strider 1> erl Erlang (BEAM) emulator version 5.3 [hipe] [threads:0] Eshell V5.3 (abort with ^G) -1>Str = "abcd". -"abcd" +1> Str = "abcd". +"abcd"+ +
Command 1 sets variable
2> L = length(Str). -4 +4+ +
Command 2 sets
3> Descriptor = {L, list_to_atom(Str)}.
-{4,abcd}
+{4,abcd}
+
+ Command 3 builds the tuple
4> L. -4 +4+ +
Command 4 prints the value of variable
5> b().
Descriptor = {4,abcd}
L = 4
Str = "abcd"
-ok
+ok
+
+ Command 5 evaluates the internal shell command
6> f(L). -ok +ok+ +
Command 6 evaluates the internal shell command
7> b().
Descriptor = {4,abcd}
Str = "abcd"
-ok
+ok
+
+ Command 7 prints the new bindings.
+ +8> f(L). -ok +ok+ +
Command 8 has no effect, as
9> {L, _} = Descriptor.
-{4,abcd}
+{4,abcd}
+
+ Command 9 performs a pattern matching operation on
+
10> L. -4 +4+ +
Command 10 prints the current value of
11> {P, Q, R} = Descriptor.
-** exception error: no match of right hand side value {4,abcd}
+** exception error: no match of right hand side value {4,abcd}
+
+ Command 11 tries to match
12> P.
-* 1: variable 'P' is unbound **
+* 1: variable 'P' is unbound
13> Descriptor.
-{4,abcd}
+{4,abcd}
+
+ Commands 12 and 13 show that
14>{P, Q} = Descriptor.
{4,abcd}
15> P.
-4
+4
+
+ Commands 14 and 15 show a correct match where
16> f(). -ok +ok+ +
Command 16 clears all bindings.
+ +The next few commands assume that
+
+
+
17> put(aa, hello). undefined 18> get(aa). -hello +hello+ +
Commands 17 and 18 set and inspect the value of item
+
19> Y = test1:demo(1). -11 +11+ +
Command 19 evaluates
20> get().
[{aa,worked}]
21> put(aa, hello).
worked
22> Z = test1:demo(2).
** exception error: no match of right hand side value 1
- in function test1:demo/1
+ in function test1:demo/1
+
+ Commands 21 and 22 change the value of dictionary item
+
23> Z. -* 1: variable 'Z' is unbound ** +* 1: variable 'Z' is unbound 24> get(aa). -hello +hello+ +
Commands 23 and 24 show that
25> erase(), put(aa, hello). undefined 26> spawn(test1, demo, [1]). <0.57.0> 27> get(aa). -hello +hello+ +
Commands 25, 26, and 27 show the effect of evaluating
+
28> io:format("hello hello\n").
hello hello
ok
@@ -341,31 +444,96 @@ ok
hello hello
ok
30> v(28).
-ok
+ok
+
+ Commands 28, 29 and 30 use the history facilities of the shell. + Command 29 re-evaluates command 28. Command 30 uses the value (result) + of command 28. In the cases of a pure function (a function + with no side effects), the result is the same. For a function + with side effects, the result can be different.
+ +The next few commands show some record manipulation. It is
+ assumed that
+
31> c(ex).
{ok,ex}
32> rr(ex).
-[rec]
+[rec]
+
+ Commands 31 and 32 compile file
33> rl(rec).
-record(rec,{a,b = val()}).
-ok
+ok
+
+ Command 33 prints the definition of the record named
+
34> #rec{}.
-** exception error: undefined shell command val/0
+** exception error: undefined shell command val/0
+
+ Command 34 tries to create a
35> #rec{b = 3}.
-#rec{a = undefined,b = 3}
+#rec{a = undefined,b = 3}
+
+ Command 35 shows the workaround: explicitly assign values to record + fields that cannot otherwise be initialized.
+ +
36> rp(v(-1)).
#rec{a = undefined,b = 3}
-ok
+ok
+
+ Command 36 prints the newly created record using record + definitions maintained by the shell.
+ +
37> rd(rec, {f = orddict:new()}).
-rec
+rec
+
+ Command 37 defines a record directly in the shell. The
+ definition replaces the one read from file
38> #rec{}.
#rec{f = []}
-ok
+ok
+
+ Command 38 creates a record using the new definition, and + prints the result.
+ +
39> rd(rec, {c}), A.
-* 1: variable 'A' is unbound **
+* 1: variable 'A' is unbound
40> #rec{}.
#rec{c = undefined}
-ok
+ok
+
+ Command 39 and 40 show that record definitions are updated
+ as side effects. The evaluation of the command fails, but
+ the definition of
For the next command, it is assumed that
+
+
41> test1:loop(0). Hello Number: 0 Hello Number: 1 @@ -383,225 +551,122 @@ Hello Number: 3375 Hello Number: 3376 Hello Number: 3377 Hello Number: 3378 -** exception exit: killed +** exception exit: killed+ +
Command 41 evaluates
In this particular case, command
42> E = ets:new(t, []). -17 +17+ +
Command 42 creates an ETS table.
+ +
43> ets:insert({d,1,2}).
-** exception error: undefined function ets:insert/1
+** exception error: undefined function ets:insert/1
+
+ Command 43 tries to insert a tuple into the ETS table, but the + first argument (the table) is missing. The exception kills the + evaluator process.
+ +
44> ets:insert(E, {d,1,2}).
** exception error: argument is of wrong type
in function ets:insert/2
- called as ets:insert(16,{d,1,2})
+ called as ets:insert(16,{d,1,2})
+
+ Command 44 corrects the mistake, but the ETS table has been + destroyed as it was owned by the killed evaluator process.
+ +45> f(E). ok 46> catch_exception(true). -false +false+ +
Command 46 sets the exception handling of the evaluator process
+ to
47> E = ets:new(t, []).
18
48> ets:insert({d,1,2}).
-* exception error: undefined function ets:insert/1
-49> ets:insert(E, {d,1,2}).
-true
-50> halt().
-strider 2>
- Command 1 sets the variable
Command 2 sets
Command 3 builds the tuple
Command 4 prints the value of the variable
Command 5 evaluates the internal shell command
Command 6
Command 7 prints the new bindings. -
-Command 8 has no effect since
Command 9 performs a pattern matching operation on
-
Command 10 prints the current value of
Command 11 tries to match
Commands 12 and 13 show that
Commands 14 and 15 show a correct match where
Command 16 clears all bindings. -
-The next few commands assume that
-demo(X) -> - put(aa, worked), - X = 1, - X + 10.-
Commands 17 and 18 set and inspect the value of the item
-
Command 19 evaluates
Commands 21 and 22 change the value of the dictionary item
-
Commands 23 and 24 show that
Commands 25, 26 and 27 show the effect of evaluating
-
Commands 28, 29 and 30 use the history facilities of the shell. -
-Command 29 is
The next few commands show some record manipulation. It is
- assumed that
--record(rec, {a, b = val()}).
-
-val() ->
- 3.
- Commands 31 and 32 compiles the file
Command 33 prints the definition of the record named
-
Command 34 tries to create a
Command 36 prints the newly created record using record - definitions maintained by the shell. -
-Command 37 defines a record directly in the shell. The
- definition replaces the one read from the file
Command 38 creates a record using the new definition, and - prints the result. -
-Command 39 and 40 show that record definitions are updated
- as side effects. The evaluation of the command fails but
- the definition of
For the next command, it is assumed that
-loop(N) ->
- io:format("Hello Number: ~w~n", [N]),
- loop(N+1).
- Command 41 evaluates
In this particular case, the
Command 42 creates an ETS table.
-Command 43 tries to insert a tuple into the ETS table but the - first argument (the table) is missing. The exception kills the - evaluator process.
-Command 44 corrects the mistake, but the ETS table has been - destroyed since it was owned by the killed evaluator process.
-Command 46 sets the exception handling of the evaluator process
- to
Command 48 makes the same mistake as in command 43, but this time the evaluator process lives on. The single star at the beginning of the printout signals that the exception has been caught.
+ +
+49> ets:insert(E, {d,1,2}).
+true
+
Command 49 successfully inserts the tuple into the ETS table.
-The
+50> halt(). +strider 2>+ +
Command 50 exits the Erlang runtime system.
When the shell starts, it starts a single evaluator
- process. This process, together with any local processes which
+ process. This process, together with any local processes that
it spawns, is referred to as a
All jobs which do not use standard IO run in the normal way. -
-The shell escape key
- --> ? - c [nn] - connect to job - i [nn] - interrupt job - k [nn] - kill job - j - list all jobs - s [shell] - start local shell - r [node [shell]] - start remote shell - q - quit erlang - ? | h - this message+ with standard I/O. All other jobs, which are said to be
All jobs that do not use standard I/O run in the normal way.
+ +The shell escape key
+--> ? +c [nn] - connect to job +i [nn] - interrupt job +k [nn] - kill job +j - list all jobs +s [shell] - start local shell +r [node [shell]] - start remote shell +q - quit erlang +? | h - this message+
The
Connects to job number
Stops the current evaluator process for job number
Lists all jobs. A list of all known jobs is - printed. The current job name is prefixed with '*'. -
+ printed. The current job name is prefixed with '*'.Starts a new job. This will be assigned the new index
-
Starts a new job. This is assigned the new index
+
Starts a new job. This will be assigned the new index
-
Starts a new job. This is assigned the new index
+
Starts a remote job on
Quits Erlang. Note that this option is disabled if
- Erlang is started with the ignore break,
Quits Erlang. Notice that this option is disabled if
+ Erlang is started with the ignore break,
Displays this message.
+Displays the help message above.
It is possible to alter the behavior of shell escape by means
- of the STDLIB application variable
The behavior of shell escape can be changed by the STDLIB
+ application variable
If you want an Erlang node to have a remote job active from the start
- (rather than the default local job), you start Erlang with the
-
If you want an Erlang node to have a remote job active from the start
+ (rather than the default local job), start Erlang with flag
+
The shell may be started in a +
The shell can be started in a
restricted mode. In this mode, the shell evaluates a function call
only if allowed. This feature makes it possible to, for example,
prevent a user from accidentally calling a function from the
prompt that could harm a running system (useful in combination
- with the the system flag
When the restricted shell evaluates an expression and
- encounters a function call or an operator application,
+ encounters a function call or an operator application,
it calls a callback function (with
information about the function call in question). This callback
function returns
to determine if the call to the local function
to determine if the call to non-local function
-
These callback functions are in fact called from local and
+
+ This is used to determine if the call to the local function
+ This is used to determine if the call to non-local function
+
+
+
+
These callback functions are called from local and
non-local evaluation function handlers, described in the
-
The
Argument
There are two ways to start a restricted shell session:
+Use STDLIB application variable
From a normal shell session, call function
+
Notes:
When restricted shell mode is activated or + deactivated, new jobs started on the node run in restricted + or normal mode, respectively.
+If restricted mode has been enabled on a + particular node, remote shells connecting to this node also + run in restricted mode.
+The callback functions cannot be used to allow or disallow + execution of functions called from compiled code (only functions + called from expressions entered at the shell prompt).
+Errors when loading the callback module is handled in different ways depending on how the restricted shell is activated:
+If the restricted shell is activated by setting the STDLIB
+ variable during emulator startup, and the callback module cannot be
+ loaded, a default restricted shell allowing only the commands
+
If the restricted shell is activated using
+
The default shell prompt function displays the name of the node
(if the node can be part of a distributed system) and the
current command number. The user can customize the prompt
- function by calling
-
A customized prompt function is stated as a tuple
Sets the number of previous commands to keep in the
- history list to
Sets the number of results from previous commands to keep in
- the history list to
Sets the number of previous commands to keep in the
+ history list to
Sets the shell prompt function to
Sets the number of results from previous commands to keep in
+ the history list to
Exits a normal shell and starts a restricted
- shell.
Exits a normal shell and starts a restricted shell.
+
If the callback module cannot be loaded, an error tuple is
returned. The
Exits a restricted shell and starts a normal shell. The function is meant to be called from the shell.
Sets pretty printing of lists to
The flag can also be set by the STDLIB application variable
-
The functions in
Consider the following shell dialogue:
+The functions in this module are called when no module name is + specified in a shell command.
+ +Consider the following shell dialog:
+
-1 > lists:reverse("abc").
+1> lists:reverse("abc").
"cba"
-2 > c(foo).
-{ok, foo}
- In command one, the module
In command one, module
To add your own commands to the shell, create a module called
-code:load_abs("$PATH/user_default").
+code:load_abs("$PATH/user_default").
+
This module provides functions for starting Erlang slave nodes. - All slave nodes which are started by a master will terminate - automatically when the master terminates. All TTY output produced - at the slave will be sent back to the master node. File I/O is - done via the master.
+ All slave nodes that are started by a master terminate + automatically when the master terminates. All terminal output produced + at the slave is sent back to the master node. File I/O is + done through the master. +Slave nodes on other hosts than the current one are started with
- the program
An alternative to the
The slave node should use the same file system at the master. At - least, Erlang/OTP should be installed in the same place on both - computers and the same version of Erlang should be used.
-Currently, a node running on Windows NT can only start slave
+ the command line to
+
+-rsh Program+ +
The slave node is to use the same file system at the master. At + least, Erlang/OTP is to be installed in the same place on both + computers and the same version of Erlang is to be used.
+ +A node running on Windows can only start slave nodes on the host on which it is running.
+The master node must be alive.
Calls
+% erl -name abc -s slave pseudo klacke@super x --+
Starts a number of pseudo servers. A pseudo server is a + server with a registered name that does nothing + but pass on all message to the real server that executes at a + master node. A pseudo server is an intermediary that only has + the same registered name as the real server.
+For example, if you have started a slave node
+rpc:call(N, slave, pseudo, [node(), [pxw_server]]).
+ Runs a pseudo server. This function never returns any value
+ and the process that executes the function receives
+ messages. All messages received are simply passed on to
+
Starts a slave node on the host
Starts a slave node on host
The name of the started node will be
The name of the started node becomes
+
The slave node resets its
The
As an example, suppose that we want to start a slave node at
- host
Argument
As an example, suppose that you want to start a slave node at
+ host
directory
the Mnesia directory should be set to
the unix
The following code is executed to achieve this:
E = " -env DISPLAY " ++ net_adm:localhost() ++ ":0 ",
Arg = "-mnesia_dir " ++ M ++ " -pa " ++ Dir ++ E,
slave:start(H, Name, Arg).
- If successful, the function returns
The function returns
The master node failed to get in contact with the slave - node. This can happen in a number of circumstances:
+ node. This can occur in a number of circumstances:A node with the name
A node with name
Starts a slave node in the same way as
See
For a description of arguments and return values, see
+
Stops (kills) a node.
Calls
-% erl -name abc -s slave pseudo klacke@super x ---
Starts a number of pseudo servers. A pseudo server is a - server with a registered name which does absolutely nothing - but pass on all message to the real server which executes at a - master node. A pseudo server is an intermediary which only has - the same registered name as the real server.
-For example, if we have started a slave node
-rpc:call(N, slave, pseudo, [node(), [pxw_server]]).
- Runs a pseudo server. This function never returns any value
- and the process which executes the function will receive
- messages. All messages received will simply be passed on to
-
The
This module provides operations on finite sets and relations represented as sets. Intuitively, a set is a collection of elements; every element belongs to the set, and the set contains every element.
+Given a set A and a sentence S(x), where x is a free variable, a new set B whose elements are exactly those elements of A for which S(x) holds can be formed, this is denoted B = {x in A : S(x)}. Sentences are expressed using the logical operators "for some" (or "there exists"), "for all", "and", "or", "not". If the existence of a set containing all the - specified elements is known (as will always be the case in this - module), we write B = {x : S(x)}.
-The unordered set containing the elements a, b and c - is denoted {a, b, c}. This notation is not to be - confused with tuples. The ordered pair of a and b, with - first coordinate a and second coordinate b, is denoted - (a, b). An ordered pair is an ordered set of two - elements. In this module ordered sets can contain one, two or - more elements, and parentheses are used to enclose the elements. - Unordered sets and ordered sets are orthogonal, again in this - module; there is no unordered set equal to any ordered set.
-The set that contains no elements is called the empty set.
- If two sets A and B contain the same elements, then A
- is
The
A
Sometimes, when the range of a function is more important than
- the function itself, the function is called a family.
- The domain of a family is called the index set, and the
- range is called the indexed set. If x is a family from
- I to X, then x[i] denotes the value of the function at index i.
- The notation "a family in X" is used for such a family. When the
- indexed set is a set of subsets of a set X, then we call x
- a
A
The unordered set containing the elements a, b, and c is + denoted {a, b, c}. This notation is not to be confused with + tuples.
+The ordered pair of a and b, with first coordinate + a and second coordinate b, is denoted (a, b). An ordered pair + is an ordered set of two elements. In this module, ordered + sets can contain one, two, or more elements, and parentheses are + used to enclose the elements.
+Unordered sets and ordered sets are orthogonal, again in this + module; there is no unordered set equal to any ordered set.
The empty set contains no elements.
+Set A is
Set B is a
The
The
Two sets are
The
The
The
The
The
A
The
The
The
If A is a subset of X, the
If R is a relation from X to Y, and S is a relation from Y to Z, the
+
The
If S is a restriction of R to A, then R is an
+
If X = Y, then R is called a relation in X.
+The
If R is a relation in X, and if S is defined so that x S y
+ if x R y and not x = y, then S is the
+
A relation R in X is reflexive if x R x for every + element x of X, it is symmetric if x R y implies + that y R x, and it is transitive if + x R y and y R z imply that x R z.
+A
Instead of writing (x, y) in F or x F y, we + write F(x) = y when F is a function, and say that F maps x + onto y, or that the value of F at x is y.
+As functions are relations, the definitions of the last item (domain, + range, and so on) apply to functions as well.
+If the converse of a function F is a function F', then F' is called
+ the
The relative product of two functions F1 and F2 is called
+ the
Sometimes, when the range of a function is more important than the + function itself, the function is called a family.
+The domain of a family is called the index set, and the + range is called the indexed set.
+If x is a family from I to X, then x[i] denotes the value of the + function at index i. The notation "a family in X" is used for such a + family.
+When the indexed set is a set of subsets of a set X, we call x a
+
If x is a family of subsets of X, the union of the range of x is + called the union of the family x.
+If x is non-empty (the index set is non-empty), the intersection + of the family x is the intersection of the range of x.
+In this module, the only families that are considered are families + of subsets of some set X; in the following, the word "family" is + used for such families of subsets.
+A
A relation in a set is an equivalence relation if it is + reflexive, symmetric, and transitive.
+If R is an equivalence relation in X, and x is an element of X, the
+
If R is an equivalence relation in X, the
+
We call a set of ordered sets (x[1], ..., x[n]) an
+
The
The relative product of binary relations can be generalized to n-ary
+ relations as follows. Let TR be an ordered set
+ (R[1], ..., R[n]) of binary relations from X to Y[i]
+ and S a binary relation from
+ (Y[1] × ... × Y[n]) to Z. The
+
The
For every atom T, except '_', and for every term X, + (T, X) belongs to Sets (atomic sets).
+(['_'], []) belongs to Sets (the untyped empty + set).
+For every tuple T = {T[1], ..., T[n]} and + for every tuple X = {X[1], ..., X[n]}, if + (T[i], X[i]) belongs to Sets for every + 1 <= i <= n, then (T, X) belongs + to Sets (ordered sets).
+For every term T, if X is the empty list or a non-empty + sorted list [X[1], ..., X[n]] without duplicates + such that (T, X[i]) belongs to Sets for every + 1 <= i <= n, then ([T], X) + belongs to Sets (typed unordered sets).
+An
A
If S is an element (T, X) of Sets, then T is a
+
The sets represented by Sets are the elements of the range of + function Set from Sets to Erlang terms and sets of Erlang terms:
+When there is no risk of confusion, elements of Sets are identified
+ with the sets they represent. For example, if U is the result of
+ calling
An
The actual sets represented by Sets are the elements of the - range of the function Set from Sets to Erlang terms and sets of - Erlang terms:
-When there is no risk of confusion, elements of Sets will be
- identified with the sets they represent. For instance, if U is
- the result of calling
The types are used to implement the various conditions that
- sets need to fulfill. As an example, consider the relative
+ sets must fulfill. As an example, consider the relative
product of two sets R and S, and recall that the relative
product of R and S is defined if R is a binary relation to Y and
- S is a binary relation from Y. The function that implements the relative
- product,
A few functions of this module (
A few functions of this module
+ (
If SetFun is specified as a fun, the fun is applied to each element + of the given set and the return value is assumed to be a set.
+If SetFun is specified as a tuple
Specifying a SetFun as an integer I is equivalent to
+ specifying
Examples of SetFuns:
+
fun sofs:union/1
fun(S) -> sofs:partition(1, S) end
@@ -325,22 +367,31 @@ fun(S) -> sofs:partition(1, S) end
{external, fun({_,{_,C}}) -> C end}
{external, fun({_,{_,{_,E}=C}}) -> {E,{E,C}} end}
2
+
The order in which a SetFun is applied to the elements of an - unordered set is not specified, and may change in future - versions of sofs.
+ unordered set is not specified, and can change in future + versions of this module. +The execution time of the functions of this module is dominated
by the time it takes to sort lists. When no sorting is needed,
the execution time is in the worst case proportional to the sum
of the sizes of the input arguments and the returned value. A
- few functions execute in constant time:
The functions of this module exit the process with a
When comparing external sets the operator
When comparing external sets, operator
A tuple where the elements are of type
Creates a
Returns the binary relation containing the elements
- (E, Set) such that Set belongs to
1> Ss = sofs:from_term([[a,b],[b,c]]),
CR = sofs:canonical_relation(Ss),
@@ -435,13 +488,14 @@ fun(S) -> sofs:partition(1, S) end
[{a,[a,b]},{b,[a,b]},{b,[b,c]},{c,[b,c]}]
Returns the
1> F1 = sofs:a_function([{a,1},{b,2},{c,2}]),
F2 = sofs:a_function([{1,x},{2,y},{3,z}]),
@@ -450,13 +504,14 @@ fun(S) -> sofs:partition(1, S) end
[{a,x},{b,y},{c,y}]
Creates the
1> S = sofs:set([a,b]),
E = sofs:from_term(1),
@@ -465,12 +520,13 @@ fun(S) -> sofs:partition(1, S) end
[{a,1},{b,1}]
Returns the
1> R1 = sofs:relation([{1,a},{2,b},{3,a}]),
R2 = sofs:converse(R1),
@@ -478,39 +534,42 @@ fun(S) -> sofs:partition(1, S) end
[{a,1},{a,3},{b,2}]
Returns the
Returns the
Creates a
If G is a directed graph, it holds that the vertices and
edges of G are the same as the vertices and edges of
Returns the
Returns the
1> R = sofs:relation([{1,a},{1,b},{2,b},{2,c}]),
S = sofs:domain(R),
@@ -518,14 +577,15 @@ fun(S) -> sofs:partition(1, S) end
[1,2]
Returns the difference between the binary relation
-
1> R1 = sofs:relation([{1,a},{2,b},{3,c}]),
S = sofs:set([2,4,6]),
@@ -536,14 +596,15 @@ fun(S) -> sofs:partition(1, S) end
difference(R, restriction(R, S)) .
Returns a subset of
Returns a subset of
1> SetFun = {external, fun({_A,B,C}) -> {B,C} end},
R1 = sofs:relation([{a,aa,1},{b,bb,2},{c,cc,3}]),
@@ -555,24 +616,27 @@ fun(S) -> sofs:partition(1, S) end
difference(S1, restriction(F, S1, S2)) .
Returns the Returns the
Returns the
Returns the
1> S = sofs:set([b,c]),
A = sofs:empty_set(),
@@ -582,31 +646,33 @@ fun(S) -> sofs:partition(1, S) end
[{a,[1,2]},{b,[3]},{c,[]}]
Creates a
Creates a
If
If
1> F1 = sofs:family([{a,[1,2]},{b,[3,4]}]),
F2 = sofs:family([{b,[4,5]},{c,[6,7]}]),
@@ -615,19 +681,20 @@ fun(S) -> sofs:partition(1, S) end
[{a,[1,2]},{b,[3]}]
If
If
1> FR = sofs:from_term([{a,[{1,a},{2,b},{3,c}]},{b,[]},{c,[{4,d},{5,e}]}]),
F = sofs:family_domain(FR),
@@ -635,43 +702,46 @@ fun(S) -> sofs:partition(1, S) end
[{a,[1,2,3]},{b,[]},{c,[4,5]}]
If
If
1> FR = sofs:from_term([{a,[{1,a},{2,b},{3,c}]},{b,[]},{c,[{4,d},{5,e}]}]),
F = sofs:family_field(FR),
sofs:to_external(F).
[{a,[1,2,3,a,b,c]},{b,[]},{c,[4,5,d,e]}]
If
If
If
If
1> F1 = sofs:from_term([{a,[[1,2,3],[2,3,4]]},{b,[[x,y,z],[x,y]]}]),
@@ -680,17 +750,18 @@ fun(S) -> sofs:partition(1, S) end
[{a,[2,3]},{b,[x,y]}]
If
If
1> F1 = sofs:family([{a,[1,2]},{b,[3,4]},{c,[5,6]}]),
F2 = sofs:family([{b,[4,5]},{c,[7,8]},{d,[9,10]}]),
@@ -699,17 +770,18 @@ fun(S) -> sofs:partition(1, S) end
[{b,[4]},{c,[]}]
If
If
1> F1 = sofs:from_term([{a,[[1,2],[2,3]]},{b,[[]]}]),
F2 = sofs:family_projection(fun sofs:union/1, F1),
@@ -717,19 +789,20 @@ fun(S) -> sofs:partition(1, S) end
[{a,[1,2,3]},{b,[]}]
If
If
1> FR = sofs:from_term([{a,[{1,a},{2,b},{3,c}]},{b,[]},{c,[{4,d},{5,e}]}]),
F = sofs:family_range(FR),
@@ -737,22 +810,23 @@ fun(S) -> sofs:partition(1, S) end
[{a,[a,b,c]},{b,[]},{c,[d,e]}]
If
If
1> F1 = sofs:family([{a,[1,2,3]},{b,[1,2]},{c,[1]}]),
SpecFun = fun(S) -> sofs:no_elements(S) =:= 2 end,
@@ -761,23 +835,24 @@ fun(S) -> sofs:partition(1, S) end
[{b,[1,2]}]
Creates a directed graph from
- the
Creates a directed graph from
+
If no graph type is given
If no graph type is specified,
It F is a family, it holds that F is a subset of
If
If
1> F = sofs:family([{a,[]}, {b,[1]}, {c,[2,3]}]),
R = sofs:family_to_relation(F),
@@ -803,19 +879,20 @@ fun(S) -> sofs:partition(1, S) end
[{b,1},{c,2},{c,3}]
If
If
1> F1 = sofs:from_term([{a,[[1,2],[2,3]]},{b,[[]]}]),
F2 = sofs:family_union(F1),
@@ -825,19 +902,20 @@ fun(S) -> sofs:partition(1, S) end
family_projection(fun sofs:union/1, F) .
If
If
1> F1 = sofs:family([{a,[1,2]},{b,[3,4]},{c,[5,6]}]),
F2 = sofs:family([{b,[4,5]},{c,[7,8]},{d,[9,10]}]),
@@ -846,40 +924,43 @@ fun(S) -> sofs:partition(1, S) end
[{a,[1,2]},{b,[3,4,5]},{c,[5,6,7,8]},{d,[9,10]}]
Returns the
1> R = sofs:relation([{1,a},{1,b},{2,b},{2,c}]),
S = sofs:field(R),
sofs:to_external(S).
[1,2,a,b,c]
- Creates a set from the
Returns the
Returns the
1> S1 = sofs:relation([{a,1},{b,2}]),
S2 = sofs:relation([{x,3},{y,4}]),
@@ -888,31 +969,33 @@ fun(S) -> sofs:partition(1, S) end
[[{a,1},{b,2}],[{x,3},{y,4}]]
Returns the
Returns the
1> S = sofs:from_term([{{"foo"},[1,1]},{"foo",[2,2]}],
[{atom,[atom]}]),
@@ -920,12 +1003,12 @@ fun(S) -> sofs:partition(1, S) end
[{{"foo"},[1]},{"foo",[2]}]
1> A = sofs:from_term(a),
@@ -935,19 +1018,25 @@ fun(S) -> sofs:partition(1, S) end
Ss = sofs:from_sets([P1,P2]),
sofs:to_external(Ss).
[{a,[1,2,3]},{b,[4,5,6]}]
- Other functions that create sets are
Other functions that create sets are
+
Returns the
Returns the
1> R = sofs:relation([{1,a},{2,b},{2,c},{3,d}]),
S1 = sofs:set([1,2]),
@@ -956,32 +1045,35 @@ fun(S) -> sofs:partition(1, S) end
[a,b,c]
Returns
- the
Returns
+ the
Intersecting an empty set of sets exits the process with a
Returns
- the
Returns
+ the
Returns the intersection of
- the
Returns the intersection of
+
Intersecting an empty family exits the process with a
Returns the
1> R1 = sofs:relation([{1,a},{2,b},{3,c}]),
R2 = sofs:inverse(R1),
@@ -1005,14 +1098,15 @@ fun(S) -> sofs:partition(1, S) end
[{a,1},{b,2},{c,3}]
Returns the
1> R = sofs:relation([{1,a},{2,b},{2,c},{3,d}]),
S1 = sofs:set([c,d,e]),
@@ -1021,42 +1115,46 @@ fun(S) -> sofs:partition(1, S) end
[2,3]
Returns
Returns
Returns
Returns
Returns
Returns
Returns
Returns
1> S1 = sofs:set([1.0]), S2 = sofs:set([1]), @@ -1064,50 +1162,55 @@ fun(S) -> sofs:partition(1, S) end true
Returns
Returns
Returns
Returns
Returns
Returns
Returns
Returns
Returns the
Returns the
1> R1 = sofs:relation([{a,x,1},{b,y,2}]),
R2 = sofs:relation([{1,f,g},{1,h,i},{2,3,4}]),
@@ -1116,18 +1219,19 @@ true
[{a,x,1,f,g},{a,x,1,h,i},{b,y,2,3,4}]
If
If
1> Ri = sofs:relation([{a,1},{b,2},{c,3}]),
R = sofs:relation([{a,b},{b,c},{c,a}]),
@@ -1136,22 +1240,24 @@ true
[{1,2},{2,3},{3,1}]
Returns the number of elements of the ordered or unordered
- set
Returns the
Returns the
1> Sets1 = sofs:from_term([[a,b,c],[d,e,f],[g,h,i]]), Sets2 = sofs:from_term([[b,c,d],[e,f,g],[h,i,j]]), @@ -1160,13 +1266,14 @@ true[[a],[b,c],[d],[e,f],[g],[h,i],[j]]
Returns the
Returns the
1> Ss = sofs:from_term([[a],[b],[c,d],[e,f]]), SetFun = fun(S) -> sofs:from_term(sofs:no_elements(S)) end, @@ -1175,17 +1282,18 @@ true[[[a],[b]],[[c,d],[e,f]]]
Returns a pair of sets that, regarded as constituting a
- set, forms a
1> R1 = sofs:relation([{1,a},{2,b},{3,c}]),
S = sofs:set([2,4,6]),
@@ -1193,23 +1301,23 @@ true
{sofs:to_external(R2),sofs:to_external(R3)}.
{[{2,b}],[{1,a},{3,c}]}
Returns the
Returns
1> S = sofs:relation([{a,a,a,a},{a,a,b,b},{a,b,b,b}]),
SetFun = {external, fun({A,_,C,_}) -> {A,C} end},
@@ -1218,16 +1326,16 @@ true
[{{a,a},[{a,a,a,a}]},{{a,b},[{a,a,b,b},{a,b,b,b}]}]
Returns the
Returns the
1> S1 = sofs:set([a,b]), S2 = sofs:set([1,2]), @@ -1237,13 +1345,14 @@ true[{a,1,x},{a,1,y},{a,2,x},{a,2,y},{b,1,x},{b,1,y},{b,2,x},{b,2,y}]
Returns the
Returns the
1> S1 = sofs:set([1,2]), S2 = sofs:set([a,b]), @@ -1254,17 +1363,18 @@ true
Returns the set created by substituting each element of
-
If
If
1> S1 = sofs:from_term([{1,a},{2,b},{3,a}]),
S2 = sofs:projection(2, S1),
@@ -1272,12 +1382,13 @@ true
[a,b]
Returns the
1> R = sofs:relation([{1,a},{1,b},{2,b},{2,c}]),
S = sofs:range(R),
@@ -1285,6 +1396,7 @@ true
[a,b,c]
Creates a
Returns the
Returns
1> R = sofs:relation([{b,1},{c,2},{c,3}]),
F = sofs:relation_to_family(R),
@@ -1320,20 +1433,21 @@ true
[{b,[1]},{c,[2,3]}]
If
If
If
If
Note that
Notice that
Returns the
Returns the
Returns the
1> R1 = sofs:relation([{1,a},{1,aa},{2,b}]),
R2 = sofs:relation([{1,u},{2,v},{3,c}]),
@@ -1382,13 +1496,14 @@ true
Returns the
1> R1 = sofs:relation([{1,a},{2,b},{3,c}]),
S = sofs:set([1,2,4]),
@@ -1397,13 +1512,14 @@ true
[{1,a},{2,b}]
Returns a subset of
Returns a subset of
1> S1 = sofs:relation([{1,a},{2,b},{3,c}]),
S2 = sofs:set([b,c,d]),
@@ -1412,28 +1528,30 @@ true
[{2,b},{3,c}]
Creates an
Creates an
Returns the set containing every element
- of
1> R1 = sofs:relation([{a,1},{b,2}]),
@@ -1444,14 +1562,15 @@ true
[[{a,1},{b,2}]]
Returns the Returns the
1> R1 = sofs:relation([{1,1},{1,2},{2,1},{2,2}]),
R2 = sofs:strict_relation(R1),
@@ -1459,13 +1578,14 @@ true
[{1,2},{2,1}]
Returns a function, the domain of which
- is
1> L = [{a,1},{b,2}].
@@ -1482,24 +1602,24 @@ true
1> I = sofs:substitution(fun(A) -> A end, sofs:set([a,b,c])),
sofs:to_external(I).
[{a,a},{b,b},{c,c}]
- Let SetOfSets be a set of sets and BinRel a binary
- relation. The function that maps each element Set of
- SetOfSets onto the
Let
images(SetOfSets, BinRel) ->
Fun = fun(Set) -> sofs:image(BinRel, Set) end,
sofs:substitution(Fun, SetOfSets).
- Here might be the place to reveal something that was more - or less stated before, namely that external unordered sets - are represented as sorted lists. As a consequence, creating - the image of a set under a relation R may traverse all +
External unordered sets are represented as sorted lists. So,
+ creating the image of a set under a relation R can traverse all
elements of R (to that comes the sorting of results, the
- image). In
images2(SetOfSets, BinRel) ->
CR = sofs:canonical_relation(SetOfSets),
@@ -1507,13 +1627,14 @@ images2(SetOfSets, BinRel) ->
sofs:relation_to_family(R).
Returns the Returns the
1> S1 = sofs:set([1,2,3]), S2 = sofs:set([2,3,4]), @@ -1522,68 +1643,81 @@ images2(SetOfSets, BinRel) -> [1,4]
Returns a triple of sets:
Returns a triple of sets:
+Returns the
Returns the
Returns the elements of the ordered set
Returns the elements of the ordered set
Returns the
Returns the
Returns the
Returns the union of
- the
Returns the union of
1> F = sofs:family([{a,[0,2,4]},{b,[0,1,2]},{c,[2,3]}]),
S = sofs:union_of_family(F),
@@ -1591,16 +1725,17 @@ images2(SetOfSets, BinRel) ->
[0,1,2,3,4]
Returns a subset S of the
@@ -1614,11 +1749,11 @@ images2(SetOfSets, BinRel) ->diff --git a/lib/stdlib/doc/src/stdlib_app.xml b/lib/stdlib/doc/src/stdlib_app.xml index 5508be9c5d..cde73269a8 100644 --- a/lib/stdlib/doc/src/stdlib_app.xml +++ b/lib/stdlib/doc/src/stdlib_app.xml @@ -29,38 +29,38 @@ See Also -+
dict(3) , -digraph(3) , -orddict(3) , -ordsets(3) , -sets(3)
, + dict(3) , + digraph(3) , + orddict(3) , + ordsets(3) sets(3) STDLIB -The STDLIB Application +The STDLIB application. - The STDLIB is mandatory in the sense that the minimal system - based on Erlang/OTP consists of Kernel and STDLIB. The STDLIB - application contains no services.
+The
STDLIB application is mandatory in the sense that the minimal + system based on Erlang/OTP consists ofKernel andSTDLIB . + TheSTDLIB application contains no services.Configuration -The following configuration parameters are defined for the STDLIB - application. See
+app(4) for more information about - configuration parameters.The following configuration parameters are defined for the
+STDLIB + application. For more information about configuration parameters, see the +module in Kernel. app(4) shell_esc = icl | abort - -
This parameter can be used to alter the behaviour of - the Erlang shell when ^G is pressed.
+Can be used to change the behavior of the Erlang shell when + ^G is pressed.
restricted_shell = module() - -
This parameter can be used to run the Erlang shell - in restricted mode.
+Can be used to run the Erlang shell in restricted mode.
shell_catch_exception = boolean() - -
This parameter can be used to set the exception handling - of the Erlang shell's evaluator process.
+Can be used to set the exception handling of the evaluator process of + Erlang shell.
shell_history_length = integer() >= 0 - -
This parameter can be used to determine how many - commands are saved by the Erlang shell.
+Can be used to determine how many commands are saved by the Erlang + shell.
shell_prompt_func = {Mod, Func} | default - @@ -69,27 +69,26 @@
Mod = atom() - -
Func = atom() This parameter can be used to set a customized - Erlang shell prompt function.
+Can be used to set a customized Erlang shell prompt function.
shell_saved_results = integer() >= 0 - -
This parameter can be used to determine how many - results are saved by the Erlang shell.
+Can be used to determine how many results are saved by the Erlang + shell.
shell_strings = boolean() - -
This parameter can be used to determine how the Erlang - shell outputs lists of integers.
+Can be used to determine how the Erlang shell outputs lists of + integers.
diff --git a/lib/stdlib/doc/src/string.xml b/lib/stdlib/doc/src/string.xml index a9ecb60244..dddedf1132 100644 --- a/lib/stdlib/doc/src/string.xml +++ b/lib/stdlib/doc/src/string.xml @@ -24,306 +24,372 @@ See Also -+
app(4) , -application(3) , -shell(3) ,
, + app(4) , + application(3) shell(3) string Robert Virding -Bjarne Dacker +Bjarne Däcker 1 Bjarne Däcker - 96-09-28 +1996-09-28 A -string.sgml +string.xml string -String Processing Functions +String processing functions. - +This module contains functions for string processing.
+This module provides functions for string processing.
- +- Return the length of a string ++ + Center a string. - Returns the number of characters in the string.
+Returns a string, where
is centered in the + string and surrounded by blanks or String . + The resulting string has length Character . Number - +- Test string equality ++ + Returns a string consisting of numbers of characters. - Tests whether two strings are equal. Returns
+true if - they are, otherwisefalse .Returns a string consisting of
characters + Number . Optionally, the string can end with + string Character . Tail - +- Concatenate two strings ++ Return the index of the first occurrence of + a character in a string. - Concatenates two strings to form a new string. Returns the - new string.
+Returns the index of the first occurrence of +
in Character . Returns + String 0 ifdoes not occur. Character - +- - Return the index of the first/last occurrence of +Character inString + Concatenate two strings. - Returns the index of the first/last occurrence of -
+in Character . String 0 is returned ifdoes not - occur. Character Concatenates
and + String1 to form a new string + String2 , which is returned. String3 - +- - Find the index of a substring ++ Copy a string. - Returns the position where the first/last occurrence of -
-begins in SubString . String 0 is returned if- does not exist in SubString . - For example: String -> string:str(" Hello Hello World World ", "Hello World"). -8+Returns a string containing
repeated + String times. Number - +- Span characters at start of string +Span characters at start of a string. Returns the length of the maximum initial segment of -
-, which consists entirely of characters from (not - from) String . Chars For example:
+, which consists entirely of characters + not from String . + Chars Example:
-> string:span("\t abcdef", " \t"). -5 > string:cspan("\t abcdef", " \t"). -0+0- + +- - Return a substring of +String + Test string equality. ++ +Returns
+true ifand + String1 are equal, otherwise String2 false .+ ++ Join a list of strings with separator. - Returns a substring of
-, starting at the - position String , and ending at the end of the string or - at length Start . Length For example:
+Returns a string with the elements of
++ separated by the string in StringList . Separator Example:
-> substr("Hello World", 4, 5). -"lo Wo"+> join(["one", "two", "three"], ", "). +"one, two, three"- +- Split string into tokens ++ + Adjust left end of a string. - Returns a list of tokens in
-, separated by the - characters in String . SeparatorList For example:
+Returns
+with the length adjusted in + accordance with String . The left margin is + fixed. If Number length( < +String ), then Number is padded + with blanks or String s. Character Example:
-> tokens("abc defxxghix jkl", "x "). -["abc", "def", "ghi", "jkl"]-Note that, as shown in the example above, two or more - adjacent separator characters in
+> string:left("Hello",10,$.). +"Hello....."- will be treated as one. That is, there will not be any empty - strings in the resulting list of tokens. String - +- Join a list of strings with separator ++ Return the length of a string. - Returns a string with the elements of
-- separated by the string in StringList . Separator For example:
--> join(["one", "two", "three"], ", "). -"one, two, three"+Returns the number of characters in
. String - +- - Returns a string consisting of numbers of characters ++ Return the index of the last occurrence of + a character in a string. - Returns a string consisting of
+of characters - Number . Optionally, the string can end with the - string Character . Tail Returns the index of the last occurrence of +
in Character . Returns + String 0 ifdoes not occur. Character - +- Copy a string ++ + Adjust right end of a string. - Returns a string containing
+repeated - String times. Number Returns
+with the length adjusted in + accordance with String . The right margin is + fixed. If the length of Number ( < +String ), then Number is padded + with blanks or String s. Character Example:
++> string:right("Hello", 10, $.). +".....Hello"- +- - Count blank separated words ++ Find the index of a substring. - Returns the number of words in
-, separated by - blanks or String . Character For example:
+Returns the position where the last occurrence of +
+begins in SubString . + Returns String 0 if+ does not exist in SubString . String Example:
-> words(" Hello old boy!", $o). -4+> string:rstr(" Hello Hello World World ", "Hello World"). +8- + +- - Extract subword ++ Span characters at start of a string. - Returns the word in position
-of Number . - Words are separated by blanks or String s. Character For example:
+Returns the length of the maximum initial segment of +
+, which consists entirely of characters + from String . Chars Example:
-> string:sub_word(" Hello old boy !",3,$o). -"ld b"+> string:span("\t abcdef", " \t"). +5+ ++ Find the index of a substring. ++ +Returns the position where the first occurrence of +
+begins in SubString . + Returns String 0 if+ does not exist in SubString . String Example:
++> string:str(" Hello Hello World World ", "Hello World"). +8++ - Strip leading or trailing characters +Strip leading or trailing characters. Returns a string, where leading and/or trailing blanks or a number of
-have been removed. - Character can be Direction left ,right , or -both and indicates from which direction blanks are to be - removed. The functionstrip/1 is equivalent to +, which can be Direction left ,right , + orboth , indicates from which direction blanks are to be + removed.strip/1 is equivalent tostrip(String, both) .For example:
+Example:
> string:strip("...Hello.....", both, $.). -"Hello"+"Hello"- +- - Adjust left end of string ++ + Extract a substring. - Returns the
-with the length adjusted in - accordance with String . The left margin is - fixed. If the Number length( <String ), - Number is padded with blanks or String s. Character For example:
+Returns a substring of
+, starting at + position String to the end of the string, or to + and including position Start . Stop Example:
-> string:left("Hello",10,$.). -"Hello....."+sub_string("Hello World", 4, 8). +"lo Wo"- -- - Adjust right end of string ++ + Return a substring of a string. - -Returns the
-with the length adjusted in - accordance with String . The right margin is - fixed. If the length of Number ( <String ), - Number is padded with blanks or String s. Character For example:
+Returns a substring of
+, starting at + position String , and ending at the end of the + string or at length Start . Length Example:
-> string:right("Hello", 10, $.). -".....Hello"-- +- - Center a string -- Returns a string, where
+> substr("Hello World", 4, 5). +"lo Wo"is centred in the - string and surrounded by blanks or characters. The resulting - string will have the length String . Number - +- - Extract a substring ++ + Extract subword. - Returns a substring of
-, starting at the - position String to the end of the string, or to and - including the Start position. Stop For example:
+Returns the word in position
+of + Number . Words are separated by blanks or + String s. Character Example:
-sub_string("Hello World", 4, 8). -"lo Wo"+> string:sub_word(" Hello old boy !",3,$o). +"ld b"+ - Returns a float whose text representation is the integers (ASCII values) in String. +Returns a float whose text representation is the integers + (ASCII values) in a string. - Argument
-is expected to start with a valid text - represented float (the digits being ASCII values). Remaining characters - in the string after the float are returned in String . Rest Example:
+Argument
+is expected to start with a + valid text represented float (the digits are ASCII values). + Remaining characters in the string after the float are returned in + String . Rest Example:
- > {F1,Fs} = string:to_float("1.0-1.0e-1"), - > {F2,[]} = string:to_float(Fs), - > F1+F2. - 0.9 - > string:to_float("3/2=1.5"). - {error,no_float} - > string:to_float("-1.5eX"). - {-1.5,"eX"}+> {F1,Fs} = string:to_float("1.0-1.0e-1"), +> {F2,[]} = string:to_float(Fs), +> F1+F2. +0.9 +> string:to_float("3/2=1.5"). +{error,no_float} +> string:to_float("-1.5eX"). +{-1.5,"eX"}+ - Returns an integer whose text representation is the integers (ASCII values) in String. +Returns an integer whose text representation is the integers + (ASCII values) in a string. - Argument
-is expected to start with a valid text - represented integer (the digits being ASCII values). Remaining characters - in the string after the integer are returned in String . Rest Example:
+Argument
+is expected to start with a + valid text represented integer (the digits are ASCII values). + Remaining characters in the string after the integer are returned in + String . Rest Example:
- > {I1,Is} = string:to_integer("33+22"), - > {I2,[]} = string:to_integer(Is), - > I1-I2. - 11 - > string:to_integer("0.5"). - {0,".5"} - > string:to_integer("x=2"). - {error,no_integer}+> {I1,Is} = string:to_integer("33+22"), +> {I2,[]} = string:to_integer(Is), +> I1-I2. +11 +> string:to_integer("0.5"). +{0,".5"} +> string:to_integer("x=2"). +{error,no_integer}+ + - Convert case of string (ISO/IEC 8859-1) +Convert case of string (ISO/IEC 8859-1). - +The given string or character is case-converted. Note that - the supported character set is ISO/IEC 8859-1 (a.k.a. Latin 1), - all values outside this set is unchanged
+The specified string or character is case-converted. Notice that + the supported character set is ISO/IEC 8859-1 (also called Latin 1); + all values outside this set are unchanged
++ + ++ Split string into tokens. ++ +Returns a list of tokens in
+, separated + by the characters in String . SeparatorList Example:
++> tokens("abc defxxghix jkl", "x "). +["abc", "def", "ghi", "jkl"]+Notice that, as shown in this example, two or more + adjacent separator characters in
++ are treated as one. That is, there are no empty + strings in the resulting list of tokens. String + + + Count blank separated words. ++ Returns the number of words in
+, separated + by blanks or String . Character Example:
++> words(" Hello old boy!", $o). +4diff --git a/lib/stdlib/doc/src/supervisor.xml b/lib/stdlib/doc/src/supervisor.xml index 29e5a732d5..294196f746 100644 --- a/lib/stdlib/doc/src/supervisor.xml +++ b/lib/stdlib/doc/src/supervisor.xml @@ -29,124 +29,138 @@ Notes -Some of the general string functions may seem to overlap each - other. The reason for this is that this string package is the - combination of two earlier packages and all the functions of - both packages have been retained. -
+Some of the general string functions can seem to overlap each + other. The reason is that this string package is the + combination of two earlier packages and all functions of + both packages have been retained.
+- Any undocumented functions in
+string should not be used.Any undocumented functions in
string are not to be used.supervisor -Generic Supervisor Behaviour +Generic supervisor behavior. - A behaviour module for implementing a supervisor, a process which +
This behavior module provides a supervisor, a process that supervises other processes called child processes. A child process can either be another supervisor or a worker process. Worker processes are normally implemented using one of the -
+ nice way to structure a fault-tolerant application. For more + information, seegen_event ,gen_fsm ,gen_statem orgen_server - behaviours. A supervisor implemented using this module will have +, + gen_event , + gen_fsm , or + gen_server + behaviors. A supervisor implemented using this module has a standard set of interface functions and include functionality for tracing and error reporting. Supervisors are used to build a hierarchical process structure called a supervision tree, a - nice way to structure a fault tolerant application. Refer to - OTP Design Principles for more information. gen_statem + Supervisor Behaviour in OTP Design Principles. +A supervisor expects the definition of which child processes to supervise to be specified in a callback module exporting a - pre-defined set of functions.
-Unless otherwise stated, all functions in this module will fail + predefined set of functions.
+ +Unless otherwise stated, all functions in this module fail if the specified supervisor does not exist or if bad arguments - are given.
+ are specified.+ +Supervision Principles -The supervisor is responsible for starting, stopping and +
The supervisor is responsible for starting, stopping, and monitoring its child processes. The basic idea of a supervisor is - that it shall keep its child processes alive by restarting them + that it must keep its child processes alive by restarting them when necessary.
+The children of a supervisor are defined as a list of child specifications. When the supervisor is started, the child processes are started in order from left to right according to this list. When the supervisor terminates, it first terminates its child processes in reversed start order, from right to left.
+- The properties of a supervisor are defined by the supervisor - flags. This is the type definition for the supervisor flags: -
-sup_flags() = #{strategy => strategy(), % optional ++ +The supervisor properties are defined by the supervisor flags. + The type definition for the supervisor flags is as follows:
+ ++sup_flags() = #{strategy => strategy(), % optional intensity => non_neg_integer(), % optional - period => pos_integer()} % optional --A supervisor can have one of the following restart - strategies, specified with the
+ period => pos_integer()} % optionalstrategy key in the - above map: -A supervisor can have one of the following restart strategies + specified with the
+strategy key in the above map:+
- -
one_for_one - if one child process terminates and - should be restarted, only that child process is +
one_for_one - If one child process terminates and + is to be restarted, only that child process is affected. This is the default restart strategy.- -
one_for_all - if one child process terminates and - should be restarted, all other child processes are terminated +
one_for_all - If one child process terminates and + is to be restarted, all other child processes are terminated and then all child processes are restarted.- -
rest_for_one - if one child process terminates and - should be restarted, the 'rest' of the child processes -- - i.e. the child processes after the terminated child process - in the start order -- are terminated. Then the terminated +
rest_for_one - If one child process terminates and + is to be restarted, the 'rest' of the child processes (that + is, the child processes after the terminated child process + in the start order) are terminated. Then the terminated child process and all child processes after it are restarted.- -
simple_one_for_one - a simplifiedone_for_one +-
simple_one_for_one - A simplifiedone_for_one supervisor, where all child processes are dynamically added - instances of the same process type, i.e. running the same + instances of the same process type, that is, running the same code.The functions
delete_child/2 - andrestart_child/2 are invalid for -simple_one_for_one supervisors and will return +Functions +
-and + delete_child/2 + are invalid for restart_child/2 simple_one_for_one supervisors and return{error,simple_one_for_one} if the specified supervisor uses this restart strategy.The function
terminate_child/2 can be used for +Function
-+ can be used for children underterminate_child/2 simple_one_for_one supervisors by - giving the child'spid() as the second argument. If + specifying the child'spid() as the second argument. If instead the child specification identifier is used, -terminate_child/2 will return +terminate_child/2 return{error,simple_one_for_one} .Because a
simple_one_for_one supervisor could have +As a
simple_one_for_one supervisor can have many children, it shuts them all down asynchronously. This - means that the children will do their cleanup in parallel, + means that the children do their cleanup in parallel, and therefore the order in which they are stopped is not defined.To prevent a supervisor from getting into an infinite loop of child process terminations and restarts, a maximum restart intensity is defined using two integer values specified - with the
+ andintensity andperiod keys in the above + with keysintensity andperiod in the above map. Assuming the valuesMaxR forintensity - andMaxT forperiod , then if more thanMaxR - restarts occur withinMaxT seconds, the supervisor will - terminate all child processes and then itself. The default value - forintensity is1 , and the default value - forperiod is5 . -MaxT forperiod , then, if more thanMaxR + restarts occur withinMaxT seconds, the supervisor + terminates all child processes and then itself.intensity + defaults to1 andperiod defaults to5 . +- This is the type definition of a child specification:
-child_spec() = #{id => child_id(), % mandatory +The type definition of a child specification is as follows:
+ ++child_spec() = #{id => child_id(), % mandatory start => mfargs(), % mandatory restart => restart(), % optional shutdown => shutdown(), % optional type => worker(), % optional modules => modules()} % optional+The old tuple format is kept for backwards compatibility, see
+ but the map is preferred. +child_spec() , - but the map is preferred. -
id is used to identify the child specification internally by the supervisor.The
-id key is mandatory.Note that this identifier on occations has been called - "name". As far as possible, the terms "identifier" or "id" - are now used but in order to keep backwards compatibility, - some occurences of "name" can still be found, for example - in error messages.
+Notice that this identifier on occations has been called + "name". As far as possible, the terms "identifier" or "id" + are now used but to keep backward compatibility, + some occurences of "name" can still be found, for example + in error messages.
start defines the function call used to start the @@ -154,84 +168,86 @@ tuple{M,F,A} used asapply(M,F,A) .The start function must create and link to the child process, and must return
{ok,Child} or -{ok,Child,Info} whereChild is the pid of - the child process andInfo an arbitrary term which is +{ok,Child,Info} , whereChild is the pid of + the child process andInfo any term that is ignored by the supervisor.The start function can also return
-ignore if the child process for some reason cannot be started, in which case - the child specification will be kept by the supervisor - (unless it is a temporary child) but the non-existing child - process will be ignored.If something goes wrong, the function may also return an + the child specification is kept by the supervisor + (unless it is a temporary child) but the non-existing child + process is ignored.
+If something goes wrong, the function can also return an error tuple
-{error,Error} .Note that the
-start_link functions of the different - behaviour modules fulfill the above requirements.The
+start key is mandatory.Notice that the
+start_link functions of the different + behavior modules fulfill the above requirements.The
start key is mandatory.-
restart defines when a terminated child process - shall be restarted. Apermanent child process will - always be restarted, atemporary child process will - never be restarted (even when the supervisor's restart strategy + must be restarted. Apermanent child process is + always restarted. Atemporary child process is + never restarted (even when the supervisor's restart strategy isrest_for_one orone_for_all and a sibling's - death causes the temporary process to be terminated) and a -transient child process will be restarted only if - it terminates abnormally, i.e. with another exit reason - thannormal ,shutdown or{shutdown,Term} .The
+ death causes the temporary process to be terminated). + Arestart key is optional. If it is not given, the - default valuepermanent will be used.transient child process is restarted only if + it terminates abnormally, that is, with another exit reason + thannormal ,shutdown , or{shutdown,Term} . +The
restart key is optional. If it is not specified, + it defaults topermanent .- -
shutdown defines how a child process shall be - terminated.brutal_kill means the child process will - be unconditionally terminated usingexit(Child,kill) . - An integer timeout value means that the supervisor will tell +
shutdown defines how a child process must be + terminated.brutal_kill means that the child process + is unconditionally terminated usingexit(Child,kill) . + An integer time-out value means that the supervisor tells the child process to terminate by callingexit(Child,shutdown) and then wait for an exit signal - with reasonshutdown back from the child process. If - no exit signal is received within the specified number of milliseconds, + with reasonshutdown back from the child process. If no + exit signal is received within the specified number of milliseconds, the child process is unconditionally terminated usingexit(Child,kill) .If the child process is another supervisor, the shutdown time - should be set to
infinity to give the subtree ample + is to be set toinfinity to give the subtree ample time to shut down. It is also allowed to set it toinfinity , if the child process is a worker.- Be careful when setting the shutdown time to -
+infinity when the child process is a worker. Because, in this - situation, the termination of the supervision tree depends on the - child process, it must be implemented in a safe way and its cleanup - procedure must always return.infinity when the child process is a worker. Because, in this + situation, the termination of the supervision tree depends on the + child process, it must be implemented in a safe way and its cleanup + procedure must always return.Note that all child processes implemented using the standard - OTP behaviour modules automatically adhere to the shutdown +
Notice that all child processes implemented using the standard + OTP behavior modules automatically adhere to the shutdown protocol.
-The
+shutdown key is optional. If it is not given, - the default value5000 will be used if the child is - of typeworker ; andinfinity will be used if - the child is of typesupervisor .The
shutdown key is optional. If it is not specified, + it defaults to5000 if the child is + of typeworker and it defaults toinfinity if + the child is of typesupervisor .-
type specifies if the child process is a supervisor or a worker.The
+type key is optional. If it is not given, the - default valueworker will be used.The
type key is optional. If it is not specified, + it defaults toworker .-
modules is used by the release handler during code replacement to determine which processes are using a certain module. As a rule of thumb, if the child process is asupervisor ,gen_server , -gen_fsm orgen_statem - this should be a list with one element[Module] , - whereModule is the callback module. If the child - process is an event manager (gen_event ) with a - dynamic set of callback modules, the valuedynamic - shall be used. See OTP Design Principles for more - information about release handling.The
+modules key is optional. If it is not given, it - defaults to[M] , whereM comes from the - child's start{M,F,A} gen_statem , orgen_fsm , + this is to be a list with one element[Module] , + whereModule is the callback module. If the child + process is an event manager (gen_event ) with a + dynamic set of callback modules, valuedynamic + must be used. For more information about release handling, see ++ Release Handling + in OTP Design Principles. +The
modules key is optional. If it is not specified, it + defaults to[M] , whereM comes from the + child's start{M,F,A} .Internally, the supervisor also keeps track of the pid @@ -240,6 +256,7 @@
+ @@ -250,20 +267,18 @@ - + The tuple format is kept for backwards compatibility - only. A map is preferred; see more details -
above .The tuple format is kept for backward compatibility + only. A map is preferred; see more details +
above .- - The value
+undefined for(the - argument list) is only to be used internally - in A supervisor . If the restart type of the child - istemporary , then the process is never to be - restarted and therefore there is no need to store the real - argument list. The valueundefined will then be - stored instead.Value
undefined for(the + argument list) is only to be used internally + in A supervisor . If the restart type of the child + istemporary , the process is never to be + restarted and therefore there is no need to store the real + argument list. Valueundefined is then stored instead.@@ -280,9 +295,9 @@ - + The tuple format is kept for backwards compatibility - only. A map is preferred; see more details -
above .The tuple format is kept for backward compatibility + only. A map is preferred; see more details +
above .@@ -291,307 +306,355 @@ - +- - Create a supervisor process. -- - + + Check if children specifications are syntactically correct. + - Creates a supervisor process as part of a supervision tree. - The function will, among other things, ensure that - the supervisor is linked to the calling process (its - supervisor).
-The created supervisor process calls
-to - find out about restart strategy, maximum restart intensity - and child processes. To ensure a synchronized start-up - procedure, Module :init/1start_link/2,3 does not return until -has returned and all child processes - have been started. Module :init/1If
-, the supervisor is registered - locally as SupName ={local,Name}Name usingregister/2 . If -the supervisor is registered - globally as SupName ={global,Name}Name usingglobal:register_name/2 . If -the supervisor - is registered as SupName ={via,Module ,Name }Name using the registry represented by -Module . TheModule callback must export the functions -register_name/2 ,unregister_name/1 andsend/2 , - which shall behave like the corresponding functions inglobal . - Thus,{via,global, is a valid reference.Name }If no name is provided, the supervisor is not registered.
--
is the name of the callback module. Module -
is an arbitrary term which is passed as - the argument to Args . Module :init/1If the supervisor and its child processes are successfully - created (i.e. if all child process start functions return -
-{ok,Child} ,{ok,Child,Info} , orignore ), - the function returns{ok,Pid} , wherePid is - the pid of the supervisor. If there already exists a process - with the specified, the function returns - SupName {error,{already_started,Pid}} , wherePid is - the pid of that process.If
-returns Module :init/1ignore , this function - returnsignore as well, and the supervisor terminates - with reasonnormal . - Iffails or returns an incorrect value, - this function returns Module :init/1{error,Term} whereTerm - is a term with information about the error, and the supervisor - terminates with reasonTerm .If any child process start function fails or returns an error - tuple or an erroneous value, the supervisor will first terminate - all already started child processes with reason
+shutdown - and then terminate itself and return -{error, {shutdown, Reason}} .Takes a list of child specification as argument + and returns
ok if all of them are syntactically + correct, otherwise{error, .Error }- -- Dynamically add a child process to a supervisor. -- + + Return counts for the number of child specifications, + active children, supervisors, and workers. - Dynamically adds a child specification to the supervisor -
-which starts the corresponding child process. SupRef +
can be: SupRef Returns a property list (see
+ ) containing the + counts for each of the following elements of the supervisor's + child specifications and managed processes:proplists -
-- the pid,
-- -
Name , if the supervisor is locally registered,- -
{Name,Node} , if the supervisor is locally - registered at another node, or- -
{global,Name} , if the supervisor is globally - registered.- +
{via,Module,Name} , if the supervisor is registered - through an alternative process registry.- +
++
specs - The total count of children, dead or alive.- +
++
active - The count of all actively running child + processes managed by this supervisor. For a +simple_one_for_one supervisors, no check is done to ensure + that each child process is still alive, although the result + provided here is likely to be very + accurate unless the supervisor is heavily overloaded.- +
++
supervisors - The count of all children marked as +child_type = supervisor in the specification list, + regardless if the child process is still alive.- +
+
workers - The count of all children marked as +child_type = worker in the specification list, + regardless if the child process is still alive.-
must be a valid child specification - (unless the supervisor is a ChildSpec simple_one_for_one - supervisor; see below). The child process will be started by - using the start function as defined in the child - specification.In the case of a
-simple_one_for_one supervisor, - the child specification defined inModule:init/1 will - be used, andshall instead be an arbitrary - list of terms ChildSpec . The child process will then be - started by appending List to the existing start - function arguments, i.e. by calling - List apply(M, F, A++ whereList ){M,F,A} is the start - function defined in the child specification.If there already exists a child specification with - the specified identifier,
-is discarded, and - the function returns ChildSpec {error,already_present} or -{error,{already_started, , depending on if - the corresponding child process is running or not.Child }}If the child process start function returns
-{ok, - orChild }{ok, , the child specification and pid are - added to the supervisor and the function returns the same - value.Child ,Info }If the child process start function returns
-ignore , - the child specification is added to the supervisor (unless the - supervisor is asimple_one_for_one supervisor, see below), - the pid is set toundefined and the function returns -{ok,undefined} . -In the case of a
-simple_one_for_one supervisor, when a child - process start function returnsignore the functions returns -{ok,undefined} and no child is added to the supervisor. -If the child process start function returns an error tuple or - an erroneous value, or if it fails, the child specification is - discarded, and the function returns
+{error,Error} where -Error is a term containing information about the error - and child specification.For a description of
, see + SupRef . start_child/2 - - Terminate a child process belonging to a supervisor. -- -Tells the supervisor
- -to terminate the given - child. SupRef If the supervisor is not
- -simple_one_for_one , -must be the child specification - identifier. The process, if there is one, is terminated and, - unless it is a temporary child, the child specification is - kept by the supervisor. The child process may later be - restarted by the supervisor. The child process can also be - restarted explicitly by calling - Id restart_child/2 . Usedelete_child/2 to remove - the child specification.If the child is temporary, the child specification is deleted as - soon as the process terminates. This means - that
-delete_child/2 has no meaning, - andrestart_child/2 can not be used for these - children.If the supervisor is
-simple_one_for_one ,- must be the child process' Id pid() . If the specified - process is alive, but is not a child of the given - supervisor, the function will return -{error,not_found} . If the child specification - identifier is given instead of apid() , the - function will return{error,simple_one_for_one} .If successful, the function returns
-ok . If there is - no child specification with the specified, the - function returns Id {error,not_found} .See
-- for a description of start_child/2 . SupRef + + Delete a child specification from a supervisor. - +Tells the supervisor
-to delete the child - specification identified by SupRef . The corresponding child - process must not be running. Use Id terminate_child/2 to - terminate it.See
+- for a description of start_child/2 . SupRef Tells supervisor
+to delete the child + specification identified by SupRef . The corresponding + child process must not be running. Use + Id + to terminate it.terminate_child/2 For a description of
, see + SupRef . start_child/2 If successful, the function returns
+ returnsok . If the child - specification identified byexists but - the corresponding child process is running or about to be restarted, - the function returns Id {error,running} or -{error,restarting} , respectively. If the child specification + specification identified byexists but the + corresponding child process is running or is about to be restarted, + the function returns Id {error,running} or +{error,restarting} , respectively. If the child specification identified bydoes not exist, the function - returns Id {error,not_found} .{error,not_found} . ++ ++ Return the child specification map for the specified + child. ++ Returns the child specification map for the child identified + by
+Id under supervisorSupRef . The returned + map contains all keys, both mandatory and optional.For a description of
, see + SupRef . start_child/2 + - Restart a terminated child process belonging to a supervisor. +Restart a terminated child process belonging to a supervisor. + - Tells the supervisor
to restart + SupRef Tells supervisor
-to restart a child process corresponding to the child specification identified by SupRef . The child specification must exist, and the corresponding child process must not be running. Id Note that for temporary children, the child specification - is automatically deleted when the child terminates; thus - it is not possible to restart such children.
-See
+- for a description of start_child/2 SupRef .Notice that for temporary children, the child specification + is automatically deleted when the child terminates; thus, + it is not possible to restart such children.
+For a description of
, see + SupRef . start_child/2 If the child specification identified by
+ function returnsdoes not exist, the function returns Id {error,not_found} . If the child specification exists but the corresponding process is already running, the - function returns -{error,running} .{error,running} .If the child process start function returns
{ok, orChild }{ok, , the pid is added to the supervisor and the function returns the same value.Child ,Info }If the child process start function returns
ignore , - the pid remains set toundefined , and the function + the pid remains set toundefined and the function returns{ok,undefined} .If the child process start function returns an error tuple or an erroneous value, or if it fails, the function returns -
{error, +Error }{error, , whereError }is a term containing information about the error. Error - +- Return information about all children specifications and - child processes belonging to a supervisor. ++ Dynamically add a child process to a supervisor. ++ - Returns a newly created list with information about all child - specifications and child processes belonging to - the supervisor
-. SupRef Note that calling this function when supervising a large - number of children under low memory conditions can cause an - out of memory exception.
-See
-for a description of - start_child/2 SupRef .The information given for each child specification/process - is:
+Dynamically adds a child specification to supervisor +
+, which starts the corresponding child + process. SupRef +
can be any of the + following: SupRef +
+- The pid
+- +
Name , if the supervisor is locally registered- +
{Name,Node} , if the supervisor is locally + registered at another node- +
{global,Name} , if the supervisor is globally + registered- +
{via,Module,Name} , if the supervisor is registered + through an alternative process registry+
must be a valid child specification + (unless the supervisor is a ChildSpec simple_one_for_one + supervisor; see below). The child process is started by + using the start function as defined in the child specification.For a
simple_one_for_one supervisor, + the child specification defined inModule:init/1 is used, + andmust instead be an arbitrary + list of terms ChildSpec . The child process is then + started by appending List to the existing start + function arguments, that is, by calling + List apply(M, F, A++ , whereList ){M,F,A} is the + start function defined in the child specification.+
- -
--
- as defined in the child specification or - Id undefined in the case of a -simple_one_for_one supervisor.- -
+
- the pid of the corresponding child - process, the atom Child restarting if the process is about to be - restarted, orundefined if there is no such process.If there already exists a child specification with the specified + identifier,
is discarded, and + the function returns ChildSpec {error,already_present} or +{error,{already_started, , depending on + if the corresponding child process is running or not.Child }}- -
+
- as defined in the child specification. Type If the child process start function returns +
{ok, or +Child }{ok, , the child + specification and pid are added to the supervisor and the + function returns the same value.Child ,Info }- -
+
- as defined in the child specification. Modules If the child process start function returns
ignore , + the child specification is added to the supervisor (unless the + supervisor is asimple_one_for_one supervisor, see below), + the pid is set toundefined , and the function returns +{ok,undefined} .For a
+simple_one_for_one supervisor, when a child + process start function returnsignore , the functions returns +{ok,undefined} and no child is added to the supervisor.If the child process start function returns an error tuple or + an erroneous value, or if it fails, the child specification is + discarded, and the function returns
{error,Error} , where +Error is a term containing information about the error + and child specification.- +- Return counts for the number of child specifications, - active children, supervisors, and workers. ++ + Create a supervisor process. ++ + - Returns a property list (see
+proplists ) containing the - counts for each of the following elements of the supervisor's - child specifications and managed processes:Creates a supervisor process as part of a supervision tree. + For example, the function ensures that the supervisor is linked to + the calling process (its supervisor).
+The created supervisor process calls +
to + find out about restart strategy, maximum restart intensity, + and child processes. To ensure a synchronized startup + procedure, Module :init/1start_link/2,3 does not return until +has returned and all child + processes have been started. Module :init/1+
- -
+
specs - the total count of children, dead or alive.If
, the supervisor is + registered locally as SupName ={local,Name}Name usingregister/2 .- -
+
active - the count of all actively running child processes - managed by this supervisor. In the case ofsimple_one_for_one - supervisors, no check is carried out to ensure that each child process - is still alive, though the result provided here is likely to be very - accurate unless the supervisor is heavily overloaded.If
, the supervisor is + registered globally as SupName ={global,Name}Name using ++ .global:register_name/2 - -
++
supervisors - the count of all children marked as - child_type = supervisor in the spec list, whether or not the - child process is still alive.If +
+, + the supervisor is registered as SupName ={via,Module ,Name }Name using the registry + represented byModule . TheModule callback must + export the functionsregister_name/2 , +unregister_name/1 , andsend/2 , which must behave + like the corresponding functions in +. Thus, + global {via,global, is a valid reference.Name }If no name is provided, the supervisor is not registered.
++
is the name of the callback module. Module +
is any term that is passed as + the argument to Args . Module :init/1+
-- +
If the supervisor and its child processes are successfully + created (that is, if all child process start functions return +
{ok,Child} ,{ok,Child,Info} , orignore ), + the function returns{ok,Pid} , wherePid is + the pid of the supervisor.- -
++
workers - the count of all children marked as - child_type = worker in the spec list, whether or not the child - process is still alive.If there already exists a process with the specified +
+, the function returns + SupName {error,{already_started,Pid}} , wherePid is + the pid of that process.- +
+If
+returns Module :init/1ignore , this + function returnsignore as well, and the supervisor + terminates with reasonnormal .- +
+If
+fails or returns an + incorrect value, this function returns Module :init/1{error,Term} , where +Term is a term with information about the error, and the + supervisor terminates with reasonTerm .- +
If any child process start function fails or returns an error + tuple or an erroneous value, the supervisor first terminates + all already started child processes with reason
shutdown + and then terminate itself and returns +{error, {shutdown, Reason}} .See
- for a description of start_child/2 . SupRef - +- Check if children specifications are syntactically correct. ++ Terminate a child process belonging to a supervisor. - This function takes a list of child specification as argument - and returns
+ok if all of them are syntactically - correct, or{error, otherwise.Error }Tells supervisor
+to terminate the + specified child. SupRef If the supervisor is not
+simple_one_for_one , +must be the child specification + identifier. The process, if any, is terminated and, + unless it is a temporary child, the child specification is + kept by the supervisor. The child process can later be + restarted by the supervisor. The child process can also be + restarted explicitly by calling + Id . + Use + restart_child/2 + to remove the child specification. delete_child/2 If the child is temporary, the child specification is deleted as + soon as the process terminates. This means + that
+delete_child/2 has no meaning + andrestart_child/2 cannot be used for these children.If the supervisor is
+simple_one_for_one , ++ must be the Id pid() of the child process. If the specified + process is alive, but is not a child of the specified + supervisor, the function returns +{error,not_found} . If the child specification + identifier is specified instead of apid() , the + function returns{error,simple_one_for_one} .If successful, the function returns
+ok . If there is + no child specification with the specified, the + function returns Id {error,not_found} .For a description of
, see + SupRef . start_child/2 - - Return the child specification map for the given - child. ++ Return information about all children specifications and + child processes belonging to a supervisor. - Returns the child specification map for the child identified - by
-Id under supervisorSupRef . The returned - map contains all keys, both mandatory and optional.See
+- for a description of start_child/2 . SupRef Returns a newly created list with information about all child + specifications and child processes belonging to + supervisor
+. SupRef Notice that calling this function when supervising many + childrens under low memory conditions can cause an + out of memory exception.
+For a description of
+, see + SupRef . start_child/2 The following information is given for each child + specification/process:
++
- +
++
- As defined in the child specification or + Id undefined for asimple_one_for_one supervisor.- +
++
- The pid of the corresponding child + process, the atom Child restarting if the process is about to be + restarted, orundefined if there is no such process.- +
++
- As defined in the child + specification. Type - +
++
- As defined in the child + specification. Modules - +CALLBACK FUNCTIONS -The following functions must be exported from a +
Callback Functions +The following function must be exported from a
supervisor callback module.Module:init(Args) -> Result @@ -599,47 +662,52 @@Args = term() Result = {ok,{SupFlags,[ChildSpec]}} | ignore -SupFlags = -sup_flags() ChildSpec = +child_spec() SupFlags = + +sup_flags() ChildSpec = + child_spec() Whenever a supervisor is started using -
supervisor:start_link/2,3 , this function is called by +, + this function is called by the new process to find out about restart strategy, maximum restart intensity, and child specifications. start_link/2,3
Args is theArgs argument provided to the start function.-
SupFlags is the supervisor flags defining the - restart strategy and max restart intensity for the + restart strategy and maximum restart intensity for the supervisor.[ChildSpec] is a list of valid child specifications defining which child processes the supervisor - shall start and monitor. See the discussion about - Supervision Principles above.Note that when the restart strategy is + must start and monitor. See the discussion in section +
++ earlier.Supervision Principles Notice that when the restart strategy is
-simple_one_for_one , the list of child specifications must be a list with one child specification only. - (The child specification identifier is ignored.) No child process is then started + (The child specification identifier is ignored.) + No child process is then started during the initialization phase, but all children are assumed to be started dynamically using -supervisor:start_child/2 .The function may also return
-ignore .Note that this function might also be called as a part of a - code upgrade procedure. For this reason, the function should - not have any side effects. See -
+Design - Principles for more information about code upgrade - of supervisors.. + start_child/2 The function can also return
+ignore .Notice that this function can also be called as a part of a code + upgrade procedure. Therefore, the function is not to have any side + effects. For more information about code upgrade of supervisors, see + section +
Changing + a Supervisor in OTP Design Principles.- diff --git a/lib/stdlib/doc/src/supervisor_bridge.xml b/lib/stdlib/doc/src/supervisor_bridge.xml index e40c8bbd6f..c4c1b37548 100644 --- a/lib/stdlib/doc/src/supervisor_bridge.xml +++ b/lib/stdlib/doc/src/supervisor_bridge.xml @@ -31,73 +31,106 @@SEE ALSO -+
gen_event(3) , -gen_fsm(3) , -gen_statem(3) , -gen_server(3) , -sys(3) See Also +
, + gen_event(3) , + gen_fsm(3) , + gen_statem(3) , + gen_server(3) sys(3) supervisor_bridge -Generic Supervisor Bridge Behaviour. +Generic supervisor bridge behavior. - +A behaviour module for implementing a supervisor_bridge, a process - which connects a subsystem not designed according to the OTP design - principles to a supervision tree. The supervisor_bridge sits between +
This behavior module provides a supervisor bridge, a process + that connects a subsystem not designed according to the OTP design + principles to a supervision tree. The supervisor bridge sits between a supervisor and the subsystem. It behaves like a real supervisor to its own supervisor, but has a different interface than a real - supervisor to the subsystem. Refer to OTP Design Principles - for more information.
-A supervisor_bridge assumes the functions for starting and stopping + supervisor to the subsystem. For more information, see +
+ ++ Supervisor Behaviour in OTP Design Principles. +A supervisor bridge assumes the functions for starting and stopping the subsystem to be located in a callback module exporting a - pre-defined set of functions.
-The
-sys module can be used for debugging a - supervisor_bridge.Unless otherwise stated, all functions in this module will fail if - the specified supervisor_bridge does not exist or if bad arguments are - given.
+ predefined set of functions. + +The
+ +module can be used + for debugging a supervisor bridge. sys(3) Unless otherwise stated, all functions in this module fail if + the specified supervisor bridge does not exist or if bad arguments are + specified.
Create a supervisor bridge process. - Creates a supervisor_bridge process, linked to the calling - process, which calls
to start the subsystem. - To ensure a synchronized start-up procedure, this function does + Module :init/1Creates a supervisor bridge process, linked to the calling process, + which calls
-to start the subsystem. + To ensure a synchronized startup procedure, this function does not return until Module :init/1has returned. Module :init/1If
+the supervisor_bridge is - registered locally as SupBridgeName ={local,Name }using Name register/2 . - Ifthe supervisor_bridge is - registered globally as SupBridgeName ={global,Name }using - Name global:register_name/2 . - Ifthe supervisor_bridge is - registered as SupBridgeName ={via,Module ,Name }using a registry represented - by Name Module . TheModule callback should export - the functionsregister_name/2 ,unregister_name/1 - andsend/2 , which should behave like the - corresponding functions inglobal . Thus, -{via,global,GlobalName} is a valid reference. - If no name is provided, the supervisor_bridge is not registered. - If there already exists a process with the specified -the function returns - SupBridgeName {error,{already_started, , wherePid }}is the pid - of that process. Pid +
+- +
+If
+, + the supervisor bridge is registered locally as + SupBridgeName ={local,Name }using Name register/2 .- +
+If
+, + the supervisor bridge is registered globally as + SupBridgeName ={global,Name }using + Name + .global:register_name/2 - +
+If +
+, + the supervisor bridge is registered as SupBridgeName ={via,Module ,Name }+ using a registry represented by Name Module . The +Module callback is to export functions +register_name/2 ,unregister_name/1 , andsend/2 , + which are to behave like the corresponding functions in +. + Thus, global {via,global,GlobalName} is a valid reference.If no name is provided, the supervisor bridge is not registered.
-
is the name of the callback module. Module -
is an arbitrary term which is passed as the argument - to Args . Module :init/1If the supervisor_bridge and the subsystem are successfully - started the function returns
-{ok, , wherePid }is - is the pid of the supervisor_bridge. Pid If
+returns Module :init/1ignore , this function - returnsignore as well and the supervisor_bridge terminates - with reasonnormal . - Iffails or returns an error tuple or an - incorrect value, this function returns Module :init/1{error, where -Error r}is a term with information about the error, and - the supervisor_bridge terminates with reason Error . Error +
is an arbitrary term that is passed as the + argument to Args . Module :init/1+
- +
+If the supervisor bridge and the subsystem are successfully + started, the function returns
+{ok, , where +Pid }is is the pid of the supervisor + bridge. Pid - +
+If there already exists a process with the specified +
+, the function returns + SupBridgeName {error,{already_started, , where +Pid }}is the pid of that process. Pid - +
+If
+returns Module :init/1ignore , this + function returnsignore as well and the supervisor bridge + terminates with reasonnormal .- +
+If
+fails or returns an error + tuple or an incorrect value, this function returns + Module :init/1{error, , where +Error r}is a term with information about the + error, and the supervisor bridge + terminates with reason Error . Error - +CALLBACK FUNCTIONS -The following functions should be exported from a +
Callback Functions +The following functions must be exported from a
supervisor_bridge callback module.+ Module:init(Args) -> Result @@ -110,25 +143,26 @@Error = term() - Whenever a supervisor_bridge is started using -
supervisor_bridge:start_link/2,3 , this function is called +Whenever a supervisor bridge is started using +
, + this function is called by the new process to start the subsystem and initialize. start_link/2,3 -
Args is theArgs argument provided to the start function.The function should return
{ok,Pid,State} wherePid +The function is to return
{ok,Pid,State} , wherePid is the pid of the main process in the subsystem andState is any term.If later
-Pid terminates with a reasonReason , - the supervisor bridge will terminate with reasonReason as - well. - If later the supervisor_bridge is stopped by its supervisor with - reasonReason , it will call + the supervisor bridge terminates with reasonReason as well. + If later the supervisor bridge is stopped by its supervisor with + reasonReason , it callsModule:terminate(Reason,State) to terminate.If something goes wrong during the initialization the function - should return
+{error,Error} whereError is any - term, orignore .If the initialization fails, the function is to return +
{error,Error} , whereError is any term, + orignore .Module:terminate(Reason, State) Clean up and stop subsystem. @@ -137,15 +171,15 @@State = term() - @@ -153,9 +187,9 @@This function is called by the supervisor_bridge when it is about - to terminate. It should be the opposite of
Module:init/1 +This function is called by the supervisor bridge when it is about + to terminate. It is to be the opposite of
-Module:init/1 and stop the subsystem and do any necessary cleaning up. The return value is ignored.
Reason isshutdown if the supervisor_bridge is - terminated by its supervisor. If the supervisor_bridge terminates ++ then
Reason isshutdown if the supervisor bridge is + terminated by its supervisor. If the supervisor bridge terminates because a a linked process (apart from the main process of the subsystem) has terminated with reasonTerm , -Reason will beTerm .Reason becomesTerm .
State is taken from the return value ofModule:init/1 .- diff --git a/lib/stdlib/doc/src/sys.xml b/lib/stdlib/doc/src/sys.xml index 2255395f46..1120b926d5 100644 --- a/lib/stdlib/doc/src/sys.xml +++ b/lib/stdlib/doc/src/sys.xml @@ -4,7 +4,7 @@SEE ALSO -+
supervisor(3) , -sys(3) See Also +
, + supervisor(3) sys(3) - 1996 2016 +1996 2014 Ericsson AB. All Rights Reserved. @@ -30,62 +30,67 @@ 1996-06-06 - sys.sgml +sys.xml sys -A Functional Interface to System Messages +A functional interface to system messages. - This module contains functions for sending system messages used by programs, and messages used for debugging purposes. -
-Functions used for implementation of processes - should also understand system messages such as debugging - messages and code change. These functions must be used to implement the use of system messages for a process; either directly, or through standard behaviours, such as
-gen_server .The default timeout is 5000 ms, unless otherwise specified. The -
timeout defines the time period to wait for the process to +This module contains functions for sending system messages used by + programs, and messages used for debugging purposes.
+Functions used for implementation of processes are also expected to + understand system messages, such as debug messages and code change. These + functions must be used to implement the use of system messages for a + process; either directly, or through standard behaviors, such as +
+. gen_server The default time-out is 5000 ms, unless otherwise specified. +
-timeout defines the time to wait for the process to respond to a request. If the process does not respond, the function evaluatesexit({timeout, {M, F, A}}) .+
The functions make reference to a debug structure. - The debug structure is a list of dbg_opt() . -dbg_opt() is an internal data type used by the -handle_system_msg/6 function. No debugging is performed if it is an empty list. -+ The functions make references to a debug structure. + The debug structure is a list of
dbg_opt() , which is an internal + data type used by function+ . No debugging is performed if it is + an empty list.handle_system_msg/6 @@ -93,15 +98,16 @@ System Messages -Processes which are not implemented as one of the standard - behaviours must still understand system - messages. There are three different messages which must be - understood: -
+Processes that are not implemented as one of the standard + behaviors must still understand system messages. The following + three messages must be understood:
Plain system messages. These are received as
+ receiving process module. When a system message is received, function +{system, From, Msg} . The content and meaning of this message are not interpreted by the - receiving process module. When a system message has been - received, the functionsys:handle_system_msg/6 - is called in order to handle the request. -+ + is called to handle the request.handle_system_msg/6 Shutdown messages. If the process traps exits, it must - be able to handle an shut-down request from its parent, the + be able to handle a shutdown request from its parent, the supervisor. The message
{'EXIT', Parent, Reason} - from the parent is an order to terminate. The process must terminate when this message is received, normally with the + from the parent is an order to terminate. The process must + terminate when this message is received, normally with the sameReason asParent .- -
There is one more message which the process must understand if the modules used to implement the process change dynamically during runtime. An example of such a process is the
+gen_event processes. This message is{get_modules, From} . The reply to this message isFrom ! {modules, Modules} , - whereModules is a list of the currently active modules in the process. -If the modules used to implement the process change dynamically + during runtime, the process must understand one more message. An + example is the
+ processes. The message is gen_event {get_modules, From} . + The reply to this message isFrom ! {modules, Modules} , where +Modules is a list of the currently active modules in the + process.This message is used by the release handler to find which - processes execute a certain module. The process may at a - later time be suspended and ordered to perform a code change - for one of its modules. -
+ processes that execute a certain module. The process can later be + suspended and ordered to perform a code change for one of its + modules.+ System Events When debugging a process with the functions of this - module, the process generates system_events which are + module, the process generates system_events, which are then treated in the debug function. For example,
-trace - formats the system events to the tty. + formats the system events to the terminal.There are three predefined system events which are used when a +
Three predefined system events are used when a process receives or sends a message. The process can also define its own system events. It is always up to the process itself to format these events.
+ @@ -111,7 +117,7 @@ - + See
above .See the introduction of this manual page.
@@ -120,421 +126,594 @@ - -- - Log system events in memory -- -Turns the logging of system events On or Off. If On, a - maximum of
-events are kept in the - debug structure (the default is 10). If N is Flag get , a list of all - logged events is returned. Ifis Flag standard_io . The events are - formatted with a function that is defined by the process that - generated the event (with a call to -sys:handle_debug/4 ).- -- - Log system events to the specified file -- -Enables or disables the logging of all system events in textual - format to the file. The events are formatted with a function that is - defined by the process that generated the event (with a call - to
-sys:handle_debug/4 ).- +- - Enable or disable the collections of statistics ++ + Send the code change system message to the process. - Enables or disables the collection of statistics. If
+is - Flag get , the statistical collection is returned.Tells the process to change code. The process must be + suspended to handle this message. Argument
+ is reserved for each process to use as its own. Function + Extra is called. + Module :system_code_change/4is the old version of the + OldVsn . Module - +- - Print all system events on +standard_io + + Get the state of the process. - Prints all system events on
+standard_io . The events are - formatted with a function that is defined by the process that - generated the event (with a call to -sys:handle_debug/4 ).Gets the state of the process.
++ +These functions are intended only to help with debugging. They are + provided for convenience, allowing developers to avoid having to + create their own state extraction functions and also avoid having + to interactively extract the state from the return values of +
+or + get_status/1 + while debugging. get_status/2 The value of
+varies for different types of + processes, as follows: State +
+- +
+For a +
++ process, the returned gen_server + is the state of the callback module. State - +
+For a +
++ process, gen_fsm is the tuple + State {CurrentStateName, CurrentStateData} .- +
+For a +
++ process, gen_statem is the tuple + State {CurrentState,CurrentData} .- +
+For a +
++ process, gen_event is a list of tuples, + where each tuple corresponds to an event handler registered + in the process and contains State {Module, Id, HandlerState} , + as follows:+ ++ Module - +
+The module name of the event handler.
++ Id - +
+The ID of the handler (which is
+false if it was + registered without an ID).+ HandlerState - +
+The state of the handler.
+If the callback module exports a function
+system_get_state/1 , + it is called in the target process to get its state. Its argument is + the same as theMisc value returned by +, and + function get_status/1,2 + is expected to extract the + state of the callback module from it. Function +Module:system_get_state/1 system_get_state/1 must return{ok, State} , where +State is the state of the callback module.If the callback module does not export a
+system_get_state/1 + function,get_state/1,2 assumes that theMisc value is + the state of the callback module and returns it directly instead.If the callback module's
+system_get_state/1 function crashes + or throws an exception, the caller exits with error +{callback_failed, {Module, system_get_state}, {Class, Reason}} , + whereModule is the name of the callback module and +Class andReason indicate details of the exception.Function
+system_get_state/1 is primarily useful for + user-defined behaviors and modules that implement OTP +special processes . + Thegen_server ,gen_fsm , +gen_statem , andgen_event OTP + behavior modules export this function, so callback modules for those + behaviors need not to supply their own.For more information about a process, including its state, see +
and + get_status/1 . get_status/2 - +- - Turn off debugging ++ + Get the status of the process. - Turns off all debugging for the process. This includes - functions that have been installed explicitly with the -
+install function, for example triggers.Gets the status of the process.
+The value of
+varies for different types of + processes, for example: Misc +
+- +
+A
++ process returns the state of the callback module. gen_server - +
+A
++ process returns information, such as its current + state name and state data. gen_fsm - +
+A
++ process returns information, such as its current + state name and state data. gen_statem - +
+A
++ process returns information about each of its + registered handlers. gen_event Callback modules for
gen_server , +gen_fsm ,gen_statem , andgen_event + can also change the value of+ by exporting a function Misc format_status/2 , which contributes + module-specific information. For details, see ++ , +gen_server:format_status/2 + , +gen_fsm:format_status/2 + , and +gen_statem:format_status/2 + .gen_event:format_status/2 - +- - Suspend the process ++ + Install a debug function in the process. - Suspends the process. When the process is suspended, it - will only respond to other system messages, but not other - messages.
+Enables installation of alternative debug functions. An example of + such a function is a trigger, a function that waits for some + special event and performs some action when the event is + generated. For example, turning on low-level tracing.
+
is called whenever a system event is + generated. This function is to return Func done , or a new +Func state. In the first case, the function is removed. It is + also removed if the function fails.- +- - Resume a suspended process ++ + Log system events in memory. - Resumes a suspended process.
+Turns the logging of system events on or off. If on, a + maximum of
+events are kept in the + debug structure (default is 10). N If
+is Flag get , a list of all logged + events is returned.If
+is Flag standard_io .The events are formatted with a function that is defined by the + process that generated the event (with a call to +
+ .handle_debug/4 )- +- - Send the code change system message to the process ++ + Log system events to the specified file. - Tells the process to change code. The process must be - suspended to handle this message. The
+argument is - reserved for each process to use as its own. The function - Extra is called. Module :system_code_change/4is - the old version of the OldVsn . Module Enables or disables the logging of all system events in text + format to the file. The events are formatted with a function that is + defined by the process that generated the event (with a call to +
). + handle_debug/4 - +- - Get the status of the process ++ + Turn off debugging. - Gets the status of the process.
-The value of
+varies for different types of - processes. For example, a Misc gen_server process returns - the callback module's state, agen_fsm process - returns information such as its current state name and state data, - agen_statem process returns information about - its current state and data, and agen_event process - returns information about each of its - registered handlers. Callback modules forgen_server , -gen_fsm ,gen_statem andgen_event - can also customise the value - ofby exporting a Misc format_status/2 - function that contributes module-specific information; - seegen_server format_status/2 , -gen_fsm format_status/2 , -gen_statem format_status/2 , and -gen_event format_status/2 - for more details.Turns off all debugging for the process. This includes + functions that are installed explicitly with function +
, + for example, triggers. install/2,3 - +- - Get the state of the process ++ + Remove a debug function from the process. - Gets the state of the process.
-- -These functions are intended only to help with debugging. They are provided for - convenience, allowing developers to avoid having to create their own state extraction - functions and also avoid having to interactively extract state from the return values of -
-or - get_status/1 while debugging. get_status/2 The value of
-varies for different types of - processes. For a State gen_server process, the returned- is simply the callback module's state. For a State gen_fsm process, -is the tuple State {CurrentStateName, CurrentStateData} . - For agen_statem processis - the tuple State {CurrentState,CurrentData}. - For agen_event process,a list of tuples, - where each tuple corresponds to an event handler registered in the process and contains - State {Module, Id, HandlerState} , whereModule is the event handler's module name, -Id is the handler's ID (which is the valuefalse if it was registered without - an ID), andHandlerState is the handler's state.If the callback module exports a
-system_get_state/1 function, it will be called in the - target process to get its state. Its argument is the same as theMisc value returned by -get_status/1,2 , and thesystem_get_state/1 - function is expected to extract the callback module's state from it. Thesystem_get_state/1 - function must return{ok, State} whereState is the callback module's state.If the callback module does not export a
-system_get_state/1 function,get_state/1,2 - assumes theMisc value is the callback module's state and returns it directly instead.If the callback module's
-system_get_state/1 function crashes or throws an exception, the - caller exits with error{callback_failed, {Module, system_get_state}, {Class, Reason}} where -Module is the name of the callback module andClass andReason indicate - details of the exception.The
-system_get_state/1 function is primarily useful for user-defined - behaviours and modules that implement OTPspecial - processes . Thegen_server ,gen_fsm , -gen_statem andgen_event OTP - behaviour modules export this function, so callback modules for those behaviours - need not supply their own.To obtain more information about a process, including its state, see -
+get_status/1 and -get_status/2 .Removes an installed debug function from the + process.
must be the same as previously + installed. Func + - Replace the state of the process +Replace the state of the process. Replaces the state of the process, and returns the new state.
- -These functions are intended only to help with debugging, and they should not be - be called from normal code. They are provided for convenience, allowing developers - to avoid having to create their own custom state replacement functions.
+These functions are intended only to help with debugging, and are + not to be called from normal code. They are provided for + convenience, allowing developers to avoid having to create their own + custom state replacement functions.
The
-function provides a new state for the process. - The StateFun argument and State return value - of NewState vary for different types of processes. For a - StateFun gen_server process,is simply the callback module's - state, and State is a new instance of that state. For a - NewState gen_fsm process,is the tuple - State {CurrentStateName, CurrentStateData} , and- is a similar tuple that may contain a new state name, new state data, or both. - The same applies for a NewState gen_statem process but - it names the tuple fields{CurrentState,CurrentData} . - For agen_event process,is the tuple - State {Module, Id, HandlerState} whereModule is the event handler's module name, -Id is the handler's ID (which is the valuefalse if it was registered without - an ID), andHandlerState is the handler's state.is a - similar tuple where NewState Module andId shall have the same values as in -but the value of State HandlerState may be different. Returning - awhose NewState Module orId values differ from those of -will result in the event handler's state remaining unchanged. For a - State gen_event process,is called once for each event handler - registered in the StateFun gen_event process.If a
-function decides not to effect any change in process - state, then regardless of process type, it may simply return its StateFun - argument. State If a
function crashes or throws an exception, then - for StateFun gen_server ,gen_fsm orgen_statem processes, - the original state of the process is - unchanged. Forgen_event processes, a crashing or failing- function means that only the state of the particular event handler it was working on when it - failed or crashed is unchanged; it can still succeed in changing the states of other event + StateFun Function
+provides a new state for the + process. Argument StateFun and the + State return value of + NewState vary for different types of + processes as follows: StateFun +
+- +
+For a
++ process, gen_server is the state of the callback + module and State + is a new instance of that state. NewState - +
+For a
+process, + gen_fsm is the tuple State {CurrentStateName, + CurrentStateData} , andis a + similar tuple, which can contain + a new state name, new state data, or both. NewState - +
+For a
++ process, gen_statem is the + tuple State {CurrentState,CurrentData} , + andis a + similar tuple, which can contain + a new current state, new state data, or both. NewState - +
+For a
++ process, gen_event is the + tuple State {Module, Id, HandlerState} as follows:+ ++ Module - +
+The module name of the event handler.
++ Id - +
+The ID of the handler (which is
+false if it was + registered without an ID).+ HandlerState - +
+The state of the handler.
++
is a similar tuple where + NewState Module andId are to have the same values as in +, but the value of State HandlerState + can be different. Returning a, whose + NewState Module orId values differ from those of +, leaves the state of the event handler + unchanged. For a State gen_event process, +is called once for each event handler + registered in the StateFun gen_event process.If a
+function decides not to effect any + change in process state, then regardless of process type, it can + return its StateFun argument. State If a
-function crashes or throws an + exception, the original state of the process is unchanged for + StateFun gen_server ,gen_fsm , andgen_statem processes. + Forgen_event processes, a crashing or + failingfunction + means that only the state of the particular event handler it was + working on when it failed or crashed is unchanged; it can still + succeed in changing the states of other event handlers registered in the same StateFun gen_event process.If the callback module exports a
-system_replace_state/2 function, it will be called in the - target process to replace its state usingStateFun . Its two arguments areStateFun - andMisc , whereMisc is the same as theMisc value returned by -get_status/1,2 . Asystem_replace_state/2 function - is expected to return{ok, NewState, NewMisc} whereNewState is the callback module's - new state obtained by callingStateFun , andNewMisc is a possibly new value used to - replace the originalMisc (required sinceMisc often contains the callback - module's state within it).If the callback module does not export a
-system_replace_state/2 function, -replace_state/2,3 assumes theMisc value is the callback module's state, passes it - toStateFun and uses the return value as both the new state and as the new value of -Misc .If the callback module's
system_replace_state/2 function crashes or throws an exception, - the caller exits with error{callback_failed, {Module, system_replace_state}, {Class, Reason}} - whereModule is the name of the callback module andClass andReason indicate details - of the exception. If the callback module does not provide asystem_replace_state/2 function and -StateFun crashes or throws an exception, the caller exits with error +If the callback module exports a +
++ function, it is called in the + target process to replace its state usingsystem_replace_state/2 StateFun . Its two + arguments areStateFun andMisc , where +Misc is the same as theMisc value returned by +. + A get_status/1,2 system_replace_state/2 function is expected to return +{ok, NewState, NewMisc} , whereNewState is the new state + of the callback module, obtained by callingStateFun , and +NewMisc is + a possibly new value used to replace the originalMisc + (required asMisc often contains the state of the callback + module within it).If the callback module does not export a +
+system_replace_state/2 function, ++ assumes that replace_state/2,3 Misc is the state of the callback module, + passes it toStateFun and uses the return value as + both the new state and as the new value ofMisc .If the callback module's function
-system_replace_state/2 + crashes or throws an exception, the caller exits with error +{callback_failed, {Module, system_replace_state}, {Class, + Reason}} , whereModule is the name of the callback module + andClass andReason indicate details of the exception. + If the callback module does not provide a +system_replace_state/2 function andStateFun crashes or + throws an exception, the caller exits with error{callback_failed, StateFun, {Class, Reason}} .The
+system_replace_state/2 function is primarily useful for user-defined behaviours and - modules that implement OTPspecial processes . The -gen_server ,gen_fsm ,gen_statem and -gen_event OTP behaviour modules export this function, - and so callback modules for those behaviours need not supply their own.Function
system_replace_state/2 is primarily useful for + user-defined behaviors and modules that implement OTP +special processes . The + OTP behavior modulesgen_server , +gen_fsm ,gen_statem , andgen_event + export this function, so callback modules for those + behaviors need not to supply their own.- +- - Install a debug function in the process ++ + Resume a suspended process. - This function makes it possible to install other debug - functions than the ones defined above. An example of such a - function is a trigger, a function that waits for some - special event and performs some action when the event is - generated. This could, for example, be turning on low level tracing. -
-+
is called whenever a system event is - generated. This function should return Func done , or a new - func state. In the first case, the function is removed. It is removed - if the function fails.Resumes a suspended process.
- + +- - Remove a debug function from the process ++ + Enable or disable the collections of statistics. - Removes a previously installed debug function from the - process.
+must be the same as previously - installed. Func Enables or disables the collection of statistics. If +
is Flag get , + the statistical collection is returned.+ ++ + Suspend the process. ++ +Suspends the process. When the process is suspended, it + only responds to other system messages, but not other + messages.
++ + - Terminate the process +Terminate the process. - +This function orders the process to terminate with the - given
+. The termination is done - asynchronously, so there is no guarantee that the process is - actually terminated when the function returns. Reason Orders the process to terminate with the + specified
+. The termination is done + asynchronously, so it is not guaranteed that the process is + terminated when the function returns. Reason + + + Print all system events on +standard_io .+ Prints all system events on
standard_io . The events are + formatted with a function that is defined by the process that + generated the event (with a call to +). + handle_debug/4 + Process Implementation Functions -+
The following functions are used when implementing a - special process. This is an ordinary process which does not use a - standard behaviour, but a process which understands the standard system messages. + The following functions are used when implementing a + special process. This is an ordinary process, which does not use a + standard behavior, but a process that understands the standard system + messages.
diff --git a/lib/stdlib/doc/src/timer.xml b/lib/stdlib/doc/src/timer.xml index 4f259d57a8..8f2ce36b06 100644 --- a/lib/stdlib/doc/src/timer.xml +++ b/lib/stdlib/doc/src/timer.xml @@ -30,26 +30,25 @@ + - Convert a list of options to a debug structure +Convert a list of options to a debug structure. - This function can be used by a process that initiates a debug - structure from a list of options. The values of the -
argument are the same as the corresponding + Opt Can be used by a process that initiates a debug + structure from a list of options. The values of argument +
are the same as for the corresponding functions. Opt + - Get the data associated with a debug option +Get the data associated with a debug option. - This function gets the data associated with a debug option.
+is returned if the - Default is not found. Can be - used by the process to retrieve debug data for printing - before it terminates. Item Gets the data associated with a debug option. +
+ is returned if Default is not found. Can be + used by the process to retrieve debug data for printing before it + terminates. Item + - Generate a system event +Generate a system event. This function is called by a process when it generates a - system event.
+ system event.is a formatting - function which is called as FormFunc in order to print - the events, which is necessary if tracing is activated. - FormFunc (Device, -Event ,Extra )is any extra information which the - process needs in the format function, for example the name - of the process. Extra is a formatting + function, called as FormFunc to print the events, + which is necessary if tracing is activated. + FormFunc (Device, +Event ,Extra )is any extra information that the + process needs in the format function, for example, the process + name. Extra + - Take care of system messages +Take care of system messages. - This function is used by a process module that wishes to take care of system - messages. The process receives a
-{system, - message and passes theFrom ,Msg }and Msg to this - function. - From This function never returns. It calls the function -
-where the - process continues the execution, or - Module :system_continue(Parent , NDebug,Misc )if - the process should terminate. The Module :system_terminate(Reason,Parent ,Debug ,Misc )must export - Module system_continue/3 ,system_terminate/4 , -system_code_change/4 ,system_get_state/1 and -system_replace_state/2 (see below). -The
argument can be used to save internal data - in a process, for example its state. It is sent to + Misc This function is used by a process module to take care of system + messages. The process receives a +
+{system, message and + passesFrom ,Msg }and Msg to this + function. From This function never returns. It calls either of the + following functions:
++
+- +
++
, + where the process continues the execution. Module :system_continue(Parent , + NDebug,Misc )- +
++
, + if the process is to terminate. Module :system_terminate(Reason, +Parent ,Debug ,Misc )+
must export the following: Module +
+- +
system_continue/3 - +
system_terminate/4 - +
system_code_change/4 - +
system_get_state/1 - +
system_replace_state/2 Argument
+can be used to save internal data + in a process, for example, its state. It is sent to Misc or - Module :system_continue/3Module :system_terminate/4. Module :system_terminate/4+ - Print the logged events in the debug structure +Print the logged events in the debug structure. - Prints the logged system events in the debug structure +
Prints the logged system events in the debug structure, using
+ generated by a call to +FormFunc as defined when the event was - generated by a call tohandle_debug/4 .. handle_debug/4 - +Mod:system_continue(Parent, Debug, Misc) -> none() -Called when the process should continue its execution +Module:system_code_change(Misc, Module, OldVsn, Extra) -> + {ok, NMisc} +Called when the process is to perform a code change. - Parent = pid() -Debug = [ dbg_opt() ]Misc = term() +OldVsn = undefined | term() +Module = atom() +Extra = term() +NMisc = term() - This function is called from
+sys:handle_system_msg/6 when the process - should continue its execution (for example after it has been - suspended). This function never returns.Called from
+ when the process is to perform a + code change. The code change is used when the + internal data structure has changed. This function + converts argumenthandle_system_msg/6 Misc to the new data + structure.OldVsn is attribute vsn of the + old version of theModule . If no such attribute is + defined, the atomundefined is sent.- +Mod:system_terminate(Reason, Parent, Debug, Misc) -> none() -Called when the process should terminate +Module:system_continue(Parent, Debug, Misc) -> none() +Called when the process is to continue its execution. - Reason = term() Parent = pid() Debug = [ dbg_opt() ]Misc = term() - This function is called from
+sys:handle_system_msg/6 when the process - should terminate. For example, this function is called when - the process is suspended and its parent orders shut-down. - It gives the process a chance to do a clean-up. This function never - returns.Called from
+ when the process is to continue + its execution (for example, after it has been + suspended). This function never returns.handle_system_msg/6 - +Mod:system_code_change(Misc, Module, OldVsn, Extra) -> {ok, NMisc} -Called when the process should perform a code change +Module:system_get_state(Misc) -> {ok, State} +Called when the process is to return its current state. + Misc = term() -OldVsn = undefined | term() -Module = atom() -Extra = term() -NMisc = term() +State = term() - Called from
+sys:handle_system_msg/6 when the process - should perform a code change. The code change is used when the - internal data structure has changed. This function - converts theMisc argument to the new data - structure.OldVsn is the vsn attribute of the - old version of theModule . If no such attribute was - defined, the atomundefined is sent.Called from
+ + when the process is to return a term that reflects its current state. +handle_system_msg/6 State is the value returned by +. get_state/2 - +Mod:system_get_state(Misc) -> {ok, State} -Called when the process should return its current state +Module:system_replace_state(StateFun, Misc) -> + {ok, NState, NMisc} +Called when the process is to replace its current state. + + +StateFun = fun((State :: term()) -> NState) Misc = term() -State = term() -NState = term() +NMisc = term() +- This function is called from
+sys:handle_system_msg/6 when the process - should return a term that reflects its current state.State is the - value returned bysys:get_state/2 .Called from
+ when the process is to replace + its current state.handle_system_msg/6 NState is the value returned by +. + replace_state/3 - Mod:system_replace_state(StateFun, Misc) -> {ok, NState, NMisc} -Called when the process should replace its current state +Module:system_terminate(Reason, Parent, Debug, Misc) -> none() +Called when the process is to terminate. - +StateFun = fun((State :: term()) -> NState) +Reason = term() +Parent = pid() +Debug = [ dbg_opt() ]Misc = term() -NState = term() -NMisc = term() -- This function is called from
+sys:handle_system_msg/6 when the process - should replace its current state.NState is the value returned by -sys:replace_state/3 .Called from
+ when the process is to terminate. + For example, this function is called when + the process is suspended and its parent orders shutdown. + It gives the process a chance to do a cleanup. This function never + returns.handle_system_msg/6 1998-09-09 D -timer.sgml +timer.xml timer -Timer Functions +Timer functions. + This module provides useful functions related to time. Unless otherwise - stated, time is always measured in
-milliseconds . All - timer functions return immediately, regardless of work carried - out by another process. -Successful evaluations of the timer functions yield return values - containing a timer reference, denoted
-TRef below. By using -cancel/1 , the returned reference can be used to cancel any - requested action. ATRef is an Erlang term, the contents - of which must not be altered. -The timeouts are not exact, but should be
+ stated, time is always measured in milliseconds. All + timer functions return immediately, regardless of work done by another + process. +at least as long - as requested. -Successful evaluations of the timer functions give return values + containing a timer reference, denoted
+TRef . By using +, + the returned reference can be used to cancel any + requested action. A cancel/1 TRef is an Erlang term, which contents + must not be changed.The time-outs are not exact, but are at least as long + as requested.
+ @@ -60,231 +59,286 @@ A timer reference.
- +- Start a global timer server (named +timer_server ).+ Apply Module:Function(Arguments) after a specified +Time .- Starts the timer server. Normally, the server does not need - to be started explicitly. It is started dynamically if it - is needed. This is useful during development, but in a - target system the server should be started explicitly. Use - configuration parameters for
+kernel for this.Evaluates
+apply( afterModule ,Function , +Arguments )+ milliseconds. Time Returns
{ok, or +TRef }{error, .Reason }- +- Apply +Module:Function(Arguments) after a specifiedTime .+ Evaluate Module:Function(Arguments) repeatedly at + intervals ofTime .- Evaluates
+apply( afterModule ,Function ,Arguments )amount of time - has elapsed. Returns Time {ok, , orTRef }{error, .Reason }Evaluates
+apply( repeatedly at intervals of +Module ,Function , +Arguments ). Time Returns
{ok, or +TRef }{error, .Reason }- +- - Send +Message toPid after a specifiedTime .+ Cancel a previously requested time-out identified by + TRef .- - +- send_after/3 - -
-Evaluates
-after Pid !Message amount - of time has elapsed. ( Time can also be an atom of a - registered name.) Returns Pid {ok, , or -TRef }{error, .Reason }- send_after/2 - -
-Same as
-send_after( .Time , self(),Message )Cancels a previously requested time-out.
+is + a unique + timer reference returned by the related timer function. TRef Returns
{ok, cancel} , or{error, + whenReason }is not a timer reference. TRef - + +- - Send an exit signal with +Reason after a specifiedTime .Send an exit signal with +Reason after a specified +Time .+ ++
exit_after/2 is the same as +exit_after( .Time , self(), +Reason1 )+
exit_after/3 sends an exit signal with reason +to + pid Reason1 . Returns Pid {ok, + orTRef }{error, .Reason2 }+ + ++ Convert +Hours +Minutes +Seconds to +Milliseconds .+ +Returns the number of milliseconds in
+. Hours + +Minutes +Seconds + + ++ Convert +Hours toMilliseconds .+ +Returns the number of milliseconds in
+. Hours + + ++ + Send an exit signal with +Reason after a specified +Time .+ ++
kill_after/1 is the same as +exit_after( .Time , self(), kill)+
kill_after/2 is the same as +exit_after( .Time ,Pid , kill)+ + ++ Converts +Minutes toMilliseconds .+ +Returns the number of milliseconds in +
+. Minutes + + ++ Calculate time difference between time stamps. +In microseconds ++ +Calculates the time difference
+in microseconds, + where Tdiff = +T2 -T1 and T1 + are time-stamp tuples on the same format as returned from + T2 + or +erlang:timestamp/0 + .os:timestamp/0 + + ++ Convert +Seconds toMilliseconds .+ +Returns the number of milliseconds in +
+. Seconds + -+ + Send Message toPid after a specified +Time .- - exit_after/3 - -
-Send an exit signal with reason
-to Pid - Reason1 . Returns Pid {ok, , or -TRef }{error, .Reason2 }- exit_after/2 - -
-Same as
-exit_after( .Time , self(),Reason1 )+ kill_after/2 send_after/3 - -
-Same as
+exit_after( .Time ,Pid , kill)Evaluates
+after + Pid !Message milliseconds. ( Time + can also be an atom of a registered name.) Pid Returns
{ok, or +TRef }{error, .Reason }+ kill_after/1 send_after/2 - -
Same as
+exit_after( .Time , self(), kill)Same as
send_after( .Time , self(), +Message )- +- Evaluate -Module:Function(Arguments) repeatedly at intervals ofTime .- -Evaluates
-apply( repeatedly at - intervals ofModule ,Function ,Arguments ). Returns Time {ok, , or -TRef }{error, .Reason }+ - Send +Message repeatedly at intervals ofTime .Send Message repeatedly at intervals ofTime . +send_interval/3 - -
Evaluates
repeatedly after Pid !Message - amount of time has elapsed. ( Time can also be an atom of - a registered name.) Returns Pid {ok, or +TRef }Evaluates
++ repeatedly after Pid !Message milliseconds. + ( Time can also be + an atom of a registered name.) Pid Returns
{ok, orTRef }{error, .Reason }send_interval/2 - -
Same as
+send_interval( .Time , self(),Message )Same as
send_interval( .Time , self(), +Message )- +- Cancel a previously requested timeout identified by +TRef .+ Suspend the calling process for Time milliseconds. +- Cancels a previously requested timeout.
+is a unique - timer reference returned by the timer function in question. Returns - TRef {ok, cancel} , or{error, whenReason }- is not a timer reference. TRef Suspends the process calling this function for +
milliseconds and then returns Time ok , + or suspends the process forever ifis the + atom Time infinity . Naturally, this + function does not return immediately.- +- Suspend the calling process for +Time amount of milliseconds.+ Start a global timer server (named timer_server ). +- Suspends the process calling this function for
+amount - of milliseconds and then returns Time ok , or suspend the process - forever ifis the atom Time infinity . Naturally, this - function does not return immediately.Starts the timer server. Normally, the server does not need + to be started explicitly. It is started dynamically if it + is needed. This is useful during development, but in a + target system the server is to be started explicitly. Use + configuration parameters for +
for this. Kernel Measure the real time it takes to evaluate + Function, Arguments) orapply(Module, - Function, Arguments) orapply(Fun, Arguments) apply(Fun, Arguments) .
Evaluates
Evaluates
Returns
Evaluates
Evaluates
Evaluates
Evaluates
Calculates the time difference
Returns the number of milliseconds in
Return the number of milliseconds in
Returns the number of milliseconds in
Returns the number of milliseconds in
This example illustrates how to print out "Hello World!" in 5 seconds:
+Example 1
+The following example shows how to print "Hello World!" in 5 seconds:
- 1> timer:apply_after(5000, io, format, ["~nHello World!~n", []]).
- {ok,TRef}
- Hello World!
- The following coding example illustrates a process which performs a - certain action and if this action is not completed within a certain - limit, then the process is killed.
+1> timer:apply_after(5000, io, format, ["~nHello World!~n", []]). +{ok,TRef} +Hello World! + +Example 2
+The following example shows a process performing a + certain action, and if this action is not completed within a certain + limit, the process is killed:
- Pid = spawn(mod, fun, [foo, bar]),
- %% If pid is not finished in 10 seconds, kill him
- {ok, R} = timer:kill_after(timer:seconds(10), Pid),
- ...
- %% We change our mind...
- timer:cancel(R),
- ...
+Pid = spawn(mod, fun, [foo, bar]),
+%% If pid is not finished in 10 seconds, kill him
+{ok, R} = timer:kill_after(timer:seconds(10), Pid),
+...
+%% We change our mind...
+timer:cancel(R),
+...
A timer can always be removed by calling
An interval timer, i.e. a timer created by evaluating any of the
- functions
A one-shot timer, i.e. a timer created by evaluating any of the
- functions
A timer can always be removed by calling
+
An interval timer, that is, a timer created by evaluating any of the
+ functions
+
A one-shot timer, that is, a timer created by evaluating any of the
+ functions
+
This module contains functions for converting between different character representations. Basically it converts between ISO-latin-1 characters and Unicode ditto, but it can also convert between different Unicode encodings (like UTF-8, UTF-16 and UTF-32).
-The default Unicode encoding in Erlang is in binaries UTF-8, which is also the format in which built in functions and libraries in OTP expect to find binary Unicode data. In lists, Unicode data is encoded as integers, each integer representing one character and encoded simply as the Unicode codepoint for the character.
-Other Unicode encodings than integers representing codepoints or UTF-8 in binaries are referred to as "external encodings". The ISO-latin-1 encoding is in binaries and lists referred to as latin1-encoding.
-It is recommended to only use external encodings for communication with external entities where this is required. When working inside the Erlang/OTP environment, it is recommended to keep binaries in UTF-8 when representing Unicode characters. Latin1 encoding is supported both for backward compatibility and for communication with external entities not supporting Unicode character sets.
+This module contains functions for converting between different character + representations. It converts between ISO Latin-1 characters and Unicode + characters, but it can also convert between different Unicode encodings + (like UTF-8, UTF-16, and UTF-32).
+The default Unicode encoding in Erlang is in binaries UTF-8, which is also + the format in which built-in functions and libraries in OTP expect to find + binary Unicode data. In lists, Unicode data is encoded as integers, each + integer representing one character and encoded simply as the Unicode code + point for the character.
+Other Unicode encodings than integers representing code points or UTF-8 + in binaries are referred to as "external encodings". The ISO + Latin-1 encoding + is in binaries and lists referred to as latin1-encoding.
+It is recommended to only use external encodings for communication with + external entities where this is required. When working inside the + Erlang/OTP environment, it is recommended to keep binaries in UTF-8 when + representing Unicode characters. ISO Latin-1 encoding is supported both + for backward compatibility and for communication + with external entities not supporting Unicode character sets.
A
A
A
A
A
A
An An
The same as
Same as
The same as
Same as
Check for a UTF byte order mark (BOM) in the beginning of a
- binary. If the supplied binary
If no BOM is found, the function returns
Checks for a UTF Byte Order Mark (BOM) in the beginning of a
+ binary. If the supplied binary
If no BOM is found, the function returns
Same as
Same as
Converts a possibly deep list of integers and
- binaries into a list of integers representing Unicode
- characters. The binaries in the input may have characters
- encoded as latin1 (0 - 255, one character per byte), in which
- case the
If
The purpose of the function is mainly to be able to convert
- combinations of Unicode characters into a pure Unicode
- string in list representation for further processing. For
- writing the data to an external entity, the reverse function
-
The option
If for some reason, the data cannot be converted, either
- because of illegal Unicode/latin1 characters in the list, or
- because of invalid UTF encoding in any binaries, an error
- tuple is returned. The error tuple contains the tag
-
However, if the input
Errors occur for the following reasons:
-A special type of error is when no actual invalid integers or
- bytes are found, but a trailing
If one UTF characters is split over two consecutive
- binaries in the
- decode_data(Data) ->
- case unicode:characters_to_list(Data,unicode) of
- {incomplete,Encoded, Rest} ->
- More = get_some_more_data(),
- Encoded ++ decode_data([Rest, More]);
- {error,Encoded,Rest} ->
- handle_error(Encoded,Rest);
- List ->
- List
- end.
-
- Bit-strings that are not whole bytes are however not allowed, - so a UTF character has to be split along 8-bit boundaries to - ever be decoded.
- -If any parameters are of the wrong type, the list structure
- is invalid (a number as tail) or the binaries do not contain
- whole bytes (bit-strings), a
Same as
Same as
Behaves as
Options:
+An alias for
An alias for
An alias for
The atoms
Errors and exceptions occur as in
+
Same as
Same as
Behaves as
The option
Errors and exceptions occur as in
Converts a possibly deep list of integers and + binaries into a list of integers representing Unicode + characters. The binaries in the input can have characters + encoded as one of the following:
+ISO Latin-1 (0-255, one character per byte). Here,
+ case parameter
One of the UTF-encodings, which is specified as parameter
+
Only when
If
The purpose of the function is mainly to convert
+ combinations of Unicode characters into a pure Unicode
+ string in list representation for further processing. For
+ writing the data to an external entity, the reverse function
+
Option
If the data cannot be converted, either
+ because of illegal Unicode/ISO Latin-1 characters in the list,
+ or because of invalid UTF encoding in any binaries, an error
+ tuple is returned. The error tuple contains the tag
+
However, if the input
Errors occur for the following reasons:
+Integers out of range.
+If
If
An integer > 16#10FFFF + (the maximum Unicode character)
+An integer in the range 16#D800 to 16#DFFF (invalid range + reserved for UTF-16 surrogate pairs)
+Incorrect UTF encoding.
+If
Errors can occur for various reasons, including the + following:
+"Pure" decoding errors + (like the upper bits of the bytes being wrong).
+The bytes are decoded to a too large number.
+The bytes are decoded to a code point in the invalid + Unicode range.
+Encoding is "overlong", meaning that a number + should have been encoded in fewer bytes.
+The case of a truncated UTF is handled specially, see the + paragraph about incomplete binaries below.
+If
A special type of error is when no actual invalid integers or
+ bytes are found, but a trailing
If one UTF character is split over two consecutive binaries in
+ the
Example:
+
+decode_data(Data) ->
+ case unicode:characters_to_list(Data,unicode) of
+ {incomplete,Encoded, Rest} ->
+ More = get_some_more_data(),
+ Encoded ++ decode_data([Rest, More]);
+ {error,Encoded,Rest} ->
+ handle_error(Encoded,Rest);
+ List ->
+ List
+ end.
+ However, bit strings that are not whole bytes are not allowed, + so a UTF character must be split along 8-bit boundaries to + ever be decoded.
+A
Create a UTF byte order mark (BOM) as a binary from the
- supplied
The function returns
It can be noted that the BOM for UTF-8 is seldom used, and it - is really not a byte order mark. There are obviously no - byte order issues with UTF-8, so the BOM is only there to - differentiate UTF-8 encoding from other UTF formats.
- +Creates a UTF Byte Order Mark (BOM) as a binary from the
+ supplied
The function returns
Notice that the BOM for UTF-8 is seldom used, and it + is really not a byte order mark. There are obviously no + byte order issues with UTF-8, so the BOM is only there to + differentiate UTF-8 encoding from other UTF formats.
Implementing support for Unicode character sets is an ongoing - process. The Erlang Enhancement Proposal (EEP) 10 outlined the - basics of Unicode support and also specified a default encoding in - binaries that all Unicode-aware modules should handle in the - future.
- -The functionality described in EEP10 was implemented in Erlang/OTP
- R13A, but that was by no means the end of it. In Erlang/OTP R14B01 support
- for Unicode file names was added, although it was in no way complete
- and was by default disabled on platforms where no guarantee was given
- for the file name encoding. With Erlang/OTP R16A came support for UTF-8 encoded
- source code, among with enhancements to many of the applications to
- support both Unicode encoded file names as well as support for UTF-8
- encoded files in several circumstances. Most notable is the support
- for UTF-8 in files read by
This guide outlines the current Unicode support and gives a couple - of recipes for working with Unicode data.
-Experience with the Unicode support in Erlang has made it - painfully clear that understanding Unicode characters and encodings - is not as easy as one would expect. The complexity of the field as - well as the implications of the standard requires thorough - understanding of concepts rarely before thought of.
- -Furthermore the Erlang implementation requires understanding of - concepts that never were an issue for many (Erlang) programmers. To - understand and use Unicode characters requires that you study the - subject thoroughly, even if you're an experienced programmer.
- -As an example, one could contemplate the issue of converting - between upper and lower case letters. Reading the standard will make - you realize that, to begin with, there's not a simple one to one - mapping in all scripts. Take German as an example, where there's a - letter "ß" (Sharp s) in lower case, but the uppercase equivalent is - "SS". Or Greek, where "Σ" has two different lowercase forms: "ς" in - word-final position and "σ" elsewhere. Or Turkish where dotted and - dot-less "i" both exist in lower case and upper case forms, or - Cyrillic "I" which usually has no lowercase form. Or of course - languages that have no concept of upper case (or lower case). So, a - conversion function will need to know not only one character at a - time, but possibly the whole sentence, maybe the natural language - the translation should be in and also take into account differences - in input and output string length and so on. There is at the time of - writing no Unicode to_upper/to_lower functionality in Erlang/OTP, but - there are publicly available libraries that address these issues.
- -Another example is the accented characters where the same glyph - has two different representations. Let's look at the Swedish - "ö". There's a code point for that in the Unicode standard, but you - can also write it as "o" followed by U+0308 (Combining Diaeresis, - with the simplified meaning that the last letter should have a "¨" - above). They have exactly the same glyph. They are for most - purposes the same, but they have completely different - representations. For example MacOS X converts all file names to use - Combining Diaeresis, while most other programs (including Erlang) - try to hide that by doing the opposite when for example listing - directories. However it's done, it's usually important to normalize - such characters to avoid utter confusion.
- -The list of examples can be made as long as the Unicode standard, I - suspect. The point is that one need a kind of knowledge that was - never needed when programs only took one or two languages into - account. The complexity of human languages and scripts, certainly - has made this a challenge when constructing a universal - standard. Supporting Unicode properly in your program will require - effort.
- -Unicode is a standard defining code points (numbers) for all - known, living or dead, scripts. In principle, every known symbol - used in any language has a Unicode code point.
-Unicode code points are defined and published by the Unicode - Consortium, which is a non profit organization.
-Support for Unicode is increasing throughout the world of - computing, as the benefits of one common character set are - overwhelming when programs are used in a global environment.
-Along with the base of the standard: the code points for all the - scripts, there are a couple of encoding standards available.
-It is vital to understand the difference between encodings and - Unicode characters. Unicode characters are code points according to - the Unicode standard, while the encodings are ways to represent such - code points. An encoding is just a standard for representation, - UTF-8 can for example be used to represent a very limited part of - the Unicode character set (e.g. ISO-Latin-1), or the full Unicode - range. It's just an encoding format.
-As long as all character sets were limited to 256 characters, - each character could be stored in one single byte, so there was more - or less only one practical encoding for the characters. Encoding - each character in one byte was so common that the encoding wasn't - even named. When we now, with the Unicode system, have a lot more - than 256 characters, we need a common way to represent these. The - common ways of representing the code points are the encodings. This - means a whole new concept to the programmer, the concept of - character representation, which was before a non-issue.
- -Different operating systems and tools support different - encodings. For example Linux and MacOS X has chosen the UTF-8 - encoding, which is backwards compatible with 7-bit ASCII and - therefore affects programs written in plain English the - least. Windows on the other hand supports a limited version of - UTF-16, namely all the code planes where the characters can be - stored in one single 16-bit entity, which includes most living - languages.
- -The most widely spread encodings are:
-Certain ranges of numbers are left unused in the Unicode standard - and certain ranges are even deemed invalid. The most notable invalid - range is 16#D800 - 16#DFFF, as the UTF-16 encoding does not allow - for encoding of these numbers. It can be speculated that the UTF-16 - encoding standard was, from the beginning, expected to be able to - hold all Unicode characters in one 16-bit entity, but then had to be - extended, leaving a hole in the Unicode range to cope with backward - compatibility.
-Additionally, the code point 16#FEFF is used for byte order marks - (BOM's) and use of that character is not encouraged in other - contexts than that. It actually is valid though, as the character - "ZWNBS" (Zero Width Non Breaking Space). BOM's are used to identify - encodings and byte order for programs where such parameters are not - known in advance. Byte order marks are more seldom used than one - could expect, but their use might become more widely spread as they - provide the means for programs to make educated guesses about the - Unicode format of a certain file.
-To support Unicode in Erlang, problems in several areas have been - addressed. Each area is described briefly in this section and more - thoroughly further down in this document:
-
-%% -*- coding: utf-8 -*-
-
- in the beginning of the file. This of course requires your editor to
- support UTF-8 as well. The same comment is also interpreted by
- functions like In Erlang, strings are actually lists of integers. A string was - up until Erlang/OTP R13 defined to be encoded in the ISO-latin-1 (ISO8859-1) - character set, which is, code point by code point, a sub-range of - the Unicode character set.
-The standard list encoding for strings was therefore easily - extended to cope with the whole Unicode range: A Unicode string in - Erlang is simply a list containing integers, each integer being a - valid Unicode code point and representing one character in the - Unicode character set.
-Erlang strings in ISO-latin-1 are a subset of Unicode - strings.
-Only if a string contains code points < 256, can it be
- directly converted to a binary by using
- i.e.
Binaries are more troublesome. For performance reasons, programs
- often store textual data in binaries instead of lists, mainly
- because they are more compact (one byte per character instead of two
- words per character, as is the case with lists). Using
-
As the UTF-8 encoding is widely spread and provides some backward - compatibility in the 7-bit ASCII range, it is selected as the - standard encoding for Unicode characters in binaries for Erlang.
-The standard binary encoding is used whenever a library function - in Erlang should cope with Unicode data in binaries, but is of - course not enforced when communicating externally. Functions and - bit-syntax exist to encode and decode both UTF-8, UTF-16 and UTF-32 - in binaries. Library functions dealing with binaries and Unicode in - general, however, only deal with the default encoding.
- -Character data may be combined from several sources, sometimes
- available in a mix of strings and binaries. Erlang has for long had
- the concept of
+
+ Unicode Implementation
+ Implementing support for Unicode character sets is an ongoing process.
+ The Erlang Enhancement Proposal (EEP) 10 outlined the basics of Unicode
+ support and specified a default encoding in binaries that all
+ Unicode-aware modules are to handle in the future.
+
+ Here is an overview what has been done so far:
+
+
+ The functionality described in EEP10 was implemented
+ in Erlang/OTP R13A.
+
+ Erlang/OTP R14B01 added support for Unicode
+ filenames, but it was not complete and was by default
+ disabled on platforms where no guarantee was given for the
+ filename encoding.
+
+ With Erlang/OTP R16A came support for UTF-8 encoded
+ source code, with enhancements to many of the applications to
+ support both Unicode encoded filenames and support for UTF-8
+ encoded files in many circumstances. Most notable is the
+ support for UTF-8 in files read by file:consult/1 ,
+ release handler support for UTF-8, and more support for
+ Unicode character sets in the I/O system.
+
+ In Erlang/OTP 17.0, the encoding default for Erlang
+ source files was switched to UTF-8.
+
+
+ This section outlines the current Unicode support and gives some
+ recipes for working with Unicode data.
+
+
+
+ Understanding Unicode
+ Experience with the Unicode support in Erlang has made it clear that
+ understanding Unicode characters and encodings is not as easy as one
+ would expect. The complexity of the field and the implications of the
+ standard require thorough understanding of concepts rarely before
+ thought of.
+
+ Also, the Erlang implementation requires understanding of
+ concepts that were never an issue for many (Erlang) programmers. To
+ understand and use Unicode characters requires that you study the
+ subject thoroughly, even if you are an experienced programmer.
+
+ As an example, contemplate the issue of converting between upper and
+ lower case letters. Reading the standard makes you realize that there is
+ not a simple one to one mapping in all scripts, for example:
+
+
+ -
+
In German, the letter "ß" (sharp s) is in lower case, but the
+ uppercase equivalent is "SS".
+
+ -
+
In Greek, the letter "Σ" has two different lowercase forms,
+ "ς" in word-final position and "σ" elsewhere.
+
+ -
+
In Turkish, both dotted and dotless "i" exist in lower case and
+ upper case forms.
+
+ -
+
Cyrillic "I" has usually no lowercase form.
+
+ -
+
Languages with no concept of upper case (or lower case).
+
+
+
+ So, a conversion function must know not only one character at a time,
+ but possibly the whole sentence, the natural language to translate to,
+ the differences in input and output string length, and so on.
+ Erlang/OTP has currently no Unicode to_upper /to_lower
+ functionality, but publicly available libraries address these issues.
+
+ Another example is the accented characters, where the same glyph has two
+ different representations. The Swedish letter "ö" is one example.
+ The Unicode standard has a code point for it, but you can also write it
+ as "o" followed by "U+0308" (Combining Diaeresis, with the simplified
+ meaning that the last letter is to have "¨" above). They have the same
+ glyph. They are for most purposes the same, but have different
+ representations. For example, MacOS X converts all filenames to use
+ Combining Diaeresis, while most other programs (including Erlang) try to
+ hide that by doing the opposite when, for example, listing directories.
+ However it is done, it is usually important to normalize such
+ characters to avoid confusion.
+
+ The list of examples can be made long. One need a kind of knowledge that
+ was not needed when programs only considered one or two languages. The
+ complexity of human languages and scripts has certainly made this a
+ challenge when constructing a universal standard. Supporting Unicode
+ properly in your program will require effort.
+
+
+
+ What Unicode Is
+ Unicode is a standard defining code points (numbers) for all known,
+ living or dead, scripts. In principle, every symbol used in any
+ language has a Unicode code point. Unicode code points are defined and
+ published by the Unicode Consortium, which is a non-profit
+ organization.
+
+ Support for Unicode is increasing throughout the world of computing, as
+ the benefits of one common character set are overwhelming when programs
+ are used in a global environment. Along with the base of the standard,
+ the code points for all the scripts, some encoding standards are
+ available.
+
+ It is vital to understand the difference between encodings and Unicode
+ characters. Unicode characters are code points according to the Unicode
+ standard, while the encodings are ways to represent such code points. An
+ encoding is only a standard for representation. UTF-8 can, for example,
+ be used to represent a very limited part of the Unicode character set
+ (for example ISO-Latin-1) or the full Unicode range. It is only an
+ encoding format.
+
+ As long as all character sets were limited to 256 characters, each
+ character could be stored in one single byte, so there was more or less
+ only one practical encoding for the characters. Encoding each character
+ in one byte was so common that the encoding was not even named. With the
+ Unicode system there are much more than 256 characters, so a common way
+ is needed to represent these. The common ways of representing the code
+ points are the encodings. This means a whole new concept to the
+ programmer, the concept of character representation, which was a
+ non-issue earlier.
+
+ Different operating systems and tools support different encodings. For
+ example, Linux and MacOS X have chosen the UTF-8 encoding, which is
+ backward compatible with 7-bit ASCII and therefore affects programs
+ written in plain English the least. Windows supports a limited version
+ of UTF-16, namely all the code planes where the characters can be
+ stored in one single 16-bit entity, which includes most living
+ languages.
+
+ The following are the most widely spread encodings:
+
+
+ Bytewise representation
+ -
+
This is not a proper Unicode representation, but the representation
+ used for characters before the Unicode standard. It can still be used
+ to represent character code points in the Unicode standard with
+ numbers < 256, which exactly corresponds to the ISO Latin-1
+ character set. In Erlang, this is commonly denoted latin1
+ encoding, which is slightly misleading as ISO Latin-1 is a
+ character code range, not an encoding.
+
+ UTF-8
+ -
+
Each character is stored in one to four bytes depending on code
+ point. The encoding is backward compatible with bytewise
+ representation of 7-bit ASCII, as all 7-bit characters are stored in
+ one single byte in UTF-8. The characters beyond code point 127 are
+ stored in more bytes, letting the most significant bit in the first
+ character indicate a multi-byte character. For details on the
+ encoding, the RFC is publicly available.
+ Notice that UTF-8 is not compatible with bytewise
+ representation for code points from 128 through 255, so an ISO
+ Latin-1 bytewise representation is generally incompatible with
+ UTF-8.
+
+ UTF-16
+ -
+
This encoding has many similarities to UTF-8, but the basic
+ unit is a 16-bit number. This means that all characters occupy
+ at least two bytes, and some high numbers four bytes. Some
+ programs, libraries, and operating systems claiming to use
+ UTF-16 only allow for characters that can be stored in one
+ 16-bit entity, which is usually sufficient to handle living
+ languages. As the basic unit is more than one byte, byte-order
+ issues occur, which is why UTF-16 exists in both a big-endian
+ and a little-endian variant.
+ In Erlang, the full UTF-16 range is supported when applicable, like
+ in the unicode
+ module and in the bit syntax.
+
+ UTF-32
+ -
+
The most straightforward representation. Each character is stored in
+ one single 32-bit number. There is no need for escapes or any
+ variable number of entities for one character. All Unicode code
+ points can be stored in one single 32-bit entity. As with UTF-16,
+ there are byte-order issues. UTF-32 can be both big-endian and
+ little-endian.
+
+ UCS-4
+ -
+
Basically the same as UTF-32, but without some Unicode semantics,
+ defined by IEEE, and has little use as a separate encoding standard.
+ For all normal (and possibly abnormal) use, UTF-32 and UCS-4 are
+ interchangeable.
+
+
+
+ Certain number ranges are unused in the Unicode standard and certain
+ ranges are even deemed invalid. The most notable invalid range is
+ 16#D800-16#DFFF, as the UTF-16 encoding does not allow for encoding of
+ these numbers. This is possibly because the UTF-16 encoding standard,
+ from the beginning, was expected to be able to hold all Unicode
+ characters in one 16-bit entity, but was then extended, leaving a hole
+ in the Unicode range to handle backward compatibility.
+
+ Code point 16#FEFF is used for Byte Order Marks (BOMs) and use of that
+ character is not encouraged in other contexts. It is valid though, as
+ the character "ZWNBS" (Zero Width Non Breaking Space). BOMs are used to
+ identify encodings and byte order for programs where such parameters are
+ not known in advance. BOMs are more seldom used than expected, but can
+ become more widely spread as they provide the means for programs to make
+ educated guesses about the Unicode format of a certain file.
+
+
+
+ Areas of Unicode Support
+ To support Unicode in Erlang, problems in various areas have been
+ addressed. This section describes each area briefly and more
+ thoroughly later in this User's Guide.
+
+
+ Representation
+ -
+
To handle Unicode characters in Erlang, a common representation
+ in both lists and binaries is needed. EEP (10) and the subsequent
+ initial implementation in Erlang/OTP R13A settled a standard
+ representation of Unicode characters in Erlang.
+
+ Manipulation
+ -
+
The Unicode characters need to be processed by the Erlang
+ program, which is why library functions must be able to handle
+ them. In some cases functionality has been added to already
+ existing interfaces (as the string module now can
+ handle lists with any code points). In some cases new
+ functionality or options have been added (as in the io module, the file
+ handling, the unicode module, and
+ the bit syntax). Today most modules in Kernel and
+ STDLIB , as well as the VM are Unicode-aware.
+
+ File I/O
+ -
+
I/O is by far the most problematic area for Unicode. A file is an
+ entity where bytes are stored, and the lore of programming has been
+ to treat characters and bytes as interchangeable. With Unicode
+ characters, you must decide on an encoding when you want to store
+ the data in a file. In Erlang, you can open a text file with an
+ encoding option, so that you can read characters from it rather than
+ bytes, but you can also open a file for bytewise I/O.
+ The Erlang I/O-system has been designed (or at least used) in a way
+ where you expect any I/O server to handle any string data.
+ That is, however, no longer the case when working with Unicode
+ characters. The Erlang programmer must now know the
+ capabilities of the device where the data ends up. Also, ports in
+ Erlang are byte-oriented, so an arbitrary string of (Unicode)
+ characters cannot be sent to a port without first converting it to an
+ encoding of choice.
+
+ Terminal I/O
+ -
+
Terminal I/O is slightly easier than file I/O. The output is meant
+ for human reading and is usually Erlang syntax (for example, in the
+ shell). There exists syntactic representation of any Unicode
+ character without displaying the glyph (instead written as
+ \x {HHH }). Unicode data can therefore usually be
+ displayed even if the terminal as such does not support the whole
+ Unicode range.
+
+ Filenames
+ -
+
Filenames can be stored as Unicode strings in different ways
+ depending on the underlying operating system and file system. This
+ can be handled fairly easy by a program. The problems arise when the
+ file system is inconsistent in its encodings. For example, Linux
+ allows files to be named with any sequence of bytes, leaving to each
+ program to interpret those bytes. On systems where these
+ "transparent" filenames are used, Erlang must be informed about the
+ filename encoding by a startup flag. The default is bytewise
+ interpretation, which is usually wrong, but allows for interpretation
+ of all filenames.
+ The concept of "raw filenames" can be used to handle wrongly encoded
+ filenames if one enables Unicode filename translation (+fnu )
+ on platforms where this is not the default.
+
+ Source code encoding
+ -
+
The Erlang source code has support for the UTF-8 encoding
+ and bytewise encoding. The default in Erlang/OTP R16B was bytewise
+ (latin1 ) encoding. It was changed to UTF-8 in Erlang/OTP 17.0.
+ You can control the encoding by a comment like the following in the
+ beginning of the file:
+
+%% -*- coding: utf-8 -*-
+ This of course requires your editor to support UTF-8 as well. The
+ same comment is also interpreted by functions like
+ file:consult/1 ,
+ the release handler, and so on, so that you can have all text files
+ in your source directories in UTF-8 encoding.
+
+ The language
+ -
+
Having the source code in UTF-8 also allows you to write string
+ literals containing Unicode characters with code points > 255,
+ although atoms, module names, and function names are restricted to
+ the ISO Latin-1 range. Binary literals, where you use type
+ /utf8 , can also be expressed using Unicode characters > 255.
+ Having module names using characters other than 7-bit ASCII can cause
+ trouble on operating systems with inconsistent file naming schemes,
+ and can hurt portability, so it is not recommended.
+ EEP 40 suggests that the language is also to allow for Unicode
+ characters > 255 in variable names. Whether to implement that EEP
+ is yet to be decided.
+
+
+
+
+
+ Standard Unicode Representation
+ In Erlang, strings are lists of integers. A string was until
+ Erlang/OTP R13 defined to be encoded in the ISO Latin-1 (ISO 8859-1)
+ character set, which is, code point by code point, a subrange of the
+ Unicode character set.
+
+ The standard list encoding for strings was therefore easily extended to
+ handle the whole Unicode range. A Unicode string in Erlang is a list
+ containing integers, where each integer is a valid Unicode code point and
+ represents one character in the Unicode character set.
+
+ Erlang strings in ISO Latin-1 are a subset of Unicode strings.
+
+ Only if a string contains code points < 256, can it be directly
+ converted to a binary by using, for example,
+ erlang:iolist_to_binary/1
+ or can be sent directly to a port. If the string contains Unicode
+ characters > 255, an encoding must be decided upon and the string is to
+ be converted to a binary in the preferred encoding using
+ unicode:characters_to_binary/1,2,3 .
+ Strings are not generally lists of bytes, as they were before
+ Erlang/OTP R13, they are lists of characters. Characters are not
+ generally bytes, they are Unicode code points.
+
+ Binaries are more troublesome. For performance reasons, programs often
+ store textual data in binaries instead of lists, mainly because they are
+ more compact (one byte per character instead of two words per character,
+ as is the case with lists). Using
+ erlang:list_to_binary/1 ,
+ an ISO Latin-1 Erlang string can be converted into a binary, effectively
+ using bytewise encoding: one byte per character. This was convenient for
+ those limited Erlang strings, but cannot be done for arbitrary Unicode
+ lists.
+
+ As the UTF-8 encoding is widely spread and provides some backward
+ compatibility in the 7-bit ASCII range, it is selected as the standard
+ encoding for Unicode characters in binaries for Erlang.
+
+ The standard binary encoding is used whenever a library function in
+ Erlang is to handle Unicode data in binaries, but is of course not
+ enforced when communicating externally. Functions and bit syntax exist to
+ encode and decode both UTF-8, UTF-16, and UTF-32 in binaries. However,
+ library functions dealing with binaries and Unicode in general only deal
+ with the default encoding.
+
+ Character data can be combined from many sources, sometimes available in
+ a mix of strings and binaries. Erlang has for long had the concept of
+ iodata or iolist s, where binaries and lists can be combined
+ to represent a sequence of bytes. In the same way, the Unicode-aware
+ modules often allow for combinations of binaries and lists, where the
+ binaries have characters encoded in UTF-8 and the lists contain such
+ binaries or numbers representing Unicode code points:
+
+
unicode_binary() = binary() with characters encoded in UTF-8 coding standard
chardata() = charlist() | unicode_binary()
charlist() = maybe_improper_list(char() | unicode_binary() | charlist(),
- unicode_binary() | nil())
- The module unicode in STDLIB even
- supports similar mixes with binaries containing other encodings than
- UTF-8, but that is a special case to allow for conversions to and
- from external data:
-
-external_unicode_binary() = binary() with characters coded in
- a user specified Unicode encoding other than UTF-8 (UTF-16 or UTF-32)
+ unicode_binary() | nil())
+
+ The module unicode
+ even supports similar mixes with binaries containing other encodings than
+ UTF-8, but that is a special case to allow for conversions to and from
+ external data:
+
+
+external_unicode_binary() = binary() with characters coded in a user-specified
+ Unicode encoding other than UTF-8 (UTF-16 or UTF-32)
external_chardata() = external_charlist() | external_unicode_binary()
-external_charlist() = maybe_improper_list(char() |
- external_unicode_binary() |
- external_charlist(),
- external_unicode_binary() | nil())
-
-
- Basic Language Support
- As of Erlang/OTP R16 Erlang
- source files can be written in either UTF-8 or bytewise encoding
- (a.k.a. latin1 encoding). The details on how to state the encoding
- of an Erlang source file can be found in
- epp(3) . Strings and comments
- can be written using Unicode, but functions still have to be named
- using characters from the ISO-latin-1 character set and atoms are
- restricted to the same ISO-latin-1 range. These restrictions in the
- language are of course independent of the encoding of the source
- file.
+external_charlist() = maybe_improper_list(char() | external_unicode_binary() |
+ external_charlist(), external_unicode_binary() | nil())
+ The bit-syntax contains types for coping with binary data in the
- three main encodings. The types are named
+ Basic Language Support
+ As from Erlang/OTP R16, Erlang source
+ files can be written in UTF-8 or bytewise (latin1 ) encoding. For
+ information about how to state the encoding of an Erlang source file, see
+ the epp(3) module.
+ Strings and comments can be written using Unicode, but functions must
+ still be named using characters from the ISO Latin-1 character set, and
+ atoms are restricted to the same ISO Latin-1 range. These restrictions in
+ the language are of course independent of the encoding of the source
+ file.
+
+
+ Bit Syntax
+ The bit syntax contains types for handling binary data in the
+ three main encodings. The types are named utf8 , utf16 ,
+ and utf32 . The utf16 and utf32 types can be in a
+ big-endian or a little-endian variant:
+
+
<<Ch/utf8,_/binary>> = Bin1,
<<Ch/utf16-little,_/binary>> = Bin2,
Bin3 = <<$H/utf32-little, $e/utf32-little, $l/utf32-little, $l/utf32-little,
$o/utf32-little>>,
- For convenience, literal strings can be encoded with a Unicode
- encoding in binaries using the following (or similar) syntax:
-
+
+ For convenience, literal strings can be encoded with a Unicode
+ encoding in binaries using the following (or similar) syntax:
+
+
Bin4 = <<"Hello"/utf16>>,
-
-
- String and Character Literals
- For source code, there is an extension to the \ OOO
- (backslash followed by three octal numbers) and \x HH
- (backslash followed by x , followed by two hexadecimal
- characters) syntax, namely \x{ H ...} (a backslash
- followed by an x , followed by left curly bracket, any
- number of hexadecimal digits and a terminating right curly
- bracket). This allows for entering characters of any code point
- literally in a string even when the encoding of the source file is
- bytewise (latin1 ).
- In the shell, if using a Unicode input device, or in source
- code stored in UTF-8, $ can be followed directly by a
- Unicode character producing an integer. In the following example
- the code point of a Cyrillic с is output:
-
+
+
+
+ String and Character Literals
+ For source code, there is an extension to syntax \ OOO
+ (backslash followed by three octal numbers) and \x HH (backslash
+ followed by x , followed by two hexadecimal characters), namely
+ \x{ H ...} (backslash followed by x , followed by
+ left curly bracket, any number of hexadecimal digits, and a terminating
+ right curly bracket). This allows for entering characters of any code
+ point literally in a string even when the encoding of the source file
+ is bytewise (latin1 ).
+
+ In the shell, if using a Unicode input device, or in source code
+ stored in UTF-8, $ can be followed directly by a Unicode
+ character producing an integer. In the following example, the code
+ point of a Cyrillic с is output:
+
+
7> $с.
1089
-
-
- Heuristic String Detection
- In certain output functions and in the output of return values
- in the shell, Erlang tries to heuristically detect string data in
- lists and binaries. Typically you will see heuristic detection in
- a situation like this:
-
+
+
+
+ Heuristic String Detection
+ In certain output functions and in the output of return values in
+ the shell, Erlang tries to detect string data in lists and binaries
+ heuristically. Typically you will see heuristic detection in a
+ situation like this:
+
+
1> [97,98,99].
"abc"
2> <<97,98,99>>.
<<"abc">>
3> <<195,165,195,164,195,182>>.
<<"åäö"/utf8>>
- Here the shell will detect lists containing printable
- characters or binaries containing printable characters either in
- bytewise or UTF-8 encoding. The question here is: what is a
- printable character? One view would be that anything the Unicode
- standard thinks is printable, will also be printable according to
- the heuristic detection. The result would be that almost any list
- of integers will be deemed a string, resulting in all sorts of
- characters being printed, maybe even characters your terminal does
- not have in its font set (resulting in some generic output you
- probably will not appreciate). Another way is to keep it backwards
- compatible so that only the ISO-Latin-1 character set is used to
- detect a string. A third way would be to let the user decide
- exactly what Unicode ranges are to be viewed as characters. Since
- Erlang/OTP R16B you can select either the whole Unicode range or the
- ISO-Latin-1 range by supplying the startup flag +pc
- Range, where Range is either latin1 or
- unicode . For backwards compatibility, the default is
- latin1 . This only controls how heuristic string detection
- is done. In the future, more ranges are expected to be added, so
- that one can tailor the heuristics to the language and region
- relevant to the user.
- Lets look at an example with the two different startup options:
-
+
+ Here the shell detects lists containing printable characters or
+ binaries containing printable characters in bytewise or UTF-8 encoding.
+ But what is a printable character? One view is that anything the Unicode
+ standard thinks is printable, is also printable according to the
+ heuristic detection. The result is then that almost any list of
+ integers are deemed a string, and all sorts of characters are printed,
+ maybe also characters that your terminal lacks in its font set
+ (resulting in some unappreciated generic output).
+ Another way is to keep it backward compatible so that only the ISO
+ Latin-1 character set is used to detect a string. A third way is to let
+ the user decide exactly what Unicode ranges that are to be viewed as
+ characters.
+
+ As from Erlang/OTP R16B you can select the ISO Latin-1 range or the
+ whole Unicode range by supplying startup flag +pc latin1 or
+ +pc unicode , respectively. For backward compatibility,
+ latin1 is default. This only controls how heuristic string
+ detection is done. More ranges are expected to be added in the future,
+ enabling tailoring of the heuristics to the language and region
+ relevant to the user.
+
+ The following examples show the two startup options:
+
+
$ erl +pc latin1
Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false]
@@ -467,9 +535,9 @@ Eshell V5.10.1 (abort with ^G)
4> <<208,174,208,189,208,184,208,186,208,190,208,180>>.
<<208,174,208,189,208,184,208,186,208,190,208,180>>
5> <<229/utf8,228/utf8,246/utf8>>.
-<<"åäö"/utf8>>
-
-
+<<"åäö"/utf8>>
+
+
$ erl +pc unicode
Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false]
@@ -483,78 +551,88 @@ Eshell V5.10.1 (abort with ^G)
4> <<208,174,208,189,208,184,208,186,208,190,208,180>>.
<<"Юникод"/utf8>>
5> <<229/utf8,228/utf8,246/utf8>>.
-<<"åäö"/utf8>>
-
- In the examples, we can see that the default Erlang shell will
- only interpret characters from the ISO-Latin1 range as printable
- and will only detect lists or binaries with those "printable"
- characters as containing string data. The valid UTF-8 binary
- containing "Юникод", will not be printed as a string. When, on the
- other hand, started with all Unicode characters printable (+pc
- unicode ), the shell will output anything containing printable
- Unicode data (in binaries either UTF-8 or bytewise encoded) as
- string data.
-
- These heuristics are also used by
- io (_lib ):format/2 and friends when the
- t modifier is used in conjunction with ~p or
- ~P :
-
+<<"åäö"/utf8>>
+
+ In the examples, you can see that the default Erlang shell interprets
+ only characters from the ISO Latin1 range as printable and only detects
+ lists or binaries with those "printable" characters as containing
+ string data. The valid UTF-8 binary containing the Russian word
+ "Юникод", is not printed as a string. When started with all Unicode
+ characters printable (+pc unicode ), the shell outputs anything
+ containing printable Unicode data (in binaries, either UTF-8 or
+ bytewise encoded) as string data.
+
+ These heuristics are also used by
+ io:format/2 ,
+ io_lib:format/2 ,
+ and friends when modifier t is used with ~p or
+ ~P :
+
+
$ erl +pc latin1
Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.10.1 (abort with ^G)
1> io:format("~tp~n",[{<<"åäö">>, <<"åäö"/utf8>>, <<208,174,208,189,208,184,208,186,208,190,208,180>>}]).
{<<"åäö">>,<<"åäö"/utf8>>,<<208,174,208,189,208,184,208,186,208,190,208,180>>}
-ok
-
-
+ok
+
+
$ erl +pc unicode
Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.10.1 (abort with ^G)
1> io:format("~tp~n",[{<<"åäö">>, <<"åäö"/utf8>>, <<208,174,208,189,208,184,208,186,208,190,208,180>>}]).
{<<"åäö">>,<<"åäö"/utf8>>,<<"Юникод"/utf8>>}
-ok
-
- Please observe that this only affects heuristic interpretation
- of lists and binaries on output. For example the ~ts format
- sequence does always output a valid lists of characters,
- regardless of the +pc setting, as the programmer has
- explicitly requested string output.
+ok
+
+ Notice that this only affects heuristic interpretation of
+ lists and binaries on output. For example, the ~ts format
+ sequence always outputs a valid list of characters, regardless of the
+ +pc setting, as the programmer has explicitly requested string
+ output.
+
The interactive Erlang shell, when started towards a terminal or
- started using the
On Windows, proper operation requires that a suitable font
- is installed and selected for the Erlang application to use. If no
- suitable font is available on your system, try installing the DejaVu
- fonts (
On Unix-like operating systems, the terminal should be able
- to handle UTF-8 on input and output (modern versions of XTerm, KDE
- konsole and the Gnome terminal do for example) and your locale
- settings have to be proper. As an example, my
+ +++ -The Interactive Shell +The interactive Erlang shell, when started to a terminal or started + using command
+ +werl on Windows, can support Unicode input and + output.On Windows, proper operation requires that a suitable font is + installed and selected for the Erlang application to use. If no suitable + font is available on your system, try installing the +
+ +DejaVu fonts , which are freely + available, and then select that font in the Erlang shell application.On Unix-like operating systems, the terminal is to be able to handle + UTF-8 on input and output (this is done by, for example, modern versions + of XTerm, KDE Konsole, and the Gnome terminal) + and your locale settings must be proper. As + an example, a
+ +LANG environment variable can be set as follows:$ echo $LANG en_US.UTF-8-Actually, most systems handle the
-LC_CTYPE variable before -LANG , so if that is set, it has to be set to -UTF-8 :+ +Most systems handle variable
+ +LC_CTYPE beforeLANG , so if + that is set, it must be set toUTF-8 :$ echo $LC_CTYPE en_US.UTF-8-The
-LANG orLC_CTYPE setting should be consistent - with what the terminal is capable of, there is no portable way for - Erlang to ask the actual terminal about its UTF-8 capacity, we have - to rely on the language and character type settings.To investigate what Erlang thinks about the terminal, the -
-io:getopts() call can be used when the shell is started:+ +The
+ +LANG orLC_CTYPE setting are to be consistent with + what the terminal is capable of. There is no portable way for Erlang to + ask the terminal about its UTF-8 capacity, we have to rely on the + language and character type settings.To investigate what Erlang thinks about the terminal, the call +
+ ++ can be used when the shell is started: io:getopts() $ LC_CTYPE=en_US.ISO-8859-1 erl Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false] @@ -571,27 +649,31 @@ Eshell V5.10.1 (abort with ^G) {encoding,unicode} 2>-When (finally?) everything is in order with the locale settings, - fonts and the terminal emulator, you probably also have discovered a - way to input characters in the script you desire. For testing, the - simplest way is to add some keyboard mappings for other languages, - usually done with some applet in your desktop environment. In my KDE - environment, I start the KDE Control Center (Personal Settings), - select "Regional and Accessibility" and then "Keyboard Layout". On - Windows XP, I start Control Panel->Regional and Language - Options, select the Language tab and click the Details... button in - the square named "Text services and input Languages". Your - environment probably provides similar means of changing the keyboard - layout. Make sure you have a way to easily switch back and forth - between keyboards if you are not used to this, entering commands - using a Cyrillic character set is, as an example, not easily done in - the Erlang shell.
- -Now you are set up for some Unicode input and output. The - simplest thing to do is of course to enter a string in the - shell:
- -+When (finally?) everything is in order with the locale settings, fonts. + and the terminal emulator, you have probably found a way to input + characters in the script you desire. For testing, the simplest way is to + add some keyboard mappings for other languages, usually done with some + applet in your desktop environment.
+ +In a KDE environment, select KDE Control Center (Personal + Settings) > Regional and Accessibility > Keyboard + Layout.
+ +On Windows XP, select Control Panel > Regional and Language + Options, select tab Language, and click button + Details... in the square named Text Services and Input + Languages.
+ +Your environment + probably provides similar means of changing the keyboard layout. Ensure + that you have a way to switch back and forth between keyboards easily if + you are not used to this. For example, entering commands using a Cyrillic + character set is not easily done in the Erlang shell.
+ +Now you are set up for some Unicode input and output. The simplest thing + to do is to enter a string in the shell:
+ +$ erl Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false] @@ -603,12 +685,13 @@ Eshell V5.10.1 (abort with ^G) 3> io:format("~ts~n", [v(2)]). Юникод ok -4>-While strings can be input as Unicode characters, the language - elements are still limited to the ISO-latin-1 character set. Only - character constants and strings are allowed to be beyond that - range:
-+4>+ +While strings can be input as Unicode characters, the language elements + are still limited to the ISO Latin-1 character set. Only character + constants and strings are allowed to be beyond that range:
+ +$ erl Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false] @@ -618,371 +701,398 @@ Eshell V5.10.1 (abort with ^G) 2> Юникод. * 1: illegal character 2>-- -Unicode File Names -- Most modern operating systems support Unicode file names in some - way or another. There are several different ways to do this and - Erlang by default treats the different approaches differently:
-- - -Mandatory Unicode file naming -- -
-Windows and, for most common uses, MacOS X enforces Unicode - support for file names. All files created in the file system have - names that can consistently be interpreted. In MacOS X, all file - names are retrieved in UTF-8 encoding, while Windows has - selected an approach where each system call handling file names - has a special Unicode aware variant, giving much the same - effect. There are no file names on these systems that are not - Unicode file names, why the default behavior of the Erlang VM is - to work in "Unicode file name translation mode", - meaning that a file name can be given as a Unicode list and that - will be automatically translated to the proper name encoding for - the underlying operating and file system.
-Doing i.e. a
-file:list_dir/1 on one of these systems - may return Unicode lists with code points beyond 255, depending - on the content of the actual file system.As the feature is fairly new, you may still stumble upon non - core applications that cannot handle being provided with file - names containing characters with code points larger than 255, but - the core Erlang system should have no problems with Unicode file - names.
-Transparent file naming -- -
-Most Unix operating systems have adopted a simpler approach, - namely that Unicode file naming is not enforced, but by - convention. Those systems usually use UTF-8 encoding for Unicode - file names, but do not enforce it. On such a system, a file name - containing characters having code points between 128 and 255 may - be named either as plain ISO-latin-1 or using UTF-8 encoding. As - no consistency is enforced, the Erlang VM can do no consistent - translation of all file names.
- -By default on such systems, Erlang starts in
- -utf8 file - name mode if the terminal supports UTF-8, otherwise in -latin1 mode.In the
- -latin1 mode, file names are bytewise endcoded. - This allows for list representation of all file names in - the system, but, for example, a file named "Östersund.txt", will - appear infile:list_dir/1 as either "Östersund.txt" (if - the file name was encoded in bytewise ISO-Latin-1 by the program - creating the file, or more probably as -[195,150,115,116,101,114,115,117,110,100] , which is a - list containing UTF-8 bytes - not what you would want... If you - on the other hand use Unicode file name translation on such a - system, non-UTF-8 file names will simply be ignored by functions - likefile:list_dir/1 . They can be retrieved with -file:list_dir_all/1 , but wrongly encoded file names will - appear as "raw file names".The Unicode file naming support was introduced with Erlang/OTP - R14B01. A VM operating in Unicode file name translation mode can - work with files having names in any language or character set (as - long as it is supported by the underlying OS and file system). The - Unicode character list is used to denote file or directory names and - if the file system content is listed, you will also get - Unicode lists as return value. The support lies in the Kernel and - STDLIB modules, why most applications (that does not explicitly - require the file names to be in the ISO-latin-1 range) will benefit - from the Unicode support without change.
- -On operating systems with mandatory Unicode file names, this - means that you more easily conform to the file names of other (non - Erlang) applications, and you can also process file names that, at - least on Windows, were completely inaccessible (due to having names - that could not be represented in ISO-latin-1). Also you will avoid - creating incomprehensible file names on MacOS X as the vfs layer of - the OS will accept all your file names as UTF-8 and will not rewrite - them.
- -For most systems, turning on Unicode file name translation is no - problem even if it uses transparent file naming. Very few systems - have mixed file name encodings. A consistent UTF-8 named system will - work perfectly in Unicode file name mode. It was still however - considered experimental in Erlang/OTP R14B01 and is still not the default on - such systems. Unicode file name translation is turned on with the -
- -+fnu switch to the On Linux, a VM started without explicitly - stating the file name translation mode will default tolatin1 - as the native file name encoding. On Windows and MacOS X, the - default behavior is that of Unicode file name translation, why the -file:native_name_encoding/0 by default returnsutf8 on - those systems (the fact that Windows actually does not use UTF-8 on - the file system level can safely be ignored by the Erlang - programmer). The default behavior can, as stated before, be - changed using the+fnu or+fnl options to the VM, see - theprogram. If the - VM is started in Unicode file name translation mode, - erl file:native_name_encoding/0 will return the atom -utf8 . The+fnu switch can be followed byw , -i ore , to control how wrongly encoded file names are - to be reported.w means that a warning is sent to the -error_logger whenever a wrongly encoded file name is - "skipped" in directory listings,i means that those wrongly - encoded file names are silently ignored ande means that the - API function will return an error whenever a wrongly encoded file - (or directory) name is encountered.w is the default. Note - thatfile:read_link/1 will always return an error if the link - points to an invalid file name.In Unicode file name mode, file names given to the BIF -
- -open_port/2 with the option{spawn_executable,...} are - also interpreted as Unicode. So is the parameter list given in the -args option available when usingspawn_executable . The - UTF-8 translation of arguments can be avoided using binaries, see - the discussion about raw file names below.It is worth noting that the file
- -encoding options given - when opening a file has nothing to do with the file name - encoding convention. You can very well open files containing data - encoded in UTF-8 but having file names in bytewise (latin1 ) encoding - or vice versa.- - Erlang drivers and NIF shared objects still can not be - named with names containing code points beyond 127. This is a known - limitation to be removed in a future release. Erlang modules however - can, but it is definitely not a good idea and is still considered - experimental.
- -Notes About Raw File Names -- Raw file names were introduced together with Unicode file name - support in erts-5.8.2 (Erlang/OTP R14B01). The reason "raw file - names" was introduced in the system was to be able to - consistently represent file names given in different encodings on - the same system. Having the VM automatically translate a file name - that is not in UTF-8 to a list of Unicode characters might seem - practical, but this would open up for both duplicate file names and - other inconsistent behavior. Consider a directory containing a file - named "björn" in ISO-latin-1, while the Erlang VM is - operating in Unicode file name mode (and therefore expecting UTF-8 - file naming). The ISO-latin-1 name is not valid UTF-8 and one could - be tempted to think that automatic conversion in for example -
- -file:list_dir/1 is a good idea. But what would happen if we - later tried to open the file and have the name as a Unicode list - (magically converted from the ISO-latin-1 file name)? The VM will - convert the file name given to UTF-8, as this is the encoding - expected. Effectively this means trying to open the file named - <<"björn"/utf8>>. This file does not exist, - and even if it existed it would not be the same file as the one that - was listed. We could even create two files named "björn", - one named in the UTF-8 encoding and one not. If -file:list_dir/1 would automatically convert the ISO-latin-1 - file name to a list, we would get two identical file names as the - result. To avoid this, we need to differentiate between file names - being properly encoded according to the Unicode file naming - convention (i.e. UTF-8) and file names being invalid under the - encoding. By the commonfile:list_dir/1 function, the wrongly - encoded file names are simply ignored in Unicode file name - translation mode, but by thefile:list_dir_all/1 function, - the file names with invalid encoding are returned as "raw" - file names, i.e. as binaries.The Erlang
- -file module accepts raw file names as - input.open_port({spawn_executable, ...} ...) also accepts - them. As mentioned earlier, the arguments given in the option list - toopen_port({spawn_executable, ...} ...) undergo the same - conversion as the file names, meaning that the executable will be - provided with arguments in UTF-8 as well. This translation is - avoided consistently with how the file names are treated, by giving - the argument as a binary.To force Unicode file name translation mode on systems where this - is not the default was considered experimental in Erlang/OTP R14B01 due to - the fact that the initial implementation did not ignore wrongly - encoded file names, so that raw file names could spread unexpectedly - throughout the system. Beginning with Erlang/OTP R16B, the wrongly encoded file - names are only retrieved by special functions - (e.g.
- -file:list_dir_all/1 ), so the impact on existing code is - much lower, why it is now supported. Unicode file name translation - is expected to be default in future releases.Even if you are operating without Unicode file naming translation - automatically done by the VM, you can access and create files with - names in UTF-8 encoding by using raw file names encoded as - UTF-8. Enforcing the UTF-8 encoding regardless of the mode the - Erlang VM is started in might, in some circumstances be a good idea, - as the convention of using UTF-8 file names is spreading.
-- -Notes About MacOS X -MacOS X's vfs layer enforces UTF-8 file names in a quite - aggressive way. Older versions did this by simply refusing to create - non UTF-8 conforming file names, while newer versions replace - offending bytes with the sequence "%HH", where HH is the - original character in hexadecimal notation. As Unicode translation - is enabled by default on MacOS X, the only way to come up against - this is to either start the VM with the
- -+fnl flag or to use a - raw file name in bytewise (latin1 ) encoding. If using a raw - filename, with a bytewise encoding containing characters between 127 - and 255, to create a file, the file can not be opened using the same - name as the one used to create it. There is no remedy for this - behaviour, other than keeping the file names in the right - encoding.MacOS X also reorganizes the names of files so that the - representation of accents etc is using the "combining characters", - i.e. the character
-ö is represented as the code points - [111,776], where 111 is the charactero and 776 is the - special accent character "combining diaeresis". This way of - normalizing Unicode is otherwise very seldom used and Erlang - normalizes those file names in the opposite way upon retrieval, so - that file names using combining accents are not passed up to the - Erlang application. In Erlang the file name "björn" is - retrieved as [98,106,246,114,110], not as [98,106,117,776,114,110], - even though the file system might think differently. The - normalization into combining accents are redone when actually - accessing files, so this can usually be ignored by the Erlang - programmer.- -Unicode in Environment and Parameters -- Environment variables and their interpretation is handled much in - the same way as file names. If Unicode file names are enabled, - environment variables as well as parameters to the Erlang VM are - expected to be in Unicode.
-If Unicode file names are enabled, the calls to -
-, - os:getenv/0 , - os:getenv/1 and - os:putenv/2 - will handle Unicode strings. On Unix-like platforms, the built-in - functions will translate environment variables in UTF-8 to/from - Unicode strings, possibly with code points > 255. On Windows the - Unicode versions of the environment system API will be used, also - allowing for code points > 255. os:unsetenv/1 On Unix-like operating systems, parameters are expected to be - UTF-8 without translation if Unicode file names are enabled.
-- -Unicode-aware Modules -Most of the modules in Erlang/OTP are of course Unicode-unaware - in the sense that they have no notion of Unicode and really should - not have. Typically they handle non-textual or byte-oriented data - (like
-gen_tcp etc).Modules that actually handle textual data (like
-io_lib , -string etc) are sometimes subject to conversion or extension - to be able to handle Unicode characters.Fortunately, most textual data has been stored in lists and range - checking has been sparse, why modules like
-string works well - for Unicode lists with little need for conversion or extension.Some modules are however changed to be explicitly - Unicode-aware. These modules include:
-- -- unicode - -
-The module
-- is obviously Unicode-aware. It contains functions for conversion - between different Unicode formats as well as some utilities for - identifying byte order marks. Few programs handling Unicode data - will survive without this module. unicode - io - -
-The
-module has been - extended along with the actual I/O-protocol to handle Unicode - data. This means that several functions require binaries to be - in UTF-8 and there are modifiers to formatting control sequences - to allow for outputting of Unicode strings. io - file ,group ,user - -
-I/O-servers throughout the system are able to handle - Unicode data and has options for converting data upon actual - output or input to/from the device. As shown earlier, the -
-has support for - Unicode terminals and the shell module allows for - translation to and from various Unicode formats on disk. file The actual reading and writing of files with Unicode data is - however not best done with the
-file module as its - interface is byte oriented. A file opened with a Unicode - encoding (like UTF-8), is then best read or written using the -module. io - re - -
-The
-module allows - for matching Unicode strings as a special option. As the library - is actually centered on matching in binaries, the Unicode - support is UTF-8-centered. re - wx - -
-The
-graphical library - has extensive support for Unicode text wx The module
-works perfectly for - Unicode strings as well as for ISO-latin-1 strings with the - exception of the language-dependent string and - to_upper - functions, which are only correct for the ISO-latin-1 character - set. Actually they can never function correctly for Unicode - characters in their current form, as there are language and locale - issues as well as multi-character mappings to consider when - converting text between cases. Converting case in an international - environment is a big subject not yet addressed in OTP. to_lower - + +Unicode Data in Files -The fact that Erlang as such can handle Unicode data in many forms - does not automatically mean that the content of any file can be - Unicode text. The external entities such as ports or I/O-servers are - not generally Unicode capable.
-Ports are always byte oriented, so before sending data that you - are not sure is bytewise encoded to a port, make sure to encode it - in a proper Unicode encoding. Sometimes this will mean that only - part of the data shall be encoded as e.g. UTF-8, some parts may be - binary data (like a length indicator) or something else that shall - not undergo character encoding, so no automatic translation is - present.
-I/O-servers behave a little differently. The I/O-servers connected - to terminals (or stdout) can usually cope with Unicode data - regardless of the
-encoding option. This is convenient when - one expects a modern environment but do not want to crash when - writing to a archaic terminal or pipe. Files on the other hand are - more picky. A file can have an encoding option which makes it - generally usable by the io-module (e.g.{encoding,utf8} ), but - is by default opened as a byte oriented file. Themodule is byte oriented, why only - ISO-Latin-1 characters can be written using that module. The - file module is the one to use if - Unicode data is to be output to a file with other io encoding - thanlatin1 (a.k.a. bytewise encoding). It is slightly - confusing that a file opened with - e.g.file:open(Name,[read,{encoding,utf8}]) , cannot be - properly read usingfile:read(File,N) but you have to use the -io module to retrieve the Unicode data from it. The reason is - thatfile:read andfile:write (and friends) are purely - byte oriented, and should so be, as that is the way to access - files other than text files - byte by byte. Just as with ports, you - can of course write encoded data into a file by "manually" converting - the data to the encoding of choice (using themodule or the bit syntax) - and then output it on a bytewise encoded ( unicode latin1 ) file.The rule of thumb is that the
- -module should be used for files - opened for bytewise access ( file {encoding,latin1} ) and the -module should be used when - accessing files with any other encoding - (e.g. io {encoding,uf8} ).Functions reading Erlang syntax from files generally recognize - the
-coding: comment and can therefore handle Unicode data on - input. When writing Erlang Terms to a file, you should insert - such comments when applicable:++ + +Unicode Filenames ++ Most modern operating systems support Unicode filenames in some way. + There are many different ways to do this and Erlang by default treats the + different approaches differently:
+ ++ + +Mandatory Unicode file naming +- +
+Windows and, for most common uses, MacOS X enforce Unicode support + for filenames. All files created in the file system have names that + can consistently be interpreted. In MacOS X, all filenames are + retrieved in UTF-8 encoding. In Windows, each system call handling + filenames has a special Unicode-aware variant, giving much the same + effect. There are no filenames on these systems that are not Unicode + filenames. So, the default behavior of the Erlang VM is to work in + "Unicode filename translation mode". This means that a + filename can be specified as a Unicode list, which is automatically + translated to the proper name encoding for the underlying operating + system and file system.
+Doing, for example, a +
++ on one of these systems can return Unicode lists with code points + > 255, depending on the content of the file system. file:list_dir/1 Transparent file naming +- +
+Most Unix operating systems have adopted a simpler approach, namely + that Unicode file naming is not enforced, but by convention. Those + systems usually use UTF-8 encoding for Unicode filenames, but do not + enforce it. On such a system, a filename containing characters with + code points from 128 through 255 can be named as plain ISO Latin-1 or + use UTF-8 encoding. As no consistency is enforced, the Erlang VM + cannot do consistent translation of all filenames.
+By default on such systems, Erlang starts in
+utf8 filename + mode if the terminal supports UTF-8, otherwise inlatin1 + mode.In
+latin1 mode, filenames are bytewise encoded. This allows + for list representation of all filenames in the system. However, a + a file named "Östersund.txt", appears in ++ either as "Östersund.txt" (if the filename was encoded in bytewise + ISO Latin-1 by the program creating the file) or more probably as + file:list_dir/1 [195,150,115,116,101,114,115,117,110,100] , which is a list + containing UTF-8 bytes (not what you want). If you use Unicode + filename translation on such a system, non-UTF-8 filenames are + ignored by functions likefile:list_dir/1 . They can be + retrieved with function +, + but wrongly encoded filenames appear as "raw filenames". + file:list_dir_all/1 The Unicode file naming support was introduced in Erlang/OTP + R14B01. A VM operating in Unicode filename translation mode can + work with files having names in any language or character set (as + long as it is supported by the underlying operating system and + file system). The Unicode character list is used to denote + filenames or directory names. If the file system content is + listed, you also get Unicode lists as return value. The support + lies in the
+ +Kernel andSTDLIB modules, which is why + most applications (that does not explicitly require the filenames + to be in the ISO Latin-1 range) benefit from the Unicode support + without change.On operating systems with mandatory Unicode filenames, this means that + you more easily conform to the filenames of other (non-Erlang) + applications. You can also process filenames that, at least on Windows, + were inaccessible (because of having names that could not be represented + in ISO Latin-1). Also, you avoid creating incomprehensible filenames + on MacOS X, as the
+ +vfs layer of the operating system accepts all + your filenames as UTF-8 does not rewrite them.For most systems, turning on Unicode filename translation is no problem + even if it uses transparent file naming. Very few systems have mixed + filename encodings. A consistent UTF-8 named system works perfectly in + Unicode filename mode. It was still, however, considered experimental in + Erlang/OTP R14B01 and is still not the default on such systems.
+ +Unicode filename translation is turned on with switch
+ ++fnu . On + Linux, a VM started without explicitly stating the filename translation + mode defaults tolatin1 as the native filename encoding. On + Windows and MacOS X, the default behavior is that of Unicode filename + translation. Therefore ++ by default returns file:native_name_encoding/0 utf8 on those systems (Windows does not use + UTF-8 on the file system level, but this can safely be ignored by the + Erlang programmer). The default behavior can, as stated earlier, be + changed using option+fnu or+fnl to the VM, see the +program. If the VM is + started in Unicode filename translation mode, + erl file:native_name_encoding/0 returns atomutf8 . Switch ++fnu can be followed byw ,i , ore to control + how wrongly encoded filenames are to be reported.+
+ +- +
++
w means that a warning is sent to theerror_logger + whenever a wrongly encoded filename is "skipped" in directory + listings.w is the default.- +
++
i means that wrongly encoded filenames are silently ignored. +- +
++
e means that the API function returns an error whenever a + wrongly encoded filename (or directory name) is encountered.Notice that +
+ ++ always returns an error if the link points to an invalid filename. file:read_link/1 In Unicode filename mode, filenames given to BIF
+ +open_port/2 with + option{spawn_executable,...} are also interpreted as Unicode. So + is the parameter list specified in optionargs available when + usingspawn_executable . The UTF-8 translation of arguments can be + avoided using binaries, see section +Notes About Raw Filenames . +Notice that the file encoding options specified when opening a file has + nothing to do with the filename encoding convention. You can very well + open files containing data encoded in UTF-8, but having filenames in + bytewise (
+ +latin1 ) encoding or conversely.+ + Erlang drivers and NIF-shared objects still cannot be named with + names containing code points > 127. This limitation will be removed in + a future release. However, Erlang modules can, but it is definitely not a + good idea and is still considered experimental.
++ + +Notes About Raw Filenames ++ Raw filenames were introduced together with Unicode filename support + in
+ +ERTS 5.8.2 (Erlang/OTP R14B01). The reason "raw + filenames" were introduced in the system was + to be able to represent + filenames, specified in different encodings on the same system, + consistently. It can seem practical to have the VM automatically + translate a filename that is not in UTF-8 to a list of Unicode + characters, but this would open up for both duplicate filenames and + other inconsistent behavior.Consider a directory containing a file named "björn" in ISO + Latin-1, while the Erlang VM is operating in Unicode filename mode (and + therefore expects UTF-8 file naming). The ISO Latin-1 name is not valid + UTF-8 and one can be tempted to think that automatic conversion in, for + example, +
+ ++ is a good idea. But what would happen if we later tried to open the file + and have the name as a Unicode list (magically converted from the ISO + Latin-1 filename)? The VM converts the filename to UTF-8, as this is + the encoding expected. Effectively this means trying to open the file + named <<"björn"/utf8>>. This file does not exist, + and even if it existed it would not be the same file as the one that was + listed. We could even create two files named "björn", one + named in UTF-8 encoding and one not. If file:list_dir/1 file:list_dir/1 would + automatically convert the ISO Latin-1 filename to a list, we would get + two identical filenames as the result. To avoid this, we must + differentiate between filenames that are properly encoded according to + the Unicode file naming convention (that is, UTF-8) and filenames that + are invalid under the encoding. By the common function +file:list_dir/1 , the wrongly encoded filenames are ignored in + Unicode filename translation mode, but by function ++ the filenames with invalid encoding are returned as "raw" + filenames, that is, as binaries. file:list_dir_all/1 The
+ +file module accepts raw filenames as input. +open_port({spawn_executable, ...} ...) also accepts them. As + mentioned earlier, the arguments specified in the option list to +open_port({spawn_executable, ...} ...) undergo the same + conversion as the filenames, meaning that the executable is provided + with arguments in UTF-8 as well. This translation is avoided + consistently with how the filenames are treated, by giving the argument + as a binary.To force Unicode filename translation mode on systems where this is not + the default was considered experimental in Erlang/OTP R14B01. This was + because the initial implementation did not ignore wrongly encoded + filenames, so that raw filenames could spread unexpectedly throughout + the system. As from Erlang/OTP R16B, the wrongly encoded + filenames are only retrieved by special functions (such as +
+ +file:list_dir_all/1 ). Since the impact on existing code is + therefore much lower it is now supported. + Unicode filename translation is + expected to be default in future releases.Even if you are operating without Unicode file naming translation + automatically done by the VM, you can access and create files with + names in UTF-8 encoding by using raw filenames encoded as UTF-8. + Enforcing the UTF-8 encoding regardless of the mode the Erlang VM is + started in can in some circumstances be a good idea, as the convention + of using UTF-8 filenames is spreading.
++ +Notes About MacOS X +The
+ +vfs layer of MacOS X enforces UTF-8 filenames in an + aggressive way. Older versions did this by refusing to create non-UTF-8 + conforming filenames, while newer versions replace offending bytes with + the sequence "%HH", where HH is the original character in + hexadecimal notation. As Unicode translation is enabled by default on + MacOS X, the only way to come up against this is to either start the VM + with flag+fnl or to use a raw filename in bytewise + (latin1 ) encoding. If using a raw filename, with a bytewise + encoding containing characters from 127 through 255, to create a file, + the file cannot be opened using the same name as the one used to create + it. There is no remedy for this behavior, except keeping the filenames + in the correct encoding.MacOS X reorganizes the filenames so that the representation of + accents, and so on, uses the "combining characters". For example, + character
+ö is represented as code points[111,776] , + where111 is charactero and776 is the special + accent character "Combining Diaeresis". This way of normalizing Unicode + is otherwise very seldom used. Erlang normalizes those filenames in the + opposite way upon retrieval, so that filenames using combining accents + are not passed up to the Erlang application. In Erlang, filename + "björn" is retrieved as[98,106,246,114,110] , not as +[98,106,117,776,114,110] , although the file system can think + differently. The normalization into combining accents is redone when + accessing files, so this can usually be ignored by the Erlang + programmer.+ + +Unicode in Environment and Parameters ++ Environment variables and their interpretation are handled much in the + same way as filenames. If Unicode filenames are enabled, environment + variables as well as parameters to the Erlang VM are expected to be in + Unicode.
+ +If Unicode filenames are enabled, the calls to +
+, + os:getenv/0,1 , and + os:putenv/2 + handle Unicode strings. On Unix-like platforms, the built-in functions + translate environment variables in UTF-8 to/from Unicode strings, possibly + with code points > 255. On Windows, the Unicode versions of the + environment system API are used, and code points > 255 are allowed. os:unsetenv/1 On Unix-like operating systems, parameters are expected to be UTF-8 + without translation if Unicode filenames are enabled.
++ + +Unicode-Aware Modules +Most of the modules in Erlang/OTP are Unicode-unaware in the sense that + they have no notion of Unicode and should not have. Typically they handle + non-textual or byte-oriented data (such as
+ +gen_tcp ).Modules handling textual data (such as +
+ +and + io_lib are sometimes + subject to conversion or extension to be able to handle Unicode + characters. string Fortunately, most textual data has been stored in lists and range + checking has been sparse, so modules like
+ +string work well for + Unicode lists with little need for conversion or extension.Some modules are, however, changed to be explicitly Unicode-aware. These + modules include:
+ ++ + ++ unicode - +
+The
++ module is clearly Unicode-aware. It contains functions for conversion + between different Unicode formats and some utilities for identifying + byte order marks. Few programs handling Unicode data survive without + this module. unicode + io - +
+The
+module has been + extended along with the actual I/O protocol to handle Unicode data. + This means that many functions require binaries to be in UTF-8, and + there are modifiers to format control sequences to allow for output + of Unicode strings. io + file ,group ,user - +
+I/O-servers throughout the system can handle Unicode data and have + options for converting data upon output or input to/from the device. + As shown earlier, the +
+module has + support for Unicode terminals and the + shell module + allows for translation to and from various Unicode formats on + disk. file Reading and writing of files with Unicode data is, however, not best + done with the
+file module, as its interface is + byte-oriented. A file opened with a Unicode encoding (like UTF-8) is + best read or written using the +module. io + re - +
+The
+module allows + for matching Unicode strings as a special option. As the library is + centered on matching in binaries, the Unicode support is + UTF-8-centered. re + wx - +
+The graphical library
+ has extensive support for Unicode text. wx The
+module works + perfectly for Unicode strings and ISO Latin-1 strings, except the + language-dependent functions + string + and + string:to_upper/1 , + which are only correct for the ISO Latin-1 character set. These two + functions can never function correctly for Unicode characters in their + current form, as there are language and locale issues as well as + multi-character mappings to consider when converting text between cases. + Converting case in an international environment is a large subject not + yet addressed in OTP. string:to_lower/1 + -Unicode Data in Files +Although Erlang can handle Unicode data in many forms does not + automatically mean that the content of any file can be Unicode text. The + external entities, such as ports and I/O servers, are not generally + Unicode capable.
+ +Ports are always byte-oriented, so before sending data that you are not + sure is bytewise-encoded to a port, ensure to encode it in a proper + Unicode encoding. Sometimes this means that only part of the data must + be encoded as, for example, UTF-8. Some parts can be binary data (like a + length indicator) or something else that must not undergo character + encoding, so no automatic translation is present.
+ +I/O servers behave a little differently. The I/O servers connected to + terminals (or
+ +stdout ) can usually cope with Unicode data + regardless of the encoding option. This is convenient when one expects + a modern environment but do not want to crash when writing to an archaic + terminal or pipe.A file can have an encoding option that makes it generally usable by the +
+ +module (for example + io {encoding,utf8} ), but is by default opened as a byte-oriented file. + Themodule is + byte-oriented, so only ISO Latin-1 characters can be written using that + module. Use the file io module if Unicode data is to be output to a + file with otherencoding thanlatin1 (bytewise encoding). + It is slightly confusing that a file opened with, for example, +file:open(Name,[read,{encoding,utf8}]) cannot be properly read + usingfile:read(File,N) , but using theio module to retrieve + the Unicode data from it. The reason is thatfile:read and +file:write (and friends) are purely byte-oriented, and should be, + as that is the way to access files other than text files, byte by byte. + As with ports, you can write encoded data into a file by "manually" + converting the data to the encoding of choice (using the +module or the + bit syntax) and then output it on a bytewise ( unicode latin1 ) encoded + file.Recommendations:
+ ++
+ +- +
Use the +
+module for + files opened for bytewise access ( file {encoding,latin1} ).- +
Use the
+module + when accessing files with any other encoding (for example + io {encoding,uf8} ).Functions reading Erlang syntax from files recognize the
+ +coding: + comment and can therefore handle Unicode data on input. When writing + Erlang terms to a file, you are advised to insert such comments when + applicable:$ erl +fna +pc unicode Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false] @@ -990,202 +1100,224 @@ Eshell V5.10.1 (abort with ^G) 1> file:write_file("test.term",<<"%% coding: utf-8\n[{\"Юникод\",4711}].\n"/utf8>>). ok 2> file:consult("test.term"). -{ok,[[{"Юникод",4711}]]} --- -Summary of Options -- The Unicode support is controlled by both command line switches, - some standard environment variables and the version of OTP you are - using. Most options affect mainly the way Unicode data is displayed, - not the actual functionality of the API's in the standard - libraries. This means that Erlang programs usually do not - need to concern themselves with these options, they are more for the - development environment. An Erlang program can be written so that it - works well regardless of the type of system or the Unicode options - that are in effect.
- -Here follows a summary of the settings affecting Unicode:
-- -The -LANG andLC_CTYPE environment variables- -
-The language setting in the OS mainly affects the shell. The - terminal (i.e. the group leader) will operate with
-{encoding, - unicode} only if the environment tells it that UTF-8 is - allowed. This setting should correspond to the actual terminal - you are using.The environment can also affect file name interpretation, if - Erlang is started with the
-+fna flag (which is default from - Erlang/OTP 17.0).You can check the setting of this by calling -
-io:getopts() , which will give you an option list - containing{encoding,unicode} or -{encoding,latin1} .The -+pc {unicode |latin1 } flag to -erl(1) - -
-This flag affects what is interpreted as string data when - doing heuristic string detection in the shell and in -
-io /io_lib:format with the"~tp" and -~tP formatting instructions, as described above.You can check this option by calling io:printable_range/0, - which will return
-unicode orlatin1 . To be - compatible with future (expected) extensions to the settings, - one should rather useio_lib:printable_list/1 to check if - a list is printable according to the setting. That function will - take into account new possible settings returned from -io:printable_range/0 .The -+fn {l |a |u } - [{w |i |e }] - flag toerl(1) - -
-This flag affects how the file names are to be interpreted. On - operating systems with transparent file naming, this has to be - specified to allow for file naming in Unicode characters (and - for correct interpretation of file names containing characters - > 255.
--
+fnl means bytewise interpretation of file names, which - was the usual way to represent ISO-Latin-1 file names before - UTF-8 file naming got widespread.-
+fnu means that file names are encoded in UTF-8, which - is nowadays the common scheme (although not enforced).- -
+fna means that you automatically select between -+fnl and+fnu , based on theLANG and -LC_CTYPE environment variables. This is optimistic - heuristics indeed, nothing enforces a user to have a terminal - with the same encoding as the file system, but usually, this is - the case. This is the default on all Unix-like operating - systems except MacOS X.The file name translation mode can be read with the -
-file:native_name_encoding/0 function, which returns -latin1 (meaning bytewise encoding) orutf8 .- - epp:default_encoding/0 - -
-This function returns the default encoding for Erlang source - files (if no encoding comment is present) in the currently - running release. In Erlang/OTP R16B
-latin1 was returned (meaning - bytewise encoding). In Erlang/OTP 17.0 and forward it returns -utf8 .The encoding of each file can be specified using comments as - described in -
-. epp(3) - and the io:setopts/ {1 ,2 }-oldshell /-noshell flags.- -
-When Erlang is started with
--oldshell or --noshell , the I/O-server forstandard_io is default - set to bytewise encoding, while an interactive shell defaults to - what the environment variables says.With the
-io:setopts/2 function you can set the - encoding of a file or other I/O-server. This can also be set when - opening a file. Setting the terminal (or other -standard_io server) unconditionally to the option -{encoding,utf8} will for example make UTF-8 encoded characters - being written to the device regardless of how Erlang was started or - the users environment.Opening files with
-encoding option is convenient when - writing or reading text files in a known encoding.You can retrieve the
-encoding setting for an I/O-server - using. io:getopts() - Recipes -When starting with Unicode, one often stumbles over some common - issues. I try to outline some methods of dealing with Unicode data - in this section.
+{ok,[[{"Юникод",4711}]]}
The Unicode support is controlled by both command-line switches, some + standard environment variables, and the OTP version you are using. Most + options affect mainly how Unicode data is displayed, not the + functionality of the APIs in the standard libraries. This means that + Erlang programs usually do not need to concern themselves with these + options, they are more for the development environment. An Erlang program + can be written so that it works well regardless of the type of system or + the Unicode options that are in effect.
+ +Here follows a summary of the settings affecting Unicode:
+ +The language setting in the operating system mainly affects the
+ shell. The terminal (that is, the group leader) operates with
+
The environment can also affect filename interpretation, if Erlang
+ is started with flag
You can check the setting of this by calling
+
This flag affects what is interpreted as string data when doing
+ heuristic string detection in the shell and in
+
You can check this option by calling
+
This flag affects how the filenames are to be interpreted. On + operating systems with transparent file naming, this must be + specified to allow for file naming in Unicode characters (and for + correct interpretation of filenames containing characters > 255). +
+The filename translation mode can be read with function
+
This function returns the default encoding for Erlang source files
+ (if no encoding comment is present) in the currently running release.
+ In Erlang/OTP R16B,
The encoding of each file can be specified using comments as
+ described in the
+
When Erlang is started with
You can set the encoding of a file or other I/O server with function
+
Opening files with option
You can retrieve the
A common method of identifying encoding in text-files is to put
- a byte order mark (BOM) first in the file. The BOM is the
- code point 16#FEFF encoded in the same way as the rest of the
- file. If such a file is to be read, the first few bytes (depending
- on encoding) is not part of the actual text. This code outlines
- how to open a file which is believed to have a BOM and set the
- files encoding and position for further sequential reading
- (preferably using the
+ Recipes
+ When starting with Unicode, one often stumbles over some common issues.
+ This section describes some methods of dealing with Unicode data.
+
+
+ Byte Order Marks
+ A common method of identifying encoding in text files is to put a Byte
+ Order Mark (BOM) first in the file. The BOM is the code point 16#FEFF
+ encoded in the same way as the remaining file. If such a file is to be
+ read, the first few bytes (depending on encoding) are not part of the
+ text. This code outlines how to open a file that is believed to
+ have a BOM, and sets the files encoding and position for further
+ sequential reading (preferably using the
+ io module).
+
+ Notice that error handling is omitted from the code:
+
+
open_bom_file_for_reading(File) ->
{ok,F} = file:open(File,[read,binary]),
{ok,Bin} = file:read(F,4),
{Type,Bytes} = unicode:bom_to_encoding(Bin),
file:position(F,Bytes),
io:setopts(F,[{encoding,Type}]),
- {ok,F}.
-
- The unicode:bom_to_encoding/1 function identifies the
- encoding from a binary of at least four bytes. It returns, along
- with an term suitable for setting the encoding of the file, the
- actual length of the BOM, so that the file position can be set
- accordingly. Note that file:position/2 always works on
- byte-offsets, so that the actual byte-length of the BOM is
- needed.
- To open a file for writing and putting the BOM first is even
- simpler:
-
+ {ok,F}.
+
+ Function
+ unicode:bom_to_encoding/1
+ identifies the encoding from a binary of at least four bytes. It
+ returns, along with a term suitable for setting the encoding of the
+ file, the byte length of the BOM, so that the file position can be set
+ accordingly. Notice that function
+ file:position/2
+ always works on byte-offsets, so that the byte length of the BOM is
+ needed.
+
+ To open a file for writing and place the BOM first is even simpler:
+
+
open_bom_file_for_writing(File,Encoding) ->
{ok,F} = file:open(File,[write,binary]),
ok = file:write(File,unicode:encoding_to_bom(Encoding)),
io:setopts(F,[{encoding,Encoding}]),
- {ok,F}.
-
- In both cases the file is then best processed using the
- io module, as the functions in io can handle code
- points beyond the ISO-latin-1 range.
-
-
- Formatted I/O
- When reading and writing to Unicode-aware entities, like the
- User or a file opened for Unicode translation, you will probably
- want to format text strings using the functions in io or io_lib . For backward
- compatibility reasons, these functions do not accept just any list
- as a string, but require a special translation modifier
- when working with Unicode texts. The modifier is t . When
- applied to the s control character in a formatting string,
- it accepts all Unicode code points and expect binaries to be in
- UTF-8:
-
+ {ok,F}.
+
+ The file is in both these cases then best processed using the
+
When reading and writing to Unicode-aware entities, like a
+ file opened for Unicode translation, you probably want to format text
+ strings using the functions in the
+
1> io:format("~ts~n",[<<"åäö"/utf8>>]).
åäö
ok
2> io:format("~s~n",[<<"åäö"/utf8>>]).
åäö
ok
- Obviously the second
As long as the data is always lists, the
The function
+ +Clearly, the second
+ +io:format/2 gives undesired output, as the + UTF-8 binary is not inlatin1 . For backward compatibility, the + non-prefixed control characters expects bytewise-encoded ISO + Latin-1 characters in binaries and lists containing only code points + < 256.As long as the data is always lists, modifier
+ +t can be used for + any string, but when binary data is involved, care must be taken to + make the correct choice of formatting characters. A bytewise-encoded + binary is also interpreted as a string, and printed even when using +~ts , but it can be mistaken for a valid UTF-8 string. Avoid + therefore using the~ts control if the binary contains + bytewise-encoded characters and not UTF-8.Function +
+ ++ behaves similarly. It is defined to return a deep list of characters + and the output can easily be converted to binary data for outputting on + any device by a simple + io_lib:format/2 . + When the translation modifier is used, the list can, however, contain + characters that cannot be stored in one byte. The call to + erlang:list_to_binary/1 erlang:list_to_binary/1 then fails. However, if the I/O server + you want to communicate with is Unicode-aware, the returned list can + still be used directly:$ erl +pc unicode Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false] @@ -1195,55 +1327,56 @@ Eshell V5.10.1 (abort with ^G) 2> io:put_chars(io_lib:format("~ts~n", ["Γιούνικοντ"])). Γιούνικοντ ok-The Unicode string is returned as a Unicode list, which is - recognized as such since the Erlang shell uses the Unicode - encoding (and is started with all Unicode characters considered - printable). The Unicode list is valid input to the
-function, - so data can be output on any Unicode capable device. If the device - is a terminal, characters will be output in the io:put_chars/2 \x{ H - ...} format if encoding islatin1 otherwise in UTF-8 - (for the non-interactive terminal - "oldshell" or "noshell") or - whatever is suitable to show the character properly (for an - interactive terminal - the regular shell). The bottom line is that - you can always send Unicode data to thestandard_io - device. Files will however only accept Unicode code points beyond - ISO-latin-1 ifencoding is set to something else than -latin1 .
While it is - strongly encouraged that the actual encoding of characters in - binary data is known prior to processing, that is not always - possible. On a typical Linux system, there is a mix of UTF-8 - and ISO-latin-1 text files and there are seldom any BOM's in the - files to identify them.
-UTF-8 is designed in such a way that ISO-latin-1 characters
- with numbers beyond the 7-bit ASCII range are seldom considered
- valid when decoded as UTF-8. Therefore one can usually use
- heuristics to determine if a file is in UTF-8 or if it is encoded
- in ISO-latin-1 (one byte per character) encoding. The
-
+
+ The Unicode string is returned as a Unicode list, which is recognized
+ as such, as the Erlang shell uses the Unicode encoding (and is started
+ with all Unicode characters considered printable). The Unicode list is
+ valid input to function
+ io:put_chars/2 ,
+ so data can be output on any Unicode-capable device. If the device is a
+ terminal, characters are output in format \x{ H...} if
+ encoding is latin1 . Otherwise in UTF-8 (for the non-interactive
+ terminal: "oldshell" or "noshell") or whatever is suitable to show the
+ character properly (for an interactive terminal: the regular shell).
+
+ So, you can always send Unicode data to the standard_io device.
+ Files, however, accept only Unicode code points beyond ISO Latin-1 if
+ encoding is set to something else than latin1 .
+ While it is strongly encouraged that the encoding of characters + in binary data is known before processing, that is not always possible. + On a typical Linux system, there is a mix of UTF-8 and ISO Latin-1 text + files, and there are seldom any BOMs in the files to identify them.
+ +UTF-8 is designed so that ISO Latin-1 characters with numbers beyond
+ the 7-bit ASCII range are seldom considered valid when decoded as UTF-8.
+ Therefore one can usually use heuristics to determine if a file is in
+ UTF-8 or if it is encoded in ISO Latin-1 (one byte per character).
+ The
heuristic_encoding_bin(Bin) when is_binary(Bin) ->
case unicode:characters_to_binary(Bin,utf8,utf8) of
Bin ->
utf8;
_ ->
latin1
- end.
-
- If one does not have a complete binary of the file content, one
- could instead chunk through the file and check part by part. The
- return-tuple
+ end.
+
+ If you do not have a complete binary of the file content, you can
+ instead chunk through the file and check part by part. The return-tuple
+
heuristic_encoding_file(FileName) ->
{ok,F} = file:open(FileName,[read,binary]),
loop_through_file(F,<<>>,file:read(F,1024)).
@@ -1260,13 +1393,14 @@ loop_through_file(F,Acc,{ok,Bin}) when is_binary(Bin) ->
loop_through_file(F,Rest,file:read(F,1024));
Res when is_binary(Res) ->
loop_through_file(F,<<>>,file:read(F,1024))
- end.
-
- Another option is to try to read the whole file in UTF-8
- encoding and see if it fails. Here we need to read the file using
-
+ end.
+
+ Another option is to try to read the whole file in UTF-8 encoding and
+ see if it fails. Here we need to read the file using function
+
heuristic_encoding_file2(FileName) ->
{ok,F} = file:open(FileName,[read,binary,{encoding,utf8}]),
loop_through_file2(F,io:get_chars(F,'',1024)).
@@ -1276,69 +1410,71 @@ loop_through_file2(_,eof) ->
loop_through_file2(_,{error,_Err}) ->
latin1;
loop_through_file2(F,Bin) when is_binary(Bin) ->
- loop_through_file2(F,io:get_chars(F,'',1024)).
-
- For various reasons, you may find yourself having a list of - UTF-8 bytes. This is not a regular string of Unicode characters as - each element in the list does not contain one character. Instead - you get the "raw" UTF-8 encoding that you have in binaries. This - is easily converted to a proper Unicode string by first converting - byte per byte into a binary and then converting the binary of - UTF-8 encoded characters back to a Unicode string:
-
- utf8_list_to_string(StrangeList) ->
- unicode:characters_to_list(list_to_binary(StrangeList)).
-
- When working with binaries, you may get the horrible "double
- UTF-8 encoding", where strange characters are encoded in your
- binaries or files that you did not expect. What you may have got,
- is a UTF-8 encoded binary that is for the second time encoded as
- UTF-8. A common situation is where you read a file, byte by byte,
- but the actual content is already UTF-8. If you then convert the
- bytes to UTF-8, using i.e. the
The by far most common situation where this happens, is when - you get lists of UTF-8 instead of proper Unicode strings, and then - convert them to UTF-8 in a binary or on a file:
-
- wrong_thing_to_do() ->
- {ok,Bin} = file:read_file("an_utf8_encoded_file.txt"),
- MyList = binary_to_list(Bin), %% Wrong! It is an utf8 binary!
- {ok,C} = file:open("catastrophe.txt",[write,{encoding,utf8}]),
- io:put_chars(C,MyList), %% Expects a Unicode string, but get UTF-8
- %% bytes in a list!
- file:close(C). %% The file catastrophe.txt contains more or less unreadable
- %% garbage!
-
- Make very sure you know what a binary contains before - converting it to a string. If no other option exists, try - heuristics:
-
- if_you_can_not_know() ->
- {ok,Bin} = file:read_file("maybe_utf8_encoded_file.txt"),
- MyList = case unicode:characters_to_list(Bin) of
- L when is_list(L) ->
- L;
- _ ->
- binary_to_list(Bin) %% The file was bytewise encoded
- end,
- %% Now we know that the list is a Unicode string, not a list of UTF-8 bytes
- {ok,G} = file:open("greatness.txt",[write,{encoding,utf8}]),
- io:put_chars(G,MyList), %% Expects a Unicode string, which is what it gets!
- file:close(G). %% The file contains valid UTF-8 encoded Unicode characters!
-
+ loop_through_file2(F,io:get_chars(F,'',1024)).
+ For various reasons, you can sometimes have a list of UTF-8 + bytes. This is not a regular string of Unicode characters, as each list + element does not contain one character. Instead you get the "raw" UTF-8 + encoding that you have in binaries. This is easily converted to a proper + Unicode string by first converting byte per byte into a binary, and then + converting the binary of UTF-8 encoded characters back to a Unicode + string:
+ +
+utf8_list_to_string(StrangeList) ->
+ unicode:characters_to_list(list_to_binary(StrangeList)).
+ When working with binaries, you can get the horrible "double UTF-8
+ encoding", where strange characters are encoded in your binaries or
+ files. In other words, you can get a UTF-8 encoded binary that for the
+ second time is encoded as UTF-8. A common situation is where you read a
+ file, byte by byte, but the content is already UTF-8. If you then
+ convert the bytes to UTF-8, using, for example, the
+
By far the most common situation where this occurs, is when you get + lists of UTF-8 instead of proper Unicode strings, and then convert them + to UTF-8 in a binary or on a file:
+ +
+wrong_thing_to_do() ->
+ {ok,Bin} = file:read_file("an_utf8_encoded_file.txt"),
+ MyList = binary_to_list(Bin), %% Wrong! It is an utf8 binary!
+ {ok,C} = file:open("catastrophe.txt",[write,{encoding,utf8}]),
+ io:put_chars(C,MyList), %% Expects a Unicode string, but get UTF-8
+ %% bytes in a list!
+ file:close(C). %% The file catastrophe.txt contains more or less unreadable
+ %% garbage!
+
+ Ensure you know what a binary contains before converting it to a + string. If no other option exists, try heuristics:
+ +
+if_you_can_not_know() ->
+ {ok,Bin} = file:read_file("maybe_utf8_encoded_file.txt"),
+ MyList = case unicode:characters_to_list(Bin) of
+ L when is_list(L) ->
+ L;
+ _ ->
+ binary_to_list(Bin) %% The file was bytewise encoded
+ end,
+ %% Now we know that the list is a Unicode string, not a list of UTF-8 bytes
+ {ok,G} = file:open("greatness.txt",[write,{encoding,utf8}]),
+ io:put_chars(G,MyList), %% Expects a Unicode string, which is what it gets!
+ file:close(G). %% The file contains valid UTF-8 encoded Unicode characters!
+ This module provides read and write access to the registry on Windows. It is essentially a port driver wrapped around the Win32 API calls for accessing the registry.
The registry is a hierarchical database, used to store various system - and software information in Windows. It is available in Windows 95 and - Windows NT. It contains installation data, and is updated by installers + and software information in Windows. + It contains installation data, and is updated by installers and system programs. The Erlang installer updates the registry by adding data that Erlang needs.
The registry contains keys and values. Keys are like the directories in a file system, they form a hierarchy. Values are like files, they have a name and a value, and also a type.
-Paths to keys are left to right, with sub-keys to the right and backslash
- between keys. (Remember that backslashes must be doubled in Erlang strings.)
- Case is preserved but not significant.
- Example:
Paths to keys are left to right, with subkeys to the right and backslash + between keys. (Remember that backslashes must be doubled in Erlang + strings.) Case is preserved but not significant.
+For example,
+
There are six entry points in the Windows registry, top level keys. They can be
- abbreviated in the
There are six entry points in the Windows registry, top-level keys. + They can be abbreviated in this module as follows:
-Abbrev. Registry key -======= ============ +Abbreviation Registry key +============ ============ hkcr HKEY_CLASSES_ROOT current_user HKEY_CURRENT_USER hkcu HKEY_CURRENT_USER @@ -67,29 +68,39 @@ current_config HKEY_CURRENT_CONFIG hkcc HKEY_CURRENT_CONFIG dyn_data HKEY_DYN_DATA hkdd HKEY_DYN_DATA-
The key above could be written as
The
The key above can be written as
+
This module uses a current key. It works much like the + current directory. From the current key, values can be fetched, subkeys can be listed, and so on.
-Under a key, any number of named values can be stored. They have name, and +
Under a key, any number of named values can be stored. They have names, types, and data.
-Currently, the
There is also a "default" value, which has the empty string as name. It is read and
- written with the atom
Some registry values are stored as strings with references to environment variables,
- e.g.
For additional information on the Windows registry consult the Win32 +
Other types can be read, and are returned as binaries.
+There is also a "default" value, which has the empty string as name. It
+ is read and written with the atom
Some registry values are stored as strings with references to environment
+ variables, for example,
For more information on the Windows registry, see consult the Win32 Programmer's Reference.
As returned by
As returned by
+
Changes the current key to another key. Works like cd. +
Changes the current key to another key. Works like
Creates a key, or just changes to it, if it is already there. Works
- like a combination of
The registry must have been opened in write-mode.
+ like a combination ofThe registry must have been opened in write mode.
Closes the registry. After that, the
Closes the registry. After that, the
Returns the path to the current key. This is the equivalent of
Note that the current key is stored in the driver, and might be - invalid (e.g. if the key has been removed).
+Returns the path to the current key. This is the equivalent of
+
Notice that the current key is stored in the driver, and can be + invalid (for example, if the key has been removed).
Deletes the current key, if it is valid. Calls the Win32 API
- function
Deletes a named value on the current key. The atom
The registry must have been opened in write-mode.
+ used for the default value. +The registry must have been opened in write mode.
Expands a string containing environment variables between percent - characters. Anything between two % is taken for a environment - variable, and is replaced by the value. Two consecutive % is replaced - by one %.
-A variable name that is not in the environment, will result in an error.
+ characters. Anything between twoA variable name that is not in the environment results in an + error.
Convert an POSIX errorcode to a string (by calling
Converts a POSIX error code to a string
+ (by calling
Opens the registry for reading or writing. The current key will be the root
- (
Use
Opens the registry for reading or writing. The current key is the
+ root (
Use
Sets the named (or default) value to value. Calls the Win32
- API function
The registry must have been opened in write-mode.
+Sets the named (or default) value to
Other types cannot be added or changed.
+The registry must have been opened in write mode.
Returns a list of subkeys to the current key. Calls the Win32
API function
Avoid calling this on the root keys, it can be slow.
+Avoid calling this on the root keys, as it can be slow.
Retrieves the named value (or default) on the current key.
- Registry values of type
Retrieves a list of all values on the current key. The values
- have types corresponding to the registry types, see
Win32 Programmer's Reference (from Microsoft)
-The Windows 95 Registry (book from O'Reilly)
+The
This module archives and extracts files to and from a zip
+ archive. The zip format is specified by the "ZIP Appnote.txt" file,
+ available on the PKWARE web site
+
The zip module supports zip archive versions up to 6.1. However, password-protection and Zip64 are not supported.
-By convention, the name of a zip file should end in "
Zip archives are created with the
-
To extract files from a zip archive, use the
-
To fold a function over all files in a zip archive, use the
-
To return a list of the files in a zip archive, use the
-
To print a list of files to the Erlang shell,
- use either the
In some cases, it is desirable to open a zip archive, and to
- unzip files from it file by file, without having to reopen the
- archive. The functions
-
By convention, the name of a zip file is to end with
To create zip archives, use function
+
To extract files from a zip archive, use function
+
To fold a function over all files in a zip archive, use function
+
To return a list of the files in a zip archive, use function
+
To print a list of files to the Erlang shell, use function
+
Sometimes it is desirable to open a zip archive, and to
+ unzip files from it file by file, without having to reopen the
+ archive. This can be done by functions
+
Zip64 archives are not currently supported.
-Password-protected and encrypted archives are not currently - supported
-Only the DEFLATE (zlib-compression) and the STORE (uncompressed - data) zip methods are supported.
-The size of the archive is limited to 2 G-byte (32 bits).
-Comments for individual files is not supported when creating zip - archives. The zip archive comment for the whole zip archive is - supported.
-There is currently no support for altering an existing zip archive. - To add or remove a file from an archive, the whole archive must be - recreated.
+Zip64 archives are not supported.
+Password-protected and encrypted archives are not supported.
+Only the DEFLATE (zlib-compression) and the STORE (uncompressed + data) zip methods are supported.
+The archive size is limited to 2 GB (32 bits).
+Comments for individual files are not supported when creating zip + archives. The zip archive comment for the whole zip archive is + supported.
+Changing a zip archive is not supported. + To add or remove a file from an archive, the whole archive must be + recreated.
+The record
The record
The record
The record
the name of the file
+The filename
file info as in
-
File information as in
+
the comment for the file in the zip archive
+The comment for the file in the zip archive
the offset of the file in the zip archive (used internally)
+The file offset in the zip archive (used internally)
the compressed size of the file (the uncompressed size is found
- in
The size of the compressed file (the size of the uncompressed
+ file is found in
These options are described in
These options are described in
As returned by
As returned by
+
The
As synonyms, the functions
The file-list is a list of files, with paths relative to the - current directory, they will be stored with this path in the - archive. Files may also be specified with data in binaries, - to create an archive directly from data.
-Files will be compressed using the DEFLATE compression, as
- described in the Appnote.txt file. However, files will be
- stored without compression if they already are compressed.
- The
It is possible to override the default behavior and
- explicitly control what types of files that should be
- compressed by using the
The following options are available:
-By default, the
Print an informational message about each file - being added.
-The output will not be to a file, but instead as a tuple
-
Add a comment to the zip-archive.
-Use the given directory as current directory, it will be - prepended to file names when adding them, although it will not - be in the zip-archive. (Acting like a file:set_cwd/1, but - without changing the global cwd property.)
-Controls what types of files will be
- compressed. It is by default set to
means that all files will be compressed (as long
- as they pass the
means that only files with exactly these extensions - will be compressed.
adds these extensions to the list of compress - extensions.
deletes these extensions from the list of compress - extensions.
Controls what types of files will be uncompressed. It is by
- default set to
means that no files will be compressed.
means that files with these extensions will be - uncompressed.
adds these extensions to the list of uncompress - extensions.
deletes these extensions from the list of uncompress - extensions.
The
If the
The following options are available:
-By default, all files will be extracted from the zip
- archive. With the
By default, the
By default, all existing files with the same name as file in
- the zip archive will be overwritten. With the
Print an informational message as each file is being - extracted.
-Instead of extracting to the current directory, the
-
Use the given directory as current directory, it will be - prepended to file names when extracting them from the - zip-archive. (Acting like a file:set_cwd/1, but without - changing the global cwd property.)
-The
For example:
+Calls
Both
The
Example:
> Name = "dummy.zip". "dummy.zip" @@ -380,97 +232,300 @@
The
As synonyms, the functions
The result value is the tuple
The following options are available:
+One option is available:
By default, the
By default, this function opens the zip file in
+
The
Prints all filenames in the zip archive
The
Prints filenames and information about all files in the zip archive
+
The
The archive must be closed with
The
If argument
Options:
+By default, all files are extracted from the zip
+ archive. With option
By default, this function opens the
+ zip file in
By default, all files with the same name as files in
+ the zip archive are overwritten. With option
Prints an informational message for each extracted file.
+Instead of extracting to the current directory,
+ the result is given as a list of tuples
+
Uses the specified directory as current directory. It is
+ prepended to filenames when extracting them from the
+ zip archive. (Acting like
+
The
Creates a zip archive containing the files specified in
+
Files are compressed using the DEFLATE compression, as
+ described in the "Appnote.txt" file. However, files are
+ stored without compression if they are already compressed.
+
It is possible to override the default behavior and control
+ what types of files that are to be compressed by using options
+
To trigger file compression, its extension must match with the
+
Options:
+By default, this function opens the
+ zip file in mode
Prints an informational message about each added file.
+The output is not to a file, but instead as a tuple
+
Adds a comment to the zip archive.
+Uses the specified directory as current work directory
+ (
Controls what types of files to be compressed. Defaults to
+
All files are compressed (as long
+ as they pass the
Only files with exactly these extensions + are compressed.
+Adds these extensions to the list of compress + extensions.
+Deletes these extensions from the list of compress + extensions.
+Controls what types of files to be uncompressed. Defaults to
+
No files are compressed.
+Files with these extensions are uncompressed.
+Adds these extensions to the list of uncompress + extensions.
+Deletes these extensions from the list of uncompress + extensions.
+Closes a zip archive, previously opened with
+
The
The files will be unzipped to memory or to file, depending on
- the options given to the
Extracts one or all files from an open archive.
+The files are unzipped to memory or to file, depending on
+ the options specified to function
+
The
Returns the file list of an open zip archive. The first returned + element is the zip archive comment.
+Opens a zip archive, and reads and saves its directory. This
+ means that later reading files from the archive is
+ faster than unzipping files one at a time with
+
The archive must be closed with
+
The