diff options
Diffstat (limited to 'lib/stdlib/doc')
71 files changed, 21698 insertions, 18053 deletions
diff --git a/lib/stdlib/doc/src/array.xml b/lib/stdlib/doc/src/array.xml index bff98245bf..db0ab42372 100644 --- a/lib/stdlib/doc/src/array.xml +++ b/lib/stdlib/doc/src/array.xml @@ -1,7 +1,8 @@ <?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE erlref SYSTEM "erlref.dtd"> + <erlref> -<header> + <header> <copyright> <year>2007</year><year>2016</year> <holder>Ericsson AB. All Rights Reserved.</holder> @@ -21,469 +22,541 @@ </legalnotice> -<title>array</title> -<prepared></prepared> -<responsible></responsible> -<docno>1</docno> -<approved></approved> -<checked></checked> -<date></date> -<rev>A</rev> -<file>array.xml</file></header> -<module>array</module> -<modulesummary>Functional, extendible arrays.</modulesummary> -<description> -<p>Functional, extendible arrays. Arrays can have fixed size, or -can grow automatically as needed. A default value is used for entries -that have not been explicitly set.</p> - - <p>Arrays uses <em>zero</em> based indexing. This is a deliberate design -choice and differs from other erlang datastructures, e.g. tuples.</p> - - <p>Unless specified by the user when the array is created, the default - value is the atom <c>undefined</c>. There is no difference between an - unset entry and an entry which has been explicitly set to the same - value as the default one (cf. <seealso marker="#reset-2">reset/2</seealso>). If you need to -differentiate between unset and set entries, you must make sure that -the default value cannot be confused with the values of set entries.</p> - - <p>The array never shrinks automatically; if an index <c>I</c> has been used - successfully to set an entry, all indices in the range [0,<c>I</c>] will - stay accessible unless the array size is explicitly changed by - calling <seealso marker="#resize-2">resize/2</seealso>.</p> - - <p>Examples: - </p><pre> %% Create a fixed-size array with entries 0-9 set to 'undefined' - A0 = array:new(10). - 10 = array:size(A0). - - %% Create an extendible array and set entry 17 to 'true', - %% causing the array to grow automatically - A1 = array:set(17, true, array:new()). - 18 = array:size(A1). - - %% Read back a stored value - true = array:get(17, A1). - - %% Accessing an unset entry returns the default value - undefined = array:get(3, A1). - - %% Accessing an entry beyond the last set entry also returns the - %% default value, if the array does not have fixed size - undefined = array:get(18, A1). - - %% "sparse" functions ignore default-valued entries - A2 = array:set(4, false, A1). - [{4, false}, {17, true}] = array:sparse_to_orddict(A2). - - %% An extendible array can be made fixed-size later - A3 = array:fix(A2). - - %% A fixed-size array does not grow automatically and does not - %% allow accesses beyond the last set entry - {'EXIT',{badarg,_}} = (catch array:set(18, true, A3)). - {'EXIT',{badarg,_}} = (catch array:get(18, A3)).</pre></description> -<datatypes> - <datatype> - <name name="array" n_vars="1"/> - <desc> - <p>A functional, extendible array. The representation is - not documented and is subject to change without notice. Note that - arrays cannot be directly compared for equality.</p> - </desc> - </datatype> - <datatype> - <name name="array" n_vars="0"/> - </datatype> - <datatype> - <name name="array_indx"/> - </datatype> - <datatype> - <name name="array_opts"/> - </datatype> - <datatype> - <name name="array_opt"/> - </datatype> - <datatype> - <name name="indx_pairs"/> - </datatype> - <datatype> - <name name="indx_pair"/> - </datatype> -</datatypes> - -<funcs> -<func> -<name name="default" arity="1"/> -<fsummary>Get the value used for uninitialized entries.</fsummary> - -<desc><marker id="default-1"/> - -<p>Get the value used for uninitialized entries. - </p> -<p><em>See also:</em> <seealso marker="#new-2">new/2</seealso>.</p> -</desc></func> -<func> -<name name="fix" arity="1"/> -<fsummary>Fix the size of the array.</fsummary> - -<desc><marker id="fix-1"/> - -<p>Fix the size of the array. This prevents it from growing - automatically upon insertion; see also <seealso marker="#set-3">set/3</seealso>.</p> -<p><em>See also:</em> <seealso marker="#relax-1">relax/1</seealso>.</p> -</desc></func> -<func> -<name name="foldl" arity="3"/> -<fsummary>Fold the elements of the array using the given function and - initial accumulator value.</fsummary> -<desc><marker id="foldl-3"/> -<p>Fold the elements of the array using the given function and - initial accumulator value. The elements are visited in order from the - lowest index to the highest. If <c><anno>Function</anno></c> is not a function, the - call fails with reason <c>badarg</c>. - </p> -<p><em>See also:</em> <seealso marker="#foldr-3">foldr/3</seealso>, <seealso marker="#map-2">map/2</seealso>, <seealso marker="#sparse_foldl-3">sparse_foldl/3</seealso>.</p> -</desc></func> -<func> -<name name="foldr" arity="3"/> -<fsummary>Fold the elements of the array right-to-left using the given - function and initial accumulator value.</fsummary> -<desc><marker id="foldr-3"/> - -<p>Fold the elements of the array right-to-left using the given - function and initial accumulator value. The elements are visited in - order from the highest index to the lowest. If <c><anno>Function</anno></c> is not a - function, the call fails with reason <c>badarg</c>. - </p> -<p><em>See also:</em> <seealso marker="#foldl-3">foldl/3</seealso>, <seealso marker="#map-2">map/2</seealso>.</p> -</desc></func> -<func> -<name name="from_list" arity="1"/> -<fsummary>Equivalent to from_list(List, undefined). -</fsummary> - -<desc><marker id="from_list-1"/> -<p>Equivalent to <seealso marker="#from_list-2">from_list(<c><anno>List</anno></c>, undefined)</seealso>.</p> -</desc></func> -<func> -<name name="from_list" arity="2"/> -<fsummary>Convert a list to an extendible array.</fsummary> - -<desc><marker id="from_list-2"/> - -<p>Convert a list to an extendible array. <c><anno>Default</anno></c> is used as the value - for uninitialized entries of the array. If <c><anno>List</anno></c> is not a proper list, - the call fails with reason <c>badarg</c>. - </p> -<p><em>See also:</em> <seealso marker="#new-2">new/2</seealso>, <seealso marker="#to_list-1">to_list/1</seealso>.</p> -</desc></func> -<func> -<name name="from_orddict" arity="1"/> -<fsummary>Equivalent to from_orddict(Orddict, undefined). -</fsummary> - -<desc><marker id="from_orddict-1"/> -<p>Equivalent to <seealso marker="#from_orddict-2">from_orddict(<c><anno>Orddict</anno></c>, undefined)</seealso>.</p> -</desc></func> -<func> -<name name="from_orddict" arity="2"/> -<fsummary>Convert an ordered list of pairs {Index, Value} to a - corresponding extendible array.</fsummary> - -<desc><marker id="from_orddict-2"/> - -<p>Convert an ordered list of pairs <c>{Index, <anno>Value</anno>}</c> to a - corresponding extendible array. <c><anno>Default</anno></c> is used as the value for - uninitialized entries of the array. If <c><anno>Orddict</anno></c> is not a proper, - ordered list of pairs whose first elements are nonnegative - integers, the call fails with reason <c>badarg</c>. - </p> -<p><em>See also:</em> <seealso marker="#new-2">new/2</seealso>, <seealso marker="#to_orddict-1">to_orddict/1</seealso>.</p> -</desc></func> -<func> -<name name="get" arity="2"/> -<fsummary>Get the value of entry I.</fsummary> - -<desc><marker id="get-2"/> - -<p>Get the value of entry <c><anno>I</anno></c>. If <c><anno>I</anno></c> is not a nonnegative - integer, or if the array has fixed size and <c><anno>I</anno></c> is larger than the - maximum index, the call fails with reason <c>badarg</c>.</p> - - <p>If the array does not have fixed size, this function will return the - default value for any index <c><anno>I</anno></c> greater than <c>size(<anno>Array</anno>)-1</c>.</p> -<p><em>See also:</em> <seealso marker="#set-3">set/3</seealso>.</p> -</desc></func> -<func> -<name name="is_array" arity="1"/> -<fsummary>Returns true if X appears to be an array, otherwise false.</fsummary> - -<desc><marker id="is_array-1"/> - -<p>Returns <c>true</c> if <c><anno>X</anno></c> appears to be an array, otherwise <c>false</c>. - Note that the check is only shallow; there is no guarantee that <c><anno>X</anno></c> - is a well-formed array representation even if this function returns - <c>true</c>.</p> -</desc></func> -<func> -<name name="is_fix" arity="1"/> -<fsummary>Check if the array has fixed size.</fsummary> - -<desc><marker id="is_fix-1"/> - -<p>Check if the array has fixed size. - Returns <c>true</c> if the array is fixed, otherwise <c>false</c>.</p> -<p><em>See also:</em> <seealso marker="#fix-1">fix/1</seealso>.</p> -</desc></func> -<func> -<name name="map" arity="2"/> -<fsummary>Map the given function onto each element of the array.</fsummary> -<desc><marker id="map-2"/> - -<p>Map the given function onto each element of the array. The - elements are visited in order from the lowest index to the highest. - If <c><anno>Function</anno></c> is not a function, the call fails with reason <c>badarg</c>. - </p> -<p><em>See also:</em> <seealso marker="#foldl-3">foldl/3</seealso>, <seealso marker="#foldr-3">foldr/3</seealso>, <seealso marker="#sparse_map-2">sparse_map/2</seealso>.</p> -</desc></func> -<func> -<name name="new" arity="0"/> -<fsummary>Create a new, extendible array with initial size zero.</fsummary> - -<desc><marker id="new-0"/> - -<p>Create a new, extendible array with initial size zero.</p> -<p><em>See also:</em> <seealso marker="#new-1">new/1</seealso>, <seealso marker="#new-2">new/2</seealso>.</p> -</desc></func> -<func> -<name name="new" arity="1"/> -<fsummary>Create a new array according to the given options.</fsummary> - -<desc><marker id="new-1"/> - -<p>Create a new array according to the given options. By default, -the array is extendible and has initial size zero. Array indices -start at 0.</p> - - <p><c><anno>Options</anno></c> is a single term or a list of terms, selected from the - following: - </p><taglist> - <tag><c>N::integer() >= 0</c> or <c>{size, N::integer() >= 0}</c></tag> - <item><p>Specifies the initial size of the array; this also implies - <c>{fixed, true}</c>. If <c>N</c> is not a nonnegative integer, the call - fails with reason <c>badarg</c>.</p></item> - <tag><c>fixed</c> or <c>{fixed, true}</c></tag> - <item><p>Creates a fixed-size array; see also <seealso marker="#fix-1">fix/1</seealso>.</p></item> - <tag><c>{fixed, false}</c></tag> - <item><p>Creates an extendible (non fixed-size) array.</p></item> - <tag><c>{default, Value}</c></tag> - <item><p>Sets the default value for the array to <c>Value</c>.</p></item> - </taglist><p> -Options are processed in the order they occur in the list, i.e., -later options have higher precedence.</p> - - <p>The default value is used as the value of uninitialized entries, and -cannot be changed once the array has been created.</p> - - <p>Examples: - </p><pre> array:new(100)</pre><p> creates a fixed-size array of size 100. - </p><pre> array:new({default,0})</pre><p> creates an empty, extendible array - whose default value is 0. - </p><pre> array:new([{size,10},{fixed,false},{default,-1}])</pre><p> creates an - extendible array with initial size 10 whose default value is -1. - </p> -<p><em>See also:</em> <seealso marker="#fix-1">fix/1</seealso>, <seealso marker="#from_list-2">from_list/2</seealso>, <seealso marker="#get-2">get/2</seealso>, <seealso marker="#new-0">new/0</seealso>, <seealso marker="#new-2">new/2</seealso>, <seealso marker="#set-3">set/3</seealso>.</p> -</desc></func> -<func> -<name name="new" arity="2"/> -<fsummary>Create a new array according to the given size and options.</fsummary> - -<desc><marker id="new-2"/> - -<p>Create a new array according to the given size and options. If - <c><anno>Size</anno></c> is not a nonnegative integer, the call fails with reason - <c>badarg</c>. By default, the array has fixed size. Note that any size - specifications in <c><anno>Options</anno></c> will override the <c><anno>Size</anno></c> parameter.</p> - - <p>If <c><anno>Options</anno></c> is a list, this is simply equivalent to <c>new([{size, - <anno>Size</anno>} | <anno>Options</anno>]</c>, otherwise it is equivalent to <c>new([{size, <anno>Size</anno>} | - [<anno>Options</anno>]]</c>. However, using this function directly is more efficient.</p> - - <p>Example: - </p><pre> array:new(100, {default,0})</pre><p> creates a fixed-size array of size - 100, whose default value is 0. - </p> -<p><em>See also:</em> <seealso marker="#new-1">new/1</seealso>.</p> -</desc></func> -<func> -<name name="relax" arity="1"/> -<fsummary>Make the array resizable.</fsummary> - -<desc><marker id="relax-1"/> - -<p>Make the array resizable. (Reverses the effects of <seealso marker="#fix-1">fix/1</seealso>.)</p> -<p><em>See also:</em> <seealso marker="#fix-1">fix/1</seealso>.</p> -</desc></func> -<func> -<name name="reset" arity="2"/> -<fsummary>Reset entry I to the default value for the array.</fsummary> - -<desc><marker id="reset-2"/> - -<p>Reset entry <c><anno>I</anno></c> to the default value for the array. - If the value of entry <c><anno>I</anno></c> is the default value the array will be - returned unchanged. Reset will never change size of the array. - Shrinking can be done explicitly by calling <seealso marker="#resize-2">resize/2</seealso>.</p> - - <p>If <c><anno>I</anno></c> is not a nonnegative integer, or if the array has fixed size - and <c><anno>I</anno></c> is larger than the maximum index, the call fails with reason - <c>badarg</c>; cf. <seealso marker="#set-3">set/3</seealso> - </p> -<p><em>See also:</em> <seealso marker="#new-2">new/2</seealso>, <seealso marker="#set-3">set/3</seealso>.</p> -</desc></func> -<func> -<name name="resize" arity="1"/> -<fsummary>Change the size of the array to that reported by sparse_size/1.</fsummary> - -<desc><marker id="resize-1"/> - -<p>Change the size of the array to that reported by <seealso marker="#sparse_size-1">sparse_size/1</seealso>. If the given array has fixed size, the resulting - array will also have fixed size.</p> -<p><em>See also:</em> <seealso marker="#resize-2">resize/2</seealso>, <seealso marker="#sparse_size-1">sparse_size/1</seealso>.</p> -</desc></func> -<func> -<name name="resize" arity="2"/> -<fsummary>Change the size of the array.</fsummary> - -<desc><marker id="resize-2"/> - -<p>Change the size of the array. If <c><anno>Size</anno></c> is not a nonnegative - integer, the call fails with reason <c>badarg</c>. If the given array has - fixed size, the resulting array will also have fixed size.</p> -</desc></func> -<func> -<name name="set" arity="3"/> -<fsummary>Set entry I of the array to Value.</fsummary> - -<desc><marker id="set-3"/> - -<p>Set entry <c><anno>I</anno></c> of the array to <c><anno>Value</anno></c>. If <c><anno>I</anno></c> is not a - nonnegative integer, or if the array has fixed size and <c><anno>I</anno></c> is larger - than the maximum index, the call fails with reason <c>badarg</c>.</p> - - <p>If the array does not have fixed size, and <c><anno>I</anno></c> is greater than - <c>size(<anno>Array</anno>)-1</c>, the array will grow to size <c><anno>I</anno>+1</c>. - </p> -<p><em>See also:</em> <seealso marker="#get-2">get/2</seealso>, <seealso marker="#reset-2">reset/2</seealso>.</p> -</desc></func> -<func> -<name name="size" arity="1"/> -<fsummary>Get the number of entries in the array.</fsummary> - -<desc><marker id="size-1"/> - -<p>Get the number of entries in the array. Entries are numbered - from 0 to <c>size(<anno>Array</anno>)-1</c>; hence, this is also the index of the first - entry that is guaranteed to not have been previously set.</p> -<p><em>See also:</em> <seealso marker="#set-3">set/3</seealso>, <seealso marker="#sparse_size-1">sparse_size/1</seealso>.</p> -</desc></func> -<func> -<name name="sparse_foldl" arity="3"/> -<fsummary>Fold the elements of the array using the given function and - initial accumulator value, skipping default-valued entries.</fsummary> -<desc><marker id="sparse_foldl-3"/> - -<p>Fold the elements of the array using the given function and - initial accumulator value, skipping default-valued entries. The - elements are visited in order from the lowest index to the highest. - If <c><anno>Function</anno></c> is not a function, the call fails with reason <c>badarg</c>. - </p> -<p><em>See also:</em> <seealso marker="#foldl-3">foldl/3</seealso>, <seealso marker="#sparse_foldr-3">sparse_foldr/3</seealso>.</p> -</desc></func> -<func> -<name name="sparse_foldr" arity="3"/> -<fsummary>Fold the elements of the array right-to-left using the given - function and initial accumulator value, skipping default-valued - entries.</fsummary> -<desc><marker id="sparse_foldr-3"/> - -<p>Fold the elements of the array right-to-left using the given - function and initial accumulator value, skipping default-valued - entries. The elements are visited in order from the highest index to - the lowest. If <c><anno>Function</anno></c> is not a function, the call fails with - reason <c>badarg</c>. - </p> -<p><em>See also:</em> <seealso marker="#foldr-3">foldr/3</seealso>, <seealso marker="#sparse_foldl-3">sparse_foldl/3</seealso>.</p> -</desc></func> -<func> -<name name="sparse_map" arity="2"/> -<fsummary>Map the given function onto each element of the array, skipping - default-valued entries.</fsummary> -<desc><marker id="sparse_map-2"/> - -<p>Map the given function onto each element of the array, skipping - default-valued entries. The elements are visited in order from the - lowest index to the highest. If <c><anno>Function</anno></c> is not a function, the - call fails with reason <c>badarg</c>. - </p> -<p><em>See also:</em> <seealso marker="#map-2">map/2</seealso>.</p> -</desc></func> -<func> -<name name="sparse_size" arity="1"/> -<fsummary>Get the number of entries in the array up until the last - non-default valued entry.</fsummary> - -<desc><marker id="sparse_size-1"/> - -<p>Get the number of entries in the array up until the last - non-default valued entry. In other words, returns <c>I+1</c> if <c>I</c> is the - last non-default valued entry in the array, or zero if no such entry - exists.</p> -<p><em>See also:</em> <seealso marker="#resize-1">resize/1</seealso>, <seealso marker="#size-1">size/1</seealso>.</p> -</desc></func> -<func> -<name name="sparse_to_list" arity="1"/> -<fsummary>Converts the array to a list, skipping default-valued entries.</fsummary> - -<desc><marker id="sparse_to_list-1"/> - -<p>Converts the array to a list, skipping default-valued entries. - </p> -<p><em>See also:</em> <seealso marker="#to_list-1">to_list/1</seealso>.</p> -</desc></func> -<func> -<name name="sparse_to_orddict" arity="1"/> -<fsummary>Convert the array to an ordered list of pairs {Index, Value}, - skipping default-valued entries.</fsummary> - -<desc><marker id="sparse_to_orddict-1"/> - -<p>Convert the array to an ordered list of pairs <c>{Index, <anno>Value</anno>}</c>, - skipping default-valued entries. - </p> -<p><em>See also:</em> <seealso marker="#to_orddict-1">to_orddict/1</seealso>.</p> -</desc></func> -<func> -<name name="to_list" arity="1"/> -<fsummary>Converts the array to a list.</fsummary> - -<desc><marker id="to_list-1"/> - -<p>Converts the array to a list. - </p> -<p><em>See also:</em> <seealso marker="#from_list-2">from_list/2</seealso>, <seealso marker="#sparse_to_list-1">sparse_to_list/1</seealso>.</p> -</desc></func> -<func> -<name name="to_orddict" arity="1"/> -<fsummary>Convert the array to an ordered list of pairs {Index, Value}.</fsummary> - -<desc><marker id="to_orddict-1"/> - -<p>Convert the array to an ordered list of pairs <c>{Index, <anno>Value</anno>}</c>. - </p> -<p><em>See also:</em> <seealso marker="#from_orddict-2">from_orddict/2</seealso>, <seealso marker="#sparse_to_orddict-1">sparse_to_orddict/1</seealso>.</p> -</desc></func></funcs> - + <title>array</title> + <prepared></prepared> + <responsible></responsible> + <docno>1</docno> + <approved></approved> + <checked></checked> + <date></date> + <rev>A</rev> + <file>array.xml</file> + </header> + <module>array</module> + <modulesummary>Functional, extendible arrays.</modulesummary> + <description> + <p>Functional, extendible arrays. Arrays can have fixed size, or can grow + automatically as needed. A default value is used for entries that have not + been explicitly set.</p> + + <p>Arrays uses <em>zero</em>-based indexing. This is a deliberate design + choice and differs from other Erlang data structures, for example, + tuples.</p> + + <p>Unless specified by the user when the array is created, the default + value is the atom <c>undefined</c>. There is no difference between an + unset entry and an entry that has been explicitly set to the same value + as the default one (compare + <seealso marker="#reset-2"><c>reset/2</c></seealso>). If you need to + differentiate between unset and set entries, ensure that the default value + cannot be confused with the values of set entries.</p> + + <p>The array never shrinks automatically. If an index <c>I</c> has been used + to set an entry successfully, all indices in the range [0,<c>I</c>] stay + accessible unless the array size is explicitly changed by calling + <seealso marker="#resize-2"><c>resize/2</c></seealso>.</p> + + <p><em>Examples:</em></p> + + <p>Create a fixed-size array with entries 0-9 set to <c>undefined</c>:</p> + + <pre> +A0 = array:new(10). +10 = array:size(A0).</pre> + + <p>Create an extendible array and set entry 17 to <c>true</c>, causing the + array to grow automatically:</p> + + <pre> +A1 = array:set(17, true, array:new()). +18 = array:size(A1).</pre> + + <p>Read back a stored value:</p> + + <pre> +true = array:get(17, A1).</pre> + + <p>Accessing an unset entry returns default value:</p> + + <pre> +undefined = array:get(3, A1)</pre> + + <p>Accessing an entry beyond the last set entry also returns the default + value, if the array does not have fixed size:</p> + + <pre> +undefined = array:get(18, A1).</pre> + + <p>"Sparse" functions ignore default-valued entries:</p> + + <pre> +A2 = array:set(4, false, A1). +[{4, false}, {17, true}] = array:sparse_to_orddict(A2).</pre> + + <p>An extendible array can be made fixed-size later:</p> + + <pre> +A3 = array:fix(A2).</pre> + + <p>A fixed-size array does not grow automatically and does not allow + accesses beyond the last set entry:</p> + + <pre> +{'EXIT',{badarg,_}} = (catch array:set(18, true, A3)). +{'EXIT',{badarg,_}} = (catch array:get(18, A3)).</pre> + </description> + + <datatypes> + <datatype> + <name name="array" n_vars="1"/> + <desc> + <p>A functional, extendible array. The representation is not documented + and is subject to change without notice. Notice that arrays cannot be + directly compared for equality.</p> + </desc> + </datatype> + <datatype> + <name name="array" n_vars="0"/> + </datatype> + <datatype> + <name name="array_indx"/> + </datatype> + <datatype> + <name name="array_opts"/> + </datatype> + <datatype> + <name name="array_opt"/> + </datatype> + <datatype> + <name name="indx_pairs"/> + </datatype> + <datatype> + <name name="indx_pair"/> + </datatype> + </datatypes> + + <funcs> + <func> + <name name="default" arity="1"/> + <fsummary>Get the value used for uninitialized entries.</fsummary> + <desc><marker id="default-1"/> + <p>Gets the value used for uninitialized entries.</p> + <p>See also <seealso marker="#new-2"><c>new/2</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="fix" arity="1"/> + <fsummary>Fix the array size.</fsummary> + <desc><marker id="fix-1"/> + <p>Fixes the array size. This prevents it from growing automatically + upon insertion.</p> + <p>See also <seealso marker="#set-3"><c>set/3</c></seealso> and + <seealso marker="#relax-1"><c>relax/1</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="foldl" arity="3"/> + <fsummary>Fold the array elements using the specified function and initial + accumulator value.</fsummary> + <desc><marker id="foldl-3"/> + <p>Folds the array elements using the specified function and initial + accumulator value. The elements are visited in order from the lowest + index to the highest. If <c><anno>Function</anno></c> is not a + function, the call fails with reason <c>badarg</c>.</p> + <p>See also <seealso marker="#foldr-3"><c>foldr/3</c></seealso>, + <seealso marker="#map-2"><c>map/2</c></seealso>, + <seealso marker="#sparse_foldl-3"><c>sparse_foldl/3</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="foldr" arity="3"/> + <fsummary>Fold the array elements right-to-left using the specified + function and initial accumulator value.</fsummary> + <desc><marker id="foldr-3"/> + <p>Folds the array elements right-to-left using the specified function + and initial accumulator value. The elements are visited in order from + the highest index to the lowest. If <c><anno>Function</anno></c> is + not a function, the call fails with reason <c>badarg</c>.</p> + <p>See also <seealso marker="#foldl-3"><c>foldl/3</c></seealso>, + <seealso marker="#map-2"><c>map/2</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="from_list" arity="1"/> + <fsummary>Equivalent to <c>from_list(List, undefined)</c>.</fsummary> + <desc><marker id="from_list-1"/> + <p>Equivalent to + <seealso marker="#from_list-2"><c>from_list(<anno>List</anno>, undefined)</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="from_list" arity="2"/> + <fsummary>Convert a list to an extendible array.</fsummary> + <desc><marker id="from_list-2"/> + <p>Converts a list to an extendible array. <c><anno>Default</anno></c> + is used as the value for uninitialized entries of the array. If + <c><anno>List</anno></c> is not a proper list, the call fails with + reason <c>badarg</c>.</p> + <p>See also <seealso marker="#new-2"><c>new/2</c></seealso>, + <seealso marker="#to_list-1"><c>to_list/1</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="from_orddict" arity="1"/> + <fsummary>Equivalent to <c>from_orddict(Orddict, undefined)</c>. + </fsummary> + <desc><marker id="from_orddict-1"/> + <p>Equivalent to + <seealso marker="#from_orddict-2"><c>from_orddict(<anno>Orddict</anno>, undefined)</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="from_orddict" arity="2"/> + <fsummary>Convert an ordered list of pairs <c>{Index, Value}</c> to a + corresponding extendible array.</fsummary> + <desc><marker id="from_orddict-2"/> + <p>Converts an ordered list of pairs <c>{Index, <anno>Value</anno>}</c> + to a corresponding extendible array. <c><anno>Default</anno></c> is + used as the value for uninitialized entries of the array. If + <c><anno>Orddict</anno></c> is not a proper, ordered list of pairs + whose first elements are non-negative integers, the call fails with + reason <c>badarg</c>.</p> + <p>See also <seealso marker="#new-2"><c>new/2</c></seealso>, + <seealso marker="#to_orddict-1"><c>to_orddict/1</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="get" arity="2"/> + <fsummary>Get the value of entry <c>I</c>.</fsummary> + <desc><marker id="get-2"/> + <p>Gets the value of entry <c><anno>I</anno></c>. If + <c><anno>I</anno></c> is not a non-negative integer, or if the array + has fixed size and <c><anno>I</anno></c> is larger than the maximum + index, the call fails with reason <c>badarg</c>.</p> + <p>If the array does not have fixed size, the default value for any + index <c><anno>I</anno></c> greater than + <c>size(<anno>Array</anno>)-1</c> is returned.</p> + <p>See also <seealso marker="#set-3"><c>set/3</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="is_array" arity="1"/> + <fsummary>Returns <c>true</c> if <c>X</c> is an array, otherwise + <c>false</c>.</fsummary> + <desc><marker id="is_array-1"/> + <p>Returns <c>true</c> if <c><anno>X</anno></c> is an array, otherwise + <c>false</c>. Notice that the check is only shallow, as there is no + guarantee that <c><anno>X</anno></c> is a well-formed array + representation even if this function returns <c>true</c>.</p> + </desc> + </func> + + <func> + <name name="is_fix" arity="1"/> + <fsummary>Check if the array has fixed size.</fsummary> + <desc><marker id="is_fix-1"/> + <p>Checks if the array has fixed size. Returns <c>true</c> if the array + is fixed, otherwise <c>false</c>.</p> + <p>See also <seealso marker="#fix-1"><c>fix/1</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="map" arity="2"/> + <fsummary>Map the specified function onto each array element.</fsummary> + <desc><marker id="map-2"/> + <p>Maps the specified function onto each array element. The elements are + visited in order from the lowest index to the highest. If + <c><anno>Function</anno></c> is not a function, the call fails with + reason <c>badarg</c>.</p> + <p>See also <seealso marker="#foldl-3"><c>foldl/3</c></seealso>, + <seealso marker="#foldr-3"><c>foldr/3</c></seealso>, + <seealso marker="#sparse_map-2"><c>sparse_map/2</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="new" arity="0"/> + <fsummary>Create a new, extendible array with initial size zero. + </fsummary> + <desc><marker id="new-0"/> + <p>Creates a new, extendible array with initial size zero.</p> + <p>See also <seealso marker="#new-1"><c>new/1</c></seealso>, + <seealso marker="#new-2"><c>new/2</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="new" arity="1"/> + <fsummary>Create a new array according to the specified options. + </fsummary> + <desc><marker id="new-1"/> + <p>Creates a new array according to the specified otions. By default, + the array is extendible and has initial size zero. Array indices + start at <c>0</c>.</p> + <p><c><anno>Options</anno></c> is a single term or a list of terms, + selected from the following:</p> + <taglist> + <tag><c>N::integer() >= 0</c> or <c>{size, N::integer() >= 0}</c> + </tag> + <item><p>Specifies the initial array size; this also implies + <c>{fixed, true}</c>. If <c>N</c> is not a non-negative integer, the + call fails with reason <c>badarg</c>.</p></item> + <tag><c>fixed</c> or <c>{fixed, true}</c></tag> + <item><p>Creates a fixed-size array. See also + <seealso marker="#fix-1"><c>fix/1</c></seealso>.</p></item> + <tag><c>{fixed, false}</c></tag> + <item><p>Creates an extendible (non-fixed-size) array.</p></item> + <tag><c>{default, Value}</c></tag> + <item><p>Sets the default value for the array to <c>Value</c>.</p> + </item> + </taglist> + <p>Options are processed in the order they occur in the list, that is, + later options have higher precedence.</p> + <p>The default value is used as the value of uninitialized entries, and + cannot be changed once the array has been created.</p> + <p><em>Examples:</em></p> + <pre> +array:new(100)</pre> + <p>creates a fixed-size array of size 100.</p> + <pre> +array:new({default,0})</pre> + <p>creates an empty, extendible array whose default value is <c>0</c>. + </p> + <pre> +array:new([{size,10},{fixed,false},{default,-1}])</pre> + <p>creates an extendible array with initial size 10 whose default value + is <c>-1</c>.</p> + <p>See also <seealso marker="#fix-1"><c>fix/1</c></seealso>, + <seealso marker="#from_list-2"><c>from_list/2</c></seealso>, + <seealso marker="#get-2"><c>get/2</c></seealso>, + <seealso marker="#new-0"><c>new/0</c></seealso>, + <seealso marker="#new-2"><c>new/2</c></seealso>, + <seealso marker="#set-3"><c>set/3</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="new" arity="2"/> + <fsummary>Create a new array according to the specified size and options. + </fsummary> + <desc><marker id="new-2"/> + <p>Creates a new array according to the specified size and options. If + <c><anno>Size</anno></c> is not a non-negative integer, the call fails + with reason <c>badarg</c>. By default, the array has fixed size. + Notice that any size specifications in <c><anno>Options</anno></c> + override parameter <c><anno>Size</anno></c>.</p> + <p>If <c><anno>Options</anno></c> is a list, this is equivalent to + <c>new([{size, <anno>Size</anno>} | <anno>Options</anno>]</c>, + otherwise it is equivalent to <c>new([{size, <anno>Size</anno>} | + [<anno>Options</anno>]]</c>. However, using this function directly is + more efficient.</p> + <p><em>Example:</em></p> + <pre> +array:new(100, {default,0})</pre> + <p>creates a fixed-size array of size 100, whose default value is + <c>0</c>.</p> + <p>See also <seealso marker="#new-1"><c>new/1</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="relax" arity="1"/> + <fsummary>Make the array resizable.</fsummary> + <desc><marker id="relax-1"/> + <p>Makes the array resizable. (Reverses the effects of + <seealso marker="#fix-1"><c>fix/1</c></seealso>.)</p> + <p>See also <seealso marker="#fix-1"><c>fix/1</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="reset" arity="2"/> + <fsummary>Reset entry <c>I</c> to the default value for the array. + </fsummary> + <desc><marker id="reset-2"/> + <p>Resets entry <c><anno>I</anno></c> to the default value for the + array. If the value of entry <c><anno>I</anno></c> is the default + value, the array is returned unchanged. Reset never changes the array + size. Shrinking can be done explicitly by calling + <seealso marker="#resize-2"><c>resize/2</c></seealso>.</p> + <p>If <c><anno>I</anno></c> is not a non-negative integer, or if the + array has fixed size and <c><anno>I</anno></c> is larger than the + maximum index, the call fails with reason <c>badarg</c>; compare + <seealso marker="#set-3"><c>set/3</c></seealso></p> + <p>See also <seealso marker="#new-2"><c>new/2</c></seealso>, + <seealso marker="#set-3"><c>set/3</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="resize" arity="1"/> + <fsummary>Change the array size to that reported by <c>sparse_size/1</c>. + </fsummary> + <desc><marker id="resize-1"/> + <p>Changes the array size to that reported by + <seealso marker="#sparse_size-1"><c>sparse_size/1</c></seealso>. If + the specified array has fixed size, also the resulting array has fixed + size.</p> + <p>See also <seealso marker="#resize-2"><c>resize/2</c></seealso>, + <seealso marker="#sparse_size-1"><c>sparse_size/1</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="resize" arity="2"/> + <fsummary>Change the array size.</fsummary> + <desc><marker id="resize-2"/> + <p>Change the array size. If <c><anno>Size</anno></c> is not a + non-negative integer, the call fails with reason <c>badarg</c>. If + the specified array has fixed size, also the resulting array has fixed + size.</p> + </desc> + </func> + + <func> + <name name="set" arity="3"/> + <fsummary>Set entry <c>I</c> of the array to <c>Value</c>.</fsummary> + <desc><marker id="set-3"/> + <p>Sets entry <c><anno>I</anno></c> of the array to + <c><anno>Value</anno></c>. If <c><anno>I</anno></c> is not a + non-negative integer, or if the array has fixed size and + <c><anno>I</anno></c> is larger than the maximum index, the call + fails with reason <c>badarg</c>.</p> + <p>If the array does not have fixed size, and <c><anno>I</anno></c> is + greater than <c>size(<anno>Array</anno>)-1</c>, the array grows to + size <c><anno>I</anno>+1</c>.</p> + <p>See also <seealso marker="#get-2"><c>get/2</c></seealso>, + <seealso marker="#reset-2"><c>reset/2</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="size" arity="1"/> + <fsummary>Get the number of entries in the array.</fsummary> + <desc><marker id="size-1"/> + <p>Gets the number of entries in the array. Entries are numbered from + <c>0</c> to <c>size(<anno>Array</anno>)-1</c>. Hence, this is also the + index of the first entry that is guaranteed to not have been + previously set.</p> + <p>See also <seealso marker="#set-3"><c>set/3</c></seealso>, + <seealso marker="#sparse_size-1"><c>sparse_size/1</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="sparse_foldl" arity="3"/> + <fsummary>Fold the array elements using the specified function and initial + accumulator value, skipping default-valued entries.</fsummary> + <desc><marker id="sparse_foldl-3"/> + <p>Folds the array elements using the specified function and initial + accumulator value, skipping default-valued entries. The elements are + visited in order from the lowest index to the highest. If + <c><anno>Function</anno></c> is not a function, the call fails with + reason <c>badarg</c>.</p> + <p>See also <seealso marker="#foldl-3"><c>foldl/3</c></seealso>, + <seealso marker="#sparse_foldr-3"><c>sparse_foldr/3</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="sparse_foldr" arity="3"/> + <fsummary>Fold the array elements right-to-left using the specified + function and initial accumulator value, skipping default-valued + entries.</fsummary> + <desc><marker id="sparse_foldr-3"/> + <p>Folds the array elements right-to-left using the specified + function and initial accumulator value, skipping default-valued + entries. The elements are visited in order from the highest index to + the lowest. If <c><anno>Function</anno></c> is not a function, the + call fails with reason <c>badarg</c>.</p> + <p>See also <seealso marker="#foldr-3"><c>foldr/3</c></seealso>, + <seealso marker="#sparse_foldl-3"><c>sparse_foldl/3</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="sparse_map" arity="2"/> + <fsummary>Map the specified function onto each array element, skipping + default-valued entries.</fsummary> + <desc><marker id="sparse_map-2"/> + <p>Maps the specified function onto each array element, skipping + default-valued entries. The elements are visited in order from the + lowest index to the highest. If <c><anno>Function</anno></c> is not a + function, the call fails with reason <c>badarg</c>.</p> + <p>See also <seealso marker="#map-2"><c>map/2</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="sparse_size" arity="1"/> + <fsummary>Get the number of entries in the array up until the last + non-default-valued entry.</fsummary> + <desc><marker id="sparse_size-1"/> + <p>Gets the number of entries in the array up until the last + non-default-valued entry. That is, returns <c>I+1</c> if <c>I</c> is + the last non-default-valued entry in the array, or zero if no such + entry exists.</p> + <p>See also <seealso marker="#resize-1"><c>resize/1</c></seealso>, + <seealso marker="#size-1"><c>size/1</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="sparse_to_list" arity="1"/> + <fsummary>Convert the array to a list, skipping default-valued entries. + </fsummary> + <desc><marker id="sparse_to_list-1"/> + <p>Converts the array to a list, skipping default-valued entries.</p> + <p>See also <seealso marker="#to_list-1"><c>to_list/1</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="sparse_to_orddict" arity="1"/> + <fsummary>Convert the array to an ordered list of pairs <c>{Index, + Value}</c>, skipping default-valued entries.</fsummary> + <desc><marker id="sparse_to_orddict-1"/> + <p>Converts the array to an ordered list of pairs <c>{Index, + <anno>Value</anno>}</c>, skipping default-valued entries.</p> + <p>See also + <seealso marker="#to_orddict-1"><c>to_orddict/1</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="to_list" arity="1"/> + <fsummary>Convert the array to a list.</fsummary> + <desc><marker id="to_list-1"/> + <p>Converts the array to a list.</p> + <p>See also <seealso marker="#from_list-2"><c>from_list/2</c></seealso>, + <seealso marker="#sparse_to_list-1"><c>sparse_to_list/1</c></seealso>. + </p> + </desc> + </func> + + <func> + <name name="to_orddict" arity="1"/> + <fsummary>Convert the array to an ordered list of pairs <c>{Index, + Value}</c>.</fsummary> + <desc><marker id="to_orddict-1"/> + <p>Converts the array to an ordered list of pairs <c>{Index, + <anno>Value</anno>}</c>.</p> + <p>See also + <seealso marker="#from_orddict-2"><c>from_orddict/2</c></seealso>, + <seealso marker="#sparse_to_orddict-1"><c>sparse_to_orddict/1</c></seealso>. + </p> + </desc> + </func> + </funcs> </erlref> diff --git a/lib/stdlib/doc/src/assert_hrl.xml b/lib/stdlib/doc/src/assert_hrl.xml index ef4f928e57..e2dfc2ab9b 100644 --- a/lib/stdlib/doc/src/assert_hrl.xml +++ b/lib/stdlib/doc/src/assert_hrl.xml @@ -28,131 +28,134 @@ <date></date> <rev></rev> </header> - <file>assert.hrl</file> - <filesummary>Assert Macros</filesummary> + <file>assert.hrl.xml</file> + <filesummary>Assert macros.</filesummary> <description> <p>The include file <c>assert.hrl</c> provides macros for inserting - assertions in your program code.</p> - <p>These macros are defined in the Stdlib include file - <c>assert.hrl</c>. Include the following directive in the module - from which the function is called:</p> - <code type="none"> + assertions in your program code.</p> + + <p>Include the following directive in the module from which the function is + called:</p> + + <code type="none"> -include_lib("stdlib/include/assert.hrl").</code> - <p>When an assertion succeeds, the assert macro yields the atom - <c>ok</c>. When an assertion fails, an exception of type <c>error</c> is - instead generated. The associated error term will have the form - <c>{Macro, Info}</c>, where <c>Macro</c> is the name of the macro, for - example <c>assertEqual</c>, and <c>Info</c> will be a list of tagged - values such as <c>[{module, M}, {line, L}, ...]</c> giving more - information about the location and cause of the exception. All entries - in the <c>Info</c> list are optional, and you should not rely - programatically on any of them being present.</p> - - <p>If the macro <c>NOASSERT</c> is defined when the <c>assert.hrl</c> - include file is read by the compiler, the macros will be defined as - equivalent to the atom <c>ok</c>. The test will not be performed, and - there will be no cost at runtime.</p> + + <p>When an assertion succeeds, the assert macro yields the atom <c>ok</c>. + When an assertion fails, an exception of type <c>error</c> is generated. + The associated error term has the form <c>{Macro, Info}</c>. <c>Macro</c> + is the macro name, for example, <c>assertEqual</c>. <c>Info</c> is a list + of tagged values, such as <c>[{module, M}, {line, L}, ...]</c>, which + gives more information about the location and cause of the exception. All + entries in the <c>Info</c> list are optional; do not rely programatically + on any of them being present.</p> + + <p>If the macro <c>NOASSERT</c> is defined when <c>assert.hrl</c> is read + by the compiler, the macros are defined as equivalent to the atom + <c>ok</c>. The test is not performed and there is no cost at runtime.</p> <p>For example, using <c>erlc</c> to compile your modules, the following - will disable all assertions:</p> - <code type="none"> + disable all assertions:</p> + + <code type="none"> erlc -DNOASSERT=true *.erl</code> - <p>(The value of <c>NOASSERT</c> does not matter, only the fact that it - is defined.)</p> + + <p>The value of <c>NOASSERT</c> does not matter, only the fact that it is + defined.</p> + <p>A few other macros also have effect on the enabling or disabling of - assertions:</p> + assertions:</p> + <list type="bulleted"> - <item>If <c>NODEBUG</c> is defined, it implies <c>NOASSERT</c>, unless - <c>DEBUG</c> is also defined, which is assumed to take - precedence.</item> - <item>If <c>ASSERT</c> is defined, it overrides <c>NOASSERT</c>, that - is, the assertions will remain enabled.</item> + <item><p>If <c>NODEBUG</c> is defined, it implies <c>NOASSERT</c>, unless + <c>DEBUG</c> is also defined, which is assumed to take precedence.</p> + </item> + <item><p>If <c>ASSERT</c> is defined, it overrides <c>NOASSERT</c>, that + is, the assertions remain enabled.</p></item> </list> - <p>If you prefer, you can thus use only <c>DEBUG</c>/<c>NODEBUG</c> as - the main flags to control the behaviour of the assertions (which is - useful if you have other compiler conditionals or debugging macros - controlled by those flags), or you can use <c>ASSERT</c>/<c>NOASSERT</c> - to control only the assert macros.</p> + <p>If you prefer, you can thus use only <c>DEBUG</c>/<c>NODEBUG</c> as the + main flags to control the behavior of the assertions (which is useful if + you have other compiler conditionals or debugging macros controlled by + those flags), or you can use <c>ASSERT</c>/<c>NOASSERT</c> to control only + the assert macros.</p> </description> <section> <title>Macros</title> <taglist> <tag><c>assert(BoolExpr)</c></tag> - <item><p>Tests that <c>BoolExpr</c> completes normally returning - <c>true</c>.</p> + <item> + <p>Tests that <c>BoolExpr</c> completes normally returning + <c>true</c>.</p> </item> - <tag><c>assertNot(BoolExpr)</c></tag> - <item><p>Tests that <c>BoolExpr</c> completes normally returning - <c>false</c>.</p> + <item> + <p>Tests that <c>BoolExpr</c> completes normally returning + <c>false</c>.</p> </item> - <tag><c>assertMatch(GuardedPattern, Expr)</c></tag> - <item><p>Tests that <c>Expr</c> completes normally yielding a value - that matches <c>GuardedPattern</c>. For example:</p> + <item> + <p>Tests that <c>Expr</c> completes normally yielding a value that + matches <c>GuardedPattern</c>, for example:</p> <code type="none"> - ?assertMatch({bork, _}, f())</code> - <p>Note that a guard <c>when ...</c> can be included:</p> +?assertMatch({bork, _}, f())</code> + <p>Notice that a guard <c>when ...</c> can be included:</p> <code type="none"> - ?assertMatch({bork, X} when X > 0, f())</code> +?assertMatch({bork, X} when X > 0, f())</code> </item> - <tag><c>assertNotMatch(GuardedPattern, Expr)</c></tag> - <item><p>Tests that <c>Expr</c> completes normally yielding a value - that does not match <c>GuardedPattern</c>.</p> - <p>As in <c>assertMatch</c>, <c>GuardedPattern</c> can have a - <c>when</c> part.</p> + <item> + <p>Tests that <c>Expr</c> completes normally yielding a value that does + not match <c>GuardedPattern</c>.</p> + <p>As in <c>assertMatch</c>, <c>GuardedPattern</c> can have a + <c>when</c> part.</p> </item> - <tag><c>assertEqual(ExpectedValue, Expr)</c></tag> - <item><p>Tests that <c>Expr</c> completes normally yielding a value - that is exactly equal to <c>ExpectedValue</c>.</p> + <item> + <p>Tests that <c>Expr</c> completes normally yielding a value that is + exactly equal to <c>ExpectedValue</c>.</p> </item> - <tag><c>assertNotEqual(ExpectedValue, Expr)</c></tag> - <item><p>Tests that <c>Expr</c> completes normally yielding a value - that is not exactly equal to <c>ExpectedValue</c>.</p> + <item> + <p>Tests that <c>Expr</c> completes normally yielding a value that is + not exactly equal to <c>ExpectedValue</c>.</p> </item> - <tag><c>assertException(Class, Term, Expr)</c></tag> - <item><p>Tests that <c>Expr</c> completes abnormally with an exception - of type <c>Class</c> and with the associated <c>Term</c>. The - assertion fails if <c>Expr</c> raises a different exception or if it - completes normally returning any value.</p> - <p>Note that both <c>Class</c> and <c>Term</c> can be guarded - patterns, as in <c>assertMatch</c>.</p> + <item> + <p>Tests that <c>Expr</c> completes abnormally with an exception of type + <c>Class</c> and with the associated <c>Term</c>. The assertion fails + if <c>Expr</c> raises a different exception or if it completes + normally returning any value.</p> + <p>Notice that both <c>Class</c> and <c>Term</c> can be guarded + patterns, as in <c>assertMatch</c>.</p> </item> - <tag><c>assertNotException(Class, Term, Expr)</c></tag> - <item><p>Tests that <c>Expr</c> does not evaluate abnormally with an - exception of type <c>Class</c> and with the associated <c>Term</c>. - The assertion succeeds if <c>Expr</c> raises a different exception or - if it completes normally returning any value.</p> - <p>As in <c>assertException</c>, both <c>Class</c> and <c>Term</c> - can be guarded patterns.</p> + <item> + <p>Tests that <c>Expr</c> does not evaluate abnormally with an + exception of type <c>Class</c> and with the associated <c>Term</c>. + The assertion succeeds if <c>Expr</c> raises a different exception or + if it completes normally returning any value.</p> + <p>As in <c>assertException</c>, both <c>Class</c> and <c>Term</c> can + be guarded patterns.</p> </item> - <tag><c>assertError(Term, Expr)</c></tag> - <item><p>Equivalent to <c>assertException(error, Term, - Expr)</c></p> + <item> + <p>Equivalent to <c>assertException(error, Term, Expr)</c></p> </item> - <tag><c>assertExit(Term, Expr)</c></tag> - <item><p>Equivalent to <c>assertException(exit, Term, Expr)</c></p> + <item> + <p>Equivalent to <c>assertException(exit, Term, Expr)</c></p> </item> - <tag><c>assertThrow(Term, Expr)</c></tag> - <item><p>Equivalent to <c>assertException(throw, Term, Expr)</c></p> + <item> + <p>Equivalent to <c>assertException(throw, Term, Expr)</c></p> </item> - </taglist> </section> <section> - <title>SEE ALSO</title> - <p><seealso marker="compiler:compile">compile(3)</seealso></p> - <p><seealso marker="erts:erlc">erlc(3)</seealso></p> + <title>See Also</title> + <p><seealso marker="compiler:compile"><c>compile(3)</c></seealso>, + <seealso marker="erts:erlc"><c>erlc(3)</c></seealso></p> </section> </fileref> diff --git a/lib/stdlib/doc/src/base64.xml b/lib/stdlib/doc/src/base64.xml index 7b82d7dd3d..cfa1ecc006 100644 --- a/lib/stdlib/doc/src/base64.xml +++ b/lib/stdlib/doc/src/base64.xml @@ -27,50 +27,57 @@ <docno></docno> <date>2007-02-22</date> <rev></rev> - <file>base64.sgml</file> + <file>base64.xml</file> </header> <module>base64</module> - <modulesummary>Implements base 64 encode and decode, see RFC2045.</modulesummary> + <modulesummary>Provides base64 encode and decode, see + RFC 2045.</modulesummary> <description> - <p>Implements base 64 encode and decode, see RFC2045. </p> + <p>Provides base64 encode and decode, see + <url href="https://www.ietf.org/rfc/rfc2045.txt">RFC 2045</url>.</p> </description> + <datatypes> <datatype> <name name="ascii_string"/> </datatype> <datatype> <name name="ascii_binary"/> - <desc><p>A <c>binary()</c> with ASCII characters in the range 1 to 255.</p> + <desc><p>A <c>binary()</c> with ASCII characters in the range 1 to + 255.</p> </desc> </datatype> </datatypes> + <funcs> <func> - <name name="encode" arity="1"/> - <name name="encode_to_string" arity="1"/> - <fsummary>Encodes data into base64. </fsummary> - <type variable="Data"/> - <type variable="Base64" name_i="1"/> - <type variable="Base64String"/> - <desc> - <p>Encodes a plain ASCII string into base64. The result will - be 33% larger than the data.</p> - </desc> - </func> - <func> <name name="decode" arity="1"/> <name name="decode_to_string" arity="1"/> <name name="mime_decode" arity="1"/> <name name="mime_decode_to_string" arity="1"/> - <fsummary>Decodes a base64 encoded string to data. </fsummary> + <fsummary>Decode a base64 encoded string to data.</fsummary> <type variable="Base64" name_i="1"/> <type variable="Data" name_i="1"/> <type variable="DataString" name_i="2"/> <desc> - <p>Decodes a base64 encoded string to plain ASCII. See RFC4648. - <c>mime_decode/1</c> and <c>mime_decode_to_string/1</c> - strips away illegal characters, while <c>decode/1</c> and - <c>decode_to_string/1</c> only strips away whitespace characters.</p> + <p>Decodes a base64-encoded string to plain ASCII. See + <url href="https://www.ietf.org/html/rfc4648">RFC 4648</url>.</p> + <p><c>mime_decode/1</c> and <c>mime_decode_to_string/1</c> strip away + illegal characters, while <c>decode/1</c> and + <c>decode_to_string/1</c> only strip away whitespace characters.</p> + </desc> + </func> + + <func> + <name name="encode" arity="1"/> + <name name="encode_to_string" arity="1"/> + <fsummary>Encode data into base64.</fsummary> + <type variable="Data"/> + <type variable="Base64" name_i="1"/> + <type variable="Base64String"/> + <desc> + <p>Encodes a plain ASCII string into base64. The result is 33% larger + than the data.</p> </desc> </func> </funcs> diff --git a/lib/stdlib/doc/src/beam_lib.xml b/lib/stdlib/doc/src/beam_lib.xml index 7c89c8b43e..d5ec90b060 100644 --- a/lib/stdlib/doc/src/beam_lib.xml +++ b/lib/stdlib/doc/src/beam_lib.xml @@ -4,7 +4,7 @@ <erlref> <header> <copyright> - <year>2000</year><year>2015</year> + <year>2000</year><year>2016</year> <holder>Ericsson AB. All Rights Reserved.</holder> </copyright> <legalnotice> @@ -29,137 +29,159 @@ <rev>PA1</rev> </header> <module>beam_lib</module> - <modulesummary>An Interface To the BEAM File Format</modulesummary> + <modulesummary>An interface to the BEAM file format.</modulesummary> <description> - <p><c>beam_lib</c> provides an interface to files created by - the BEAM compiler ("BEAM files"). The format used, a variant of + <p>This module provides an interface to files created by + the BEAM Compiler ("BEAM files"). The format used, a variant of "EA IFF 1985" Standard for Interchange Format Files, divides data into chunks.</p> + <p>Chunk data can be returned as binaries or as compound terms. Compound terms are returned when chunks are referenced by names - (atoms) rather than identifiers (strings). The names recognized - and the corresponding identifiers are:</p> + (atoms) rather than identifiers (strings). The recognized names + and the corresponding identifiers are as follows:</p> + <list type="bulleted"> <item><c>abstract_code ("Abst")</c></item> + <item><c>atoms ("Atom")</c></item> <item><c>attributes ("Attr")</c></item> <item><c>compile_info ("CInf")</c></item> <item><c>exports ("ExpT")</c></item> - <item><c>labeled_exports ("ExpT")</c></item> <item><c>imports ("ImpT")</c></item> <item><c>indexed_imports ("ImpT")</c></item> - <item><c>locals ("LocT")</c></item> + <item><c>labeled_exports ("ExpT")</c></item> <item><c>labeled_locals ("LocT")</c></item> - <item><c>atoms ("Atom")</c></item> + <item><c>locals ("LocT")</c></item> </list> </description> <section> <marker id="debug_info"></marker> <title>Debug Information/Abstract Code</title> - <p>The option <c>debug_info</c> can be given to the compiler (see - <seealso marker="compiler:compile#debug_info">compile(3)</seealso>) - in order to have debug information in the form of abstract code - (see <seealso marker="erts:absform">The Abstract Format</seealso> - in ERTS User's Guide) stored in the <c>abstract_code</c> chunk. + <p>Option <c>debug_info</c> can be specified to the Compiler (see + <seealso marker="compiler:compile#debug_info"><c>compile(3)</c></seealso>) + to have debug information in the form of abstract code (see section + <seealso marker="erts:absform">The Abstract Format</seealso> in the + ERTS User's Guide) stored in the <c>abstract_code</c> chunk. Tools such as Debugger and Xref require the debug information to be included.</p> + <warning> <p>Source code can be reconstructed from the debug information. - Use encrypted debug information (see below) to prevent this.</p> + To prevent this, use encrypted debug information (see below).</p> </warning> + <p>The debug information can also be removed from BEAM files - using <seealso marker="#strip/1">strip/1</seealso>, - <seealso marker="#strip_files/1">strip_files/1</seealso> and/or - <seealso marker="#strip_release/1">strip_release/1</seealso>.</p> + using <seealso marker="#strip/1"><c>strip/1</c></seealso>, + <seealso marker="#strip_files/1"><c>strip_files/1</c></seealso>, and/or + <seealso marker="#strip_release/1"><c>strip_release/1</c></seealso>.</p> </section> - <section> - <title>Reconstructing source code</title> - <p>Here is an example of how to reconstruct source code from - the debug information in a BEAM file <c>Beam</c>:</p> - <code type="none"> - {ok,{_,[{abstract_code,{_,AC}}]}} = beam_lib:chunks(Beam,[abstract_code]). - io:fwrite("~s~n", [erl_prettypr:format(erl_syntax:form_list(AC))]).</code> - </section> - <section> - <title>Encrypted debug information</title> - <p>The debug information can be encrypted in order to keep - the source code secret, but still being able to use tools such as - Xref or Debugger. </p> - <p>To use encrypted debug information, a key must be provided to - the compiler and <c>beam_lib</c>. The key is given as a string and - it is recommended that it contains at least 32 characters and - that both upper and lower case letters as well as digits and - special characters are used.</p> - <p>The default type -- and currently the only type -- of crypto - algorithm is <c>des3_cbc</c>, three rounds of DES. The key string - will be scrambled using <c>erlang:md5/1</c> to generate - the actual keys used for <c>des3_cbc</c>.</p> - <note> - <p>As far as we know by the time of writing, it is - infeasible to break <c>des3_cbc</c> encryption without any - knowledge of the key. Therefore, as long as the key is kept - safe and is unguessable, the encrypted debug information - <em>should</em> be safe from intruders.</p> - </note> - <p>There are two ways to provide the key:</p> - <list type="ordered"> - <item> - <p>Use the compiler option <c>{debug_info,Key}</c>, see - <seealso marker="compiler:compile#debug_info_key">compile(3)</seealso>, - and the function - <seealso marker="#crypto_key_fun/1">crypto_key_fun/1</seealso> - to register a fun which returns the key whenever - <c>beam_lib</c> needs to decrypt the debug information.</p> - <p>If no such fun is registered, <c>beam_lib</c> will instead - search for a <c>.erlang.crypt</c> file, see below.</p> - </item> - <item> - <p>Store the key in a text file named <c>.erlang.crypt</c>.</p> - <p>In this case, the compiler option <c>encrypt_debug_info</c> - can be used, see - <seealso marker="compiler:compile#encrypt_debug_info">compile(3)</seealso>.</p> - </item> - </list> + + <section> + <title>Reconstruct Source Code</title> + <p>The following example shows how to reconstruct source code from + the debug information in a BEAM file <c>Beam</c>:</p> + + <code type="none"> +{ok,{_,[{abstract_code,{_,AC}}]}} = beam_lib:chunks(Beam,[abstract_code]). +io:fwrite("~s~n", [erl_prettypr:format(erl_syntax:form_list(AC))]).</code> </section> - <section> - <title>.erlang.crypt</title> - <p><c>beam_lib</c> searches for <c>.erlang.crypt</c> in the current - directory and then the home directory for the current user. If - the file is found and contains a key, <c>beam_lib</c> will - implicitly create a crypto key fun and register it.</p> - <p>The <c>.erlang.crypt</c> file should contain a single list of - tuples:</p> - <code type="none"> - {debug_info, Mode, Module, Key}</code> - <p><c>Mode</c> is the type of crypto algorithm; currently, the only - allowed value thus is <c>des3_cbc</c>. <c>Module</c> is either an - atom, in which case <c>Key</c> will only be used for the module - <c>Module</c>, or <c>[]</c>, in which case <c>Key</c> will be - used for all modules. <c>Key</c> is the non-empty key string.</p> - <p>The <c>Key</c> in the first tuple where both <c>Mode</c> and - <c>Module</c> matches will be used.</p> - <p>Here is an example of an <c>.erlang.crypt</c> file that returns - the same key for all modules:</p> - <code type="none"><![CDATA[ + + <section> + <title>Encrypted Debug Information</title> + <p>The debug information can be encrypted to keep + the source code secret, but still be able to use tools such as + Debugger or Xref.</p> + + <p>To use encrypted debug information, a key must be provided to + the compiler and <c>beam_lib</c>. The key is specified as a string. + It is recommended that the string contains at least 32 characters and + that both upper and lower case letters as well as digits and + special characters are used.</p> + + <p>The default type (and currently the only type) of crypto + algorithm is <c>des3_cbc</c>, three rounds of DES. The key string + is scrambled using + <seealso marker="erts:erlang#md5/1"><c>erlang:md5/1</c></seealso> + to generate the keys used for <c>des3_cbc</c>.</p> + + <note> + <p>As far as we know by the time of writing, it is + infeasible to break <c>des3_cbc</c> encryption without any + knowledge of the key. Therefore, as long as the key is kept + safe and is unguessable, the encrypted debug information + <em>should</em> be safe from intruders.</p> + </note> + + <p>The key can be provided in the following two ways:</p> + + <list type="ordered"> + <item> + <p>Use Compiler option <c>{debug_info,Key}</c>, see + <seealso marker="compiler:compile#debug_info_key"><c>compile(3)</c></seealso> + and function + <seealso marker="#crypto_key_fun/1"><c>crypto_key_fun/1</c></seealso> + to register a fun that returns the key whenever + <c>beam_lib</c> must decrypt the debug information.</p> + <p>If no such fun is registered, <c>beam_lib</c> instead + searches for an <c>.erlang.crypt</c> file, see the next section.</p> + </item> + <item> + <p>Store the key in a text file named <c>.erlang.crypt</c>.</p> + <p>In this case, Compiler option <c>encrypt_debug_info</c> + can be used, see + <seealso marker="compiler:compile#encrypt_debug_info"><c>compile(3)</c></seealso>. + </p> + </item> + </list> + </section> + + <section> + <title>.erlang.crypt</title> + <p><c>beam_lib</c> searches for <c>.erlang.crypt</c> in the current + directory and then the home directory for the current user. If + the file is found and contains a key, <c>beam_lib</c> + implicitly creates a crypto key fun and registers it.</p> + + <p>File <c>.erlang.crypt</c> is to contain a single list of tuples:</p> + + <code type="none"> +{debug_info, Mode, Module, Key}</code> + + <p><c>Mode</c> is the type of crypto algorithm; currently, the only + allowed value is <c>des3_cbc</c>. <c>Module</c> is either an + atom, in which case <c>Key</c> is only used for the module + <c>Module</c>, or <c>[]</c>, in which case <c>Key</c> is + used for all modules. <c>Key</c> is the non-empty key string.</p> + + <p><c>Key</c> in the first tuple where both <c>Mode</c> and + <c>Module</c> match is used.</p> + + <p>The following is an example of an <c>.erlang.crypt</c> file that returns + the same key for all modules:</p> + + <code type="none"><![CDATA[ [{debug_info, des3_cbc, [], "%>7}|pc/DM6Cga*68$Mw]L#&_Gejr]G^"}].]]></code> - <p>And here is a slightly more complicated example of an - <c>.erlang.crypt</c> which provides one key for the module - <c>t</c>, and another key for all other modules:</p> - <code type="none"><![CDATA[ + + <p>The following is a slightly more complicated example of an + <c>.erlang.crypt</c> providing one key for module + <c>t</c> and another key for all other modules:</p> + + <code type="none"><![CDATA[ [{debug_info, des3_cbc, t, "My KEY"}, {debug_info, des3_cbc, [], "%>7}|pc/DM6Cga*68$Mw]L#&_Gejr]G^"}].]]></code> - <note> - <p>Do not use any of the keys in these examples. Use your own - keys.</p> - </note> - </section> + + <note> + <p>Do not use any of the keys in these examples. Use your own keys.</p> + </note> + </section> <datatypes> <datatype> <name name="beam"/> <desc> <p>Each of the functions described below accept either the - module name, the filename, or a binary containing the beam + module name, the filename, or a binary containing the BEAM module.</p> </desc> </datatype> @@ -167,7 +189,7 @@ <name name="chunkdata"/> <desc> <p>The list of attributes is sorted on <c>Attribute</c> - (in attrib_entry()), and each + (in <c>attrib_entry()</c>) and each attribute name occurs once in the list. The attribute values occur in the same order as in the file. The lists of functions are also sorted.</p> @@ -186,8 +208,8 @@ <name name="abst_code"/> <desc> <p>It is not checked that the forms conform to the abstract format - indicated by <c><anno>AbstVersion</anno></c>. <c>no_abstract_code</c> means - that the <c>"Abst"</c> chunk is present, but empty.</p> + indicated by <c><anno>AbstVersion</anno></c>. <c>no_abstract_code</c> + means that chunk <c>"Abst"</c> is present, but empty.</p> </desc> </datatype> <datatype> @@ -230,78 +252,163 @@ <p>Reads chunk data for all chunks.</p> </desc> </func> + + <func> + <name name="build_module" arity="1"/> + <fsummary>Create a BEAM module from a list of chunks.</fsummary> + <desc> + <p>Builds a BEAM module (as a binary) from a list of chunks.</p> + </desc> + </func> + <func> <name name="chunks" arity="2"/> - <fsummary>Read selected chunks from a BEAM file or binary</fsummary> + <fsummary>Read selected chunks from a BEAM file or binary.</fsummary> <desc> - <p>Reads chunk data for selected chunks refs. The order of + <p>Reads chunk data for selected chunks references. The order of the returned list of chunk data is determined by the order of the list of chunks references.</p> </desc> </func> + <func> <name name="chunks" arity="3"/> - <fsummary>Read selected chunks from a BEAM file or binary</fsummary> + <fsummary>Read selected chunks from a BEAM file or binary.</fsummary> <desc> - <p>Reads chunk data for selected chunks refs. The order of + <p>Reads chunk data for selected chunks references. The order of the returned list of chunk data is determined by the order of the list of chunks references.</p> - <p>By default, if any requested chunk is missing in <c><anno>Beam</anno></c>, - an <c>error</c> tuple is returned. - However, if the option <c>allow_missing_chunks</c> has been given, - a result will be returned even if chunks are missing. - In the result list, any missing chunks will be represented as + <p>By default, if any requested chunk is missing in + <c><anno>Beam</anno></c>, an <c>error</c> tuple is returned. + However, if option <c>allow_missing_chunks</c> is specified, + a result is returned even if chunks are missing. + In the result list, any missing chunks are represented as <c>{<anno>ChunkRef</anno>,missing_chunk}</c>. - Note, however, that if the <c>"Atom"</c> chunk if missing, that is - considered a fatal error and the return value will be an <c>error</c> + Notice however that if chunk <c>"Atom"</c> is missing, that is + considered a fatal error and the return value is an <c>error</c> tuple.</p> </desc> </func> + <func> - <name name="build_module" arity="1"/> - <fsummary>Creates a BEAM module from a list of chunks</fsummary> + <name name="clear_crypto_key_fun" arity="0"/> + <fsummary>Unregister the current crypto key fun.</fsummary> <desc> - <p>Builds a BEAM module (as a binary) from a list of chunks.</p> + <p>Unregisters the crypto key fun and terminates the process + holding it, started by + <seealso marker="#crypto_key_fun/1"><c>crypto_key_fun/1</c></seealso>. + </p> + <p>Returns either <c>{ok, undefined}</c> if no crypto key fun is + registered, or <c>{ok, Term}</c>, where <c>Term</c> is + the return value from <c>CryptoKeyFun(clear)</c>, see + <c>crypto_key_fun/1</c>.</p> </desc> </func> + <func> - <name name="version" arity="1"/> - <fsummary>Read the BEAM file's module version</fsummary> + <name name="cmp" arity="2"/> + <fsummary>Compare two BEAM files.</fsummary> + <type name="cmp_rsn"/> <desc> - <p>Returns the module version(s). A version is defined by - the module attribute <c>-vsn(Vsn)</c>. If this attribute is - not specified, the version defaults to the checksum of - the module. Note that if the version <c>Vsn</c> is not a list, - it is made into one, that is <c>{ok,{Module,[Vsn]}}</c> is - returned. If there are several <c>-vsn</c> module attributes, - the result is the concatenated list of versions. Examples:</p> - <pre> -1> <input>beam_lib:version(a).</input> % -vsn(1). -{ok,{a,[1]}} -2> <input>beam_lib:version(b).</input> % -vsn([1]). -{ok,{b,[1]}} -3> <input>beam_lib:version(c).</input> % -vsn([1]). -vsn(2). -{ok,{c,[1,2]}} -4> <input>beam_lib:version(d).</input> % no -vsn attribute -{ok,{d,[275613208176997377698094100858909383631]}}</pre> + <p>Compares the contents of two BEAM files. If the module names + are the same, and all chunks except for chunk <c>"CInf"</c> + (the chunk containing the compilation information that is + returned by <c>Module:module_info(compile)</c>) + have the same contents in both files, + <c>ok</c> is returned. Otherwise an error message is returned.</p> </desc> </func> + <func> - <name name="md5" arity="1"/> - <fsummary>Read the BEAM file's module version</fsummary> + <name name="cmp_dirs" arity="2"/> + <fsummary>Compare the BEAM files in two directories.</fsummary> <desc> - <p>Calculates an MD5 redundancy check for the code of the module - (compilation date and other attributes are not included).</p> + <p>Compares the BEAM files in + two directories. Only files with extension <c>".beam"</c> are + compared. BEAM files that exist only in directory + <c><anno>Dir1</anno></c> (<c><anno>Dir2</anno></c>) are returned in + <c><anno>Only1</anno></c> (<c><anno>Only2</anno></c>). + BEAM files that exist in both directories but + are considered different by <c>cmp/2</c> are returned as + pairs {<c><anno>Filename1</anno></c>, <c><anno>Filename2</anno></c>}, + where <c><anno>Filename1</anno></c> (<c><anno>Filename2</anno></c>) + exists in directory <c><anno>Dir1</anno></c> + (<c><anno>Dir2</anno></c>).</p> </desc> </func> + + <func> + <name name="crypto_key_fun" arity="1"/> + <fsummary>Register a fun that provides a crypto key.</fsummary> + <type name="crypto_fun"/> + <type name="crypto_fun_arg"/> + <type name="mode"/> + <desc> + <p>Registers an unary fun + that is called if <c>beam_lib</c> must read an + <c>abstract_code</c> chunk that has been encrypted. The fun + is held in a process that is started by the function.</p> + <p>If a fun is already registered when attempting to + register a fun, <c>{error, exists}</c> is returned.</p> + <p>The fun must handle the following arguments:</p> + <code type="none"> +CryptoKeyFun(init) -> ok | {ok, NewCryptoKeyFun} | {error, Term}</code> + <p>Called when the fun is registered, in the process that holds + the fun. Here the crypto key fun can do any necessary + initializations. If <c>{ok, NewCryptoKeyFun}</c> is returned, + <c>NewCryptoKeyFun</c> is registered instead of + <c>CryptoKeyFun</c>. If <c>{error, Term}</c> is returned, + the registration is aborted and <c>crypto_key_fun/1</c> + also returns <c>{error, Term}</c>.</p> + <code type="none"> +CryptoKeyFun({debug_info, Mode, Module, Filename}) -> Key</code> + <p>Called when the key is needed for module <c>Module</c> + in the file named <c>Filename</c>. <c>Mode</c> is the type of + crypto algorithm; currently, the only possible value is + <c>des3_cbc</c>. The call is to fail (raise an exception) if + no key is available.</p> + <code type="none"> +CryptoKeyFun(clear) -> term()</code> + <p>Called before the fun is unregistered. Here any cleaning up + can be done. The return value is not important, but is passed + back to the caller of <c>clear_crypto_key_fun/0</c> as part + of its return value.</p> + </desc> + </func> + + <func> + <name name="diff_dirs" arity="2"/> + <fsummary>Compare the BEAM files in two directories.</fsummary> + <desc> + <p>Compares the BEAM files in two directories as + <seealso marker="#cmp_dirs/2"><c>cmp_dirs/2</c></seealso>, but the + names of files that exist in only one directory or are different are + presented on standard output.</p> + </desc> + </func> + + <func> + <name name="format_error" arity="1"/> + <fsummary>Return an English description of a BEAM read error reply. + </fsummary> + <desc> + <p>For a specified error returned by any function in this module, + this function returns a descriptive string + of the error in English. For file errors, function + <seealso marker="kernel:file#format_error/1"><c>file:format_error(Posix)</c></seealso> + is to be called.</p> + </desc> + </func> + <func> <name name="info" arity="1"/> - <fsummary>Information about a BEAM file</fsummary> + <fsummary>Information about a BEAM file.</fsummary> <desc> <p>Returns a list containing some information about a BEAM file as tuples <c>{Item, Info}</c>:</p> <taglist> - <tag><c>{file, <anno>Filename</anno>} | {binary, <anno>Binary</anno>}</c></tag> + <tag><c>{file, <anno>Filename</anno>} | {binary, + <anno>Binary</anno>}</c></tag> <item> <p>The name (string) of the BEAM file, or the binary from which the information was extracted.</p> @@ -310,7 +417,8 @@ <item> <p>The name (atom) of the module.</p> </item> - <tag><c>{chunks, [{<anno>ChunkId</anno>, <anno>Pos</anno>, <anno>Size</anno>}]}</c></tag> + <tag><c>{chunks, [{<anno>ChunkId</anno>, <anno>Pos</anno>, + <anno>Size</anno>}]}</c></tag> <item> <p>For each chunk, the identifier (string) and the position and size of the chunk data, in bytes.</p> @@ -318,135 +426,75 @@ </taglist> </desc> </func> + <func> - <name name="cmp" arity="2"/> - <fsummary>Compare two BEAM files</fsummary> - <type name="cmp_rsn"/> - <desc> - <p>Compares the contents of two BEAM files. If the module names - are the same, and all chunks except for the <c>"CInf"</c> chunk - (the chunk containing the compilation information which is - returned by <c>Module:module_info(compile)</c>) - have the same contents in both files, - <c>ok</c> is returned. Otherwise an error message is returned.</p> - </desc> - </func> - <func> - <name name="cmp_dirs" arity="2"/> - <fsummary>Compare the BEAM files in two directories</fsummary> - <desc> - <p>The <c>cmp_dirs/2</c> function compares the BEAM files in - two directories. Only files with extension <c>".beam"</c> are - compared. BEAM files that exist in directory <c><anno>Dir1</anno></c> - (<c><anno>Dir2</anno></c>) only are returned in <c><anno>Only1</anno></c> - (<c><anno>Only2</anno></c>). BEAM files that exist on both directories but - are considered different by <c>cmp/2</c> are returned as - pairs {<c><anno>Filename1</anno></c>, <c><anno>Filename2</anno></c>} where - <c><anno>Filename1</anno></c> (<c><anno>Filename2</anno></c>) exists in directory - <c><anno>Dir1</anno></c> (<c><anno>Dir2</anno></c>).</p> - </desc> - </func> - <func> - <name name="diff_dirs" arity="2"/> - <fsummary>Compare the BEAM files in two directories</fsummary> + <name name="md5" arity="1"/> + <fsummary>Read the module version of the BEAM file.</fsummary> <desc> - <p>The <c>diff_dirs/2</c> function compares the BEAM files in - two directories the way <c>cmp_dirs/2</c> does, but names of - files that exist in only one directory or are different are - presented on standard output.</p> + <p>Calculates an MD5 redundancy check for the code of the module + (compilation date and other attributes are not included).</p> </desc> </func> + <func> <name name="strip" arity="1"/> - <fsummary>Removes chunks not needed by the loader from a BEAM file</fsummary> + <fsummary>Remove chunks not needed by the loader from a BEAM file. + </fsummary> <desc> - <p>The <c>strip/1</c> function removes all chunks from a BEAM + <p>Removes all chunks from a BEAM file except those needed by the loader. In particular, - the debug information (<c>abstract_code</c> chunk) is removed.</p> + the debug information (chunk <c>abstract_code</c>) is removed.</p> </desc> </func> + <func> <name name="strip_files" arity="1"/> - <fsummary>Removes chunks not needed by the loader from BEAM files</fsummary> + <fsummary>Removes chunks not needed by the loader from BEAM files. + </fsummary> <desc> - <p>The <c>strip_files/1</c> function removes all chunks except + <p>Removes all chunks except those needed by the loader from BEAM files. In particular, - the debug information (<c>abstract_code</c> chunk) is removed. - The returned list contains one element for each given file - name, in the same order as in <c>Files</c>.</p> + the debug information (chunk <c>abstract_code</c>) is removed. + The returned list contains one element for each specified filename, + in the same order as in <c>Files</c>.</p> </desc> </func> + <func> <name name="strip_release" arity="1"/> - <fsummary>Removes chunks not needed by the loader from all BEAM files of a release</fsummary> + <fsummary>Remove chunks not needed by the loader from all BEAM files of + a release.</fsummary> <desc> - <p>The <c>strip_release/1</c> function removes all chunks + <p>Removes all chunks except those needed by the loader from the BEAM files of a - release. <c><anno>Dir</anno></c> should be the installation root + release. <c><anno>Dir</anno></c> is to be the installation root directory. For example, the current OTP release can be stripped with the call <c>beam_lib:strip_release(code:root_dir())</c>.</p> </desc> </func> + <func> - <name name="format_error" arity="1"/> - <fsummary>Return an English description of a BEAM read error reply</fsummary> - <desc> - <p>Given the error returned by any function in this module, - the function <c>format_error</c> returns a descriptive string - of the error in English. For file errors, the function - <c>file:format_error(Posix)</c> should be called.</p> - </desc> - </func> - <func> - <name name="crypto_key_fun" arity="1"/> - <fsummary>Register a fun that provides a crypto key</fsummary> - <type name="crypto_fun"/> - <type name="crypto_fun_arg"/> - <type name="mode"/> - <desc> - <p>The <c>crypto_key_fun/1</c> function registers a unary fun - that will be called if <c>beam_lib</c> needs to read an - <c>abstract_code</c> chunk that has been encrypted. The fun - is held in a process that is started by the function.</p> - <p>If there already is a fun registered when attempting to - register a fun, <c>{error, exists}</c> is returned.</p> - <p>The fun must handle the following arguments:</p> - <code type="none"> - CryptoKeyFun(init) -> ok | {ok, NewCryptoKeyFun} | {error, Term}</code> - <p>Called when the fun is registered, in the process that holds - the fun. Here the crypto key fun can do any necessary - initializations. If <c>{ok, NewCryptoKeyFun}</c> is returned - then <c>NewCryptoKeyFun</c> will be registered instead of - <c>CryptoKeyFun</c>. If <c>{error, Term}</c> is returned, - the registration is aborted and <c>crypto_key_fun/1</c> - returns <c>{error, Term}</c> as well.</p> - <code type="none"> - CryptoKeyFun({debug_info, Mode, Module, Filename}) -> Key</code> - <p>Called when the key is needed for the module <c>Module</c> - in the file named <c>Filename</c>. <c>Mode</c> is the type of - crypto algorithm; currently, the only possible value thus is - <c>des3_cbc</c>. The call should fail (raise an exception) if - there is no key available.</p> - <code type="none"> - CryptoKeyFun(clear) -> term()</code> - <p>Called before the fun is unregistered. Here any cleaning up - can be done. The return value is not important, but is passed - back to the caller of <c>clear_crypto_key_fun/0</c> as part - of its return value.</p> - </desc> - </func> - <func> - <name name="clear_crypto_key_fun" arity="0"/> - <fsummary>Unregister the current crypto key fun</fsummary> + <name name="version" arity="1"/> + <fsummary>Read the module version of the BEAM file.</fsummary> <desc> - <p>Unregisters the crypto key fun and terminates the process - holding it, started by <c>crypto_key_fun/1</c>.</p> - <p>The <c>clear_crypto_key_fun/1</c> either returns - <c>{ok, undefined}</c> if there was no crypto key fun - registered, or <c>{ok, Term}</c>, where <c>Term</c> is - the return value from <c>CryptoKeyFun(clear)</c>, see - <c>crypto_key_fun/1</c>.</p> + <p>Returns the module version or versions. A version is defined by + module attribute <c>-vsn(Vsn)</c>. If this attribute is + not specified, the version defaults to the checksum of + the module. Notice that if version <c>Vsn</c> is not a list, + it is made into one, that is <c>{ok,{Module,[Vsn]}}</c> is + returned. If there are many <c>-vsn</c> module attributes, + the result is the concatenated list of versions.</p> + <p><em>Examples:</em></p> + <pre> +1> <input>beam_lib:version(a).</input> % -vsn(1). +{ok,{a,[1]}} +2> <input>beam_lib:version(b).</input> % -vsn([1]). +{ok,{b,[1]}} +3> <input>beam_lib:version(c).</input> % -vsn([1]). -vsn(2). +{ok,{c,[1,2]}} +4> <input>beam_lib:version(d).</input> % no -vsn attribute +{ok,{d,[275613208176997377698094100858909383631]}}</pre> </desc> </func> </funcs> diff --git a/lib/stdlib/doc/src/binary.xml b/lib/stdlib/doc/src/binary.xml index 933157fc34..6a86d6c7ba 100644 --- a/lib/stdlib/doc/src/binary.xml +++ b/lib/stdlib/doc/src/binary.xml @@ -35,285 +35,311 @@ <file>binary.xml</file> </header> <module>binary</module> - <modulesummary>Library for handling binary data</modulesummary> + <modulesummary>Library for handling binary data.</modulesummary> <description> <p>This module contains functions for manipulating byte-oriented - binaries. Although the majority of functions could be implemented + binaries. Although the majority of functions could be provided using bit-syntax, the functions in this library are highly optimized and are expected to either execute faster or consume - less memory (or both) than a counterpart written in pure Erlang.</p> + less memory, or both, than a counterpart written in pure Erlang.</p> - <p>The module is implemented according to the EEP (Erlang Enhancement Proposal) 31.</p> + <p>The module is provided according to Erlang Enhancement Proposal + (EEP) 31.</p> <note> - <p> - The library handles byte-oriented data. Bitstrings that are not - binaries (does not contain whole octets of bits) will result in a <c>badarg</c> - exception being thrown from any of the functions in this - module. - </p> + <p>The library handles byte-oriented data. For bitstrings that are not + binaries (does not contain whole octets of bits) a <c>badarg</c> + exception is thrown from any of the functions in this module.</p> </note> </description> + <datatypes> <datatype> <name name="cp"/> - <desc><p>Opaque data-type representing a compiled - search-pattern. Guaranteed to be a tuple() to allow programs to - distinguish it from non precompiled search patterns.</p> + <desc><p>Opaque data type representing a compiled + search pattern. Guaranteed to be a <c>tuple()</c> to allow programs to + distinguish it from non-precompiled search patterns.</p> </desc> </datatype> <datatype> <name name="part"/> - <desc><p>A representaion of a part (or range) in a binary. Start is a - zero-based offset into a binary() and Length is the length of - that part. As input to functions in this module, a reverse + <desc><p>A representaion of a part (or range) in a binary. <c>Start</c> is + a zero-based offset into a <c>binary()</c> and <c>Length</c> is the + length of that part. As input to functions in this module, a reverse part specification is allowed, constructed with a negative - Length, so that the part of the binary begins at Start + - Length and is -Length long. This is useful for referencing the - last N bytes of a binary as {size(Binary), -N}. The functions - in this module always return part()'s with positive Length.</p> + <c>Length</c>, so that the part of the binary begins at <c>Start</c> + + <c>Length</c> and is -<c>Length</c> long. This is useful for referencing + the last <c>N</c> bytes of a binary as <c>{size(Binary), -N}</c>. The + functions in this module always return <c>part()</c>s with positive + <c>Length</c>.</p> </desc> </datatype> </datatypes> + <funcs> <func> <name name="at" arity="2"/> - <fsummary>Returns the byte at a specific position in a binary</fsummary> + <fsummary>Return the byte at a specific position in a binary.</fsummary> <desc> - - <p>Returns the byte at position <c><anno>Pos</anno></c> (zero-based) in the binary - <c><anno>Subject</anno></c> as an integer. If <c><anno>Pos</anno></c> >= <c>byte_size(<anno>Subject</anno>)</c>, - a <c>badarg</c> - exception is raised.</p> - + <p>Returns the byte at position <c><anno>Pos</anno></c> (zero-based) in + binary <c><anno>Subject</anno></c> as an integer. If + <c><anno>Pos</anno></c> >= <c>byte_size(<anno>Subject</anno>)</c>, + a <c>badarg</c> exception is raised.</p> </desc> </func> + <func> <name name="bin_to_list" arity="1"/> - <fsummary>Convert a binary to a list of integers</fsummary> + <fsummary>Convert a binary to a list of integers.</fsummary> <desc> - <p>The same as <c>bin_to_list(<anno>Subject</anno>,{0,byte_size(<anno>Subject</anno>)})</c>.</p> + <p>Same as <c>bin_to_list(<anno>Subject</anno>, {0,byte_size(<anno>Subject</anno>)})</c>.</p> </desc> </func> + <func> <name name="bin_to_list" arity="2"/> - <fsummary>Convert a binary to a list of integers</fsummary> + <fsummary>Convert a binary to a list of integers.</fsummary> <desc> + <p>Converts <c><anno>Subject</anno></c> to a list of <c>byte()</c>s, each + representing the value of one byte. <c>part()</c> denotes which part of + the <c>binary()</c> to convert.</p> - <p>Converts <c><anno>Subject</anno></c> to a list of <c>byte()</c>s, each representing - the value of one byte. The <c>part()</c> denotes which part of the - <c>binary()</c> to convert. Example:</p> + <p><em>Example:</em></p> <code> -1> binary:bin_to_list(<<"erlang">>,{1,3}). +1> binary:bin_to_list(<<"erlang">>, {1,3}). "rla" -%% or [114,108,97] in list notation. -</code> - <p>If <c><anno>PosLen</anno></c> in any way references outside the binary, a <c>badarg</c> exception is raised.</p> +%% or [114,108,97] in list notation.</code> + + <p>If <c><anno>PosLen</anno></c> in any way references outside the binary, + a <c>badarg</c> exception is raised.</p> </desc> </func> + <func> <name name="bin_to_list" arity="3"/> - <fsummary>Convert a binary to a list of integers</fsummary> + <fsummary>Convert a binary to a list of integers.</fsummary> <desc> - <p>The same as<c> bin_to_list(<anno>Subject</anno>,{<anno>Pos</anno>,<anno>Len</anno>})</c>.</p> + <p>Same as<c> bin_to_list(<anno>Subject</anno>, {<anno>Pos</anno>, <anno>Len</anno>})</c>.</p> </desc> </func> + <func> <name name="compile_pattern" arity="1"/> - <fsummary>Pre-compiles a binary search pattern</fsummary> + <fsummary>Precompile a binary search pattern.</fsummary> <desc> - <p>Builds an internal structure representing a compilation of a - search-pattern, later to be used in the <seealso marker="#match-3">match/3</seealso>, - <seealso marker="#matches-3">matches/3</seealso>, - <seealso marker="#split-3">split/3</seealso> or - <seealso marker="#replace-4">replace/4</seealso> - functions. The <c>cp()</c> returned is guaranteed to be a - <c>tuple()</c> to allow programs to distinguish it from non - pre-compiled search patterns</p> - - <p>When a list of binaries is given, it denotes a set of - alternative binaries to search for. I.e if + search pattern, later to be used in functions + <seealso marker="#match-3"><c>match/3</c></seealso>, + <seealso marker="#matches-3"><c>matches/3</c></seealso>, + <seealso marker="#split-3"><c>split/3</c></seealso>, or + <seealso marker="#replace-4"><c>replace/4</c></seealso>. + The <c>cp()</c> returned is guaranteed to be a + <c>tuple()</c> to allow programs to distinguish it from + non-precompiled search patterns.</p> + + <p>When a list of binaries is specified, it denotes a set of + alternative binaries to search for. For example, if <c>[<<"functional">>,<<"programming">>]</c> - is given as <c><anno>Pattern</anno></c>, this - means "either <c><<"functional">></c> or + is specified as <c><anno>Pattern</anno></c>, this + means either <c><<"functional">></c> or <c><<"programming">></c>". The pattern is a set of - alternatives; when only a single binary is given, the set has - only one element. The order of alternatives in a pattern is not significant.</p> - - <p>The list of binaries used for search alternatives shall be flat and proper.</p> + alternatives; when only a single binary is specified, the set has + only one element. The order of alternatives in a pattern is + not significant.</p> - <p>If <c><anno>Pattern</anno></c> is not a binary or a flat proper list of binaries with length > 0, - a <c>badarg</c> exception will be raised.</p> + <p>The list of binaries used for search alternatives must be flat and + proper.</p> + <p>If <c><anno>Pattern</anno></c> is not a binary or a flat proper list of + binaries with length > 0, a <c>badarg</c> exception is raised.</p> </desc> </func> + <func> <name name="copy" arity="1"/> - <fsummary>Creates a duplicate of a binary</fsummary> + <fsummary>Create a duplicate of a binary.</fsummary> <desc> - <p>The same as <c>copy(<anno>Subject</anno>, 1)</c>.</p> + <p>Same as <c>copy(<anno>Subject</anno>, 1)</c>.</p> </desc> </func> + <func> <name name="copy" arity="2"/> - <fsummary>Duplicates a binary N times and creates a new</fsummary> + <fsummary>Duplicate a binary <c>N</c> times and create a new.</fsummary> <desc> - <p>Creates a binary with the content of <c><anno>Subject</anno></c> duplicated <c><anno>N</anno></c> times.</p> + <p>Creates a binary with the content of <c><anno>Subject</anno></c> + duplicated <c><anno>N</anno></c> times.</p> - <p>This function will always create a new binary, even if <c><anno>N</anno> = - 1</c>. By using <c>copy/1</c> on a binary referencing a larger binary, one - might free up the larger binary for garbage collection.</p> + <p>This function always creates a new binary, even if <c><anno>N</anno> = + 1</c>. By using <seealso marker="#copy/1"><c>copy/1</c></seealso> + on a binary referencing a larger binary, one + can free up the larger binary for garbage collection.</p> <note> <p>By deliberately copying a single binary to avoid referencing - a larger binary, one might, instead of freeing up the larger + a larger binary, one can, instead of freeing up the larger binary for later garbage collection, create much more binary data than needed. Sharing binary data is usually good. Only in special cases, when small parts reference large binaries and the large binaries are no longer used in any process, deliberate - copying might be a good idea.</p> </note> + copying can be a good idea.</p> + </note> - <p>If <c><anno>N</anno></c> < <c>0</c>, a <c>badarg</c> exception is raised.</p> + <p>If <c><anno>N</anno></c> < <c>0</c>, a <c>badarg</c> exception is + raised.</p> </desc> </func> + <func> <name name="decode_unsigned" arity="1"/> - <fsummary>Decode a whole binary into an integer of arbitrary size</fsummary> + <fsummary>Decode a whole binary into an integer of arbitrary size. + </fsummary> <desc> - <p>The same as <c>decode_unsigned(<anno>Subject</anno>, big)</c>.</p> + <p>Same as <c>decode_unsigned(<anno>Subject</anno>, big)</c>.</p> </desc> </func> + <func> <name name="decode_unsigned" arity="2"/> - <fsummary>Decode a whole binary into an integer of arbitrary size</fsummary> + <fsummary>Decode a whole binary into an integer of arbitrary size. + </fsummary> <desc> + <p>Converts the binary digit representation, in big endian or little + endian, of a positive integer in <c><anno>Subject</anno></c> to an Erlang + <c>integer()</c>.</p> - <p>Converts the binary digit representation, in big or little - endian, of a positive integer in <c><anno>Subject</anno></c> to an Erlang <c>integer()</c>.</p> - - <p>Example:</p> + <p><em>Example:</em></p> <code> 1> binary:decode_unsigned(<<169,138,199>>,big). -11111111 - </code> +11111111</code> </desc> </func> + <func> <name name="encode_unsigned" arity="1"/> - <fsummary>Encodes an unsigned integer into the minimal binary</fsummary> + <fsummary>Encode an unsigned integer into the minimal binary.</fsummary> <desc> - <p>The same as <c>encode_unsigned(<anno>Unsigned</anno>, big)</c>.</p> + <p>Same as <c>encode_unsigned(<anno>Unsigned</anno>, big)</c>.</p> </desc> </func> + <func> <name name="encode_unsigned" arity="2"/> - <fsummary>Encodes an unsigned integer into the minimal binary</fsummary> + <fsummary>Encode an unsigned integer into the minimal binary.</fsummary> <desc> - <p>Converts a positive integer to the smallest possible - representation in a binary digit representation, either big + representation in a binary digit representation, either big endian or little endian.</p> - <p>Example:</p> + <p><em>Example:</em></p> <code> -1> binary:encode_unsigned(11111111,big). -<<169,138,199>> - </code> +1> binary:encode_unsigned(11111111, big). +<<169,138,199>></code> </desc> </func> + <func> <name name="first" arity="1"/> - <fsummary>Returns the first byte of a binary</fsummary> + <fsummary>Return the first byte of a binary.</fsummary> <desc> - - <p>Returns the first byte of the binary <c><anno>Subject</anno></c> as an integer. If the - size of <c><anno>Subject</anno></c> is zero, a <c>badarg</c> exception is raised.</p> - + <p>Returns the first byte of binary <c><anno>Subject</anno></c> as an + integer. If the size of <c><anno>Subject</anno></c> is zero, a + <c>badarg</c> exception is raised.</p> </desc> </func> + <func> <name name="last" arity="1"/> - <fsummary>Returns the last byte of a binary</fsummary> + <fsummary>Return the last byte of a binary.</fsummary> <desc> - - <p>Returns the last byte of the binary <c><anno>Subject</anno></c> as an integer. If the - size of <c><anno>Subject</anno></c> is zero, a <c>badarg</c> exception is raised.</p> - + <p>Returns the last byte of binary <c><anno>Subject</anno></c> as an + integer. If the size of <c><anno>Subject</anno></c> is zero, a + <c>badarg</c> exception is raised.</p> </desc> </func> + <func> <name name="list_to_bin" arity="1"/> - <fsummary>Convert a list of integers and binaries to a binary</fsummary> + <fsummary>Convert a list of integers and binaries to a binary.</fsummary> <desc> - <p>Works exactly as <c>erlang:list_to_binary/1</c>, added for completeness.</p> + <p>Works exactly as + <seealso marker="erts:erlang#list_to_binary/1"><c>erlang:list_to_binary/1</c></seealso>, + added for completeness.</p> </desc> </func> + <func> <name name="longest_common_prefix" arity="1"/> - <fsummary>Returns length of longest common prefix for a set of binaries</fsummary> + <fsummary>Return length of longest common prefix for a set of binaries. + </fsummary> <desc> - <p>Returns the length of the longest common prefix of the - binaries in the list <c><anno>Binaries</anno></c>. Example:</p> + binaries in list <c><anno>Binaries</anno></c>.</p> + + <p><em>Example:</em></p> <code> -1> binary:longest_common_prefix([<<"erlang">>,<<"ergonomy">>]). +1> binary:longest_common_prefix([<<"erlang">>, <<"ergonomy">>]). 2 -2> binary:longest_common_prefix([<<"erlang">>,<<"perl">>]). -0 -</code> +2> binary:longest_common_prefix([<<"erlang">>, <<"perl">>]). +0</code> - <p>If <c><anno>Binaries</anno></c> is not a flat list of binaries, a <c>badarg</c> exception is raised.</p> + <p>If <c><anno>Binaries</anno></c> is not a flat list of binaries, a + <c>badarg</c> exception is raised.</p> </desc> </func> + <func> <name name="longest_common_suffix" arity="1"/> - <fsummary>Returns length of longest common suffix for a set of binaries</fsummary> + <fsummary>Return length of longest common suffix for a set of binaries. + </fsummary> <desc> - <p>Returns the length of the longest common suffix of the - binaries in the list <c><anno>Binaries</anno></c>. Example:</p> + binaries in list <c><anno>Binaries</anno></c>.</p> + + <p><em>Example:</em></p> <code> -1> binary:longest_common_suffix([<<"erlang">>,<<"fang">>]). +1> binary:longest_common_suffix([<<"erlang">>, <<"fang">>]). 3 -2> binary:longest_common_suffix([<<"erlang">>,<<"perl">>]). -0 -</code> - - <p>If <c>Binaries</c> is not a flat list of binaries, a <c>badarg</c> exception is raised.</p> +2> binary:longest_common_suffix([<<"erlang">>, <<"perl">>]). +0</code> + <p>If <c>Binaries</c> is not a flat list of binaries, a <c>badarg</c> + exception is raised.</p> </desc> </func> + <func> <name name="match" arity="2"/> - <fsummary>Searches for the first match of a pattern in a binary</fsummary> + <fsummary>Search for the first match of a pattern in a binary.</fsummary> <desc> - <p>The same as <c>match(<anno>Subject</anno>, <anno>Pattern</anno>, [])</c>.</p> + <p>Same as <c>match(<anno>Subject</anno>, <anno>Pattern</anno>, [])</c>. + </p> </desc> </func> + <func> <name name="match" arity="3"/> - <fsummary>Searches for the first match of a pattern in a binary</fsummary> + <fsummary>Search for the first match of a pattern in a binary.</fsummary> <type name="part"/> <desc> + <p>Searches for the first occurrence of <c><anno>Pattern</anno></c> in + <c><anno>Subject</anno></c> and returns the position and length.</p> - <p>Searches for the first occurrence of <c><anno>Pattern</anno></c> in <c><anno>Subject</anno></c> and - returns the position and length.</p> + <p>The function returns <c>{Pos, Length}</c> for the binary + in <c><anno>Pattern</anno></c>, starting at the lowest position in + <c><anno>Subject</anno></c>.</p> - <p>The function will return <c>{Pos, Length}</c> for the binary - in <c><anno>Pattern</anno></c> starting at the lowest position in - <c><anno>Subject</anno></c>, Example:</p> + <p><em>Example:</em></p> <code> -1> binary:match(<<"abcde">>, [<<"bcde">>,<<"cd">>],[]). -{1,4} -</code> +1> binary:match(<<"abcde">>, [<<"bcde">>, <<"cd">>],[]). +{1,4}</code> <p>Even though <c><<"cd">></c> ends before <c><<"bcde">></c>, <c><<"bcde">></c> @@ -325,41 +351,44 @@ <taglist> <tag>{scope, {<anno>Start</anno>, <anno>Length</anno>}}</tag> - <item><p>Only the given part is searched. Return values still have - offsets from the beginning of <c><anno>Subject</anno></c>. A negative <c>Length</c> is - allowed as described in the <c>DATA TYPES</c> section of this manual.</p></item> + <item><p>Only the specified part is searched. Return values still have + offsets from the beginning of <c><anno>Subject</anno></c>. A negative + <c>Length</c> is allowed as described in section Data Types in this + manual.</p></item> </taglist> - <p>If none of the strings in - <c><anno>Pattern</anno></c> is found, the atom <c>nomatch</c> is returned.</p> + <p>If none of the strings in <c><anno>Pattern</anno></c> is found, the + atom <c>nomatch</c> is returned.</p> - <p>For a description of <c><anno>Pattern</anno></c>, see - <seealso marker="#compile_pattern-1">compile_pattern/1</seealso>.</p> + <p>For a description of <c><anno>Pattern</anno></c>, see function + <seealso marker="#compile_pattern-1"><c>compile_pattern/1</c></seealso>. + </p> - <p>If <c>{scope, {Start,Length}}</c> is given in the options - such that <c>Start</c> is larger than the size of - <c>Subject</c>, <c>Start + Length</c> is less than zero or - <c>Start + Length</c> is larger than the size of + <p>If <c>{scope, {Start,Length}}</c> is specified in the options such + that <c>Start</c> > size of <c>Subject</c>, <c>Start</c> + + <c>Length</c> < 0 or <c>Start</c> + <c>Length</c> > size of <c>Subject</c>, a <c>badarg</c> exception is raised.</p> - </desc> </func> + <func> <name name="matches" arity="2"/> - <fsummary>Searches for all matches of a pattern in a binary</fsummary> + <fsummary>Search for all matches of a pattern in a binary.</fsummary> <desc> - <p>The same as <c>matches(<anno>Subject</anno>, <anno>Pattern</anno>, [])</c>.</p> + <p>Same as <c>matches(<anno>Subject</anno>, <anno>Pattern</anno>, [])</c>. + </p> </desc> </func> + <func> <name name="matches" arity="3"/> - <fsummary>Searches for all matches of a pattern in a binary</fsummary> + <fsummary>Search for all matches of a pattern in a binary.</fsummary> <type name="part"/> <desc> - - <p>Works like <c>match/2</c>, but the <c><anno>Subject</anno></c> is searched until + <p>As <seealso marker="#match-2"><c>match/2</c></seealso>, + but <c><anno>Subject</anno></c> is searched until exhausted and a list of all non-overlapping parts matching - <c><anno>Pattern</anno></c> is returned (in order). </p> + <c><anno>Pattern</anno></c> is returned (in order).</p> <p>The first and longest match is preferred to a shorter, which is illustrated by the following example:</p> @@ -367,76 +396,84 @@ <code> 1> binary:matches(<<"abcde">>, [<<"bcde">>,<<"bc">>,<<"de">>],[]). -[{1,4}] -</code> - - <p>The result shows that <<"bcde">> is selected instead of the - shorter match <<"bc">> (which would have given raise to one - more match,<<"de">>). This corresponds to the behavior of posix - regular expressions (and programs like awk), but is not - consistent with alternative matches in re (and Perl), where +[{1,4}]</code> + + <p>The result shows that <<"bcde">> is selected instead of + the shorter match <<"bc">> (which would have given raise to + one more match, <<"de">>). + This corresponds to the behavior of + POSIX regular expressions (and programs like awk), but is not + consistent with alternative matches in <c>re</c> (and Perl), where instead lexical ordering in the search pattern selects which string matches.</p> - <p>If none of the strings in pattern is found, an empty list is returned.</p> - - <p>For a description of <c><anno>Pattern</anno></c>, see <seealso marker="#compile_pattern-1">compile_pattern/1</seealso> and for a - description of available options, see <seealso marker="#match-3">match/3</seealso>.</p> + <p>If none of the strings in a pattern is found, an empty list is + returned.</p> - <p>If <c>{scope, {<anno>Start</anno>,<anno>Length</anno>}}</c> is given in the options such that - <c><anno>Start</anno></c> is larger than the size of <c><anno>Subject</anno></c>, <c><anno>Start</anno> + <anno>Length</anno></c> is - less than zero or <c><anno>Start</anno> + <anno>Length</anno></c> is larger than the size of - <c><anno>Subject</anno></c>, a <c>badarg</c> exception is raised.</p> + <p>For a description of <c><anno>Pattern</anno></c>, see + <seealso marker="#compile_pattern-1"><c>compile_pattern/1</c></seealso>. + For a description of available options, see + <seealso marker="#match-3"><c>match/3</c></seealso>.</p> + <p>If <c>{scope, {<anno>Start</anno>,<anno>Length</anno>}}</c> is + specified in the options such that <c><anno>Start</anno></c> > size + of <c><anno>Subject</anno></c>, <c><anno>Start</anno> + + <anno>Length</anno></c> < 0 or <c><anno>Start</anno> + + <anno>Length</anno></c> is > size of <c><anno>Subject</anno></c>, + a <c>badarg</c> exception is raised.</p> </desc> </func> + <func> <name name="part" arity="2"/> - <fsummary>Extracts a part of a binary</fsummary> + <fsummary>Extract a part of a binary.</fsummary> <desc> + <p>Extracts the part of binary <c><anno>Subject</anno></c> described by + <c><anno>PosLen</anno></c>.</p> - <p>Extracts the part of the binary <c><anno>Subject</anno></c> described by <c><anno>PosLen</anno></c>.</p> - - <p>Negative length can be used to extract bytes at the end of a binary:</p> + <p>A negative length can be used to extract bytes at the end of a + binary:</p> <code> 1> Bin = <<1,2,3,4,5,6,7,8,9,10>>. -2> binary:part(Bin,{byte_size(Bin), -5}). -<<6,7,8,9,10>> -</code> +2> binary:part(Bin, {byte_size(Bin), -5}). +<<6,7,8,9,10>></code> <note> - <p><seealso marker="#part-2">part/2</seealso>and <seealso - marker="#part-3">part/3</seealso> are also available in the - <c>erlang</c> module under the names <c>binary_part/2</c> and + <p><seealso marker="#part-2">part/2</seealso> and + <seealso marker="#part-3">part/3</seealso> are also available in the + <seealso marker="erts:erlang"><c>erlang</c></seealso> + module under the names <c>binary_part/2</c> and <c>binary_part/3</c>. Those BIFs are allowed in guard tests.</p> </note> - <p>If <c><anno>PosLen</anno></c> in any way references outside the binary, a <c>badarg</c> exception - is raised.</p> - + <p>If <c><anno>PosLen</anno></c> in any way references outside the binary, + a <c>badarg</c> exception is raised.</p> </desc> </func> + <func> <name name="part" arity="3"/> - <fsummary>Extracts a part of a binary</fsummary> + <fsummary>Extract a part of a binary.</fsummary> <desc> - <p>The same as <c>part(<anno>Subject</anno>, {<anno>Pos</anno>, <anno>Len</anno>})</c>.</p> + <p>Same as <c>part(<anno>Subject</anno>, {<anno>Pos</anno>, + <anno>Len</anno>})</c>.</p> </desc> </func> + <func> <name name="referenced_byte_size" arity="1"/> - <fsummary>Determines the size of the actual binary pointed out by a sub-binary</fsummary> + <fsummary>Determine the size of the binary pointed out by a subbinary. + </fsummary> <desc> + <p>If a binary references a larger binary (often described as + being a subbinary), it can be useful to get the size of the + referenced binary. This function can be used in a program to trigger the + use of <seealso marker="#copy/1"><c>copy/1</c></seealso>. By copying a + binary, one can dereference the original, possibly large, binary that a + smaller binary is a reference to.</p> - <p>If a binary references a larger binary (often described as - being a sub-binary), it can be useful to get the size of the - actual referenced binary. This function can be used in a program - to trigger the use of <c>copy/1</c>. By copying a binary, one might - dereference the original, possibly large, binary which a smaller - binary is a reference to.</p> - - <p>Example:</p> + <p><em>Example:</em></p> <code> store(Binary, GBSet) -> @@ -447,26 +484,24 @@ store(Binary, GBSet) -> _ -> Binary end, - gb_sets:insert(NewBin,GBSet). - </code> + gb_sets:insert(NewBin,GBSet).</code> <p>In this example, we chose to copy the binary content before - inserting it in the <c>gb_sets:set()</c> if it references a binary more than - twice the size of the data we're going to keep. Of course - different rules for when copying will apply to different - programs.</p> + inserting it in <c>gb_sets:set()</c> if it references a binary more than + twice the data size we want to keep. Of course, + different rules apply when copying to different programs.</p> - <p>Binary sharing will occur whenever binaries are taken apart, - this is the fundamental reason why binaries are fast, + <p>Binary sharing occurs whenever binaries are taken apart. + This is the fundamental reason why binaries are fast, decomposition can always be done with O(1) complexity. In rare circumstances this data sharing is however undesirable, why this - function together with <c>copy/1</c> might be useful when optimizing + function together with <c>copy/1</c> can be useful when optimizing for memory use.</p> <p>Example of binary sharing:</p> <code> -1> A = binary:copy(<<1>>,100). +1> A = binary:copy(<<1>>, 100). <<1,1,1,1,1 ... 2> byte_size(A). 100 @@ -477,141 +512,138 @@ store(Binary, GBSet) -> 5> byte_size(B). 10 6> binary:referenced_byte_size(B) -100 - </code> +100</code> <note> <p>Binary data is shared among processes. If another process still references the larger binary, copying the part this - process uses only consumes more memory and will not free up the + process uses only consumes more memory and does not free up the larger binary for garbage collection. Use this kind of intrusive - functions with extreme care, and only if a real problem is - detected.</p> + functions with extreme care and only if a real problem is detected.</p> </note> - </desc> </func> + <func> <name name="replace" arity="3"/> - <fsummary>Replaces bytes in a binary according to a pattern</fsummary> + <fsummary>Replace bytes in a binary according to a pattern.</fsummary> <desc> - <p>The same as <c>replace(<anno>Subject</anno>,<anno>Pattern</anno>,<anno>Replacement</anno>,[])</c>.</p> + <p>Same as <c>replace(<anno>Subject</anno>, <anno>Pattern</anno>, <anno>Replacement</anno>,[])</c>.</p> </desc> </func> + <func> <name name="replace" arity="4"/> - <fsummary>Replaces bytes in a binary according to a pattern</fsummary> + <fsummary>Replace bytes in a binary according to a pattern.</fsummary> <type_desc variable="OnePos">An integer() =< byte_size(<anno>Replacement</anno>) </type_desc> <desc> - <p>Constructs a new binary by replacing the parts in - <c><anno>Subject</anno></c> matching <c><anno>Pattern</anno></c> with the content of - <c><anno>Replacement</anno></c>.</p> + <c><anno>Subject</anno></c> matching <c><anno>Pattern</anno></c> with + the content of <c><anno>Replacement</anno></c>.</p> + + <p>If the matching subpart of <c><anno>Subject</anno></c> giving raise + to the replacement is to be inserted in the result, option + <c>{insert_replaced, <anno>InsPos</anno>}</c> inserts the matching part + into <c><anno>Replacement</anno></c> at the specified position (or + positions) before inserting <c><anno>Replacement</anno></c> into + <c><anno>Subject</anno></c>.</p> - <p>If the matching sub-part of <c><anno>Subject</anno></c> giving raise to the - replacement is to be inserted in the result, the option - <c>{insert_replaced, <anno>InsPos</anno>}</c> will insert the matching part into - <c><anno>Replacement</anno></c> at the given position (or positions) before actually - inserting <c><anno>Replacement</anno></c> into the <c><anno>Subject</anno></c>. Example:</p> + <p><em>Example:</em></p> <code> -1> binary:replace(<<"abcde">>,<<"b">>,<<"[]">>,[{insert_replaced,1}]). +1> binary:replace(<<"abcde">>,<<"b">>,<<"[]">>, [{insert_replaced,1}]). <<"a[b]cde">> -2> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>, - [global,{insert_replaced,1}]). +2> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>,[global,{insert_replaced,1}]). <<"a[b]c[d]e">> -3> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>, - [global,{insert_replaced,[1,1]}]). +3> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>,[global,{insert_replaced,[1,1]}]). <<"a[bb]c[dd]e">> -4> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[-]">>, - [global,{insert_replaced,[1,2]}]). -<<"a[b-b]c[d-d]e">> -</code> +4> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[-]">>,[global,{insert_replaced,[1,2]}]). +<<"a[b-b]c[d-d]e">></code> - <p>If any position given in <c><anno>InsPos</anno></c> is greater than the size of the replacement binary, a <c>badarg</c> exception is raised.</p> + <p>If any position specified in <c><anno>InsPos</anno></c> > size + of the replacement binary, a <c>badarg</c> exception is raised.</p> - <p>The options <c>global</c> and <c>{scope, part()}</c> work as for <seealso marker="#split-3">split/3</seealso>. The return type is always a <c>binary()</c>.</p> + <p>Options <c>global</c> and <c>{scope, part()}</c> work as for + <seealso marker="#split-3"><c>split/3</c></seealso>. + The return type is always a <c>binary()</c>.</p> - <p>For a description of <c><anno>Pattern</anno></c>, see <seealso marker="#compile_pattern-1">compile_pattern/1</seealso>.</p> + <p>For a description of <c><anno>Pattern</anno></c>, see + <seealso marker="#compile_pattern-1"><c>compile_pattern/1</c></seealso>. + </p> </desc> </func> + <func> <name name="split" arity="2"/> - <fsummary>Splits a binary according to a pattern</fsummary> + <fsummary>Split a binary according to a pattern.</fsummary> <desc> - <p>The same as <c>split(<anno>Subject</anno>, <anno>Pattern</anno>, [])</c>.</p> + <p>Same as <c>split(<anno>Subject</anno>, <anno>Pattern</anno>, + [])</c>.</p> </desc> </func> + <func> <name name="split" arity="3"/> - <fsummary>Splits a binary according to a pattern</fsummary> + <fsummary>Split a binary according to a pattern.</fsummary> <desc> + <p>Splits <c><anno>Subject</anno></c> into a list of binaries based on + <c><anno>Pattern</anno></c>. If option <c>global</c> is not specified, + only the first occurrence of <c><anno>Pattern</anno></c> in + <c><anno>Subject</anno></c> gives rise to a split.</p> - <p>Splits <c><anno>Subject</anno></c> into a list of binaries based on <c><anno>Pattern</anno></c>. If - the option global is not given, only the first occurrence of - <c><anno>Pattern</anno></c> in <c><anno>Subject</anno></c> will give rise to a split.</p> + <p>The parts of <c><anno>Pattern</anno></c> found in + <c><anno>Subject</anno></c> are not included in the result.</p> - <p>The parts of <c><anno>Pattern</anno></c> actually found in <c><anno>Subject</anno></c> are not included in the result.</p> - - <p>Example:</p> + <p><em>Example:</em></p> <code> 1> binary:split(<<1,255,4,0,0,0,2,3>>, [<<0,0,0>>,<<2>>],[]). [<<1,255,4>>, <<2,3>>] 2> binary:split(<<0,1,0,0,4,255,255,9>>, [<<0,0>>, <<255,255>>],[global]). -[<<0,1>>,<<4>>,<<9>>] -</code> +[<<0,1>>,<<4>>,<<9>>]</code> <p>Summary of options:</p> - <taglist> + <taglist> <tag>{scope, part()}</tag> - - <item><p>Works as in <seealso marker="#match-3">match/3</seealso> and - <seealso marker="#matches-3">matches/3</seealso>. Note that + <item><p>Works as in <seealso marker="#match-3"><c>match/3</c></seealso> + and <seealso marker="#matches-3"><c>matches/3</c></seealso>. Notice that this only defines the scope of the search for matching strings, - it does not cut the binary before splitting. The bytes before - and after the scope will be kept in the result. See example - below.</p></item> - + it does not cut the binary before splitting. The bytes before and after + the scope are kept in the result. See the example below.</p></item> <tag>trim</tag> - - <item><p>Removes trailing empty parts of the result (as does trim in <c>re:split/3</c>)</p></item> - + <item><p>Removes trailing empty parts of the result (as does <c>trim</c> + in <seealso marker="re#split/3"><c>re:split/3</c></seealso>.</p></item> <tag>trim_all</tag> - <item><p>Removes all empty parts of the result.</p></item> - <tag>global</tag> - - <item><p>Repeats the split until the <c><anno>Subject</anno></c> is - exhausted. Conceptually the global option makes split work on - the positions returned by <seealso marker="#matches-3">matches/3</seealso>, - while it normally - works on the position returned by - <seealso marker="#match-3">match/3</seealso>.</p></item> - + <item><p>Repeats the split until <c><anno>Subject</anno></c> is + exhausted. Conceptually option <c>global</c> makes split work + on the positions returned by + <seealso marker="#matches-3"><c>matches/3</c></seealso>, while it + normally works on the position returned by + <seealso marker="#match-3"><c>match/3</c></seealso>.</p></item> </taglist> <p>Example of the difference between a scope and taking the binary apart before splitting:</p> <code> -1> binary:split(<<"banana">>,[<<"a">>],[{scope,{2,3}}]). +1> binary:split(<<"banana">>, [<<"a">>],[{scope,{2,3}}]). [<<"ban">>,<<"na">>] -2> binary:split(binary:part(<<"banana">>,{2,3}),[<<"a">>],[]). -[<<"n">>,<<"n">>] -</code> +2> binary:split(binary:part(<<"banana">>,{2,3}), [<<"a">>],[]). +[<<"n">>,<<"n">>]</code> <p>The return type is always a list of binaries that are all - referencing <c><anno>Subject</anno></c>. This means that the data in <c><anno>Subject</anno></c> is not - actually copied to new binaries and that <c><anno>Subject</anno></c> cannot be - garbage collected until the results of the split are no longer - referenced.</p> - - <p>For a description of <c><anno>Pattern</anno></c>, see <seealso marker="#compile_pattern-1">compile_pattern/1</seealso>.</p> + referencing <c><anno>Subject</anno></c>. This means that the data in + <c><anno>Subject</anno></c> is not copied to new binaries, and that + <c><anno>Subject</anno></c> cannot be garbage collected until the results + of the split are no longer referenced.</p> + <p>For a description of <c><anno>Pattern</anno></c>, see + <seealso marker="#compile_pattern-1"><c>compile_pattern/1</c></seealso>. + </p> </desc> </func> </funcs> diff --git a/lib/stdlib/doc/src/book.xml b/lib/stdlib/doc/src/book.xml index 84ce3f0788..008d7f4319 100644 --- a/lib/stdlib/doc/src/book.xml +++ b/lib/stdlib/doc/src/book.xml @@ -27,7 +27,7 @@ <docno></docno> <date>1997-05-02</date> <rev>1.3</rev> - <file>book.sgml</file> + <file>book.xml</file> </header> <insidecover> </insidecover> @@ -48,3 +48,4 @@ <index></index> </book> + diff --git a/lib/stdlib/doc/src/c.xml b/lib/stdlib/doc/src/c.xml index 9b4a9489c0..92ab59c6b0 100644 --- a/lib/stdlib/doc/src/c.xml +++ b/lib/stdlib/doc/src/c.xml @@ -25,270 +25,310 @@ <title>c</title> <prepared>Joe Armstrong</prepared> <docno>1</docno> - <date>96-10-30</date> + <date>1996-10-30</date> <rev>B</rev> </header> <module>c</module> - <modulesummary>Command Interface Module</modulesummary> + <modulesummary>Command interface module.</modulesummary> <description> - <p>The <c>c</c> module enables users to enter the short form of + <p>This module enables users to enter the short form of some commonly used commands.</p> <note> - <p>These functions are are intended for interactive use in - the Erlang shell only. The module prefix may be omitted.</p> + <p>These functions are intended for interactive use in + the Erlang shell only. The module prefix can be omitted.</p> </note> </description> + <funcs> <func> <name name="bt" arity="1"/> - <fsummary>Stack backtrace for a process</fsummary> + <fsummary>Stack backtrace for a process.</fsummary> <desc> <p>Stack backtrace for a process. Equivalent to <c>erlang:process_display(<anno>Pid</anno>, backtrace)</c>.</p> </desc> </func> + <func> <name name="c" arity="1"/> <name name="c" arity="2"/> - <fsummary>Compile and load code in a file</fsummary> + <fsummary>Compile and load code in a file.</fsummary> <desc> - <p><c>c/1,2</c> compiles and then purges and loads the code for - a file. <c><anno>Options</anno></c> defaults to []. Compilation is + <p>Compiles and then purges and loads the code for a file. + <c><anno>Options</anno></c> defaults to <c>[]</c>. Compilation is equivalent to:</p> <code type="none"> compile:file(<anno>File</anno>, <anno>Options</anno> ++ [report_errors, report_warnings])</code> - <p>Note that purging the code means that any processes + <p>Notice that purging the code means that any processes lingering in old code for the module are killed without - warning. See <c>code/3</c> for more information.</p> + warning. For more information, see <c>code/3</c>.</p> </desc> </func> + <func> <name name="cd" arity="1"/> - <fsummary>Change working directory</fsummary> + <fsummary>Change working directory.</fsummary> <desc> - <p>Changes working directory to <c><anno>Dir</anno></c>, which may be a + <p>Changes working directory to <c><anno>Dir</anno></c>, which can be a relative name, and then prints the name of the new working directory.</p> + <p><em>Example:</em></p> <pre> 2> <input>cd("../erlang").</input> /home/ron/erlang</pre> </desc> </func> + <func> <name name="flush" arity="0"/> - <fsummary>Flush any messages sent to the shell</fsummary> + <fsummary>Flush any messages sent to the shell.</fsummary> <desc> <p>Flushes any messages sent to the shell.</p> </desc> </func> + <func> <name name="help" arity="0"/> - <fsummary>Help information</fsummary> + <fsummary>Help information.</fsummary> <desc> <p>Displays help information: all valid shell internal commands, and commands in this module.</p> </desc> </func> + <func> <name name="i" arity="0"/> <name name="ni" arity="0"/> - <fsummary>Information about the system</fsummary> + <fsummary>System information.</fsummary> <desc> - <p><c>i/0</c> displays information about the system, listing + <p><c>i/0</c> displays system information, listing information about all processes. <c>ni/0</c> does the same, but for all nodes the network.</p> </desc> </func> + <func> <name name="i" arity="3"/> - <fsummary>Information about pid <X.Y.Z></fsummary> + <fsummary>Information about pid <X.Y.Z>.</fsummary> <desc> <p>Displays information about a process, Equivalent to - <c>process_info(pid(<anno>X</anno>, <anno>Y</anno>, <anno>Z</anno>))</c>, but location transparent.</p> + <c>process_info(pid(<anno>X</anno>, <anno>Y</anno>, + <anno>Z</anno>))</c>, but location transparent.</p> </desc> </func> + <func> <name name="l" arity="1"/> - <fsummary>Load or reload module</fsummary> + <fsummary>Load or reload a module.</fsummary> <desc> <p>Purges and loads, or reloads, a module by calling <c>code:purge(<anno>Module</anno>)</c> followed by <c>code:load_file(<anno>Module</anno>)</c>.</p> - <p>Note that purging the code means that any processes + <p>Notice that purging the code means that any processes lingering in old code for the module are killed without - warning. See <c>code/3</c> for more information.</p> + warning. For more information, see <c>code/3</c>.</p> </desc> </func> + <func> <name>lc(Files) -> ok</name> - <fsummary>Compile a list of files</fsummary> + <fsummary>Compile a list of files.</fsummary> <type> <v>Files = [File]</v> - <v>File = <seealso marker="file#type-filename">file:filename() - </seealso></v> + <v>File</v> </type> <desc> - <p>Compiles a list of files by calling <c>compile:file(File, [report_errors, report_warnings])</c> for each <c>File</c> - in <c>Files</c>.</p> + <p>Compiles a list of files by calling + <c>compile:file(File, [report_errors, report_warnings])</c> for each + <c>File</c> in <c>Files</c>.</p> + <p>For information about <c>File</c>, see + <seealso marker="file#type-filename"><c>file:filename()</c></seealso>. + </p> </desc> </func> + <func> <name name="ls" arity="0"/> - <fsummary>List files in the current directory</fsummary> + <fsummary>List files in the current directory.</fsummary> <desc> <p>Lists files in the current directory.</p> </desc> </func> + <func> <name name="ls" arity="1"/> - <fsummary>List files in a directory or a single file</fsummary> + <fsummary>List files in a directory or a single file.</fsummary> <desc> - <p>Lists files in directory <c><anno>Dir</anno></c> or, if Dir is a file, only list it.</p> + <p>Lists files in directory <c><anno>Dir</anno></c> or, if <c>Dir</c> + is a file, only lists it.</p> </desc> </func> + <func> <name name="m" arity="0"/> - <fsummary>Which modules are loaded</fsummary> + <fsummary>Which modules are loaded.</fsummary> <desc> <p>Displays information about the loaded modules, including the files from which they have been loaded.</p> </desc> </func> + <func> <name name="m" arity="1"/> - <fsummary>Information about a module</fsummary> + <fsummary>Information about a module.</fsummary> <desc> <p>Displays information about <c><anno>Module</anno></c>.</p> </desc> </func> + <func> <name name="memory" arity="0"/> - <fsummary>Memory allocation information</fsummary> + <fsummary>Memory allocation information.</fsummary> <desc> <p>Memory allocation information. Equivalent to - <seealso marker="erts:erlang#memory/0"><c>erlang:memory/0</c> - </seealso>.</p> + <seealso marker="erts:erlang#memory/0"><c>erlang:memory/0</c></seealso>.</p> </desc> </func> + <func> <name name="memory" arity="1" clause_i="1"/> <name name="memory" arity="1" clause_i="2"/> - <fsummary>Memory allocation information</fsummary> + <fsummary>Memory allocation information.</fsummary> <desc> <p>Memory allocation information. Equivalent to - <seealso marker="erts:erlang#memory/1"><c>erlang:memory/1</c> - </seealso>.</p> + <seealso marker="erts:erlang#memory/1"><c>erlang:memory/1</c></seealso>.</p> </desc> </func> + <func> <name name="nc" arity="1"/> <name name="nc" arity="2"/> - <fsummary>Compile and load code in a file on all nodes</fsummary> + <fsummary>Compile and load code in a file on all nodes.</fsummary> <desc> <p>Compiles and then loads the code for a file on all nodes. - <c><anno>Options</anno></c> defaults to []. Compilation is equivalent to:</p> + <c><anno>Options</anno></c> defaults to <c>[]</c>. + Compilation is equivalent to:</p> <code type="none"> compile:file(<anno>File</anno>, <anno>Options</anno> ++ [report_errors, report_warnings])</code> </desc> </func> + <func> <name name="nl" arity="1"/> - <fsummary>Load module on all nodes</fsummary> + <fsummary>Load module on all nodes.</fsummary> <desc> <p>Loads <c><anno>Module</anno></c> on all nodes.</p> </desc> </func> + <func> <name name="pid" arity="3"/> - <fsummary>Convert X,Y,Z to a pid</fsummary> + <fsummary>Convert <c>X,Y,Z</c> to a pid.</fsummary> <desc> - <p>Converts <c><anno>X</anno></c>, <c><anno>Y</anno></c>, <c><anno>Z</anno></c> to the pid - <c><![CDATA[<X.Y.Z>]]></c>. This function should only be used when - debugging.</p> + <p>Converts <c><anno>X</anno></c>, <c><anno>Y</anno></c>, + <c><anno>Z</anno></c> to pid <c><![CDATA[<X.Y.Z>]]></c>. + This function is only to be used when debugging.</p> </desc> </func> + <func> <name name="pwd" arity="0"/> - <fsummary>Print working directory</fsummary> + <fsummary>Print working directory.</fsummary> <desc> <p>Prints the name of the working directory.</p> </desc> </func> + <func> <name name="q" arity="0"/> - <fsummary>Quit - shorthand for <c>init:stop()</c></fsummary> + <fsummary>Quit - shorthand for <c>init:stop()</c>.</fsummary> <desc> <p>This function is shorthand for <c>init:stop()</c>, that is, it causes the node to stop in a controlled fashion.</p> </desc> </func> + <func> <name name="regs" arity="0"/> <name name="nregs" arity="0"/> - <fsummary>Information about registered processes</fsummary> + <fsummary>Information about registered processes.</fsummary> <desc> <p><c>regs/0</c> displays information about all registered processes. <c>nregs/0</c> does the same, but for all nodes in the network.</p> </desc> </func> + <func> <name name="uptime" arity="0"/> - <fsummary>Print node uptime</fsummary> + <fsummary>Print node uptime.</fsummary> <desc> - <p>Prints the node uptime (as given by - <c>erlang:statistics(wall_clock)</c>), in human-readable form.</p> + <p>Prints the node uptime (as specified by + <c>erlang:statistics(wall_clock)</c>) in human-readable form.</p> </desc> </func> + <func> <name>xm(ModSpec) -> void()</name> - <fsummary>Cross reference check a module</fsummary> + <fsummary>Cross-reference check a module.</fsummary> <type> <v>ModSpec = Module | Filename</v> <v> Module = atom()</v> <v> Filename = string()</v> </type> <desc> - <p>This function finds undefined functions, unused functions, + <p>Finds undefined functions, unused functions, and calls to deprecated functions in a module by calling <c>xref:m/1</c>.</p> </desc> </func> + <func> <name>y(File) -> YeccRet</name> - <fsummary>Generate an LALR-1 parser</fsummary> + <fsummary>Generate an LALR-1 parser.</fsummary> <type> - <v>File = name() -- see filename(3)</v> - <v>YeccRet = -- see yecc:file/2</v> + <v>File = name()</v> + <v>YeccRet</v> </type> <desc> <p>Generates an LALR-1 parser. Equivalent to:</p> <code type="none"> yecc:file(File)</code> + <p>For information about <c>File = name()</c>, see + <seealso marker="filename"><c>filename(3)</c></seealso>. + For information about <c>YeccRet</c>, see + <seealso marker="parsetools:yecc#file/1"><c>yecc:file/2</c></seealso>. + </p> </desc> </func> + <func> <name>y(File, Options) -> YeccRet</name> - <fsummary>Generate an LALR-1 parser</fsummary> + <fsummary>Generate an LALR-1 parser.</fsummary> <type> - <v>File = name() -- see filename(3)</v> - <v>Options, YeccRet = -- see yecc:file/2</v> + <v>File = name()</v> + <v>Options, YeccRet</v> </type> <desc> <p>Generates an LALR-1 parser. Equivalent to:</p> <code type="none"> yecc:file(File, Options)</code> + <p>For information about <c>File = name()</c>, see + <seealso marker="filename"><c>filename(3)</c></seealso>. + For information about <c>Options</c> and <c>YeccRet</c>, see + <seealso marker="parsetools:yecc#file/1"><c>yecc:file/2</c></seealso>. + </p> </desc> </func> </funcs> <section> <title>See Also</title> - <p><seealso marker="compiler:compile">compile(3)</seealso>, - <seealso marker="filename">filename(3)</seealso>, - <seealso marker="erts:erlang">erlang(3)</seealso>, - <seealso marker="parsetools:yecc">yecc(3)</seealso>, - <seealso marker="tools:xref">xref(3)</seealso></p> + <p><seealso marker="filename"><c>filename(3)</c></seealso>, + <seealso marker="compiler:compile"><c>compile(3)</c></seealso>, + <seealso marker="erts:erlang"><c>erlang(3)</c></seealso>, + <seealso marker="parsetools:yecc"><c>yecc(3)</c></seealso>, + <seealso marker="tools:xref"><c>xref(3)</c></seealso></p> </section> </erlref> diff --git a/lib/stdlib/doc/src/calendar.xml b/lib/stdlib/doc/src/calendar.xml index 38bf55679e..65b3edcdf6 100644 --- a/lib/stdlib/doc/src/calendar.xml +++ b/lib/stdlib/doc/src/calendar.xml @@ -29,20 +29,21 @@ <rev>B</rev> </header> <module>calendar</module> - <modulesummary>Local and universal time, day-of-the-week, date and time conversions</modulesummary> + <modulesummary>Local and universal time, day of the week, date and time + conversions.</modulesummary> <description> <p>This module provides computation of local and universal time, - day-of-the-week, and several time conversion functions.</p> + day of the week, and many time conversion functions.</p> <p>Time is local when it is adjusted in accordance with the current time zone and daylight saving. Time is universal when it reflects the time at longitude zero, without any adjustment for daylight saving. Universal Coordinated Time (UTC) time is also called Greenwich Mean Time (GMT).</p> <p>The time functions <c>local_time/0</c> and - <c>universal_time/0</c> provided in this module both return date - and time. The reason for this is that separate functions for date - and time may result in a date/time combination which is displaced - by 24 hours. This happens if one of the functions is called + <c>universal_time/0</c> in this module both return date + and time. The is because separate functions for date + and time can result in a date/time combination that is displaced + by 24 hours. This occurs if one of the functions is called before midnight, and the other after midnight. This problem also applies to the Erlang BIFs <c>date/0</c> and <c>time/0</c>, and their use is strongly discouraged if a reliable date/time stamp @@ -56,22 +57,21 @@ <p>The Gregorian calendar in this module is extended back to year 0. For a given date, the <em>gregorian days</em> is the number of days up to and including the date specified. Similarly, - the <em>gregorian seconds</em> for a given date and time, is - the the number of seconds up to and including the specified date + the <em>gregorian seconds</em> for a specified date and time is + the number of seconds up to and including the specified date and time.</p> <p>For computing differences between epochs in time, use the functions counting gregorian days or seconds. If epochs are - given as local time, they must be converted to universal time, in - order to get the correct value of the elapsed time between epochs. - Use of the function <c>time_difference/2</c> is discouraged.</p> - <p>There exists different definitions for the week of the year. - The calendar module contains a week of the year implementation - which conforms to the ISO 8601 standard. Since the week number for - a given date can fall on the previous, the current or on the next - year it is important to provide the information which year is it - together with the week number. The function <c>iso_week_number/0</c> - and <c>iso_week_number/1</c> returns a tuple of the year and the - week number.</p> + specified as local time, they must be converted to universal time + to get the correct value of the elapsed time between epochs. + Use of function <c>time_difference/2</c> is discouraged.</p> + <p>Different definitions exist for the week of the year. + This module contains a week of the year implementation + conforming to the ISO 8601 standard. As the week number for a + specified date can fall on the previous, the current, or on the next + year, it is important to specify both the year and the week number. + Functions <c>iso_week_number/0</c> and <c>iso_week_number/1</c> + return a tuple of the year and the week number.</p> </description> <datatypes> @@ -86,9 +86,9 @@ </datatype> <datatype> <name name="year"/> - <desc><p>Year cannot be abbreviated. Example: 93 denotes year - 93, not 1993. Valid range depends on the underlying OS. The - date tuple must denote a valid date.</p> + <desc><p>Year cannot be abbreviated. For example, 93 denotes year + 93, not 1993. The valid range depends on the underlying operating + system. The date tuple must denote a valid date.</p> </desc> </datatype> <datatype> @@ -130,186 +130,221 @@ <func> <name name="date_to_gregorian_days" arity="1"/> <name name="date_to_gregorian_days" arity="3"/> - <fsummary>Compute the number of days from year 0 up to the given date</fsummary> + <fsummary>Compute the number of days from year 0 up to the specified + date.</fsummary> <type variable="Date" name_i="1"/> <type variable="Year"/> <type variable="Month"/> <type variable="Day"/> <desc> - <p>This function computes the number of gregorian days starting - with year 0 and ending at the given date.</p> + <p>Computes the number of gregorian days starting + with year 0 and ending at the specified date.</p> </desc> </func> + <func> <name name="datetime_to_gregorian_seconds" arity="1"/> - <fsummary>Compute the number of seconds from year 0 up to the given date and time</fsummary> + <fsummary>Compute the number of seconds from year 0 up to the specified + date and time.</fsummary> <desc> - <p>This function computes the number of gregorian seconds - starting with year 0 and ending at the given date and time.</p> + <p>Computes the number of gregorian seconds starting + with year 0 and ending at the specified date and time.</p> </desc> </func> + <func> <name name="day_of_the_week" arity="1"/> <name name="day_of_the_week" arity="3"/> - <fsummary>Compute the day of the week</fsummary> + <fsummary>Compute the day of the week.</fsummary> <type variable="Date" name_i="1"/> <type variable="Year"/> <type variable="Month"/> <type variable="Day"/> <desc> - <p>This function computes the day of the week given <c><anno>Year</anno></c>, - <c><anno>Month</anno></c> and <c><anno>Day</anno></c>. The return value denotes the day - of the week as <c>1</c>: Monday, <c>2</c>: Tuesday, and so on.</p> + <p>Computes the day of the week from the specified + <c><anno>Year</anno></c>, <c><anno>Month</anno></c>, and + <c><anno>Day</anno></c>. Returns the day of the week as + <c>1</c>: Monday, <c>2</c>: Tuesday, and so on.</p> </desc> </func> + <func> <name name="gregorian_days_to_date" arity="1"/> - <fsummary>Compute the date given the number of gregorian days</fsummary> + <fsummary>Compute the date from the number of gregorian days.</fsummary> <desc> - <p>This function computes the date given the number of - gregorian days.</p> + <p>Computes the date from the specified number of gregorian days.</p> </desc> </func> + <func> <name name="gregorian_seconds_to_datetime" arity="1"/> - <fsummary>Compute the date given the number of gregorian days</fsummary> + <fsummary>Compute the date and time from the number of gregorian seconds. + </fsummary> <desc> - <p>This function computes the date and time from the given + <p>Computes the date and time from the specified number of gregorian seconds.</p> </desc> </func> + <func> <name name="is_leap_year" arity="1"/> - <fsummary>Check if a year is a leap year</fsummary> + <fsummary>Check if the year is a leap year.</fsummary> <desc> - <p>This function checks if a year is a leap year.</p> + <p>Checks if the specified year is a leap year.</p> </desc> </func> + <func> <name name="iso_week_number" arity="0"/> - <fsummary>Compute the iso week number for the actual date</fsummary> + <fsummary>Compute the ISO week number for the actual date.</fsummary> <desc> - <p>This function returns the tuple {Year, WeekNum} representing - the iso week number for the actual date. For determining the - actual date, the function <c>local_time/0</c> is used.</p> + <p>Returns tuple <c>{Year, WeekNum}</c> representing + the ISO week number for the actual date. To determine the + actual date, use function + <seealso marker="#local_time/0"><c>local_time/0</c></seealso>.</p> </desc> </func> + <func> <name name="iso_week_number" arity="1"/> - <fsummary>Compute the iso week number for the given date</fsummary> + <fsummary>Compute the ISO week number for the specified date.</fsummary> <desc> - <p>This function returns the tuple {Year, WeekNum} representing - the iso week number for the given date.</p> + <p>Returns tuple <c>{Year, WeekNum}</c> representing + the ISO week number for the specified date.</p> </desc> </func> + <func> <name name="last_day_of_the_month" arity="2"/> - <fsummary>Compute the number of days in a month</fsummary> + <fsummary>Compute the number of days in a month.</fsummary> <desc> - <p>This function computes the number of days in a month.</p> + <p>Computes the number of days in a month.</p> </desc> </func> + <func> <name name="local_time" arity="0"/> - <fsummary>Compute local time</fsummary> + <fsummary>Compute local time.</fsummary> <desc> - <p>This function returns the local time reported by + <p>Returns the local time reported by the underlying operating system.</p> </desc> </func> + <func> <name name="local_time_to_universal_time" arity="1"/> - <fsummary>Convert from local time to universal time (deprecated)</fsummary> + <fsummary>Convert from local time to universal time (deprecated). + </fsummary> <desc> - <p>This function converts from local time to Universal - Coordinated Time (UTC). <c><anno>DateTime1</anno></c> must refer to a local + <p>Converts from local time to Universal Coordinated Time (UTC). + <c><anno>DateTime1</anno></c> must refer to a local date after Jan 1, 1970.</p> <warning> <p>This function is deprecated. Use - <c>local_time_to_universal_time_dst/1</c> instead, as it - gives a more correct and complete result. Especially for - the period that does not exist since it gets skipped during + <seealso marker="#local_time_to_universal_time_dst/1"> + <c>local_time_to_universal_time_dst/1</c></seealso> + instead, as it gives a more correct and complete result. + Especially for + the period that does not exist, as it is skipped during the switch <em>to</em> daylight saving time, this function still returns a result.</p> </warning> </desc> </func> + <func> <name name="local_time_to_universal_time_dst" arity="1"/> - <fsummary>Convert from local time to universal time(s)</fsummary> + <fsummary>Convert from local time to universal time(s).</fsummary> <desc> - <p>This function converts from local time to Universal - Coordinated Time (UTC). <c><anno>DateTime1</anno></c> must refer to a local + <p>Converts from local time to Universal Coordinated Time (UTC). + <c><anno>DateTime1</anno></c> must refer to a local date after Jan 1, 1970.</p> - <p>The return value is a list of 0, 1 or 2 possible UTC times:</p> + <p>The return value is a list of 0, 1, or 2 possible UTC times:</p> <taglist> <tag><c>[]</c></tag> <item> <p>For a local <c>{Date1, Time1}</c> during the period that is skipped when switching <em>to</em> daylight saving - time, there is no corresponding UTC since the local time - is illegal - it has never happened.</p> + time, there is no corresponding UTC, as the local time + is illegal (it has never occured).</p> </item> <tag><c>[DstDateTimeUTC, DateTimeUTC]</c></tag> <item> <p>For a local <c>{Date1, Time1}</c> during the period that is repeated when switching <em>from</em> daylight saving - time, there are two corresponding UTCs. One for the first + time, two corresponding UTCs exist; one for the first instance of the period when daylight saving time is still active, and one for the second instance.</p> </item> <tag><c>[DateTimeUTC]</c></tag> <item> - <p>For all other local times there is only one - corresponding UTC.</p> + <p>For all other local times only one corresponding UTC exists.</p> </item> </taglist> </desc> </func> + + <func> + <name name="now_to_datetime" arity="1"/> + <fsummary>Convert now to date and time.</fsummary> + <desc> + <p>Returns Universal Coordinated Time (UTC) + converted from the return value from + <seealso marker="erts:erlang#timestamp/0"><c>erlang:timestamp/0</c></seealso>. + </p> + </desc> + </func> + <func> <name name="now_to_local_time" arity="1"/> - <fsummary>Convert now to local date and time</fsummary> + <fsummary>Convert now to local date and time.</fsummary> <desc> - <p>This function returns local date and time converted from - the return value from - <seealso marker="erts:erlang#timestamp/0"><c>erlang:timestamp/0</c></seealso>.</p> + <p>Returns local date and time converted from the return value from + <seealso marker="erts:erlang#timestamp/0"><c>erlang:timestamp/0</c></seealso>. + </p> </desc> </func> + <func> <name name="now_to_universal_time" arity="1"/> - <name name="now_to_datetime" arity="1"/> - <fsummary>Convert now to date and time</fsummary> + <fsummary>Convert now to date and time.</fsummary> <desc> - <p>This function returns Universal Coordinated Time (UTC) - converted from the return value from - <seealso marker="erts:erlang#timestamp/0"><c>erlang:timestamp/0</c></seealso>.</p> + <p>Returns Universal Coordinated Time (UTC) + converted from the return value from + <seealso marker="erts:erlang#timestamp/0"><c>erlang:timestamp/0</c></seealso>. + </p> </desc> </func> + <func> <name name="seconds_to_daystime" arity="1"/> - <fsummary>Compute days and time from seconds</fsummary> + <fsummary>Compute days and time from seconds.</fsummary> <desc> - <p>This function transforms a given number of seconds into days, - hours, minutes, and seconds. The <c><anno>Time</anno></c> part is always - non-negative, but <c><anno>Days</anno></c> is negative if the argument + <p>Converts a specified number of seconds into days, hours, minutes, + and seconds. <c><anno>Time</anno></c> is always non-negative, but + <c><anno>Days</anno></c> is negative if argument <c><anno>Seconds</anno></c> is.</p> </desc> </func> + <func> <name name="seconds_to_time" arity="1"/> - <fsummary>Compute time from seconds</fsummary> + <fsummary>Compute time from seconds.</fsummary> <type name="secs_per_day"/> <desc> - <p>This function computes the time from the given number of - seconds. <c><anno>Seconds</anno></c> must be less than the number of + <p>Computes the time from the specified number of seconds. + <c><anno>Seconds</anno></c> must be less than the number of seconds per day (86400).</p> </desc> </func> + <func> <name name="time_difference" arity="2"/> - <fsummary>Compute the difference between two times (deprecated)</fsummary> + <fsummary>Compute the difference between two times (deprecated). + </fsummary> <desc> - <p>This function returns the difference between two <c>{Date, Time}</c> tuples. <c><anno>T2</anno></c> should refer to an epoch later + <p>Returns the difference between two <c>{Date, Time}</c> tuples. + <c><anno>T2</anno></c> is to refer to an epoch later than <c><anno>T1</anno></c>.</p> <warning> <p>This function is obsolete. Use the conversion functions for @@ -317,33 +352,38 @@ </warning> </desc> </func> + <func> <name name="time_to_seconds" arity="1"/> - <fsummary>Compute the number of seconds since midnight up to the given time</fsummary> + <fsummary>Compute the number of seconds since midnight up to the + specified time.</fsummary> <type name="secs_per_day"/> <desc> - <p>This function computes the number of seconds since midnight + <p>Returns the number of seconds since midnight up to the specified time.</p> </desc> </func> + <func> <name name="universal_time" arity="0"/> - <fsummary>Compute universal time</fsummary> + <fsummary>Compute universal time.</fsummary> <desc> - <p>This function returns the Universal Coordinated Time (UTC) - reported by the underlying operating system. Local time is - returned if universal time is not available.</p> + <p>Returns the Universal Coordinated Time (UTC) + reported by the underlying operating system. Returns local time if + universal time is unavailable.</p> </desc> </func> + <func> <name name="universal_time_to_local_time" arity="1"/> - <fsummary>Convert from universal time to local time</fsummary> + <fsummary>Convert from universal time to local time.</fsummary> <desc> - <p>This function converts from Universal Coordinated Time (UTC) - to local time. <c><anno>DateTime</anno></c> must refer to a date after Jan 1, - 1970.</p> + <p>Converts from Universal Coordinated Time (UTC) to local time. + <c><anno>DateTime</anno></c> must refer to a date after Jan 1, 1970. + </p> </desc> </func> + <func> <name name="valid_date" arity="1"/> <name name="valid_date" arity="3"/> @@ -362,31 +402,31 @@ <title>Leap Years</title> <p>The notion that every fourth year is a leap year is not completely true. By the Gregorian rule, a year Y is a leap year if - either of the following rules is valid:</p> + one of the following rules is valid:</p> <list type="bulleted"> <item> - <p>Y is divisible by 4, but not by 100; or</p> + <p>Y is divisible by 4, but not by 100.</p> </item> <item> <p>Y is divisible by 400.</p> </item> </list> - <p>Accordingly, 1996 is a leap year, 1900 is not, but 2000 is.</p> + <p>Hence, 1996 is a leap year, 1900 is not, but 2000 is.</p> </section> <section> <title>Date and Time Source</title> <p>Local time is obtained from the Erlang BIF <c>localtime/0</c>. Universal time is computed from the BIF <c>universaltime/0</c>.</p> - <p>The following facts apply:</p> + <p>The following fapply:</p> <list type="bulleted"> - <item>there are 86400 seconds in a day</item> - <item>there are 365 days in an ordinary year</item> - <item>there are 366 days in a leap year</item> - <item>there are 1461 days in a 4 year period</item> - <item>there are 36524 days in a 100 year period</item> - <item>there are 146097 days in a 400 year period</item> - <item>there are 719528 days between Jan 1, 0 and Jan 1, 1970.</item> + <item>There are 86400 seconds in a day.</item> + <item>There are 365 days in an ordinary year.</item> + <item>There are 366 days in a leap year.</item> + <item>There are 1461 days in a 4 year period.</item> + <item>There are 36524 days in a 100 year period.</item> + <item>There are 146097 days in a 400 year period.</item> + <item>There are 719528 days between Jan 1, 0 and Jan 1, 1970.</item> </list> </section> </erlref> diff --git a/lib/stdlib/doc/src/dets.xml b/lib/stdlib/doc/src/dets.xml index 177c2ba508..2e4261d72e 100644 --- a/lib/stdlib/doc/src/dets.xml +++ b/lib/stdlib/doc/src/dets.xml @@ -26,82 +26,100 @@ <prepared>Claes Wikström</prepared> <responsible>Claes Wikström</responsible> <docno></docno> - <approved>nobody</approved> - <checked>no</checked> + <approved></approved> + <checked></checked> <date>2001-06-06</date> <rev>B</rev> - <file>dets.sgml</file> + <file>dets.xml</file> </header> <module>dets</module> - <modulesummary>A Disk Based Term Storage</modulesummary> + <modulesummary>A disk-based term storage.</modulesummary> <description> - <p>The module <c>dets</c> provides a term storage on file. The + <p>This module provides a term storage on file. The stored terms, in this module called <em>objects</em>, are tuples such that one element is defined to be the key. A Dets <em>table</em> is a collection of objects with the key at the same position stored on a file.</p> - <p>Dets is used by the Mnesia application, and is provided as is - for users who are interested in an efficient storage of Erlang - terms on disk only. Many applications just need to store some + + <p>This module is used by the Mnesia application, and is provided + "as is" for users who are interested in efficient storage of Erlang + terms on disk only. Many applications only need to store some terms in a file. Mnesia adds transactions, queries, and distribution. The size of Dets files cannot exceed 2 GB. If larger - tables are needed, Mnesia's table fragmentation can be used.</p> - <p>There are three types of Dets tables: set, bag and - duplicate_bag. A table of type <em>set</em> has at most one object - with a given key. If an object with a key already present in the - table is inserted, the existing object is overwritten by the new - object. A table of type <em>bag</em> has zero or more different - objects with a given key. A table of type <em>duplicate_bag</em> - has zero or more possibly matching objects with a given key.</p> + tables are needed, table fragmentation in Mnesia can be used.</p> + + <p>Three types of Dets tables exist:</p> + + <list type="bulleted"> + <item><p><c>set</c>. A table of this type has at most one object with a + given key. If an object with a key already present in the + table is inserted, the existing object is overwritten by the new + object.</p> + </item> + <item><p><c>bag</c>. A table of this type has zero or more different + objects with a given key.</p> + </item> + <item><p><c>duplicate_bag</c>. A table of this type has zero or more + possibly matching objects with a given key.</p> + </item> + </list> + <p>Dets tables must be opened before they can be updated or read, - and when finished they must be properly closed. If a table has not - been properly closed, Dets will automatically repair the table. + and when finished they must be properly closed. If a table is not + properly closed, Dets automatically repairs the table. This can take a substantial time if the table is large. A Dets table is closed when the process which opened the table - terminates. If several Erlang processes (users) open the same Dets - table, they will share the table. The table is properly closed + terminates. If many Erlang processes (users) open the same Dets + table, they share the table. The table is properly closed when all users have either terminated or closed the table. Dets - tables are not properly closed if the Erlang runtime system is - terminated abnormally.</p> + tables are not properly closed if the Erlang runtime system + terminates abnormally.</p> + <note> - <p>A ^C command abnormally terminates an Erlang runtime + <p>A <c>^C</c> command abnormally terminates an Erlang runtime system in a Unix environment with a break-handler.</p> </note> - <p>Since all operations performed by Dets are disk operations, it + + <p>As all operations performed by Dets are disk operations, it is important to realize that a single look-up operation involves a - series of disk seek and read operations. For this reason, the Dets - functions are much slower than the corresponding Ets functions, + series of disk seek and read operations. The Dets functions + are therefore much slower than the corresponding + <seealso marker="ets"><c>ets(3)</c></seealso> functions, although Dets exports a similar interface.</p> + <p>Dets organizes data as a linear hash list and the hash list grows gracefully as more data is inserted into the table. Space management on the file is performed by what is called a buddy system. The current implementation keeps the entire buddy system in RAM, which implies that if the table gets heavily fragmented, quite some memory can be used up. The only way to defragment a - table is to close it and then open it again with the <c>repair</c> - option set to <c>force</c>.</p> - <p>It is worth noting that the ordered_set type present in Ets is - not yet implemented by Dets, neither is the limited support for - concurrent updates which makes a sequence of <c>first</c> and - <c>next</c> calls safe to use on fixed Ets tables. Both these - features will be implemented by Dets in a future release of - Erlang/OTP. Until then, the Mnesia application (or some user - implemented method for locking) has to be used to implement safe - concurrency. Currently, no library of Erlang/OTP has support for - ordered disk based term storage.</p> + table is to close it and then open it again with option <c>repair</c> + set to <c>force</c>.</p> + + <p>Notice that type <c>ordered_set</c> in Ets is not yet + provided by Dets, neither is the limited support for + concurrent updates that makes a sequence of <c>first</c> and + <c>next</c> calls safe to use on fixed ETS tables. Both these + features will be provided by Dets in a future release of + Erlang/OTP. Until then, the Mnesia application (or some + user-implemented method for locking) must be used to implement safe + concurrency. Currently, no Erlang/OTP library has support for + ordered disk-based term storage.</p> + <p>Two versions of the format used for storing objects on file are supported by Dets. The first version, 8, is the format always used - for tables created by OTP R7 and earlier. The second version, 9, - is the default version of tables created by OTP R8 (and later OTP - releases). OTP R8 can create version 8 tables, and convert version - 8 tables to version 9, and vice versa, upon request. - </p> + for tables created by Erlang/OTP R7 and earlier. The second version, 9, + is the default version of tables created by Erlang/OTP R8 (and later + releases). Erlang/OTP R8 can create version 8 tables, and convert version + 8 tables to version 9, and conversely, upon request.</p> <p>All Dets functions return <c>{error, Reason}</c> if an error - occurs (<c>first/1</c> and <c>next/2</c> are exceptions, they exit - the process with the error tuple). If given badly formed - arguments, all functions exit the process with a <c>badarg</c> + occurs (<seealso marker="#first/1"><c>first/1</c></seealso> and + <seealso marker="#next/2"><c>next/2</c></seealso> are exceptions, they + exit the process with the error tuple). If badly formed arguments are + specified, all functions exit the process with a <c>badarg</c> message.</p> </description> + <datatypes> <datatype> <name name="access"/> @@ -130,10 +148,11 @@ <datatype> <name name="match_spec"/> <desc> - <p>Match specifications, see the <seealso - marker="erts:match_spec">match specification</seealso> - documentation in the ERTS User's Guide and <seealso - marker="ms_transform">ms_transform(3).</seealso></p> + <p>Match specifications, see section + <seealso marker="erts:match_spec"> + Match Specification in Erlang</seealso> in ERTS User's Guide and the + <seealso marker="ms_transform"><c>ms_transform(3)</c></seealso> + module.</p> </desc> </datatype> <datatype> @@ -146,15 +165,15 @@ <name name="object_cont"/> <desc> <p>Opaque continuation used by <seealso marker="#match_object/1"> - <c>match_object/1</c></seealso> and <seealso marker="#match_object/3"> - <c>match_object/3</c></seealso>.</p> + <c>match_object/1</c></seealso> and + <seealso marker="#match_object/3"><c>match_object/3</c></seealso>.</p> </desc> </datatype> <datatype> <name name="pattern"/> <desc> - <p>See <seealso marker="ets#match/2">ets:match/2</seealso> for a - description of patterns.</p> + <p>For a description of patterns, see + <seealso marker="ets#match/2"><c>ets:match/2</c></seealso>.</p> </desc> </datatype> <datatype> @@ -175,67 +194,69 @@ <name name="version"/> </datatype> </datatypes> + <funcs> <func> <name name="all" arity="0"/> - <fsummary>Return a list of the names of all open Dets tables on this node.</fsummary> + <fsummary>Return a list of the names of all open Dets tables on + this node.</fsummary> <desc> - <p>Returns a list of the names of all open tables on this - node.</p> + <p>Returns a list of the names of all open tables on this node.</p> </desc> </func> + <func> <name name="bchunk" arity="2"/> - <fsummary>Return a chunk of objects stored in a Dets table.</fsummary> + <fsummary>Return a chunk of objects stored in a Dets table. + </fsummary> <desc> <p>Returns a list of objects stored in a table. The exact representation of the returned objects is not public. The - lists of data can be used for initializing a table by giving - the value <c>bchunk</c> to the <c>format</c> option of the + lists of data can be used for initializing a table by specifying + value <c>bchunk</c> to option <c>format</c> of function <seealso marker="#init_table/3"><c>init_table/3</c></seealso> - function. The Mnesia application uses this + The Mnesia application uses this function for copying open tables.</p> <p>Unless the table is protected using <c>safe_fixtable/2</c>, - calls to <c>bchunk/2</c> may not work as expected if + calls to <c>bchunk/2</c> do possibly not work as expected if concurrent updates are made to the table.</p> <p>The first time <c>bchunk/2</c> is called, an initial continuation, the atom <c>start</c>, must be provided.</p> - <p>The <c>bchunk/2</c> function returns a tuple + <p><c>bchunk/2</c> returns a tuple <c>{<anno>Continuation2</anno>, <anno>Data</anno>}</c>, where <c><anno>Data</anno></c> is a list of objects. <c><anno>Continuation2</anno></c> is another continuation - which is - to be passed on to a subsequent call to <c>bchunk/2</c>. With - a series of calls to <c>bchunk/2</c> it is possible to extract - all objects of the table. - </p> + that is to be passed on to a subsequent call to <c>bchunk/2</c>. With + a series of calls to <c>bchunk/2</c>, all table objects can be + extracted.</p> <p><c>bchunk/2</c> returns <c>'$end_of_table'</c> when all - objects have been returned, or <c>{error, <anno>Reason</anno>}</c> - if an error occurs. - </p> + objects are returned, or <c>{error, <anno>Reason</anno>}</c> + if an error occurs.</p> </desc> </func> + <func> <name name="close" arity="1"/> <fsummary>Close a Dets table.</fsummary> <desc> <p>Closes a table. Only processes that have opened a table are - allowed to close it. - </p> + allowed to close it.</p> <p>All open tables must be closed before the system is - stopped. If an attempt is made to open a table which has not - been properly closed, Dets automatically tries to repair the - table.</p> + stopped. If an attempt is made to open a table that is not + properly closed, Dets automatically tries to repair it.</p> </desc> </func> + <func> <name name="delete" arity="2"/> - <fsummary>Delete all objects with a given key from a Dets table.</fsummary> + <fsummary>Delete all objects with a specified key from a Dets + table.</fsummary> <desc> - <p>Deletes all objects with the key <c><anno>Key</anno></c> from - the table <c><anno>Name</anno></c>.</p> + <p>Deletes all objects with key <c><anno>Key</anno></c> from + table <c><anno>Name</anno></c>.</p> </desc> </func> + <func> <name name="delete_all_objects" arity="1"/> <fsummary>Delete all objects from a Dets table.</fsummary> @@ -245,264 +266,275 @@ is equivalent to <c>match_delete(T, '_')</c>.</p> </desc> </func> + <func> <name name="delete_object" arity="2"/> - <fsummary>Delete a given object from a Dets table.</fsummary> + <fsummary>Delete a specified object from a Dets table.</fsummary> <desc> - <p>Deletes all instances of a given object from a table. If a - table is of type <c>bag</c> or <c>duplicate_bag</c>, the - <c>delete/2</c> function cannot be used to delete only some of - the objects with a given key. This function makes this - possible.</p> + <p>Deletes all instances of a specified object from a table. If a + table is of type <c>bag</c> or <c>duplicate_bag</c>, this + function can be used to delete only some of + the objects with a specified key.</p> </desc> </func> + <func> <name name="first" arity="1"/> <fsummary>Return the first key stored in a Dets table.</fsummary> <desc> - <p>Returns the first key stored in the table <c><anno>Name</anno></c> - according to the table's internal order, or + <p>Returns the first key stored in table <c><anno>Name</anno></c> + according to the internal order of the table, or <c>'$end_of_table'</c> if the table is empty.</p> <p>Unless the table is protected using <c>safe_fixtable/2</c>, subsequent calls to <seealso marker="#next/2"><c>next/2</c></seealso> - may not work as expected if + do possibly not work as expected if concurrent updates are made to the table.</p> - <p>Should an error occur, the process is exited with an error - tuple <c>{error, Reason}</c>. The reason for not returning the - error tuple is that it cannot be distinguished from a key.</p> + <p>If an error occurs, the process is exited with an error + tuple <c>{error, Reason}</c>. The error tuple is not returned, + as it cannot be distinguished from a key.</p> <p>There are two reasons why <c>first/1</c> and <c>next/2</c> - should not be used: they are not very efficient, and they - prevent the use of the key <c>'$end_of_table'</c> since this - atom is used to indicate the end of the table. If possible, - the <c>match</c>, <c>match_object</c>, and <c>select</c> - functions should be used for traversing tables.</p> + are not to be used: they are not efficient, and they + prevent the use of key <c>'$end_of_table'</c>, as this atom + is used to indicate the end of the table. If possible, use functions + <seealso marker="#match/1"><c>match</c></seealso>, + <seealso marker="#match_object/1"><c>match_object</c></seealso>, and + <seealso marker="#select/1"><c>select</c></seealso> + for traversing tables.</p> </desc> </func> + <func> <name name="foldl" arity="3"/> <name name="foldr" arity="3"/> <fsummary>Fold a function over a Dets table.</fsummary> <desc> <p>Calls <c><anno>Function</anno></c> on successive elements of - the table <c><anno>Name</anno></c> together with an extra argument - <c>AccIn</c>. The - order in which the elements of the table are traversed is - unspecified. <c><anno>Function</anno></c> must return a new - accumulator which is passed to the next call. - <c><anno>Acc0</anno></c> is returned if - the table is empty.</p> + table <c><anno>Name</anno></c> together with an extra argument + <c>AccIn</c>. The table elements are traversed in unspecified + order. <c><anno>Function</anno></c> must return a new + accumulator that is passed to the next call. + <c><anno>Acc0</anno></c> is returned if the table is empty.</p> </desc> </func> + <func> <name name="from_ets" arity="2"/> - <fsummary>Replace the objects of a Dets table with the objects of an Ets table.</fsummary> + <fsummary>Replace the objects of a Dets table with the objects + of an ETS table.</fsummary> <desc> - <p>Deletes all objects of the table <c><anno>Name</anno></c> and then - inserts all the objects of the Ets table <c><anno>EtsTab</anno></c>. - The order in which the objects are inserted is not specified. - Since <c>ets:safe_fixtable/2</c> is called the Ets table must - be public or owned by the calling process.</p> + <p>Deletes all objects of table <c><anno>Name</anno></c> and then + inserts all the objects of the ETS table + <c><anno>EtsTab</anno></c>. The objects are inserted in unspecified + order. As <c>ets:safe_fixtable/2</c> is called, the ETS table + must be public or owned by the calling process.</p> </desc> </func> + <func> <name name="info" arity="1"/> <fsummary>Return information about a Dets table.</fsummary> <desc> - <p>Returns information about the table <c><anno>Name</anno></c> - as a list of tuples:</p> + <p>Returns information about table <c><anno>Name</anno></c> + as a list of tuples:</p> <list type="bulleted"> <item> - <p><c>{file_size, integer() >= 0}</c>, the size of the file in - bytes.</p> + <p><c>{file_size, integer() >= 0}}</c> - The file size, in + bytes.</p> </item> <item> - <p><c>{filename, </c><seealso marker="file#type-name">file:name()</seealso><c>}</c>, - the name of the file where objects are stored.</p> + <p><c>{filename, </c><seealso marker="file#type-name"> + <c>file:name()</c></seealso><c>}</c> - The name of the file + where objects are stored.</p> </item> <item> - <p><c>{keypos, </c><seealso marker="#type-keypos">keypos()</seealso> - <c>}</c>, the position of the key.</p> + <p><c>{keypos, </c><seealso marker="#type-keypos"> + <c>keypos()</c></seealso><c>}</c> - The key position.</p> </item> <item> - <p><c>{size, integer() >= 0}</c>, the number of objects stored - in the table.</p> + <p><c>{size, integer() >= 0}</c> - The number of objects + stored in the table.</p> </item> <item> - <p><c>{type, </c><seealso marker="#type-type">type()</seealso> - <c>}</c>, the type of the table.</p> + <p><c>{type, </c><seealso marker="#type-type"> + <c>type()</c></seealso><c>}</c> - The table type.</p> </item> </list> </desc> </func> + <func> <name name="info" arity="2"/> - <fsummary>Return the information associated with a given item for a Dets table.</fsummary> + <fsummary>Return the information associated with a specified item for + a Dets table.</fsummary> <desc> <p>Returns the information associated with <c><anno>Item</anno></c> - for the table <c><anno>Name</anno></c>. + for table <c><anno>Name</anno></c>. In addition to the <c>{<anno>Item</anno>, <anno>Value</anno>}</c> - pairs defined for <c>info/1</c>, the following items are - allowed:</p> + pairs defined for <seealso marker="#info/1"><c>info/1</c></seealso>, + the following items are allowed:</p> <list type="bulleted"> <item> - <p><c>{access, </c><seealso marker="#type-access">access()</seealso> - <c>}</c>, the access mode.</p> + <p><c>{access, </c><seealso marker="#type-access"> + <c>access()</c></seealso><c>}</c> - The access mode.</p> </item> <item> <p><c>{auto_save, </c><seealso marker="#type-auto_save"> - auto_save()</seealso><c>}</c>, the auto save interval.</p> + <c>auto_save()</c></seealso><c>}</c> - The autosave interval.</p> </item> <item> - <p><c>{bchunk_format, binary()}</c>, an opaque binary + <p><c>{bchunk_format, binary()}</c> - An opaque binary describing the format of the objects returned by <c>bchunk/2</c>. The binary can be used as argument to <c>is_compatible_chunk_format/2</c>. Only available for version 9 tables.</p> </item> <item> - <p><c>{hash,</c> Hash<c>}</c>. Describes which BIF is - used to calculate the hash values of the objects stored in - the Dets table. Possible values of Hash are <c>hash</c>, - which implies that the <c>erlang:hash/2</c> BIF is used, - <c>phash</c>, which implies that the <c>erlang:phash/2</c> - BIF is used, and <c>phash2</c>, which implies that the - <c>erlang:phash2/1</c> BIF is used.</p> + <p><c>{hash, Hash}</c> - Describes which BIF is + used to calculate the hash values of the objects stored in the + Dets table. Possible values of <c>Hash</c>:</p> + <list> + <item> + <p><c>hash</c> - Implies that the <c>erlang:hash/2</c> BIF + is used.</p> + </item> + <item> + <p><c>phash</c> - Implies that the <c>erlang:phash/2</c> BIF + is used.</p> + </item> + <item> + <p><c>phash2</c> - Implies that the <c>erlang:phash2/1</c> BIF + is used.</p> + </item> + </list> </item> <item> - <p><c>{memory, integer() >= 0}</c>, the size of the file in - bytes. The same value is associated with the item - <c>file_size</c>.</p> + <p><c>{memory, integer() >= 0}</c> - The file size, in bytes. + The same value is associated with item <c>file_size</c>.</p> </item> <item> - <p><c>{no_keys, integer >= 0()}</c>, the number of different + <p><c>{no_keys, integer >= 0()}</c> - The number of different keys stored in the table. Only available for version 9 tables.</p> </item> <item> - <p><c>{no_objects, integer >= 0()}</c>, the number of objects + <p><c>{no_objects, integer >= 0()}</c> - The number of objects stored in the table.</p> </item> <item> - <p><c>{no_slots, {</c>Min<c>, </c>Used<c>, </c>Max<c>}}</c>, - the number of - slots of the table. <c>Min</c> is the minimum number of + <p><c>{no_slots, {Min, Used, Max}}</c> - The + number of slots of the table. <c>Min</c> is the minimum number of slots, <c>Used</c> is the number of currently used slots, and <c>Max</c> is the maximum number of slots. Only available for version 9 tables.</p> </item> <item> - <p><c>{owner, pid()}</c>, the pid of the process that + <p><c>{owner, pid()}</c> - The pid of the process that handles requests to the Dets table.</p> </item> <item> - <p><c>{ram_file, boolean()}</c>, whether the table is + <p><c>{ram_file, boolean()}</c> - Whether the table is kept in RAM.</p> </item> <item> - <p><c>{safe_fixed_monotonic_time, SafeFixed}</c>. If the table - is fixed, <c>SafeFixed</c> is a tuple <c>{FixedAtTime, [{Pid,RefCount}]}</c>. - <c>FixedAtTime</c> is the time when + <p><c>{safe_fixed_monotonic_time, SafeFixed}</c> - If the table + is fixed, <c>SafeFixed</c> is a tuple + <c>{FixedAtTime, [{Pid,RefCount}]}</c>. + <c>FixedAtTime</c> is the time when the table was first fixed, and <c>Pid</c> is the pid of the process that fixes the table <c>RefCount</c> times. - There may be any number of processes in the list. If the - table is not fixed, SafeFixed is the atom <c>false</c>.</p> - <p><c>FixedAtTime</c> will correspond to the result - returned by - <seealso marker="erts:erlang#monotonic_time/0">erlang:monotonic_time/0</seealso> - at the time of fixation. The usage of <c>safe_fixed_monotonic_time</c> is - <seealso marker="erts:time_correction#Time_Warp_Safe_Code">time warp - safe</seealso>.</p> + There can be any number of processes in the list. If the table + is not fixed, <c>SafeFixed</c> is the atom <c>false</c>.</p> + <p><c>FixedAtTime</c> corresponds to the result returned by + <seealso marker="erts:erlang#monotonic_time/0"> + <c>erlang:monotonic_time/0</c></seealso> at the time of fixation. + The use of <c>safe_fixed_monotonic_time</c> is + <seealso marker="erts:time_correction#Time_Warp_Safe_Code"> + time warp safe</seealso>.</p> </item> <item> - <p> - <c>{safe_fixed, SafeFixed}</c>. The same as - <c>{safe_fixed_monotonic_time, SafeFixed}</c> with the exception - of the format and value of <c>FixedAtTime</c>. - </p> - <p> - <c>FixedAtTime</c> will correspond to the result returned by - <seealso marker="erts:erlang#timestamp/0">erlang:timestamp/0</seealso> - at the time of fixation. Note that when the system is using - single or multi - <seealso marker="erts:time_correction#Time_Warp_Modes">time warp - modes</seealso> this might produce strange results. This - since the usage of <c>safe_fixed</c> is not - <seealso marker="erts:time_correction#Time_Warp_Safe_Code">time warp - safe</seealso>. Time warp safe code need to use - <c>safe_fixed_monotonic_time</c> instead.</p> + <p><c>{safe_fixed, SafeFixed}</c> - The same as + <c>{safe_fixed_monotonic_time, SafeFixed}</c> except + the format and value of <c>FixedAtTime</c>.</p> + <p><c>FixedAtTime</c> corresponds to the result returned by + <seealso marker="erts:erlang#timestamp/0"> + <c>erlang:timestamp/0</c></seealso> at the time of fixation. + Notice that when the system uses single or multi + <seealso marker="erts:time_correction#Time_Warp_Modes">time warp + modes</seealso>, this can produce strange results. This is + because the use of <c>safe_fixed</c> is not + <seealso marker="erts:time_correction#Time_Warp_Safe_Code"> + time warp safe</seealso>. Time warp safe code must use + <c>safe_fixed_monotonic_time</c> instead.</p> </item> <item> - <p><c>{version, integer()}</c>, the version of the format of + <p><c>{version, integer()}</c> - The version of the format of the table.</p> </item> </list> </desc> </func> + <func> <name name="init_table" arity="2"/> <name name="init_table" arity="3"/> <fsummary>Replace all objects of a Dets table.</fsummary> <desc> - <p>Replaces the existing objects of the table <c><anno>Name</anno></c> + <p>Replaces the existing objects of table <c><anno>Name</anno></c> with objects created by calling the input function <c><anno>InitFun</anno></c>, see below. The reason for using this function rather than - calling <c>insert/2</c> is that of efficiency. It should be - noted that the input functions are called by the process that + calling <c>insert/2</c> is that of efficiency. Notice + that the input functions are called by the process that handles requests to the Dets table, not by the calling process.</p> - <p>When called with the argument <c>read</c> the function - <c><anno>InitFun</anno></c> is assumed to return - <c>end_of_input</c> when - there is no more input, or <c>{Objects, Fun}</c>, where + <p>When called with argument <c>read</c>, function + <c><anno>InitFun</anno></c> is assumed to return <c>end_of_input</c> + when there is no more input, or <c>{Objects, Fun}</c>, where <c>Objects</c> is a list of objects and <c>Fun</c> is a new - input function. Any other value Value is returned as an error - <c>{error, {init_fun, Value}}</c>. Each input function will be - called exactly once, and should an error occur, the last - function is called with the argument <c>close</c>, the reply + input function. Any other value <c>Value</c> is returned as an error + <c>{error, {init_fun, Value}}</c>. Each input function is + called exactly once, and if an error occurs, the last + function is called with argument <c>close</c>, the reply of which is ignored.</p> - <p>If the type of the table is <c>set</c> and there is more - than one object with a given key, one of the objects is + <p>If the table type is <c>set</c> and more + than one object exists with a given key, one of the objects is chosen. This is not necessarily the last object with the given key in the sequence of objects returned by the input - functions. Duplicate keys should be avoided, or the file - will be unnecessarily fragmented. This holds also for duplicated + functions. Avoid duplicate keys, otherwise the file becomes + unnecessarily fragmented. This holds also for duplicated objects stored in tables of type <c>bag</c>.</p> <p>It is important that the table has a sufficient number of - slots for the objects. If not, the hash list will start to - grow when <c>init_table/2</c> returns which will significantly - slow down access to the table for a period of time. The - minimum number of slots is set by the <c>open_file/2</c> - option <c>min_no_slots</c> and returned by the <c>info/2</c> - item <c>no_slots</c>. See also the <c>min_no_slots</c> option - below. - </p> - <p>The <c><anno>Options</anno></c> argument is a list of - <c>{Key, Val}</c> - tuples where the following values are allowed:</p> + slots for the objects. If not, the hash list starts to + grow when <c>init_table/2</c> returns, which significantly + slows down access to the table for a period of time. The + minimum number of slots is set by the <c>open_file/2</c> option + <c>min_no_slots</c> and returned by the <c>info/2</c> + item <c>no_slots</c>. See also option <c>min_no_slots</c> below.</p> + <p>Argument <c><anno>Options</anno></c> is a list of <c>{Key, Val}</c> + tuples, where the following values are allowed:</p> <list type="bulleted"> <item> - <p><c>{min_no_slots, no_slots()}</c>. Specifies the - estimated number of different keys that will be stored - in the table. The <c>open_file</c> option with the same - name is ignored unless the table is created, and in that + <p><c>{min_no_slots, no_slots()}</c> - Specifies the + estimated number of different keys to be stored + in the table. The <c>open_file/2</c> option with the same + name is ignored, unless the table is created, in which case performance can be enhanced by supplying an estimate when initializing the table.</p> </item> <item> - <p><c>{format, Format}</c>. Specifies the format of the - objects returned by the function <c><anno>InitFun</anno></c>. If + <p><c>{format, Format}</c> - Specifies the format of the + objects returned by function <c><anno>InitFun</anno></c>. If <c>Format</c> is <c>term</c> (the default), - <c><anno>InitFun</anno></c> is assumed to return a list of tuples. If - <c>Format</c> is <c>bchunk</c>, <c><anno>InitFun</anno></c> is + <c><anno>InitFun</anno></c> is assumed to return a list of tuples. + If <c>Format</c> is <c>bchunk</c>, <c><anno>InitFun</anno></c> is assumed to return <c><anno>Data</anno></c> as returned by <seealso marker="#bchunk/2"><c>bchunk/2</c></seealso>. - This option overrides the - <c>min_no_slots</c> option.</p> + This option overrides option <c>min_no_slots</c>.</p> </item> </list> </desc> </func> + <func> <name name="insert" arity="2"/> <fsummary>Insert one or more objects into a Dets table.</fsummary> @@ -513,46 +545,50 @@ the old object will be replaced.</p> </desc> </func> + <func> <name name="insert_new" arity="2"/> <fsummary>Insert one or more objects into a Dets table.</fsummary> <desc> - <p>Inserts one or more objects into the table <c><anno>Name</anno></c>. + <p>Inserts one or more objects into table <c><anno>Name</anno></c>. If there already exists some object with a key matching the key - of any of the given objects the table is not updated and - <c>false</c> is returned, otherwise the objects are inserted + of any of the specified objects, the table is not updated and + <c>false</c> is returned. Otherwise the objects are inserted and <c>true</c> returned.</p> </desc> </func> + <func> <name name="is_compatible_bchunk_format" arity="2"/> - <fsummary>Test compatibility of a table's chunk data.</fsummary> + <fsummary>Test compatibility of chunk data of a table.</fsummary> <desc> <p>Returns <c>true</c> if it would be possible to initialize - the table <c><anno>Name</anno></c>, using - <seealso marker="#init_table/3"><c>init_table/3</c></seealso> - with the - option <c>{format, bchunk}</c>, with objects read with + table <c><anno>Name</anno></c>, using + <seealso marker="#init_table/3"><c>init_table/3</c></seealso> with + option <c>{format, bchunk}</c>, with objects read with <seealso marker="#bchunk/2"><c>bchunk/2</c></seealso> from some - table <c>T</c> such that calling + table <c>T</c>, such that calling <c>info(T, bchunk_format)</c> returns <c>BchunkFormat</c>.</p> </desc> </func> + <func> <name name="is_dets_file" arity="1"/> <fsummary>Test for a Dets table.</fsummary> <desc> - <p>Returns <c>true</c> if the file <c><anno>Filename</anno></c> - is a Dets table, <c>false</c> otherwise.</p> + <p>Returns <c>true</c> if file <c><anno>Filename</anno></c> + is a Dets table, otherwise <c>false</c>.</p> </desc> </func> + <func> <name name="lookup" arity="2"/> - <fsummary>Return all objects with a given key stored in a Dets table.</fsummary> + <fsummary>Return all objects with a specified key stored in a + Dets table.</fsummary> <desc> - <p>Returns a list of all objects with the key <c><anno>Key</anno></c> - stored in the table <c><anno>Name</anno></c>. For example:</p> + <p>Returns a list of all objects with key <c><anno>Key</anno></c> + stored in table <c><anno>Name</anno></c>, for example:</p> <pre> 2> <input>dets:open_file(abc, [{type, bag}]).</input> {ok,abc} @@ -561,394 +597,419 @@ ok 4> <input>dets:insert(abc, {1,3,4}).</input> ok 5> <input>dets:lookup(abc, 1).</input> -[{1,2,3},{1,3,4}] </pre> - <p>If the table is of type <c>set</c>, the function returns +[{1,2,3},{1,3,4}]</pre> + <p>If the table type is <c>set</c>, the function returns either the empty list or a list with one object, as there cannot be more than one object with a given key. If the table - is of type <c>bag</c> or <c>duplicate_bag</c>, the function + type is <c>bag</c> or <c>duplicate_bag</c>, the function returns a list of arbitrary length.</p> - <p>Note that the order of objects returned is unspecified. In + <p>Notice that the order of objects returned is unspecified. In particular, the order in which objects were inserted is not reflected.</p> </desc> </func> + <func> <name name="match" arity="1"/> - <fsummary>Match a chunk of objects stored in a Dets table and return a list of variable bindings.</fsummary> + <fsummary>Match a chunk of objects stored in a Dets table and + return a list of variable bindings.</fsummary> <desc> <p>Matches some objects stored in a table and returns a - non-empty list of the bindings that match a given pattern in + non-empty list of the bindings matching a specified pattern in some unspecified order. The table, the pattern, and the number of objects that are matched are all defined by - <c><anno>Continuation</anno></c>, which has been returned by a prior - call to <c>match/1</c> or <c>match/3</c>.</p> - <p>When all objects of the table have been matched, + <c><anno>Continuation</anno></c>, which has been returned by a + previous call to <c>match/1</c> or <c>match/3</c>.</p> + <p>When all table objects are matched, <c>'$end_of_table'</c> is returned.</p> </desc> </func> + <func> <name name="match" arity="2"/> - <fsummary>Match the objects stored in a Dets table and return a list of variable bindings.</fsummary> + <fsummary>Match the objects stored in a Dets table and return a + list of variable bindings.</fsummary> <desc> - <p>Returns for each object of the table <c><anno>Name</anno></c> that - matches <c><anno>Pattern</anno></c> a list of bindings in some unspecified - order. See <seealso marker="ets#match/2">ets:match/2</seealso> for a - description of patterns. If the keypos'th element of - <c><anno>Pattern</anno></c> is unbound, all objects of the table are + <p>Returns for each object of table <c><anno>Name</anno></c> that + matches <c><anno>Pattern</anno></c> a list of bindings in some + unspecified order. For a description of patterns, see + <seealso marker="ets#match/2"><c>ets:match/2</c></seealso>. + If the keypos'th element of + <c><anno>Pattern</anno></c> is unbound, all table objects are matched. If the keypos'th element is bound, only the - objects with the right key are matched.</p> + objects with the correct key are matched.</p> </desc> </func> + <func> <name name="match" arity="3"/> - <fsummary>Match the first chunk of objects stored in a Dets table and return a list of variable bindings.</fsummary> + <fsummary>Match the first chunk of objects stored in a Dets table + and return a list of variable bindings.</fsummary> <desc> - <p>Matches some or all objects of the table <c><anno>Name</anno></c> and + <p>Matches some or all objects of table <c><anno>Name</anno></c> and returns a non-empty list of the bindings that match <c><anno>Pattern</anno></c> in some unspecified order. - See <seealso marker="ets#match/2">ets:match/2</seealso> for a - description of patterns.</p> + For a description of patterns, see + <seealso marker="ets#match/2"><c>ets:match/2</c></seealso>.</p> <p>A tuple of the bindings and a continuation is returned, unless the table is empty, in which case <c>'$end_of_table'</c> is returned. The continuation is to be used when matching further objects by calling <seealso marker="#match/1"><c>match/1</c></seealso>.</p> <p>If the keypos'th element of <c><anno>Pattern</anno></c> is bound, - all objects of the table are matched. If the keypos'th element is - unbound, all objects of the table are matched, <c><anno>N</anno></c> + all table objects are matched. If the keypos'th element is + unbound, all table objects are matched, <c><anno>N</anno></c> objects at a time, until at least one object matches or the - end of the table has been reached. The default, indicated by - giving <c><anno>N</anno></c> the value <c>default</c>, - is to let the number - of objects vary depending on the sizes of the objects. If - <c><anno>Name</anno></c> is a version 9 table, all objects with the same - key are always matched at the same time which implies that - more than <anno>N</anno> objects may sometimes be matched. - </p> - <p>The table should always be protected using - <c>safe_fixtable/2</c> before calling <c>match/3</c>, or - errors may occur when calling <c>match/1</c>.</p> + end of the table is reached. The default, indicated by + giving <c><anno>N</anno></c> the value <c>default</c>, is to let + the number of objects vary depending on the sizes of the objects. If + <c><anno>Name</anno></c> is a version 9 table, all objects with the + same key are always matched at the same time, which implies that + more than <anno>N</anno> objects can sometimes be matched.</p> + <p>The table is always to be protected using + <seealso marker="#safe_fixtable/2"><c>safe_fixtable/2</c></seealso> + before calling <c>match/3</c>, otherwise + errors can occur when calling <c>match/1</c>.</p> </desc> </func> + <func> <name name="match_delete" arity="2"/> - <fsummary>Delete all objects that match a given pattern from a Dets table.</fsummary> + <fsummary>Delete all objects that match a given pattern from a + Dets table.</fsummary> <desc> - <p>Deletes all objects that match <c><anno>Pattern</anno></c> from the - table <c><anno>Name</anno></c>. - See <seealso marker="ets#match/2">ets:match/2</seealso> for a - description of patterns.</p> + <p>Deletes all objects that match <c><anno>Pattern</anno></c> from + table <c><anno>Name</anno></c>. For a description of patterns, + see <seealso marker="ets#match/2"><c>ets:match/2</c></seealso>.</p> <p>If the keypos'th element of <c>Pattern</c> is bound, - only the objects with the right key are matched.</p> + only the objects with the correct key are matched.</p> </desc> </func> + <func> <name name="match_object" arity="1"/> - <fsummary>Match a chunk of objects stored in a Dets table and return a list of objects.</fsummary> + <fsummary>Match a chunk of objects stored in a Dets table and + return a list of objects.</fsummary> <desc> <p>Returns a non-empty list of some objects stored in a table that match a given pattern in some unspecified order. The table, the pattern, and the number of objects that are matched are all defined by <c><anno>Continuation</anno></c>, which has been - returned by a prior call to <c>match_object/1</c> or + returned by a previous call to <c>match_object/1</c> or <c>match_object/3</c>.</p> - <p>When all objects of the table have been matched, + <p>When all table objects are matched, <c>'$end_of_table'</c> is returned.</p> </desc> </func> + <func> <name name="match_object" arity="2"/> - <fsummary>Match the objects stored in a Dets table and return a list of objects.</fsummary> + <fsummary>Match the objects stored in a Dets table and return + a list of objects.</fsummary> <desc> - <p>Returns a list of all objects of the table <c><anno>Name</anno></c> that + <p>Returns a list of all objects of table <c><anno>Name</anno></c> that match <c><anno>Pattern</anno></c> in some unspecified order. - See <seealso marker="ets#match/2">ets:match/2</seealso> for a - description of patterns. - </p> + For a description of patterns, see + <seealso marker="ets#match/2"><c>ets:match/2</c></seealso>.</p> <p>If the keypos'th element of <c><anno>Pattern</anno></c> is - unbound, all objects of the table are matched. If the + unbound, all table objects are matched. If the keypos'th element of <c><anno>Pattern</anno></c> is bound, only the - objects with the right key are matched.</p> + objects with the correct key are matched.</p> <p>Using the <c>match_object</c> functions for traversing all - objects of a table is more efficient than calling + table objects is more efficient than calling <c>first/1</c> and <c>next/2</c> or <c>slot/2</c>.</p> </desc> </func> + <func> <name name="match_object" arity="3"/> - <fsummary>Match the first chunk of objects stored in a Dets table and return a list of objects.</fsummary> + <fsummary>Match the first chunk of objects stored in a Dets table + and return a list of objects.</fsummary> <desc> - <p>Matches some or all objects stored in the table <c><anno>Name</anno></c> + <p>Matches some or all objects stored in table <c><anno>Name</anno></c> and returns a non-empty list of the objects that match <c><anno>Pattern</anno></c> in some unspecified order. - See <seealso marker="ets#match/2">ets:match/2</seealso> for a - description of patterns.</p> + For a description of patterns, see + <seealso marker="ets#match/2"><c>ets:match/2</c></seealso>.</p> <p>A list of objects and a continuation is returned, unless the table is empty, in which case <c>'$end_of_table'</c> is returned. The continuation is to be used when matching - further objects by calling <c>match_object/1</c>.</p> - <p>If the keypos'th element of <c><anno>Pattern</anno></c> is bound, all - objects of the table are matched. If the keypos'th element is - unbound, all objects of the table are matched, <c><anno>N</anno></c> + further objects by calling + <seealso marker="#match_object/1"><c>match_object/1</c></seealso>.</p> + <p>If the keypos'th element of <c><anno>Pattern</anno></c> is bound, + all table objects are matched. If the keypos'th element is + unbound, all table objects are matched, <c><anno>N</anno></c> objects at a time, until at least one object matches or the - end of the table has been reached. The default, indicated by - giving <c><anno>N</anno></c> the value <c>default</c>, is to let the number + end of the table is reached. The default, indicated by + giving <c><anno>N</anno></c> the value <c>default</c>, + is to let the number of objects vary depending on the sizes of the objects. If - <c><anno>Name</anno></c> is a version 9 table, all matching objects with - the same key are always returned in the same reply which - implies that more than <anno>N</anno> objects may sometimes be returned. - </p> - <p>The table should always be protected using - <c>safe_fixtable/2</c> before calling <c>match_object/3</c>, - or errors may occur when calling <c>match_object/1</c>.</p> + <c><anno>Name</anno></c> is a version 9 table, all matching objects + with the same key are always returned in the same reply, which implies + that more than <anno>N</anno> objects can sometimes be returned.</p> + <p>The table is always to be protected using + <seealso marker="#safe_fixtable/2"><c>safe_fixtable/2</c></seealso> + before calling <c>match_object/3</c>, otherwise + errors can occur when calling <c>match_object/1</c>.</p> </desc> </func> + <func> <name name="member" arity="2"/> <fsummary>Test for occurrence of a key in a Dets table.</fsummary> <desc> - <p>Works like <c>lookup/2</c>, but does not return the - objects. The function returns <c>true</c> if one or more - elements of the table has the key <c><anno>Key</anno></c>, <c>false</c> - otherwise.</p> + <p>Works like <seealso marker="#lookup/2"><c>lookup/2</c></seealso>, + but does not return the objects. Returns <c>true</c> if one or more + table elements has key <c><anno>Key</anno></c>, otherwise + <c>false</c>.</p> </desc> </func> + <func> <name name="next" arity="2"/> <fsummary>Return the next key in a Dets table.</fsummary> <desc> - <p>Returns the key following <c><anno>Key1</anno></c> in the table - <c><anno>Name</anno></c> according to the table's internal order, or - <c>'$end_of_table'</c> if there is no next key.</p> - <p>Should an error occur, the process is exited with an error + <p>Returns either the key following <c><anno>Key1</anno></c> in table + <c><anno>Name</anno></c> according to the internal order of the + table, or <c>'$end_of_table'</c> if there is no next key.</p> + <p>If an error occurs, the process is exited with an error tuple <c>{error, Reason}</c>.</p> - <p>Use <seealso marker="#first/1"><c>first/1</c></seealso> to find - the first key in the table.</p> + <p>To find the first key in the table, use + <seealso marker="#first/1"><c>first/1</c></seealso>.</p> </desc> </func> + <func> <name name="open_file" arity="1"/> <fsummary>Open an existing Dets table.</fsummary> <desc> - <p>Opens an existing table. If the table has not been properly - closed, it will be repaired. The returned reference is to be - used as the name of the table. This function is most useful - for debugging purposes.</p> + <p>Opens an existing table. If the table is not properly closed, + it is repaired. The returned reference is to be used as the table + name. This function is most useful for debugging purposes.</p> </desc> </func> + <func> <name name="open_file" arity="2"/> <fsummary>Open a Dets table.</fsummary> <desc> <p>Opens a table. An empty Dets table is created if no file exists.</p> - <p>The atom <c><anno>Name</anno></c> is the name of the table. The table + <p>The atom <c><anno>Name</anno></c> is the table name. The table name must be provided in all subsequent operations on the table. The name can be used by other processes as well, and - several process can share one table. - </p> + many processes can share one table.</p> <p>If two processes open the same table by giving the same - name and arguments, then the table will have two users. If one - user closes the table, it still remains open until the second - user closes the table.</p> - <p>The <c><anno>Args</anno></c> argument is a list of <c>{Key, Val}</c> - tuples where the following values are allowed:</p> + name and arguments, the table has two users. If one + user closes the table, it remains open until the second + user closes it.</p> + <p>Argument <c><anno>Args</anno></c> is a list of <c>{Key, Val}</c> + tuples, where the following values are allowed:</p> <list type="bulleted"> <item> <p><c>{access, </c><seealso marker="#type-access"> - access()</seealso><c>}</c>. It is possible to open - existing tables in read-only mode. A table which is opened + <c>access()</c></seealso><c>}</c> - Existing tables can be + opened in read-only mode. A table that is opened in read-only mode is not subjected to the automatic file reparation algorithm if it is later opened after a crash. - The default value is <c>read_write</c>.</p> + Defaults to <c>read_write</c>.</p> </item> <item> <p><c>{auto_save, </c><seealso marker="#type-auto_save"> - auto_save()</seealso><c>}</c>, the auto save + <c>auto_save()</c></seealso><c>}</c> - The autosave interval. If the interval is an integer <c>Time</c>, the table is flushed to disk whenever it is not accessed for <c>Time</c> milliseconds. A table that has been flushed - will require no reparation when reopened after an + requires no reparation when reopened after an uncontrolled emulator halt. If the interval is the atom - <c>infinity</c>, auto save is disabled. The default value - is 180000 (3 minutes).</p> + <c>infinity</c>, autosave is disabled. Defaults to + 180000 (3 minutes).</p> </item> <item> <p><c>{estimated_no_objects, </c><seealso marker="#type-no_slots"> - no_slots()</seealso><c>}</c>. Equivalent to the - <c>min_no_slots</c> option.</p> + <c>no_slots()</c></seealso><c>}</c> - Equivalent to option + <c>min_no_slots</c>.</p> </item> <item> <p><c>{file, </c><seealso marker="file#type-name"> - file:name()</seealso><c>}</c>, the name of the file to be - opened. The default value is the name of the table.</p> + <c>file:name()</c></seealso><c>}</c> - The name of the file to be + opened. Defaults to the table name.</p> </item> <item> <p><c>{max_no_slots, </c><seealso marker="#type-no_slots"> - no_slots()</seealso><c>}</c>, the maximum number - of slots that will be used. The default value as well as - the maximal value is 32 M. Note that a higher value may - increase the fragmentation of the table, and conversely, - that a smaller value may decrease the fragmentation, at + <c>no_slots()</c></seealso><c>}</c> - The maximum number + of slots to be used. Defaults to 32 M, which is the + maximal value. Notice that a higher value can + increase the table fragmentation, and + a smaller value can decrease the fragmentation, at the expense of execution time. Only available for version 9 tables.</p> </item> <item> <p><c>{min_no_slots, </c><seealso marker="#type-no_slots"> - no_slots()</seealso><c>}</c>. Application + <c>no_slots()</c></seealso><c>}</c> - Application performance can be enhanced with this flag by specifying, when the table is created, the estimated number of - different keys that will be stored in the table. The - default value as well as the minimum value is 256.</p> + different keys to be stored in the table. Defaults to 256, + which is the minimum value.</p> </item> <item> <p><c>{keypos, </c><seealso marker="#type-keypos"> - keypos()</seealso><c>}</c>, the position of the - element of each object to be used as key. The default - value is 1. The ability to explicitly state the key + <c>keypos()</c></seealso><c>}</c> - The position of the + element of each object to be used as key. Defaults to 1. + The ability to explicitly state the key position is most convenient when we want to store Erlang records in which the first position of the record is the name of the record type.</p> </item> <item> - <p><c>{ram_file, boolean()}</c>, whether the table is to - be kept in RAM. Keeping the table in RAM may sound like an + <p><c>{ram_file, boolean()}</c> - Whether the table is to + be kept in RAM. Keeping the table in RAM can sound like an anomaly, but can enhance the performance of applications - which open a table, insert a set of objects, and then + that open a table, insert a set of objects, and then close the table. When the table is closed, its contents - are written to the disk file. The default value is - <c>false</c>.</p> + are written to the disk file. Defaults to <c>false</c>.</p> </item> <item> - <p><c>{repair, Value}</c>. <c>Value</c> can be either + <p><c>{repair, Value}</c> - <c>Value</c> can be either a <c>boolean()</c> or the atom <c>force</c>. The flag - specifies whether the Dets server should invoke the - automatic file reparation algorithm. The default is - <c>true</c>. If <c>false</c> is specified, there is no - attempt to repair the file and <c>{error, {needs_repair, - FileName}}</c> is returned if the table needs to be - repaired.</p> - <p>The value <c>force</c> means that a reparation will - take place even if the table has been properly closed. + specifies if the Dets server is to invoke the + automatic file reparation algorithm. Defaults to + <c>true</c>. If <c>false</c> is specified, no attempt is + made to repair the file, and <c>{error, {needs_repair, + FileName}}</c> is returned if the table must be repaired.</p> + <p>Value <c>force</c> means that a reparation + is made even if the table is properly closed. This is how to convert tables created by older versions of STDLIB. An example is tables hashed with the deprecated - <c>erlang:hash/2</c> BIF. Tables created with Dets from a - STDLIB version of 1.8.2 and later use the - <c>erlang:phash/2</c> function or the - <c>erlang:phash2/1</c> function, which is preferred.</p> - <p>The <c>repair</c> option is ignored if the table is - already open.</p> + <c>erlang:hash/2</c> BIF. Tables created with Dets from + STDLIB version 1.8.2 or later use function + <c>erlang:phash/2</c> or function <c>erlang:phash2/1</c>, + which is preferred.</p> + <p>Option <c>repair</c> is ignored if the table is already open.</p> </item> <item> - <p><c>{type, </c><seealso marker="#type-type">type()</seealso><c>}</c>, - the type of the table. The default value is <c>set</c>.</p> + <p><c>{type, </c><seealso marker="#type-type"> + <c>type()</c></seealso><c>}</c> - The table type. Defaults to + <c>set</c>.</p> </item> <item> <p><c>{version, </c><seealso marker="#type-version"> - version()</seealso><c>}</c>, the version of the format - used for the table. The default value is <c>9</c>. Tables - on the format used before OTP R8 can be created by giving - the value <c>8</c>. A version 8 table can be converted to - a version 9 table by giving the options <c>{version,9}</c> + <c>version()</c></seealso><c>}</c> - The version of the format + used for the table. Defaults to <c>9</c>. Tables on the format + used before Erlang/OTP R8 can be created by specifying value + <c>8</c>. A version 8 table can be converted to a version 9 + table by specifying options <c>{version,9}</c> and <c>{repair,force}</c>.</p> </item> </list> </desc> </func> + <func> <name name="pid2name" arity="1"/> <fsummary>Return the name of the Dets table handled by a pid.</fsummary> <desc> - <p>Returns the name of the table given the pid of a process + <p>Returns the table name given the pid of a process that handles requests to a table, or <c>undefined</c> if there is no such table.</p> <p>This function is meant to be used for debugging only.</p> </desc> </func> + <func> <name name="repair_continuation" arity="2"/> - <fsummary>Repair a continuation from select/1 or select/3.</fsummary> + <fsummary>Repair a continuation from <c>select/1</c> or <c>select/3</c>. + </fsummary> <desc> <p>This function can be used to restore an opaque continuation - returned by <c>select/3</c> or <c>select/1</c> if the + returned by + <seealso marker="#select/3"><c>select/3</c></seealso> or + <seealso marker="#select/1"><c>select/1</c></seealso> if the continuation has passed through external term format (been sent between nodes or stored on disk).</p> <p>The reason for this function is that continuation terms - contain compiled match specifications and therefore will be + contain compiled match specifications and therefore are invalidated if converted to external term format. Given that the original match specification is kept intact, the continuation can be restored, meaning it can once again be used in subsequent <c>select/1</c> calls even though it has been stored on disk or on another node.</p> - <p>See also <c>ets(3)</c> for further explanations and - examples. - </p> + <p>For more information and examples, see the + <seealso marker="ets"><c>ets(3)</c></seealso> module.</p> <note> - <p>This function is very rarely needed in application code. It - is used by Mnesia to implement distributed <c>select/3</c> + <p>This function is rarely needed in application code. It is used by + application Mnesia to provide distributed <c>select/3</c> and <c>select/1</c> sequences. A normal application would either use Mnesia or keep the continuation from being converted to external format.</p> <p>The reason for not having an external representation of - compiled match specifications is performance. It may be + compiled match specifications is performance. It can be subject to change in future releases, while this interface - will remain for backward compatibility.</p> + remains for backward compatibility.</p> </note> </desc> </func> + <func> <name name="safe_fixtable" arity="2"/> <fsummary>Fix a Dets table for safe traversal.</fsummary> <desc> - <p>If <c><anno>Fix</anno></c> is <c>true</c>, the table + <p>If <c><anno>Fix</anno></c> is <c>true</c>, table <c><anno>Name</anno></c> is fixed (once more) by the calling process, otherwise the table is released. The table is also released when a fixing process - terminates. - </p> - <p>If several processes fix a table, the table will remain + terminates.</p> + <p>If many processes fix a table, the table remains fixed until all processes have released it or terminated. A reference counter is kept on a per process basis, and N consecutive fixes require N releases to release the table.</p> <p>It is not guaranteed that calls to <c>first/1</c>, - <c>next/2</c>, select and match functions work as expected - even if the table has been fixed; the limited support for - concurrency implemented in Ets has not yet been implemented - in Dets. Fixing a table currently only disables resizing of + <c>next/2</c>, or select and match functions work as expected + even if the table is fixed; the limited support for + concurrency provided by the + <seealso marker="ets"><c>ets(3)</c></seealso> module is not yet + provided by Dets. + Fixing a table currently only disables resizing of the hash list of the table.</p> <p>If objects have been added while the table was fixed, the - hash list will start to grow when the table is released which - will significantly slow down access to the table for a period + hash list starts to grow when the table is released, which + significantly slows down access to the table for a period of time.</p> </desc> </func> + <func> <name name="select" arity="1"/> - <fsummary>Apply a match specification to some objects stored in a Dets table.</fsummary> + <fsummary>Apply a match specification to some objects stored in a + Dets table.</fsummary> <desc> <p>Applies a match specification to some objects stored in a table and returns a non-empty list of the results. The table, the match specification, and the number of objects that are matched are all defined by <c><anno>Continuation</anno></c>, - which has been returned by a prior call to <c>select/1</c> - or <c>select/3</c>.</p> + which is returned by a previous call to + <seealso marker="#select/1"><c>select/1</c></seealso> or + <seealso marker="#select/3"><c>select/3</c></seealso>.</p> <p>When all objects of the table have been matched, <c>'$end_of_table'</c> is returned.</p> </desc> </func> + <func> <name name="select" arity="2"/> - <fsummary>Apply a match specification to all objects stored in a Dets table.</fsummary> + <fsummary>Apply a match specification to all objects stored in a + Dets table.</fsummary> <desc> - <p>Returns the results of applying the match specification - <c><anno>MatchSpec</anno></c> to all or some objects stored in the table - <c><anno>Name</anno></c>. The order of the objects is not specified. See - the ERTS User's Guide for a description of match - specifications.</p> + <p>Returns the results of applying match specification + <c><anno>MatchSpec</anno></c> to all or some objects stored in table + <c><anno>Name</anno></c>. The order of the objects is not specified. + For a description of match specifications, see the + <seealso marker="erts:match_spec">ERTS User's Guide</seealso>.</p> <p>If the keypos'th element of <c><anno>MatchSpec</anno></c> is unbound, the match specification is applied to all objects of the table. If the keypos'th element is bound, the match - specification is applied to the objects with the right key(s) + specification is applied to the objects with the correct key(s) only.</p> <p>Using the <c>select</c> functions for traversing all objects of a table is more efficient than calling @@ -956,116 +1017,138 @@ ok </p> </desc> </func> + <func> <name name="select" arity="3"/> - <fsummary>Apply a match specification to the first chunk of objects stored in a Dets table.</fsummary> + <fsummary>Apply a match specification to the first chunk of objects + stored in a Dets table.</fsummary> <desc> - <p>Returns the results of applying the match specification - <c><anno>MatchSpec</anno></c> to some or all objects stored in the table - <c><anno>Name</anno></c>. The order of the objects is not specified. See - the ERTS User's Guide for a description of match - specifications.</p> + <p>Returns the results of applying match specification + <c><anno>MatchSpec</anno></c> to some or all objects stored in table + <c><anno>Name</anno></c>. The order of the objects is not specified. + For a description of match specifications, see the + <seealso marker="erts:match_spec">ERTS User's Guide</seealso>.</p> <p>A tuple of the results of applying the match specification and a continuation is returned, unless the table is empty, in which case <c>'$end_of_table'</c> is returned. The - continuation is to be used when matching further objects by - calling <c>select/1</c>.</p> - <p>If the keypos'th element of <c><anno>MatchSpec</anno></c> is bound, the - match specification is applied to all objects of the table - with the right key(s). If the keypos'th element of + continuation is to be used when matching more objects by calling + <seealso marker="#select/1"><c>select/1</c></seealso>.</p> + <p>If the keypos'th element of <c><anno>MatchSpec</anno></c> is bound, + the match specification is applied to all objects of the table + with the correct key(s). If the keypos'th element of <c><anno>MatchSpec</anno></c> is unbound, the match specification is - applied to all objects of the table, <c><anno>N</anno></c> objects at a - time, until at least one object matches or the end of the - table has been reached. The default, indicated by giving - <c><anno>N</anno></c> the value <c>default</c>, is to let the number of - objects vary depending on the sizes of the objects. If - <c><anno>Name</anno></c> is a version 9 table, all objects with the same - key are always handled at the same time which implies that the - match specification may be applied to more than <anno>N</anno> objects. - </p> - <p>The table should always be protected using - <c>safe_fixtable/2</c> before calling <c>select/3</c>, or - errors may occur when calling <c>select/1</c>.</p> + applied to all objects of the table, <c><anno>N</anno></c> objects at + a time, until at least one object matches or the end of the + table is reached. The default, indicated by giving + <c><anno>N</anno></c> the value <c>default</c>, is to let the number + of objects vary depending on the sizes of the objects. If + <c><anno>Name</anno></c> is a version 9 table, all objects with the + same key are always handled at the same time, which implies that the + match specification can be applied to more than <anno>N</anno> + objects.</p> + <p>The table is always to be protected using + <seealso marker="#safe_fixtable/2"><c>safe_fixtable/2</c></seealso> + before calling <c>select/3</c>, otherwise + errors can occur when calling <c>select/1</c>.</p> </desc> </func> + <func> <name name="select_delete" arity="2"/> - <fsummary>Delete all objects that match a given pattern from a Dets table.</fsummary> + <fsummary>Delete all objects that match a given pattern from a + Dets table.</fsummary> <desc> - <p>Deletes each object from the table <c><anno>Name</anno></c> such that - applying the match specification <c><anno>MatchSpec</anno></c> to the - object returns the value <c>true</c>. See the ERTS - User's Guide for a description of match - specifications. Returns the number of deleted objects.</p> + <p>Deletes each object from table <c><anno>Name</anno></c> such that + applying match specification <c><anno>MatchSpec</anno></c> to the + object returns value <c>true</c>. + For a description of match specifications, see the + <seealso marker="erts:match_spec">ERTS User's Guide</seealso>. + Returns the number of deleted objects.</p> <p>If the keypos'th element of <c><anno>MatchSpec</anno></c> is bound, the match specification is applied to the objects - with the right key(s) only.</p> + with the correct key(s) only.</p> </desc> </func> + <func> <name name="slot" arity="2"/> - <fsummary>Return the list of objects associated with a slot of a Dets table.</fsummary> + <fsummary>Return the list of objects associated with a slot of a + Dets table.</fsummary> <desc> <p>The objects of a table are distributed among slots, - starting with slot <c>0</c> and ending with slot n. This - function returns the list of objects associated with slot - <c><anno>I</anno></c>. If <c><anno>I</anno></c> is greater than n + starting with slot <c>0</c> and ending with slot <c>n</c>. + Returns the list of objects associated with slot + <c><anno>I</anno></c>. If <c><anno>I</anno></c> > <c>n</c>, <c>'$end_of_table'</c> is returned.</p> </desc> </func> + <func> <name name="sync" arity="1"/> - <fsummary>Ensure that all updates made to a Dets table are written to disk.</fsummary> + <fsummary>Ensure that all updates made to a Dets table are written + to disk.</fsummary> <desc> - <p>Ensures that all updates made to the table <c><anno>Name</anno></c> are - written to disk. This also applies to tables which have been - opened with the <c>ram_file</c> flag set to <c>true</c>. In - this case, the contents of the RAM file are flushed to - disk.</p> - <p>Note that the space management data structures kept in RAM, - the buddy system, is also written to the disk. This may take + <p>Ensures that all updates made to table <c><anno>Name</anno></c> are + written to disk. This also applies to tables that have been + opened with flag <c>ram_file</c> set to <c>true</c>. In + this case, the contents of the RAM file are flushed to disk.</p> + <p>Notice that the space management data structures kept in RAM, + the buddy system, is also written to the disk. This can take some time if the table is fragmented.</p> </desc> </func> + <func> <name name="table" arity="1"/> <name name="table" arity="2"/> <fsummary>Return a QLC query handle.</fsummary> <desc> - <p><marker id="qlc_table"></marker>Returns a QLC (Query List - Comprehension) query handle. The module <c>qlc</c> - implements a query language aimed mainly at Mnesia but Ets - tables, Dets tables, and lists are also recognized by <c>qlc</c> - as sources of data. Calling <c>dets:table/1,2</c> is the - means to make the Dets table <c><anno>Name</anno></c> usable to <c>qlc</c>.</p> - <p>When there are only simple restrictions on the key position - <c>qlc</c> uses <c>dets:lookup/2</c> to look up the keys, but when - that is not possible the whole table is traversed. The - option <c>traverse</c> determines how this is done:</p> + <p>Returns a Query List + Comprehension (QLC) query handle. The + <seealso marker="qlc"><c>qlc(3)</c></seealso> module + provides a query language aimed mainly for Mnesia, but + ETS tables, Dets tables, and lists are also recognized + by <c>qlc</c> as sources of data. Calling + <seealso marker="dets#table/1"><c>dets:table/1,2</c></seealso> is the + means to make Dets table <c><anno>Name</anno></c> usable to + <c>qlc</c>.</p> + <p>When there are only simple restrictions on the key position, + <c>qlc</c> uses + <seealso marker="dets#lookup/2"><c>dets:lookup/2</c></seealso> + to look up the keys. When + that is not possible, the whole table is traversed. + Option <c>traverse</c> determines how this is done:</p> <list type="bulleted"> <item> - <p><c>first_next</c>. The table is traversed one key at - a time by calling <c>dets:first/1</c> and - <c>dets:next/2</c>.</p> + <p><c>first_next</c> - The table is traversed one key at + a time by calling <c>dets:first/1</c> and <c>dets:next/2</c>.</p> </item> <item> - <p><c>select</c>. The table is traversed by calling - <c>dets:select/3</c> and <c>dets:select/1</c>. The option - <c>n_objects</c> determines the number of objects + <p><c>select</c> - The table is traversed by calling + <seealso marker="dets:select/3"><c>dets:select/3</c></seealso> and + <seealso marker="dets:select/1"><c>dets:select/1</c></seealso>. + Option <c>n_objects</c> determines the number of objects returned (the third argument of <c>select/3</c>). The match specification (the second argument of - <c>select/3</c>) is assembled by <c>qlc</c>: simple filters are - translated into equivalent match specifications while - more complicated filters have to be applied to all - objects returned by <c>select/3</c> given a match - specification that matches all objects.</p> + <c>select/3</c>) is assembled by <c>qlc</c>:</p> + <list type="bulleted"> + <item> + <p>Simple filters are translated into equivalent match + specifications.</p> + </item> + <item> + <p>More complicated filters must be applied to all + objects returned by <c>select/3</c> given a match + specification that matches all objects.</p> + </item> + </list> </item> <item> <p><c>{select, </c><seealso marker="#type-match_spec"> - match_spec()</seealso><c>}</c>. As for <c>select</c> + match_spec()</seealso><c>}</c> - As for <c>select</c>, the table is traversed by calling <c>dets:select/3</c> and <c>dets:select/1</c>. The difference is that the - match specification is explicitly given. This is how to + match specification is specified explicitly. This is how to state match specifications that cannot easily be expressed within the syntax provided by <c>qlc</c>.</p> </item> @@ -1076,70 +1159,79 @@ ok 1> <input>dets:open_file(t, []),</input> <input>ok = dets:insert(t, [{1,a},{2,b},{3,c},{4,d}]),</input> <input>MS = ets:fun2ms(fun({X,Y}) when (X > 1) or (X < 5) -> {Y} end),</input> -<input>QH1 = dets:table(t, [{traverse, {select, MS}}]).</input> </pre> +<input>QH1 = dets:table(t, [{traverse, {select, MS}}]).</input></pre> <p>An example with implicit match specification:</p> <pre> -2> <input>QH2 = qlc:q([{Y} || {X,Y} <- dets:table(t), (X > 1) or (X < 5)]).</input> </pre> - <p>The latter example is in fact equivalent to the former which - can be verified using the function <c>qlc:info/1</c>:</p> +2> <input>QH2 = qlc:q([{Y} || {X,Y} <- dets:table(t), (X > 1) or (X < 5)]).</input></pre> + <p>The latter example is equivalent to the former, which + can be verified using function <c>qlc:info/1</c>:</p> <pre> 3> <input>qlc:info(QH1) =:= qlc:info(QH2).</input> -true </pre> - <p><c>qlc:info/1</c> returns information about a query handle, - and in this case identical information is returned for the +true</pre> + <p><c>qlc:info/1</c> returns information about a query handle. + In this case identical information is returned for the two query handles.</p> </desc> </func> + <func> <name name="to_ets" arity="2"/> - <fsummary>Insert all objects of a Dets table into an Ets table.</fsummary> + <fsummary>Insert all objects of a Dets table into an ETS + table.</fsummary> <desc> - <p>Inserts the objects of the Dets table <c><anno>Name</anno></c> into the - Ets table <c><anno>EtsTab</anno></c>. The order in which the objects are - inserted is not specified. The existing objects of the Ets + <p>Inserts the objects of the Dets table <c><anno>Name</anno></c> + into the ETS table <c><anno>EtsTab</anno></c>. The order in + which the objects are + inserted is not specified. The existing objects of the ETS table are kept unless overwritten.</p> </desc> </func> + <func> <name name="traverse" arity="2"/> - <fsummary>Apply a function to all or some objects stored in a Dets table.</fsummary> + <fsummary>Apply a function to all or some objects stored in a Dets + table.</fsummary> <desc> - <p>Applies <c><anno>Fun</anno></c> to each object stored in the table - <c><anno>Name</anno></c> in some unspecified order. Different actions are + <p>Applies <c><anno>Fun</anno></c> to each object stored in table + <c><anno>Name</anno></c> in some unspecified order. Different + actions are taken depending on the return value of <c><anno>Fun</anno></c>. The following <c><anno>Fun</anno></c> return values are allowed:</p> <taglist> <tag><c>continue</c></tag> <item> <p>Continue to perform the traversal. For example, the - following function can be used to print out the contents + following function can be used to print the contents of a table:</p> <pre> -fun(X) -> io:format("~p~n", [X]), continue end. </pre> +fun(X) -> io:format("~p~n", [X]), continue end.</pre> </item> <tag><c>{continue, Val}</c></tag> <item> - <p>Continue the traversal and accumulate <c><anno>Val</anno></c>. The - following function is supplied in order to collect all - objects of a table in a list: </p> + <p>Continue the traversal and accumulate <c><anno>Val</anno></c>. + The following function is supplied to collect all + objects of a table in a list:</p> <pre> -fun(X) -> {continue, X} end. </pre> +fun(X) -> {continue, X} end.</pre> </item> <tag><c>{done, <anno>Value</anno>}</c></tag> <item> - <p>Terminate the traversal and return <c>[<anno>Value</anno> | Acc]</c>.</p> + <p>Terminate the traversal and return + <c>[<anno>Value</anno> | Acc]</c>.</p> </item> </taglist> - <p>Any other value <c><anno>OtherValue</anno></c> returned by <c><anno>Fun</anno></c> terminates the - traversal and is immediately returned. - </p> + <p>Any other value <c><anno>OtherValue</anno></c> returned by + <c><anno>Fun</anno></c> terminates the + traversal and is returned immediately.</p> </desc> </func> + <func> <name name="update_counter" arity="3"/> - <fsummary>Update a counter object stored in a Dets table.</fsummary> + <fsummary>Update a counter object stored in a Dets table. + </fsummary> <desc> - <p>Updates the object with key <c><anno>Key</anno></c> stored in the + <p>Updates the object with key <c><anno>Key</anno></c> stored in table <c><anno>Name</anno></c> of type <c>set</c> by adding <c><anno>Incr</anno></c> to the element at the <c><anno>Pos</anno></c>:th position. @@ -1148,7 +1240,7 @@ fun(X) -> {continue, X} end. </pre> following the key is updated.</p> <p>This functions provides a way of updating a counter, without having to look up an object, update the object by - incrementing an element and insert the resulting object into + incrementing an element, and insert the resulting object into the table again.</p> </desc> </func> @@ -1156,8 +1248,9 @@ fun(X) -> {continue, X} end. </pre> <section> <title>See Also</title> - <p><seealso marker="ets">ets(3)</seealso>, - mnesia(3), - <seealso marker="qlc">qlc(3)</seealso></p> + <p><seealso marker="ets"><c>ets(3)</c></seealso>, + <seealso marker="mnesia:mnesia"><c>mnesia(3)</c></seealso>, + <seealso marker="qlc"><c>qlc(3)</c></seealso></p> </section> </erlref> + diff --git a/lib/stdlib/doc/src/dict.xml b/lib/stdlib/doc/src/dict.xml index 20bab99a9c..c926ff1b5b 100644 --- a/lib/stdlib/doc/src/dict.xml +++ b/lib/stdlib/doc/src/dict.xml @@ -29,12 +29,13 @@ <rev>B</rev> </header> <module>dict</module> - <modulesummary>Key-Value Dictionary</modulesummary> + <modulesummary>Key-value dictionary.</modulesummary> <description> - <p><c>Dict</c> implements a <c>Key</c> - <c>Value</c> dictionary. + <p>This module provides a <c>Key</c>-<c>Value</c> dictionary. The representation of a dictionary is not defined.</p> - <p>This module provides exactly the same interface as the module - <c>orddict</c>. One difference is that while this module + <p>This module provides the same interface as the + <seealso marker="orddict"><c>orddict(3)</c></seealso> module. + One difference is that while this module considers two keys as different if they do not match (<c>=:=</c>), <c>orddict</c> considers two keys as different if and only if they do not compare equal (<c>==</c>).</p> @@ -43,211 +44,241 @@ <datatypes> <datatype> <name name="dict" n_vars="2"/> - <desc><p>Dictionary as returned by <c>new/0</c>.</p></desc> + <desc><p>Dictionary as returned by + <seealso marker="#new/0"><c>new/0</c></seealso>.</p> + </desc> </datatype> <datatype> <name name="dict" n_vars="0"/> </datatype> </datatypes> + <funcs> <func> <name name="append" arity="3"/> - <fsummary>Append a value to keys in a dictionary</fsummary> + <fsummary>Append a value to keys in a dictionary.</fsummary> <desc> - <p>This function appends a new <c><anno>Value</anno></c> to the current list + <p>Appends a new <c><anno>Value</anno></c> to the current list of values associated with <c><anno>Key</anno></c>.</p> + <p>See also section <seealso marker="#notes">Notes</seealso>.</p> </desc> </func> + <func> <name name="append_list" arity="3"/> - <fsummary>Append new values to keys in a dictionary</fsummary> + <fsummary>Append new values to keys in a dictionary.</fsummary> <desc> - <p>This function appends a list of values <c><anno>ValList</anno></c> to + <p>Appends a list of values <c><anno>ValList</anno></c> to the current list of values associated with <c><anno>Key</anno></c>. An exception is generated if the initial value associated with <c><anno>Key</anno></c> is not a list of values.</p> + <p>See also section <seealso marker="#notes">Notes</seealso>.</p> </desc> </func> + <func> <name name="erase" arity="2"/> - <fsummary>Erase a key from a dictionary</fsummary> + <fsummary>Erase a key from a dictionary.</fsummary> <desc> - <p>This function erases all items with a given key from a - dictionary.</p> + <p>Erases all items with a given key from a dictionary.</p> </desc> </func> + <func> <name name="fetch" arity="2"/> - <fsummary>Look-up values in a dictionary</fsummary> + <fsummary>Look up values in a dictionary.</fsummary> <desc> - <p>This function returns the value associated with <c><anno>Key</anno></c> - in the dictionary <c><anno>Dict</anno></c>. <c>fetch</c> assumes that - the <c><anno>Key</anno></c> is present in the dictionary and an exception + <p>Returns the value associated with <c><anno>Key</anno></c> + in dictionary <c><anno>Dict</anno></c>. This function assumes that + <c><anno>Key</anno></c> is present in dictionary <c>Dict</c>, + and an exception is generated if <c><anno>Key</anno></c> is not in the dictionary.</p> + <p>See also section <seealso marker="#notes">Notes</seealso>.</p> </desc> </func> + <func> <name name="fetch_keys" arity="1"/> - <fsummary>Return all keys in a dictionary</fsummary> + <fsummary>Return all keys in a dictionary.</fsummary> <desc> - <p>This function returns a list of all keys in the dictionary.</p> + <p>Returns a list of all keys in dictionary <c>Dict</c>.</p> </desc> </func> + <func> <name name="filter" arity="2"/> - <fsummary>Choose elements which satisfy a predicate</fsummary> + <fsummary>Select elements that satisfy a predicate.</fsummary> <desc> <p><c><anno>Dict2</anno></c> is a dictionary of all keys and values in - <c><anno>Dict1</anno></c> for which <c><anno>Pred</anno>(<anno>Key</anno>, <anno>Value</anno>)</c> is <c>true</c>.</p> + <c><anno>Dict1</anno></c> for which + <c><anno>Pred</anno>(<anno>Key</anno>, <anno>Value</anno>)</c> is + <c>true</c>.</p> </desc> </func> + <func> <name name="find" arity="2"/> - <fsummary>Search for a key in a dictionary</fsummary> + <fsummary>Search for a key in a dictionary.</fsummary> <desc> - <p>This function searches for a key in a dictionary. Returns - <c>{ok, <anno>Value</anno>}</c> where <c><anno>Value</anno></c> is the value associated - with <c><anno>Key</anno></c>, or <c>error</c> if the key is not present in - the dictionary.</p> + <p>Searches for a key in dictionary <c>Dict</c>. Returns + <c>{ok, <anno>Value</anno>}</c>, where <c><anno>Value</anno></c> is + the value associated with <c><anno>Key</anno></c>, or <c>error</c> + if the key is not present in the dictionary.</p> + <p>See also section <seealso marker="#notes">Notes</seealso>.</p> </desc> </func> + <func> <name name="fold" arity="3"/> - <fsummary>Fold a function over a dictionary</fsummary> + <fsummary>Fold a function over a dictionary.</fsummary> <desc> <p>Calls <c><anno>Fun</anno></c> on successive keys and values of - <c><anno>Dict</anno></c> together with an extra argument <c>Acc</c> + dictionary <c><anno>Dict</anno></c> together with an extra argument + <c>Acc</c> (short for accumulator). <c><anno>Fun</anno></c> must return a new - accumulator which is passed to the next call. <c><anno>Acc0</anno></c> is - returned if the dict is empty. The evaluation order is + accumulator that is passed to the next call. <c><anno>Acc0</anno></c> + is returned if the dictionary is empty. The evaluation order is undefined.</p> </desc> </func> + <func> <name name="from_list" arity="1"/> - <fsummary>Convert a list of pairs to a dictionary</fsummary> + <fsummary>Convert a list of pairs to a dictionary.</fsummary> <desc> - <p>This function converts the <c><anno>Key</anno></c> - <c><anno>Value</anno></c> list - <c><anno>List</anno></c> to a dictionary.</p> + <p>Converts the <c><anno>Key</anno></c>-<c><anno>Value</anno></c> list + <c><anno>List</anno></c> to dictionary <c>Dict</c>.</p> </desc> </func> + + <func> + <name name="is_empty" arity="1"/> + <fsummary>Return <c>true</c> if the dictionary is empty.</fsummary> + <desc> + <p>Returns <c>true</c> if dictionary <c><anno>Dict</anno></c> has no + elements, otherwise <c>false</c>.</p> + </desc> + </func> + <func> <name name="is_key" arity="2"/> - <fsummary>Test if a key is in a dictionary</fsummary> + <fsummary>Test if a key is in a dictionary.</fsummary> <desc> - <p>This function tests if <c><anno>Key</anno></c> is contained in - the dictionary <c><anno>Dict</anno></c>.</p> + <p>Tests if <c><anno>Key</anno></c> is contained in + dictionary <c><anno>Dict</anno></c>.</p> </desc> </func> + <func> <name name="map" arity="2"/> - <fsummary>Map a function over a dictionary</fsummary> + <fsummary>Map a function over a dictionary.</fsummary> <desc> - <p><c>map</c> calls <c><anno>Fun</anno></c> on successive keys and values - of <c><anno>Dict1</anno></c> to return a new value for each key. - The evaluation order is undefined.</p> + <p>Calls <c><anno>Fun</anno></c> on successive keys and values + of dictionary <c><anno>Dict1</anno></c> to return a new value for + each key. The evaluation order is undefined.</p> </desc> </func> + <func> <name name="merge" arity="3"/> - <fsummary>Merge two dictionaries</fsummary> + <fsummary>Merge two dictionaries.</fsummary> <desc> - <p><c>merge</c> merges two dictionaries, <c><anno>Dict1</anno></c> and - <c><anno>Dict2</anno></c>, to create a new dictionary. All the <c><anno>Key</anno></c> - - <c><anno>Value</anno></c> pairs from both dictionaries are included in - the new dictionary. If a key occurs in both dictionaries then - <c><anno>Fun</anno></c> is called with the key and both values to return a - new value. <c>merge</c> could be defined as:</p> + <p>Merges two dictionaries, <c><anno>Dict1</anno></c> and + <c><anno>Dict2</anno></c>, to create a new dictionary. All the + <c><anno>Key</anno></c>-<c><anno>Value</anno></c> pairs from both + dictionaries are included in the new dictionary. If a key occurs + in both dictionaries, <c><anno>Fun</anno></c> is called with the + key and both values to return a new value. + <c>merge</c> can be defined as follows, but is faster:</p> <code type="none"> merge(Fun, D1, D2) -> fold(fun (K, V1, D) -> update(K, fun (V2) -> Fun(K, V1, V2) end, V1, D) end, D2, D1).</code> - <p>but is faster.</p> </desc> </func> + <func> <name name="new" arity="0"/> - <fsummary>Create a dictionary</fsummary> + <fsummary>Create a dictionary.</fsummary> <desc> - <p>This function creates a new dictionary.</p> + <p>Creates a new dictionary.</p> </desc> </func> + <func> <name name="size" arity="1"/> - <fsummary>Return the number of elements in a dictionary</fsummary> + <fsummary>Return the number of elements in a dictionary.</fsummary> <desc> - <p>Returns the number of elements in a <c><anno>Dict</anno></c>.</p> - </desc> - </func> - <func> - <name name="is_empty" arity="1"/> - <fsummary>Return true if the dictionary is empty</fsummary> - <desc> - <p>Returns <c>true</c> if <c><anno>Dict</anno></c> has no elements, <c>false</c> otherwise.</p> + <p>Returns the number of elements in dictionary + <c><anno>Dict</anno></c>.</p> </desc> </func> + <func> <name name="store" arity="3"/> - <fsummary>Store a value in a dictionary</fsummary> + <fsummary>Store a value in a dictionary.</fsummary> <desc> - <p>This function stores a <c><anno>Key</anno></c> - <c><anno>Value</anno></c> pair in a - dictionary. If the <c><anno>Key</anno></c> already exists in <c><anno>Dict1</anno></c>, + <p>Stores a <c><anno>Key</anno></c>-<c><anno>Value</anno></c> pair in + dictionary <c>Dict2</c>. If <c><anno>Key</anno></c> already exists in + <c><anno>Dict1</anno></c>, the associated value is replaced by <c><anno>Value</anno></c>.</p> </desc> </func> + <func> <name name="to_list" arity="1"/> - <fsummary>Convert a dictionary to a list of pairs</fsummary> + <fsummary>Convert a dictionary to a list of pairs.</fsummary> <desc> - <p>This function converts the dictionary to a list - representation.</p> + <p>Converts dictionary <c>Dict</c> to a list representation.</p> </desc> </func> + <func> <name name="update" arity="3"/> - <fsummary>Update a value in a dictionary</fsummary> + <fsummary>Update a value in a dictionary.</fsummary> <desc> - <p>Update a value in a dictionary by calling <c><anno>Fun</anno></c> on - the value to get a new value. An exception is generated if + <p>Updates a value in a dictionary by calling <c><anno>Fun</anno></c> on + the value to get a new value. An exception is generated if <c><anno>Key</anno></c> is not present in the dictionary.</p> </desc> </func> + <func> <name name="update" arity="4"/> - <fsummary>Update a value in a dictionary</fsummary> + <fsummary>Update a value in a dictionary.</fsummary> <desc> - <p>Update a value in a dictionary by calling <c><anno>Fun</anno></c> on - the value to get a new value. If <c><anno>Key</anno></c> is not present - in the dictionary then <c><anno>Initial</anno></c> will be stored as - the first value. For example <c>append/3</c> could be defined - as:</p> + <p>Updates a value in a dictionary by calling <c><anno>Fun</anno></c> on + the value to get a new value. If <c><anno>Key</anno></c> is not + present in the dictionary, <c><anno>Initial</anno></c> is stored as + the first value. For example, <c>append/3</c> can be defined as:</p> <code type="none"> append(Key, Val, D) -> update(Key, fun (Old) -> Old ++ [Val] end, [Val], D).</code> </desc> </func> + <func> <name name="update_counter" arity="3"/> - <fsummary>Increment a value in a dictionary</fsummary> + <fsummary>Increment a value in a dictionary.</fsummary> <desc> - <p>Add <c><anno>Increment</anno></c> to the value associated with <c><anno>Key</anno></c> - and store this value. If <c><anno>Key</anno></c> is not present in - the dictionary then <c><anno>Increment</anno></c> will be stored as - the first value.</p> - <p>This could be defined as:</p> + <p>Adds <c><anno>Increment</anno></c> to the value associated with + <c><anno>Key</anno></c> and stores this value. + If <c><anno>Key</anno></c> is not present in the dictionary, + <c><anno>Increment</anno></c> is stored as the first value.</p> + <p>This can be defined as follows, but is faster:</p> <code type="none"> update_counter(Key, Incr, D) -> update(Key, fun (Old) -> Old + Incr end, Incr, D).</code> - <p>but is faster.</p> </desc> </func> </funcs> <section> <title>Notes</title> - <p>The functions <c>append</c> and <c>append_list</c> are included - so we can store keyed values in a list <em>accumulator</em>. For + <marker id="notes"/> + <p>Functions <c>append</c> and <c>append_list</c> are included + so that keyed values can be stored in a list <em>accumulator</em>, for example:</p> <pre> > D0 = dict:new(), @@ -256,19 +287,18 @@ update_counter(Key, Incr, D) -> D3 = dict:append(files, f2, D2), D4 = dict:append(files, f3, D3), dict:fetch(files, D4). -[f1,f2,f3] </pre> +[f1,f2,f3]</pre> <p>This saves the trouble of first fetching a keyed value, appending a new value to the list of stored values, and storing - the result. - </p> - <p>The function <c>fetch</c> should be used if the key is known to - be in the dictionary, otherwise <c>find</c>.</p> + the result.</p> + <p>Function <c>fetch</c> is to be used if the key is known to + be in the dictionary, otherwise function <c>find</c>.</p> </section> <section> <title>See Also</title> - <p><seealso marker="gb_trees">gb_trees(3)</seealso>, - <seealso marker="orddict">orddict(3)</seealso></p> + <p><seealso marker="gb_trees"><c>gb_trees(3)</c></seealso>, + <seealso marker="orddict"><c>orddict(3)</c></seealso></p> </section> </erlref> diff --git a/lib/stdlib/doc/src/digraph.xml b/lib/stdlib/doc/src/digraph.xml index 16dd789caf..5332d7aba5 100644 --- a/lib/stdlib/doc/src/digraph.xml +++ b/lib/stdlib/doc/src/digraph.xml @@ -30,64 +30,92 @@ <checked></checked> <date>2001-08-27</date> <rev>C</rev> - <file>digraph.sgml</file> + <file>digraph.xml</file> </header> <module>digraph</module> - <modulesummary>Directed Graphs</modulesummary> + <modulesummary>Directed graphs.</modulesummary> <description> - <p>The <c>digraph</c> module implements a version of labeled - directed graphs. What makes the graphs implemented here + <p>This module provides a version of labeled + directed graphs. What makes the graphs provided here non-proper directed graphs is that multiple edges between vertices are allowed. However, the customary definition of - directed graphs will be used in the text that follows. - </p> - <p>A <marker id="digraph"></marker><em>directed graph</em> (or just - "digraph") is a pair (V, E) of a finite set V of - <marker id="vertex"></marker><em>vertices</em> and a finite set E of - <marker id="edge"></marker><em>directed edges</em> (or just "edges"). - The set of - edges E is a subset of V × V (the Cartesian - product of V with itself). In this module, V is allowed to be - empty; the so obtained unique digraph is called the - <marker id="empty_digraph"></marker><em>empty digraph</em>. - Both vertices and edges are represented by unique Erlang terms. - </p> - <p>Digraphs can be annotated with additional information. Such - information may be attached to the vertices and to the edges of - the digraph. A digraph which has been annotated is called a - <em>labeled digraph</em>, and the information attached to a - vertex or an edge is called a <marker id="label"></marker> - <em>label</em>. Labels are Erlang terms. - </p> - <p>An edge e = (v, w) is said to - <marker id="emanate"></marker><em>emanate</em> from vertex v and - to be <marker id="incident"></marker><em>incident</em> on vertex w. - The <marker id="out_degree"></marker><em>out-degree</em> of a vertex - is the number of edges emanating from that vertex. - The <marker id="in_degree"></marker><em>in-degree</em> of a vertex - is the number of edges incident on that vertex. - If there is an edge emanating from v and incident on w, then w is - said to be an <marker id="out_neighbour"></marker> - <em>out-neighbour</em> of v, and v is said to be an - <marker id="in_neighbour"></marker><em>in-neighbour</em> of w. - A <marker id="path"></marker><em>path</em> P from v[1] to v[k] - in a digraph (V, E) is a non-empty sequence - v[1], v[2], ..., v[k] of vertices in V such that - there is an edge (v[i],v[i+1]) in E for - 1 <= i < k. - The <marker id="length"></marker><em>length</em> of the path P is k-1. - P is <marker id="simple_path"></marker><em>simple</em> if all - vertices are distinct, except that the first and the last vertices - may be the same. - P is a <marker id="cycle"></marker><em>cycle</em> if the length - of P is not zero and v[1] = v[k]. - A <marker id="loop"></marker><em>loop</em> is a cycle of length one. - A <marker id="simple_cycle"></marker><em>simple cycle</em> is a path - that is both a cycle and simple. - An <marker id="acyclic_digraph"></marker><em>acyclic digraph</em> - is a digraph that has no cycles. - </p> + directed graphs is used here.</p> + + <list type="bulleted"> + <item> + <p>A <marker id="digraph"></marker><em>directed graph</em> (or just + "digraph") is a pair (V, E) of a finite set V of + <marker id="vertex"></marker><em>vertices</em> and a finite set E of + <marker id="edge"></marker><em>directed edges</em> (or just "edges"). + The set of edges E is a subset of V × V (the + Cartesian product of V with itself).</p> + <p>In this module, V is allowed to be empty. The so obtained unique + digraph is called the + <marker id="empty_digraph"></marker><em>empty digraph</em>. Both + vertices and edges are represented by unique Erlang terms.</p> + </item> + <item> + <p>Digraphs can be annotated with more information. Such information + can be attached to the vertices and to the edges of the digraph. An + annotated digraph is called a <em>labeled digraph</em>, and the + information attached to a vertex or an edge is called a + <marker id="label"></marker><em>label</em>. Labels are Erlang + terms.</p> + </item> + <item> + <p>An edge e = (v, w) is said to + <marker id="emanate"></marker><em>emanate</em> from vertex v and to + be <marker id="incident"></marker><em>incident</em> on vertex w.</p> + </item> + <item> + <p>The <marker id="out_degree"></marker><em>out-degree</em> of a vertex + is the number of edges emanating from that vertex.</p> + </item> + <item> + <p>The <marker id="in_degree"></marker><em>in-degree</em> of a vertex + is the number of edges incident on that vertex.</p> + </item> + <item> + <p>If an edge is emanating from v and incident on w, then w is + said to be an <marker id="out_neighbour"></marker> + <em>out-neighbor</em> of v, and v is said to be an + <marker id="in_neighbour"></marker><em>in-neighbor</em> of w.</p> + </item> + <item> + <p>A <marker id="path"></marker><em>path</em> P from v[1] to v[k] + in a digraph (V, E) is a non-empty sequence + v[1], v[2], ..., v[k] of vertices in V such that + there is an edge (v[i],v[i+1]) in E for + 1 <= i < k.</p> + </item> + <item> + <p>The <marker id="length"></marker><em>length</em> of path P is + k-1.</p> + </item> + <item> + <p>Path P is <marker id="simple_path"></marker><em>simple</em> if all + vertices are distinct, except that the first and the last vertices + can be the same.</p> + </item> + <item> + <p>Path P is a <marker id="cycle"></marker><em>cycle</em> if the + length of P is not zero and v[1] = v[k].</p> + </item> + <item> + <p>A <marker id="loop"></marker><em>loop</em> is a cycle of length + one.</p> + </item> + <item> + <p>A <marker id="simple_cycle"></marker><em>simple cycle</em> is a path + that is both a cycle and simple.</p> + </item> + <item> + <p>An <marker id="acyclic_digraph"></marker><em>acyclic digraph</em> + is a digraph without cycles.</p> + </item> + </list> </description> + <datatypes> <datatype> <name name="d_type"/> @@ -100,20 +128,20 @@ </datatype> <datatype> <name name="graph"/> - <desc><p>A digraph as returned by <c>new/0,1</c>.</p></desc> + <desc><p>A digraph as returned by + <seealso marker="#new/0"><c>new/0,1</c></seealso>.</p></desc> </datatype> <datatype> <name>edge()</name> - <desc><p><marker id="type-edge"/></p></desc> </datatype> <datatype> <name name="label"/> </datatype> <datatype> <name>vertex()</name> - <desc><p><marker id="type-vertex"/></p></desc> </datatype> </datatypes> + <funcs> <func> <name name="add_edge" arity="3"/> @@ -122,291 +150,313 @@ <fsummary>Add an edge to a digraph.</fsummary> <type name="add_edge_err_rsn"/> <desc> - <p><c>add_edge/5</c> creates (or modifies) the edge <c><anno>E</anno></c> - of the digraph <c><anno>G</anno></c>, using <c><anno>Label</anno></c> as the (new) - <seealso marker="#label">label</seealso> of the edge. The + <p><c>add_edge/5</c> creates (or modifies) edge <c><anno>E</anno></c> + of digraph <c><anno>G</anno></c>, using <c><anno>Label</anno></c> as + the (new) <seealso marker="#label">label</seealso> of the edge. The edge is <seealso marker="#emanate">emanating</seealso> from - <c><anno>V1</anno></c> and <seealso marker="#incident">incident</seealso> - on <c><anno>V2</anno></c>. Returns <c><anno>E</anno></c>. - </p> - <p><c>add_edge(<anno>G</anno>, <anno>V1</anno>, <anno>V2</anno>, <anno>Label</anno>)</c> is - equivalent to + <c><anno>V1</anno></c> and + <seealso marker="#incident">incident</seealso> + on <c><anno>V2</anno></c>. Returns <c><anno>E</anno></c>.</p> + <p><c>add_edge(<anno>G</anno>, <anno>V1</anno>, <anno>V2</anno>, <anno>Label</anno>)</c> + is equivalent to <c>add_edge(<anno>G</anno>, <anno>E</anno>, <anno>V1</anno>, <anno>V2</anno>, <anno>Label</anno>)</c>, where <c><anno>E</anno></c> is a created edge. The created edge is - represented by the term <c>['$e' | N]</c>, where N - is an integer >= 0. - </p> - <p><c>add_edge(<anno>G</anno>, <anno>V1</anno>, <anno>V2</anno>)</c> is equivalent to + represented by term <c>['$e' | N]</c>, where <c>N</c> + is an integer >= 0.</p> + <p><c>add_edge(<anno>G</anno>, <anno>V1</anno>, <anno>V2</anno>)</c> + is equivalent to <c>add_edge(<anno>G</anno>, <anno>V1</anno>, <anno>V2</anno>, [])</c>. - </p> - <p>If the edge would create a cycle in - an <seealso marker="#acyclic_digraph">acyclic digraph</seealso>, - then <c>{error, {bad_edge, <anno>Path</anno>}}</c> is returned. If - either of <c><anno>V1</anno></c> or <c><anno>V2</anno></c> is not a vertex of the - digraph <c><anno>G</anno></c>, then + </p> + <p>If the edge would create a cycle in + an <seealso marker="#acyclic_digraph">acyclic digraph</seealso>, + <c>{error, {bad_edge, <anno>Path</anno>}}</c> is returned. + If either of <c><anno>V1</anno></c> or <c><anno>V2</anno></c> is not + a vertex of digraph <c><anno>G</anno></c>, <c>{error, {bad_vertex, </c><anno>V</anno><c>}}</c> is returned, <anno>V</anno> = <c><anno>V1</anno></c> or - <anno>V</anno> = <c><anno>V2</anno></c>. - </p> + <anno>V</anno> = <c><anno>V2</anno></c>.</p> </desc> </func> + <func> <name name="add_vertex" arity="1"/> <name name="add_vertex" arity="2"/> <name name="add_vertex" arity="3"/> <fsummary>Add or modify a vertex of a digraph.</fsummary> <desc> - <p><c>add_vertex/3</c> creates (or modifies) the vertex <c><anno>V</anno></c> - of the digraph <c><anno>G</anno></c>, using <c><anno>Label</anno></c> as the (new) + <p><c>add_vertex/3</c> creates (or modifies) vertex + <c><anno>V</anno></c> of digraph <c><anno>G</anno></c>, using + <c><anno>Label</anno></c> as the (new) <seealso marker="#label">label</seealso> of the - vertex. Returns <c><anno>V</anno></c>. - </p> - <p><c>add_vertex(<anno>G</anno>, <anno>V</anno>)</c> is equivalent to - <c>add_vertex(<anno>G</anno>, <anno>V</anno>, [])</c>. - </p> + vertex. Returns <c><anno>V</anno></c>.</p> + <p><c>add_vertex(<anno>G</anno>, <anno>V</anno>)</c> is equivalent + to <c>add_vertex(<anno>G</anno>, <anno>V</anno>, [])</c>. + </p> <p><c>add_vertex/1</c> creates a vertex using the empty list as label, and returns the created vertex. The created vertex - is represented by the term <c>['$v' | N]</c>, - where N is an integer >= 0. - </p> + is represented by term <c>['$v' | N]</c>, + where <c>N</c> is an integer >= 0.</p> </desc> </func> + <func> <name name="del_edge" arity="2"/> <fsummary>Delete an edge from a digraph.</fsummary> <desc> - <p>Deletes the edge <c><anno>E</anno></c> from the digraph <c><anno>G</anno></c>. - </p> + <p>Deletes edge <c><anno>E</anno></c> from digraph + <c><anno>G</anno></c>.</p> </desc> </func> + <func> <name name="del_edges" arity="2"/> <fsummary>Delete edges from a digraph.</fsummary> <desc> - <p>Deletes the edges in the list <c><anno>Edges</anno></c> from the digraph - <c><anno>G</anno></c>. - </p> + <p>Deletes the edges in list <c><anno>Edges</anno></c> from digraph + <c><anno>G</anno></c>.</p> </desc> </func> + <func> <name name="del_path" arity="3"/> <fsummary>Delete paths from a digraph.</fsummary> <desc> - <p>Deletes edges from the digraph <c><anno>G</anno></c> until there are no - <seealso marker="#path">paths</seealso> from the vertex - <c><anno>V1</anno></c> to the vertex <c><anno>V2</anno></c>. - </p> - <p>A sketch of the procedure employed: Find an arbitrary - <seealso marker="#simple_path">simple path</seealso> - v[1], v[2], ..., v[k] from <c><anno>V1</anno></c> to - <c><anno>V2</anno></c> in <c><anno>G</anno></c>. Remove all edges of - <c><anno>G</anno></c> <seealso marker="#emanate">emanating</seealso> from v[i] - and <seealso marker="#incident">incident</seealso> to v[i+1] for - 1 <= i < k (including multiple - edges). Repeat until there is no path between <c><anno>V1</anno></c> and - <c><anno>V2</anno></c>. - </p> + <p>Deletes edges from digraph <c><anno>G</anno></c> until there are no + <seealso marker="#path">paths</seealso> from vertex + <c><anno>V1</anno></c> to vertex <c><anno>V2</anno></c>.</p> + <p>A sketch of the procedure employed:</p> + <list type="bulleted"> + <item> + <p>Find an arbitrary + <seealso marker="#simple_path">simple path</seealso> + v[1], v[2], ..., v[k] from <c><anno>V1</anno></c> + to <c><anno>V2</anno></c> in <c><anno>G</anno></c>.</p> + </item> + <item> + <p>Remove all edges of <c><anno>G</anno></c> + <seealso marker="#emanate">emanating</seealso> from v[i] and + <seealso marker="#incident">incident</seealso> to v[i+1] for + 1 <= i < k (including multiple + edges).</p> + </item> + <item> + <p>Repeat until there is no path between <c><anno>V1</anno></c> + and <c><anno>V2</anno></c>.</p> + </item> + </list> </desc> </func> + <func> <name name="del_vertex" arity="2"/> <fsummary>Delete a vertex from a digraph.</fsummary> <desc> - <p>Deletes the vertex <c><anno>V</anno></c> from the digraph <c><anno>G</anno></c>. Any - edges <seealso marker="#emanate">emanating</seealso> from - <c><anno>V</anno></c> or <seealso marker="#incident">incident</seealso> - on <c><anno>V</anno></c> are also deleted. - </p> + <p>Deletes vertex <c><anno>V</anno></c> from digraph + <c><anno>G</anno></c>. Any edges + <seealso marker="#emanate">emanating</seealso> from + <c><anno>V</anno></c> or + <seealso marker="#incident">incident</seealso> + on <c><anno>V</anno></c> are also deleted.</p> </desc> </func> + <func> <name name="del_vertices" arity="2"/> <fsummary>Delete vertices from a digraph.</fsummary> <desc> - <p>Deletes the vertices in the list <c><anno>Vertices</anno></c> from the - digraph <c><anno>G</anno></c>. - </p> + <p>Deletes the vertices in list <c><anno>Vertices</anno></c> from + digraph <c><anno>G</anno></c>.</p> </desc> </func> + <func> <name name="delete" arity="1"/> <fsummary>Delete a digraph.</fsummary> <desc> - <p>Deletes the digraph <c><anno>G</anno></c>. This call is important - because digraphs are implemented with <c>ETS</c>. There is - no garbage collection of <c>ETS</c> tables. The digraph - will, however, be deleted if the process that created the - digraph terminates. - </p> + <p>Deletes digraph <c><anno>G</anno></c>. This call is important + as digraphs are implemented with ETS. There is + no garbage collection of ETS tables. However, the digraph + is deleted if the process that created the digraph terminates.</p> </desc> </func> + <func> <name name="edge" arity="2"/> - <fsummary>Return the vertices and the label of an edge of a digraph.</fsummary> + <fsummary>Return the vertices and the label of an edge of a digraph. + </fsummary> <desc> - <p>Returns <c>{<anno>E</anno>, <anno>V1</anno>, <anno>V2</anno>, <anno>Label</anno>}</c> where - <c><anno>Label</anno></c> is the <seealso marker="#label">label</seealso> - of the edge - <c><anno>E</anno></c> <seealso marker="#emanate">emanating</seealso> from - <c><anno>V1</anno></c> and <seealso marker="#incident">incident</seealso> on - <c><anno>V2</anno></c> of the digraph <c><anno>G</anno></c>. - If there is no edge <c><anno>E</anno></c> of the - digraph <c><anno>G</anno></c>, then <c>false</c> is returned. - </p> + <p>Returns + <c>{<anno>E</anno>, <anno>V1</anno>, <anno>V2</anno>, <anno>Label</anno>}</c>, + where <c><anno>Label</anno></c> is the + <seealso marker="#label">label</seealso> of edge + <c><anno>E</anno></c> <seealso marker="#emanate">emanating</seealso> + from <c><anno>V1</anno></c> and + <seealso marker="#incident">incident</seealso> on + <c><anno>V2</anno></c> of digraph <c><anno>G</anno></c>. + If no edge <c><anno>E</anno></c> of + digraph <c><anno>G</anno></c> exists, <c>false</c> is returned.</p> </desc> </func> + <func> <name name="edges" arity="1"/> <fsummary>Return all edges of a digraph.</fsummary> <desc> - <p>Returns a list of all edges of the digraph <c><anno>G</anno></c>, in - some unspecified order. - </p> + <p>Returns a list of all edges of digraph <c><anno>G</anno></c>, in + some unspecified order.</p> </desc> </func> + <func> <name name="edges" arity="2"/> - <fsummary>Return the edges emanating from or incident on a vertex of a digraph.</fsummary> + <fsummary>Return the edges emanating from or incident on a vertex of + a digraph.</fsummary> <desc> - <p>Returns a list of all - edges <seealso marker="#emanate">emanating</seealso> from - or <seealso marker="#incident">incident</seealso> on <c><anno>V</anno></c> - of the digraph <c><anno>G</anno></c>, in some unspecified order.</p> + <p>Returns a list of all + edges <seealso marker="#emanate">emanating</seealso> from or + <seealso marker="#incident">incident</seealso> on<c><anno>V</anno></c> + of digraph <c><anno>G</anno></c>, in some unspecified order.</p> </desc> </func> + <func> <name name="get_cycle" arity="2"/> <fsummary>Find one cycle in a digraph.</fsummary> <desc> - <p>If there is - a <seealso marker="#simple_cycle">simple cycle</seealso> of - length two or more through the vertex - <c><anno>V</anno></c>, then the cycle is returned as a list - <c>[<anno>V</anno>, ..., <anno>V</anno>]</c> of vertices, otherwise if there - is a <seealso marker="#loop">loop</seealso> through - <c><anno>V</anno></c>, then the loop is returned as a list <c>[<anno>V</anno>]</c>. If - there are no cycles through <c><anno>V</anno></c>, then <c>false</c> is - returned. - </p> - <p><c>get_path/3</c> is used for finding a simple cycle - through <c><anno>V</anno></c>. - </p> + <p>If a <seealso marker="#simple_cycle">simple cycle</seealso> of + length two or more exists through vertex <c><anno>V</anno></c>, the + cycle is returned as a list + <c>[<anno>V</anno>, ..., <anno>V</anno>]</c> of vertices. + If a <seealso marker="#loop">loop</seealso> through + <c><anno>V</anno></c> exists, the loop is returned as a list + <c>[<anno>V</anno>]</c>. If no cycles through + <c><anno>V</anno></c> exist, <c>false</c> is returned.</p> + <p><seealso marker="#get_path/3"><c>get_path/3</c></seealso> is used + for finding a simple cycle through <c><anno>V</anno></c>.</p> </desc> </func> + <func> <name name="get_path" arity="3"/> <fsummary>Find one path in a digraph.</fsummary> <desc> - <p>Tries to find - a <seealso marker="#simple_path">simple path</seealso> from - the vertex <c><anno>V1</anno></c> to the vertex - <c><anno>V2</anno></c> of the digraph <c><anno>G</anno></c>. Returns the path as a - list <c>[<anno>V1</anno>, ..., <anno>V2</anno>]</c> of vertices, or - <c>false</c> if no simple path from <c><anno>V1</anno></c> to <c><anno>V2</anno></c> - of length one or more exists. - </p> - <p>The digraph <c><anno>G</anno></c> is traversed in a depth-first manner, - and the first path found is returned. - </p> + <p>Tries to find + a <seealso marker="#simple_path">simple path</seealso> from vertex + <c><anno>V1</anno></c> to vertex <c><anno>V2</anno></c> of digraph + <c><anno>G</anno></c>. Returns the path as a list + <c>[<anno>V1</anno>, ..., <anno>V2</anno>]</c> of vertices, + or <c>false</c> if no simple path from <c><anno>V1</anno></c> to + <c><anno>V2</anno></c> of length one or more exists.</p> + <p>Digraph <c><anno>G</anno></c> is traversed in a depth-first manner, + and the first found path is returned.</p> </desc> </func> + <func> <name name="get_short_cycle" arity="2"/> <fsummary>Find one short cycle in a digraph.</fsummary> <desc> - <p>Tries to find an as short as - possible <seealso marker="#simple_cycle">simple cycle</seealso> through - the vertex <c><anno>V</anno></c> of the digraph <c>G</c>. Returns the cycle - as a list <c>[<anno>V</anno>, ..., <anno>V</anno>]</c> of vertices, or + <p>Tries to find an as short as possible + <seealso marker="#simple_cycle">simple cycle</seealso> through + vertex <c><anno>V</anno></c> of digraph <c>G</c>. Returns the cycle + as a list <c>[<anno>V</anno>, ..., <anno>V</anno>]</c> + of vertices, or <c>false</c> if no simple cycle through <c><anno>V</anno></c> exists. - Note that a <seealso marker="#loop">loop</seealso> through - <c><anno>V</anno></c> is returned as the list <c>[<anno>V</anno>, <anno>V</anno>]</c>. - </p> - <p><c>get_short_path/3</c> is used for finding a simple cycle - through <c><anno>V</anno></c>. - </p> + Notice that a <seealso marker="#loop">loop</seealso> through + <c><anno>V</anno></c> is returned as list + <c>[<anno>V</anno>, <anno>V</anno>]</c>.</p> + <p><seealso marker="#get_short_path/3"><c>get_short_path/3</c></seealso> + is used for finding a simple cycle through <c><anno>V</anno></c>.</p> </desc> </func> + <func> <name name="get_short_path" arity="3"/> <fsummary>Find one short path in a digraph.</fsummary> <desc> - <p>Tries to find an as short as - possible <seealso marker="#simple_path">simple path</seealso> from - the vertex <c><anno>V1</anno></c> to the vertex <c><anno>V2</anno></c> of the digraph <c><anno>G</anno></c>. - Returns the path as a list <c>[<anno>V1</anno>, ..., <anno>V2</anno>]</c> of - vertices, or <c>false</c> if no simple path from <c><anno>V1</anno></c> - to <c><anno>V2</anno></c> of length one or more exists. - </p> - <p>The digraph <c><anno>G</anno></c> is traversed in a breadth-first - manner, and the first path found is returned. - </p> + <p>Tries to find an as short as possible + <seealso marker="#simple_path">simple path</seealso> from vertex + <c><anno>V1</anno></c> to vertex <c><anno>V2</anno></c> of digraph + <c><anno>G</anno></c>. Returns the path as a list + <c>[<anno>V1</anno>, ..., <anno>V2</anno>]</c> of + vertices, or <c>false</c> if no simple path from + <c><anno>V1</anno></c> + to <c><anno>V2</anno></c> of length one or more exists.</p> + <p>Digraph <c><anno>G</anno></c> is traversed in a breadth-first + manner, and the first found path is returned.</p> </desc> </func> + <func> <name name="in_degree" arity="2"/> <fsummary>Return the in-degree of a vertex of a digraph.</fsummary> <desc> - <p>Returns the <seealso marker="#in_degree">in-degree</seealso> of the vertex - <c><anno>V</anno></c> of the digraph <c><anno>G</anno></c>. - </p> + <p>Returns the <seealso marker="#in_degree">in-degree</seealso> of + vertex <c><anno>V</anno></c> of digraph <c><anno>G</anno></c>.</p> </desc> </func> + <func> <name name="in_edges" arity="2"/> - <fsummary>Return all edges incident on a vertex of a digraph.</fsummary> + <fsummary>Return all edges incident on a vertex of a digraph.</fsummary> <desc> - <p>Returns a list of all - edges <seealso marker="#incident">incident</seealso> on - <c><anno>V</anno></c> of the digraph <c><anno>G</anno></c>, in some unspecified order. - </p> + <p>Returns a list of all + edges <seealso marker="#incident">incident</seealso> on + <c><anno>V</anno></c> of digraph <c><anno>G</anno></c>, + in some unspecified order.</p> </desc> </func> + <func> <name name="in_neighbours" arity="2"/> - <fsummary>Return all in-neighbours of a vertex of a digraph.</fsummary> + <fsummary>Return all in-neighbors of a vertex of a digraph.</fsummary> <desc> - <p>Returns a list of - all <seealso marker="#in_neighbour">in-neighbours</seealso> of - <c><anno>V</anno></c> of the digraph <c><anno>G</anno></c>, in some unspecified order. - </p> + <p>Returns a list of + all <seealso marker="#in_neighbour">in-neighbors</seealso> of + <c><anno>V</anno></c> of digraph <c><anno>G</anno></c>, + in some unspecified order.</p> </desc> </func> + <func> <name name="info" arity="1"/> <fsummary>Return information about a digraph.</fsummary> <type name="d_cyclicity"/> <type name="d_protection"/> <desc> - <p>Returns a list of <c>{Tag, Value}</c> pairs describing the - digraph <c><anno>G</anno></c>. The following pairs are returned: - </p> + <p>Returns a list of <c>{Tag, Value}</c> pairs describing + digraph <c><anno>G</anno></c>. The following pairs are returned:</p> <list type="bulleted"> <item> - <p><c>{cyclicity, <anno>Cyclicity</anno>}</c>, where <c><anno>Cyclicity</anno></c> + <p><c>{cyclicity, <anno>Cyclicity</anno>}</c>, where + <c><anno>Cyclicity</anno></c> is <c>cyclic</c> or <c>acyclic</c>, according to the options given to <c>new</c>.</p> </item> <item> - <p><c>{memory, <anno>NoWords</anno>}</c>, where <c><anno>NoWords</anno></c> is - the number of words allocated to the <c>ETS</c> tables.</p> + <p><c>{memory, <anno>NoWords</anno>}</c>, where + <c><anno>NoWords</anno></c> is + the number of words allocated to the ETS tables.</p> </item> <item> - <p><c>{protection, <anno>Protection</anno>}</c>, where <c><anno>Protection</anno></c> + <p><c>{protection, <anno>Protection</anno>}</c>, where + <c><anno>Protection</anno></c> is <c>protected</c> or <c>private</c>, according to the options given to <c>new</c>.</p> </item> </list> </desc> </func> + <func> <name name="new" arity="0"/> - <fsummary>Return a protected empty digraph, where cycles are allowed.</fsummary> + <fsummary>Return a protected empty digraph, where cycles are allowed. + </fsummary> <desc> - <p>Equivalent to <c>new([])</c>. - </p> + <p>Equivalent to <c>new([])</c>.</p> </desc> </func> + <func> <name name="new" arity="1"/> <fsummary>Create a new empty digraph.</fsummary> @@ -415,97 +465,103 @@ <type name="d_cyclicity"/> <type name="d_protection"/> <desc> - <p>Returns - an <seealso marker="#empty_digraph">empty digraph</seealso> with - properties according to the options in <c><anno>Type</anno></c>:</p> + <p>Returns + an <seealso marker="#empty_digraph">empty digraph</seealso> with + properties according to the options in <c><anno>Type</anno></c>:</p> <taglist> <tag><c>cyclic</c></tag> - <item>Allow <seealso marker="#cycle">cycles</seealso> in the - digraph (default).</item> + <item><p>Allows <seealso marker="#cycle">cycles</seealso> in the + digraph (default).</p></item> <tag><c>acyclic</c></tag> - <item>The digraph is to be kept <seealso marker="#acyclic_digraph">acyclic</seealso>.</item> + <item><p>The digraph is to be kept + <seealso marker="#acyclic_digraph">acyclic</seealso>.</p></item> <tag><c>protected</c></tag> - <item>Other processes can read the digraph (default).</item> + <item><p>Other processes can read the digraph (default).</p></item> <tag><c>private</c></tag> - <item>The digraph can be read and modified by the creating - process only.</item> + <item><p>The digraph can be read and modified by the creating + process only.</p></item> </taglist> - <p>If an unrecognized type option <c>T</c> is given or <c><anno>Type</anno></c> - is not a proper list, there will be a <c>badarg</c> exception. - </p> + <p>If an unrecognized type option <c>T</c> is specified or + <c><anno>Type</anno></c> + is not a proper list, a <c>badarg</c> exception is raised.</p> </desc> </func> + <func> <name name="no_edges" arity="1"/> - <fsummary>Return the number of edges of the a digraph.</fsummary> + <fsummary>Return the number of edges of a digraph.</fsummary> <desc> - <p>Returns the number of edges of the digraph <c><anno>G</anno></c>. - </p> + <p>Returns the number of edges of digraph <c><anno>G</anno></c>.</p> </desc> </func> + <func> <name name="no_vertices" arity="1"/> <fsummary>Return the number of vertices of a digraph.</fsummary> <desc> - <p>Returns the number of vertices of the digraph <c><anno>G</anno></c>. - </p> + <p>Returns the number of vertices of digraph <c><anno>G</anno></c>.</p> </desc> </func> + <func> <name name="out_degree" arity="2"/> <fsummary>Return the out-degree of a vertex of a digraph.</fsummary> <desc> - <p>Returns the <seealso marker="#out_degree">out-degree</seealso> of the vertex - <c><anno>V</anno></c> of the digraph <c><anno>G</anno></c>. - </p> + <p>Returns the <seealso marker="#out_degree">out-degree</seealso> of + vertex <c><anno>V</anno></c> of digraph <c><anno>G</anno></c>.</p> </desc> </func> + <func> <name name="out_edges" arity="2"/> - <fsummary>Return all edges emanating from a vertex of a digraph.</fsummary> + <fsummary>Return all edges emanating from a vertex of a digraph. + </fsummary> <desc> - <p>Returns a list of all - edges <seealso marker="#emanate">emanating</seealso> from - <c><anno>V</anno></c> of the digraph <c><anno>G</anno></c>, in some unspecified order. - </p> + <p>Returns a list of all + edges <seealso marker="#emanate">emanating</seealso> from + <c><anno>V</anno></c> of digraph <c><anno>G</anno></c>, + in some unspecified order.</p> </desc> </func> + <func> <name name="out_neighbours" arity="2"/> - <fsummary>Return all out-neighbours of a vertex of a digraph.</fsummary> + <fsummary>Return all out-neighbors of a vertex of a digraph.</fsummary> <desc> <p>Returns a list of - all <seealso marker="#out_neighbour">out-neighbours</seealso> of - <c><anno>V</anno></c> of the digraph <c><anno>G</anno></c>, in some unspecified order. - </p> + all <seealso marker="#out_neighbour">out-neighbors</seealso> of + <c><anno>V</anno></c> of digraph <c><anno>G</anno></c>, + in some unspecified order.</p> </desc> </func> + <func> <name name="vertex" arity="2"/> <fsummary>Return the label of a vertex of a digraph.</fsummary> <desc> - <p>Returns <c>{<anno>V</anno>, <anno>Label</anno>}</c> where <c><anno>Label</anno></c> is the + <p>Returns <c>{<anno>V</anno>, <anno>Label</anno>}</c>, + where <c><anno>Label</anno></c> is the <seealso marker="#label">label</seealso> of the vertex - <c><anno>V</anno></c> of the digraph <c><anno>G</anno></c>, or <c>false</c> if there - is no vertex <c><anno>V</anno></c> of the digraph <c><anno>G</anno></c>. - </p> + <c><anno>V</anno></c> of digraph <c><anno>G</anno></c>, + or <c>false</c> if no vertex <c><anno>V</anno></c> + of digraph <c><anno>G</anno></c> exists.</p> </desc> </func> + <func> <name name="vertices" arity="1"/> <fsummary>Return all vertices of a digraph.</fsummary> <desc> - <p>Returns a list of all vertices of the digraph <c><anno>G</anno></c>, in - some unspecified order. - </p> + <p>Returns a list of all vertices of digraph <c><anno>G</anno></c>, in + some unspecified order.</p> </desc> </func> </funcs> <section> <title>See Also</title> - <p><seealso marker="digraph_utils">digraph_utils(3)</seealso>, - <seealso marker="ets">ets(3)</seealso></p> + <p><seealso marker="digraph_utils"><c>digraph_utils(3)</c></seealso>, + <seealso marker="ets"><c>ets(3)</c></seealso></p> </section> </erlref> diff --git a/lib/stdlib/doc/src/digraph_utils.xml b/lib/stdlib/doc/src/digraph_utils.xml index 9bddee546f..cb316e5b93 100644 --- a/lib/stdlib/doc/src/digraph_utils.xml +++ b/lib/stdlib/doc/src/digraph_utils.xml @@ -24,345 +24,386 @@ <title>digraph_utils</title> <prepared>Hans Bolinder</prepared> - <responsible>nobody</responsible> + <responsible></responsible> <docno></docno> - <approved>nobody</approved> - <checked>no</checked> + <approved></approved> + <checked></checked> <date>2001-08-27</date> <rev>PA1</rev> - <file>digraph_utils.sgml</file> + <file>digraph_utils.xml</file> </header> <module>digraph_utils</module> - <modulesummary>Algorithms for Directed Graphs</modulesummary> + <modulesummary>Algorithms for directed graphs.</modulesummary> <description> - <p>The <c>digraph_utils</c> module implements some algorithms - based on depth-first traversal of directed graphs. See the - <c>digraph</c> module for basic functions on directed graphs. - </p> - <p>A <marker id="digraph"></marker><em>directed graph</em> (or - just "digraph") is a pair (V, E) of a finite set V of - <marker id="vertex"></marker><em>vertices</em> and a finite set E - of <marker id="edge"></marker><em>directed edges</em> (or just - "edges"). The set of edges E is a subset of V × V - (the Cartesian product of V with itself). - </p> - <p>Digraphs can be annotated with additional information. Such - information may be attached to the vertices and to the edges of - the digraph. A digraph which has been annotated is called a - <em>labeled digraph</em>, and the information attached to a - vertex or an edge is called a <marker id="label"></marker> - <em>label</em>.</p> - <p>An edge e = (v, w) is said - to <marker id="emanate"></marker><em>emanate</em> from vertex v and - to be <marker id="incident"></marker><em>incident</em> on vertex w. - If there is an edge emanating from v and incident on w, then w is - said to be - an <marker id="out_neighbour"></marker><em>out-neighbour</em> of v, - and v is said to be - an <marker id="in_neighbour"></marker><em>in-neighbour</em> of w. - A <marker id="path"></marker><em>path</em> P from v[1] to v[k] in a - digraph (V, E) is a non-empty sequence - v[1], v[2], ..., v[k] of vertices in V such that - there is an edge (v[i],v[i+1]) in E for - 1 <= i < k. - The <marker id="length"></marker><em>length</em> of the path P is k-1. - P is a <marker id="cycle"></marker><em>cycle</em> if the length of P - is not zero and v[1] = v[k]. - A <marker id="loop"></marker><em>loop</em> is a cycle of length one. - An <marker id="acyclic_digraph"></marker><em>acyclic digraph</em> is - a digraph that has no cycles. - </p> + <p>This module provides algorithms based on depth-first traversal of + directed graphs. For basic functions on directed graphs, see the + <seealso marker="digraph"><c>digraph(3)</c></seealso> module.</p> - <p>A <marker id="depth_first_traversal"></marker> <em>depth-first - traversal</em> of a directed digraph can be viewed as a process - that visits all vertices of the digraph. Initially, all vertices - are marked as unvisited. The traversal starts with an - arbitrarily chosen vertex, which is marked as visited, and - follows an edge to an unmarked vertex, marking that vertex. The - search then proceeds from that vertex in the same fashion, until - there is no edge leading to an unvisited vertex. At that point - the process backtracks, and the traversal continues as long as - there are unexamined edges. If there remain unvisited vertices - when all edges from the first vertex have been examined, some - hitherto unvisited vertex is chosen, and the process is - repeated. - </p> - <p>A <marker id="partial_ordering"></marker><em>partial ordering</em> of - a set S is a transitive, antisymmetric and reflexive relation - between the objects of S. The problem - of <marker id="topsort"></marker><em>topological sorting</em> is to - find a total - ordering of S that is a superset of the partial ordering. A - digraph G = (V, E) is equivalent to a relation E - on V (we neglect the fact that the version of directed graphs - implemented in the <c>digraph</c> module allows multiple edges - between vertices). If the digraph has no cycles of length two or - more, then the reflexive and transitive closure of E is a - partial ordering. - </p> - <p>A <marker id="subgraph"></marker><em>subgraph</em> G' of G is a - digraph whose vertices and edges form subsets of the vertices - and edges of G. G' is <em>maximal</em> with respect to a - property P if all other subgraphs that include the vertices of - G' do not have the property P. A <marker - id="strong_components"></marker> <em>strongly connected - component</em> is a maximal subgraph such that there is a path - between each pair of vertices. A <marker - id="components"></marker><em>connected component</em> is a - maximal subgraph such that there is a path between each pair of - vertices, considering all edges undirected. An <marker - id="arborescence"></marker><em>arborescence</em> is an acyclic - digraph with a vertex V, the <marker - id="root"></marker><em>root</em>, such that there is a unique - path from V to every other vertex of G. A <marker - id="tree"></marker><em>tree</em> is an acyclic non-empty digraph - such that there is a unique path between every pair of vertices, - considering all edges undirected.</p> + <list type="bulleted"> + <item> + <p>A <marker id="digraph"></marker><em>directed graph</em> (or just + "digraph") is a pair (V, E) of a finite set V of + <marker id="vertex"></marker><em>vertices</em> and a finite set E of + <marker id="edge"></marker><em>directed edges</em> (or just "edges"). + The set of edges E is a subset of V × V (the + Cartesian product of V with itself).</p> + </item> + <item> + <p>Digraphs can be annotated with more information. Such information + can be attached to the vertices and to the edges of the digraph. An + annotated digraph is called a <em>labeled digraph</em>, and the + information attached to a vertex or an edge is called a + <marker id="label"></marker><em>label</em>.</p> + </item> + <item> + <p>An edge e = (v, w) is said to + <marker id="emanate"></marker><em>emanate</em> from vertex v and to + be <marker id="incident"></marker><em>incident</em> on vertex w.</p> + </item> + <item> + <p>If an edge is emanating from v and incident on w, then w is + said to be an <marker id="out_neighbour"></marker> + <em>out-neighbor</em> of v, and v is said to be an + <marker id="in_neighbour"></marker><em>in-neighbor</em> of w.</p> + </item> + <item> + <p>A <marker id="path"></marker><em>path</em> P from v[1] to v[k] + in a digraph (V, E) is a non-empty sequence + v[1], v[2], ..., v[k] of vertices in V such that + there is an edge (v[i],v[i+1]) in E for + 1 <= i < k.</p> + </item> + <item> + <p>The <marker id="length"></marker><em>length</em> of path P is + k-1.</p> + </item> + <item> + <p>Path P is a <marker id="cycle"></marker><em>cycle</em> if the + length of P is not zero and v[1] = v[k].</p> + </item> + <item> + <p>A <marker id="loop"></marker><em>loop</em> is a cycle of length + one.</p> + </item> + <item> + <p>An <marker id="acyclic_digraph"></marker><em>acyclic digraph</em> + is a digraph without cycles.</p> + </item> + <item> + <p>A <marker id="depth_first_traversal"></marker><em>depth-first + traversal</em> of a directed digraph can be viewed as a process + that visits all vertices of the digraph. Initially, all vertices + are marked as unvisited. The traversal starts with an + arbitrarily chosen vertex, which is marked as visited, and + follows an edge to an unmarked vertex, marking that vertex. The + search then proceeds from that vertex in the same fashion, until + there is no edge leading to an unvisited vertex. At that point + the process backtracks, and the traversal continues as long as + there are unexamined edges. If unvisited vertices remain + when all edges from the first vertex have been examined, some + so far unvisited vertex is chosen, and the process is repeated.</p> + </item> + <item> + <p>A <marker id="partial_ordering"></marker><em>partial ordering</em> + of a set S is a transitive, antisymmetric, and reflexive relation + between the objects of S.</p> + </item> + <item> + <p>The problem of + <marker id="topsort"></marker><em>topological sorting</em> is to find + a total ordering of S that is a superset of the partial ordering. A + digraph G = (V, E) is equivalent to a relation E + on V (we neglect that the version of directed graphs + provided by the <c>digraph</c> module allows multiple edges + between vertices). If the digraph has no cycles of length two or + more, the reflexive and transitive closure of E is a + partial ordering.</p> + </item> + <item> + <p>A <marker id="subgraph"></marker><em>subgraph</em> G' of G is a + digraph whose vertices and edges form subsets of the vertices + and edges of G.</p> + </item> + <item> + <p>G' is <em>maximal</em> with respect to a property P if all other + subgraphs that include the vertices of G' do not have property P.</p> + </item> + <item> + <p>A <marker id="strong_components"></marker><em>strongly connected + component</em> is a maximal subgraph such that there is a path + between each pair of vertices.</p> + </item> + <item> + <p>A <marker id="components"></marker><em>connected component</em> + is a maximal subgraph such that there is a path between each pair of + vertices, considering all edges undirected.</p> + </item> + <item> + <p>An <marker id="arborescence"></marker><em>arborescence</em> is an + acyclic digraph with a vertex V, the + <marker id="root"></marker><em>root</em>, such that there is a unique + path from V to every other vertex of G.</p> + </item> + <item> + <p>A <marker id="tree"></marker><em>tree</em> is an acyclic non-empty + digraph such that there is a unique path between every pair of + vertices, considering all edges undirected.</p> + </item> + </list> </description> - <datatypes> - <datatype> - <name>digraph()</name> - <desc><p><marker id="type-digraph"/> - A digraph as returned by <c>digraph:new/0,1</c>.</p></desc> - </datatype> - </datatypes> <funcs> <func> <name name="arborescence_root" arity="1"/> <fsummary>Check if a digraph is an arborescence.</fsummary> <desc> - <p>Returns <c>{yes, <anno>Root</anno>}</c> if <c><anno>Root</anno></c> is - the <seealso marker="#root">root</seealso> of the arborescence - <c><anno>Digraph</anno></c>, <c>no</c> otherwise. - </p> + <p>Returns <c>{yes, <anno>Root</anno>}</c> if <c><anno>Root</anno></c> + is the <seealso marker="#root">root</seealso> of the arborescence + <c><anno>Digraph</anno></c>, otherwise <c>no</c>.</p> </desc> </func> + <func> <name name="components" arity="1"/> <fsummary>Return the components of a digraph.</fsummary> <desc> - <p>Returns a list - of <seealso marker="#components">connected components</seealso>. - Each component is represented by its + <p>Returns a list + of <seealso marker="#components">connected components.</seealso>. + Each component is represented by its vertices. The order of the vertices and the order of the - components are arbitrary. Each vertex of the digraph - <c><anno>Digraph</anno></c> occurs in exactly one component. - </p> + components are arbitrary. Each vertex of digraph + <c><anno>Digraph</anno></c> occurs in exactly one component.</p> </desc> </func> + <func> <name name="condensation" arity="1"/> <fsummary>Return a condensed graph of a digraph.</fsummary> <desc> - <p>Creates a digraph where the vertices are - the <seealso marker="#strong_components">strongly connected - components</seealso> of <c><anno>Digraph</anno></c> as returned by - <c>strong_components/1</c>. If X and Y are two different strongly - connected components, and there exist vertices x and y in X - and Y respectively such that there is an - edge <seealso marker="#emanate">emanating</seealso> from x - and <seealso marker="#incident">incident</seealso> on y, then - an edge emanating from X and incident on Y is created. - </p> + <p>Creates a digraph where the vertices are + the <seealso marker="#strong_components">strongly connected + components</seealso> of <c><anno>Digraph</anno></c> as returned by + <seealso marker="#strong_components/1"> + <c>strong_components/1</c></seealso>. + If X and Y are two different strongly + connected components, and vertices x and y exist in X + and Y, respectively, such that there is an + edge <seealso marker="#emanate">emanating</seealso> from x + and <seealso marker="#incident">incident</seealso> on y, then + an edge emanating from X and incident on Y is created.</p> <p>The created digraph has the same type as <c><anno>Digraph</anno></c>. - All vertices and edges have the - default <seealso marker="#label">label</seealso> <c>[]</c>. - </p> - <p>Each and every <seealso marker="#cycle">cycle</seealso> is - included in some strongly connected component, which implies - that there always exists - a <seealso marker="#topsort">topological ordering</seealso> of the - created digraph.</p> + All vertices and edges have the + default <seealso marker="#label">label</seealso> <c>[]</c>.</p> + <p>Each <seealso marker="#cycle">cycle</seealso> is + included in some strongly connected component, which implies that + a <seealso marker="#topsort">topological ordering</seealso> of the + created digraph always exists.</p> </desc> </func> + <func> <name name="cyclic_strong_components" arity="1"/> <fsummary>Return the cyclic strong components of a digraph.</fsummary> <desc> - <p>Returns a list of <seealso marker="#strong_components">strongly - connected components</seealso>. - Each strongly component is represented + <p>Returns a list of <seealso marker="#strong_components">strongly + connected components</seealso>. Each strongly component is represented by its vertices. The order of the vertices and the order of the components are arbitrary. Only vertices that are included in some <seealso marker="#cycle">cycle</seealso> in - <c><anno>Digraph</anno></c> are returned, otherwise the returned list is - equal to that returned by <c>strong_components/1</c>. - </p> + <c><anno>Digraph</anno></c> are returned, otherwise the returned + list is equal to that returned by + <seealso marker="#strong_components/1"> + <c>strong_components/1</c></seealso>.</p> </desc> </func> + <func> <name name="is_acyclic" arity="1"/> <fsummary>Check if a digraph is acyclic.</fsummary> <desc> - <p>Returns <c>true</c> if and only if the digraph - <c><anno>Digraph</anno></c> is <seealso marker="#acyclic_digraph">acyclic</seealso>.</p> + <p>Returns <c>true</c> if and only if digraph + <c><anno>Digraph</anno></c> is + <seealso marker="#acyclic_digraph">acyclic</seealso>.</p> </desc> </func> + <func> <name name="is_arborescence" arity="1"/> <fsummary>Check if a digraph is an arborescence.</fsummary> <desc> - <p>Returns <c>true</c> if and only if the digraph + <p>Returns <c>true</c> if and only if digraph <c><anno>Digraph</anno></c> is an <seealso marker="#arborescence">arborescence</seealso>.</p> </desc> </func> + <func> <name name="is_tree" arity="1"/> <fsummary>Check if a digraph is a tree.</fsummary> <desc> - <p>Returns <c>true</c> if and only if the digraph + <p>Returns <c>true</c> if and only if digraph <c><anno>Digraph</anno></c> is - a <seealso marker="#tree">tree</seealso>.</p> + a <seealso marker="#tree">tree</seealso>.</p> </desc> </func> + <func> <name name="loop_vertices" arity="1"/> - <fsummary>Return the vertices of a digraph included in some loop.</fsummary> + <fsummary>Return the vertices of a digraph included in some loop. + </fsummary> <desc> - <p>Returns a list of all vertices of <c><anno>Digraph</anno></c> that are - included in some <seealso marker="#loop">loop</seealso>.</p> + <p>Returns a list of all vertices of <c><anno>Digraph</anno></c> that + are included in some <seealso marker="#loop">loop</seealso>.</p> </desc> </func> + <func> <name name="postorder" arity="1"/> - <fsummary>Return the vertices of a digraph in post-order.</fsummary> + <fsummary>Return the vertices of a digraph in postorder.</fsummary> <desc> - <p>Returns all vertices of the digraph <c><anno>Digraph</anno></c>. The - order is given by - a <seealso marker="#depth_first_traversal">depth-first - traversal</seealso> of the digraph, collecting visited + <p>Returns all vertices of digraph <c><anno>Digraph</anno></c>. + The order is given by + a <seealso marker="#depth_first_traversal">depth-first + traversal</seealso> of the digraph, collecting visited vertices in postorder. More precisely, the vertices visited while searching from an arbitrarily chosen vertex are collected in postorder, and all those collected vertices are - placed before the subsequently visited vertices. - </p> + placed before the subsequently visited vertices.</p> </desc> </func> + <func> <name name="preorder" arity="1"/> - <fsummary>Return the vertices of a digraph in pre-order.</fsummary> + <fsummary>Return the vertices of a digraph in preorder.</fsummary> <desc> - <p>Returns all vertices of the digraph <c><anno>Digraph</anno></c>. The - order is given by - a <seealso marker="#depth_first_traversal">depth-first - traversal</seealso> of the digraph, collecting visited - vertices in pre-order.</p> + <p>Returns all vertices of digraph <c><anno>Digraph</anno></c>. + The order is given by + a <seealso marker="#depth_first_traversal">depth-first + traversal</seealso> of the digraph, collecting visited + vertices in preorder.</p> </desc> </func> + <func> <name name="reachable" arity="2"/> - <fsummary>Return the vertices reachable from some vertices of a digraph.</fsummary> + <fsummary>Return the vertices reachable from some vertices of a digraph. + </fsummary> <desc> <p>Returns an unsorted list of digraph vertices such that for - each vertex in the list, there is - a <seealso marker="#path">path</seealso> in <c><anno>Digraph</anno></c> from some + each vertex in the list, there is a + <seealso marker="#path">path</seealso> in <c><anno>Digraph</anno></c> + from some vertex of <c><anno>Vertices</anno></c> to the vertex. In particular, - since paths may have length zero, the vertices of - <c><anno>Vertices</anno></c> are included in the returned list. - </p> + as paths can have length zero, the vertices of + <c><anno>Vertices</anno></c> are included in the returned list.</p> </desc> </func> + <func> <name name="reachable_neighbours" arity="2"/> - <fsummary>Return the neighbours reachable from some vertices of a digraph.</fsummary> + <fsummary>Return the neighbors reachable from some vertices of a + digraph.</fsummary> <desc> <p>Returns an unsorted list of digraph vertices such that for - each vertex in the list, there is - a <seealso marker="#path">path</seealso> in <c><anno>Digraph</anno></c> of length + each vertex in the list, there is a + <seealso marker="#path">path</seealso> in <c><anno>Digraph</anno></c> + of length one or more from some vertex of <c><anno>Vertices</anno></c> to the - vertex. As a consequence, only those vertices - of <c><anno>Vertices</anno></c> that are included in - some <seealso marker="#cycle">cycle</seealso> are returned. - </p> + vertex. As a consequence, only those vertices + of <c><anno>Vertices</anno></c> that are included in + some <seealso marker="#cycle">cycle</seealso> are returned.</p> </desc> </func> + <func> <name name="reaching" arity="2"/> - <fsummary>Return the vertices that reach some vertices of a digraph.</fsummary> + <fsummary>Return the vertices that reach some vertices of a digraph. + </fsummary> <desc> <p>Returns an unsorted list of digraph vertices such that for - each vertex in the list, there is - a <seealso marker="#path">path</seealso> from the vertex to some - vertex of <c><anno>Vertices</anno></c>. In particular, since paths may have - length zero, the vertices of <c><anno>Vertices</anno></c> are included in - the returned list. - </p> + each vertex in the list, there is + a <seealso marker="#path">path</seealso> from the vertex to some + vertex of <c><anno>Vertices</anno></c>. In particular, as paths + can have length zero, the vertices of <c><anno>Vertices</anno></c> + are included in the returned list.</p> </desc> </func> + <func> <name name="reaching_neighbours" arity="2"/> - <fsummary>Return the neighbours that reach some vertices of a digraph.</fsummary> + <fsummary>Return the neighbors that reach some vertices of a digraph. + </fsummary> <desc> <p>Returns an unsorted list of digraph vertices such that for - each vertex in the list, there is - a <seealso marker="#path">path</seealso> of length one or more - from the vertex to some vertex of <c><anno>Vertices</anno></c>. As a consequence, - only those vertices of <c><anno>Vertices</anno></c> that are included in - some <seealso marker="#cycle">cycle</seealso> are returned. - </p> + each vertex in the list, there is + a <seealso marker="#path">path</seealso> of length one or more + from the vertex to some vertex of <c><anno>Vertices</anno></c>. + Therefore only those vertices of <c><anno>Vertices</anno></c> + that are included + in some <seealso marker="#cycle">cycle</seealso> are returned.</p> </desc> </func> + <func> <name name="strong_components" arity="1"/> <fsummary>Return the strong components of a digraph.</fsummary> <desc> - <p>Returns a list of <seealso marker="#strong_components">strongly - connected components</seealso>. - Each strongly component is represented + <p>Returns a list of <seealso marker="#strong_components">strongly + connected components</seealso>. + Each strongly component is represented by its vertices. The order of the vertices and the order of - the components are arbitrary. Each vertex of the digraph + the components are arbitrary. Each vertex of digraph <c><anno>Digraph</anno></c> occurs in exactly one strong component. - </p> + </p> </desc> </func> + <func> <name name="subgraph" arity="2"/> <name name="subgraph" arity="3"/> <fsummary>Return a subgraph of a digraph.</fsummary> <desc> - <p>Creates a maximal <seealso marker="#subgraph">subgraph</seealso> of <c>Digraph</c> having + <p>Creates a maximal <seealso marker="#subgraph">subgraph</seealso> + of <c>Digraph</c> having as vertices those vertices of <c><anno>Digraph</anno></c> that are - mentioned in <c><anno>Vertices</anno></c>. - </p> - <p>If the value of the option <c>type</c> is <c>inherit</c>, - which is the default, then the type of <c><anno>Digraph</anno></c> is used + mentioned in <c><anno>Vertices</anno></c>.</p> + <p>If the value of option <c>type</c> is <c>inherit</c>, which is + the default, the type of <c><anno>Digraph</anno></c> is used for the subgraph as well. Otherwise the option value of <c>type</c> - is used as argument to <c>digraph:new/1</c>. - </p> - <p>If the value of the option <c>keep_labels</c> is <c>true</c>, - which is the default, then - the <seealso marker="#label">labels</seealso> of vertices and edges - of <c><anno>Digraph</anno></c> are used for the subgraph as well. If the value - is <c>false</c>, then the default label, <c>[]</c>, is used - for the subgraph's vertices and edges. - </p> - <p><c>subgraph(<anno>Digraph</anno>, <anno>Vertices</anno>)</c> is equivalent to - <c>subgraph(<anno>Digraph</anno>, <anno>Vertices</anno>, [])</c>. - </p> - <p>There will be a <c>badarg</c> exception if any of the arguments - are invalid. - </p> + is used as argument to + <seealso marker="digraph:new/1"><c>digraph:new/1</c></seealso>.</p> + <p>If the value of option <c>keep_labels</c> is <c>true</c>, + which is the default, + the <seealso marker="#label">labels</seealso> of vertices and edges + of <c><anno>Digraph</anno></c> are used for the subgraph as well. If + the value is <c>false</c>, default label <c>[]</c> is used + for the vertices and edges of the subgroup.</p> + <p><c>subgraph(<anno>Digraph</anno>, <anno>Vertices</anno>)</c> is + equivalent to + <c>subgraph(<anno>Digraph</anno>, <anno>Vertices</anno>, [])</c>.</p> + <p>If any of the arguments are invalid, a <c>badarg</c> exception is + raised.</p> </desc> </func> + <func> <name name="topsort" arity="1"/> - <fsummary>Return a topological sorting of the vertices of a digraph.</fsummary> + <fsummary>Return a topological sorting of the vertices of a digraph. + </fsummary> <desc> - <p>Returns a <seealso marker="#topsort">topological - ordering</seealso> of the vertices of the digraph - <c><anno>Digraph</anno></c> if such an ordering exists, <c>false</c> - otherwise. For each vertex in the returned list, there are - no <seealso marker="#out_neighbour">out-neighbours</seealso> - that occur earlier in the list.</p> + <p>Returns a <seealso marker="#topsort">topological + ordering</seealso> of the vertices of digraph + <c><anno>Digraph</anno></c> if such an ordering exists, otherwise + <c>false</c>. For each vertex in the returned list, + no <seealso marker="#out_neighbour">out-neighbors</seealso> + occur earlier in the list.</p> </desc> </func> </funcs> <section> <title>See Also</title> - <p><seealso marker="digraph">digraph(3)</seealso></p> + <p><seealso marker="digraph"><c>digraph(3)</c></seealso></p> </section> </erlref> diff --git a/lib/stdlib/doc/src/epp.xml b/lib/stdlib/doc/src/epp.xml index ac87f9c2b6..1dc0161398 100644 --- a/lib/stdlib/doc/src/epp.xml +++ b/lib/stdlib/doc/src/epp.xml @@ -28,214 +28,241 @@ <docno>1</docno> <approved>Kenneth Lundin</approved> <checked></checked> - <date>97-01-31</date> + <date>1997-01-31</date> <rev>B</rev> - <file>epp.sgml</file> + <file>epp.xml</file> </header> <module>epp</module> - <modulesummary>An Erlang Code Preprocessor</modulesummary> + <modulesummary>An Erlang code preprocessor.</modulesummary> <description> - <p>The Erlang code preprocessor includes functions which are used - by <c>compile</c> to preprocess macros and include files before - the actual parsing takes place.</p> + <p>The Erlang code preprocessor includes functions that are used by the + <seealso marker="compiler:compile"><c>compile</c></seealso> + module to preprocess macros and include files before + the parsing takes place.</p> + <p>The Erlang source file <marker id="encoding"/><em>encoding</em> is selected by a comment in one of the first two lines of the source file. The - first string that matches the regular expression + first string matching the regular expression <c>coding\s*[:=]\s*([-a-zA-Z0-9])+</c> selects the encoding. If - the matching string is not a valid encoding it is ignored. The - valid encodings are <c>Latin-1</c> and <c>UTF-8</c> where the - case of the characters can be chosen freely. Examples:</p> - <pre> + the matching string is not a valid encoding, it is ignored. The + valid encodings are <c>Latin-1</c> and <c>UTF-8</c>, where the + case of the characters can be chosen freely.</p> + + <p><em>Examples:</em></p> + + <pre> %% coding: utf-8</pre> - <pre> + + <pre> %% For this file we have chosen encoding = Latin-1</pre> - <pre> + + <pre> %% -*- coding: latin-1 -*-</pre> </description> + <datatypes> <datatype> <name name="macros"></name> </datatype> <datatype> <name name="epp_handle"></name> - <desc><p>Handle to the epp server.</p></desc> + <desc><p>Handle to the <c>epp</c> server.</p></desc> </datatype> <datatype> <name name="source_encoding"></name> </datatype> </datatypes> + <funcs> <func> - <name name="open" arity="1"/> - <fsummary>Open a file for preprocessing</fsummary> + <name name="close" arity="1"/> + <fsummary>Close the preprocessing of the file associated with <c>Epp</c>. + </fsummary> <desc> - <p>Opens a file for preprocessing.</p> - <p>If <c>extra</c> is given in - <c><anno>Options</anno></c>, the return value will be - <c>{ok, <anno>Epp</anno>, <anno>Extra</anno>}</c> instead - of <c>{ok, <anno>Epp</anno>}</c>.</p> + <p>Closes the preprocessing of a file.</p> </desc> </func> + <func> - <name name="open" arity="2"/> - <fsummary>Open a file for preprocessing</fsummary> + <name name="default_encoding" arity="0"/> + <fsummary>Return the default encoding of Erlang source files.</fsummary> <desc> - <p>Equivalent to <c>epp:open([{name, FileName}, {includes, IncludePath}])</c>.</p> + <p>Returns the default encoding of Erlang source files.</p> </desc> </func> + <func> - <name name="open" arity="3"/> - <fsummary>Open a file for preprocessing</fsummary> + <name name="encoding_to_string" arity="1"/> + <fsummary>Return a string representation of an encoding.</fsummary> <desc> - <p>Equivalent to <c>epp:open([{name, FileName}, {includes, IncludePath}, - {macros, PredefMacros}])</c>.</p> + <p>Returns a string representation of an encoding. The string + is recognized by + <seealso marker="#read_encoding/1"><c>read_encoding/1,2</c></seealso>, + <seealso marker="#read_encoding_from_binary/1"> + <c>read_encoding_from_binary/1,2</c></seealso>, and + <seealso marker="#set_encoding/1"><c>set_encoding/1,2</c></seealso> + as a valid encoding.</p> </desc> </func> + <func> - <name name="close" arity="1"/> - <fsummary>Close the preprocessing of the file associated with <c>Epp</c></fsummary> + <name name="format_error" arity="1"/> + <fsummary>Format an error descriptor.</fsummary> <desc> - <p>Closes the preprocessing of a file.</p> + <p>Takes an <c><anno>ErrorDescriptor</anno></c> and returns + a string that + describes the error or warning. This function is usually + called implicitly when processing an <c>ErrorInfo</c> + structure (see section + <seealso marker="#errorinfo">Error Information</seealso>).</p> </desc> </func> + <func> - <name name="parse_erl_form" arity="1"/> - <fsummary>Return the next Erlang form from the opened Erlang source file</fsummary> - <type name="warning_info"/> + <name name="open" arity="1"/> + <fsummary>Open a file for preprocessing.</fsummary> <desc> - <p>Returns the next Erlang form from the opened Erlang source file. - The tuple <c>{eof, <anno>Line</anno>}</c> is returned at end-of-file. The first - form corresponds to an implicit attribute <c>-file(File,1).</c>, where - <c>File</c> is the name of the file.</p> + <p>Opens a file for preprocessing.</p> + <p>If <c>extra</c> is specified in + <c><anno>Options</anno></c>, the return value is + <c>{ok, <anno>Epp</anno>, <anno>Extra</anno>}</c> instead + of <c>{ok, <anno>Epp</anno>}</c>.</p> </desc> </func> + <func> - <name name="parse_file" arity="2"/> - <fsummary>Preprocess and parse an Erlang source file</fsummary> + <name name="open" arity="2"/> + <fsummary>Open a file for preprocessing.</fsummary> <desc> - <p>Preprocesses and parses an Erlang source file. - Note that the tuple <c>{eof, <anno>Line</anno>}</c> returned - at end-of-file is included as a "form".</p> - <p>If <c>extra</c> is given in - <c><anno>Options</anno></c>, the return value will be - <c>{ok, [<anno>Form</anno>], <anno>Extra</anno>}</c> instead - of <c>{ok, [<anno>Form</anno>]}</c>.</p> + <p>Equivalent to + <c>epp:open([{name, FileName}, {includes, IncludePath}])</c>.</p> </desc> </func> + <func> - <name name="parse_file" arity="3"/> - <fsummary>Preprocess and parse an Erlang source file</fsummary> + <name name="open" arity="3"/> + <fsummary>Open a file for preprocessing.</fsummary> <desc> - <p>Equivalent to <c>epp:parse_file(FileName, [{includes, IncludePath}, - {macros, PredefMacros}])</c>.</p> + <p>Equivalent to <c>epp:open([{name, FileName}, {includes, IncludePath}, + {macros, PredefMacros}])</c>.</p> </desc> </func> + <func> - <name name="default_encoding" arity="0"/> - <fsummary>Return the default encoding of Erlang source files</fsummary> + <name name="parse_erl_form" arity="1"/> + <fsummary>Return the next Erlang form from the opened Erlang source file. + </fsummary> + <type name="warning_info"/> <desc> - <p>Returns the default encoding of Erlang source files.</p> + <p>Returns the next Erlang form from the opened Erlang source file. + Tuple <c>{eof, <anno>Line</anno>}</c> is returned at the end of the + file. The first form corresponds to an implicit attribute + <c>-file(File,1).</c>, where <c>File</c> is the file name.</p> </desc> </func> + <func> - <name name="encoding_to_string" arity="1"/> - <fsummary>Return a string representation of an encoding</fsummary> + <name name="parse_file" arity="2"/> + <fsummary>Preprocess and parse an Erlang source file.</fsummary> <desc> - <p>Returns a string representation of an encoding. The string - is recognized by <c>read_encoding/1,2</c>, - <c>read_encoding_from_binary/1,2</c>, and - <c>set_encoding/1,2</c> as a valid encoding.</p> + <p>Preprocesses and parses an Erlang source file. + Notice that tuple <c>{eof, <anno>Line</anno>}</c> returned at the + end of the file is included as a "form".</p> + <p>If <c>extra</c> is specified in + <c><anno>Options</anno></c>, the return value is + <c>{ok, [<anno>Form</anno>], <anno>Extra</anno>}</c> instead + of <c>{ok, [<anno>Form</anno>]}</c>.</p> + </desc> + </func> + + <func> + <name name="parse_file" arity="3"/> + <fsummary>Preprocess and parse an Erlang source file.</fsummary> + <desc> + <p>Equivalent to <c>epp:parse_file(FileName, [{includes, IncludePath}, + {macros, PredefMacros}])</c>.</p> </desc> </func> + <func> <name name="read_encoding" arity="1"/> <name name="read_encoding" arity="2"/> - <fsummary>Read the encoding from a file</fsummary> + <fsummary>Read the encoding from a file.</fsummary> <desc> <p>Read the <seealso marker="#encoding">encoding</seealso> from a file. Returns the read encoding, or <c>none</c> if no - valid encoding was found.</p> - <p>The option <c>in_comment_only</c> is <c>true</c> by + valid encoding is found.</p> + <p>Option <c>in_comment_only</c> is <c>true</c> by default, which is correct for Erlang source files. If set to - <c>false</c> the encoding string does not necessarily have to + <c>false</c>, the encoding string does not necessarily have to occur in a comment.</p> </desc> </func> + <func> <name name="read_encoding_from_binary" arity="1"/> <name name="read_encoding_from_binary" arity="2"/> - <fsummary>Read the encoding from a binary</fsummary> + <fsummary>Read the encoding from a binary.</fsummary> <desc> <p>Read the <seealso marker="#encoding">encoding</seealso> from a binary. Returns the read encoding, or <c>none</c> if no - valid encoding was found.</p> - <p>The option <c>in_comment_only</c> is <c>true</c> by + valid encoding is found.</p> + <p>Option <c>in_comment_only</c> is <c>true</c> by default, which is correct for Erlang source files. If set to - <c>false</c> the encoding string does not necessarily have to + <c>false</c>, the encoding string does not necessarily have to occur in a comment.</p> </desc> </func> + <func> <name name="set_encoding" arity="1"/> - <fsummary>Read and set the encoding of an IO device</fsummary> + <fsummary>Read and set the encoding of an I/O device.</fsummary> <desc> <p>Reads the <seealso marker="#encoding">encoding</seealso> from - an IO device and sets the encoding of the device - accordingly. The position of the IO device referenced by + an I/O device and sets the encoding of the device + accordingly. The position of the I/O device referenced by <c><anno>File</anno></c> is not affected. If no valid - encoding can be read from the IO device the encoding of the - IO device is set to the default encoding.</p> + encoding can be read from the I/O device, the encoding of the + I/O device is set to the default encoding.</p> <p>Returns the read encoding, or <c>none</c> if no valid - encoding was found.</p> + encoding is found.</p> </desc> </func> + <func> <name name="set_encoding" arity="2"/> - <fsummary>Read and set the encoding of an IO device</fsummary> + <fsummary>Read and set the encoding of an I/O device.</fsummary> <desc> <p>Reads the <seealso marker="#encoding">encoding</seealso> from - an IO device and sets the encoding of the device - accordingly. The position of the IO device referenced by + an I/O device and sets the encoding of the device + accordingly. The position of the I/O device referenced by <c><anno>File</anno></c> is not affected. If no valid - encoding can be read from the IO device the encoding of the - IO device is set to the - <seealso marker="#encoding">encoding</seealso> given by - <c><anno>Default</anno></c>.</p> + encoding can be read from the I/O device, the encoding of the + I/O device is set to the + <seealso marker="#encoding">encoding</seealso> specified by + <c><anno>Default</anno></c>.</p> <p>Returns the read encoding, or <c>none</c> if no valid - encoding was found.</p> - </desc> - </func> - <func> - <name name="format_error" arity="1"/> - <fsummary>Format an error descriptor</fsummary> - <desc> - <p>Takes an <c><anno>ErrorDescriptor</anno></c> and returns - a string which - describes the error or warning. This function is usually - called implicitly when processing an <c>ErrorInfo</c> - structure (see below).</p> + encoding is found.</p> </desc> </func> </funcs> <section> <title>Error Information</title> - <p>The <c>ErrorInfo</c> mentioned above is the standard - <c>ErrorInfo</c> structure which is returned from all IO - modules. It has the following format: - </p> + <marker id="errorinfo"/> + <p><c>ErrorInfo</c> is the standard <c>ErrorInfo</c> structure that is + returned from all I/O modules. The format is as follows:</p> <code type="none"> - {ErrorLine, Module, ErrorDescriptor} </code> - <p>A string which describes the error is obtained with the following call: - </p> +{ErrorLine, Module, ErrorDescriptor}</code> + <p>A string describing the error is obtained with the following call:</p> <code type="none"> - Module:format_error(ErrorDescriptor) </code> +Module:format_error(ErrorDescriptor)</code> </section> <section> <title>See Also</title> - <p><seealso marker="erl_parse">erl_parse(3)</seealso></p> + <p><seealso marker="erl_parse"><c>erl_parse(3)</c></seealso></p> </section> </erlref> diff --git a/lib/stdlib/doc/src/erl_anno.xml b/lib/stdlib/doc/src/erl_anno.xml index ddc8b8c765..f316f63d98 100644 --- a/lib/stdlib/doc/src/erl_anno.xml +++ b/lib/stdlib/doc/src/erl_anno.xml @@ -5,7 +5,7 @@ <header> <copyright> <year>2015</year> - <year>2015</year> + <year>2016</year> <holder>Ericsson AB, All Rights Reserved</holder> </copyright> <legalnotice> @@ -35,75 +35,81 @@ <file>erl_anno.xml</file> </header> <module>erl_anno</module> - - <modulesummary> - Abstract Datatype for the Annotations of the Erlang Compiler + <modulesummary>Abstract datatype for the annotations of the Erlang Compiler. </modulesummary> <description> - <p>This module implements an abstract type that is used by the + <p>This module provides an abstract type that is used by the Erlang Compiler and its helper modules for holding data such as column, line number, and text. The data type is a collection of <marker id="annotations"/><em>annotations</em> as described in the following.</p> + <p>The Erlang Token Scanner returns tokens with a subset of the following annotations, depending on the options:</p> + <taglist> <tag><c>column</c></tag> <item><p>The column where the token begins.</p></item> <tag><c>location</c></tag> <item><p>The line and column where the token begins, or - just the line if the column unknown.</p> - </item> + just the line if the column is unknown.</p></item> <tag><c>text</c></tag> <item><p>The token's text.</p></item> </taglist> - <p>From the above the following annotation is derived:</p> + + <p>From this, the following annotation is derived:</p> + <taglist> <tag><c>line</c></tag> <item><p>The line where the token begins.</p></item> </taglist> - <p>Furthermore, the following annotations are supported by - this module, and used by various modules:</p> + + <p>This module also supports the following annotations, + which are used by various modules:</p> + <taglist> <tag><c>file</c></tag> <item><p>A filename.</p></item> <tag><c>generated</c></tag> <item><p>A Boolean indicating if the abstract code is - compiler generated. The Erlang Compiler does not emit warnings - for such code.</p> - </item> + compiler-generated. The Erlang Compiler does not emit warnings + for such code.</p></item> <tag><c>record</c></tag> <item><p>A Boolean indicating if the origin of the abstract - code is a record. Used by Dialyzer to assign types to tuple - elements.</p> + code is a record. Used by + <seealso marker="dialyzer:dialyzer">Dialyzer</seealso> + to assign types to tuple elements.</p> </item> </taglist> + <p>The functions - <seealso marker="erl_scan#column/1">column()</seealso>, - <seealso marker="erl_scan#end_location/1">end_location()</seealso>, - <seealso marker="erl_scan#line/1">line()</seealso>, - <seealso marker="erl_scan#location/1">location()</seealso>, and - <seealso marker="erl_scan#text/1">text()</seealso> + <seealso marker="erl_scan#column/1"><c>column()</c></seealso>, + <seealso marker="erl_scan#end_location/1"><c>end_location()</c></seealso>, + <seealso marker="erl_scan#line/1"><c>line()</c></seealso>, + <seealso marker="erl_scan#location/1"><c>location()</c></seealso>, and + <seealso marker="erl_scan#text/1"><c>text()</c></seealso> in the <c>erl_scan</c> module can be used for inspecting annotations in tokens.</p> + <p>The functions - <seealso marker="erl_parse#map_anno/2">map_anno()</seealso>, - <seealso marker="erl_parse#fold_anno/3">fold_anno()</seealso>, - <seealso marker="erl_parse#mapfold_anno/3">mapfold_anno()</seealso>, - <seealso marker="erl_parse#new_anno/1">new_anno()</seealso>, <seealso marker="erl_parse#anno_from_term/1"> - anno_from_term()</seealso>, and + <c>anno_from_term()</c></seealso>, <seealso marker="erl_parse#anno_to_term/1"> - anno_to_term()</seealso> in the <c>erl_parse</c> module can be - used for manipulating annotations in abstract code. - </p> + <c>anno_to_term()</c></seealso>, + <seealso marker="erl_parse#fold_anno/3"><c>fold_anno()</c></seealso>, + <seealso marker="erl_parse#map_anno/2"><c>map_anno()</c></seealso>, + <seealso marker="erl_parse#mapfold_anno/3"> + <c>mapfold_anno()</c></seealso>, + and <seealso marker="erl_parse#new_anno/1"><c>new_anno()</c></seealso>, + in the <c>erl_parse</c> module can be + used for manipulating annotations in abstract code.</p> </description> <datatypes> <datatype> <name>anno()</name> - <desc><p><marker id="type-anno"/>A collection of annotations.</p> + <desc><p>A collection of annotations.</p> </desc> </datatype> <datatype> @@ -118,9 +124,6 @@ </datatype> <datatype> <name name="line"></name> - <desc> - <p>To be changed to a non-negative integer in Erlang/OTP 19.0.</p> - </desc> </datatype> <datatype> <name name="location"></name> @@ -133,177 +136,169 @@ <funcs> <func> <name name="column" arity="1"/> - <fsummary>Return the column</fsummary> + <fsummary>Return the column.</fsummary> <type name="column"></type> <desc> - <p>Returns the column of the annotations <anno>Anno</anno>. - </p> + <p>Returns the column of the annotations <anno>Anno</anno>.</p> </desc> </func> + <func> <name name="end_location" arity="1"/> - <fsummary>Return the end location of the text</fsummary> + <fsummary>Return the end location of the text.</fsummary> <type name="location"></type> <desc> <p>Returns the end location of the text of the annotations <anno>Anno</anno>. If there is no text, - <c>undefined</c> is returned. - </p> + <c>undefined</c> is returned.</p> </desc> </func> + <func> <name name="file" arity="1"/> - <fsummary>Return the filename</fsummary> + <fsummary>Return the filename.</fsummary> <type name="filename"></type> <desc> <p>Returns the filename of the annotations <anno>Anno</anno>. - If there is no filename, <c>undefined</c> is returned. - </p> + If there is no filename, <c>undefined</c> is returned.</p> </desc> </func> + <func> <name name="from_term" arity="1"/> - <fsummary>Return annotations given a term</fsummary> + <fsummary>Return annotations given a term.</fsummary> <desc> - <p>Returns annotations with the representation <anno>Term</anno>. - </p> - <!-- - <p>Although it is possible to create new annotations by calling - <c>from_term/1</c>, the intention is that one should not do - so - the proper way to create annotations is to call - <c>new/1</c> and then modify the annotations - by calling the <c>set_*</c> functions.</p> - --> - <p>See also <seealso marker="#to_term/1">to_term()</seealso>. - </p> + <p>Returns annotations with representation <anno>Term</anno>.</p> + <p>See also <seealso marker="#to_term/1">to_term()</seealso>.</p> </desc> </func> + <func> <name name="generated" arity="1"/> - <fsummary>Return the generated Boolean</fsummary> + <fsummary>Return the generated Boolean.</fsummary> <type name="generated"></type> <desc> - <p>Returns <c>true</c> if the annotations <anno>Anno</anno> - has been marked as generated. The default is to return - <c>false</c>. - </p> + <p>Returns <c>true</c> if annotations <anno>Anno</anno> + is marked as generated. The default is to return + <c>false</c>.</p> </desc> </func> + <func> <name name="is_anno" arity="1"/> - <fsummary>Test for a collection of annotations</fsummary> + <fsummary>Test for a collection of annotations.</fsummary> <desc> <p>Returns <c>true</c> if <anno>Term</anno> is a collection of - annotations, <c>false</c> otherwise.</p> + annotations, otherwise <c>false</c>.</p> </desc> </func> + <func> <name name="line" arity="1"/> - <fsummary>Return the line</fsummary> + <fsummary>Return the line.</fsummary> <type name="line"></type> <desc> - <p>Returns the line of the annotations <anno>Anno</anno>. - </p> + <p>Returns the line of the annotations <anno>Anno</anno>.</p> </desc> </func> + <func> <name name="location" arity="1"/> - <fsummary>Return the location</fsummary> + <fsummary>Return the location.</fsummary> <type name="location"></type> <desc> - <p>Returns the location of the annotations <anno>Anno</anno>. - </p> + <p>Returns the location of the annotations <anno>Anno</anno>.</p> </desc> </func> + <func> <name name="new" arity="1"/> - <fsummary>Create a new collection of annotations</fsummary> + <fsummary>Create a new collection of annotations.</fsummary> <type name="location"></type> <desc> <p>Creates a new collection of annotations given a location.</p> </desc> </func> + <func> <name name="set_file" arity="2"/> - <fsummary>Modify the filename</fsummary> + <fsummary>Modify the filename.</fsummary> <type name="filename"></type> <desc> - <p>Modifies the filename of the annotations <anno>Anno</anno>. - </p> + <p>Modifies the filename of the annotations <anno>Anno</anno>.</p> </desc> </func> + <func> <name name="set_generated" arity="2"/> - <fsummary>Modify the generated marker</fsummary> + <fsummary>Modify the generated marker.</fsummary> <type name="generated"></type> <desc> - <p>Modifies the generated marker of the annotations - <anno>Anno</anno>. + <p>Modifies the generated marker of the annotations <anno>Anno</anno>. </p> </desc> </func> + <func> <name name="set_line" arity="2"/> - <fsummary>Modify the line</fsummary> + <fsummary>Modify the line.</fsummary> <type name="line"></type> <desc> - <p>Modifies the line of the annotations <anno>Anno</anno>. - </p> + <p>Modifies the line of the annotations <anno>Anno</anno>.</p> </desc> </func> + <func> <name name="set_location" arity="2"/> - <fsummary>Modify the location</fsummary> + <fsummary>Modify the location.</fsummary> <type name="location"></type> <desc> - <p>Modifies the location of the annotations <anno>Anno</anno>. - </p> + <p>Modifies the location of the annotations <anno>Anno</anno>.</p> </desc> </func> + <func> <name name="set_record" arity="2"/> - <fsummary>Modify the record marker</fsummary> + <fsummary>Modify the record marker.</fsummary> <type name="record"></type> <desc> - <p>Modifies the record marker of the annotations <anno>Anno</anno>. - </p> + <p>Modifies the record marker of the annotations <anno>Anno</anno>.</p> </desc> </func> + <func> <name name="set_text" arity="2"/> - <fsummary>Modify the text</fsummary> + <fsummary>Modify the text.</fsummary> <type name="text"></type> <desc> - <p>Modifies the text of the annotations <anno>Anno</anno>. - </p> + <p>Modifies the text of the annotations <anno>Anno</anno>.</p> </desc> </func> <func> + <name name="text" arity="1"/> - <fsummary>Return the text</fsummary> + <fsummary>Return the text.</fsummary> <type name="text"></type> <desc> <p>Returns the text of the annotations <anno>Anno</anno>. - If there is no text, <c>undefined</c> is returned. - </p> + If there is no text, <c>undefined</c> is returned.</p> </desc> </func> + <func> <name name="to_term" arity="1"/> - <fsummary>Return the term representing a collection of - annotations</fsummary> + <fsummary>Return the term representing a collection of annotations. + </fsummary> <desc> - <p>Returns the term representing the annotations <anno>Anno</anno>. - </p> - <p>See also <seealso marker="#from_term/1">from_term()</seealso>. - </p> + <p>Returns the term representing the annotations <anno>Anno</anno>.</p> + <p>See also <seealso marker="#from_term/1">from_term()</seealso>.</p> </desc> </func> </funcs> + <section> <title>See Also</title> - <p><seealso marker="erl_scan">erl_scan(3)</seealso>, - <seealso marker="erl_parse">erl_parse(3)</seealso> - </p> + <p><seealso marker="erl_parse"><c>erl_parse(3)</c></seealso>, + <seealso marker="erl_scan"><c>erl_scan(3)</c></seealso></p> </section> </erlref> diff --git a/lib/stdlib/doc/src/erl_eval.xml b/lib/stdlib/doc/src/erl_eval.xml index d60b04b510..1c0f7f062f 100644 --- a/lib/stdlib/doc/src/erl_eval.xml +++ b/lib/stdlib/doc/src/erl_eval.xml @@ -28,19 +28,19 @@ <docno>1</docno> <approved>Bjarne Däcker</approved> <checked></checked> - <date>97-01-21</date> + <date>1997-01-21</date> <rev>B</rev> - <file>erl_eval.sgml</file> + <file>erl_eval.xml</file> </header> <module>erl_eval</module> - <modulesummary>The Erlang Meta Interpreter</modulesummary> + <modulesummary>The Erlang meta interpreter.</modulesummary> <description> <p>This module provides an interpreter for Erlang expressions. The expressions are in the abstract syntax as returned by <seealso marker="erl_parse"><c>erl_parse</c></seealso>, - the Erlang parser, or <seealso marker="io"> - <c>io</c></seealso>.</p> + the Erlang parser, or <seealso marker="io"><c>io</c></seealso>.</p> </description> + <datatypes> <datatype> <name name="bindings"/> @@ -73,9 +73,9 @@ </datatype> <datatype> <name name="local_function_handler"/> - <desc><p>Further described - <seealso marker="#local_function_handler">below.</seealso></p> - </desc> + <desc><p>Further described in section + <seealso marker="#local_function_handler"> + Local Function Handler</seealso> in this module</p></desc> </datatype> <datatype> <name name="name"/> @@ -85,152 +85,164 @@ </datatype> <datatype> <name name="non_local_function_handler"/> - <desc><p>Further described - <seealso marker="#non_local_function_handler">below.</seealso></p> - </desc> + <desc><p>Further described in section + <seealso marker="#non_local_function_handler"> + Non-Local Function Handler</seealso> in this module.</p></desc> </datatype> <datatype> <name name="value"/> </datatype> </datatypes> + <funcs> <func> - <name name="exprs" arity="2"/> - <name name="exprs" arity="3"/> - <name name="exprs" arity="4"/> - <fsummary>Evaluate expressions</fsummary> + <name name="add_binding" arity="3"/> + <fsummary>Add a binding.</fsummary> <desc> - <p>Evaluates <c><anno>Expressions</anno></c> with the set of bindings - <c><anno>Bindings</anno></c>, where <c><anno>Expressions</anno></c> - is a sequence of - expressions (in abstract syntax) of a type which may be - returned by <seealso marker="io#parse_erl_exprs/2"> - <c>io:parse_erl_exprs/2</c></seealso>. See below for an - explanation of how and when to use the arguments - <c><anno>LocalFunctionHandler</anno></c> and - <c><anno>NonLocalFunctionHandler</anno></c>. - </p> - <p>Returns <c>{value, <anno>Value</anno>, <anno>NewBindings</anno>}</c> - </p> + <p>Adds binding <c><anno>Name</anno>=<anno>Value</anno></c> + to <c><anno>BindingStruct</anno></c>. + Returns an updated binding structure.</p> </desc> </func> + + <func> + <name name="binding" arity="2"/> + <fsummary>Return bindings.</fsummary> + <desc> + <p>Returns the binding of <c><anno>Name</anno></c> + in <c><anno>BindingStruct</anno></c>.</p> + </desc> + </func> + + <func> + <name name="bindings" arity="1"/> + <fsummary>Return bindings.</fsummary> + <desc> + <p>Returns the list of bindings contained in the binding + structure.</p> + </desc> + </func> + + <func> + <name name="del_binding" arity="2"/> + <fsummary>Delete a binding.</fsummary> + <desc> + <p>Removes the binding of <c><anno>Name</anno></c> + in <c><anno>BindingStruct</anno></c>. + Returns an updated binding structure.</p> + </desc> + </func> + <func> <name name="expr" arity="2"/> <name name="expr" arity="3"/> <name name="expr" arity="4"/> <name name="expr" arity="5"/> - <fsummary>Evaluate expression</fsummary> + <fsummary>Evaluate expression.</fsummary> <desc> <p>Evaluates <c><anno>Expression</anno></c> with the set of bindings - <c><anno>Bindings</anno></c>. <c><anno>Expression</anno></c> - is an expression in - abstract syntax. See below for an explanation of - how and when to use the arguments + <c><anno>Bindings</anno></c>. <c><anno>Expression</anno></c> is an + expression in abstract syntax. + For an explanation of when and how to use arguments <c><anno>LocalFunctionHandler</anno></c> and - <c><anno>NonLocalFunctionHandler</anno></c>. - </p> - <p>Returns <c>{value, <anno>Value</anno>, - <anno>NewBindings</anno>}</c> by default. But if the - <c><anno>ReturnFormat</anno></c> is <c>value</c> only - the <c><anno>Value</anno></c> is returned.</p> + <c><anno>NonLocalFunctionHandler</anno></c>, see sections + <seealso marker="#local_function_handler"> + Local Function Handler</seealso> and + <seealso marker="#non_local_function_handler"> + Non-Local Function Handler</seealso> in this module.</p> + <p>Returns <c>{value, <anno>Value</anno>, <anno>NewBindings</anno>}</c> + by default. If <c><anno>ReturnFormat</anno></c> is <c>value</c>, + only <c><anno>Value</anno></c> is returned.</p> </desc> </func> + <func> <name name="expr_list" arity="2"/> <name name="expr_list" arity="3"/> <name name="expr_list" arity="4"/> - <fsummary>Evaluate a list of expressions</fsummary> + <fsummary>Evaluate a list of expressions.</fsummary> <desc> <p>Evaluates a list of expressions in parallel, using the same initial bindings for each expression. Attempts are made to - merge the bindings returned from each evaluation. This - function is useful in the <c>LocalFunctionHandler</c>. See below. - </p> + merge the bindings returned from each evaluation. This + function is useful in <c>LocalFunctionHandler</c>, see section + <seealso marker="#local_function_handler"> + Local Function Handler</seealso> in this module.</p> <p>Returns <c>{<anno>ValueList</anno>, <anno>NewBindings</anno>}</c>. </p> </desc> </func> + <func> - <name name="new_bindings" arity="0"/> - <fsummary>Return a bindings structure</fsummary> - <desc> - <p>Returns an empty binding structure.</p> - </desc> - </func> - <func> - <name name="bindings" arity="1"/> - <fsummary>Return bindings</fsummary> - <desc> - <p>Returns the list of bindings contained in the binding - structure.</p> - </desc> - </func> - <func> - <name name="binding" arity="2"/> - <fsummary>Return bindings</fsummary> - <desc> - <p>Returns the binding of <c><anno>Name</anno></c> - in <c><anno>BindingStruct</anno></c>.</p> - </desc> - </func> - <func> - <name name="add_binding" arity="3"/> - <fsummary>Add a binding</fsummary> + <name name="exprs" arity="2"/> + <name name="exprs" arity="3"/> + <name name="exprs" arity="4"/> + <fsummary>Evaluate expressions.</fsummary> <desc> - <p>Adds the binding <c><anno>Name</anno> = <anno>Value</anno></c> - to <c><anno>BindingStruct</anno></c>. - Returns an updated binding structure.</p> + <p>Evaluates <c><anno>Expressions</anno></c> with the set of bindings + <c><anno>Bindings</anno></c>, where <c><anno>Expressions</anno></c> + is a sequence of expressions (in abstract syntax) of a type that can + be returned by <seealso marker="io#parse_erl_exprs/2"> + <c>io:parse_erl_exprs/2</c></seealso>. + For an explanation of when and how to use arguments + <c><anno>LocalFunctionHandler</anno></c> and + <c><anno>NonLocalFunctionHandler</anno></c>, see sections + <seealso marker="#local_function_handler"> + Local Function Handler</seealso> and + <seealso marker="#non_local_function_handler"> + Non-Local Function Handler</seealso> in this module.</p> + <p>Returns <c>{value, <anno>Value</anno>, <anno>NewBindings</anno>}</c> + </p> </desc> </func> + <func> - <name name="del_binding" arity="2"/> - <fsummary>Delete a binding</fsummary> + <name name="new_bindings" arity="0"/> + <fsummary>Return a bindings structure.</fsummary> <desc> - <p>Removes the binding of <c><anno>Name</anno></c> - in <c><anno>BindingStruct</anno></c>. - Returns an updated binding structure.</p> + <p>Returns an empty binding structure.</p> </desc> </func> </funcs> <section> + <marker id="local_function_handler"></marker> <title>Local Function Handler</title> - <p><marker id="local_function_handler"></marker> - During evaluation of a function, no calls can be made to local + <p>During evaluation of a function, no calls can be made to local functions. An undefined function error would be generated. However, the optional argument - <c>LocalFunctionHandler</c> may be used to define a function - which is called when there is a call to a local function. The + <c>LocalFunctionHandler</c> can be used to define a function + that is called when there is a call to a local function. The argument can have the following formats:</p> <taglist> <tag><c>{value,Func}</c></tag> <item> - <p>This defines a local function handler which is called with:</p> + <p>This defines a local function handler that is called with:</p> <code type="none"> -Func(Name, Arguments) </code> +Func(Name, Arguments)</code> <p><c>Name</c> is the name of the local function (an atom) and <c>Arguments</c> is a list of the <em>evaluated</em> arguments. The function handler returns the value of the - local function. In this case, it is not possible to access - the current bindings. To signal an error, the function - handler just calls <c>exit/1</c> with a suitable exit value.</p> + local function. In this case, the current bindings cannot be + accessed. To signal an error, the function + handler calls <c>exit/1</c> with a suitable exit value.</p> </item> <tag><c>{eval,Func}</c></tag> <item> - <p>This defines a local function handler which is called with:</p> + <p>This defines a local function handler that is called with:</p> <code type="none"> -Func(Name, Arguments, Bindings) </code> +Func(Name, Arguments, Bindings)</code> <p><c>Name</c> is the name of the local function (an atom), <c>Arguments</c> is a list of the <em>unevaluated</em> arguments, and <c>Bindings</c> are the current variable bindings. The function handler returns:</p> <code type="none"> -{value,Value,NewBindings} </code> +{value,Value,NewBindings}</code> <p><c>Value</c> is the value of the local function and <c>NewBindings</c> are the updated variable bindings. In this case, the function handler must itself evaluate all the function arguments and manage the bindings. To signal an - error, the function handler just calls <c>exit/1</c> with a + error, the function handler calls <c>exit/1</c> with a suitable exit value.</p> </item> <tag><c>none</c></tag> @@ -241,55 +253,66 @@ Func(Name, Arguments, Bindings) </code> </section> <section> - <title>Non-local Function Handler</title> - <p><marker id="non_local_function_handler"></marker> - The optional argument <c>NonlocalFunctionHandler</c> may be - used to define a function which is called in the following - cases: a functional object (fun) is called; a built-in function - is called; a function is called using the M:F syntax, where M - and F are atoms or expressions; an operator Op/A is called - (this is handled as a call to the function <c>erlang:Op/A</c>). - Exceptions are calls to <c>erlang:apply/2,3</c>; neither of the - function handlers will be called for such calls. + <marker id="non_local_function_handler"></marker> + <title>Non-Local Function Handler</title> + <p>The optional argument <c>NonLocalFunctionHandler</c> can be + used to define a function that is called in the following + cases:</p> + <list type="bulleted"> + <item><p>A functional object (fun) is called.</p></item> + <item><p>A built-in function is called.</p></item> + <item><p>A function is called using the <c>M:F</c> syntax, where <c>M</c> + and <c>F</c> are atoms or expressions.</p></item> + <item><p>An operator <c>Op/A</c> is called (this is handled as a call to + function <c>erlang:Op/A</c>).</p></item> + </list> + <p>Exceptions are calls to <c>erlang:apply/2,3</c>; neither of the + function handlers are called for such calls. The argument can have the following formats:</p> <taglist> <tag><c>{value,Func}</c></tag> <item> - <p>This defines an nonlocal function handler which is called with:</p> + <p>This defines a non-local function handler that is called with:</p> <code type="none"> -Func(FuncSpec, Arguments) </code> +Func(FuncSpec, Arguments)</code> <p><c>FuncSpec</c> is the name of the function on the form <c>{Module,Function}</c> or a fun, and <c>Arguments</c> is a list of the <em>evaluated</em> arguments. The function handler returns the value of the function. To - signal an error, the function handler just calls + signal an error, the function handler calls <c>exit/1</c> with a suitable exit value.</p> </item> <tag><c>none</c></tag> <item> - <p>There is no nonlocal function handler.</p> + <p>There is no non-local function handler.</p> </item> </taglist> <note> <p>For calls such as <c>erlang:apply(Fun, Args)</c> or - <c>erlang:apply(Module, Function, Args)</c> the call of the + <c>erlang:apply(Module, Function, Args)</c>, the call of the non-local function handler corresponding to the call to - <c>erlang:apply/2,3</c> itself--<c>Func({erlang, apply}, [Fun, Args])</c> or <c>Func({erlang, apply}, [Module, Function, Args])</c>--will never take place. The non-local function - handler <em>will</em> however be called with the evaluated - arguments of the call to <c>erlang:apply/2,3</c>: <c>Func(Fun, Args)</c> or <c>Func({Module, Function}, Args)</c> (assuming + <c>erlang:apply/2,3</c> itself + (<c>Func({erlang, apply}, [Fun, Args])</c> or + <c>Func({erlang, apply}, [Module, Function, Args])</c>) + never takes place.</p> + <p>The non-local function handler <em>is</em> however called with the + evaluated arguments of the call to + <c>erlang:apply/2,3</c>: <c>Func(Fun, Args)</c> or + <c>Func({Module, Function}, Args)</c> (assuming that <c>{Module, Function}</c> is not <c>{erlang, apply}</c>).</p> - <p>Calls to functions defined by evaluating fun expressions + <p>Calls to functions defined by evaluating fun expressions <c>"fun ... end"</c> are also hidden from non-local function - handlers.</p> </note> - <p>The nonlocal function handler argument is probably not used as + handlers.</p> + </note> + <p>The non-local function handler argument is probably not used as frequently as the local function handler argument. A possible use is to call <c>exit/1</c> on calls to functions that for some reason are not allowed to be called.</p> </section> <section> - <title>Bugs</title> - <p>Undocumented functions in <c>erl_eval</c> should not be used.</p> + <title>Known Limitation</title> + <p>Undocumented functions in this module are not to be used.</p> </section> </erlref> diff --git a/lib/stdlib/doc/src/erl_expand_records.xml b/lib/stdlib/doc/src/erl_expand_records.xml index 93e464c733..7e4aa2db37 100644 --- a/lib/stdlib/doc/src/erl_expand_records.xml +++ b/lib/stdlib/doc/src/erl_expand_records.xml @@ -26,33 +26,35 @@ <title>erl_expand_records</title> <prepared>Hans Bolinder</prepared> - <responsible>nobody</responsible> + <responsible></responsible> <docno></docno> - <approved>nobody</approved> - <checked>no</checked> + <approved></approved> + <checked></checked> <date>2005-12-23</date> <rev>PA1</rev> - <file>erl_expand_records.sgml</file> + <file>erl_expand_records.xml</file> </header> <module>erl_expand_records</module> - <modulesummary>Expands Records in a Module</modulesummary> + <modulesummary>Expands records in a module.</modulesummary> <description> + <p>This module expands records in a module.</p> </description> + <funcs> <func> <name name="module" arity="2"/> - <fsummary>Expand all records in a module</fsummary> + <fsummary>Expand all records in a module.</fsummary> <desc> <p>Expands all records in a module. The returned module has no - references to records, neither attributes nor code.</p> + references to records, attributes, or code.</p> </desc> </func> </funcs> <section> <title>See Also</title> - <p>The <seealso marker="erts:absform">abstract format</seealso> - documentation in ERTS User's Guide</p> + <p>Section <seealso marker="erts:absform">The Abstract Format</seealso> + in ERTS User's Guide.</p> </section> </erlref> diff --git a/lib/stdlib/doc/src/erl_id_trans.xml b/lib/stdlib/doc/src/erl_id_trans.xml index 153bd4148e..16952a9582 100644 --- a/lib/stdlib/doc/src/erl_id_trans.xml +++ b/lib/stdlib/doc/src/erl_id_trans.xml @@ -30,30 +30,32 @@ <docno>1</docno> <approved>Bjarne Däcker</approved> <checked></checked> - <date>97-01-21</date> + <date>1997-01-21</date> <rev>B</rev> - <file>erl_id_trans.sgml</file> + <file>erl_id_trans.xml</file> </header> <module>erl_id_trans</module> - <modulesummary>An Identity Parse Transform</modulesummary> + <modulesummary>An identity parse transform.</modulesummary> <description> <p>This module performs an identity parse transformation of Erlang code. - It is included as an example for users who may wish to write their own - parse transformers. If the option <c>{parse_transform,Module}</c> is passed - to the compiler, a user written function <c>parse_transform/2</c> - is called by the compiler before the code is checked for - errors.</p> + It is included as an example for users who wants to write their own + parse transformers. If option <c>{parse_transform,Module}</c> is passed + to the compiler, a user-written function <c>parse_transform/2</c> + is called by the compiler before the code is checked for errors.</p> </description> + <funcs> <func> <name>parse_transform(Forms, Options) -> Forms</name> - <fsummary>Transform Erlang forms</fsummary> + <fsummary>Transform Erlang forms.</fsummary> <type> - <v>Forms = [<seealso marker="erl_parse#type-abstract_form">erl_parse:abstract_form()</seealso>]</v> + <v>Forms = [<seealso marker="erl_parse#type-abstract_form">erl_parse:abstract_form()</seealso> + | <seealso marker="erl_parse#type-form_info">erl_parse:form_info()</seealso>]</v> <v>Options = [<seealso marker="compile#type-option">compile:option()</seealso>]</v> </type> <desc> - <p>Performs an identity transformation on Erlang forms, as an example.</p> + <p>Performs an identity transformation on Erlang forms, as an example. + </p> </desc> </func> </funcs> @@ -62,17 +64,17 @@ <title>Parse Transformations</title> <p>Parse transformations are used if a programmer wants to use Erlang syntax, but with different semantics. The original Erlang - code is then transformed into other Erlang code. - </p> + code is then transformed into other Erlang code.</p> <note> - <p>Programmers are strongly advised not to engage in parse transformations and no support is offered for problems encountered.</p> + <p>Programmers are strongly advised not to engage in parse + transformations. No support is offered for problems encountered.</p> </note> </section> <section> <title>See Also</title> - <p><seealso marker="erl_parse">erl_parse(3)</seealso>, - <seealso marker="compiler:compile">compile(3)</seealso>.</p> + <p><seealso marker="erl_parse"><c>erl_parse(3)</c></seealso>, + <seealso marker="compiler:compile"><c>compile(3)</c></seealso></p> </section> </erlref> diff --git a/lib/stdlib/doc/src/erl_internal.xml b/lib/stdlib/doc/src/erl_internal.xml index 940f8c5b40..cf49df0972 100644 --- a/lib/stdlib/doc/src/erl_internal.xml +++ b/lib/stdlib/doc/src/erl_internal.xml @@ -30,91 +30,100 @@ <docno>1</docno> <approved>Bjarne Däcker</approved> <checked></checked> - <date>97-01-21</date> + <date>1997-01-21</date> <rev>B</rev> - <file>erl_internal.sgml</file> + <file>erl_internal.xml</file> </header> <module>erl_internal</module> - <modulesummary>Internal Erlang Definitions</modulesummary> + <modulesummary>Internal Erlang definitions.</modulesummary> <description> - <p>This module defines Erlang BIFs, guard tests and operators. + <p>This module defines Erlang BIFs, guard tests, and operators. This module is only of interest to programmers who manipulate Erlang code.</p> </description> + <funcs> <func> - <name name="bif" arity="2"/> - <fsummary>Test for an Erlang BIF</fsummary> + <name name="arith_op" arity="2"/> + <fsummary>Test for an arithmetic operator.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>Name</anno>/<anno>Arity</anno></c> is an Erlang BIF - which is automatically recognized by the compiler, otherwise - <c>false</c>.</p> + <p>Returns <c>true</c> if <c><anno>OpName</anno>/<anno>Arity</anno></c> + is an arithmetic operator, otherwise <c>false</c>.</p> </desc> </func> + <func> - <name name="guard_bif" arity="2"/> - <fsummary>Test for an Erlang BIF allowed in guards</fsummary> + <name name="bif" arity="2"/> + <fsummary>Test for an Erlang BIF.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>Name</anno>/<anno>Arity</anno></c> is an Erlang BIF - which is allowed in guards, otherwise <c>false</c>.</p> + <p>Returns <c>true</c> if <c><anno>Name</anno>/<anno>Arity</anno></c> + is an Erlang BIF that is automatically recognized by the compiler, + otherwise <c>false</c>.</p> </desc> </func> + <func> - <name name="type_test" arity="2"/> - <fsummary>Test for a valid type test</fsummary> + <name name="bool_op" arity="2"/> + <fsummary>Test for a Boolean operator.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>Name</anno>/<anno>Arity</anno></c> is a valid Erlang - type test, otherwise <c>false</c>.</p> + <p>Returns <c>true</c> if <c><anno>OpName</anno>/<anno>Arity</anno></c> + is a Boolean operator, otherwise <c>false</c>.</p> </desc> </func> + <func> - <name name="arith_op" arity="2"/> - <fsummary>Test for an arithmetic operator</fsummary> + <name name="comp_op" arity="2"/> + <fsummary>Test for a comparison operator.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>OpName</anno>/<anno>Arity</anno></c> is an arithmetic - operator, otherwise <c>false</c>.</p> + <p>Returns <c>true</c> if <c><anno>OpName</anno>/<anno>Arity</anno></c> + is a comparison operator, otherwise <c>false</c>.</p> </desc> </func> + <func> - <name name="bool_op" arity="2"/> - <fsummary>Test for a Boolean operator</fsummary> + <name name="guard_bif" arity="2"/> + <fsummary>Test for an Erlang BIF allowed in guards.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>OpName</anno>/<anno>Arity</anno></c> is a Boolean - operator, otherwise <c>false</c>.</p> + <p>Returns <c>true</c> if <c><anno>Name</anno>/<anno>Arity</anno></c> is + an Erlang BIF that is allowed in guards, otherwise <c>false</c>.</p> </desc> </func> + <func> - <name name="comp_op" arity="2"/> - <fsummary>Test for a comparison operator</fsummary> + <name name="list_op" arity="2"/> + <fsummary>Test for a list operator.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>OpName</anno>/<anno>Arity</anno></c> is a comparison - operator, otherwise <c>false</c>.</p> + <p>Returns <c>true</c> if <c><anno>OpName</anno>/<anno>Arity</anno></c> + is a list operator, otherwise <c>false</c>.</p> </desc> </func> + <func> - <name name="list_op" arity="2"/> - <fsummary>Test for a list operator</fsummary> + <name name="op_type" arity="2"/> + <fsummary>Return operator type.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>OpName</anno>/<anno>Arity</anno></c> is a list - operator, otherwise <c>false</c>.</p> + <p>Returns the <c><anno>Type</anno></c> of operator that + <c><anno>OpName</anno>/<anno>Arity</anno></c> belongs to, + or generates a <c>function_clause</c> error if it is not an + operator.</p> </desc> </func> + <func> <name name="send_op" arity="2"/> - <fsummary>Test for a send operator</fsummary> + <fsummary>Test for a send operator.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>OpName</anno>/<anno>Arity</anno></c> is a send - operator, otherwise <c>false</c>.</p> + <p>Returns <c>true</c> if <c><anno>OpName</anno>/<anno>Arity</anno></c> + is a send operator, otherwise <c>false</c>.</p> </desc> </func> + <func> - <name name="op_type" arity="2"/> - <fsummary>Return operator type</fsummary> + <name name="type_test" arity="2"/> + <fsummary>Test for a valid type test.</fsummary> <desc> - <p>Returns the <c><anno>Type</anno></c> of operator that <c><anno>OpName</anno>/<anno>Arity</anno></c> - belongs to, - or generates a <c>function_clause</c> error if it is not an - operator at all.</p> + <p>Returns <c>true</c> if <c><anno>Name</anno>/<anno>Arity</anno></c> is + a valid Erlang type test, otherwise <c>false</c>.</p> </desc> </func> </funcs> diff --git a/lib/stdlib/doc/src/erl_lint.xml b/lib/stdlib/doc/src/erl_lint.xml index 3747b0f3c3..77cb7a9916 100644 --- a/lib/stdlib/doc/src/erl_lint.xml +++ b/lib/stdlib/doc/src/erl_lint.xml @@ -28,39 +28,45 @@ <docno>1</docno> <approved>Bjarne Däcker</approved> <checked></checked> - <date>97-01-27</date> + <date>1997-01-27</date> <rev>B</rev> - <file>erl_lint.sgml</file> + <file>erl_lint.xml</file> </header> <module>erl_lint</module> - <modulesummary>The Erlang Code Linter</modulesummary> + <modulesummary>The Erlang code linter.</modulesummary> <description> <p>This module is used to check Erlang code for illegal syntax and - other bugs. It also warns against coding practices which are - not recommended. </p> + other bugs. It also warns against coding practices that are + not recommended.</p> + <p>The errors detected include:</p> + <list type="bulleted"> - <item>redefined and undefined functions</item> - <item>unbound and unsafe variables</item> - <item>illegal record usage.</item> + <item>Redefined and undefined functions</item> + <item>Unbound and unsafe variables</item> + <item>Illegal record use</item> </list> - <p>Warnings include:</p> + + <p>The warnings detected include:</p> + <list type="bulleted"> - <item>unused functions and imports</item> - <item>unused variables</item> - <item>variables imported into matches</item> - <item>variables exported from - <c>if</c>/<c>case</c>/<c>receive</c></item> - <item>variables shadowed in lambdas and list - comprehensions.</item> + <item>Unused functions and imports</item> + <item>Unused variables</item> + <item>Variables imported into matches</item> + <item>Variables exported from + <c>if</c>/<c>case</c>/<c>receive</c></item> + <item>Variables shadowed in funs and list comprehensions</item> </list> + <p>Some of the warnings are optional, and can be turned on by - giving the appropriate option, described below.</p> + specifying the appropriate option, described below.</p> + <p>The functions in this module are invoked automatically by the - Erlang compiler and there is no reason to invoke these + Erlang compiler. There is no reason to invoke these functions separately unless you have written your own Erlang compiler.</p> </description> + <datatypes> <datatype> <name name="error_info"/> @@ -69,86 +75,87 @@ <name name="error_description"/> </datatype> </datatypes> + <funcs> <func> + <name name="format_error" arity="1"/> + <fsummary>Format an error descriptor.</fsummary> + <desc> + <p>Takes an <c><anno>ErrorDescriptor</anno></c> and returns a string + that describes the error or warning. This function is usually + called implicitly when processing an <c>ErrorInfo</c> structure + (see section + <seealso marker="#errorinfo">Error Information</seealso>).</p> + </desc> + </func> + + <func> + <name name="is_guard_test" arity="1"/> + <fsummary>Test for a guard test.</fsummary> + <desc> + <p>Tests if <c><anno>Expr</anno></c> is a legal guard test. + <c><anno>Expr</anno></c> is an Erlang term representing the abstract + form for the expression. <seealso marker="erl_parse#parse_exprs/1"> + <c>erl_parse:parse_exprs(Tokens)</c></seealso> + can be used to generate a list of <c><anno>Expr</anno></c>.</p> + </desc> + </func> + + <func> <name name="module" arity="1"/> <name name="module" arity="2"/> <name name="module" arity="3"/> - <fsummary>Check a module for errors</fsummary> + <fsummary>Check a module for errors.</fsummary> <desc> - <p>This function checks all the forms in a module for errors. - It returns: - </p> + <p>Checks all the forms in a module for errors. It returns:</p> <taglist> <tag><c>{ok,<anno>Warnings</anno>}</c></tag> <item> - <p>There were no errors in the module.</p> + <p>There are no errors in the module.</p> </item> <tag><c>{error,<anno>Errors</anno>,<anno>Warnings</anno>}</c></tag> <item> - <p>There were errors in the module.</p> + <p>There are errors in the module.</p> </item> </taglist> - <p>Since this module is of interest only to the maintainers of - the compiler, and to avoid having the same description in - two places to avoid the usual maintenance nightmare, the + <p>As this module is of interest only to the maintainers of the + compiler, and to avoid the same description in two places, the elements of <c>Options</c> that control the warnings are - only described in <seealso marker="compiler:compile#erl_lint_options">compile(3)</seealso>. - </p> - <p>The <c><anno>AbsForms</anno></c> of a module which comes from a file - that is read through <c>epp</c>, the Erlang pre-processor, - can come from many files. This means that any references to - errors must include the file name (see <seealso marker="epp">epp(3)</seealso>, or parser <seealso marker="erl_parse">erl_parse(3)</seealso>). - The warnings and errors returned have the following format: - </p> + only described in the + <seealso marker="compiler:compile#erl_lint_options"> + <c>compile(3)</c></seealso> module.</p> + <p><c><anno>AbsForms</anno></c> of a module, which comes from a file + that is read through <c>epp</c>, the Erlang preprocessor, can come + from many files. This means that any references to errors must + include the filename, see the <seealso marker="epp"> + <c>epp(3)</c></seealso> module or parser (see the + <seealso marker="erl_parse"><c>erl_parse(3)</c></seealso> module). + The returned errors and warnings have the following format:</p> <code type="none"> - [{<anno>FileName2</anno>,[<anno>ErrorInfo</anno>]}] </code> - <p>The errors and warnings are listed in the order in which - they are encountered in the forms. This means that the - errors from one file may be split into different entries in - the list of errors.</p> - </desc> - </func> - <func> - <name name="is_guard_test" arity="1"/> - <fsummary>Test for a guard test</fsummary> - <desc> - <p>This function tests if <c><anno>Expr</anno></c> is a legal guard test. - <c><anno>Expr</anno></c> is an Erlang term representing the abstract form - for the expression. <c>erl_parse:parse_exprs(Tokens)</c> can - be used to generate a list of <c><anno>Expr</anno></c>.</p> - </desc> - </func> - <func> - <name name="format_error" arity="1"/> - <fsummary>Format an error descriptor</fsummary> - <desc> - <p>Takes an <c><anno>ErrorDescriptor</anno></c> and returns a string which - describes the error or warning. This function is usually - called implicitly when processing an <c>ErrorInfo</c> - structure (see below).</p> +[{<anno>FileName2</anno>,[<anno>ErrorInfo</anno>]}]</code> + <p>The errors and warnings are listed in the order in which they are + encountered in the forms. The errors from one file can therefore be + split into different entries in the list of errors.</p> </desc> </func> </funcs> <section> + <marker id="errorinfo"/> <title>Error Information</title> - <p>The <c>ErrorInfo</c> mentioned above is the standard - <c>ErrorInfo</c> structure which is returned from all IO - modules. It has the following format: - </p> + <p><c>ErrorInfo</c> is the standard <c>ErrorInfo</c> structure that is + returned from all I/O modules. The format is as follows:</p> <code type="none"> - {ErrorLine, Module, ErrorDescriptor} </code> - <p>A string which describes the error is obtained with the following call: - </p> +{ErrorLine, Module, ErrorDescriptor}</code> + <p>A string describing the error is obtained with the following call:</p> <code type="none"> - Module:format_error(ErrorDescriptor) </code> +Module:format_error(ErrorDescriptor)</code> </section> <section> <title>See Also</title> - <p><seealso marker="erl_parse">erl_parse(3)</seealso>, - <seealso marker="epp">epp(3)</seealso></p> + <p><seealso marker="epp"><c>epp(3)</c></seealso>, + <seealso marker="erl_parse"><c>erl_parse(3)</c></seealso></p> </section> </erlref> diff --git a/lib/stdlib/doc/src/erl_parse.xml b/lib/stdlib/doc/src/erl_parse.xml index 13be488c33..647f36883c 100644 --- a/lib/stdlib/doc/src/erl_parse.xml +++ b/lib/stdlib/doc/src/erl_parse.xml @@ -28,43 +28,41 @@ <docno>1</docno> <approved>Bjarne Däcker</approved> <checked></checked> - <date>97-01-24</date> + <date>1997-01-24</date> <rev>B</rev> - <file>erl_parse.sgml</file> + <file>erl_parse.xml</file> </header> <module>erl_parse</module> - <modulesummary>The Erlang Parser</modulesummary> + <modulesummary>The Erlang parser.</modulesummary> <description> - <p>This module is the basic Erlang parser which converts tokens into - the abstract form of either forms (i.e., top-level constructs), + <p>This module is the basic Erlang parser that converts tokens into + the abstract form of either forms (that is, top-level constructs), expressions, or terms. The Abstract Format is described in the <seealso marker="erts:absform">ERTS User's Guide</seealso>. - Note that a token list must end with the <em>dot</em> token in order - to be acceptable to the parse functions (see <seealso marker="erl_scan">erl_scan(3)</seealso>).</p> + Notice that a token list must end with the <em>dot</em> token to be + acceptable to the parse functions (see the <seealso marker="erl_scan"> + <c>erl_scan(3)</c></seealso>) module.</p> </description> + <datatypes> <datatype> <name>abstract_clause()</name> - <desc><p><marker id="type-abstract_clause"/> - Abstract form of an Erlang clause.</p> + <desc><p>Abstract form of an Erlang clause.</p> </desc> </datatype> <datatype> <name>abstract_expr()</name> - <desc><p><marker id="type-abstract_expr"/> - Abstract form of an Erlang expression.</p> + <desc><p>Abstract form of an Erlang expression.</p> </desc> </datatype> <datatype> <name>abstract_form()</name> - <desc><p><marker id="type-abstract_form"/> - Abstract form of an Erlang form.</p> + <desc><p>Abstract form of an Erlang form.</p> </desc> </datatype> <datatype> <name>abstract_type()</name> - <desc><p><marker id="type-abstract_type"/> - Abstract form of an Erlang type.</p> + <desc><p>Abstract form of an Erlang type.</p> </desc> </datatype> <datatype> @@ -77,261 +75,268 @@ <name name="error_info"></name> </datatype> <datatype> + <name name="form_info"></name> + <desc><p>Tuples <c>{error, error_info()}</c> and <c>{warning, + error_info()}</c>, denoting syntactically incorrect forms and + warnings, and <c>{eof, line()}</c>, denoting an end-of-stream + encountered before a complete form had been parsed.</p> + </desc> + </datatype> + <datatype> <name name="token"></name> </datatype> </datatypes> + <funcs> <func> - <name name="parse_form" arity="1"/> - <fsummary>Parse an Erlang form</fsummary> - <desc> - <p>This function parses <c><anno>Tokens</anno></c> as if it were - a form. It returns:</p> - <taglist> - <tag><c>{ok, <anno>AbsForm</anno>}</c></tag> - <item> - <p>The parsing was successful. <c><anno>AbsForm</anno></c> is the - abstract form of the parsed form.</p> - </item> - <tag><c>{error, <anno>ErrorInfo</anno>}</c></tag> - <item> - <p>An error occurred.</p> - </item> - </taglist> - </desc> - </func> - <func> - <name name="parse_exprs" arity="1"/> - <fsummary>Parse Erlang expressions</fsummary> - <desc> - <p>This function parses <c><anno>Tokens</anno></c> as if it were - a list of expressions. It returns:</p> - <taglist> - <tag><c>{ok, <anno>ExprList</anno>}</c></tag> - <item> - <p>The parsing was successful. <c><anno>ExprList</anno></c> is a - list of the abstract forms of the parsed expressions.</p> - </item> - <tag><c>{error, <anno>ErrorInfo</anno>}</c></tag> - <item> - <p>An error occurred.</p> - </item> - </taglist> - </desc> - </func> - <func> - <name name="parse_term" arity="1"/> - <fsummary>Parse an Erlang term</fsummary> - <desc> - <p>This function parses <c><anno>Tokens</anno></c> as if it were - a term. It returns:</p> - <taglist> - <tag><c>{ok, <anno>Term</anno>}</c></tag> - <item> - <p>The parsing was successful. <c><anno>Term</anno></c> is - the Erlang term corresponding to the token list.</p> - </item> - <tag><c>{error, ErrorInfo}</c></tag> - <item> - <p>An error occurred.</p> - </item> - </taglist> - </desc> - </func> - <func> - <name>format_error(ErrorDescriptor) -> Chars</name> - <fsummary>Format an error descriptor</fsummary> - <type> - <v>ErrorDescriptor = <seealso - marker="#type-error_info">error_description()</seealso></v> - <v>Chars = [char() | Chars]</v> - </type> - <desc> - <p>Uses an <c>ErrorDescriptor</c> and returns a string - which describes the error. This function is usually called - implicitly when an <c>ErrorInfo</c> structure is processed - (see below).</p> - </desc> - </func> - <func> - <name name="tokens" arity="1"/> - <name name="tokens" arity="2"/> - <fsummary>Generate a list of tokens for an expression</fsummary> - <desc> - <p>This function generates a list of tokens representing the abstract - form <c><anno>AbsTerm</anno></c> of an expression. Optionally, it - appends <c><anno>MoreTokens</anno></c>.</p> - </desc> - </func> - <func> - <name name="normalise" arity="1"/> - <fsummary>Convert abstract form to an Erlang term</fsummary> - <desc> - <p>Converts the abstract form <c><anno>AbsTerm</anno></c> of a - term into a - conventional Erlang data structure (i.e., the term itself). - This is the inverse of <c>abstract/1</c>.</p> - </desc> - </func> - <func> <name name="abstract" arity="1"/> - <fsummary>Convert an Erlang term into an abstract form</fsummary> + <fsummary>Convert an Erlang term into an abstract form.</fsummary> <desc> <p>Converts the Erlang data structure <c><anno>Data</anno></c> into an abstract form of type <c><anno>AbsTerm</anno></c>. - This is the inverse of <c>normalise/1</c>.</p> + This function is the inverse of + <seealso marker="#normalise/1"><c>normalise/1</c></seealso>.</p> <p><c>erl_parse:abstract(T)</c> is equivalent to <c>erl_parse:abstract(T, 0)</c>.</p> </desc> </func> + <func> <name name="abstract" arity="2"/> - <fsummary>Convert an Erlang term into an abstract form</fsummary> + <fsummary>Convert an Erlang term into an abstract form.</fsummary> <type name="encoding_func"/> <desc> <p>Converts the Erlang data structure <c><anno>Data</anno></c> into an abstract form of type <c><anno>AbsTerm</anno></c>.</p> - <p>The <c><anno>Line</anno></c> option is the line that will - be assigned to each node of <c><anno>AbsTerm</anno></c>.</p> - <p>The <c><anno>Encoding</anno></c> option is used for - selecting which integer lists will be considered + <p>Option <c><anno>Line</anno></c> is the line to be + assigned to each node of <c><anno>AbsTerm</anno></c>.</p> + <p>Option <c><anno>Encoding</anno></c> is used for + selecting which integer lists to be considered as strings. The default is to use the encoding returned by - <seealso marker="epp#default_encoding/0"> + function <seealso marker="epp#default_encoding/0"> <c>epp:default_encoding/0</c></seealso>. - The value <c>none</c> means that no integer lists will be - considered as strings. The <c>encoding_func()</c> will be - called with one integer of a list at a time, and if it - returns <c>true</c> for every integer the list will be + Value <c>none</c> means that no integer lists are + considered as strings. <c>encoding_func()</c> is + called with one integer of a list at a time; if it + returns <c>true</c> for every integer, the list is considered a string.</p> </desc> </func> + <func> - <name name="map_anno" arity="2"/> - <fsummary> - Map a function over the annotations of a <c>erl_parse</c> tree - </fsummary> + <name name="anno_from_term" arity="1"/> + <fsummary>Return annotations as terms.</fsummary> <desc> - <p>Modifies the <c>erl_parse</c> tree <c><anno>Abstr</anno></c> - by applying <c><anno>Fun</anno></c> on each collection of - annotations of the nodes of the <c>erl_parse</c> tree. The - <c>erl_parse</c> tree is traversed in a depth-first, - left-to-right, fashion. - </p> + <p>Assumes that <c><anno>Term</anno></c> is a term with the same + structure as a <c>erl_parse</c> tree, but with terms, + say <c>T</c>, where a <c>erl_parse</c> tree has collections + of annotations. Returns a <c>erl_parse</c> tree where each + term <c>T</c> is replaced by the value returned by + <seealso marker="erl_anno#from_term/1"> + <c>erl_anno:from_term(T)</c></seealso>. The term + <c><anno>Term</anno></c> is traversed in a depth-first, + left-to-right fashion.</p> </desc> </func> + + <func> + <name name="anno_to_term" arity="1"/> + <fsummary>Return the representation of annotations.</fsummary> + <desc> + <p>Returns a term where each collection of annotations + <c>Anno</c> of the nodes of the <c>erl_parse</c> tree + <c><anno>Abstr</anno></c> is replaced by the term + returned by <seealso marker="erl_anno#to_term/1"> + <c>erl_anno:to_term(Anno)</c></seealso>. The + <c>erl_parse</c> tree is traversed in a depth-first, + left-to-right fashion.</p> + </desc> + </func> + <func> <name name="fold_anno" arity="3"/> - <fsummary> - Fold a function over the annotations of a <c>erl_parse</c> tree + <fsummary>Fold a function over the annotations of an <c>erl_parse</c> tree. </fsummary> <desc> <p>Updates an accumulator by applying <c><anno>Fun</anno></c> on each collection of annotations of the <c>erl_parse</c> tree <c><anno>Abstr</anno></c>. The first call to <c><anno>Fun</anno></c> has <c><anno>AccIn</anno></c> as - argument, and the returned accumulator + argument, the returned accumulator <c><anno>AccOut</anno></c> is passed to the next call, and so on. The final value of the accumulator is returned. The - <c>erl_parse</c> tree is traversed in a depth-first, left-to-right, - fashion. - </p> + <c>erl_parse</c> tree is traversed in a depth-first, left-to-right + fashion.</p> </desc> </func> + <func> - <name name="mapfold_anno" arity="3"/> - <fsummary> - Map and fold a function over the annotations of a - <c>erl_parse</c> tree + <name>format_error(ErrorDescriptor) -> Chars</name> + <fsummary>Format an error descriptor.</fsummary> + <type> + <v>ErrorDescriptor = <seealso + marker="#type-error_info">error_description()</seealso></v> + <v>Chars = [char() | Chars]</v> + </type> + <desc> + <p>Uses an <c>ErrorDescriptor</c> and returns a string + that describes the error. This function is usually called + implicitly when an <c>ErrorInfo</c> structure is processed + (see section <seealso marker="#errorinfo"> + Error Information</seealso>).</p> + </desc> + </func> + + <func> + <name name="map_anno" arity="2"/> + <fsummary>Map a function over the annotations of an <c>erl_parse</c> tree. </fsummary> <desc> <p>Modifies the <c>erl_parse</c> tree <c><anno>Abstr</anno></c> - by applying <c><anno>Fun</anno></c> on each collection of - annotations of the nodes of the <c>erl_parse</c> tree, while - at the same time updating an accumulator. The first call to - <c><anno>Fun</anno></c> has <c><anno>AccIn</anno></c> as - second argument, and the returned accumulator - <c><anno>AccOut</anno></c> is passed to the next call, and - so on. The modified <c>erl_parse</c> tree as well as the the - final value of the accumulator are returned. The - <c>erl_parse</c> tree is traversed in a depth-first, - left-to-right, fashion. - </p> + by applying <c><anno>Fun</anno></c> on each collection of + annotations of the nodes of the <c>erl_parse</c> tree. The + <c>erl_parse</c> tree is traversed in a depth-first, + left-to-right fashion.</p> + </desc> + </func> + + <func> + <name name="mapfold_anno" arity="3"/> + <fsummary>Map and fold a function over the annotations of an + <c>erl_parse</c> tree.</fsummary> + <desc> + <p>Modifies the <c>erl_parse</c> tree <c><anno>Abstr</anno></c> + by applying <c><anno>Fun</anno></c> on each collection of + annotations of the nodes of the <c>erl_parse</c> tree, while + at the same time updating an accumulator. The first call to + <c><anno>Fun</anno></c> has <c><anno>AccIn</anno></c> as + second argument, the returned accumulator + <c><anno>AccOut</anno></c> is passed to the next call, and + so on. The modified <c>erl_parse</c> tree and the + final value of the accumulator are returned. The + <c>erl_parse</c> tree is traversed in a depth-first, + left-to-right fashion.</p> </desc> </func> + <func> <name name="new_anno" arity="1"/> - <fsummary> - Create new annotations - </fsummary> + <fsummary>Create new annotations.</fsummary> <desc> <p>Assumes that <c><anno>Term</anno></c> is a term with the same structure as a <c>erl_parse</c> tree, but with <seealso marker="erl_anno#type-location">locations</seealso> where a <c>erl_parse</c> tree has collections of annotations. Returns a <c>erl_parse</c> tree where each location <c>L</c> - has been replaced by the value returned by <seealso + is replaced by the value returned by <seealso marker="erl_anno#new/1"><c>erl_anno:new(L)</c></seealso>. The term <c><anno>Term</anno></c> is traversed in a - depth-first, left-to-right, fashion. - </p> + depth-first, left-to-right fashion.</p> </desc> </func> + <func> - <name name="anno_from_term" arity="1"/> - <fsummary> - Return annotations as terms - </fsummary> + <name name="normalise" arity="1"/> + <fsummary>Convert abstract form to an Erlang term.</fsummary> <desc> - <p>Assumes that <c><anno>Term</anno></c> is a term with the same - structure as a <c>erl_parse</c> tree, but with terms, - <c>T</c> say, where a <c>erl_parse</c> tree has collections - of annotations. Returns a <c>erl_parse</c> tree where each - term <c>T</c> has been replaced by the value returned by - <seealso marker="erl_anno#from_term/1"> - <c>erl_anno:from_term(T)</c></seealso>. The term - <c><anno>Term</anno></c> is traversed in a depth-first, - left-to-right, fashion. - </p> + <p>Converts the abstract form <c><anno>AbsTerm</anno></c> of a + term into a conventional Erlang data structure (that is, the + term itself). This function is the inverse of + <seealso marker="#abstract/1"><c>abstract/1</c></seealso>.</p> </desc> </func> + <func> - <name name="anno_to_term" arity="1"/> - <fsummary> - Return the representation of annotations - </fsummary> + <name name="parse_exprs" arity="1"/> + <fsummary>Parse Erlang expressions.</fsummary> <desc> - <p>Returns a term where each collection of annotations - <c>Anno</c> of the nodes of the <c>erl_parse</c> tree - <c><anno>Abstr</anno></c> has been replaced by the term - returned by <seealso marker="erl_anno#to_term/1"> - <c>erl_anno:to_term(Anno)</c></seealso>. The - <c>erl_parse</c> tree is traversed in a depth-first, - left-to-right, fashion. - </p> + <p>Parses <c><anno>Tokens</anno></c> as if it was a list of expressions. + Returns one of the following:</p> + <taglist> + <tag><c>{ok, <anno>ExprList</anno>}</c></tag> + <item> + <p>The parsing was successful. <c><anno>ExprList</anno></c> is a + list of the abstract forms of the parsed expressions.</p> + </item> + <tag><c>{error, <anno>ErrorInfo</anno>}</c></tag> + <item> + <p>An error occurred.</p> + </item> + </taglist> + </desc> + </func> + + <func> + <name name="parse_form" arity="1"/> + <fsummary>Parse an Erlang form.</fsummary> + <desc> + <p>Parses <c><anno>Tokens</anno></c> as if it was a form. Returns one + of the following:</p> + <taglist> + <tag><c>{ok, <anno>AbsForm</anno>}</c></tag> + <item> + <p>The parsing was successful. <c><anno>AbsForm</anno></c> is the + abstract form of the parsed form.</p> + </item> + <tag><c>{error, <anno>ErrorInfo</anno>}</c></tag> + <item> + <p>An error occurred.</p> + </item> + </taglist> + </desc> + </func> + + <func> + <name name="parse_term" arity="1"/> + <fsummary>Parse an Erlang term.</fsummary> + <desc> + <p>Parses <c><anno>Tokens</anno></c> as if it was a term. Returns + one of the following:</p> + <taglist> + <tag><c>{ok, <anno>Term</anno>}</c></tag> + <item> + <p>The parsing was successful. <c><anno>Term</anno></c> is + the Erlang term corresponding to the token list.</p> + </item> + <tag><c>{error, <anno>ErrorInfo</anno>}</c></tag> + <item> + <p>An error occurred.</p> + </item> + </taglist> + </desc> + </func> + + <func> + <name name="tokens" arity="1"/> + <name name="tokens" arity="2"/> + <fsummary>Generate a list of tokens for an expression.</fsummary> + <desc> + <p>Generates a list of tokens representing the abstract + form <c><anno>AbsTerm</anno></c> of an expression. Optionally, + <c><anno>MoreTokens</anno></c> is appended.</p> </desc> </func> </funcs> <section> + <marker id="errorinfo"/> <title>Error Information</title> - <p>The <c>ErrorInfo</c> mentioned above is the standard - <c>ErrorInfo</c> structure which is returned from all IO - modules. It has the format: - </p> + <p><c>ErrorInfo</c> is the standard <c>ErrorInfo</c> structure that is + returned from all I/O modules. The format is as follows:</p> <code type="none"> - {ErrorLine, Module, ErrorDescriptor} </code> - <p>A string which describes the error is obtained with the following call: - </p> +{ErrorLine, Module, ErrorDescriptor}</code> + <p>A string describing the error is obtained with the following call:</p> <code type="none"> - Module:format_error(ErrorDescriptor) </code> +Module:format_error(ErrorDescriptor)</code> </section> <section> <title>See Also</title> - <p><seealso marker="io">io(3)</seealso>, - <seealso marker="erl_anno">erl_anno(3)</seealso>, - <seealso marker="erl_scan">erl_scan(3)</seealso>, - <seealso marker="erts:absform">ERTS User's Guide</seealso></p> + <p><seealso marker="erl_anno"><c>erl_anno(3)</c></seealso>, + <seealso marker="erl_scan"><c>erl_scan(3)</c></seealso>, + <seealso marker="io"><c>io(3)</c></seealso>, + section <seealso marker="erts:absform">The Abstract Format</seealso> + in the ERTS User's Guide</p> </section> </erlref> diff --git a/lib/stdlib/doc/src/erl_pp.xml b/lib/stdlib/doc/src/erl_pp.xml index e96fd576ec..77a7f1e8d1 100644 --- a/lib/stdlib/doc/src/erl_pp.xml +++ b/lib/stdlib/doc/src/erl_pp.xml @@ -7,7 +7,7 @@ <year>1996</year> <year>2016</year> <holder>Ericsson AB, All Rights Reserved</holder> - </copyright> + </copyright> <legalnotice> Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. @@ -30,38 +30,37 @@ <docno>1</docno> <approved>Bjarne Däcker</approved> <checked></checked> - <date>97-01-24</date> + <date>1997-01-24</date> <rev>B</rev> - <file>erl_pp.sgml</file> + <file>erl_pp.xml</file> </header> <module>erl_pp</module> - <modulesummary>The Erlang Pretty Printer</modulesummary> + <modulesummary>The Erlang pretty printer.</modulesummary> <description> <p>The functions in this module are used to generate aesthetically attractive representations of abstract - forms, which are suitable for printing. All functions return (possibly deep) + forms, which are suitable for printing. + All functions return (possibly deep) lists of characters and generate an error if the form is wrong.</p> - <p>All functions can have an optional argument which specifies a hook + + <p>All functions can have an optional argument, which specifies a hook that is called if an attempt is made to print an unknown form.</p> </description> + <datatypes> <datatype> <name name="hook_function"/> <desc> - <p>The optional argument <marker id="hook_function"/> - <c>HookFunction</c>, shown in the functions described below, - defines a function which is called when an unknown form occurs where there - should be a valid expression.</p> - - <p>If <c>HookFunction</c> is equal to <c>none</c> there is no hook - function.</p> - - <p>The called hook function should return a (possibly deep) list - of characters. <seealso marker="#expr/4"><c>expr/4</c></seealso> - is useful in a hook. - </p> - <p>If <c><anno>CurrentIndentation</anno></c> is negative, there will be no line - breaks and only a space is used as a separator.</p> + <p>Optional argument <marker id="hook_function"/><c>HookFunction</c>, + shown in the functions described in this module, defines a function + that is called when an unknown form occurs where there + is to be a valid expression. If <c>HookFunction</c> is equal to + <c>none</c>, there is no hook function.</p> + <p>The called hook function is to return a (possibly deep) list of + characters. Function <seealso marker="#expr/4"><c>expr/4</c></seealso> + is useful in a hook.</p> + <p>If <c><anno>CurrentIndentation</anno></c> is negative, there are no + line breaks and only a space is used as a separator.</p> </desc> </datatype> <datatype> @@ -71,78 +70,88 @@ <name name="options"/> </datatype> </datatypes> + <funcs> <func> - <name name="form" arity="1"/> - <name name="form" arity="2"/> - <fsummary>Pretty print a form</fsummary> + <name name="attribute" arity="1"/> + <name name="attribute" arity="2"/> + <fsummary>Pretty print an attribute.</fsummary> <desc> - <p>Pretty prints a - <c><anno>Form</anno></c> which is an abstract form of a type which is - returned by <seealso marker="erl_parse#parse_form/1"> - <c>erl_parse:parse_form/1</c></seealso>.</p> + <p>Same as <seealso marker="#form/1"><c>form/1,2</c></seealso>, + but only for attribute <c><anno>Attribute</anno></c>.</p> </desc> </func> + <func> - <name name="attribute" arity="1"/> - <name name="attribute" arity="2"/> - <fsummary>Pretty print an attribute</fsummary> + <name name="expr" arity="1"/> + <name name="expr" arity="2"/> + <name name="expr" arity="3"/> + <name name="expr" arity="4"/> + <fsummary>Pretty print one <c>Expression</c>.</fsummary> <desc> - <p>The same as <c>form</c>, but only for the attribute - <c><anno>Attribute</anno></c>.</p> + <p>Prints one expression. It is useful for implementing hooks (see + section + <seealso marker="#knownlimitations">Known Limitations</seealso>).</p> </desc> </func> + <func> - <name name="function" arity="1"/> - <name name="function" arity="2"/> - <fsummary>Pretty print a function</fsummary> + <name name="exprs" arity="1"/> + <name name="exprs" arity="2"/> + <name name="exprs" arity="3"/> + <fsummary>Pretty print <c>Expressions</c>.</fsummary> <desc> - <p>The same as <c>form</c>, but only for the function - <c><anno>Function</anno></c>.</p> + <p>Same as <seealso marker="#form/1"><c>form/1,2</c></seealso>, + but only for the sequence of + expressions in <c><anno>Expressions</anno></c>.</p> </desc> </func> + <func> - <name name="guard" arity="1"/> - <name name="guard" arity="2"/> - <fsummary>Pretty print a guard</fsummary> + <name name="form" arity="1"/> + <name name="form" arity="2"/> + <fsummary>Pretty print a form.</fsummary> <desc> - <p>The same as <c>form</c>, but only for the guard test - <c><anno>Guard</anno></c>.</p> + <p>Pretty prints a + <c><anno>Form</anno></c>, which is an abstract form of a type that is + returned by <seealso marker="erl_parse#parse_form/1"> + <c>erl_parse:parse_form/1</c></seealso>.</p> </desc> </func> + <func> - <name name="exprs" arity="1"/> - <name name="exprs" arity="2"/> - <name name="exprs" arity="3"/> - <fsummary>Pretty print <c>Expressions</c></fsummary> + <name name="function" arity="1"/> + <name name="function" arity="2"/> + <fsummary>Pretty print a function.</fsummary> <desc> - <p>The same as <c>form</c>, but only for the sequence of - expressions in <c><anno>Expressions</anno></c>.</p> + <p>Same as <seealso marker="#form/1"><c>form/1,2</c></seealso>, + but only for function <c><anno>Function</anno></c>.</p> </desc> </func> + <func> - <name name="expr" arity="1"/> - <name name="expr" arity="2"/> - <name name="expr" arity="3"/> - <name name="expr" arity="4"/> - <fsummary>Pretty print one <c>Expression</c></fsummary> + <name name="guard" arity="1"/> + <name name="guard" arity="2"/> + <fsummary>Pretty print a guard.</fsummary> <desc> - <p>This function prints one expression. It is useful for implementing hooks (see below).</p> + <p>Same as <seealso marker="#form/1"><c>form/1,2</c></seealso>, + but only for the guard test <c><anno>Guard</anno></c>.</p> </desc> </func> </funcs> <section> - <title>Bugs</title> - <p>It should be possible to have hook functions for unknown forms - at places other than expressions.</p> + <marker id="knownlimitations"/> + <title>Known Limitations</title> + <p>It is not possible to have hook functions for unknown forms + at other places than expressions.</p> </section> <section> <title>See Also</title> - <p><seealso marker="io">io(3)</seealso>, - <seealso marker="erl_parse">erl_parse(3)</seealso>, - <seealso marker="erl_eval">erl_eval(3)</seealso></p> + <p><seealso marker="erl_eval"><c>erl_eval(3)</c></seealso>, + <seealso marker="erl_parse"><c>erl_parse(3)</c></seealso>, + <seealso marker="io"><c>io(3)</c></seealso></p> </section> </erlref> diff --git a/lib/stdlib/doc/src/erl_scan.xml b/lib/stdlib/doc/src/erl_scan.xml index ee0d6b6033..137ccd3416 100644 --- a/lib/stdlib/doc/src/erl_scan.xml +++ b/lib/stdlib/doc/src/erl_scan.xml @@ -4,7 +4,7 @@ <erlref> <header> <copyright> - <year>1996</year><year>2015</year> + <year>1996</year><year>2016</year> <holder>Ericsson AB. All Rights Reserved.</holder> </copyright> <legalnotice> @@ -28,16 +28,17 @@ <docno>1</docno> <approved>Bjarne Däcker</approved> <checked></checked> - <date>97-01-24</date> + <date>1997-01-24</date> <rev>B</rev> - <file>erl_scan.sgml</file> + <file>erl_scan.xml</file> </header> <module>erl_scan</module> - <modulesummary>The Erlang Token Scanner</modulesummary> + <modulesummary>The Erlang token scanner.</modulesummary> <description> - <p>This module contains functions for tokenizing characters into + <p>This module contains functions for tokenizing (scanning) characters into Erlang tokens.</p> </description> + <datatypes> <datatype> <name name="category"></name> @@ -70,23 +71,96 @@ <name name="tokens_result"></name> </datatype> </datatypes> + <funcs> <func> + <name name="category" arity="1"/> + <fsummary>Return the category.</fsummary> + <desc> + <p>Returns the category of <c><anno>Token</anno></c>.</p> + </desc> + </func> + + <func> + <name name="column" arity="1"/> + <fsummary>Return the column.</fsummary> + <desc> + <p>Returns the column of <c><anno>Token</anno></c>'s + collection of annotations.</p> + </desc> + </func> + + <func> + <name name="end_location" arity="1"/> + <fsummary>Return the end location of the text.</fsummary> + <desc> + <p>Returns the end location of the text of + <c><anno>Token</anno></c>'s collection of annotations. If + there is no text, <c>undefined</c> is returned.</p> + </desc> + </func> + + <func> + <name name="format_error" arity="1"/> + <fsummary>Format an error descriptor.</fsummary> + <desc> + <p>Uses an <c><anno>ErrorDescriptor</anno></c> and returns a string + that describes the error or warning. This function is usually + called implicitly when an <c>ErrorInfo</c> structure is + processed (see section + <seealso marker="#errorinfo">Error Information</seealso>).</p> + </desc> + </func> + + <func> + <name name="line" arity="1"/> + <fsummary>Return the line.</fsummary> + <desc> + <p>Returns the line of <c><anno>Token</anno></c>'s collection + of annotations.</p> + </desc> + </func> + + <func> + <name name="location" arity="1"/> + <fsummary>Return the location.</fsummary> + <desc> + <p>Returns the location of <c><anno>Token</anno></c>'s + collection of annotations.</p> + </desc> + </func> + + <func> + <name name="reserved_word" arity="1"/> + <fsummary>Test for a reserved word.</fsummary> + <desc> + <p>Returns <c>true</c> if <c><anno>Atom</anno></c> is an + Erlang reserved word, otherwise <c>false</c>.</p> + </desc> + </func> + + <func> <name name="string" arity="1"/> <name name="string" arity="2"/> <name name="string" arity="3"/> - <fsummary>Scan a string and return the Erlang tokens</fsummary> + <fsummary>Scan a string and return the Erlang tokens.</fsummary> <desc> <p>Takes the list of characters <c><anno>String</anno></c> and tries to - scan (tokenize) them. Returns <c>{ok, <anno>Tokens</anno>, - <anno>EndLocation</anno>}</c>, - where <c><anno>Tokens</anno></c> are the Erlang tokens from - <c><anno>String</anno></c>. <c><anno>EndLocation</anno></c> - is the first location after the last token.</p> - <p><c>{error, <anno>ErrorInfo</anno>, <anno>ErrorLocation</anno>}</c> - is returned if an error occurs. - <c><anno>ErrorLocation</anno></c> is the first location after - the erroneous token.</p> + scan (tokenize) them. Returns one of the following:</p> + <taglist> + <tag><c>{ok, <anno>Tokens</anno>, <anno>EndLocation</anno>}</c></tag> + <item> + <p><c><anno>Tokens</anno></c> are the Erlang tokens from + <c><anno>String</anno></c>. <c><anno>EndLocation</anno></c> + is the first location after the last token.</p> + </item> + <tag><c>{error, <anno>ErrorInfo</anno>, + <anno>ErrorLocation</anno>}</c></tag> + <item> + <p>An error occurred. <c><anno>ErrorLocation</anno></c> is the + first location after the erroneous token.</p> + </item> + </taglist> <p><c>string(<anno>String</anno>)</c> is equivalent to <c>string(<anno>String</anno>, 1)</c>, and <c>string(<anno>String</anno>, @@ -95,80 +169,102 @@ <anno>StartLocation</anno>, [])</c>.</p> <p><c><anno>StartLocation</anno></c> indicates the initial location when scanning starts. If <c><anno>StartLocation</anno></c> is a line, - <c>Anno</c> as well as <c><anno>EndLocation</anno></c> and - <c><anno>ErrorLocation</anno></c> will be lines. If - <c><anno>StartLocation</anno></c> is a pair of a line and a column + <c>Anno</c>, <c><anno>EndLocation</anno></c>, and + <c><anno>ErrorLocation</anno></c> are lines. If + <c><anno>StartLocation</anno></c> is a pair of a line and a column, <c>Anno</c> takes the form of an opaque compound data type, and <c><anno>EndLocation</anno></c> and <c><anno>ErrorLocation</anno></c> - will be pairs of a line and a column. The <em>token + are pairs of a line and a column. The <em>token annotations</em> contain information about the column and the line where the token begins, as well as the text of the - token (if the <c>text</c> option is given), all of which can + token (if option <c>text</c> is specified), all of which can be accessed by calling - <seealso marker="#column/1">column/1</seealso>, - <seealso marker="#line/1">line/1</seealso>, - <seealso marker="#location/1">location/1</seealso>, and - <seealso marker="#text/1">text/1</seealso>.</p> + <seealso marker="#column/1"><c>column/1</c></seealso>, + <seealso marker="#line/1"><c>line/1</c></seealso>, + <seealso marker="#location/1"><c>location/1</c></seealso>, and + <seealso marker="#text/1"><c>text/1</c></seealso>.</p> <p>A <em>token</em> is a tuple containing information about - syntactic category, the token annotations, and the actual - terminal symbol. For punctuation characters (e.g. <c>;</c>, + syntactic category, the token annotations, and the + terminal symbol. For punctuation characters (such as <c>;</c> and <c>|</c>) and reserved words, the category and the symbol coincide, and the token is represented by a two-tuple. - Three-tuples have one of the following forms: <c>{atom, - Info, atom()}</c>, - <c>{char, Info, integer()}</c>, <c>{comment, Info, - string()}</c>, <c>{float, Info, float()}</c>, <c>{integer, - Info, integer()}</c>, <c>{var, Info, atom()}</c>, - and <c>{white_space, Info, string()}</c>.</p> - <p>The valid options are:</p> + Three-tuples have one of the following forms:</p> + <list type="bulleted"> + <item><c>{atom, Anno, atom()}</c></item> + <item><c>{char, Anno, char()}</c></item> + <item><c>{comment, Anno, string()}</c></item> + <item><c>{float, Anno, float()}</c></item> + <item><c>{integer, Anno, integer()}</c></item> + <item><c>{var, Anno, atom()}</c></item> + <item><c>{white_space, Anno, string()}</c></item> + </list> + <p>Valid options:</p> <taglist> - <tag><c>{reserved_word_fun, reserved_word_fun()}</c></tag> - <item><p>A callback function that is called when the scanner - has found an unquoted atom. If the function returns - <c>true</c>, the unquoted atom itself will be the category - of the token; if the function returns <c>false</c>, - <c>atom</c> will be the category of the unquoted atom.</p> - </item> - <tag><c>return_comments</c></tag> - <item><p>Return comment tokens.</p> - </item> - <tag><c>return_white_spaces</c></tag> - <item><p>Return white space tokens. By convention, if there is - a newline character, it is always the first character of the - text (there cannot be more than one newline in a white space - token).</p> - </item> - <tag><c>return</c></tag> - <item><p>Short for <c>[return_comments, return_white_spaces]</c>.</p> - </item> - <tag><c>text</c></tag> - <item><p>Include the token's text in the token annotation. The - text is the part of the input corresponding to the token.</p> - </item> + <tag><c>{reserved_word_fun, reserved_word_fun()}</c></tag> + <item><p>A callback function that is called when the scanner + has found an unquoted atom. If the function returns + <c>true</c>, the unquoted atom itself becomes the category + of the token. If the function returns <c>false</c>, + <c>atom</c> becomes the category of the unquoted atom.</p> + </item> + <tag><c>return_comments</c></tag> + <item><p>Return comment tokens.</p> + </item> + <tag><c>return_white_spaces</c></tag> + <item><p>Return white space tokens. By convention, a newline + character, if present, is always the first character of the + text (there cannot be more than one newline in a white space + token).</p> + </item> + <tag><c>return</c></tag> + <item><p>Short for <c>[return_comments, return_white_spaces]</c>.</p> + </item> + <tag><c>text</c></tag> + <item><p>Include the token text in the token annotation. The + text is the part of the input corresponding to the token.</p> + </item> </taglist> </desc> </func> + + <func> + <name name="symbol" arity="1"/> + <fsummary>Return the symbol.</fsummary> + <desc> + <p>Returns the symbol of <c><anno>Token</anno></c>.</p> + </desc> + </func> + + <func> + <name name="text" arity="1"/> + <fsummary>Return the text.</fsummary> + <desc> + <p>Returns the text of <c><anno>Token</anno></c>'s collection + of annotations. If there is no text, <c>undefined</c> is + returned.</p> + </desc> + </func> + <func> <name name="tokens" arity="3"/> <name name="tokens" arity="4"/> - <fsummary>Re-entrant scanner</fsummary> + <fsummary>Re-entrant scanner.</fsummary> <type name="char_spec"/> <type name="return_cont"/> - <type_desc name="return_cont">An opaque continuation</type_desc> + <type_desc name="return_cont">An opaque continuation.</type_desc> <desc> - <p>This is the re-entrant scanner which scans characters until - a <em>dot</em> ('.' followed by a white space) or - <c>eof</c> has been reached. It returns:</p> + <p>This is the re-entrant scanner, which scans characters until + either a <em>dot</em> ('.' followed by a white space) or + <c>eof</c> is reached. It returns:</p> <taglist> <tag><c>{done, <anno>Result</anno>, <anno>LeftOverChars</anno>}</c> </tag> <item> - <p>This return indicates that there is sufficient input + <p>Indicates that there is sufficient input data to get a result. <c><anno>Result</anno></c> is:</p> <taglist> - <tag><c>{ok, Tokens, EndLocation}</c> - </tag> + <tag><c>{ok, Tokens, EndLocation}</c></tag> <item> <p>The scanning was successful. <c>Tokens</c> is the list of tokens including <em>dot</em>.</p> @@ -177,8 +273,7 @@ <item> <p>End of file was encountered before any more tokens.</p> </item> - <tag><c>{error, ErrorInfo, EndLocation}</c> - </tag> + <tag><c>{error, ErrorInfo, EndLocation}</c></tag> <item> <p>An error occurred. <c><anno>LeftOverChars</anno></c> is the remaining characters of the input data, @@ -194,110 +289,26 @@ </item> </taglist> <p>The <c><anno>CharSpec</anno></c> <c>eof</c> signals end of file. - <c><anno>LeftOverChars</anno></c> will then take the value <c>eof</c> + <c><anno>LeftOverChars</anno></c> then takes the value <c>eof</c> as well.</p> <p><c>tokens(<anno>Continuation</anno>, <anno>CharSpec</anno>, <anno>StartLocation</anno>)</c> is equivalent to <c>tokens(<anno>Continuation</anno>, <anno>CharSpec</anno>, <anno>StartLocation</anno>, [])</c>.</p> - <p>See <seealso marker="#string/3">string/3</seealso> for a - description of the various options.</p> - </desc> - </func> - <func> - <name name="reserved_word" arity="1"/> - <fsummary>Test for a reserved word</fsummary> - <desc> - <p>Returns <c>true</c> if <c><anno>Atom</anno></c> is an Erlang - reserved word, otherwise <c>false</c>.</p> - </desc> - </func> - <func> - <name name="category" arity="1"/> - <fsummary>Return the category</fsummary> - <desc> - <p>Returns the category of <c><anno>Token</anno></c>. - </p> - </desc> - </func> - <func> - <name name="symbol" arity="1"/> - <fsummary>Return the symbol</fsummary> - <desc> - <p>Returns the symbol of <c><anno>Token</anno></c>. - </p> - </desc> - </func> - <func> - <name name="column" arity="1"/> - <fsummary>Return the column</fsummary> - <desc> - <p>Returns the column of <c><anno>Token</anno></c>'s - collection of annotations. - </p> - </desc> - </func> - <func> - <name name="end_location" arity="1"/> - <fsummary>Return the end location of the text</fsummary> - <desc> - <p>Returns the end location of the text of - <c><anno>Token</anno></c>'s collection of annotations. If - there is no text, - <c>undefined</c> is returned. - </p> - </desc> - </func> - <func> - <name name="line" arity="1"/> - <fsummary>Return the line</fsummary> - <desc> - <p>Returns the line of <c><anno>Token</anno></c>'s collection - of annotations. - </p> - </desc> - </func> - <func> - <name name="location" arity="1"/> - <fsummary>Return the location</fsummary> - <desc> - <p>Returns the location of <c><anno>Token</anno></c>'s - collection of annotations. - </p> - </desc> - </func> - <func> - <name name="text" arity="1"/> - <fsummary>Return the text</fsummary> - <desc> - <p>Returns the text of <c><anno>Token</anno></c>'s collection - of annotations. If there is no text, <c>undefined</c> is - returned. - </p> - </desc> - </func> - <func> - <name name="format_error" arity="1"/> - <fsummary>Format an error descriptor</fsummary> - <desc> - <p>Takes an <c><anno>ErrorDescriptor</anno></c> and returns - a string which - describes the error or warning. This function is usually - called implicitly when processing an <c>ErrorInfo</c> - structure (see below).</p> + <p>For a description of the options, see + <seealso marker="#string/3"><c>string/3</c></seealso>.</p> </desc> </func> </funcs> <section> + <marker id="errorinfo"/> <title>Error Information</title> - <p>The <c>ErrorInfo</c> mentioned above is the standard - <c>ErrorInfo</c> structure which is returned from all IO - modules. It has the following format:</p> + <p><c>ErrorInfo</c> is the standard <c>ErrorInfo</c> structure that is + returned from all I/O modules. The format is as follows:</p> <code type="none"> {ErrorLocation, Module, ErrorDescriptor}</code> - <p>A string which describes the error is obtained with the - following call:</p> + <p>A string describing the error is obtained with the following call:</p> <code type="none"> Module:format_error(ErrorDescriptor)</code> </section> @@ -305,15 +316,15 @@ Module:format_error(ErrorDescriptor)</code> <section> <title>Notes</title> <p>The continuation of the first call to the re-entrant input - functions must be <c>[]</c>. Refer to Armstrong, Virding and - Williams, 'Concurrent Programming in Erlang', Chapter 13, for a - complete description of how the re-entrant input scheme works.</p> + functions must be <c>[]</c>. For a complete description of how the + re-entrant input scheme works, see Armstrong, Virding and + Williams: 'Concurrent Programming in Erlang', Chapter 13.</p> </section> <section> <title>See Also</title> - <p><seealso marker="io">io(3)</seealso>, - <seealso marker="erl_anno">erl_anno(3)</seealso>, - <seealso marker="erl_parse">erl_parse(3)</seealso></p> + <p><seealso marker="erl_anno"><c>erl_anno(3)</c></seealso>, + <seealso marker="erl_parse"><c>erl_parse(3)</c></seealso>, + <seealso marker="io"><c>io(3)</c></seealso></p> </section> </erlref> diff --git a/lib/stdlib/doc/src/erl_tar.xml b/lib/stdlib/doc/src/erl_tar.xml index 1f4a43f622..24e7b64b9e 100644 --- a/lib/stdlib/doc/src/erl_tar.xml +++ b/lib/stdlib/doc/src/erl_tar.xml @@ -28,117 +28,146 @@ <docno>1</docno> <approved>Kenneth Lundin</approved> <checked></checked> - <date>03-01-21</date> + <date>2003-01-21</date> <rev>A</rev> - <file>erl_tar.sgml</file> + <file>erl_tar.xml</file> </header> <module>erl_tar</module> - <modulesummary>Unix 'tar' utility for reading and writing tar archives</modulesummary> + <modulesummary>Unix 'tar' utility for reading and writing tar archives. + </modulesummary> <description> - <p>The <c>erl_tar</c> module archives and extract files to and from - a tar file. <c>erl_tar</c> supports the <c>ustar</c> format + <p>This module archives and extract files to and from + a tar file. This module supports the <c>ustar</c> format (IEEE Std 1003.1 and ISO/IEC 9945-1). All modern <c>tar</c> programs (including GNU tar) can read this format. To ensure that that GNU tar produces a tar file that <c>erl_tar</c> can read, - give the <c>--format=ustar</c> option to GNU tar.</p> - <p>By convention, the name of a tar file should end in "<c>.tar</c>". - To abide to the convention, you'll need to add "<c>.tar</c>" yourself - to the name.</p> - <p>Tar files can be created in one operation using the - <seealso marker="#create_2">create/2</seealso> or - <seealso marker="#create_3">create/3</seealso> function.</p> - <p>Alternatively, for more control, the - <seealso marker="#open">open</seealso>, - <seealso marker="#add">add/3,4</seealso>, and - <seealso marker="#close">close/1</seealso> functions can be used.</p> - <p>To extract all files from a tar file, use the - <seealso marker="#extract_1">extract/1</seealso> function. + specify option <c>--format=ustar</c> to GNU tar.</p> + + <p>By convention, the name of a tar file is to end in "<c>.tar</c>". + To abide to the convention, add "<c>.tar</c>" to the name.</p> + + <p>Tar files can be created in one operation using function + <seealso marker="#create/2"><c>create/2</c></seealso> or + <seealso marker="#create/3"><c>create/3</c></seealso>.</p> + + <p>Alternatively, for more control, use functions + <seealso marker="#open/2"><c>open/2</c></seealso>, + <seealso marker="#add/3"><c>add/3,4</c></seealso>, and + <seealso marker="#close/1"><c>close/1</c></seealso>.</p> + + <p>To extract all files from a tar file, use function + <seealso marker="#extract/1"><c>extract/1</c></seealso>. To extract only some files or to be able to specify some more options, - use the <seealso marker="#extract_2">extract/2</seealso> function.</p> + use function <seealso marker="#extract/2"><c>extract/2</c></seealso>.</p> + <p>To return a list of the files in a tar file, - use either the <seealso marker="#table_1">table/1</seealso> or - <seealso marker="#table_2">table/2</seealso> function. + use function <seealso marker="#table/1"><c>table/1</c></seealso> or + <seealso marker="#table/2"><c>table/2</c></seealso>. To print a list of files to the Erlang shell, - use either the <seealso marker="#t_1">t/1</seealso> or - <seealso marker="#tt_1">tt/1</seealso> function.</p> + use function <seealso marker="#t/1"><c>t/1</c></seealso> or + <seealso marker="#tt/1"><c>tt/1</c></seealso>.</p> + <p>To convert an error term returned from one of the functions - above to a readable message, use the - <seealso marker="#format_error_1">format_error/1</seealso> function.</p> + above to a readable message, use function + <seealso marker="#format_error/1"><c>format_error/1</c></seealso>.</p> </description> <section> - <title>UNICODE SUPPORT</title> - <p>If <seealso - marker="kernel:file#native_name_encoding/0">file:native_name_encoding/0</seealso> - returns <c>utf8</c>, path names will be encoded in UTF-8 when - creating tar files and path names will be assumed to be encoded in - UTF-8 when extracting tar files.</p> + <title>Unicode Support</title> + <p>If <seealso marker="kernel:file#native_name_encoding/0"> + <c>file:native_name_encoding/0</c></seealso> + returns <c>utf8</c>, path names are encoded in UTF-8 when + creating tar files, and path names are assumed to be encoded in + UTF-8 when extracting tar files.</p> - <p>If <seealso - marker="kernel:file#native_name_encoding/0">file:native_name_encoding/0</seealso> - returns <c>latin1</c>, no translation of path names will be - done.</p> + <p>If <seealso marker="kernel:file#native_name_encoding/0"> + <c>file:native_name_encoding/0</c></seealso> + returns <c>latin1</c>, no translation of path names is done.</p> </section> <section> - <title>OTHER STORAGE MEDIA</title> - <p>The <c>erl_ftp</c> module normally accesses the tar-file on disk using the <seealso marker="kernel:file">file module</seealso>. When other needs arise, there is a way to define your own low-level Erlang functions to perform the writing and reading on the storage media. See <seealso marker="#init/3">init/3</seealso> for usage.</p> - <p>An example of this is the sftp support in <seealso marker="ssh:ssh_sftp#open_tar/3">ssh_sftp:open_tar/3</seealso>. That function opens a tar file on a remote machine using an sftp channel.</p> + <title>Other Storage Media</title> + <p>The <seealso marker="inets:ftp"><c>ftp</c></seealso> + module (Inets) normally accesses the tar file on disk using + the <seealso marker="kernel:file"><c>file</c></seealso> module. + When other needs arise, you can define your own low-level Erlang + functions to perform the writing and reading on the storage media; + use function <seealso marker="#init/3"><c>init/3</c></seealso>.</p> + + <p>An example of this is the SFTP support in + <seealso marker="ssh:ssh_sftp#open_tar/3"> + <c>ssh_sftp:open_tar/3</c></seealso>. This function opens a tar file + on a remote machine using an SFTP channel.</p> </section> <section> - <title>LIMITATIONS</title> - <p>For maximum compatibility, it is safe to archive files with names - up to 100 characters in length. Such tar files can generally be - extracted by any <c>tar</c> program.</p> - <p>If filenames exceed 100 characters in length, the resulting tar - file can only be correctly extracted by a POSIX-compatible <c>tar</c> - program (such as Solaris <c>tar</c>), not by GNU tar.</p> - <p>File have longer names than 256 bytes cannot be stored at all.</p> - <p>The filename of the file a symbolic link points is always limited - to 100 characters.</p> + <title>Limitations</title> + <list type="bulleted"> + <item> + <p>For maximum compatibility, it is safe to archive files with names + up to 100 characters in length. Such tar files can generally be + extracted by any <c>tar</c> program.</p> + </item> + <item> + <p>For filenames exceeding 100 characters in length, the resulting tar + file can only be correctly extracted by a POSIX-compatible <c>tar</c> + program (such as Solaris <c>tar</c> or a modern GNU <c>tar</c>).</p> + </item> + <item> + <p>Files with longer names than 256 bytes cannot be stored.</p> + </item> + <item> + <p>The file name a symbolic link points is always limited + to 100 characters.</p> + </item> + </list> </section> + <funcs> <func> <name>add(TarDescriptor, Filename, Options) -> RetValue</name> - <fsummary>Add a file to an open tar file</fsummary> + <fsummary>Add a file to an open tar file.</fsummary> <type> <v>TarDescriptor = term()</v> <v>Filename = filename()</v> <v>Options = [Option]</v> <v>Option = dereference|verbose|{chunks,ChunkSize}</v> - <v>ChunkSize = positive_integer()</v> + <v>ChunkSize = positive_integer()</v> <v>RetValue = ok|{error,{Filename,Reason}}</v> <v>Reason = term()</v> </type> <desc> - <p>The <marker id="add"></marker><c>add/3</c> function adds - a file to a tar file that has been opened for writing by - <seealso marker="#open">open/1</seealso>.</p> + <p>Adds a file to a tar file that has been opened for writing by + <seealso marker="#open/2"><c>open/1</c></seealso>.</p> + <p>Options:</p> <taglist> <tag><c>dereference</c></tag> <item> - <p>By default, symbolic links will be stored as symbolic links - in the tar file. Use the <c>dereference</c> option to override the - default and store the file that the symbolic link points to into - the tar file.</p> + <p>By default, symbolic links are stored as symbolic links + in the tar file. To override the default and store the file + that the symbolic link points to into the tar file, use + option <c>dereference</c>.</p> </item> <tag><c>verbose</c></tag> <item> - <p>Print an informational message about the file being added.</p> + <p>Prints an informational message about the added file.</p> + </item> + <tag><c>{chunks,ChunkSize}</c></tag> + <item> + <p>Reads data in parts from the file. This is intended for + memory-limited machines that, for example, builds a tar file + on a remote machine over SFTP, see + <seealso marker="ssh:ssh_sftp#open_tar/3"> + <c>ssh_sftp:open_tar/3</c></seealso>.</p> </item> - <tag><c>{chunks,ChunkSize}</c></tag> - <item> - <p>Read data in parts from the file. This is intended for memory-limited - machines that for example builds a tar file on a remote machine over - <seealso marker="ssh:ssh_sftp#open_tar/3">sftp</seealso>.</p> - </item> </taglist> </desc> </func> + <func> - <name>add(TarDescriptor, FilenameOrBin, NameInArchive, Options) -> RetValue </name> - <fsummary>Add a file to an open tar file</fsummary> + <name>add(TarDescriptor, FilenameOrBin, NameInArchive, Options) -> + RetValue </name> + <fsummary>Add a file to an open tar file.</fsummary> <type> <v>TarDescriptor = term()</v> <v>FilenameOrBin = filename()|binary()</v> @@ -150,53 +179,55 @@ <v>Reason = term()</v> </type> <desc> - <p>The <c>add/4</c> function adds a file to a tar file - that has been opened for writing by - <seealso marker="#open">open/1</seealso>. It accepts the same - options as <seealso marker="#add">add/3</seealso>.</p> - <p><c>NameInArchive</c> is the name under which the file will - be stored in the tar file. That is the name that the file will - get when it will be extracted from the tar file.</p> + <p>Adds a file to a tar file that has been opened for writing by + <seealso marker="#open/2"><c>open/2</c></seealso>. This function + accepts the same options as + <seealso marker="#add/3"><c>add/3</c></seealso>.</p> + <p><c>NameInArchive</c> is the name under which the file becomes + stored in the tar file. The file gets this name when it is + extracted from the tar file.</p> </desc> </func> + <func> <name>close(TarDescriptor)</name> - <fsummary>Close an open tar file</fsummary> + <fsummary>Close an open tar file.</fsummary> <type> <v>TarDescriptor = term()</v> </type> <desc> - <p>The <marker id="close"></marker><c>close/1</c> function - closes a tar file - opened by <seealso marker="#open">open/1</seealso>.</p> + <p>Closes a tar file + opened by <seealso marker="#open/2"><c>open/2</c></seealso>.</p> </desc> </func> + <func> <name>create(Name, FileList) ->RetValue </name> - <fsummary>Create a tar archive</fsummary> + <fsummary>Create a tar archive.</fsummary> <type> <v>Name = filename()</v> - <v>FileList = [Filename|{NameInArchive, binary()},{NameInArchive, Filename}]</v> + <v>FileList = [Filename|{NameInArchive, binary()},{NameInArchive, + Filename}]</v> <v>Filename = filename()</v> <v>NameInArchive = filename()</v> <v>RetValue = ok|{error,{Name,Reason}}</v> <v>Reason = term()</v> </type> <desc> - <p>The <marker id="create_2"></marker><c>create/2</c> function - creates a tar file and - archives the files whose names are given in <c>FileList</c> into it. - The files may either be read from disk or given as - binaries.</p> + <p>Creates a tar file and archives the files whose names are specified + in <c>FileList</c> into it. The files can either be read from disk + or be specified as binaries.</p> </desc> </func> + <func> <name>create(Name, FileList, OptionList)</name> - <fsummary>Create a tar archive with options</fsummary> + <fsummary>Create a tar archive with options.</fsummary> <type> <v>Name = filename()</v> - <v>FileList = [Filename|{NameInArchive, binary()},{NameInArchive, Filename}]</v> - <v>Filename = filename()</v> + <v>FileList = [Filename|{NameInArchive, binary()},{NameInArchive, + Filename}]</v> + <v>Filename = filename()</v> <v>NameInArchive = filename()</v> <v>OptionList = [Option]</v> <v>Option = compressed|cooked|dereference|verbose</v> @@ -204,68 +235,66 @@ <v>Reason = term()</v> </type> <desc> - <p>The <marker id="create_3"></marker><c>create/3</c> function - creates a tar file and archives the files whose names are given - in <c>FileList</c> into it. The files may either be read from - disk or given as binaries.</p> - <p>The options in <c>OptionList</c> modify the defaults as follows. - </p> + <p>Creates a tar file and archives the files whose names are specified + in <c>FileList</c> into it. The files can either be read from disk + or be specified as binaries.</p> + <p>The options in <c>OptionList</c> modify the defaults as follows:</p> <taglist> <tag><c>compressed</c></tag> <item> - <p>The entire tar file will be compressed, as if it has + <p>The entire tar file is compressed, as if it has been run through the <c>gzip</c> program. To abide to the - convention that a compressed tar file should end in "<c>.tar.gz</c>" or - "<c>.tgz</c>", you'll need to add the appropriate extension yourself.</p> + convention that a compressed tar file is to end in + "<c>.tar.gz</c>" or "<c>.tgz</c>", add the appropriate + extension.</p> </item> <tag><c>cooked</c></tag> <item> - <p>By default, the <c>open/2</c> function will open the tar file - in <c>raw</c> mode, which is faster but does not allow a remote (erlang) - file server to be used. Adding <c>cooked</c> to the mode list will - override the default and open the tar file without the <c>raw</c> - option.</p> + <p>By default, function <c>open/2</c> opens the tar file in + <c>raw</c> mode, which is faster but does not allow a remote + (Erlang) file server to be used. Adding <c>cooked</c> to the + mode list overrides the default and opens the tar file without + option <c>raw</c>.</p> </item> <tag><c>dereference</c></tag> <item> - <p>By default, symbolic links will be stored as symbolic links - in the tar file. Use the <c>dereference</c> option to override the - default and store the file that the symbolic link points to into - the tar file.</p> + <p>By default, symbolic links are stored as symbolic links in + the tar file. To override the default and store the file that + the symbolic link points to into the tar file, use + option <c>dereference</c>.</p> </item> <tag><c>verbose</c></tag> <item> - <p>Print an informational message about each file being added.</p> + <p>Prints an informational message about each added file.</p> </item> </taglist> </desc> </func> + <func> <name>extract(Name) -> RetValue</name> - <fsummary>Extract all files from a tar file</fsummary> + <fsummary>Extract all files from a tar file.</fsummary> <type> <v>Name = filename()</v> <v>RetValue = ok|{error,{Name,Reason}}</v> <v>Reason = term()</v> </type> <desc> - <p>The <marker id="extract_1"></marker><c>extract/1</c> function - extracts all files from a tar archive.</p> - <p>If the <c>Name</c> argument is given as "<c>{binary,Binary}</c>", - the contents of the binary is assumed to be a tar archive. - </p> - <p>If the <c>Name</c> argument is given as "<c>{file,Fd}</c>", - <c>Fd</c> is assumed to be a file descriptor returned from - the <c>file:open/2</c> function. - </p> - <p>Otherwise, <c>Name</c> should be a filename.</p> + <p>Extracts all files from a tar archive.</p> + <p>If argument <c>Name</c> is specified as <c>{binary,Binary}</c>, + the contents of the binary is assumed to be a tar archive.</p> + <p>If argument <c>Name</c> is specified as <c>{file,Fd}</c>, + <c>Fd</c> is assumed to be a file descriptor returned from function + <c>file:open/2</c>.</p> + <p>Otherwise, <c>Name</c> is to be a filename.</p> </desc> </func> + <func> <name>extract(Name, OptionList)</name> - <fsummary>Extract files from a tar file</fsummary> + <fsummary>Extract files from a tar file.</fsummary> <type> - <v>Name = filename() | {binary,Binary} | {file,Fd} </v> + <v>Name = filename() | {binary,Binary} | {file,Fd}</v> <v>Binary = binary()</v> <v>Fd = file_descriptor()</v> <v>OptionList = [Option]</v> @@ -278,272 +307,263 @@ <v>Reason = term()</v> </type> <desc> - <p>The <marker id="extract_2"></marker><c>extract/2</c> function - extracts files from a tar archive.</p> - <p>If the <c>Name</c> argument is given as "<c>{binary,Binary}</c>", - the contents of the binary is assumed to be a tar archive. - </p> - <p>If the <c>Name</c> argument is given as "<c>{file,Fd}</c>", - <c>Fd</c> is assumed to be a file descriptor returned from - the <c>file:open/2</c> function. - </p> - <p>Otherwise, <c>Name</c> should be a filename. - </p> + <p>Extracts files from a tar archive.</p> + <p>If argument <c>Name</c> is specified as <c>{binary,Binary}</c>, + the contents of the binary is assumed to be a tar archive.</p> + <p>If argument <c>Name</c> is specified as <c>{file,Fd}</c>, + <c>Fd</c> is assumed to be a file descriptor returned from function + <c>file:open/2</c>.</p> + <p>Otherwise, <c>Name</c> is to be a filename.</p> <p>The following options modify the defaults for the extraction as - follows.</p> + follows:</p> <taglist> <tag><c>{cwd,Cwd}</c></tag> <item> - <p>Files with relative filenames will by default be extracted - to the current working directory. - Given the <c>{cwd,Cwd}</c> option, the <c>extract/2</c> function - will extract into the directory <c>Cwd</c> instead of to the current - working directory.</p> + <p>Files with relative filenames are by default extracted + to the current working directory. With this option, files are + instead extracted into directory <c>Cwd</c>.</p> </item> <tag><c>{files,FileList}</c></tag> <item> - <p>By default, all files will be extracted from the tar file. - Given the <c>{files,Files}</c> option, the <c>extract/2</c> function - will only extract the files whose names are included in <c>FileList</c>.</p> + <p>By default, all files are extracted from the tar file. With + this option, only those files are extracted whose names are + included in <c>FileList</c>.</p> </item> <tag><c>compressed</c></tag> <item> - <p>Given the <c>compressed</c> option, the <c>extract/2</c> - function will uncompress the file while extracting - If the tar file is not actually compressed, the <c>compressed</c> - will effectively be ignored.</p> + <p>With this option, the file is uncompressed while extracting. + If the tar file is not compressed, this option is ignored.</p> </item> <tag><c>cooked</c></tag> <item> - <p>By default, the <c>open/2</c> function will open the tar file - in <c>raw</c> mode, which is faster but does not allow a remote (erlang) - file server to be used. Adding <c>cooked</c> to the mode list will - override the default and open the tar file without the <c>raw</c> - option.</p> + <p>By default, function <c>open/2</c> function opens the tar file + in <c>raw</c> mode, which is faster but does not allow a remote + (Erlang) file server to be used. Adding <c>cooked</c> to the mode + list overrides the default and opens the tar file without option + <c>raw</c>.</p> </item> <tag><c>memory</c></tag> <item> - <p>Instead of extracting to a directory, the memory option will - give the result as a list of tuples {Filename, Binary}, where - Binary is a binary containing the extracted data of the file named - Filename in the tar file.</p> + <p>Instead of extracting to a directory, this option gives the + result as a list of tuples <c>{Filename, Binary}</c>, where + <c>Binary</c> is a binary containing the extracted data of the + file named <c>Filename</c> in the tar file.</p> </item> <tag><c>keep_old_files</c></tag> <item> - <p>By default, all existing files with the same name as file in - the tar file will be overwritten - Given the <c>keep_old_files</c> option, the <c>extract/2</c> function - will not overwrite any existing files.</p> + <p>By default, all existing files with the same name as files in + the tar file are overwritten. With this option, existing + files are not overwriten.</p> </item> <tag><c>verbose</c></tag> <item> - <p>Print an informational message as each file is being extracted.</p> + <p>Prints an informational message for each extracted file.</p> </item> </taglist> </desc> </func> + <func> <name>format_error(Reason) -> string()</name> - <fsummary>Convert error term to a readable string</fsummary> + <fsummary>Convert error term to a readable string.</fsummary> <type> <v>Reason = term()</v> </type> <desc> - <p>The <marker id="format_error_1"></marker><c>format_error/1</c> - function converts - an error reason term to a human-readable error message string.</p> + <p>Cconverts an error reason term to a human-readable error message + string.</p> </desc> </func> + <func> - <name>open(Name, OpenModeList) -> RetValue</name> - <fsummary>Open a tar file for writing.</fsummary> + <name>init(UserPrivate, AccessMode, Fun) -> + {ok,TarDescriptor} | {error,Reason}</name> + <fsummary>Create a <c>TarDescriptor</c> used in subsequent tar operations + when defining own low-level storage access functions.</fsummary> <type> - <v>Name = filename()</v> - <v>OpenModeList = [OpenMode]</v> - <v>Mode = write|compressed|cooked</v> - <v>RetValue = {ok,TarDescriptor}|{error,{Name,Reason}}</v> - <v>TarDescriptor = term()</v> + <v>UserPrivate = term()</v> + <v>AccessMode = [write] | [read]</v> + <v>Fun when AccessMode is [write] = + fun(write, {UserPrivate,DataToWrite})->...; + (position,{UserPrivate,Position})->...; + (close, UserPrivate)->... end</v> + <v>Fun when AccessMode is [read] = + fun(read2, {UserPrivate,Size})->...; + (position,{UserPrivate,Position})->...; + (close, UserPrivate)->... end</v> + <v>TarDescriptor = term()</v> <v>Reason = term()</v> </type> <desc> - <p>The <marker id="open"></marker><c>open/2</c> function creates - a tar file for writing. - (Any existing file with the same name will be truncated.)</p> - <p>By convention, the name of a tar file should end in "<c>.tar</c>". - To abide to the convention, you'll need to add "<c>.tar</c>" yourself - to the name.</p> - <p>Except for the <c>write</c> atom the following atoms - may be added to <c>OpenModeList</c>:</p> + <p>The <c>Fun</c> is the definition of what to do when the different + storage operations functions are to be called from the higher tar + handling functions (such as <c>add/3</c>, <c>add/4</c>, and + <c>close/1</c>).</p> + <p>The <c>Fun</c> is called when the tar function wants to do a + low-level operation, like writing a block to a file. The <c>Fun</c> + is called as <c>Fun(Op, {UserPrivate,Parameters...})</c>, where + <c>Op</c> is the operation name, <c>UserPrivate</c> is the term + passed as the first argument to <c>init/1</c> and + <c>Parameters...</c> are the data added by the tar function to be + passed down to the storage handling function.</p> + <p>Parameter <c>UserPrivate</c> is typically the result of opening a + low-level structure like a file descriptor or an SFTP channel id. + The different <c>Fun</c> clauses operate on that very term.</p> + <p>The following are the fun clauses parameter lists:</p> <taglist> - <tag><c>compressed</c></tag> + <tag><c>(write, {UserPrivate,DataToWrite})</c></tag> <item> - <p>The entire tar file will be compressed, as if it has - been run through the <c>gzip</c> program. To abide to the - convention that a compressed tar file should end in "<c>.tar.gz</c>" or - "<c>.tgz</c>", you'll need to add the appropriate extension yourself.</p> + <p>Writes term <c>DataToWrite</c> using <c>UserPrivate</c>.</p> </item> - <tag><c>cooked</c></tag> + <tag><c>(close, UserPrivate)</c></tag> + <item> + <p>Closes the access.</p> + </item> + <tag><c>(read2, {UserPrivate,Size})</c></tag> <item> - <p>By default, the <c>open/2</c> function will open the tar file - in <c>raw</c> mode, which is faster but does not allow a remote (erlang) - file server to be used. Adding <c>cooked</c> to the mode list will - override the default and open the tar file without the <c>raw</c> - option.</p> + <p>Reads using <c>UserPrivate</c> but only <c>Size</c> bytes. + Notice that there is only an arity-2 read function, not an arity-1 + function.</p> + </item> + <tag><c>(position,{UserPrivate,Position})</c></tag> + <item> + <p>Sets the position of <c>UserPrivate</c> as defined for files in + <seealso marker="kernel:file#position-2"> + <c>file:position/2</c></seealso></p> </item> </taglist> - <p>Use the <seealso marker="#add">add/3,4</seealso> functions - to add one file at the time into an opened tar file. When you are - finished adding files, use the <seealso marker="#close">close</seealso> - function to close the tar file.</p> + <p><em>Example:</em></p> + <p>The following is a complete <c>Fun</c> parameter for reading and + writing on files using the + <seealso marker="kernel:file"><c>file</c></seealso> module:</p> + <code type="none"> +ExampleFun = + fun(write, {Fd,Data}) -> file:write(Fd, Data); + (position, {Fd,Pos}) -> file:position(Fd, Pos); + (read2, {Fd,Size}) -> file:read(Fd, Size); + (close, Fd) -> file:close(Fd) + end</code> + <p>Here <c>Fd</c> was specified to function <c>init/3</c> as:</p> + <code> +{ok,Fd} = file:open(Name, ...). +{ok,TarDesc} = erl_tar:init(Fd, [write], ExampleFun),</code> + <p><c>TarDesc</c> is then used:</p> + <code> +erl_tar:add(TarDesc, SomeValueIwantToAdd, FileNameInTarFile), +..., +erl_tar:close(TarDesc)</code> + <p>When the <c>erl_tar</c> core wants to, for example, write a piece + of <c>Data</c>, it would call + <c>ExampleFun(write, {UserPrivate,Data})</c>.</p> + <note> + <p>This example with the <c>file</c> module operations is + not necessary to use directly, as that is what function + <seealso marker="#open/2"><c>open/2</c></seealso> in principle + does.</p> + </note> <warning> - <p>The <c>TarDescriptor</c> term is not a file descriptor. - You should not rely on the specific contents of the <c>TarDescriptor</c> - term, as it may change in future versions as more features are added - to the <c>erl_tar</c> module.</p> + <p>The <c>TarDescriptor</c> term is not a file descriptor. You are + advised not to rely on the specific contents of this term, as it + can change in future Erlang/OTP releases when more features are + added to this module.</p> </warning> </desc> </func> <func> - <name>init(UserPrivate, AccessMode, Fun) -> {ok,TarDescriptor} | {error,Reason} -</name> - <fsummary>Creates a TarDescriptor used in subsequent tar operations when - defining own low-level storage access functions - </fsummary> + <name>open(Name, OpenModeList) -> RetValue</name> + <fsummary>Open a tar file for writing.</fsummary> <type> - <v>UserPrivate = term()</v> - <v>AccessMode = [write] | [read]</v> - <v>Fun when AccessMode is [write] = fun(write, {UserPrivate,DataToWrite})->...; - (position,{UserPrivate,Position})->...; - (close, UserPrivate)->... - end - </v> - <v>Fun when AccessMode is [read] = fun(read2, {UserPrivate,Size})->...; - (position,{UserPrivate,Position})->...; - (close, UserPrivate)->... - end - </v> - <v>TarDescriptor = term()</v> - <v>Reason = term()</v> + <v>Name = filename()</v> + <v>OpenModeList = [OpenMode]</v> + <v>Mode = write|compressed|cooked</v> + <v>RetValue = {ok,TarDescriptor}|{error,{Name,Reason}}</v> + <v>TarDescriptor = term()</v> + <v>Reason = term()</v> </type> <desc> - <p>The <c>Fun</c> is the definition of what to do when the different - storage operations functions are to be called from the higher tar - handling functions (<c>add/3</c>, <c>add/4</c>, <c>close/1</c>...). - </p> - <p>The <c>Fun</c> will be called when the tar function wants to do - a low-level operation, like writing a block to a file. The Fun is called - as <c>Fun(Op,{UserPrivate,Parameters...})</c> where <c>Op</c> is the operation name, - <c>UserPrivate</c> is the term passed as the first argument to <c>init/1</c> and - <c>Parameters...</c> are the data added by the tar function to be passed down to - the storage handling function. - </p> - <p>The parameter <c>UserPrivate</c> is typically the result of opening a low level - structure like a file descriptor, a sftp channel id or such. The different <c>Fun</c> - clauses operates on that very term. - </p> - <p>The fun clauses parameter lists are:</p> - <taglist> - <tag><c>(write, {UserPrivate,DataToWrite})</c></tag> - <item>Write the term <c>DataToWrite</c> using <c>UserPrivate</c></item> - <tag><c>(close, UserPrivate)</c></tag> - <item>Close the access.</item> - <tag><c>(read2, {UserPrivate,Size})</c></tag> - <item>Read using <c>UserPrivate</c> but only <c>Size</c> bytes. Note that there is - only an arity-2 read function, not an arity-1 - </item> - <tag><c> (position,{UserPrivate,Position})</c></tag> - <item>Sets the position of <c>UserPrivate</c> as defined for files in <seealso marker="kernel:file#position-2">file:position/2</seealso></item> - <tag><c></c></tag> - <item></item> - </taglist> - <p>A complete <c>Fun</c> parameter for reading and writing on files using the - <seealso marker="kernel:file">file module</seealso> could be: - </p> - <code type="none"> - ExampleFun = - fun(write, {Fd,Data}) -> file:write(Fd, Data); - (position, {Fd,Pos}) -> file:position(Fd, Pos); - (read2, {Fd,Size}) -> file:read(Fd,Size); - (close, Fd) -> file:close(Fd) - end - </code> - <p>where <c>Fd</c> was given to the <c>init/3</c> function as:</p> - <code> - {ok,Fd} = file:open(Name,...). - {ok,TarDesc} = erl_tar:init(Fd, [write], ExampleFun), - </code> - <p>The <c>TarDesc</c> is then used:</p> - <code> - erl_tar:add(TarDesc, SomeValueIwantToAdd, FileNameInTarFile), - ...., - erl_tar:close(TarDesc) - </code> - <p>When the erl_tar core wants to e.g. write a piece of Data, it would call - <c>ExampleFun(write,{UserPrivate,Data})</c>. - </p> - <note> - <p>The example above with <c>file</c> module operations is not necessary to - use directly since that is what the <seealso marker="#open">open</seealso> function - in principle does. - </p> - </note> + <p>Creates a tar file for writing (any existing file with the same + name is truncated).</p> + <p>By convention, the name of a tar file is to end in "<c>.tar</c>". + To abide to the convention, add "<c>.tar</c>" to the name.</p> + <p>Except for the <c>write</c> atom, the following atoms + can be added to <c>OpenModeList</c>:</p> + <taglist> + <tag><c>compressed</c></tag> + <item> + <p>The entire tar file is compressed, as if it has been run + through the <c>gzip</c> program. To abide to the convention + that a compressed tar file is to end in "<c>.tar.gz</c>" or + "<c>.tgz</c>", add the appropriate extension.</p> + </item> + <tag><c>cooked</c></tag> + <item> + <p>By default, the tar file is opened in <c>raw</c> mode, which is + faster but does not allow a remote (Erlang) file server to be + used. Adding <c>cooked</c> to the mode list overrides the + default and opens the tar file without option <c>raw</c>.</p> + </item> + </taglist> + <p>To add one file at the time into an opened tar file, use function + <seealso marker="#add/3"><c>add/3,4</c></seealso>. When you are + finished adding files, use function <seealso marker="#close/1"> + <c>close/1</c></seealso> to close the tar file.</p> <warning> - <p>The <c>TarDescriptor</c> term is not a file descriptor. - You should not rely on the specific contents of the <c>TarDescriptor</c> - term, as it may change in future versions as more features are added - to the <c>erl_tar</c> module.</p> + <p>The <c>TarDescriptor</c> term is not a file descriptor. You are + advised not to rely on the specific contents of this term, as it + can change in future Erlang/OTP releases when more features are + added to this module..</p> </warning> </desc> </func> <func> <name>table(Name) -> RetValue</name> - <fsummary>Retrieve the name of all files in a tar file</fsummary> + <fsummary>Retrieve the name of all files in a tar file.</fsummary> <type> <v>Name = filename()</v> <v>RetValue = {ok,[string()]}|{error,{Name,Reason}}</v> <v>Reason = term()</v> </type> <desc> - <p>The <marker id="table_1"></marker><c>table/1</c> function - retrieves the names of all files in the tar file <c>Name</c>.</p> + <p>Retrieves the names of all files in the tar file <c>Name</c>.</p> </desc> </func> + <func> <name>table(Name, Options)</name> - <fsummary>Retrieve name and information of all files in a tar file</fsummary> + <fsummary>Retrieve name and information of all files in a tar file. + </fsummary> <type> <v>Name = filename()</v> </type> <desc> - <p>The <marker id="table_2"></marker><c>table/2</c> function - retrieves the names of all files in the tar file <c>Name</c>.</p> + <p>Retrieves the names of all files in the tar file <c>Name</c>.</p> </desc> </func> + <func> <name>t(Name)</name> - <fsummary>Print the name of each file in a tar file</fsummary> + <fsummary>Print the name of each file in a tar file.</fsummary> <type> <v>Name = filename()</v> </type> <desc> - <p>The <marker id="t_1"></marker><c>t/1</c> function prints the names - of all files in the tar file <c>Name</c> to the Erlang shell. - (Similar to "<c>tar t</c>".)</p> + <p>Prints the names of all files in the tar file <c>Name</c> to the + Erlang shell (similar to "<c>tar t</c>").</p> </desc> </func> + <func> <name>tt(Name)</name> - <fsummary>Print name and information for each file in a tar file</fsummary> + <fsummary>Print name and information for each file in a tar file. + </fsummary> <type> <v>Name = filename()</v> </type> <desc> - <p>The <marker id="tt_1"></marker><c>tt/1</c> function prints - names and - information about all files in the tar file <c>Name</c> to - the Erlang shell. (Similar to "<c>tar tv</c>".)</p> + <p>Prints names and information about all files in the tar file + <c>Name</c> to the Erlang shell (similar to "<c>tar tv</c>").</p> </desc> </func> </funcs> diff --git a/lib/stdlib/doc/src/ets.xml b/lib/stdlib/doc/src/ets.xml index 2d69c201bc..5f5d2b7f36 100644 --- a/lib/stdlib/doc/src/ets.xml +++ b/lib/stdlib/doc/src/ets.xml @@ -29,102 +29,131 @@ <rev></rev> </header> <module>ets</module> - <modulesummary>Built-In Term Storage</modulesummary> + <modulesummary>Built-in term storage.</modulesummary> <description> <p>This module is an interface to the Erlang built-in term storage BIFs. These provide the ability to store very large quantities of data in an Erlang runtime system, and to have constant access time to the data. (In the case of <c>ordered_set</c>, see below, access time is proportional to the logarithm of the number of - objects stored).</p> + stored objects.)</p> + <p>Data is organized as a set of dynamic tables, which can store tuples. Each table is created by a process. When the process terminates, the table is automatically destroyed. Every table has access rights set at creation.</p> + <p>Tables are divided into four different types, <c>set</c>, - <c>ordered_set</c>, <c>bag</c> and <c>duplicate_bag</c>. + <c>ordered_set</c>, <c>bag</c>, and <c>duplicate_bag</c>. A <c>set</c> or <c>ordered_set</c> table can only have one object - associated with each key. A <c>bag</c> or <c>duplicate_bag</c> can + associated with each key. A <c>bag</c> or <c>duplicate_bag</c> table can have many objects associated with each key.</p> + <p>The number of tables stored at one Erlang node is limited. - The current default limit is approximately 1400 tables. The upper - limit can be increased by setting the environment variable + The current default limit is about 1400 tables. The upper + limit can be increased by setting environment variable <c>ERL_MAX_ETS_TABLES</c> before starting the Erlang runtime - system (i.e. with the <c>-env</c> option to - <c>erl</c>/<c>werl</c>). The actual limit may be slightly higher + system (that is, with option <c>-env</c> to + <c>erl</c>/<c>werl</c>). The actual limit can be slightly higher than the one specified, but never lower.</p> - <p>Note that there is no automatic garbage collection for tables. + + <p>Notice that there is no automatic garbage collection for tables. Even if there are no references to a table from any process, it - will not automatically be destroyed unless the owner process - terminates. It can be destroyed explicitly by using - <c>delete/1</c>. The default owner is the process that created the - table. Table ownership can be transferred at process termination - by using the <seealso marker="#heir">heir</seealso> option or explicitly - by calling <seealso marker="#give_away/3">give_away/3</seealso>.</p> + is not automatically destroyed unless the owner process + terminates. To destroy a table explicitly, use function + <seealso marker="#delete/1"><c>delete/1</c></seealso>. + The default owner is the process that created the + table. To transfer table ownership at process termination, use + option <seealso marker="#heir"><c>heir</c></seealso> or call + <seealso marker="#give_away/3"><c>give_away/3</c></seealso>.</p> + <p>Some implementation details:</p> + <list type="bulleted"> - <item>In the current implementation, every object insert and - look-up operation results in a copy of the object.</item> - <item><c>'$end_of_table'</c> should not be used as a key since - this atom is used to mark the end of the table when using - <c>first</c>/<c>next</c>.</item> + <item><p>In the current implementation, every object insert and + look-up operation results in a copy of the object.</p></item> + <item><p><c>'$end_of_table'</c> is not to be used as a key, as + this atom is used to mark the end of the table when using functions + <seealso marker="#first/1"><c>first/1</c></seealso> and + <seealso marker="#next/2"><c>next/2</c></seealso>.</p></item> </list> - <p>Also worth noting is the subtle difference between + + <p>Notice the subtle difference between <em>matching</em> and <em>comparing equal</em>, which is - demonstrated by the different table types <c>set</c> and - <c>ordered_set</c>. Two Erlang terms <c>match</c> if they are of - the same type and have the same value, so that <c>1</c> matches - <c>1</c>, but not <c>1.0</c> (as <c>1.0</c> is a <c>float()</c> - and not an <c>integer()</c>). Two Erlang terms <em>compare equal</em> if they either are of the same type and value, or if - both are numeric types and extend to the same value, so that - <c>1</c> compares equal to both <c>1</c> and <c>1.0</c>. The - <c>ordered_set</c> works on the <em>Erlang term order</em> and - there is no defined order between an <c>integer()</c> and a - <c>float()</c> that extends to the same value, hence the key - <c>1</c> and the key <c>1.0</c> are regarded as equal in an - <c>ordered_set</c> table.</p> + demonstrated by table types <c>set</c> and <c>ordered_set</c>:</p> + + <list type="bulleted"> + <item> + <p>Two Erlang terms <c>match</c> if they are of + the same type and have the same value, so that <c>1</c> matches + <c>1</c>, but not <c>1.0</c> (as <c>1.0</c> is a <c>float()</c> + and not an <c>integer()</c>).</p> + </item> + <item> + <p>Two Erlang terms <em>compare equal</em> + if they either are of the same type and value, or if + both are numeric types and extend to the same value, so that + <c>1</c> compares equal to both <c>1</c> and <c>1.0</c>.</p> + </item> + <item> + <p>The <c>ordered_set</c> works on the <em>Erlang term order</em> and + no defined order exists between an <c>integer()</c> and a + <c>float()</c> that extends to the same value. Hence the key + <c>1</c> and the key <c>1.0</c> are regarded as equal in an + <c>ordered_set</c> table.</p> + </item> + </list> </description> + <section> <title>Failure</title> - <p>In general, the functions below will exit with reason - <c>badarg</c> if any argument is of the wrong format, if the - table identifier is invalid or if the operation is denied due to + <p>The functions in this module exits with reason + <c>badarg</c> if any argument has the wrong format, if the + table identifier is invalid, or if the operation is denied because of table access rights (<seealso marker="#protected">protected</seealso> or <seealso marker="#private">private</seealso>).</p> </section> + <section><marker id="concurrency"></marker> <title>Concurrency</title> <p>This module provides some limited support for concurrent access. All updates to single objects are guaranteed to be both <em>atomic</em> - and <em>isolated</em>. This means that an updating operation towards - a single object will either succeed or fail completely without any - effect at all (atomicy). - Nor can any intermediate results of the update be seen by other - processes (isolation). Some functions that update several objects - state that they even guarantee atomicy and isolation for the entire + and <em>isolated</em>. This means that an updating operation to + a single object either succeeds or fails completely without any + effect (atomicity) and that + no intermediate results of the update can be seen by other + processes (isolation). Some functions that update many objects + state that they even guarantee atomicity and isolation for the entire operation. In database terms the isolation level can be seen as - "serializable", as if all isolated operations were carried out serially, + "serializable", as if all isolated operations are carried out serially, one after the other in a strict order.</p> - <p>No other support is available within ETS that would guarantee - consistency between objects. However, the <c>safe_fixtable/2</c> - function can be used to guarantee that a sequence of - <c>first/1</c> and <c>next/2</c> calls will traverse the table - without errors and that each existing object in the table is visited - exactly once, even if another process (or the same process) + + <p>No other support is available within this module that would guarantee + consistency between objects. However, function + <seealso marker="#safe_fixtable/2"><c>safe_fixtable/2</c></seealso> + can be used to guarantee that a sequence of + <seealso marker="#first/1"><c>first/1</c></seealso> and + <seealso marker="#next/2"><c>next/2</c></seealso> calls traverse the + table without errors and that each existing object in the table is + visited exactly once, even if another (or the same) process simultaneously deletes or inserts objects into the table. - Nothing more is guaranteed; in particular objects that are inserted - or deleted during such a traversal may be visited once or not at all. - Functions that internally traverse over a table, like <c>select</c> - and <c>match</c>, will give the same guarantee as <c>safe_fixtable</c>.</p> + Nothing else is guaranteed; in particular objects that are inserted + or deleted during such a traversal can be visited once or not at all. + Functions that internally traverse over a table, like + <seealso marker="#select/1"><c>select</c></seealso> and + <seealso marker="#match/1"><c>match</c></seealso>, + give the same guarantee as + <seealso marker="#safe_fixtable/2"><c>safe_fixtable</c></seealso>.</p> </section> + <section> <marker id="match_spec"></marker> <title>Match Specifications</title> - <p>Some of the functions uses a <em>match specification</em>, - match_spec. A brief explanation is given in - <seealso marker="#select/2">select/2</seealso>. For a detailed - description, see the chapter "Match specifications in Erlang" in - <em>ERTS User's Guide</em>.</p> + <p>Some of the functions use a <em>match specification</em>, + <c>match_spec</c>. For a brief explanation, see + <seealso marker="#select/2"><c>select/2</c></seealso>. For a detailed + description, see section <seealso marker="erts:match_spec"> + Match Specifications in Erlang</seealso> in ERTS User's Guide.</p> </section> <datatypes> @@ -134,11 +163,9 @@ <datatype> <name>continuation()</name> <desc> - <p><marker id="type-continuation"/> - Opaque continuation used by <seealso marker="#select/1"> + <p>Opaque continuation used by <seealso marker="#select/1"> <c>select/1,3</c></seealso>, <seealso marker="#select_reverse/1"> - <c>select_reverse/1,3</c></seealso>, <seealso - marker="#match/1"> + <c>select_reverse/1,3</c></seealso>, <seealso marker="#match/1"> <c>match/1,3</c></seealso>, and <seealso marker="#match_object/1"> <c>match_object/1,3</c></seealso>.</p> </desc> @@ -159,26 +186,30 @@ </datatype> <datatype> <name name="tid"/> - <desc><p>A table identifier, as returned by new/2.</p></desc> + <desc><p>A table identifier, as returned by + <seealso marker="#new/2"><c>new/2</c></seealso>.</p></desc> </datatype> <datatype> <name name="type"/> </datatype> </datatypes> + <funcs> <func> <name name="all" arity="0"/> <fsummary>Return a list of all ETS tables.</fsummary> <desc> <p>Returns a list of all tables at the node. Named tables are - given by their names, unnamed tables are given by their + specified by their names, unnamed tables are specified by their table identifiers.</p> - <p>There is no guarantee of consistency in the returned list. Tables created - or deleted by other processes "during" the ets:all() call may or may - not be included in the list. Only tables created/deleted <em>before</em> - ets:all() is called are guaranteed to be included/excluded.</p> + <p>There is no guarantee of consistency in the returned list. Tables + created or deleted by other processes "during" the <c>ets:all()</c> + call either are or are not included in the list. Only tables + created/deleted <em>before</em> <c>ets:all()</c> is called are + guaranteed to be included/excluded.</p> </desc> </func> + <func> <name name="delete" arity="1"/> <fsummary>Delete an entire ETS table.</fsummary> @@ -186,175 +217,187 @@ <p>Deletes the entire table <c><anno>Tab</anno></c>.</p> </desc> </func> + <func> <name name="delete" arity="2"/> - <fsummary>Delete all objects with a given key from an ETS table.</fsummary> + <fsummary>Delete all objects with a specified key from an ETS + table.</fsummary> <desc> - <p>Deletes all objects with the key <c><anno>Key</anno></c> from the table + <p>Deletes all objects with key <c><anno>Key</anno></c> from table <c><anno>Tab</anno></c>.</p> </desc> </func> + <func> <name name="delete_all_objects" arity="1"/> <fsummary>Delete all objects in an ETS table.</fsummary> <desc> <p>Delete all objects in the ETS table <c><anno>Tab</anno></c>. - The operation is guaranteed to be - <seealso marker="#concurrency">atomic and isolated</seealso>.</p> + The operation is guaranteed to be + <seealso marker="#concurrency">atomic and isolated</seealso>.</p> </desc> </func> + <func> <name name="delete_object" arity="2"/> <fsummary>Deletes a specific from an ETS table.</fsummary> <desc> - <p>Delete the exact object <c><anno>Object</anno></c> from the ETS table, + <p>Delete the exact object <c><anno>Object</anno></c> from the + ETS table, leaving objects with the same key but other differences - (useful for type <c>bag</c>). In a <c>duplicate_bag</c>, all - instances of the object will be deleted.</p> + (useful for type <c>bag</c>). In a <c>duplicate_bag</c> table, all + instances of the object are deleted.</p> </desc> </func> + <func> <name name="file2tab" arity="1"/> <fsummary>Read an ETS table from a file.</fsummary> <desc> - <p>Reads a file produced by <seealso - marker="#tab2file/2">tab2file/2</seealso> or - <seealso marker="#tab2file/3">tab2file/3</seealso> and creates the - corresponding table <c><anno>Tab</anno></c>.</p> - <p>Equivalent to <c>file2tab(<anno>Filename</anno>, [])</c>.</p> + <p>Reads a file produced by <seealso marker="#tab2file/2"> + <c>tab2file/2</c></seealso> or + <seealso marker="#tab2file/3"><c>tab2file/3</c></seealso> and + creates the corresponding table <c><anno>Tab</anno></c>.</p> + <p>Equivalent to <c>file2tab(<anno>Filename</anno>, [])</c>.</p> </desc> </func> + <func> <name name="file2tab" arity="2"/> <fsummary>Read an ETS table from a file.</fsummary> <desc> - <p>Reads a file produced by <seealso - marker="#tab2file/2">tab2file/2</seealso> or - <seealso marker="#tab2file/3">tab2file/3</seealso> and creates the + <p>Reads a file produced by <seealso marker="#tab2file/2"> + <c>tab2file/2</c></seealso> or <seealso marker="#tab2file/3"> + <c>tab2file/3</c></seealso> and creates the corresponding table <c><anno>Tab</anno></c>.</p> - <p>The currently only supported option is <c>{verify,boolean()}</c>. If - verification is turned on (by means of specifying - <c>{verify,true}</c>), the function utilizes whatever - information is present in the file to assert that the - information is not damaged. How this is done depends on which - <c>extended_info</c> was written using - <seealso marker="#tab2file/3">tab2file/3</seealso>.</p> - <p>If no <c>extended_info</c> is present in the file and - <c>{verify,true}</c> is specified, the number of objects - written is compared to the size of the original table when the - dump was started. This might make verification fail if the - table was - <c>public</c> and objects were added or removed while the - table was dumped to file. To avoid this type of problems, - either do not verify files dumped while updated simultaneously - or use the <c>{extended_info, [object_count]}</c> option to - <seealso marker="#tab2file/3">tab2file/3</seealso>, which - extends the information in the file with the number of objects - actually written.</p> - <p>If verification is turned on and the file was written with - the option <c>{extended_info, [md5sum]}</c>, reading the file - is slower and consumes radically more CPU time than - otherwise.</p> + <p>The only supported option is <c>{verify,boolean()}</c>. + If verification is turned on (by specifying <c>{verify,true}</c>), + the function uses whatever information is present in the file to + assert that the information is not damaged. How this is done depends + on which <c>extended_info</c> was written using + <seealso marker="#tab2file/3"><c>tab2file/3</c></seealso>.</p> + <p>If no <c>extended_info</c> is present in the file and + <c>{verify,true}</c> is specified, the number of objects + written is compared to the size of the original table when the + dump was started. This can make verification fail if the table was + <c>public</c> and objects were added or removed while the + table was dumped to file. To avoid this problem, + either do not verify files dumped while updated simultaneously + or use option <c>{extended_info, [object_count]}</c> to + <seealso marker="#tab2file/3"><c>tab2file/3</c></seealso>, which + extends the information in the file with the number of objects + written.</p> + <p>If verification is turned on and the file was written with + option <c>{extended_info, [md5sum]}</c>, reading the file + is slower and consumes radically more CPU time than otherwise.</p> <p><c>{verify,false}</c> is the default.</p> </desc> </func> + <func> <name name="first" arity="1"/> <fsummary>Return the first key in an ETS table.</fsummary> <desc> - <p>Returns the first key <c><anno>Key</anno></c> in the table <c><anno>Tab</anno></c>. - If the table is of the <c>ordered_set</c> type, the first key - in Erlang term order will be returned. If the table is of any - other type, the first key according to the table's internal - order will be returned. If the table is empty, - <c>'$end_of_table'</c> will be returned.</p> - <p>Use <c>next/2</c> to find subsequent keys in the table.</p> + <p>Returns the first key <c><anno>Key</anno></c> in table + <c><anno>Tab</anno></c>. For an <c>ordered_set</c> table, the first + key in Erlang term order is returned. For other + table types, the first key according to the internal + order of the table is returned. If the table is empty, + <c>'$end_of_table'</c> is returned.</p> + <p>To find subsequent keys in the table, use + <seealso marker="#next/2"><c>next/2</c></seealso>.</p> </desc> </func> + <func> <name name="foldl" arity="3"/> - <fsummary>Fold a function over an ETS table</fsummary> + <fsummary>Fold a function over an ETS table.</fsummary> <desc> <p><c><anno>Acc0</anno></c> is returned if the table is empty. - This function is similar to <c>lists:foldl/3</c>. The order in - which the elements of the table are traversed is unspecified, - except for tables of type <c>ordered_set</c>, for which they - are traversed first to last.</p> - - <p>If <c><anno>Function</anno></c> inserts objects into the table, or another - process inserts objects into the table, those objects <em>may</em> - (depending on key ordering) be included in the traversal.</p> + This function is similar to + <seealso marker="lists#foldl/3"><c>lists:foldl/3</c></seealso>. + The table elements are traversed is unspecified order, except for + <c>ordered_set</c> tables, where they are traversed first to last.</p> + <p>If <c><anno>Function</anno></c> inserts objects into the table, + or another + process inserts objects into the table, those objects <em>can</em> + (depending on key ordering) be included in the traversal.</p> </desc> </func> + <func> <name name="foldr" arity="3"/> - <fsummary>Fold a function over an ETS table</fsummary> + <fsummary>Fold a function over an ETS table.</fsummary> <desc> <p><c><anno>Acc0</anno></c> is returned if the table is empty. - This function is similar to <c>lists:foldr/3</c>. The order in - which the elements of the table are traversed is unspecified, - except for tables of type <c>ordered_set</c>, for which they - are traversed last to first.</p> - - <p>If <c><anno>Function</anno></c> inserts objects into the table, or another - process inserts objects into the table, those objects <em>may</em> - (depending on key ordering) be included in the traversal.</p> + This function is similar to + <seealso marker="lists#foldr/3"><c>lists:foldr/3</c></seealso>. + The table elements are traversed is unspecified order, except for + <c>ordered_set</c> tables, where they are traversed last to first.</p> + <p>If <c><anno>Function</anno></c> inserts objects into the table, + or another + process inserts objects into the table, those objects <em>can</em> + (depending on key ordering) be included in the traversal.</p> </desc> </func> + <func> <name name="from_dets" arity="2"/> - <fsummary>Fill an ETS table with objects from a Dets table.</fsummary> + <fsummary>Fill an ETS table with objects from a Dets + table.</fsummary> <desc> <p>Fills an already created ETS table with the objects in the - already opened Dets table named <c><anno>DetsTab</anno></c>. The existing - objects of the ETS table are kept unless overwritten.</p> - <p>Throws a badarg error if any of the tables does not exist or the - dets table is not open.</p> + already opened Dets table <c><anno>DetsTab</anno></c>. + Existing objects in the ETS table are kept unless + overwritten.</p> + <p>If any of the tables does not exist or the Dets table is + not open, a <c>badarg</c> exception is raised.</p> </desc> </func> + <func> <name name="fun2ms" arity="1"/> - <fsummary>Pseudo function that transforms fun syntax to a match_spec.</fsummary> + <fsummary>Pseudo function that transforms fun syntax to a match + specification.</fsummary> <desc> - <p>Pseudo function that by means of a <c>parse_transform</c> - translates <c><anno>LiteralFun</anno></c> typed as parameter in the - function call to a - <seealso marker="#match_spec">match_spec</seealso>. With - "literal" is meant that the fun needs to textually be written + <p>Pseudo function that by a <c>parse_transform</c> translates + <c><anno>LiteralFun</anno></c> typed as parameter in the function + call to a + <seealso marker="#match_spec">match specification</seealso>. + With "literal" is meant that the fun must textually be written as the parameter of the function, it cannot be held in a - variable which in turn is passed to the function).</p> - <p>The parse transform is implemented in the module - <c>ms_transform</c> and the source <em>must</em> include the + variable that in turn is passed to the function.</p> + <p>The parse transform is provided in the <c>ms_transform</c> + module and the source <em>must</em> include file <c>ms_transform.hrl</c> in STDLIB for this pseudo function to work. Failing to include the hrl file in - the source will result in a runtime error, not a compile - time ditto. The include file is easiest included by adding - the line + the source results in a runtime error, not a compile + time error. The include file is easiest included by adding line <c>-include_lib("stdlib/include/ms_transform.hrl").</c> to the source file.</p> <p>The fun is very restricted, it can take only a single parameter (the object to match): a sole variable or a - tuple. It needs to use the <c>is_</c> guard tests. - Language constructs that have no representation - in a match_spec (like <c>if</c>, <c>case</c>, <c>receive</c> - etc) are not allowed.</p> - <p>The return value is the resulting match_spec.</p> - <p>Example:</p> + tuple. It must use the <c>is_</c> guard tests. + Language constructs that have no representation in a match + specification (<c>if</c>, <c>case</c>, <c>receive</c>, + and so on) are not allowed.</p> + <p>The return value is the resulting match specification.</p> + <p><em>Example:</em></p> <pre> 1> <input>ets:fun2ms(fun({M,N}) when N > 3 -> M end).</input> [{{'$1','$2'},[{'>','$2',3}],['$1']}]</pre> - <p>Variables from the environment can be imported, so that this - works:</p> + <p>Variables from the environment can be imported, so that the + following works:</p> <pre> 2> <input>X=3.</input> 3 3> <input>ets:fun2ms(fun({M,N}) when N > X -> M end).</input> [{{'$1','$2'},[{'>','$2',{const,3}}],['$1']}]</pre> - <p>The imported variables will be replaced by match_spec + <p>The imported variables are replaced by match specification <c>const</c> expressions, which is consistent with the - static scoping for Erlang funs. Local or global function - calls can not be in the guard or body of the fun however. - Calls to builtin match_spec functions of course is allowed:</p> + static scoping for Erlang funs. However, local or global function + calls cannot be in the guard or body of the fun. Calls to built-in + match specification functions is of course allowed:</p> <pre> 4> <input>ets:fun2ms(fun({M,N}) when N > X, is_atomm(M) -> M end).</input> Error: fun containing local Erlang function calls @@ -362,724 +405,832 @@ Error: fun containing local Erlang function calls {error,transform_error} 5> <input>ets:fun2ms(fun({M,N}) when N > X, is_atom(M) -> M end).</input> [{{'$1','$2'},[{'>','$2',{const,3}},{is_atom,'$1'}],['$1']}]</pre> - <p>As can be seen by the example, the function can be called - from the shell too. The fun needs to be literally in the call - when used from the shell as well. Other means than the - parse_transform are used in the shell case, but more or less - the same restrictions apply (the exception being records, - as they are not handled by the shell).</p> + <p>As shown by the example, the function can be called + from the shell also. The fun must be literally in the call + when used from the shell as well.</p> <warning> - <p>If the parse_transform is not applied to a module which - calls this pseudo function, the call will fail in runtime - (with a <c>badarg</c>). The module <c>ets</c> actually - exports a function with this name, but it should never - really be called except for when using the function in the + <p>If the <c>parse_transform</c> is not applied to a module that + calls this pseudo function, the call fails in runtime + (with a <c>badarg</c>). The <c>ets</c> module + exports a function with this name, but it is never to + be called except when using the function in the shell. If the <c>parse_transform</c> is properly applied by - including the <c>ms_transform.hrl</c> header file, compiled - code will never call the function, but the function call is - replaced by a literal match_spec.</p> + including header file <c>ms_transform.hrl</c>, compiled + code never calls the function, but the function call is + replaced by a literal match specification.</p> </warning> - <p>For more information, see - <seealso marker="ms_transform#top">ms_transform(3)</seealso>.</p> + <p>For more information, see <seealso marker="ms_transform#top"> + <c>ms_transform(3)</c></seealso>.</p> </desc> </func> + <func> <name name="give_away" arity="3"/> <fsummary>Change owner of a table.</fsummary> <desc> - <p>Make process <c><anno>Pid</anno></c> the new owner of table <c><anno>Tab</anno></c>. - If successful, the message - <c>{'ETS-TRANSFER',<anno>Tab</anno>,FromPid,<anno>GiftData</anno>}</c> will be sent - to the new owner.</p> - <p>The process <c><anno>Pid</anno></c> must be alive, local and not already the - owner of the table. The calling process must be the table owner.</p> - <p>Note that <c>give_away</c> does not at all affect the - <seealso marker="#heir">heir</seealso> option of the table. A table - owner can for example set the <c>heir</c> to itself, give the table - away and then get it back in case the receiver terminates.</p> + <p>Make process <c><anno>Pid</anno></c> the new owner of table + <c><anno>Tab</anno></c>. If successful, message + <c>{'ETS-TRANSFER',<anno>Tab</anno>,FromPid,<anno>GiftData</anno>}</c> + is sent to the new owner.</p> + <p>The process <c><anno>Pid</anno></c> must be alive, local, and not + already the owner of the table. + The calling process must be the table owner.</p> + <p>Notice that this function does not affect option + <seealso marker="#heir"><c>heir</c></seealso> of the table. A table + owner can, for example, set <c>heir</c> to itself, give the table + away, and then get it back if the receiver terminates.</p> </desc> </func> + <func> <name name="i" arity="0"/> - <fsummary>Display information about all ETS tables on tty.</fsummary> + <fsummary>Display information about all ETS tables on a terminal. + </fsummary> <desc> - <p>Displays information about all ETS tables on tty.</p> + <p>Displays information about all ETS tables on a terminal.</p> </desc> </func> + <func> <name name="i" arity="1"/> - <fsummary>Browse an ETS table on tty.</fsummary> + <fsummary>Browse an ETS table on a terminal.</fsummary> <desc> - <p>Browses the table <c><anno>Tab</anno></c> on tty.</p> + <p>Browses table <c><anno>Tab</anno></c> on a terminal.</p> </desc> </func> + <func> <name name="info" arity="1"/> - <fsummary>Return information about an ETS table.</fsummary> + <fsummary>Return information about an <c>table</c>.</fsummary> <desc> - <p>Returns information about the table <c><anno>Tab</anno></c> as a list of + <p>Returns information about table <c><anno>Tab</anno></c> as a list of tuples. If <c><anno>Tab</anno></c> has the correct type - for a table identifier, but does not refer to an existing ETS - table, <c>undefined</c> is returned. If <c><anno>Tab</anno></c> is not of the - correct type, this function fails with reason <c>badarg</c>.</p> - - <list type="bulleted"> - <item><c>{compressed, boolean()}</c> <br></br> - - Indicates if the table is compressed or not.</item> - <item><c>{heir, pid() | none}</c> <br></br> - - The pid of the heir of the table, or <c>none</c> if no heir is set.</item> - <item><c>{keypos, integer() >= 1}</c> <br></br> - - The key position.</item> - <item><c>{memory, integer() >= 0</c> <br></br> - - The number of words allocated to the table.</item> - <item><c>{name, atom()}</c> <br></br> - - The name of the table.</item> - <item><c>{named_table, boolean()}</c> <br></br> - - Indicates if the table is named or not.</item> - <item><c>{node, node()}</c> <br></br> - - The node where the table is stored. This field is no longer - meaningful as tables cannot be accessed from other nodes.</item> - <item><c>{owner, pid()}</c> <br></br> - - The pid of the owner of the table.</item> - <item><c>{protection, </c><seealso marker="#type-access">access()</seealso><c>}</c> <br></br> - - The table access rights.</item> - <item><c>{size, integer() >= 0</c> <br></br> - - The number of objects inserted in the table.</item> - <item><c>{type, </c><seealso marker="#type-type">type()</seealso><c>}</c> <br></br> - - The table type.</item> - <item><c>{read_concurrency, boolean()}</c> <br></br> - - Indicates whether the table uses read_concurrency or not.</item> - <item><c>{write_concurrency, boolean()}</c> <br></br> - - Indicates whether the table uses write_concurrency or not.</item> - </list> + for a table identifier, but does not refer to an existing ETS + table, <c>undefined</c> is returned. If <c><anno>Tab</anno></c> is + not of the correct type, a <c>badarg</c> exception is raised.</p> + <taglist> + <tag><c>{compressed, boolean()}</c></tag> + <item> + <p>Indicates if the table is compressed.</p> + </item> + <tag><c>{heir, pid() | none}</c></tag> + <item> + <p>The pid of the heir of the table, or <c>none</c> if no heir + is set.</p> + </item> + <tag><c>{keypos, integer() >= 1}</c></tag> + <item> + <p>The key position.</p> + </item> + <tag><c>{memory, integer() >= 0</c></tag> + <item> + <p>The number of words allocated to the table.</p> + </item> + <tag><c>{name, atom()}</c></tag> + <item> + <p>The table name.</p> + </item> + <tag><c>{named_table, boolean()}</c></tag> + <item> + <p>Indicates if the table is named.</p> + </item> + <tag><c>{node, node()}</c></tag> + <item> + <p>The node where the table is stored. This field is no longer + meaningful, as tables cannot be accessed from other nodes.</p> + </item> + <tag><c>{owner, pid()}</c></tag> + <item> + <p>The pid of the owner of the table.</p> + </item> + <tag><c>{protection,</c> <seealso marker="#type-access"> + <c>access()</c></seealso><c>}</c></tag> + <item> + <p>The table access rights.</p> + </item> + <tag><c>{size, integer() >= 0</c></tag> + <item> + <p>The number of objects inserted in the table.</p> + </item> + <tag><c>{type,</c> <seealso marker="#type-type"> + <c>type()</c></seealso><c>}</c></tag> + <item> + <p>The table type.</p> + </item> + <tag><c>{read_concurrency, boolean()}</c></tag> + <item> + <p>Indicates whether the table uses <c>read_concurrency</c> or + not.</p> + </item> + <tag><c>{write_concurrency, boolean()}</c></tag> + <item> + <p>Indicates whether the table uses <c>write_concurrency</c>.</p> + </item> + </taglist> </desc> </func> + <func> <name name="info" arity="2"/> - <fsummary>Return the information associated with given item for an ETS table.</fsummary> + <fsummary>Return the information associated with the specified item for + an ETS table.</fsummary> <desc> - <p>Returns the information associated with <c>Item</c> for - the table <c><anno>Tab</anno></c>, or returns <c>undefined</c> if <c>Tab</c> - does not refer an existing ETS table. - If <c><anno>Tab</anno></c> is not of the correct type, or if <c><anno>Item</anno></c> is not - one of the allowed values, this function fails with reason <c>badarg</c>.</p> - - <warning><p>In R11B and earlier, this function would not fail but return - <c>undefined</c> for invalid values for <c>Item</c>.</p> - </warning> - - <p>In addition to the <c>{<anno>Item</anno>,<anno>Value</anno>}</c> - pairs defined for <c>info/1</c>, the following items are - allowed:</p> + <p>Returns the information associated with <c>Item</c> for table + <c><anno>Tab</anno></c>, or returns <c>undefined</c> if <c>Tab</c> + does not refer an existing ETS table. If + <c><anno>Tab</anno></c> is + not of the correct type, or if <c><anno>Item</anno></c> is not + one of the allowed values, a <c>badarg</c> exception is raised.</p> + <warning> + <p>In Erlang/OTP R11B and earlier, this function would not fail but + return <c>undefined</c> for invalid values for <c>Item</c>.</p> + </warning> + <p>In addition to the <c>{<anno>Item</anno>,<anno>Value</anno>}</c> + pairs defined for <seealso marker="#info/1"><c>info/1</c></seealso>, + the following items are allowed:</p> <list type="bulleted"> - <item><c>Item=fixed, Value=boolean()</c> <br></br> - - Indicates if the table is fixed by any process or not.</item> - <item><marker id="info_2_safe_fixed_monotonic_time"/> - <p><c>Item=safe_fixed|safe_fixed_monotonic_time, Value={FixationTime,Info}|false</c> <br></br> -</p> + <item> + <p><c>Item=fixed, Value=boolean()</c></p> + <p>Indicates if the table is fixed by any process.</p> + </item> + <item> + <p><marker id="info_2_safe_fixed_monotonic_time"/></p> + <p><c>Item=safe_fixed|safe_fixed_monotonic_time, + Value={FixationTime,Info}|false</c></p> <p>If the table has been fixed using - <seealso marker="#safe_fixtable/2"><c>safe_fixtable/2</c></seealso>, + <seealso marker="#safe_fixtable/2"> + <c>safe_fixtable/2</c></seealso>, the call returns a tuple where <c>FixationTime</c> is the - time when the table was first fixed by a process, which - may or may not be one of the processes it is fixed by - right now.</p> - <p>The format and value of <c>FixationTime</c> depends on - <c>Item</c>:</p> - <taglist> - <tag><c>safe_fixed</c></tag> - <item><p><c>FixationTime</c> will correspond to the result - returned by - <seealso marker="erts:erlang#timestamp/0">erlang:timestamp/0</seealso> - at the time of fixation. Note that when the system is using - single or multi - <seealso marker="erts:time_correction#Time_Warp_Modes">time warp - modes</seealso> this might produce strange results. This - since the usage of <c>safe_fixed</c> is not - <seealso marker="erts:time_correction#Time_Warp_Safe_Code">time warp - safe</seealso>. Time warp safe code need to use - <c>safe_fixed_monotonic_time</c> instead.</p></item> - - <tag><c>safe_fixed_monotonic_time</c></tag> - <item><p><c>FixationTime</c> will correspond to the result - returned by - <seealso marker="erts:erlang#monotonic_time/0">erlang:monotonic_time/0</seealso> - at the time of fixation. The usage of <c>safe_fixed_monotonic_time</c> is - <seealso marker="erts:time_correction#Time_Warp_Safe_Code">time warp - safe</seealso>.</p></item> - </taglist> + time when the table was first fixed by a process, which either + is or is not one of the processes it is fixed by now.</p> + <p>The format and value of <c>FixationTime</c> depends on + <c>Item</c>:</p> + <taglist> + <tag><c>safe_fixed</c></tag> + <item> + <p><c>FixationTime</c> corresponds to the result returned by + <seealso marker="erts:erlang#timestamp/0"> + <c>erlang:timestamp/0</c></seealso> at the time of fixation. + Notice that when the system uses single or multi + <seealso marker="erts:time_correction#Time_Warp_Modes">time + warp modes</seealso> this can produce strange results, as + the use of <c>safe_fixed</c> is not + <seealso marker="erts:time_correction#Time_Warp_Safe_Code"> + time warp safe</seealso>. Time warp safe code must use + <c>safe_fixed_monotonic_time</c> instead.</p> + </item> + <tag><c>safe_fixed_monotonic_time</c></tag> + <item> + <p><c>FixationTime</c> corresponds to the result returned by + <seealso marker="erts:erlang#monotonic_time/0"> + <c>erlang:monotonic_time/0</c></seealso> at the time of + fixation. The use of <c>safe_fixed_monotonic_time</c> is + <seealso marker="erts:time_correction#Time_Warp_Safe_Code"> + time warp safe</seealso>.</p> + </item> + </taglist> <p><c>Info</c> is a possibly empty lists of tuples <c>{Pid,RefCount}</c>, one tuple for every process the - table is fixed by right now. <c>RefCount</c> is the value - of the reference counter, keeping track of how many times + table is fixed by now. <c>RefCount</c> is the value + of the reference counter and it keeps track of how many times the table has been fixed by the process.</p> <p>If the table never has been fixed, the call returns - <c>false</c>.</p></item> - <item><p><c>Item=stats, Value=tuple()</c> <br></br> - Returns internal statistics about set, bag and duplicate_bag tables on an internal format used by OTP test suites. - Not for production use.</p></item> + <c>false</c>.</p> + </item> + <item> + <p><c>Item=stats, Value=tuple()</c></p> + <p>Returns internal statistics about <c>set</c>, <c>bag</c>, and + <c>duplicate_bag</c> tables on an internal format used by OTP + test suites. Not for production use.</p></item> </list> </desc> </func> + <func> <name name="init_table" arity="2"/> <fsummary>Replace all objects of an ETS table.</fsummary> <desc> - <p>Replaces the existing objects of the table <c><anno>Tab</anno></c> with - objects created by calling the input function <c><anno>InitFun</anno></c>, + <p>Replaces the existing objects of table <c><anno>Tab</anno></c> with + objects created by calling the input function + <c><anno>InitFun</anno></c>, see below. This function is provided for compatibility with the <c>dets</c> module, it is not more efficient than filling - a table by using <c>ets:insert/2</c>. - </p> - <p>When called with the argument <c>read</c> the function - <c><anno>InitFun</anno></c> is assumed to return <c>end_of_input</c> when + a table by using + <seealso marker="#insert/2"><c>insert/2</c></seealso>.</p> + <p>When called with argument <c>read</c>, the function + <c><anno>InitFun</anno></c> is assumed to return + <c>end_of_input</c> when there is no more input, or <c>{<anno>Objects</anno>, Fun}</c>, where - <c><anno>Objects</anno></c> is a list of objects and <c>Fun</c> is a new - input function. Any other value Value is returned as an error - <c>{error, {init_fun, Value}}</c>. Each input function will be - called exactly once, and should an error occur, the last - function is called with the argument <c>close</c>, the reply + <c><anno>Objects</anno></c> is a list of objects and <c>Fun</c> is a + new input function. Any other value <c>Value</c> is returned as an + error <c>{error, {init_fun, Value}}</c>. Each input function is + called exactly once, and if an error occur, the last + function is called with argument <c>close</c>, the reply of which is ignored.</p> - <p>If the type of the table is <c>set</c> and there is more - than one object with a given key, one of the objects is + <p>If the table type is <c>set</c> and more than one object + exists with a given key, one of the objects is chosen. This is not necessarily the last object with the given key in the sequence of objects returned by the input functions. This holds also for duplicated objects stored in tables of type <c>bag</c>.</p> </desc> </func> + <func> <name name="insert" arity="2"/> <fsummary>Insert an object into an ETS table.</fsummary> <desc> - <p>Inserts the object or all of the objects in the list - <c><anno>ObjectOrObjects</anno></c> into the table <c><anno>Tab</anno></c>. - If the table is a <c>set</c> and the key of the inserted - objects <em>matches</em> the key of any object in the table, - the old object will be replaced. If the table is an - <c>ordered_set</c> and the key of the inserted object - <em>compares equal</em> to the key of any object in the - table, the old object is also replaced. If the list contains - more than one object with <em>matching</em> keys and the table is a - <c>set</c>, one will be inserted, which one is - not defined. The same thing holds for <c>ordered_set</c>, but - will also happen if the keys <em>compare equal</em>.</p> + <p>Inserts the object or all of the objects in list + <c><anno>ObjectOrObjects</anno></c> into table + <c><anno>Tab</anno></c>.</p> + <list type="bulleted"> + <item> + <p>If the table type is <c>set</c> and the key of the inserted + objects <em>matches</em> the key of any object in the table, + the old object is replaced.</p> + </item> + <item> + <p>If the table type is <c>ordered_set</c> and the key of the + inserted object <em>compares equal</em> to the key of any object + in the table, the old object is replaced.</p> + </item> + <item> + <p>If the list contains more than one object with + <em>matching</em> keys and the table type is <c>set</c>, one is + inserted, which one is not defined. + The same holds for table type <c>ordered_set</c> + if the keys <em>compare equal</em>.</p> + </item> + </list> <p>The entire operation is guaranteed to be <seealso marker="#concurrency">atomic and isolated</seealso>, even when a list of objects is inserted.</p> </desc> </func> + <func> <name name="insert_new" arity="2"/> - <fsummary>Insert an object into an ETS table if the key is not already present.</fsummary> + <fsummary>Insert an object into an ETS table if the key is not + already present.</fsummary> <desc> - <p>This function works exactly like <c>insert/2</c>, with the - exception that instead of overwriting objects with the same - key (in the case of <c>set</c> or <c>ordered_set</c>) or - adding more objects with keys already existing in the table - (in the case of <c>bag</c> and <c>duplicate_bag</c>), it - simply returns <c>false</c>. If <c><anno>ObjectOrObjects</anno></c> is a - list, the function checks <em>every</em> key prior to - inserting anything. Nothing will be inserted if not + <p>Same as <seealso marker="#insert/2"><c>insert/2</c></seealso> + except that instead of overwriting objects with the same key + (for <c>set</c> or <c>ordered_set</c>) or adding more objects with + keys already existing in the table (for <c>bag</c> and + <c>duplicate_bag</c>), <c>false</c> is returned.</p> + <p>If <c><anno>ObjectOrObjects</anno></c> is a + list, the function checks <em>every</em> key before + inserting anything. Nothing is inserted unless <em>all</em> keys present in the list are absent from the table. Like <c>insert/2</c>, the entire operation is guaranteed to be <seealso marker="#concurrency">atomic and isolated</seealso>.</p> </desc> </func> + <func> <name name="is_compiled_ms" arity="1"/> - <fsummary>Checks if an Erlang term is the result of ets:match_spec_compile</fsummary> + <fsummary>Check if an Erlang term is the result of + <c>match_spec_compile</c>.</fsummary> <desc> - <p>This function is used to check if a term is a valid - compiled <seealso marker="#match_spec">match_spec</seealso>. - The compiled match_spec is an opaque datatype which can - <em>not</em> be sent between Erlang nodes nor be stored on + <p>Checks if a term is a valid + compiled <seealso marker="#match_spec">match specification</seealso>. + The compiled match specification is an opaque datatype that + <em>cannot</em> be sent between Erlang nodes or be stored on disk. Any attempt to create an external representation of a - compiled match_spec will result in an empty binary - (<c><![CDATA[<<>>]]></c>). As an example, the following - expression:</p> + compiled match specification results in an empty binary + (<c><![CDATA[<<>>]]></c>).</p> + <p><em>Examples:</em></p> + <p>The following expression yields <c>true</c>::</p> <code type="none"> ets:is_compiled_ms(ets:match_spec_compile([{'_',[],[true]}])).</code> - <p>will yield <c>true</c>, while the following expressions:</p> + <p>The following expressions yield <c>false</c>, as variable + <c>Broken</c> contains a compiled match specification that has + passed through external representation:</p> <code type="none"> MS = ets:match_spec_compile([{'_',[],[true]}]), Broken = binary_to_term(term_to_binary(MS)), ets:is_compiled_ms(Broken).</code> - <p>will yield false, as the variable <c>Broken</c> will contain - a compiled match_spec that has passed through external - representation.</p> <note> - <p>The fact that compiled match_specs has no external - representation is for performance reasons. It may be subject - to change in future releases, while this interface will - still remain for backward compatibility reasons.</p> + <p>The reason for not having an external representation of + compiled match specifications is performance. It can be + subject to change in future releases, while this interface + remains for backward compatibility.</p> </note> </desc> </func> + <func> <name name="last" arity="1"/> - <fsummary>Return the last key in an ETS table of type<c>ordered_set</c>.</fsummary> + <fsummary>Return the last key in an ETS table of type + <c>ordered_set</c>.</fsummary> <desc> - <p>Returns the last key <c><anno>Key</anno></c> according to Erlang term - order in the table <c>Tab</c> of the <c>ordered_set</c> type. - If the table is of any other type, the function is synonymous - to <c>first/1</c>. If the table is empty, - <c>'$end_of_table'</c> is returned.</p> - <p>Use <c>prev/2</c> to find preceding keys in the table.</p> + <p>Returns the last key <c><anno>Key</anno></c> according to Erlang + term order in table <c>Tab</c> of type <c>ordered_set</c>. For + other table types, the function is synonymous to + <seealso marker="#first/1"><c>first/1</c></seealso>. + If the table is empty, <c>'$end_of_table'</c> is returned.</p> + <p>To find preceding keys in the table, use + <seealso marker="#prev/2"><c>prev/2</c></seealso>.</p> </desc> </func> + <func> <name name="lookup" arity="2"/> - <fsummary>Return all objects with a given key in an ETS table.</fsummary> + <fsummary>Return all objects with a specified key in an ETS table. + </fsummary> <desc> - <p>Returns a list of all objects with the key <c><anno>Key</anno></c> in - the table <c><anno>Tab</anno></c>.</p> - <p>In the case of <c>set, bag and duplicate_bag</c>, an object - is returned only if the given key <em>matches</em> the key - of the object in the table. If the table is an - <c>ordered_set</c> however, an object is returned if the key - given <em>compares equal</em> to the key of an object in the - table. The difference being the same as between <c>=:=</c> - and <c>==</c>. As an example, one might insert an object - with the + <p>Returns a list of all objects with key <c><anno>Key</anno></c> in + table <c><anno>Tab</anno></c>.</p> + <list type="bulleted"> + <item> + <p>For tables of type <c>set</c>, <c>bag</c>, or + <c>duplicate_bag</c>, an object is returned only if the specified + key <em>matches</em> the key of the object in the table.</p> + </item> + <item> + <p>For tables of type <c>ordered_set</c>, an object is returned if + the specified key <em>compares equal</em> to the key of an object + in the table.</p> + </item> + </list> + <p>The difference is the same as between <c>=:=</c> and <c>==</c>.</p> + <p>As an example, one can insert an object with <c>integer()</c> <c>1</c> as a key in an <c>ordered_set</c> - and get the object returned as a result of doing a - <c>lookup/2</c> with the <c>float()</c> <c>1.0</c> as the - key to search for.</p> - <p>If the table is of type <c>set</c> or <c>ordered_set</c>, + and get the object returned as a result of doing a <c>lookup/2</c> + with <c>float()</c> <c>1.0</c> as the key to search for.</p> + <p>For tables of type <c>set</c> or <c>ordered_set</c>, the function returns either the empty list or a list with one element, as there cannot be more than one object with the same - key. If the table is of type <c>bag</c> or - <c>duplicate_bag</c>, the function returns a list of - arbitrary length.</p> - <p>Note that the time order of object insertions is preserved; - the first object inserted with the given key will be first + key. For tables of type <c>bag</c> or <c>duplicate_bag</c>, the + function returns a list of arbitrary length.</p> + <p>Notice that the time order of object insertions is preserved; + the first object inserted with the specified key is the first in the resulting list, and so on.</p> - <p>Insert and look-up times in tables of type <c>set</c>, - <c>bag</c> and <c>duplicate_bag</c> are constant, regardless - of the size of the table. For the <c>ordered_set</c> - data-type, time is proportional to the (binary) logarithm of + <p>Insert and lookup times in tables of type <c>set</c>, + <c>bag</c>, and <c>duplicate_bag</c> are constant, regardless + of the table size. For the <c>ordered_set</c> + datatype, time is proportional to the (binary) logarithm of the number of objects.</p> </desc> </func> + <func> <name name="lookup_element" arity="3"/> - <fsummary>Return the <c>Pos</c>:th element of all objects with a given key in an ETS table.</fsummary> + <fsummary>Return the <c>Pos</c>:th element of all objects with a + specified key in an ETS table.</fsummary> <desc> - <p>If the table <c><anno>Tab</anno></c> is of type <c>set</c> or - <c>ordered_set</c>, the function returns the <c><anno>Pos</anno></c>:th - element of the object with the key <c><anno>Key</anno></c>.</p> - <p>If the table is of type <c>bag</c> or <c>duplicate_bag</c>, - the functions returns a list with the <c><anno>Pos</anno></c>:th element of - every object with the key <c><anno>Key</anno></c>.</p> - <p>If no object with the key <c><anno>Key</anno></c> exists, the function - will exit with reason <c>badarg</c>.</p> - <p>The difference between <c>set</c>, <c>bag</c> and + <p>For a table <c><anno>Tab</anno></c> of type <c>set</c> or + <c>ordered_set</c>, the function returns the + <c><anno>Pos</anno></c>:th + element of the object with key <c><anno>Key</anno></c>.</p> + <p>For tables of type <c>bag</c> or <c>duplicate_bag</c>, + the functions returns a list with the <c><anno>Pos</anno></c>:th + element of every object with key <c><anno>Key</anno></c>.</p> + <p>If no object with key <c><anno>Key</anno></c> exists, the + function exits with reason <c>badarg</c>.</p> + <p>The difference between <c>set</c>, <c>bag</c>, and <c>duplicate_bag</c> on one hand, and <c>ordered_set</c> on - the other, regarding the fact that <c>ordered_set</c>'s + the other, regarding the fact that <c>ordered_set</c> view keys as equal when they <em>compare equal</em> - whereas the other table types only regard them equal when - they <em>match</em>, naturally holds for - <c>lookup_element</c> as well as for <c>lookup</c>.</p> + whereas the other table types regard them equal only when + they <em>match</em>, holds for <c>lookup_element/3</c>.</p> </desc> </func> + + <func> + <name name="match" arity="1"/> + <fsummary>Continues matching objects in an ETS table.</fsummary> + <desc> + <p>Continues a match started with + <seealso marker="#match/3"><c>match/3</c></seealso>. The next + chunk of the size specified in the initial <c>match/3</c> + call is returned together with a new <c><anno>Continuation</anno></c>, + which can be used in subsequent calls to this function.</p> + <p>When there are no more objects in the table, <c>'$end_of_table'</c> + is returned.</p> + </desc> + </func> + <func> <name name="match" arity="2"/> - <fsummary>Match the objects in an ETS table against a pattern.</fsummary> + <fsummary>Match the objects in an ETS table against a pattern. + </fsummary> <desc> - <p>Matches the objects in the table <c><anno>Tab</anno></c> against the + <p>Matches the objects in table <c><anno>Tab</anno></c> against pattern <c><anno>Pattern</anno></c>.</p> - <p>A pattern is a term that may contain:</p> + <p>A pattern is a term that can contain:</p> <list type="bulleted"> - <item>bound parts (Erlang terms),</item> - <item><c>'_'</c> which matches any Erlang term, and</item> - <item>pattern variables: <c>'$N'</c> where - <c>N</c>=0,1,...</item> + <item>Bound parts (Erlang terms)</item> + <item><c>'_'</c> that matches any Erlang term</item> + <item>Pattern variables <c>'$N'</c>, where <c>N</c>=0,1,...</item> </list> <p>The function returns a list with one element for each matching object, where each element is an ordered list of - pattern variable bindings. An example:</p> + pattern variable bindings, for example:</p> <pre> -6> <input>ets:match(T, '$1').</input> % Matches every object in the table +6> <input>ets:match(T, '$1').</input> % Matches every object in table [[{rufsen,dog,7}],[{brunte,horse,5}],[{ludde,dog,5}]] 7> <input>ets:match(T, {'_',dog,'$1'}).</input> [[7],[5]] 8> <input>ets:match(T, {'_',cow,'$1'}).</input> []</pre> <p>If the key is specified in the pattern, the match is very - efficient. If the key is not specified, i.e. if it is a + efficient. If the key is not specified, that is, if it is a variable or an underscore, the entire table must be searched. The search time can be substantial if the table is very large.</p> - <p>On tables of the <c>ordered_set</c> type, the result is in - the same order as in a <c>first/next</c> traversal.</p> + <p>For tables of type <c>ordered_set</c>, the result is in + the same order as in a <c>first</c>/<c>next</c> traversal.</p> </desc> </func> + <func> <name name="match" arity="3"/> - <fsummary>Match the objects in an ETS table against a pattern and returns part of the answers.</fsummary> + <fsummary>Match the objects in an ETS table against a pattern + and return part of the answers.</fsummary> <desc> - <p>Works like <c>ets:match/2</c> but only returns a limited - (<c><anno>Limit</anno></c>) number of matching objects. The - <c><anno>Continuation</anno></c> term can then be used in subsequent calls - to <c>ets:match/1</c> to get the next chunk of matching - objects. This is a space efficient way to work on objects in a - table which is still faster than traversing the table object - by object using <c>ets:first/1</c> and <c>ets:next/1</c>.</p> - <p><c>'$end_of_table'</c> is returned if the table is empty.</p> + <p>Works like <seealso marker="#match/2"><c>match/2</c></seealso>, + but returns only a limited (<c><anno>Limit</anno></c>) number of + matching objects. Term <c><anno>Continuation</anno></c> can then + be used in subsequent calls to <seealso marker="#match/1"> + <c>match/1</c></seealso> to get the next chunk of matching + objects. This is a space-efficient way to work on objects in a + table, which is faster than traversing the table object + by object using + <seealso marker="#first/1"><c>first/1</c></seealso> and + <seealso marker="#next/2"><c>next/2</c></seealso>.</p> + <p>If the table is empty, <c>'$end_of_table'</c> is returned.</p> </desc> </func> + <func> - <name name="match" arity="1"/> - <fsummary>Continues matching objects in an ETS table.</fsummary> + <name name="match_delete" arity="2"/> + <fsummary>Delete all objects that match a specified pattern from an + ETS table.</fsummary> <desc> - <p>Continues a match started with <c>ets:match/3</c>. The next - chunk of the size given in the initial <c>ets:match/3</c> - call is returned together with a new <c><anno>Continuation</anno></c> - that can be used in subsequent calls to this function.</p> - <p><c>'$end_of_table'</c> is returned when there are no more - objects in the table.</p> + <p>Deletes all objects that match pattern <c><anno>Pattern</anno></c> + from table <c><anno>Tab</anno></c>. For a description of patterns, + see <seealso marker="#match/2"><c>match/2</c></seealso>.</p> </desc> </func> + <func> - <name name="match_delete" arity="2"/> - <fsummary>Delete all objects which match a given pattern from an ETS table.</fsummary> + <name name="match_object" arity="1"/> + <fsummary>Continues matching objects in an ETS table.</fsummary> <desc> - <p>Deletes all objects which match the pattern <c><anno>Pattern</anno></c> - from the table <c><anno>Tab</anno></c>. See <c>match/2</c> for a - description of patterns.</p> + <p>Continues a match started with + <seealso marker="#match_object/3"><c>match_object/3</c></seealso>. + The next chunk of the size specified in the initial + <c>match_object/3</c> call is returned together with a new + <c><anno>Continuation</anno></c>, which can be used in subsequent + calls to this function.</p> + <p>When there are no more objects in the table, <c>'$end_of_table'</c> + is returned.</p> </desc> </func> + <func> <name name="match_object" arity="2"/> - <fsummary>Match the objects in an ETS table against a pattern.</fsummary> + <fsummary>Match the objects in an ETS table against a pattern. + </fsummary> <desc> - <p>Matches the objects in the table <c><anno>Tab</anno></c> against the - pattern <c><anno>Pattern</anno></c>. See <c>match/2</c> for a description - of patterns. The function returns a list of all objects which + <p>Matches the objects in table <c><anno>Tab</anno></c> against + pattern <c><anno>Pattern</anno></c>. For a description of patterns, + see <seealso marker="#match/2"><c>match/2</c></seealso>. + The function returns a list of all objects that match the pattern.</p> <p>If the key is specified in the pattern, the match is very - efficient. If the key is not specified, i.e. if it is a + efficient. If the key is not specified, that is, if it is a variable or an underscore, the entire table must be searched. The search time can be substantial if the table is very large.</p> - <p>On tables of the <c>ordered_set</c> type, the result is in - the same order as in a <c>first/next</c> traversal.</p> + <p>For tables of type <c>ordered_set</c>, the result is in + the same order as in a <c>first</c>/<c>next</c> traversal.</p> </desc> </func> + <func> <name name="match_object" arity="3"/> - <fsummary>Match the objects in an ETS table against a pattern and returns part of the answers.</fsummary> + <fsummary>Match the objects in an ETS table against a pattern and + return part of the answers.</fsummary> <desc> - <p>Works like <c>ets:match_object/2</c> but only returns a - limited (<c><anno>Limit</anno></c>) number of matching objects. The - <c><anno>Continuation</anno></c> term can then be used in subsequent calls - to <c>ets:match_object/1</c> to get the next chunk of matching - objects. This is a space efficient way to work on objects in a - table which is still faster than traversing the table object - by object using <c>ets:first/1</c> and <c>ets:next/1</c>.</p> - <p><c>'$end_of_table'</c> is returned if the table is empty.</p> - </desc> - </func> - <func> - <name name="match_object" arity="1"/> - <fsummary>Continues matching objects in an ETS table.</fsummary> - <desc> - <p>Continues a match started with <c>ets:match_object/3</c>. - The next chunk of the size given in the initial - <c>ets:match_object/3</c> call is returned together with a - new <c><anno>Continuation</anno></c> that can be used in subsequent calls - to this function.</p> - <p><c>'$end_of_table'</c> is returned when there are no more - objects in the table.</p> + <p>Works like <seealso marker="#match_object/2"> + <c>match_object/2</c></seealso>, but only returns a + limited (<c><anno>Limit</anno></c>) number of matching objects. Term + <c><anno>Continuation</anno></c> can then be used in subsequent + calls to <seealso marker="#match_object/1"> + <c>match_object/1</c></seealso> to get the next chunk of matching + objects. This is a space-efficient way to work on objects in a + table, which is faster than traversing the table object + by object using + <seealso marker="#first/1"><c>first/1</c></seealso> and + <seealso marker="#next/2"><c>next/2</c></seealso>.</p> + <p>If the table is empty, <c>'$end_of_table'</c> is returned.</p> </desc> </func> + <func> <name name="match_spec_compile" arity="1"/> - <fsummary>Compiles a match specification into its internal representation</fsummary> + <fsummary>Compile a match specification into its internal representation. + </fsummary> <desc> - <p>This function transforms a - <seealso marker="#match_spec">match_spec</seealso> into an - internal representation that can be used in subsequent calls - to <c>ets:match_spec_run/2</c>. The internal representation is - opaque and can not be converted to external term format and - then back again without losing its properties (meaning it can - not be sent to a process on another node and still remain a - valid compiled match_spec, nor can it be stored on disk). - The validity of a compiled match_spec can be checked using - <c>ets:is_compiled_ms/1</c>.</p> - <p>If the term <c><anno>MatchSpec</anno></c> can not be compiled (does not - represent a valid match_spec), a <c>badarg</c> fault is - thrown.</p> + <p>Transforms a + <seealso marker="#match_spec">match specification</seealso> into an + internal representation that can be used in subsequent calls to + <seealso marker="#match_spec_run/2"><c>match_spec_run/2</c></seealso>. + The internal representation is + opaque and cannot be converted to external term format and + then back again without losing its properties (that is, it cannot + be sent to a process on another node and still remain a + valid compiled match specification, nor can it be stored on disk). + To check the validity of a compiled match specification, use + <seealso marker="#is_compiled_ms/1"><c>is_compiled_ms/1</c></seealso>. + </p> + <p>If term <c><anno>MatchSpec</anno></c> cannot be compiled (does not + represent a valid match specification), a <c>badarg</c> exception is + raised.</p> <note> - <p>This function has limited use in normal code, it is used by - Dets to perform the <c>dets:select</c> operations.</p> + <p>This function has limited use in normal code. It is used by the + <seealso marker="dets"><c>dets</c></seealso> module + to perform the <c>dets:select()</c> operations.</p> </note> </desc> </func> + <func> <name name="match_spec_run" arity="2"/> - <fsummary>Performs matching, using a compiled match_spec, on a list of tuples</fsummary> + <fsummary>Perform matching, using a compiled match specification on a + list of tuples.</fsummary> <desc> - <p>This function executes the matching specified in a - compiled <seealso marker="#match_spec">match_spec</seealso> on - a list of tuples. The <c><anno>CompiledMatchSpec</anno></c> term should be - the result of a call to <c>ets:match_spec_compile/1</c> and - is hence the internal representation of the match_spec one - wants to use.</p> - <p>The matching will be executed on each element in <c><anno>List</anno></c> - and the function returns a list containing all results. If an - element in <c><anno>List</anno></c> does not match, nothing is returned + <p>Executes the matching specified in a compiled + <seealso marker="#match_spec">match specification</seealso> on a list + of tuples. Term <c><anno>CompiledMatchSpec</anno></c> is to be + the result of a call to <seealso marker="#match_spec_compile/1"> + <c>match_spec_compile/1</c></seealso> and is hence the internal + representation of the match specification one wants to use.</p> + <p>The matching is executed on each element in <c><anno>List</anno></c> + and the function returns a list containing all results. If an element + in <c><anno>List</anno></c> does not match, nothing is returned for that element. The length of the result list is therefore - equal or less than the the length of the parameter - <c><anno>List</anno></c>. The two calls in the following example will give - the same result (but certainly not the same execution - time...):</p> + equal or less than the length of parameter <c><anno>List</anno></c>. + </p> + <p><em>Example:</em></p> + <p>The following two calls give the same result (but certainly not the + same execution time):</p> <code type="none"> Table = ets:new... -MatchSpec = .... +MatchSpec = ... % The following call... ets:match_spec_run(ets:tab2list(Table), ets:match_spec_compile(MatchSpec)), -% ...will give the same result as the more common (and more efficient) -ets:select(Table,MatchSpec),</code> +% ...gives the same result as the more common (and more efficient) +ets:select(Table, MatchSpec),</code> <note> - <p>This function has limited use in normal code, it is used by - Dets to perform the <c>dets:select</c> operations and by + <p>This function has limited use in normal code. It is used by the + <seealso marker="dets"><c>dets</c></seealso> module + to perform the <c>dets:select()</c> operations and by Mnesia during transactions.</p> </note> </desc> </func> + <func> <name name="member" arity="2"/> - <fsummary>Tests for occurrence of a key in an ETS table</fsummary> + <fsummary>Tests for occurrence of a key in an ETS table.</fsummary> <desc> - <p>Works like <c>lookup/2</c>, but does not return the objects. - The function returns <c>true</c> if one or more elements in - the table has the key <c><anno>Key</anno></c>, <c>false</c> otherwise.</p> + <p>Works like <seealso marker="#lookup/2"><c>lookup/2</c></seealso>, + but does not return the objects. Returns <c>true</c> if one or more + elements in the table has key <c><anno>Key</anno></c>, otherwise + <c>false</c>.</p> </desc> </func> + <func> <name name="new" arity="2"/> <fsummary>Create a new ETS table.</fsummary> <desc> - <p>Creates a new table and returns a table identifier which can + <p>Creates a new table and returns a table identifier that can be used in subsequent operations. The table identifier can be sent to other processes so that a table can be shared between different processes within a node.</p> - <p>The parameter <c><anno>Options</anno></c> is a list of atoms which - specifies table type, access rights, key position and if the - table is named or not. If one or more options are left out, - the default values are used. This means that not specifying - any options (<c>[]</c>) is the same as specifying - <c>[set, protected, {keypos,1}, {heir,none}, {write_concurrency,false}, {read_concurrency,false}]</c>.</p> - <list type="bulleted"> + <p>Parameter <c><anno>Options</anno></c> is a list of atoms that + specifies table type, access rights, key position, and whether the + table is named. Default values are used for omitted options. + This means that not specifying any options (<c>[]</c>) is the same + as specifying <c>[set, protected, {keypos,1}, {heir,none}, + {write_concurrency,false}, {read_concurrency,false}]</c>.</p> + <taglist> + <tag><c>set</c></tag> <item> - <p><c>set</c> - The table is a <c>set</c> table - one key, one object, + <p>The table is a <c>set</c> table: one key, one object, no order among objects. This is the default table type.</p> </item> + <tag><c>ordered_set</c></tag> <item> - <p><c>ordered_set</c> - The table is a <c>ordered_set</c> table - one key, one + <p>The table is a <c>ordered_set</c> table: one key, one object, ordered in Erlang term order, which is the order implied by the < and > operators. Tables of this type have a somewhat different behavior in some situations - than tables of the other types. Most notably the + than tables of other types. Most notably, the <c>ordered_set</c> tables regard keys as equal when they <em>compare equal</em>, not only when they match. This - means that to an <c>ordered_set</c>, the - <c>integer()</c> <c>1</c> and the <c>float()</c> <c>1.0</c> are regarded as equal. This also means that the + means that to an <c>ordered_set</c> table, <c>integer()</c> + <c>1</c> and <c>float()</c> <c>1.0</c> are regarded as equal. + This also means that the key used to lookup an element not necessarily - <em>matches</em> the key in the elements returned, if + <em>matches</em> the key in the returned elements, if <c>float()</c>'s and <c>integer()</c>'s are mixed in keys of a table.</p> </item> + <tag><c>bag</c></tag> <item> - <p><c>bag</c> - The table is a <c>bag</c> table which can have many + <p>The table is a <c>bag</c> table, which can have many objects, but only one instance of each object, per key.</p> </item> + <tag><c>duplicate_bag</c></tag> <item> - <p><c>duplicate_bag</c> - The table is a <c>duplicate_bag</c> table which can have + <p>The table is a <c>duplicate_bag</c> table, which can have many objects, including multiple copies of the same object, per key.</p> </item> + <tag><c>public</c></tag> <item> - <p><c>public</c> - Any process may read or write to the table.</p> + <p>Any process can read or write to the table.</p> + <marker id="protected"></marker> </item> + <tag><c>protected</c></tag> <item> - <marker id="protected"></marker> - <p><c>protected</c> - The owner process can read and write to the table. Other + <p>The owner process can read and write to the table. Other processes can only read the table. This is the default setting for the access rights.</p> + <marker id="private"></marker> </item> + <tag><c>private</c></tag> <item> - <marker id="private"></marker> - <p><c>private</c> - Only the owner process can read or write to the table.</p> + <p>Only the owner process can read or write to the table.</p> </item> + <tag><c>named_table</c></tag> <item> - <p><c>named_table</c> - If this option is present, the name <c><anno>Name</anno></c> is + <p>If this option is present, name <c><anno>Name</anno></c> is associated with the table identifier. The name can then be used instead of the table identifier in subsequent operations.</p> </item> + <tag><c>{keypos,<anno>Pos</anno>}</c></tag> <item> - <p><c>{keypos,<anno>Pos</anno>}</c> - Specifies which element in the stored tuples should be - used as key. By default, it is the first element, i.e. - <c><anno>Pos</anno>=1</c>. However, this is not always appropriate. In + <p>Specifies which element in the stored tuples to use + as key. By default, it is the first element, that is, + <c><anno>Pos</anno>=1</c>. However, this is not always + appropriate. In particular, we do not want the first element to be the key if we want to store Erlang records in a table.</p> - <p>Note that any tuple stored in the table must have at + <p>Notice that any tuple stored in the table must have at least <c><anno>Pos</anno></c> number of elements.</p> - </item> - <item> <marker id="heir"></marker> - <p><c>{heir,<anno>Pid</anno>,<anno>HeirData</anno>} | {heir,none}</c><br></br> - Set a process as heir. The heir will inherit the table if - the owner terminates. The message - <c>{'ETS-TRANSFER',tid(),FromPid,<anno>HeirData</anno>}</c> will be sent to - the heir when that happens. The heir must be a local process. - Default heir is <c>none</c>, which will destroy the table when - the owner terminates.</p> </item> + <tag><c>{heir,<anno>Pid</anno>,<anno>HeirData</anno>} | + {heir,none}</c></tag> <item> + <p>Set a process as heir. The heir inherits the table if + the owner terminates. Message + <c>{'ETS-TRANSFER',tid(),FromPid,<anno>HeirData</anno>}</c> is + sent to the heir when that occurs. The heir must be a local + process. Default heir is <c>none</c>, which destroys the table + when the owner terminates.</p> <marker id="new_2_write_concurrency"></marker> - <p><c>{write_concurrency,boolean()}</c> - Performance tuning. Default is <c>false</c>, in which case an operation that - mutates (writes to) the table will obtain exclusive access, - blocking any concurrent access of the same table until finished. - If set to <c>true</c>, the table is optimized towards concurrent - write access. Different objects of the same table can be mutated - (and read) by concurrent processes. This is achieved to some degree - at the expense of memory consumption and the performance of - sequential access and concurrent reading. - The <c>write_concurrency</c> option can be combined with the - <seealso marker="#new_2_read_concurrency">read_concurrency</seealso> - option. You typically want to combine these when large concurrent - read bursts and large concurrent write bursts are common (see the - documentation of the - <seealso marker="#new_2_read_concurrency">read_concurrency</seealso> - option for more information). - Note that this option does not change any guarantees about - <seealso marker="#concurrency">atomicy and isolation</seealso>. - Functions that makes such promises over several objects (like - <c>insert/2</c>) will gain less (or nothing) from this option.</p> - <p>In current implementation, table type <c>ordered_set</c> is not - affected by this option. Also, the memory consumption inflicted by - both <c>write_concurrency</c> and <c>read_concurrency</c> is a - constant overhead per table. This overhead can be especially large - when both options are combined.</p> </item> + <tag><c>{write_concurrency,boolean()}</c></tag> <item> + <p>Performance tuning. Defaults to <c>false</c>, in which case an + operation that + mutates (writes to) the table obtains exclusive access, + blocking any concurrent access of the same table until finished. + If set to <c>true</c>, the table is optimized to concurrent + write access. Different objects of the same table can be mutated + (and read) by concurrent processes. This is achieved to some + degree at the expense of memory consumption and the performance + of sequential access and concurrent reading.</p> + <p>Option <c>write_concurrency</c> can be combined with option + <seealso marker="#new_2_read_concurrency"> + <c>read_concurrency</c></seealso>. You typically want to combine + these when large concurrent read bursts and large concurrent + write bursts are common; for more information, see option + <seealso marker="#new_2_read_concurrency"> + <c>read_concurrency</c></seealso>.</p> + <p>Notice that this option does not change any guarantees about + <seealso marker="#concurrency">atomicity and isolation</seealso>. + Functions that makes such promises over many objects (like + <seealso marker="#insert/2"><c>insert/2</c></seealso>) + gain less (or nothing) from this option.</p> + <p>Table type <c>ordered_set</c> is not affected by this option. + Also, the memory consumption inflicted by + both <c>write_concurrency</c> and <c>read_concurrency</c> is a + constant overhead per table. This overhead can be especially + large when both options are combined.</p> <marker id="new_2_read_concurrency"></marker> - <p><c>{read_concurrency,boolean()}</c> - Performance tuning. Default is <c>false</c>. When set to - <c>true</c>, the table is optimized for concurrent read - operations. When this option is enabled on a runtime system with - SMP support, read operations become much cheaper; especially on - systems with multiple physical processors. However, switching - between read and write operations becomes more expensive. You - typically want to enable this option when concurrent read - operations are much more frequent than write operations, or when - concurrent reads and writes comes in large read and write - bursts (i.e., lots of reads not interrupted by writes, and lots - of writes not interrupted by reads). You typically do - <em>not</em> want to enable this option when the common access - pattern is a few read operations interleaved with a few write - operations repeatedly. In this case you will get a performance - degradation by enabling this option. The <c>read_concurrency</c> - option can be combined with the - <seealso marker="#new_2_write_concurrency">write_concurrency</seealso> - option. You typically want to combine these when large concurrent - read bursts and large concurrent write bursts are common.</p> </item> + <tag><c>{read_concurrency,boolean()}</c></tag> <item> + <p>Performance tuning. Defaults to <c>false</c>. When set to + <c>true</c>, the table is optimized for concurrent read + operations. When this option is enabled on a runtime system with + SMP support, read operations become much cheaper; especially on + systems with multiple physical processors. However, switching + between read and write operations becomes more expensive.</p> + <p>You typically want to enable this option when concurrent read + operations are much more frequent than write operations, or when + concurrent reads and writes comes in large read and write bursts + (that is, many reads not interrupted by writes, and many + writes not interrupted by reads).</p> + <p>You typically do + <em>not</em> want to enable this option when the common access + pattern is a few read operations interleaved with a few write + operations repeatedly. In this case, you would get a performance + degradation by enabling this option.</p> + <p>Option <c>read_concurrency</c> can be combined with option + <seealso marker="#new_2_write_concurrency"> + <c>write_concurrency</c></seealso>. + You typically want to combine these when large concurrent + read bursts and large concurrent write bursts are common.</p> <marker id="new_2_compressed"></marker> - <p><c>compressed</c> - If this option is present, the table data will be stored in a more compact format to - consume less memory. The downside is that it will make table operations slower. - Especially operations that need to inspect entire objects, - such as <c>match</c> and <c>select</c>, will get much slower. The key element - is not compressed in current implementation.</p> </item> - </list> + <tag><c>compressed</c></tag> + <item> + <p>If this option is present, the table data is stored in a more + compact format to consume less memory. However, it will make + table operations slower. Especially operations that need to + inspect entire objects, such as <c>match</c> and <c>select</c>, + get much slower. The key element is not compressed.</p> + </item> + </taglist> </desc> </func> + <func> <name name="next" arity="2"/> <fsummary>Return the next key in an ETS table.</fsummary> <desc> - <p>Returns the next key <c><anno>Key2</anno></c>, following the key - <c><anno>Key1</anno></c> in the table <c><anno>Tab</anno></c>. If the table is of the - <c>ordered_set</c> type, the next key in Erlang term order is - returned. If the table is of any other type, the next key - according to the table's internal order is returned. If there - is no next key, <c>'$end_of_table'</c> is returned.</p> - <p>Use <c>first/1</c> to find the first key in the table.</p> - <p>Unless a table of type <c>set</c>, <c>bag</c> or + <p>Returns the next key <c><anno>Key2</anno></c>, following key + <c><anno>Key1</anno></c> in table <c><anno>Tab</anno></c>. For table + type <c>ordered_set</c>, the next key in Erlang term order is + returned. For other table types, the next key + according to the internal order of the table is returned. If no + next key exists, <c>'$end_of_table'</c> is returned.</p> + <p>To find the first key in the table, use + <seealso marker="#first/1"><c>first/1</c></seealso>.</p> + <p>Unless a table of type <c>set</c>, <c>bag</c>, or <c>duplicate_bag</c> is protected using - <c>safe_fixtable/2</c>, see below, a traversal may fail if - concurrent updates are made to the table. If the table is of + <seealso marker="#safe_fixtable/2"><c>safe_fixtable/2</c></seealso>, + a traversal can fail if + concurrent updates are made to the table. For table type <c>ordered_set</c>, the function returns the next key in order, even if the object does no longer exist.</p> </desc> </func> + <func> <name name="prev" arity="2"/> - <fsummary>Return the previous key in an ETS table of type<c>ordered_set</c>.</fsummary> + <fsummary>Return the previous key in an ETS table of type + <c>ordered_set</c>.</fsummary> <desc> - <p>Returns the previous key <c><anno>Key2</anno></c>, preceding the key - <c><anno>Key1</anno></c> according the Erlang term order in the table - <c><anno>Tab</anno></c> of the <c>ordered_set</c> type. If the table is of - any other type, the function is synonymous to <c>next/2</c>. - If there is no previous key, <c>'$end_of_table'</c> is - returned.</p> - <p>Use <c>last/1</c> to find the last key in the table.</p> + <p>Returns the previous key <c><anno>Key2</anno></c>, preceding key + <c><anno>Key1</anno></c> according to Erlang term order in table + <c><anno>Tab</anno></c> of type <c>ordered_set</c>. For other + table types, the function is synonymous to + <seealso marker="#next/2"><c>next/2</c></seealso>. + If no previous key exists, <c>'$end_of_table'</c> is returned.</p> + <p>To find the last key in the table, use + <seealso marker="#last/1"><c>last/1</c></seealso>.</p> </desc> </func> + <func> <name name="rename" arity="2"/> <fsummary>Rename a named ETS table.</fsummary> <desc> <p>Renames the named table <c><anno>Tab</anno></c> to the new name - <c><anno>Name</anno></c>. Afterwards, the old name can not be used to + <c><anno>Name</anno></c>. Afterwards, the old name cannot be used to access the table. Renaming an unnamed table has no effect.</p> </desc> </func> + <func> <name name="repair_continuation" arity="2"/> - <fsummary>Repair a continuation from ets:select/1 or ets:select/3 that has passed through external representation</fsummary> + <fsummary>Repair a continuation from <c>ets:select/1 or ets:select/3</c> + that has passed through external representation.</fsummary> <desc> - <p>This function can be used to restore an opaque continuation - returned by <c>ets:select/3</c> or <c>ets:select/1</c> if the + <p>Restores an opaque continuation returned by + <seealso marker="#select/3"><c>select/3</c></seealso> or + <seealso marker="#select/1"><c>select/1</c></seealso> if the continuation has passed through external term format (been sent between nodes or stored on disk).</p> <p>The reason for this function is that continuation terms - contain compiled match_specs and therefore will be - invalidated if converted to external term format. Given that - the original match_spec is kept intact, the continuation can + contain compiled match specifications and therefore are + invalidated if converted to external term format. Given that the + original match specification is kept intact, the continuation can be restored, meaning it can once again be used in subsequent - <c>ets:select/1</c> calls even though it has been stored on + <c>select/1</c> calls even though it has been stored on disk or on another node.</p> - <p>As an example, the following sequence of calls will fail:</p> + <p><em>Examples:</em></p> + <p>The following sequence of calls fails:</p> <code type="none"> T=ets:new(x,[]), ... @@ -1089,7 +1240,9 @@ A end),10), Broken = binary_to_term(term_to_binary(C)), ets:select(Broken).</code> - <p>...while the following sequence will work:</p> + <p>The following sequence works, as the call to + <c>repair_continuation/2</c> reestablishes the (deliberately) + invalidated continuation <c>Broken</c>.</p> <code type="none"> T=ets:new(x,[]), ... @@ -1100,45 +1253,44 @@ end), {_,C} = ets:select(T,MS,10), Broken = binary_to_term(term_to_binary(C)), ets:select(ets:repair_continuation(Broken,MS)).</code> - <p>...as the call to <c>ets:repair_continuation/2</c> will - reestablish the (deliberately) invalidated continuation - <c>Broken</c>.</p> <note> - <p>This function is very rarely needed in application code. It - is used by Mnesia to implement distributed <c>select/3</c> + <p>This function is rarely needed in application code. It is used + by Mnesia to provide distributed <c>select/3</c> and <c>select/1</c> sequences. A normal application would either use Mnesia or keep the continuation from being converted to external format.</p> <p>The reason for not having an external representation of a - compiled match_spec is performance. It may be subject to - change in future releases, while this interface will remain + compiled match specification is performance. It can be subject to + change in future releases, while this interface remains for backward compatibility.</p> </note> </desc> </func> + <func> <name name="safe_fixtable" arity="2"/> <fsummary>Fix an ETS table for safe traversal.</fsummary> <desc> - <p>Fixes a table of the <c>set</c>, <c>bag</c> or - <c>duplicate_bag</c> table type for safe traversal.</p> + <p>Fixes a table of type <c>set</c>, <c>bag</c>, or + <c>duplicate_bag</c> for safe traversal.</p> <p>A process fixes a table by calling - <c>safe_fixtable(<anno>Tab</anno>, true)</c>. The table remains fixed until - the process releases it by calling + <c>safe_fixtable(<anno>Tab</anno>, true)</c>. The table remains + fixed until the process releases it by calling <c>safe_fixtable(<anno>Tab</anno>, false)</c>, or until the process terminates.</p> - <p>If several processes fix a table, the table will remain fixed + <p>If many processes fix a table, the table remains fixed until all processes have released it (or terminated). A reference counter is kept on a per process basis, and N - consecutive fixes requires N releases to actually release - the table.</p> - <p>When a table is fixed, a sequence of <c>first/1</c> and - <c>next/2</c> calls are guaranteed to succeed and each object in - the table will only be returned once, even if objects - are removed or inserted during the traversal. - The keys for new objects inserted during the traversal <em>may</em> - be returned by <seealso marker="#next/2">next/2</seealso> - (it depends on the internal ordering of the keys). An example:</p> + consecutive fixes requires N releases to release the table.</p> + <p>When a table is fixed, a sequence of + <seealso marker="#first/1"><c>first/1</c></seealso> and + <seealso marker="#next/2"><c>next/2</c></seealso> calls are + guaranteed to succeed, and each object in + the table is returned only once, even if objects + are removed or inserted during the traversal. The keys for new + objects inserted during the traversal <em>can</em> be returned by + <c>next/2</c> (it depends on the internal ordering of the keys).</p> + <p><em>Example:</em></p> <code type="none"> clean_all_with_value(Tab,X) -> safe_fixtable(Tab,true), @@ -1155,218 +1307,205 @@ clean_all_with_value(Tab,X,Key) -> true end, clean_all_with_value(Tab,X,ets:next(Tab,Key)).</code> - <p>Note that no deleted objects are actually removed from a + <p>Notice that no deleted objects are removed from a fixed table until it has been released. If a process fixes a table but never releases it, the memory used by the deleted - objects will never be freed. The performance of operations on - the table will also degrade significantly.</p> - <p>Use - <seealso marker="#info_2_safe_fixed_monotonic_time"><c>info(Tab, - safe_fixed_monotonic_time)</c></seealso> to retrieve information - about which processes have fixed which tables. A system with a lot - of processes fixing tables may need a monitor which sends alarms + objects is never freed. The performance of operations on + the table also degrades significantly.</p> + <p>To retrieve information about which processes have fixed which + tables, use <seealso marker="#info_2_safe_fixed_monotonic_time"> + <c>info(Tab, safe_fixed_monotonic_time)</c></seealso>. A system with + many processes fixing tables can need a monitor that sends alarms when tables have been fixed for too long.</p> - <p>Note that for tables of the <c>ordered_set</c> type, - <c>safe_fixtable/2</c> is not necessary as calls to - <c>first/1</c> and <c>next/2</c> will always succeed.</p> + <p>Notice that for table type <c>ordered_set</c>, + <c>safe_fixtable/2</c> is not necessary, as calls to + <c>first/1</c> and <c>next/2</c> always succeed.</p> </desc> </func> + + <func> + <name name="select" arity="1"/> + <fsummary>Continue matching objects in an ETS table.</fsummary> + <desc> + <p>Continues a match started with + <seealso marker="#select/3"><c>select/3</c></seealso>. The next + chunk of the size specified in the initial <c>select/3</c> + call is returned together with a new <c><anno>Continuation</anno></c>, + which can be used in subsequent calls to this function.</p> + <p>When there are no more objects in the table, <c>'$end_of_table'</c> + is returned.</p> + </desc> + </func> + <func> <name name="select" arity="2"/> - <fsummary>Match the objects in an ETS table against a match_spec.</fsummary> + <fsummary>Match the objects in an ETS table against a + match specification.</fsummary> <desc> - <p>Matches the objects in the table <c><anno>Tab</anno></c> using a - <seealso marker="#match_spec">match_spec</seealso>. This is a - more general call than the <c>ets:match/2</c> and - <c>ets:match_object/2</c> calls. In its simplest forms the - match_specs look like this:</p> - <list type="bulleted"> - <item>MatchSpec = [MatchFunction]</item> - <item>MatchFunction = {MatchHead, [Guard], [Result]}</item> - <item>MatchHead = "Pattern as in ets:match"</item> - <item>Guard = {"Guardtest name", ...}</item> - <item>Result = "Term construct"</item> - </list> - <p>This means that the match_spec is always a list of one or - more tuples (of arity 3). The tuples first element should be - a pattern as described in the documentation of - <c>ets:match/2</c>. The second element of the tuple should + <p>Matches the objects in table <c><anno>Tab</anno></c> using a + <seealso marker="#match_spec">match specification</seealso>. + This is a more general call than + <seealso marker="#match/2"><c>match/2</c></seealso> and + <seealso marker="#match_object/2"><c>match_object/2</c></seealso> + calls. In its simplest form, the match specification is as + follows:</p> + <code type="none"> +MatchSpec = [MatchFunction] +MatchFunction = {MatchHead, [Guard], [Result]} +MatchHead = "Pattern as in ets:match" +Guard = {"Guardtest name", ...} +Result = "Term construct"</code> + <p>This means that the match specification is always a list of one or + more tuples (of arity 3). The first element of the tuple is to be + a pattern as described in + <seealso marker="#match/2"><c>match/2</c></seealso>. + The second element of the tuple is to be a list of 0 or more guard tests (described below). The - third element of the tuple should be a list containing a - description of the value to actually return. In almost all - normal cases the list contains exactly one term which fully + third element of the tuple is to be a list containing a + description of the value to return. In almost all + normal cases, the list contains exactly one term that fully describes the value to return for each object.</p> <p>The return value is constructed using the "match variables" - bound in the MatchHead or using the special match variables + bound in <c>MatchHead</c> or using the special match variables <c>'$_'</c> (the whole matching object) and <c>'$$'</c> (all match variables in a list), so that the following - <c>ets:match/2</c> expression:</p> + <c>match/2</c> expression:</p> <code type="none"> ets:match(Tab,{'$1','$2','$3'})</code> <p>is exactly equivalent to:</p> <code type="none"> ets:select(Tab,[{{'$1','$2','$3'},[],['$$']}])</code> - <p>- and the following <c>ets:match_object/2</c> call:</p> + <p>And that the following <c>match_object/2</c> call:</p> <code type="none"> ets:match_object(Tab,{'$1','$2','$1'})</code> <p>is exactly equivalent to</p> <code type="none"> ets:select(Tab,[{{'$1','$2','$1'},[],['$_']}])</code> <p>Composite terms can be constructed in the <c>Result</c> part - either by simply writing a list, so that this code:</p> + either by simply writing a list, so that the following code:</p> <code type="none"> ets:select(Tab,[{{'$1','$2','$3'},[],['$$']}])</code> <p>gives the same output as:</p> <code type="none"> ets:select(Tab,[{{'$1','$2','$3'},[],[['$1','$2','$3']]}])</code> - <p>i.e. all the bound variables in the match head as a list. If + <p>That is, all the bound variables in the match head as a list. If tuples are to be constructed, one has to write a tuple of - arity 1 with the single element in the tuple being the tuple - one wants to construct (as an ordinary tuple could be mistaken - for a <c>Guard</c>). Therefore the following call:</p> + arity 1 where the single element in the tuple is the tuple + one wants to construct (as an ordinary tuple can be mistaken + for a <c>Guard</c>).</p> + <p>Therefore the following call:</p> <code type="none"> ets:select(Tab,[{{'$1','$2','$1'},[],['$_']}])</code> <p>gives the same output as:</p> <code type="none"> ets:select(Tab,[{{'$1','$2','$1'},[],[{{'$1','$2','$3'}}]}])</code> - <p>- this syntax is equivalent to the syntax used in the trace - patterns (see - <seealso marker="runtime_tools:dbg">dbg(3)</seealso>).</p> - <p>The <c>Guard</c>s are constructed as tuples where the first - element is the name of the test and the rest of the elements - are the parameters of the test. To check for a specific type + <p>This syntax is equivalent to the syntax used in the trace + patterns (see the + <seealso marker="runtime_tools:dbg"> + <c>dbg(3)</c></seealso>) module in Runtime_Tools.</p> + <p>The <c>Guard</c>s are constructed as tuples, where the first + element is the test name and the remaining elements + are the test parameters. To check for a specific type (say a list) of the element bound to the match variable <c>'$1'</c>, one would write the test as <c>{is_list, '$1'}</c>. If the test fails, the object in the - table will not match and the next <c>MatchFunction</c> (if - any) will be tried. Most guard tests present in Erlang can be + table does not match and the next <c>MatchFunction</c> (if + any) is tried. Most guard tests present in Erlang can be used, but only the new versions prefixed <c>is_</c> are - allowed (like <c>is_float</c>, <c>is_atom</c> etc).</p> + allowed (<c>is_float</c>, <c>is_atom</c>, and so on).</p> <p>The <c>Guard</c> section can also contain logic and arithmetic operations, which are written with the same syntax - as the guard tests (prefix notation), so that a guard test - written in Erlang looking like this:</p> + as the guard tests (prefix notation), so that the following + guard test written in Erlang:</p> <code type="none"><![CDATA[ is_integer(X), is_integer(Y), X + Y < 4711]]></code> - <p>is expressed like this (X replaced with '$1' and Y with - '$2'):</p> + <p>is expressed as follows (<c>X</c> replaced with <c>'$1'</c> and + <c>Y</c> with <c>'$2'</c>):</p> <code type="none"><![CDATA[ [{is_integer, '$1'}, {is_integer, '$2'}, {'<', {'+', '$1', '$2'}, 4711}]]]></code> - <p>On tables of the <c>ordered_set</c> type, objects are visited - in the same order as in a <c>first/next</c> - traversal. This means that the match specification will be - executed against objects with keys in the <c>first/next</c> - order and the corresponding result list will be in the order of that + <p>For tables of type <c>ordered_set</c>, objects are visited + in the same order as in a <c>first</c>/<c>next</c> + traversal. This means that the match specification is + executed against objects with keys in the <c>first</c>/<c>next</c> + order and the corresponding result list is in the order of that execution.</p> - </desc> </func> + <func> <name name="select" arity="3"/> - <fsummary>Match the objects in an ETS table against a match_spec and returns part of the answers.</fsummary> + <fsummary>Match the objects in an ETS table against a match + specification and return part of the answers.</fsummary> <desc> - <p>Works like <c>ets:select/2</c> but only returns a limited - (<c><anno>Limit</anno></c>) number of matching objects. The - <c><anno>Continuation</anno></c> term can then be used in subsequent calls - to <c>ets:select/1</c> to get the next chunk of matching - objects. This is a space efficient way to work on objects in a - table which is still faster than traversing the table object - by object using <c>ets:first/1</c> and <c>ets:next/1</c>.</p> - <p><c>'$end_of_table'</c> is returned if the table is empty.</p> - </desc> - </func> - <func> - <name name="select" arity="1"/> - <fsummary>Continue matching objects in an ETS table.</fsummary> - <desc> - <p>Continues a match started with - <c>ets:select/3</c>. The next - chunk of the size given in the initial <c>ets:select/3</c> - call is returned together with a new <c><anno>Continuation</anno></c> - that can be used in subsequent calls to this function.</p> - <p><c>'$end_of_table'</c> is returned when there are no more - objects in the table.</p> + <p>Works like <seealso marker="#select/2"><c>select/2</c></seealso>, + but only returns a limited + (<c><anno>Limit</anno></c>) number of matching objects. Term + <c><anno>Continuation</anno></c> can then be used in subsequent + calls to <seealso marker="#select/1"><c>select/1</c></seealso> + to get the next chunk of matching + objects. This is a space-efficient way to work on objects in a + table, which is still faster than traversing the table object by + object using <seealso marker="#first/1"><c>first/1</c></seealso> + and <seealso marker="#next/2"><c>next/2</c></seealso>.</p> + <p>If the table is empty, <c>'$end_of_table'</c> is returned.</p> </desc> </func> + <func> <name name="select_count" arity="2"/> - <fsummary>Match the objects in an ETS table against a match_spec and returns the number of objects for which the match_spec returned 'true'</fsummary> + <fsummary>Match the objects in an ETS table against a match + specification and return the number of objects for which the match + specification returned <c>true</c>.</fsummary> <desc> - <p>Matches the objects in the table <c><anno>Tab</anno></c> using a - <seealso marker="#match_spec">match_spec</seealso>. If the - match_spec returns <c>true</c> for an object, that object + <p>Matches the objects in table <c><anno>Tab</anno></c> using a + <seealso marker="#match_spec">match specification</seealso>. If the + match specification returns <c>true</c> for an object, that object considered a match and is counted. For any other result from - the match_spec the object is not considered a match and is + the match specification the object is not considered a match and is therefore not counted.</p> - <p>The function could be described as a <c>match_delete/2</c> - that does not actually delete any elements, but only counts - them.</p> + <p>This function can be described as a + <seealso marker="#match_delete/2"><c>match_delete/2</c></seealso> + function that does not delete any elements, but only counts them.</p> <p>The function returns the number of objects matched.</p> </desc> </func> + <func> <name name="select_delete" arity="2"/> - <fsummary>Match the objects in an ETS table against a match_spec and deletes objects where the match_spec returns 'true'</fsummary> + <fsummary>Match the objects in an ETS table against a match + specification and delete objects where the match specification + returns <c>true</c>.</fsummary> <desc> - <p>Matches the objects in the table <c><anno>Tab</anno></c> using a - <seealso marker="#match_spec">match_spec</seealso>. If the - match_spec returns <c>true</c> for an object, that object is - removed from the table. For any other result from the - match_spec the object is retained. This is a more general - call than the <c>ets:match_delete/2</c> call.</p> - <p>The function returns the number of objects actually + <p>Matches the objects in table <c><anno>Tab</anno></c> using a + <seealso marker="#match_spec">match specification</seealso>. If the + match specification returns <c>true</c> for an object, that object is + removed from the table. For any other result from the match + specification the object is retained. This is a more general + call than the <seealso marker="#match_delete/2"> + <c>match_delete/2</c></seealso> call.</p> + <p>The function returns the number of objects deleted from the table.</p> <note> - <p>The <c>match_spec</c> has to return the atom <c>true</c> if - the object is to be deleted. No other return value will get the - object deleted, why one can not use the same match specification for + <p>The match specification has to return the atom <c>true</c> if + the object is to be deleted. No other return value gets the + object deleted. So one cannot use the same match specification for looking up elements as for deleting them.</p> </note> </desc> </func> - <func> - <name name="select_reverse" arity="2"/> - <fsummary>Match the objects in an ETS table against a match_spec.</fsummary> - <desc> - - <p>Works like <c>select/2</c>, but returns the list in reverse - order for the <c>ordered_set</c> table type. For all other table - types, the return value is identical to that of <c>select/2</c>.</p> - - </desc> - </func> - <func> - <name name="select_reverse" arity="3"/> - <fsummary>Match the objects in an ETS table against a match_spec and returns part of the answers.</fsummary> - <desc> - - <p>Works like <c>select/3</c>, but for the <c>ordered_set</c> - table type, traversing is done starting at the last object in - Erlang term order and moves towards the first. For all other - table types, the return value is identical to that of - <c>select/3</c>.</p> - <p>Note that this is <em>not</em> equivalent to - reversing the result list of a <c>select/3</c> call, as the result list - is not only reversed, but also contains the last <c><anno>Limit</anno></c> - matching objects in the table, not the first.</p> - - </desc> - </func> <func> <name name="select_reverse" arity="1"/> <fsummary>Continue matching objects in an ETS table.</fsummary> <desc> - - <p>Continues a match started with - <c>ets:select_reverse/3</c>. If the table is an - <c>ordered_set</c>, the traversal of the table will continue - towards objects with keys earlier in the Erlang term order. The - returned list will also contain objects with keys in reverse - order.</p> - - <p>For all other table types, the behaviour is exactly that of <c>select/1</c>.</p> - <p>Example:</p> + <p>Continues a match started with <seealso marker="#select_reverse/3"> + <c>select_reverse/3</c></seealso>. For tables of type + <c>ordered_set</c>, the traversal of the table continues + to objects with keys earlier in the Erlang term order. The + returned list also contains objects with keys in reverse order. + For all other table types, the behavior is exactly that of + <seealso marker="#select/1"><c>select/1</c></seealso>.</p> + <p><em>Example:</em></p> <code> 1> T = ets:new(x,[ordered_set]). 2> [ ets:insert(T,{N}) || N <- lists:seq(1,10) ]. @@ -1384,217 +1523,288 @@ is_integer(X), is_integer(Y), X + Y < 4711]]></code> 8> R2. [{2},{1}] 9> '$end_of_table' = ets:select_reverse(C2). -... - </code> +...</code> </desc> </func> + + <func> + <name name="select_reverse" arity="2"/> + <fsummary>Match the objects in an ETS table against a + match specification.</fsummary> + <desc> + <p>Works like <seealso marker="#select/2"><c>select/2</c></seealso>, + but returns the list in reverse order for table type <c>ordered_set</c>. + For all other table types, the return value is identical to that of + <c>select/2</c>.</p> + </desc> + </func> + + <func> + <name name="select_reverse" arity="3"/> + <fsummary>Match the objects in an ETS table against a + match specification and return part of the answers.</fsummary> + <desc> + <p>Works like <seealso marker="#select/3"><c>select/3</c></seealso>, + but for table type <c>ordered_set</c> + traversing is done starting at the last object in + Erlang term order and moves to the first. For all other table + types, the return value is identical to that of <c>select/3</c>.</p> + <p>Notice that this is <em>not</em> equivalent to + reversing the result list of a <c>select/3</c> call, as the result list + is not only reversed, but also contains the last + <c><anno>Limit</anno></c> + matching objects in the table, not the first.</p> + </desc> + </func> + <func> <name name="setopts" arity="2"/> <fsummary>Set table options.</fsummary> <desc> - <p>Set table options. The only option that currently is allowed to be - set after the table has been created is - <seealso marker="#heir">heir</seealso>. The calling process must be - the table owner.</p> + <p>Sets table options. The only allowed option to be set after the + table has been created is + <seealso marker="#heir"><c>heir</c></seealso>. + The calling process must be the table owner.</p> </desc> </func> + <func> <name name="slot" arity="2"/> - <fsummary>Return all objects in a given slot of an ETS table.</fsummary> + <fsummary>Return all objects in a specified slot of an ETS table. + </fsummary> <desc> <p>This function is mostly for debugging purposes, Normally - one should use <c>first/next</c> or <c>last/prev</c> instead.</p> - <p>Returns all objects in the <c><anno>I</anno></c>:th slot of the table - <c><anno>Tab</anno></c>. A table can be traversed by repeatedly calling - the function, starting with the first slot <c><anno>I</anno>=0</c> and + <c>first</c>/<c>next</c> or <c>last</c>/<c>prev</c> are to be used + instead.</p> + <p>Returns all objects in slot <c><anno>I</anno></c> of table + <c><anno>Tab</anno></c>. A table can be traversed by repeatedly + calling the function, + starting with the first slot <c><anno>I</anno>=0</c> and ending when <c>'$end_of_table'</c> is returned. - The function will fail with reason <c>badarg</c> if the - <c><anno>I</anno></c> argument is out of range.</p> - <p>Unless a table of type <c>set</c>, <c>bag</c> or + If argument <c><anno>I</anno></c> is out of range, + the function fails with reason <c>badarg</c>.</p> + <p>Unless a table of type <c>set</c>, <c>bag</c>, or <c>duplicate_bag</c> is protected using - <c>safe_fixtable/2</c>, see above, a traversal may fail if - concurrent updates are made to the table. If the table is of - type <c>ordered_set</c>, the function returns a list - containing the <c><anno>I</anno></c>:th object in Erlang term order.</p> + <seealso marker="#safe_fixtable/2"><c>safe_fixtable/2</c></seealso>, + a traversal can fail if + concurrent updates are made to the table. For table type + <c>ordered_set</c>, the function returns a list containing + object <c><anno>I</anno></c> in Erlang term order.</p> </desc> </func> + <func> <name name="tab2file" arity="2"/> <fsummary>Dump an ETS table to a file.</fsummary> <desc> - <p>Dumps the table <c><anno>Tab</anno></c> to the file <c><anno>Filename</anno></c>.</p> - <p>Equivalent to <c>tab2file(<anno>Tab</anno>, <anno>Filename</anno>,[])</c></p> - + <p>Dumps table <c><anno>Tab</anno></c> to file + <c><anno>Filename</anno></c>.</p> + <p>Equivalent to + <c>tab2file(<anno>Tab</anno>, <anno>Filename</anno>,[])</c></p> </desc> </func> + <func> <name name="tab2file" arity="3"/> <fsummary>Dump an ETS table to a file.</fsummary> <desc> - <p>Dumps the table <c><anno>Tab</anno></c> to the file <c><anno>Filename</anno></c>.</p> - <p>When dumping the table, certain information about the table - is dumped to a header at the beginning of the dump. This - information contains data about the table type, - name, protection, size, version and if it's a named table. It - also contains notes about what extended information is added - to the file, which can be a count of the objects in the file - or a MD5 sum of the header and records in the file.</p> - <p>The size field in the header might not correspond to the - actual number of records in the file if the table is public - and records are added or removed from the table during - dumping. Public tables updated during dump, and that one wants - to verify when reading, needs at least one field of extended - information for the read verification process to be reliable - later.</p> - <p>The <c>extended_info</c> option specifies what extra - information is written to the table dump:</p> - <taglist> - <tag><c>object_count</c></tag> - <item><p>The number of objects actually written to the file is - noted in the file footer, why verification of file truncation - is possible even if the file was updated during - dump.</p></item> - <tag><c>md5sum</c></tag> - <item><p>The header and objects in the file are checksummed using - the built in MD5 functions. The MD5 sum of all objects is - written in the file footer, so that verification while reading - will detect the slightest bitflip in the file data. Using this - costs a fair amount of CPU time.</p></item> - </taglist> - <p>Whenever the <c>extended_info</c> option is used, it - results in a file not readable by versions of ets prior to - that in stdlib-1.15.1</p> - <p>The <c>sync</c> option, if set to <c>true</c>, ensures that - the content of the file is actually written to the disk before - <c>tab2file</c> returns. Default is <c>{sync, false}</c>.</p> + <p>Dumps table <c><anno>Tab</anno></c> to file + <c><anno>Filename</anno></c>.</p> + <p>When dumping the table, some information about the table + is dumped to a header at the beginning of the dump. This + information contains data about the table type, + name, protection, size, version, and if it is a named table. It + also contains notes about what extended information is added + to the file, which can be a count of the objects in the file + or a MD5 sum of the header and records in the file.</p> + <p>The size field in the header might not correspond to the + number of records in the file if the table is public + and records are added or removed from the table during + dumping. Public tables updated during dump, and that one wants + to verify when reading, needs at least one field of extended + information for the read verification process to be reliable + later.</p> + <p>Option <c>extended_info</c> specifies what extra + information is written to the table dump:</p> + <taglist> + <tag><c>object_count</c></tag> + <item> + <p>The number of objects written to the file is + noted in the file footer, so file truncation can be + verified even if the file was updated during dump.</p> + </item> + <tag><c>md5sum</c></tag> + <item> + <p>The header and objects in the file are checksummed using + the built-in MD5 functions. The MD5 sum of all objects is + written in the file footer, so that verification while reading + detects the slightest bitflip in the file data. Using this + costs a fair amount of CPU time.</p> + </item> + </taglist> + <p>Whenever option <c>extended_info</c> is used, it + results in a file not readable by versions of ETS before + that in STDLIB 1.15.1</p> + <p>If option <c>sync</c> is set to <c>true</c>, it ensures that + the content of the file is written to the disk before + <c>tab2file</c> returns. Defaults to <c>{sync, false}</c>.</p> </desc> </func> + <func> <name name="tab2list" arity="1"/> <fsummary>Return a list of all objects in an ETS table.</fsummary> <desc> - <p>Returns a list of all objects in the table <c><anno>Tab</anno></c>.</p> + <p>Returns a list of all objects in table <c><anno>Tab</anno></c>.</p> </desc> </func> + <func> <name name="tabfile_info" arity="1"/> <fsummary>Return a list of all objects in an ETS table.</fsummary> <desc> - <p>Returns information about the table dumped to file by - <seealso marker="#tab2file/2">tab2file/2</seealso> or - <seealso marker="#tab2file/3">tab2file/3</seealso></p> - <p>The following items are returned:</p> - <taglist> - <tag>name</tag> - <item><p>The name of the dumped table. If the table was a - named table, a table with the same name cannot exist when the - table is loaded from file with - <seealso marker="#file2tab/2">file2tab/2</seealso>. If the table is - not saved as a named table, this field has no significance - at all when loading the table from file.</p></item> - <tag>type</tag> - <item>The ets type of the dumped table (i.e. <c>set</c>, <c>bag</c>, - <c>duplicate_bag</c> or <c>ordered_set</c>). This type will be used - when loading the table again.</item> - <tag>protection</tag> - <item>The protection of the dumped table (i.e. <c>private</c>, - <c>protected</c> or <c>public</c>). A table loaded from the file - will get the same protection.</item> - <tag>named_table</tag> - <item><c>true</c> if the table was a named table when dumped - to file, otherwise <c>false</c>. Note that when a named table - is loaded from a file, there cannot exist a table in the - system with the same name.</item> - <tag>keypos</tag> - <item>The <c>keypos</c> of the table dumped to file, which - will be used when loading the table again.</item> - <tag>size</tag> - <item>The number of objects in the table when the table dump - to file started, which in case of a <c>public</c> table need - not correspond to the number of objects actually saved to the - file, as objects might have been added or deleted by another - process during table dump.</item> - <tag>extended_info</tag> - <item>The extended information written in the file footer to - allow stronger verification during table loading from file, as - specified to <seealso - marker="#tab2file/3">tab2file/3</seealso>. Note that this - function only tells <em>which</em> information is present, not - the values in the file footer. The value is a list containing - one or more of the atoms <c>object_count</c> and - <c>md5sum</c>.</item> - <tag>version</tag> - <item>A tuple <c>{<anno>Major</anno>,<anno>Minor</anno>}</c> containing the major and - minor version of the file format for ets table dumps. This - version field was added beginning with stdlib-1.5.1, files - dumped with older versions will return <c>{0,0}</c> in this - field.</item> - </taglist> - <p>An error is returned if the file is inaccessible, - badly damaged or not an file produced with <seealso - marker="#tab2file/2">tab2file/2</seealso> or <seealso - marker="#tab2file/3">tab2file/3</seealso>.</p> + <p>Returns information about the table dumped to file by + <seealso marker="#tab2file/2"><c>tab2file/2</c></seealso> or + <seealso marker="#tab2file/3"><c>tab2file/3</c></seealso>.</p> + <p>The following items are returned:</p> + <taglist> + <tag><c>name</c></tag> + <item> + <p>The name of the dumped table. If the table was a + named table, a table with the same name cannot exist when the + table is loaded from file with + <seealso marker="#file2tab/2"><c>file2tab/2</c></seealso>. + If the table is + not saved as a named table, this field has no significance + when loading the table from file.</p> + </item> + <tag><c>type</c></tag> + <item> + <p>The ETS type of the dumped table (that is, <c>set</c>, + <c>bag</c>, <c>duplicate_bag</c>, or <c>ordered_set</c>). This + type is used when loading the table again.</p> + </item> + <tag><c>protection</c></tag> + <item> + <p>The protection of the dumped table (that is, <c>private</c>, + <c>protected</c>, or <c>public</c>). A table loaded from the + file gets the same protection.</p> + </item> + <tag><c>named_table</c></tag> + <item> + <p><c>true</c> if the table was a named table when dumped + to file, otherwise <c>false</c>. Notice that when a named table + is loaded from a file, there cannot exist a table in the + system with the same name.</p> + </item> + <tag><c>keypos</c></tag> + <item> + <p>The <c>keypos</c> of the table dumped to file, which + is used when loading the table again.</p> + </item> + <tag><c>size</c></tag> + <item> + <p>The number of objects in the table when the table dump + to file started. For a <c>public</c> table, this number + does not need to correspond to the number of objects saved to + the file, as objects can have been added or deleted by another + process during table dump.</p> + </item> + <tag><c>extended_info</c></tag> + <item> + <p>The extended information written in the file footer to + allow stronger verification during table loading from file, as + specified to <seealso marker="#tab2file/3"> + <c>tab2file/3</c></seealso>. Notice that this + function only tells <em>which</em> information is present, not + the values in the file footer. The value is a list containing one + or more of the atoms <c>object_count</c> and <c>md5sum</c>.</p> + </item> + <tag><c>version</c></tag> + <item> + <p>A tuple <c>{<anno>Major</anno>,<anno>Minor</anno>}</c> + containing the major and + minor version of the file format for ETS table dumps. This + version field was added beginning with STDLIB 1.5.1. + Files dumped with older versions return <c>{0,0}</c> in this + field.</p> + </item> + </taglist> + <p>An error is returned if the file is inaccessible, + badly damaged, or not produced with + <seealso marker="#tab2file/2"><c>tab2file/2</c></seealso> or + <seealso marker="#tab2file/3"><c>tab2file/3</c></seealso>.</p> </desc> </func> + <func> <name name="table" arity="1"/> <name name="table" arity="2"/> <fsummary>Return a QLC query handle.</fsummary> <desc> - <p><marker id="qlc_table"></marker>Returns a QLC (Query List - Comprehension) query handle. The module <c>qlc</c> implements - a query language aimed mainly at Mnesia but ETS tables, Dets - tables, and lists are also recognized by QLC as sources of - data. Calling <c>ets:table/1,2</c> is the means to make the + <p>Returns a Query List + Comprehension (QLC) query handle. The + <seealso marker="qlc"><c>qlc</c></seealso> module provides + a query language aimed mainly at Mnesia, but ETS + tables, Dets tables, + and lists are also recognized by QLC as sources of + data. Calling <c>table/1,2</c> is the means to make the ETS table <c>Tab</c> usable to QLC.</p> - <p>When there are only simple restrictions on the key position - QLC uses <c>ets:lookup/2</c> to look up the keys, but when - that is not possible the whole table is traversed. The - option <c>traverse</c> determines how this is done:</p> - <list type="bulleted"> + <p>When there are only simple restrictions on the key position, + QLC uses <seealso marker="#lookup/2"><c>lookup/2</c></seealso> + to look up the keys. When + that is not possible, the whole table is traversed. + Option <c>traverse</c> determines how this is done:</p> + <taglist> + <tag><c>first_next</c></tag> <item> - <p><c>first_next</c>. The table is traversed one key at - a time by calling <c>ets:first/1</c> and - <c>ets:next/2</c>.</p> + <p>The table is traversed one key at a time by calling + <seealso marker="#first/1"><c>first/1</c></seealso> and + <seealso marker="#next/2"><c>next/2</c></seealso>.</p> </item> + <tag><c>last_prev</c></tag> <item> - <p><c>last_prev</c>. The table is traversed one key at - a time by calling <c>ets:last/1</c> and - <c>ets:prev/2</c>.</p> + <p>The table is traversed one key at a time by calling + <seealso marker="#last/1"><c>last/1</c></seealso> and + <seealso marker="#prev/2"><c>prev/2</c></seealso>.</p> </item> + <tag><c>select</c></tag> <item> - <p><c>select</c>. The table is traversed by calling - <c>ets:select/3</c> and <c>ets:select/1</c>. The option - <c>n_objects</c> determines the number of objects + <p>The table is traversed by calling + <seealso marker="#select/3"><c>select/3</c></seealso> and + <seealso marker="#select/1"><c>select/1</c></seealso>. + Option <c>n_objects</c> determines the number of objects returned (the third argument of <c>select/3</c>); the default is to return <c>100</c> objects at a time. The - <seealso marker="#match_spec">match_spec</seealso> (the - second argument of <c>select/3</c>) is assembled by QLC: - simple filters are translated into equivalent match_specs - while more complicated filters have to be applied to all - objects returned by <c>select/3</c> given a match_spec + <seealso marker="#match_spec">match specification</seealso> (the + second argument of <c>select/3</c>) is assembled by QLC: simple + filters are translated into equivalent match specifications + while more complicated filters must be applied to all + objects returned by <c>select/3</c> given a match specification that matches all objects.</p> </item> + <tag><c>{select, <anno>MatchSpec</anno>}</c></tag> <item> - <p><c>{select, <anno>MatchSpec</anno>}</c>. As for <c>select</c> - the table is traversed by calling <c>ets:select/3</c> and - <c>ets:select/1</c>. The difference is that the - match_spec is explicitly given. This is how to state - match_specs that cannot easily be expressed within the - syntax provided by QLC.</p> + <p>As for <c>select</c>, the table is traversed by calling + <seealso marker="#select/3"><c>select/3</c></seealso> and + <seealso marker="#select/1"><c>select/1</c></seealso>. + The difference is that the match specification is explicitly + specified. This is how to state match specifications that cannot + easily be expressed within the syntax provided by QLC.</p> </item> - </list> - <p>The following example uses an explicit match_spec to - traverse the table:</p> + </taglist> + <p><em>Examples:</em></p> + <p>An explicit match specification is here used to traverse the + table:</p> <pre> 9> <input>true = ets:insert(Tab = ets:new(t, []), [{1,a},{2,b},{3,c},{4,d}]),</input> <input>MS = ets:fun2ms(fun({X,Y}) when (X > 1) or (X < 5) -> {Y} end),</input> <input>QH1 = ets:table(Tab, [{traverse, {select, MS}}]).</input></pre> - <p>An example with implicit match_spec:</p> + <p>An example with an implicit match specification:</p> <pre> 10> <input>QH2 = qlc:q([{Y} || {X,Y} <- ets:table(Tab), (X > 1) or (X < 5)]).</input></pre> - <p>The latter example is in fact equivalent to the former which - can be verified using the function <c>qlc:info/1</c>:</p> + <p>The latter example is equivalent to the former, which + can be verified using function <c>qlc:info/1</c>:</p> <pre> 11> <input>qlc:info(QH1) =:= qlc:info(QH2).</input> true</pre> @@ -1603,52 +1813,60 @@ true</pre> two query handles.</p> </desc> </func> + + <func> + <name name="take" arity="2"/> + <fsummary>Return and remove all objects with a specified key from an + ETS table.</fsummary> + <desc> + <p>Returns and removes a list of all objects with key + <c><anno>Key</anno></c> in table <c><anno>Tab</anno></c>.</p> + <p>The specified <c><anno>Key</anno></c> is used to identify the object + by either <em>comparing equal</em> the key of an object in an + <c>ordered_set</c> table, or <em>matching</em> in other types of + tables (for details on the difference, see + <seealso marker="#lookup/2"><c>lookup/2</c></seealso> and + <seealso marker="#new/2"><c>new/2</c></seealso>).</p> + </desc> + </func> <func> <name name="test_ms" arity="2"/> - <fsummary>Test a match_spec for use in ets:select/2.</fsummary> + <fsummary>Test a match specification for use in <c>select/2</c>. + </fsummary> <desc> <p>This function is a utility to test a - <seealso marker="#match_spec">match_spec</seealso> used in - calls to <c>ets:select/2</c>. The function both tests - <c><anno>MatchSpec</anno></c> for "syntactic" correctness and runs the - match_spec against the object <c><anno>Tuple</anno></c>. If the match_spec - contains errors, the tuple <c>{error, <anno>Errors</anno>}</c> is returned + <seealso marker="#match_spec">match specification</seealso> used in + calls to <seealso marker="#select/2"><c>select/2</c></seealso>. + The function both tests <c><anno>MatchSpec</anno></c> for "syntactic" + correctness and runs the match specification against object + <c><anno>Tuple</anno></c>.</p> + <p>If the match specification is syntactically correct, the function + either returns <c>{ok,<anno>Result</anno>}</c>, where + <c><anno>Result</anno></c> is what would have been the result in a + real <c>select/2</c> call, or <c>false</c> if the match specification + does not match object <c><anno>Tuple</anno></c>.</p> + <p>If the match specification contains errors, tuple + <c>{error, <anno>Errors</anno>}</c> is returned, where <c><anno>Errors</anno></c> is a list of natural language - descriptions of what was wrong with the match_spec. If the - match_spec is syntactically OK, the function returns - <c>{ok,<anno>Result</anno>}</c> where <c><anno>Result</anno></c> is what would have been - the result in a real <c>ets:select/2</c> call or <c>false</c> - if the match_spec does not match the object <c><anno>Tuple</anno></c>.</p> + descriptions of what was wrong with the match specification.</p> <p>This is a useful debugging and test tool, especially when - writing complicated <c>ets:select/2</c> calls.</p> + writing complicated <c>select/2</c> calls.</p> <p>See also: <seealso marker="erts:erlang#match_spec_test/3"> erlang:match_spec_test/3</seealso>.</p> </desc> </func> - <func> - <name name="take" arity="2"/> - <fsummary>Return and remove all objects with a given key from an ETS - table.</fsummary> - <desc> - <p>Returns a list of all objects with the key <c><anno>Key</anno></c> in - the table <c><anno>Tab</anno></c> and removes.</p> - <p>The given <c><anno>Key</anno></c> is used to identify the object by - either <em>comparing equal</em> the key of an object in an - <c>ordered_set</c> table, or <em>matching</em> in other types of - tables (see <seealso marker="#lookup/2">lookup/2</seealso> and - <seealso marker="#new/2">new/2</seealso> for details on the - difference).</p> - </desc> - </func> + <func> <name name="to_dets" arity="2"/> - <fsummary>Fill a Dets table with objects from an ETS table.</fsummary> + <fsummary>Fill a Dets table with objects from an ETS table. + </fsummary> <desc> <p>Fills an already created/opened Dets table with the objects - in the already opened ETS table named <c><anno>Tab</anno></c>. The Dets - table is emptied before the objects are inserted.</p> + in the already opened ETS table named <c><anno>Tab</anno></c>. + The Dets table is emptied before the objects are inserted.</p> </desc> </func> + <func> <name name="update_counter" arity="3" clause_i="1"/> <name name="update_counter" arity="4" clause_i="1"/> @@ -1666,107 +1884,112 @@ true</pre> <type variable="Default"/> <desc> <p>This function provides an efficient way to update one or more - counters, without the hassle of having to look up an object, update - the object by incrementing an element and insert the resulting object - into the table again. (The update is done atomically; i.e. no process - can access the ets table in the middle of the operation.) - </p> - <p>It will destructively update the object with key <c><anno>Key</anno></c> - in the table <c><anno>Tab</anno></c> by adding <c><anno>Incr</anno></c> to the element - at the <c><anno>Pos</anno></c>:th position. The new counter value is + counters, without the trouble of having to look up an object, update + the object by incrementing an element, and insert the resulting + object into the table again. (The update is done atomically, + that is, no process + can access the ETS table in the middle of the operation.)</p> + <p>This function destructively update the object with key + <c><anno>Key</anno></c> in table <c><anno>Tab</anno></c> by adding + <c><anno>Incr</anno></c> to the element at position + <c><anno>Pos</anno></c>. The new counter value is returned. If no position is specified, the element directly - following the key (<c><![CDATA[<keypos>+1]]></c>) is updated.</p> - <p>If a <c><anno>Threshold</anno></c> is specified, the counter will be - reset to the value <c><anno>SetValue</anno></c> if the following + following key (<c><![CDATA[<keypos>+1]]></c>) is updated.</p> + <p>If a <c><anno>Threshold</anno></c> is specified, the counter is + reset to value <c><anno>SetValue</anno></c> if the following conditions occur:</p> <list type="bulleted"> - <item>The <c><anno>Incr</anno></c> is not negative (<c>>= 0</c>) and the - result would be greater than (<c>></c>) <c><anno>Threshold</anno></c></item> - <item>The <c><anno>Incr</anno></c> is negative (<c><![CDATA[< 0]]></c>) and the - result would be less than (<c><![CDATA[<]]></c>) - <c><anno>Threshold</anno></c></item> + <item><p><c><anno>Incr</anno></c> is not negative (<c>>= 0</c>) and + the result would be greater than (<c>></c>) + <c><anno>Threshold</anno></c>.</p> + </item> + <item><p><c><anno>Incr</anno></c> is negative + (<c><![CDATA[< 0]]></c>) and the result would be less than + (<c><![CDATA[<]]></c>) <c><anno>Threshold</anno></c>.</p> + </item> </list> - <p>A list of <c><anno>UpdateOp</anno></c> can be supplied to do several update - operations within the object. The operations are carried out in the - order specified in the list. If the same counter position occurs - more than one time in the list, the corresponding counter will thus - be updated several times, each time based on the previous result. - The return value is a list of the new counter values from each - update operation in the same order as in the operation list. If an - empty list is specified, nothing is updated and an empty list is - returned. If the function should fail, no updates will be done at - all. - </p> - <p>The given <c><anno>Key</anno></c> is used to identify the object by either - <em>matching</em> the key of an object in a <c>set</c> table, - or <em>compare equal</em> to the key of an object in an - <c>ordered_set</c> table (see - <seealso marker="#lookup/2">lookup/2</seealso> and - <seealso marker="#new/2">new/2</seealso> - for details on the difference).</p> - <p>If a default object <c><anno>Default</anno></c> is given, it is used + <p>A list of <c><anno>UpdateOp</anno></c> can be supplied to do many + update operations within the object. + The operations are carried out in the + order specified in the list. If the same counter position occurs + more than once in the list, the corresponding counter is thus + updated many times, each time based on the previous result. + The return value is a list of the new counter values from each + update operation in the same order as in the operation list. If an + empty list is specified, nothing is updated and an empty list is + returned. If the function fails, no updates is done.</p> + <p>The specified <c><anno>Key</anno></c> is used to identify the object + by either <em>matching</em> the key of an object in a <c>set</c> + table, or <em>compare equal</em> to the key of an object in an + <c>ordered_set</c> table (for details on the difference, see + <seealso marker="#lookup/2"><c>lookup/2</c></seealso> and + <seealso marker="#new/2"><c>new/2</c></seealso>).</p> + <p>If a default object <c><anno>Default</anno></c> is specified, + it is used as the object to be updated if the key is missing from the table. The value in place of the key is ignored and replaced by the proper key value. The return value is as if the default object had not been used, - that is a single updated element or a list of them.</p> - <p>The function will fail with reason <c>badarg</c> if:</p> + that is, a single updated element or a list of them.</p> + <p>The function fails with reason <c>badarg</c> in the following + situations:</p> <list type="bulleted"> - <item>the table is not of type <c>set</c> or - <c>ordered_set</c>,</item> - <item>no object with the right key exists and no default object were - supplied,</item> - <item>the object has the wrong arity,</item> - <item>the default object arity is smaller than - <c><![CDATA[<keypos>]]></c></item> - <item>any field from the default object being updated is not an - integer</item> - <item>the element to update is not an integer,</item> - <item>the element to update is also the key, or,</item> - <item>any of <c><anno>Pos</anno></c>, <c><anno>Incr</anno></c>, <c><anno>Threshold</anno></c> or - <c><anno>SetValue</anno></c> is not an integer</item> + <item>The table type is not <c>set</c> or + <c>ordered_set</c>.</item> + <item>No object with the correct key exists and no default object was + supplied.</item> + <item>The object has the wrong arity.</item> + <item>The default object arity is smaller than + <c><![CDATA[<keypos>]]></c>.</item> + <item>Any field from the default object that is updated is not an + integer.</item> + <item>The element to update is not an integer.</item> + <item>The element to update is also the key.</item> + <item>Any of <c><anno>Pos</anno></c>, <c><anno>Incr</anno></c>, + <c><anno>Threshold</anno></c>, or <c><anno>SetValue</anno></c> + is not an integer.</item> </list> </desc> </func> + <func> <name name="update_element" arity="3" clause_i="1"/> <name name="update_element" arity="3" clause_i="2"/> - <fsummary>Updates the <c>Pos</c>:th element of the object with a given key in an ETS table.</fsummary> + <fsummary>Update the <c>Pos</c>:th element of the object with a + specified key in an ETS table.</fsummary> <type variable="Tab"/> <type variable="Key"/> <type variable="Value"/> <type variable="Pos"/> <desc> <p>This function provides an efficient way to update one or more - elements within an object, without the hassle of having to look up, - update and write back the entire object. - </p> - <p>It will destructively update the object with key <c><anno>Key</anno></c> - in the table <c><anno>Tab</anno></c>. The element at the <c><anno>Pos</anno></c>:th position - will be given the value <c><anno>Value</anno></c>. </p> - <p>A list of <c>{<anno>Pos</anno>,<anno>Value</anno>}</c> can be supplied to update several - elements within the same object. If the same position occurs more - than one in the list, the last value in the list will be written. If - the list is empty or the function fails, no updates will be done at - all. The function is also atomic in the sense that other processes - can never see any intermediate results. - </p> - <p>The function returns <c>true</c> if an object with the key - <c><anno>Key</anno></c> was found, <c>false</c> otherwise. - </p> - <p>The given <c><anno>Key</anno></c> is used to identify the object by either - <em>matching</em> the key of an object in a <c>set</c> table, - or <em>compare equal</em> to the key of an object in an - <c>ordered_set</c> table (see - <seealso marker="#lookup/2">lookup/2</seealso> and - <seealso marker="#new/2">new/2</seealso> - for details on the difference).</p> - <p>The function will fail with reason <c>badarg</c> if:</p> + elements within an object, without the trouble of having to look up, + update, and write back the entire object.</p> + <p>This function destructively updates the object with key + <c><anno>Key</anno></c> in table <c><anno>Tab</anno></c>. + The element at position <c><anno>Pos</anno></c> is given + the value <c><anno>Value</anno></c>.</p> + <p>A list of <c>{<anno>Pos</anno>,<anno>Value</anno>}</c> can be + supplied to update many + elements within the same object. If the same position occurs more + than once in the list, the last value in the list is written. If + the list is empty or the function fails, no updates are done. + The function is also atomic in the sense that other processes + can never see any intermediate results.</p> + <p>Returns <c>true</c> if an object with key <c><anno>Key</anno></c> + is found, otherwise <c>false</c>.</p> + <p>The specified <c><anno>Key</anno></c> is used to identify the object + by either <em>matching</em> the key of an object in a <c>set</c> + table, or <em>compare equal</em> to the key of an object in an + <c>ordered_set</c> table (for details on the difference, see + <seealso marker="#lookup/2"><c>lookup/2</c></seealso> and + <seealso marker="#new/2"><c>new/2</c></seealso>).</p> + <p>The function fails with reason <c>badarg</c> in the following + situations:</p> <list type="bulleted"> - <item>the table is not of type <c>set</c> or - <c>ordered_set</c>,</item> - <item><c><anno>Pos</anno></c> is less than 1 or greater than the object - arity, or,</item> - <item>the element to update is also the key</item> + <item>The table type is not <c>set</c> or <c>ordered_set</c>.</item> + <item><c><anno>Pos</anno></c> < 1.</item> + <item><c><anno>Pos</anno></c> > object arity.</item> + <item>The element to update is also the key.</item> </list> </desc> </func> diff --git a/lib/stdlib/doc/src/file_sorter.xml b/lib/stdlib/doc/src/file_sorter.xml index bc24f02a99..e988d58c2f 100644 --- a/lib/stdlib/doc/src/file_sorter.xml +++ b/lib/stdlib/doc/src/file_sorter.xml @@ -24,125 +24,150 @@ <title>file_sorter</title> <prepared>Hans Bolinder</prepared> - <responsible>nobody</responsible> + <responsible></responsible> <docno></docno> - <approved>nobody</approved> - <checked>no</checked> + <approved></approved> + <checked></checked> <date>2001-03-13</date> <rev>PA1</rev> - <file>file_sorter.sgml</file> + <file>file_sorter.xml</file> </header> <module>file_sorter</module> - <modulesummary>File Sorter</modulesummary> + <modulesummary>File sorter.</modulesummary> <description> - <p>The functions of this module sort terms on files, merge already - sorted files, and check files for sortedness. Chunks containing - binary terms are read from a sequence of files, sorted + <p>This module contains functions for sorting terms on files, merging + already sorted files, and checking files for sortedness. Chunks + containing binary terms are read from a sequence of files, sorted internally in memory and written on temporary files, which are merged producing one sorted file as output. Merging is provided as an optimization; it is faster when the files are already - sorted, but it always works to sort instead of merge. - </p> + sorted, but it always works to sort instead of merge.</p> + <p>On a file, a term is represented by a header and a binary. Two - options define the format of terms on files: - </p> - <list type="bulleted"> - <item><c>{header, HeaderLength}</c>. HeaderLength determines the - number of bytes preceding each binary and containing the - length of the binary in bytes. Default is 4. The order of the - header bytes is defined as follows: if <c>B</c> is a binary - containing a header only, the size <c>Size</c> of the binary - is calculated as - <c><![CDATA[<<Size:HeaderLength/unit:8>> = B]]></c>. + options define the format of terms on files:</p> + + <taglist> + <tag><c>{header, HeaderLength}</c></tag> + <item> + <p><c>HeaderLength</c> determines the + number of bytes preceding each binary and containing the + length of the binary in bytes. Defaults to 4. The order of the + header bytes is defined as follows: if <c>B</c> is a binary + containing a header only, size <c>Size</c> of the binary + is calculated as + <c><![CDATA[<<Size:HeaderLength/unit:8>> = B]]></c>.</p> </item> - <item><c>{format, Format}</c>. The format determines the - function that is applied to binaries in order to create the - terms that will be sorted. The default value is - <c>binary_term</c>, which is equivalent to - <c>fun binary_to_term/1</c>. The value <c>binary</c> is - equivalent to <c>fun(X) -> X end</c>, which means that the - binaries will be sorted as they are. This is the fastest - format. If <c>Format</c> is <c>term</c>, <c>io:read/2</c> is - called to read terms. In that case only the default value of - the <c>header</c> option is allowed. The <c>format</c> option - also determines what is written to the sorted output file: if - <c>Format</c> is <c>term</c> then <c>io:format/3</c> is called - to write each term, otherwise the binary prefixed by a header - is written. Note that the binary written is the same binary - that was read; the results of applying the <c>Format</c> - function are thrown away as soon as the terms have been - sorted. Reading and writing terms using the <c>io</c> module - is very much slower than reading and writing binaries. + <tag><c>{format, Format}</c></tag> + <item> + <p>Option <c>Format</c> determines the + function that is applied to binaries to create the + terms to be sorted. Defaults to + <c>binary_term</c>, which is equivalent to + <c>fun binary_to_term/1</c>. Value <c>binary</c> is + equivalent to <c>fun(X) -> X end</c>, which means that the + binaries are sorted as they are. This is the fastest + format. If <c>Format</c> is <c>term</c>, <c>io:read/2</c> is + called to read terms. In that case, only the default value of + option <c>header</c> is allowed.</p> + <p>Option <c>format</c> also determines what is written to the + sorted output file: if + <c>Format</c> is <c>term</c>, then <c>io:format/3</c> is called + to write each term, otherwise the binary prefixed by a header + is written. Notice that the binary written is the same binary + that was read; the results of applying function <c>Format</c> + are thrown away when the terms have been sorted. + Reading and writing terms using the <c>io</c> module + is much slower than reading and writing binaries.</p> </item> - </list> - <p>Other options are: - </p> - <list type="bulleted"> - <item><c>{order, Order}</c>. The default is to sort terms in - ascending order, but that can be changed by the value - <c>descending</c> or by giving an ordering function <c>Fun</c>. - An ordering function is antisymmetric, transitive and total. - <c>Fun(A, B)</c> should return <c>true</c> if <c>A</c> - comes before <c>B</c> in the ordering, <c>false</c> otherwise. - An example of a typical ordering function is less than or equal - to, <c>=</2</c>. - Using an ordering function will slow down the sort - considerably. The <c>keysort</c>, <c>keymerge</c> and - <c>keycheck</c> functions do not accept ordering functions. + </taglist> + + <p>Other options are:</p> + + <taglist> + <tag><c>{order, Order}</c></tag> + <item> + <p>The default is to sort terms in + ascending order, but that can be changed by value + <c>descending</c> or by specifying an ordering function <c>Fun</c>. + An ordering function is antisymmetric, transitive, and total. + <c>Fun(A, B)</c> is to return <c>true</c> if <c>A</c> + comes before <c>B</c> in the ordering, otherwise <c>false</c>. + An example of a typical ordering function is less than or equal + to, <c>=</2</c>. Using an ordering function slows down the sort + considerably. Functions <c>keysort</c>, <c>keymerge</c> and + <c>keycheck</c> do not accept ordering functions.</p> </item> - <item><c>{unique, boolean()}</c>. When sorting or merging files, - only the first of a sequence of terms that compare equal (<c>==</c>) - is output if this option is set to <c>true</c>. The default - value is <c>false</c> which implies that all terms that - compare equal are output. When checking files for - sortedness, a check that no pair of consecutive terms - compares equal is done if this option is set to <c>true</c>. + <tag><c>{unique, boolean()}</c></tag> + <item> + <p>When sorting or merging files, + only the first of a sequence of terms that compare equal (<c>==</c>) + is output if this option is set to <c>true</c>. Defaults + to <c>false</c>, which implies that all terms that + compare equal are output. When checking files for + sortedness, a check that no pair of consecutive terms + compares equal is done if this option is set to <c>true</c>.</p> </item> - <item><c>{tmpdir, TempDirectory}</c>. The directory where - temporary files are put can be chosen explicitly. The - default, implied by the value <c>""</c>, is to put temporary - files on the same directory as the sorted output file. If - output is a function (see below), the directory returned by - <c>file:get_cwd()</c> is used instead. The names of - temporary files are derived from the Erlang nodename - (<c>node()</c>), the process identifier of the current Erlang - emulator (<c>os:getpid()</c>), and a unique integer - (<c>erlang:unique_integer([positive])</c>); a typical name would be - <c>fs_mynode@myhost_1763_4711.17</c>, where - <c>17</c> is a sequence number. Existing files will be - overwritten. Temporary files are deleted unless some - uncaught EXIT signal occurs. + <tag><c>{tmpdir, TempDirectory}</c></tag> + <item> + <p>The directory where + temporary files are put can be chosen explicitly. The + default, implied by value <c>""</c>, is to put temporary + files on the same directory as the sorted output file. If + output is a function (see below), the directory returned by + <c>file:get_cwd()</c> is used instead. The names of + temporary files are derived from the Erlang nodename + (<c>node()</c>), the process identifier of the current Erlang + emulator (<c>os:getpid()</c>), and a unique integer + (<c>erlang:unique_integer([positive])</c>). A typical name is + <c>fs_mynode@myhost_1763_4711.17</c>, where + <c>17</c> is a sequence number. Existing files are + overwritten. Temporary files are deleted unless some + uncaught <c>EXIT</c> signal occurs.</p> </item> - <item><c>{compressed, boolean()}</c>. Temporary files and the - output file may be compressed. The default value - <c>false</c> implies that written files are not - compressed. Regardless of the value of the <c>compressed</c> - option, compressed files can always be read. Note that - reading and writing compressed files is significantly slower - than reading and writing uncompressed files. + <tag><c>{compressed, boolean()}</c></tag> + <item> + <p>Temporary files and the output file can be compressed. Defaults + <c>false</c>, which implies that written files are not + compressed. Regardless of the value of option <c>compressed</c>, + compressed files can always be read. Notice that + reading and writing compressed files are significantly slower + than reading and writing uncompressed files.</p> </item> - <item><c>{size, Size}</c>. By default approximately 512*1024 - bytes read from files are sorted internally. This option - should rarely be needed. + <tag><c>{size, Size}</c></tag> + <item> + <p>By default about 512*1024 bytes read from files are sorted + internally. This option is rarely needed.</p> </item> - <item><c>{no_files, NoFiles}</c>. By default 16 files are - merged at a time. This option should rarely be needed. + <tag><c>{no_files, NoFiles}</c></tag> + <item> + <p>By default 16 files are merged at a time. This option is rarely + needed.</p> </item> - </list> + </taglist> + <p>As an alternative to sorting files, a function of one argument - can be given as input. When called with the argument <c>read</c> - the function is assumed to return <c>end_of_input</c> or - <c>{end_of_input, Value}}</c> when there is no more input - (<c>Value</c> is explained below), or <c>{Objects, Fun}</c>, - where <c>Objects</c> is a list of binaries or terms depending on - the format and <c>Fun</c> is a new input function. Any other - value is immediately returned as value of the current call to - <c>sort</c> or <c>keysort</c>. Each input function will be - called exactly once, and should an error occur, the last - function is called with the argument <c>close</c>, the reply of - which is ignored. - </p> - <p>A function of one argument can be given as output. The results + can be specified as input. When called with argument <c>read</c>, + the function is assumed to return either of the following:</p> + + <list type="bulleted"> + <item> + <p><c>end_of_input</c> or <c>{end_of_input, Value}}</c> when there + is no more input (<c>Value</c> is explained below).</p> + </item> + <item> + <p><c>{Objects, Fun}</c>, where <c>Objects</c> is a list of binaries + or terms depending on the format, and <c>Fun</c> is a new input + function.</p> + </item> + </list> + + <p>Any other value is immediately returned as value of the current call + to <c>sort</c> or <c>keysort</c>. Each input function is + called exactly once. If an error occurs, the last + function is called with argument <c>close</c>, the reply of + which is ignored.</p> + + <p>A function of one argument can be specified as output. The results of sorting or merging the input is collected in a non-empty sequence of variable length lists of binaries or terms depending on the format. The output function is called with one list at a @@ -151,18 +176,20 @@ call to the sort or merge function. Each output function is called exactly once. When some output function has been applied to all of the results or an error occurs, the last function is - called with the argument <c>close</c>, and the reply is returned - as value of the current call to the sort or merge function. If a - function is given as input and the last input function returns - <c>{end_of_input, Value}</c>, the function given as output will - be called with the argument <c>{value, Value}</c>. This makes it + called with argument <c>close</c>, and the reply is returned + as value of the current call to the sort or merge function.</p> + + <p>If a function is specified as input and the last input function + returns <c>{end_of_input, Value}</c>, the function specified as output + is called with argument <c>{value, Value}</c>. This makes it easy to initiate the sequence of output functions with a value - calculated by the input functions. - </p> + calculated by the input functions.</p> + <p>As an example, consider sorting the terms on a disk log file. A function that reads chunks from the disk log and returns a list of binaries is used as input. The results are collected in a list of terms.</p> + <pre> sort(Log) -> {ok, _} = disk_log:open([{name,Log}, {mode,read_only}]), @@ -193,29 +220,32 @@ output(L) -> lists:append(lists:reverse(L)); (Terms) -> output([Terms | L]) - end. </pre> - <p>Further examples of functions as input and output can be found - at the end of the <c>file_sorter</c> module; the <c>term</c> - format is implemented with functions. - </p> + end.</pre> + + <p>For more examples of functions as input and output, see + the end of the <c>file_sorter</c> module; the <c>term</c> + format is implemented with functions.</p> + <p>The possible values of <c>Reason</c> returned when an error occurs are:</p> + <list type="bulleted"> <item> - <p><c>bad_object</c>, <c>{bad_object, FileName}</c>. + <p><c>bad_object</c>, <c>{bad_object, FileName}</c> - Applying the format function failed for some binary, or the key(s) could not be extracted from some term.</p> </item> <item> - <p><c>{bad_term, FileName}</c>. <c>io:read/2</c> failed + <p><c>{bad_term, FileName}</c> - <c>io:read/2</c> failed to read some term.</p> </item> <item> - <p><c>{file_error, FileName, file:posix()}</c>. See - <c>file(3)</c> for an explanation of <c>file:posix()</c>.</p> + <p><c>{file_error, FileName, file:posix()}</c> - For an + explanation of <c>file:posix()</c>, see + <seealso marker="kernel:file"><c>file(3)</c></seealso>.</p> </item> <item> - <p><c>{premature_eof, FileName}</c>. End-of-file was + <p><c>{premature_eof, FileName}</c> - End-of-file was encountered inside some binary term.</p> </item> </list> @@ -304,30 +334,53 @@ output(L) -> <funcs> <func> - <name name="sort" arity="1"/> - <fsummary>Sort terms on files.</fsummary> + <name name="check" arity="1"/> + <name name="check" arity="2"/> + <fsummary>Check whether terms on files are sorted.</fsummary> <desc> - <p>Sorts terms on files. <c>sort(FileName)</c> is equivalent - to <c>sort([FileName], FileName)</c>.</p> + <p>Checks files for sortedness. If a file is not sorted, the + first out-of-order element is returned. The first term on a + file has position 1.</p> + <p><c>check(FileName)</c> is equivalent to + <c>check([FileName], [])</c>.</p> </desc> </func> + <func> - <name name="sort" arity="2"/> - <name name="sort" arity="3"/> - <fsummary>Sort terms on files.</fsummary> + <name name="keycheck" arity="2"/> + <name name="keycheck" arity="3"/> + <fsummary>Check whether terms on files are sorted by key.</fsummary> <desc> - <p>Sorts terms on files. <c>sort(Input, Output)</c> is - equivalent to <c>sort(Input, Output, [])</c>.</p> + <p>Checks files for sortedness. If a file is not sorted, the + first out-of-order element is returned. The first term on a + file has position 1.</p> + <p><c>keycheck(KeyPos, FileName)</c> is equivalent + to <c>keycheck(KeyPos, [FileName], [])</c>.</p> </desc> </func> + + <func> + <name name="keymerge" arity="3"/> + <name name="keymerge" arity="4"/> + <fsummary>Merge terms on files by key.</fsummary> + <desc> + <p>Merges tuples on files. Each input file is assumed to be + sorted on key(s).</p> + <p><c>keymerge(KeyPos, FileNames, Output)</c> is equivalent + to <c>keymerge(KeyPos, FileNames, Output, [])</c>.</p> + </desc> + </func> + <func> <name name="keysort" arity="2"/> <fsummary>Sort terms on files by key.</fsummary> <desc> - <p>Sorts tuples on files. <c>keysort(N, FileName)</c> is + <p>Sorts tuples on files.</p> + <p><c>keysort(N, FileName)</c> is equivalent to <c>keysort(N, [FileName], FileName)</c>.</p> </desc> </func> + <func> <name name="keysort" arity="3"/> <name name="keysort" arity="4"/> @@ -335,13 +388,14 @@ output(L) -> <desc> <p>Sorts tuples on files. The sort is performed on the element(s) mentioned in <c><anno>KeyPos</anno></c>. If two - tuples compare equal (<c>==</c>) on one element, next + tuples compare equal (<c>==</c>) on one element, the next element according to <c><anno>KeyPos</anno></c> is compared. The sort is stable.</p> <p><c>keysort(N, Input, Output)</c> is equivalent to <c>keysort(N, Input, Output, [])</c>.</p> </desc> </func> + <func> <name name="merge" arity="2"/> <name name="merge" arity="3"/> @@ -353,39 +407,25 @@ output(L) -> <c>merge(FileNames, Output, [])</c>.</p> </desc> </func> + <func> - <name name="keymerge" arity="3"/> - <name name="keymerge" arity="4"/> - <fsummary>Merge terms on files by key.</fsummary> - <desc> - <p>Merges tuples on files. Each input file is assumed to be - sorted on key(s).</p> - <p><c>keymerge(KeyPos, FileNames, Output)</c> is equivalent - to <c>keymerge(KeyPos, FileNames, Output, [])</c>.</p> - </desc> - </func> - <func> - <name name="check" arity="1"/> - <name name="check" arity="2"/> - <fsummary>Check whether terms on files are sorted.</fsummary> + <name name="sort" arity="1"/> + <fsummary>Sort terms on files.</fsummary> <desc> - <p>Checks files for sortedness. If a file is not sorted, the - first out-of-order element is returned. The first term on a - file has position 1.</p> - <p><c>check(FileName)</c> is equivalent to - <c>check([FileName], [])</c>.</p> + <p>Sorts terms on files.</p> + <p><c>sort(FileName)</c> is equivalent + to <c>sort([FileName], FileName)</c>.</p> </desc> </func> + <func> - <name name="keycheck" arity="2"/> - <name name="keycheck" arity="3"/> - <fsummary>Check whether terms on files are sorted by key.</fsummary> + <name name="sort" arity="2"/> + <name name="sort" arity="3"/> + <fsummary>Sort terms on files.</fsummary> <desc> - <p>Checks files for sortedness. If a file is not sorted, the - first out-of-order element is returned. The first term on a - file has position 1.</p> - <p><c>keycheck(KeyPos, FileName)</c> is equivalent - to <c>keycheck(KeyPos, [FileName], [])</c>.</p> + <p>Sorts terms on files.</p> + <p><c>sort(Input, Output)</c> is + equivalent to <c>sort(Input, Output, [])</c>.</p> </desc> </func> </funcs> diff --git a/lib/stdlib/doc/src/filelib.xml b/lib/stdlib/doc/src/filelib.xml index 3ad159a66d..7c6380ce28 100644 --- a/lib/stdlib/doc/src/filelib.xml +++ b/lib/stdlib/doc/src/filelib.xml @@ -28,19 +28,23 @@ <docno>1</docno> <approved>Kenneth Lundin</approved> <checked></checked> - <date>03-01-21</date> + <date>2003-01-21</date> <rev>A</rev> - <file>filelib.sgml</file> + <file>filelib.xml</file> </header> <module>filelib</module> - <modulesummary>File utilities, such as wildcard matching of filenames</modulesummary> + <modulesummary>File utilities, such as wildcard matching of filenames. + </modulesummary> <description> - <p>This module contains utilities on a higher level than the <c>file</c> - module.</p> - <p>This module does not support "raw" file names (i.e. files whose names - do not comply with the expected encoding). Such files will be ignored - by the functions in this module.</p> - <p>For more information about raw file names, see the <seealso marker="kernel:file">file</seealso> module.</p> + <p>This module contains utilities on a higher level than the + <seealso marker="kernel:file"><c>file</c></seealso> module.</p> + + <p>This module does not support "raw" filenames (that is, files whose + names do not comply with the expected encoding). Such files are ignored + by the functions in this module.</p> + + <p>For more information about raw filenames, see the + <seealso marker="kernel:file"><c>file</c></seealso> module.</p> </description> <datatypes> @@ -61,93 +65,99 @@ <funcs> <func> <name name="ensure_dir" arity="1"/> - <fsummary>Ensure that all parent directories for a file or directory exist.</fsummary> + <fsummary>Ensure that all parent directories for a file or directory + exist.</fsummary> <desc> - <p>The <c>ensure_dir/1</c> function ensures that all parent - directories for the given file or directory name <c><anno>Name</anno></c> + <p>Ensures that all parent directories for the specified file or + directory name <c><anno>Name</anno></c> exist, trying to create them if necessary.</p> <p>Returns <c>ok</c> if all parent directories already exist - or could be created, or <c>{error, <anno>Reason</anno>}</c> if some parent - directory does not exist and could not be created for some - reason.</p> + or can be created. Returns <c>{error, <anno>Reason</anno>}</c> if + some parent directory does not exist and cannot be created.</p> </desc> </func> + <func> <name name="file_size" arity="1"/> - <fsummary>Return the size in bytes of the file.</fsummary> + <fsummary>Return the size in bytes of a file.</fsummary> <desc> - <p>The <c>file_size</c> function returns the size of the given file.</p> + <p>Returns the size of the specified file.</p> </desc> </func> + <func> <name name="fold_files" arity="5"/> <fsummary>Fold over all files matching a regular expression.</fsummary> <desc> - <p>The <c>fold_files/5</c> function folds the function - <c><anno>Fun</anno></c> over all (regular) files <c><anno>F</anno></c> in the - directory <c><anno>Dir</anno></c> that match the regular expression <c><anno>RegExp</anno></c> - (see the <seealso marker="re">re</seealso> module for a description - of the allowed regular expressions). - If <c><anno>Recursive</anno></c> is true all sub-directories to <c>Dir</c> - are processed. The regular expression matching is done on just - the filename without the directory part.</p> - - <p>If Unicode file name translation is in effect and the file - system is completely transparent, file names that cannot be - interpreted as Unicode may be encountered, in which case the - <c>fun()</c> must be prepared to handle raw file names - (i.e. binaries). If the regular expression contains - codepoints beyond 255, it will not match file names that do - not conform to the expected character encoding (i.e. are not - encoded in valid UTF-8).</p> - - <p>For more information about raw file names, see the - <seealso marker="kernel:file">file</seealso> module.</p> + <p>Folds function <c><anno>Fun</anno></c> over all (regular) files + <c><anno>F</anno></c> in directory <c><anno>Dir</anno></c> that match + the regular expression <c><anno>RegExp</anno></c> (for a description + of the allowed regular expressions, + see the <seealso marker="re"><c>re</c></seealso> module). + If <c><anno>Recursive</anno></c> is <c>true</c>, all subdirectories + to <c>Dir</c> + are processed. The regular expression matching is only done on + the filename without the directory part.</p> + <p>If Unicode filename translation is in effect and the file + system is transparent, filenames that cannot be + interpreted as Unicode can be encountered, in which case the + <c>fun()</c> must be prepared to handle raw filenames + (that is, binaries). If the regular expression contains + codepoints > 255, it does not match filenames that do + not conform to the expected character encoding (that is, are not + encoded in valid UTF-8).</p> + <p>For more information about raw filenames, see the + <seealso marker="kernel:file"><c>file</c></seealso> module.</p> </desc> </func> + <func> <name name="is_dir" arity="1"/> - <fsummary>Test whether Name refer to a directory or not</fsummary> + <fsummary>Test whether <c>Name</c> refers to a directory.</fsummary> <desc> - <p>The <c>is_dir/1</c> function returns <c>true</c> if <c><anno>Name</anno></c> - refers to a directory, and <c>false</c> otherwise.</p> + <p>Returns <c>true</c> if <c><anno>Name</anno></c> + refers to a directory, otherwise <c>false</c>.</p> </desc> </func> + <func> <name name="is_file" arity="1"/> - <fsummary>Test whether Name refer to a file or directory.</fsummary> + <fsummary>Test whether <c>Name</c> refers to a file or directory. + </fsummary> <desc> - <p>The <c>is_file/1</c> function returns <c>true</c> if <c><anno>Name</anno></c> - refers to a file or a directory, and <c>false</c> otherwise.</p> + <p>Returns <c>true</c> if <c><anno>Name</anno></c> + refers to a file or a directory, otherwise <c>false</c>.</p> </desc> </func> + <func> <name name="is_regular" arity="1"/> - <fsummary>Test whether Name refer to a (regular) file.</fsummary> + <fsummary>Test whether <c>Name</c> refers to a (regular) file.</fsummary> <desc> - <p>The <c>is_regular/1</c> function returns <c>true</c> if <c><anno>Name</anno></c> - refers to a file (regular file), and <c>false</c> otherwise.</p> + <p>Returns <c>true</c> if <c><anno>Name</anno></c> + refers to a (regular) file, otherwise <c>false</c>.</p> </desc> </func> + <func> <name name="last_modified" arity="1"/> - <fsummary>Return the local date and time when a file was last modified.</fsummary> + <fsummary>Return the local date and time when a file was last modified. + </fsummary> <desc> - <p>The <c>last_modified/1</c> function returns the date and time the - given file or directory was last modified, or 0 if the file - does not exist.</p> + <p>Returns the date and time the specified file or directory was last + modified, or <c>0</c> if the file does not exist.</p> </desc> </func> + <func> <name name="wildcard" arity="1"/> <fsummary>Match filenames using Unix-style wildcards.</fsummary> <desc> - <p>The <c>wildcard/1</c> function returns a list of all files - that match Unix-style wildcard-string <c><anno>Wildcard</anno></c>.</p> + <p>Returns a list of all files that match Unix-style wildcard string + <c><anno>Wildcard</anno></c>.</p> <p>The wildcard string looks like an ordinary filename, except - that certain "wildcard characters" are interpreted in a special - way. The following characters are special: - </p> + that the following "wildcard characters" are interpreted in a special + way:</p> <taglist> <tag>?</tag> <item> @@ -160,14 +170,14 @@ </item> <tag>**</tag> <item> - <p>Two adjacent <c>*</c>'s used as a single pattern will - match all files and zero or more directories and subdirectories.</p> + <p>Two adjacent <c>*</c> used as a single pattern match + all files and zero or more directories and subdirectories.</p> </item> <tag>[Character1,Character2,...]</tag> <item> <p>Matches any of the characters listed. Two characters - separated by a hyphen will match a range of characters. - Example: <c>[A-Z]</c> will match any uppercase letter.</p> + separated by a hyphen match a range of characters. + Example: <c>[A-Z]</c> matches any uppercase letter.</p> </item> <tag>{Item,...}</tag> <item> @@ -175,49 +185,45 @@ </item> </taglist> <p>Other characters represent themselves. Only filenames that - have exactly the same character in the same position will match. - (Matching is case-sensitive; i.e. "a" will not match "A"). - </p> - <p>Note that multiple "*" characters are allowed - (as in Unix wildcards, but opposed to Windows/DOS wildcards). - </p> - <p>Examples:</p> + have exactly the same character in the same position match. + Matching is case-sensitive, for example, "a" does not match "A".</p> + <p>Notice that multiple "*" characters are allowed + (as in Unix wildcards, but opposed to Windows/DOS wildcards).</p> + <p><em>Examples:</em></p> <p>The following examples assume that the current directory is the - top of an Erlang/OTP installation. - </p> - <p>To find all <c>.beam</c> files in all applications, the following - line can be used:</p> + top of an Erlang/OTP installation.</p> + <p>To find all <c>.beam</c> files in all applications, use the + following line:</p> <code type="none"> - filelib:wildcard("lib/*/ebin/*.beam"). </code> - <p>To find either <c>.erl</c> or <c>.hrl</c> in all applications - <c>src</c> directories, the following</p> +filelib:wildcard("lib/*/ebin/*.beam").</code> + <p>To find <c>.erl</c> or <c>.hrl</c> in all applications <c>src</c> + directories, use either of the following lines:</p> <code type="none"> - filelib:wildcard("lib/*/src/*.?rl") </code> - <p>or the following line</p> +filelib:wildcard("lib/*/src/*.?rl")</code> <code type="none"> - filelib:wildcard("lib/*/src/*.{erl,hrl}") </code> - <p>can be used.</p> - <p>To find all <c>.hrl</c> files in either <c>src</c> or <c>include</c> - directories, use:</p> +filelib:wildcard("lib/*/src/*.{erl,hrl}")</code> + <p>To find all <c>.hrl</c> files in <c>src</c> or <c>include</c> + directories:</p> <code type="none"> - filelib:wildcard("lib/*/{src,include}/*.hrl"). </code> +filelib:wildcard("lib/*/{src,include}/*.hrl").</code> <p>To find all <c>.erl</c> or <c>.hrl</c> files in either - <c>src</c> or <c>include</c> directories, use:</p> + <c>src</c> or <c>include</c> directories:</p> <code type="none"> - filelib:wildcard("lib/*/{src,include}/*.{erl,hrl}") </code> - <p>To find all <c>.erl</c> or <c>.hrl</c> files in any - subdirectory, use:</p> +filelib:wildcard("lib/*/{src,include}/*.{erl,hrl}")</code> + <p>To find all <c>.erl</c> or <c>.hrl</c> files in any subdirectory:</p> <code type="none"> - filelib:wildcard("lib/**/*.{erl,hrl}") </code> +filelib:wildcard("lib/**/*.{erl,hrl}")</code> </desc> </func> + <func> <name name="wildcard" arity="2"/> - <fsummary>Match filenames using Unix-style wildcards starting at a specified directory.</fsummary> + <fsummary>Match filenames using Unix-style wildcards starting at a + specified directory.</fsummary> <desc> - <p>The <c>wildcard/2</c> function works like <c>wildcard/1</c>, - except that instead of the actual working directory, <c><anno>Cwd</anno></c> - will be used.</p> + <p>Same as <seealso marker="#wildcard/1"><c>wildcard/1</c></seealso>, + except that <c><anno>Cwd</anno></c> is used instead of the working + directory.</p> </desc> </func> </funcs> diff --git a/lib/stdlib/doc/src/filename.xml b/lib/stdlib/doc/src/filename.xml index f284a7596c..2a413835d0 100644 --- a/lib/stdlib/doc/src/filename.xml +++ b/lib/stdlib/doc/src/filename.xml @@ -25,27 +25,37 @@ <title>filename</title> <prepared>Kenneth Lundin</prepared> <docno>1</docno> - <date>97-11-13</date> + <date>1997-11-13</date> <rev>B</rev> </header> <module>filename</module> - <modulesummary>Filename Manipulation Functions</modulesummary> + <modulesummary>Filename manipulation functions.</modulesummary> <description> - <p>The module <c>filename</c> provides a number of useful functions - for analyzing and manipulating file names. These functions are + <p>This module provides functions + for analyzing and manipulating filenames. These functions are designed so that the Erlang code can work on many different - platforms with different formats for file names. With file name - is meant all strings that can be used to denote a file. They can - be short relative names like <c>foo.erl</c>, very long absolute - name which include a drive designator and directory names like + platforms with different filename formats. With filename + is meant all strings that can be used to denote a file. The filename + can be a short relative name like <c>foo.erl</c>, a long absolute + name including a drive designator, a directory name like <c>D:\usr/local\bin\erl/lib\tools\foo.erl</c>, or any variations in between.</p> - <p>In Windows, all functions return file names with forward slashes - only, even if the arguments contain back slashes. Use - <c>join/1</c> to normalize a file name by removing redundant - directory separators.</p> - <p>The module supports raw file names in the way that if a binary is present, or the file name cannot be interpreted according to the return value of - <seealso marker="kernel:file#native_name_encoding/0">file:native_name_encoding/0</seealso>, a raw file name will also be returned. For example filename:join/1 provided with a path component being a binary (and also not being possible to interpret under the current native file name encoding) will result in a raw file name being returned (the join operation will have been performed of course). For more information about raw file names, see the <seealso marker="kernel:file">file</seealso> module.</p> + + <p>In Windows, all functions return filenames with forward slashes + only, even if the arguments contain backslashes. To normalize a + filename by removing redundant directory separators, use + <seealso marker="#join/1"><c>join/1</c></seealso>.</p> + + <p>The module supports raw filenames in the way that if a binary is + present, or the filename cannot be interpreted according to the return + value of <seealso marker="kernel:file#native_name_encoding/0"> + <c>file:native_name_encoding/0</c></seealso>, a raw filename is also + returned. For example, <c>join/1</c> provided with a path component + that is a binary (and cannot be interpreted under the current + native filename encoding) results in a raw filename that is returned + (the join operation is performed of course). For more information + about raw filenames, see the + <seealso marker="kernel:file"><c>file</c></seealso> module.</p> </description> <datatypes> <datatype> @@ -56,13 +66,14 @@ <funcs> <func> <name name="absname" arity="1"/> - <fsummary>Convert a filename to an absolute name, relative the working directory</fsummary> + <fsummary>Convert a filename to an absolute name, relative the working + directory.</fsummary> <desc> - <p>Converts a relative <c><anno>Filename</anno></c> and returns an absolute - name. No attempt is made to create the shortest absolute name, - because this can give incorrect results on file systems which + <p>Converts a relative <c><anno>Filename</anno></c> and returns an + absolute name. No attempt is made to create the shortest absolute + name, as this can give incorrect results on file systems that allow links.</p> - <p>Unix examples:</p> + <p><em>Unix examples:</em></p> <pre> 1> <input>pwd().</input> "/usr/local" @@ -72,7 +83,7 @@ "/usr/local/../x" 4> <input>filename:absname("/").</input> "/"</pre> - <p>Windows examples:</p> + <p><em>Windows examples:</em></p> <pre> 1> <input>pwd().</input> "D:/usr/local" @@ -84,28 +95,32 @@ "D:/"</pre> </desc> </func> + <func> <name name="absname" arity="2"/> - <fsummary>Convert a filename to an absolute name, relative a specified directory</fsummary> + <fsummary>Convert a filename to an absolute name, relative a specified + directory.</fsummary> <desc> - <p>This function works like <c>absname/1</c>, except that - the directory to which the file name should be made relative - is given explicitly in the <c><anno>Dir</anno></c> argument.</p> + <p>Same as <seealso marker="#absname/1"><c>absname/1</c></seealso>, + except that the directory to which the filename is to be made + relative is specified in argument <c><anno>Dir</anno></c>.</p> </desc> </func> + <func> <name name="absname_join" arity="2"/> - <fsummary>Join an absolute directory with a relative filename</fsummary> + <fsummary>Join an absolute directory with a relative filename.</fsummary> <desc> - <p>Joins an absolute directory with a relative filename. - Similar to <c>join/2</c>, but on platforms with tight - restrictions on raw filename length and no support for - symbolic links (read: VxWorks), leading parent directory - components in <c><anno>Filename</anno></c> are matched against trailing - directory components in <c><anno>Dir</anno></c> so they can be removed - from the result - minimizing its length.</p> + <p>Joins an absolute directory with a relative filename. Similar to + <seealso marker="#join/2"><c>join/2</c></seealso>, but on platforms + with tight restrictions on raw filename length and no support for + symbolic links (read: VxWorks), leading parent directory components + in <c><anno>Filename</anno></c> are matched against trailing + directory components in <c><anno>Dir</anno></c> so they can be + removed from the result - minimizing its length.</p> </desc> </func> + <func> <name name="basedir" arity="2"/> <fsummary>Equivalent to <c>basedir(<anno>Type</anno>,<anno>Application</anno>,#{})</c>.</fsummary> @@ -121,11 +136,13 @@ <fsummary></fsummary> <desc><marker id="basedir-3"/> <p> - Returns a suitable path, or paths, for a given type. - If <c>os</c> is not set in <c><anno>Opts</anno></c> the function will default to - the native option, i.e. <c>'linux'</c>, <c>'darwin'</c> or <c>'windows'</c>, as understood - by <c>os:type/0</c>. Anything not recognized as <c>'darwin'</c> or <c>'windows'</c> is - interpreted as <c>'linux'</c>.</p> + Returns a suitable path, or paths, for a given type. If + <c>os</c> is not set in <c><anno>Opts</anno></c> the + function will default to the native option, that is + <c>'linux'</c>, <c>'darwin'</c> or <c>'windows'</c>, as + understood by <c>os:type/0</c>. Anything not recognized + as <c>'darwin'</c> or <c>'windows'</c> is interpreted as + <c>'linux'</c>.</p> <p> The options <c>'author'</c> and <c>'version'</c> are only used with <c>'windows'</c> option mode. </p> @@ -257,11 +274,12 @@ true </func> <func> <name name="basename" arity="1"/> - <fsummary>Return the last component of a filename</fsummary> + <fsummary>Return the last component of a filename.</fsummary> <desc> <p>Returns the last component of <c><anno>Filename</anno></c>, or - <c><anno>Filename</anno></c> itself if it does not contain any directory - separators.</p> + <c><anno>Filename</anno></c> itself if it does not contain any + directory separators.</p> + <p><em>Examples:</em></p> <pre> 5> <input>filename:basename("foo").</input> "foo" @@ -271,15 +289,18 @@ true []</pre> </desc> </func> + <func> <name name="basename" arity="2"/> - <fsummary>Return the last component of a filename, stripped of the specified extension</fsummary> + <fsummary>Return the last component of a filename, stripped of the + specified extension.</fsummary> <desc> - <p>Returns the last component of <c><anno>Filename</anno></c> with the - extension <c><anno>Ext</anno></c> stripped. This function should be used - to remove a specific extension which might, or might not, be - there. Use <c>rootname(basename(Filename))</c> to remove an - extension that exists, but you are not sure which one it is.</p> + <p>Returns the last component of <c><anno>Filename</anno></c> with + extension <c><anno>Ext</anno></c> stripped. This function is to be + used to remove a (possible) specific extension. To remove an + existing extension when you are unsure which one it is, use + <c>rootname(basename(Filename))</c>.</p> + <p><em>Examples:</em></p> <pre> 8> <input>filename:basename("~/src/kalle.erl", ".erl").</input> "kalle" @@ -293,27 +314,32 @@ true "kalle"</pre> </desc> </func> + <func> <name name="dirname" arity="1"/> - <fsummary>Return the directory part of a path name</fsummary> + <fsummary>Return the directory part of a path name.</fsummary> <desc> <p>Returns the directory part of <c><anno>Filename</anno></c>.</p> + <p><em>Examples:</em></p> <pre> 13> <input>filename:dirname("/usr/src/kalle.erl").</input> "/usr/src" 14> <input>filename:dirname("kalle.erl").</input> -"." - +"."</pre> + <pre> 5> <input>filename:dirname("\\usr\\src/kalle.erl").</input> % Windows "/usr/src"</pre> </desc> </func> + <func> <name name="extension" arity="1"/> - <fsummary>Return the file extension</fsummary> + <fsummary>Return the file extension.</fsummary> <desc> - <p>Returns the file extension of <c><anno>Filename</anno></c>, including - the period. Returns an empty string if there is no extension.</p> + <p>Returns the file extension of <c><anno>Filename</anno></c>, + including the period. Returns an empty string if no extension + exists.</p> + <p><em>Examples:</em></p> <pre> 15> <input>filename:extension("foo.erl").</input> ".erl" @@ -321,69 +347,123 @@ true []</pre> </desc> </func> + + <func> + <name name="find_src" arity="1"/> + <name name="find_src" arity="2"/> + <fsummary>Find the filename and compiler options for a module.</fsummary> + <desc> + <p>Finds the source filename and compiler options for a module. + The result can be fed to <seealso marker="compiler:compile#file/2"> + <c>compile:file/2</c></seealso> to compile the file again.</p> + <warning><p>It is not recommended to use this function. If possible, + use the <seealso marker="beam_lib"><c>beam_lib(3)</c></seealso> + module to extract the abstract code format from the Beam file and + compile that instead.</p></warning> + <p>Argument <c><anno>Beam</anno></c>, which can be a string or an atom, + specifies either the module name or the path to the source + code, with or without extension <c>".erl"</c>. In either + case, the module must be known by the code server, that is, + <c>code:which(<anno>Module</anno>)</c> must succeed.</p> + <p><c><anno>Rules</anno></c> describes how the source directory can be + found when the object code directory is known. It is a list of + tuples <c>{<anno>BinSuffix</anno>, <anno>SourceSuffix</anno>}</c> and + is interpreted as follows: if the end of the directory name where the + object is located matches <c><anno>BinSuffix</anno></c>, then the + source code directory has the same name, but with + <c><anno>BinSuffix</anno></c> replaced by + <c><anno>SourceSuffix</anno></c>. <c><anno>Rules</anno></c> defaults + to:</p> + <code type="none"> +[{"", ""}, {"ebin", "src"}, {"ebin", "esrc"}]</code> + <p>If the source file is found in the resulting directory, the function + returns that location together with <c><anno>Options</anno></c>. + Otherwise the next rule is tried, and so on.</p> + <p>The function returns <c>{<anno>SourceFile</anno>, + <anno>Options</anno>}</c> if it succeeds. + <c><anno>SourceFile</anno></c> is the absolute path to the source + file without extension <c>".erl"</c>. <c><anno>Options</anno></c> + includes the options that are necessary to recompile the file with + <c>compile:file/2</c>, but excludes options such as <c>report</c> + and <c>verbose</c>, which do not change the way code is generated. + The paths in options <c>{outdir, <anno>Path</anno>}</c> and + <c>{i, Path}</c> are guaranteed to be absolute.</p> + </desc> + </func> + <func> <name name="flatten" arity="1"/> - <fsummary>Convert a filename to a flat string</fsummary> + <fsummary>Convert a filename to a flat string.</fsummary> <desc> <p>Converts a possibly deep list filename consisting of characters and atoms into the corresponding flat string filename.</p> </desc> </func> + <func> <name name="join" arity="1"/> - <fsummary>Join a list of filename components with directory separators</fsummary> + <fsummary>Join a list of filename components with directory separators. + </fsummary> <desc> - <p>Joins a list of file name <c><anno>Components</anno></c> with directory - separators. If one of the elements of <c><anno>Components</anno></c> - includes an absolute path, for example <c>"/xxx"</c>, + <p>Joins a list of filename <c><anno>Components</anno></c> with + directory separators. + If one of the elements of <c><anno>Components</anno></c> + includes an absolute path, such as <c>"/xxx"</c>, the preceding elements, if any, are removed from the result.</p> <p>The result is "normalized":</p> <list type="bulleted"> <item>Redundant directory separators are removed.</item> <item>In Windows, all directory separators are forward - slashes and the drive letter is in lower case.</item> + slashes and the drive letter is in lower case.</item> </list> + <p><em>Examples:</em></p> <pre> 17> <input>filename:join(["/usr", "local", "bin"]).</input> "/usr/local/bin" 18> <input>filename:join(["a/b///c/"]).</input> -"a/b/c" - +"a/b/c"</pre> + <pre> 6> <input>filename:join(["B:a\\b///c/"]).</input> % Windows "b:a/b/c"</pre> </desc> </func> + <func> <name name="join" arity="2"/> - <fsummary>Join two filename components with directory separators</fsummary> + <fsummary>Join two filename components with directory separators. + </fsummary> <desc> - <p>Joins two file name components with directory separators. - Equivalent to <c>join([<anno>Name1</anno>, <anno>Name2</anno>])</c>.</p> + <p>Joins two filename components with directory separators. + Equivalent to <c>join([<anno>Name1</anno>, <anno>Name2</anno>])</c>. + </p> </desc> </func> + <func> <name name="nativename" arity="1"/> - <fsummary>Return the native form of a file path</fsummary> + <fsummary>Return the native form of a file path.</fsummary> <desc> - <p>Converts <c><anno>Path</anno></c> to a form accepted by the command shell - and native applications on the current platform. On Windows, + <p>Converts <c><anno>Path</anno></c> to a form accepted by the command + shell and native applications on the current platform. On Windows, forward slashes are converted to backward slashes. On all - platforms, the name is normalized as done by <c>join/1</c>.</p> + platforms, the name is normalized as done by + <seealso marker="#join/1"><c>join/1</c></seealso>.</p> + <p><em>Examples:</em></p> <pre> 19> <input>filename:nativename("/usr/local/bin/").</input> % Unix -"/usr/local/bin" - +"/usr/local/bin"</pre> + <pre> 7> <input>filename:nativename("/usr/local/bin/").</input> % Windows "\\usr\\local\\bin"</pre> </desc> </func> + <func> <name name="pathtype" arity="1"/> - <fsummary>Return the type of a path</fsummary> + <fsummary>Return the path type.</fsummary> <desc> - <p>Returns the type of path, one of <c>absolute</c>, - <c>relative</c>, or <c>volumerelative</c>.</p> + <p>Returns the path type, which is one of the following:</p> <taglist> <tag><c>absolute</c></tag> <item> @@ -408,14 +488,16 @@ true </taglist> </desc> </func> + <func> <name name="rootname" arity="1"/> <name name="rootname" arity="2"/> - <fsummary>Remove a filename extension</fsummary> + <fsummary>Remove a filename extension.</fsummary> <desc> - <p>Remove a filename extension. <c>rootname/2</c> works as + <p>Removes a filename extension. <c>rootname/2</c> works as <c>rootname/1</c>, except that the extension is removed only if it is <c><anno>Ext</anno></c>.</p> + <p><em>Examples:</em></p> <pre> 20> <input>filename:rootname("/beam.src/kalle").</input> /beam.src/kalle" @@ -427,12 +509,14 @@ true "/beam.src/foo.beam"</pre> </desc> </func> + <func> <name name="split" arity="1"/> - <fsummary>Split a filename into its path components</fsummary> + <fsummary>Split a filename into its path components.</fsummary> <desc> <p>Returns a list whose elements are the path components of <c><anno>Filename</anno></c>.</p> + <p><em>Examples:</em></p> <pre> 24> <input>filename:split("/usr/local/bin").</input> ["/","usr","local","bin"] @@ -442,50 +526,6 @@ true ["a:/","msdev","include"]</pre> </desc> </func> - <func> - <name name="find_src" arity="1"/> - <name name="find_src" arity="2"/> - <fsummary>Find the filename and compiler options for a module</fsummary> - <desc> - <p>Finds the source filename and compiler options for a module. - The result can be fed to <c>compile:file/2</c> in order to - compile the file again.</p> - - <warning><p>We don't recommend using this function. If possible, - use <seealso marker="beam_lib">beam_lib(3)</seealso> to extract - the abstract code format from the BEAM file and compile that - instead.</p></warning> - - <p>The <c><anno>Beam</anno></c> argument, which can be a string or an atom, - specifies either the module name or the path to the source - code, with or without the <c>".erl"</c> extension. In either - case, the module must be known by the code server, i.e. - <c>code:which(<anno>Module</anno>)</c> must succeed.</p> - <p><c><anno>Rules</anno></c> describes how the source directory can be found, - when the object code directory is known. It is a list of - tuples <c>{<anno>BinSuffix</anno>, <anno>SourceSuffix</anno>}</c> and is interpreted - as follows: If the end of the directory name where the object - is located matches <c><anno>BinSuffix</anno></c>, then the source code - directory has the same name, but with <c><anno>BinSuffix</anno></c> - replaced by <c><anno>SourceSuffix</anno></c>. <c><anno>Rules</anno></c> defaults to:</p> - <code type="none"> -[{"", ""}, {"ebin", "src"}, {"ebin", "esrc"}]</code> - <p>If the source file is found in the resulting directory, then - the function returns that location together with - <c><anno>Options</anno></c>. Otherwise, the next rule is tried, and so on.</p> - - <p>The function returns <c>{<anno>SourceFile</anno>, <anno>Options</anno>}</c> if it succeeds. - <c><anno>SourceFile</anno></c> is the absolute path to the source file - without the <c>".erl"</c> extension. <c><anno>Options</anno></c> include - the options which are necessary to recompile the file with - <c>compile:file/2</c>, but excludes options such as - <c>report</c> or <c>verbose</c> which do not change the way - code is generated. The paths in the <c>{outdir, <anno>Path</anno>}</c> - and <c>{i, Path}</c> options are guaranteed to be - absolute.</p> - - </desc> - </func> </funcs> </erlref> diff --git a/lib/stdlib/doc/src/gb_sets.xml b/lib/stdlib/doc/src/gb_sets.xml index 84609a0f7c..d677dd6f83 100644 --- a/lib/stdlib/doc/src/gb_sets.xml +++ b/lib/stdlib/doc/src/gb_sets.xml @@ -29,87 +29,75 @@ <rev></rev> </header> <module>gb_sets</module> - <modulesummary>General Balanced Trees</modulesummary> + <modulesummary>General balanced trees.</modulesummary> <description> - <p>An implementation of ordered sets using Prof. Arne Andersson's - General Balanced Trees. This can be much more efficient than + <p>This module provides ordered sets using Prof. Arne Andersson's + General Balanced Trees. Ordered sets can be much more efficient than using ordered lists, for larger sets, but depends on the application.</p> + <p>This module considers two elements as different if and only if they do not compare equal (<c>==</c>).</p> </description> <section> - <title>Complexity note</title> - <p>The complexity on set operations is bounded by either O(|S|) or - O(|T| * log(|S|)), where S is the largest given set, depending + <title>Complexity Note</title> + <p>The complexity on set operations is bounded by either <em>O(|S|)</em> or + <em>O(|T| * log(|S|))</em>, where S is the largest given set, depending on which is fastest for any particular function call. For operating on sets of almost equal size, this implementation is about 3 times slower than using ordered-list sets directly. For sets of very different sizes, however, this solution can be - arbitrarily much faster; in practical cases, often between 10 - and 100 times. This implementation is particularly suited for + arbitrarily much faster; in practical cases, often + 10-100 times. This implementation is particularly suited for accumulating elements a few at a time, building up a large set - (more than 100-200 elements), and repeatedly testing for + (> 100-200 elements), and repeatedly testing for membership in the current set.</p> + <p>As with normal tree structures, lookup (membership testing), - insertion and deletion have logarithmic complexity.</p> + insertion, and deletion have logarithmic complexity.</p> </section> <section> <title>Compatibility</title> - <p>All of the following functions in this module also exist - and do the same thing in the <c>sets</c> and <c>ordsets</c> + <p>The following functions in this module also exist and provides + the same functionality in the + <seealso marker="sets"><c>sets(3)</c></seealso> and + <seealso marker="ordsets"><c>ordsets(3)</c></seealso> modules. That is, by only changing the module name for each call, you can try out different set representations.</p> <list type="bulleted"> - <item> - <p><c>add_element/2</c></p> + <item><seealso marker="#add_element/2"><c>add_element/2</c></seealso> </item> - <item> - <p><c>del_element/2</c></p> + <item><seealso marker="#del_element/2"><c>del_element/2</c></seealso> </item> - <item> - <p><c>filter/2</c></p> + <item><seealso marker="#filter/2"><c>filter/2</c></seealso> </item> - <item> - <p><c>fold/3</c></p> + <item><seealso marker="#fold/3"><c>fold/3</c></seealso> </item> - <item> - <p><c>from_list/1</c></p> + <item><seealso marker="#from_list/1"><c>from_list/1</c></seealso> </item> - <item> - <p><c>intersection/1</c></p> + <item><seealso marker="#intersection/1"><c>intersection/1</c></seealso> </item> - <item> - <p><c>intersection/2</c></p> + <item><seealso marker="#intersection/2"><c>intersection/2</c></seealso> </item> - <item> - <p><c>is_element/2</c></p> + <item><seealso marker="#is_element/2"><c>is_element/2</c></seealso> </item> - <item> - <p><c>is_set/1</c></p> + <item><seealso marker="#is_set/1"><c>is_set/1</c></seealso> </item> - <item> - <p><c>is_subset/2</c></p> + <item><seealso marker="#is_subset/2"><c>is_subset/2</c></seealso> </item> - <item> - <p><c>new/0</c></p> + <item><seealso marker="#new/0"><c>new/0</c></seealso> </item> - <item> - <p><c>size/1</c></p> + <item><seealso marker="#size/1"><c>size/1</c></seealso> </item> - <item> - <p><c>subtract/2</c></p> + <item><seealso marker="#subtract/2"><c>subtract/2</c></seealso> </item> - <item> - <p><c>to_list/1</c></p> + <item><seealso marker="#to_list/1"><c>to_list/1</c></seealso> </item> - <item> - <p><c>union/1</c></p> + <item><seealso marker="#union/1"><c>union/1</c></seealso> </item> - <item> - <p><c>union/2</c></p> + <item><seealso marker="#union/2"><c>union/2</c></seealso> </item> </list> </section> @@ -117,290 +105,369 @@ <datatypes> <datatype> <name name="set" n_vars="1"/> - <desc><p>A GB set.</p></desc> + <desc><p>A general balanced set.</p></desc> </datatype> <datatype> <name name="set" n_vars="0"/> </datatype> <datatype> <name name="iter" n_vars="1"/> - <desc><p>A GB set iterator.</p></desc> + <desc><p>A general balanced set iterator.</p></desc> </datatype> <datatype> <name name="iter" n_vars="0"/> </datatype> </datatypes> + <funcs> <func> <name name="add" arity="2"/> <name name="add_element" arity="2"/> - <fsummary>Add a (possibly existing) element to a set</fsummary> + <fsummary>Add a (possibly existing) element to a set.</fsummary> <desc> <p>Returns a new set formed from <c><anno>Set1</anno></c> with - <c><anno>Element</anno></c> inserted. If <c><anno>Element</anno></c> is already an + <c><anno>Element</anno></c> inserted. If <c><anno>Element</anno></c> + is already an element in <c><anno>Set1</anno></c>, nothing is changed.</p> </desc> </func> + <func> <name name="balance" arity="1"/> - <fsummary>Rebalance tree representation of a set</fsummary> + <fsummary>Rebalance tree representation of a set.</fsummary> <desc> - <p>Rebalances the tree representation of <c><anno>Set1</anno></c>. Note that - this is rarely necessary, but may be motivated when a large + <p>Rebalances the tree representation of <c><anno>Set1</anno></c>. + Notice that + this is rarely necessary, but can be motivated when a large number of elements have been deleted from the tree without - further insertions. Rebalancing could then be forced in order - to minimise lookup times, since deletion only does not + further insertions. Rebalancing can then be forced + to minimise lookup times, as deletion does not rebalance the tree.</p> </desc> </func> + + <func> + <name name="del_element" arity="2"/> + <fsummary>Remove a (possibly non-existing) element from a set.</fsummary> + <desc> + <p>Returns a new set formed from <c><anno>Set1</anno></c> with + <c><anno>Element</anno></c> removed. If <c><anno>Element</anno></c> + is not an element + in <c><anno>Set1</anno></c>, nothing is changed.</p> + </desc> + </func> + <func> <name name="delete" arity="2"/> - <fsummary>Remove an element from a set</fsummary> + <fsummary>Remove an element from a set.</fsummary> <desc> <p>Returns a new set formed from <c><anno>Set1</anno></c> with - <c><anno>Element</anno></c> removed. Assumes that <c><anno>Element</anno></c> is present + <c><anno>Element</anno></c> removed. Assumes that + <c><anno>Element</anno></c> is present in <c><anno>Set1</anno></c>.</p> </desc> </func> + <func> <name name="delete_any" arity="2"/> - <name name="del_element" arity="2"/> - <fsummary>Remove a (possibly non-existing) element from a set</fsummary> + <fsummary>Remove a (possibly non-existing) element from a set.</fsummary> <desc> <p>Returns a new set formed from <c><anno>Set1</anno></c> with - <c><anno>Element</anno></c> removed. If <c><anno>Element</anno></c> is not an element + <c><anno>Element</anno></c> removed. If <c><anno>Element</anno></c> + is not an element in <c><anno>Set1</anno></c>, nothing is changed.</p> </desc> </func> + <func> <name name="difference" arity="2"/> - <name name="subtract" arity="2"/> - <fsummary>Return the difference of two sets</fsummary> + <fsummary>Return the difference of two sets.</fsummary> <desc> - <p>Returns only the elements of <c><anno>Set1</anno></c> which are not also - elements of <c><anno>Set2</anno></c>.</p> + <p>Returns only the elements of <c><anno>Set1</anno></c> that are not + also elements of <c><anno>Set2</anno></c>.</p> </desc> </func> + <func> <name name="empty" arity="0"/> - <name name="new" arity="0"/> - <fsummary>Return an empty set</fsummary> + <fsummary>Return an empty set.</fsummary> <desc> <p>Returns a new empty set.</p> </desc> </func> + <func> <name name="filter" arity="2"/> - <fsummary>Filter set elements</fsummary> + <fsummary>Filter set elements.</fsummary> <desc> <p>Filters elements in <c><anno>Set1</anno></c> using predicate function <c><anno>Pred</anno></c>.</p> </desc> </func> + <func> <name name="fold" arity="3"/> - <fsummary>Fold over set elements</fsummary> + <fsummary>Fold over set elements.</fsummary> <desc> - <p>Folds <c><anno>Function</anno></c> over every element in <c><anno>Set</anno></c> + <p>Folds <c><anno>Function</anno></c> over every element in + <c><anno>Set</anno></c> returning the final value of the accumulator.</p> </desc> </func> + <func> <name name="from_list" arity="1"/> - <fsummary>Convert a list into a set</fsummary> + <fsummary>Convert a list into a set.</fsummary> <desc> <p>Returns a set of the elements in <c><anno>List</anno></c>, where - <c><anno>List</anno></c> may be unordered and contain duplicates.</p> + <c><anno>List</anno></c> can be unordered and contain duplicates.</p> </desc> </func> + <func> <name name="from_ordset" arity="1"/> - <fsummary>Make a set from an ordset list</fsummary> + <fsummary>Make a set from an ordset list.</fsummary> <desc> - <p>Turns an ordered-set list <c><anno>List</anno></c> into a set. The list - must not contain duplicates.</p> + <p>Turns an ordered-set list <c><anno>List</anno></c> into a set. + The list must not contain duplicates.</p> </desc> </func> + <func> <name name="insert" arity="2"/> - <fsummary>Add a new element to a set</fsummary> + <fsummary>Add a new element to a set.</fsummary> <desc> <p>Returns a new set formed from <c><anno>Set1</anno></c> with - <c><anno>Element</anno></c> inserted. Assumes that <c><anno>Element</anno></c> is not + <c><anno>Element</anno></c> inserted. Assumes that + <c><anno>Element</anno></c> is not present in <c><anno>Set1</anno></c>.</p> </desc> </func> + <func> - <name name="intersection" arity="2"/> - <fsummary>Return the intersection of two sets</fsummary> + <name name="intersection" arity="1"/> + <fsummary>Return the intersection of a list of sets.</fsummary> <desc> - <p>Returns the intersection of <c><anno>Set1</anno></c> and <c><anno>Set2</anno></c>.</p> + <p>Returns the intersection of the non-empty list of sets.</p> </desc> </func> + <func> - <name name="intersection" arity="1"/> - <fsummary>Return the intersection of a list of sets</fsummary> + <name name="intersection" arity="2"/> + <fsummary>Return the intersection of two sets.</fsummary> <desc> - <p>Returns the intersection of the non-empty list of sets.</p> + <p>Returns the intersection of <c><anno>Set1</anno></c> and + <c><anno>Set2</anno></c>.</p> </desc> </func> + <func> <name name="is_disjoint" arity="2"/> - <fsummary>Check whether two sets are disjoint</fsummary> + <fsummary>Check whether two sets are disjoint.</fsummary> <desc> <p>Returns <c>true</c> if <c><anno>Set1</anno></c> and <c><anno>Set2</anno></c> are disjoint (have no elements in common), - and <c>false</c> otherwise.</p> + otherwise <c>false</c>.</p> + </desc> + </func> + + <func> + <name name="is_element" arity="2"/> + <fsummary>Test for membership of a set.</fsummary> + <desc> + <p>Returns <c>true</c> if <c><anno>Element</anno></c> is an element of + <c><anno>Set</anno></c>, otherwise <c>false</c>.</p> </desc> </func> + <func> <name name="is_empty" arity="1"/> - <fsummary>Test for empty set</fsummary> + <fsummary>Test for empty set.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>Set</anno></c> is an empty set, and - <c>false</c> otherwise.</p> + <p>Returns <c>true</c> if <c><anno>Set</anno></c> is an empty set, + otherwise <c>false</c>.</p> </desc> </func> + <func> <name name="is_member" arity="2"/> - <name name="is_element" arity="2"/> - <fsummary>Test for membership of a set</fsummary> + <fsummary>Test for membership of a set.</fsummary> <desc> <p>Returns <c>true</c> if <c><anno>Element</anno></c> is an element of <c><anno>Set</anno></c>, otherwise <c>false</c>.</p> </desc> </func> + <func> <name name="is_set" arity="1"/> - <fsummary>Test for a set</fsummary> + <fsummary>Test for a set.</fsummary> <desc> <p>Returns <c>true</c> if <c><anno>Term</anno></c> appears to be a set, otherwise <c>false</c>.</p> </desc> </func> + <func> <name name="is_subset" arity="2"/> - <fsummary>Test for subset</fsummary> + <fsummary>Test for subset.</fsummary> <desc> <p>Returns <c>true</c> when every element of <c><anno>Set1</anno></c> is also a member of <c><anno>Set2</anno></c>, otherwise <c>false</c>.</p> </desc> </func> + <func> <name name="iterator" arity="1"/> - <fsummary>Return an iterator for a set</fsummary> + <fsummary>Return an iterator for a set.</fsummary> <desc> - <p>Returns an iterator that can be used for traversing the - entries of <c><anno>Set</anno></c>; see <c>next/1</c>. The implementation + <p>Returns an iterator that can be used for traversing the entries of + <c><anno>Set</anno></c>; see + <seealso marker="#next/1"><c>next/1</c></seealso>. The implementation of this is very efficient; traversing the whole set using - <c>next/1</c> is only slightly slower than getting the list - of all elements using <c>to_list/1</c> and traversing that. + <c>next/1</c> is only slightly slower than getting the list of all + elements using <seealso marker="#to_list/1"><c>to_list/1</c></seealso> + and traversing that. The main advantage of the iterator approach is that it does not require the complete list of all elements to be built in memory at one time.</p> </desc> </func> + <func> <name name="iterator_from" arity="2"/> - <fsummary>Return an iterator for a set starting from a specified element</fsummary> + <fsummary>Return an iterator for a set starting from a specified element. + </fsummary> <desc> <p>Returns an iterator that can be used for traversing the - entries of <c><anno>Set</anno></c>; see <c>next/1</c>. + entries of <c><anno>Set</anno></c>; see + <seealso marker="#next/1"><c>next/1</c></seealso>. The difference as compared to the iterator returned by - <c>iterator/1</c> is that the first element greater than + <seealso marker="#iterator/1"><c>iterator/1</c></seealso> + is that the first element greater than or equal to <c><anno>Element</anno></c> is returned.</p> </desc> </func> + <func> <name name="largest" arity="1"/> - <fsummary>Return largest element</fsummary> + <fsummary>Return largest element.</fsummary> <desc> <p>Returns the largest element in <c><anno>Set</anno></c>. Assumes that - <c><anno>Set</anno></c> is nonempty.</p> + <c><anno>Set</anno></c> is not empty.</p> </desc> </func> + + <func> + <name name="new" arity="0"/> + <fsummary>Return an empty set.</fsummary> + <desc> + <p>Returns a new empty set.</p> + </desc> + </func> + <func> <name name="next" arity="1"/> - <fsummary>Traverse a set with an iterator</fsummary> + <fsummary>Traverse a set with an iterator.</fsummary> <desc> - <p>Returns <c>{<anno>Element</anno>, <anno>Iter2</anno>}</c> where <c><anno>Element</anno></c> is the - smallest element referred to by the iterator <c><anno>Iter1</anno></c>, + <p>Returns <c>{<anno>Element</anno>, <anno>Iter2</anno>}</c>, where + <c><anno>Element</anno></c> is the smallest element referred to by + iterator <c><anno>Iter1</anno></c>, and <c><anno>Iter2</anno></c> is the new iterator to be used for traversing the remaining elements, or the atom <c>none</c> if no elements remain.</p> </desc> </func> + <func> <name name="singleton" arity="1"/> - <fsummary>Return a set with one element</fsummary> + <fsummary>Return a set with one element.</fsummary> <desc> - <p>Returns a set containing only the element <c><anno>Element</anno></c>.</p> + <p>Returns a set containing only element <c><anno>Element</anno></c>. + </p> </desc> </func> + <func> <name name="size" arity="1"/> - <fsummary>Return the number of elements in a set</fsummary> + <fsummary>Return the number of elements in a set.</fsummary> <desc> <p>Returns the number of elements in <c><anno>Set</anno></c>.</p> </desc> </func> + <func> <name name="smallest" arity="1"/> - <fsummary>Return smallest element</fsummary> + <fsummary>Return smallest element.</fsummary> <desc> <p>Returns the smallest element in <c><anno>Set</anno></c>. Assumes that - <c><anno>Set</anno></c> is nonempty.</p> + <c><anno>Set</anno></c> is not empty.</p> </desc> </func> + + <func> + <name name="subtract" arity="2"/> + <fsummary>Return the difference of two sets.</fsummary> + <desc> + <p>Returns only the elements of <c><anno>Set1</anno></c> that are not + also elements of <c><anno>Set2</anno></c>.</p> + </desc> + </func> + <func> <name name="take_largest" arity="1"/> - <fsummary>Extract largest element</fsummary> + <fsummary>Extract largest element.</fsummary> <desc> - <p>Returns <c>{<anno>Element</anno>, <anno>Set2</anno>}</c>, where <c><anno>Element</anno></c> is the - largest element in <c><anno>Set1</anno></c>, and <c><anno>Set2</anno></c> is this set - with <c><anno>Element</anno></c> deleted. Assumes that <c><anno>Set1</anno></c> is - nonempty.</p> + <p>Returns <c>{<anno>Element</anno>, <anno>Set2</anno>}</c>, where + <c><anno>Element</anno></c> is the largest element in + <c><anno>Set1</anno></c>, and <c><anno>Set2</anno></c> is this set + with <c><anno>Element</anno></c> deleted. Assumes that + <c><anno>Set1</anno></c> is not empty.</p> </desc> </func> + <func> <name name="take_smallest" arity="1"/> - <fsummary>Extract smallest element</fsummary> + <fsummary>Extract smallest element.</fsummary> <desc> - <p>Returns <c>{<anno>Element</anno>, <anno>Set2</anno>}</c>, where <c><anno>Element</anno></c> is the - smallest element in <c><anno>Set1</anno></c>, and <c><anno>Set2</anno></c> is this set - with <c><anno>Element</anno></c> deleted. Assumes that <c><anno>Set1</anno></c> is - nonempty.</p> + <p>Returns <c>{<anno>Element</anno>, <anno>Set2</anno>}</c>, where + <c><anno>Element</anno></c> is the smallest element in + <c><anno>Set1</anno></c>, and <c><anno>Set2</anno></c> is this set + with <c><anno>Element</anno></c> deleted. Assumes that + <c><anno>Set1</anno></c> is not empty.</p> </desc> </func> + <func> <name name="to_list" arity="1"/> - <fsummary>Convert a set into a list</fsummary> + <fsummary>Convert a set into a list.</fsummary> <desc> <p>Returns the elements of <c><anno>Set</anno></c> as a list.</p> </desc> </func> + <func> - <name name="union" arity="2"/> - <fsummary>Return the union of two sets</fsummary> + <name name="union" arity="1"/> + <fsummary>Return the union of a list of sets.</fsummary> <desc> - <p>Returns the merged (union) set of <c><anno>Set1</anno></c> and - <c><anno>Set2</anno></c>.</p> + <p>Returns the merged (union) set of the list of sets.</p> </desc> </func> + <func> - <name name="union" arity="1"/> - <fsummary>Return the union of a list of sets</fsummary> + <name name="union" arity="2"/> + <fsummary>Return the union of two sets.</fsummary> <desc> - <p>Returns the merged (union) set of the list of sets.</p> + <p>Returns the merged (union) set of <c><anno>Set1</anno></c> and + <c><anno>Set2</anno></c>.</p> </desc> </func> </funcs> <section> - <title>SEE ALSO</title> - <p><seealso marker="gb_trees">gb_trees(3)</seealso>, - <seealso marker="ordsets">ordsets(3)</seealso>, - <seealso marker="sets">sets(3)</seealso></p> + <title>See Also</title> + <p><seealso marker="gb_trees"><c>gb_trees(3)</c></seealso>, + <seealso marker="ordsets"><c>ordsets(3)</c></seealso>, + <seealso marker="sets"><c>sets(3)</c></seealso></p> </section> </erlref> diff --git a/lib/stdlib/doc/src/gb_trees.xml b/lib/stdlib/doc/src/gb_trees.xml index 5d1f27c014..9a49d66820 100644 --- a/lib/stdlib/doc/src/gb_trees.xml +++ b/lib/stdlib/doc/src/gb_trees.xml @@ -29,277 +29,320 @@ <rev></rev> </header> <module>gb_trees</module> - <modulesummary>General Balanced Trees</modulesummary> + <modulesummary>General balanced trees.</modulesummary> <description> - <p>An efficient implementation of Prof. Arne Andersson's General + <p>This module provides Prof. Arne Andersson's General Balanced Trees. These have no storage overhead compared to - unbalanced binary trees, and their performance is in general + unbalanced binary trees, and their performance is better than AVL trees.</p> + <p>This module considers two keys as different if and only if they do not compare equal (<c>==</c>).</p> </description> <section> - <title>Data structure</title> - <p>Data structure:</p> + <title>Data Structure</title> <code type="none"> - -- {Size, Tree}, where `Tree' is composed of nodes of the form: - - {Key, Value, Smaller, Bigger}, and the "empty tree" node: - - nil.</code> - <p>There is no attempt to balance trees after deletions. Since +{Size, Tree}</code> + + <p><c>Tree</c> is composed of nodes of the form <c>{Key, Value, Smaller, + Bigger}</c> and the "empty tree" node <c>nil</c>.</p> + + <p>There is no attempt to balance trees after deletions. As deletions do not increase the height of a tree, this should be OK.</p> - <p>Original balance condition <em>h(T) <= ceil(c * log(|T|))</em> + + <p>The original balance condition <em>h(T) <= ceil(c * log(|T|))</em> has been changed to the similar (but not quite equivalent) condition <em>2 ^ h(T) <= |T| ^ c</em>. This should also be OK.</p> - <p>Performance is comparable to the AVL trees in the Erlang book - (and faster in general due to less overhead); the difference is - that deletion works for these trees, but not for the book's - trees. Behaviour is logarithmic (as it should be).</p> </section> <datatypes> <datatype> <name name="tree" n_vars="2"/> - <desc><p>A GB tree.</p></desc> + <desc><p>A general balanced tree.</p></desc> </datatype> <datatype> <name name="tree" n_vars="0"/> </datatype> <datatype> <name name="iter" n_vars="2"/> - <desc><p>A GB tree iterator.</p></desc> + <desc><p>A general balanced tree iterator.</p></desc> </datatype> <datatype> <name name="iter" n_vars="0"/> </datatype> </datatypes> + <funcs> <func> <name name="balance" arity="1"/> - <fsummary>Rebalance a tree</fsummary> + <fsummary>Rebalance a tree.</fsummary> <desc> - <p>Rebalances <c><anno>Tree1</anno></c>. Note that this is rarely necessary, - but may be motivated when a large number of nodes have been + <p>Rebalances <c><anno>Tree1</anno></c>. Notice that this is + rarely necessary, + but can be motivated when many nodes have been deleted from the tree without further insertions. Rebalancing - could then be forced in order to minimise lookup times, since - deletion only does not rebalance the tree.</p> + can then be forced to minimize lookup times, as + deletion does not rebalance the tree.</p> </desc> </func> + <func> <name name="delete" arity="2"/> - <fsummary>Remove a node from a tree</fsummary> + <fsummary>Remove a node from a tree.</fsummary> <desc> - <p>Removes the node with key <c><anno>Key</anno></c> from <c><anno>Tree1</anno></c>; - returns new tree. Assumes that the key is present in the tree, - crashes otherwise.</p> + <p>Removes the node with key <c><anno>Key</anno></c> from + <c><anno>Tree1</anno></c> and returns the new tree. Assumes that the + key is present in the tree, crashes otherwise.</p> </desc> </func> + <func> <name name="delete_any" arity="2"/> - <fsummary>Remove a (possibly non-existing) node from a tree</fsummary> + <fsummary>Remove a (possibly non-existing) node from a tree.</fsummary> <desc> - <p>Removes the node with key <c><anno>Key</anno></c> from <c><anno>Tree1</anno></c> if - the key is present in the tree, otherwise does nothing; - returns new tree.</p> + <p>Removes the node with key <c><anno>Key</anno></c> from + <c><anno>Tree1</anno></c> if + the key is present in the tree, otherwise does nothing. + Returns the new tree.</p> </desc> </func> + <func> <name name="empty" arity="0"/> - <fsummary>Return an empty tree</fsummary> + <fsummary>Return an empty tree.</fsummary> <desc> - <p>Returns a new empty tree</p> + <p>Returns a new empty tree.</p> </desc> </func> + <func> <name name="enter" arity="3"/> - <fsummary>Insert or update key with value in a tree</fsummary> + <fsummary>Insert or update key with value in a tree.</fsummary> <desc> - <p>Inserts <c><anno>Key</anno></c> with value <c><anno>Value</anno></c> into <c><anno>Tree1</anno></c> if - the key is not present in the tree, otherwise updates - <c><anno>Key</anno></c> to value <c><anno>Value</anno></c> in <c><anno>Tree1</anno></c>. Returns the + <p>Inserts <c><anno>Key</anno></c> with value <c><anno>Value</anno></c> + into <c><anno>Tree1</anno></c> if the key is not present in the tree, + otherwise updates <c><anno>Key</anno></c> to value + <c><anno>Value</anno></c> in <c><anno>Tree1</anno></c>. Returns the new tree.</p> </desc> </func> + <func> <name name="from_orddict" arity="1"/> - <fsummary>Make a tree from an orddict</fsummary> + <fsummary>Make a tree from an orddict.</fsummary> <desc> - <p>Turns an ordered list <c><anno>List</anno></c> of key-value tuples into a - tree. The list must not contain duplicate keys.</p> + <p>Turns an ordered list <c><anno>List</anno></c> of key-value tuples + into a tree. The list must not contain duplicate keys.</p> </desc> </func> + <func> <name name="get" arity="2"/> - <fsummary>Look up a key in a tree, if present</fsummary> + <fsummary>Look up a key in a tree, if present.</fsummary> <desc> - <p>Retrieves the value stored with <c><anno>Key</anno></c> in <c><anno>Tree</anno></c>. + <p>Retrieves the value stored with <c><anno>Key</anno></c> in + <c><anno>Tree</anno></c>. Assumes that the key is present in the tree, crashes otherwise.</p> </desc> </func> + <func> <name name="insert" arity="3"/> - <fsummary>Insert a new key and value in a tree</fsummary> + <fsummary>Insert a new key and value in a tree.</fsummary> <desc> - <p>Inserts <c><anno>Key</anno></c> with value <c><anno>Value</anno></c> into <c><anno>Tree1</anno></c>; + <p>Inserts <c><anno>Key</anno></c> with value <c><anno>Value</anno></c> + into <c><anno>Tree1</anno></c> and returns the new tree. Assumes that the key is not present in the tree, crashes otherwise.</p> </desc> </func> + <func> <name name="is_defined" arity="2"/> - <fsummary>Test for membership of a tree</fsummary> + <fsummary>Test for membership of a tree.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>Key</anno></c> is present in <c><anno>Tree</anno></c>, - otherwise <c>false</c>.</p> + <p>Returns <c>true</c> if <c><anno>Key</anno></c> is present in + <c><anno>Tree</anno></c>, otherwise <c>false</c>.</p> </desc> </func> + <func> <name name="is_empty" arity="1"/> - <fsummary>Test for empty tree</fsummary> + <fsummary>Test for empty tree.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>Tree</anno></c> is an empty tree, and - <c>false</c> otherwise.</p> + <p>Returns <c>true</c> if <c><anno>Tree</anno></c> is an empty tree, + othwewise <c>false</c>.</p> </desc> </func> + <func> <name name="iterator" arity="1"/> - <fsummary>Return an iterator for a tree</fsummary> + <fsummary>Return an iterator for a tree.</fsummary> <desc> <p>Returns an iterator that can be used for traversing the - entries of <c><anno>Tree</anno></c>; see <c>next/1</c>. The implementation + entries of <c><anno>Tree</anno></c>; see + <seealso marker="#next/1"><c>next/1</c></seealso>. The implementation of this is very efficient; traversing the whole tree using <c>next/1</c> is only slightly slower than getting the list - of all elements using <c>to_list/1</c> and traversing that. + of all elements using + <seealso marker="#to_list/1"><c>to_list/1</c></seealso> + and traversing that. The main advantage of the iterator approach is that it does not require the complete list of all elements to be built in memory at one time.</p> </desc> </func> + <func> <name name="iterator_from" arity="2"/> - <fsummary>Return an iterator for a tree starting from specified key</fsummary> + <fsummary>Return an iterator for a tree starting from a specified key. + </fsummary> <desc> <p>Returns an iterator that can be used for traversing the - entries of <c><anno>Tree</anno></c>; see <c>next/1</c>. - The difference as compared to the iterator returned by - <c>iterator/1</c> is that the first key greater than - or equal to <c><anno>Key</anno></c> is returned.</p> + entries of <c><anno>Tree</anno></c>; see + <seealso marker="#next/1"><c>next/1</c></seealso>. + The difference as compared to the iterator returned by + <seealso marker="#iterator/1"><c>iterator/1</c></seealso> + is that the first key greater than + or equal to <c><anno>Key</anno></c> is returned.</p> </desc> </func> + <func> <name name="keys" arity="1"/> - <fsummary>Return a list of the keys in a tree</fsummary> + <fsummary>Return a list of the keys in a tree.</fsummary> <desc> <p>Returns the keys in <c><anno>Tree</anno></c> as an ordered list.</p> </desc> </func> + <func> <name name="largest" arity="1"/> - <fsummary>Return largest key and value</fsummary> + <fsummary>Return largest key and value.</fsummary> <desc> - <p>Returns <c>{<anno>Key</anno>, <anno>Value</anno>}</c>, where <c><anno>Key</anno></c> is the largest - key in <c><anno>Tree</anno></c>, and <c><anno>Value</anno></c> is the value associated - with this key. Assumes that the tree is nonempty.</p> + <p>Returns <c>{<anno>Key</anno>, <anno>Value</anno>}</c>, where + <c><anno>Key</anno></c> is the largest + key in <c><anno>Tree</anno></c>, and <c><anno>Value</anno></c> is + the value associated + with this key. Assumes that the tree is not empty.</p> </desc> </func> + <func> <name name="lookup" arity="2"/> - <fsummary>Look up a key in a tree</fsummary> + <fsummary>Look up a key in a tree.</fsummary> <desc> - <p>Looks up <c><anno>Key</anno></c> in <c><anno>Tree</anno></c>; returns - <c>{value, <anno>Value</anno>}</c>, or <c>none</c> if <c><anno>Key</anno></c> is not - present.</p> + <p>Looks up <c><anno>Key</anno></c> in <c><anno>Tree</anno></c>. + Returns <c>{value, <anno>Value</anno>}</c>, or <c>none</c> if + <c><anno>Key</anno></c> is not present.</p> </desc> </func> + <func> <name name="map" arity="2"/> - <fsummary>Return largest key and value</fsummary> - <desc><p>Maps the function F(<anno>K</anno>, <anno>V1</anno>) -> <anno>V2</anno> to all key-value pairs - of the tree <c><anno>Tree1</anno></c> and returns a new tree <c><anno>Tree2</anno></c> with the same set of keys - as <c><anno>Tree1</anno></c> and the new set of values <c><anno>V2</anno></c>.</p> + <fsummary>Return largest key and value.</fsummary> + <desc> + <p>Maps function F(<anno>K</anno>, <anno>V1</anno>) -> <anno>V2</anno> + to all key-value pairs of tree <c><anno>Tree1</anno></c>. Returns a + new tree <c><anno>Tree2</anno></c> with the same set of keys as + <c><anno>Tree1</anno></c> and the new set of values + <c><anno>V2</anno></c>.</p> </desc> </func> + <func> <name name="next" arity="1"/> - <fsummary>Traverse a tree with an iterator</fsummary> + <fsummary>Traverse a tree with an iterator.</fsummary> <desc> - <p>Returns <c>{<anno>Key</anno>, <anno>Value</anno>, <anno>Iter2</anno>}</c> where <c><anno>Key</anno></c> is the - smallest key referred to by the iterator <c><anno>Iter1</anno></c>, and + <p>Returns <c>{<anno>Key</anno>, <anno>Value</anno>, + <anno>Iter2</anno>}</c>, where <c><anno>Key</anno></c> is the + smallest key referred to by iterator <c><anno>Iter1</anno></c>, and <c><anno>Iter2</anno></c> is the new iterator to be used for traversing the remaining nodes, or the atom <c>none</c> if no nodes remain.</p> </desc> </func> + <func> <name name="size" arity="1"/> - <fsummary>Return the number of nodes in a tree</fsummary> + <fsummary>Return the number of nodes in a tree.</fsummary> <desc> <p>Returns the number of nodes in <c><anno>Tree</anno></c>.</p> </desc> </func> + <func> <name name="smallest" arity="1"/> - <fsummary>Return smallest key and value</fsummary> + <fsummary>Return smallest key and value.</fsummary> <desc> - <p>Returns <c>{<anno>Key</anno>, <anno>Value</anno>}</c>, where <c><anno>Key</anno></c> is the smallest - key in <c><anno>Tree</anno></c>, and <c><anno>Value</anno></c> is the value associated - with this key. Assumes that the tree is nonempty.</p> + <p>Returns <c>{<anno>Key</anno>, <anno>Value</anno>}</c>, where + <c><anno>Key</anno></c> is the smallest + key in <c><anno>Tree</anno></c>, and <c><anno>Value</anno></c> is + the value associated + with this key. Assumes that the tree is not empty.</p> </desc> </func> + <func> <name name="take_largest" arity="1"/> - <fsummary>Extract largest key and value</fsummary> + <fsummary>Extract largest key and value.</fsummary> <desc> - <p>Returns <c>{<anno>Key</anno>, <anno>Value</anno>, <anno>Tree2</anno>}</c>, where <c><anno>Key</anno></c> is the - largest key in <c><anno>Tree1</anno></c>, <c><anno>Value</anno></c> is the value - associated with this key, and <c><anno>Tree2</anno></c> is this tree with - the corresponding node deleted. Assumes that the tree is - nonempty.</p> + <p>Returns <c>{<anno>Key</anno>, <anno>Value</anno>, + <anno>Tree2</anno>}</c>, where <c><anno>Key</anno></c> is the + largest key in <c><anno>Tree1</anno></c>, <c><anno>Value</anno></c> + is the value associated with this key, and <c><anno>Tree2</anno></c> + is this tree with the corresponding node deleted. Assumes that the + tree is not empty.</p> </desc> </func> + <func> <name name="take_smallest" arity="1"/> - <fsummary>Extract smallest key and value</fsummary> + <fsummary>Extract smallest key and value.</fsummary> <desc> - <p>Returns <c>{<anno>Key</anno>, <anno>Value</anno>, <anno>Tree2</anno>}</c>, where <c><anno>Key</anno></c> is the - smallest key in <c><anno>Tree1</anno></c>, <c><anno>Value</anno></c> is the value - associated with this key, and <c><anno>Tree2</anno></c> is this tree with - the corresponding node deleted. Assumes that the tree is - nonempty.</p> + <p>Returns <c>{<anno>Key</anno>, <anno>Value</anno>, + <anno>Tree2</anno>}</c>, where <c><anno>Key</anno></c> is the + smallest key in <c><anno>Tree1</anno></c>, <c><anno>Value</anno></c> + is the value associated with this key, and <c><anno>Tree2</anno></c> + is this tree with the corresponding node deleted. Assumes that the + tree is not empty.</p> </desc> </func> + <func> <name name="to_list" arity="1"/> - <fsummary>Convert a tree into a list</fsummary> + <fsummary>Convert a tree into a list.</fsummary> <desc> <p>Converts a tree into an ordered list of key-value tuples.</p> </desc> </func> + <func> <name name="update" arity="3"/> - <fsummary>Update a key to new value in a tree</fsummary> + <fsummary>Update a key to new value in a tree.</fsummary> <desc> - <p>Updates <c><anno>Key</anno></c> to value <c><anno>Value</anno></c> in <c><anno>Tree1</anno></c>; - returns the new tree. Assumes that the key is present in the - tree.</p> + <p>Updates <c><anno>Key</anno></c> to value <c><anno>Value</anno></c> + in <c><anno>Tree1</anno></c> and + returns the new tree. Assumes that the key is present in the tree.</p> </desc> </func> + <func> <name name="values" arity="1"/> - <fsummary>Return a list of the values in a tree</fsummary> + <fsummary>Return a list of the values in a tree.</fsummary> <desc> - <p>Returns the values in <c><anno>Tree</anno></c> as an ordered list, sorted - by their corresponding keys. Duplicates are not removed.</p> + <p>Returns the values in <c><anno>Tree</anno></c> as an ordered list, + sorted by their corresponding keys. Duplicates are not removed.</p> </desc> </func> </funcs> <section> - <title>SEE ALSO</title> - <p><seealso marker="gb_sets">gb_sets(3)</seealso>, - <seealso marker="dict">dict(3)</seealso></p> + <title>See Also</title> + <p><seealso marker="dict"><c>dict(3)</c></seealso>, + <seealso marker="gb_sets"><c>gb_sets(3)</c></seealso></p> </section> </erlref> diff --git a/lib/stdlib/doc/src/gen_event.xml b/lib/stdlib/doc/src/gen_event.xml index b2c482d3ed..c24542002a 100644 --- a/lib/stdlib/doc/src/gen_event.xml +++ b/lib/stdlib/doc/src/gen_event.xml @@ -29,19 +29,23 @@ <rev></rev> </header> <module>gen_event</module> - <modulesummary>Generic Event Handling Behaviour</modulesummary> + <modulesummary>Generic event handling behavior.</modulesummary> <description> - <p>A behaviour module for implementing event handling functionality. - The OTP event handling model consists of a generic event manager - process with an arbitrary number of event handlers which are added and - deleted dynamically.</p> - <p>An event manager implemented using this module will have a standard - set of interface functions and include functionality for tracing and - error reporting. It will also fit into an OTP supervision tree. - Refer to <em>OTP Design Principles</em> for more information.</p> + <p>This behavior module provides event handling functionality. It + consists of a generic event manager process with any number of + event handlers that are added and deleted dynamically.</p> + + <p>An event manager implemented using this module has a standard + set of interface functions and includes functionality for tracing and + error reporting. It also fits into an OTP supervision tree. For more + information, see + <seealso marker="doc/design_principles:events">OTP Design Principles</seealso>. + </p> + <p>Each event handler is implemented as a callback module exporting - a pre-defined set of functions. The relationship between the behaviour - functions and the callback functions can be illustrated as follows:</p> + a predefined set of functions. The relationship between the behavior + functions and the callback functions is as follows:</p> + <pre> gen_event module Callback module ---------------- --------------- @@ -69,39 +73,46 @@ gen_event:which_handlers -----> - gen_event:stop -----> Module:terminate/2 - -----> Module:code_change/3</pre> - <p>Since each event handler is one callback module, an event manager - will have several callback modules which are added and deleted - dynamically. Therefore <c>gen_event</c> is more tolerant of callback - module errors than the other behaviours. If a callback function for + + <p>As each event handler is one callback module, an event manager + has many callback modules that are added and deleted + dynamically. <c>gen_event</c> is therefore more tolerant of callback + module errors than the other behaviors. If a callback function for an installed event handler fails with <c>Reason</c>, or returns a - bad value <c>Term</c>, the event manager will not fail. It will delete - the event handler by calling the callback function - <c>Module:terminate/2</c> (see below), giving as argument + bad value <c>Term</c>, the event manager does not fail. It deletes + the event handler by calling callback function + <seealso marker="#Module:terminate/2"><c>Module:terminate/2</c></seealso>, + giving as argument <c>{error,{'EXIT',Reason}}</c> or <c>{error,Term}</c>, respectively. - No other event handler will be affected.</p> - <p>A gen_event process handles system messages as documented in - <seealso marker="sys">sys(3)</seealso>. The <c>sys</c> module + No other event handler is affected.</p> + + <p>A <c>gen_event</c> process handles system messages as described in + <seealso marker="sys"><c>sys(3)</c></seealso>. The <c>sys</c> module can be used for debugging an event manager.</p> - <p>Note that an event manager <em>does</em> trap exit signals + + <p>Notice that an event manager <em>does</em> trap exit signals automatically.</p> - <p>The gen_event process can go into hibernation - (see <seealso marker="erts:erlang#erlang:hibernate/3">erlang(3)</seealso>) if a callback - function in a handler module specifies <c>'hibernate'</c> in its return value. - This might be useful if the server is expected to be idle for a long - time. However this feature should be used with care as hibernation - implies at least two garbage collections (when hibernating and - shortly after waking up) and is not something you'd want to do - between each event handled by a busy event manager.</p> - - <p>It's also worth noting that when multiple event handlers are - invoked, it's sufficient that one single event handler returns a - <c>'hibernate'</c> request for the whole event manager to go into - hibernation.</p> + + <p>The <c>gen_event</c> process can go into hibernation + (see <seealso marker="erts:erlang#hibernate/3"> + <c>erlang:hibernate/3</c></seealso>) if a callback function in + a handler module specifies <c>hibernate</c> in its return value. + This can be useful if the server is expected to be idle for a long + time. However, use this feature with care, as hibernation + implies at least two garbage collections (when hibernating and + shortly after waking up) and is not something you want to do + between each event handled by a busy event manager.</p> + + <p>Notice that when multiple event handlers are + invoked, it is sufficient that one single event handler returns a + <c>hibernate</c> request for the whole event manager to go into + hibernation.</p> <p>Unless otherwise stated, all functions in this module fail if the specified event manager does not exist or if bad arguments are - given.</p> + specified.</p> </description> + <datatypes> <datatype> <name name="handler"/> @@ -116,66 +127,9 @@ gen_event:stop -----> Module:terminate/2 <name name="del_handler_ret"/> </datatype> </datatypes> + <funcs> <func> - <name>start_link() -> Result</name> - <name>start_link(EventMgrName) -> Result</name> - <fsummary>Create a generic event manager process in a supervision tree.</fsummary> - <type> - <v>EventMgrName = {local,Name} | {global,GlobalName} - | {via,Module,ViaName}</v> - <v> Name = atom()</v> - <v> GlobalName = ViaName = term()</v> - <v>Result = {ok,Pid} | {error,{already_started,Pid}}</v> - <v> Pid = pid()</v> - </type> - <desc> - <p>Creates an event manager process as part of a supervision - tree. The function should be called, directly or indirectly, - by the supervisor. It will, among other things, ensure that - the event manager is linked to the supervisor.</p> - <p>If <c>EventMgrName={local,Name}</c>, the event manager is - registered locally as <c>Name</c> using <c>register/2</c>. - If <c>EventMgrName={global,GlobalName}</c>, the event manager is - registered globally as <c>GlobalName</c> using - <c>global:register_name/2</c>. If no name is provided, - the event manager is not registered. - If <c>EventMgrName={via,Module,ViaName}</c>, the event manager will - register with the registry represented by <c>Module</c>. - The <c>Module</c> callback should export the functions - <c>register_name/2</c>, <c>unregister_name/1</c>, - <c>whereis_name/1</c> and <c>send/2</c>, which should behave like the - corresponding functions in <c>global</c>. Thus, - <c>{via,global,GlobalName}</c> is a valid reference.</p> - <p>If the event manager is successfully created the function - returns <c>{ok,Pid}</c>, where <c>Pid</c> is the pid of - the event manager. If there already exists a process with - the specified <c>EventMgrName</c> the function returns - <c>{error,{already_started,Pid}}</c>, where <c>Pid</c> is - the pid of that process.</p> - </desc> - </func> - <func> - <name>start() -> Result</name> - <name>start(EventMgrName) -> Result</name> - <fsummary>Create a stand-alone event manager process.</fsummary> - <type> - <v>EventMgrName = {local,Name} | {global,GlobalName} - | {via,Module,ViaName}</v> - <v> Name = atom()</v> - <v> GlobalName = ViaName = term()</v> - <v>Result = {ok,Pid} | {error,{already_started,Pid}}</v> - <v> Pid = pid()</v> - </type> - <desc> - <p>Creates a stand-alone event manager process, i.e. an event - manager which is not part of a supervision tree and thus has - no supervisor.</p> - <p>See <c>start_link/0,1</c> for a description of arguments and - return values.</p> - </desc> - </func> - <func> <name>add_handler(EventMgrRef, Handler, Args) -> Result</name> <fsummary>Add an event handler to a generic event manager.</fsummary> <type> @@ -191,26 +145,27 @@ gen_event:stop -----> Module:terminate/2 <v> Reason = term()</v> </type> <desc> - <p>Adds a new event handler to the event manager <c>EventMgrRef</c>. - The event manager will call <c>Module:init/1</c> to initiate - the event handler and its internal state.</p> - <p><c>EventMgrRef</c> can be:</p> + <p>Adds a new event handler to event manager <c>EventMgrRef</c>. + The event manager calls + <seealso marker="#Module:init/1"><c>Module:init/1</c></seealso> + to initiate the event handler and its internal state.</p> + <p><c>EventMgrRef</c> can be any of the following:</p> <list type="bulleted"> - <item>the pid,</item> - <item><c>Name</c>, if the event manager is locally registered,</item> + <item>The pid</item> + <item><c>Name</c>, if the event manager is locally registered</item> <item><c>{Name,Node}</c>, if the event manager is locally - registered at another node, or</item> + registered at another node</item> <item><c>{global,GlobalName}</c>, if the event manager is globally - registered.</item> - <item><c>{via,Module,ViaName}</c>, if the event manager is registered - through an alternative process registry.</item> + registered</item> + <item><c>{via,Module,ViaName}</c>, if the event manager is registered + through an alternative process registry</item> </list> <p><c>Handler</c> is the name of the callback module <c>Module</c> or a tuple <c>{Module,Id}</c>, where <c>Id</c> is any term. The <c>{Module,Id}</c> representation makes it possible to - identify a specific event handler when there are several event - handlers using the same callback module.</p> - <p><c>Args</c> is an arbitrary term which is passed as the argument + identify a specific event handler when many event handlers + use the same callback module.</p> + <p><c>Args</c> is any term that is passed as the argument to <c>Module:init/1</c>.</p> <p>If <c>Module:init/1</c> returns a correct value indicating successful completion, the event manager adds the event @@ -221,9 +176,11 @@ gen_event:stop -----> Module:terminate/2 <c>{error,Reason}</c>, respectively.</p> </desc> </func> + <func> <name>add_sup_handler(EventMgrRef, Handler, Args) -> Result</name> - <fsummary>Add a supervised event handler to a generic event manager.</fsummary> + <fsummary>Add a supervised event handler to a generic event manager. + </fsummary> <type> <v>EventMgrRef = Name | {Name,Node} | {global,GlobalName} | {via,Module,ViaName} | pid()</v> @@ -237,63 +194,52 @@ gen_event:stop -----> Module:terminate/2 <v> Reason = term()</v> </type> <desc> - <p>Adds a new event handler in the same way as <c>add_handler/3</c> - but will also supervise the connection between the event handler + <p>Adds a new event handler in the same way as + <seealso marker="#add_handler/3"><c>add_handler/3</c></seealso>, + but also supervises the connection between the event handler and the calling process.</p> <list type="bulleted"> <item>If the calling process later terminates with <c>Reason</c>, - the event manager will delete the event handler by calling - <c>Module:terminate/2</c> with <c>{stop,Reason}</c> as argument.</item> + the event manager deletes the event handler by calling + <seealso marker="#Module:terminate/2"> + <c>Module:terminate/2</c></seealso> + with <c>{stop,Reason}</c> as argument. + </item> <item> - <p>If the event handler later is deleted, the event manager + <p>If the event handler is deleted later, the event manager sends a message<c>{gen_event_EXIT,Handler,Reason}</c> to the calling process. <c>Reason</c> is one of the following:</p> <list type="bulleted"> - <item><c>normal</c>, if the event handler has been removed due to a - call to <c>delete_handler/3</c>, or <c>remove_handler</c> - has been returned by a callback function (see below).</item> - <item><c>shutdown</c>, if the event handler has been removed - because the event manager is terminating.</item> - <item><c>{swapped,NewHandler,Pid}</c>, if the process <c>Pid</c> - has replaced the event handler with another event handler - <c>NewHandler</c> using a call to <c>swap_handler/3</c> or - <c>swap_sup_handler/3</c>.</item> - <item>a term, if the event handler is removed due to an error. - Which term depends on the error.</item> + <item> + <p><c>normal</c>, if the event handler has been removed + because of a + call to <c>delete_handler/3</c>, or <c>remove_handler</c> + has been returned by a callback function (see below).</p> + </item> + <item> + <p><c>shutdown</c>, if the event handler has been removed + because the event manager is terminating.</p> + </item> + <item> + <p><c>{swapped,NewHandler,Pid}</c>, if the process <c>Pid</c> + has replaced the event handler with another event handler + <c>NewHandler</c> using a call to + <seealso marker="#swap_handler/3"> + <c>swap_handler/3</c></seealso> or + <seealso marker="#swap_sup_handler/3"> + <c>swap_sup_handler/3</c></seealso>.</p> + </item> + <item> + <p>A term, if the event handler is removed because of an error. + Which term depends on the error.</p></item> </list> </item> </list> - <p>See <c>add_handler/3</c> for a description of the arguments - and return values.</p> - </desc> - </func> - <func> - <name>notify(EventMgrRef, Event) -> ok</name> - <name>sync_notify(EventMgrRef, Event) -> ok</name> - <fsummary>Notify an event manager about an event.</fsummary> - <type> - <v>EventMgrRef = Name | {Name,Node} | {global,GlobalName} - | {via,Module,ViaName} | pid()</v> - <v> Name = Node = atom()</v> - <v> GlobalName = ViaName = term()</v> - <v>Event = term()</v> - </type> - <desc> - <p>Sends an event notification to the event manager - <c>EventMgrRef</c>. The event manager will call - <c>Module:handle_event/2</c> for each installed event handler to - handle the event.</p> - <p><c>notify</c> is asynchronous and will return immediately after - the event notification has been sent. <c>sync_notify</c> is - synchronous in the sense that it will return <c>ok</c> after - the event has been handled by all event handlers.</p> - <p>See <c>add_handler/3</c> for a description of <c>EventMgrRef</c>.</p> - <p><c>Event</c> is an arbitrary term which is passed as one of - the arguments to <c>Module:handle_event/2</c>.</p> - <p><c>notify</c> will not fail even if the specified event manager - does not exist, unless it is specified as <c>Name</c>.</p> + <p>For a description of the arguments and return values, see + <seealso marker="#add_handler/3"><c>add_handler/3</c></seealso>.</p> </desc> </func> + <func> <name>call(EventMgrRef, Handler, Request) -> Result</name> <name>call(EventMgrRef, Handler, Request, Timeout) -> Result</name> @@ -314,18 +260,18 @@ gen_event:stop -----> Module:terminate/2 <v> Reason = term()</v> </type> <desc> - <p>Makes a synchronous call to the event handler <c>Handler</c> - installed in the event manager <c>EventMgrRef</c> by sending a - request and waiting until a reply arrives or a timeout occurs. - The event manager will call <c>Module:handle_call/2</c> to handle - the request.</p> - <p>See <c>add_handler/3</c> for a description of <c>EventMgrRef</c> - and <c>Handler</c>.</p> - <p><c>Request</c> is an arbitrary term which is passed as one of + <p>Makes a synchronous call to event handler <c>Handler</c> + installed in event manager <c>EventMgrRef</c> by sending a + request and waiting until a reply arrives or a time-out occurs. + The event manager calls <seealso marker="#Module:handle_call/2"> + <c>Module:handle_call/2</c></seealso> to handle the request.</p> + <p>For a description of <c>EventMgrRef</c> and <c>Handler</c>, see + <seealso marker="#add_handler/3"><c>add_handler/3</c></seealso>.</p> + <p><c>Request</c> is any term that is passed as one of the arguments to <c>Module:handle_call/2</c>.</p> - <p><c>Timeout</c> is an integer greater than zero which specifies + <p><c>Timeout</c> is an integer greater than zero that specifies how many milliseconds to wait for a reply, or the atom - <c>infinity</c> to wait indefinitely. Default value is 5000. + <c>infinity</c> to wait indefinitely. Defaults to 5000. If no reply is received within the specified time, the function call fails.</p> <p>The return value <c>Reply</c> is defined in the return value of @@ -337,7 +283,8 @@ gen_event:stop -----> Module:terminate/2 respectively.</p> </desc> </func> - <func> + + <func> <name>delete_handler(EventMgrRef, Handler, Args) -> Result</name> <fsummary>Delete an event handler from a generic event manager.</fsummary> <type> @@ -353,12 +300,14 @@ gen_event:stop -----> Module:terminate/2 <v> Reason = term()</v> </type> <desc> - <p>Deletes an event handler from the event manager - <c>EventMgrRef</c>. The event manager will call - <c>Module:terminate/2</c> to terminate the event handler.</p> - <p>See <c>add_handler/3</c> for a description of <c>EventMgrRef</c> - and <c>Handler</c>.</p> - <p><c>Args</c> is an arbitrary term which is passed as one of + <p>Deletes an event handler from event manager + <c>EventMgrRef</c>. The event manager calls + <seealso marker="#Module:terminate/2"> + <c>Module:terminate/2</c></seealso> to terminate the event + handler.</p> + <p>For a description of <c>EventMgrRef</c> and <c>Handler</c>, see + <seealso marker="#add_handler/3"><c>add_handler/3</c></seealso>.</p> + <p><c>Args</c> is any term that is passed as one of the arguments to <c>Module:terminate/2</c>.</p> <p>The return value is the return value of <c>Module:terminate/2</c>. If the specified event handler is not installed, the function @@ -367,6 +316,148 @@ gen_event:stop -----> Module:terminate/2 <c>{'EXIT',Reason}</c>.</p> </desc> </func> + + <func> + <name>notify(EventMgrRef, Event) -> ok</name> + <name>sync_notify(EventMgrRef, Event) -> ok</name> + <fsummary>Notify an event manager about an event.</fsummary> + <type> + <v>EventMgrRef = Name | {Name,Node} | {global,GlobalName} + | {via,Module,ViaName} | pid()</v> + <v> Name = Node = atom()</v> + <v> GlobalName = ViaName = term()</v> + <v>Event = term()</v> + </type> + <desc> + <p>Sends an event notification to event manager + <c>EventMgrRef</c>. The event manager calls + <seealso marker="#Module:handle_event/2"> + <c>Module:handle_event/2</c></seealso> + for each installed event handler to handle the event.</p> + <p><c>notify/2</c> is asynchronous and returns immediately after + the event notification has been sent. <c>sync_notify/2</c> is + synchronous in the sense that it returns <c>ok</c> after + the event has been handled by all event handlers.</p> + <p>For a description of <c>EventMgrRef</c>, see + <seealso marker="#add_handler/3"><c>add_handler/3</c></seealso>.</p> + <p><c>Event</c> is any term that is passed as one of + the arguments to <seealso marker="#Module:handle_event/2"> + <c>Module:handle_event/2</c></seealso>.</p> + <p><c>notify/1</c> does not fail even if the specified event manager + does not exist, unless it is specified as <c>Name</c>.</p> + </desc> + </func> + + <func> + <name>start() -> Result</name> + <name>start(EventMgrName) -> Result</name> + <fsummary>Create a stand-alone event manager process.</fsummary> + <type> + <v>EventMgrName = {local,Name} | {global,GlobalName} + | {via,Module,ViaName}</v> + <v> Name = atom()</v> + <v> GlobalName = ViaName = term()</v> + <v>Result = {ok,Pid} | {error,{already_started,Pid}}</v> + <v> Pid = pid()</v> + </type> + <desc> + <p>Creates a stand-alone event manager process, that is, an event + manager that is not part of a supervision tree and thus has + no supervisor.</p> + <p>For a description of the arguments and return values, see + <seealso marker="#start_link/0"><c>start_link/0,1</c></seealso>.</p> + </desc> + </func> + + <func> + <name>start_link() -> Result</name> + <name>start_link(EventMgrName) -> Result</name> + <fsummary>Create a generic event manager process in a supervision tree. + </fsummary> + <type> + <v>EventMgrName = {local,Name} | {global,GlobalName} + | {via,Module,ViaName}</v> + <v> Name = atom()</v> + <v> GlobalName = ViaName = term()</v> + <v>Result = {ok,Pid} | {error,{already_started,Pid}}</v> + <v> Pid = pid()</v> + </type> + <desc> + <p>Creates an event manager process as part of a supervision + tree. The function is to be called, directly or indirectly, + by the supervisor. For example, it ensures that + the event manager is linked to the supervisor.</p> + <list type="bulleted"> + <item> + <p>If <c>EventMgrName={local,Name}</c>, the event manager is + registered locally as <c>Name</c> using <c>register/2</c>.</p> + </item> + <item> + <p>If <c>EventMgrName={global,GlobalName}</c>, the event manager is + registered globally as <c>GlobalName</c> using + <seealso marker="kernel:global#register_name/2"> + <c>global:register_name/2</c></seealso>. + If no name is provided, the event manager is not registered.</p> + </item> + <item> + <p>If <c>EventMgrName={via,Module,ViaName}</c>, the event manager + registers with the registry represented by <c>Module</c>. + The <c>Module</c> callback is to export the functions + <c>register_name/2</c>, <c>unregister_name/1</c>, + <c>whereis_name/1</c>, and <c>send/2</c>, which are to behave + as the corresponding functions in + <seealso marker="kernel:global"><c>global</c></seealso>. + Thus, <c>{via,global,GlobalName}</c> is a valid reference.</p> + </item> + </list> + <p>If the event manager is successfully created, the function + returns <c>{ok,Pid}</c>, where <c>Pid</c> is the pid of + the event manager. If a process with the specified + <c>EventMgrName</c> exists already, the function returns + <c>{error,{already_started,Pid}}</c>, where <c>Pid</c> is + the pid of that process.</p> + </desc> + </func> + + <func> + <name>stop(EventMgrRef) -> ok</name> + <name>stop(EventMgrRef, Reason, Timeout) -> ok</name> + <fsummary>Terminate a generic event manager.</fsummary> + <type> + <v>EventMgrRef = Name | {Name,Node} | {global,GlobalName} + | {via,Module,ViaName} | pid()</v> + <v>Name = Node = atom()</v> + <v>GlobalName = ViaName = term()</v> + <v>Reason = term()</v> + <v>Timeout = int()>0 | infinity</v> + </type> + <desc> + <p>Orders event manager <c>EventMgrRef</c> to exit with + the specifies <c>Reason</c> and waits for it to + terminate. Before terminating, <c>gen_event</c> calls + <seealso marker="#Module:terminate/2"> + <c>Module:terminate(stop,...)</c></seealso> + for each installed event handler.</p> + <p>The function returns <c>ok</c> if the event manager terminates + with the expected reason. Any other reason than <c>normal</c>, + <c>shutdown</c>, or <c>{shutdown,Term}</c> causes an + error report to be issued using + <seealso marker="kernel:error_logger#format/2"> + <c>error_logger:format/2</c></seealso>. + The default <c>Reason</c> is <c>normal</c>.</p> + <p><c>Timeout</c> is an integer greater than zero that + specifies how many milliseconds to wait for the event manager to + terminate, or the atom <c>infinity</c> to wait + indefinitely. Defaults to <c>infinity</c>. If the + event manager has not terminated within the specified time, a + <c>timeout</c> exception is raised.</p> + <p>If the process does not exist, a <c>noproc</c> exception + is raised.</p> + <p>For a description of <c>EventMgrRef</c>, see + <seealso marker="#add_handler/3"><c>add_handler/3</c></seealso>.</p> + </desc> + </func> + <func> <name>swap_handler(EventMgrRef, {Handler1,Args1}, {Handler2,Args2}) -> Result</name> <fsummary>Replace an event handler in a generic event manager.</fsummary> @@ -385,34 +476,35 @@ gen_event:stop -----> Module:terminate/2 </type> <desc> <p>Replaces an old event handler with a new event handler in - the event manager <c>EventMgrRef</c>.</p> - <p>See <c>add_handler/3</c> for a description of the arguments.</p> + event manager <c>EventMgrRef</c>.</p> + <p>For a description of the arguments, see + <seealso marker="#add_handler/3"><c>add_handler/3</c></seealso>.</p> <p>First the old event handler <c>Handler1</c> is deleted. The event manager calls <c>Module1:terminate(Args1, ...)</c>, where <c>Module1</c> is the callback module of <c>Handler1</c>, and collects the return value.</p> <p>Then the new event handler <c>Handler2</c> is added and initiated by calling <c>Module2:init({Args2,Term})</c>, where <c>Module2</c> - is the callback module of <c>Handler2</c> and <c>Term</c> + is the callback module of <c>Handler2</c> and <c>Term</c> is the return value of <c>Module1:terminate/2</c>. This makes it possible to transfer information from <c>Handler1</c> to <c>Handler2</c>.</p> - <p>The new handler will be added even if the the specified old event - handler is not installed in which case <c>Term=error</c>, or if - <c>Module1:terminate/2</c> fails with <c>Reason</c> in which case - <c>Term={'EXIT',Reason}</c>. - The old handler will be deleted even if <c>Module2:init/1</c> - fails.</p> + <p>The new handler is added even if the the specified old event + handler is not installed, in which case <c>Term=error</c>, or if + <c>Module1:terminate/2</c> fails with <c>Reason</c>, + in which case <c>Term={'EXIT',Reason}</c>. + The old handler is deleted even if <c>Module2:init/1</c> fails.</p> <p>If there was a supervised connection between <c>Handler1</c> and - a process <c>Pid</c>, there will be a supervised connection + a process <c>Pid</c>, there is a supervised connection between <c>Handler2</c> and <c>Pid</c> instead.</p> <p>If <c>Module2:init/1</c> returns a correct value, this function returns <c>ok</c>. If <c>Module2:init/1</c> fails with - <c>Reason</c> or returns an unexpected value <c>Term</c>, this + <c>Reason</c> or returns an unexpected value <c>Term</c>, this function returns <c>{error,{'EXIT',Reason}}</c> or <c>{error,Term}</c>, respectively.</p> </desc> </func> + <func> <name>swap_sup_handler(EventMgrRef, {Handler1,Args1}, {Handler2,Args2}) -> Result</name> <fsummary>Replace an event handler in a generic event manager.</fsummary> @@ -430,16 +522,18 @@ gen_event:stop -----> Module:terminate/2 <v> Reason = term()</v> </type> <desc> - <p>Replaces an event handler in the event manager <c>EventMgrRef</c> - in the same way as <c>swap_handler/3</c> but will also supervise + <p>Replaces an event handler in event manager <c>EventMgrRef</c> + in the same way as <c>swap_handler/3</c>, but also supervises the connection between <c>Handler2</c> and the calling process.</p> - <p>See <c>swap_handler/3</c> for a description of the arguments - and return values.</p> + <p>For a description of the arguments and return values, see + <seealso marker="#swap_handler/3"><c>swap_handler/3</c></seealso>.</p> </desc> </func> + <func> <name>which_handlers(EventMgrRef) -> [Handler]</name> - <fsummary>Return all event handlers installed in a generic event manager.</fsummary> + <fsummary>Return all event handlers installed in a generic event manager. + </fsummary> <type> <v>EventMgrRef = Name | {Name,Node} | {global,GlobalName} | {via,Module,ViaName} | pid()</v> @@ -450,132 +544,106 @@ gen_event:stop -----> Module:terminate/2 <v> Id = term()</v> </type> <desc> - <p>Returns a list of all event handlers installed in the event + <p>Returns a list of all event handlers installed in event manager <c>EventMgrRef</c>.</p> - <p>See <c>add_handler/3</c> for a description of <c>EventMgrRef</c> - and <c>Handler</c>.</p> - </desc> - </func> - <func> - <name>stop(EventMgrRef) -> ok</name> - <name>stop(EventMgrRef, Reason, Timeout) -> ok</name> - <fsummary>Terminate a generic event manager.</fsummary> - <type> - <v>EventMgrRef = Name | {Name,Node} | {global,GlobalName} - | {via,Module,ViaName} | pid()</v> - <v>Name = Node = atom()</v> - <v>GlobalName = ViaName = term()</v> - <v>Reason = term()</v> - <v>Timeout = int()>0 | infinity</v> - </type> - <desc> - <p>Orders the event manager <c>EventMgrRef</c> to exit with - the given <c>Reason</c> and waits for it to - terminate. Before terminating, the gen_event will call - <seealso marker="#Module:terminate/2">Module:terminate(stop,...)</seealso> - for each installed event handler.</p> - <p>The function returns <c>ok</c> if the event manager terminates - with the expected reason. Any other reason than <c>normal</c>, - <c>shutdown</c>, or <c>{shutdown,Term}</c> will cause an - error report to be issued using - <seealso marker="kernel:error_logger#format/2">error_logger:format/2</seealso>. - The default <c>Reason</c> is <c>normal</c>.</p> - <p><c>Timeout</c> is an integer greater than zero which - specifies how many milliseconds to wait for the event manager to - terminate, or the atom <c>infinity</c> to wait - indefinitely. The default value is <c>infinity</c>. If the - event manager has not terminated within the specified time, a - <c>timeout</c> exception is raised.</p> - <p>If the process does not exist, a <c>noproc</c> exception - is raised.</p> - <p>See <c>add_handler/3</c> for a description of <c>EventMgrRef</c>.</p> + <p>For a description of <c>EventMgrRef</c> and <c>Handler</c>, see + <seealso marker="#add_handler/3"><c>add_handler/3</c></seealso>.</p> </desc> </func> </funcs> <section> - <title>CALLBACK FUNCTIONS</title> - <p>The following functions should be exported from a <c>gen_event</c> + <title>Callback Functions</title> + <p>The following functions are to be exported from a <c>gen_event</c> callback module.</p> </section> + <funcs> <func> - <name>Module:init(InitArgs) -> {ok,State} | {ok,State,hibernate} | {error,Reason}</name> - <fsummary>Initialize an event handler.</fsummary> + <name>Module:code_change(OldVsn, State, Extra) -> {ok, NewState}</name> + <fsummary>Update the internal state during upgrade/downgrade.</fsummary> <type> - <v>InitArgs = Args | {Args,Term}</v> - <v> Args = Term = term()</v> - <v>State = term()</v> - <v>Reason = term()</v> + <v>OldVsn = Vsn | {down, Vsn}</v> + <v> Vsn = term()</v> + <v>State = NewState = term()</v> + <v>Extra = term()</v> </type> <desc> - <p>Whenever a new event handler is added to an event manager, - this function is called to initialize the event handler.</p> - <p>If the event handler is added due to a call to - <c>gen_event:add_handler/3</c> or - <c>gen_event:add_sup_handler/3</c>, <c>InitArgs</c> is - the <c>Args</c> argument of these functions.</p> - <p>If the event handler is replacing another event handler due to - a call to <c>gen_event:swap_handler/3</c> or - <c>gen_event:swap_sup_handler/3</c>, or due to a <c>swap</c> - return tuple from one of the other callback functions, - <c>InitArgs</c> is a tuple <c>{Args,Term}</c> where <c>Args</c> is - the argument provided in the function call/return tuple and - <c>Term</c> is the result of terminating the old event handler, - see <c>gen_event:swap_handler/3</c>.</p> - <p>If successful, the function should return <c>{ok,State}</c> - or <c>{ok,State,hibernate}</c> where <c>State</c> is the - initial internal state of the event handler.</p> - <p>If <c>{ok,State,hibernate}</c> is returned, the event - manager will go into hibernation (by calling <seealso - marker="proc_lib#hibernate/3">proc_lib:hibernate/3</seealso>), - waiting for the next event to occur.</p> + <p>This function is called for an installed event handler that + is to update its internal state during a release + upgrade/downgrade, that is, when the instruction + <c>{update,Module,Change,...}</c>, where + <c>Change={advanced,Extra}</c>, is specified in the <c>.appup</c> + file. For more information, see <seealso + marker="doc/design_principles:users_guide">OTP Design Principles</seealso>.</p> + <p>For an upgrade, <c>OldVsn</c> is <c>Vsn</c>, and for a downgrade, + <c>OldVsn</c> is <c>{down,Vsn}</c>. <c>Vsn</c> is defined by the + <c>vsn</c> attribute(s) of the old version of the callback module + <c>Module</c>. If no such attribute is defined, the version + is the checksum of the Beam file.</p> + <p><c>State</c> is the internal state of the event handler.</p> + <p><c>Extra</c> is passed "as is" from the <c>{advanced,Extra}</c> + part of the update instruction.</p> + <p>The function is to return the updated internal state.</p> </desc> </func> + <func> - <name>Module:handle_event(Event, State) -> Result</name> - <fsummary>Handle an event.</fsummary> + <name>Module:format_status(Opt, [PDict, State]) -> Status</name> + <fsummary>Optional function for providing a term describing the + current event handler state.</fsummary> <type> - <v>Event = term()</v> + <v>Opt = normal | terminate</v> + <v>PDict = [{Key, Value}]</v> <v>State = term()</v> - <v>Result = {ok,NewState} | {ok,NewState,hibernate} </v> - <v> | {swap_handler,Args1,NewState,Handler2,Args2} | remove_handler</v> - <v> NewState = term()</v> - <v> Args1 = Args2 = term()</v> - <v> Handler2 = Module2 | {Module2,Id}</v> - <v> Module2 = atom()</v> - <v> Id = term()</v> + <v>Status = term()</v> </type> <desc> - <p>Whenever an event manager receives an event sent using - <c>gen_event:notify/2</c> or <c>gen_event:sync_notify/2</c>, this - function is called for each installed event handler to handle - the event.</p> - <p><c>Event</c> is the <c>Event</c> argument of - <c>notify</c>/<c>sync_notify</c>.</p> + <note> + <p>This callback is optional, so event handler modules need + not export it. If a handler does not export this function, + the <c>gen_event</c> module uses the handler state directly for + the purposes described below.</p> + </note> + <p>This function is called by a <c>gen_event</c> process in the + following situations:</p> + <list type="bulleted"> + <item>One of <seealso marker="sys#get_status/1"> + <c>sys:get_status/1,2</c></seealso> + is invoked to get the <c>gen_event</c> status. <c>Opt</c> is set + to the atom <c>normal</c> for this case.</item> + <item>The event handler terminates abnormally and <c>gen_event</c> + logs an error. <c>Opt</c> is set to the + atom <c>terminate</c> for this case.</item> + </list> + <p>This function is useful for changing the form and + appearance of the event handler state for these cases. An + event handler callback module wishing to change the + the <c>sys:get_status/1,2</c> return value as well as how + its state appears in termination error logs, exports an + instance of <c>format_status/2</c> that returns a term + describing the current state of the event handler.</p> + <p><c>PDict</c> is the current value of the + process dictionary of <c>gen_event</c>.</p> <p><c>State</c> is the internal state of the event handler.</p> - <p>If the function returns <c>{ok,NewState}</c> or <c>{ok,NewState,hibernate}</c> - the event handler - will remain in the event manager with the possible updated - internal state <c>NewState</c>.</p> - <p>If <c>{ok,NewState,hibernate}</c> is returned, the event - manager will also go into hibernation (by calling <seealso - marker="proc_lib#hibernate/3">proc_lib:hibernate/3</seealso>), - waiting for the next event to occur. It is sufficient that one of the event - handlers return <c>{ok,NewState,hibernate}</c> for the whole event manager - process to hibernate.</p> - <p>If the function returns - <c>{swap_handler,Args1,NewState,Handler2,Args2}</c> the event - handler will be replaced by <c>Handler2</c> by first calling - <c>Module:terminate(Args1,NewState)</c> and then - <c>Module2:init({Args2,Term})</c> where <c>Term</c> is the return - value of <c>Module:terminate/2</c>. - See <c>gen_event:swap_handler/3</c> for more information.</p> - <p>If the function returns <c>remove_handler</c> the event handler - will be deleted by calling - <c>Module:terminate(remove_handler,State)</c>.</p> + <p>The function is to return <c>Status</c>, a term that + change the details of the current state of the event + handler. Any term is allowed for <c>Status</c>. The + <c>gen_event</c> module uses <c>Status</c> as follows:</p> + <list type="bulleted"> + <item><p>When <c>sys:get_status/1,2</c> is called, <c>gen_event</c> + ensures that its return value contains <c>Status</c> in + place of the state term of the event handler.</p></item> + <item><p>When an event handler terminates abnormally, <c>gen_event</c> + logs <c>Status</c> in place of the state term of the + event handler.</p></item> + </list> + <p>One use for this function is to return compact alternative + state representations to avoid that large state terms + are printed in log files.</p> </desc> </func> + <func> <name>Module:handle_call(Request, State) -> Result</name> <fsummary>Handle a synchronous request.</fsummary> @@ -594,15 +662,77 @@ gen_event:stop -----> Module:terminate/2 </type> <desc> <p>Whenever an event manager receives a request sent using - <c>gen_event:call/3,4</c>, this function is called for + <seealso marker="#call/3"><c>call/3,4</c></seealso>, + this function is called for the specified event handler to handle the request.</p> - <p><c>Request</c> is the <c>Request</c> argument of <c>call</c>.</p> + <p><c>Request</c> is the <c>Request</c> argument of <c>call/3,4</c>.</p> + <p><c>State</c> is the internal state of the event handler.</p> + <p>The return values are the same as for + <seealso marker="#Module:handle_event/2"> + <c>Module:handle_event/2</c></seealso> + except that they also contain a term <c>Reply</c>, which is the reply + to the client as the return value of <c>call/3,4</c>.</p> + </desc> + </func> + + <func> + <name>Module:handle_event(Event, State) -> Result</name> + <fsummary>Handle an event.</fsummary> + <type> + <v>Event = term()</v> + <v>State = term()</v> + <v>Result = {ok,NewState} | {ok,NewState,hibernate} </v> + <v> | {swap_handler,Args1,NewState,Handler2,Args2} + | remove_handler</v> + <v> NewState = term()</v> + <v> Args1 = Args2 = term()</v> + <v> Handler2 = Module2 | {Module2,Id}</v> + <v> Module2 = atom()</v> + <v> Id = term()</v> + </type> + <desc> + <p>Whenever an event manager receives an event sent using + <seealso marker="#notify/2"><c>notify/2</c></seealso> or + <seealso marker="#sync_notify/2"><c>sync_notify/2</c></seealso>, + this function is called for each installed event handler to handle + the event.</p> + <p><c>Event</c> is the <c>Event</c> argument of + <c>notify/2</c>/<c>sync_notify/2</c>.</p> <p><c>State</c> is the internal state of the event handler.</p> - <p>The return values are the same as for <c>handle_event/2</c> - except they also contain a term <c>Reply</c> which is the reply - given back to the client as the return value of <c>call</c>.</p> + <list type="bulleted"> + <item> + <p>If <c>{ok,NewState}</c> or <c>{ok,NewState,hibernate}</c> + is returned, the event handler + remains in the event manager with the possible updated + internal state <c>NewState</c>.</p> + </item> + <item> + <p>If <c>{ok,NewState,hibernate}</c> is returned, the event + manager also goes into hibernation (by calling + <seealso marker="proc_lib#hibernate/3"> + <c>proc_lib:hibernate/3</c></seealso>), waiting for the next + event to occur. It is sufficient that one of the + event handlers return <c>{ok,NewState,hibernate}</c> for the + whole event manager process to hibernate.</p> + </item> + <item> + <p>If <c>{swap_handler,Args1,NewState,Handler2,Args2}</c> is + returned, the event handler is replaced by <c>Handler2</c> by + first calling <c>Module:terminate(Args1,NewState)</c> and then + <c>Module2:init({Args2,Term})</c>, where <c>Term</c> is the return + value of <c>Module:terminate/2</c>. For more information, see + <seealso marker="#swap_handler/3"><c>swap_handler/3</c></seealso>. + </p> + </item> + <item> + <p>If <c>remove_handler</c> is returned, the event handler is + deleted by calling + <c>Module:terminate(remove_handler,State)</c>.</p> + </item> + </list> </desc> </func> + <func> <name>Module:handle_info(Info, State) -> Result</name> <fsummary>Handle an incoming message.</fsummary> @@ -610,7 +740,8 @@ gen_event:stop -----> Module:terminate/2 <v>Info = term()</v> <v>State = term()</v> <v>Result = {ok,NewState} | {ok,NewState,hibernate}</v> - <v> | {swap_handler,Args1,NewState,Handler2,Args2} | remove_handler</v> + <v> | {swap_handler,Args1,NewState,Handler2,Args2} + | remove_handler</v> <v> NewState = term()</v> <v> Args1 = Args2 = term()</v> <v> Handler2 = Module2 | {Module2,Id}</v> @@ -622,10 +753,49 @@ gen_event:stop -----> Module:terminate/2 an event manager receives any other message than an event or a synchronous request (or a system message).</p> <p><c>Info</c> is the received message.</p> - <p>See <c>Module:handle_event/2</c> for a description of State - and possible return values.</p> + <p>For a description of <c>State</c> and possible return values, see + <seealso marker="#Module:handle_event/2"> + <c>Module:handle_event/2</c></seealso>.</p> + </desc> + </func> + + <func> + <name>Module:init(InitArgs) -> {ok,State} | {ok,State,hibernate} | {error,Reason}</name> + <fsummary>Initialize an event handler.</fsummary> + <type> + <v>InitArgs = Args | {Args,Term}</v> + <v> Args = Term = term()</v> + <v>State = term()</v> + <v>Reason = term()</v> + </type> + <desc> + <p>Whenever a new event handler is added to an event manager, + this function is called to initialize the event handler.</p> + <p>If the event handler is added because of a call to + <seealso marker="#add_handler/3"><c>add_handler/3</c></seealso> or + <seealso marker="#add_sup_handler/3"> + <c>add_sup_handler/3</c></seealso>, <c>InitArgs</c> is + the <c>Args</c> argument of these functions.</p> + <p>If the event handler replaces another event handler because of + a call to + <seealso marker="#swap_handler/3"><c>swap_handler/3</c></seealso> or + <seealso marker="#swap_sup_handler/3"> + <c>swap_sup_handler/3</c></seealso>, or because of a <c>swap</c> + return tuple from one of the other callback functions, + <c>InitArgs</c> is a tuple <c>{Args,Term}</c>, where <c>Args</c> is + the argument provided in the function call/return tuple and + <c>Term</c> is the result of terminating the old event handler, see + <seealso marker="#swap_handler/3"><c>swap_handler/3</c></seealso>.</p> + <p>If successful, the function returns <c>{ok,State}</c> + or <c>{ok,State,hibernate}</c>, where <c>State</c> is the + initial internal state of the event handler.</p> + <p>If <c>{ok,State,hibernate}</c> is returned, the event + manager goes into hibernation (by calling <seealso + marker="proc_lib#hibernate/3"><c>proc_lib:hibernate/3</c></seealso>), + waiting for the next event to occur.</p> </desc> </func> + <func> <name>Module:terminate(Arg, State) -> term()</name> <fsummary>Clean up before deletion.</fsummary> @@ -636,22 +806,25 @@ gen_event:stop -----> Module:terminate/2 </type> <desc> <p>Whenever an event handler is deleted from an event manager, - this function is called. It should be the opposite of - <c>Module:init/1</c> and do any necessary cleaning up.</p> - <p>If the event handler is deleted due to a call to - <c>gen_event:delete_handler</c>, <c>gen_event:swap_handler/3</c> - or <c>gen_event:swap_sup_handler/3</c>, <c>Arg</c> is + this function is called. It is to be the opposite of + <seealso marker="#Module:init/1"><c>Module:init/1</c></seealso> + and do any necessary cleaning up.</p> + <p>If the event handler is deleted because of a call to + <seealso marker="#delete_handler/3"><c>delete_handler/3</c></seealso>, + <seealso marker="#swap_handler/3"><c>swap_handler/3</c></seealso>, or + <seealso marker="#swap_sup_handler/3"> + <c>swap_sup_handler/3</c></seealso>, <c>Arg</c> is the <c>Args</c> argument of this function call.</p> <p><c>Arg={stop,Reason}</c> if the event handler has a supervised - connection to a process which has terminated with reason + connection to a process that has terminated with reason <c>Reason</c>.</p> <p><c>Arg=stop</c> if the event handler is deleted because the event manager is terminating.</p> - <p>The event manager will terminate if it is part of a supervision - tree and it is ordered by its supervisor to terminate. - Even if it is <em>not</em> part of a supervision tree, it will - terminate if it receives an <c>'EXIT'</c> message from - its parent.</p> + <p>The event manager terminates if it is part of a supervision + tree and it is ordered by its supervisor to terminate. + Even if it is <em>not</em> part of a supervision tree, it + terminates if it receives an <c>'EXIT'</c> message from + its parent.</p> <p><c>Arg=remove_handler</c> if the event handler is deleted because another callback function has returned <c>remove_handler</c> or <c>{remove_handler,Reply}</c>.</p> @@ -660,104 +833,20 @@ gen_event:stop -----> Module:terminate/2 or <c>Arg={error,{'EXIT',Reason}}</c> if a callback function failed.</p> <p><c>State</c> is the internal state of the event handler.</p> - <p>The function may return any term. If the event handler is - deleted due to a call to <c>gen_event:delete_handler</c>, - the return value of that function will be the return value of this + <p>The function can return any term. If the event handler is + deleted because of a call to <c>gen_event:delete_handler/3</c>, + the return value of that function becomes the return value of this function. If the event handler is to be replaced with another event - handler due to a swap, the return value will be passed to + handler because of a swap, the return value is passed to the <c>init</c> function of the new event handler. Otherwise the return value is ignored.</p> </desc> </func> - <func> - <name>Module:code_change(OldVsn, State, Extra) -> {ok, NewState}</name> - <fsummary>Update the internal state during upgrade/downgrade.</fsummary> - <type> - <v>OldVsn = Vsn | {down, Vsn}</v> - <v> Vsn = term()</v> - <v>State = NewState = term()</v> - <v>Extra = term()</v> - </type> - <desc> - <p>This function is called for an installed event handler which - should update its internal state during a release - upgrade/downgrade, i.e. when the instruction - <c>{update,Module,Change,...}</c> where - <c>Change={advanced,Extra}</c> is given in the <c>.appup</c> - file. See <em>OTP Design Principles</em> for more - information.</p> - <p>In the case of an upgrade, <c>OldVsn</c> is <c>Vsn</c>, and - in the case of a downgrade, <c>OldVsn</c> is - <c>{down,Vsn}</c>. <c>Vsn</c> is defined by the <c>vsn</c> - attribute(s) of the old version of the callback module - <c>Module</c>. If no such attribute is defined, the version - is the checksum of the BEAM file.</p> - <p><c>State</c> is the internal state of the event handler.</p> - <p><c>Extra</c> is passed as-is from the <c>{advanced,Extra}</c> - part of the update instruction.</p> - <p>The function should return the updated internal state.</p> - </desc> - </func> - <func> - <name>Module:format_status(Opt, [PDict, State]) -> Status</name> - <fsummary>Optional function for providing a term describing the - current event handler state.</fsummary> - <type> - <v>Opt = normal | terminate</v> - <v>PDict = [{Key, Value}]</v> - <v>State = term()</v> - <v>Status = term()</v> - </type> - <desc> - <note> - <p>This callback is optional, so event handler modules need - not export it. If a handler does not export this function, - the gen_event module uses the handler state directly for - the purposes described below.</p> - </note> - <p>This function is called by a gen_event process when:</p> - <list type="bulleted"> - <item>One - of <seealso marker="sys#get_status/1">sys:get_status/1,2</seealso> - is invoked to get the gen_event status. <c>Opt</c> is set - to the atom <c>normal</c> for this case.</item> - <item>The event handler terminates abnormally and gen_event - logs an error. <c>Opt</c> is set to the - atom <c>terminate</c> for this case.</item> - </list> - <p>This function is useful for customising the form and - appearance of the event handler state for these cases. An - event handler callback module wishing to customise - the <c>sys:get_status/1,2</c> return value as well as how - its state appears in termination error logs exports an - instance of <c>format_status/2</c> that returns a term - describing the current state of the event handler.</p> - <p><c>PDict</c> is the current value of the gen_event's - process dictionary.</p> - <p><c>State</c> is the internal state of the event - handler.</p> - <p>The function should return <c>Status</c>, a term that - customises the details of the current state of the event - handler. Any term is allowed for <c>Status</c>. The - gen_event module uses <c>Status</c> as follows:</p> - <list type="bulleted"> - <item>When <c>sys:get_status/1,2</c> is called, gen_event - ensures that its return value contains <c>Status</c> in - place of the event handler's actual state term.</item> - <item>When an event handler terminates abnormally, gen_event - logs <c>Status</c> in place of the event handler's actual - state term.</item> - </list> - <p>One use for this function is to return compact alternative - state representations to avoid having large state terms - printed in logfiles.</p> - </desc> - </func> </funcs> <section> - <title>SEE ALSO</title> - <p><seealso marker="supervisor">supervisor(3)</seealso>, - <seealso marker="sys">sys(3)</seealso></p> + <title>See Also</title> + <p><seealso marker="supervisor"><c>supervisor(3)</c></seealso>, + <seealso marker="sys"><c>sys(3)</c></seealso></p> </section> </erlref> diff --git a/lib/stdlib/doc/src/gen_fsm.xml b/lib/stdlib/doc/src/gen_fsm.xml index 835e252704..de06987d38 100644 --- a/lib/stdlib/doc/src/gen_fsm.xml +++ b/lib/stdlib/doc/src/gen_fsm.xml @@ -29,29 +29,30 @@ <rev></rev> </header> <module>gen_fsm</module> - <modulesummary>Generic Finite State Machine Behaviour</modulesummary> + <modulesummary>Generic finite state machine behavior.</modulesummary> <description> <note> <p> There is a new behaviour <seealso marker="gen_statem"><c>gen_statem</c></seealso> that is intended to replace <c>gen_fsm</c> for new code. - It has the same features and add some really useful. - This module will not be removed for the foreseeable future + <c>gen_fsm</c> will not be removed for the foreseeable future to keep old state machine implementations running. </p> </note> - <p>A behaviour module for implementing a finite state machine. - A generic finite state machine process (gen_fsm) implemented - using this module will have a standard set of interface functions - and include functionality for tracing and error reporting. It will - also fit into an OTP supervision tree. Refer to - <seealso marker="doc/design_principles:fsm">OTP Design Principles</seealso> for more information. + <p>This behavior module provides a finite state machine. + A generic finite state machine process (<c>gen_fsm</c>) implemented + using this module has a standard set of interface functions + and includes functionality for tracing and error reporting. It + also fits into an OTP supervision tree. For more information, see + <seealso marker="doc/design_principles:fsm">OTP Design Principles</seealso>. </p> - <p>A gen_fsm assumes all specific parts to be located in a callback - module exporting a pre-defined set of functions. The relationship - between the behaviour functions and the callback functions can be - illustrated as follows:</p> + + <p>A <c>gen_fsm</c> process assumes all specific parts to be located in a + callback module exporting a predefined set of functions. The relationship + between the behavior functions and the callback functions is as + follows:</p> + <pre> gen_fsm module Callback module -------------- --------------- @@ -73,34 +74,261 @@ gen_fsm:sync_send_all_state_event -----> Module:handle_sync_event/4 - -----> Module:terminate/3 - -----> Module:code_change/4</pre> - <p>If a callback function fails or returns a bad value, the gen_fsm - will terminate.</p> - <p>A gen_fsm handles system messages as documented in - <seealso marker="sys">sys(3)</seealso>. The <c>sys</c> module - can be used for debugging a gen_fsm.</p> - <p>Note that a gen_fsm does not trap exit signals automatically, - this must be explicitly initiated in the callback module.</p> + + <p>If a callback function fails or returns a bad value, the <c>gen_fsm</c> + process terminates.</p> + + <p>A <c>gen_fsm</c> process handles system messages as described in + <seealso marker="sys"><c>sys(3)</c></seealso>. The <c>sys</c> module + can be used for debugging a <c>gen_fsm</c> process.</p> + + <p>Notice that a <c>gen_fsm</c> process does not trap exit signals + automatically, this must be explicitly initiated in the callback + module.</p> + <p>Unless otherwise stated, all functions in this module fail if - the specified gen_fsm does not exist or if bad arguments are - given.</p> - <p>The gen_fsm process can go into hibernation - (see <seealso marker="erts:erlang#erlang:hibernate/3">erlang(3)</seealso>) if a callback - function specifies <c>'hibernate'</c> instead of a timeout value. This - might be useful if the server is expected to be idle for a long - time. However this feature should be used with care as hibernation - implies at least two garbage collections (when hibernating and - shortly after waking up) and is not something you'd want to do - between each call to a busy state machine.</p> + the specified <c>gen_fsm</c> process does not exist or if bad arguments + are specified.</p> + <p>The <c>gen_fsm</c> process can go into hibernation + (see <seealso marker="erts:erlang#hibernate/3"> + <c>erlang:hibernate/3</c></seealso>) if a callback function + specifies <c>'hibernate'</c> instead of a time-out value. This + can be useful if the server is expected to be idle for a long + time. However, use this feature with care, as hibernation + implies at least two garbage collections (when hibernating and + shortly after waking up) and is not something you want to do + between each call to a busy state machine.</p> </description> + <funcs> <func> + <name>cancel_timer(Ref) -> RemainingTime | false</name> + <fsummary>Cancel an internal timer in a generic FSM.</fsummary> + <type> + <v>Ref = reference()</v> + <v>RemainingTime = integer()</v> + </type> + <desc> + <p>Cancels an internal timer referred by <c>Ref</c> in the + <c>gen_fsm</c> process that calls this function.</p> + <p><c>Ref</c> is a reference returned from + <seealso marker="#send_event_after/2"> + <c>send_event_after/2</c></seealso> or + <seealso marker="#start_timer/2"><c>start_timer/2</c></seealso>.</p> + <p>If the timer has already timed out, but the event not yet + been delivered, it is cancelled as if it had <em>not</em> + timed out, so there is no false timer event after + returning from this function.</p> + <p>Returns the remaining time in milliseconds until the timer would + have expired if <c>Ref</c> referred to an active timer, otherwise + <c>false</c>.</p> + </desc> + </func> + + <func> + <name>enter_loop(Module, Options, StateName, StateData)</name> + <name>enter_loop(Module, Options, StateName, StateData, FsmName)</name> + <name>enter_loop(Module, Options, StateName, StateData, Timeout)</name> + <name>enter_loop(Module, Options, StateName, StateData, FsmName, Timeout)</name> + <fsummary>Enter the <c>gen_fsm</c> receive loop.</fsummary> + <type> + <v>Module = atom()</v> + <v>Options = [Option]</v> + <v> Option = {debug,Dbgs}</v> + <v> Dbgs = [Dbg]</v> + <v> Dbg = trace | log | statistics</v> + <v> | {log_to_file,FileName} | {install,{Func,FuncState}}</v> + <v>StateName = atom()</v> + <v>StateData = term()</v> + <v>FsmName = {local,Name} | {global,GlobalName}</v> + <v> | {via,Module,ViaName}</v> + <v> Name = atom()</v> + <v> GlobalName = ViaName = term()</v> + <v>Timeout = int() | infinity</v> + </type> + <desc> + <p>Makes an existing process into a <c>gen_fsm</c> process. + Does not return, + instead the calling process enters the <c>gen_fsm</c> receive + loop and becomes a <c>gen_fsm</c> process. The process <em>must</em> + have been started using one of the start functions in + <seealso marker="proc_lib"><c>proc_lib(3)</c></seealso>. The user is + responsible for any initialization of the process, including + registering a name for it.</p> + <p>This function is useful when a more complex initialization + procedure is needed than the <c>gen_fsm</c> behavior provides.</p> + <p><c>Module</c>, <c>Options</c>, and <c>FsmName</c> have + the same meanings as when calling + <seealso marker="#start_link/3"><c>start[_link]/3,4</c></seealso>. + However, if <c>FsmName</c> is specified, the process must have + been registered accordingly <em>before</em> this function is + called.</p> + <p><c>StateName</c>, <c>StateData</c>, and <c>Timeout</c> have + the same meanings as in the return value of + <seealso marker="#Moduleinit"><c>Module:init/1</c></seealso>. + The callback module <c>Module</c> does not need to + export an <c>init/1</c> function.</p> + <p>The function fails if the calling process was not started by a + <c>proc_lib</c> start function, or if it is not registered + according to <c>FsmName</c>.</p> + </desc> + </func> + + <func> + <name>reply(Caller, Reply) -> Result</name> + <fsummary>Send a reply to a caller.</fsummary> + <type> + <v>Caller - see below</v> + <v>Reply = term()</v> + <v>Result = term()</v> + </type> + <desc> + <p>This function can be used by a <c>gen_fsm</c> process to + explicitly send a reply to a client process that called + <seealso marker="#sync_send_event/2"> + <c>sync_send_event/2,3</c></seealso> or + <seealso marker="#sync_send_all_state_event/2"> + <c>sync_send_all_state_event/2,3</c></seealso> + when the reply cannot be defined in the return value of + <seealso marker="#Module:StateName/3"> + <c>Module:StateName/3</c></seealso> or + <seealso marker="#Module:handle_sync_event/4"> + <c>Module:handle_sync_event/4</c></seealso>.</p> + <p><c>Caller</c> must be the <c>From</c> argument provided to + the callback function. <c>Reply</c> is any term + given back to the client as the return value of + <c>sync_send_event/2,3</c> or + <c>sync_send_all_state_event/2,3</c>.</p> + <p>Return value <c>Result</c> is not further defined, and + is always to be ignored.</p> + </desc> + </func> + + <func> + <name>send_all_state_event(FsmRef, Event) -> ok</name> + <fsummary>Send an event asynchronously to a generic FSM.</fsummary> + <type> + <v>FsmRef = Name | {Name,Node} | {global,GlobalName}</v> + <v> | {via,Module,ViaName} | pid()</v> + <v> Name = Node = atom()</v> + <v> GlobalName = ViaName = term()</v> + <v>Event = term()</v> + </type> + <desc> + <p>Sends an event asynchronously to the <c>FsmRef</c> of the + <c>gen_fsm</c> process and returns <c>ok</c> immediately. + The <c>gen_fsm</c> process calls + <seealso marker="#Module:handle_event/3"> + <c>Module:handle_event/3</c></seealso> to handle the event.</p> + <p>For a description of the arguments, see + <seealso marker="#send_event/2"><c>send_event/2</c></seealso>.</p> + <p>The difference between <c>send_event/2</c> and + <c>send_all_state_event/2</c> is which callback function is + used to handle the event. This function is useful when + sending events that are handled the same way in every state, + as only one <c>handle_event</c> clause is needed to handle + the event instead of one clause in each state name function.</p> + </desc> + </func> + + <func> + <name>send_event(FsmRef, Event) -> ok</name> + <fsummary>Send an event asynchronously to a generic FSM.</fsummary> + <type> + <v>FsmRef = Name | {Name,Node} | {global,GlobalName}</v> + <v> | {via,Module,ViaName} | pid()</v> + <v> Name = Node = atom()</v> + <v> GlobalName = ViaName = term()</v> + <v>Event = term()</v> + </type> + <desc> + <p>Sends an event asynchronously to the <c>FsmRef</c> of the + <c>gen_fsm</c> process + and returns <c>ok</c> immediately. The <c>gen_fsm</c> process calls + <seealso marker="#Module:StateName/2"> + <c>Module:StateName/2</c></seealso> to handle the event, where + <c>StateName</c> is the name of the current state of + the <c>gen_fsm</c> process.</p> + <p><c>FsmRef</c> can be any of the following:</p> + <list type="bulleted"> + <item>The pid</item> + <item><c>Name</c>, if the <c>gen_fsm</c> process is locally + registered</item> + <item><c>{Name,Node}</c>, if the <c>gen_fsm</c> process is locally + registered at another node</item> + <item><c>{global,GlobalName}</c>, if the <c>gen_fsm</c> process is + globally registered</item> + <item><c>{via,Module,ViaName}</c>, if the <c>gen_fsm</c> process is + registered through an alternative process registry</item> + </list> + <p><c>Event</c> is any term that is passed as one of + the arguments to <c>Module:StateName/2</c>.</p> + </desc> + </func> + + <func> + <name>send_event_after(Time, Event) -> Ref</name> + <fsummary>Send a delayed event internally in a generic FSM.</fsummary> + <type> + <v>Time = integer()</v> + <v>Event = term()</v> + <v>Ref = reference()</v> + </type> + <desc> + <p>Sends a delayed event internally in the <c>gen_fsm</c> process + that calls this function after <c>Time</c> milliseconds. + Returns immediately a + reference that can be used to cancel the delayed send using + <seealso marker="#cancel_timer/1"><c>cancel_timer/1</c></seealso>.</p> + <p>The <c>gen_fsm</c> process calls + <seealso marker="#Module:StateName/2"> + <c>Module:StateName/2</c></seealso> to handle + the event, where <c>StateName</c> is the name of the current + state of the <c>gen_fsm</c> process at the time the delayed event is + delivered.</p> + <p><c>Event</c> is any term that is passed as one of + the arguments to <c>Module:StateName/2</c>.</p> + </desc> + </func> + + <func> + <name>start(Module, Args, Options) -> Result</name> + <name>start(FsmName, Module, Args, Options) -> Result</name> + <fsummary>Create a standalone <c>gen_fsm</c> process.</fsummary> + <type> + <v>FsmName = {local,Name} | {global,GlobalName}</v> + <v> | {via,Module,ViaName}</v> + <v> Name = atom()</v> + <v> GlobalName = ViaName = term()</v> + <v>Module = atom()</v> + <v>Args = term()</v> + <v>Options = [Option]</v> + <v> Option = {debug,Dbgs} | {timeout,Time} | {spawn_opt,SOpts}</v> + <v> Dbgs = [Dbg]</v> + <v> Dbg = trace | log | statistics</v> + <v> | {log_to_file,FileName} | {install,{Func,FuncState}}</v> + <v> SOpts = [term()]</v> + <v>Result = {ok,Pid} | ignore | {error,Error}</v> + <v> Pid = pid()</v> + <v> Error = {already_started,Pid} | term()</v> + </type> + <desc> + <p>Creates a standalone <c>gen_fsm</c> process, that is, a process that + is not part of a supervision tree and thus has no supervisor.</p> + <p>For a description of arguments and return values, see + <seealso marker="#start_link/3"><c>start_link/3,4</c></seealso>.</p> + </desc> + </func> + + <func> <name>start_link(Module, Args, Options) -> Result</name> <name>start_link(FsmName, Module, Args, Options) -> Result</name> - <fsummary>Create a gen_fsm process in a supervision tree.</fsummary> + <fsummary>Create a <c>gen_fsm</c> process in a supervision tree. + </fsummary> <type> - <v>FsmName = {local,Name} | {global,GlobalName} - | {via,Module,ViaName}</v> + <v>FsmName = {local,Name} | {global,GlobalName}</v> + <v> | {via,Module,ViaName}</v> <v> Name = atom()</v> <v> GlobalName = ViaName = term()</v> <v>Module = atom()</v> @@ -117,54 +345,64 @@ gen_fsm:sync_send_all_state_event -----> Module:handle_sync_event/4 <v> Error = {already_started,Pid} | term()</v> </type> <desc> - <p>Creates a gen_fsm process as part of a supervision tree. - The function should be called, directly or indirectly, by - the supervisor. It will, among other things, ensure that - the gen_fsm is linked to the supervisor.</p> - <p>The gen_fsm process calls <c>Module:init/1</c> to - initialize. To ensure a synchronized start-up procedure, + <p>Creates a <c>gen_fsm</c> process as part of a supervision tree. + The function is to be called, directly or indirectly, by + the supervisor. For example, it ensures that + the <c>gen_fsm</c> process is linked to the supervisor.</p> + <p>The <c>gen_fsm</c> process calls + <seealso marker="#Module:init/1"><c>Module:init/1</c></seealso> to + initialize. To ensure a synchronized startup procedure, <c>start_link/3,4</c> does not return until <c>Module:init/1</c> has returned.</p> - <p>If <c>FsmName={local,Name}</c>, the gen_fsm is registered - locally as <c>Name</c> using <c>register/2</c>. - If <c>FsmName={global,GlobalName}</c>, the gen_fsm is - registered globally as <c>GlobalName</c> using - <c>global:register_name/2</c>. - If <c>FsmName={via,Module,ViaName}</c>, the gen_fsm will - register with the registry represented by <c>Module</c>. - The <c>Module</c> callback should export the functions - <c>register_name/2</c>, <c>unregister_name/1</c>, - <c>whereis_name/1</c> and <c>send/2</c>, which should behave like the - corresponding functions in <c>global</c>. Thus, - <c>{via,global,GlobalName}</c> is a valid reference.</p> - <p>If no name is provided, - the gen_fsm is not registered.</p> + <list type="bulleted"> + <item> + <p>If <c>FsmName={local,Name}</c>, the <c>gen_fsm</c> process is + registered locally as <c>Name</c> using <c>register/2</c>.</p> + </item> + <item> + <p>If <c>FsmName={global,GlobalName}</c>, the <c>gen_fsm</c> process + is registered globally as <c>GlobalName</c> using + <seealso marker="kernel:global#register_name/2"> + <c>global:register_name/2</c></seealso>.</p> + </item> + <item> + <p>If <c>FsmName={via,Module,ViaName}</c>, the <c>gen_fsm</c> + process registers with the registry represented by <c>Module</c>. + The <c>Module</c> callback is to export the functions + <c>register_name/2</c>, <c>unregister_name/1</c>, + <c>whereis_name/1</c>, and <c>send/2</c>, which are to behave + like the corresponding functions in + <seealso marker="kernel:global"><c>global</c></seealso>. + Thus, <c>{via,global,GlobalName}</c> is a valid reference.</p> + </item> + </list> + <p>If no name is provided, the <c>gen_fsm</c> process is not + registered.</p> <p><c>Module</c> is the name of the callback module.</p> - <p><c>Args</c> is an arbitrary term which is passed as + <p><c>Args</c> is any term that is passed as the argument to <c>Module:init/1</c>.</p> - <p>If the option <c>{timeout,Time}</c> is present, the gen_fsm - is allowed to spend <c>Time</c> milliseconds initializing - or it will be terminated and the start function will return + <p>If option <c>{timeout,Time}</c> is present, the <c>gen_fsm</c> + process is allowed to spend <c>Time</c> milliseconds initializing + or it terminates and the start function returns <c>{error,timeout}</c>.</p> - <p>If the option <c>{debug,Dbgs}</c> is present, - the corresponding <c>sys</c> function will be called for each - item in <c>Dbgs</c>. See - <seealso marker="sys">sys(3)</seealso>.</p> - <p>If the option <c>{spawn_opt,SOpts}</c> is present, - <c>SOpts</c> will be passed as option list to - the <c>spawn_opt</c> BIF which is used to spawn the gen_fsm - process. See - <seealso marker="erts:erlang#spawn_opt/2">erlang(3)</seealso>.</p> + <p>If option <c>{debug,Dbgs}</c> is present, the corresponding + <c>sys</c> function is called for each item in <c>Dbgs</c>; see + <seealso marker="sys"><c>sys(3)</c></seealso>.</p> + <p>If option <c>{spawn_opt,SOpts}</c> is present, <c>SOpts</c> is + passed as option list to the <c>spawn_opt</c> BIF that is used to + spawn the <c>gen_fsm</c> process; see + <seealso marker="erts:erlang#spawn_opt/2"> + <c>spawn_opt/2</c></seealso>.</p> <note> - <p>Using the spawn option <c>monitor</c> is currently not - allowed, but will cause the function to fail with reason + <p>Using spawn option <c>monitor</c> is not + allowed, it causes the function to fail with reason <c>badarg</c>.</p> </note> - <p>If the gen_fsm is successfully created and initialized - the function returns <c>{ok,Pid}</c>, where <c>Pid</c> is - the pid of the gen_fsm. If there already exists a process with - the specified <c>FsmName</c>, the function returns - <c>{error,{already_started,Pid}}</c> where <c>Pid</c> is + <p>If the <c>gen_fsm</c> process is successfully created and + initialized, the function returns <c>{ok,Pid}</c>, where <c>Pid</c> + is the pid of the <c>gen_fsm</c> process. If a process with the + specified <c>FsmName</c> exists already, the function returns + <c>{error,{already_started,Pid}}</c>, where <c>Pid</c> is the pid of that process.</p> <p>If <c>Module:init/1</c> fails with <c>Reason</c>, the function returns <c>{error,Reason}</c>. If @@ -173,129 +411,106 @@ gen_fsm:sync_send_all_state_event -----> Module:handle_sync_event/4 returns <c>{error,Reason}</c> or <c>ignore</c>, respectively.</p> </desc> </func> + <func> - <name>start(Module, Args, Options) -> Result</name> - <name>start(FsmName, Module, Args, Options) -> Result</name> - <fsummary>Create a stand-alone gen_fsm process.</fsummary> + <name>start_timer(Time, Msg) -> Ref</name> + <fsummary>Send a time-out event internally in a generic FSM.</fsummary> <type> - <v>FsmName = {local,Name} | {global,GlobalName} - | {via,Module,ViaName}</v> - <v> Name = atom()</v> - <v> GlobalName = ViaName = term()</v> - <v>Module = atom()</v> - <v>Args = term()</v> - <v>Options = [Option]</v> - <v> Option = {debug,Dbgs} | {timeout,Time} | {spawn_opt,SOpts}</v> - <v> Dbgs = [Dbg]</v> - <v> Dbg = trace | log | statistics</v> - <v> | {log_to_file,FileName} | {install,{Func,FuncState}}</v> - <v> SOpts = [term()]</v> - <v>Result = {ok,Pid} | ignore | {error,Error}</v> - <v> Pid = pid()</v> - <v> Error = {already_started,Pid} | term()</v> + <v>Time = integer()</v> + <v>Msg = term()</v> + <v>Ref = reference()</v> </type> <desc> - <p>Creates a stand-alone gen_fsm process, i.e. a gen_fsm which - is not part of a supervision tree and thus has no supervisor.</p> - <p>See <seealso marker="#start_link/3">start_link/3,4</seealso> - for a description of arguments and return values.</p> + <p>Sends a time-out event internally in the <c>gen_fsm</c> + process that calls this function after <c>Time</c> milliseconds. + Returns immediately a + reference that can be used to cancel the timer using + <seealso marker="#cancel_timer/1"><c>cancel_timer/1</c></seealso>.</p> + <p>The <c>gen_fsm</c> process calls + <seealso marker="#Module:StateName/2"> + <c>Module:StateName/2</c></seealso> to handle + the event, where <c>StateName</c> is the name of the current + state of the <c>gen_fsm</c> process at the time the time-out + message is delivered.</p> + <p><c>Msg</c> is any term that is passed in the + time-out message, <c>{timeout, Ref, Msg}</c>, as one of + the arguments to <c>Module:StateName/2</c>.</p> </desc> </func> + <func> <name>stop(FsmRef) -> ok</name> <name>stop(FsmRef, Reason, Timeout) -> ok</name> <fsummary>Synchronously stop a generic FSM.</fsummary> <type> - <v>FsmRef = Name | {Name,Node} | {global,GlobalName} - | {via,Module,ViaName} | pid()</v> + <v>FsmRef = Name | {Name,Node} | {global,GlobalName}</v> + <v> | {via,Module,ViaName} | pid()</v> <v> Node = atom()</v> <v> GlobalName = ViaName = term()</v> <v>Reason = term()</v> <v>Timeout = int()>0 | infinity</v> </type> <desc> - <p>Orders a generic FSM to exit with the given <c>Reason</c> - and waits for it to terminate. The gen_fsm will call - <seealso marker="#Module:terminate/3">Module:terminate/3</seealso> - before exiting.</p> - <p>The function returns <c>ok</c> if the generic FSM terminates - with the expected reason. Any other reason than <c>normal</c>, - <c>shutdown</c>, or <c>{shutdown,Term}</c> will cause an + <p>Orders a generic finite state machine to exit with the specified + <c>Reason</c> and waits for it to terminate. The <c>gen_fsm</c> + process calls <seealso marker="#Module:terminate/3"> + <c>Module:terminate/3</c></seealso> before exiting.</p> + <p>The function returns <c>ok</c> if the generic finite state machine + terminates with the expected reason. Any other reason than + <c>normal</c>, <c>shutdown</c>, or <c>{shutdown,Term}</c> causes an error report to be issued using - <seealso marker="kernel:error_logger#format/2">error_logger:format/2</seealso>. - The default <c>Reason</c> is <c>normal</c>.</p> - <p><c>Timeout</c> is an integer greater than zero which + <seealso marker="kernel:error_logger#format/2"> + <c>error_logger:format/2</c></seealso>. + The default <c>Reason</c> is <c>normal</c>.</p> + <p><c>Timeout</c> is an integer greater than zero that specifies how many milliseconds to wait for the generic FSM to terminate, or the atom <c>infinity</c> to wait indefinitely. The default value is <c>infinity</c>. If the - generic FSM has not terminated within the specified time, a - <c>timeout</c> exception is raised.</p> - <p>If the process does not exist, a <c>noproc</c> exception - is raised.</p> - </desc> - </func> - <func> - <name>send_event(FsmRef, Event) -> ok</name> - <fsummary>Send an event asynchronously to a generic FSM.</fsummary> - <type> - <v>FsmRef = Name | {Name,Node} | {global,GlobalName} - | {via,Module,ViaName} | pid()</v> - <v> Name = Node = atom()</v> - <v> GlobalName = ViaName = term()</v> - <v>Event = term()</v> - </type> - <desc> - <p>Sends an event asynchronously to the gen_fsm <c>FsmRef</c> - and returns <c>ok</c> immediately. The gen_fsm will call - <c>Module:StateName/2</c> to handle the event, where - <c>StateName</c> is the name of the current state of - the gen_fsm.</p> - <p><c>FsmRef</c> can be:</p> - <list type="bulleted"> - <item>the pid,</item> - <item><c>Name</c>, if the gen_fsm is locally registered,</item> - <item><c>{Name,Node}</c>, if the gen_fsm is locally - registered at another node, or</item> - <item><c>{global,GlobalName}</c>, if the gen_fsm is globally - registered.</item> - <item><c>{via,Module,ViaName}</c>, if the gen_fsm is registered - through an alternative process registry.</item> - </list> - <p><c>Event</c> is an arbitrary term which is passed as one of - the arguments to <c>Module:StateName/2</c>.</p> + generic finite state machine has not terminated within the specified + time, a <c>timeout</c> exception is raised.</p> + <p>If the process does not exist, a <c>noproc</c> exception + is raised.</p> </desc> </func> + <func> - <name>send_all_state_event(FsmRef, Event) -> ok</name> - <fsummary>Send an event asynchronously to a generic FSM.</fsummary> + <name>sync_send_all_state_event(FsmRef, Event) -> Reply</name> + <name>sync_send_all_state_event(FsmRef, Event, Timeout) -> Reply</name> + <fsummary>Send an event synchronously to a generic FSM.</fsummary> <type> - <v>FsmRef = Name | {Name,Node} | {global,GlobalName} - | {via,Module,ViaName} | pid()</v> + <v>FsmRef = Name | {Name,Node} | {global,GlobalName}</v> + <v> | {via,Module,ViaName} | pid()</v> <v> Name = Node = atom()</v> <v> GlobalName = ViaName = term()</v> <v>Event = term()</v> + <v>Timeout = int()>0 | infinity</v> + <v>Reply = term()</v> </type> <desc> - <p>Sends an event asynchronously to the gen_fsm <c>FsmRef</c> - and returns <c>ok</c> immediately. The gen_fsm will call - <c>Module:handle_event/3</c> to handle the event.</p> - <p>See <seealso marker="#send_event/2">send_event/2</seealso> - for a description of the arguments.</p> - <p>The difference between <c>send_event</c> and - <c>send_all_state_event</c> is which callback function is - used to handle the event. This function is useful when - sending events that are handled the same way in every state, - as only one <c>handle_event</c> clause is needed to handle - the event instead of one clause in each state name function.</p> + <p>Sends an event to the <c>FsmRef</c> of the <c>gen_fsm</c> + process and waits until a reply arrives or a time-out occurs. + The <c>gen_fsm</c> process calls + <seealso marker="#Module:handle_sync_event/4"> + <c>Module:handle_sync_event/4</c></seealso> to handle the event.</p> + <p>For a description of <c>FsmRef</c> and <c>Event</c>, see + <seealso marker="#send_event/2">send_event/2</seealso>. + For a description of <c>Timeout</c> and <c>Reply</c>, see + <seealso marker="#sync_send_event/3"> + <c>sync_send_event/3</c></seealso>.</p> + <p>For a discussion about the difference between + <c>sync_send_event</c> and <c>sync_send_all_state_event</c>, see + <seealso marker="#send_all_state_event/2"> + <c>send_all_state_event/2</c></seealso>.</p> </desc> </func> + <func> <name>sync_send_event(FsmRef, Event) -> Reply</name> <name>sync_send_event(FsmRef, Event, Timeout) -> Reply</name> <fsummary>Send an event synchronously to a generic FSM.</fsummary> <type> - <v>FsmRef = Name | {Name,Node} | {global,GlobalName} - | {via,Module,ViaName} | pid()</v> + <v>FsmRef = Name | {Name,Node} | {global,GlobalName}</v> + <v> | {via,Module,ViaName} | pid()</v> <v> Name = Node = atom()</v> <v> GlobalName = ViaName = term()</v> <v>Event = term()</v> @@ -303,210 +518,231 @@ gen_fsm:sync_send_all_state_event -----> Module:handle_sync_event/4 <v>Reply = term()</v> </type> <desc> - <p>Sends an event to the gen_fsm <c>FsmRef</c> and waits until a - reply arrives or a timeout occurs. The gen_fsm will call - <c>Module:StateName/3</c> to handle the event, where + <p>Sends an event to the <c>FsmRef</c> of the <c>gen_fsm</c> + process and waits until a reply arrives or a time-out occurs. + <c>The gen_fsm</c> process calls + <seealso marker="#Module:StateName/3"> + <c>Module:StateName/3</c></seealso> to handle the event, where <c>StateName</c> is the name of the current state of - the gen_fsm.</p> - <p>See <seealso marker="#send_event/2">send_event/2</seealso> - for a description of <c>FsmRef</c> and <c>Event</c>.</p> - <p><c>Timeout</c> is an integer greater than zero which + the <c>gen_fsm</c> process.</p> + <p>For a description of <c>FsmRef</c> and <c>Event</c>, see + <seealso marker="#send_event/2"><c>send_event/2</c></seealso>.</p> + <p><c>Timeout</c> is an integer greater than zero that specifies how many milliseconds to wait for a reply, or - the atom <c>infinity</c> to wait indefinitely. Default value - is 5000. If no reply is received within the specified time, + the atom <c>infinity</c> to wait indefinitely. Defaults + to 5000. If no reply is received within the specified time, the function call fails.</p> - <p>The return value <c>Reply</c> is defined in the return value + <p>Return value <c>Reply</c> is defined in the return value of <c>Module:StateName/3</c>.</p> - <p>The ancient behaviour of sometimes consuming the server + <note> + <p>The ancient behavior of sometimes consuming the server exit message if the server died during the call while - linked to the client has been removed in OTP R12B/Erlang 5.6.</p> + linked to the client was removed in Erlang 5.6/OTP R12B.</p> + </note> </desc> </func> + </funcs> + + <section> + <title>Callback Functions</title> + <p>The following functions are to be exported from a <c>gen_fsm</c> + callback module.</p> + + <p><em>state name</em> denotes a state of the state machine.</p> + + <p><em>state data</em> denotes the internal state of the Erlang process + that implements the state machine.</p> + </section> + + <funcs> <func> - <name>sync_send_all_state_event(FsmRef, Event) -> Reply</name> - <name>sync_send_all_state_event(FsmRef, Event, Timeout) -> Reply</name> - <fsummary>Send an event synchronously to a generic FSM.</fsummary> + <name>Module:code_change(OldVsn, StateName, StateData, Extra) -> {ok, NextStateName, NewStateData}</name> + <fsummary>Update the internal state data during upgrade/downgrade. + </fsummary> <type> - <v>FsmRef = Name | {Name,Node} | {global,GlobalName} - | {via,Module,ViaName} | pid()</v> - <v> Name = Node = atom()</v> - <v> GlobalName = ViaName = term()</v> - <v>Event = term()</v> - <v>Timeout = int()>0 | infinity</v> - <v>Reply = term()</v> + <v>OldVsn = Vsn | {down, Vsn}</v> + <v> Vsn = term()</v> + <v>StateName = NextStateName = atom()</v> + <v>StateData = NewStateData = term()</v> + <v>Extra = term()</v> </type> <desc> - <p>Sends an event to the gen_fsm <c>FsmRef</c> and waits until a - reply arrives or a timeout occurs. The gen_fsm will call - <c>Module:handle_sync_event/4</c> to handle the event.</p> - <p>See <seealso marker="#send_event/2">send_event/2</seealso> - for a description of <c>FsmRef</c> and <c>Event</c>. See - <seealso marker="#sync_send_event/3">sync_send_event/3</seealso> - for a description of <c>Timeout</c> and <c>Reply</c>.</p> - <p>See - <seealso marker="#send_all_state_event/2">send_all_state_event/2</seealso> - for a discussion about the difference between - <c>sync_send_event</c> and <c>sync_send_all_state_event</c>.</p> + <p>This function is called by a <c>gen_fsm</c> process when it is to + update its internal state data during a release upgrade/downgrade, + that is, when instruction <c>{update,Module,Change,...}</c>, + where <c>Change={advanced,Extra}</c>, is given in + the <c>appup</c> file; see section + <seealso marker="doc/design_principles:release_handling#instr"> + Release Handling Instructions</seealso> in OTP Design Principles.</p> + <p>For an upgrade, <c>OldVsn</c> is <c>Vsn</c>, and for a downgrade, + <c>OldVsn</c> is <c>{down,Vsn}</c>. <c>Vsn</c> is defined by the + <c>vsn</c> attribute(s) of the old version of the callback module + <c>Module</c>. If no such attribute is defined, the version is + the checksum of the Beam file.</p> + <p><c>StateName</c> is the current state name and <c>StateData</c> the + internal state data of the <c>gen_fsm</c> process.</p> + <p><c>Extra</c> is passed "as is" from the <c>{advanced,Extra}</c> + part of the update instruction.</p> + <p>The function is to return the new current state name and + updated internal data.</p> </desc> </func> + <func> - <name>reply(Caller, Reply) -> Result</name> - <fsummary>Send a reply to a caller.</fsummary> + <name>Module:format_status(Opt, [PDict, StateData]) -> Status</name> + <fsummary>Optional function for providing a term describing the + current <c>gen_fsm</c> process status.</fsummary> <type> - <v>Caller - see below</v> - <v>Reply = term()</v> - <v>Result = term()</v> + <v>Opt = normal | terminate</v> + <v>PDict = [{Key, Value}]</v> + <v>StateData = term()</v> + <v>Status = term()</v> </type> <desc> - <p>This function can be used by a gen_fsm to explicitly send a - reply to a client process that called - <seealso marker="#sync_send_event/2">sync_send_event/2,3</seealso> - or - <seealso marker="#sync_send_all_state_event/2">sync_send_all_state_event/2,3</seealso>, - when the reply cannot be defined in the return value of - <c>Module:State/3</c> or <c>Module:handle_sync_event/4</c>.</p> - <p><c>Caller</c> must be the <c>From</c> argument provided to - the callback function. <c>Reply</c> is an arbitrary term, - which will be given back to the client as the return value of - <c>sync_send_event/2,3</c> or - <c>sync_send_all_state_event/2,3</c>.</p> - <p>The return value <c>Result</c> is not further defined, and - should always be ignored.</p> + <note> + <p>This callback is optional, so callback modules need not + export it. The <c>gen_fsm</c> module provides a default + implementation of this function that returns the callback + module state data.</p> + </note> + <p>This function is called by a <c>gen_fsm</c> process in the + following situations:</p> + <list type="bulleted"> + <item>One of <seealso marker="sys#get_status/1"> + <c>sys:get_status/1,2</c></seealso> + is invoked to get the <c>gen_fsm</c> status. <c>Opt</c> is set to + the atom <c>normal</c> for this case.</item> + <item>The <c>gen_fsm</c> process terminates abnormally and logs an + error. <c>Opt</c> is set to the atom <c>terminate</c> for + this case.</item> + </list> + <p>This function is useful for changing the form and + appearance of the <c>gen_fsm</c> status for these cases. A callback + module wishing to change the <c>sys:get_status/1,2</c> + return value as well as how its status appears in + termination error logs, exports an instance + of <c>format_status/2</c> that returns a term describing the + current status of the <c>gen_fsm</c> process.</p> + <p><c>PDict</c> is the current value of the process dictionary of the + <c>gen_fsm</c> process.</p> + <p><c>StateData</c> is the internal state data of the + <c>gen_fsm</c> process.</p> + <p>The function is to return <c>Status</c>, a term that + change the details of the current state and status of + the <c>gen_fsm</c> process. There are no restrictions on the + form <c>Status</c> can take, but for + the <c>sys:get_status/1,2</c> case (when <c>Opt</c> + is <c>normal</c>), the recommended form for + the <c>Status</c> value is <c>[{data, [{"StateData", + Term}]}]</c>, where <c>Term</c> provides relevant details of + the <c>gen_fsm</c> state data. Following this recommendation is not + required, but it makes the callback module status + consistent with the rest of the <c>sys:get_status/1,2</c> + return value.</p> + <p>One use for this function is to return compact alternative + state data representations to avoid that large state terms + are printed in log files.</p> </desc> </func> + <func> - <name>send_event_after(Time, Event) -> Ref</name> - <fsummary>Send a delayed event internally in a generic FSM.</fsummary> + <name>Module:handle_event(Event, StateName, StateData) -> Result</name> + <fsummary>Handle an asynchronous event.</fsummary> <type> - <v>Time = integer()</v> <v>Event = term()</v> - <v>Ref = reference()</v> - </type> - <desc> - <p>Sends a delayed event internally in the gen_fsm that calls - this function after <c>Time</c> ms. Returns immediately a - reference that can be used to cancel the delayed send using - <seealso marker="#cancel_timer/1">cancel_timer/1</seealso>.</p> - <p>The gen_fsm will call <c>Module:StateName/2</c> to handle - the event, where <c>StateName</c> is the name of the current - state of the gen_fsm at the time the delayed event is - delivered.</p> - <p><c>Event</c> is an arbitrary term which is passed as one of - the arguments to <c>Module:StateName/2</c>.</p> - </desc> - </func> - <func> - <name>start_timer(Time, Msg) -> Ref</name> - <fsummary>Send a timeout event internally in a generic FSM.</fsummary> - <type> - <v>Time = integer()</v> - <v>Msg = term()</v> - <v>Ref = reference()</v> + <v>StateName = atom()</v> + <v>StateData = term()</v> + <v>Result = {next_state,NextStateName,NewStateData}</v> + <v> | {next_state,NextStateName,NewStateData,Timeout}</v> + <v> | {next_state,NextStateName,NewStateData,hibernate}</v> + <v> | {stop,Reason,NewStateData}</v> + <v> NextStateName = atom()</v> + <v> NewStateData = term()</v> + <v> Timeout = int()>0 | infinity</v> + <v> Reason = term()</v> </type> <desc> - <p>Sends a timeout event internally in the gen_fsm that calls - this function after <c>Time</c> ms. Returns immediately a - reference that can be used to cancel the timer using - <seealso marker="#cancel_timer/1">cancel_timer/1</seealso>.</p> - <p>The gen_fsm will call <c>Module:StateName/2</c> to handle - the event, where <c>StateName</c> is the name of the current - state of the gen_fsm at the time the timeout message is - delivered.</p> - <p><c>Msg</c> is an arbitrary term which is passed in the - timeout message, <c>{timeout, Ref, Msg}</c>, as one of - the arguments to <c>Module:StateName/2</c>.</p> + <p>Whenever a <c>gen_fsm</c> process receives an event sent using + <seealso marker="#send_all_state_event/2"> + <c>send_all_state_event/2</c></seealso>, + this function is called to handle the event.</p> + <p><c>StateName</c> is the current state name of the <c>gen_fsm</c> + process.</p> + <p>For a description of the other arguments and possible return values, + see <seealso marker="#Module:StateName/2"> + <c>Module:StateName/2</c></seealso>.</p> </desc> </func> + <func> - <name>cancel_timer(Ref) -> RemainingTime | false</name> - <fsummary>Cancel an internal timer in a generic FSM.</fsummary> + <name>Module:handle_info(Info, StateName, StateData) -> Result</name> + <fsummary>Handle an incoming message.</fsummary> <type> - <v>Ref = reference()</v> - <v>RemainingTime = integer()</v> + <v>Info = term()</v> + <v>StateName = atom()</v> + <v>StateData = term()</v> + <v>Result = {next_state,NextStateName,NewStateData}</v> + <v> | {next_state,NextStateName,NewStateData,Timeout}</v> + <v> | {next_state,NextStateName,NewStateData,hibernate}</v> + <v> | {stop,Reason,NewStateData}</v> + <v> NextStateName = atom()</v> + <v> NewStateData = term()</v> + <v> Timeout = int()>0 | infinity</v> + <v> Reason = normal | term()</v> </type> <desc> - <p>Cancels an internal timer referred by <c>Ref</c> in the - gen_fsm that calls this function.</p> - <p><c>Ref</c> is a reference returned from - <seealso marker="#send_event_after/2">send_event_after/2</seealso> - or - <seealso marker="#start_timer/2">start_timer/2</seealso>.</p> - <p>If the timer has already timed out, but the event not yet - been delivered, it is cancelled as if it had <em>not</em> - timed out, so there will be no false timer event after - returning from this function.</p> - <p>Returns the remaining time in ms until the timer would - have expired if <c>Ref</c> referred to an active timer, - <c>false</c> otherwise.</p> + <p>This function is called by a <c>gen_fsm</c> process when it receives + any other message than a synchronous or asynchronous event (or a + system message).</p> + <p><c>Info</c> is the received message.</p> + <p>For a description of the other arguments and possible return values, + see <seealso marker="#Module:StateName/2"> + <c>Module:StateName/2</c></seealso>.</p> </desc> </func> + <func> - <name>enter_loop(Module, Options, StateName, StateData)</name> - <name>enter_loop(Module, Options, StateName, StateData, FsmName)</name> - <name>enter_loop(Module, Options, StateName, StateData, Timeout)</name> - <name>enter_loop(Module, Options, StateName, StateData, FsmName, Timeout)</name> - <fsummary>Enter the gen_fsm receive loop</fsummary> + <name>Module:handle_sync_event(Event, From, StateName, StateData) -> Result</name> + <fsummary>Handle a synchronous event.</fsummary> <type> - <v>Module = atom()</v> - <v>Options = [Option]</v> - <v> Option = {debug,Dbgs}</v> - <v> Dbgs = [Dbg]</v> - <v> Dbg = trace | log | statistics</v> - <v> | {log_to_file,FileName} | {install,{Func,FuncState}}</v> + <v>Event = term()</v> + <v>From = {pid(),Tag}</v> <v>StateName = atom()</v> <v>StateData = term()</v> - <v>FsmName = {local,Name} | {global,GlobalName} - | {via,Module,ViaName}</v> - <v> Name = atom()</v> - <v> GlobalName = ViaName = term()</v> - <v>Timeout = int() | infinity</v> + <v>Result = {reply,Reply,NextStateName,NewStateData}</v> + <v> | {reply,Reply,NextStateName,NewStateData,Timeout}</v> + <v> | {reply,Reply,NextStateName,NewStateData,hibernate}</v> + <v> | {next_state,NextStateName,NewStateData}</v> + <v> | {next_state,NextStateName,NewStateData,Timeout}</v> + <v> | {next_state,NextStateName,NewStateData,hibernate}</v> + <v> | {stop,Reason,Reply,NewStateData} | {stop,Reason,NewStateData}</v> + <v> Reply = term()</v> + <v> NextStateName = atom()</v> + <v> NewStateData = term()</v> + <v> Timeout = int()>0 | infinity</v> + <v> Reason = term()</v> </type> <desc> - <p>Makes an existing process into a gen_fsm. Does not return, - instead the calling process will enter the gen_fsm receive - loop and become a gen_fsm process. The process <em>must</em> - have been started using one of the start functions in - <c>proc_lib</c>, see - <seealso marker="proc_lib">proc_lib(3)</seealso>. The user is - responsible for any initialization of the process, including - registering a name for it.</p> - <p>This function is useful when a more complex initialization - procedure is needed than the gen_fsm behaviour provides.</p> - <p><c>Module</c>, <c>Options</c> and <c>FsmName</c> have - the same meanings as when calling - <seealso marker="#start_link/3">start[_link]/3,4</seealso>. - However, if <c>FsmName</c> is specified, the process must have - been registered accordingly <em>before</em> this function is - called.</p> - <p><c>StateName</c>, <c>StateData</c> and <c>Timeout</c> have - the same meanings as in the return value of - <seealso marker="#Moduleinit">Module:init/1</seealso>. - Also, the callback module <c>Module</c> does not need to - export an <c>init/1</c> function.</p> - <p>Failure: If the calling process was not started by a - <c>proc_lib</c> start function, or if it is not registered - according to <c>FsmName</c>.</p> + <p>Whenever a <c>gen_fsm</c> process receives an event sent using + <seealso marker="#sync_send_all_state_event/2"> + <c>sync_send_all_state_event/2,3</c></seealso>, + this function is called to handle the event.</p> + <p><c>StateName</c> is the current state name of the <c>gen_fsm</c> + process.</p> + <p>For a description of the other arguments and possible return values, + see <seealso marker="#Module:StateName/3"> + <c>Module:StateName/3</c></seealso>.</p> </desc> </func> - </funcs> - <section> - <title>CALLBACK FUNCTIONS</title> - <p>The following functions should be exported from a <c>gen_fsm</c> - callback module.</p> - <p>In the description, the expression <em>state name</em> is used to - denote a state of the state machine. <em>state data</em> is used - to denote the internal state of the Erlang process which - implements the state machine.</p> - </section> - <funcs> <func> <name>Module:init(Args) -> Result</name> - <fsummary>Initialize process and internal state name and state data.</fsummary> + <fsummary>Initialize process and internal state name and state data. + </fsummary> <type> <v>Args = term()</v> <v>Result = {ok,StateName,StateData} | {ok,StateName,StateData,Timeout}</v> - <v> | {ok,StateName,StateData,hibernate}</v> + <v> | {ok,StateName,StateData,hibernate}</v> <v> | {stop,Reason} | ignore</v> <v> StateName = atom()</v> <v> StateData = term()</v> @@ -515,33 +751,36 @@ gen_fsm:sync_send_all_state_event -----> Module:handle_sync_event/4 </type> <desc> <marker id="Moduleinit"></marker> - <p>Whenever a gen_fsm is started using - <seealso marker="#start/3">gen_fsm:start/3,4</seealso> or - <seealso marker="#start_link/3">gen_fsm:start_link/3,4</seealso>, + <p>Whenever a <c>gen_fsm</c> process is started using + <seealso marker="#start/3"><c>start/3,4</c></seealso> or + <seealso marker="#start_link/3"><c>start_link/3,4</c></seealso>, this function is called by the new process to initialize.</p> <p><c>Args</c> is the <c>Args</c> argument provided to the start function.</p> - <p>If initialization is successful, the function should return - <c>{ok,StateName,StateData}</c>, - <c>{ok,StateName,StateData,Timeout}</c> or <c>{ok,StateName,StateData,hibernate}</c>, - where <c>StateName</c> + <p>If initialization is successful, the function is to return + <c>{ok,StateName,StateData}</c>, + <c>{ok,StateName,StateData,Timeout}</c>, or + <c>{ok,StateName,StateData,hibernate}</c>, where <c>StateName</c> is the initial state name and <c>StateData</c> the initial - state data of the gen_fsm.</p> - <p>If an integer timeout value is provided, a timeout will occur + state data of the <c>gen_fsm</c> process.</p> + <p>If an integer time-out value is provided, a time-out occurs unless an event or a message is received within <c>Timeout</c> - milliseconds. A timeout is represented by the atom - <c>timeout</c> and should be handled by - the <c>Module:StateName/2</c> callback functions. The atom + milliseconds. A time-out is represented by the atom + <c>timeout</c> and is to be handled by the + <seealso marker="#Module:StateName/2"> + <c>Module:StateName/2</c></seealso> callback functions. The atom <c>infinity</c> can be used to wait indefinitely, this is the default value.</p> - <p>If <c>hibernate</c> is specified instead of a timeout value, the process will go - into hibernation when waiting for the next message to arrive (by calling - <seealso marker="proc_lib#hibernate/3">proc_lib:hibernate/3</seealso>).</p> - <p>If something goes wrong during the initialization - the function should return <c>{stop,Reason}</c>, where - <c>Reason</c> is any term, or <c>ignore</c>.</p> + <p>If <c>hibernate</c> is specified instead of a time-out value, the + process goes into hibernation when waiting for the next message + to arrive (by calling <seealso marker="proc_lib#hibernate/3"> + <c>proc_lib:hibernate/3</c></seealso>).</p> + <p>If the initialization fails, the function returns + <c>{stop,Reason}</c>, where <c>Reason</c> is any term, + or <c>ignore</c>.</p> </desc> </func> + <func> <name>Module:StateName(Event, StateData) -> Result</name> <fsummary>Handle an asynchronous event.</fsummary> @@ -549,8 +788,8 @@ gen_fsm:sync_send_all_state_event -----> Module:handle_sync_event/4 <v>Event = timeout | term()</v> <v>StateData = term()</v> <v>Result = {next_state,NextStateName,NewStateData} </v> - <v> | {next_state,NextStateName,NewStateData,Timeout}</v> - <v> | {next_state,NextStateName,NewStateData,hibernate}</v> + <v> | {next_state,NextStateName,NewStateData,Timeout}</v> + <v> | {next_state,NextStateName,NewStateData,hibernate}</v> <v> | {stop,Reason,NewStateData}</v> <v> NextStateName = atom()</v> <v> NewStateData = term()</v> @@ -558,56 +797,33 @@ gen_fsm:sync_send_all_state_event -----> Module:handle_sync_event/4 <v> Reason = term()</v> </type> <desc> - <p>There should be one instance of this function for each - possible state name. Whenever a gen_fsm receives an event - sent using - <seealso marker="#send_event/2">gen_fsm:send_event/2</seealso>, + <p>There is to be one instance of this function for each + possible state name. Whenever a <c>gen_fsm</c> process receives + an event sent using + <seealso marker="#send_event/2"><c>send_event/2</c></seealso>, the instance of this function with the same name as the current state name <c>StateName</c> is called to handle - the event. It is also called if a timeout occurs.</p> - <p><c>Event</c> is either the atom <c>timeout</c>, if a timeout + the event. It is also called if a time-out occurs.</p> + <p><c>Event</c> is either the atom <c>timeout</c>, if a time-out has occurred, or the <c>Event</c> argument provided to <c>send_event/2</c>.</p> - <p><c>StateData</c> is the state data of the gen_fsm.</p> + <p><c>StateData</c> is the state data of the <c>gen_fsm</c> process.</p> <p>If the function returns <c>{next_state,NextStateName,NewStateData}</c>, - <c>{next_state,NextStateName,NewStateData,Timeout}</c> or - <c>{next_state,NextStateName,NewStateData,hibernate}</c>, - the gen_fsm will continue executing with the current state + <c>{next_state,NextStateName,NewStateData,Timeout}</c>, or + <c>{next_state,NextStateName,NewStateData,hibernate}</c>, the + <c>gen_fsm</c> process continues executing with the current state name set to <c>NextStateName</c> and with the possibly - updated state data <c>NewStateData</c>. See - <c>Module:init/1</c> for a description of <c>Timeout</c> and <c>hibernate</c>.</p> + updated state data <c>NewStateData</c>. For a description of + <c>Timeout</c> and <c>hibernate</c>, see + <seealso marker="#Module:init/1"><c>Module:init/1</c></seealso>.</p> <p>If the function returns <c>{stop,Reason,NewStateData}</c>, - the gen_fsm will call + the <c>gen_fsm</c> process calls <c>Module:terminate(Reason,StateName,NewStateData)</c> and - terminate.</p> - </desc> - </func> - <func> - <name>Module:handle_event(Event, StateName, StateData) -> Result</name> - <fsummary>Handle an asynchronous event.</fsummary> - <type> - <v>Event = term()</v> - <v>StateName = atom()</v> - <v>StateData = term()</v> - <v>Result = {next_state,NextStateName,NewStateData} </v> - <v> | {next_state,NextStateName,NewStateData,Timeout}</v> - <v> | {next_state,NextStateName,NewStateData,hibernate}</v> - <v> | {stop,Reason,NewStateData}</v> - <v> NextStateName = atom()</v> - <v> NewStateData = term()</v> - <v> Timeout = int()>0 | infinity</v> - <v> Reason = term()</v> - </type> - <desc> - <p>Whenever a gen_fsm receives an event sent using - <seealso marker="#send_all_state_event/2">gen_fsm:send_all_state_event/2</seealso>, - this function is called to handle the event.</p> - <p><c>StateName</c> is the current state name of the gen_fsm.</p> - <p>See <c>Module:StateName/2</c> for a description of the other - arguments and possible return values.</p> + terminates.</p> </desc> </func> + <func> <name>Module:StateName(Event, From, StateData) -> Result</name> <fsummary>Handle a synchronous event.</fsummary> @@ -616,11 +832,11 @@ gen_fsm:sync_send_all_state_event -----> Module:handle_sync_event/4 <v>From = {pid(),Tag}</v> <v>StateData = term()</v> <v>Result = {reply,Reply,NextStateName,NewStateData}</v> - <v> | {reply,Reply,NextStateName,NewStateData,Timeout}</v> - <v> | {reply,Reply,NextStateName,NewStateData,hibernate}</v> + <v> | {reply,Reply,NextStateName,NewStateData,Timeout}</v> + <v> | {reply,Reply,NextStateName,NewStateData,hibernate}</v> <v> | {next_state,NextStateName,NewStateData}</v> - <v> | {next_state,NextStateName,NewStateData,Timeout}</v> - <v> | {next_state,NextStateName,NewStateData,hibernate}</v> + <v> | {next_state,NextStateName,NewStateData,Timeout}</v> + <v> | {next_state,NextStateName,NewStateData,hibernate}</v> <v> | {stop,Reason,Reply,NewStateData} | {stop,Reason,NewStateData}</v> <v> Reply = term()</v> <v> NextStateName = atom()</v> @@ -629,102 +845,56 @@ gen_fsm:sync_send_all_state_event -----> Module:handle_sync_event/4 <v> Reason = normal | term()</v> </type> <desc> - <p>There should be one instance of this function for each - possible state name. Whenever a gen_fsm receives an event - sent using - <seealso marker="#sync_send_event/2">gen_fsm:sync_send_event/2,3</seealso>, + <p>There is to be one instance of this function for each + possible state name. Whenever a <c>gen_fsm</c> process receives an + event sent using <seealso marker="#sync_send_event/2"> + <c>sync_send_event/2,3</c></seealso>, the instance of this function with the same name as the current state name <c>StateName</c> is called to handle the event.</p> <p><c>Event</c> is the <c>Event</c> argument provided to - <c>sync_send_event</c>.</p> + <c>sync_send_event/2,3</c>.</p> <p><c>From</c> is a tuple <c>{Pid,Tag}</c> where <c>Pid</c> is - the pid of the process which called <c>sync_send_event/2,3</c> + the pid of the process that called <c>sync_send_event/2,3</c> and <c>Tag</c> is a unique tag.</p> - <p><c>StateData</c> is the state data of the gen_fsm.</p> - <p>If the function returns - <c>{reply,Reply,NextStateName,NewStateData}</c>, - <c>{reply,Reply,NextStateName,NewStateData,Timeout}</c> or - <c>{reply,Reply,NextStateName,NewStateData,hibernate}</c>, - <c>Reply</c> will be given back to <c>From</c> as the return - value of <c>sync_send_event/2,3</c>. The gen_fsm then - continues executing with the current state name set to - <c>NextStateName</c> and with the possibly updated state data - <c>NewStateData</c>. See <c>Module:init/1</c> for a - description of <c>Timeout</c> and <c>hibernate</c>.</p> - <p>If the function returns - <c>{next_state,NextStateName,NewStateData}</c>, - <c>{next_state,NextStateName,NewStateData,Timeout}</c> or - <c>{next_state,NextStateName,NewStateData,hibernate}</c>, - the gen_fsm will continue executing in <c>NextStateName</c> - with <c>NewStateData</c>. Any reply to <c>From</c> must be - given explicitly using - <seealso marker="#reply/2">gen_fsm:reply/2</seealso>.</p> - <p>If the function returns - <c>{stop,Reason,Reply,NewStateData}</c>, <c>Reply</c> will be - given back to <c>From</c>. If the function returns - <c>{stop,Reason,NewStateData}</c>, any reply to <c>From</c> - must be given explicitly using <c>gen_fsm:reply/2</c>. - The gen_fsm will then call - <c>Module:terminate(Reason,StateName,NewStateData)</c> and - terminate.</p> - </desc> - </func> - <func> - <name>Module:handle_sync_event(Event, From, StateName, StateData) -> Result</name> - <fsummary>Handle a synchronous event.</fsummary> - <type> - <v>Event = term()</v> - <v>From = {pid(),Tag}</v> - <v>StateName = atom()</v> - <v>StateData = term()</v> - <v>Result = {reply,Reply,NextStateName,NewStateData}</v> - <v> | {reply,Reply,NextStateName,NewStateData,Timeout}</v> - <v> | {reply,Reply,NextStateName,NewStateData,hibernate}</v> - <v> | {next_state,NextStateName,NewStateData}</v> - <v> | {next_state,NextStateName,NewStateData,Timeout}</v> - <v> | {next_state,NextStateName,NewStateData,hibernate}</v> - <v> | {stop,Reason,Reply,NewStateData} | {stop,Reason,NewStateData}</v> - <v> Reply = term()</v> - <v> NextStateName = atom()</v> - <v> NewStateData = term()</v> - <v> Timeout = int()>0 | infinity</v> - <v> Reason = term()</v> - </type> - <desc> - <p>Whenever a gen_fsm receives an event sent using - <seealso marker="#sync_send_all_state_event/2">gen_fsm:sync_send_all_state_event/2,3</seealso>, - this function is called to handle the event.</p> - <p><c>StateName</c> is the current state name of the gen_fsm.</p> - <p>See <c>Module:StateName/3</c> for a description of the other - arguments and possible return values.</p> - </desc> - </func> - <func> - <name>Module:handle_info(Info, StateName, StateData) -> Result</name> - <fsummary>Handle an incoming message.</fsummary> - <type> - <v>Info = term()</v> - <v>StateName = atom()</v> - <v>StateData = term()</v> - <v>Result = {next_state,NextStateName,NewStateData}</v> - <v> | {next_state,NextStateName,NewStateData,Timeout}</v> - <v> | {next_state,NextStateName,NewStateData,hibernate}</v> - <v> | {stop,Reason,NewStateData}</v> - <v> NextStateName = atom()</v> - <v> NewStateData = term()</v> - <v> Timeout = int()>0 | infinity</v> - <v> Reason = normal | term()</v> - </type> - <desc> - <p>This function is called by a gen_fsm when it receives any - other message than a synchronous or asynchronous event (or a - system message).</p> - <p><c>Info</c> is the received message.</p> - <p>See <c>Module:StateName/2</c> for a description of the other - arguments and possible return values.</p> + <p><c>StateData</c> is the state data of the <c>gen_fsm</c> process.</p> + <list type="bulleted"> + <item> + <p>If <c>{reply,Reply,NextStateName,NewStateData}</c>, + <c>{reply,Reply,NextStateName,NewStateData,Timeout}</c>, or + <c>{reply,Reply,NextStateName,NewStateData,hibernate}</c> is + returned, <c>Reply</c> is given back to <c>From</c> as the return + value of <c>sync_send_event/2,3</c>. The <c>gen_fsm</c> process + then continues executing with the current state name set to + <c>NextStateName</c> and with the possibly updated state data + <c>NewStateData</c>. For a description of <c>Timeout</c> and + <c>hibernate</c>, see + <seealso marker="#Module:init/1"> + <c>Module:init/1</c></seealso>.</p> + </item> + <item> + <p>If <c>{next_state,NextStateName,NewStateData}</c>, + <c>{next_state,NextStateName,NewStateData,Timeout}</c>, or + <c>{next_state,NextStateName,NewStateData,hibernate}</c> is + returned, the <c>gen_fsm</c> process continues executing in + <c>NextStateName</c> with <c>NewStateData</c>. + Any reply to <c>From</c> must be specified explicitly using + <seealso marker="#reply/2"><c>reply/2</c></seealso>.</p> + </item> + <item> + <p>If the function returns + <c>{stop,Reason,Reply,NewStateData}</c>, <c>Reply</c> is + given back to <c>From</c>. If the function returns + <c>{stop,Reason,NewStateData}</c>, any reply to <c>From</c> + must be specified explicitly using <c>reply/2</c>. + The <c>gen_fsm</c> process then calls + <c>Module:terminate(Reason,StateName,NewStateData)</c> and + terminates.</p> + </item> + </list> </desc> </func> + <func> <name>Module:terminate(Reason, StateName, StateData)</name> <fsummary>Clean up before termination.</fsummary> @@ -734,134 +904,56 @@ gen_fsm:sync_send_all_state_event -----> Module:handle_sync_event/4 <v>StateData = term()</v> </type> <desc> - <p>This function is called by a gen_fsm when it is about to - terminate. It should be the opposite of <c>Module:init/1</c> - and do any necessary cleaning up. When it returns, the gen_fsm - terminates with <c>Reason</c>. The return value is ignored.</p> + <p>This function is called by a <c>gen_fsm</c> process when it is about + to terminate. It is to be the opposite of + <seealso marker="#Module:init/1"><c>Module:init/1</c></seealso> + and do any necessary cleaning up. When it returns, the <c>gen_fsm</c> + process terminates with <c>Reason</c>. The return value is ignored. + </p> <p><c>Reason</c> is a term denoting the stop reason, <c>StateName</c> is the current state name, and - <c>StateData</c> is the state data of the gen_fsm.</p> - <p><c>Reason</c> depends on why the gen_fsm is terminating. If + <c>StateData</c> is the state data of the <c>gen_fsm</c> process.</p> + <p><c>Reason</c> depends on why the <c>gen_fsm</c> process is + terminating. If it is because another callback function has returned a stop - tuple <c>{stop,..}</c>, <c>Reason</c> will have the value - specified in that tuple. If it is due to a failure, + tuple <c>{stop,..}</c>, <c>Reason</c> has the value + specified in that tuple. If it is because of a failure, <c>Reason</c> is the error reason.</p> - <p>If the gen_fsm is part of a supervision tree and is ordered - by its supervisor to terminate, this function will be called + <p>If the <c>gen_fsm</c> process is part of a supervision tree and is + ordered by its supervisor to terminate, this function is called with <c>Reason=shutdown</c> if the following conditions apply:</p> <list type="bulleted"> - <item>the gen_fsm has been set to trap exit signals, and</item> - <item>the shutdown strategy as defined in the supervisor's - child specification is an integer timeout value, not - <c>brutal_kill</c>.</item> + <item> + <p>The <c>gen_fsm</c> process has been set to trap exit signals.</p> + </item> + <item> + <p>The shutdown strategy as defined in the child specification of + the supervisor is an integer time-out value, not + <c>brutal_kill</c>.</p> + </item> </list> - <p>Even if the gen_fsm is <em>not</em> part of a supervision tree, - this function will be called if it receives an <c>'EXIT'</c> - message from its parent. <c>Reason</c> will be the same as in - the <c>'EXIT'</c> message.</p> - <p>Otherwise, the gen_fsm will be immediately terminated.</p> - <p>Note that for any other reason than <c>normal</c>, - <c>shutdown</c>, or <c>{shutdown,Term}</c> the gen_fsm is - assumed to terminate due to an error and - an error report is issued using - <seealso marker="kernel:error_logger#format/2">error_logger:format/2</seealso>.</p> - </desc> - </func> - <func> - <name>Module:code_change(OldVsn, StateName, StateData, Extra) -> {ok, NextStateName, NewStateData}</name> - <fsummary>Update the internal state data during upgrade/downgrade.</fsummary> - <type> - <v>OldVsn = Vsn | {down, Vsn}</v> - <v> Vsn = term()</v> - <v>StateName = NextStateName = atom()</v> - <v>StateData = NewStateData = term()</v> - <v>Extra = term()</v> - </type> - <desc> - <p>This function is called by a gen_fsm when it should update - its internal state data during a release upgrade/downgrade, - i.e. when the instruction <c>{update,Module,Change,...}</c> - where <c>Change={advanced,Extra}</c> is given in - the <c>appup</c> file. See - <seealso marker="doc/design_principles:release_handling#instr">OTP Design Principles</seealso>.</p> - <p>In the case of an upgrade, <c>OldVsn</c> is <c>Vsn</c>, and - in the case of a downgrade, <c>OldVsn</c> is - <c>{down,Vsn}</c>. <c>Vsn</c> is defined by the <c>vsn</c> - attribute(s) of the old version of the callback module - <c>Module</c>. If no such attribute is defined, the version is - the checksum of the BEAM file.</p> - <p><c>StateName</c> is the current state name and - <c>StateData</c> the internal state data of the gen_fsm.</p> - <p><c>Extra</c> is passed as-is from the <c>{advanced,Extra}</c> - part of the update instruction.</p> - <p>The function should return the new current state name and - updated internal data.</p> - </desc> - </func> - <func> - <name>Module:format_status(Opt, [PDict, StateData]) -> Status</name> - <fsummary>Optional function for providing a term describing the - current gen_fsm status.</fsummary> - <type> - <v>Opt = normal | terminate</v> - <v>PDict = [{Key, Value}]</v> - <v>StateData = term()</v> - <v>Status = term()</v> - </type> - <desc> - <note> - <p>This callback is optional, so callback modules need not - export it. The gen_fsm module provides a default - implementation of this function that returns the callback - module state data.</p> - </note> - <p>This function is called by a gen_fsm process when:</p> - <list type="bulleted"> - <item>One - of <seealso marker="sys#get_status/1">sys:get_status/1,2</seealso> - is invoked to get the gen_fsm status. <c>Opt</c> is set to - the atom <c>normal</c> for this case.</item> - <item>The gen_fsm terminates abnormally and logs an - error. <c>Opt</c> is set to the atom <c>terminate</c> for - this case.</item> - </list> - <p>This function is useful for customising the form and - appearance of the gen_fsm status for these cases. A callback - module wishing to customise the <c>sys:get_status/1,2</c> - return value as well as how its status appears in - termination error logs exports an instance - of <c>format_status/2</c> that returns a term describing the - current status of the gen_fsm.</p> - <p><c>PDict</c> is the current value of the gen_fsm's - process dictionary.</p> - <p><c>StateData</c> is the internal state data of the - gen_fsm.</p> - <p>The function should return <c>Status</c>, a term that - customises the details of the current state and status of - the gen_fsm. There are no restrictions on the - form <c>Status</c> can take, but for - the <c>sys:get_status/1,2</c> case (when <c>Opt</c> - is <c>normal</c>), the recommended form for - the <c>Status</c> value is <c>[{data, [{"StateData", - Term}]}]</c> where <c>Term</c> provides relevant details of - the gen_fsm state data. Following this recommendation isn't - required, but doing so will make the callback module status - consistent with the rest of the <c>sys:get_status/1,2</c> - return value.</p> - <p>One use for this function is to return compact alternative - state data representations to avoid having large state terms - printed in logfiles.</p> + <p>Even if the <c>gen_fsm</c> process is <em>not</em> part of a + supervision tree, + this function is called if it receives an <c>'EXIT'</c> + message from its parent. <c>Reason</c> is the same as in + the <c>'EXIT'</c> message.</p> + <p>Otherwise, the <c>gen_fsm</c> process terminates immediately.</p> + <p>Notice that for any other reason than <c>normal</c>, + <c>shutdown</c>, or <c>{shutdown,Term}</c> the <c>gen_fsm</c> process + is assumed to terminate because of an error and an error report is + issued using <seealso marker="kernel:error_logger#format/2"> + <c>error_logger:format/2</c></seealso>.</p> </desc> </func> </funcs> <section> - <title>SEE ALSO</title> - <p><seealso marker="gen_event">gen_event(3)</seealso>, - <seealso marker="gen_server">gen_server(3)</seealso>, - <seealso marker="gen_statem">gen_statem(3)</seealso>, - <seealso marker="supervisor">supervisor(3)</seealso>, - <seealso marker="proc_lib">proc_lib(3)</seealso>, - <seealso marker="sys">sys(3)</seealso></p> + <title>See Also</title> + <p><seealso marker="gen_event"><c>gen_event(3)</c></seealso>, + <seealso marker="gen_server"><c>gen_server(3)</c></seealso>, + <seealso marker="gen_statem"><c>gen_statem(3)</c></seealso>, + <seealso marker="proc_lib"><c>proc_lib(3)</c></seealso>, + <seealso marker="supervisor"><c>supervisor(3)</c></seealso>, + <seealso marker="sys"><c>sys(3)</c></seealso></p> </section> </erlref> diff --git a/lib/stdlib/doc/src/gen_server.xml b/lib/stdlib/doc/src/gen_server.xml index 10dc978afc..4a7dd60858 100644 --- a/lib/stdlib/doc/src/gen_server.xml +++ b/lib/stdlib/doc/src/gen_server.xml @@ -29,18 +29,21 @@ <rev></rev> </header> <module>gen_server</module> - <modulesummary>Generic Server Behaviour</modulesummary> + <modulesummary>Generic server behavior.</modulesummary> <description> - <p>A behaviour module for implementing the server of a client-server - relation. A generic server process (gen_server) implemented using - this module will have a standard set of interface functions and - include functionality for tracing and error reporting. It will - also fit into an OTP supervision tree. Refer to - <seealso marker="doc/design_principles:gen_server_concepts">OTP Design Principles</seealso> for more information.</p> - <p>A gen_server assumes all specific parts to be located in a - callback module exporting a pre-defined set of functions. - The relationship between the behaviour functions and the callback - functions can be illustrated as follows:</p> + <p>This behavior module provides the server of a client-server + relation. A generic server process (<c>gen_server</c>) implemented using + this module has a standard set of interface functions and + includes functionality for tracing and error reporting. It also + fits into an OTP supervision tree. For more information, see section + <seealso marker="doc/design_principles:gen_server_concepts"> + gen_server Behaviour</seealso> in OTP Design Principles.</p> + + <p>A <c>gen_server</c> process assumes all specific parts to be located in + a callback module exporting a predefined set of functions. + The relationship between the behavior functions and the callback + functions is as follows:</p> + <pre> gen_server module Callback module ----------------- --------------- @@ -59,175 +62,65 @@ gen_server:abcast -----> Module:handle_cast/2 - -----> Module:terminate/2 -- -----> Module:code_change/3 </pre> - <p>If a callback function fails or returns a bad value, - the gen_server will terminate.</p> - <p>A gen_server handles system messages as documented in - <seealso marker="sys">sys(3)</seealso>. The <c>sys</c> module - can be used for debugging a gen_server.</p> - <p>Note that a gen_server does not trap exit signals automatically, - this must be explicitly initiated in the callback module.</p> +- -----> Module:code_change/3</pre> + + <p>If a callback function fails or returns a bad value, the + <c>gen_server</c> process terminates.</p> + + <p>A <c>gen_server</c> process handles system messages as described in + <seealso marker="sys"><c>sys(3)</c></seealso>. The <c>sys</c> module + can be used for debugging a <c>gen_server</c> process.</p> + + <p>Notice that a <c>gen_server</c> process does not trap exit signals + automatically, this must be explicitly initiated in the callback + module.</p> + <p>Unless otherwise stated, all functions in this module fail if - the specified gen_server does not exist or if bad arguments are - given.</p> - - <p>The gen_server process can go into hibernation - (see <seealso marker="erts:erlang#erlang:hibernate/3">erlang(3)</seealso>) if a callback - function specifies <c>'hibernate'</c> instead of a timeout value. This - might be useful if the server is expected to be idle for a long - time. However this feature should be used with care as hibernation - implies at least two garbage collections (when hibernating and - shortly after waking up) and is not something you'd want to do - between each call to a busy server.</p> + the specified <c>gen_server</c> process does not exist or if bad + arguments are specified.</p> + <p>The <c>gen_server</c> process can go into hibernation + (see <seealso marker="erts:erlang#hibernate/3"> + <c>erlang:hibernate/3</c></seealso>) if a callback + function specifies <c>'hibernate'</c> instead of a time-out value. This + can be useful if the server is expected to be idle for a long + time. However, use this feature with care, as hibernation + implies at least two garbage collections (when hibernating and + shortly after waking up) and is not something you want to do + between each call to a busy server.</p> </description> + <funcs> <func> - <name>start_link(Module, Args, Options) -> Result</name> - <name>start_link(ServerName, Module, Args, Options) -> Result</name> - <fsummary>Create a gen_server process in a supervision tree.</fsummary> - <type> - <v>ServerName = {local,Name} | {global,GlobalName} - | {via,Module,ViaName}</v> - <v> Name = atom()</v> - <v> GlobalName = ViaName = term()</v> - <v>Module = atom()</v> - <v>Args = term()</v> - <v>Options = [Option]</v> - <v> Option = {debug,Dbgs} | {timeout,Time} | {spawn_opt,SOpts}</v> - <v> Dbgs = [Dbg]</v> - <v> Dbg = trace | log | statistics | {log_to_file,FileName} | {install,{Func,FuncState}}</v> - <v> SOpts = [term()]</v> - <v>Result = {ok,Pid} | ignore | {error,Error}</v> - <v> Pid = pid()</v> - <v> Error = {already_started,Pid} | term()</v> - </type> - <desc> - <p>Creates a gen_server process as part of a supervision tree. - The function should be called, directly or indirectly, by - the supervisor. It will, among other things, ensure that - the gen_server is linked to the supervisor.</p> - <p>The gen_server process calls <c>Module:init/1</c> to - initialize. To ensure a synchronized start-up procedure, - <c>start_link/3,4</c> does not return until - <c>Module:init/1</c> has returned.</p> - <p>If <c>ServerName={local,Name}</c> the gen_server is - registered locally as <c>Name</c> using <c>register/2</c>. - If <c>ServerName={global,GlobalName}</c> the gen_server is - registered globally as <c>GlobalName</c> using - <c>global:register_name/2</c>. If no name is provided, - the gen_server is not registered. - If <c>ServerName={via,Module,ViaName}</c>, the gen_server will - register with the registry represented by <c>Module</c>. - The <c>Module</c> callback should export the functions - <c>register_name/2</c>, <c>unregister_name/1</c>, - <c>whereis_name/1</c> and <c>send/2</c>, which should behave like the - corresponding functions in <c>global</c>. Thus, - <c>{via,global,GlobalName}</c> is a valid reference.</p> - <p><c>Module</c> is the name of the callback module.</p> - <p><c>Args</c> is an arbitrary term which is passed as - the argument to <c>Module:init/1</c>.</p> - <p>If the option <c>{timeout,Time}</c> is present, - the gen_server is allowed to spend <c>Time</c> milliseconds - initializing or it will be terminated and the start function - will return <c>{error,timeout}</c>. - </p> - <p>If the option <c>{debug,Dbgs}</c> is present, - the corresponding <c>sys</c> function will be called for each - item in <c>Dbgs</c>. See - <seealso marker="sys">sys(3)</seealso>.</p> - <p>If the option <c>{spawn_opt,SOpts}</c> is present, - <c>SOpts</c> will be passed as option list to - the <c>spawn_opt</c> BIF which is used to spawn - the gen_server. See - <seealso marker="erts:erlang#spawn_opt/2">erlang(3)</seealso>.</p> - <note> - <p>Using the spawn option <c>monitor</c> is currently not - allowed, but will cause the function to fail with reason - <c>badarg</c>.</p> - </note> - <p>If the gen_server is successfully created and initialized - the function returns <c>{ok,Pid}</c>, where <c>Pid</c> is - the pid of the gen_server. If there already exists a process - with the specified <c>ServerName</c> the function returns - <c>{error,{already_started,Pid}}</c>, where <c>Pid</c> is - the pid of that process.</p> - <p>If <c>Module:init/1</c> fails with <c>Reason</c>, - the function returns <c>{error,Reason}</c>. If - <c>Module:init/1</c> returns <c>{stop,Reason}</c> or - <c>ignore</c>, the process is terminated and the function - returns <c>{error,Reason}</c> or <c>ignore</c>, respectively.</p> - </desc> - </func> - <func> - <name>start(Module, Args, Options) -> Result</name> - <name>start(ServerName, Module, Args, Options) -> Result</name> - <fsummary>Create a stand-alone gen_server process.</fsummary> - <type> - <v>ServerName = {local,Name} | {global,GlobalName} - | {via,Module,ViaName}</v> - <v> Name = atom()</v> - <v> GlobalName = ViaName = term()</v> - <v>Module = atom()</v> - <v>Args = term()</v> - <v>Options = [Option]</v> - <v> Option = {debug,Dbgs} | {timeout,Time} | {spawn_opt,SOpts}</v> - <v> Dbgs = [Dbg]</v> - <v> Dbg = trace | log | statistics | {log_to_file,FileName} | {install,{Func,FuncState}}</v> - <v> SOpts = [term()]</v> - <v>Result = {ok,Pid} | ignore | {error,Error}</v> - <v> Pid = pid()</v> - <v> Error = {already_started,Pid} | term()</v> - </type> - <desc> - <p>Creates a stand-alone gen_server process, i.e. a gen_server - which is not part of a supervision tree and thus has no - supervisor.</p> - <p>See <seealso marker="#start_link/3">start_link/3,4</seealso> - for a description of arguments and return values.</p> - </desc> - </func> - <func> - <name>stop(ServerRef) -> ok</name> - <name>stop(ServerRef, Reason, Timeout) -> ok</name> - <fsummary>Synchronously stop a generic server.</fsummary> + <name>abcast(Name, Request) -> abcast</name> + <name>abcast(Nodes, Name, Request) -> abcast</name> + <fsummary>Send an asynchronous request to many generic servers.</fsummary> <type> - <v>ServerRef = Name | {Name,Node} | {global,GlobalName} - | {via,Module,ViaName} | pid()</v> + <v>Nodes = [Node]</v> <v> Node = atom()</v> - <v> GlobalName = ViaName = term()</v> - <v>Reason = term()</v> - <v>Timeout = int()>0 | infinity</v> + <v>Name = atom()</v> + <v>Request = term()</v> </type> <desc> - <p>Orders a generic server to exit with the - given <c>Reason</c> and waits for it to terminate. The - gen_server will call - <seealso marker="#Module:terminate/2">Module:terminate/2</seealso> - before exiting.</p> - <p>The function returns <c>ok</c> if the server terminates - with the expected reason. Any other reason than <c>normal</c>, - <c>shutdown</c>, or <c>{shutdown,Term}</c> will cause an - error report to be issued using - <seealso marker="kernel:error_logger#format/2">error_logger:format/2</seealso>. - The default <c>Reason</c> is <c>normal</c>.</p> - <p><c>Timeout</c> is an integer greater than zero which - specifies how many milliseconds to wait for the server to - terminate, or the atom <c>infinity</c> to wait - indefinitely. The default value is <c>infinity</c>. If the - server has not terminated within the specified time, a - <c>timeout</c> exception is raised.</p> - <p>If the process does not exist, a <c>noproc</c> exception - is raised.</p> + <p>Sends an asynchronous request to the <c>gen_server</c> processes + locally registered as <c>Name</c> at the specified nodes. The function + returns immediately and ignores nodes that do not exist, or + where the <c>gen_server</c> <c>Name</c> does not exist. + The <c>gen_server</c> processes call + <seealso marker="#Module:handle_cast/2"> + <c>Module:handle_cast/2</c></seealso> to handle the request.</p> + <p>For a description of the arguments, see + <seealso marker="#multi_call/2"><c>multi_call/2,3,4</c></seealso>.</p> </desc> </func> + <func> <name>call(ServerRef, Request) -> Reply</name> <name>call(ServerRef, Request, Timeout) -> Reply</name> <fsummary>Make a synchronous call to a generic server.</fsummary> <type> - <v>ServerRef = Name | {Name,Node} | {global,GlobalName} - | {via,Module,ViaName} | pid()</v> + <v>ServerRef = Name | {Name,Node} | {global,GlobalName}</v> + <v> | {via,Module,ViaName} | pid()</v> <v> Node = atom()</v> <v> GlobalName = ViaName = term()</v> <v>Request = term()</v> @@ -235,47 +128,126 @@ gen_server:abcast -----> Module:handle_cast/2 <v>Reply = term()</v> </type> <desc> - <p>Makes a synchronous call to the gen_server <c>ServerRef</c> + <p>Makes a synchronous call to the <c>ServerRef</c> of the + <c>gen_server</c> process by sending a request and waiting until a reply arrives or a - timeout occurs. The gen_server will call - <c>Module:handle_call/3</c> to handle the request.</p> - <p><c>ServerRef</c> can be:</p> + time-out occurs. The <c>gen_server</c> process calls + <seealso marker="#Module:handle_call/3"> + <c>Module:handle_call/3</c></seealso> to handle the request.</p> + <p><c>ServerRef</c> can be any of the following:</p> <list type="bulleted"> - <item>the pid,</item> - <item><c>Name</c>, if the gen_server is locally registered,</item> - <item><c>{Name,Node}</c>, if the gen_server is locally - registered at another node, or</item> - <item><c>{global,GlobalName}</c>, if the gen_server is - globally registered.</item> - <item><c>{via,Module,ViaName}</c>, if the gen_server is - registered through an alternative process registry.</item> + <item>The pid</item> + <item><c>Name</c>, if the <c>gen_server</c> process is locally + registered</item> + <item><c>{Name,Node}</c>, if the <c>gen_server</c> process is locally + registered at another node</item> + <item><c>{global,GlobalName}</c>, if the <c>gen_server</c> process is + globally registered</item> + <item><c>{via,Module,ViaName}</c>, if the <c>gen_server</c> process is + registered through an alternative process registry</item> </list> - <p><c>Request</c> is an arbitrary term which is passed as one of + <p><c>Request</c> is any term that is passed as one of the arguments to <c>Module:handle_call/3</c>.</p> - <p><c>Timeout</c> is an integer greater than zero which + <p><c>Timeout</c> is an integer greater than zero that specifies how many milliseconds to wait for a reply, or - the atom <c>infinity</c> to wait indefinitely. Default value - is 5000. If no reply is received within the specified time, + the atom <c>infinity</c> to wait indefinitely. Defaults to + 5000. If no reply is received within the specified time, the function call fails. If the caller catches the failure and continues running, and the server is just late with the reply, - it may arrive at any time later into the caller's message queue. + it can arrive at any time later into the message queue of the caller. The caller must in this case be prepared for this and discard any such garbage messages that are two element tuples with a reference as the first element.</p> <p>The return value <c>Reply</c> is defined in the return value of <c>Module:handle_call/3</c>.</p> - <p>The call may fail for several reasons, including timeout and - the called gen_server dying before or during the call.</p> - <p>The ancient behaviour of sometimes consuming the server + <p>The call can fail for many reasons, including time-out and the + called <c>gen_server</c> process dying before or during the call.</p> + <note> + <p>The ancient behavior of sometimes consuming the server exit message if the server died during the call while - linked to the client has been removed in OTP R12B/Erlang 5.6.</p> + linked to the client was removed in Erlang 5.6/OTP R12B.</p> + </note> </desc> </func> + + <func> + <name>cast(ServerRef, Request) -> ok</name> + <fsummary>Send an asynchronous request to a generic server.</fsummary> + <type> + <v>ServerRef = Name | {Name,Node} | {global,GlobalName}</v> + <v> | {via,Module,ViaName} | pid()</v> + <v> Node = atom()</v> + <v> GlobalName = ViaName = term()</v> + <v>Request = term()</v> + </type> + <desc> + <p>Sends an asynchronous request to the <c>ServerRef</c> of the + <c>gen_server</c> process + and returns <c>ok</c> immediately, ignoring + if the destination node or <c>gen_server</c> process does not exist. + The <c>gen_server</c> process calls + <seealso marker="#Module:handle_cast/2"> + <c>Module:handle_cast/2</c></seealso> to handle the request.</p> + <p>For a description of <c>ServerRef</c>, see + <seealso marker="#call/2"><c>call/2,3</c></seealso>.</p> + <p><c>Request</c> is any term that is passed as one + of the arguments to <c>Module:handle_cast/2</c>.</p> + </desc> + </func> + + <func> + <name>enter_loop(Module, Options, State)</name> + <name>enter_loop(Module, Options, State, ServerName)</name> + <name>enter_loop(Module, Options, State, Timeout)</name> + <name>enter_loop(Module, Options, State, ServerName, Timeout)</name> + <fsummary>Enter the <c>gen_server</c> receive loop.</fsummary> + <type> + <v>Module = atom()</v> + <v>Options = [Option]</v> + <v> Option = {debug,Dbgs}</v> + <v> Dbgs = [Dbg]</v> + <v> Dbg = trace | log | statistics</v> + <v> | {log_to_file,FileName} | {install,{Func,FuncState}}</v> + <v>State = term()</v> + <v>ServerName = {local,Name} | {global,GlobalName}</v> + <v> | {via,Module,ViaName}</v> + <v> Name = atom()</v> + <v> GlobalName = ViaName = term()</v> + <v>Timeout = int() | infinity</v> + </type> + <desc> + <p>Makes an existing process into a <c>gen_server</c> process. Does not + return, instead the calling process enters the <c>gen_server</c> + process receive + loop and becomes a <c>gen_server</c> process. The process + <em>must</em> have been started using one of the start functions in + <seealso marker="proc_lib"><c>proc_lib(3)</c></seealso>. The user is + responsible for any initialization of the process, including + registering a name for it.</p> + <p>This function is useful when a more complex initialization procedure + is needed than the <c>gen_server</c> process behavior provides.</p> + <p><c>Module</c>, <c>Options</c>, and <c>ServerName</c> have + the same meanings as when calling + <seealso marker="#start_link/3"><c>start[_link]/3,4</c></seealso>. + However, if <c>ServerName</c> is specified, the process must + have been registered accordingly <em>before</em> this function + is called.</p> + <p><c>State</c> and <c>Timeout</c> have the same meanings as in + the return value of + <seealso marker="#Module:init/1"><c>Module:init/1</c></seealso>. + The callback module <c>Module</c> does not need to + export an <c>init/1</c> function.</p> + <p>The function fails if the calling process was not started by a + <c>proc_lib</c> start function, or if it is not registered + according to <c>ServerName</c>.</p> + </desc> + </func> + <func> <name>multi_call(Name, Request) -> Result</name> <name>multi_call(Nodes, Name, Request) -> Result</name> <name>multi_call(Nodes, Name, Request, Timeout) -> Result</name> - <fsummary>Make a synchronous call to several generic servers.</fsummary> + <fsummary>Make a synchronous call to many generic servers.</fsummary> <type> <v>Nodes = [Node]</v> <v> Node = atom()</v> @@ -288,203 +260,339 @@ gen_server:abcast -----> Module:handle_cast/2 <v>BadNodes = [Node]</v> </type> <desc> - <p>Makes a synchronous call to all gen_servers locally + <p>Makes a synchronous call to all <c>gen_server</c> processes locally registered as <c>Name</c> at the specified nodes by first - sending a request to every node and then waiting for - the replies. The gen_servers will call - <c>Module:handle_call/3</c> to handle the request.</p> - <p>The function returns a tuple <c>{Replies,BadNodes}</c> where + sending a request to every node and then waits for + the replies. The <c>gen_server</c> process calls + <seealso marker="#Module:handle_call/3"> + <c>Module:handle_call/3</c></seealso> to handle the request.</p> + <p>The function returns a tuple <c>{Replies,BadNodes}</c>, where <c>Replies</c> is a list of <c>{Node,Reply}</c> and <c>BadNodes</c> is a list of node that either did not exist, - or where the gen_server <c>Name</c> did not exist or did not + or where the <c>gen_server</c> <c>Name</c> did not exist or did not reply.</p> <p><c>Nodes</c> is a list of node names to which the request - should be sent. Default value is the list of all known nodes + is to be sent. Default value is the list of all known nodes <c>[node()|nodes()]</c>.</p> <p><c>Name</c> is the locally registered name of each - gen_server.</p> - <p><c>Request</c> is an arbitrary term which is passed as one of + <c>gen_server</c> process.</p> + <p><c>Request</c> is any term that is passed as one of the arguments to <c>Module:handle_call/3</c>.</p> - <p><c>Timeout</c> is an integer greater than zero which + <p><c>Timeout</c> is an integer greater than zero that specifies how many milliseconds to wait for each reply, or - the atom <c>infinity</c> to wait indefinitely. Default value - is <c>infinity</c>. If no reply is received from a node within + the atom <c>infinity</c> to wait indefinitely. Defaults + to <c>infinity</c>. If no reply is received from a node within the specified time, the node is added to <c>BadNodes</c>.</p> - <p>When a reply <c>Reply</c> is received from the gen_server at - a node <c>Node</c>, <c>{Node,Reply}</c> is added to + <p>When a reply <c>Reply</c> is received from the <c>gen_server</c> + process at a node <c>Node</c>, <c>{Node,Reply}</c> is added to <c>Replies</c>. <c>Reply</c> is defined in the return value of <c>Module:handle_call/3</c>.</p> <warning> - <p>If one of the nodes is not capable of process monitors, - for example C or Java nodes, and the gen_server is not started - when the requests are sent, but starts within 2 seconds, - this function waits the whole <c>Timeout</c>, - which may be infinity.</p> + <p>If one of the nodes cannot process monitors, for example, + C or Java nodes, and the <c>gen_server</c> process is not started + when the requests are sent, but starts within 2 seconds, + this function waits the whole <c>Timeout</c>, + which may be infinity.</p> <p>This problem does not exist if all nodes are Erlang nodes.</p> </warning> - <p>To prevent late answers (after the timeout) from polluting - the caller's message queue, a middleman process is used to - do the actual calls. Late answers will then be discarded + <p>To prevent late answers (after the time-out) from polluting + the message queue of the caller, a middleman process is used to + do the calls. Late answers are then discarded when they arrive to a terminated process.</p> </desc> </func> + <func> - <name>cast(ServerRef, Request) -> ok</name> - <fsummary>Send an asynchronous request to a generic server.</fsummary> + <name>reply(Client, Reply) -> Result</name> + <fsummary>Send a reply to a client.</fsummary> <type> - <v>ServerRef = Name | {Name,Node} | {global,GlobalName} - | {via,Module,ViaName} | pid()</v> - <v> Node = atom()</v> - <v> GlobalName = ViaName = term()</v> - <v>Request = term()</v> + <v>Client - see below</v> + <v>Reply = term()</v> + <v>Result = term()</v> </type> <desc> - <p>Sends an asynchronous request to the gen_server - <c>ServerRef</c> and returns <c>ok</c> immediately, ignoring - if the destination node or gen_server does not exist. - The gen_server will call <c>Module:handle_cast/2</c> to - handle the request.</p> - <p>See <seealso marker="#call/2">call/2,3</seealso> for a - description of <c>ServerRef</c>.</p> - <p><c>Request</c> is an arbitrary term which is passed as one - of the arguments to <c>Module:handle_cast/2</c>.</p> + <p>This function can be used by a <c>gen_server</c> process to + explicitly send a reply to a client that called + <seealso marker="#call/2"><c>call/2,3</c></seealso> or + <seealso marker="#multi_call/2"><c>multi_call/2,3,4</c></seealso>, + when the reply cannot be defined in the return value of + <seealso marker="#Module:handle_call/3"> + <c>Module:handle_call/3</c></seealso>.</p> + <p><c>Client</c> must be the <c>From</c> argument provided to + the callback function. <c>Reply</c> is any term + given back to the client as the return value of + <c>call/2,3</c> or <c>multi_call/2,3,4</c>.</p> + <p>The return value <c>Result</c> is not further defined, and + is always to be ignored.</p> </desc> </func> + <func> - <name>abcast(Name, Request) -> abcast</name> - <name>abcast(Nodes, Name, Request) -> abcast</name> - <fsummary>Send an asynchronous request to several generic servers.</fsummary> + <name>start(Module, Args, Options) -> Result</name> + <name>start(ServerName, Module, Args, Options) -> Result</name> + <fsummary>Create a standalone <c>gen_server</c> process.</fsummary> <type> - <v>Nodes = [Node]</v> - <v> Node = atom()</v> - <v>Name = atom()</v> - <v>Request = term()</v> + <v>ServerName = {local,Name} | {global,GlobalName}</v> + <v> | {via,Module,ViaName}</v> + <v> Name = atom()</v> + <v> GlobalName = ViaName = term()</v> + <v>Module = atom()</v> + <v>Args = term()</v> + <v>Options = [Option]</v> + <v> Option = {debug,Dbgs} | {timeout,Time} | {spawn_opt,SOpts}</v> + <v> Dbgs = [Dbg]</v> + <v> Dbg = trace | log | statistics | {log_to_file,FileName} | {install,{Func,FuncState}}</v> + <v> SOpts = [term()]</v> + <v>Result = {ok,Pid} | ignore | {error,Error}</v> + <v> Pid = pid()</v> + <v> Error = {already_started,Pid} | term()</v> </type> <desc> - <p>Sends an asynchronous request to the gen_servers locally - registered as <c>Name</c> at the specified nodes. The function - returns immediately and ignores nodes that do not exist, or - where the gen_server <c>Name</c> does not exist. - The gen_servers will call <c>Module:handle_cast/2</c> to - handle the request.</p> - <p>See - <seealso marker="#multi_call/2">multi_call/2,3,4</seealso> - for a description of the arguments.</p> + <p>Creates a standalone <c>gen_server</c> process, that is, a + <c>gen_server</c> process that is not part of a supervision tree + and thus has no supervisor.</p> + <p>For a description of arguments and return values, see + <seealso marker="#start_link/3"><c>start_link/3,4</c></seealso>.</p> </desc> </func> + <func> - <name>reply(Client, Reply) -> Result</name> - <fsummary>Send a reply to a client.</fsummary> + <name>start_link(Module, Args, Options) -> Result</name> + <name>start_link(ServerName, Module, Args, Options) -> Result</name> + <fsummary>Create a <c>gen_server</c> process in a supervision tree. + </fsummary> <type> - <v>Client - see below</v> - <v>Reply = term()</v> - <v>Result = term()</v> + <v>ServerName = {local,Name} | {global,GlobalName}</v> + <v> | {via,Module,ViaName}</v> + <v> Name = atom()</v> + <v> GlobalName = ViaName = term()</v> + <v>Module = atom()</v> + <v>Args = term()</v> + <v>Options = [Option]</v> + <v> Option = {debug,Dbgs} | {timeout,Time} | {spawn_opt,SOpts}</v> + <v> Dbgs = [Dbg]</v> + <v> Dbg = trace | log | statistics | {log_to_file,FileName} | {install,{Func,FuncState}}</v> + <v> SOpts = [term()]</v> + <v>Result = {ok,Pid} | ignore | {error,Error}</v> + <v> Pid = pid()</v> + <v> Error = {already_started,Pid} | term()</v> </type> <desc> - <p>This function can be used by a gen_server to explicitly send - a reply to a client that called <c>call/2,3</c> or - <c>multi_call/2,3,4</c>, when the reply cannot be defined in - the return value of <c>Module:handle_call/3</c>.</p> - <p><c>Client</c> must be the <c>From</c> argument provided to - the callback function. <c>Reply</c> is an arbitrary term, - which will be given back to the client as the return value of - <c>call/2,3</c> or <c>multi_call/2,3,4</c>.</p> - <p>The return value <c>Result</c> is not further defined, and - should always be ignored.</p> + <p>Creates a <c>gen_server</c> process as part of a supervision tree. + This function is to be called, directly or indirectly, by + the supervisor. For example, it ensures that + the <c>gen_server</c> process is linked to the supervisor.</p> + <p>The <c>gen_server</c> process calls + <seealso marker="#Module:init/1"><c>Module:init/1</c></seealso> to + initialize. To ensure a synchronized startup procedure, + <c>start_link/3,4</c> does not return until + <c>Module:init/1</c> has returned.</p> + <list type="bulleted"> + <item> + <p>If <c>ServerName={local,Name}</c>, the <c>gen_server</c> process + is registered locally as <c>Name</c> using <c>register/2</c>.</p> + </item> + <item> + <p>If <c>ServerName={global,GlobalName}</c>, the <c>gen_server</c> + process id registered globally as <c>GlobalName</c> using + <seealso marker="kernel:global#register_name/2"> + <c>global:register_name/2</c></seealso> If no name is + provided, the <c>gen_server</c> process is not registered.</p> + </item> + <item> + <p>If <c>ServerName={via,Module,ViaName}</c>, the <c>gen_server</c> + process registers with the registry represented by <c>Module</c>. + The <c>Module</c> callback is to export the functions + <c>register_name/2</c>, <c>unregister_name/1</c>, + <c>whereis_name/1</c>, and <c>send/2</c>, which are to behave + like the corresponding functions in + <seealso marker="kernel:global"><c>global</c></seealso>. + Thus, <c>{via,global,GlobalName}</c> is a valid reference.</p> + </item> + </list> + <p><c>Module</c> is the name of the callback module.</p> + <p><c>Args</c> is any term that is passed as + the argument to + <seealso marker="#Module:init/1"><c>Module:init/1</c></seealso>.</p> + <list type="bulleted"> + <item> + <p>If option <c>{timeout,Time}</c> is present, the <c>gen_server</c> + process is allowed to spend <c>Time</c> milliseconds + initializing or it is terminated and the start function + returns <c>{error,timeout}</c>.</p> + </item> + <item> + <p>If option <c>{debug,Dbgs}</c> is present, + the corresponding <c>sys</c> function is called for each + item in <c>Dbgs</c>; see + <seealso marker="sys"><c>sys(3)</c></seealso>.</p> + </item> + <item> + <p>If option <c>{spawn_opt,SOpts}</c> is present, + <c>SOpts</c> is passed as option list to + the <c>spawn_opt</c> BIF, which is used to spawn + the <c>gen_server</c> process; see + <seealso marker="erts:erlang#spawn_opt/2"> + <c>spawn_opt/2</c></seealso>.</p> + </item> + </list> + <note> + <p>Using spawn option <c>monitor</c> is not + allowed, it causes the function to fail with reason + <c>badarg</c>.</p> + </note> + <p>If the <c>gen_server</c> process is successfully created and + initialized, the function returns <c>{ok,Pid}</c>, where <c>Pid</c> + is the pid of the <c>gen_server</c> process. If a process with the + specified <c>ServerName</c> exists already, the function returns + <c>{error,{already_started,Pid}}</c>, where <c>Pid</c> is + the pid of that process.</p> + <p>If <c>Module:init/1</c> fails with <c>Reason</c>, + the function returns <c>{error,Reason}</c>. If + <c>Module:init/1</c> returns <c>{stop,Reason}</c> or + <c>ignore</c>, the process is terminated and the function + returns <c>{error,Reason}</c> or <c>ignore</c>, respectively.</p> </desc> </func> + <func> - <name>enter_loop(Module, Options, State)</name> - <name>enter_loop(Module, Options, State, ServerName)</name> - <name>enter_loop(Module, Options, State, Timeout)</name> - <name>enter_loop(Module, Options, State, ServerName, Timeout)</name> - <fsummary>Enter the gen_server receive loop</fsummary> + <name>stop(ServerRef) -> ok</name> + <name>stop(ServerRef, Reason, Timeout) -> ok</name> + <fsummary>Synchronously stop a generic server.</fsummary> <type> - <v>Module = atom()</v> - <v>Options = [Option]</v> - <v> Option = {debug,Dbgs}</v> - <v> Dbgs = [Dbg]</v> - <v> Dbg = trace | log | statistics</v> - <v> | {log_to_file,FileName} | {install,{Func,FuncState}}</v> - <v>State = term()</v> - <v>ServerName = {local,Name} | {global,GlobalName} - | {via,Module,ViaName}</v> - <v> Name = atom()</v> + <v>ServerRef = Name | {Name,Node} | {global,GlobalName}</v> + <v> | {via,Module,ViaName} | pid()</v> + <v> Node = atom()</v> <v> GlobalName = ViaName = term()</v> - <v>Timeout = int() | infinity</v> + <v>Reason = term()</v> + <v>Timeout = int()>0 | infinity</v> </type> <desc> - <p>Makes an existing process into a gen_server. Does not return, - instead the calling process will enter the gen_server receive - loop and become a gen_server process. The process - <em>must</em> have been started using one of the start - functions in <c>proc_lib</c>, see - <seealso marker="proc_lib">proc_lib(3)</seealso>. The user is - responsible for any initialization of the process, including - registering a name for it.</p> - <p>This function is useful when a more complex initialization - procedure is needed than the gen_server behaviour provides.</p> - <p><c>Module</c>, <c>Options</c> and <c>ServerName</c> have - the same meanings as when calling - <seealso marker="#start_link/3">gen_server:start[_link]/3,4</seealso>. - However, if <c>ServerName</c> is specified, the process must - have been registered accordingly <em>before</em> this function - is called.</p> - <p><c>State</c> and <c>Timeout</c> have the same meanings as in - the return value of - <seealso marker="#Moduleinit">Module:init/1</seealso>. - Also, the callback module <c>Module</c> does not need to - export an <c>init/1</c> function. </p> - <p>Failure: If the calling process was not started by a - <c>proc_lib</c> start function, or if it is not registered - according to <c>ServerName</c>.</p> + <p>Orders a generic server to exit with the specified <c>Reason</c> + and waits for it to terminate. The <c>gen_server</c> process calls + <seealso marker="#Module:terminate/2"> + <c>Module:terminate/2</c></seealso> before exiting.</p> + <p>The function returns <c>ok</c> if the server terminates + with the expected reason. Any other reason than <c>normal</c>, + <c>shutdown</c>, or <c>{shutdown,Term}</c> causes an + error report to be issued using + <seealso marker="kernel:error_logger#format/2"> + <c>error_logger:format/2</c></seealso>. + The default <c>Reason</c> is <c>normal</c>.</p> + <p><c>Timeout</c> is an integer greater than zero that + specifies how many milliseconds to wait for the server to + terminate, or the atom <c>infinity</c> to wait + indefinitely. Defaults to <c>infinity</c>. If the + server has not terminated within the specified time, a + <c>timeout</c> exception is raised.</p> + <p>If the process does not exist, a <c>noproc</c> exception + is raised.</p> </desc> </func> </funcs> <section> - <title>CALLBACK FUNCTIONS</title> + <title>Callback Functions</title> <p>The following functions - should be exported from a <c>gen_server</c> callback module.</p> + are to be exported from a <c>gen_server</c> callback module.</p> </section> + <funcs> <func> - <name>Module:init(Args) -> Result</name> - <fsummary>Initialize process and internal state.</fsummary> + <name>Module:code_change(OldVsn, State, Extra) -> {ok, NewState} | {error, Reason}</name> + <fsummary>Update the internal state during upgrade/downgrade.</fsummary> <type> - <v>Args = term()</v> - <v>Result = {ok,State} | {ok,State,Timeout} | {ok,State,hibernate}</v> - <v> | {stop,Reason} | ignore</v> - <v> State = term()</v> - <v> Timeout = int()>=0 | infinity</v> - <v> Reason = term()</v> + <v>OldVsn = Vsn | {down, Vsn}</v> + <v> Vsn = term()</v> + <v>State = NewState = term()</v> + <v>Extra = term()</v> + <v>Reason = term()</v> </type> <desc> - <marker id="Moduleinit"></marker> - <p>Whenever a gen_server is started using - <seealso marker="#start/3">gen_server:start/3,4</seealso> or - <seealso marker="#start_link/3">gen_server:start_link/3,4</seealso>, - this function is called by the new process to initialize.</p> - <p><c>Args</c> is the <c>Args</c> argument provided to the start - function.</p> - <p>If the initialization is successful, the function should - return <c>{ok,State}</c>, <c>{ok,State,Timeout}</c> or <c>{ok,State,hibernate}</c>, where - <c>State</c> is the internal state of the gen_server.</p> - <p>If an integer timeout value is provided, a timeout will occur - unless a request or a message is received within - <c>Timeout</c> milliseconds. A timeout is represented by - the atom <c>timeout</c> which should be handled by - the <c>handle_info/2</c> callback function. The atom - <c>infinity</c> can be used to wait indefinitely, this is - the default value.</p> - <p>If <c>hibernate</c> is specified instead of a timeout value, the process will go - into hibernation when waiting for the next message to arrive (by calling - <seealso marker="proc_lib#hibernate/3">proc_lib:hibernate/3</seealso>).</p> - <p>If something goes wrong during the initialization - the function should return <c>{stop,Reason}</c> where - <c>Reason</c> is any term, or <c>ignore</c>.</p> + <p>This function is called by a <c>gen_server</c> process when it is + to update its internal state during a release upgrade/downgrade, + that is, when the instruction <c>{update,Module,Change,...}</c>, + where <c>Change={advanced,Extra}</c>, is specifed in + the <c>appup</c> file. For more information, see section + <seealso marker="doc/design_principles:release_handling#instr"> + Release Handling Instructions</seealso> in OTP Design Principles.</p> + <p>For an upgrade, <c>OldVsn</c> is <c>Vsn</c>, and + for a downgrade, <c>OldVsn</c> is + <c>{down,Vsn}</c>. <c>Vsn</c> is defined by the <c>vsn</c> + attribute(s) of the old version of the callback module + <c>Module</c>. If no such attribute is defined, the version + is the checksum of the Beam file.</p> + <p><c>State</c> is the internal state of the <c>gen_server</c> + process.</p> + <p><c>Extra</c> is passed "as is" from the <c>{advanced,Extra}</c> + part of the update instruction.</p> + <p>If successful, the function must return the updated + internal state.</p> + <p>If the function returns <c>{error,Reason}</c>, the ongoing + upgrade fails and rolls back to the old release.</p> </desc> </func> + + <func> + <name>Module:format_status(Opt, [PDict, State]) -> Status</name> + <fsummary>Optional function for providing a term describing the + current <c>gen_server</c> status.</fsummary> + <type> + <v>Opt = normal | terminate</v> + <v>PDict = [{Key, Value}]</v> + <v>State = term()</v> + <v>Status = term()</v> + </type> + <desc> + <note> + <p>This callback is optional, so callback modules need not + export it. The <c>gen_server</c> module provides a default + implementation of this function that returns the callback + module state.</p> + </note> + <p>This function is called by a <c>gen_server</c> process in the + following situations:</p> + <list type="bulleted"> + <item> + <p>One of <seealso marker="sys#get_status/1"> + <c>sys:get_status/1,2</c></seealso> + is invoked to get the <c>gen_server</c> status. <c>Opt</c> is set + to the atom <c>normal</c>.</p> + </item> + <item> + <p>The <c>gen_server</c> process terminates abnormally and logs an + error. <c>Opt</c> is set to the atom <c>terminate</c>.</p> + </item> + </list> + <p>This function is useful for changing the form and + appearance of the <c>gen_server</c> status for these cases. A + callback module wishing to change + the <c>sys:get_status/1,2</c> return value, as well as how + its status appears in termination error logs, exports an + instance of <c>format_status/2</c> that returns a term + describing the current status of the <c>gen_server</c> process.</p> + <p><c>PDict</c> is the current value of the process dictionary of + the <c>gen_server</c> process..</p> + <p><c>State</c> is the internal state of the <c>gen_server</c> + process.</p> + <p>The function is to return <c>Status</c>, a term that + changes the details of the current state and status of + the <c>gen_server</c> process. There are no restrictions on the + form <c>Status</c> can take, but for + the <c>sys:get_status/1,2</c> case (when <c>Opt</c> + is <c>normal</c>), the recommended form for + the <c>Status</c> value is <c>[{data, [{"State", + Term}]}]</c>, where <c>Term</c> provides relevant details of + the <c>gen_server</c> state. Following this recommendation is not + required, but it makes the callback module status + consistent with the rest of the <c>sys:get_status/1,2</c> + return value.</p> + <p>One use for this function is to return compact alternative + state representations to avoid that large state terms are + printed in log files.</p> + </desc> + </func> + <func> <name>Module:handle_call(Request, From, State) -> Result</name> <fsummary>Handle a synchronous request.</fsummary> @@ -493,9 +601,9 @@ gen_server:abcast -----> Module:handle_cast/2 <v>From = {pid(),Tag}</v> <v>State = term()</v> <v>Result = {reply,Reply,NewState} | {reply,Reply,NewState,Timeout}</v> - <v> | {reply,Reply,NewState,hibernate}</v> + <v> | {reply,Reply,NewState,hibernate}</v> <v> | {noreply,NewState} | {noreply,NewState,Timeout}</v> - <v> | {noreply,NewState,hibernate}</v> + <v> | {noreply,NewState,hibernate}</v> <v> | {stop,Reason,Reply,NewState} | {stop,Reason,NewState}</v> <v> Reply = term()</v> <v> NewState = term()</v> @@ -503,38 +611,52 @@ gen_server:abcast -----> Module:handle_cast/2 <v> Reason = term()</v> </type> <desc> - <p>Whenever a gen_server receives a request sent using - <seealso marker="#call/2">gen_server:call/2,3</seealso> or - <seealso marker="#multi_call/2">gen_server:multi_call/2,3,4</seealso>, + <p>Whenever a <c>gen_server</c> process receives a request sent using + <seealso marker="#call/2"><c>call/2,3</c></seealso> or + <seealso marker="#multi_call/2"><c>multi_call/2,3,4</c></seealso>, this function is called to handle the request.</p> <p><c>Request</c> is the <c>Request</c> argument provided to <c>call</c> or <c>multi_call</c>.</p> - <p><c>From</c> is a tuple <c>{Pid,Tag}</c> where <c>Pid</c> is + <p><c>From</c> is a tuple <c>{Pid,Tag}</c>, where <c>Pid</c> is the pid of the client and <c>Tag</c> is a unique tag.</p> - <p><c>State</c> is the internal state of the gen_server.</p> - <p>If the function returns <c>{reply,Reply,NewState}</c>, - <c>{reply,Reply,NewState,Timeout}</c> or - <c>{reply,Reply,NewState,hibernate}</c>, <c>Reply</c> will be - given back to <c>From</c> as the return value of - <c>call/2,3</c> or included in the return value of - <c>multi_call/2,3,4</c>. The gen_server then continues - executing with the possibly updated internal state - <c>NewState</c>. See <c>Module:init/1</c> for a description - of <c>Timeout</c> and <c>hibernate</c>.</p> - <p>If the functions returns <c>{noreply,NewState}</c>, - <c>{noreply,NewState,Timeout}</c> or <c>{noreply,NewState,hibernate}</c>, - the gen_server will - continue executing with <c>NewState</c>. Any reply to - <c>From</c> must be given explicitly using - <seealso marker="#reply/2">gen_server:reply/2</seealso>.</p> - <p>If the function returns <c>{stop,Reason,Reply,NewState}</c>, - <c>Reply</c> will be given back to <c>From</c>. If - the function returns <c>{stop,Reason,NewState}</c>, any reply - to <c>From</c> must be given explicitly using - <c>gen_server:reply/2</c>. The gen_server will then call - <c>Module:terminate(Reason,NewState)</c> and terminate.</p> + <p><c>State</c> is the internal state of the <c>gen_server</c> + process.</p> + <list type="bulleted"> + <item> + <p>If <c>{reply,Reply,NewState}</c> is returned, + <c>{reply,Reply,NewState,Timeout}</c> or + <c>{reply,Reply,NewState,hibernate}</c>, <c>Reply</c> is + given back to <c>From</c> as the return value of + <c>call/2,3</c> or included in the return value of + <c>multi_call/2,3,4</c>. The <c>gen_server</c> process then + continues executing with the possibly updated internal state + <c>NewState</c>.</p> + <p>For a description of <c>Timeout</c> and <c>hibernate</c>, see + <seealso marker="#Module:init/1"><c>Module:init/1</c></seealso>.</p> + </item> + <item> + <p>If <c>{noreply,NewState}</c> is returned, + <c>{noreply,NewState,Timeout}</c>, or + <c>{noreply,NewState,hibernate}</c>, the <c>gen_server</c> + process continues executing with <c>NewState</c>. Any reply to + <c>From</c> must be specified explicitly using + <seealso marker="#reply/2"><c>reply/2</c></seealso>.</p> + </item> + <item> + <p>If <c>{stop,Reason,Reply,NewState}</c> is returned, + <c>Reply</c> is given back to <c>From</c>.</p> + </item> + <item> + <p>If <c>{stop,Reason,NewState}</c> is returned, any reply + to <c>From</c> must be specified explicitly using + <seealso marker="#reply/2"><c>reply/2</c></seealso>. + The <c>gen_server</c> process then calls + <c>Module:terminate(Reason,NewState)</c> and terminates.</p> + </item> + </list> </desc> </func> + <func> <name>Module:handle_cast(Request, State) -> Result</name> <fsummary>Handle an asynchronous request.</fsummary> @@ -549,37 +671,82 @@ gen_server:abcast -----> Module:handle_cast/2 <v> Reason = term()</v> </type> <desc> - <p>Whenever a gen_server receives a request sent using - <seealso marker="#cast/2">gen_server:cast/2</seealso> or - <seealso marker="#abcast/2">gen_server:abcast/2,3</seealso>, + <p>Whenever a <c>gen_server</c> process receives a request sent using + <seealso marker="#cast/2"><c>cast/2</c></seealso> or + <seealso marker="#abcast/2"><c>abcast/2,3</c></seealso>, this function is called to handle the request.</p> - <p>See <c>Module:handle_call/3</c> for a description of - the arguments and possible return values.</p> + <p>For a description of the arguments and possible return values, see + <seealso marker="#Module:handle_call/3"> + <c>Module:handle_call/3</c></seealso>.</p> </desc> </func> + <func> <name>Module:handle_info(Info, State) -> Result</name> <fsummary>Handle an incoming message.</fsummary> <type> <v>Info = timeout | term()</v> <v>State = term()</v> - <v>Result = {noreply,NewState} | {noreply,NewState,Timeout} </v> - <v> | {noreply,NewState,hibernate}</v> + <v>Result = {noreply,NewState} | {noreply,NewState,Timeout}</v> + <v> | {noreply,NewState,hibernate}</v> <v> | {stop,Reason,NewState}</v> <v> NewState = term()</v> <v> Timeout = int()>=0 | infinity</v> <v> Reason = normal | term()</v> </type> <desc> - <p>This function is called by a gen_server when a timeout - occurs or when it receives any other message than a + <p>This function is called by a <c>gen_server</c> process when a + time-out occurs or when it receives any other message than a synchronous or asynchronous request (or a system message).</p> - <p><c>Info</c> is either the atom <c>timeout</c>, if a timeout + <p><c>Info</c> is either the atom <c>timeout</c>, if a time-out has occurred, or the received message.</p> - <p>See <c>Module:handle_call/3</c> for a description of - the other arguments and possible return values.</p> + <p>For a description of the other arguments and possible return values, + see <seealso marker="#Module:handle_call/3"> + <c>Module:handle_call/3</c></seealso>.</p> + </desc> + </func> + + <func> + <name>Module:init(Args) -> Result</name> + <fsummary>Initialize process and internal state.</fsummary> + <type> + <v>Args = term()</v> + <v>Result = {ok,State} | {ok,State,Timeout} | {ok,State,hibernate}</v> + <v> | {stop,Reason} | ignore</v> + <v> State = term()</v> + <v> Timeout = int()>=0 | infinity</v> + <v> Reason = term()</v> + </type> + <desc> + <p>Whenever a <c>gen_server</c> process is started using + <seealso marker="#start/3"><c>start/3,4</c></seealso> or + <seealso marker="#start_link/3"><c>start_link/3,4</c></seealso>, + this function is called by the new process to initialize.</p> + <p><c>Args</c> is the <c>Args</c> argument provided to the start + function.</p> + <p>If the initialization is successful, the function is to + return <c>{ok,State}</c>, <c>{ok,State,Timeout}</c>, or + <c>{ok,State,hibernate}</c>, where <c>State</c> is the internal + state of the <c>gen_server</c> process.</p> + <p>If an integer time-out value is provided, a time-out occurs + unless a request or a message is received within + <c>Timeout</c> milliseconds. A time-out is represented by + the atom <c>timeout</c>, which is to be handled by the + <seealso marker="#Module:handle_info/2"> + <c>Module:handle_info/2</c></seealso> callback function. The atom + <c>infinity</c> can be used to wait indefinitely, this is + the default value.</p> + <p>If <c>hibernate</c> is specified instead of a time-out value, + the process goes into + hibernation when waiting for the next message to arrive (by calling + <seealso marker="proc_lib#hibernate/3"> + <c>proc_lib:hibernate/3</c></seealso>).</p> + <p>If the initialization fails, the function is to return + <c>{stop,Reason}</c>, where <c>Reason</c> is any term, or + <c>ignore</c>.</p> </desc> </func> + <func> <name>Module:terminate(Reason, State)</name> <fsummary>Clean up before termination.</fsummary> @@ -588,137 +755,57 @@ gen_server:abcast -----> Module:handle_cast/2 <v>State = term()</v> </type> <desc> - <p>This function is called by a gen_server when it is about to - terminate. It should be the opposite of <c>Module:init/1</c> + <p>This function is called by a <c>gen_server</c> process when it is + about to terminate. It is to be the opposite of + <seealso marker="#Module:init/1"><c>Module:init/1</c></seealso> and do any necessary cleaning up. When it returns, - the gen_server terminates with <c>Reason</c>. The return - value is ignored.</p> - <p><c>Reason</c> is a term denoting the stop reason and - <c>State</c> is the internal state of the gen_server.</p> - <p><c>Reason</c> depends on why the gen_server is terminating. - If it is because another callback function has returned a - stop tuple <c>{stop,..}</c>, <c>Reason</c> will have - the value specified in that tuple. If it is due to a failure, + the <c>gen_server</c> process terminates with <c>Reason</c>. + The return value is ignored.</p> + <p><c>Reason</c> is a term denoting the stop reason and <c>State</c> + is the internal state of the <c>gen_server</c> process.</p> + <p><c>Reason</c> depends on why the <c>gen_server</c> process is + terminating. If it is because another callback function has returned + a stop tuple <c>{stop,..}</c>, <c>Reason</c> has + the value specified in that tuple. If it is because of a failure, <c>Reason</c> is the error reason.</p> - <p>If the gen_server is part of a supervision tree and is - ordered by its supervisor to terminate, this function will be + <p>If the <c>gen_server</c> process is part of a supervision tree and + is ordered by its supervisor to terminate, this function is called with <c>Reason=shutdown</c> if the following conditions apply:</p> <list type="bulleted"> - <item>the gen_server has been set to trap exit signals, and</item> - <item>the shutdown strategy as defined in the supervisor's - child specification is an integer timeout value, not - <c>brutal_kill</c>.</item> + <item> + <p>The <c>gen_server</c> process has been set to trap exit + signals.</p> + </item> + <item> + <p>The shutdown strategy as defined in the child specification + of the supervisor is an integer time-out value, not + <c>brutal_kill</c>.</p> + </item> </list> - <p>Even if the gen_server is <em>not</em> part of a supervision tree, - this function will be called if it receives an <c>'EXIT'</c> - message from its parent. <c>Reason</c> will be the same as in - the <c>'EXIT'</c> message.</p> - <p>Otherwise, the gen_server will be immediately terminated.</p> - <p>Note that for any other reason than <c>normal</c>, - <c>shutdown</c>, or <c>{shutdown,Term}</c> the gen_server is - assumed to terminate due to an error and - an error report is issued using - <seealso marker="kernel:error_logger#format/2">error_logger:format/2</seealso>.</p> - </desc> - </func> - <func> - <name>Module:code_change(OldVsn, State, Extra) -> {ok, NewState} | {error, Reason}</name> - <fsummary>Update the internal state during upgrade/downgrade.</fsummary> - <type> - <v>OldVsn = Vsn | {down, Vsn}</v> - <v> Vsn = term()</v> - <v>State = NewState = term()</v> - <v>Extra = term()</v> - <v>Reason = term()</v> - </type> - <desc> - <p>This function is called by a gen_server when it should - update its internal state during a release upgrade/downgrade, - i.e. when the instruction <c>{update,Module,Change,...}</c> - where <c>Change={advanced,Extra}</c> is given in - the <c>appup</c> file. See - <seealso marker="doc/design_principles:release_handling#instr">OTP Design Principles</seealso> - for more information.</p> - <p>In the case of an upgrade, <c>OldVsn</c> is <c>Vsn</c>, and - in the case of a downgrade, <c>OldVsn</c> is - <c>{down,Vsn}</c>. <c>Vsn</c> is defined by the <c>vsn</c> - attribute(s) of the old version of the callback module - <c>Module</c>. If no such attribute is defined, the version - is the checksum of the BEAM file.</p> - <p><c>State</c> is the internal state of the gen_server.</p> - <p><c>Extra</c> is passed as-is from the <c>{advanced,Extra}</c> - part of the update instruction.</p> - <p>If successful, the function shall return the updated - internal state.</p> - <p>If the function returns <c>{error,Reason}</c>, the ongoing - upgrade will fail and roll back to the old release.</p> - </desc> - </func> - <func> - <name>Module:format_status(Opt, [PDict, State]) -> Status</name> - <fsummary>Optional function for providing a term describing the - current gen_server status.</fsummary> - <type> - <v>Opt = normal | terminate</v> - <v>PDict = [{Key, Value}]</v> - <v>State = term()</v> - <v>Status = term()</v> - </type> - <desc> - <note> - <p>This callback is optional, so callback modules need not - export it. The gen_server module provides a default - implementation of this function that returns the callback - module state.</p> - </note> - <p>This function is called by a gen_server process when:</p> - <list type="bulleted"> - <item>One - of <seealso marker="sys#get_status/1">sys:get_status/1,2</seealso> - is invoked to get the gen_server status. <c>Opt</c> is set - to the atom <c>normal</c> for this case.</item> - <item>The gen_server terminates abnormally and logs an - error. <c>Opt</c> is set to the atom <c>terminate</c> for this - case.</item> - </list> - <p>This function is useful for customising the form and - appearance of the gen_server status for these cases. A - callback module wishing to customise - the <c>sys:get_status/1,2</c> return value as well as how - its status appears in termination error logs exports an - instance of <c>format_status/2</c> that returns a term - describing the current status of the gen_server.</p> - <p><c>PDict</c> is the current value of the gen_server's - process dictionary.</p> - <p><c>State</c> is the internal state of the gen_server.</p> - <p>The function should return <c>Status</c>, a term that - customises the details of the current state and status of - the gen_server. There are no restrictions on the - form <c>Status</c> can take, but for - the <c>sys:get_status/1,2</c> case (when <c>Opt</c> - is <c>normal</c>), the recommended form for - the <c>Status</c> value is <c>[{data, [{"State", - Term}]}]</c> where <c>Term</c> provides relevant details of - the gen_server state. Following this recommendation isn't - required, but doing so will make the callback module status - consistent with the rest of the <c>sys:get_status/1,2</c> - return value.</p> - <p>One use for this function is to return compact alternative - state representations to avoid having large state terms - printed in logfiles.</p> + <p>Even if the <c>gen_server</c> process is <em>not</em> part of a + supervision tree, this function is called if it receives an + <c>'EXIT'</c> message from its parent. <c>Reason</c> is the same + as in the <c>'EXIT'</c> message.</p> + <p>Otherwise, the <c>gen_server</c> process terminates immediately.</p> + <p>Notice that for any other reason than <c>normal</c>, + <c>shutdown</c>, or <c>{shutdown,Term}</c>, the <c>gen_server</c> + process is assumed to terminate because of an error and + an error report is issued using + <seealso marker="kernel:error_logger#format/2"> + <c>error_logger:format/2</c></seealso>.</p> </desc> </func> </funcs> <section> - <title>SEE ALSO</title> - <p><seealso marker="gen_event">gen_event(3)</seealso>, - <seealso marker="gen_fsm">gen_fsm(3)</seealso>, - <seealso marker="gen_statem">gen_statem(3)</seealso>, - <seealso marker="supervisor">supervisor(3)</seealso>, - <seealso marker="proc_lib">proc_lib(3)</seealso>, - <seealso marker="sys">sys(3)</seealso></p> + <title>See Also</title> + <p><seealso marker="gen_event"><c>gen_event(3)</c></seealso>, + <seealso marker="gen_fsm"><c>gen_fsm(3)</c></seealso>, + <seealso marker="gen_statem"><c>gen_statem(3)</c></seealso>, + <seealso marker="proc_lib"><c>proc_lib(3)</c></seealso>, + <seealso marker="supervisor"><c>supervisor(3)</c></seealso>, + <seealso marker="sys"><c>sys(3)</c></seealso></p> </section> </erlref> diff --git a/lib/stdlib/doc/src/gen_statem.xml b/lib/stdlib/doc/src/gen_statem.xml index 0e7d6e53e9..3322571b2c 100644 --- a/lib/stdlib/doc/src/gen_statem.xml +++ b/lib/stdlib/doc/src/gen_statem.xml @@ -97,6 +97,9 @@ gen_statem module Callback module gen_statem:start gen_statem:start_link -----> Module:init/1 +Server start or code change + -----> Module:callback_mode/0 + gen_statem:stop -----> Module:terminate/3 gen_statem:call @@ -116,9 +119,11 @@ erlang:'!' -----> Module:StateName/3 </p> <p> If a callback function fails or returns a bad value, - the <c>gen_statem</c> terminates. However, an exception of class + the <c>gen_statem</c> terminates, unless otherwise stated. + However, an exception of class <seealso marker="erts:erlang#throw/1"><c>throw</c></seealso> - is not regarded as an error but as a valid return. + is not regarded as an error but as a valid return + from all callback functions. </p> <marker id="state_function"/> <p> @@ -127,7 +132,8 @@ erlang:'!' -----> Module:StateName/3 in a <c>gen_statem</c> is the callback function that is called for all events in this state. It is selected depending on which <seealso marker="#type-callback_mode"><em>callback mode</em></seealso> - that the implementation specifies when the server starts. + that the callback module defines with the callback function + <seealso marker="#Module:callback_mode/0"><c>Module:callback_mode/0</c></seealso>. </p> <p> When the @@ -138,9 +144,9 @@ erlang:'!' -----> Module:StateName/3 This gathers all code for a specific state in one function as the <c>gen_statem</c> engine branches depending on state name. - Notice that in this mode the mandatory callback function + Notice the fact that there is a mandatory callback function <seealso marker="#Module:terminate/3"><c>Module:terminate/3</c></seealso> - makes the state name <c>terminate</c> unusable. + makes the state name <c>terminate</c> unusable in this mode. </p> <p> When the @@ -249,11 +255,10 @@ erlang:'!' -----> Module:StateName/3 -behaviour(gen_statem). -export([start/0,push/0,get_count/0,stop/0]). --export([terminate/3,code_change/4,init/1]). +-export([terminate/3,code_change/4,init/1,callback_mode/0]). -export([on/3,off/3]). name() -> pushbutton_statem. % The registered server name -callback_mode() -> state_functions. %% API. This example uses a registered name name() %% and does not link to the caller. @@ -270,15 +275,14 @@ stop() -> terminate(_Reason, _State, _Data) -> void. code_change(_Vsn, State, Data, _Extra) -> - {callback_mode(),State,Data}. + {ok,State,Data}. init([]) -> - %% Set the callback mode and initial state + data. - %% Data is used only as a counter. + %% Set the initial state + data. Data is used only as a counter. State = off, Data = 0, - {callback_mode(),State,Data}. - + {ok,State,Data}. +callback_mode() -> state_functions. -%%% State functions +%%% State function(s) off({call,From}, push, Data) -> %% Go to 'on', increment count and reply @@ -326,18 +330,13 @@ ok To compare styles, here follows the same example using <seealso marker="#type-callback_mode"><em>callback mode</em></seealso> <c>state_functions</c>, or rather the code to replace - from function <c>init/1</c> of the <c>pushbutton.erl</c> + after function <c>init/1</c> of the <c>pushbutton.erl</c> example file above: </p> <code type="erl"> -init([]) -> - %% Set the callback mode and initial state + data. - %% Data is used only as a counter. - State = off, Data = 0, - {handle_event_function,State,Data}. - +callback_mode() -> handle_event_function. -%%% Event handling +%%% State function(s) handle_event({call,From}, push, off, Data) -> %% Go to 'on', increment count and reply @@ -400,7 +399,7 @@ handle_event(_, _, State, Data) -> <item> <p> The <c>gen_statem</c> is globally registered in - <seealso marker="kernel:global"><c>kernel:global</c></seealso>. + <seealso marker="kernel:global"><c>global</c></seealso>. </p> </item> <tag><c>{via,RegMod,ViaName}</c></tag> @@ -413,7 +412,7 @@ handle_event(_, _, State, Data) -> <c>register_name/2</c>, <c>unregister_name/1</c>, <c>whereis_name/1</c>, and <c>send/2</c>, which are to behave like the corresponding functions in - <seealso marker="kernel:global"><c>kernel:global</c></seealso>. + <seealso marker="kernel:global"><c>global</c></seealso>. Thus, <c>{via,global,GlobalName}</c> is the same as <c>{global,GlobalName}</c>. </p> @@ -426,8 +425,8 @@ handle_event(_, _, State, Data) -> <desc> <p> Debug option that can be used when starting - a <c>gen_statem</c> server through, for example, - <seealso marker="#enter_loop/5"><c>enter_loop/5</c></seealso>. + a <c>gen_statem</c> server through, + <seealso marker="#enter_loop/4"><c>enter_loop/4-6</c></seealso>. </p> <p> For every entry in <c><anno>Dbgs</anno></c>, @@ -525,12 +524,9 @@ handle_event(_, _, State, Data) -> <desc> <p> The <em>callback mode</em> is selected when starting the - <c>gen_statem</c> using the return value from - <seealso marker="#Module:init/1"><c>Module:init/1</c></seealso> - or when calling - <seealso marker="#enter_loop/5"><c>enter_loop/5,6,7</c></seealso>, - and with the return value from - <seealso marker="#Module:code_change/4"><c>Module:code_change/4</c></seealso>. + <c>gen_statem</c> and after code change + using the return value from + <seealso marker="#Module:callback_mode/0"><c>Module:callback_mode/0</c></seealso>. </p> <taglist> <tag><c>state_functions</c></tag> @@ -691,7 +687,7 @@ handle_event(_, _, State, Data) -> <seealso marker="#state_function">state function</seealso>, from <seealso marker="#Module:init/1"><c>Module:init/1</c></seealso> or by giving them to - <seealso marker="#enter_loop/6"><c>enter_loop/6,7</c></seealso>. + <seealso marker="#enter_loop/5"><c>enter_loop/5,6</c></seealso>. </p> <p> Actions are executed in the containing list order. @@ -923,7 +919,8 @@ handle_event(_, _, State, Data) -> </p> <note> <p> - To avoid getting a late reply in the caller's + For <c><anno>Timeout</anno> =/= infinity</c>, + to avoid getting a late reply in the caller's inbox, this function spawns a proxy process that does the call. A late reply gets delivered to the dead proxy process, hence gets discarded. This is @@ -958,35 +955,36 @@ handle_event(_, _, State, Data) -> </func> <func> - <name name="enter_loop" arity="5"/> + <name name="enter_loop" arity="4"/> <fsummary>Enter the <c>gen_statem</c> receive loop.</fsummary> <desc> <p> The same as - <seealso marker="#enter_loop/7"><c>enter_loop/7</c></seealso> - except that no + <seealso marker="#enter_loop/6"><c>enter_loop/6</c></seealso> + with <c>Actions = []</c> except that no <seealso marker="#type-server_name"><c>server_name()</c></seealso> - must have been registered. + must have been registered. This creates an anonymous server. </p> </desc> </func> <func> - <name name="enter_loop" arity="6"/> + <name name="enter_loop" arity="5"/> <fsummary>Enter the <c>gen_statem</c> receive loop.</fsummary> <desc> <p> If <c><anno>Server_or_Actions</anno></c> is a <c>list()</c>, the same as - <seealso marker="#enter_loop/7"><c>enter_loop/7</c></seealso> + <seealso marker="#enter_loop/6"><c>enter_loop/6</c></seealso> except that no <seealso marker="#type-server_name"><c>server_name()</c></seealso> must have been registered and <c>Actions = <anno>Server_or_Actions</anno></c>. + This creates an anonymous server. </p> <p> Otherwise the same as - <seealso marker="#enter_loop/7"><c>enter_loop/7</c></seealso> + <seealso marker="#enter_loop/6"><c>enter_loop/6</c></seealso> with <c>Server = <anno>Server_or_Actions</anno></c> and <c>Actions = []</c>. @@ -995,7 +993,7 @@ handle_event(_, _, State, Data) -> </func> <func> - <name name="enter_loop" arity="7"/> + <name name="enter_loop" arity="6"/> <fsummary>Enter the <c>gen_statem</c> receive loop.</fsummary> <desc> <p> @@ -1015,21 +1013,31 @@ handle_event(_, _, State, Data) -> the <c>gen_statem</c> behavior provides. </p> <p> - <c><anno>Module</anno></c>, <c><anno>Opts</anno></c>, and - <c><anno>Server</anno></c> have the same meanings - as when calling + <c><anno>Module</anno></c>, <c><anno>Opts</anno></c> + have the same meaning as when calling <seealso marker="#start_link/3"><c>start[_link]/3,4</c></seealso>. + </p> + <p> + If <c><anno>Server</anno></c> is <c>self()</c> an anonymous + server is created just as when using + <seealso marker="#start_link/3"><c>start[_link]/3</c></seealso>. + If <c><anno>Server</anno></c> is a + <seealso marker="#type-server_name"><c>server_name()</c></seealso> + a named server is created just as when using + <seealso marker="#start_link/4"><c>start[_link]/4</c></seealso>. However, the <seealso marker="#type-server_name"><c>server_name()</c></seealso> name must have been registered accordingly - <em>before</em> this function is called.</p> + <em>before</em> this function is called. + </p> <p> - <c><anno>CallbackMode</anno></c>, <c><anno>State</anno></c>, - <c><anno>Data</anno></c>, and <c><anno>Actions</anno></c> + <c><anno>State</anno></c>, <c><anno>Data</anno></c>, + and <c><anno>Actions</anno></c> have the same meanings as in the return value of <seealso marker="#Module:init/1"><c>Module:init/1</c></seealso>. - Also, the callback module <c><anno>Module</anno></c> - does not need to export an <c>init/1</c> function. + Also, the callback module does not need to export a + <seealso marker="#Module:init/1"><c>Module:init/1</c></seealso> + function. </p> <p> The function fails if the calling process was not started by a @@ -1253,6 +1261,48 @@ handle_event(_, _, State, Data) -> <funcs> <func> + <name>Module:callback_mode() -> CallbackMode</name> + <fsummary>Update the internal state during upgrade/downgrade.</fsummary> + <type> + <v> + CallbackMode = + <seealso marker="#type-callback_mode">callback_mode()</seealso> + </v> + </type> + <desc> + <p> + This function is called by a <c>gen_statem</c> + when it needs to find out the + <seealso marker="#type-callback_mode"><em>callback mode</em></seealso> + of the callback module. The value is cached by <c>gen_statem</c> + for efficiency reasons, so this function is only called + once after server start and after code change, + but before the first + <seealso marker="#state_function">state function</seealso> + is called. More occasions may be added in future versions + of <c>gen_statem</c>. + </p> + <p> + Server start happens either when + <seealso marker="#Module:init/1"><c>Module:init/1</c></seealso> + returns or when + <seealso marker="#enter_loop/4"><c>enter_loop/4-6</c></seealso> + is called. Code change happens when + <seealso marker="#Module:code_change/4"><c>Module:code_change/4</c></seealso> + returns. + </p> + <note> + <p> + If this function's body does not consist of solely one of two + possible + <seealso marker="#type-callback_mode">atoms</seealso> + the callback module is doing something strange. + </p> + </note> + </desc> + </func> + + <func> <name>Module:code_change(OldVsn, OldState, OldData, Extra) -> Result </name> @@ -1262,11 +1312,7 @@ handle_event(_, _, State, Data) -> <v> Vsn = term()</v> <v>OldState = NewState = term()</v> <v>Extra = term()</v> - <v>Result = {NewCallbackMode,NewState,NewData} | Reason</v> - <v> - NewCallbackMode = - <seealso marker="#type-callback_mode">callback_mode()</seealso> - </v> + <v>Result = {ok,NewState,NewData} | Reason</v> <v> OldState = NewState = <seealso marker="#type-state">state()</seealso> @@ -1295,21 +1341,6 @@ handle_event(_, _, State, Data) -> <c>Module</c>. If no such attribute is defined, the version is the checksum of the Beam file. </p> - <note> - <p> - If you would dare to change - <seealso marker="#type-callback_mode"><em>callback mode</em></seealso> - during release upgrade/downgrade, the upgrade is no problem, - as the new code surely knows what <em>callback mode</em> - it needs. However, for a downgrade this function must - know from argument <c>Extra</c> that comes from the - <seealso marker="sasl:appup"><c>sasl:appup</c></seealso> - file what <em>callback mode</em> the old code did use. - It can also be possible to figure this out - from argument <c>{down,Vsn}</c>, as <c>Vsn</c> - in effect defines the old callback module version. - </p> - </note> <p> <c>OldState</c> and <c>OldData</c> is the internal state of the <c>gen_statem</c>. @@ -1321,31 +1352,32 @@ handle_event(_, _, State, Data) -> <p> If successful, the function must return the updated internal state in an - <c>{NewCallbackMode,NewState,NewData}</c> tuple. + <c>{ok,NewState,NewData}</c> tuple. </p> <p> - If the function returns <c>Reason</c>, the ongoing - upgrade fails and rolls back to the old release.</p> - <p> - This function can use - <seealso marker="erts:erlang#throw/1"><c>erlang:throw/1</c></seealso> - to return <c>Result</c> or <c>Reason</c>. + If the function returns a failure <c>Reason</c>, the ongoing + upgrade fails and rolls back to the old release. + Note that <c>Reason</c> can not be an <c>{ok,_,_}</c> tuple + since that will be regarded as a + <c>{ok,NewState,NewData}</c> tuple, + and that a tuple matching <c>{ok,_}</c> + is an also invalid failure <c>Reason</c>. + It is recommended to use an atom as <c>Reason</c> since + it will be wrapped in an <c>{error,Reason}</c> tuple. </p> </desc> </func> <func> <name>Module:init(Args) -> Result</name> - <fsummary>Initialize process and internal state.</fsummary> + <fsummary> + Optional function for initializing process and internal state. + </fsummary> <type> <v>Args = term()</v> - <v>Result = {CallbackMode,State,Data}</v> - <v> | {CallbackMode,State,Data,Actions}</v> + <v>Result = {ok,State,Data}</v> + <v> | {ok,State,Data,Actions}</v> <v> | {stop,Reason} | ignore</v> - <v> - CallbackMode = - <seealso marker="#type-callback_mode">callback_mode()</seealso> - </v> <v>State = <seealso marker="#type-state">state()</seealso></v> <v> Data = <seealso marker="#type-data">data()</seealso> @@ -1364,7 +1396,7 @@ handle_event(_, _, State, Data) -> <seealso marker="#start_link/3"><c>start_link/3,4</c></seealso> or <seealso marker="#start/3"><c>start/3,4</c></seealso>, - this function is called by the new process to initialize + this optional function is called by the new process to initialize the implementation state and server data. </p> <p> @@ -1373,11 +1405,8 @@ handle_event(_, _, State, Data) -> </p> <p> If the initialization is successful, the function is to - return <c>{CallbackMode,State,Data}</c> or - <c>{CallbackMode,State,Data,Actions}</c>. - <c>CallbackMode</c> selects the - <seealso marker="#type-callback_mode"><em>callback mode</em></seealso> - of the <c>gen_statem</c>. + return <c>{ok,State,Data}</c> or + <c>{ok,State,Data,Actions}</c>. <c>State</c> is the initial <seealso marker="#type-state"><c>state()</c></seealso> and <c>Data</c> the initial server @@ -1395,11 +1424,16 @@ handle_event(_, _, State, Data) -> or <c>ignore</c>; see <seealso marker="#start_link/3"><c>start_link/3,4</c></seealso>. </p> - <p> - This function can use - <seealso marker="erts:erlang#throw/1"><c>erlang:throw/1</c></seealso> - to return <c>Result</c>. - </p> + <note> + <p> + This callback is optional, so a callback module does not need + to export it, but most do. If this function is not exported, + the <c>gen_statem</c> should be started through + <seealso marker="proc_lib"><c>proc_lib</c></seealso> + and + <seealso marker="#enter_loop/4"><c>enter_loop/4-6</c></seealso>. + </p> + </note> </desc> </func> @@ -1430,10 +1464,14 @@ handle_event(_, _, State, Data) -> This callback is optional, so a callback module does not need to export it. The <c>gen_statem</c> module provides a default implementation of this function that returns - <c>{State,Data}</c>. If this callback fails, the default - function returns <c>{State,Info}</c>, - where <c>Info</c> informs of the crash but no details, - to hide possibly sensitive data. + <c>{State,Data}</c>. + </p> + <p> + If this callback is exported but fails, + to hide possibly sensitive data, + the default function will instead return <c>{State,Info}</c>, + where <c>Info</c> says nothing but the fact that + <c>format_status/2</c> has crashed. </p> </note> <p>This function is called by a <c>gen_statem</c> process when @@ -1494,11 +1532,6 @@ handle_event(_, _, State, Data) -> printed in log files. Another use is to hide sensitive data from being written to the error log. </p> - <p> - This function can use - <seealso marker="erts:erlang#throw/1"><c>erlang:throw/1</c></seealso> - to return <c>Status</c>. - </p> </desc> </func> @@ -1573,9 +1606,12 @@ handle_event(_, _, State, Data) -> see <seealso marker="#type-action"><c>action()</c></seealso>. </p> <p> - These functions can use - <seealso marker="erts:erlang#throw/1"><c>erlang:throw/1</c></seealso>, - to return the result. + Note the fact that you can use + <seealso marker="erts:erlang#throw/1"><c>throw</c></seealso> + to return the result, which can be useful. + For example to bail out with <c>throw(keep_state_and_data)</c> + from deep within complex code that is in no position to + return <c>{next_state,State,Data}</c>. </p> </desc> </func> @@ -1648,11 +1684,6 @@ handle_event(_, _, State, Data) -> and an error report is issued using <seealso marker="kernel:error_logger#format/2"><c>error_logger:format/2</c></seealso>. </p> - <p> - This function can use - <seealso marker="erts:erlang#throw/1"><c>erlang:throw/1</c></seealso> - to return <c>Ignored</c>, which is ignored anyway. - </p> </desc> </func> </funcs> diff --git a/lib/stdlib/doc/src/introduction.xml b/lib/stdlib/doc/src/introduction.xml new file mode 100644 index 0000000000..5bf545c65f --- /dev/null +++ b/lib/stdlib/doc/src/introduction.xml @@ -0,0 +1,72 @@ +<?xml version="1.0" encoding="utf-8" ?> +<!DOCTYPE chapter SYSTEM "chapter.dtd"> + +<chapter> + <header> + <copyright> + <year>1999</year> + <year>2013</year> + <holder>Ericsson AB. All Rights Reserved.</holder> + </copyright> + <legalnotice> + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + + </legalnotice> + + <title>Introduction</title> + <prepared></prepared> + <responsible></responsible> + <docno></docno> + <approved></approved> + <checked></checked> + <date>2016-03-04</date> + <rev>PA1</rev> + <file>introduction.xml</file> + </header> + + <section> + <title>Scope</title> + <p>The Standard Erlang Libraries application, <em>STDLIB</em>, is mandatory + in the sense that the minimal system based on Erlang/OTP consists of + <em>STDLIB</em> and <em>Kernel</em>.</p> + + <p><em>STDLIB</em> contains the following functional areas:</p> + + <list type="bulleted"> + <item>Erlang shell</item> + <item>Command interface</item> + <item>Query interface</item> + <item>Interface to standard Erlang I/O servers</item> + <item>Interface to the Erlang built-in term storage BIFs</item> + <item>Regular expression matching functions for strings and binaries</item> + <item>Finite state machine</item> + <item>Event handling</item> + <item>Functions for the server of a client-server relation</item> + <item>Function to control applications in a distributed manner</item> + <item>Start and control of slave nodes</item> + <item>Operations on finite sets and relations represented as sets</item> + <item>Library for handling binary data</item> + <item>Disk-based term storage</item> + <item>List processing</item> + <item>Maps processing</item> + </list> + </section> + + <section> + <title>Prerequisites</title> + <p>It is assumed that the reader is familiar with the Erlang programming + language.</p> + </section> +</chapter> + + diff --git a/lib/stdlib/doc/src/io.xml b/lib/stdlib/doc/src/io.xml index 9ae50ed90c..11a64c7f8a 100644 --- a/lib/stdlib/doc/src/io.xml +++ b/lib/stdlib/doc/src/io.xml @@ -29,48 +29,50 @@ <rev></rev> </header> <module>io</module> - <modulesummary>Standard I/O Server Interface Functions</modulesummary> + <modulesummary>Standard I/O server interface functions.</modulesummary> <description> <p>This module provides an interface to standard Erlang I/O servers. The output functions all return <c>ok</c> if they are successful, or exit if they are not.</p> - <p>In the following description, all functions have an optional + + <p>All functions in this module have an optional parameter <c>IoDevice</c>. If included, it must be the pid of a - process which handles the IO protocols. Normally, it is the + process that handles the I/O protocols. Normally, it is the <c>IoDevice</c> returned by - <seealso marker="kernel:file#open/2">file:open/2</seealso>.</p> - <p>For a description of the IO protocols refer to the <seealso marker="io_protocol">STDLIB User's Guide</seealso>.</p> - <warning> - - <p>As of R13A, data supplied to the <seealso - marker="#put_chars/2">put_chars</seealso> function should be in the - <seealso marker="unicode#type-chardata"><c>unicode:chardata()</c></seealso> format. This means that programs - supplying binaries to this function need to convert them to UTF-8 - before trying to output the data on an IO device.</p> - - <p>If an IO device is set in binary mode, the functions <seealso - marker="#get_chars/3">get_chars</seealso> and <seealso - marker="#get_line/2">get_line</seealso> may return binaries - instead of lists. The binaries will, as of R13A, be encoded in - UTF-8.</p> + <seealso marker="kernel:file#open/2"><c>file:open/2</c></seealso>.</p> - <p>To work with binaries in ISO-latin-1 encoding, use the <seealso - marker="kernel:file">file</seealso> module instead.</p> - - <p>For conversion functions between character encodings, see the <seealso - marker="stdlib:unicode">unicode</seealso> module.</p> + <p>For a description of the I/O protocols, see section + <seealso marker="io_protocol">The Erlang I/O Protocol</seealso> + in the User's Guide.</p> + <warning> + <p>As from Erlang/OTP R13A, data supplied to function + <seealso marker="#put_chars/2"><c>put_chars/2</c></seealso> + is to be in the <seealso marker="unicode#type-chardata"> + <c>unicode:chardata()</c></seealso> format. This means that programs + supplying binaries to this function must convert them to UTF-8 + before trying to output the data on an I/O device.</p> + <p>If an I/O device is set in binary mode, functions + <seealso marker="#get_chars/2"><c>get_chars/2,3</c></seealso> and + <seealso marker="#get_line/1"><c>get_line/1,2</c></seealso> + can return binaries instead of lists. + The binaries are, as from Erlang/OTP R13A, + encoded in UTF-8.</p> + <p>To work with binaries in ISO Latin-1 encoding, use the + <seealso marker="kernel:file"><c>file</c></seealso> module instead.</p> + <p>For conversion functions between character encodings, see the + <seealso marker="stdlib:unicode"><c>unicode</c></seealso> module.</p> </warning> - </description> <datatypes> <datatype> <name name="device"/> <desc> - <p>An IO device. Either <c>standard_io</c>, <c>standard_error</c>, a - registered name, or a pid handling IO protocols (returned from - <seealso marker="kernel:file#open/2">file:open/2</seealso>).</p> + <p>An I/O device, either <c>standard_io</c>, <c>standard_error</c>, a + registered name, or a pid handling I/O protocols (returned from + <seealso marker="kernel:file#open/2"><c>file:open/2</c></seealso>). + </p> </desc> </datatype> <datatype> @@ -96,7 +98,7 @@ </datatype> <datatype> <name name="server_no_data"/> - <desc><p>What the I/O-server sends when there is no data.</p></desc> + <desc><p>What the I/O server sends when there is no data.</p></desc> </datatype> </datatypes> @@ -104,329 +106,93 @@ <func> <name name="columns" arity="0"/> <name name="columns" arity="1"/> - <fsummary>Get the number of columns of an IO device</fsummary> - <desc> - <p>Retrieves the number of columns of the - <c><anno>IoDevice</anno></c> (i.e. the width of a terminal). The function - only succeeds for terminal devices, for all other IO devices - the function returns <c>{error, enotsup}</c></p> - </desc> - </func> - <func> - <name name="put_chars" arity="1"/> - <name name="put_chars" arity="2"/> - <fsummary>Write a list of characters</fsummary> - <desc> - <p>Writes the characters of <c><anno>CharData</anno></c> to the I/O server - (<c><anno>IoDevice</anno></c>).</p> - </desc> - </func> - <func> - <name name="nl" arity="0"/> - <name name="nl" arity="1"/> - <fsummary>Write a newline</fsummary> - <desc> - <p>Writes new line to the standard output (<c><anno>IoDevice</anno></c>).</p> - </desc> - </func> - <func> - <name name="get_chars" arity="2"/> - <name name="get_chars" arity="3"/> - <fsummary>Read a specified number of characters</fsummary> - <type name="server_no_data"/> - <desc> - <p>Reads <c><anno>Count</anno></c> characters from standard input - (<c><anno>IoDevice</anno></c>), prompting it with <c><anno>Prompt</anno></c>. It - returns:</p> - <taglist> - <tag><c><anno>Data</anno></c></tag> - <item> - <p>The input characters. If the IO device supports Unicode, - the data may represent codepoints larger than 255 (the - latin1 range). If the I/O server is set to deliver - binaries, they will be encoded in UTF-8 (regardless of if - the IO device actually supports Unicode or not).</p> - </item> - <tag><c>eof</c></tag> - <item> - <p>End of file was encountered.</p> - </item> - <tag><c>{error, <anno>ErrorDescription</anno>}</c></tag> - <item> - <p>Other (rare) error condition, for instance <c>{error, estale}</c> - if reading from an NFS file system.</p> - </item> - </taglist> - </desc> - </func> - <func> - <name name="get_line" arity="1"/> - <name name="get_line" arity="2"/> - <fsummary>Read a line</fsummary> - <type name="server_no_data"/> - <desc> - <p>Reads a line from the standard input (<c><anno>IoDevice</anno></c>), - prompting it with <c><anno>Prompt</anno></c>. It returns:</p> - <taglist> - <tag><c><anno>Data</anno></c></tag> - <item> - <p>The characters in the line terminated by a LF (or end of - file). If the IO device supports Unicode, - the data may represent codepoints larger than 255 (the - latin1 range). If the I/O server is set to deliver - binaries, they will be encoded in UTF-8 (regardless of if - the IO device actually supports Unicode or not).</p> - </item> - <tag><c>eof</c></tag> - <item> - <p>End of file was encountered.</p> - </item> - <tag><c>{error, <anno>ErrorDescription</anno>}</c></tag> - <item> - <p>Other (rare) error condition, for instance <c>{error, estale}</c> - if reading from an NFS file system.</p> - </item> - </taglist> - </desc> - </func> - <func> - <name name="getopts" arity="0"/> - <name name="getopts" arity="1"/> - <fsummary>Get the supported options and values from an I/O-server</fsummary> + <fsummary>Get the number of columns of an I/O device.</fsummary> <desc> - <p>This function requests all available options and their current values for a specific IO device. Example:</p> -<pre> -1> <input>{ok,F} = file:open("/dev/null",[read]).</input> -{ok,<0.42.0>} -2> <input>io:getopts(F).</input> -[{binary,false},{encoding,latin1}]</pre> - <p>Here the file I/O-server returns all available options for a file, - which are the expected ones, <c>encoding</c> and <c>binary</c>. The standard shell however has some more options:</p> -<pre> -3> io:getopts(). -[{expand_fun,#Fun<group.0.120017273>}, - {echo,true}, - {binary,false}, - {encoding,unicode}]</pre> - <p>This example is, as can be seen, run in an environment where the terminal supports Unicode input and output.</p> + <p>Retrieves the number of columns of the + <c><anno>IoDevice</anno></c> (that is, the width of a terminal). + The function succeeds for terminal devices and returns + <c>{error, enotsup}</c> for all other I/O devices.</p> </desc> </func> - <func> - <name name="printable_range" arity="0"/> - <fsummary>Get user requested printable character range</fsummary> - <desc> - <p>Return the user requested range of printable Unicode characters.</p> - <p>The user can request a range of characters that are to be considered printable in heuristic detection of strings by the shell and by the formatting functions. This is done by supplying <c>+pc <range></c> when starting Erlang.</p> - <p>Currently the only valid values for <c><range></c> are <c>latin1</c> and <c>unicode</c>. <c>latin1</c> means that only code points below 256 (with the exception of control characters etc) will be considered printable. <c>unicode</c> means that all printable characters in all unicode character ranges are considered printable by the io functions.</p> - <p>By default, Erlang is started so that only the <c>latin1</c> range of characters will indicate that a list of integers is a string.</p> - <p>The simplest way to utilize the setting is to call <seealso marker="io_lib#printable_list/1">io_lib:printable_list/1</seealso>, which will use the return value of this function to decide if a list is a string of printable characters or not.</p> - <note><p>In the future, this function may return more values and ranges. It is recommended to use the io_lib:printable_list/1 function to avoid compatibility problems.</p></note> - </desc> - </func> - <func> - <name name="setopts" arity="1"/> - <name name="setopts" arity="2"/> - <fsummary>Set options</fsummary> - <desc> - <p>Set options for the standard IO device (<c><anno>IoDevice</anno></c>).</p> - <p>Possible options and values vary depending on the actual - IO device. For a list of supported options and their current values - on a specific IO device, use the <seealso - marker="#getopts/1">getopts/1</seealso> function.</p> - - <p>The options and values supported by the current OTP IO devices are:</p> - <taglist> - <tag><c>binary, list or {binary, boolean()}</c></tag> - <item> - <p>If set in binary mode (<c>binary</c> or <c>{binary, true}</c>), the I/O server sends binary data (encoded in UTF-8) as answers to the <c>get_line</c>, <c>get_chars</c> and, if possible, <c>get_until</c> requests (see the I/O protocol description in <seealso marker="io_protocol">STDLIB User's Guide</seealso> for details). The immediate effect is that <c>get_chars/2,3</c> and <c>get_line/1,2</c> return UTF-8 binaries instead of lists of chars for the affected IO device.</p> - <p>By default, all IO devices in OTP are set in list mode, but the I/O functions can handle any of these modes and so should other, user written, modules behaving as clients to I/O-servers.</p> - <p>This option is supported by the standard shell (<c>group.erl</c>), the 'oldshell' (<c>user.erl</c>) and the file I/O servers.</p> - </item> - <tag><c>{echo, boolean()}</c></tag> - <item> - <p>Denotes if the terminal should echo input. Only supported for the standard shell I/O-server (<c>group.erl</c>)</p> - </item> - <tag><c>{expand_fun, expand_fun()}</c></tag> - <item> - <p>Provide a function for tab-completion (expansion) - like the Erlang shell. This function is called - when the user presses the TAB key. The expansion is - active when calling line-reading functions such as - <c>get_line/1,2</c>.</p> - <p>The function is called with the current line, upto - the cursor, as a reversed string. It should return a - three-tuple: <c>{yes|no, string(), [string(), ...]}</c>. The - first element gives a beep if <c>no</c>, otherwise the - expansion is silent, the second is a string that will be - entered at the cursor position, and the third is a list of - possible expansions. If this list is non-empty, the list - will be printed and the current input line will be written - once again.</p> - <p>Trivial example (beep on anything except empty line, which - is expanded to <c>"quit"</c>):</p> - <code type="none"> - fun("") -> {yes, "quit", []}; - (_) -> {no, "", ["quit"]} end</code> - <p>This option is supported by the standard shell only (<c>group.erl</c>).</p> - </item> - <tag><c>{encoding, latin1 | unicode}</c></tag> - <item> - <p>Specifies how characters are input or output from or to the actual IO device, implying that i.e. a terminal is set to handle Unicode input and output or a file is set to handle UTF-8 data encoding.</p> - <p>The option <em>does not</em> affect how data is returned from the I/O functions or how it is sent in the I/O-protocol, it only affects how the IO device is to handle Unicode characters towards the "physical" device.</p> - <p>The standard shell will be set for either Unicode or latin1 encoding when the system is started. The actual encoding is set with the help of the <c>LANG</c> or <c>LC_CTYPE</c> environment variables on Unix-like system or by other means on other systems. The bottom line is that the user can input Unicode characters and the IO device will be in <c>{encoding, unicode}</c> mode if the IO device supports it. The mode can be changed, if the assumption of the runtime system is wrong, by setting this option.</p> - <p>The IO device used when Erlang is started with the "-oldshell" or "-noshell" flags is by default set to latin1 encoding, meaning that any characters beyond codepoint 255 will be escaped and that input is expected to be plain 8-bit ISO-latin-1. If the encoding is changed to Unicode, input and output from the standard file descriptors will be in UTF-8 (regardless of operating system).</p> - <p>Files can also be set in <c>{encoding, unicode}</c>, meaning that data is written and read as UTF-8. More encodings are possible for files, see below.</p> - <p><c>{encoding, unicode | latin1}</c> is supported by both the standard shell (<c>group.erl</c> including <c>werl</c> on Windows®), the 'oldshell' (<c>user.erl</c>) and the file I/O servers.</p> - </item> - <tag><c>{encoding, utf8 | utf16 | utf32 | {utf16,big} | {utf16,little} | {utf32,big} | {utf32,little}}</c></tag> - <item> - <p>For disk files, the encoding can be set to various UTF variants. This will have the effect that data is expected to be read as the specified encoding from the file and the data will be written in the specified encoding to the disk file.</p> - <p><c>{encoding, utf8}</c> will have the same effect as <c>{encoding, unicode}</c> on files.</p> - <p>The extended encodings are only supported on disk files (opened by the <seealso marker="kernel:file#open/2">file:open/2</seealso> function)</p> - </item> - </taglist> - </desc> - </func> - <func> - <name name="write" arity="1"/> - <name name="write" arity="2"/> - <fsummary>Write a term</fsummary> - <desc> - <p>Writes the term <c><anno>Term</anno></c> to the standard output - (<c><anno>IoDevice</anno></c>).</p> - </desc> - </func> <func> - <name name="read" arity="1"/> - <name name="read" arity="2"/> - <fsummary>Read a term</fsummary> - <type name="server_no_data"/> - <desc> - <p>Reads a term <c><anno>Term</anno></c> from the standard input - (<c><anno>IoDevice</anno></c>), prompting it with <c><anno>Prompt</anno></c>. It - returns:</p> - <taglist> - <tag><c>{ok, <anno>Term</anno>}</c></tag> - <item> - <p>The parsing was successful.</p> - </item> - <tag><c>eof</c></tag> - <item> - <p>End of file was encountered.</p> - </item> - <tag><c>{error, <anno>ErrorInfo</anno>}</c></tag> - <item> - <p>The parsing failed.</p> - </item> - <tag><c>{error, <anno>ErrorDescription</anno>}</c></tag> - <item> - <p>Other (rare) error condition, for instance <c>{error, estale}</c> - if reading from an NFS file system.</p> - </item> - </taglist> - </desc> - </func> - <func> - <name name="read" arity="3"/> - <name name="read" arity="4"/> - <fsummary>Read a term</fsummary> - <type name="server_no_data"/> - <desc> - <p>Reads a term <c><anno>Term</anno></c> from <c><anno>IoDevice</anno></c>, prompting it - with <c><anno>Prompt</anno></c>. Reading starts at location - <c><anno>StartLocation</anno></c>. The argument - <c><anno>Options</anno></c> is passed on as the <c>Options</c> - argument of the <c>erl_scan:tokens/4</c> function. It returns:</p> - <taglist> - <tag><c>{ok, Term, <anno>EndLocation</anno>}</c></tag> - <item> - <p>The parsing was successful.</p> - </item> - <tag><c>{eof, <anno>EndLocation</anno>}</c></tag> - <item> - <p>End of file was encountered.</p> - </item> - <tag><c>{error, <anno>ErrorInfo</anno>, <anno>ErrorLocation</anno>}</c></tag> - <item> - <p>The parsing failed.</p> - </item> - <tag><c>{error, <anno>ErrorDescription</anno>}</c></tag> - <item> - <p>Other (rare) error condition, for instance <c>{error, estale}</c> - if reading from an NFS file system.</p> - </item> - </taglist> - </desc> - </func> - <func> - <name name="fwrite" arity="1"/> - <name name="fwrite" arity="2"/> - <name name="fwrite" arity="3"/> <name name="format" arity="1"/> <name name="format" arity="2"/> <name name="format" arity="3"/> - <fsummary>Write formatted output</fsummary> + <name name="fwrite" arity="1"/> + <name name="fwrite" arity="2"/> + <name name="fwrite" arity="3"/> + <fsummary>Write formatted output.</fsummary> <desc> - <p>Writes the items in <c><anno>Data</anno></c> (<c>[]</c>) on the standard - output (<c><anno>IoDevice</anno></c>) in accordance with <c><anno>Format</anno></c>. - <c><anno>Format</anno></c> contains plain characters which are copied to + <p>Writes the items in <c><anno>Data</anno></c> (<c>[]</c>) on the + standard output (<c><anno>IoDevice</anno></c>) in accordance with + <c><anno>Format</anno></c>. <c><anno>Format</anno></c> contains + plain characters that are copied to the output device, and control sequences for formatting, see - below. If <c><anno>Format</anno></c> is an atom or a binary, it is first - converted to a list with the aid of <c>atom_to_list/1</c> - or <c>binary_to_list/1</c>.</p> + below. If <c><anno>Format</anno></c> is an atom or a binary, it is + first converted to a list with the aid of <c>atom_to_list/1</c> or + <c>binary_to_list/1</c>. Example:</p> <pre> 1> <input>io:fwrite("Hello world!~n", []).</input> Hello world! ok</pre> - <p>The general format of a control sequence is <c>~F.P.PadModC</c>. - The character <c>C</c> determines the type of control sequence + <p>The general format of a control sequence is <c>~F.P.PadModC</c>.</p> + <p>Character <c>C</c> determines the type of control sequence to be used, <c>F</c> and <c>P</c> are optional numeric arguments. If <c>F</c>, <c>P</c>, or <c>Pad</c> is <c>*</c>, the next argument in <c>Data</c> is used as the numeric value of <c>F</c> or <c>P</c>.</p> - <p><c>F</c> is the <c>field width</c> of the printed argument. A - negative value means that the argument will be left justified - within the field, otherwise it will be right justified. If no - field width is specified, the required print width will be - used. If the field width specified is too small, then the - whole field will be filled with <c>*</c> characters.</p> - <p><c>P</c> is the <c>precision</c> of the printed argument. A - default value is used if no precision is specified. The - interpretation of precision depends on the control sequences. - Unless otherwise specified, the argument <c>within</c> is used - to determine print width.</p> - <p><c>Pad</c> is the padding character. This is the character - used to pad the printed representation of the argument so that - it conforms to the specified field width and precision. Only - one padding character can be specified and, whenever - applicable, it is used for both the field width and precision. - The default padding character is <c>' '</c> (space).</p> - <p><c>Mod</c> is the control sequence modifier. It is either a - single character (currently only <c>t</c>, for Unicode - translation, and <c>l</c>, for stopping <c>p</c> and - <c>P</c> from detecting printable characters, are supported) - that changes the interpretation of Data.</p> - <p>The following control sequences are available:</p> + <list type="bulleted"> + <item> + <p><c>F</c> is the <c>field width</c> of the printed argument. A + negative value means that the argument is left-justified + within the field, otherwise right-justified. If no + field width is specified, the required print width is + used. If the field width specified is too small, the + whole field is filled with <c>*</c> characters.</p> + </item> + <item> + <p><c>P</c> is the <c>precision</c> of the printed argument. A + default value is used if no precision is specified. The + interpretation of precision depends on the control sequences. + Unless otherwise specified, argument <c>within</c> is used + to determine print width.</p> + </item> + <item> + <p><c>Pad</c> is the padding character. This is the character + used to pad the printed representation of the argument so that + it conforms to the specified field width and precision. Only + one padding character can be specified and, whenever + applicable, it is used for both the field width and precision. + The default padding character is <c>' '</c> (space).</p> + </item> + <item> + <p><c>Mod</c> is the control sequence modifier. It is either a + single character (<c>t</c>, for Unicode + translation, and <c>l</c>, for stopping <c>p</c> and + <c>P</c> from detecting printable characters) + that changes the interpretation of <c>Data</c>.</p> + </item> + </list> + <p><em>Available control sequences:</em></p> <taglist> <tag><c>~</c></tag> <item> - <p>The character <c>~</c> is written.</p> + <p>Character <c>~</c> is written.</p> </item> <tag><c>c</c></tag> <item> - <p>The argument is a number that will be interpreted as an + <p>The argument is a number that is interpreted as an ASCII code. The precision is the number of times the - character is printed and it defaults to the field width, - which in turn defaults to 1. The following example - illustrates:</p> + character is printed and defaults to the field width, + which in turn defaults to 1. Example:</p> <pre> 1> <input>io:fwrite("|~10.5c|~-10.5c|~5c|~n", [$a, $b, $c]).</input> | aaaaa|bbbbb |ccccc| ok</pre> <p>If the Unicode translation modifier (<c>t</c>) is in effect, the integer argument can be any number representing a - valid Unicode codepoint, otherwise it should be an integer + valid Unicode codepoint, otherwise it is to be an integer less than or equal to 255, otherwise it is masked with 16#FF:</p> <pre> 2> <input>io:fwrite("~tc~n",[1024]).</input> @@ -435,29 +201,28 @@ ok 3> <input>io:fwrite("~c~n",[1024]).</input> ^@ ok</pre> - </item> <tag><c>f</c></tag> <item> - <p>The argument is a float which is written as + <p>The argument is a float that is written as <c>[-]ddd.ddd</c>, where the precision is the number of digits after the decimal point. The default precision is 6 - and it cannot be less than 1.</p> + and it cannot be < 1.</p> </item> <tag><c>e</c></tag> <item> - <p>The argument is a float which is written as + <p>The argument is a float that is written as <c>[-]d.ddde+-ddd</c>, where the precision is the number of digits written. The default precision is 6 and it - cannot be less than 2.</p> + cannot be < 2.</p> </item> <tag><c>g</c></tag> <item> - <p>The argument is a float which is written as <c>f</c>, if + <p>The argument is a float that is written as <c>f</c>, if it is >= 0.1 and < 10000.0. Otherwise, it is written in the <c>e</c> format. The precision is the number of - significant digits. It defaults to 6 and should not be - less than 2. If the absolute value of the float does not + significant digits. It defaults to 6 and is not to be + < 2. If the absolute value of the float does not allow it to be written in the <c>f</c> format with the desired number of significant digits, it is also written in the <c>e</c> format.</p> @@ -471,8 +236,9 @@ ok</pre> the argument is <c>unicode:chardata()</c>, meaning that binaries are in UTF-8. The characters are printed without quotes. The string is first truncated - by the given precision and then padded and justified - to the given field width. The default precision is the field width.</p> + by the specified precision and then padded and justified to the + specified field width. The default precision is the field width. + </p> <p>This format can be used for printing any object and truncating the output so it fits a specified field:</p> <pre> @@ -484,7 +250,8 @@ ok 3> <input>io:fwrite("|~-10.8s|~n", [io_lib:write({hey, hey, hey})]).</input> |{hey,hey | ok</pre> - <p>A list with integers larger than 255 is considered an error if the Unicode translation modifier is not given:</p> + <p>A list with integers > 255 is considered an error if the + Unicode translation modifier is not specified:</p> <pre> 4> <input>io:fwrite("~ts~n",[[1024]]).</input> \x{400} @@ -497,8 +264,8 @@ ok <item> <p>Writes data with the standard syntax. This is used to output Erlang terms. Atoms are printed within quotes if - they contain embedded non-printable characters, and - floats are printed accurately as the shortest, correctly + they contain embedded non-printable characters. + Floats are printed accurately as the shortest, correctly rounded string.</p> </item> <tag><c>p</c></tag> @@ -506,11 +273,11 @@ ok <p>Writes the data with standard syntax in the same way as <c>~w</c>, but breaks terms whose printed representation is longer than one line into many lines and indents each - line sensibly. Left justification is not supported. + line sensibly. Left-justification is not supported. It also tries to detect lists of printable characters and to output these as strings. The Unicode translation modifier is used for determining - what characters are printable. For example:</p> + what characters are printable, for example:</p> <pre> 1> <input>T = [{attributes,[[{id,age,1.50000},{mode,explicit},</input> <input>{typename,"INTEGER"}], [{id,cho},{mode,explicit},{typename,'Cho'}]]},</input> @@ -531,12 +298,13 @@ ok {tag,{'PRIVATE',3}}, {mode,implicit}] ok</pre> - <p>The field width specifies the maximum line length. It - defaults to 80. The precision specifies the initial + <p>The field width specifies the maximum line length. + Defaults to 80. The precision specifies the initial indentation of the term. It defaults to the number of - characters printed on this line in the <c>same</c> call to - <c>io:fwrite</c> or <c>io:format</c>. For example, using - <c>T</c> above:</p> + characters printed on this line in the <em>same</em> call to + <seealso marker="#write/1"><c>write/1</c></seealso> or + <seealso marker="#format/1"><c>format/1,2,3</c></seealso>. + For example, using <c>T</c> above:</p> <pre> 4> <input>io:fwrite("Here T = ~62p~n", [T]).</input> Here T = [{attributes,[[{id,age,1.5}, @@ -549,8 +317,8 @@ Here T = [{attributes,[[{id,age,1.5}, {tag,{'PRIVATE',3}}, {mode,implicit}] ok</pre> - <p>When the modifier <c>l</c> is given no detection of - printable character lists will take place. For example:</p> + <p>When the modifier <c>l</c> is specified, no detection of + printable character lists takes place, for example:</p> <pre> 5> <input>S = [{a,"a"}, {b, "b"}].</input> 6> <input>io:fwrite("~15p~n", [S]).</input> @@ -561,9 +329,9 @@ ok [{a,[97]}, {b,[98]}] ok</pre> - <p>Binaries that look like UTF-8 encoded strings will be + <p>Binaries that look like UTF-8 encoded strings are output with the string syntax if the Unicode translation - modifier is given:</p> + modifier is specified:</p> <pre> 9> <input>io:fwrite("~p~n",[[1024]]).</input> [1024] @@ -578,7 +346,7 @@ ok</pre> <tag><c>W</c></tag> <item> <p>Writes data in the same way as <c>~w</c>, but takes an - extra argument which is the maximum depth to which terms + extra argument that is the maximum depth to which terms are printed. Anything below this depth is replaced with <c>...</c>. For example, using <c>T</c> above:</p> <pre> @@ -587,17 +355,17 @@ ok</pre> [{id,cho},{mode,...},{...}]]},{typename,'Person'}, {tag,{'PRIVATE',3}},{mode,implicit}] ok</pre> - <p>If the maximum depth has been reached, then it is - impossible to read in the resultant output. Also, the + <p>If the maximum depth is reached, it cannot + be read in the resultant output. Also, the <c>,...</c> form in a tuple denotes that there are more elements in the tuple but these are below the print depth.</p> </item> <tag><c>P</c></tag> <item> <p>Writes data in the same way as <c>~p</c>, but takes an - extra argument which is the maximum depth to which terms + extra argument that is the maximum depth to which terms are printed. Anything below this depth is replaced with - <c>...</c>. For example:</p> + <c>...</c>, for example:</p> <pre> 9> <input>io:fwrite("~62P~n", [T,9]).</input> [{attributes,[[{id,age,1.5},{mode,explicit},{typename,...}], @@ -609,9 +377,9 @@ ok</pre> </item> <tag><c>B</c></tag> <item> - <p>Writes an integer in base 2..36, the default base is + <p>Writes an integer in base 2-36, the default base is 10. A leading dash is printed for negative integers.</p> - <p>The precision field selects base. For example:</p> + <p>The precision field selects base, for example:</p> <pre> 1> <input>io:fwrite("~.16B~n", [31]).</input> 1F @@ -629,7 +397,7 @@ ok</pre> prefix to insert before the number, but after the leading dash, if any.</p> <p>The prefix can be a possibly deep list of characters or - an atom.</p> + an atom. Example:</p> <pre> 1> <input>io:fwrite("~X~n", [31,"10#"]).</input> 10#31 @@ -641,7 +409,7 @@ ok</pre> <tag><c>#</c></tag> <item> <p>Like <c>B</c>, but prints the number with an Erlang style - <c>#</c>-separated base prefix.</p> + <c>#</c>-separated base prefix. Example:</p> <pre> 1> <input>io:fwrite("~.10#~n", [31]).</input> 10#31 @@ -671,14 +439,14 @@ ok</pre> <p>Ignores the next term.</p> </item> </taglist> - <p>Returns:</p> + <p>The function returns:</p> <taglist> <tag><c>ok</c></tag> <item> <p>The formatting succeeded.</p> </item> </taglist> - <p>If an error occurs, there is no output. For example:</p> + <p>If an error occurs, there is no output. Example:</p> <pre> 1> <input>io:fwrite("~s ~w ~i ~w ~c ~n",['abc def', 'abc def', {foo, 1},{foo, 1}, 65]).</input> abc def 'abc def' {foo,1} A @@ -692,45 +460,57 @@ ok in function io:o_request/2</pre> <p>In this example, an attempt was made to output the single character 65 with the aid of the string formatting directive - "~s".</p> + <c>"~s"</c>.</p> </desc> </func> + <func> <name name="fread" arity="2"/> <name name="fread" arity="3"/> - <fsummary>Read formatted input</fsummary> + <fsummary>Read formatted input.</fsummary> <type name="server_no_data"/> <desc> - <p>Reads characters from the standard input (<c><anno>IoDevice</anno></c>), - prompting it with <c><anno>Prompt</anno></c>. Interprets the characters in - accordance with <c><anno>Format</anno></c>. <c><anno>Format</anno></c> contains control - sequences which directs the interpretation of the input.</p> - <p><c><anno>Format</anno></c> may contain:</p> + <p>Reads characters from the standard input + (<c><anno>IoDevice</anno></c>), prompting it with + <c><anno>Prompt</anno></c>. Interprets the characters in accordance + with <c><anno>Format</anno></c>. <c><anno>Format</anno></c> contains + control sequences that directs the interpretation of the input.</p> + <p><c><anno>Format</anno></c> can contain the following:</p> <list type="bulleted"> <item> - <p>White space characters (SPACE, TAB and NEWLINE) which - cause input to be read to the next non-white space - character.</p> + <p>Whitespace characters (<em>Space</em>, <em>Tab</em>, and + <em>Newline</em>) that cause input to be read to the next + non-whitespace character.</p> </item> <item> - <p>Ordinary characters which must match the next input + <p>Ordinary characters that must match the next input character.</p> </item> <item> - <p>Control sequences, which have the general format - <c>~*FMC</c>. The character <c>*</c> is an optional - return suppression character. It provides a method to - specify a field which is to be omitted. <c>F</c> is the - <c>field width</c> of the input field, <c>M</c> is an optional - translation modifier (of which <c>t</c> is the only currently - supported, meaning Unicode translation) and <c>C</c> - determines the type of control sequence.</p> - - <p>Unless otherwise specified, leading white-space is + <c>~*FMC</c>, where:</p> + <list type="bulleted"> + <item> + <p>Character <c>*</c> is an optional return suppression + character. It provides a method to specify a field that + is to be omitted.</p> + </item> + <item> + <p><c>F</c> is the <c>field width</c> of the input field.</p> + </item> + <item> + <p><c>M</c> is an optional translation modifier (of which + <c>t</c> is the only supported, meaning Unicode + translation).</p> + </item> + <item> + <p><c>C</c> determines the type of control sequence.</p> + </item> + </list> + <p>Unless otherwise specified, leading whitespace is ignored for all control sequences. An input field cannot - be more than one line wide. The following control - sequences are available:</p> + be more than one line wide.</p> + <p><em>Available control sequences:</em></p> <taglist> <tag><c>~</c></tag> <item> @@ -742,22 +522,22 @@ ok </item> <tag><c>u</c></tag> <item> - <p>An unsigned integer in base 2..36 is expected. The + <p>An unsigned integer in base 2-36 is expected. The field width parameter is used to specify base. Leading - white-space characters are not skipped.</p> + whitespace characters are not skipped.</p> </item> <tag><c>-</c></tag> <item> <p>An optional sign character is expected. A sign - character <c>-</c> gives the return value <c>-1</c>. Sign + character <c>-</c> gives return value <c>-1</c>. Sign character <c>+</c> or none gives <c>1</c>. The field width - parameter is ignored. Leading white-space characters + parameter is ignored. Leading whitespace characters are not skipped.</p> </item> <tag><c>#</c></tag> <item> - <p>An integer in base 2..36 with Erlang-style base - prefix (for example <c>"16#ffff"</c>) is expected.</p> + <p>An integer in base 2-36 with Erlang-style base + prefix (for example, <c>"16#ffff"</c>) is expected.</p> </item> <tag><c>f</c></tag> <item> @@ -766,18 +546,15 @@ ok </item> <tag><c>s</c></tag> <item> - <p>A string of non-white-space characters is read. If a + <p>A string of non-whitespace characters is read. If a field width has been specified, this number of - characters are read and all trailing white-space + characters are read and all trailing whitespace characters are stripped. An Erlang string (list of characters) is returned.</p> - - <p>If Unicode translation is in effect (<c>~ts</c>), - characters larger than 255 are accepted, otherwise - not. With the translation modifier, the list - returned may as a consequence also contain - integers larger than 255:</p> - + <p>If Unicode translation is in effect (<c>~ts</c>), + characters > 255 are accepted, otherwise + not. With the translation modifier, the returned + list can as a consequence also contain integers > 255:</p> <pre> 1> <input>io:fread("Prompt> ","~s").</input> Prompt> <input><Characters beyond latin1 range not printable in this medium></input> @@ -785,22 +562,23 @@ Prompt> <input><Characters beyond latin1 range not printable in this medium&g 2> <input>io:fread("Prompt> ","~ts").</input> Prompt> <input><Characters beyond latin1 range not printable in this medium></input> {ok,[[1091,1085,1080,1094,1086,1076,1077]]}</pre> - </item> <tag><c>a</c></tag> <item> <p>Similar to <c>s</c>, but the resulting string is converted into an atom.</p> - <p>The Unicode translation modifier is not allowed (atoms can not contain characters beyond the latin1 range).</p> + <p>The Unicode translation modifier is not allowed (atoms + cannot contain characters beyond the <c>latin1</c> range).</p> </item> <tag><c>c</c></tag> <item> <p>The number of characters equal to the field width are read (default is 1) and returned as an Erlang string. - However, leading and trailing white-space characters + However, leading and trailing whitespace characters are not omitted as they are with <c>s</c>. All characters are returned.</p> - <p>The Unicode translation modifier works as with <c>s</c>:</p> + <p>The Unicode translation modifier works as with <c>s</c>: + </p> <pre> 1> <input>io:fread("Prompt> ","~c").</input> Prompt> <input><Character beyond latin1 range not printable in this medium></input> @@ -808,21 +586,20 @@ Prompt> <input><Character beyond latin1 range not printable in this medium> 2> <input>io:fread("Prompt> ","~tc").</input> Prompt> <input><Character beyond latin1 range not printable in this medium></input> {ok,[[1091]]}</pre> - </item> <tag><c>l</c></tag> <item> - <p>Returns the number of characters which have been - scanned up to that point, including white-space + <p>Returns the number of characters that have been + scanned up to that point, including whitespace characters.</p> </item> </taglist> - <p>It returns:</p> + <p>The function returns:</p> <taglist> <tag><c>{ok, <anno>Terms</anno>}</c></tag> <item> - <p>The read was successful and <c><anno>Terms</anno></c> is the list - of successfully matched and read items.</p> + <p>The read was successful and <c><anno>Terms</anno></c> is + the list of successfully matched and read items.</p> </item> <tag><c>eof</c></tag> <item> @@ -835,13 +612,14 @@ Prompt> <input><Character beyond latin1 range not printable in this medium> </item> <tag><c>{error, <anno>ErrorDescription</anno>}</c></tag> <item> - <p>The read operation failed and the parameter - <c><anno>ErrorDescription</anno></c> gives a hint about the error.</p> + <p>The read operation failed and parameter + <c><anno>ErrorDescription</anno></c> gives a hint about + the error.</p> </item> </taglist> </item> </list> - <p>Examples:</p> + <p><em>Examples:</em></p> <pre> 20> <input>io:fread('enter>', "~f~f~f").</input> enter><input>1.9 35.5e3 15.0</input> @@ -854,104 +632,127 @@ enter><input>:</input> <input>alan</input> <input>:</input> <input>joe</in {ok, ["alan", " joe "]}</pre> </desc> </func> + <func> - <name name="rows" arity="0"/> - <name name="rows" arity="1"/> - <fsummary>Get the number of rows of an IO device</fsummary> + <name name="get_chars" arity="2"/> + <name name="get_chars" arity="3"/> + <fsummary>Read a specified number of characters.</fsummary> + <type name="server_no_data"/> <desc> - <p>Retrieves the number of rows of the - <c><anno>IoDevice</anno></c> (i.e. the height of a terminal). The function - only succeeds for terminal devices, for all other IO devices - the function returns <c>{error, enotsup}</c></p> + <p>Reads <c><anno>Count</anno></c> characters from standard input + (<c><anno>IoDevice</anno></c>), prompting it with + <c><anno>Prompt</anno></c>.</p> + <p>The function returns:</p> + <taglist> + <tag><c><anno>Data</anno></c></tag> + <item> + <p>The input characters. If the I/O device supports Unicode, + the data can represent codepoints > 255 (the + <c>latin1</c> range). If the I/O server is set to deliver + binaries, they are encoded in UTF-8 (regardless of whether + the I/O device supports Unicode).</p> + </item> + <tag><c>eof</c></tag> + <item> + <p>End of file was encountered.</p> + </item> + <tag><c>{error, <anno>ErrorDescription</anno>}</c></tag> + <item> + <p>Other (rare) error condition, such as <c>{error, estale}</c> + if reading from an NFS file system.</p> + </item> + </taglist> </desc> </func> + <func> - <name name="scan_erl_exprs" arity="1"/> - <name name="scan_erl_exprs" arity="2"/> - <name name="scan_erl_exprs" arity="3"/> - <name name="scan_erl_exprs" arity="4"/> - <fsummary>Read and tokenize Erlang expressions</fsummary> + <name name="get_line" arity="1"/> + <name name="get_line" arity="2"/> + <fsummary>Read a line.</fsummary> <type name="server_no_data"/> <desc> - <p>Reads data from the standard input (<c>IoDevice</c>), - prompting it with <c>Prompt</c>. Reading starts at location - <c>StartLocation</c> (<c>1</c>). The argument <c><anno>Options</anno></c> - is passed on as the <c>Options</c> argument of the - <c>erl_scan:tokens/4</c> function. The data is tokenized as if - it were a - sequence of Erlang expressions until a final dot (<c>.</c>) is - reached. This token is also returned. It returns:</p> + <p>Reads a line from the standard input (<c><anno>IoDevice</anno></c>), + prompting it with <c><anno>Prompt</anno></c>.</p> + <p>The function returns:</p> <taglist> - <tag><c>{ok, Tokens, EndLocation}</c></tag> + <tag><c><anno>Data</anno></c></tag> <item> - <p>The tokenization succeeded.</p> - </item> - <tag><c>{eof, EndLocation}</c></tag> - <item> - <p>End of file was encountered by the tokenizer.</p> + <p>The characters in the line terminated by a line feed (or end of + file). If the I/O device supports Unicode, + the data can represent codepoints > 255 (the + <c>latin1</c> range). If the I/O server is set to deliver + binaries, they are encoded in UTF-8 (regardless of if + the I/O device supports Unicode).</p> </item> <tag><c>eof</c></tag> <item> - <p>End of file was encountered by the I/O-server.</p> + <p>End of file was encountered.</p> </item> - <tag><c>{error, ErrorInfo, ErrorLocation}</c></tag> + <tag><c>{error, <anno>ErrorDescription</anno>}</c></tag> <item> - <p>An error occurred while tokenizing.</p> - </item> - <tag><c>{error, <anno>ErrorDescription</anno>}</c></tag> - <item> - <p>Other (rare) error condition, for instance <c>{error, estale}</c> - if reading from an NFS file system.</p> + <p>Other (rare) error condition, such as <c>{error, estale}</c> + if reading from an NFS file system.</p> </item> </taglist> - <p>Example:</p> - <pre> -23> <input>io:scan_erl_exprs('enter>').</input> -enter><input>abc(), "hey".</input> -{ok,[{atom,1,abc},{'(',1},{')',1},{',',1},{string,1,"hey"},{dot,1}],2} -24> <input>io:scan_erl_exprs('enter>').</input> -enter><input>1.0er.</input> -{error,{1,erl_scan,{illegal,float}},2}</pre> </desc> </func> + <func> - <name name="scan_erl_form" arity="1"/> - <name name="scan_erl_form" arity="2"/> - <name name="scan_erl_form" arity="3"/> - <name name="scan_erl_form" arity="4"/> - <fsummary>Read and tokenize an Erlang form</fsummary> - <type name="server_no_data"/> + <name name="getopts" arity="0"/> + <name name="getopts" arity="1"/> + <fsummary>Get the supported options and values from an I/O server. + </fsummary> <desc> - <p>Reads data from the standard input (<c><anno>IoDevice</anno></c>), - prompting it with <c><anno>Prompt</anno></c>. Starts reading - at location <c><anno>StartLocation</anno></c> (<c>1</c>). The - argument <c><anno>Options</anno></c> is passed on as the - <c>Options</c> argument of the <c>erl_scan:tokens/4</c> - function. The data is tokenized as if it were an - Erlang form - one of the valid Erlang expressions in an - Erlang source file - until a final dot (<c>.</c>) is reached. - This last token is also returned. The return values are the - same as for <c>scan_erl_exprs/1,2,3</c> above.</p> + <p>Requests all available options and their current + values for a specific I/O device, for example:</p> +<pre> +1> <input>{ok,F} = file:open("/dev/null",[read]).</input> +{ok,<0.42.0>} +2> <input>io:getopts(F).</input> +[{binary,false},{encoding,latin1}]</pre> + <p>Here the file I/O server returns all available options for a file, + which are the expected ones, <c>encoding</c> and <c>binary</c>. + However, the standard shell has some more options:</p> +<pre> +3> io:getopts(). +[{expand_fun,#Fun<group.0.120017273>}, + {echo,true}, + {binary,false}, + {encoding,unicode}]</pre> + <p>This example is, as can be seen, run in an environment where the + terminal supports Unicode input and output.</p> </desc> </func> + + <func> + <name name="nl" arity="0"/> + <name name="nl" arity="1"/> + <fsummary>Write a newline.</fsummary> + <desc> + <p>Writes new line to the standard output + (<c><anno>IoDevice</anno></c>).</p> + </desc> + </func> + <func> <name name="parse_erl_exprs" arity="1"/> <name name="parse_erl_exprs" arity="2"/> <name name="parse_erl_exprs" arity="3"/> <name name="parse_erl_exprs" arity="4"/> - <fsummary>Read, tokenize and parse Erlang expressions</fsummary> + <fsummary>Read, tokenize, and parse Erlang expressions.</fsummary> <type name="parse_ret"/> <type name="server_no_data"/> <desc> <p>Reads data from the standard input (<c><anno>IoDevice</anno></c>), prompting it with <c><anno>Prompt</anno></c>. Starts reading at location - <c><anno>StartLocation</anno></c> (<c>1</c>). The argument - <c><anno>Options</anno></c> is passed on as the - <c>Options</c> argument of the <c>erl_scan:tokens/4</c> - function. The data is tokenized and parsed as if it were a - sequence of Erlang expressions until a final dot (<c>.</c>) is reached. - It returns:</p> + <c><anno>StartLocation</anno></c> (<c>1</c>). Argument + <c><anno>Options</anno></c> is passed on as argument + <c>Options</c> of function <seealso marker="erl_scan#tokens/4"> + <c>erl_scan:tokens/4</c></seealso>. The data is tokenized and parsed + as if it was a sequence of Erlang expressions until a final dot + (<c>.</c>) is reached.</p> + <p>The function returns:</p> <taglist> <tag><c>{ok, ExprList, EndLocation}</c></tag> <item> @@ -963,17 +764,17 @@ enter><input>1.0er.</input> </item> <tag><c>eof</c></tag> <item> - <p>End of file was encountered by the I/O-server.</p> + <p>End of file was encountered by the I/O server.</p> </item> <tag><c>{error, ErrorInfo, ErrorLocation}</c></tag> <item> <p>An error occurred while tokenizing or parsing.</p> </item> - <tag><c>{error, <anno>ErrorDescription</anno>}</c></tag> - <item> - <p>Other (rare) error condition, for instance <c>{error, estale}</c> - if reading from an NFS file system.</p> - </item> + <tag><c>{error, <anno>ErrorDescription</anno>}</c></tag> + <item> + <p>Other (rare) error condition, such as <c>{error, estale}</c> + if reading from an NFS file system.</p> + </item> </taglist> <p>Example:</p> <pre> @@ -985,24 +786,25 @@ enter><input>abc("hey".</input> {error,{1,erl_parse,["syntax error before: ",["'.'"]]},2}</pre> </desc> </func> + <func> <name name="parse_erl_form" arity="1"/> <name name="parse_erl_form" arity="2"/> <name name="parse_erl_form" arity="3"/> <name name="parse_erl_form" arity="4"/> - <fsummary>Read, tokenize and parse an Erlang form</fsummary> + <fsummary>Read, tokenize, and parse an Erlang form.</fsummary> <type name="parse_form_ret"/> <type name="server_no_data"/> <desc> <p>Reads data from the standard input (<c><anno>IoDevice</anno></c>), prompting it with <c><anno>Prompt</anno></c>. Starts reading at - location <c><anno>StartLocation</anno></c> (<c>1</c>). The argument - <c><anno>Options</anno></c> is passed on as the - <c>Options</c> argument of the <c>erl_scan:tokens/4</c> - function. The data is tokenized and parsed as if - it were an Erlang form - one of the valid Erlang expressions - in an Erlang source file - until a final dot (<c>.</c>) is reached. It - returns:</p> + location <c><anno>StartLocation</anno></c> (<c>1</c>). Argument + <c><anno>Options</anno></c> is passed on as argument + <c>Options</c> of function <seealso marker="erl_scan#tokens/4"> + <c>erl_scan:tokens/4</c></seealso>. The data is tokenized and parsed + as if it was an Erlang form (one of the valid Erlang expressions + in an Erlang source file) until a final dot (<c>.</c>) is reached.</p> + <p>The function returns:</p> <taglist> <tag><c>{ok, AbsForm, EndLocation}</c></tag> <item> @@ -1014,32 +816,353 @@ enter><input>abc("hey".</input> </item> <tag><c>eof</c></tag> <item> - <p>End of file was encountered by the I/O-server.</p> + <p>End of file was encountered by the I/O server.</p> </item> <tag><c>{error, ErrorInfo, ErrorLocation}</c></tag> <item> <p>An error occurred while tokenizing or parsing.</p> </item> - <tag><c>{error, <anno>ErrorDescription</anno>}</c></tag> - <item> - <p>Other (rare) error condition, for instance <c>{error, estale}</c> - if reading from an NFS file system.</p> - </item> + <tag><c>{error, <anno>ErrorDescription</anno>}</c></tag> + <item> + <p>Other (rare) error condition, such as <c>{error, estale}</c> + if reading from an NFS file system.</p> + </item> + </taglist> + </desc> + </func> + + <func> + <name name="printable_range" arity="0"/> + <fsummary>Get user-requested printable character range.</fsummary> + <desc> + <p>Returns the user-requested range of printable Unicode characters.</p> + <p>The user can request a range of characters that are to be considered + printable in heuristic detection of strings by the shell and by the + formatting functions. This is done by supplying + <c>+pc <range></c> when starting Erlang.</p> + <p>The only valid values for <c><range></c> are + <c>latin1</c> and <c>unicode</c>. <c>latin1</c> means that only code + points < 256 (except control characters, and so on) + are considered printable. <c>unicode</c> means that all printable + characters in all Unicode character ranges are considered printable + by the I/O functions.</p> + <p>By default, Erlang is started so that only the <c>latin1</c> range + of characters indicate that a list of integers is a string.</p> + <p>The simplest way to use the setting is to call + <seealso marker="io_lib#printable_list/1"> + <c>io_lib:printable_list/1</c></seealso>, which uses the return + value of this function to decide if a list is a string of printable + characters.</p> + <note> + <p>In a future release, this function may return more values and + ranges. To avoid compatibility problems, it is recommended to use + function <seealso marker="io_lib#printable_list/1"> + <c>io_lib:printable_list/1</c></seealso>.</p></note> + </desc> + </func> + + <func> + <name name="put_chars" arity="1"/> + <name name="put_chars" arity="2"/> + <fsummary>Write a list of characters.</fsummary> + <desc> + <p>Writes the characters of <c><anno>CharData</anno></c> to the I/O + server (<c><anno>IoDevice</anno></c>).</p> + </desc> + </func> + + <func> + <name name="read" arity="1"/> + <name name="read" arity="2"/> + <fsummary>Read a term.</fsummary> + <type name="server_no_data"/> + <desc> + <p>Reads a term <c><anno>Term</anno></c> from the standard input + (<c><anno>IoDevice</anno></c>), prompting it with + <c><anno>Prompt</anno></c>.</p> + <p>The function returns:</p> + <taglist> + <tag><c>{ok, <anno>Term</anno>}</c></tag> + <item> + <p>The parsing was successful.</p> + </item> + <tag><c>eof</c></tag> + <item> + <p>End of file was encountered.</p> + </item> + <tag><c>{error, <anno>ErrorInfo</anno>}</c></tag> + <item> + <p>The parsing failed.</p> + </item> + <tag><c>{error, <anno>ErrorDescription</anno>}</c></tag> + <item> + <p>Other (rare) error condition, such as <c>{error, estale}</c> + if reading from an NFS file system.</p> + </item> + </taglist> + </desc> + </func> + + <func> + <name name="read" arity="3"/> + <name name="read" arity="4"/> + <fsummary>Read a term.</fsummary> + <type name="server_no_data"/> + <desc> + <p>Reads a term <c><anno>Term</anno></c> from + <c><anno>IoDevice</anno></c>, prompting it + with <c><anno>Prompt</anno></c>. Reading starts at location + <c><anno>StartLocation</anno></c>. Argument + <c><anno>Options</anno></c> is passed on as argument <c>Options</c> + of function <seealso marker="erl_scan#tokens/4"> + <c>erl_scan:tokens/4</c></seealso>.</p> + <p>The function returns:</p> + <taglist> + <tag><c>{ok, Term, <anno>EndLocation</anno>}</c></tag> + <item> + <p>The parsing was successful.</p> + </item> + <tag><c>{eof, <anno>EndLocation</anno>}</c></tag> + <item> + <p>End of file was encountered.</p> + </item> + <tag><c>{error, <anno>ErrorInfo</anno>, + <anno>ErrorLocation</anno>}</c></tag> + <item> + <p>The parsing failed.</p> + </item> + <tag><c>{error, <anno>ErrorDescription</anno>}</c></tag> + <item> + <p>Other (rare) error condition, such as <c>{error, estale}</c> + if reading from an NFS file system.</p> + </item> + </taglist> + </desc> + </func> + + <func> + <name name="rows" arity="0"/> + <name name="rows" arity="1"/> + <fsummary>Get the number of rows of an I/O device.</fsummary> + <desc> + <p>Retrieves the number of rows of <c><anno>IoDevice</anno></c> + (that is, the height of a terminal). The function + only succeeds for terminal devices, for all other I/O devices + the function returns <c>{error, enotsup}</c>.</p> + </desc> + </func> + + <func> + <name name="scan_erl_exprs" arity="1"/> + <name name="scan_erl_exprs" arity="2"/> + <name name="scan_erl_exprs" arity="3"/> + <name name="scan_erl_exprs" arity="4"/> + <fsummary>Read and tokenize Erlang expressions.</fsummary> + <type name="server_no_data"/> + <desc> + <p>Reads data from the standard input (<c>IoDevice</c>), + prompting it with <c>Prompt</c>. Reading starts at location + <c>StartLocation</c> (<c>1</c>). Argument <c><anno>Options</anno></c> + is passed on as argument <c>Options</c> of function + <seealso marker="erl_scan#tokens/4"> + <c>erl_scan:tokens/4</c></seealso>. The data is tokenized as if it + were a sequence of Erlang expressions until a final dot (<c>.</c>) is + reached. This token is also returned.</p> + <p>The function returns:</p> + <taglist> + <tag><c>{ok, Tokens, EndLocation}</c></tag> + <item> + <p>The tokenization succeeded.</p> + </item> + <tag><c>{eof, EndLocation}</c></tag> + <item> + <p>End of file was encountered by the tokenizer.</p> + </item> + <tag><c>eof</c></tag> + <item> + <p>End of file was encountered by the I/O server.</p> + </item> + <tag><c>{error, ErrorInfo, ErrorLocation}</c></tag> + <item> + <p>An error occurred while tokenizing.</p> + </item> + <tag><c>{error, <anno>ErrorDescription</anno>}</c></tag> + <item> + <p>Other (rare) error condition, such as <c>{error, estale}</c> + if reading from an NFS file system.</p> + </item> + </taglist> + <p><em>Example:</em></p> + <pre> +23> <input>io:scan_erl_exprs('enter>').</input> +enter><input>abc(), "hey".</input> +{ok,[{atom,1,abc},{'(',1},{')',1},{',',1},{string,1,"hey"},{dot,1}],2} +24> <input>io:scan_erl_exprs('enter>').</input> +enter><input>1.0er.</input> +{error,{1,erl_scan,{illegal,float}},2}</pre> + </desc> + </func> + + <func> + <name name="scan_erl_form" arity="1"/> + <name name="scan_erl_form" arity="2"/> + <name name="scan_erl_form" arity="3"/> + <name name="scan_erl_form" arity="4"/> + <fsummary>Read and tokenize an Erlang form.</fsummary> + <type name="server_no_data"/> + <desc> + <p>Reads data from the standard input (<c><anno>IoDevice</anno></c>), + prompting it with <c><anno>Prompt</anno></c>. Starts reading + at location <c><anno>StartLocation</anno></c> (<c>1</c>). + Argument <c><anno>Options</anno></c> is passed on as argument + <c>Options</c> of function <seealso marker="erl_scan#tokens/4"> + <c>erl_scan:tokens/4</c></seealso>. The data is tokenized as if it + was an Erlang form (one of the valid Erlang expressions in an + Erlang source file) until a final dot (<c>.</c>) is reached. + This last token is also returned.</p> + <p>The return values are the same as for + <seealso marker="#scan_erl_exprs/1"> + <c>scan_erl_exprs/1,2,3,4</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="setopts" arity="1"/> + <name name="setopts" arity="2"/> + <fsummary>Set options.</fsummary> + <desc> + <p>Set options for the standard I/O device + (<c><anno>IoDevice</anno></c>).</p> + <p>Possible options and values vary depending on the + I/O device. For a list of supported options and their current values + on a specific I/O device, use function + <seealso marker="#getopts/1"><c>getopts/1</c></seealso>.</p> + <p>The options and values supported by the OTP I/O devices + are as follows:</p> + <taglist> + <tag><c>binary</c>, <c>list</c>, or <c>{binary, boolean()}</c></tag> + <item> + <p>If set in binary mode (<c>binary</c> or <c>{binary, true}</c>), + the I/O server sends binary data (encoded in UTF-8) as answers + to the <c>get_line</c>, <c>get_chars</c>, and, if possible, + <c>get_until</c> requests (for details, see section + <seealso marker="io_protocol">The Erlang I/O Protocol</seealso>) + in the User's Guide). The immediate effect is that + <seealso marker="#get_chars/2"><c>get_chars/2,3</c></seealso> and + <seealso marker="#get_line/1"><c>get_line/1,2</c></seealso> + return UTF-8 binaries instead of lists of characters + for the affected I/O device.</p> + <p>By default, all I/O devices in OTP are set in <c>list</c> mode. + However, the I/O functions can handle any of these modes and so + should other, user-written, modules behaving as clients to I/O + servers.</p> + <p>This option is supported by the standard shell + (<c>group.erl</c>), the 'oldshell' (<c>user.erl</c>), and the + file I/O servers.</p> + </item> + <tag><c>{echo, boolean()}</c></tag> + <item> + <p>Denotes if the terminal is to echo input. Only supported for + the standard shell I/O server (<c>group.erl</c>)</p> + </item> + <tag><c>{expand_fun, expand_fun()}</c></tag> + <item> + <p>Provides a function for tab-completion (expansion) + like the Erlang shell. This function is called + when the user presses the <em>Tab</em> key. The expansion is + active when calling line-reading functions, such as + <seealso marker="#get_line/1"><c>get_line/1,2</c></seealso>.</p> + <p>The function is called with the current line, up to + the cursor, as a reversed string. It is to return a + three-tuple: <c>{yes|no, string(), [string(), ...]}</c>. The + first element gives a beep if <c>no</c>, otherwise the + expansion is silent; the second is a string that will be + entered at the cursor position; the third is a list of + possible expansions. If this list is not empty, + it is printed and the current input line is written + once again.</p> + <p>Trivial example (beep on anything except empty line, which + is expanded to <c>"quit"</c>):</p> + <code type="none"> +fun("") -> {yes, "quit", []}; + (_) -> {no, "", ["quit"]} end</code> + <p>This option is only supported by the standard shell + (<c>group.erl</c>).</p> + </item> + <tag><c>{encoding, latin1 | unicode}</c></tag> + <item> + <p>Specifies how characters are input or output from or to the I/O + device, implying that, for example, a terminal is set to handle + Unicode input and output or a file is set to handle UTF-8 data + encoding.</p> + <p>The option <em>does not</em> affect how data is returned from the + I/O functions or how it is sent in the I/O protocol, it only + affects how the I/O device is to handle Unicode characters to the + "physical" device.</p> + <p>The standard shell is set for <c>unicode</c> or <c>latin1</c> + encoding when + the system is started. The encoding is set with the help of the + <c>LANG</c> or <c>LC_CTYPE</c> environment variables on Unix-like + system or by other means on other systems. + So, the user can input Unicode characters and the I/O device + is in <c>{encoding, unicode}</c> mode if the I/O device supports + it. The mode can be changed, if the assumption of the runtime + system is wrong, by setting this option.</p> + <p>The I/O device used when Erlang is started with the "-oldshell" + or "-noshell" flags is by default set to <c>latin1</c> encoding, + meaning that any characters > codepoint 255 are escaped + and that input is expected to be plain 8-bit ISO Latin-1. + If the encoding is changed to Unicode, input and output from + the standard file descriptors are in UTF-8 (regardless of + operating system).</p> + <p>Files can also be set in <c>{encoding, unicode}</c>, meaning + that data is written and read as UTF-8. More encodings are + possible for files, see below.</p> + <p><c>{encoding, unicode | latin1}</c> is supported by both the + standard shell (<c>group.erl</c> including <c>werl</c> on + Windows), the 'oldshell' (<c>user.erl</c>), and the file I/O + servers.</p> + </item> + <tag><c>{encoding, utf8 | utf16 | utf32 | {utf16,big} | + {utf16,little} | {utf32,big} | {utf32,little}}</c></tag> + <item> + <p>For disk files, the encoding can be set to various UTF variants. + This has the effect that data is expected to be read as the + specified encoding from the file, and the data is written in the + specified encoding to the disk file.</p> + <p><c>{encoding, utf8}</c> has the same effect as + <c>{encoding, unicode}</c> on files.</p> + <p>The extended encodings are only supported on disk files + (opened by function + <seealso marker="kernel:file#open/2"> + <c>file:open/2</c></seealso>).</p> + </item> </taglist> </desc> </func> + + <func> + <name name="write" arity="1"/> + <name name="write" arity="2"/> + <fsummary>Write a term.</fsummary> + <desc> + <p>Writes term <c><anno>Term</anno></c> to the standard output + (<c><anno>IoDevice</anno></c>).</p> + </desc> + </func> </funcs> <section> <title>Standard Input/Output</title> - <p>All Erlang processes have a default standard IO device. This + <p>All Erlang processes have a default standard I/O device. This device is used when no <c>IoDevice</c> argument is specified in - the above function calls. However, it is sometimes desirable to - use an explicit <c>IoDevice</c> argument which refers to the - default IO device. This is the case with functions that can - access either a file or the default IO device. The atom + the function calls in this module. However, it is sometimes desirable to + use an explicit <c>IoDevice</c> argument that refers to the + default I/O device. This is the case with functions that can + access either a file or the default I/O device. The atom <c>standard_io</c> has this special meaning. The following example illustrates this:</p> + <pre> 27> <input>io:read('enter>').</input> enter><input>foo.</input> @@ -1047,30 +1170,37 @@ enter><input>foo.</input> 28> <input>io:read(standard_io, 'enter>').</input> enter><input>bar.</input> {ok,bar}</pre> + <p>There is always a process registered under the name of <c>user</c>. This can be used for sending output to the user.</p> </section> + <section> <title>Standard Error</title> - <p>In certain situations, especially when the standard output is redirected, access to an I/O-server specific for error messages might be convenient. The IO device <c>standard_error</c> can be used to direct output to whatever the current operating system considers a suitable IO device for error output. Example on a Unix-like operating system:</p> + <p>In certain situations, especially when the standard output is + redirected, access to an I/O server specific for error messages can be + convenient. The I/O device <c>standard_error</c> can be used to direct + output to whatever the current operating system considers a suitable + I/O device for error output. Example on a Unix-like operating system:</p> + <pre> $ <input>erl -noshell -noinput -eval 'io:format(standard_error,"Error: ~s~n",["error 11"]),'\</input> <input>'init:stop().' > /dev/null</input> Error: error 11</pre> - - - </section> <section> <title>Error Information</title> - <p>The <c>ErrorInfo</c> mentioned above is the standard - <c>ErrorInfo</c> structure which is returned from all IO modules. - It has the format:</p> + <p>The <c>ErrorInfo</c> mentioned in this module is the standard + <c>ErrorInfo</c> structure that is returned from all I/O modules. + It has the following format:</p> + <code type="none"> {ErrorLocation, Module, ErrorDescriptor}</code> - <p>A string which describes the error is obtained with the following + + <p>A string that describes the error is obtained with the following call:</p> + <code type="none"> Module:format_error(ErrorDescriptor)</code> </section> diff --git a/lib/stdlib/doc/src/io_lib.xml b/lib/stdlib/doc/src/io_lib.xml index b22ec15a0c..931e50f6f2 100644 --- a/lib/stdlib/doc/src/io_lib.xml +++ b/lib/stdlib/doc/src/io_lib.xml @@ -29,14 +29,16 @@ <rev></rev> </header> <module>io_lib</module> - <modulesummary>IO Library Functions</modulesummary> + <modulesummary>I/O library functions.</modulesummary> <description> <p>This module contains functions for converting to and from strings (lists of characters). They are used for implementing the - functions in the <c>io</c> module. There is no guarantee that the + functions in the <seealso marker="io"><c>io</c></seealso> module. + There is no guarantee that the character lists returned from some of the functions are flat, - they can be deep lists. <c>lists:flatten/1</c> can be used for - flattening deep lists.</p> + they can be deep lists. Function + <seealso marker="lists#flatten/1"><c>lists:flatten/1</c></seealso> + can be used for flattening deep lists.</p> </description> <datatypes> @@ -45,7 +47,8 @@ </datatype> <datatype> <name name="continuation"/> - <desc><p>A continuation as returned by <seealso marker="#fread/3"><c>fread/3</c></seealso>.</p> + <desc><p>A continuation as returned by + <seealso marker="#fread/3"><c>fread/3</c></seealso>.</p> </desc> </datatype> <datatype> @@ -62,338 +65,377 @@ </datatype> <datatype> <name name="format_spec"/> - <desc><p>Description:</p> + <desc><p>Where:</p> <list type="bulleted"> <item><p><c>control_char</c> is the type of control - sequence: <c>$P</c>, <c>$w</c>, and so on;</p> + sequence: <c>$P</c>, <c>$w</c>, and so on.</p> </item> <item><p><c>args</c> is a list of the arguments used by the control sequence, or an empty list if the control sequence - does not take any arguments;</p> + does not take any arguments.</p> </item> - <item><p><c>width</c> is the field width;</p> + <item><p><c>width</c> is the field width.</p> </item> - <item><p><c>adjust</c> is the adjustment;</p> + <item><p><c>adjust</c> is the adjustment.</p> </item> <item><p><c>precision</c> is the precision of the printed - argument;</p> + argument.</p> </item> - <item><p><c>pad_char</c> is the padding character;</p> + <item><p><c>pad_char</c> is the padding character.</p> </item> - <item><p><c>encoding</c> is set to <c>true</c> if the translation - modifier <c>t</c> is present;</p> + <item><p><c>encoding</c> is set to <c>true</c> if translation + modifier <c>t</c> is present.</p> </item> - <item><p><c>strings</c> is set to <c>false</c> if the modifier + <item><p><c>strings</c> is set to <c>false</c> if modifier <c>l</c> is present.</p> </item> </list> </desc> </datatype> </datatypes> + <funcs> <func> - <name name="nl" arity="0"/> - <fsummary>Write a newline</fsummary> + <name name="build_text" arity="1"/> + <fsummary>Build the output text for a preparsed format list.</fsummary> <desc> - <p>Returns a character list which represents a new line - character.</p> + <p>For details, see + <seealso marker="#scan_format/2"><c>scan_format/2</c></seealso>.</p> </desc> </func> + <func> - <name name="write" arity="1"/> - <name name="write" arity="2"/> - <fsummary>Write a term</fsummary> + <name name="char_list" arity="1"/> + <fsummary>Test for a list of characters.</fsummary> <desc> - <p>Returns a character list which represents <c><anno>Term</anno></c>. The - <c><anno>Depth</anno></c> (-1) argument controls the depth of the - structures written. When the specified depth is reached, - everything below this level is replaced by "...". For - example:</p> - <pre> -1> <input>lists:flatten(io_lib:write({1,[2],[3],[4,5],6,7,8,9})).</input> -"{1,[2],[3],[4,5],6,7,8,9}" -2> <input>lists:flatten(io_lib:write({1,[2],[3],[4,5],6,7,8,9}, 5)).</input> -"{1,[2],[3],[...],...}"</pre> + <p>Returns <c>true</c> if <c><anno>Term</anno></c> is a flat list of + characters in the Unicode range, otherwise <c>false</c>.</p> </desc> </func> + <func> - <name name="print" arity="1"/> - <name name="print" arity="4"/> - <fsummary>Pretty print a term</fsummary> + <name name="deep_char_list" arity="1"/> + <fsummary>Test for a deep list of characters.</fsummary> <desc> - <p>Also returns a list of characters which represents - <c><anno>Term</anno></c>, but breaks representations which are longer than - one line into many lines and indents each line sensibly. It - also tries to detect and output lists of printable characters - as strings. <c><anno>Column</anno></c> is the starting column (1), - <c><anno>LineLength</anno></c> the maximum line length (80), and - <c><anno>Depth</anno></c> (-1) the maximum print depth.</p> + <p>Returns <c>true</c> if <c><anno>Term</anno></c> is a, possibly deep, + list of characters in the Unicode range, otherwise <c>false</c>.</p> </desc> </func> + <func> - <name name="fwrite" arity="2"/> - <name name="format" arity="2"/> - <fsummary>Write formatted output</fsummary> + <name name="deep_latin1_char_list" arity="1"/> + <fsummary>Test for a deep list of characters.</fsummary> <desc> - <p>Returns a character list which represents <c><anno>Data</anno></c> - formatted in accordance with <c><anno>Format</anno></c>. See - <seealso marker="io#fwrite/1">io:fwrite/1,2,3</seealso> for a detailed - description of the available formatting options. A fault is - generated if there is an error in the format string or - argument list.</p> + <p>Returns <c>true</c> if <c><anno>Term</anno></c> is a, possibly deep, + list of characters in the ISO Latin-1 range, otherwise + <c>false</c>.</p> + </desc> + </func> - <p>If (and only if) the Unicode translation modifier is used - in the format string (i.e. ~ts or ~tc), the resulting list - may contain characters beyond the ISO-latin-1 character - range (in other words, numbers larger than 255). If so, the - result is not an ordinary Erlang string(), but can well be - used in any context where Unicode data is allowed.</p> - + <func> + <name name="format" arity="2"/> + <name name="fwrite" arity="2"/> + <fsummary>Write formatted output.</fsummary> + <desc> + <p>Returns a character list that represents <c><anno>Data</anno></c> + formatted in accordance with <c><anno>Format</anno></c>. + For a detailed description of the available formatting options, see + <seealso marker="io#fwrite/1"><c>io:fwrite/1,2,3</c></seealso>. + If the format string or argument list contains an error, a fault is + generated.</p> + <p>If and only if the Unicode translation modifier is used in the + format string (that is, <c>~ts</c> or <c>~tc</c>), the resulting list + can contain characters beyond the ISO Latin-1 character range + (that is, numbers > 255). If so, the + result is not an ordinary Erlang <c>string()</c>, but can well be + used in any context where Unicode data is allowed.</p> </desc> </func> + <func> <name name="fread" arity="2"/> - <fsummary>Read formatted input</fsummary> + <fsummary>Read formatted input.</fsummary> <desc> - <p>Tries to read <c><anno>String</anno></c> in accordance with the control - sequences in <c><anno>Format</anno></c>. See - <seealso marker="io#fread/3">io:fread/3</seealso> for a detailed - description of the available formatting options. It is - assumed that <c><anno>String</anno></c> contains whole lines. It returns:</p> + <p>Tries to read <c><anno>String</anno></c> in accordance with the + control sequences in <c><anno>Format</anno></c>. + For a detailed description of the available formatting options, see + <seealso marker="io#fread/3"><c>io:fread/3</c></seealso>. It is + assumed that <c><anno>String</anno></c> contains whole lines.</p> + <p>The function returns:</p> <taglist> - <tag><c>{ok, <anno>InputList</anno>, <anno>LeftOverChars</anno>}</c></tag> + <tag><c>{ok, <anno>InputList</anno>, + <anno>LeftOverChars</anno>}</c></tag> <item> - <p>The string was read. <c><anno>InputList</anno></c> is the list of - successfully matched and read items, and - <c><anno>LeftOverChars</anno></c> are the input characters not used.</p> + <p>The string was read. <c><anno>InputList</anno></c> is the list + of successfully matched and read items, and + <c><anno>LeftOverChars</anno></c> are the input characters not + used.</p> </item> - <tag><c>{more, <anno>RestFormat</anno>, <anno>Nchars</anno>, <anno>InputStack</anno>}</c></tag> + <tag><c>{more, <anno>RestFormat</anno>, <anno>Nchars</anno>, + <anno>InputStack</anno>}</c></tag> <item> - <p>The string was read, but more input is needed in order - to complete the original format string. <c><anno>RestFormat</anno></c> - is the remaining format string, <c><anno>Nchars</anno></c> the number + <p>The string was read, but more input is needed to complete the + original format string. <c><anno>RestFormat</anno></c> is the + remaining format string, <c><anno>Nchars</anno></c> is the number of characters scanned, and <c><anno>InputStack</anno></c> is the reversed list of inputs matched up to that point.</p> </item> <tag><c>{error, <anno>What</anno>}</c></tag> <item> - <p>The read operation failed and the parameter <c><anno>What</anno></c> + <p>The read operation failed and parameter <c><anno>What</anno></c> gives a hint about the error.</p> </item> </taglist> - <p>Example:</p> + <p><em>Example:</em></p> <pre> 3> <input>io_lib:fread("~f~f~f", "15.6 17.3e-6 24.5").</input> {ok,[15.6,1.73e-5,24.5],[]}</pre> </desc> </func> + <func> <name name="fread" arity="3"/> <fsummary>Re-entrant formatted reader</fsummary> <desc> <p>This is the re-entrant formatted reader. The continuation of - the first call to the functions must be <c>[]</c>. Refer to - Armstrong, Virding, Williams, 'Concurrent Programming in - Erlang', Chapter 13 for a complete description of how the - re-entrant input scheme works.</p> + the first call to the functions must be <c>[]</c>. For a complete + description of how the re-entrant input scheme works, see + Armstrong, Virding, Williams: 'Concurrent Programming in + Erlang', Chapter 13.</p> <p>The function returns:</p> <taglist> - <tag><c>{done, <anno>Result</anno>, <anno>LeftOverChars</anno>}</c></tag> + <tag><c>{done, <anno>Result</anno>, + <anno>LeftOverChars</anno>}</c></tag> <item> - <p>The input is complete. The result is one of the - following:</p> + <p>The input is complete. The result is one of the following:</p> <taglist> <tag><c>{ok, <anno>InputList</anno>}</c></tag> <item> - <p>The string was read. <c><anno>InputList</anno></c> is the list of - successfully matched and read items, and - <c><anno>LeftOverChars</anno></c> are the remaining characters.</p> + <p>The string was read. <c><anno>InputList</anno></c> is the + list of successfully matched and read items, and + <c><anno>LeftOverChars</anno></c> are the remaining + characters.</p> </item> <tag><c>eof</c></tag> <item> - <p>End of file has been encountered. + <p>End of file was encountered. <c><anno>LeftOverChars</anno></c> are the input characters not used.</p> </item> <tag><c>{error, <anno>What</anno>}</c></tag> <item> - <p>An error occurred and the parameter <c><anno>What</anno></c> gives - a hint about the error.</p> + <p>An error occurred and parameter <c><anno>What</anno></c> + gives a hint about the error.</p> </item> </taglist> </item> <tag><c>{more, <anno>Continuation</anno>}</c></tag> <item> <p>More data is required to build a term. - <c><anno>Continuation</anno></c> must be passed to <c>fread/3</c>, + <c><anno>Continuation</anno></c> must be passed to <c>fread/3</c> when more data becomes available.</p> </item> </taglist> </desc> </func> + <func> - <name name="write_atom" arity="1"/> - <fsummary>Write an atom</fsummary> + <name name="indentation" arity="2"/> + <fsummary>Indentation after printing string.</fsummary> <desc> - <p>Returns the list of characters needed to print the atom - <c><anno>Atom</anno></c>.</p> + <p>Returns the indentation if <c><anno>String</anno></c> has been + printed, starting at <c><anno>StartIndent</anno></c>.</p> </desc> </func> + <func> - <name name="write_string" arity="1"/> - <fsummary>Write a string</fsummary> + <name name="latin1_char_list" arity="1"/> + <fsummary>Test for a list of ISO Latin-1 characters.</fsummary> <desc> - <p>Returns the list of characters needed to print - <c><anno>String</anno></c> as a string.</p> + <p>Returns <c>true</c> if <c><anno>Term</anno></c> is a flat list of + characters in the ISO Latin-1 range, otherwise <c>false</c>.</p> </desc> </func> + <func> - <name name="write_string_as_latin1" arity="1"/> - <fsummary>Write a string</fsummary> + <name name="nl" arity="0"/> + <fsummary>Write a newline.</fsummary> <desc> - <p>Returns the list of characters needed to print - <c><anno>String</anno></c> as a string. Non-Latin-1 - characters are escaped.</p> + <p>Returns a character list that represents a new line character.</p> </desc> </func> + <func> - <name name="write_latin1_string" arity="1"/> - <fsummary>Write an ISO-latin-1 string</fsummary> + <name name="print" arity="1"/> + <name name="print" arity="4"/> + <fsummary>Pretty print a term.</fsummary> <desc> - <p>Returns the list of characters needed to print - <c><anno>Latin1String</anno></c> as a string.</p> + <p>Returns a list of characters that represents + <c><anno>Term</anno></c>, but breaks representations longer + than one line into many lines and indents each line sensibly. + Also tries to detect and output lists of printable characters + as strings.</p> + <list type="bulleted"> + <item><c><anno>Column</anno></c> is the starting column; defaults + to 1.</item> + <item><c><anno>LineLength</anno></c> is the maximum line length; + defaults to 80.</item> + <item><c><anno>Depth</anno></c> is the maximum print depth; + defaults to -1, which means no limitation.</item> + </list> </desc> </func> + <func> - <name name="write_char" arity="1"/> - <fsummary>Write a character</fsummary> + <name name="printable_latin1_list" arity="1"/> + <fsummary>Test for a list of printable ISO Latin-1 characters.</fsummary> <desc> - <p>Returns the list of characters needed to print a character - constant in the Unicode character set.</p> + <p>Returns <c>true</c> if <c><anno>Term</anno></c> is a flat list of + printable ISO Latin-1 characters, otherwise <c>false</c>.</p> </desc> </func> + <func> - <name name="write_char_as_latin1" arity="1"/> - <fsummary>Write a character</fsummary> + <name name="printable_list" arity="1"/> + <fsummary>Test for a list of printable characters.</fsummary> <desc> - <p>Returns the list of characters needed to print a character - constant in the Unicode character set. Non-Latin-1 characters - are escaped.</p> + <p>Returns <c>true</c> if <c><anno>Term</anno></c> is a flat list of + printable characters, otherwise <c>false</c>.</p> + <p>What is a printable character in this case is determined by + startup flag <c>+pc</c> to the Erlang VM; see + <seealso marker="io#printable_range/0"> + <c>io:printable_range/0</c></seealso> and + <seealso marker="erts:erl"><c>erl(1)</c></seealso>.</p> </desc> </func> + <func> - <name name="write_latin1_char" arity="1"/> - <fsummary>Write an ISO-latin-1 character</fsummary> + <name name="printable_unicode_list" arity="1"/> + <fsummary>Test for a list of printable Unicode characters.</fsummary> <desc> - <p>Returns the list of characters needed to print a character - constant in the ISO-latin-1 character set.</p> + <p>Returns <c>true</c> if <c><anno>Term</anno></c> is a flat list of + printable Unicode characters, otherwise <c>false</c>.</p> </desc> </func> + <func> <name name="scan_format" arity="2"/> - <fsummary>Parse all control sequences in the format string</fsummary> + <fsummary>Parse all control sequences in the format string.</fsummary> <desc> - <p>Returns a list corresponding to the given format string, + <p>Returns a list corresponding to the specified format string, where control sequences have been replaced with - corresponding tuples. This list can be passed to <seealso - marker="#build_text/1">io_lib:build_text/1</seealso> to have - the same effect as <c>io_lib:format(Format, Args)</c>, or to - <seealso - marker="#unscan_format/1">io_lib:unscan_format/1</seealso> - in order to get the corresponding pair of <c>Format</c> and - <c>Args</c> (with every <c>*</c> and corresponding argument - expanded to numeric values).</p> + corresponding tuples. This list can be passed to:</p> + <list type="bulleted"> + <item> + <p><seealso marker="#build_text/1"><c>build_text/1</c></seealso> + to have the same effect as <c>format(Format, Args)</c></p> + </item> + <item> + <p><seealso marker="#unscan_format/1"> + <c>unscan_format/1</c></seealso> to get the corresponding pair + of <c>Format</c> and <c>Args</c> (with every <c>*</c> and + corresponding argument expanded to numeric values)</p> + </item> + </list> <p>A typical use of this function is to replace unbounded-size control sequences like <c>~w</c> and <c>~p</c> with the depth-limited variants <c>~W</c> and <c>~P</c> before - formatting to text, e.g. in a logger.</p> + formatting to text in, for example, a logger.</p> </desc> </func> + <func> <name name="unscan_format" arity="1"/> - <fsummary>Revert a pre-parsed format list to a plain character list - and a list of arguments</fsummary> - <desc> - <p>See <seealso - marker="#scan_format/2">io_lib:scan_format/2</seealso> for - details.</p> - </desc> - </func> - <func> - <name name="build_text" arity="1"/> - <fsummary>Build the output text for a pre-parsed format list</fsummary> + <fsummary>Revert a preparsed format list to a plain character list + and a list of arguments.</fsummary> <desc> - <p>See <seealso - marker="#scan_format/2">io_lib:scan_format/2</seealso> for - details.</p> + <p>For details, see + <seealso marker="#scan_format/2"><c>scan_format/2</c></seealso>.</p> </desc> </func> + <func> - <name name="indentation" arity="2"/> - <fsummary>Indentation after printing string</fsummary> + <name name="write" arity="1"/> + <name name="write" arity="2"/> + <fsummary>Write a term.</fsummary> <desc> - <p>Returns the indentation if <c><anno>String</anno></c> has been printed, - starting at <c><anno>StartIndent</anno></c>.</p> + <p>Returns a character list that represents <c><anno>Term</anno></c>. + Argument <c><anno>Depth</anno></c> controls the depth of the + structures written. When the specified depth is reached, + everything below this level is replaced by "<c>...</c>". + <c><anno>Depth</anno></c> defaults to -1, which means + no limitation.</p> + <p><em>Example:</em></p> + <pre> +1> <input>lists:flatten(io_lib:write({1,[2],[3],[4,5],6,7,8,9})).</input> +"{1,[2],[3],[4,5],6,7,8,9}" +2> <input>lists:flatten(io_lib:write({1,[2],[3],[4,5],6,7,8,9}, 5)).</input> +"{1,[2],[3],[...],...}"</pre> </desc> </func> + <func> - <name name="char_list" arity="1"/> - <fsummary>Test for a list of characters</fsummary> + <name name="write_atom" arity="1"/> + <fsummary>Write an atom.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>Term</anno></c> is a flat list of - characters in the Unicode range, otherwise it returns <c>false</c>.</p> + <p>Returns the list of characters needed to print atom + <c><anno>Atom</anno></c>.</p> </desc> </func> + <func> - <name name="latin1_char_list" arity="1"/> - <fsummary>Test for a list of ISO-latin-1 characters</fsummary> + <name name="write_char" arity="1"/> + <fsummary>Write a character.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>Term</anno></c> is a flat list of - characters in the ISO-latin-1 range, otherwise it returns <c>false</c>.</p> + <p>Returns the list of characters needed to print a character + constant in the Unicode character set.</p> </desc> </func> + <func> - <name name="deep_char_list" arity="1"/> - <fsummary>Test for a deep list of characters</fsummary> + <name name="write_char_as_latin1" arity="1"/> + <fsummary>Write a character.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>Term</anno></c> is a, possibly deep, list - of characters in the Unicode range, otherwise it returns <c>false</c>.</p> + <p>Returns the list of characters needed to print a character + constant in the Unicode character set. Non-Latin-1 characters + are escaped.</p> </desc> </func> + <func> - <name name="deep_latin1_char_list" arity="1"/> - <fsummary>Test for a deep list of characters</fsummary> + <name name="write_latin1_char" arity="1"/> + <fsummary>Write an ISO Latin-1 character.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>Term</anno></c> is a, possibly deep, list - of characters in the ISO-latin-1 range, otherwise it returns <c>false</c>.</p> + <p>Returns the list of characters needed to print a character + constant in the ISO Latin-1 character set.</p> </desc> </func> + <func> - <name name="printable_list" arity="1"/> - <fsummary>Test for a list of printable characters</fsummary> + <name name="write_latin1_string" arity="1"/> + <fsummary>Write an ISO Latin-1 string.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>Term</anno></c> is a flat list of - printable characters, otherwise it returns <c>false</c>.</p> - <p>What is a printable character in this case is determined by the - <c>+pc</c> start up flag to the Erlang VM. See - <seealso marker="io#printable_range/0">io:printable_range/0</seealso> - and <seealso marker="erts:erl#erl">erl(1)</seealso>.</p> + <p>Returns the list of characters needed to print + <c><anno>Latin1String</anno></c> as a string.</p> </desc> </func> + <func> - <name name="printable_latin1_list" arity="1"/> - <fsummary>Test for a list of printable ISO-latin-1 characters</fsummary> + <name name="write_string" arity="1"/> + <fsummary>Write a string.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>Term</anno></c> is a flat list of - printable ISO-latin-1 characters, otherwise it returns <c>false</c>.</p> + <p>Returns the list of characters needed to print + <c><anno>String</anno></c> as a string.</p> </desc> </func> + <func> - <name name="printable_unicode_list" arity="1"/> - <fsummary>Test for a list of printable Unicode characters</fsummary> + <name name="write_string_as_latin1" arity="1"/> + <fsummary>Write a string.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>Term</anno></c> is a flat list of - printable Unicode characters, otherwise it returns <c>false</c>.</p> + <p>Returns the list of characters needed to print + <c><anno>String</anno></c> as a string. Non-Latin-1 + characters are escaped.</p> </desc> </func> </funcs> diff --git a/lib/stdlib/doc/src/io_protocol.xml b/lib/stdlib/doc/src/io_protocol.xml index f2a669a49a..84b5f62c7f 100644 --- a/lib/stdlib/doc/src/io_protocol.xml +++ b/lib/stdlib/doc/src/io_protocol.xml @@ -23,7 +23,7 @@ </legalnotice> - <title>The Erlang I/O-protocol</title> + <title>The Erlang I/O Protocol</title> <prepared>Patrik Nyblom</prepared> <responsible></responsible> <docno></docno> @@ -34,183 +34,217 @@ <file>io_protocol.xml</file> </header> - -<p>The I/O-protocol in Erlang specifies a way for a client to communicate -with an I/O server and vice versa. The I/O server is a process that handles -the requests and performs the requested task on e.g. an IO device. The -client is any Erlang process wishing to read or write data from/to the -IO device.</p> - -<p>The common I/O-protocol has been present in OTP since the -beginning, but has been fairly undocumented and has also somewhat -evolved over the years. In an addendum to Robert Virdings rationale -the original I/O-protocol is described. This document describes the -current I/O-protocol.</p> - -<p>The original I/O-protocol was simple and flexible. Demands for spacial -and execution time efficiency has triggered extensions to the protocol -over the years, making the protocol larger and somewhat less easy to -implement than the original. It can certainly be argued that the -current protocol is too complex, but this text describes how it looks -today, not how it should have looked.</p> - -<p>The basic ideas from the original protocol still hold. The I/O server -and client communicate with one single, rather simplistic protocol and -no server state is ever present in the client. Any I/O server can be -used together with any client code and client code need not be aware -of the actual IO device the I/O server communicates with.</p> - -<section> -<title>Protocol Basics</title> - -<p>As described in Robert's paper, I/O servers and clients communicate using -<c>io_request</c>/<c>io_reply</c> tuples as follows:</p> - -<p><em>{io_request, From, ReplyAs, Request}</em><br/> -<em>{io_reply, ReplyAs, Reply}</em></p> - -<p>The client sends an <c>io_request</c> tuple to the I/O server and -the server eventually sends a corresponding <c>io_reply</c> tuple.</p> - -<list type="bulleted"> -<item><c>From</c> is the <c>pid()</c> of the client, the process which -the I/O server sends the IO reply to.</item> - -<item><c>ReplyAs</c> can be any datum and is returned in the corresponding -<c>io_reply</c>. The <seealso marker="stdlib:io">io</seealso> module monitors -the I/O server, and uses the monitor reference as the <c>ReplyAs</c> datum. -A more complicated client -could have several outstanding I/O requests to the same I/O server and -would then use different references (or something else) to differentiate among -the incoming IO replies. The <c>ReplyAs</c> element should be considered -opaque by the I/O server. Note that the <c>pid()</c> of the I/O server is not -explicitly present in the <c>io_reply</c> tuple. The reply can be sent from any -process, not necessarily the actual I/O server. The <c>ReplyAs</c> element is -the only thing that connects one I/O request with an I/O-reply.</item> - -<item><c>Request</c> and <c>Reply</c> are described below.</item> -</list> - -<p>When an I/O server receives an <c>io_request</c> tuple, it acts upon the actual -<c>Request</c> part and eventually sends an <c>io_reply</c> tuple with the corresponding -<c>Reply</c> part.</p> -</section> -<section> -<title>Output Requests</title> - -<p>To output characters on an IO device, the following <c>Request</c>s exist:</p> - -<p> -<em>{put_chars, Encoding, Characters}</em><br/> -<em>{put_chars, Encoding, Module, Function, Args}</em> -</p> -<list type="bulleted"> -<item><c>Encoding</c> is either <c>unicode</c> or <c>latin1</c>, meaning that the - characters are (in case of binaries) encoded as either UTF-8 or - ISO-latin-1 (pure bytes). A well behaved I/O server should also - return error if list elements contain integers > 255 when - <c>Encoding</c> is set to <c>latin1</c>. Note that this does not in any way tell - how characters should be put on the actual IO device or how the - I/O server should handle them. Different I/O servers may handle the - characters however they want, this simply tells the I/O server which - format the data is expected to have. In the <c>Module</c>/<c>Function</c>/<c>Args</c> - case, <c>Encoding</c> tells which format the designated function - produces. Note that byte-oriented data is simplest sent using the ISO-latin-1 - encoding.</item> - -<item>Characters are the data to be put on the IO device. If <c>Encoding</c> is - <c>latin1</c>, this is an <c>iolist()</c>. If <c>Encoding</c> is <c>unicode</c>, this is an - Erlang standard mixed Unicode list (one integer in a list per - character, characters in binaries represented as UTF-8).</item> - -<item><c>Module</c>, <c>Function</c>, and <c>Args</c> denote a function which will be called to - produce the data (like <c>io_lib:format/2</c>). <c>Args</c> is a list of arguments - to the function. The function should produce data in the given - <c>Encoding</c>. The I/O server should call the function as - <c>apply(Mod, Func, Args)</c> and will put the returned data on the IO device as if it was sent - in a <c>{put_chars, Encoding, Characters}</c> request. If the function - returns anything else than a binary or list or throws an exception, - an error should be sent back to the client.</item> -</list> - -<p>The I/O server replies to the client with an <c>io_reply</c> tuple where the <c>Reply</c> -element is one of:</p> -<p> -<em>ok</em><br/> -<em>{error, Error}</em> -</p> - -<list type="bulleted"> -<item><c>Error</c> describes the error to the client, which may do whatever - it wants with it. The Erlang <seealso marker="stdlib:io">io</seealso> - module typically returns it as is.</item> -</list> - -<p>For backward compatibility the following <c>Request</c>s should also be -handled by an I/O server (these requests should not be present after -R15B of OTP):</p> -<p> -<em>{put_chars, Characters}</em><br/> -<em>{put_chars, Module, Function, Args}</em> -</p> - -<p>These should behave as <c>{put_chars, latin1, Characters}</c> and -<c>{put_chars, latin1, Module, Function, Args}</c> respectively. </p> -</section> -<section> -<title>Input Requests</title> - -<p>To read characters from an IO device, the following <c>Request</c>s exist:</p> - -<p><em>{get_until, Encoding, Prompt, Module, Function, ExtraArgs}</em></p> - -<list type="bulleted"> -<item><c>Encoding</c> denotes how data is to be sent back to the client and - what data is sent to the function denoted by - <c>Module</c>/<c>Function</c>/<c>ExtraArgs</c>. If the function supplied returns data as a - list, the data is converted to this encoding. If however the - function supplied returns data in some other format, no conversion - can be done and it is up to the client supplied function to return - data in a proper way. If <c>Encoding</c> is <c>latin1</c>, lists of integers - 0..255 or binaries containing plain bytes are sent back to the - client when possible; if <c>Encoding</c> is <c>unicode</c>, lists with integers in - the whole Unicode range or binaries encoded in UTF-8 are sent to the - client. The user supplied function will always see lists of integers, never - binaries, but the list may contain numbers > 255 if the <c>Encoding</c> is - <c>unicode</c>.</item> - -<item><c>Prompt</c> is a list of characters (not mixed, no binaries) or an atom - to be output as a prompt for input on the IO device. <c>Prompt</c> is - often ignored by the I/O server and if set to <c>''</c> it should always - be ignored (and result in nothing being written to the IO device).</item> - -<item><p><c>Module</c>, <c>Function</c>, and <c>ExtraArgs</c> denote a function and arguments to - determine when enough data is written. The function should take two - additional arguments, the last state, and a list of characters. The - function should return one of:</p> -<p> -<em>{done, Result, RestChars}</em><br/> -<em>{more, Continuation}</em> -</p> - <p>The <c>Result</c> can be any Erlang term, but if it is a <c>list()</c>, the - I/O server may convert it to a <c>binary()</c> of appropriate format before - returning it to the client, if the I/O server is set in binary mode (see - below).</p> - - <p>The function will be called with the data the I/O server finds on - its IO device, returning <c>{done, Result, RestChars}</c> when enough data is - read (in which case <c>Result</c> is sent to the client and <c>RestChars</c> is - kept in the I/O server as a buffer for subsequent input) or - <c>{more, Continuation}</c>, indicating that more characters are needed to - complete the request. The <c>Continuation</c> will be sent as the state in - subsequent calls to the function when more characters are - available. When no more characters are available, the function - shall return <c>{done, eof, Rest}</c>. - The initial state is the empty list and the data when an - end of file is reached on the IO device is the atom <c>eof</c>. An emulation - of the <c>get_line</c> request could be (inefficiently) implemented using - the following functions:</p> -<code> + <p>The I/O protocol in Erlang enables bi-directional communication between + clients and servers.</p> + + <list type="bulleted"> + <item> + <p>The I/O server is a process that handles the requests and performs + the requested task on, for example, an I/O device.</p> + </item> + <item> + <p>The client is any Erlang process wishing to read or write data from/to + the I/O device.</p> + </item> + </list> + + <p>The common I/O protocol has been present in OTP since the beginning, but + has been undocumented and has also evolved over the years. In an + addendum to Robert Virding's rationale, the original I/O protocol is + described. This section describes the current I/O protocol.</p> + + <p>The original I/O protocol was simple and flexible. Demands for memory + efficiency and execution time efficiency have triggered extensions + to the protocol over the years, making the protocol larger and somewhat + less easy to implement than the original. It can certainly be argued that + the current protocol is too complex, but this section describes how it + looks today, not how it should have looked.</p> + + <p>The basic ideas from the original protocol still hold. The I/O server + and client communicate with one single, rather simplistic protocol and no + server state is ever present in the client. Any I/O server can be used + together with any client code, and the client code does not need to be + aware of the I/O device that the I/O server communicates with.</p> + + <section> + <title>Protocol Basics</title> + <p>As described in Robert's paper, I/O servers and clients communicate + using <c>io_request</c>/<c>io_reply</c> tuples as follows:</p> + + <pre> +{io_request, From, ReplyAs, Request} +{io_reply, ReplyAs, Reply}</pre> + + <p>The client sends an <c>io_request</c> tuple to the I/O server and the + server eventually sends a corresponding <c>io_reply</c> tuple.</p> + + <list type="bulleted"> + <item> + <p><c>From</c> is the <c>pid()</c> of the client, the process which + the I/O server sends the I/O reply to.</p> + </item> + <item> + <p><c>ReplyAs</c> can be any datum and is returned in the + corresponding <c>io_reply</c>. The + <seealso marker="stdlib:io"><c>io</c></seealso> module monitors the + the I/O server and uses the monitor reference as the <c>ReplyAs</c> + datum. A more complicated client can have many outstanding I/O + requests to the same I/O server and can use different references (or + something else) to differentiate among the incoming I/O replies. + Element <c>ReplyAs</c> is to be considered opaque by the I/O + server.</p> + <p>Notice that the <c>pid()</c> of the I/O server is not explicitly + present in tuple <c>io_reply</c>. The reply can be sent from any + process, not necessarily the actual I/O server.</p> + </item> + <item> + <p><c>Request</c> and <c>Reply</c> are described below.</p> + </item> + </list> + + <p>When an I/O server receives an <c>io_request</c> tuple, it acts upon the + <c>Request</c> part and eventually sends an <c>io_reply</c> tuple with + the corresponding <c>Reply</c> part.</p> + </section> + + <section> + <title>Output Requests</title> + <p>To output characters on an I/O device, the following <c>Request</c>s + exist:</p> + + <pre> +{put_chars, Encoding, Characters} +{put_chars, Encoding, Module, Function, Args}</pre> + + <list type="bulleted"> + <item> + <p><c>Encoding</c> is <c>unicode</c> or <c>latin1</c>, meaning that the + characters are (in case of binaries) encoded as UTF-8 or ISO Latin-1 + (pure bytes). A well-behaved I/O server is also to return an error + indication if list elements contain integers > 255 + when <c>Encoding</c> is set to <c>latin1</c>.</p> + <p>Notice that this does not in any way tell how characters are to be + put on the I/O device or handled by the I/O server. Different I/O + servers can handle the characters however they want, this only tells + the I/O server which format the data is expected to have. In the + <c>Module</c>/<c>Function</c>/<c>Args</c> case, <c>Encoding</c> tells + which format the designated function produces.</p> + <p>Notice also that byte-oriented data is simplest sent using the ISO + Latin-1 encoding.</p> + </item> + <item> + <p><c>Characters</c> are the data to be put on the I/O device. If + <c>Encoding</c> is <c>latin1</c>, this is an <c>iolist()</c>. If + <c>Encoding</c> is <c>unicode</c>, this is an Erlang standard mixed + Unicode list (one integer in a list per character, characters in + binaries represented as UTF-8).</p> + </item> + <item> + <p><c>Module</c>, <c>Function</c>, and <c>Args</c> denote a function + that is called to produce the data (like + <seealso marker="stdlib:io_lib#format/2"><c>io_lib:format/2</c></seealso>). + </p> + <p><c>Args</c> is a list of arguments to the function. The function is + to produce data in the specified <c>Encoding</c>. The I/O server is + to call the function as <c>apply(Mod, Func, Args)</c> and put the + returned data on the I/O device as if it was sent in a + <c>{put_chars, Encoding, Characters}</c> request. If the function + returns anything else than a binary or list, or throws an exception, + an error is to be sent back to the client.</p> + </item> + </list> + + <p>The I/O server replies to the client with an <c>io_reply</c> tuple, where + element <c>Reply</c> is one of:</p> + + <pre> +ok +{error, Error}</pre> + + <list type="bulleted"> + <item><c>Error</c> describes the error to the client, which can do + whatever it wants with it. The + <seealso marker="stdlib:io"><c>io</c></seealso> module typically + returns it "as is".</item> + </list> + + <p>For backward compatibility, the following <c>Request</c>s are also to be + handled by an I/O server (they are not to be present after + Erlang/OTP R15B):</p> + + <pre> +{put_chars, Characters} +{put_chars, Module, Function, Args}</pre> + + <p>These are to behave as <c>{put_chars, latin1, Characters}</c> and + <c>{put_chars, latin1, Module, Function, Args}</c>, respectively.</p> + </section> + + <section> + <title>Input Requests</title> + <p>To read characters from an I/O device, the following <c>Request</c>s + exist:</p> + + <pre> +{get_until, Encoding, Prompt, Module, Function, ExtraArgs}</pre> + + <list type="bulleted"> + <item> + <p><c>Encoding</c> denotes how data is to be sent back to the client + and what data is sent to the function denoted by + <c>Module</c>/<c>Function</c>/<c>ExtraArgs</c>. If the function + supplied returns data as a list, the data is converted to this + encoding. If the function supplied returns data in some other format, + no conversion can be done, and it is up to the client-supplied + function to return data in a proper way.</p> + <p>If <c>Encoding</c> is <c>latin1</c>, lists of integers <c>0..255</c> + or binaries containing plain bytes are sent back to the client when + possible. If <c>Encoding</c> is <c>unicode</c>, lists with integers + in the whole Unicode range or binaries encoded in UTF-8 are sent to + the client. The user-supplied function always sees lists of + integers, never binaries, but the list can contain numbers > 255 + if <c>Encoding</c> is <c>unicode</c>.</p> + </item> + <item> + <p><c>Prompt</c> is a list of characters (not mixed, no binaries) or an + atom to be output as a prompt for input on the I/O device. + <c>Prompt</c> is often ignored by the I/O server; if set to <c>''</c>, + it is always to be ignored (and results in nothing being written to + the I/O device).</p> + </item> + <item> + <p><c>Module</c>, <c>Function</c>, and <c>ExtraArgs</c> denote a + function and arguments to determine when enough data is written. The + function is to take two more arguments, the last state, and a list of + characters. The function is to return one of:</p> + <pre> +{done, Result, RestChars} +{more, Continuation}</pre> + <p><c>Result</c> can be any Erlang term, but if it is a <c>list()</c>, + the I/O server can convert it to a <c>binary()</c> of appropriate + format before returning it to the client, if the I/O server is set in + binary mode (see below).</p> + <p>The function is called with the data the I/O server finds on its I/O + device, returning one of:</p> + <list type="bulleted"> + <item> + <p><c>{done, Result, RestChars}</c> when enough data is read. In + this case <c>Result</c> is sent to the client and <c>RestChars</c> + is kept in the I/O server as a buffer for later input.</p> + </item> + <item> + <p><c>{more, Continuation}</c>, which indicates that more + characters are needed to complete the request.</p> + </item> + </list> + <p><c>Continuation</c> is sent as the state in later calls to the + function when more characters are available. When no more characters + are available, the function must return <c>{done, eof, Rest}</c>. The + initial state is the empty list. The data when an end of file is + reached on the IO device is the atom <c>eof</c>.</p> + <p>An emulation of the <c>get_line</c> request can be (inefficiently) + implemented using the following functions:</p> + <code> -module(demo). -export([until_newline/3, get_line/1]). @@ -234,226 +268,253 @@ get_line(IoServer) -> receive {io_reply, IoServer, Data} -> Data - end. -</code> - <p>Note especially that the last element in the <c>Request</c> tuple (<c>[$\n]</c>) - is appended to the argument list when the function is called. The - function should be called like - <c>apply(Module, Function, [ State, Data | ExtraArgs ])</c> by the I/O server</p> -</item> -</list> - -<p>A fixed number of characters is requested using this <c>Request</c>:</p> -<p> -<em>{get_chars, Encoding, Prompt, N}</em> -</p> - -<list type="bulleted"> -<item><c>Encoding</c> and <c>Prompt</c> as for <c>get_until</c>.</item> - -<item><c>N</c> is the number of characters to be read from the IO device.</item> -</list> - -<p>A single line (like in the example above) is requested with this <c>Request</c>:</p> -<p> -<em>{get_line, Encoding, Prompt}</em> -</p> - -<list type="bulleted"> -<item><c>Encoding</c> and <c>Prompt</c> as above.</item> -</list> - -<p>Obviously, the <c>get_chars</c> and <c>get_line</c> could be implemented with the -<c>get_until</c> request (and indeed they were originally), but demands for -efficiency has made these additions necessary.</p> - -<p>The I/O server replies to the client with an <c>io_reply</c> tuple where the <c>Reply</c> -element is one of:</p> -<p> -<em>Data</em><br/> -<em>eof</em><br/> -<em>{error, Error}</em> -</p> - -<list type="bulleted"> -<item><c>Data</c> is the characters read, in either list or binary form - (depending on the I/O server mode, see below).</item> -<item><c>Error</c> describes the error to the client, which may do whatever it - wants with it. The Erlang <seealso marker="stdlib:io">io</seealso> - module typically returns it as is.</item> -<item><c>eof</c> is returned when input end is reached and no more data is -available to the client process.</item> -</list> - -<p>For backward compatibility the following <c>Request</c>s should also be -handled by an I/O server (these reqeusts should not be present after -R15B of OTP):</p> - -<p> -<em>{get_until, Prompt, Module, Function, ExtraArgs}</em><br/> -<em>{get_chars, Prompt, N}</em><br/> -<em>{get_line, Prompt}</em><br/> -</p> - -<p>These should behave as <c>{get_until, latin1, Prompt, Module, Function, -ExtraArgs}</c>, <c>{get_chars, latin1, Prompt, N}</c> and <c>{get_line, latin1, -Prompt}</c> respectively.</p> -</section> -<section> -<title>I/O-server Modes</title> - -<p>Demands for efficiency when reading data from an I/O server has not -only lead to the addition of the <c>get_line</c> and <c>get_chars</c> requests, but -has also added the concept of I/O server options. No options are -mandatory to implement, but all I/O servers in the Erlang standard -libraries honor the <c>binary</c> option, which allows the <c>Data</c> element of the -<c>io_reply</c> tuple to be a binary instead of a list <em>when possible</em>. -If the data is sent as a binary, Unicode data will be sent in the -standard Erlang Unicode -format, i.e. UTF-8 (note that the function of the <c>get_until</c> request still gets -list data regardless of the I/O server mode).</p> - -<p>Note that i.e. the <c>get_until</c> request allows for a function with the data specified as always being a list. Also the return value data from such a function can be of any type (as is indeed the case when an <c>io:fread</c> request is sent to an I/O server). The client has to be prepared for data received as answers to those requests to be in a variety of forms, but the I/O server should convert the results to binaries whenever possible (i.e. when the function supplied to <c>get_until</c> actually returns a list). The example shown later in this text does just that.</p> - -<p>An I/O-server in binary mode will affect the data sent to the client, -so that it has to be able to handle binary data. For convenience, it -is possible to set and retrieve the modes of an I/O server using the -following I/O requests:</p> - -<p> -<em>{setopts, Opts}</em> -</p> - - -<list type="bulleted"> -<item><c>Opts</c> is a list of options in the format recognized by <seealso marker="stdlib:proplists">proplists</seealso> (and - of course by the I/O server itself).</item> -</list> -<p>As an example, the I/O server for the interactive shell (in <c>group.erl</c>) -understands the following options:</p> -<p> -<em>{binary, boolean()}</em> (or <em>binary</em>/<em>list</em>)<br/> -<em>{echo, boolean()}</em><br/> -<em>{expand_fun, fun()}</em><br/> -<em>{encoding, unicode/latin1}</em> (or <em>unicode</em>/<em>latin1</em>) -</p> - -<p>- of which the <c>binary</c> and <c>encoding</c> options are common for all -I/O servers in OTP, while <c>echo</c> and <c>expand</c> are valid only for this -I/O server. It is worth noting that the <c>unicode</c> option notifies how -characters are actually put on the physical IO device, i.e. if the -terminal per se is Unicode aware, it does not affect how characters -are sent in the I/O-protocol, where each request contains encoding -information for the provided or returned data.</p> - -<p>The I/O server should send one of the following as <c>Reply</c>:</p> -<p> -<em>ok</em><br/> -<em>{error, Error}</em> -</p> - -<p>An error (preferably <c>enotsup</c>) is to be expected if the option is -not supported by the I/O server (like if an <c>echo</c> option is sent in a -<c>setopts</c> request to a plain file).</p> - -<p>To retrieve options, this request is used:</p> -<p> -<em>getopts</em> -</p> - -<p>The <c>getopts</c> request asks for a complete list of all options -supported by the I/O server as well as their current values.</p> - -<p>The I/O server replies:</p> -<p> -<em>OptList</em><br/> -<em>{error, Error}</em> -</p> - -<list type="bulleted"> -<item><c>OptList</c> is a list of tuples <c>{Option, Value}</c> where <c>Option</c> is always - an atom.</item> -</list> -</section> -<section> -<title>Multiple I/O Requests</title> - -<p>The <c>Request</c> element can in itself contain several <c>Request</c>s by using -the following format:</p> -<p> -<em>{requests, Requests}</em> -</p> -<list type="bulleted"> -<item><c>Requests</c> is a list of valid <c>io_request</c> tuples for the protocol, they - shall be executed in the order in which they appear in the list and - the execution should continue until one of the requests result in an - error or the list is consumed. The result of the last request is - sent back to the client.</item> -</list> - -<p>The I/O server can for a list of requests send any of the valid results in -the reply:</p> - -<p> -<em>ok</em><br/> -<em>{ok, Data}</em><br/> -<em>{ok, Options}</em><br/> -<em>{error, Error}</em> -</p> -<p>- depending on the actual requests in the list.</p> -</section> -<section> -<title>Optional I/O Requests</title> - -<p>The following I/O request is optional to implement and a client -should be prepared for an error return:</p> -<p> -<em>{get_geometry, Geometry}</em> -</p> -<list type="bulleted"> -<item><c>Geometry</c> is either the atom <c>rows</c> or the atom <c>columns</c>.</item> -</list> -<p>The I/O server should send the <c>Reply</c> as:</p> -<p> -<em>{ok, N}</em><br/> -<em>{error, Error}</em> -</p> - -<list type="bulleted"> -<item><c>N</c> is the number of character rows or columns the IO device has, if - applicable to the IO device the I/O server handles, otherwise <c>{error, - enotsup}</c> is a good answer.</item> -</list> -</section> -<section> -<title>Unimplemented Request Types</title> - -<p>If an I/O server encounters a request it does not recognize (i.e. the -<c>io_request</c> tuple is in the expected format, but the actual <c>Request</c> is -unknown), the I/O server should send a valid reply with the error tuple:</p> -<p> -<em>{error, request}</em> -</p> - -<p>This makes it possible to extend the protocol with optional requests -and for the clients to be somewhat backwards compatible.</p> -</section> -<section> -<title>An Annotated and Working Example I/O Server</title> - -<p>An I/O server is any process capable of handling the I/O protocol. There is -no generic I/O server behavior, but could well be. The framework is -simple enough, a process handling incoming requests, usually both -I/O-requests and other IO device-specific requests (for i.e. positioning, -closing etc.).</p> - -<p>Our example I/O server stores characters in an ETS table, making up a -fairly crude ram-file (it is probably not useful, but working).</p> - -<p>The module begins with the usual directives, a function to start the -I/O server and a main loop handling the requests:</p> - -<code> + end.</code> + <p>Notice that the last element in the <c>Request</c> tuple + (<c>[$\n]</c>) is appended to the argument list when the function is + called. The function is to be called like + <c>apply(Module, Function, [ State, Data | ExtraArgs ])</c> by the + I/O server.</p> + </item> + </list> + + <p>A fixed number of characters is requested using the following + <c>Request</c>:</p> + + <pre> +{get_chars, Encoding, Prompt, N}</pre> + + <list type="bulleted"> + <item> + <p><c>Encoding</c> and <c>Prompt</c> as for <c>get_until</c>.</p> + </item> + <item> + <p><c>N</c> is the number of characters to be read from the I/O + device.</p> + </item> + </list> + + <p>A single line (as in former example) is requested with the + following <c>Request</c>:</p> + + <pre> +{get_line, Encoding, Prompt}</pre> + + <list type="bulleted"> + <item><c>Encoding</c> and <c>Prompt</c> as for <c>get_until</c>.</item> + </list> + + <p>Clearly, <c>get_chars</c> and <c>get_line</c> could be implemented with + the <c>get_until</c> request (and indeed they were originally), but + demands for efficiency have made these additions necessary.</p> + + <p>The I/O server replies to the client with an <c>io_reply</c> tuple, where + element <c>Reply</c> is one of:</p> + + <pre> +Data +eof +{error, Error}</pre> + + <list type="bulleted"> + <item> + <p><c>Data</c> is the characters read, in list or binary form + (depending on the I/O server mode, see the next section).</p> + </item> + <item> + <p><c>eof</c> is returned when input end is reached and no more data is + available to the client process.</p> + </item> + <item> + <p><c>Error</c> describes the error to the client, which can do + whatever it wants with it. The + <seealso marker="stdlib:io"><c>io</c></seealso> module typically + returns it as is.</p> + </item> + </list> + + <p>For backward compatibility, the following <c>Request</c>s are also to be + handled by an I/O server (they are not to be present after + Erlang/OTP R15B):</p> + + <pre> +{get_until, Prompt, Module, Function, ExtraArgs} +{get_chars, Prompt, N} +{get_line, Prompt}</pre> + + <p>These are to behave as + <c>{get_until, latin1, Prompt, Module, Function, ExtraArgs}</c>, + <c>{get_chars, latin1, Prompt, N}</c>, and + <c>{get_line, latin1, Prompt}</c>, respectively.</p> + </section> + + <section> + <title>I/O Server Modes</title> + <p>Demands for efficiency when reading data from an I/O server has not only + lead to the addition of the <c>get_line</c> and <c>get_chars</c> requests, + but has also added the concept of I/O server options. No options are + mandatory to implement, but all I/O servers in the Erlang standard + libraries honor the <c>binary</c> option, which allows element + <c>Data</c> of the <c>io_reply</c> tuple to be a binary instead of a list + <em>when possible</em>. If the data is sent as a binary, Unicode data is + sent in the standard Erlang Unicode format, that is, UTF-8 (notice that + the function of the <c>get_until</c> request still gets list data + regardless of the I/O server mode).</p> + + <p>Notice that the <c>get_until</c> request allows for a function with the + data specified as always being a list. Also, the return value data from + such a function can be of any type (as is indeed the case when an + <seealso marker="stdlib:io#fread/2"><c>io:fread/2,3</c></seealso> + request is sent to an I/O server). + The client must be prepared for data received as + answers to those requests to be in various forms. However, the I/O + server is to convert the results to binaries whenever possible (that is, + when the function supplied to <c>get_until</c> returns a list). This is + done in the example in section + <seealso marker="#example_io_server">An Annotated and Working Example I/O Server</seealso>. + </p> + + <p>An I/O server in binary mode affects the data sent to the client, so that + it must be able to handle binary data. For convenience, the modes of an + I/O server can be set and retrieved using the following I/O requests:</p> + + <pre> +{setopts, Opts}</pre> + + <list type="bulleted"> + <item><c>Opts</c> is a list of options in the format recognized by the + <seealso marker="stdlib:proplists"><c>proplists</c></seealso> module + (and by the I/O server).</item> + </list> + + <p>As an example, the I/O server for the interactive shell (in + <c>group.erl</c>) understands the following options:</p> + + <pre> +{binary, boolean()} (or binary/list) +{echo, boolean()} +{expand_fun, fun()} +{encoding, unicode/latin1} (or unicode/latin1)</pre> + + <p>Options <c>binary</c> and <c>encoding</c> are common for all I/O servers + in OTP, while <c>echo</c> and <c>expand</c> are valid only for this I/O + server. Option <c>unicode</c> notifies how characters are put on the + physical I/O device, that is, if the terminal itself is Unicode-aware. + It does not affect how characters are sent in the I/O protocol, where + each request contains encoding information for the provided or returned + data.</p> + + <p>The I/O server is to send one of the following as <c>Reply</c>:</p> + + <pre> +ok +{error, Error}</pre> + + <p>An error (preferably <c>enotsup</c>) is to be expected if the option is + not supported by the I/O server (like if an <c>echo</c> option is sent in + a <c>setopts</c> request to a plain file).</p> + + <p>To retrieve options, the following request is used:</p> + + <pre> +getopts</pre> + + <p>This request asks for a complete list of all options supported by the + I/O server as well as their current values.</p> + + <p>The I/O server replies:</p> + + <pre> +OptList +{error, Error}</pre> + + <list type="bulleted"> + <item><c>OptList</c> is a list of tuples <c>{Option, Value}</c>, where + <c>Option</c> always is an atom.</item> + </list> + </section> + + <section> + <title>Multiple I/O Requests</title> + <p>The <c>Request</c> element can in itself contain many <c>Request</c>s + by using the following format:</p> + + <pre> +{requests, Requests}</pre> + + <list type="bulleted"> + <item><c>Requests</c> is a list of valid <c>io_request</c> tuples for the + protocol. They must be executed in the order that they appear in + the list. The execution is to continue until one of the requests results + in an error or the list is consumed. The result of the last request is + sent back to the client.</item> + </list> + + <p>The I/O server can, for a list of requests, send any of the following + valid results in the reply, depending on the requests in the list:</p> + + <pre> +ok +{ok, Data} +{ok, Options} +{error, Error}</pre> + </section> + + <section> + <title>Optional I/O Request</title> + <p>The following I/O request is optional to implement and a client is to + be prepared for an error return:</p> + + <pre> +{get_geometry, Geometry}</pre> + + <list type="bulleted"> + <item><c>Geometry</c> is the atom <c>rows</c> or the atom + <c>columns</c>.</item> + </list> + + <p>The I/O server is to send the <c>Reply</c> as:</p> + + <pre> +{ok, N} +{error, Error}</pre> + + <list type="bulleted"> + <item><c>N</c> is the number of character rows or columns that the I/O + device has, if applicable to the I/O device handled by the I/O server, + otherwise <c>{error, enotsup}</c> is a good answer.</item> + </list> + </section> + + <section> + <title>Unimplemented Request Types</title> + <p>If an I/O server encounters a request that it does not recognize (that + is, the <c>io_request</c> tuple has the expected format, but the + <c>Request</c> is unknown), the I/O server is to send a valid reply with + the error tuple:</p> + + <pre> +{error, request}</pre> + + <p>This makes it possible to extend the protocol with optional requests + and for the clients to be somewhat backward compatible.</p> + </section> + + <section> + <title>An Annotated and Working Example I/O Server</title> + <marker id="example_io_server"/> + <p>An I/O server is any process capable of handling the I/O protocol. There + is no generic I/O server behavior, but could well be. The framework is + simple, a process handling incoming requests, usually both I/O-requests + and other I/O device-specific requests (positioning, closing, and so on). + </p> + + <p>The example I/O server stores characters in an ETS table, making + up a fairly crude RAM file.</p> + + <p>The module begins with the usual directives, a function to start the + I/O server and a main loop handling the requests:</p> + + <code> -module(ets_io_server). -export([start_link/0, init/0, loop/1, until_newline/3, until_enough/3]). @@ -490,39 +551,34 @@ loop(State) -> ?MODULE:loop(State#state{position = 0}); _Unknown -> ?MODULE:loop(State) - end. -</code> - -<p>The main loop receives messages from the client (which might be using -the <seealso marker="stdlib:io">io</seealso> module to send requests). -For each request the function -<c>request/2</c> is called and a reply is eventually sent using the <c>reply/3</c> -function.</p> + end.</code> -<p>The "private" message <c>{From, rewind}</c> results in the -current position in the pseudo-file to be reset to 0 (the beginning of -the "file"). This is a typical example of IO device-specific -messages not being part of the I/O-protocol. It is usually a bad idea -to embed such private messages in <c>io_request</c> tuples, as that might be -confusing to the reader.</p> + <p>The main loop receives messages from the client (which can use the + the <seealso marker="stdlib:io"><c>io</c></seealso> module to send + requests). For each request, the function <c>request/2</c> is called and a + reply is eventually sent using function <c>reply/3</c>.</p> -<p>Let us look at the reply function first...</p> + <p>The "private" message <c>{From, rewind}</c> results in the + current position in the pseudo-file to be reset to <c>0</c> (the beginning + of the "file"). This is a typical example of I/O device-specific + messages not being part of the I/O protocol. It is usually a bad idea to + embed such private messages in <c>io_request</c> tuples, as that can + confuse the reader.</p> -<code> + <p>First, we examine the reply function:</p> + <code> reply(From, ReplyAs, Reply) -> - From ! {io_reply, ReplyAs, Reply}. + From ! {io_reply, ReplyAs, Reply}.</code> -</code> + <p>It sends the <c>io_reply</c> tuple back to the client, providing element + <c>ReplyAs</c> received in the request along with the result of the + request, as described earlier.</p> -<p>Simple enough, it sends the <c>io_reply</c> tuple back to the client, -providing the <c>ReplyAs</c> element received in the request along with the -result of the request, as described above.</p> + <p>We need to handle some requests. First the requests for writing + characters:</p> -<p>Now look at the different requests we need to handle. First the -requests for writing characters:</p> - -<code> + <code> request({put_chars, Encoding, Chars}, State) -> put_chars(unicode:characters_to_list(Chars,Encoding),State); request({put_chars, Encoding, Module, Function, Args}, State) -> @@ -531,23 +587,22 @@ request({put_chars, Encoding, Module, Function, Args}, State) -> catch _:_ -> {error, {error,Function}, State} - end; -</code> + end;</code> -<p>The <c>Encoding</c> tells us how the characters in the request are -represented. We want to store the characters as lists in the -ETS table, so we convert them to lists using the -<seealso marker="stdlib:unicode#characters_to_list/2"><c>unicode:characters_to_list/2</c></seealso> function. The conversion function -conveniently accepts the encoding types <c>unicode</c> or <c>latin1</c>, so we can -use <c>Encoding</c> directly.</p> + <p>The <c>Encoding</c> says how the characters in the request are + represented. We want to store the characters as lists in the ETS + table, so we convert them to lists using function + <seealso marker="stdlib:unicode#characters_to_list/2"><c>unicode:characters_to_list/2</c></seealso>. + The conversion function conveniently accepts the encoding types + <c>unicode</c> and <c>latin1</c>, so we can use <c>Encoding</c> directly.</p> -<p>When <c>Module</c>, <c>Function</c> and <c>Arguments</c> are provided, we simply apply it -and do the same thing with the result as if the data was provided -directly.</p> + <p>When <c>Module</c>, <c>Function</c>, and <c>Arguments</c> are provided, + we apply it and do the same with the result as if the data was provided + directly.</p> -<p>Let us handle the requests for retrieving data too:</p> + <p>We handle the requests for retrieving data:</p> -<code> + <code> request({get_until, Encoding, _Prompt, M, F, As}, State) -> get_until(Encoding, M, F, As, State); request({get_chars, Encoding, _Prompt, N}, State) -> @@ -555,17 +610,16 @@ request({get_chars, Encoding, _Prompt, N}, State) -> get_until(Encoding, ?MODULE, until_enough, [N], State); request({get_line, Encoding, _Prompt}, State) -> %% To simplify the code, get_line is implemented using get_until - get_until(Encoding, ?MODULE, until_newline, [$\n], State); -</code> + get_until(Encoding, ?MODULE, until_newline, [$\n], State);</code> -<p>Here we have cheated a little by more or less only implementing -<c>get_until</c> and using internal helpers to implement <c>get_chars</c> and -<c>get_line</c>. In production code, this might be too inefficient, but that -of course depends on the frequency of the different requests. Before -we start actually implementing the functions <c>put_chars/2</c> and -<c>get_until/5</c>, let us look into the few remaining requests:</p> + <p>Here we have cheated a little by more or less only implementing + <c>get_until</c> and using internal helpers to implement <c>get_chars</c> + and <c>get_line</c>. In production code, this can be inefficient, but + that depends on the frequency of the different requests. Before we start + implementing functions <c>put_chars/2</c> and <c>get_until/5</c>, we + examine the few remaining requests:</p> -<code> + <code> request({get_geometry,_}, State) -> {error, {error,enotsup}, State}; request({setopts, Opts}, State) -> @@ -573,23 +627,23 @@ request({setopts, Opts}, State) -> request(getopts, State) -> getopts(State); request({requests, Reqs}, State) -> - multi_request(Reqs, {ok, ok, State}); -</code> + multi_request(Reqs, {ok, ok, State});</code> -<p>The <c>get_geometry</c> request has no meaning for this I/O server, so the -reply will be <c>{error, enotsup}</c>. The only option we handle is the -<c>binary</c>/<c>list</c> option, which is done in separate functions.</p> + <p>Request <c>get_geometry</c> has no meaning for this I/O server, so the + reply is <c>{error, enotsup}</c>. The only option we handle is + <c>binary</c>/<c>list</c>, which is done in separate functions.</p> -<p>The multi-request tag (<c>requests</c>) is handled in a separate loop -function applying the requests in the list one after another, -returning the last result.</p> + <p>The multi-request tag (<c>requests</c>) is handled in a separate loop + function applying the requests in the list one after another, returning + the last result.</p> -<p>What is left is to handle backward compatibility and the <seealso marker="kernel:file">file</seealso> module -(which uses the old requests until backward compatibility with pre-R13 -nodes is no longer needed). Note that the I/O server will not work with -a simple <c>file:write/2</c> if these are not added:</p> + <p>We need to handle backward compatibility and the + <seealso marker="kernel:file"><c>file</c></seealso> module (which + uses the old requests until backward compatibility with pre-R13 nodes is + no longer needed). Notice that the I/O server does not work with a simple + <c>file:write/2</c> if these are not added:</p> -<code> + <code> request({put_chars,Chars}, State) -> request({put_chars,latin1,Chars}, State); request({put_chars,M,F,As}, State) -> @@ -599,38 +653,35 @@ request({get_chars,Prompt,N}, State) -> request({get_line,Prompt}, State) -> request({get_line,latin1,Prompt}, State); request({get_until, Prompt,M,F,As}, State) -> - request({get_until,latin1,Prompt,M,F,As}, State); -</code> + request({get_until,latin1,Prompt,M,F,As}, State);</code> -<p>OK, what is left now is to return <c>{error, request}</c> if the request is -not recognized:</p> + <p><c>{error, request}</c> must be returned if the request is not + recognized:</p> -<code> + <code> request(_Other, State) -> - {error, {error, request}, State}. -</code> + {error, {error, request}, State}.</code> -<p>Let us move further and actually handle the different requests, first -the fairly generic multi-request type:</p> + <p>Next we handle the different requests, first the fairly generic + multi-request type:</p> -<code> + <code> multi_request([R|Rs], {ok, _Res, State}) -> multi_request(Rs, request(R, State)); multi_request([_|_], Error) -> Error; multi_request([], Result) -> - Result. -</code> + Result.</code> -<p>We loop through the requests one at the time, stopping when we either -encounter an error or the list is exhausted. The last return value is -sent back to the client (it is first returned to the main loop and then -sent back by the function <c>io_reply</c>).</p> + <p>We loop through the requests one at the time, stopping when we either + encounter an error or the list is exhausted. The last return value is + sent back to the client (it is first returned to the main loop and then + sent back by function <c>io_reply</c>).</p> -<p>The <c>getopts</c> and <c>setopts</c> requests are also simple to handle, we just -change or read our state record:</p> + <p>Requests <c>getopts</c> and <c>setopts</c> are also simple to handle. + We only change or read the state record:</p> -<code> + <code> setopts(Opts0,State) -> Opts = proplists:unfold( proplists:substitute_negations( @@ -662,46 +713,44 @@ getopts(#state{mode=M} = S) -> true; _ -> false - end}],S}. -</code> + end}],S}.</code> -<p>As a convention, all I/O servers handle both <c>{setopts, [binary]}</c>, -<c>{setopts, [list]}</c> and <c>{setopts,[{binary, boolean()}]}</c>, hence the trick -with <c>proplists:substitute_negations/2</c> and <c>proplists:unfold/1</c>. If -invalid options are sent to us, we send <c>{error, enotsup}</c> back to the -client.</p> + <p>As a convention, all I/O servers handle both <c>{setopts, [binary]}</c>, + <c>{setopts, [list]}</c>, and <c>{setopts,[{binary, boolean()}]}</c>, + hence the trick with <c>proplists:substitute_negations/2</c> and + <c>proplists:unfold/1</c>. If invalid options are sent to us, we send + <c>{error, enotsup}</c> back to the client.</p> -<p>The <c>getopts</c> request should return a list of <c>{Option, Value}</c> tuples, -which has the twofold function of providing both the current values -and the available options of this I/O server. We have only one option, -and hence return that.</p> + <p>Request <c>getopts</c> is to return a list of <c>{Option, Value}</c> + tuples. This has the twofold function of providing both the current values + and the available options of this I/O server. We have only one option, and + hence return that.</p> -<p>So far our I/O server has been fairly generic (except for the <c>rewind</c> -request handled in the main loop and the creation of an ETS table). -Most I/O servers contain code similar to the one above.</p> + <p>So far this I/O server is fairly generic (except for request + <c>rewind</c> handled in the main loop and the creation of an ETS + table). Most I/O servers contain code similar to this one.</p> -<p>To make the example runnable, we now start implementing the actual -reading and writing of the data to/from the ETS table. First the -<c>put_chars/3</c> function:</p> + <p>To make the example runnable, we start implementing the reading and + writing of the data to/from the ETS table. First function + <c>put_chars/3</c>:</p> -<code> + <code> put_chars(Chars, #state{table = T, position = P} = State) -> R = P div ?CHARS_PER_REC, C = P rem ?CHARS_PER_REC, [ apply_update(T,U) || U <- split_data(Chars, R, C) ], - {ok, ok, State#state{position = (P + length(Chars))}}. -</code> + {ok, ok, State#state{position = (P + length(Chars))}}.</code> -<p>We already have the data as (Unicode) lists and therefore just split -the list in runs of a predefined size and put each run in the -table at the current position (and forward). The functions -<c>split_data/3</c> and <c>apply_update/2</c> are implemented below.</p> + <p>We already have the data as (Unicode) lists and therefore only split + the list in runs of a predefined size and put each run in the table at + the current position (and forward). Functions <c>split_data/3</c> and + <c>apply_update/2</c> are implemented below.</p> -<p>Now we want to read data from the table. The <c>get_until/5</c> function reads -data and applies the function until it says it is done. The result is -sent back to the client:</p> + <p>Now we want to read data from the table. Function <c>get_until/5</c> + reads data and applies the function until it says that it is done. The + result is sent back to the client:</p> -<code> + <code> get_until(Encoding, Mod, Func, As, #state{position = P, mode = M, table = T} = State) -> case get_loop(Mod,Func,As,T,P,[]) of @@ -737,34 +786,34 @@ get_loop(M,F,A,T,P,C) -> get_loop(M,F,A,T,NewP,NewC); _ -> {error,F} - end. -</code> - -<p>Here we also handle the mode (<c>binary</c> or <c>list</c>) that can be set by -the <c>setopts</c> request. By default, all OTP I/O servers send data back to -the client as lists, but switching mode to <c>binary</c> might increase -efficiency if the I/O server handles it in an appropriate way. The -implementation of <c>get_until</c> is hard to get efficient as the supplied -function is defined to take lists as arguments, but <c>get_chars</c> and -<c>get_line</c> can be optimized for binary mode. This example does not -optimize anything however. It is important though that the returned -data is of the right type depending on the options set, so we convert -the lists to binaries in the correct encoding <em>if possible</em> -before returning. The function supplied in the <c>get_until</c> request tuple may, -as its final result return anything, so only functions actually -returning lists can get them converted to binaries. If the request -contained the encoding tag <c>unicode</c>, the lists can contain all Unicode -codepoints and the binaries should be in UTF-8, if the encoding tag -was <c>latin1</c>, the client should only get characters in the range -0..255. The function <c>check/2</c> takes care of not returning arbitrary -Unicode codepoints in lists if the encoding was given as <c>latin1</c>. If -the function did not return a list, the check cannot be performed and -the result will be that of the supplied function untouched.</p> - -<p>Now we are more or less done. We implement the utility functions below -to actually manipulate the table:</p> - -<code> + end.</code> + + <p>Here we also handle the mode (<c>binary</c> or <c>list</c>) that can be + set by request <c>setopts</c>. By default, all OTP I/O servers send data + back to the client as lists, but switching mode to <c>binary</c> can + increase efficiency if the I/O server handles it in an appropriate way. + The implementation of <c>get_until</c> is difficult to get efficient, as + the supplied function is defined to take lists as arguments, but + <c>get_chars</c> and <c>get_line</c> can be optimized for binary mode. + However, this example does not optimize anything.</p> + + <p>It is important though that the returned data is of the correct type + depending on the options set. We therefore convert the lists to binaries + in the correct encoding <em>if possible</em> before returning. The + function supplied in the <c>get_until</c> request tuple can, as its final + result return anything, so only functions returning lists can get them + converted to binaries. If the request contains encoding tag + <c>unicode</c>, the lists can contain all Unicode code points and the + binaries are to be in UTF-8. If the encoding tag is <c>latin1</c>, the + client is only to get characters in the range <c>0..255</c>. Function + <c>check/2</c> takes care of not returning arbitrary Unicode code points + in lists if the encoding was specified as <c>latin1</c>. If the function + does not return a list, the check cannot be performed and the result is + that of the supplied function untouched.</p> + + <p>To manipulate the table we implement the following utility functions:</p> + + <code> check(unicode, List) -> List; check(latin1, List) -> @@ -775,18 +824,16 @@ check(latin1, List) -> catch throw:_ -> {error,{cannot_convert, unicode, latin1}} - end. -</code> + end.</code> -<p>The function check takes care of providing an error tuple if Unicode -codepoints above 255 is to be returned if the client requested -latin1.</p> + <p>The function check provides an error tuple if Unicode code points > + 255 are to be returned if the client requested <c>latin1</c>.</p> -<p>The two functions <c>until_newline/3</c> and <c>until_enough/3</c> are helpers used -together with the <c>get_until/5</c> function to implement <c>get_chars</c> and -<c>get_line</c> (inefficiently):</p> - -<code> + <p>The two functions <c>until_newline/3</c> and <c>until_enough/3</c> are + helpers used together with function <c>get_until/5</c> to implement + <c>get_chars</c> and <c>get_line</c> (inefficiently):</p> + + <code> until_newline([],eof,_MyStopCharacter) -> {done,eof,[]}; until_newline(ThisFar,eof,_MyStopCharacter) -> @@ -810,16 +857,15 @@ until_enough(ThisFar,CharList,N) {Res,Rest} = my_split(N,ThisFar ++ CharList, []), {done,Res,Rest}; until_enough(ThisFar,CharList,_N) -> - {more,ThisFar++CharList}. -</code> + {more,ThisFar++CharList}.</code> -<p>As can be seen, the functions above are just the type of functions -that should be provided in <c>get_until</c> requests.</p> + <p>As can be seen, the functions above are just the type of functions that + are to be provided in <c>get_until</c> requests.</p> -<p>Now we only need to read and write the table in an appropriate way to -complete the I/O server:</p> + <p>To complete the I/O server, we only need to read and write the table in + an appropriate way:</p> -<code> + <code> get(P,Tab) -> R = P div ?CHARS_PER_REC, C = P rem ?CHARS_PER_REC, @@ -856,18 +902,16 @@ apply_update(Table, {Row, Col, List}) -> {Part1,_} = my_split(Col,OldData,[]), {_,Part2} = my_split(Col+length(List),OldData,[]), ets:insert(Table,{Row, Part1 ++ List ++ Part2}) - end. -</code> - -<p>The table is read or written in chunks of <c>?CHARS_PER_REC</c>, overwriting -when necessary. The implementation is obviously not efficient, it is -just working.</p> - -<p>This concludes the example. It is fully runnable and you can read or -write to the I/O server by using i.e. the <seealso marker="stdlib:io">io</seealso> module or even the <seealso marker="kernel:file">file</seealso> -module. It is as simple as that to implement a fully fledged I/O server -in Erlang.</p> -</section> + end.</code> + + <p>The table is read or written in chunks of <c>?CHARS_PER_REC</c>, + overwriting when necessary. The implementation is clearly not efficient, + it is just working.</p> + + <p>This concludes the example. It is fully runnable and you can read or + write to the I/O server by using, for example, the + <seealso marker="stdlib:io"><c>io</c></seealso> module or even the + <seealso marker="kernel:file"><c>file</c></seealso> module. It is + as simple as that to implement a fully fledged I/O server in Erlang.</p> + </section> </chapter> - - diff --git a/lib/stdlib/doc/src/lib.xml b/lib/stdlib/doc/src/lib.xml index ac41987eaf..58dad7c9e0 100644 --- a/lib/stdlib/doc/src/lib.xml +++ b/lib/stdlib/doc/src/lib.xml @@ -29,68 +29,73 @@ <rev></rev> </header> <module>lib</module> - <modulesummary>A number of useful library functions</modulesummary> + <modulesummary>Useful library functions.</modulesummary> <description> <warning> - <p>This module is retained for compatibility. It may disappear - without warning in a future release.</p> + <p>This module is retained for backward compatibility. It can disappear + without warning in a future Erlang/OTP release.</p> </warning> </description> + <funcs> <func> - <name name="flush_receive" arity="0"/> - <fsummary>Flush messages</fsummary> - <desc> - <p>Flushes the message buffer of the current process.</p> - </desc> - </func> - <func> <name name="error_message" arity="2"/> - <fsummary>Print error message</fsummary> + <fsummary>Print error message.</fsummary> <desc> <p>Prints error message <c><anno>Args</anno></c> in accordance with - <c><anno>Format</anno></c>. Similar to <c>io:format/2</c>, see - <seealso marker="io#fwrite/1">io(3)</seealso>.</p> + <c><anno>Format</anno></c>. Similar to + <seealso marker="io#format/1"><c>io:format/2</c></seealso>.</p> </desc> </func> + <func> - <name name="progname" arity="0"/> - <fsummary>Return name of Erlang start script</fsummary> + <name name="flush_receive" arity="0"/> + <fsummary>Flush messages.</fsummary> <desc> - <p>Returns the name of the script that started the current - Erlang session.</p> + <p>Flushes the message buffer of the current process.</p> </desc> </func> + <func> <name name="nonl" arity="1"/> - <fsummary>Remove last newline</fsummary> + <fsummary>Remove last newline.</fsummary> <desc> <p>Removes the last newline character, if any, in <c><anno>String1</anno></c>.</p> </desc> </func> + + <func> + <name name="progname" arity="0"/> + <fsummary>Return name of Erlang start script.</fsummary> + <desc> + <p>Returns the name of the script that started the current + Erlang session.</p> + </desc> + </func> + <func> <name name="send" arity="2"/> - <fsummary>Send a message</fsummary> + <fsummary>Send a message.</fsummary> <desc> - <p>This function to makes it possible to send a message using - the <c>apply/3</c> BIF.</p> + <p>Makes it possible to send a message using the <c>apply/3</c> BIF.</p> </desc> </func> + <func> <name name="sendw" arity="2"/> - <fsummary>Send a message and wait for an answer</fsummary> + <fsummary>Send a message and wait for an answer.</fsummary> <desc> - <p>As <c>send/2</c>, but waits for an answer. It is implemented - as follows:</p> + <p>As <seealso marker="#send/2"><c>send/2</c></seealso>, + but waits for an answer. It is implemented as follows:</p> <code type="none"> sendw(To, Msg) -> To ! {self(),Msg}, receive Reply -> Reply end.</code> - <p>The message returned is not necessarily a reply to the - message sent.</p> + <p>The returned message is not necessarily a reply to the sent + message.</p> </desc> </func> </funcs> diff --git a/lib/stdlib/doc/src/lists.xml b/lib/stdlib/doc/src/lists.xml index 03d0063599..60dbae70c2 100644 --- a/lib/stdlib/doc/src/lists.xml +++ b/lib/stdlib/doc/src/lists.xml @@ -25,11 +25,11 @@ <title>lists</title> <prepared>Robert Virding</prepared> <docno>1</docno> - <date>96-09-28</date> + <date>1996-09-28</date> <rev>A</rev> </header> <module>lists</module> - <modulesummary>List Processing Functions</modulesummary> + <modulesummary>List processing functions.</modulesummary> <description> <p>This module contains functions for list processing.</p> @@ -44,132 +44,156 @@ <p>Whenever an <marker id="ordering_function"></marker><em>ordering function</em> <c>F</c> is expected as argument, it is assumed that the - following properties hold of <c>F</c> for all x, y and z:</p> + following properties hold of <c>F</c> for all x, y, and z:</p> + <list type="bulleted"> - <item><p>if x <c>F</c> y and y <c>F</c> x then x = y (<c>F</c> - is antisymmetric);</p> + <item><p>If x <c>F</c> y and y <c>F</c> x, then x = y (<c>F</c> + is antisymmetric).</p> </item> - <item><p>if x <c>F</c> y and y <c>F</c> z then x <c>F</c> z - (<c>F</c> is transitive);</p> + <item><p>If x <c>F</c> y and y <c>F</c> z, then x <c>F</c> z + (<c>F</c> is transitive).</p> </item> <item><p>x <c>F</c> y or y <c>F</c> x (<c>F</c> is total).</p> </item> </list> - <p>An example of a typical ordering function is less than or equal - to, <c>=</2</c>.</p> + <p>An example of a typical ordering function is less than or equal + to: <c>=</2</c>.</p> </description> + <funcs> <func> <name name="all" arity="2"/> - <fsummary>Return true if all elements in the list satisfy<c>Pred</c></fsummary> + <fsummary>Return <c>true</c> if all elements in a list satisfy + <c>Pred</c>.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>Pred</anno>(<anno>Elem</anno>)</c> returns - <c>true</c> for all elements <c><anno>Elem</anno></c> in <c><anno>List</anno></c>, - otherwise <c>false</c>.</p> + <p>Returns <c>true</c> if <c><anno>Pred</anno>(<anno>Elem</anno>)</c> + returns <c>true</c> for all elements <c><anno>Elem</anno></c> in + <c><anno>List</anno></c>, otherwise <c>false</c>.</p> </desc> </func> + <func> <name name="any" arity="2"/> - <fsummary>Return true if any of the elements in the list satisfies<c>Pred</c></fsummary> + <fsummary>Return <c>true</c> if any of the elements in a list + satisfies <c>Pred</c>.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>Pred</anno>(<anno>Elem</anno>)</c> returns - <c>true</c> for at least one element <c><anno>Elem</anno></c> in - <c><anno>List</anno></c>.</p> + <p>Returns <c>true</c> if <c><anno>Pred</anno>(<anno>Elem</anno>)</c> + returns <c>true</c> for at least one element <c><anno>Elem</anno></c> + in <c><anno>List</anno></c>.</p> </desc> </func> + <func> <name name="append" arity="1"/> - <fsummary>Append a list of lists</fsummary> + <fsummary>Append a list of lists.</fsummary> <desc> - <p>Returns a list in which all the sub-lists of - <c><anno>ListOfLists</anno></c> have been appended. For example:</p> + <p>Returns a list in which all the sublists of + <c><anno>ListOfLists</anno></c> have been appended.</p> + <p><em>Example:</em></p> <pre> > <input>lists:append([[1, 2, 3], [a, b], [4, 5, 6]]).</input> [1,2,3,a,b,4,5,6]</pre> </desc> </func> + <func> <name name="append" arity="2"/> - <fsummary>Append two lists</fsummary> + <fsummary>Append two lists.</fsummary> <desc> - <p>Returns a new list <c><anno>List3</anno></c> which is made from + <p>Returns a new list <c><anno>List3</anno></c>, which is made from the elements of <c><anno>List1</anno></c> followed by the elements of - <c><anno>List2</anno></c>. For example:</p> + <c><anno>List2</anno></c>.</p> + <p><em>Example:</em></p> <pre> > <input>lists:append("abc", "def").</input> "abcdef"</pre> <p><c>lists:append(A, B)</c> is equivalent to <c>A ++ B</c>.</p> </desc> </func> + <func> <name name="concat" arity="1"/> - <fsummary>Concatenate a list of atoms</fsummary> + <fsummary>Concatenate a list of atoms.</fsummary> <desc> - <p>Concatenates the text representation of the elements - of <c><anno>Things</anno></c>. The elements of <c><anno>Things</anno></c> can be atoms, - integers, floats or strings.</p> + <p>Concatenates the text representation of the elements of + <c><anno>Things</anno></c>. The elements of <c><anno>Things</anno></c> + can be atoms, integers, floats, or strings.</p> + <p><em>Example:</em></p> <pre> > <input>lists:concat([doc, '/', file, '.', 3]).</input> "doc/file.3"</pre> </desc> </func> + <func> <name name="delete" arity="2"/> - <fsummary>Delete an element from a list</fsummary> + <fsummary>Delete an element from a list.</fsummary> <desc> <p>Returns a copy of <c><anno>List1</anno></c> where the first element matching <c><anno>Elem</anno></c> is deleted, if there is such an element.</p> </desc> </func> + <func> <name name="droplast" arity="1"/> - <fsummary>Drop the last element of a list</fsummary> + <fsummary>Drop the last element of a list.</fsummary> <desc> - <p>Drops the last element of a <c><anno>List</anno></c>. The list should - be non-empty, otherwise the function will crash with a <c>function_clause</c></p> + <p>Drops the last element of a <c><anno>List</anno></c>. The list is to + be non-empty, otherwise the function crashes with a + <c>function_clause</c>.</p> </desc> </func> + <func> <name name="dropwhile" arity="2"/> - <fsummary>Drop elements from a list while a predicate is true</fsummary> + <fsummary>Drop elements from a list while a predicate is <c>true</c>. + </fsummary> <desc> - <p>Drops elements <c><anno>Elem</anno></c> from <c><anno>List1</anno></c> while - <c><anno>Pred</anno>(<anno>Elem</anno>)</c> returns <c>true</c> and returns - the remaining list.</p> + <p>Drops elements <c><anno>Elem</anno></c> from + <c><anno>List1</anno></c> while + <c><anno>Pred</anno>(<anno>Elem</anno>)</c> returns <c>true</c> and + returns the remaining list.</p> </desc> </func> + <func> <name name="duplicate" arity="2"/> - <fsummary>Make N copies of element</fsummary> + <fsummary>Make <c>N</c> copies of element.</fsummary> <desc> - <p>Returns a list which contains <c><anno>N</anno></c> copies of the term - <c><anno>Elem</anno></c>. For example:</p> + <p>Returns a list containing <c><anno>N</anno></c> copies of term + <c><anno>Elem</anno></c>.</p> + <p><em>Example:</em></p> <pre> > <input>lists:duplicate(5, xx).</input> [xx,xx,xx,xx,xx]</pre> </desc> </func> + <func> <name name="filter" arity="2"/> - <fsummary>Choose elements which satisfy a predicate</fsummary> + <fsummary>Select elements that satisfy a predicate.</fsummary> <desc> - <p><c><anno>List2</anno></c> is a list of all elements <c><anno>Elem</anno></c> in - <c><anno>List1</anno></c> for which <c><anno>Pred</anno>(<anno>Elem</anno>)</c> returns - <c>true</c>.</p> + <p><c><anno>List2</anno></c> is a list of all elements + <c><anno>Elem</anno></c> in <c><anno>List1</anno></c> for which + <c><anno>Pred</anno>(<anno>Elem</anno>)</c> returns <c>true</c>.</p> </desc> </func> + <func> <name name="filtermap" arity="2"/> - <fsummary>Filter and map elements which satisfy a function</fsummary> - <desc> - <p>Calls <c><anno>Fun</anno>(<anno>Elem</anno>)</c> on successive elements <c>Elem</c> - of <c><anno>List1</anno></c>. <c><anno>Fun</anno>/2</c> must return either a boolean - or a tuple <c>{true, <anno>Value</anno>}</c>. The function returns the list of elements - for which <c><anno>Fun</anno></c> returns a new value, where a value of <c>true</c> - is synonymous with <c>{true, <anno>Elem</anno>}</c>.</p> - <p>That is, <c>filtermap</c> behaves as if it had been defined as follows:</p> + <fsummary>Filter and map elements that satisfy a function.</fsummary> + <desc> + <p>Calls <c><anno>Fun</anno>(<anno>Elem</anno>)</c> on successive + elements <c>Elem</c> of <c><anno>List1</anno></c>. + <c><anno>Fun</anno>/2</c> must return either a Boolean or a tuple + <c>{true, <anno>Value</anno>}</c>. The function returns the list of + elements for which <c><anno>Fun</anno></c> returns a new value, where + a value of <c>true</c> is synonymous with + <c>{true, <anno>Elem</anno>}</c>.</p> + <p>That is, <c>filtermap</c> behaves as if it had been defined as + follows:</p> <code type="none"> filtermap(Fun, List1) -> lists:foldr(fun(Elem, Acc) -> @@ -179,26 +203,29 @@ filtermap(Fun, List1) -> {true,Value} -> [Value|Acc] end end, [], List1).</code> - <p>Example:</p> + <p><em>Example:</em></p> <pre> > <input>lists:filtermap(fun(X) -> case X rem 2 of 0 -> {true, X div 2}; _ -> false end end, [1,2,3,4,5]).</input> [1,2]</pre> </desc> </func> + <func> <name name="flatlength" arity="1"/> - <fsummary>Length of flattened deep list</fsummary> + <fsummary>Length of flattened deep list.</fsummary> <desc> - <p>Equivalent to <c>length(flatten(<anno>DeepList</anno>))</c>, but more - efficient.</p> + <p>Equivalent to <c>length(flatten(<anno>DeepList</anno>))</c>, but + more efficient.</p> </desc> </func> + <func> <name name="flatmap" arity="2"/> - <fsummary>Map and flatten in one pass</fsummary> + <fsummary>Map and flatten in one pass.</fsummary> <desc> - <p>Takes a function from <c><anno>A</anno></c>s to lists of <c><anno>B</anno></c>s, and a - list of <c><anno>A</anno></c>s (<c><anno>List1</anno></c>) and produces a list of + <p>Takes a function from <c><anno>A</anno></c>s to lists of + <c><anno>B</anno></c>s, and a list of <c><anno>A</anno></c>s + (<c><anno>List1</anno></c>) and produces a list of <c><anno>B</anno></c>s by applying the function to every element in <c><anno>List1</anno></c> and appending the resulting lists.</p> <p>That is, <c>flatmap</c> behaves as if it had been defined as @@ -206,37 +233,42 @@ filtermap(Fun, List1) -> <code type="none"> flatmap(Fun, List1) -> append(map(Fun, List1)).</code> - <p>Example:</p> + <p><em>Example:</em></p> <pre> > <input>lists:flatmap(fun(X)->[X,X] end, [a,b,c]).</input> [a,a,b,b,c,c]</pre> </desc> </func> + <func> <name name="flatten" arity="1"/> - <fsummary>Flatten a deep list</fsummary> + <fsummary>Flatten a deep list.</fsummary> <desc> <p>Returns a flattened version of <c><anno>DeepList</anno></c>.</p> </desc> </func> + <func> <name name="flatten" arity="2"/> - <fsummary>Flatten a deep list</fsummary> + <fsummary>Flatten a deep list.</fsummary> <desc> - <p>Returns a flattened version of <c><anno>DeepList</anno></c> with the tail + <p>Returns a flattened version of <c><anno>DeepList</anno></c> with tail <c><anno>Tail</anno></c> appended.</p> </desc> </func> + <func> <name name="foldl" arity="3"/> - <fsummary>Fold a function over a list</fsummary> - <desc> - <p>Calls <c><anno>Fun</anno>(<anno>Elem</anno>, <anno>AccIn</anno>)</c> on successive elements <c>A</c> - of <c><anno>List</anno></c>, starting with <c><anno>AccIn</anno> == <anno>Acc0</anno></c>. - <c><anno>Fun</anno>/2</c> must return a new accumulator which is passed to - the next call. The function returns the final value of - the accumulator. <c><anno>Acc0</anno></c> is returned if the list is empty. - For example:</p> + <fsummary>Fold a function over a list.</fsummary> + <desc> + <p>Calls <c><anno>Fun</anno>(<anno>Elem</anno>, <anno>AccIn</anno>)</c> + on successive elements <c>A</c> of <c><anno>List</anno></c>, starting + with <c><anno>AccIn</anno> == <anno>Acc0</anno></c>. + <c><anno>Fun</anno>/2</c> must return a new accumulator, which is + passed to the next call. The function returns the final value of + the accumulator. <c><anno>Acc0</anno></c> is returned if the list is + empty.</p> + <p><em>Example:</em></p> <pre> > <input>lists:foldl(fun(X, Sum) -> X + Sum end, 0, [1,2,3,4,5]).</input> 15 @@ -244,12 +276,14 @@ flatmap(Fun, List1) -> 120</pre> </desc> </func> + <func> <name name="foldr" arity="3"/> - <fsummary>Fold a function over a list</fsummary> + <fsummary>Fold a function over a list.</fsummary> <desc> - <p>Like <c>foldl/3</c>, but the list is traversed from right to - left. For example:</p> + <p>Like <seealso marker="#foldl/3"><c>foldl/3</c></seealso>, but the + list is traversed from right to left.</p> + <p><em>Example:</em></p> <pre> > <input>P = fun(A, AccIn) -> io:format("~p ", [A]), AccIn end.</input> #Fun<erl_eval.12.2225172> @@ -257,10 +291,11 @@ flatmap(Fun, List1) -> 1 2 3 void > <input>lists:foldr(P, void, [1,2,3]).</input> 3 2 1 void</pre> - <p><c>foldl/3</c> is tail recursive and would usually be - preferred to <c>foldr/3</c>.</p> + <p><c>foldl/3</c> is tail recursive and is usually preferred to + <c>foldr/3</c>.</p> </desc> </func> + <func> <name name="join" arity="2"/> <fsummary>Insert an element between elements in a list</fsummary> @@ -278,45 +313,52 @@ flatmap(Fun, List1) -> </func> <func> <name name="foreach" arity="2"/> - <fsummary>Apply a function to each element of a list</fsummary> + <fsummary>Apply a function to each element of a list.</fsummary> <desc> - <p>Calls <c><anno>Fun</anno>(<anno>Elem</anno>)</c> for each element <c><anno>Elem</anno></c> in - <c><anno>List</anno></c>. This function is used for its side effects and + <p>Calls <c><anno>Fun</anno>(<anno>Elem</anno>)</c> for each element + <c><anno>Elem</anno></c> in <c><anno>List</anno></c>. This function + is used for its side effects and the evaluation order is defined to be the same as the order of the elements in the list.</p> </desc> </func> + <func> <name name="keydelete" arity="3"/> - <fsummary>Delete an element from a list of tuples</fsummary> + <fsummary>Delete an element from a list of tuples.</fsummary> <type_desc variable="N">1..tuple_size(<anno>Tuple</anno>)</type_desc> <desc> <p>Returns a copy of <c><anno>TupleList1</anno></c> where the first - occurrence of a tuple whose <c><anno>N</anno></c>th element compares equal to + occurrence of a tuple whose <c><anno>N</anno></c>th element compares + equal to <c><anno>Key</anno></c> is deleted, if there is such a tuple.</p> </desc> </func> + <func> <name name="keyfind" arity="3"/> - <fsummary>Search for an element in a list of tuples</fsummary> + <fsummary>Search for an element in a list of tuples.</fsummary> <type_desc variable="N">1..tuple_size(<anno>Tuple</anno>)</type_desc> <desc> <p>Searches the list of tuples <c><anno>TupleList</anno></c> for a - tuple whose <c><anno>N</anno></c>th element compares equal to <c><anno>Key</anno></c>. + tuple whose <c><anno>N</anno></c>th element compares equal to + <c><anno>Key</anno></c>. Returns <c><anno>Tuple</anno></c> if such a tuple is found, otherwise <c>false</c>.</p> </desc> </func> + <func> <name name="keymap" arity="3"/> - <fsummary>Map a function over a list of tuples</fsummary> + <fsummary>Map a function over a list of tuples.</fsummary> <type_desc variable="N">1..tuple_size(<anno>Tuple</anno>)</type_desc> <desc> <p>Returns a list of tuples where, for each tuple in - <c><anno>TupleList1</anno></c>, the <c><anno>N</anno></c>th element <c><anno>Term1</anno></c> of the tuple + <c><anno>TupleList1</anno></c>, the <c><anno>N</anno></c>th element + <c><anno>Term1</anno></c> of the tuple has been replaced with the result of calling <c><anno>Fun</anno>(<anno>Term1</anno>)</c>.</p> - <p>Examples:</p> + <p><em>Examples:</em></p> <pre> > <input>Fun = fun(Atom) -> atom_to_list(Atom) end.</input> #Fun<erl_eval.6.10732646> @@ -324,33 +366,37 @@ flatmap(Fun, List1) -> [{name,"jane",22},{name,"lizzie",20},{name,"lydia",15}]</pre> </desc> </func> + <func> <name name="keymember" arity="3"/> - <fsummary>Test for membership of a list of tuples</fsummary> + <fsummary>Test for membership of a list of tuples.</fsummary> <type_desc variable="N">1..tuple_size(<anno>Tuple</anno>)</type_desc> <desc> - <p>Returns <c>true</c> if there is a tuple in <c><anno>TupleList</anno></c> - whose <c><anno>N</anno></c>th element compares equal to <c><anno>Key</anno></c>, otherwise - <c>false</c>.</p> + <p>Returns <c>true</c> if there is a tuple in + <c><anno>TupleList</anno></c> whose <c><anno>N</anno></c>th element + compares equal to <c><anno>Key</anno></c>, otherwise <c>false</c>.</p> </desc> </func> + <func> <name name="keymerge" arity="3"/> - <fsummary>Merge two key-sorted lists of tuples</fsummary> + <fsummary>Merge two key-sorted lists of tuples.</fsummary> <type_desc variable="N">1..tuple_size(<anno>Tuple</anno>)</type_desc> <desc> - <p>Returns the sorted list formed by merging <c><anno>TupleList1</anno></c> - and <c><anno>TupleList2</anno></c>. The merge is performed on - the <c><anno>N</anno></c>th element of each tuple. Both <c><anno>TupleList1</anno></c> and - <c><anno>TupleList2</anno></c> must be key-sorted prior to evaluating this - function. When two tuples compare equal, the tuple from + <p>Returns the sorted list formed by merging + <c><anno>TupleList1</anno></c> and <c><anno>TupleList2</anno></c>. + The merge is performed on the <c><anno>N</anno></c>th element of each + tuple. Both <c><anno>TupleList1</anno></c> and + <c><anno>TupleList2</anno></c> must be key-sorted before evaluating + this function. When two tuples compare equal, the tuple from <c><anno>TupleList1</anno></c> is picked before the tuple from <c><anno>TupleList2</anno></c>.</p> </desc> </func> + <func> <name name="keyreplace" arity="4"/> - <fsummary>Replace an element in a list of tuples</fsummary> + <fsummary>Replace an element in a list of tuples.</fsummary> <type_desc variable="N">1..tuple_size(<anno>Tuple</anno>)</type_desc> <desc> <p>Returns a copy of <c><anno>TupleList1</anno></c> where the first @@ -359,193 +405,226 @@ flatmap(Fun, List1) -> <c><anno>NewTuple</anno></c>, if there is such a tuple <c>T</c>.</p> </desc> </func> + <func> <name name="keysearch" arity="3"/> - <fsummary>Search for an element in a list of tuples</fsummary> + <fsummary>Search for an element in a list of tuples.</fsummary> <type_desc variable="N">1..tuple_size(<anno>Tuple</anno>)</type_desc> <desc> <p>Searches the list of tuples <c><anno>TupleList</anno></c> for a - tuple whose <c><anno>N</anno></c>th element compares equal to <c><anno>Key</anno></c>. + tuple whose <c><anno>N</anno></c>th element compares equal to + <c><anno>Key</anno></c>. Returns <c>{value, <anno>Tuple</anno>}</c> if such a tuple is found, otherwise <c>false</c>.</p> - <note><p>This function is retained for backward compatibility. - The function <c>lists:keyfind/3</c> (introduced in R13A) - is in most cases more convenient.</p></note> + <note> + <p>This function is retained for backward compatibility. Function + <seealso marker="#keyfind/3"><c>keyfind/3</c></seealso> + is usually more convenient.</p> + </note> </desc> </func> + <func> <name name="keysort" arity="2"/> - <fsummary>Sort a list of tuples</fsummary> + <fsummary>Sort a list of tuples.</fsummary> <type_desc variable="N">1..tuple_size(<anno>Tuple</anno>)</type_desc> <desc> - <p>Returns a list containing the sorted elements of the list - <c><anno>TupleList1</anno></c>. Sorting is performed on the <c><anno>N</anno></c>th - element of the tuples. The sort is stable.</p> + <p>Returns a list containing the sorted elements of list + <c><anno>TupleList1</anno></c>. Sorting is performed on the + <c><anno>N</anno></c>th element of the tuples. The sort is stable.</p> </desc> </func> + <func> <name name="keystore" arity="4"/> - <fsummary>Store an element in a list of tuples</fsummary> + <fsummary>Store an element in a list of tuples.</fsummary> <type_desc variable="N">1..tuple_size(<anno>Tuple</anno>)</type_desc> <desc> <p>Returns a copy of <c><anno>TupleList1</anno></c> where the first occurrence of a tuple <c>T</c> whose <c><anno>N</anno></c>th element compares equal to <c><anno>Key</anno></c> is replaced with - <c><anno>NewTuple</anno></c>, if there is such a tuple <c>T</c>. If there - is no such tuple <c>T</c> a copy of <c><anno>TupleList1</anno></c> where + <c><anno>NewTuple</anno></c>, if there is such a tuple <c>T</c>. + If there is no such tuple <c>T</c>, a copy of + <c><anno>TupleList1</anno></c> where [<c><anno>NewTuple</anno></c>] has been appended to the end is returned.</p> </desc> </func> + <func> <name name="keytake" arity="3"/> - <fsummary>Extract an element from a list of tuples</fsummary> + <fsummary>Extract an element from a list of tuples.</fsummary> <type_desc variable="N">1..tuple_size(<anno>Tuple</anno>)</type_desc> <desc> - <p>Searches the list of tuples <c><anno>TupleList1</anno></c> for a tuple - whose <c><anno>N</anno></c>th element compares equal to <c><anno>Key</anno></c>. - Returns <c>{value, <anno>Tuple</anno>, <anno>TupleList2</anno>}</c> if such a tuple is - found, otherwise <c>false</c>. <c><anno>TupleList2</anno></c> is a copy + <p>Searches the list of tuples <c><anno>TupleList1</anno></c> for a + tuple whose <c><anno>N</anno></c>th element compares equal to + <c><anno>Key</anno></c>. Returns <c>{value, <anno>Tuple</anno>, + <anno>TupleList2</anno>}</c> if such a tuple is found, otherwise + <c>false</c>. <c><anno>TupleList2</anno></c> is a copy of <c><anno>TupleList1</anno></c> where the first occurrence of <c><anno>Tuple</anno></c> has been removed.</p> </desc> </func> + <func> <name name="last" arity="1"/> - <fsummary>Return last element in a list</fsummary> + <fsummary>Return last element in a list.</fsummary> <desc> <p>Returns the last element in <c><anno>List</anno></c>.</p> </desc> </func> + <func> <name name="map" arity="2"/> - <fsummary>Map a function over a list</fsummary> + <fsummary>Map a function over a list.</fsummary> <desc> - <p>Takes a function from <c><anno>A</anno></c>s to <c><anno>B</anno></c>s, and a list of - <c><anno>A</anno></c>s and produces a list of <c><anno>B</anno></c>s by applying + <p>Takes a function from <c><anno>A</anno></c>s to + <c><anno>B</anno></c>s, and a list of <c><anno>A</anno></c>s and + produces a list of <c><anno>B</anno></c>s by applying the function to every element in the list. This function is - used to obtain the return values. The evaluation order is - implementation dependent.</p> + used to obtain the return values. The evaluation order depends on + the implementation.</p> </desc> </func> + <func> <name name="mapfoldl" arity="3"/> - <fsummary>Map and fold in one pass</fsummary> + <fsummary>Map and fold in one pass.</fsummary> <desc> - <p><c>mapfoldl</c> combines the operations of <c>map/2</c> and - <c>foldl/3</c> into one pass. An example, summing - the elements in a list and double them at the same time:</p> + <p>Combines the operations of + <seealso marker="#map/2"><c>map/2</c></seealso> and + <seealso marker="#foldl/3"><c>foldl/3</c></seealso> into one pass.</p> + <p><em>Example:</em></p> + <p>Summing the elements in a list and double them at the same time:</p> <pre> > <input>lists:mapfoldl(fun(X, Sum) -> {2*X, X+Sum} end,</input> <input>0, [1,2,3,4,5]).</input> {[2,4,6,8,10],15}</pre> </desc> </func> + <func> <name name="mapfoldr" arity="3"/> - <fsummary>Map and fold in one pass</fsummary> + <fsummary>Map and fold in one pass.</fsummary> <desc> - <p><c>mapfoldr</c> combines the operations of <c>map/2</c> and - <c>foldr/3</c> into one pass.</p> + <p>Combines the operations of + <seealso marker="#map/2"><c>map/2</c></seealso> and + <seealso marker="#foldr/3"><c>foldr/3</c></seealso> into one pass.</p> </desc> </func> + <func> <name name="max" arity="1"/> - <fsummary>Return maximum element of a list</fsummary> + <fsummary>Return maximum element of a list.</fsummary> <desc> <p>Returns the first element of <c><anno>List</anno></c> that compares greater than or equal to all other elements of <c><anno>List</anno></c>.</p> </desc> </func> + <func> <name name="member" arity="2"/> - <fsummary>Test for membership of a list</fsummary> + <fsummary>Test for membership of a list.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>Elem</anno></c> matches some element of - <c><anno>List</anno></c>, otherwise <c>false</c>.</p> + <p>Returns <c>true</c> if <c><anno>Elem</anno></c> matches some element + of <c><anno>List</anno></c>, otherwise <c>false</c>.</p> </desc> </func> + <func> <name name="merge" arity="1"/> - <fsummary>Merge a list of sorted lists</fsummary> + <fsummary>Merge a list of sorted lists.</fsummary> <desc> - <p>Returns the sorted list formed by merging all the sub-lists - of <c><anno>ListOfLists</anno></c>. All sub-lists must be sorted prior to + <p>Returns the sorted list formed by merging all the sublists of + <c><anno>ListOfLists</anno></c>. All sublists must be sorted before evaluating this function. When two elements compare equal, - the element from the sub-list with the lowest position in - <c><anno>ListOfLists</anno></c> is picked before the other element.</p> + the element from the sublist with the lowest position in + <c><anno>ListOfLists</anno></c> is picked before the other + element.</p> </desc> </func> + <func> <name name="merge" arity="2"/> - <fsummary>Merge two sorted lists</fsummary> + <fsummary>Merge two sorted lists.</fsummary> <desc> - <p>Returns the sorted list formed by merging <c><anno>List1</anno></c> and - <c><anno>List2</anno></c>. Both <c><anno>List1</anno></c> and <c><anno>List2</anno></c> must be - sorted prior to evaluating this function. When two elements + <p>Returns the sorted list formed by merging <c><anno>List1</anno></c> + and <c><anno>List2</anno></c>. Both <c><anno>List1</anno></c> and + <c><anno>List2</anno></c> must be + sorted before evaluating this function. When two elements compare equal, the element from <c><anno>List1</anno></c> is picked before the element from <c><anno>List2</anno></c>.</p> </desc> </func> + <func> <name name="merge" arity="3"/> - <fsummary>Merge two sorted list</fsummary> + <fsummary>Merge two sorted list.</fsummary> <desc> - <p>Returns the sorted list formed by merging <c><anno>List1</anno></c> and - <c><anno>List2</anno></c>. Both <c><anno>List1</anno></c> and <c><anno>List2</anno></c> must be - sorted according to the <seealso + <p>Returns the sorted list formed by merging <c><anno>List1</anno></c> + and <c><anno>List2</anno></c>. Both <c><anno>List1</anno></c> and + <c><anno>List2</anno></c> must be sorted according to the <seealso marker="#ordering_function">ordering function</seealso> - <c><anno>Fun</anno></c> prior to evaluating this function. <c><anno>Fun</anno>(<anno>A</anno>, - <anno>B</anno>)</c> should return <c>true</c> if <c><anno>A</anno></c> compares less - than or equal to <c><anno>B</anno></c> in the ordering, <c>false</c> - otherwise. When two elements compare equal, the element from + <c><anno>Fun</anno></c> before evaluating this function. + <c><anno>Fun</anno>(<anno>A</anno>, <anno>B</anno>)</c> is to return + <c>true</c> if <c><anno>A</anno></c> compares less + than or equal to <c><anno>B</anno></c> in the ordering, otherwise + <c>false</c>. When two elements compare equal, the element from <c><anno>List1</anno></c> is picked before the element from <c><anno>List2</anno></c>.</p> </desc> </func> + <func> <name name="merge3" arity="3"/> - <fsummary>Merge three sorted lists</fsummary> + <fsummary>Merge three sorted lists.</fsummary> <desc> <p>Returns the sorted list formed by merging <c><anno>List1</anno></c>, - <c><anno>List2</anno></c> and <c><anno>List3</anno></c>. All of <c><anno>List1</anno></c>, - <c><anno>List2</anno></c> and <c><anno>List3</anno></c> must be sorted prior to - evaluating this function. When two elements compare equal, - the element from <c><anno>List1</anno></c>, if there is such an element, + <c><anno>List2</anno></c>, and <c><anno>List3</anno></c>. All of + <c><anno>List1</anno></c>, <c><anno>List2</anno></c>, and + <c><anno>List3</anno></c> must be sorted before evaluating this + function. When two elements compare equal, the element from + <c><anno>List1</anno></c>, if there is such an element, is picked before the other element, otherwise the element from <c><anno>List2</anno></c> is picked before the element from <c><anno>List3</anno></c>.</p> </desc> </func> + <func> <name name="min" arity="1"/> - <fsummary>Return minimum element of a list</fsummary> + <fsummary>Return minimum element of a list.</fsummary> <desc> <p>Returns the first element of <c><anno>List</anno></c> that compares less than or equal to all other elements of <c><anno>List</anno></c>.</p> </desc> </func> + <func> <name name="nth" arity="2"/> - <fsummary>Return the Nth element of a list</fsummary> + <fsummary>Return the <c>N</c>th element of a list.</fsummary> <type_desc variable="N">1..length(<anno>List</anno>)</type_desc> <desc> - <p>Returns the <c><anno>N</anno></c>th element of <c><anno>List</anno></c>. For example:</p> + <p>Returns the <c><anno>N</anno></c>th element of + <c><anno>List</anno></c>.</p> + <p><em>Example:</em></p> <pre> > <input>lists:nth(3, [a, b, c, d, e]).</input> c</pre> </desc> </func> + <func> <name name="nthtail" arity="2"/> - <fsummary>Return the Nth tail of a list</fsummary> + <fsummary>Return the <c>N</c>th tail of a list.</fsummary> <type_desc variable="N">0..length(<anno>List</anno>)</type_desc> <desc> - <p>Returns the <c><anno>N</anno></c>th tail of <c><anno>List</anno></c>, that is, the sublist of - <c><anno>List</anno></c> starting at <c><anno>N</anno>+1</c> and continuing up to - the end of the list. For example:</p> + <p>Returns the <c><anno>N</anno></c>th tail of <c><anno>List</anno></c>, + that is, the sublist of <c><anno>List</anno></c> starting at + <c><anno>N</anno>+1</c> and continuing up to the end of the list.</p> + <p><em>Example</em></p> <pre> > <input>lists:nthtail(3, [a, b, c, d, e]).</input> [d,e] @@ -557,70 +636,91 @@ c</pre> []</pre> </desc> </func> + <func> <name name="partition" arity="2"/> - <fsummary>Partition a list into two lists based on a predicate</fsummary> - <desc> - <p>Partitions <c><anno>List</anno></c> into two lists, where the first list - contains all elements for which <c><anno>Pred</anno>(<anno>Elem</anno>)</c> returns - <c>true</c>, and the second list contains all elements for - which <c><anno>Pred</anno>(<anno>Elem</anno>)</c> returns <c>false</c>.</p> - <p>Examples:</p> + <fsummary>Partition a list into two lists based on a predicate.</fsummary> + <desc> + <p>Partitions <c><anno>List</anno></c> into two lists, where the first + list contains all elements for which + <c><anno>Pred</anno>(<anno>Elem</anno>)</c> returns <c>true</c>, + and the second list contains all elements for which + <c><anno>Pred</anno>(<anno>Elem</anno>)</c> returns <c>false</c>.</p> + <p><em>Examples:</em></p> <pre> > <input>lists:partition(fun(A) -> A rem 2 == 1 end, [1,2,3,4,5,6,7]).</input> {[1,3,5,7],[2,4,6]} > <input>lists:partition(fun(A) -> is_atom(A) end, [a,b,1,c,d,2,3,4,e]).</input> {[a,b,c,d,e],[1,2,3,4]}</pre> - <p>See also <c>splitwith/2</c> for a different way to partition - a list.</p> + <p>For a different way to partition a list, see + <seealso marker="#splitwith/2"><c>splitwith/2</c></seealso>.</p> </desc> </func> + <func> <name name="prefix" arity="2"/> - <fsummary>Test for list prefix</fsummary> + <fsummary>Test for list prefix.</fsummary> <desc> <p>Returns <c>true</c> if <c><anno>List1</anno></c> is a prefix of <c><anno>List2</anno></c>, otherwise <c>false</c>.</p> </desc> </func> + <func> <name name="reverse" arity="1"/> - <fsummary>Reverse a list</fsummary> + <fsummary>Reverse a list.</fsummary> <desc> <p>Returns a list with the elements in <c><anno>List1</anno></c> in reverse order.</p> </desc> </func> + <func> <name name="reverse" arity="2"/> - <fsummary>Reverse a list appending a tail</fsummary> + <fsummary>Reverse a list appending a tail.</fsummary> <desc> <p>Returns a list with the elements in <c><anno>List1</anno></c> - in reverse order, with the tail <c><anno>Tail</anno></c> appended. For - example:</p> + in reverse order, with tail <c><anno>Tail</anno></c> appended.</p> + <p><em>Example:</em></p> <pre> > <input>lists:reverse([1, 2, 3, 4], [a, b, c]).</input> [4,3,2,1,a,b,c]</pre> </desc> </func> + <func> <name name="seq" arity="2"/> <name name="seq" arity="3"/> - <fsummary>Generate a sequence of integers</fsummary> + <fsummary>Generate a sequence of integers.</fsummary> <desc> - <p>Returns a sequence of integers which starts with <c><anno>From</anno></c> - and contains the successive results of adding <c><anno>Incr</anno></c> to - the previous element, until <c><anno>To</anno></c> has been reached or - passed (in the latter case, <c><anno>To</anno></c> is not an element of + <p>Returns a sequence of integers that starts with + <c><anno>From</anno></c> and contains the successive results of + adding <c><anno>Incr</anno></c> to the previous element, until + <c><anno>To</anno></c> is reached or passed (in the latter case, + <c><anno>To</anno></c> is not an element of the sequence). <c><anno>Incr</anno></c> defaults to 1.</p> - <p>Failure: If <c><anno>To</anno><<anno>From</anno>-<anno>Incr</anno></c> and <c><anno>Incr</anno></c> - is positive, or if <c><anno>To</anno>><anno>From</anno>-<anno>Incr</anno></c> and <c><anno>Incr</anno></c> is - negative, or if <c><anno>Incr</anno>==0</c> and <c><anno>From</anno>/=<anno>To</anno></c>.</p> + <p>Failures:</p> + <list type="bulleted"> + <item> + <p>If <c><anno>To</anno> < + <anno>From</anno> - <anno>Incr</anno></c> + and <c><anno>Incr</anno> > 0</c>.</p> + </item> + <item> + <p>If <c><anno>To</anno> > + <anno>From</anno> - <anno>Incr</anno></c> and + <c><anno>Incr</anno> < 0</c>.</p> + </item> + <item> + <p>If <c><anno>Incr</anno> =:= 0</c> and + <c><anno>From</anno> =/= <anno>To</anno></c>.</p> + </item> + </list> <p>The following equalities hold for all sequences:</p> <code type="none"> -length(lists:seq(From, To)) == To-From+1 -length(lists:seq(From, To, Incr)) == (To-From+Incr) div Incr</code> - <p>Examples:</p> +length(lists:seq(From, To)) =:= To - From + 1 +length(lists:seq(From, To, Incr)) =:= (To - From + Incr) div Incr</code> + <p><em>Examples:</em></p> <pre> > <input>lists:seq(1, 10).</input> [1,2,3,4,5,6,7,8,9,10] @@ -634,74 +734,87 @@ length(lists:seq(From, To, Incr)) == (To-From+Incr) div Incr</code> [1]</pre> </desc> </func> + <func> <name name="sort" arity="1"/> - <fsummary>Sort a list</fsummary> + <fsummary>Sort a list.</fsummary> <desc> <p>Returns a list containing the sorted elements of <c><anno>List1</anno></c>.</p> </desc> </func> + <func> <name name="sort" arity="2"/> - <fsummary>Sort a list</fsummary> + <fsummary>Sort a list.</fsummary> <desc> <p>Returns a list containing the sorted elements of <c><anno>List1</anno></c>, according to the <seealso marker="#ordering_function">ordering function</seealso> - <c><anno>Fun</anno></c>. <c><anno>Fun</anno>(<anno>A</anno>, <anno>B</anno>)</c> should return <c>true</c> if - <c><anno>A</anno></c> compares less than or equal to <c><anno>B</anno></c> in the - ordering, <c>false</c> otherwise.</p> + <c><anno>Fun</anno></c>. <c><anno>Fun</anno>(<anno>A</anno>, + <anno>B</anno>)</c> is to return <c>true</c> if <c><anno>A</anno></c> + compares less than or equal to <c><anno>B</anno></c> in the + ordering, otherwise <c>false</c>.</p> </desc> </func> + <func> <name name="split" arity="2"/> - <fsummary>Split a list into two lists</fsummary> + <fsummary>Split a list into two lists.</fsummary> <type_desc variable="N">0..length(<anno>List1</anno>)</type_desc> <desc> - <p>Splits <c><anno>List1</anno></c> into <c><anno>List2</anno></c> and <c><anno>List3</anno></c>. - <c><anno>List2</anno></c> contains the first <c><anno>N</anno></c> elements and - <c><anno>List3</anno></c> the rest of the elements (the <c><anno>N</anno></c>th tail).</p> + <p>Splits <c><anno>List1</anno></c> into <c><anno>List2</anno></c> and + <c><anno>List3</anno></c>. <c><anno>List2</anno></c> contains the + first <c><anno>N</anno></c> elements and <c><anno>List3</anno></c> + the remaining elements (the <c><anno>N</anno></c>th tail).</p> </desc> </func> + <func> <name name="splitwith" arity="2"/> - <fsummary>Split a list into two lists based on a predicate</fsummary> + <fsummary>Split a list into two lists based on a predicate.</fsummary> <desc> <p>Partitions <c><anno>List</anno></c> into two lists according to - <c><anno>Pred</anno></c>. <c>splitwith/2</c> behaves as if it is defined - as follows:</p> + <c><anno>Pred</anno></c>. <c>splitwith/2</c> behaves as if it is + defined as follows:</p> <code type="none"> splitwith(Pred, List) -> {takewhile(Pred, List), dropwhile(Pred, List)}.</code> - <p>Examples:</p> + <p><em>Examples:</em></p> <pre> > <input>lists:splitwith(fun(A) -> A rem 2 == 1 end, [1,2,3,4,5,6,7]).</input> {[1],[2,3,4,5,6,7]} > <input>lists:splitwith(fun(A) -> is_atom(A) end, [a,b,1,c,d,2,3,4,e]).</input> {[a,b],[1,c,d,2,3,4,e]}</pre> - <p>See also <c>partition/2</c> for a different way to partition - a list.</p> + <p>For a different way to partition a list, see + <seealso marker="#partition/2"><c>partition/2</c></seealso>.</p> </desc> </func> + <func> <name name="sublist" arity="2"/> - <fsummary>Return a sub-list of a certain length, starting at the first position</fsummary> + <fsummary>Return a sublist of a certain length, starting at the first + position.</fsummary> <desc> - <p>Returns the sub-list of <c><anno>List1</anno></c> starting at position 1 - and with (max) <c><anno>Len</anno></c> elements. It is not an error for - <c><anno>Len</anno></c> to exceed the length of the list, in that case - the whole list is returned.</p> + <p>Returns the sublist of <c><anno>List1</anno></c> starting at + position 1 and with (maximum) <c><anno>Len</anno></c> elements. It is + not an error for <c><anno>Len</anno></c> to exceed the length of the + list, in that case the whole list is returned.</p> </desc> </func> + <func> <name name="sublist" arity="3"/> - <fsummary>Return a sub-list starting at a given position and with a given number of elements</fsummary> + <fsummary>Return a sublist starting at a specified position and with a + specified number of elements.</fsummary> <type_desc variable="Start">1..(length(<anno>List1</anno>)+1)</type_desc> <desc> - <p>Returns the sub-list of <c><anno>List1</anno></c> starting at <c><anno>Start</anno></c> - and with (max) <c><anno>Len</anno></c> elements. It is not an error for - <c><anno>Start</anno>+<anno>Len</anno></c> to exceed the length of the list.</p> + <p>Returns the sublist of <c><anno>List1</anno></c> starting at + <c><anno>Start</anno></c> and with (maximum) <c><anno>Len</anno></c> + elements. It is not an error for + <c><anno>Start</anno>+<anno>Len</anno></c> to exceed the length of + the list.</p> + <p><em>Examples:</em></p> <pre> > <input>lists:sublist([1,2,3,4], 2, 2).</input> [2,3] @@ -711,142 +824,163 @@ splitwith(Pred, List) -> []</pre> </desc> </func> + <func> <name name="subtract" arity="2"/> - <fsummary>Subtract the element in one list from another list</fsummary> + <fsummary>Subtract the element in one list from another list.</fsummary> <desc> - <p>Returns a new list <c><anno>List3</anno></c> which is a copy of - <c><anno>List1</anno></c>, subjected to the following procedure: for each - element in <c><anno>List2</anno></c>, its first occurrence in <c><anno>List1</anno></c> - is deleted. For example:</p> + <p>Returns a new list <c><anno>List3</anno></c> that is a copy of + <c><anno>List1</anno></c>, subjected to the following procedure: + for each element in <c><anno>List2</anno></c>, its first occurrence + in <c><anno>List1</anno></c> is deleted.</p> + <p><em>Example:</em></p> <pre> > <input>lists:subtract("123212", "212").</input> "312".</pre> <p><c>lists:subtract(A, B)</c> is equivalent to <c>A -- B</c>.</p> - <warning><p>The complexity of <c>lists:subtract(A, B)</c> is proportional - to <c>length(A)*length(B)</c>, meaning that it will be very slow if - both <c>A</c> and <c>B</c> are long lists. - (Using ordered lists and - <seealso marker="ordsets#subtract/2">ordsets:subtract/2</seealso> - is a much better choice if both lists are long.)</p></warning> + <warning> + <p>The complexity of <c>lists:subtract(A, B)</c> is proportional to + <c>length(A)*length(B)</c>, meaning that it is very slow if both + <c>A</c> and <c>B</c> are long lists. (If both lists are long, it + is a much better choice to use ordered lists and + <seealso marker="ordsets#subtract/2"> + <c>ordsets:subtract/2</c></seealso>.</p> + </warning> </desc> </func> + <func> <name name="suffix" arity="2"/> - <fsummary>Test for list suffix</fsummary> + <fsummary>Test for list suffix.</fsummary> <desc> <p>Returns <c>true</c> if <c><anno>List1</anno></c> is a suffix of <c><anno>List2</anno></c>, otherwise <c>false</c>.</p> </desc> </func> + <func> <name name="sum" arity="1"/> - <fsummary>Return sum of elements in a list</fsummary> + <fsummary>Return the sum of elements in a list.</fsummary> <desc> <p>Returns the sum of the elements in <c><anno>List</anno></c>.</p> </desc> </func> + <func> <name name="takewhile" arity="2"/> - <fsummary>Take elements from a list while a predicate is true</fsummary> + <fsummary>Take elements from a list while a predicate is <c>true</c>. + </fsummary> <desc> - <p>Takes elements <c><anno>Elem</anno></c> from <c><anno>List1</anno></c> while - <c><anno>Pred</anno>(<anno>Elem</anno>)</c> returns <c>true</c>, that is, - the function returns the longest prefix of the list for which + <p>Takes elements <c><anno>Elem</anno></c> from + <c><anno>List1</anno></c> while + <c><anno>Pred</anno>(<anno>Elem</anno>)</c> returns <c>true</c>, that + is, the function returns the longest prefix of the list for which all elements satisfy the predicate.</p> </desc> </func> + <func> <name name="ukeymerge" arity="3"/> - <fsummary>Merge two key-sorted lists of tuples, removing duplicates</fsummary> + <fsummary>Merge two key-sorted lists of tuples, removing duplicates. + </fsummary> <type_desc variable="N">1..tuple_size(<anno>Tuple</anno>)</type_desc> <desc> - <p>Returns the sorted list formed by merging <c><anno>TupleList1</anno></c> - and <c><anno>TupleList2</anno></c>. The merge is performed on the - <c><anno>N</anno></c>th element of each tuple. Both <c><anno>TupleList1</anno></c> and - <c><anno>TupleList2</anno></c> must be key-sorted without duplicates - prior to evaluating this function. When two tuples compare - equal, the tuple from <c><anno>TupleList1</anno></c> is picked and the - one from <c><anno>TupleList2</anno></c> deleted.</p> + <p>Returns the sorted list formed by merging + <c><anno>TupleList1</anno></c> and + <c><anno>TupleList2</anno></c>. The merge is performed on the + <c><anno>N</anno></c>th element of each tuple. Both + <c><anno>TupleList1</anno></c> and <c><anno>TupleList2</anno></c> + must be key-sorted without duplicates before evaluating this function. + When two tuples compare equal, the tuple from + <c><anno>TupleList1</anno></c> is picked and the + one from <c><anno>TupleList2</anno></c> is deleted.</p> </desc> </func> + <func> <name name="ukeysort" arity="2"/> - <fsummary>Sort a list of tuples, removing duplicates</fsummary> + <fsummary>Sort a list of tuples, removing duplicates.</fsummary> <type_desc variable="N">1..tuple_size(<anno>Tuple</anno>)</type_desc> <desc> - <p>Returns a list containing the sorted elements of the list - <c><anno>TupleList1</anno></c> where all but the first tuple of the - tuples comparing equal have been deleted. Sorting is + <p>Returns a list containing the sorted elements of list + <c><anno>TupleList1</anno></c> where all except the first tuple of + the tuples comparing equal have been deleted. Sorting is performed on the <c><anno>N</anno></c>th element of the tuples.</p> </desc> </func> + <func> <name name="umerge" arity="1"/> - <fsummary>Merge a list of sorted lists, removing duplicates</fsummary> + <fsummary>Merge a list of sorted lists, removing duplicates.</fsummary> <desc> - <p>Returns the sorted list formed by merging all the sub-lists - of <c><anno>ListOfLists</anno></c>. All sub-lists must be sorted and - contain no duplicates prior to evaluating this function. - When two elements compare equal, the element from the - sub-list with the lowest position in <c><anno>ListOfLists</anno></c> is - picked and the other one deleted.</p> + <p>Returns the sorted list formed by merging all the sublists + of <c><anno>ListOfLists</anno></c>. All sublists must be sorted and + contain no duplicates before evaluating this function. + When two elements compare equal, the element from the sublist + with the lowest position in <c><anno>ListOfLists</anno></c> is + picked and the other is deleted.</p> </desc> </func> + <func> <name name="umerge" arity="2"/> - <fsummary>Merge two sorted lists, removing duplicates</fsummary> + <fsummary>Merge two sorted lists, removing duplicates.</fsummary> <desc> - <p>Returns the sorted list formed by merging <c><anno>List1</anno></c> and - <c><anno>List2</anno></c>. Both <c><anno>List1</anno></c> and <c><anno>List2</anno></c> must be - sorted and contain no duplicates prior to evaluating this + <p>Returns the sorted list formed by merging <c><anno>List1</anno></c> + and <c><anno>List2</anno></c>. Both <c><anno>List1</anno></c> and + <c><anno>List2</anno></c> must be + sorted and contain no duplicates before evaluating this function. When two elements compare equal, the element from - <c><anno>List1</anno></c> is picked and the one from <c><anno>List2</anno></c> - deleted.</p> + <c><anno>List1</anno></c> is picked and the one from + <c><anno>List2</anno></c> is deleted.</p> </desc> </func> + <func> <name name="umerge" arity="3"/> - <fsummary>Merge two sorted lists, removing duplicates</fsummary> + <fsummary>Merge two sorted lists, removing duplicates.</fsummary> <desc> - <p>Returns the sorted list formed by merging <c><anno>List1</anno></c> and - <c><anno>List2</anno></c>. Both <c><anno>List1</anno></c> and <c><anno>List2</anno></c> must be - sorted according to the <seealso + <p>Returns the sorted list formed by merging <c><anno>List1</anno></c> + and <c><anno>List2</anno></c>. Both <c><anno>List1</anno></c> and + <c><anno>List2</anno></c> must be sorted according to the <seealso marker="#ordering_function">ordering function</seealso> - <c>Fun</c> and contain no duplicates prior to evaluating - this function. <c><anno>Fun</anno>(<anno>A</anno>, <anno>B</anno>)</c> should return <c>true</c> if - <c><anno>A</anno></c> compares less than or equal to <c><anno>B</anno></c> in the - ordering, <c>false</c> otherwise. When two elements compare - equal, the element from - <c><anno>List1</anno></c> is picked and the one from <c><anno>List2</anno></c> - deleted.</p> + <c>Fun</c> and contain no duplicates before evaluating this function. + <c><anno>Fun</anno>(<anno>A</anno>, <anno>B</anno>)</c> is to return + <c>true</c> if <c><anno>A</anno></c> compares less than or equal to + <c><anno>B</anno></c> in the ordering, otherwise <c>false</c>. When + two elements compare equal, the element from <c><anno>List1</anno></c> + is picked and the one from <c><anno>List2</anno></c> is deleted.</p> </desc> </func> + <func> <name name="umerge3" arity="3"/> - <fsummary>Merge three sorted lists, removing duplicates</fsummary> + <fsummary>Merge three sorted lists, removing duplicates.</fsummary> <desc> <p>Returns the sorted list formed by merging <c><anno>List1</anno></c>, - <c><anno>List2</anno></c> and <c><anno>List3</anno></c>. All of <c><anno>List1</anno></c>, - <c><anno>List2</anno></c> and <c><anno>List3</anno></c> must be sorted and contain no - duplicates prior to evaluating this function. When two + <c><anno>List2</anno></c>, and <c><anno>List3</anno></c>. All of + <c><anno>List1</anno></c>, <c><anno>List2</anno></c>, and + <c><anno>List3</anno></c> must be sorted and contain no + duplicates before evaluating this function. When two elements compare equal, the element from <c><anno>List1</anno></c> is - picked if there is such an element, otherwise the element - from <c><anno>List2</anno></c> is picked, and the other one deleted.</p> + picked if there is such an element, otherwise the element from + <c><anno>List2</anno></c> is picked, and the other is deleted.</p> </desc> </func> + <func> <name name="unzip" arity="1"/> - <fsummary>Unzip a list of two-tuples into two lists</fsummary> + <fsummary>Unzip a list of two-tuples into two lists.</fsummary> <desc> <p>"Unzips" a list of two-tuples into two lists, where the first list contains the first element of each tuple, and the second list contains the second element of each tuple.</p> </desc> </func> + <func> <name name="unzip3" arity="1"/> - <fsummary>Unzip a list of three-tuples into three lists</fsummary> + <fsummary>Unzip a list of three-tuples into three lists.</fsummary> <desc> <p>"Unzips" a list of three-tuples into three lists, where the first list contains the first element of each tuple, @@ -854,76 +988,84 @@ splitwith(Pred, List) -> the third list contains the third element of each tuple.</p> </desc> </func> + <func> <name name="usort" arity="1"/> - <fsummary>Sort a list, removing duplicates</fsummary> + <fsummary>Sort a list, removing duplicates.</fsummary> <desc> <p>Returns a list containing the sorted elements of - <c><anno>List1</anno></c> where all but the first element of the elements - comparing equal have been deleted.</p> + <c><anno>List1</anno></c> where all except the first element of the + elements comparing equal have been deleted.</p> </desc> </func> + <func> <name name="usort" arity="2"/> - <fsummary>Sort a list, removing duplicates</fsummary> + <fsummary>Sort a list, removing duplicates.</fsummary> <desc> - <p>Returns a list which contains the sorted elements of - <c><anno>List1</anno></c> where all but the first element of the elements - comparing equal according to the <seealso + <p>Returns a list containing the sorted elements of + <c><anno>List1</anno></c> where all except the first element of the + elements comparing equal according to the <seealso marker="#ordering_function">ordering function</seealso> - <c><anno>Fun</anno></c> have been deleted. <c><anno>Fun</anno>(A, B)</c> should return + <c><anno>Fun</anno></c> have been deleted. + <c><anno>Fun</anno>(A, B)</c> is to return <c>true</c> if <c>A</c> compares less than or equal to - <c>B</c> in the ordering, <c>false</c> otherwise.</p> + <c>B</c> in the ordering, otherwise <c>false</c>.</p> </desc> </func> + <func> <name name="zip" arity="2"/> - <fsummary>Zip two lists into a list of two-tuples</fsummary> + <fsummary>Zip two lists into a list of two-tuples.</fsummary> <desc> <p>"Zips" two lists of equal length into one list of two-tuples, where the first element of each tuple is taken from the first - list and the second element is taken from corresponding + list and the second element is taken from the corresponding element in the second list.</p> </desc> </func> + <func> <name name="zip3" arity="3"/> - <fsummary>Zip three lists into a list of three-tuples</fsummary> + <fsummary>Zip three lists into a list of three-tuples.</fsummary> <desc> <p>"Zips" three lists of equal length into one list of three-tuples, where the first element of each tuple is taken from the first list, the second element is taken from - corresponding element in the second list, and the third - element is taken from the corresponding element in the third - list.</p> + the corresponding element in the second list, and the third + element is taken from the corresponding element in the third list.</p> </desc> </func> + <func> <name name="zipwith" arity="3"/> - <fsummary>Zip two lists into one list according to a fun</fsummary> + <fsummary>Zip two lists into one list according to a fun.</fsummary> <desc> - <p>Combine the elements of two lists of equal length into one - list. For each pair <c><anno>X</anno>, <anno>Y</anno></c> of list elements from the two - lists, the element in the result list will be + <p>Combines the elements of two lists of equal length into one list. + For each pair <c><anno>X</anno>, <anno>Y</anno></c> of list elements + from the two lists, the element in the result list is <c><anno>Combine</anno>(<anno>X</anno>, <anno>Y</anno>)</c>.</p> <p><c>zipwith(fun(X, Y) -> {X,Y} end, List1, List2)</c> is equivalent to <c>zip(List1, List2)</c>.</p> - <p>Example:</p> + <p><em>Example:</em></p> <pre> > <input>lists:zipwith(fun(X, Y) -> X+Y end, [1,2,3], [4,5,6]).</input> [5,7,9]</pre> </desc> </func> + <func> <name name="zipwith3" arity="4"/> - <fsummary>Zip three lists into one list according to a fun</fsummary> - <desc> - <p>Combine the elements of three lists of equal length into one - list. For each triple <c><anno>X</anno>, <anno>Y</anno>, <anno>Z</anno></c> of list elements from - the three lists, the element in the result list will be - <c><anno>Combine</anno>(<anno>X</anno>, <anno>Y</anno>, <anno>Z</anno>)</c>.</p> - <p><c>zipwith3(fun(X, Y, Z) -> {X,Y,Z} end, List1, List2, List3)</c> is equivalent to <c>zip3(List1, List2, List3)</c>.</p> - <p>Examples:</p> + <fsummary>Zip three lists into one list according to a fun.</fsummary> + <desc> + <p>Combines the elements of three lists of equal length into one + list. For each triple <c><anno>X</anno>, <anno>Y</anno>, + <anno>Z</anno></c> of list elements from the three lists, the element + in the result list is <c><anno>Combine</anno>(<anno>X</anno>, + <anno>Y</anno>, <anno>Z</anno>)</c>.</p> + <p><c>zipwith3(fun(X, Y, Z) -> {X,Y,Z} end, List1, List2, List3)</c> is + equivalent to <c>zip3(List1, List2, List3)</c>.</p> + <p><em>Examples:</em></p> <pre> > <input>lists:zipwith3(fun(X, Y, Z) -> X+Y+Z end, [1,2,3], [4,5,6], [7,8,9]).</input> [12,15,18] diff --git a/lib/stdlib/doc/src/log_mf_h.xml b/lib/stdlib/doc/src/log_mf_h.xml index 65622e52f5..edc3d31025 100644 --- a/lib/stdlib/doc/src/log_mf_h.xml +++ b/lib/stdlib/doc/src/log_mf_h.xml @@ -32,48 +32,56 @@ <checked>Martin Björklund</checked> <date>1996-10-31</date> <rev>A</rev> - <file>log_mf_h.sgml</file> + <file>log_mf_h.xml</file> </header> <module>log_mf_h</module> - <modulesummary>An Event Handler which Logs Events to Disk</modulesummary> + <modulesummary>An event handler that logs events to disk.</modulesummary> <description> - <p>The <c>log_mf_h</c> is a <c>gen_event</c> handler module which - can be installed in any <c>gen_event</c> process. It logs onto disk all events - which are sent to an event manager. Each event is written as a - binary which makes the logging very fast. However, a tool such as the <c>Report Browser</c> (<c>rb</c>) must be used in order to read the files. The events are written to multiple files. When all files have been used, the first one is re-used and overwritten. The directory location, the number of files, and the size of each file are configurable. The directory will include one file called <c>index</c>, and - report files <c>1, 2, ....</c>. - </p> + <p>This module is a <c>gen_event</c> handler module that can be installed + in any <c>gen_event</c> process. It logs onto disk all events that are + sent to an event manager. Each event is written as a binary, which makes + the logging very fast. However, a tool such as the Report Browser + (<seealso marker="sasl:rb"><c>rb(3)</c></seealso>) must be used to read + the files. The events are written to multiple files. When all files have + been used, the first one is reused and overwritten. The directory + location, the number of files, and the size of each file are configurable. + The directory will include one file called <c>index</c>, and report files + <c>1, 2, ...</c>.</p> </description> + <datatypes> <datatype> <name name="args"/> <desc><p>Term to be sent to <seealso marker="gen_event#add_handler/3"> - gen_event:add_handler/3</seealso>.</p></desc> + <c>gen_event:add_handler/3</c></seealso>.</p> + </desc> </datatype> </datatypes> + <funcs> <func> <name name="init" arity="3"/> <name name="init" arity="4"/> - <fsummary>Initiate the event handler</fsummary> + <fsummary>Initiate the event handler.</fsummary> <desc> - <p>Initiates the event handler. This function returns - <c><anno>Args</anno></c>, which should be used in a call to + <p>Initiates the event handler. Returns <c><anno>Args</anno></c>, which + is to be used in a call to <c>gen_event:add_handler(EventMgr, log_mf_h, <anno>Args</anno>)</c>. - </p> + </p> <p><c><anno>Dir</anno></c> specifies which directory to use for the log - files. <c><anno>MaxBytes</anno></c> specifies the size of each individual - file. <c><anno>MaxFiles</anno></c> specifies how many files are - used. <c><anno>Pred</anno></c> is a predicate function used to filter the - events. If no predicate function is specified, all events are - logged.</p> + files. <c><anno>MaxBytes</anno></c> specifies the size of each + individual file. <c><anno>MaxFiles</anno></c> specifies how many + files are used. <c><anno>Pred</anno></c> is a predicate function used + to filter the events. If no predicate function is specified, all + events are logged.</p> </desc> </func> </funcs> <section> <title>See Also</title> - <p><seealso marker="gen_event">gen_event(3)</seealso>, rb(3) </p> + <p><seealso marker="gen_event"><c>gen_event(3)</c></seealso>, + <seealso marker="sasl:rb"><c>rb(3)</c></seealso></p> </section> </erlref> diff --git a/lib/stdlib/doc/src/maps.xml b/lib/stdlib/doc/src/maps.xml index bf45461e2b..e1edbadcd3 100644 --- a/lib/stdlib/doc/src/maps.xml +++ b/lib/stdlib/doc/src/maps.xml @@ -2,12 +2,12 @@ <!DOCTYPE erlref SYSTEM "erlref.dtd"> <erlref> - <header> - <copyright> - <year>2013</year><year>2016</year> - <holder>Ericsson AB. All Rights Reserved.</holder> - </copyright> - <legalnotice> + <header> + <copyright> + <year>2013</year><year>2016</year> + <holder>Ericsson AB. All Rights Reserved.</holder> + </copyright> + <legalnotice> Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at @@ -19,397 +19,372 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. - </legalnotice> - <title>maps</title> - <prepared>Björn-Egil Dahlberg</prepared> - <docno>1</docno> - <date>2014-02-28</date> - <rev>A</rev> - </header> - <module>maps</module> - <modulesummary>Maps Processing Functions</modulesummary> - <description> - <p>This module contains functions for maps processing.</p> - </description> - <funcs> + </legalnotice> - <func> - <name name="filter" arity="2"/> - <fsummary>Choose pairs which satisfy a predicate</fsummary> - <desc> - <p> - Returns a map <c><anno>Map2</anno></c> for which predicate - <c><anno>Pred</anno></c> holds true in <c><anno>Map1</anno></c>. - </p> - <p> - The call will fail with a <c>{badmap,Map}</c> exception if - <c><anno>Map1</anno></c> is not a map or with <c>badarg</c> if - <c><anno>Pred</anno></c> is not a function of arity 2. - </p> - <p>Example:</p> - <code type="none"> + <title>maps</title> + <prepared>Björn-Egil Dahlberg</prepared> + <docno>1</docno> + <date>2014-02-28</date> + <rev>A</rev> + </header> + <module>maps</module> + <modulesummary>Maps processing functions.</modulesummary> + <description> + <p>This module contains functions for maps processing.</p> + </description> + + <funcs> + <func> + <name name="filter" arity="2"/> + <fsummary>Select pairs that satisfy a predicate.</fsummary> + <desc> + <p>Returns a map <c><anno>Map2</anno></c> for which predicate + <c><anno>Pred</anno></c> holds true in <c><anno>Map1</anno></c>.</p> + <p>The call fails with a <c>{badmap,Map}</c> exception if + <c><anno>Map1</anno></c> is not a map, or with <c>badarg</c> if + <c><anno>Pred</anno></c> is not a function of arity 2.</p> + <p><em>Example:</em></p> + <code type="none"> > M = #{a => 2, b => 3, c=> 4, "a" => 1, "b" => 2, "c" => 4}, Pred = fun(K,V) -> is_atom(K) andalso (V rem 2) =:= 0 end, maps:filter(Pred,M). -#{a => 2,c => 4} </code> - </desc> - </func> +#{a => 2,c => 4}</code> + </desc> + </func> - <func> - <name name="find" arity="2"/> - <fsummary></fsummary> - <desc> - <p> - Returns a tuple <c>{ok, Value}</c> where <c><anno>Value</anno></c> is the value associated with <c><anno>Key</anno></c>, - or <c>error</c> if no value is associated with <c><anno>Key</anno></c> in <c><anno>Map</anno></c>. - </p> - <p> - The call will fail with a <c>{badmap,Map}</c> exception if <c><anno>Map</anno></c> is not a map. - </p> - <p>Example:</p> - <code type="none"> + <func> + <name name="find" arity="2"/> + <fsummary></fsummary> + <desc> + <p>Returns a tuple <c>{ok, Value}</c>, where <c><anno>Value</anno></c> + is the value associated with <c><anno>Key</anno></c>, or <c>error</c> + if no value is associated with <c><anno>Key</anno></c> in + <c><anno>Map</anno></c>.</p> + <p>The call fails with a <c>{badmap,Map}</c> exception if + <c><anno>Map</anno></c> is not a map.</p> + <p><em>Example:</em></p> + <code type="none"> > Map = #{"hi" => 42}, Key = "hi", maps:find(Key,Map). -{ok,42} </code> - </desc> - </func> +{ok,42}</code> + </desc> + </func> - <func> - <name name="fold" arity="3"/> - <fsummary></fsummary> - <desc> - <p> - Calls <c>F(K, V, AccIn)</c> for every <c><anno>K</anno></c> to value <c><anno>V</anno></c> - association in <c><anno>Map</anno></c> in - arbitrary order. The function <c>fun F/3</c> must return a new accumulator - which is passed to the next successive call. <c>maps:fold/3</c> returns the final - value of the accumulator. The initial accumulator value <c><anno>Init</anno></c> is returned if - the map is empty. - </p> - <p>Example:</p> - <code type="none"> + <func> + <name name="fold" arity="3"/> + <fsummary></fsummary> + <desc> + <p>Calls <c>F(K, V, AccIn)</c> for every <c><anno>K</anno></c> to value + <c><anno>V</anno></c> association in <c><anno>Map</anno></c> in + any order. Function <c>fun F/3</c> must return a new + accumulator, which is passed to the next successive call. + This function returns the final value of the accumulator. The initial + accumulator value <c><anno>Init</anno></c> is returned if the map is + empty.</p> + <p><em>Example:</em></p> + <code type="none"> > Fun = fun(K,V,AccIn) when is_list(K) -> AccIn + V end, Map = #{"k1" => 1, "k2" => 2, "k3" => 3}, maps:fold(Fun,0,Map). 6</code> - </desc> - </func> + </desc> + </func> - <func> - <name name="from_list" arity="1"/> - <fsummary></fsummary> - <desc> - <p> - The function takes a list of key-value tuples elements and builds a - map. The associations may be in any order and both keys and values in the - association may be of any term. If the same key appears more than once, - the latter (rightmost) value is used and the previous values are ignored. - </p> - <p>Example:</p> - <code type="none"> + <func> + <name name="from_list" arity="1"/> + <fsummary></fsummary> + <desc> + <p>Takes a list of key-value tuples elements and builds a map. The + associations can be in any order, and both keys and values in the + association can be of any term. If the same key appears more than + once, the latter (right-most) value is used and the previous values + are ignored.</p> + <p><em>Example:</em></p> + <code type="none"> > List = [{"a",ignored},{1337,"value two"},{42,value_three},{"a",1}], maps:from_list(List). #{42 => value_three,1337 => "value two","a" => 1}</code> - </desc> - </func> + </desc> + </func> - <func> - <name name="get" arity="2"/> - <fsummary></fsummary> - <desc> - <p> - Returns the value <c><anno>Value</anno></c> associated with <c><anno>Key</anno></c> if - <c><anno>Map</anno></c> contains <c><anno>Key</anno></c>. - </p> - <p> - The call will fail with a <c>{badmap,Map}</c> exception if <c><anno>Map</anno></c> is not a map, - or with a <c>{badkey,Key}</c> exception if no value is associated with <c><anno>Key</anno></c>. - </p> - <p>Example:</p> - <code type="none"> + <func> + <name name="get" arity="2"/> + <fsummary></fsummary> + <desc> + <p>Returns value <c><anno>Value</anno></c> associated with + <c><anno>Key</anno></c> if <c><anno>Map</anno></c> contains + <c><anno>Key</anno></c>.</p> + <p>The call fails with a <c>{badmap,Map}</c> exception if + <c><anno>Map</anno></c> is not a map, or with a <c>{badkey,Key}</c> + exception if no value is associated with <c><anno>Key</anno></c>.</p> + <p><em>Example:</em></p> + <code type="none"> > Key = 1337, Map = #{42 => value_two,1337 => "value one","a" => 1}, maps:get(Key,Map). "value one"</code> - </desc> - </func> - - <func> - <name name="get" arity="3"/> - <fsummary></fsummary> - <desc> - <p> - Returns the value <c><anno>Value</anno></c> associated with <c><anno>Key</anno></c> if - <c><anno>Map</anno></c> contains <c><anno>Key</anno></c>. - If no value is associated with <c><anno>Key</anno></c> then returns <c><anno>Default</anno></c>. - </p> - <p> - The call will fail with a <c>{badmap,Map}</c> exception if <c><anno>Map</anno></c> is not a map. + </desc> + </func> - </p> - <p>Example:</p> - <code type="none"> + <func> + <name name="get" arity="3"/> + <fsummary></fsummary> + <desc> + <p>Returns value <c><anno>Value</anno></c> associated with + <c><anno>Key</anno></c> if <c><anno>Map</anno></c> contains + <c><anno>Key</anno></c>. If no value is associated with + <c><anno>Key</anno></c>, <c><anno>Default</anno></c> is returned.</p> + <p>The call fails with a <c>{badmap,Map}</c> exception if + <c><anno>Map</anno></c> is not a map.</p> + <p><em>Example:</em></p> + <code type="none"> > Map = #{ key1 => val1, key2 => val2 }. #{key1 => val1,key2 => val2} > maps:get(key1, Map, "Default value"). val1 > maps:get(key3, Map, "Default value"). "Default value"</code> - </desc> - </func> + </desc> + </func> - <func> - <name name="is_key" arity="2"/> - <fsummary></fsummary> - <desc> - <p> - Returns <c>true</c> if map <c><anno>Map</anno></c> contains <c><anno>Key</anno></c> and returns - <c>false</c> if it does not contain the <c><anno>Key</anno></c>. - </p> - <p> - The call will fail with a <c>{badmap,Map}</c> exception if <c><anno>Map</anno></c> is not a map. - </p> - <p>Example:</p> - <code type="none"> + <func> + <name name="is_key" arity="2"/> + <fsummary></fsummary> + <desc> + <p>Returns <c>true</c> if map <c><anno>Map</anno></c> contains + <c><anno>Key</anno></c> and returns <c>false</c> if it does not + contain the <c><anno>Key</anno></c>.</p> + <p>The call fails with a <c>{badmap,Map}</c> exception if + <c><anno>Map</anno></c> is not a map.</p> + <p><em>Example:</em></p> + <code type="none"> > Map = #{"42" => value}. #{"42"> => value} > maps:is_key("42",Map). true > maps:is_key(value,Map). false</code> - </desc> - </func> + </desc> + </func> - <func> - <name name="keys" arity="1"/> - <fsummary></fsummary> - <desc> - <p> - Returns a complete list of keys, in arbitrary order, which resides within <c><anno>Map</anno></c>. - </p> - <p> - The call will fail with a <c>{badmap,Map}</c> exception if <c><anno>Map</anno></c> is not a map. - </p> - <p>Example:</p> - <code type="none"> + <func> + <name name="keys" arity="1"/> + <fsummary></fsummary> + <desc> + <p>Returns a complete list of keys, in any order, which resides + within <c><anno>Map</anno></c>.</p> + <p>The call fails with a <c>{badmap,Map}</c> exception if + <c><anno>Map</anno></c> is not a map.</p> + <p><em>Example:</em></p> + <code type="none"> > Map = #{42 => value_three,1337 => "value two","a" => 1}, maps:keys(Map). [42,1337,"a"]</code> - </desc> - </func> + </desc> + </func> - <func> - <name name="map" arity="2"/> - <fsummary></fsummary> - <desc> - <p> - The function produces a new map <c><anno>Map2</anno></c> by calling the function <c>fun F(K, V1)</c> for - every <c><anno>K</anno></c> to value <c><anno>V1</anno></c> association in <c><anno>Map1</anno></c> in arbitrary order. - The function <c>fun F/2</c> must return the value <c><anno>V2</anno></c> to be associated with key <c><anno>K</anno></c> for - the new map <c><anno>Map2</anno></c>. - </p> - <p>Example:</p> - <code type="none"> + <func> + <name name="map" arity="2"/> + <fsummary></fsummary> + <desc> + <p>Produces a new map <c><anno>Map2</anno></c> by calling function + <c>fun F(K, V1)</c> for every <c><anno>K</anno></c> to value + <c><anno>V1</anno></c> association in <c><anno>Map1</anno></c> in + any order. Function <c>fun F/2</c> must return value + <c><anno>V2</anno></c> to be associated with key <c><anno>K</anno></c> + for the new map <c><anno>Map2</anno></c>.</p> + <p><em>Example:</em></p> + <code type="none"> > Fun = fun(K,V1) when is_list(K) -> V1*2 end, Map = #{"k1" => 1, "k2" => 2, "k3" => 3}, maps:map(Fun,Map). #{"k1" => 2,"k2" => 4,"k3" => 6}</code> - </desc> - </func> + </desc> + </func> - <func> - <name name="merge" arity="2"/> - <fsummary></fsummary> - <desc> - <p> - Merges two maps into a single map <c><anno>Map3</anno></c>. If two keys exists in both maps the - value in <c><anno>Map1</anno></c> will be superseded by the value in <c><anno>Map2</anno></c>. - </p> - <p> - The call will fail with a <c>{badmap,Map}</c> exception if <c><anno>Map1</anno></c> or - <c><anno>Map2</anno></c> is not a map. - </p> - <p>Example:</p> - <code type="none"> + <func> + <name name="merge" arity="2"/> + <fsummary></fsummary> + <desc> + <p>Merges two maps into a single map <c><anno>Map3</anno></c>. If two + keys exist in both maps, the value in <c><anno>Map1</anno></c> is + superseded by the value in <c><anno>Map2</anno></c>.</p> + <p>The call fails with a <c>{badmap,Map}</c> exception if + <c><anno>Map1</anno></c> or <c><anno>Map2</anno></c> is not a map.</p> + <p><em>Example:</em></p> + <code type="none"> > Map1 = #{a => "value_one", b => "value_two"}, Map2 = #{a => 1, c => 2}, maps:merge(Map1,Map2). #{a => 1,b => "value_two",c => 2}</code> - </desc> - </func> + </desc> + </func> - <func> - <name name="new" arity="0"/> - <fsummary></fsummary> - <desc> - <p> - Returns a new empty map. - </p> - <p>Example:</p> - <code type="none"> + <func> + <name name="new" arity="0"/> + <fsummary></fsummary> + <desc> + <p>Returns a new empty map.</p> + <p><em>Example:</em></p> + <code type="none"> > maps:new(). #{}</code> - </desc> - </func> - - <func> - <name name="put" arity="3"/> - <fsummary></fsummary> - <desc> - <p> - Associates <c><anno>Key</anno></c> with value <c><anno>Value</anno></c> and inserts the association into map <c>Map2</c>. - If key <c><anno>Key</anno></c> already exists in map <c><anno>Map1</anno></c>, the old associated value is - replaced by value <c><anno>Value</anno></c>. The function returns a new map <c><anno>Map2</anno></c> containing the new association and - the old associations in <c><anno>Map1</anno></c>. - </p> - <p> - The call will fail with a <c>{badmap,Map}</c> exception if <c><anno>Map1</anno></c> is not a map. - </p> + </desc> + </func> - <p>Example:</p> - <code type="none"> + <func> + <name name="put" arity="3"/> + <fsummary></fsummary> + <desc> + <p>Associates <c><anno>Key</anno></c> with value + <c><anno>Value</anno></c> and inserts the association into map + <c>Map2</c>. If key <c><anno>Key</anno></c> already exists in map + <c><anno>Map1</anno></c>, the old associated value is replaced by + value <c><anno>Value</anno></c>. The function returns a new map + <c><anno>Map2</anno></c> containing the new association and the old + associations in <c><anno>Map1</anno></c>.</p> + <p>The call fails with a <c>{badmap,Map}</c> exception if + <c><anno>Map1</anno></c> is not a map.</p> + <p><em>Example:</em></p> + <code type="none"> > Map = #{"a" => 1}. #{"a" => 1} > maps:put("a", 42, Map). #{"a" => 42} > maps:put("b", 1337, Map). #{"a" => 1,"b" => 1337}</code> - </desc> - </func> + </desc> + </func> - <func> - <name name="remove" arity="2"/> - <fsummary></fsummary> - <desc> - <p> - The function removes the <c><anno>Key</anno></c>, if it exists, and its associated value from - <c><anno>Map1</anno></c> and returns a new map <c><anno>Map2</anno></c> without key <c><anno>Key</anno></c>. - </p> - <p> - The call will fail with a <c>{badmap,Map}</c> exception if <c><anno>Map1</anno></c> is not a map. - </p> - <p>Example:</p> - <code type="none"> + <func> + <name name="remove" arity="2"/> + <fsummary></fsummary> + <desc> + <p>Removes the <c><anno>Key</anno></c>, if it exists, and its + associated value from <c><anno>Map1</anno></c> and returns a new map + <c><anno>Map2</anno></c> without key <c><anno>Key</anno></c>.</p> + <p>The call fails with a <c>{badmap,Map}</c> exception if + <c><anno>Map1</anno></c> is not a map.</p> + <p><em>Example:</em></p> + <code type="none"> > Map = #{"a" => 1}. #{"a" => 1} > maps:remove("a",Map). #{} > maps:remove("b",Map). #{"a" => 1}</code> - </desc> - </func> + </desc> + </func> + + <func> + <name name="size" arity="1"/> + <fsummary></fsummary> + <desc> + <p>Returns the number of key-value associations in + <c><anno>Map</anno></c>. This operation occurs in constant time.</p> + <p><em>Example:</em></p> + <code type="none"> +> Map = #{42 => value_two,1337 => "value one","a" => 1}, + maps:size(Map). +3</code> + </desc> + </func> - <func> - <name name="take" arity="2"/> - <fsummary></fsummary> - <desc> - <p> - The function removes the <c><anno>Key</anno></c>, if it exists, and its associated value from - <c><anno>Map1</anno></c> and returns a tuple with the removed <c><anno>Value</anno></c> and - the new map <c><anno>Map2</anno></c> without key <c><anno>Key</anno></c>. - If the key does not exist <c>error</c> is returned. - </p> - <p> - The call will fail with a <c>{badmap,Map}</c> exception if <c><anno>Map1</anno></c> is not a map. - </p> - <p>Example:</p> - <code type="none"> + <func> + <name name="take" arity="2"/> + <fsummary></fsummary> + <desc> + <p>The function removes the <c><anno>Key</anno></c>, if it + exists, and its associated value from <c><anno>Map1</anno></c> + and returns a tuple with the removed <c><anno>Value</anno></c> + and the new map <c><anno>Map2</anno></c> without key + <c><anno>Key</anno></c>. If the key does not exist + <c>error</c> is returned. + </p> + <p>The call will fail with a <c>{badmap,Map}</c> exception if + <c><anno>Map1</anno></c> is not a map. + </p> + <p>Example:</p> + <code type="none"> > Map = #{"a" => "hello", "b" => "world"}. #{"a" => "hello", "b" => "world"} > maps:take("a",Map). {"hello",#{"b" => "world"}} > maps:take("does not exist",Map). error</code> - </desc> - </func> - - <func> - <name name="size" arity="1"/> - <fsummary></fsummary> - <desc> - <p> - The function returns the number of key-value associations in the <c><anno>Map</anno></c>. - This operation happens in constant time. - </p> - <p>Example:</p> - <code type="none"> -> Map = #{42 => value_two,1337 => "value one","a" => 1}, - maps:size(Map). -3</code> - </desc> - </func> + </desc> + </func> - <func> - <name name="to_list" arity="1"/> - <fsummary></fsummary> - <desc> - <p> - The fuction returns a list of pairs representing the key-value associations of <c><anno>Map</anno></c>, - where the pairs, <c>[{K1,V1}, ..., {Kn,Vn}]</c>, are returned in arbitrary order. - </p> - <p> - The call will fail with a <c>{badmap,Map}</c> exception if <c><anno>Map</anno></c> is not a map. - </p> - <p>Example:</p> - <code type="none"> + <func> + <name name="to_list" arity="1"/> + <fsummary></fsummary> + <desc> + <p>Returns a list of pairs representing the key-value associations of + <c><anno>Map</anno></c>, where the pairs + <c>[{K1,V1}, ..., {Kn,Vn}]</c> are returned in arbitrary order.</p> + <p>The call fails with a <c>{badmap,Map}</c> exception if + <c><anno>Map</anno></c> is not a map.</p> + <p><em>Example:</em></p> + <code type="none"> > Map = #{42 => value_three,1337 => "value two","a" => 1}, maps:to_list(Map). [{42,value_three},{1337,"value two"},{"a",1}]</code> - </desc> - </func> + </desc> + </func> - <func> - <name name="update" arity="3"/> - <fsummary></fsummary> - <desc> - <p> - If <c><anno>Key</anno></c> exists in <c><anno>Map1</anno></c> the old associated value is - replaced by value <c><anno>Value</anno></c>. The function returns a new map <c><anno>Map2</anno></c> containing - the new associated value. - </p> - <p> - The call will fail with a <c>{badmap,Map}</c> exception if <c><anno>Map1</anno></c> is not a map, - or with a <c>{badkey,Key}</c> exception if no value is associated with <c><anno>Key</anno></c>. - </p> - <p>Example:</p> - <code type="none"> + <func> + <name name="update" arity="3"/> + <fsummary></fsummary> + <desc> + <p>If <c><anno>Key</anno></c> exists in <c><anno>Map1</anno></c>, the + old associated value is replaced by value <c><anno>Value</anno></c>. + The function returns a new map <c><anno>Map2</anno></c> containing + the new associated value.</p> + <p>The call fails with a <c>{badmap,Map}</c> exception if + <c><anno>Map1</anno></c> is not a map, or with a <c>{badkey,Key}</c> + exception if no value is associated with <c><anno>Key</anno></c>.</p> + <p><em>Example:</em></p> + <code type="none"> > Map = #{"a" => 1}. #{"a" => 1} > maps:update("a", 42, Map). #{"a" => 42}</code> - </desc> - </func> + </desc> + </func> - <func> - <name name="update_with" arity="3"/> - <fsummary></fsummary> - <desc> - <p>Update a value in a <c><anno>Map1</anno></c> associated with <c><anno>Key</anno></c> by - calling <c><anno>Fun</anno></c> on the old value to get a new value. An exception - <c>{badkey,<anno>Key</anno>}</c> is generated if - <c><anno>Key</anno></c> is not present in the map.</p> - <p>Example:</p> - <code type="none"> + <func> + <name name="update_with" arity="3"/> + <fsummary></fsummary> + <desc> + <p>Update a value in a <c><anno>Map1</anno></c> associated + with <c><anno>Key</anno></c> by calling + <c><anno>Fun</anno></c> on the old value to get a new + value. An exception <c>{badkey,<anno>Key</anno>}</c> is + generated if <c><anno>Key</anno></c> is not present in the + map.</p> + <p>Example:</p> + <code type="none"> > Map = #{"counter" => 1}, Fun = fun(V) -> V + 1 end, maps:update_with("counter",Fun,Map). #{"counter" => 2}</code> - </desc> - </func> + </desc> + </func> - <func> - <name name="update_with" arity="4"/> - <fsummary></fsummary> - <desc> - <p>Update a value in a <c><anno>Map1</anno></c> associated with <c><anno>Key</anno></c> by - calling <c><anno>Fun</anno></c> on the old value to get a new value. - If <c><anno>Key</anno></c> is not present - in <c><anno>Map1</anno></c> then <c><anno>Init</anno></c> will be associated with - <c><anno>Key</anno></c>. - </p> - <p>Example:</p> - <code type="none"> + <func> + <name name="update_with" arity="4"/> + <fsummary></fsummary> + <desc> + <p>Update a value in a <c><anno>Map1</anno></c> associated + with <c><anno>Key</anno></c> by calling + <c><anno>Fun</anno></c> on the old value to get a new value. + If <c><anno>Key</anno></c> is not present in + <c><anno>Map1</anno></c> then <c><anno>Init</anno></c> will be + associated with <c><anno>Key</anno></c>. + </p> + <p>Example:</p> + <code type="none"> > Map = #{"counter" => 1}, Fun = fun(V) -> V + 1 end, maps:update_with("new counter",Fun,42,Map). @@ -417,56 +392,54 @@ error</code> </desc> </func> - <func> - <name name="values" arity="1"/> - <fsummary></fsummary> - <desc> - <p> - Returns a complete list of values, in arbitrary order, contained in map <c>Map</c>. - </p> - <p> - The call will fail with a <c>{badmap,Map}</c> exception if <c><anno>Map</anno></c> is not a map. - </p> - <p>Example:</p> - <code type="none"> + <func> + <name name="values" arity="1"/> + <fsummary></fsummary> + <desc> + <p>Returns a complete list of values, in arbitrary order, contained in + map <c>Map</c>.</p> + <p>The call fails with a <c>{badmap,Map}</c> exception if + <c><anno>Map</anno></c> is not a map.</p> + <p><em>Example:</em></p> + <code type="none"> > Map = #{42 => value_three,1337 => "value two","a" => 1}, maps:values(Map). [value_three,"value two",1]</code> - </desc> - </func> + </desc> + </func> - <func> - <name name="with" arity="2"/> - <fsummary></fsummary> - <desc> - <p> - Returns a new map <c><anno>Map2</anno></c> with the keys <c>K1</c> through <c>Kn</c> and their associated values from map <c><anno>Map1</anno></c>. - Any key in <c><anno>Ks</anno></c> that does not exist in <c><anno>Map1</anno></c> are ignored. - </p> - <p>Example:</p> - <code type="none"> + <func> + <name name="with" arity="2"/> + <fsummary></fsummary> + <desc> + <p>Returns a new map <c><anno>Map2</anno></c> with the keys <c>K1</c> + through <c>Kn</c> and their associated values from map + <c><anno>Map1</anno></c>. Any key in <c><anno>Ks</anno></c> that does + not exist in <c><anno>Map1</anno></c> is ignored.</p> + <p><em>Example:</em></p> + <code type="none"> > Map = #{42 => value_three,1337 => "value two","a" => 1}, Ks = ["a",42,"other key"], maps:with(Ks,Map). #{42 => value_three,"a" => 1}</code> - </desc> - </func> + </desc> + </func> - <func> - <name name="without" arity="2"/> - <fsummary></fsummary> - <desc> - <p> - Returns a new map <c><anno>Map2</anno></c> without the keys <c>K1</c> through <c>Kn</c> and their associated values from map <c><anno>Map1</anno></c>. - Any key in <c><anno>Ks</anno></c> that does not exist in <c><anno>Map1</anno></c> are ignored. - </p> - <p>Example:</p> - <code type="none"> + <func> + <name name="without" arity="2"/> + <fsummary></fsummary> + <desc> + <p>Returns a new map <c><anno>Map2</anno></c> without keys <c>K1</c> + through <c>Kn</c> and their associated values from map + <c><anno>Map1</anno></c>. Any key in <c><anno>Ks</anno></c> that does + not exist in <c><anno>Map1</anno></c> is ignored</p> + <p><em>Example:</em></p> + <code type="none"> > Map = #{42 => value_three,1337 => "value two","a" => 1}, Ks = ["a",42,"other key"], maps:without(Ks,Map). #{1337 => "value two"}</code> - </desc> - </func> - </funcs> + </desc> + </func> + </funcs> </erlref> diff --git a/lib/stdlib/doc/src/math.xml b/lib/stdlib/doc/src/math.xml index 38084638f6..1358ce5cbf 100644 --- a/lib/stdlib/doc/src/math.xml +++ b/lib/stdlib/doc/src/math.xml @@ -30,78 +30,86 @@ <docno>1</docno> <approved>Bjarne Däcker</approved> <checked></checked> - <date>97-01-15</date> + <date>1997-01-15</date> <rev>B</rev> - <file>math.sgml</file> + <file>math.xml</file> </header> <module>math</module> - <modulesummary>Mathematical Functions</modulesummary> + <modulesummary>Mathematical functions.</modulesummary> <description> <p>This module provides an interface to a number of mathematical functions.</p> + <note> - <p>Not all functions are implemented on all platforms. In particular, - the <c>erf/1</c> and <c>erfc/1</c> functions are not implemented on Windows.</p> + <p>Not all functions are provided on all platforms. In particular, + the <seealso marker="#erf/1"><c>erf/1</c></seealso> and + <seealso marker="#erfc/1"><c>erfc/1</c></seealso> functions + are not provided on Windows.</p> </note> </description> + <funcs> <func> - <name name="pi" arity="0"/> - <fsummary>A useful number</fsummary> - <desc> - <p>A useful number.</p> - </desc> - </func> - <func> - <name name="sin" arity="1"/> - <name name="cos" arity="1"/> - <name name="tan" arity="1"/> - <name name="asin" arity="1"/> <name name="acos" arity="1"/> + <name name="acosh" arity="1"/> + <name name="asin" arity="1"/> + <name name="asinh" arity="1"/> <name name="atan" arity="1"/> <name name="atan2" arity="2"/> - <name name="sinh" arity="1"/> - <name name="cosh" arity="1"/> - <name name="tanh" arity="1"/> - <name name="asinh" arity="1"/> - <name name="acosh" arity="1"/> <name name="atanh" arity="1"/> + <name name="cos" arity="1"/> + <name name="cosh" arity="1"/> <name name="exp" arity="1"/> <name name="log" arity="1"/> - <name name="log2" arity="1"/> <name name="log10" arity="1"/> + <name name="log2" arity="1"/> <name name="pow" arity="2"/> + <name name="sin" arity="1"/> + <name name="sinh" arity="1"/> <name name="sqrt" arity="1"/> - <fsummary>Diverse math functions</fsummary> - <type variable="X" name_i="7"/> - <type variable="Y" name_i="7"/> + <name name="tan" arity="1"/> + <name name="tanh" arity="1"/> + <fsummary>Diverse math functions.</fsummary> + <type variable="X" name_i="6"/> + <type variable="Y" name_i="6"/> <desc> - <p>A collection of math functions which return floats. Arguments - are numbers. </p> + <p>A collection of mathematical functions that return floats. Arguments + are numbers.</p> </desc> </func> + <func> <name name="erf" arity="1"/> <fsummary>Error function.</fsummary> <desc> - <p>Returns the error function of <c><anno>X</anno></c>, where</p> + <p>Returns the error function of <c><anno>X</anno></c>, where:</p> <pre> -erf(X) = 2/sqrt(pi)*integral from 0 to X of exp(-t*t) dt. </pre> +erf(X) = 2/sqrt(pi)*integral from 0 to X of exp(-t*t) dt.</pre> </desc> </func> + <func> <name name="erfc" arity="1"/> - <fsummary>Another error function</fsummary> + <fsummary>Another error function.</fsummary> <desc> - <p><c>erfc(X)</c> returns <c>1.0 - erf(X)</c>, computed by - methods that avoid cancellation for large <c><anno>X</anno></c>. </p> + <p><c>erfc(X)</c> returns <c>1.0</c> - <c>erf(X)</c>, computed by + methods that avoid cancellation for large <c><anno>X</anno></c>.</p> </desc> </func> + + <func> + <name name="pi" arity="0"/> + <fsummary>A useful number.</fsummary> + <desc> + <p>A useful number.</p> + </desc> + </func> + </funcs> <section> - <title>Bugs</title> - <p>As these are the C library, the bugs are the same.</p> + <title>Limitations</title> + <p>As these are the C library, the same limitations apply.</p> </section> </erlref> diff --git a/lib/stdlib/doc/src/ms_transform.xml b/lib/stdlib/doc/src/ms_transform.xml index 84712486ea..0a05fa37c5 100644 --- a/lib/stdlib/doc/src/ms_transform.xml +++ b/lib/stdlib/doc/src/ms_transform.xml @@ -28,65 +28,81 @@ <docno>1</docno> <approved>Bjarne Däcker</approved> <checked></checked> - <date>99-02-09</date> + <date>1999-02-09</date> <rev>C</rev> - <file>ms_transform.sgml</file> + <file>ms_transform.xml</file> </header> <module>ms_transform</module> - <modulesummary>Parse_transform that translates fun syntax into match specifications. </modulesummary> + <modulesummary>A parse transformation that translates fun syntax into match + specifications.</modulesummary> <description> <marker id="top"></marker> - <p>This module implements the parse_transform that makes calls to - <c>ets</c> and <c>dbg</c>:<c>fun2ms/1</c> translate into literal - match specifications. It also implements the back end for the same - functions when called from the Erlang shell.</p> - <p>The translations from fun's to match_specs - is accessed through the two "pseudo - functions" <c>ets:fun2ms/1</c> and <c>dbg:fun2ms/1</c>.</p> - <p>Actually this introduction is more or less an introduction to the - whole concept of match specifications. Since everyone trying to use - <c>ets:select</c> or <c>dbg</c> seems to end up reading - this page, it seems in good place to explain a little more than - just what this module does.</p> - <p>There are some caveats one should be aware of, please read through - the whole manual page if it's the first time you're using the - transformations. </p> - <p>Match specifications are used more or less as filters. - They resemble usual Erlang matching in a list comprehension or in - a <c>fun</c> used in conjunction with <c>lists:foldl</c> etc. The - syntax of pure match specifications is somewhat awkward though, as - they are made up purely by Erlang terms and there is no syntax in the - language to make the match specifications more readable.</p> - <p>As the match specifications execution and structure is quite like - that of a fun, it would for most programmers be more straight forward - to simply write it using the familiar fun syntax and having that - translated into a match specification automatically. Of course a real - fun is more powerful than the match specifications allow, but bearing - the match specifications in mind, and what they can do, it's still + <p>This module provides the parse transformation that makes calls to + <seealso marker="ets"><c>ets</c></seealso> and + <seealso marker="runtime_tools:dbg#fun2ms/1"><c>dbg:fun2ms/1</c></seealso> + translate into literal match specifications. It also provides the back end + for the same functions when called from the Erlang shell.</p> + + <p>The translation from funs to match specifications + is accessed through the two "pseudo functions" + <seealso marker="ets#fun2ms/1"><c>ets:fun2ms/1</c></seealso> and + <seealso marker="runtime_tools:dbg#fun2ms/1"><c>dbg:fun2ms/1</c></seealso>.</p> + + <p>As everyone trying to use + <seealso marker="ets#select/1"><c>ets:select/2</c></seealso> or + <seealso marker="runtime_tools:dbg"><c>dbg</c></seealso> seems to end up + reading this manual page, this description is an introduction to the + concept of match specifications.</p> + + <p>Read the whole manual page if it is the first time you are using + the transformations.</p> + + <p>Match specifications are used more or less as filters. They resemble + usual Erlang matching in a list comprehension or in a fun used with + <seealso marker="lists#foldl/3"><c>lists:foldl/3</c></seealso>, and so on. + However, the syntax of pure match specifications is awkward, as + they are made up purely by Erlang terms, and the language has no + syntax to make the match specifications more readable.</p> + + <p>As the execution and structure of the match specifications are like + that of a fun, it is more straightforward + to write it using the familiar fun syntax and to have that + translated into a match specification automatically. A real fun is + clearly more powerful than the match specifications allow, but bearing + the match specifications in mind, and what they can do, it is still more convenient to write it all as a fun. This module contains the - code that simply translates the fun syntax into match_spec terms.</p> - <p>Let's start with an ets example. Using <c>ets:select</c> and - a match specification, one can filter out rows of a table and construct - a list of tuples containing relevant parts of the data in these - rows. Of course one could use <c>ets:foldl</c> instead, but the - select call is far more efficient. Without the translation, one has to - struggle with writing match specifications terms to accommodate this, - or one has to resort to the less powerful - <c>ets:match(_object)</c> calls, or simply give up and use - the more inefficient method of <c>ets:foldl</c>. Using the - <c>ets:fun2ms</c> transformation, a <c>ets:select</c> call - is at least as easy to write as any of the alternatives.</p> - <p>As an example, consider a simple table of employees:</p> + code that translates the fun syntax into match specification + terms.</p> + </description> + + <section> + <title>Example 1</title> + <p>Using <seealso marker="ets#select/2"><c>ets:select/2</c></seealso> + and a match specification, one can filter out rows of + a table and construct a list of tuples containing relevant parts + of the data in these rows. + One can use <seealso marker="ets#foldl/3"><c>ets:foldl/3</c></seealso> + instead, but the <c>ets:select/2</c> call is far more efficient. + Without the translation provided by <c>ms_transform</c>, + one must struggle with writing match specifications terms + to accommodate this.</p> + + <p>Consider a simple table of employees:</p> + <code type="none"> -record(emp, {empno, %Employee number as a string, the key surname, %Surname of the employee givenname, %Given name of employee - dept, %Department one of {dev,sales,prod,adm} - empyear}). %Year the employee was employed </code> + dept, %Department, one of {dev,sales,prod,adm} + empyear}). %Year the employee was employed</code> + <p>We create the table using:</p> + <code type="none"> -ets:new(emp_tab,[{keypos,#emp.empno},named_table,ordered_set]). </code> - <p>Let's also fill it with some randomly chosen data for the examples:</p> +ets:new(emp_tab, [{keypos,#emp.empno},named_table,ordered_set]).</code> + + <p>We fill the table with randomly chosen data:</p> + <code type="none"> [{emp,"011103","Black","Alfred",sales,2000}, {emp,"041231","Doe","John",prod,2001}, @@ -96,167 +112,204 @@ ets:new(emp_tab,[{keypos,#emp.empno},named_table,ordered_set]). </code> {emp,"535216","Chalker","Samuel",adm,1998}, {emp,"789789","Harrysson","Joe",adm,1996}, {emp,"963721","Scott","Juliana",dev,2003}, - {emp,"989891","Brown","Gabriel",prod,1999}] </code> - <p>Now, the amount of data in the table is of course to small to justify - complicated ets searches, but on real tables, using <c>select</c> to get - exactly the data you want will increase efficiency remarkably.</p> - <p>Lets say for example that we'd want the employee numbers of - everyone in the sales department. One might use <c>ets:match</c> - in such a situation:</p> + {emp,"989891","Brown","Gabriel",prod,1999}]</code> + + <p>Assuming that we want the employee numbers of everyone in the sales + department, there are several ways.</p> + + <p><c>ets:match/2</c> can be used:</p> + <pre> 1> <input>ets:match(emp_tab, {'_', '$1', '_', '_', sales, '_'}).</input> -[["011103"],["076324"]] </pre> - <p>Even though <c>ets:match</c> does not require a full match - specification, but a simpler type, it's still somewhat unreadable, and - one has little control over the returned result, it's always a list of - lists. OK, one might use <c>ets:foldl</c> or - <c>ets:foldr</c> instead:</p> +[["011103"],["076324"]]</pre> + + <p><c>ets:match/2</c> uses a simpler type of match specification, + but it is still unreadable, and one has little control over the + returned result. It is always a list of lists.</p> + + <p><seealso marker="ets#foldl/3"><c>ets:foldl/3</c></seealso> or + <seealso marker="ets#foldr/3"><c>ets:foldr/3</c></seealso> can be used to avoid the nested lists:</p> + <code type="none"> ets:foldr(fun(#emp{empno = E, dept = sales},Acc) -> [E | Acc]; (_,Acc) -> Acc end, [], - emp_tab). </code> - <p>Running that would result in <c>["011103","076324"]</c> - , which at least gets rid of the extra lists. The fun is also quite + emp_tab).</code> + + <p>The result is <c>["011103","076324"]</c>. The fun is straightforward, so the only problem is that all the data from the - table has to be transferred from the table to the calling process for - filtering. That's inefficient compared to the <c>ets:match</c> + table must be transferred from the table to the calling process for + filtering. That is inefficient compared to the <c>ets:match/2</c> call where the filtering can be done "inside" the emulator and only - the result is transferred to the process. Remember that ets tables are - all about efficiency, if it wasn't for efficiency all of ets could be - implemented in Erlang, as a process receiving requests and sending - answers back. One uses ets because one wants performance, and - therefore one wouldn't want all of the table transferred to the - process for filtering. OK, let's look at a pure - <c>ets:select</c> call that does what the <c>ets:foldr</c> - does:</p> + the result is transferred to the process.</p> + + <p>Consider a "pure" <c>ets:select/2</c> call that does what + <c>ets:foldr</c> does:</p> + <code type="none"> -ets:select(emp_tab,[{#emp{empno = '$1', dept = sales, _='_'},[],['$1']}]). </code> - <p>Even though the record syntax is used, it's still somewhat hard to +ets:select(emp_tab, [{#emp{empno = '$1', dept = sales, _='_'},[],['$1']}]).</code> + + <p>Although the record syntax is used, it is still hard to read and even harder to write. The first element of the tuple, - <c>#emp{empno = '$1', dept = sales, _='_'}</c> tells what to - match, elements not matching this will not be returned at all, as in - the <c>ets:match</c> example. The second element, the empty list - is a list of guard expressions, which we need none, and the third + <c>#emp{empno = '$1', dept = sales, _='_'}</c>, tells what to + match. Elements not matching this are not returned, as in + the <c>ets:match/2</c> example. The second element, the empty list, + is a list of guard expressions, which we do not need. The third element is the list of expressions constructing the return value (in - ets this almost always is a list containing one single term). In our - case <c>'$1'</c> is bound to the employee number in the head - (first element of tuple), and hence it is the employee number that is - returned. The result is <c>["011103","076324"]</c>, just as in - the <c>ets:foldr</c> example, but the result is retrieved much - more efficiently in terms of execution speed and memory consumption.</p> - <p>We have one efficient but hardly readable way of doing it and one - inefficient but fairly readable (at least to the skilled Erlang - programmer) way of doing it. With the use of <c>ets:fun2ms</c>, - one could have something that is as efficient as possible but still is - written as a filter using the fun syntax:</p> + ETS this is almost always a list containing one single term). + In our case <c>'$1'</c> is bound to the employee number in the head + (first element of the tuple), and hence the employee number is + returned. The result is <c>["011103","076324"]</c>, as in + the <c>ets:foldr/3</c> example, but the result is retrieved much + more efficiently in terms of execution speed and + memory consumption.</p> + + <p>Using <c>ets:fun2ms/1</c>, we can combine the ease of use of + the <c>ets:foldr/3</c> and the efficiency of the pure + <c>ets:select/2</c> example:</p> + <code type="none"> -include_lib("stdlib/include/ms_transform.hrl"). -% ... - ets:select(emp_tab, ets:fun2ms( fun(#emp{empno = E, dept = sales}) -> E - end)). </code> - <p>This may not be the shortest of the expressions, but it requires no - special knowledge of match specifications to read. The fun's head - should simply match what you want to filter out and the body returns - what you want returned. As long as the fun can be kept within the - limits of the match specifications, there is no need to transfer all - data of the table to the process for filtering as in the - <c>ets:foldr</c> example. In fact it's even easier to read then - the <c>ets:foldr</c> example, as the select call in itself - discards anything that doesn't match, while the fun of the - <c>foldr</c> call needs to handle both the elements matching and - the ones not matching.</p> - <p>It's worth noting in the above <c>ets:fun2ms</c> example that one - needs to include <c>ms_transform.hrl</c> in the source code, as this is - what triggers the parse transformation of the <c>ets:fun2ms</c> call - to a valid match specification. This also implies that the - transformation is done at compile time (except when called from the - shell of course) and therefore will take no resources at all in - runtime. So although you use the more intuitive fun syntax, it gets as - efficient in runtime as writing match specifications by hand.</p> - <p>Let's look at some more <c>ets</c> examples. Let's say one - wants to get all the employee numbers of any employee hired before the - year 2000. Using <c>ets:match</c> isn't an alternative here as - relational operators cannot be expressed there. Once again, an - <c>ets:foldr</c> could do it (slowly, but correct):</p> + end)).</code> + + <p>This example requires no special knowledge of match + specifications to understand. The head of the fun matches what + you want to filter out and the body returns what you want + returned. As long as the fun can be kept within the limits of the + match specifications, there is no need to transfer all table data + to the process for filtering as in the <c>ets:foldr/3</c> + example. It is easier to read than the <c>ets:foldr/3</c> example, + as the select call in itself discards anything that does not + match, while the fun of the <c>ets:foldr/3</c> call needs to + handle both the elements matching and the ones not matching.</p> + + <p>In the <c>ets:fun2ms/1</c> example above, it is needed to + include <c>ms_transform.hrl</c> in the source code, as this is + what triggers the parse transformation of the <c>ets:fun2ms/1</c> + call to a valid match specification. This also implies that the + transformation is done at compile time (except when called from + the shell) and therefore takes no resources in runtime. That is, + although you use the more intuitive fun syntax, it gets as + efficient in runtime as writing match specifications by hand.</p> + </section> + + <section> + <title>Example 2</title> + <p>Assume that we want to get all the employee numbers of employees + hired before year 2000. Using <c>ets:match/2</c> is not + an alternative here, as relational operators cannot be + expressed there. + Once again, <c>ets:foldr/3</c> can do it (slowly, but correct):</p> + <code type="none"><![CDATA[ ets:foldr(fun(#emp{empno = E, empyear = Y},Acc) when Y < 2000 -> [E | Acc]; (_,Acc) -> Acc end, [], emp_tab). ]]></code> - <p>The result will be - <c>["052341","076324","535216","789789","989891"]</c>, as - expected. Now the equivalent expression using a handwritten match - specification would look something like this:</p> + + <p>The result is <c>["052341","076324","535216","789789","989891"]</c>, + as expected. The equivalent expression using a handwritten match + specification would look like this:</p> + <code type="none"><![CDATA[ -ets:select(emp_tab,[{#emp{empno = '$1', empyear = '$2', _='_'}, +ets:select(emp_tab, [{#emp{empno = '$1', empyear = '$2', _='_'}, [{'<', '$2', 2000}], ['$1']}]). ]]></code> - <p>This gives the same result, the <c><![CDATA[[{'<', '$2', 2000}]]]></c> is in - the guard part and therefore discards anything that does not have a - empyear (bound to '$2' in the head) less than 2000, just as the guard - in the <c>foldl</c> example. Lets jump on to writing it using - <c>ets:fun2ms</c></p> + + <p>This gives the same result. <c><![CDATA[[{'<', '$2', 2000}]]]></c> is in + the guard part and therefore discards anything that does not have an + <c>empyear</c> (bound to <c>'$2'</c> in the head) less than 2000, as + the guard in the <c>foldr/3</c> example.</p> + + <p>We write it using <c>ets:fun2ms/1</c>:</p> + <code type="none"><![CDATA[ -include_lib("stdlib/include/ms_transform.hrl"). -% ... - ets:select(emp_tab, ets:fun2ms( fun(#emp{empno = E, empyear = Y}) when Y < 2000 -> - E + E end)). ]]></code> - <p>Obviously readability is gained by using the parse transformation.</p> - <p>I'll show some more examples without the tiresome - comparing-to-alternatives stuff. Let's say we'd want the whole object - matching instead of only one element. We could of course assign a - variable to every part of the record and build it up once again in the - body of the <c>fun</c>, but it's easier to do like this:</p> + </section> + + <section> + <title>Example 3</title> + <p>Assume that we want the whole object matching instead of only one + element. One alternative is to assign a variable to every part + of the record and build it up once again in the body of the fun, but + the following is easier:</p> + <code type="none"><![CDATA[ ets:select(emp_tab, ets:fun2ms( fun(Obj = #emp{empno = E, empyear = Y}) when Y < 2000 -> Obj - end)). ]]></code> - <p>Just as in ordinary Erlang matching, you can bind a variable to the - whole matched object using a "match in then match", i.e. a - <c>=</c>. Unfortunately this is not general in <c>fun's</c> translated - to match specifications, only on the "top level", i.e. matching the - <em>whole</em> object arriving to be matched into a separate variable, - is it allowed. For the one's used to writing match specifications by - hand, I'll have to mention that the variable A will simply be - translated into '$_'. It's not general, but it has very common usage, - why it is handled as a special, but useful, case. If this bothers you, - the pseudo function <c>object</c> also returns the whole matched - object, see the part about caveats and limitations below.</p> - <p>Let's do something in the <c>fun</c>'s body too: Let's say - that someone realizes that there are a few people having an employee - number beginning with a zero (<c>0</c>), which shouldn't be - allowed. All those should have their numbers changed to begin with a - one (<c>1</c>) instead and one wants the - list <c><![CDATA[[{<Old empno>,<New empno>}]]]></c> created:</p> + end)).]]></code> + + <p>As in ordinary Erlang matching, you can bind a variable to the + whole matched object using a "match inside the match", that is, a + <c>=</c>. Unfortunately in funs translated to match specifications, + it is allowed only at the "top-level", that is, + matching the <em>whole</em> object arriving to be matched + into a separate variable. + If you are used to writing match specifications by hand, we + mention that variable A is simply translated into '$_'. + Alternatively, pseudo function <c>object/0</c> + also returns the whole matched object, see section + <seealso marker="#warnings_and_restrictions"> + Warnings and Restrictions</seealso>.</p> + </section> + + <section> + <title>Example 4</title> + <p>This example concerns the body of the fun. Assume that all employee + numbers beginning with zero (<c>0</c>) must be changed to begin with + one (<c>1</c>) instead, and that we want to create the list + <c><![CDATA[[{<Old empno>,<New empno>}]]]></c>:</p> + <code type="none"> ets:select(emp_tab, ets:fun2ms( fun(#emp{empno = [$0 | Rest] }) -> {[$0|Rest],[$1|Rest]} - end)). </code> - <p>As a matter of fact, this query hits the feature of partially bound - keys in the table type <c>ordered_set</c>, so that not the whole - table need be searched, only the part of the table containing keys - beginning with <c>0</c> is in fact looked into. </p> - <p>The fun of course can have several clauses, so that if one could do - the following: For each employee, if he or she is hired prior to 1997, - return the tuple <c><![CDATA[{inventory, <employee number>}]]></c>, for each hired 1997 - or later, but before 2001, return <c><![CDATA[{rookie, <employee number>}]]></c>, for all others return <c><![CDATA[{newbie, <employee number>}]]></c>. All except for the ones named <c>Smith</c> as - they would be affronted by anything other than the tag - <c>guru</c> and that is also what's returned for their numbers; - <c><![CDATA[{guru, <employee number>}]]></c>:</p> + end)).</code> + + <p>This query hits the feature of partially bound + keys in table type <c>ordered_set</c>, so that not the whole + table needs to be searched, only the part containing keys + beginning with <c>0</c> is looked into.</p> + </section> + + <section> + <title>Example 5</title> + <p>The fun can have many clauses. Assume that we want to do + the following:</p> + + <list type="bulleted"> + <item> + <p>If an employee started before 1997, return the tuple + <c><![CDATA[{inventory, <employee number>}]]></c>.</p> + </item> + <item> + <p>If an employee started 1997 or later, but before 2001, return + <c><![CDATA[{rookie, <employee number>}]]></c>.</p> + </item> + <item> + <p>For all other employees, return + <c><![CDATA[{newbie, <employee number>}]]></c>, except for those + named <c>Smith</c> as they would be affronted by anything other + than the tag <c>guru</c> and that is also what is returned for their + numbers: <c><![CDATA[{guru, <employee number>}]]></c>.</p> + </item> + </list> + + <p>This is accomplished as follows:</p> + <code type="none"><![CDATA[ ets:select(emp_tab, ets:fun2ms( fun(#emp{empno = E, surname = "Smith" }) -> @@ -268,7 +321,9 @@ ets:select(emp_tab, ets:fun2ms( (#emp{empno = E, empyear = Y}) -> % 1997 -- 2001 {rookie, E} end)). ]]></code> - <p>The result will be:</p> + + <p>The result is as follows:</p> + <code type="none"> [{rookie,"011103"}, {rookie,"041231"}, @@ -278,162 +333,207 @@ ets:select(emp_tab, ets:fun2ms( {rookie,"535216"}, {inventory,"789789"}, {newbie,"963721"}, - {rookie,"989891"}] </code> - <p>and so the Smith's will be happy...</p> - <p>So, what more can you do? Well, the simple answer would be; look - in the documentation of match specifications in ERTS users - guide. However let's briefly go through the most useful "built in - functions" that you can use when the <c>fun</c> is to be - translated into a match specification by <c>ets:fun2ms</c> (it's - worth mentioning, although it might be obvious to some, that calling - other functions than the one's allowed in match specifications cannot - be done. No "usual" Erlang code can be executed by the <c>fun</c> being - translated by <c>fun2ms</c>, the <c>fun</c> is after all limited + {rookie,"989891"}]</code> + </section> + + <section> + <title>Useful BIFs</title> + <p>What more can you do? A simple answer is: see the documentation of + <seealso marker="erts:match_spec">match specifications</seealso> + in ERTS User's Guide. + However, the following is a brief overview of the most useful "built-in + functions" that you can use when the fun is to be translated into a match + specification by + <seealso marker="ets#fun2ms/1"> <c>ets:fun2ms/1</c></seealso>. It is not + possible to call other functions than those allowed in match + specifications. No "usual" Erlang code can be executed by the fun that + is translated by <c>ets:fun2ms/1</c>. The fun is limited exactly to the power of the match specifications, which is - unfortunate, but the price one has to pay for the execution speed of - an <c>ets:select</c> compared to <c>ets:foldl/foldr</c>).</p> - <p>The head of the <c>fun</c> is obviously a head matching (or mismatching) - <em>one</em> parameter, one object of the table we <c>select</c> + unfortunate, but the price one must pay for the execution speed of + <c>ets:select/2</c> compared to <c>ets:foldl/foldr</c>.</p> + + <p>The head of the fun is a head matching (or mismatching) + <em>one</em> parameter, one object of the table we select from. The object is always a single variable (can be <c>_</c>) or - a tuple, as that's what's in <c>ets, dets</c> and - <c>mnesia</c> tables (the match specification returned by - <c>ets:fun2ms</c> can of course be used with - <c>dets:select</c> and <c>mnesia:select</c> as well as - with <c>ets:select</c>). The use of <c>=</c> in the head - is allowed (and encouraged) on the top level.</p> + a tuple, as ETS, Dets, and Mnesia tables include + that. The match specification returned by <c>ets:fun2ms/1</c> can + be used with <c>dets:select/2</c> and <c>mnesia:select/2</c>, and + with <c>ets:select/2</c>. The use of <c>=</c> in the head + is allowed (and encouraged) at the top-level.</p> + <p>The guard section can contain any guard expression of Erlang. - Even the "old" type test are allowed on the toplevel of the guard - (<c>integer(X)</c> instead of <c>is_integer(X)</c>). As the new type tests (the - <c>is_</c> tests) are in practice just guard bif's they can also - be called from within the body of the fun, but so they can in ordinary - Erlang code. Also arithmetics is allowed, as well as ordinary guard - bif's. Here's a list of bif's and expressions:</p> + The following is a list of BIFs and expressions:</p> + <list type="bulleted"> - <item>The type tests: is_atom, is_float, is_integer, - is_list, is_number, is_pid, is_port, is_reference, is_tuple, - is_binary, is_function, is_record</item> - <item>The boolean operators: not, and, or, andalso, orelse </item> - <item>The relational operators: >, >=, <, =<, =:=, ==, =/=, /=</item> - <item>Arithmetics: +, -, *, div, rem</item> - <item>Bitwise operators: band, bor, bxor, bnot, bsl, bsr</item> - <item>The guard bif's: abs, element, hd, length, node, round, size, tl, - trunc, self</item> - <item>The obsolete type test (only in guards): - atom, float, integer, - list, number, pid, port, reference, tuple, - binary, function, record</item> + <item> + <p>Type tests: <c>is_atom</c>, <c>is_float</c>, <c>is_integer</c>, + <c>is_list</c>, <c>is_number</c>, <c>is_pid</c>, <c>is_port</c>, + <c>is_reference</c>, <c>is_tuple</c>, <c>is_binary</c>, + <c>is_function</c>, <c>is_record</c></p> + </item> + <item> + <p>Boolean operators: <c>not</c>, <c>and</c>, <c>or</c>, + <c>andalso</c>, <c>orelse</c></p> + </item> + <item> + <p>Relational operators: >, >=, <, =<, =:=, ==, =/=, /=</p> + </item> + <item> + <p>Arithmetics: <c>+</c>, <c>-</c>, <c>*</c>, + <c>div</c>, <c>rem</c></p> + </item> + <item> + <p>Bitwise operators: <c>band</c>, <c>bor</c>, <c>bxor</c>, <c>bnot</c>, + <c>bsl</c>, <c>bsr</c></p> + </item> + <item> + <p>The guard BIFs: <c>abs</c>, <c>element</c>, + <c>hd</c>, <c>length</c>, + <c>node</c>, <c>round</c>, <c>size</c>, <c>tl</c>, <c>trunc</c>, + <c>self</c></p> + </item> </list> + <p>Contrary to the fact with "handwritten" match specifications, the <c>is_record</c> guard works as in ordinary Erlang code.</p> - <p>Semicolons (<c>;</c>) in guards are allowed, the result will be (as - expected) one "match_spec-clause" for each semicolon-separated - part of the guard. The semantics being identical to the Erlang + + <p>Semicolons (<c>;</c>) in guards are allowed, the result is (as + expected) one "match specification clause" for each semicolon-separated + part of the guard. The semantics is identical to the Erlang semantics.</p> - <p>The body of the <c>fun</c> is used to construct the - resulting value. When selecting from tables one usually just construct + + <p>The body of the fun is used to construct the + resulting value. When selecting from tables, one usually construct a suiting term here, using ordinary Erlang term construction, like - tuple parentheses, list brackets and variables matched out in the - head, possibly in conjunction with the occasional constant. Whatever - expressions are allowed in guards are also allowed here, but there are - no special functions except <c>object</c> and + tuple parentheses, list brackets, and variables matched out in the + head, possibly with the occasional constant. Whatever + expressions are allowed in guards are also allowed here, but no special + functions exist except <c>object</c> and <c>bindings</c> (see further down), which returns the whole - matched object and all known variable bindings respectively.</p> + matched object and all known variable bindings, respectively.</p> + <p>The <c>dbg</c> variants of match specifications have an - imperative approach to the match specification body, the ets dialect - hasn't. The fun body for <c>ets:fun2ms</c> returns the result - without side effects, and as matching (<c>=</c>) in the body of + imperative approach to the match specification body, the ETS + dialect has not. The fun body for <c>ets:fun2ms/1</c> returns the result + without side effects. As matching (<c>=</c>) in the body of the match specifications is not allowed (for performance reasons) the - only thing left, more or less, is term construction...</p> - <p>Let's move on to the <c>dbg</c> dialect, the slightly - different match specifications translated by <c>dbg:fun2ms</c>. </p> - <p>The same reasons for using the parse transformation applies to - <c>dbg</c>, maybe even more so as filtering using Erlang code is - simply not a good idea when tracing (except afterwards, if you trace - to file). The concept is similar to that of <c>ets:fun2ms</c> - except that you usually use it directly from the shell (which can also - be done with <c>ets:fun2ms</c>). </p> - <p>Let's manufacture a toy module to trace on </p> + only thing left, more or less, is term construction.</p> + </section> + + <section> + <title>Example with dbg</title> + <p>This section describes the slightly different match specifications + translated by <seealso marker="runtime_tools:dbg#fun2ms/1"> + <c>dbg:fun2ms/1</c></seealso>.</p> + + <p>The same reasons for using the parse transformation apply to + <c>dbg</c>, maybe even more, as filtering using Erlang code is + not a good idea when tracing (except afterwards, if you trace + to file). The concept is similar to that of <c>ets:fun2ms/1</c> + except that you usually use it directly from the shell + (which can also be done with <c>ets:fun2ms/1</c>).</p> + + <p>The following is an example module to trace on:</p> + <code type="none"> -module(toy). -export([start/1, store/2, retrieve/1]). start(Args) -> - toy_table = ets:new(toy_table,Args). + toy_table = ets:new(toy_table, Args). store(Key, Value) -> - ets:insert(toy_table,{Key,Value}). + ets:insert(toy_table, {Key,Value}). retrieve(Key) -> - [{Key, Value}] = ets:lookup(toy_table,Key), - Value. </code> - <p>During model testing, the first test bails out with a + [{Key, Value}] = ets:lookup(toy_table, Key), + Value.</code> + + <p>During model testing, the first test results in <c>{badmatch,16}</c> in <c>{toy,start,1}</c>, why?</p> - <p>We suspect the ets call, as we match hard on the return value, but - want only the particular <c>new</c> call with - <c>toy_table</c> as first parameter. - So we start a default tracer on the node:</p> + + <p>We suspect the <c>ets:new/2</c> call, as we match hard on the + return value, but want only the particular <c>new/2</c> call with + <c>toy_table</c> as first parameter. So we start a default tracer + on the node:</p> + <pre> 1> <input>dbg:tracer().</input> {ok,<0.88.0>}</pre> - <p>And so we turn on call tracing for all processes, we are going to - make a pretty restrictive trace pattern, so there's no need to call - trace only a few processes (it usually isn't):</p> + + <p>We turn on call tracing for all processes, we want to + make a pretty restrictive trace pattern, so there is no need to call + trace only a few processes (usually it is not):</p> + <pre> 2> <input>dbg:p(all,call).</input> -{ok,[{matched,nonode@nohost,25}]} </pre> - <p>It's time to specify the filter. We want to view calls that resemble - <c><![CDATA[ets:new(toy_table,<something>)]]></c>:</p> +{ok,[{matched,nonode@nohost,25}]}</pre> + + <p>We specify the filter, we want to view calls that resemble + <c><![CDATA[ets:new(toy_table, <something>)]]></c>:</p> + <pre> 3> <input>dbg:tp(ets,new,dbg:fun2ms(fun([toy_table,_]) -> true end)).</input> -{ok,[{matched,nonode@nohost,1},{saved,1}]} </pre> - <p>As can be seen, the <c>fun</c>'s used with - <c>dbg:fun2ms</c> takes a single list as parameter instead of a +{ok,[{matched,nonode@nohost,1},{saved,1}]}</pre> + + <p>As can be seen, the fun used with + <c>dbg:fun2ms/1</c> takes a single list as parameter instead of a single tuple. The list matches a list of the parameters to the traced - function. A single variable may also be used of course. The body - of the fun expresses in a more imperative way actions to be taken if - the fun head (and the guards) matches. I return <c>true</c> here, but it's - only because the body of a fun cannot be empty, the return value will - be discarded. </p> - <p>When we run the test of our module now, we get the following trace - output:</p> + function. A single variable can also be used. The body + of the fun expresses, in a more imperative way, actions to be taken if + the fun head (and the guards) matches. <c>true</c> is returned here, + only because the body of a fun cannot be empty. The return value + is discarded.</p> + + <p>The following trace output is received during test:</p> + <code type="none"><![CDATA[ -(<0.86.0>) call ets:new(toy_table,[ordered_set]) ]]></code> - <p>Let's play we haven't spotted the problem yet, and want to see what - <c>ets:new</c> returns. We do a slightly different trace - pattern:</p> +(<0.86.0>) call ets:new(toy_table, [ordered_set]) ]]></code> + + <p>Assume that we have not found the problem yet, and want to see what + <c>ets:new/2</c> returns. We use a slightly different trace pattern:</p> + <pre> 4> <input>dbg:tp(ets,new,dbg:fun2ms(fun([toy_table,_]) -> return_trace() end)).</input></pre> - <p>Resulting in the following trace output when we run the test:</p> + + <p>The following trace output is received during test:</p> + <code type="none"><![CDATA[ (<0.86.0>) call ets:new(toy_table,[ordered_set]) (<0.86.0>) returned from ets:new/2 -> 24 ]]></code> - <p>The call to <c>return_trace</c>, makes a trace message appear + + <p>The call to <c>return_trace</c> results in a trace message when the function returns. It applies only to the specific function call triggering the match specification (and matching the head/guards of - the match specification). This is the by far the most common call in the + the match specification). This is by far the most common call in the body of a <c>dbg</c> match specification.</p> - <p>As the test now fails with <c>{badmatch,24}</c>, it's obvious - that the badmatch is because the atom <c>toy_table</c> does not - match the number returned for an unnamed table. So we spotted the - problem, the table should be named and the arguments supplied by our - test program does not include <c>named_table</c>. We rewrite the - start function to:</p> + + <p>The test now fails with <c>{badmatch,24}</c> because the atom + <c>toy_table</c> does not match the number returned for an unnamed table. + So, the problem is found, the table is to be named, and the arguments + supplied by the test program do not include <c>named_table</c>. We + rewrite the start function:</p> + <code type="none"> start(Args) -> - toy_table = ets:new(toy_table,[named_table |Args]). </code> - <p>And with the same tracing turned on, we get the following trace - output:</p> + toy_table = ets:new(toy_table, [named_table|Args]).</code> + + <p>With the same tracing turned on, the following trace output is + received:</p> + <code type="none"><![CDATA[ (<0.86.0>) call ets:new(toy_table,[named_table,ordered_set]) (<0.86.0>) returned from ets:new/2 -> toy_table ]]></code> - <p>Very well. Let's say the module now passes all testing and goes into - the system. After a while someone realizes that the table - <c>toy_table</c> grows while the system is running and that for some - reason there are a lot of elements with atom's as keys. You had - expected only integer keys and so does the rest of the system. Well, - obviously not all of the system. You turn on call tracing and try to - see calls to your module with an atom as the key:</p> + + <p>Assume that the module now passes all testing and goes into + the system. After a while, it is found that table + <c>toy_table</c> grows while the system is running and that + there are many elements with atoms as keys. We expected + only integer keys and so does the rest of the system, but + clearly not the entire system. We turn on call tracing and try to + see calls to the module with an atom as the key:</p> + <pre> 1> <input>dbg:tracer().</input> {ok,<0.88.0>} @@ -441,80 +541,101 @@ start(Args) -> {ok,[{matched,nonode@nohost,25}]} 3> <input>dbg:tpl(toy,store,dbg:fun2ms(fun([A,_]) when is_atom(A) -> true end)).</input> {ok,[{matched,nonode@nohost,1},{saved,1}]}</pre> - <p>We use <c>dbg:tpl</c> here to make sure to catch local calls - (let's say the module has grown since the smaller version and we're - not sure this inserting of atoms is not done locally...). When in - doubt always use local call tracing.</p> - <p>Let's say nothing happens when we trace in this way. Our function - is never called with these parameters. We make the conclusion that - someone else (some other module) is doing it and we realize that we - must trace on ets:insert and want to see the calling function. The - calling function may be retrieved using the match specification - function <c>caller</c> and to get it into the trace message, one - has to use the match spec function <c>message</c>. The filter - call looks like this (looking for calls to <c>ets:insert</c>):</p> + + <p>We use <c>dbg:tpl/3</c> to ensure to catch local calls + (assume that the module has grown since the smaller version and we are + unsure if this inserting of atoms is not done locally). When in + doubt, always use local call tracing.</p> + + <p>Assume that nothing happens when tracing in this way. The function + is never called with these parameters. We conclude that + someone else (some other module) is doing it and realize that we + must trace on <c>ets:insert/2</c> and want to see the calling function. + The calling function can be retrieved using the match specification + function <c>caller</c>. To get it into the trace message, the match + specification function <c>message</c> must be used. The filter + call looks like this (looking for calls to <c>ets:insert/2</c>):</p> + <pre> 4> <input>dbg:tpl(ets,insert,dbg:fun2ms(fun([toy_table,{A,_}]) when is_atom(A) -> </input> <input> message(caller()) </input> <input> end)). </input> -{ok,[{matched,nonode@nohost,1},{saved,2}]} </pre> - <p>The caller will now appear in the "additional message" part of the - trace output, and so after a while, the following output comes:</p> +{ok,[{matched,nonode@nohost,1},{saved,2}]}</pre> + + <p>The caller is now displayed in the "additional message" part of the + trace output, and the following is displayed after a while:</p> + <code type="none"><![CDATA[ (<0.86.0>) call ets:insert(toy_table,{garbage,can}) ({evil_mod,evil_fun,2}) ]]></code> - <p>You have found out that the function <c>evil_fun</c> of the - module <c>evil_mod</c>, with arity <c>2</c>, is the one - causing all this trouble.</p> - <p>This was just a toy example, but it illustrated the most used - calls in match specifications for <c>dbg</c> The other, more - esotheric calls are listed and explained in the <em>Users guide of the ERTS application</em>, they really are beyond the scope of this - document.</p> - <p>To end this chatty introduction with something more precise, here - follows some parts about caveats and restrictions concerning the fun's - used in conjunction with <c>ets:fun2ms</c> and - <c>dbg:fun2ms</c>:</p> + + <p>You have realized that function <c>evil_fun</c> of the + <c>evil_mod</c> module, with arity <c>2</c>, is causing all this trouble. + </p> + + <p>This example illustrates the most used calls in match specifications for + <c>dbg</c>. The other, more esoteric, calls are listed and explained in + <seealso marker="erts:match_spec">Match specifications in Erlang</seealso> + in ERTS User's Guide, as they are beyond + the scope of this description.</p> + </section> + + <section> + <title>Warnings and Restrictions</title> + <marker id="warnings_and_restrictions"/> + <p>The following warnings and restrictions apply to the funs used in + with <c>ets:fun2ms/1</c> and <c>dbg:fun2ms/1</c>.</p> + <warning> - <p>To use the pseudo functions triggering the translation, one - <em>has to</em> include the header file <c>ms_transform.hrl</c> - in the source code. Failure to do so will possibly result in - runtime errors rather than compile time, as the expression may + <p>To use the pseudo functions triggering the translation, + ensure to include the header file <c>ms_transform.hrl</c> + in the source code. Failure to do so possibly results in + runtime errors rather than compile time, as the expression can be valid as a plain Erlang program without translation.</p> </warning> + <warning> - <p>The <c>fun</c> has to be literally constructed inside the - parameter list to the pseudo functions. The <c>fun</c> cannot + <p>The fun must be literally constructed inside the + parameter list to the pseudo functions. The fun cannot be bound to a variable first and then passed to - <c>ets:fun2ms</c> or <c>dbg:fun2ms</c>, i.e this - will work: <c>ets:fun2ms(fun(A) -> A end)</c> but not this: - <c>F = fun(A) -> A end, ets:fun2ms(F)</c>. The later will result - in a compile time error if the header is included, otherwise a - runtime error. Even if the later construction would ever - appear to work, it really doesn't, so don't ever use it.</p> + <c>ets:fun2ms/1</c> or <c>dbg:fun2ms/1</c>. For example, + <c>ets:fun2ms(fun(A) -> A end)</c> works, but not + <c>F = fun(A) -> A end, ets:fun2ms(F)</c>. The latter results + in a compile-time error if the header is included, otherwise a + runtime error.</p> </warning> - <p>Several restrictions apply to the fun that is being translated - into a match_spec. To put it simple you cannot use anything in - the fun that you cannot use in a match_spec. This means that, + + <p>Many restrictions apply to the fun that is translated into a match + specification. To put it simple: you cannot use anything in the fun + that you cannot use in a match specification. This means that, among others, the following restrictions apply to the fun itself:</p> + <list type="bulleted"> - <item>Functions written in Erlang cannot be called, neither - local functions, global functions or real fun's</item> - <item>Everything that is written as a function call will be - translated into a match_spec call to a builtin function, so that - the call <c>is_list(X)</c> will be translated to <c>{'is_list', '$1'}</c> (<c>'$1'</c> is just an example, the numbering may - vary). If one tries to call a function that is not a match_spec - builtin, it will cause an error.</item> - <item>Variables occurring in the head of the <c>fun</c> will be - replaced by match_spec variables in the order of occurrence, so - that the fragment <c>fun({A,B,C})</c> will be replaced by - <c>{'$1', '$2', '$3'}</c> etc. Every occurrence of such a - variable later in the match_spec will be replaced by a - match_spec variable in the same way, so that the fun - <c>fun({A,B}) when is_atom(A) -> B end</c> will be translated into - <c>[{{'$1','$2'},[{is_atom,'$1'}],['$2']}]</c>.</item> <item> - <p>Variables that are not appearing in the head are imported - from the environment and made into - match_spec <c>const</c> expressions. Example from the shell:</p> + <p>Functions written in Erlang cannot be called, neither can + local functions, global functions, or real funs.</p> + </item> + <item> + <p>Everything that is written as a function call is translated + into a match specification call to a built-in function, so that + the call <c>is_list(X)</c> is translated to <c>{'is_list', '$1'}</c> + (<c>'$1'</c> is only an example, the numbering can vary). + If one tries to call a function that is not a match specification + built-in, it causes an error.</p> + </item> + <item> + <p>Variables occurring in the head of the fun are replaced by + match specification variables in the order of occurrence, so + that fragment <c>fun({A,B,C})</c> is replaced by + <c>{'$1', '$2', '$3'}</c>, and so on. Every occurrence of such a + variable in the match specification is replaced by a match + specification variable in the same way, so that the fun + <c>fun({A,B}) when is_atom(A) -> B end</c> is translated into + <c>[{{'$1','$2'},[{is_atom,'$1'}],['$2']}]</c>.</p> + </item> + <item> + <p>Variables that are not included in the head are imported + from the environment and made into match specification + <c>const</c> expressions. Example from the shell:</p> <pre> 1> <input>X = 25.</input> 25 @@ -523,7 +644,7 @@ start(Args) -> </item> <item> <p>Matching with <c>=</c> cannot be used in the body. It can only - be used on the top level in the head of the fun. + be used on the top-level in the head of the fun. Example from the shell again:</p> <pre> 1> <input>ets:fun2ms(fun({A,[B|C]} = D) when A > B -> D end).</input> @@ -534,106 +655,125 @@ match_spec {error,transform_error} 3> <input>ets:fun2ms(fun({A,[B|C]}) when A > B -> D = [B|C], D end).</input> Error: fun with body matching ('=' in body) is illegal as match_spec -{error,transform_error} </pre> - <p>All variables are bound in the head of a match_spec, so the - translator can not allow multiple bindings. The special case - when matching is done on the top level makes the variable bind - to <c>'$_'</c> in the resulting match_spec, it is to allow a more - natural access to the whole matched object. The pseudo - function <c>object()</c> could be used instead, see below. - The following expressions are translated equally: </p> +{error,transform_error}</pre> + <p>All variables are bound in the head of a match specification, so + the translator cannot allow multiple bindings. The special case + when matching is done on the top-level makes the variable bind + to <c>'$_'</c> in the resulting match specification. It is to allow + a more natural access to the whole matched object. Pseudo + function <c>object()</c> can be used instead, see below.</p> + <p>The following expressions are translated equally:</p> <code type="none"> ets:fun2ms(fun({a,_} = A) -> A end). ets:fun2ms(fun({a,_}) -> object() end).</code> </item> <item> - <p>The special match_spec variables <c>'$_'</c> and <c>'$*'</c> + <p>The special match specification variables <c>'$_'</c> and <c>'$*'</c> can be accessed through the pseudo functions <c>object()</c> (for <c>'$_'</c>) and <c>bindings()</c> (for <c>'$*'</c>). - as an example, one could translate the following - <c>ets:match_object/2</c> call to a <c>ets:select</c> call:</p> + As an example, one can translate the following + <c>ets:match_object/2</c> call to a <c>ets:select/2</c> call:</p> <code type="none"> ets:match_object(Table, {'$1',test,'$2'}). </code> - <p>...is the same as...</p> + <p>This is the same as:</p> <code type="none"> ets:select(Table, ets:fun2ms(fun({A,test,B}) -> object() end)).</code> - <p>(This was just an example, in this simple case the former - expression is probably preferable in terms of readability). - The <c>ets:select/2</c> call will conceptually look like this + <p>In this simple case, the former + expression is probably preferable in terms of readability.</p> + <p>The <c>ets:select/2</c> call conceptually looks like this in the resulting code:</p> <code type="none"> ets:select(Table, [{{'$1',test,'$2'},[],['$_']}]).</code> - <p>Matching on the top level of the fun head might feel like a + <p>Matching on the top-level of the fun head can be a more natural way to access <c>'$_'</c>, see above.</p> </item> - <item>Term constructions/literals are translated as much as is - needed to get them into valid match_specs, so that tuples are - made into match_spec tuple constructions (a one element tuple - containing the tuple) and constant expressions are used when - importing variables from the environment. Records are also - translated into plain tuple constructions, calls to element - etc. The guard test <c>is_record/2</c> is translated into - match_spec code using the three parameter version that's built - into match_specs, so that <c>is_record(A,t)</c> is translated - into <c>{is_record,'$1',t,5}</c> given that the record size of - record type <c>t</c> is 5.</item> - <item>Language constructions like <c>case</c>, <c>if</c>, - <c>catch</c> etc that are not present in match_specs are not - allowed.</item> - <item>If the header file <c>ms_transform.hrl</c> is not included, - the fun won't be translated, which may result in a - <em>runtime error</em> (depending on if the fun is valid in a - pure Erlang context). Be absolutely sure that the header is - included when using <c>ets</c> and <c>dbg:fun2ms/1</c> in - compiled code.</item> - <item>If the pseudo function triggering the translation is - <c>ets:fun2ms/1</c>, the fun's head must contain a single - variable or a single tuple. If the pseudo function is - <c>dbg:fun2ms/1</c> the fun's head must contain a single - variable or a single list.</item> + <item> + <p>Term constructions/literals are translated as much as is needed to + get them into valid match specification. This way tuples are made + into match specification tuple constructions (a one element tuple + containing the tuple) and constant expressions are used when + importing variables from the environment. Records are also + translated into plain tuple constructions, calls to element, + and so on. The guard test <c>is_record/2</c> is translated into + match specification code using the three parameter version that is + built into match specification, so that <c>is_record(A,t)</c> is + translated into <c>{is_record,'$1',t,5}</c> if the record + size of record type <c>t</c> is 5.</p> + </item> + <item> + <p>Language constructions such as <c>case</c>, <c>if</c>, and + <c>catch</c> that are not present in match specifications are not + allowed.</p> + </item> + <item> + <p>If header file <c>ms_transform.hrl</c> is not included, + the fun is not translated, which can result in a + <em>runtime error</em> (depending on whether the fun is + valid in a pure Erlang context).</p> + <p>Ensure that the header is included when using <c>ets</c> and + <c>dbg:fun2ms/1</c> in compiled code.</p> + </item> + <item> + <p>If pseudo function triggering the translation is + <c>ets:fun2ms/1</c>, the head of the fun must contain a single + variable or a single tuple. If the pseudo function is + <c>dbg:fun2ms/1</c>, the head of the fun must contain a single + variable or a single list.</p> + </item> </list> - <p>The translation from fun's to match_specs is done at compile + <p>The translation from funs to match specifications is done at compile time, so runtime performance is not affected by using these pseudo - functions. The compile time might be somewhat longer though. </p> - <p>For more information about match_specs, please read about them - in <em>ERTS users guide</em>.</p> - </description> + functions.</p> + <p>For more information about match specifications, see the + <seealso marker="erts:match_spec">Match specifications in Erlang</seealso> + in ERTS User's Guide.</p> + </section> + <funcs> <func> - <name name="parse_transform" arity="2"/> - <fsummary>Transforms Erlang abstract format containing calls to ets/dbg:fun2ms into literal match specifications.</fsummary> - <type_desc variable="Options">Option list, required but not used.</type_desc> + <name name="format_error" arity="1"/> + <fsummary>Error formatting function as required by the parse transformation interface.</fsummary> <desc> - <p>Implements the actual transformation at compile time. This - function is called by the compiler to do the source code - transformation if and when the <c>ms_transform.hrl</c> header - file is included in your source code. See the <c>ets</c> and - <c>dbg</c>:<c>fun2ms/1</c> function manual pages for - documentation on how to use this parse_transform, see the - <c>match_spec</c> chapter in <c>ERTS</c> users guide for a - description of match specifications. </p> + <p>Takes an error code returned by one of the other functions + in the module and creates a textual description of the + error.</p> </desc> </func> + <func> - <name name="transform_from_shell" arity="3"/> - <fsummary>Used when transforming fun's created in the shell into match_specifications.</fsummary> - <type_desc variable="BoundEnvironment">List of variable bindings in the shell environment.</type_desc> + <name name="parse_transform" arity="2"/> + <fsummary>Transforms Erlang abstract format containing calls to + ets/dbg:fun2ms/1 into literal match specifications.</fsummary> + <type_desc variable="Options">Option list, required but not used. + </type_desc> <desc> - <p>Implements the actual transformation when the <c>fun2ms</c> - functions are called from the shell. In this case the abstract - form is for one single fun (parsed by the Erlang shell), and - all imported variables should be in the key-value list passed - as <c><anno>BoundEnvironment</anno></c>. The result is a term, normalized, - i.e. not in abstract format.</p> + <p>Implements the transformation at compile time. This + function is called by the compiler to do the source code + transformation if and when header file <c>ms_transform.hrl</c> + is included in the source code.</p> + <p>For information about how to use this parse transformation, see + <seealso marker="ets"><c>ets</c></seealso> and + <seealso marker="runtime_tools:dbg#fun2ms/1"> + <c>dbg:fun2ms/1</c></seealso>.</p> + <p>For a description of match specifications, see section + <seealso marker="erts:match_spec"> + Match Specification in Erlang</seealso> in ERTS User's Guide.</p> </desc> </func> + <func> - <name name="format_error" arity="1"/> - <fsummary>Error formatting function as required by the parse_transform interface.</fsummary> + <name name="transform_from_shell" arity="3"/> + <fsummary>Used when transforming funs created in the shell into + match_specifications.</fsummary> + <type_desc variable="BoundEnvironment">List of variable bindings in the + shell environment.</type_desc> <desc> - <p>Takes an error code returned by one of the other functions - in the module and creates a textual description of the - error. Fairly uninteresting function actually.</p> + <p>Implements the transformation when the <c>fun2ms/1</c> + functions are called from the shell. In this case, the abstract + form is for one single fun (parsed by the Erlang shell). + All imported variables are to be in the key-value list passed + as <c><anno>BoundEnvironment</anno></c>. The result is a term, + normalized, that is, not in abstract format.</p> </desc> </func> </funcs> diff --git a/lib/stdlib/doc/src/notes.xml b/lib/stdlib/doc/src/notes.xml index 76d49e37c2..f0347703e7 100644 --- a/lib/stdlib/doc/src/notes.xml +++ b/lib/stdlib/doc/src/notes.xml @@ -31,6 +31,21 @@ </header> <p>This document describes the changes made to the STDLIB application.</p> +<section><title>STDLIB 3.0.1</title> + + <section><title>Fixed Bugs and Malfunctions</title> + <list> + <item> + <p> Correct a bug regarding typed records in the Erlang + shell. The bug was introduced in OTP-19.0. </p> + <p> + Own Id: OTP-13719 Aux Id: ERL-182 </p> + </item> + </list> + </section> + +</section> + <section><title>STDLIB 3.0</title> <section><title>Fixed Bugs and Malfunctions</title> @@ -53,7 +68,7 @@ <item> <p> Avoid stray corner-case math errors on Solaris, e.g. an - error is thrown on undeflows in exp() and pow() when it + error is thrown on underflows in exp() and pow() when it shouldn't be.</p> <p> Own Id: OTP-13531</p> @@ -67,6 +82,28 @@ <p> Own Id: OTP-13534 Aux Id: ERL-135 </p> </item> + <item> + <p> + Fixed a bug in re on openbsd where sometimes re:run would + return an incorrect result.</p> + <p> + Own Id: OTP-13602</p> + </item> + <item> + <p> + To avoid potential timer bottleneck on supervisor + restart, timer server is no longer used when the + supervisor is unable to restart a child.</p> + <p> + Own Id: OTP-13618 Aux Id: PR-1001 </p> + </item> + <item> + <p> The Erlang code preprocessor (<c>epp</c>) can handle + file names spanning over many tokens. Example: + <c>-include("a" "file" "name").</c>. </p> + <p> + Own Id: OTP-13662 Aux Id: seq13136 </p> + </item> </list> </section> @@ -287,6 +324,37 @@ <p> Own Id: OTP-13524 Aux Id: PR-1002 </p> </item> + <item> + <p> + Supervisors now explicitly add their callback module in + the return from sys:get_status/1,2. This is to simplify + custom supervisor implementations. The Misc part of the + return value from sys:get_status/1,2 for a supervisor is + now:</p> + <p> + [{data, [{"State", + State}]},{supervisor,[{"Callback",Module}]}]</p> + <p> + *** POTENTIAL INCOMPATIBILITY ***</p> + <p> + Own Id: OTP-13619 Aux Id: PR-1000 </p> + </item> + <item> + <p> + Relax translation of initial calls in <c>proc_lib</c>, + i.e. remove the restriction to only do the translation + for <c>gen_server</c> and <c>gen_fsm</c>. This enables + user defined <c>gen</c> based generic callback modules to + be displayed nicely in <c>c:i()</c> and observer.</p> + <p> + Own Id: OTP-13623</p> + </item> + <item> + <p>The function <c>queue:lait/1</c> (misspelling of + <c>liat/1</c>) is now deprecated.</p> + <p> + Own Id: OTP-13658</p> + </item> </list> </section> @@ -458,7 +526,7 @@ </item> <item> <p> - The <c>stdlib</c> reference manual is updated to show + The STDLIB reference manual is updated to show correct information about the return value of <c>gen_fsm:reply/2</c>.</p> <p> @@ -6168,7 +6236,7 @@ documentation for <c>compile</c> on how to provide the key for encrypting, and the documentation for <c>beam_lib</c> on how to provide the key for decryption so that tools such - as the Debugger, <c>xref</c>, or <c>cover</c> can be used.</p> + as the Debugger, Xref, or Cover can be used.</p> <p>The <c>beam_lib:chunks/2</c> functions now accepts an additional chunk type <c>compile_info</c> to retrieve the compilation information directly as a term. (Thanks diff --git a/lib/stdlib/doc/src/orddict.xml b/lib/stdlib/doc/src/orddict.xml index 950f688735..076b06fc38 100644 --- a/lib/stdlib/doc/src/orddict.xml +++ b/lib/stdlib/doc/src/orddict.xml @@ -24,33 +24,35 @@ <title>orddict</title> <prepared>Robert Virding</prepared> - <responsible>nobody</responsible> + <responsible></responsible> <docno></docno> - <approved>nobody</approved> - <checked>no</checked> + <approved></approved> + <checked></checked> <date>2007-04-16</date> <rev>B</rev> - <file>orddict.sgml</file> + <file>orddict.xml</file> </header> <module>orddict</module> - <modulesummary>Key-Value Dictionary as Ordered List</modulesummary> + <modulesummary>Key-value dictionary as ordered list.</modulesummary> <description> - <p><c>Orddict</c> implements a <c>Key</c> - <c>Value</c> dictionary. + <p>This module provides a <c>Key</c>-<c>Value</c> dictionary. An <c>orddict</c> is a representation of a dictionary, where a list of pairs is used to store the keys and values. The list is ordered after the keys.</p> - <p>This module provides exactly the same interface as the module - <c>dict</c> but with a defined representation. One difference is + + <p>This module provides the same interface as the + <seealso marker="dict"><c>dict(3)</c></seealso> module + but with a defined representation. One difference is that while <c>dict</c> considers two keys as different if they do not match (<c>=:=</c>), this module considers two keys as - different if and only if they do not compare equal - (<c>==</c>).</p> + different if and only if they do not compare equal (<c>==</c>).</p> </description> <datatypes> <datatype> <name name="orddict" n_vars="2"/> - <desc><p>Dictionary as returned by <c>new/0</c>.</p></desc> + <desc><p>Dictionary as returned by + <seealso marker="#new/0"><c>new/0</c></seealso>.</p></desc> </datatype> <datatype> <name name="orddict" n_vars="0"/> @@ -60,202 +62,229 @@ <funcs> <func> <name name="append" arity="3"/> - <fsummary>Append a value to keys in a dictionary</fsummary> + <fsummary>Append a value to keys in a dictionary.</fsummary> <desc> - <p>This function appends a new <c><anno>Value</anno></c> to the current list - of values associated with <c><anno>Key</anno></c>. An exception is - generated if the initial value associated with <c><anno>Key</anno></c> is - not a list of values.</p> + <p>Appends a new <c><anno>Value</anno></c> to the current list + of values associated with <c><anno>Key</anno></c>. An exception is + generated if the initial value associated with <c><anno>Key</anno></c> + is not a list of values.</p> + <p>See also section <seealso marker="#notes">Notes</seealso>.</p> </desc> </func> + <func> <name name="append_list" arity="3"/> - <fsummary>Append new values to keys in a dictionary</fsummary> + <fsummary>Append new values to keys in a dictionary.</fsummary> <desc> - <p>This function appends a list of values <c><anno>ValList</anno></c> to - the current list of values associated with <c><anno>Key</anno></c>. An - exception is generated if the initial value associated with + <p>Appends a list of values <c><anno>ValList</anno></c> to + the current list of values associated with <c><anno>Key</anno></c>. + An exception is generated if the initial value associated with <c><anno>Key</anno></c> is not a list of values.</p> + <p>See also section <seealso marker="#notes">Notes</seealso>.</p> </desc> </func> + <func> <name name="erase" arity="2"/> - <fsummary>Erase a key from a dictionary</fsummary> + <fsummary>Erase a key from a dictionary.</fsummary> <desc> - <p>This function erases all items with a given key from a - dictionary.</p> + <p>Erases all items with a specified key from a dictionary.</p> </desc> </func> + <func> <name name="fetch" arity="2"/> - <fsummary>Look-up values in a dictionary</fsummary> + <fsummary>Look up values in a dictionary.</fsummary> <desc> - <p>This function returns the value associated with <c><anno>Key</anno></c> - in the dictionary <c><anno>Orddict</anno></c>. <c>fetch</c> assumes that - the <c><anno>Key</anno></c> is present in the dictionary and an exception + <p>Returns the value associated with <c><anno>Key</anno></c> + in dictionary <c><anno>Orddict</anno></c>. This function assumes that + the <c><anno>Key</anno></c> is present in the dictionary. An exception is generated if <c><anno>Key</anno></c> is not in the dictionary.</p> + <p>See also section <seealso marker="#notes">Notes</seealso>.</p> </desc> </func> + <func> <name name="fetch_keys" arity="1"/> - <fsummary>Return all keys in a dictionary</fsummary> + <fsummary>Return all keys in a dictionary.</fsummary> <desc> - <p>This function returns a list of all keys in the dictionary.</p> + <p>Returns a list of all keys in a dictionary.</p> </desc> </func> + <func> <name name="filter" arity="2"/> - <fsummary>Choose elements which satisfy a predicate</fsummary> + <fsummary>Select elements that satisfy a predicate.</fsummary> <desc> - <p><c><anno>Orddict2</anno></c> is a dictionary of all keys and values in - <c><anno>Orddict1</anno></c> for which <c><anno>Pred</anno>(<anno>Key</anno>, <anno>Value</anno>)</c> is <c>true</c>.</p> + <p><c><anno>Orddict2</anno></c> is a dictionary of all keys and values + in <c><anno>Orddict1</anno></c> for which + <c><anno>Pred</anno>(<anno>Key</anno>, <anno>Value</anno>)</c> is + <c>true</c>.</p> </desc> </func> + <func> <name name="find" arity="2"/> - <fsummary>Search for a key in a dictionary</fsummary> + <fsummary>Search for a key in a dictionary.</fsummary> <desc> - <p>This function searches for a key in a dictionary. Returns - <c>{ok, <anno>Value</anno>}</c> where <c><anno>Value</anno></c> is the value associated - with <c><anno>Key</anno></c>, or <c>error</c> if the key is not present in - the dictionary.</p> + <p>Searches for a key in a dictionary. Returns + <c>{ok, <anno>Value</anno>}</c>, where <c><anno>Value</anno></c> is + the value associated with <c><anno>Key</anno></c>, or <c>error</c> if + the key is not present in the dictionary.</p> + <p>See also section <seealso marker="#notes">Notes</seealso>.</p> </desc> </func> + <func> <name name="fold" arity="3"/> - <fsummary>Fold a function over a dictionary</fsummary> + <fsummary>Fold a function over a dictionary.</fsummary> <desc> <p>Calls <c><anno>Fun</anno></c> on successive keys and values of <c><anno>Orddict</anno></c> together with an extra argument <c>Acc</c> (short for accumulator). <c><anno>Fun</anno></c> must return a new - accumulator which is passed to the next call. <c><anno>Acc0</anno></c> is - returned if the list is empty.</p> + accumulator that is passed to the next call. <c><anno>Acc0</anno></c> + is returned if the list is empty.</p> </desc> </func> + <func> <name name="from_list" arity="1"/> - <fsummary>Convert a list of pairs to a dictionary</fsummary> + <fsummary>Convert a list of pairs to a dictionary.</fsummary> <desc> - <p>This function converts the <c><anno>Key</anno></c> - <c><anno>Value</anno></c> list + <p>Converts the <c><anno>Key</anno></c>-<c><anno>Value</anno></c> list <c><anno>List</anno></c> to a dictionary.</p> </desc> </func> + + <func> + <name name="is_empty" arity="1"/> + <fsummary>Return true if the dictionary is empty.</fsummary> + <desc> + <p>Returns <c>true</c> if <c><anno>Orddict</anno></c> has no elements, + otherwise <c>false</c>.</p> + </desc> + </func> + <func> <name name="is_key" arity="2"/> - <fsummary>Test if a key is in a dictionary</fsummary> + <fsummary>Test if a key is in a dictionary.</fsummary> <desc> - <p>This function tests if <c><anno>Key</anno></c> is contained in - the dictionary <c><anno>Orddict</anno></c>.</p> + <p>Tests if <c><anno>Key</anno></c> is contained in + dictionary <c><anno>Orddict</anno></c>.</p> </desc> </func> + <func> <name name="map" arity="2"/> - <fsummary>Map a function over a dictionary</fsummary> + <fsummary>Map a function over a dictionary.</fsummary> <desc> - <p><c>map</c> calls <c><anno>Fun</anno></c> on successive keys and values - of <c><anno>Orddict1</anno></c> to return a new value for each key.</p> + <p>Calls <c><anno>Fun</anno></c> on successive keys and values of + <c><anno>Orddict1</anno></c> tvo return a new value for each key.</p> </desc> </func> + <func> <name name="merge" arity="3"/> - <fsummary>Merge two dictionaries</fsummary> + <fsummary>Merge two dictionaries.</fsummary> <desc> - <p><c>merge</c> merges two dictionaries, <c><anno>Orddict1</anno></c> and - <c><anno>Orddict2</anno></c>, to create a new dictionary. All the <c><anno>Key</anno></c> - - <c><anno>Value</anno></c> pairs from both dictionaries are included in - the new dictionary. If a key occurs in both dictionaries then - <c><anno>Fun</anno></c> is called with the key and both values to return a - new value. <c>merge</c> could be defined as:</p> + <p>Merges two dictionaries, <c><anno>Orddict1</anno></c> and + <c><anno>Orddict2</anno></c>, to create a new dictionary. All the + <c><anno>Key</anno></c>-<c><anno>Value</anno></c> pairs from both + dictionaries are included in the new dictionary. If a key occurs in + both dictionaries, <c><anno>Fun</anno></c> is called with the key and + both values to return a new value. + <c>merge/3</c> can be defined as follows, but is faster:</p> <code type="none"> merge(Fun, D1, D2) -> fold(fun (K, V1, D) -> update(K, fun (V2) -> Fun(K, V1, V2) end, V1, D) end, D2, D1).</code> - <p>but is faster.</p> </desc> </func> + <func> <name name="new" arity="0"/> - <fsummary>Create a dictionary</fsummary> + <fsummary>Create a dictionary.</fsummary> <desc> - <p>This function creates a new dictionary.</p> + <p>Creates a new dictionary.</p> </desc> </func> + <func> <name name="size" arity="1"/> - <fsummary>Return the number of elements in an ordered dictionary</fsummary> + <fsummary>Return the number of elements in an ordered dictionary. + </fsummary> <desc> <p>Returns the number of elements in an <c><anno>Orddict</anno></c>.</p> </desc> </func> - <func> - <name name="is_empty" arity="1"/> - <fsummary>Return true if the dictionary is empty</fsummary> - <desc> - <p>Returns <c>true</c> if <c><anno>Orddict</anno></c> has no elements, <c>false</c> otherwise.</p> - </desc> - </func> + <func> <name name="store" arity="3"/> - <fsummary>Store a value in a dictionary</fsummary> + <fsummary>Store a value in a dictionary.</fsummary> <desc> - <p>This function stores a <c><anno>Key</anno></c> - <c><anno>Value</anno></c> pair in a - dictionary. If the <c><anno>Key</anno></c> already exists in <c><anno>Orddict1</anno></c>, + <p>Stores a <c><anno>Key</anno></c>-<c><anno>Value</anno></c> pair in a + dictionary. If the <c><anno>Key</anno></c> already exists in + <c><anno>Orddict1</anno></c>, the associated value is replaced by <c><anno>Value</anno></c>.</p> </desc> </func> + <func> <name name="to_list" arity="1"/> - <fsummary>Convert a dictionary to a list of pairs</fsummary> + <fsummary>Convert a dictionary to a list of pairs.</fsummary> <desc> - <p>This function converts the dictionary to a list - representation.</p> + <p>Converts a dictionary to a list representation.</p> </desc> </func> + <func> <name name="update" arity="3"/> - <fsummary>Update a value in a dictionary</fsummary> + <fsummary>Update a value in a dictionary.</fsummary> <desc> - <p>Update a value in a dictionary by calling <c><anno>Fun</anno></c> on - the value to get a new value. An exception is generated if + <p>Updates a value in a dictionary by calling <c><anno>Fun</anno></c> + on the value to get a new value. An exception is generated if <c><anno>Key</anno></c> is not present in the dictionary.</p> </desc> </func> + <func> <name name="update" arity="4"/> - <fsummary>Update a value in a dictionary</fsummary> + <fsummary>Update a value in a dictionary.</fsummary> <desc> - <p>Update a value in a dictionary by calling <c><anno>Fun</anno></c> on - the value to get a new value. If <c><anno>Key</anno></c> is not present - in the dictionary then <c><anno>Initial</anno></c> will be stored as - the first value. For example <c>append/3</c> could be defined - as:</p> + <p>Updates a value in a dictionary by calling <c><anno>Fun</anno></c> + on the value to get a new value. If <c><anno>Key</anno></c> is not + present in the dictionary, <c><anno>Initial</anno></c> is stored as + the first value. For example, <c>append/3</c> can be defined + as follows:</p> <code type="none"> append(Key, Val, D) -> update(Key, fun (Old) -> Old ++ [Val] end, [Val], D).</code> </desc> </func> + <func> <name name="update_counter" arity="3"/> - <fsummary>Increment a value in a dictionary</fsummary> + <fsummary>Increment a value in a dictionary.</fsummary> <desc> - <p>Add <c><anno>Increment</anno></c> to the value associated with <c><anno>Key</anno></c> - and store this value. If <c><anno>Key</anno></c> is not present in - the dictionary then <c><anno>Increment</anno></c> will be stored as + <p>Adds <c><anno>Increment</anno></c> to the value associated with + <c><anno>Key</anno></c> + and store this value. If <c><anno>Key</anno></c> is not present in + the dictionary, <c><anno>Increment</anno></c> is stored as the first value.</p> - <p>This could be defined as:</p> + <p>This can be defined as follows, but is faster:</p> <code type="none"> update_counter(Key, Incr, D) -> update(Key, fun (Old) -> Old + Incr end, Incr, D).</code> - <p>but is faster.</p> </desc> </func> </funcs> <section> <title>Notes</title> - <p>The functions <c>append</c> and <c>append_list</c> are included - so we can store keyed values in a list <em>accumulator</em>. For + <marker id="notes"/> + <p>Functions <c>append/3</c> and <c>append_list/3</c> are included + so that keyed values can be stored in a list <em>accumulator</em>, for example:</p> <pre> > D0 = orddict:new(), @@ -264,19 +293,18 @@ update_counter(Key, Incr, D) -> D3 = orddict:append(files, f2, D2), D4 = orddict:append(files, f3, D3), orddict:fetch(files, D4). -[f1,f2,f3] </pre> +[f1,f2,f3]</pre> <p>This saves the trouble of first fetching a keyed value, appending a new value to the list of stored values, and storing - the result. - </p> - <p>The function <c>fetch</c> should be used if the key is known to - be in the dictionary, otherwise <c>find</c>.</p> + the result.</p> + <p>Function <c>fetch/2</c> is to be used if the key is known to + be in the dictionary, otherwise function <c>find/2</c>.</p> </section> <section> <title>See Also</title> - <p><seealso marker="dict">dict(3)</seealso>, - <seealso marker="gb_trees">gb_trees(3)</seealso></p> + <p><seealso marker="dict"><c>dict(3)</c></seealso>, + <seealso marker="gb_trees"><c>gb_trees(3)</c></seealso></p> </section> </erlref> diff --git a/lib/stdlib/doc/src/ordsets.xml b/lib/stdlib/doc/src/ordsets.xml index 0d5d618b66..148281fcf7 100644 --- a/lib/stdlib/doc/src/ordsets.xml +++ b/lib/stdlib/doc/src/ordsets.xml @@ -24,23 +24,26 @@ <title>ordsets</title> <prepared>Robert Virding</prepared> - <responsible>Bjarne Dacker</responsible> + <responsible>Bjarne Däcker</responsible> <docno>1</docno> <approved>Bjarne Däcker</approved> <checked></checked> - <date>99-07-27</date> + <date>1999-07-27</date> <rev>A</rev> - <file>ordsets.sgml</file> + <file>ordsets.xml</file> </header> <module>ordsets</module> - <modulesummary>Functions for Manipulating Sets as Ordered Lists</modulesummary> + <modulesummary>Functions for manipulating sets as ordered lists. + </modulesummary> <description> <p>Sets are collections of elements with no duplicate elements. An <c>ordset</c> is a representation of a set, where an ordered list is used to store the elements of the set. An ordered list is more efficient than an unordered list.</p> - <p>This module provides exactly the same interface as the module - <c>sets</c> but with a defined representation. One difference is + + <p>This module provides the same interface as the + <seealso marker="sets"><c>sets(3)</c></seealso> module + but with a defined representation. One difference is that while <c>sets</c> considers two elements as different if they do not match (<c>=:=</c>), this module considers two elements as different if and only if they do not compare equal (<c>==</c>).</p> @@ -49,146 +52,168 @@ <datatypes> <datatype> <name name="ordset" n_vars="1"/> - <desc><p>As returned by new/0.</p></desc> + <desc><p>As returned by + <seealso marker="#new/0"><c>new/0</c></seealso>.</p></desc> </datatype> </datatypes> + <funcs> <func> - <name name="new" arity="0"/> - <fsummary>Return an empty set</fsummary> + <name name="add_element" arity="2"/> + <fsummary>Add an element to an <c>Ordset</c>.</fsummary> <desc> - <p>Returns a new empty ordered set.</p> + <p>Returns a new ordered set formed from <c><anno>Ordset1</anno></c> + with <c><anno>Element</anno></c> inserted.</p> </desc> </func> + <func> - <name name="is_set" arity="1"/> - <fsummary>Test for an <c>Ordset</c></fsummary> + <name name="del_element" arity="2"/> + <fsummary>Remove an element from an <c>Ordset</c>.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>Ordset</anno></c> is an ordered set of - elements, otherwise <c>false</c>.</p> + <p>Returns <c><anno>Ordset1</anno></c>, but with + <c><anno>Element</anno></c> removed.</p> </desc> </func> + <func> - <name name="size" arity="1"/> - <fsummary>Return the number of elements in a set</fsummary> + <name name="filter" arity="2"/> + <fsummary>Filter set elements.</fsummary> <desc> - <p>Returns the number of elements in <c><anno>Ordset</anno></c>.</p> + <p>Filters elements in <c><anno>Ordset1</anno></c> with boolean function + <c><anno>Pred</anno></c>.</p> </desc> </func> + <func> - <name name="to_list" arity="1"/> - <fsummary>Convert an <c>Ordset</c>into a list</fsummary> + <name name="fold" arity="3"/> + <fsummary>Fold over set elements.</fsummary> <desc> - <p>Returns the elements of <c><anno>Ordset</anno></c> as a list.</p> + <p>Folds <c><anno>Function</anno></c> over every element in + <c><anno>Ordset</anno></c> and returns the final value of the + accumulator.</p> </desc> </func> + <func> <name name="from_list" arity="1"/> - <fsummary>Convert a list into an <c>Ordset</c></fsummary> + <fsummary>Convert a list into an <c>Ordset</c>.</fsummary> <desc> - <p>Returns an ordered set of the elements in <c><anno>List</anno></c>.</p> + <p>Returns an ordered set of the elements in <c><anno>List</anno></c>. + </p> </desc> </func> + <func> - <name name="is_element" arity="2"/> - <fsummary>Test for membership of an <c>Ordset</c></fsummary> + <name name="intersection" arity="1"/> + <fsummary>Return the intersection of a list of <c>Ordsets</c></fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>Element</anno></c> is an element of - <c><anno>Ordset</anno></c>, otherwise <c>false</c>.</p> + <p>Returns the intersection of the non-empty list of sets.</p> </desc> </func> + <func> - <name name="add_element" arity="2"/> - <fsummary>Add an element to an <c>Ordset</c></fsummary> + <name name="intersection" arity="2"/> + <fsummary>Return the intersection of two <c>Ordsets</c>.</fsummary> <desc> - <p>Returns a new ordered set formed from <c><anno>Ordset1</anno></c> with - <c><anno>Element</anno></c> inserted.</p> + <p>Returns the intersection of <c><anno>Ordset1</anno></c> and + <c><anno>Ordset2</anno></c>.</p> </desc> </func> + <func> - <name name="del_element" arity="2"/> - <fsummary>Remove an element from an <c>Ordset</c></fsummary> + <name name="is_disjoint" arity="2"/> + <fsummary>Check whether two <c>Ordsets</c> are disjoint.</fsummary> <desc> - <p>Returns <c><anno>Ordset1</anno></c>, but with <c><anno>Element</anno></c> removed.</p> + <p>Returns <c>true</c> if <c><anno>Ordset1</anno></c> and + <c><anno>Ordset2</anno></c> are disjoint (have no elements in common), + otherwise <c>false</c>.</p> </desc> </func> + <func> - <name name="union" arity="2"/> - <fsummary>Return the union of two <c>Ordsets</c></fsummary> + <name name="is_element" arity="2"/> + <fsummary>Test for membership of an <c>Ordset</c>.</fsummary> <desc> - <p>Returns the merged (union) set of <c><anno>Ordset1</anno></c> and - <c><anno>Ordset2</anno></c>.</p> + <p>Returns <c>true</c> if <c><anno>Element</anno></c> is an element of + <c><anno>Ordset</anno></c>, otherwise <c>false</c>.</p> </desc> </func> + <func> - <name name="union" arity="1"/> - <fsummary>Return the union of a list of <c>Ordsets</c></fsummary> + <name name="is_set" arity="1"/> + <fsummary>Test for an <c>Ordset</c>.</fsummary> <desc> - <p>Returns the merged (union) set of the list of sets.</p> + <p>Returns <c>true</c> if <c><anno>Ordset</anno></c> is an ordered set + of elements, otherwise <c>false</c>.</p> </desc> </func> + <func> - <name name="intersection" arity="2"/> - <fsummary>Return the intersection of two <c>Ordsets</c></fsummary> + <name name="is_subset" arity="2"/> + <fsummary>Test for subset.</fsummary> <desc> - <p>Returns the intersection of <c><anno>Ordset1</anno></c> and - <c><anno>Ordset2</anno></c>.</p> + <p>Returns <c>true</c> when every element of <c><anno>Ordset1</anno></c> + is also a member of <c><anno>Ordset2</anno></c>, otherwise + <c>false</c>.</p> </desc> </func> + <func> - <name name="intersection" arity="1"/> - <fsummary>Return the intersection of a list of <c>Ordsets</c></fsummary> + <name name="new" arity="0"/> + <fsummary>Return an empty set.</fsummary> <desc> - <p>Returns the intersection of the non-empty list of sets.</p> + <p>Returns a new empty ordered set.</p> </desc> </func> + <func> - <name name="is_disjoint" arity="2"/> - <fsummary>Check whether two <c>Ordsets</c> are disjoint</fsummary> + <name name="size" arity="1"/> + <fsummary>Return the number of elements in a set.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>Ordset1</anno></c> and - <c><anno>Ordset2</anno></c> are disjoint (have no elements in common), - and <c>false</c> otherwise.</p> + <p>Returns the number of elements in <c><anno>Ordset</anno></c>.</p> </desc> </func> + <func> <name name="subtract" arity="2"/> - <fsummary>Return the difference of two <c>Ordsets</c></fsummary> + <fsummary>Return the difference of two <c>Ordsets</c>.</fsummary> <desc> - <p>Returns only the elements of <c><anno>Ordset1</anno></c> which are not + <p>Returns only the elements of <c><anno>Ordset1</anno></c> that are not also elements of <c><anno>Ordset2</anno></c>.</p> </desc> </func> + <func> - <name name="is_subset" arity="2"/> - <fsummary>Test for subset</fsummary> + <name name="to_list" arity="1"/> + <fsummary>Convert an <c>Ordset</c> into a list.</fsummary> <desc> - <p>Returns <c>true</c> when every element of <c><anno>Ordset1</anno></c> is - also a member of <c><anno>Ordset2</anno></c>, otherwise <c>false</c>.</p> + <p>Returns the elements of <c><anno>Ordset</anno></c> as a list.</p> </desc> </func> + <func> - <name name="fold" arity="3"/> - <fsummary>Fold over set elements</fsummary> + <name name="union" arity="1"/> + <fsummary>Return the union of a list of <c>Ordsets</c>.</fsummary> <desc> - <p>Fold <c><anno>Function</anno></c> over every element in <c><anno>Ordset</anno></c> - returning the final value of the accumulator.</p> + <p>Returns the merged (union) set of the list of sets.</p> </desc> </func> + <func> - <name name="filter" arity="2"/> - <fsummary>Filter set elements</fsummary> + <name name="union" arity="2"/> + <fsummary>Return the union of two <c>Ordsets</c>.</fsummary> <desc> - <p>Filter elements in <c><anno>Ordset1</anno></c> with boolean function - <c><anno>Pred</anno></c>.</p> + <p>Returns the merged (union) set of <c><anno>Ordset1</anno></c> and + <c><anno>Ordset2</anno></c>.</p> </desc> </func> </funcs> <section> <title>See Also</title> - <p><seealso marker="gb_sets">gb_sets(3)</seealso>, - <seealso marker="sets">sets(3)</seealso></p> + <p><seealso marker="gb_sets"><c>gb_sets(3)</c></seealso>, + <seealso marker="sets"><c>sets(3)</c></seealso></p> </section> </erlref> diff --git a/lib/stdlib/doc/src/part.xml b/lib/stdlib/doc/src/part.xml index 15b7bd4a1d..93c47405bf 100644 --- a/lib/stdlib/doc/src/part.xml +++ b/lib/stdlib/doc/src/part.xml @@ -31,9 +31,10 @@ <rev>1.0</rev> <file>part.xml</file> </header> - <description> - <p>The Erlang standard library <em>STDLIB</em>.</p> - </description> + + <description></description> + + <xi:include href="introduction.xml"/> <xi:include href="io_protocol.xml"/> <xi:include href="unicode_usage.xml"/> </part> diff --git a/lib/stdlib/doc/src/pool.xml b/lib/stdlib/doc/src/pool.xml index d217d071da..05d12ade28 100644 --- a/lib/stdlib/doc/src/pool.xml +++ b/lib/stdlib/doc/src/pool.xml @@ -29,89 +29,103 @@ <rev></rev> </header> <module>pool</module> - <modulesummary>Load Distribution Facility</modulesummary> + <modulesummary>Load distribution facility.</modulesummary> <description> - <p><c>pool</c> can be used to run a set of Erlang nodes as a pool + <p>This module can be used to run a set of Erlang nodes as a pool of computational processors. It is organized as a master and a set of slave nodes and includes the following features:</p> + <list type="bulleted"> <item>The slave nodes send regular reports to the master about - their current load.</item> + their current load.</item> <item>Queries can be sent to the master to determine which node - will have the least load.</item> + will have the least load.</item> </list> + <p>The BIF <c>statistics(run_queue)</c> is used for estimating future loads. It returns the length of the queue of ready to run processes in the Erlang runtime system.</p> - <p>The slave nodes are started with the <c>slave</c> module. This - effects, tty IO, file IO, and code loading.</p> - <p>If the master node fails, the entire pool will exit.</p> + + <p>The slave nodes are started with the + <seealso marker="slave"><c>slave(3)</c></seealso> module. This + effects terminal I/O, file I/O, and code loading.</p> + <p>If the master node fails, the entire pool exits.</p> </description> + <funcs> <func> - <name name="start" arity="1"/> - <name name="start" arity="2"/> - <fsummary>>Start a new pool</fsummary> - <desc> - <p>Starts a new pool. The file <c>.hosts.erlang</c> is read to - find host names where the pool nodes can be started. See - section <seealso marker="#files">Files</seealso> below. The - start-up procedure fails if the file is not found.</p> - <p>The slave nodes are started with <c>slave:start/2,3</c>, - passing along <c><anno>Name</anno></c> and, if provided, - <c><anno>Args</anno></c>. - <c><anno>Name</anno></c> is used as the first part of the node names, - <c><anno>Args</anno></c> is used to specify command line arguments. See - <seealso marker="slave#start/2">slave(3)</seealso>.</p> - <p>Access rights must be set so that all nodes in the pool have - the authority to access each other.</p> - <p>The function is synchronous and all the nodes, as well as - all the system servers, are running when it returns a value.</p> - </desc> - </func> - <func> <name name="attach" arity="1"/> - <fsummary>Ensure that a pool master is running</fsummary> + <fsummary>Ensure that a pool master is running.</fsummary> <desc> - <p>This function ensures that a pool master is running and - includes <c><anno>Node</anno></c> in the pool master's pool of nodes.</p> + <p>Ensures that a pool master is running and includes + <c><anno>Node</anno></c> in the pool master's pool of nodes.</p> </desc> </func> + <func> - <name name="stop" arity="0"/> - <fsummary>Stop the pool and kill all the slave nodes</fsummary> + <name name="get_node" arity="0"/> + <fsummary>Return the node with the expected lowest future load.</fsummary> <desc> - <p>Stops the pool and kills all the slave nodes.</p> + <p>Returns the node with the expected lowest future load.</p> </desc> </func> + <func> <name name="get_nodes" arity="0"/> - <fsummary>Return a list of the current member nodes of the pool</fsummary> + <fsummary>Return a list of the current member nodes of the pool. + </fsummary> <desc> <p>Returns a list of the current member nodes of the pool.</p> </desc> </func> + <func> <name name="pspawn" arity="3"/> - <fsummary>Spawn a process on the pool node with expected lowest future load</fsummary> + <fsummary>Spawn a process on the pool node with expected lowest future + load.</fsummary> <desc> - <p>Spawns a process on the pool node which is expected to have + <p>Spawns a process on the pool node that is expected to have the lowest future load.</p> </desc> </func> + <func> <name name="pspawn_link" arity="3"/> - <fsummary>Spawn and link to a process on the pool node with expected lowest future load</fsummary> + <fsummary>Spawn and link to a process on the pool node with expected + lowest future load.</fsummary> <desc> - <p>Spawn links a process on the pool node which is expected to + <p>Spawns and links to a process on the pool node that is expected to have the lowest future load.</p> </desc> </func> + <func> - <name name="get_node" arity="0"/> - <fsummary>Return the node with the expected lowest future load</fsummary> + <name name="start" arity="1"/> + <name name="start" arity="2"/> + <fsummary>>Start a new pool.</fsummary> <desc> - <p>Returns the node with the expected lowest future load.</p> + <p>Starts a new pool. The file <c>.hosts.erlang</c> is read to + find host names where the pool nodes can be started; see + section <seealso marker="#files">Files</seealso>. The + startup procedure fails if the file is not found.</p> + <p>The slave nodes are started with + <seealso marker="slave#start/2"><c>slave:start/2,3</c></seealso>, + passing along <c><anno>Name</anno></c> and, if provided, + <c><anno>Args</anno></c>. <c><anno>Name</anno></c> is used as the + first part of the node names, <c><anno>Args</anno></c> is used to + specify command-line arguments.</p> + <p>Access rights must be set so that all nodes in the pool have + the authority to access each other.</p> + <p>The function is synchronous and all the nodes, and + all the system servers, are running when it returns a value.</p> + </desc> + </func> + + <func> + <name name="stop" arity="0"/> + <fsummary>Stop the pool and kill all the slave nodes.</fsummary> + <desc> + <p>Stops the pool and kills all the slave nodes.</p> </desc> </func> </funcs> @@ -120,12 +134,12 @@ <marker id="files"></marker> <title>Files</title> <p><c>.hosts.erlang</c> is used to pick hosts where nodes can - be started. See - <seealso marker="kernel:net_adm#host_file/0">net_adm(3)</seealso> - for information about format and location of this file.</p> - <p><c>$HOME/.erlang.slave.out.HOST</c> is used for all additional IO - that may come from the slave nodes on standard IO. If the start-up - procedure does not work, this file may indicate the reason.</p> + be started. For information about format and location of this file, see + <seealso marker="kernel:net_adm#host_file/0"> + <c>net_adm:host_file/0</c></seealso>.</p> + <p><c>$HOME/.erlang.slave.out.HOST</c> is used for all extra I/O + that can come from the slave nodes on standard I/O. If the startup + procedure does not work, this file can indicate the reason.</p> </section> </erlref> diff --git a/lib/stdlib/doc/src/proc_lib.xml b/lib/stdlib/doc/src/proc_lib.xml index f02b1f0651..da03c39a26 100644 --- a/lib/stdlib/doc/src/proc_lib.xml +++ b/lib/stdlib/doc/src/proc_lib.xml @@ -29,44 +29,55 @@ <rev></rev> </header> <module>proc_lib</module> - <modulesummary>Functions for asynchronous and synchronous start of processes adhering to the OTP design principles.</modulesummary> + <modulesummary>Functions for asynchronous and synchronous start of processes + adhering to the OTP design principles.</modulesummary> <description> <p>This module is used to start processes adhering to - the <seealso marker="doc/design_principles:des_princ">OTP Design Principles</seealso>. Specifically, the functions in this - module are used by the OTP standard behaviors (<c>gen_server</c>, - <c>gen_fsm</c>, <c>gen_statem</c>, ...) when starting new processes. - The functions can also be used to start <em>special processes</em>, - user defined processes which comply to the OTP design principles. See - <seealso marker="doc/design_principles:spec_proc">Sys and Proc_Lib</seealso> in OTP Design Principles for an example.</p> + the <seealso marker="doc/design_principles:des_princ"> + OTP Design Principles</seealso>. Specifically, the functions in this + module are used by the OTP standard behaviors (for example, + <c>gen_server</c>, <c>gen_fsm</c>, and <c>gen_statem</c>) + when starting new processes. The functions + can also be used to start <em>special processes</em>, user-defined + processes that comply to the OTP design principles. For an example, + see section <seealso marker="doc/design_principles:spec_proc"> + sys and proc_lib</seealso> in OTP Design Principles.</p> + + <p>Some useful information is initialized when a process starts. The registered names, or the process identifiers, of the parent process, and the parent ancestors, are stored together with information about the function initially called in the process.</p> - <p>While in "plain Erlang" a process is said to terminate normally - only for the exit reason <c>normal</c>, a process started + + <p>While in "plain Erlang", a process is said to terminate normally + only for exit reason <c>normal</c>, a process started using <c>proc_lib</c> is also said to terminate normally if it exits with reason <c>shutdown</c> or <c>{shutdown,Term}</c>. <c>shutdown</c> is the reason used when an application (supervision tree) is stopped.</p> - <p>When a process started using <c>proc_lib</c> terminates - abnormally -- that is, with another exit reason than <c>normal</c>, - <c>shutdown</c>, or <c>{shutdown,Term}</c> -- a <em>crash report</em> + + <p>When a process that is started using <c>proc_lib</c> terminates + abnormally (that is, with another exit reason than <c>normal</c>, + <c>shutdown</c>, or <c>{shutdown,Term}</c>), a <em>crash report</em> is generated, which is written to terminal by the default SASL event handler. That is, the crash report is normally only visible - if the SASL application is started. See - <seealso marker="sasl:sasl_app">sasl(6)</seealso> and - <seealso marker="sasl:error_logging">SASL User's Guide</seealso>.</p> - <p>The crash report contains the previously stored information such + if the SASL application is started; see + <seealso marker="sasl:sasl_app"><c>sasl(6)</c></seealso> and section + <seealso marker="sasl:error_logging">SASL Error Logging</seealso> + in the SASL User's Guide.</p> + + <p>The crash report contains the previously stored information, such as ancestors and initial function, the termination reason, and - information regarding other processes which terminate as a result + information about other processes that terminate as a result of this process terminating.</p> </description> + <datatypes> <datatype> <name name="spawn_option"/> <desc> <p>See <seealso marker="erts:erlang#spawn_opt/4"> - erlang:spawn_opt/2,3,4,5</seealso>.</p> + <c>erlang:spawn_opt/2,3,4,5</c></seealso>.</p> </desc> </datatype> <datatype> @@ -83,8 +94,129 @@ <name name="dict_or_pid"/> </datatype> </datatypes> + <funcs> <func> + <name name="format" arity="1"/> + <fsummary>Format a crash report.</fsummary> + <desc> + <p>Equivalent to <seealso marker="#format/2"> + <c>format(<anno>CrashReport</anno>, latin1)</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="format" arity="2"/> + <fsummary>Format a crash report.</fsummary> + <desc> + <p>This function can be used by a user-defined event handler to + format a crash report. The crash report is sent using + <seealso marker="kernel:error_logger#error_report/2"> + <c>error_logger:error_report(crash_report, + <anno>CrashReport</anno>)</c></seealso>. + That is, the event to be handled is of the format + <c>{error_report, GL, {Pid, crash_report, + <anno>CrashReport</anno>}}</c>, + where <c>GL</c> is the group leader pid of process + <c>Pid</c> that sent the crash report.</p> + </desc> + </func> + + <func> + <name name="format" arity="3"/> + <fsummary>Format a crash report.</fsummary> + <desc> + <p>This function can be used by a user-defined event handler to + format a crash report. When <anno>Depth</anno> is specified as a + positive integer, it is used in the format string to + limit the output as follows: <c>io_lib:format("~P", + [Term,<anno>Depth</anno>])</c>.</p> + </desc> + </func> + + <func> + <name name="hibernate" arity="3"/> + <fsummary>Hibernate a process until a message is sent to it.</fsummary> + <desc> + <p>This function does the same as (and does call) the + <seealso marker="erts:erlang#erlang:hibernate/3"> + <c>hibernate/3</c></seealso> BIF, + but ensures that exception handling and logging continues to + work as expected when the process wakes up.</p> + <p>Always use this function instead of the BIF for processes started + using <c>proc_lib</c> functions.</p> + </desc> + </func> + + <func> + <name name="init_ack" arity="1"/> + <name name="init_ack" arity="2"/> + <fsummary>Used by a process when it has started.</fsummary> + <desc> + <p>This function must be used by a process that has been started by + a <seealso marker="#start/3"><c>start[_link]/3,4,5</c></seealso> + function. It tells <c><anno>Parent</anno></c> that the process has + initialized itself, has started, or has failed to initialize + itself.</p> + <p>Function <c>init_ack/1</c> uses the parent value + previously stored by the start function used.</p> + <p>If this function is not called, the start function + returns an error tuple (if a link and/or a time-out is used) or + hang otherwise.</p> + <p>The following example illustrates how this function and + <c>proc_lib:start_link/3</c> are used:</p> + <code type="none"> +-module(my_proc). +-export([start_link/0]). +-export([init/1]). + +start_link() -> + proc_lib:start_link(my_proc, init, [self()]). + +init(Parent) -> + case do_initialization() of + ok -> + proc_lib:init_ack(Parent, {ok, self()}); + {error, Reason} -> + exit(Reason) + end, + loop(). + +...</code> + </desc> + </func> + + <func> + <name name="initial_call" arity="1"/> + <fsummary>Extract the initial call of a <c>proc_lib</c>spawned process. + </fsummary> + <desc> + <p>Extracts the initial call of a process that was started + using one of the spawn or start functions in this module. + <c><anno>Process</anno></c> can either be a pid, an integer tuple + (from which a pid can be created), or the process information of a + process <c>Pid</c> fetched through an + <c>erlang:process_info(Pid)</c> function call.</p> + <note> + <p>The list <c><anno>Args</anno></c> no longer contains the + arguments, but the same number of atoms as the number of arguments; + the first atom is <c>'Argument__1'</c>, the second + <c>'Argument__2'</c>, and so on. The reason is that the argument + list could waste a significant amount of memory, and if the + argument list contained funs, it could be impossible to upgrade the + code for the module.</p> + <p>If the process was spawned using a fun, <c>initial_call/1</c> no + longer returns the fun, but the module, function for the + local function implementing the fun, and the arity, for example, + <c>{some_module,-work/3-fun-0-,0}</c> (meaning that the fun was + created in function <c>some_module:work/3</c>). The reason is that + keeping the fun would prevent code upgrade for the module, and that + a significant amount of memory could be wasted.</p> + </note> + </desc> + </func> + + <func> <name name="spawn" arity="1"/> <name name="spawn" arity="2"/> <name name="spawn" arity="3"/> @@ -96,11 +228,12 @@ <type variable="Function"/> <type variable="Args"/> <desc> - <p>Spawns a new process and initializes it as described above. - The process is spawned using the - <seealso marker="erts:erlang#spawn/1">spawn</seealso> BIFs.</p> + <p>Spawns a new process and initializes it as described in the + beginning of this manual page. The process is spawned using the + <seealso marker="erts:erlang#spawn/1"><c>spawn</c></seealso> BIFs.</p> </desc> </func> + <func> <name name="spawn_link" arity="1"/> <name name="spawn_link" arity="2"/> @@ -113,18 +246,19 @@ <type variable="Function"/> <type variable="Args"/> <desc> - <p>Spawns a new process and initializes it as described above. - The process is spawned using the - <seealso marker="erts:erlang#spawn_link/1">spawn_link</seealso> + <p>Spawns a new process and initializes it as described in the + beginning of this manual page. The process is spawned using the + <seealso marker="erts:erlang#spawn_link/1"><c>spawn_link</c></seealso> BIFs.</p> </desc> </func> + <func> <name name="spawn_opt" arity="2"/> <name name="spawn_opt" arity="3"/> <name name="spawn_opt" arity="4"/> <name name="spawn_opt" arity="5"/> - <fsummary>Spawn a new process with given options.</fsummary> + <fsummary>Spawn a new process with specified options.</fsummary> <type variable="Node"/> <type variable="Fun" name_i="1"/> <type variable="Module"/> @@ -132,17 +266,18 @@ <type variable="Args"/> <type variable="SpawnOpts"/> <desc> - <p>Spawns a new process and initializes it as described above. - The process is spawned using the - <seealso marker="erts:erlang#spawn_opt/2">spawn_opt</seealso> + <p>Spawns a new process and initializes it as described in the + beginning of this manual page. The process is spawned using the + <seealso marker="erts:erlang#spawn_opt/2"><c>spawn_opt</c></seealso> BIFs.</p> <note> - <p>Using the spawn option <c>monitor</c> is currently not - allowed, but will cause the function to fail with reason + <p>Using spawn option <c>monitor</c> is not + allowed. It causes the function to fail with reason <c>badarg</c>.</p> </note> </desc> </func> + <func> <name name="start" arity="3"/> <name name="start" arity="4"/> @@ -153,151 +288,94 @@ <fsummary>Start a new process synchronously.</fsummary> <desc> <p>Starts a new process synchronously. Spawns the process and - waits for it to start. When the process has started, it + waits for it to start. When the process has started, it <em>must</em> call - <seealso marker="#init_ack/2">init_ack(Parent,Ret)</seealso> - or <seealso marker="#init_ack/1">init_ack(Ret)</seealso>, + <seealso marker="#init_ack/2"><c>init_ack(Parent, Ret)</c></seealso> + or <seealso marker="#init_ack/1"><c>init_ack(Ret)</c></seealso>, where <c>Parent</c> is the process that evaluates this - function. At this time, <c>Ret</c> is returned.</p> - <p>If the <c>start_link/3,4,5</c> function is used and + function. At this time, <c>Ret</c> is returned.</p> + <p>If function <c>start_link/3,4,5</c> is used and the process crashes before it has called <c>init_ack/1,2</c>, - <c>{error, <anno>Reason</anno>}</c> is returned if the calling process - traps exits.</p> - <p>If <c><anno>Time</anno></c> is specified as an integer, this function - waits for <c><anno>Time</anno></c> milliseconds for the new process to call - <c>init_ack</c>, or <c>{error, timeout}</c> is returned, and - the process is killed.</p> - <p>The <c><anno>SpawnOpts</anno></c> argument, if given, will be passed - as the last argument to the <c>spawn_opt/2,3,4,5</c> BIF.</p> + <c>{error, <anno>Reason</anno>}</c> is returned if the calling + process traps exits.</p> + <p>If <c><anno>Time</anno></c> is specified as an integer, this + function waits for <c><anno>Time</anno></c> milliseconds for the + new process to call <c>init_ack</c>, or <c>{error, timeout}</c> is + returned, and the process is killed.</p> + <p>Argument <c><anno>SpawnOpts</anno></c>, if specified, is passed + as the last argument to the <seealso marker="erts:erlang#spawn_opt/2"> + <c>spawn_opt/2,3,4,5</c></seealso> BIF.</p> <note> - <p>Using the spawn option <c>monitor</c> is currently not - allowed, but will cause the function to fail with reason + <p>Using spawn option <c>monitor</c> is not + allowed. It causes the function to fail with reason <c>badarg</c>.</p> </note> </desc> </func> - <func> - <name name="init_ack" arity="1"/> - <name name="init_ack" arity="2"/> - <fsummary>Used by a process when it has started.</fsummary> - <desc> - <p>This function must be used by a process that has been started by - a <seealso marker="#start/3">start[_link]/3,4,5</seealso> - function. It tells <c><anno>Parent</anno></c> that the process has - initialized itself, has started, or has failed to initialize - itself.</p> - <p>The <c>init_ack/1</c> function uses the parent value - previously stored by the start function used.</p> - <p>If this function is not called, the start function will - return an error tuple (if a link and/or a timeout is used) or - hang otherwise.</p> - <p>The following example illustrates how this function and - <c>proc_lib:start_link/3</c> are used.</p> - <code type="none"> --module(my_proc). --export([start_link/0]). --export([init/1]). -start_link() -> - proc_lib:start_link(my_proc, init, [self()]). - -init(Parent) -> - case do_initialization() of - ok -> - proc_lib:init_ack(Parent, {ok, self()}); - {error, Reason} -> - exit(Reason) - end, - loop(). - -...</code> - </desc> - </func> <func> - <name name="format" arity="1"/> - <fsummary>Format a crash report.</fsummary> - <desc> - <p>Equivalent to <c>format(<anno>CrashReport</anno>, latin1)</c>.</p> - </desc> - </func> - <func> - <name name="format" arity="2"/> - <fsummary>Format a crash report.</fsummary> + <name name="stop" arity="1"/> + <fsummary>Terminate a process synchronously.</fsummary> + <type variable="Process"/> <desc> - <p>This function can be used by a user defined event handler to - format a crash report. The crash report is sent using - <c>error_logger:error_report(crash_report, <anno>CrashReport</anno>)</c>. - That is, the event to be handled is of the format - <c>{error_report, GL, {Pid, crash_report, <anno>CrashReport</anno>}}</c> - where <c>GL</c> is the group leader pid of the process - <c>Pid</c> which sent the crash report.</p> + <p>Equivalent to <seealso marker="#stop/3"> + <c>stop(Process, normal, infinity)</c></seealso>.</p> </desc> </func> + <func> - <name name="format" arity="3"/> - <fsummary>Format a crash report.</fsummary> + <name name="stop" arity="3"/> + <fsummary>Terminate a process synchronously.</fsummary> + <type variable="Process"/> + <type variable="Reason"/> + <type variable="Timeout"/> <desc> - <p>This function can be used by a user defined event handler to - format a crash report. When <anno>Depth</anno> is given as an - positive integer, it will be used in the format string to - limit the output as follows: <c>io_lib:format("~P", - [Term,<anno>Depth</anno>])</c>.</p> + <p>Orders the process to exit with the specified <c>Reason</c> and + waits for it to terminate.</p> + <p>Returns <c>ok</c> if the process exits with + the specified <c>Reason</c> within <c>Timeout</c> milliseconds.</p> + <p>If the call times out, a <c>timeout</c> exception is raised.</p> + <p>If the process does not exist, a <c>noproc</c> + exception is raised.</p> + <p>The implementation of this function is based on the + <c>terminate</c> system message, and requires that the + process handles system messages correctly. + For information about system messages, see + <seealso marker="sys"><c>sys(3)</c></seealso> and section + <seealso marker="doc/design_principles:spec_proc"> + sys and proc_lib</seealso> in OTP Design Principles.</p> </desc> </func> - <func> - <name name="initial_call" arity="1"/> - <fsummary>Extract the initial call of a <c>proc_lib</c>spawned process.</fsummary> - <desc> - <p>Extracts the initial call of a process that was started - using one of the spawn or start functions described above. - <c><anno>Process</anno></c> can either be a pid, an integer tuple (from - which a pid can be created), or the process information of a - process <c>Pid</c> fetched through an - <c>erlang:process_info(Pid)</c> function call.</p> - - <note><p>The list <c><anno>Args</anno></c> no longer contains the actual arguments, - but the same number of atoms as the number of arguments; the first atom - is always <c>'Argument__1'</c>, the second <c>'Argument__2'</c>, and - so on. The reason is that the argument list could waste a significant - amount of memory, and if the argument list contained funs, it could - be impossible to upgrade the code for the module.</p> - <p>If the process was spawned using a fun, <c>initial_call/1</c> no - longer returns the actual fun, but the module, function for the local - function implementing the fun, and the arity, for instance - <c>{some_module,-work/3-fun-0-,0}</c> (meaning that the fun was - created in the function <c>some_module:work/3</c>). - The reason is that keeping the fun would prevent code upgrade for the - module, and that a significant amount of memory could be wasted.</p> - </note> - </desc> - </func> <func> <name name="translate_initial_call" arity="1"/> - <fsummary>Extract and translate the initial call of a <c>proc_lib</c>spawned process.</fsummary> + <fsummary>Extract and translate the initial call of a + <c>proc_lib</c>spawned process.</fsummary> <desc> - <p>This function is used by the <c>c:i/0</c> and - <c>c:regs/0</c> functions in order to present process - information.</p> - <p>Extracts the initial call of a process that was started - using one of the spawn or start functions described above, - and translates it to more useful information. <c><anno>Process</anno></c> + <p>This function is used by functions + <seealso marker="c#i/0"><c>c:i/0</c></seealso> and + <seealso marker="c#regs/0"><c>c:regs/0</c></seealso> + to present process information.</p> + <p>This function extracts the initial call of a process that was + started using one of the spawn or start functions in this module, + and translates it to more useful information. + <c><anno>Process</anno></c> can either be a pid, an integer tuple (from which a pid can be created), or the process information of a process <c>Pid</c> fetched through an <c>erlang:process_info(Pid)</c> function call.</p> - <p>If the initial call is to one of the system defined behaviors + <p>If the initial call is to one of the system-defined behaviors such as <c>gen_server</c> or <c>gen_event</c>, it is translated to more useful information. If a <c>gen_server</c> is spawned, the returned <c><anno>Module</anno></c> is the name of the callback module and <c><anno>Function</anno></c> is <c>init</c> (the function that initiates the new server).</p> <p>A <c>supervisor</c> and a <c>supervisor_bridge</c> are also - <c>gen_server</c> processes. In order to return information + <c>gen_server</c> processes. To return information that this process is a supervisor and the name of the - call-back module, <c><anno>Module</anno></c> is <c>supervisor</c> and + callback module, <c><anno>Module</anno></c> is <c>supervisor</c> and <c><anno>Function</anno></c> is the name of the supervisor callback - module. <c><anno>Arity</anno></c> is <c>1</c> since the <c>init/1</c> + module. <c><anno>Arity</anno></c> is <c>1</c>, as the <c>init/1</c> function is called initially in the callback module.</p> <p>By default, <c>{proc_lib,init_p,5}</c> is returned if no information about the initial call can be found. It is @@ -305,57 +383,12 @@ init(Parent) -> spawned with the <c>proc_lib</c> module.</p> </desc> </func> - <func> - <name name="hibernate" arity="3"/> - <fsummary>Hibernate a process until a message is sent to it</fsummary> - <desc> - <p>This function does the same as (and does call) the BIF - <seealso marker="erts:erlang#erlang:hibernate/3">hibernate/3</seealso>, - but ensures that exception handling and logging continues to - work as expected when the process wakes up. Always use this - function instead of the BIF for processes started using - <c>proc_lib</c> functions.</p> - </desc> - </func> - <func> - <name name="stop" arity="1"/> - <fsummary>Terminate a process synchronously.</fsummary> - <type variable="Process"/> - <desc> - <p>Equivalent to <seealso marker="#stop/3">stop(Process, - normal, infinity)</seealso>.</p> - </desc> - </func> - <func> - <name name="stop" arity="3"/> - <fsummary>Terminate a process synchronously.</fsummary> - <type variable="Process"/> - <type variable="Reason"/> - <type variable="Timeout"/> - <desc> - <p>Orders the process to exit with the given <c>Reason</c> and - waits for it to terminate.</p> - <p>The function returns <c>ok</c> if the process exits with - the given <c>Reason</c> within <c>Timeout</c> - milliseconds.</p> - <p>If the call times out, a <c>timeout</c> exception is - raised.</p> - <p>If the process does not exist, a <c>noproc</c> - exception is raised.</p> - <p>The implementation of this function is based on the - <c>terminate</c> system message, and requires that the - process handles system messages correctly. - See <seealso marker="sys">sys(3)</seealso> - and <seealso marker="doc/design_principles:spec_proc">OTP - Design Principles</seealso> for information about system - messages.</p> - </desc> - </func> </funcs> <section> - <title>SEE ALSO</title> - <p><seealso marker="kernel:error_logger">error_logger(3)</seealso></p> + <title>See Also</title> + <p><seealso marker="kernel:error_logger"> + <c>error_logger(3)</c></seealso></p> </section> </erlref> diff --git a/lib/stdlib/doc/src/proplists.xml b/lib/stdlib/doc/src/proplists.xml index 832df9556a..fe6b8cc3bf 100644 --- a/lib/stdlib/doc/src/proplists.xml +++ b/lib/stdlib/doc/src/proplists.xml @@ -30,51 +30,66 @@ <checked></checked> <date>2002-09-28</date> <rev>A</rev> - <file>proplists.sgml</file> + <file>proplists.xml</file> </header> <module>proplists</module> - <modulesummary>Support functions for property lists</modulesummary> + <modulesummary>Support functions for property lists.</modulesummary> <description> <p>Property lists are ordinary lists containing entries in the form of either tuples, whose first elements are keys used for lookup and - insertion, or atoms, which work as shorthand for tuples <c>{Atom, true}</c>. (Other terms are allowed in the lists, but are ignored - by this module.) If there is more than one entry in a list for a + insertion, or atoms, which work as shorthand for tuples + <c>{Atom, true}</c>. (Other terms are allowed in the lists, but are + ignored by this module.) If there is more than one entry in a list for a certain key, the first occurrence normally overrides any later (irrespective of the arity of the tuples).</p> + <p>Property lists are useful for representing inherited properties, - such as options passed to a function where a user may specify options + such as options passed to a function where a user can specify options overriding the default settings, object properties, annotations, - etc.</p> - <p>Two keys are considered equal if they match (<c>=:=</c>). In other - words, numbers are compared literally rather than by value, so that, - for instance, <c>1</c> and <c>1.0</c> are different keys.</p> + and so on.</p> + + <p>Two keys are considered equal if they match (<c>=:=</c>). That is, + numbers are compared literally rather than by value, so that, + for example, <c>1</c> and <c>1.0</c> are different keys.</p> </description> + <datatypes> <datatype> <name name="property"/> </datatype> </datatypes> + <funcs> <func> <name name="append_values" arity="2"/> <fsummary></fsummary> <desc> - <p>Similar to <c>get_all_values/2</c>, but each value is - wrapped in a list unless it is already itself a list, and the - resulting list of lists is concatenated. This is often useful for - "incremental" options; e.g., <c>append_values(a, [{a, [1,2]}, {b, 0}, {a, 3}, {c, -1}, {a, [4]}])</c> will return the list - <c>[1,2,3,4]</c>.</p> + <p>Similar to + <seealso marker="#get_all_values/2"><c>get_all_values/2</c></seealso>, + but each value is wrapped in a list unless it is already itself a + list. The resulting list of lists is concatenated. This is often + useful for "incremental" options.</p> + <p><em>Example:</em></p> + <code type="none"> +append_values(a, [{a, [1,2]}, {b, 0}, {a, 3}, {c, -1}, {a, [4]}])</code> + <p>returns:</p> + <code type="none"> +[1,2,3,4]</code> </desc> </func> + <func> <name name="compact" arity="1"/> <fsummary></fsummary> <desc> <p>Minimizes the representation of all entries in the list. This is equivalent to <c><![CDATA[[property(P) || P <- ListIn]]]></c>.</p> - <p>See also: <c>property/1</c>, <c>unfold/1</c>.</p> + <p>See also + <seealso marker="#property/1"><c>property/1</c></seealso>, + <seealso marker="#unfold/1"><c>unfold/1</c></seealso>.</p> </desc> </func> + <func> <name name="delete" arity="2"/> <fsummary></fsummary> @@ -83,96 +98,111 @@ <c><anno>List</anno></c>.</p> </desc> </func> + <func> <name name="expand" arity="2"/> <fsummary></fsummary> <desc> <p>Expands particular properties to corresponding sets of - properties (or other terms). For each pair <c>{<anno>Property</anno>, <anno>Expansion</anno>}</c> in <c><anno>Expansions</anno></c>, if <c>E</c> is - the first entry in <c><anno>ListIn</anno></c> with the same key as - <c><anno>Property</anno></c>, and <c>E</c> and <c><anno>Property</anno></c> - have equivalent normal forms, then <c>E</c> is replaced with - the terms in <c><anno>Expansion</anno></c>, and any following entries with - the same key are deleted from <c><anno>ListIn</anno></c>.</p> - <p>For example, the following expressions all return <c>[fie, bar, baz, fum]</c>:</p> + properties (or other terms). For each pair <c>{<anno>Property</anno>, + <anno>Expansion</anno>}</c> in <c><anno>Expansions</anno></c>: if + <c>E</c> is the first entry in <c><anno>ListIn</anno></c> with the + same key as <c><anno>Property</anno></c>, and <c>E</c> and + <c><anno>Property</anno></c> have equivalent normal forms, then + <c>E</c> is replaced with the terms in <c><anno>Expansion</anno></c>, + and any following entries with the same key are deleted from + <c><anno>ListIn</anno></c>.</p> + <p>For example, the following expressions all return + <c>[fie, bar, baz, fum]</c>:</p> <code type="none"> - expand([{foo, [bar, baz]}], - [fie, foo, fum]) - expand([{{foo, true}, [bar, baz]}], - [fie, foo, fum]) - expand([{{foo, false}, [bar, baz]}], - [fie, {foo, false}, fum])</code> - <p>However, no expansion is done in the following call:</p> +expand([{foo, [bar, baz]}], [fie, foo, fum]) +expand([{{foo, true}, [bar, baz]}], [fie, foo, fum]) +expand([{{foo, false}, [bar, baz]}], [fie, {foo, false}, fum])</code> + <p>However, no expansion is done in the following call + because <c>{foo, false}</c> shadows <c>foo</c>:</p> <code type="none"> - expand([{{foo, true}, [bar, baz]}], - [{foo, false}, fie, foo, fum])</code> - <p>because <c>{foo, false}</c> shadows <c>foo</c>.</p> - <p>Note that if the original property term is to be preserved in the +expand([{{foo, true}, [bar, baz]}], [{foo, false}, fie, foo, fum])</code> + <p>Notice that if the original property term is to be preserved in the result when expanded, it must be included in the expansion list. The inserted terms are not expanded recursively. If - <c><anno>Expansions</anno></c> contains more than one property with the same - key, only the first occurrence is used.</p> - <p>See also: <c>normalize/2</c>.</p> + <c><anno>Expansions</anno></c> contains more than one property with + the same key, only the first occurrence is used.</p> + <p>See also + <seealso marker="#normalize/2"><c>normalize/2</c></seealso>.</p> </desc> </func> + <func> <name name="get_all_values" arity="2"/> <fsummary></fsummary> <desc> - <p>Similar to <c>get_value/2</c>, but returns the list of - values for <em>all</em> entries <c>{Key, Value}</c> in - <c><anno>List</anno></c>. If no such entry exists, the result is the empty - list.</p> - <p>See also: <c>get_value/2</c>.</p> + <p>Similar to + <seealso marker="#get_value/2"><c>get_value/2</c></seealso>, + but returns the list of values for <em>all</em> entries + <c>{Key, Value}</c> in <c><anno>List</anno></c>. If no such entry + exists, the result is the empty list.</p> </desc> </func> + <func> <name name="get_bool" arity="2"/> <fsummary></fsummary> <desc> <p>Returns the value of a boolean key/value option. If - <c>lookup(<anno>Key</anno>, <anno>List</anno>)</c> would yield <c>{<anno>Key</anno>, true}</c>, - this function returns <c>true</c>; otherwise <c>false</c> - is returned.</p> - <p>See also: <c>get_value/2</c>, <c>lookup/2</c>.</p> + <c>lookup(<anno>Key</anno>, <anno>List</anno>)</c> would yield + <c>{<anno>Key</anno>, true}</c>, this function returns <c>true</c>, + otherwise <c>false</c>.</p> + <p>See also + <seealso marker="#get_value/2"><c>get_value/2</c></seealso>, + <seealso marker="#lookup/2"><c>lookup/2</c></seealso>.</p> </desc> </func> + <func> <name name="get_keys" arity="1"/> <fsummary></fsummary> <desc> - <p>Returns an unordered list of the keys used in <c><anno>List</anno></c>, - not containing duplicates.</p> + <p>Returns an unordered list of the keys used in + <c><anno>List</anno></c>, not containing duplicates.</p> </desc> </func> + <func> <name name="get_value" arity="2"/> <fsummary></fsummary> <desc> - <p>Equivalent to <c>get_value(<anno>Key</anno>, <anno>List</anno>, undefined)</c>.</p> + <p>Equivalent to + <c>get_value(<anno>Key</anno>, <anno>List</anno>, undefined)</c>.</p> </desc> </func> + <func> <name name="get_value" arity="3"/> <fsummary></fsummary> <desc> <p>Returns the value of a simple key/value property in - <c><anno>List</anno></c>. If <c>lookup(<anno>Key</anno>, <anno>List</anno>)</c> would yield - <c>{<anno>Key</anno>, Value}</c>, this function returns the corresponding - <c>Value</c>, otherwise <c><anno>Default</anno></c> is returned.</p> - <p>See also: <c>get_all_values/2</c>, <c>get_bool/2</c>, - <c>get_value/2</c>, <c>lookup/2</c>.</p> + <c><anno>List</anno></c>. If <c>lookup(<anno>Key</anno>, + <anno>List</anno>)</c> would yield <c>{<anno>Key</anno>, Value}</c>, + this function returns the corresponding <c>Value</c>, otherwise + <c><anno>Default</anno></c>.</p> + <p>See also + <seealso marker="#get_all_values/2"><c>get_all_values/2</c></seealso>, + <seealso marker="#get_bool/2"><c>get_bool/2</c></seealso>, + <seealso marker="#get_value/2"><c>get_value/2</c></seealso>, + <seealso marker="#lookup/2"><c>lookup/2</c></seealso>.</p> </desc> </func> + <func> <name name="is_defined" arity="2"/> <fsummary></fsummary> <desc> <p>Returns <c>true</c> if <c><anno>List</anno></c> contains at least one entry associated with <c><anno>Key</anno></c>, otherwise - <c>false</c> is returned.</p> + <c>false</c>.</p> </desc> </func> + <func> <name name="lookup" arity="2"/> <fsummary></fsummary> @@ -181,128 +211,160 @@ <c><anno>List</anno></c>, if one exists, otherwise returns <c>none</c>. For an atom <c>A</c> in the list, the tuple <c>{A, true}</c> is the entry associated with <c>A</c>.</p> - <p>See also: <c>get_bool/2</c>, <c>get_value/2</c>, - <c>lookup_all/2</c>.</p> + <p>See also + <seealso marker="#get_bool/2"><c>get_bool/2</c></seealso>, + <seealso marker="#get_value/2"><c>get_value/2</c></seealso>, + <seealso marker="#lookup_all/2"><c>lookup_all/2</c></seealso>.</p> </desc> </func> + <func> <name name="lookup_all" arity="2"/> <fsummary></fsummary> <desc> - <p>Returns the list of all entries associated with <c><anno>Key</anno></c> - in <c><anno>List</anno></c>. If no such entry exists, the result is the - empty list.</p> - <p>See also: <c>lookup/2</c>.</p> + <p>Returns the list of all entries associated with + <c><anno>Key</anno></c> in <c><anno>List</anno></c>. If no such entry + exists, the result is the empty list.</p> + <p>See also + <seealso marker="#lookup/2"><c>lookup/2</c></seealso>.</p> </desc> </func> + <func> <name name="normalize" arity="2"/> <fsummary></fsummary> <desc> <p>Passes <c><anno>ListIn</anno></c> through a sequence of substitution/expansion stages. For an <c>aliases</c> operation, - the function <c>substitute_aliases/2</c> is applied using the - given list of aliases; for a <c>negations</c> operation, - <c>substitute_negations/2</c> is applied using the given - negation list; for an <c>expand</c> operation, the function - <c>expand/2</c> is applied using the given list of expansions. - The final result is automatically compacted (cf. - <c>compact/1</c>).</p> + function <seealso marker="#substitute_aliases/2"> + <c>substitute_aliases/2</c></seealso> is applied using the + specified list of aliases:</p> + <list type="bulleted"> + <item> + <p>For a <c>negations</c> operation, <c>substitute_negations/2</c> + is applied using the specified negation list.</p> + </item> + <item> + <p>For an <c>expand</c> operation, function + <seealso marker="#expand/2"><c>expand/2</c></seealso> + is applied using the specified list of expansions.</p> + </item> + </list> + <p>The final result is automatically compacted (compare + <seealso marker="#compact/1"><c>compact/1</c></seealso>).</p> <p>Typically you want to substitute negations first, then aliases, then perform one or more expansions (sometimes you want to pre-expand particular entries before doing the main expansion). You might want to substitute negations and/or aliases repeatedly, to allow such forms in the right-hand side of aliases and expansion lists.</p> - <p>See also: <c>compact/1</c>, <c>expand/2</c>, - <c>substitute_aliases/2</c>, <c>substitute_negations/2</c>.</p> + <p>See also <seealso marker="#substitute_negations/2"> + <c>substitute_negations/2</c></seealso>.</p> </desc> </func> + <func> <name name="property" arity="1"/> <fsummary></fsummary> <desc> <p>Creates a normal form (minimal) representation of a property. If - <c><anno>PropertyIn</anno></c> is <c>{Key, true}</c> where - <c>Key</c> is an atom, this returns <c>Key</c>, otherwise - the whole term <c><anno>PropertyIn</anno></c> is returned.</p> - <p>See also: <c>property/2</c>.</p> + <c><anno>PropertyIn</anno></c> is <c>{Key, true}</c>, where + <c>Key</c> is an atom, <c>Key</c> is returned, otherwise + the whole term <c><anno>PropertyIn</anno></c> is returned.</p> + <p>See also + <seealso marker="#property/2"><c>property/2</c></seealso>.</p> </desc> </func> + <func> <name name="property" arity="2"/> <fsummary></fsummary> <desc> - <p>Creates a normal form (minimal) representation of a simple - key/value property. Returns <c><anno>Key</anno></c> if <c><anno>Value</anno></c> is - <c>true</c> and <c><anno>Key</anno></c> is an atom, otherwise a tuple - <c>{<anno>Key</anno>, <anno>Value</anno>}</c> is returned.</p> - <p>See also: <c>property/1</c>.</p> + <p>Creates a normal form (minimal) representation of a simple key/value + property. Returns <c><anno>Key</anno></c> if <c><anno>Value</anno></c> + is <c>true</c> and <c><anno>Key</anno></c> is an atom, otherwise a + tuple <c>{<anno>Key</anno>, <anno>Value</anno>}</c> is returned.</p> + <p>See also + <seealso marker="#property/1"><c>property/1</c></seealso>.</p> </desc> </func> + <func> <name name="split" arity="2"/> <fsummary></fsummary> <desc> <p>Partitions <c><anno>List</anno></c> into a list of sublists and a - remainder. <c><anno>Lists</anno></c> contains one sublist for each key in - <c><anno>Keys</anno></c>, in the corresponding order. The relative order of - the elements in each sublist is preserved from the original - <c><anno>List</anno></c>. <c><anno>Rest</anno></c> contains the elements in - <c><anno>List</anno></c> that are not associated with any of the given keys, + remainder. <c><anno>Lists</anno></c> contains one sublist for each key + in <c><anno>Keys</anno></c>, in the corresponding order. The relative + order of the elements in each sublist is preserved from the original + <c><anno>List</anno></c>. <c><anno>Rest</anno></c> contains the + elements in <c><anno>List</anno></c> that are not associated with any + of the specified keys, also with their original relative order preserved.</p> - <p>Example: - split([{c, 2}, {e, 1}, a, {c, 3, 4}, d, {b, 5}, b], [a, b, c])</p> - <p>returns</p> - <p>{[[a], [{b, 5}, b],[{c, 2}, {c, 3, 4}]], [{e, 1}, d]}</p> + <p><em>Example:</em></p> + <code type="none"> +split([{c, 2}, {e, 1}, a, {c, 3, 4}, d, {b, 5}, b], [a, b, c])</code> + <p>returns:</p> + <code type="none"> +{[[a], [{b, 5}, b],[{c, 2}, {c, 3, 4}]], [{e, 1}, d]}</code> </desc> </func> + <func> <name name="substitute_aliases" arity="2"/> <fsummary></fsummary> <desc> <p>Substitutes keys of properties. For each entry in - <c><anno>ListIn</anno></c>, if it is associated with some key <c>K1</c> - such that <c>{K1, K2}</c> occurs in <c><anno>Aliases</anno></c>, the + <c><anno>ListIn</anno></c>, if it is associated with some key + <c>K1</c> such that <c>{K1, K2}</c> occurs in + <c><anno>Aliases</anno></c>, the key of the entry is changed to <c>K2</c>. If the same <c>K1</c> occurs more than once in <c><anno>Aliases</anno></c>, only the first occurrence is used.</p> - <p>Example: <c>substitute_aliases([{color, colour}], L)</c> - will replace all tuples <c>{color, ...}</c> in <c>L</c> + <p>For example, <c>substitute_aliases([{color, colour}], L)</c> + replaces all tuples <c>{color, ...}</c> in <c>L</c> with <c>{colour, ...}</c>, and all atoms <c>color</c> with <c>colour</c>.</p> - <p>See also: <c>normalize/2</c>, <c>substitute_negations/2</c>.</p> + <p>See also + <seealso marker="#normalize/2"><c>normalize/2</c></seealso>, + <seealso marker="#substitute_negations/2"> + <c>substitute_negations/2</c></seealso>.</p> </desc> </func> + <func> <name name="substitute_negations" arity="2"/> <fsummary></fsummary> <desc> <p>Substitutes keys of boolean-valued properties and simultaneously negates their values. For each entry in - <c><anno>ListIn</anno></c>, if it is associated with some key <c>K1</c> - such that <c>{K1, K2}</c> occurs in <c><anno>Negations</anno></c>, then - if the entry was <c>{K1, true}</c> it will be replaced with - <c>{K2, false}</c>, otherwise it will be replaced with - <c>{K2, true}</c>, thus changing the name of the option and - simultaneously negating the value given by - <c>get_bool(ListIn)</c>. If the same <c>K1</c> occurs more - than once in <c><anno>Negations</anno></c>, only the first occurrence is - used.</p> - <p>Example: <c>substitute_negations([{no_foo, foo}], L)</c> - will replace any atom <c>no_foo</c> or tuple + <c><anno>ListIn</anno></c>, if it is associated with some key + <c>K1</c> such that <c>{K1, K2}</c> occurs in + <c><anno>Negations</anno></c>: if the entry was + <c>{K1, true}</c>, it is replaced with <c>{K2, false}</c>, otherwise + with <c>{K2, true}</c>, thus changing the name of the option and + simultaneously negating the value specified by + <seealso marker="#get_bool/2"> + <c>get_bool(Key, <anno>ListIn</anno></c></seealso>. + If the same <c>K1</c> occurs more than once in + <c><anno>Negations</anno></c>, only the first occurrence is used.</p> + <p>For example, <c>substitute_negations([{no_foo, foo}], L)</c> + replaces any atom <c>no_foo</c> or tuple <c>{no_foo, true}</c> in <c>L</c> with <c>{foo, false}</c>, - and any other tuple <c>{no_foo, ...}</c> with - <c>{foo, true}</c>.</p> - <p>See also: <c>get_bool/2</c>, <c>normalize/2</c>, - <c>substitute_aliases/2</c>.</p> + and any other tuple <c>{no_foo, ...}</c> with <c>{foo, true}</c>.</p> + <p>See also + <seealso marker="#get_bool/2"><c>get_bool/2</c></seealso>, + <seealso marker="#normalize/2"><c>normalize/2</c></seealso>, + <seealso marker="#substitute_aliases/2"> + <c>substitute_aliases/2</c></seealso>.</p> </desc> </func> + <func> <name name="unfold" arity="1"/> <fsummary></fsummary> <desc> - <p>Unfolds all occurrences of atoms in <c><anno>ListIn</anno></c> to tuples - <c>{Atom, true}</c>.</p> + <p>Unfolds all occurrences of atoms in <c><anno>ListIn</anno></c> to + tuples <c>{Atom, true}</c>.</p> </desc> </func> </funcs> diff --git a/lib/stdlib/doc/src/qlc.xml b/lib/stdlib/doc/src/qlc.xml index 2966e94ec1..fe14a6334c 100644 --- a/lib/stdlib/doc/src/qlc.xml +++ b/lib/stdlib/doc/src/qlc.xml @@ -24,102 +24,121 @@ <title>qlc</title> <prepared>Hans Bolinder</prepared> - <responsible>nobody</responsible> + <responsible></responsible> <docno></docno> - <approved>nobody</approved> - <checked>no</checked> + <approved></approved> + <checked></checked> <date>2004-08-25</date> <rev>PA1</rev> - <file>qlc.sgml</file> + <file>qlc.xml</file> </header> <module>qlc</module> - <modulesummary>Query Interface to Mnesia, ETS, Dets, etc</modulesummary> + <modulesummary>Query interface to Mnesia, ETS, Dets, and so on. + </modulesummary> <description> - <p>The <c>qlc</c> module provides a query interface to Mnesia, ETS, - Dets and other data structures that implement an iterator style - traversal of objects. </p> + <p>This module provides a query interface to + <seealso marker="mnesia:mnesia">Mnesia</seealso>, + <seealso marker="ets">ETS</seealso>, + <seealso marker="dets">Dets</seealso>, + and other data structures that provide an iterator style + traversal of objects.</p> </description> - <section><title>Overview</title> - - <p>The <c>qlc</c> module implements a query interface to <em>QLC - tables</em>. Typical QLC tables are ETS, Dets, and Mnesia - tables. There is also support for user defined tables, see the - <seealso marker="#implementing_a_qlc_table">Implementing a QLC - table</seealso> section. <marker - id="query_list_comprehension"></marker> - A <em>query</em> is stated using + <section> + <title>Overview</title> + <p>This module provides a query interface to <em>QLC + tables</em>. Typical QLC tables are Mnesia, ETS, and + Dets tables. Support is also provided for user-defined tables, see section + <seealso marker="#implementing_a_qlc_table"> + Implementing a QLC Table</seealso>. + <marker id="query_list_comprehension"></marker> + A <em>query</em> is expressed using <em>Query List Comprehensions</em> (QLCs). The answers to a query are determined by data in QLC tables that fulfill the constraints expressed by the QLCs of the query. QLCs are similar - to ordinary list comprehensions as described in the Erlang - Reference Manual and Programming Examples except that variables - introduced in patterns cannot be used in list expressions. In - fact, in the absence of optimizations and options such as - <c>cache</c> and <c>unique</c> (see below), every QLC free of - QLC tables evaluates to the same list of answers as the + to ordinary list comprehensions as described in + <seealso marker="doc/reference_manual:expressions#lcs"> + Erlang Reference Manual</seealso> and + <seealso marker="doc/programming_examples:list_comprehensions"> + Programming Examples</seealso>, except that variables + introduced in patterns cannot be used in list expressions. + In the absence of optimizations and options such as + <c>cache</c> and <c>unique</c> (see section + <seealso marker="#common_options">Common Options</seealso>, every + QLC free of QLC tables evaluates to the same list of answers as the identical ordinary list comprehension.</p> <p>While ordinary list comprehensions evaluate to lists, calling - <seealso marker="#q">qlc:q/1,2</seealso> returns a <marker - id="query_handle"></marker><em> Query - Handle</em>. To obtain all the answers to a query, <seealso - marker="#eval">qlc:eval/1,2</seealso> should be called with the + <seealso marker="#q/1"><c>q/1,2</c></seealso> returns a + <marker id="query_handle"></marker><em>query handle</em>. + To obtain all the answers to a query, <seealso marker="#eval/1"> + <c>eval/1,2</c></seealso> is to be called with the query handle as first argument. Query handles are essentially - functional objects ("funs") created in the module calling <c>q/1,2</c>. - As the funs refer to the module's code, one should - be careful not to keep query handles too long if the module's - code is to be replaced. - Code replacement is described in the <seealso - marker="doc/reference_manual:code_loading">Erlang Reference - Manual</seealso>. The list of answers can also be traversed in - chunks by use of a <marker - id="query_cursor"></marker><em>Query Cursor</em>. Query cursors are - created by calling <seealso - marker="#cursor">qlc:cursor/1,2</seealso> with a query handle as - first argument. Query cursors are essentially Erlang processes. + functional objects (funs) created in the module calling <c>q/1,2</c>. + As the funs refer to the module code, be careful not to keep query + handles too long if the module code is to be replaced. + Code replacement is described in section + <seealso marker="doc/reference_manual:code_loading"> + Compilation and Code Loading</seealso> in the Erlang Reference Manual. + The list of answers can also be traversed in chunks by use of a + <marker id="query_cursor"></marker><em>query cursor</em>. + Query cursors are created by calling + <seealso marker="#cursor/1"><c>cursor/1,2</c></seealso> with a query + handle as first argument. Query cursors are essentially Erlang processes. One answer at a time is sent from the query cursor process to the process that created the cursor.</p> - </section> - <section><title>Syntax</title> - + <section> + <title>Syntax</title> <p>Syntactically QLCs have the same parts as ordinary list comprehensions:</p> - <code type="none">[Expression || Qualifier1, Qualifier2, ...]</code> + <code type="none"> +[Expression || Qualifier1, Qualifier2, ...]</code> - <p><c>Expression</c> (the <em>template</em>) is an arbitrary + <p><c>Expression</c> (the <em>template</em>) is any Erlang expression. Qualifiers are either <em>filters</em> or <em>generators</em>. Filters are Erlang expressions returning - <c>bool()</c>. Generators have the form + <c>boolean()</c>. Generators have the form <c><![CDATA[Pattern <- ListExpression]]></c>, where <c>ListExpression</c> is an expression evaluating to a query handle or a list. Query handles are returned from - <c>qlc:table/2</c>, <c>qlc:append/1,2</c>, <c>qlc:sort/1,2</c>, - <c>qlc:keysort/2,3</c>, <c>qlc:q/1,2</c>, and - <c>qlc:string_to_handle/1,2,3</c>.</p> - + <seealso marker="#append/1"><c>append/1,2</c></seealso>, + <seealso marker="#keysort/2"><c>keysort/2,3</c></seealso>, + <seealso marker="#q/1"><c>q/1,2</c></seealso>, + <seealso marker="#sort/1"><c>sort/1,2</c></seealso>, + <seealso marker="#string_to_handle/1"> + <c>string_to_handle/1,2,3</c></seealso>, and + <seealso marker="#table/2"><c>table/2</c></seealso>.</p> </section> - <section><title>Evaluation</title> - - <p>The evaluation of a query handle begins by the inspection of - options and the collection of information about tables. As a - result qualifiers are modified during the optimization phase. - Next all list expressions are evaluated. If a cursor has been - created evaluation takes place in the cursor process. For those - list expressions that are QLCs, the list expressions of the - QLCs' generators are evaluated as well. One has to be careful if - list expressions have side effects since the order in which list - expressions are evaluated is unspecified. Finally the answers - are found by evaluating the qualifiers from left to right, - backtracking when some filter returns <c>false</c>, or - collecting the template when all filters return <c>true</c>.</p> - - <p>Filters that do not return <c>bool()</c> but fail are handled - differently depending on their syntax: if the filter is a guard + <section> + <title>Evaluation</title> + <p>A query handle is evaluated in the following order:</p> + + <list type="bulleted"> + <item> + <p>Inspection of options and the collection of information about + tables. As a result, qualifiers are modified during the optimization + phase.</p> + </item> + <item> + <p>All list expressions are evaluated. If a cursor has been created, + evaluation takes place in the cursor process. For list expressions + that are QLCs, the list expressions of the generators of the QLCs + are evaluated as well. Be careful if list expressions have side + effects, as list expressions are evaluated in unspecified order.</p> + </item> + <item> + <p>The answers are found by evaluating the qualifiers from left to + right, backtracking when some filter returns <c>false</c>, or + collecting the template when all filters return <c>true</c>.</p> + </item> + </list> + + <p>Filters that do not return <c>boolean()</c> but fail are handled + differently depending on their syntax: if the filter is a guard, it returns <c>false</c>, otherwise the query evaluation fails. This behavior makes it possible for the <c>qlc</c> module to do some optimizations without affecting the meaning of a query. For @@ -131,302 +150,311 @@ candidate objects can often be found by looking up some key values of the table or by traversing the table using a match specification. It is necessary to place the guard filters - immediately after the table's generator, otherwise the candidate - objects will not be restricted to a small set. The reason is + immediately after the table generator, otherwise the candidate + objects are not restricted to a small set. The reason is that objects that could make the query evaluation fail must not - be excluded by looking up a key or running a match - specification.</p> - + be excluded by looking up a key or running a match specification.</p> </section> - <section><title>Join</title> - + <section> + <title>Join</title> <p>The <c>qlc</c> module supports fast join of two query handles. Fast join is possible if some position <c>P1</c> of one query handler and some position <c>P2</c> of another query handler are - tested for equality. Two fast join methods have been - implemented:</p> + tested for equality. Two fast join methods are provided:</p> <list type="bulleted"> - <item>Lookup join traverses all objects of one query handle and - finds objects of the other handle (a QLC table) such that the + <item><p><em>Lookup join</em> traverses all objects of one query handle + and finds objects of the other handle (a QLC table) such that the values at <c>P1</c> and <c>P2</c> match or compare equal. The <c>qlc</c> module does not create - any indices but looks up values using the key position and - the indexed positions of the QLC table. + any indexes but looks up values using the key position and + the indexed positions of the QLC table.</p> </item> - <item>Merge join sorts the objects of each query handle if + <item><p><em>Merge join</em> sorts the objects of each query handle if necessary and filters out objects where the values at - <c>P1</c> and <c>P2</c> do not compare equal. If there are - many objects with the same value of <c>P2</c> a temporary - file will be used for the equivalence classes. + <c>P1</c> and <c>P2</c> do not compare equal. If + many objects with the same value of <c>P2</c> exist, a temporary + file is used for the equivalence classes.</p> </item> </list> <p>The <c>qlc</c> module warns at compile time if a QLC combines query handles in such a way that more than one join is - possible. In other words, there is no query planner that can - choose a good order between possible join operations. It is up + possible. That is, no query planner is provided that can + select a good order between possible join operations. It is up to the user to order the joins by introducing query handles.</p> <p>The join is to be expressed as a guard filter. The filter must be placed immediately after the two joined generators, possibly after guard filters that use variables from no other generators - but the two joined generators. The <c>qlc</c> module inspects + but the two joined generators. The <c>qlc</c> module inspects the operands of <c>=:=/2</c>, <c>==/2</c>, <c>is_record/2</c>, <c>element/2</c>, and logical operators (<c>and/2</c>, <c>or/2</c>, <c>andalso/2</c>, <c>orelse/2</c>, <c>xor/2</c>) when determining which joins to consider.</p> - </section> - <section><title>Common options</title> - - <p>The following options are accepted by <c>cursor/2</c>, - <c>eval/2</c>, <c>fold/4</c>, and <c>info/2</c>:</p> + <section> + <marker id="common_options"></marker> + <title>Common Options</title> + <p>The following options are accepted by + <seealso marker="#cursor/2"><c>cursor/2</c></seealso>, + <seealso marker="#eval/2"><c>eval/2</c></seealso>, + <seealso marker="#fold/4"><c>fold/4</c></seealso>, and + <seealso marker="#info/2"><c>info/2</c></seealso>:</p> <list type="bulleted"> - <item><c>{cache_all, Cache}</c> where <c>Cache</c> is + <item><p><c>{cache_all, Cache}</c>, where <c>Cache</c> is equal to <c>ets</c> or <c>list</c> adds a <c>{cache, Cache}</c> option to every list expression - of the query except tables and lists. Default is - <c>{cache_all, no}</c>. The option <c>cache_all</c> is - equivalent to <c>{cache_all, ets}</c>. + of the query except tables and lists. Defaults to + <c>{cache_all, no}</c>. Option <c>cache_all</c> is + equivalent to <c>{cache_all, ets}</c>.</p> </item> - <item><c>{max_list_size, MaxListSize}</c> <marker - id="max_list_size"></marker> where <c>MaxListSize</c> is the + <item><p><marker id="max_list_size"></marker><c>{max_list_size, + MaxListSize}</c>, where <c>MaxListSize</c> is the size in bytes of terms on the external format. If the accumulated size of collected objects exceeds - <c>MaxListSize</c> the objects are written onto a temporary - file. This option is used by the <c>{cache, list}</c> - option as well as by the merge join method. Default is - 512*1024 bytes. + <c>MaxListSize</c>, the objects are written onto a temporary + file. This option is used by option <c>{cache, list}</c> + and by the merge join method. Defaults to 512*1024 bytes.</p> </item> - <item><c>{tmpdir_usage, TmpFileUsage}</c> determines the + <item><p><c>{tmpdir_usage, TmpFileUsage}</c> determines the action taken when <c>qlc</c> is about to create temporary - files on the directory set by the <c>tmpdir</c> option. If the - value is <c>not_allowed</c> an error tuple is returned, + files on the directory set by option <c>tmpdir</c>. If the + value is <c>not_allowed</c>, an error tuple is returned, otherwise temporary files are created as needed. Default is - <c>allowed</c> which means that no further action is taken. + <c>allowed</c>, which means that no further action is taken. The values <c>info_msg</c>, <c>warning_msg</c>, and <c>error_msg</c> mean that the function with the corresponding - name in the module <c>error_logger</c> is called for printing - some information (currently the stacktrace). + name in module + <seealso marker="kernel:error_logger"><c>error_logger</c></seealso> + is called for printing some information (currently the stacktrace).</p> </item> - <item><c>{tmpdir, TempDirectory}</c> sets the directory used by - merge join for temporary files and by the - <c>{cache, list}</c> option. The option also overrides - the <c>tmpdir</c> option of <c>keysort/3</c> and - <c>sort/2</c>. The default value is <c>""</c> which means that - the directory returned by <c>file:get_cwd()</c> is used. + <item><p><c>{tmpdir, TempDirectory}</c> sets the directory used by + merge join for temporary files and by option + <c>{cache, list}</c>. The option also overrides + option <c>tmpdir</c> of + <seealso marker="#keysort/3"><c>keysort/3</c></seealso> and + <seealso marker="#sort/2"><c>sort/2</c></seealso>. + Defaults to <c>""</c>, which means that + the directory returned by <c>file:get_cwd()</c> is used.</p> </item> - <item><c>{unique_all, true}</c> adds a + <item><p><c>{unique_all, true}</c> adds a <c>{unique, true}</c> option to every list expression of - the query. Default is <c>{unique_all, false}</c>. The - option <c>unique_all</c> is equivalent to - <c>{unique_all, true}</c>. + the query. Defaults to <c>{unique_all, false}</c>. + Option <c>unique_all</c> is equivalent to + <c>{unique_all, true}</c>.</p> </item> </list> - </section> - <section><title>Getting started</title> - - <p><marker id="getting_started"></marker> As already mentioned - queries are stated in the list comprehension syntax as described - in the <seealso marker="doc/reference_manual:expressions">Erlang - Reference Manual</seealso>. In the following some familiarity - with list comprehensions is assumed. There are examples in - <seealso - marker="doc/programming_examples:list_comprehensions">Programming - Examples</seealso> that can get you started. It should be - stressed that list comprehensions do not add any computational + <section> + <marker id="getting_started"></marker> + <title>Getting Started</title> + <p>As mentioned earlier, + queries are expressed in the list comprehension syntax as described + in section + <seealso marker="doc/reference_manual:expressions">Expressions</seealso> + in Erlang Reference Manual. In the following, some familiarity + with list comprehensions is assumed. The examples in section + <seealso marker="doc/programming_examples:list_comprehensions"> + List Comprehensions</seealso> in Programming Examples can get you + started. Notice that list comprehensions do not add any computational power to the language; anything that can be done with list - comprehensions can also be done without them. But they add a - syntax for expressing simple search problems which is compact + comprehensions can also be done without them. But they add + syntax for expressing simple search problems, which is compact and clear once you get used to it.</p> <p>Many list comprehension expressions can be evaluated by the - <c>qlc</c> module. Exceptions are expressions such that + <c>qlc</c> module. Exceptions are expressions, such that variables introduced in patterns (or filters) are used in some - generator later in the list comprehension. As an example - consider an implementation of lists:append(L): - <c><![CDATA[[X ||Y <- L, X <- Y]]]></c>. - Y is introduced in the first generator and used in the second. + generator later in the list comprehension. As an example, + consider an implementation of <c>lists:append(L)</c>: + <c><![CDATA[[X ||Y <- L, X <- Y]]]></c>. + <c>Y</c> is introduced in the first generator and used in the second. The ordinary list comprehension is normally to be preferred when there is a choice as to which to use. One difference is that - <c>qlc:eval/1,2</c> collects answers in a list which is finally + <seealso marker="#eval/1"><c>eval/1,2</c></seealso> + collects answers in a list that is finally reversed, while list comprehensions collect answers on the stack - which is finally unwound.</p> + that is finally unwound.</p> <p>What the <c>qlc</c> module primarily adds to list comprehensions is that data can be read from QLC tables in small - chunks. A QLC table is created by calling <c>qlc:table/2</c>. + chunks. A QLC table is created by calling + <seealso marker="#table/2"><c>qlc:table/2</c></seealso>. Usually <c>qlc:table/2</c> is not called directly from the query - but via an interface function of some data structure. There are - a few examples of such functions in Erlang/OTP: - <c>mnesia:table/1,2</c>, <c>ets:table/1,2</c>, and - <c>dets:table/1,2</c>. For a given data structure there can be - several functions that create QLC tables, but common for all - these functions is that they return a query handle created by - <c>qlc:table/2</c>. Using the QLC tables provided by OTP is - probably sufficient in most cases, but for the more advanced - user the section <seealso - marker="#implementing_a_qlc_table">Implementing a QLC - table</seealso> describes the implementation of a function + but through an interface function of some data structure. + Erlang/OTP includes a few examples of such functions: + <seealso marker="mnesia:mnesia#table/1"><c>mnesia:table/1,2</c></seealso>, + <seealso marker="ets#table/1"><c>ets:table/1,2</c></seealso>, and + <seealso marker="dets#table/1"><c>dets:table/1,2</c></seealso>. + For a given data structure, many functions can create QLC tables, but + common for these functions is that they return a query handle created by + <seealso marker="#table/2"><c>qlc:table/2</c></seealso>. + Using the QLC tables provided by Erlang/OTP is usually + probably sufficient, but for the more advanced user section + <seealso marker="#implementing_a_qlc_table">Implementing a QLC + Table</seealso> describes the implementation of a function calling <c>qlc:table/2</c>.</p> - <p>Besides <c>qlc:table/2</c> there are other functions that - return query handles. They might not be used as often as tables, - but are useful from time to time. <c>qlc:append</c> traverses - objects from several tables or lists after each other. If, for - instance, you want to traverse all answers to a query QH and + <p>Besides <c>qlc:table/2</c>, other functions + return query handles. They are used more seldom than tables, + but are sometimes useful. + <seealso marker="#append/1"><c>qlc:append/1,2</c></seealso> traverses + objects from many tables or lists after each other. If, for + example, you want to traverse all answers to a query <c>QH</c> and then finish off by a term <c>{finished}</c>, you can do that by - calling <c>qlc:append(QH, [{finished}])</c>. <c>append</c> first - returns all objects of QH, then <c>{finished}</c>. If there is - one tuple <c>{finished}</c> among the answers to QH it will be - returned twice from <c>append</c>.</p> + calling <c>qlc:append(QH, [{finished}])</c>. <c>append/2</c> first + returns all objects of <c>QH</c>, then <c>{finished}</c>. If a tuple + <c>{finished}</c> exists among the answers to <c>QH</c>, it is + returned twice from <c>append/2</c>.</p> <p>As another example, consider concatenating the answers to two - queries QH1 and QH2 while removing all duplicates. The means to - accomplish this is to use the <c>unique</c> option:</p> + queries <c>QH1</c> and <c>QH2</c> while removing all duplicates. This is + accomplished by using option <c>unique</c>:</p> - <code type="none"><![CDATA[ -qlc:q([X || X <- qlc:append(QH1, QH2)], {unique, true})]]></code> + <code type="none"> +<![CDATA[qlc:q([X || X <- qlc:append(QH1, QH2)], {unique, true})]]></code> - <p>The cost is substantial: every returned answer will be stored - in an ETS table. Before returning an answer it is looked up in + <p>The cost is substantial: every returned answer is stored + in an ETS table. Before returning an answer, it is looked up in the ETS table to check if it has already been returned. Without - the <c>unique</c> options all answers to QH1 would be returned - followed by all answers to QH2. The <c>unique</c> options keeps + the <c>unique</c> option, all answers to <c>QH1</c> would be returned + followed by all answers to <c>QH2</c>. The <c>unique</c> option keeps the order between the remaining answers.</p> - <p>If the order of the answers is not important there is the - alternative to sort the answers uniquely:</p> + <p>If the order of the answers is not important, there is an + alternative to the <c>unique</c> option, namely to sort the + answers uniquely:</p> - <code type="none"><![CDATA[ -qlc:sort(qlc:q([X || X <- qlc:append(QH1, QH2)], {unique, true})).]]></code> + <code type="none"> +<![CDATA[qlc:sort(qlc:q([X || X <- qlc:append(QH1, QH2)], {unique, true})).]]></code> - <p>This query also removes duplicates but the answers will be - sorted. If there are many answers temporary files will be used. - Note that in order to get the first unique answer all answers - have to be found and sorted. Both alternatives find duplicates - by comparing answers, that is, if A1 and A2 are answers found in - that order, then A2 is a removed if A1 == A2.</p> + <p>This query also removes duplicates but the answers are + sorted. If there are many answers, temporary files are used. + Notice that to get the first unique answer, all answers + must be found and sorted. Both alternatives find duplicates by comparing + answers, that is, if <c>A1</c> and <c>A2</c> are answers found in + that order, then <c>A2</c> is a removed if <c>A1 == A2</c>.</p> - <p>To return just a few answers cursors can be used. The following + <p>To return only a few answers, cursors can be used. The following code returns no more than five answers using an ETS table for storing the unique answers:</p> - <code type="none"><![CDATA[ -C = qlc:cursor(qlc:q([X || X <- qlc:append(QH1, QH2)],{unique,true})), + <code type="none"> +<![CDATA[C = qlc:cursor(qlc:q([X || X <- qlc:append(QH1, QH2)],{unique,true})), R = qlc:next_answers(C, 5), ok = qlc:delete_cursor(C), R.]]></code> - <p>Query list comprehensions are convenient for stating - constraints on data from two or more tables. An example that + <p>QLCs are convenient for stating + constraints on data from two or more tables. The following example does a natural join on two query handles on position 2:</p> - <code type="none"><![CDATA[ -qlc:q([{X1,X2,X3,Y1} || + <code type="none"> +<![CDATA[qlc:q([{X1,X2,X3,Y1} || {X1,X2,X3} <- QH1, {Y1,Y2} <- QH2, X2 =:= Y2])]]></code> - <p>The <c>qlc</c> module will evaluate this differently depending on - the query - handles <c>QH1</c> and <c>QH2</c>. If, for example, <c>X2</c> is - matched against the key of a QLC table the lookup join method - will traverse the objects of <c>QH2</c> while looking up key - values in the table. On the other hand, if neither <c>X2</c> nor + <p>The <c>qlc</c> module evaluates this differently depending on the + query handles <c>QH1</c> and <c>QH2</c>. If, for example, <c>X2</c> is + matched against the key of a QLC table, the lookup join method + traverses the objects of <c>QH2</c> while looking up key + values in the table. However, if not <c>X2</c> or <c>Y2</c> is matched against the key or an indexed position of a - QLC table, the merge join method will make sure that <c>QH1</c> + QLC table, the merge join method ensures that <c>QH1</c> and <c>QH2</c> are both sorted on position 2 and next do the join by traversing the objects one by one.</p> - <p>The <c>join</c> option can be used to force the <c>qlc</c> module - to use a - certain join method. For the rest of this section it is assumed + <p>Option <c>join</c> can be used to force the <c>qlc</c> module to use + a certain join method. For the rest of this section it is assumed that the excessively slow join method called "nested loop" has been chosen:</p> - <code type="none"><![CDATA[ -qlc:q([{X1,X2,X3,Y1} || + <code type="none"> +<![CDATA[qlc:q([{X1,X2,X3,Y1} || {X1,X2,X3} <- QH1, {Y1,Y2} <- QH2, X2 =:= Y2], {join, nested_loop})]]></code> - <p>In this case the filter will be applied to every possible pair - of answers to QH1 and QH2, one at a time. If there are M answers - to QH1 and N answers to QH2 the filter will be run M*N - times.</p> - - <p>If QH2 is a call to the function for <c>gb_trees</c> as defined - in the <seealso marker="#implementing_a_qlc_table">Implementing - a QLC table</seealso> section, <c>gb_table:table/1</c>, the - iterator for the gb-tree will be initiated for each answer to - QH1 after which the objects of the gb-tree will be returned one + <p>In this case the filter is applied to every possible pair + of answers to <c>QH1</c> and <c>QH2</c>, one at a time. + If there are M answers to <c>QH1</c> and N answers to <c>QH2</c>, + the filter is run M*N times.</p> + + <p>If <c>QH2</c> is a call to the function for + <seealso marker="gb_trees"><c>gb_trees</c></seealso>, as defined + in section <seealso marker="#implementing_a_qlc_table">Implementing + a QLC Table</seealso>, then <c>gb_table:table/1</c>, the + iterator for the gb-tree is initiated for each answer to + <c>QH1</c>. The objects of the gb-tree are then returned one by one. This is probably the most efficient way of traversing - the table in that case since it takes minimal computational - power to get the following object. But if QH2 is not a table but - a more complicated QLC, it can be more efficient use some RAM + the table in that case, as it takes minimal computational + power to get the following object. But if <c>QH2</c> is not a table but + a more complicated QLC, it can be more efficient to use some RAM memory for collecting the answers in a cache, particularly if there are only a few answers. It must then be assumed that - evaluating QH2 has no side effects so that the meaning of the - query does not change if QH2 is evaluated only once. One way of - caching the answers is to evaluate QH2 first of all and - substitute the list of answers for QH2 in the query. Another way - is to use the <c>cache</c> option. It is stated like this:</p> - - <code type="none"><![CDATA[ -QH2' = qlc:q([X || X <- QH2], {cache, ets})]]></code> - - <p>or just</p> - - <code type="none"><![CDATA[ -QH2' = qlc:q([X || X <- QH2], cache)]]></code> - - <p>The effect of the <c>cache</c> option is that when the - generator QH2' is run the first time every answer is stored in - an ETS table. When next answer of QH1 is tried, answers to QH2' - are copied from the ETS table which is very fast. As for the - <c>unique</c> option the cost is a possibly substantial amount - of RAM memory. The <c>{cache, list}</c> option offers the + evaluating <c>QH2</c> has no side effects so that the meaning of the + query does not change if <c>QH2</c> is evaluated only once. One way of + caching the answers is to evaluate <c>QH2</c> first of all and + substitute the list of answers for <c>QH2</c> in the query. Another way + is to use option <c>cache</c>. It is expressed like this:</p> + + <code type="none"> +<![CDATA[QH2' = qlc:q([X || X <- QH2], {cache, ets})]]></code> + + <p>or only</p> + + <code type="none"> +<![CDATA[QH2' = qlc:q([X || X <- QH2], cache)]]></code> + + <p>The effect of option <c>cache</c> is that when + generator <c>QH2'</c> is run the first time, every answer is stored in + an ETS table. When the next answer of <c>QH1</c> is tried, + answers to <c>QH2'</c> + are copied from the ETS table, which is very fast. As for + option <c>unique</c> the cost is a possibly substantial amount + of RAM memory.</p> + + <p>Option <c>{cache, list}</c> offers the possibility to store the answers in a list on the process heap. - While this has the potential of being faster than ETS tables - since there is no need to copy answers from the table it can - often result in slower evaluation due to more garbage - collections of the process' heap as well as increased RAM memory - consumption due to larger heaps. Another drawback with cache - lists is that if the size of the list exceeds a limit a - temporary file will be used. Reading the answers from a file is - very much slower than copying them from an ETS table. But if the - available RAM memory is scarce setting the <seealso + This has the potential of being faster than ETS tables, + as there is no need to copy answers from the table. However, it can + often result in slower evaluation because of more garbage + collections of the process heap and increased RAM memory + consumption because of larger heaps. Another drawback with cache + lists is that if the list size exceeds a limit, a + temporary file is used. Reading the answers from a file is + much slower than copying them from an ETS table. But if the + available RAM memory is scarce, setting the <seealso marker="#max_list_size">limit</seealso> to some low value is an alternative.</p> - <p>There is an option <c>cache_all</c> that can be set to + <p>Option <c>cache_all</c> can be set to <c>ets</c> or <c>list</c> when evaluating a query. It adds a <c>cache</c> or <c>{cache, list}</c> option to every list expression except QLC tables and lists on all levels of the query. This can be used for testing if caching would improve - efficiency at all. If the answer is yes further testing is - needed to pinpoint the generators that should be cached.</p> - + efficiency at all. If the answer is yes, further testing is + needed to pinpoint the generators that are to be cached.</p> </section> - <section><title>Implementing a QLC table</title> - - <p><marker id="implementing_a_qlc_table"></marker>As an example of - how to use the <seealso marker="#q">qlc:table/2</seealso> - function the implementation of a QLC table for the <seealso - marker="gb_trees">gb_trees</seealso> module is given:</p> + <section> + <marker id="implementing_a_qlc_table"></marker> + <title>Implementing a QLC Table</title> + <p>As an example of + how to use function <seealso marker="#table/2"><c>table/2</c></seealso>, + the implementation of a QLC table for the <seealso + marker="gb_trees"><c>gb_trees</c></seealso> module is given:</p> - <code type="none"><![CDATA[ --module(gb_table). + <code type="none"> +<![CDATA[-module(gb_table). -export([table/1]). @@ -486,65 +514,64 @@ gb_iter(I0, N, EFun) -> <p><c>TF</c> is the traversal function. The <c>qlc</c> module requires that there is a way of traversing all objects of the - data structure; in <c>gb_trees</c> there is an iterator function - suitable for that purpose. Note that for each object returned a + data structure. <c>gb_trees</c> has an iterator function + suitable for that purpose. Notice that for each object returned, a new fun is created. As long as the list is not terminated by - <c>[]</c> it is assumed that the tail of the list is a nullary + <c>[]</c>, it is assumed that the tail of the list is a nullary function and that calling the function returns further objects (and functions).</p> <p>The lookup function is optional. It is assumed that the lookup function always finds values much faster than it would take to traverse the table. The first argument is the position of the - key. Since <c>qlc_next</c> returns the objects as - {Key, Value} pairs the position is 1. Note that the lookup - function should return {Key, Value} pairs, just as the - traversal function does.</p> + key. As <c>qlc_next/1</c> returns the objects as <c>{Key, Value}</c> + pairs, the position is 1. Notice that the lookup function is to return + <c>{Key, Value}</c> pairs, as the traversal function does.</p> <p>The format function is also optional. It is called by - <c>qlc:info</c> to give feedback at runtime of how the query - will be evaluated. One should try to give as good feedback as - possible without showing too much details. In the example at - most 7 objects of the table are shown. The format function + <seealso marker="#info/1"><c>info/1,2</c></seealso> + to give feedback at runtime of how the query + is to be evaluated. Try to give as good feedback as + possible without showing too much details. In the example, at + most seven objects of the table are shown. The format function handles two cases: <c>all</c> means that all objects of the - table will be traversed; <c>{lookup, 1, KeyValues}</c> - means that the lookup function will be used for looking up key + table are traversed; <c>{lookup, 1, KeyValues}</c> + means that the lookup function is used for looking up key values.</p> - <p>Whether the whole table will be traversed or just some keys - looked up depends on how the query is stated. If the query has + <p>Whether the whole table is traversed or only some keys + looked up depends on how the query is expressed. If the query has the form</p> - <code type="none"><![CDATA[ -qlc:q([T || P <- LE, F])]]></code> + <code type="none"> +<![CDATA[qlc:q([T || P <- LE, F])]]></code> - <p>and P is a tuple, the <c>qlc</c> module analyzes P and F in - compile time to find positions of the tuple P that are tested + <p>and <c>P</c> is a tuple, the <c>qlc</c> module analyzes + <c>P</c> and <c>F</c> in + compile time to find positions of tuple <c>P</c> that are tested for equality to constants. If such a position at runtime turns out to be the key position, the lookup function can be used, - otherwise all objects of the table have to be traversed. It is - the info function <c>InfoFun</c> that returns the key position. + otherwise all objects of the table must be traversed. + The info function <c>InfoFun</c> returns the key position. There can be indexed positions as well, also returned by the info function. An index is an extra table that makes lookup on - some position fast. Mnesia maintains indices upon request, - thereby introducing so called secondary keys. The <c>qlc</c> + some position fast. Mnesia maintains indexes upon request, + and introduces so called secondary keys. The <c>qlc</c> module prefers to look up objects using the key before secondary keys regardless of the number of constants to look up.</p> - </section> - <section><title>Key equality</title> - - <p>In Erlang there are two operators for testing term equality, - namely <c>==/2</c> and <c>=:=/2</c>. The difference between them - is all about the integers that can be represented by floats. For - instance, <c>2 == 2.0</c> evaluates to + <section> + <title>Key Equality</title> + <p>Erlang/OTP has two operators for testing term equality: <c>==/2</c> + and <c>=:=/2</c>. The difference is all about the integers that can be + represented by floats. For example, <c>2 == 2.0</c> evaluates to <c>true</c> while <c>2 =:= 2.0</c> evaluates to <c>false</c>. Normally this is a minor issue, but the <c>qlc</c> module cannot ignore the difference, which affects the user's choice of operators in QLCs.</p> - <p>If the <c>qlc</c> module can find out at compile time that some + <p>If the <c>qlc</c> module at compile time can determine that some constant is free of integers, it does not matter which one of <c>==/2</c> or <c>=:=/2</c> is used:</p> @@ -560,16 +587,16 @@ ets:match_spec_run(lists:flatmap(fun(V) -> [a,2.71]), ets:match_spec_compile([{{'$1'},[],['$1']}]))</pre> - <p>In the example the <c>==/2</c> operator has been handled - exactly as <c>=:=/2</c> would have been handled. On the other - hand, if it cannot be determined at compile time that some - constant is free of integers and the table uses <c>=:=/2</c> - when comparing keys for equality (see the option <seealso - marker="#key_equality">key_equality</seealso>), the - <c>qlc</c> module will not try to look up the constant. The + <p>In the example, operator <c>==/2</c> has been handled + exactly as <c>=:=/2</c> would have been handled. However, + if it cannot be determined at compile time that some + constant is free of integers, and the table uses <c>=:=/2</c> + when comparing keys for equality (see option <seealso + marker="#key_equality">key_equality</seealso>), then the + <c>qlc</c> module does not try to look up the constant. The reason is that there is in the general case no upper limit on the number of key values that can compare equal to such a - constant; every combination of integers and floats has to be + constant; every combination of integers and floats must be looked up:</p> <pre> @@ -586,11 +613,11 @@ ets:table(53264, 3> <input>lists:sort(qlc:e(Q2)).</input> [a,b,c]</pre> - <p>Looking up just <c>{2,2}</c> would not return <c>b</c> and + <p>Looking up only <c>{2,2}</c> would not return <c>b</c> and <c>c</c>.</p> <p>If the table uses <c>==/2</c> when comparing keys for equality, - the <c>qlc</c> module will look up the constant regardless of + the <c>qlc</c> module looks up the constant regardless of which operator is used in the QLC. However, <c>==/2</c> is to be preferred:</p> @@ -608,19 +635,18 @@ ets:match_spec_run(ets:lookup(86033, {2,2}), [b]</pre> <p>Lookup join is handled analogously to lookup of constants in a - table: if the join operator is <c>==/2</c> and the table where + table: if the join operator is <c>==/2</c>, and the table where constants are to be looked up uses <c>=:=/2</c> when testing - keys for equality, the <c>qlc</c> module will not consider + keys for equality, then the <c>qlc</c> module does not consider lookup join for that table.</p> - </section> <datatypes> <datatype> <name name="abstract_expr"></name> - <desc><p>Parse trees for Erlang expression, see the <seealso - marker="erts:absform">abstract format</seealso> - documentation in the ERTS User's Guide.</p></desc> + <desc><p>Parse trees for Erlang expression, see section <seealso + marker="erts:absform">The Abstract Format</seealso> + in the ERTS User's Guide.</p></desc> </datatype> <datatype> <name name="answer"></name> @@ -633,14 +659,14 @@ ets:match_spec_run(ets:lookup(86033, {2,2}), </datatype> <datatype> <name name="match_expression"></name> - <desc><p>Match specification, see the <seealso - marker="erts:match_spec">match specification</seealso> - documentation in the ERTS User's Guide and <seealso - marker="ms_transform">ms_transform(3).</seealso></p></desc> + <desc><p>Match specification, see section <seealso + marker="erts:match_spec">Match Specifications in Erlang</seealso> + in the ERTS User's Guide and <seealso + marker="ms_transform"><c>ms_transform(3)</c></seealso>.</p></desc> </datatype> <datatype> <name name="no_files"></name> - <desc><p>Actually an integer > 1.</p></desc> + <desc><p>An integer > 1.</p></desc> </datatype> <datatype> <name name="key_pos"></name> @@ -671,7 +697,7 @@ ets:match_spec_run(ets:lookup(86033, {2,2}), <name name="query_list_comprehension"></name> <desc><p>A literal <seealso marker="#query_list_comprehension">query - list comprehension</seealso>.</p></desc> + list comprehension</seealso>.</p></desc> </datatype> <datatype> <name name="spawn_options"></name> @@ -682,7 +708,7 @@ ets:match_spec_run(ets:lookup(86033, {2,2}), <datatype> <name name="sort_option"></name> <desc><p>See <seealso - marker="file_sorter">file_sorter(3)</seealso>.</p></desc> + marker="file_sorter"><c>file_sorter(3)</c></seealso>.</p></desc> </datatype> <datatype> <name name="tmp_directory"></name> @@ -693,15 +719,14 @@ ets:match_spec_run(ets:lookup(86033, {2,2}), </datatypes> <funcs> - <func> <name name="append" arity="1"/> <fsummary>Return a query handle.</fsummary> <desc> - <p>Returns a query handle. When evaluating the query handle - <c><anno>QH</anno></c> all answers to the first query handle in - <c><anno>QHL</anno></c> are returned followed by all answers - to the rest of the query handles in <c><anno>QHL</anno></c>.</p> + <p>Returns a query handle. When evaluating query handle + <c><anno>QH</anno></c>, all answers to the first query handle in + <c><anno>QHL</anno></c> are returned, followed by all answers + to the remaining query handles in <c><anno>QHL</anno></c>.</p> </desc> </func> @@ -709,11 +734,10 @@ ets:match_spec_run(ets:lookup(86033, {2,2}), <name name="append" arity="2"/> <fsummary>Return a query handle.</fsummary> <desc> - <p>Returns a query handle. When evaluating the query handle - <c><anno>QH3</anno></c> all answers to - <c><anno>QH1</anno></c> are returned followed by all answers + <p>Returns a query handle. When evaluating query handle + <c><anno>QH3</anno></c>, all answers to + <c><anno>QH1</anno></c> are returned, followed by all answers to <c><anno>QH2</anno></c>.</p> - <p><c>append(QH1, QH2)</c> is equivalent to <c>append([QH1, QH2])</c>.</p> </desc> @@ -724,15 +748,18 @@ ets:match_spec_run(ets:lookup(86033, {2,2}), <name name="cursor" arity="2"/> <fsummary>Create a query cursor.</fsummary> <desc> - <p><marker id="cursor"></marker>Creates a query cursor and + <p>Creates a query cursor and makes the calling process the owner of the cursor. The - cursor is to be used as argument to <c>next_answers/1,2</c> - and (eventually) <c>delete_cursor/1</c>. Calls - <c>erlang:spawn_opt</c> to spawn and link a process which - will evaluate the query handle. The value of the option + cursor is to be used as argument to + <seealso marker="#next_answers/1"> + <c>next_answers/1,2</c></seealso> and (eventually) + <seealso marker="#delete_cursor/1"><c>delete_cursor/1</c></seealso>. + Calls <seealso marker="erts:erlang#spawn_opt/2"> + <c>erlang:spawn_opt/2</c></seealso> to spawn and link to + a process that evaluates the query handle. The value of option <c>spawn_options</c> is used as last argument when calling - <c>spawn_opt</c>. The default value is <c>[link]</c>.</p> - + <c>spawn_opt/2</c>. Defaults to <c>[link]</c>.</p> + <p><em>Example:</em></p> <pre> 1> <input>QH = qlc:q([{X,Y} || X <- [a,b], Y <- [1,2]]),</input> <input>QC = qlc:cursor(QH),</input> @@ -759,15 +786,15 @@ ok</pre> </func> <func> - <name name="eval" arity="1"/> - <name name="eval" arity="2"/> <name name="e" arity="1"/> <name name="e" arity="2"/> + <name name="eval" arity="1"/> + <name name="eval" arity="2"/> <fsummary>Return all answers to a query.</fsummary> <desc> - <p><marker id="eval"></marker>Evaluates a query handle in the + <p>Evaluates a query handle in the calling process and collects all answers in a list.</p> - + <p><em>Example:</em></p> <pre> 1> <input>QH = qlc:q([{X,Y} || X <- [a,b], Y <- [1,2]]),</input> <input>qlc:eval(QH).</input> @@ -786,11 +813,11 @@ ok</pre> the query handle together with an extra argument <c><anno>AccIn</anno></c>. The query handle and the function are evaluated in the calling process. - <c><anno>Function</anno></c> must return a new accumulator + <c><anno>Function</anno></c> must return a new accumulator, which is passed to the next call. <c><anno>Acc0</anno></c> is returned if there are no answers to the query handle.</p> - + <p><em>Example:</em></p> <pre> 1> <input>QH = [1,2,3,4,5,6],</input> <input>qlc:fold(fun(X, Sum) -> X + Sum end, 0, QH).</input> @@ -818,30 +845,46 @@ ok</pre> <name name="info" arity="2"/> <fsummary>Return code describing a query handle.</fsummary> <desc> - <p><marker id="info"></marker>Returns information about a + <p>Returns information about a query handle. The information describes the simplifications and optimizations that are the results of preparing the - query for evaluation. This function is probably useful - mostly during debugging.</p> - + query for evaluation. This function is probably mainly useful + during debugging.</p> <p>The information has the form of an Erlang expression where QLCs most likely occur. Depending on the format functions of - mentioned QLC tables it may not be absolutely accurate.</p> - - <p>The default is to return a sequence of QLCs in a block, but - if the option <c>{flat, false}</c> is given, one single - QLC is returned. The default is to return a string, but if - the option <c>{format, abstract_code}</c> is given, - abstract code is returned instead. In the abstract code - port identifiers, references, and pids are represented by - strings. The default is to return - all elements in lists, but if the - <c>{n_elements, NElements}</c> option is given, only a - limited number of elements are returned. The default is to - show all of objects and match specifications, but if the - <c>{depth, Depth}</c> option is given, parts of terms - below a certain depth are replaced by <c>'...'</c>.</p> - + mentioned QLC tables, it is not certain that the information + is absolutely accurate.</p> + <p>Options:</p> + <list type="bulleted"> + <item> + <p>The default is to return a sequence of QLCs in a block, but + if option <c>{flat, false}</c> is specified, one single + QLC is returned.</p> + </item> + <item> + <p>The default is to return a string, but if + option <c>{format, abstract_code}</c> is specified, + abstract code is returned instead. In the abstract code, + port identifiers, references, and pids are represented by + strings.</p> + </item> + <item> + <p>The default is to return all elements in lists, but if + option <c>{n_elements, NElements}</c> is specified, only + a limited number of elements are returned.</p> + </item> + <item> + <p>The default is to show all parts of + objects and match specifications, + but if option <c>{depth, Depth}</c> is specified, parts + of terms below a certain depth are replaced by <c>'...'</c>.</p> + </item> + </list> + <p><c>info(<anno>QH</anno>)</c> is equivalent to + <c>info(<anno>QH</anno>, [])</c>.</p> + <p><em>Examples:</em></p> + <p>In the following example two simple QLCs are inserted only to + hold option <c>{unique, true}</c>:</p> <pre> 1> <input>QH = qlc:q([{X,Y} || X <- [x,y], Y <- [a,b]]),</input> <input>io:format("~s~n", [qlc:info(QH, unique_all)]).</input> @@ -865,10 +908,11 @@ begin ], [{unique,true}]) end</pre> - - <p>In this example two simple QLCs have been inserted just to - hold the <c>{unique, true}</c> option.</p> - + <p>In the following example QLC <c>V2</c> has + been inserted to show the joined generators and the join + method chosen. A convention is used for lookup join: the + first generator (<c>G2</c>) is the one traversed, the second + (<c>G1</c>) is the table where constants are looked up.</p> <pre> 1> <input>E1 = ets:new(e1, []),</input> <input>E2 = ets:new(e2, []),</input> @@ -898,15 +942,6 @@ begin [{X,Z}|{W,Y}] <- V2 ]) end</pre> - - <p>In this example the query list comprehension <c>V2</c> has - been inserted to show the joined generators and the join - method chosen. A convention is used for lookup join: the - first generator (<c>G2</c>) is the one traversed, the second - one (<c>G1</c>) is the table where constants are looked up.</p> - - <p><c>info(<anno>QH</anno>)</c> is equivalent to - <c>info(<anno>QH</anno>, [])</c>.</p> </desc> </func> @@ -915,18 +950,16 @@ end</pre> <name name="keysort" arity="3"/> <fsummary>Return a query handle.</fsummary> <desc> - <p>Returns a query handle. When evaluating the query handle - <c><anno>QH2</anno></c> the answers to the query handle + <p>Returns a query handle. When evaluating query handle + <c><anno>QH2</anno></c>, the answers to query handle <c><anno>QH1</anno></c> are sorted by <seealso - marker="file_sorter">file_sorter:keysort/4</seealso> + marker="file_sorter#keysort/4"><c>file_sorter:keysort/4</c></seealso> according to the options.</p> - - <p>The sorter will use temporary files only if + <p>The sorter uses temporary files only if <c><anno>QH1</anno></c> does not evaluate to a list and the size of the binary representation of the answers exceeds - <c>Size</c> bytes, where <c>Size</c> is the value of the - <c>size</c> option.</p> - + <c>Size</c> bytes, where <c>Size</c> is the value of option + <c>size</c>.</p> <p><c>keysort(<anno>KeyPos</anno>, <anno>QH1</anno>)</c> is equivalent to <c>keysort(<anno>KeyPos</anno>, <anno>QH1</anno>, [])</c>.</p> @@ -941,10 +974,10 @@ end</pre> <p>Returns some or all of the remaining answers to a query cursor. Only the owner of <c><anno>QueryCursor</anno></c> can retrieve answers.</p> - <p>The optional argument <c>NumberOfAnswers</c>determines the - maximum number of answers returned. The default value is + <p>Optional argument <c>NumberOfAnswers</c> determines the + maximum number of answers returned. Defaults to <c>10</c>. If less than the requested number of answers is - returned, subsequent calls to <c>next_answers</c> will + returned, subsequent calls to <c>next_answers</c> return <c>[]</c>.</p> </desc> </func> @@ -954,92 +987,87 @@ end</pre> <name name="q" arity="2"/> <fsummary>Return a handle for a query list comprehension.</fsummary> <desc> - <p><marker id="q"></marker>Returns a query handle for a query - list comprehension. The query list comprehension must be the - first argument to <c>qlc:q/1,2</c> or it will be evaluated - as an ordinary list comprehension. It is also necessary to - add the line</p> - + <p>Returns a query handle for a QLC. + The QLC must be the first argument to this function, otherwise + it is evaluated as an ordinary list comprehension. It is also + necessary to add the following line to the source code:</p> <code type="none"> -include_lib("stdlib/include/qlc.hrl").</code> - - <p>to the source file. This causes a parse transform to - substitute a fun for the query list comprehension. The - (compiled) fun will be called when the query handle is - evaluated.</p> - - <p>When calling <c>qlc:q/1,2</c> from the Erlang shell the - parse transform is automatically called. When this happens - the fun substituted for the query list comprehension is not - compiled but will be evaluated by <c>erl_eval(3)</c>. This - is also true when expressions are evaluated by means of + <p>This causes a parse transform to substitute a fun for the QLC. The + (compiled) fun is called when the query handle is evaluated.</p> + <p>When calling <c>qlc:q/1,2</c> from the Erlang shell, the + parse transform is automatically called. When this occurs, the fun + substituted for the QLC is not compiled but is evaluated by + <seealso marker="erl_eval"><c>erl_eval(3)</c></seealso>. This + is also true when expressions are evaluated by <c>file:eval/1,2</c> or in the debugger.</p> - - <p>To be very explicit, this will not work:</p> - + <p>To be explicit, this does not work:</p> <pre> ... A = [X || {X} <- [{1},{2}]], QH = qlc:q(A), ...</pre> - - <p>The variable <c>A</c> will be bound to the evaluated value + <p>Variable <c>A</c> is bound to the evaluated value of the list comprehension (<c>[1,2]</c>). The compiler complains with an error message ("argument is not a query list comprehension"); the shell process stops with a <c>badarg</c> reason.</p> - <p><c>q(<anno>QLC</anno>)</c> is equivalent to <c>q(<anno>QLC</anno>, [])</c>.</p> - - <p>The <c>{cache, ets}</c> option can be used to cache - the answers to a query list comprehension. The answers are - stored in one ETS table for each cached query list - comprehension. When a cached query list comprehension is - evaluated again, answers are fetched from the table without - any further computations. As a consequence, when all answers - to a cached query list comprehension have been found, the - ETS tables used for caching answers to the query list - comprehension's qualifiers can be emptied. The option - <c>cache</c> is equivalent to <c>{cache, ets}</c>.</p> - - <p>The <c>{cache, list}</c> option can be used to cache - the answers to a query list comprehension just like - <c>{cache, ets}</c>. The difference is that the answers - are kept in a list (on the process heap). If the answers - would occupy more than a certain amount of RAM memory a - temporary file is used for storing the answers. The option - <c>max_list_size</c> sets the limit in bytes and the temporary - file is put on the directory set by the <c>tmpdir</c> option.</p> - - <p>The <c>cache</c> option has no effect if it is known that - the query list comprehension will be evaluated at most once. - This is always true for the top-most query list - comprehension and also for the list expression of the first - generator in a list of qualifiers. Note that in the presence - of side effects in filters or callback functions the answers - to query list comprehensions can be affected by the - <c>cache</c> option.</p> - - <p>The <c>{unique, true}</c> option can be used to remove - duplicate answers to a query list comprehension. The unique - answers are stored in one ETS table for each query list - comprehension. The table is emptied every time it is known - that there are no more answers to the query list - comprehension. The option <c>unique</c> is equivalent to - <c>{unique, true}</c>. If the <c>unique</c> option is - combined with the <c>{cache, ets}</c> option, two ETS - tables are used, but the full answers are stored in one - table only. If the <c>unique</c> option is combined with the - <c>{cache, list}</c> option the answers are sorted - twice using <c>keysort/3</c>; once to remove duplicates, and - once to restore the order.</p> - - <p>The <c>cache</c> and <c>unique</c> options apply not only - to the query list comprehension itself but also to the - results of looking up constants, running match - specifications, and joining handles. </p> - + <p>Options:</p> + <list type="bulleted"> + <item> + <p>Option <c>{cache, ets}</c> can be used to cache + the answers to a QLC. The answers are stored in one ETS + table for each cached QLC. When a cached QLC is + evaluated again, answers are fetched from the table without + any further computations. Therefore, when all answers to a + cached QLC have been found, the ETS tables used for + caching answers to the qualifiers of the QLC can be emptied. + Option <c>cache</c> is equivalent to <c>{cache, ets}</c>.</p> + </item> + <item> + <p>Option <c>{cache, list}</c> can be used to cache + the answers to a QLC like + <c>{cache, ets}</c>. The difference is that the answers + are kept in a list (on the process heap). If the answers + would occupy more than a certain amount of RAM memory, a + temporary file is used for storing the answers. Option + <c>max_list_size</c> sets the limit in bytes and the temporary + file is put on the directory set by option <c>tmpdir</c>.</p> + <p>Option <c>cache</c> has no effect if it is known that + the QLC is to be evaluated at most once. + This is always true for the top-most QLC + and also for the list expression of the first + generator in a list of qualifiers. Notice that in the presence + of side effects in filters or callback functions, the answers + to QLCs can be affected by option <c>cache</c>.</p> + </item> + <item> + <p>Option <c>{unique, true}</c> can be used to remove + duplicate answers to a QLC. The unique + answers are stored in one ETS table for each QLC. + The table is emptied every time it is known + that there are no more answers to the QLC. + Option <c>unique</c> is equivalent to + <c>{unique, true}</c>. If option <c>unique</c> is + combined with option <c>{cache, ets}</c>, two ETS + tables are used, but the full answers are stored in one + table only. If option <c>unique</c> is combined with option + <c>{cache, list}</c>, the answers are sorted + twice using + <seealso marker="#keysort/3"><c>keysort/3</c></seealso>; + once to remove duplicates and once to restore the order.</p> + </item> + </list> + <p>Options <c>cache</c> and <c>unique</c> apply not only + to the QLC itself but also to the results of looking up constants, + running match specifications, and joining handles.</p> + <p><em>Example:</em></p> + <p>In the following example the cached results of the merge join are + traversed for each value of <c>A</c>. Notice that without option + <c>cache</c> the join would have been carried out + three times, once for each value of <c>A</c>.</p> <pre> 1> <input>Q = qlc:q([{A,X,Z,W} ||</input> <input>A <- [a,b,c],</input> @@ -1076,29 +1104,31 @@ begin X =:= Y ]) end</pre> - - <p>In this example the cached results of the merge join are - traversed for each value of <c>A</c>. Note that without the - <c>cache</c> option the join would have been carried out - three times, once for each value of <c>A</c></p> - - <p><c>sort/1,2</c> and <c>keysort/2,3</c> can also be used for + <p><seealso marker="#sort/1"><c>sort/1,2</c></seealso> and + <seealso marker="#keysort/2"><c>keysort/2,3</c></seealso> + can also be used for caching answers and for removing duplicates. When sorting answers are cached in a list, possibly stored on a temporary file, and no ETS tables are used.</p> - <p>Sometimes (see <seealso - marker="#lookup_fun">qlc:table/2</seealso> below) traversal + marker="#table/2"><c>table/2</c></seealso>) traversal of tables can be done by looking up key values, which is - assumed to be fast. Under certain (rare) circumstances it - could happen that there are too many key values to look up. - <marker id="max_lookup"></marker> The - <c>{max_lookup, MaxLookup}</c> option can then be used + assumed to be fast. Under certain (rare) circumstances + there can be too many key values to look up. + <marker id="max_lookup"></marker> + Option <c>{max_lookup, MaxLookup}</c> can then be used to limit the number of lookups: if more than - <c>MaxLookup</c> lookups would be required no lookups are - done but the table traversed instead. The default value is - <c>infinity</c> which means that there is no limit on the + <c>MaxLookup</c> lookups would be required, no lookups are + done but the table is traversed instead. Defaults to + <c>infinity</c>, which means that there is no limit on the number of keys to look up.</p> + <p><em>Example:</em></p> + <p>In the following example, using the <c>gb_table</c> module from + section <seealso marker="#implementing_a_qlc_table">Implementing a + QLC Table</seealso>, there are six keys to look up: + <c>{1,a}</c>, <c>{1,b}</c>, <c>{1,c}</c>, <c>{2,a}</c>, + <c>{2,b}</c>, and <c>{2,c}</c>. The reason is that the two + elements of key <c>{X, Y}</c> are compared separately.</p> <pre> 1> <input>T = gb_trees:empty(),</input> <input>QH = qlc:q([X || {{X,Y},_} <- gb_table:table(T),</input> @@ -1119,39 +1149,41 @@ ets:match_spec_run( end, [{1,a},{1,b},{1,c},{2,a},{2,b},{2,c}]), ets:match_spec_compile([{{{'$1','$2'},'_'},[],['$1']}]))</pre> - - <p>In this example using the <c>gb_table</c> module from the - <seealso marker="#implementing_a_qlc_table">Implementing a - QLC table</seealso> section there are six keys to look up: - <c>{1,a}</c>, <c>{1,b}</c>, <c>{1,c}</c>, <c>{2,a}</c>, - <c>{2,b}</c>, and <c>{2,c}</c>. The reason is that the two - elements of the key {X, Y} are compared separately.</p> - - <p>The <c>{lookup, true}</c> option can be used to ensure - that the <c>qlc</c> module will look up constants in some - QLC table. If there - are more than one QLC table among the generators' list - expressions, constants have to be looked up in at least one - of the tables. The evaluation of the query fails if there - are no constants to look up. This option is useful in - situations when it would be unacceptable to traverse all - objects in some table. Setting the <c>lookup</c> option to - <c>false</c> ensures that no constants will be looked up - (<c>{max_lookup, 0}</c> has the same effect). The - default value is <c>any</c> which means that constants will - be looked up whenever possible.</p> - - <p>The <c>{join, Join}</c> option can be used to ensure - that a certain join method will be used: - <c>{join, lookup}</c> invokes the lookup join method; - <c>{join, merge}</c> invokes the merge join method; and - <c>{join, nested_loop}</c> invokes the method of - matching every pair of objects from two handles. The last - method is mostly very slow. The evaluation of the query - fails if the <c>qlc</c> module cannot carry out the chosen - join method. The - default value is <c>any</c> which means that some fast join - method will be used if possible.</p> + <p>Options:</p> + <list type="bulleted"> + <item> + <p>Option <c>{lookup, true}</c> can be used to ensure + that the <c>qlc</c> module looks up constants in some + QLC table. If there are more than one QLC table among the + list expressions of the generators, + constants must be looked up in at least one + of the tables. The evaluation of the query fails if there + are no constants to look up. This option is useful + when it would be unacceptable to traverse all + objects in some table. Setting option <c>lookup</c> to + <c>false</c> ensures that no constants are looked up + (<c>{max_lookup, 0}</c> has the same effect). + Defaults to <c>any</c>, which means that constants are + looked up whenever possible.</p> + </item> + <item> + <p>Option <c>{join, Join}</c> can be used to ensure + that a certain join method is used:</p> + <list type="bulleted"> + <item><c>{join, lookup}</c> invokes the lookup join + method.</item> + <item><c>{join, merge}</c> invokes the merge join + method.</item> + <item><c>{join, nested_loop}</c> invokes the method of + matching every pair of objects from two handles. This + method is mostly very slow.</item> + </list> + <p>The evaluation of the query fails if the <c>qlc</c> module + cannot carry out the chosen join method. Defaults to + <c>any</c>, which means that some fast join + method is used if possible.</p> + </item> + </list> </desc> </func> @@ -1160,21 +1192,18 @@ ets:match_spec_run( <name name="sort" arity="2"/> <fsummary>Return a query handle.</fsummary> <desc> - <p>Returns a query handle. When evaluating the query handle - <c><anno>QH2</anno></c> the answers to the query handle + <p>Returns a query handle. When evaluating query handle + <c><anno>QH2</anno></c>, the answers to query handle <c><anno>QH1</anno></c> are sorted by <seealso - marker="file_sorter">file_sorter:sort/3</seealso> according - to the options.</p> - - <p>The sorter will use temporary files only if + marker="file_sorter#sort/3"><c>file_sorter:sort/3</c></seealso> + according to the options.</p> + <p>The sorter uses temporary files only if <c><anno>QH1</anno></c> does not evaluate to a list and the size of the binary representation of the answers exceeds - <c>Size</c> bytes, where <c>Size</c> is the value of the - <c>size</c> option.</p> - + <c>Size</c> bytes, where <c>Size</c> is the value of option + <c>size</c>.</p> <p><c>sort(<anno>QH1</anno>)</c> is equivalent to <c>sort(<anno>QH1</anno>, [])</c>.</p> - </desc> </func> @@ -1184,31 +1213,27 @@ ets:match_spec_run( <name name="string_to_handle" arity="3"/> <fsummary>Return a handle for a query list comprehension.</fsummary> <desc> - <p>A string version of <c>qlc:q/1,2</c>. When the query handle - is evaluated the fun created by the parse transform is - interpreted by <c>erl_eval(3)</c>. The query string is to be - one single query list comprehension terminated by a - period.</p> - + <p>A string version of <seealso marker="#q/1"><c>q/1,2</c></seealso>. + When the query handle is evaluated, the fun created by the parse + transform is interpreted by + <seealso marker="erl_eval"><c>erl_eval(3)</c></seealso>. + The query string is to be one single QLC terminated by a period.</p> + <p><em>Example:</em></p> <pre> 1> <input>L = [1,2,3],</input> <input>Bs = erl_eval:add_binding('L', L, erl_eval:new_bindings()),</input> <input>QH = qlc:string_to_handle("[X+1 || X <- L].", [], Bs),</input> <input>qlc:eval(QH).</input> [2,3,4]</pre> - <p><c>string_to_handle(<anno>QueryString</anno>)</c> is equivalent to <c>string_to_handle(<anno>QueryString</anno>, [])</c>.</p> - <p><c>string_to_handle(<anno>QueryString</anno>, - <anno>Options</anno>)</c> - is equivalent to + <anno>Options</anno>)</c> is equivalent to <c>string_to_handle(<anno>QueryString</anno>, <anno>Options</anno>, erl_eval:new_bindings())</c>.</p> - - <p>This function is probably useful mostly when called from - outside of Erlang, for instance from a driver written in C.</p> + <p>This function is probably mainly useful when called from + outside of Erlang, for example from a driver written in C.</p> </desc> </func> @@ -1216,199 +1241,222 @@ ets:match_spec_run( <name name="table" arity="2"/> <fsummary>Return a query handle for a table.</fsummary> <desc> - <p><marker id="table"></marker>Returns a query handle for a - QLC table. In Erlang/OTP there is support for ETS, Dets and - Mnesia tables, but it is also possible to turn many other - data structures into QLC tables. The way to accomplish this - is to let function(s) in the module implementing the data - structure create a query handle by calling - <c>qlc:table/2</c>. The different ways to traverse the table - as well as properties of the table are handled by callback + <p>Returns a query handle for a QLC table. + In Erlang/OTP there is support for ETS, Dets, and + Mnesia tables, but many other data structures can be turned + into QLC tables. This is accomplished by letting function(s) in the + module implementing the data structure create a query handle by + calling <c>qlc:table/2</c>. The different ways to traverse the table + and properties of the table are handled by callback functions provided as options to <c>qlc:table/2</c>.</p> - - <p>The callback function <c><anno>TraverseFun</anno></c> is - used for traversing the table. It is to return a list of - objects terminated by either <c>[]</c> or a nullary fun to - be used for traversing the not yet traversed objects of the - table. Any other return value is immediately returned as - value of the query evaluation. Unary - <c><anno>TraverseFun</anno></c>s are to accept a match - specification as argument. The match specification is - created by the parse transform by analyzing the pattern of - the generator calling <c>qlc:table/2</c> and filters using - variables introduced in the pattern. If the parse transform - cannot find a match specification equivalent to the pattern - and filters, <c><anno>TraverseFun</anno></c> will be called - with a match specification returning every object. Modules - that can utilize match specifications for optimized - traversal of tables should call <c>qlc:table/2</c> with a - unary - <c><anno>TraverseFun</anno></c> while other modules can - provide a nullary - <c><anno>TraverseFun</anno></c>. <c>ets:table/2</c> is an - example of the former; <c>gb_table:table/1</c> in the - <seealso marker="#implementing_a_qlc_table">Implementing a - QLC table</seealso> section is an example of the latter.</p> - - <p><c><anno>PreFun</anno></c> is a unary callback function - that is called once before the table is read for the first - time. If the call fails, the query evaluation fails. - Similarly, the nullary callback function - <c><anno>PostFun</anno></c> is called once after the table - was last read. The return value, which is caught, is - ignored. If <c><anno>PreFun</anno></c> has been called for a - table, - <c><anno>PostFun</anno></c> is guaranteed to be called for - that table, even if the evaluation of the query fails for - some reason. The order in which pre (post) functions for - different tables are evaluated is not specified. Other table - access than reading, such as calling - <c><anno>InfoFun</anno></c>, is assumed to be OK at any - time. The argument <c><anno>PreArgs</anno></c> is a list of - tagged values. Currently there are two tags, - <c>parent_value</c> and <c>stop_fun</c>, used by Mnesia for - managing transactions. The value of <c>parent_value</c> is - the value returned by <c><anno>ParentFun</anno></c>, or - <c>undefined</c> if there is no <c>ParentFun</c>. - <c><anno>ParentFun</anno></c> is called once just before the - call of - <c><anno>PreFun</anno></c> in the context of the process - calling - <c>eval</c>, <c>fold</c>, or - <c>cursor</c>. The value of <c>stop_fun</c> is a nullary fun - that deletes the cursor if called from the parent, or - <c>undefined</c> if there is no cursor.</p> - - <p><marker id="lookup_fun"></marker>The binary callback - function <c><anno>LookupFun</anno></c> is used for looking - up objects in the table. The first argument - <c><anno>Position</anno></c> is the key position or an - indexed position and the second argument - <c><anno>Keys</anno></c> is a sorted list of unique values. - The return value is to be a list of all objects (tuples) - such that the element at <c>Position</c> is a member of - <c><anno>Keys</anno></c>. Any other return value is - immediately returned as value of the query evaluation. - <c><anno>LookupFun</anno></c> is called instead of - traversing the table if the parse transform at compile time - can find out that the filters match and compare the element - at <c><anno>Position</anno></c> in such a way that only - <c><anno>Keys</anno></c> need to be looked up in order to - find all potential answers. The key position is obtained by - calling - <c><anno>InfoFun</anno>(keypos)</c> and the indexed - positions by calling - <c><anno>InfoFun</anno>(indices)</c>. If the key position - can be used for lookup it is always chosen, otherwise the - indexed position requiring the least number of lookups is - chosen. If there is a tie between two indexed positions the - one occurring first in the list returned by - <c><anno>InfoFun</anno></c> is chosen. Positions requiring - more than <seealso marker="#max_lookup">max_lookup</seealso> - lookups are ignored.</p> - - <p>The unary callback function <c><anno>InfoFun</anno></c> is - to return information about the table. <c>undefined</c> - should be returned if the value of some tag is unknown:</p> - <list type="bulleted"> - <item><c>indices</c>. Returns a list of indexed - positions, a list of positive integers. - </item> - <item><c>is_unique_objects</c>. Returns <c>true</c> if - the objects returned by <c>TraverseFun</c> are unique. + <item> + <p>Callback function <c><anno>TraverseFun</anno></c> is + used for traversing the table. It is to return a list of + objects terminated by either <c>[]</c> or a nullary fun to + be used for traversing the not yet traversed objects of the + table. Any other return value is immediately returned as + value of the query evaluation. Unary + <c><anno>TraverseFun</anno></c>s are to accept a match + specification as argument. The match specification is + created by the parse transform by analyzing the pattern of + the generator calling <c>qlc:table/2</c> and filters using + variables introduced in the pattern. If the parse transform + cannot find a match specification equivalent to the pattern + and filters, <c><anno>TraverseFun</anno></c> is called + with a match specification returning every object.</p> + <list type="bulleted"> + <item> + <p>Modules that can use match specifications for optimized + traversal of tables are to call <c>qlc:table/2</c> with an unary + <c><anno>TraverseFun</anno></c>. An example is + <seealso marker="ets#table/2"> + <c>ets:table/2</c></seealso>.</p> + </item> + <item> + <p>Other modules can provide a nullary + <c><anno>TraverseFun</anno></c>. An example is + <c>gb_table:table/1</c> in section + <seealso marker="#implementing_a_qlc_table">Implementing a + QLC Table</seealso>.</p> + </item> + </list> </item> - <item><c>keypos</c>. Returns the position of the table's - key, a positive integer. + <item> + <p>Unary callback function <c><anno>PreFun</anno></c> is + called once before the table is read for the first time. + If the call fails, the query evaluation fails.</p> + <p>Argument <c><anno>PreArgs</anno></c> is a list of tagged values. + There are two tags, <c>parent_value</c> and <c>stop_fun</c>, used + by Mnesia for managing transactions.</p> + <list type="bulleted"> + <item> + <p>The value of <c>parent_value</c> is + the value returned by <c><anno>ParentFun</anno></c>, or + <c>undefined</c> if there is no <c>ParentFun</c>. + <c><anno>ParentFun</anno></c> is called once just before the + call of <c><anno>PreFun</anno></c> in the context of the + process calling + <seealso marker="#eval/1"><c>eval/1,2</c></seealso>, + <seealso marker="#fold/3"><c>fold/3,4</c></seealso>, or + <seealso marker="#cursor/1"><c>cursor/1,2</c></seealso>. + </p> + </item> + <item> + <p>The value of <c>stop_fun</c> is a nullary fun + that deletes the cursor if called from the parent, or + <c>undefined</c> if there is no cursor.</p> + </item> + </list> </item> - <item><c>is_sorted_key</c>. Returns <c>true</c> if - the objects returned by <c>TraverseFun</c> are sorted - on the key. + <item> + <p>Nullary callback function + <c><anno>PostFun</anno></c> is called once after the table + was last read. The return value, which is caught, is ignored. + If <c><anno>PreFun</anno></c> has been called for a table, + <c><anno>PostFun</anno></c> is guaranteed to be called for + that table, even if the evaluation of the query fails for + some reason.</p> + <p>The pre (post) functions for different tables are evaluated in + unspecified order.</p> + <p>Other table access than reading, such as calling + <c><anno>InfoFun</anno></c>, is assumed to be OK at any time.</p> </item> - <item><c>num_of_objects</c>. Returns the number of - objects in the table, a non-negative integer. + <item> + <p><marker id="lookup_fun"></marker>Binary callback + function <c><anno>LookupFun</anno></c> is used for looking + up objects in the table. The first argument + <c><anno>Position</anno></c> is the key position or an + indexed position and the second argument + <c><anno>Keys</anno></c> is a sorted list of unique values. + The return value is to be a list of all objects (tuples), + such that the element at <c>Position</c> is a member of + <c><anno>Keys</anno></c>. Any other return value is + immediately returned as value of the query evaluation. + <c><anno>LookupFun</anno></c> is called instead of + traversing the table if the parse transform at compile time + can determine that the filters match and compare the element + at <c><anno>Position</anno></c> in such a way that only + <c><anno>Keys</anno></c> need to be looked up to + find all potential answers.</p> + <p>The key position is obtained by calling + <c><anno>InfoFun</anno>(keypos)</c> and the indexed + positions by calling + <c><anno>InfoFun</anno>(indices)</c>. If the key position + can be used for lookup, it is always chosen, otherwise the + indexed position requiring the least number of lookups is + chosen. If there is a tie between two indexed positions, the + one occurring first in the list returned by + <c><anno>InfoFun</anno></c> is chosen. Positions requiring + more than <seealso marker="#max_lookup">max_lookup</seealso> + lookups are ignored.</p> </item> - </list> - - <p>The unary callback function <c><anno>FormatFun</anno></c> - is used by <seealso marker="#info">qlc:info/1,2</seealso> - for displaying the call that created the table's query - handle. The default value, <c>undefined</c>, means that - <c>info/1,2</c> displays a call to <c>'$MOD':'$FUN'/0</c>. - It is up to <c><anno>FormatFun</anno></c> to present the - selected objects of the table in a suitable way. However, if - a character list is chosen for presentation it must be an - Erlang expression that can be scanned and parsed (a trailing - dot will be added by <c>qlc:info</c> though). - <c><anno>FormatFun</anno></c> is called with an argument - that describes the selected objects based on optimizations - done as a result of analyzing the filters of the QLC where - the call to - <c>qlc:table/2</c> occurs. The possible values of the - argument are:</p> - - <list type="bulleted"> - <item><c>{lookup, Position, Keys, NElements, DepthFun}</c>. - <c>LookupFun</c> is used for looking up objects in the - table. + <item> + <p>Unary callback function <c><anno>InfoFun</anno></c> is + to return information about the table. <c>undefined</c> + is to be returned if the value of some tag is unknown:</p> + <taglist> + <tag><c>indices</c></tag> + <item>Returns a list of indexed positions, a list of positive + integers.</item> + <tag><c>is_unique_objects</c></tag> + <item>Returns <c>true</c> if the objects returned by + <c>TraverseFun</c> are unique. + </item> + <tag><c>keypos</c></tag> + <item>Returns the position of the table key, a positive integer. + </item> + <tag><c>is_sorted_key</c></tag> + <item>Returns <c>true</c> if the objects returned by + <c>TraverseFun</c> are sorted on the key. + </item> + <tag><c>num_of_objects</c></tag> + <item>Returns the number of objects in the table, a non-negative + integer. + </item> + </taglist> </item> - <item><c>{match_spec, MatchExpression}</c>. No way of - finding all possible answers by looking up keys was - found, but the filters could be transformed into a - match specification. All answers are found by calling - <c>TraverseFun(MatchExpression)</c>. + <item> + <p>Unary callback function <c><anno>FormatFun</anno></c> + is used by <seealso marker="#info/1"><c>info/1,2</c></seealso> + for displaying the call that created the query handle of the + table. Defaults to <c>undefined</c>, which means that + <c>info/1,2</c> displays a call to <c>'$MOD':'$FUN'/0</c>. + It is up to <c><anno>FormatFun</anno></c> to present the + selected objects of the table in a suitable way. However, if + a character list is chosen for presentation, it must be an + Erlang expression that can be scanned and parsed (a trailing + dot is added by <c>info/1,2</c> though).</p> + <p><c><anno>FormatFun</anno></c> is called with an argument + that describes the selected objects based on optimizations + done as a result of analyzing the filters of the QLC where + the call to <c>qlc:table/2</c> occurs. The argument can have the + following values:</p> + <taglist> + <tag><c>{lookup, Position, Keys, NElements, DepthFun}</c>.</tag> + <item> + <p><c>LookupFun</c> is used for looking up objects in the + table.</p> + </item> + <tag><c>{match_spec, MatchExpression}</c></tag> + <item> + <p>No way of finding all possible answers by looking up keys + was found, but the filters could be transformed into a + match specification. All answers are found by calling + <c>TraverseFun(MatchExpression)</c>.</p> + </item> + <tag><c>{all, NElements, DepthFun}</c></tag> + <item> + <p>No optimization was found. A match specification matching + all objects is used if <c>TraverseFun</c> is unary.</p> + <p><c>NElements</c> is the value of the <c>info/1,2</c> option + <c>n_elements</c>.</p> + <p><c>DepthFun</c> is a function that can be used for + limiting the size of terms; calling + <c>DepthFun(Term)</c> substitutes <c>'...'</c> for + parts of <c>Term</c> below the depth specified by the + <c>info/1,2</c> option <c>depth</c>.</p> + <p>If calling <c><anno>FormatFun</anno></c> with an + argument including <c>NElements</c> and + <c>DepthFun</c> fails, <c><anno>FormatFun</anno></c> + is called once again with an argument excluding + <c>NElements</c> and <c>DepthFun</c> + (<c>{lookup, Position, Keys}</c> or + <c>all</c>).</p> + </item> + </taglist> </item> - <item><c>{all, NElements, DepthFun}</c>. No optimization was - found. A match specification matching all objects will be - used if <c>TraverseFun</c> is unary. + <item><p><marker id="key_equality"></marker>The value of option + <c>key_equality</c> is to be <c>'=:='</c> if the table + considers two keys equal if they match, and to be + <c>'=='</c> if two keys are equal if they compare equal. + Defaults to <c>'=:='</c>.</p> </item> </list> - - <p><c>NElements</c> is the value of the <c>info/1,2</c> option - <c>n_elements</c>, and <c>DepthFun</c> is a function that - can be used for limiting the size of terms; calling - <c>DepthFun(Term)</c> substitutes <c>'...'</c> for parts of - <c>Term</c> below the depth specified by the <c>info/1,2</c> - option <c>depth</c>. If calling - <c><anno>FormatFun</anno></c> with an argument including - <c>NElements</c> and <c>DepthFun</c> fails, - <c><anno>FormatFun</anno></c> is called once again with an - argument excluding - <c>NElements</c> and <c>DepthFun</c> - (<c>{lookup, Position, Keys}</c> or - <c>all</c>).</p> - - <p><marker id="key_equality"></marker>The value of - <c>key_equality</c> is to be <c>'=:='</c> if the table - considers two keys equal if they match, and to be - <c>'=='</c> if two keys are equal if they compare equal. The - default is <c>'=:='</c>.</p> - - <p>See <seealso marker="ets#qlc_table">ets(3)</seealso>, - <seealso marker="dets#qlc_table">dets(3)</seealso> and - <seealso marker="mnesia:mnesia#qlc_table">mnesia(3)</seealso> - for the various options recognized by <c>table/1,2</c> in - respective module.</p> + <p>For the various options recognized by <c>table/1,2</c> + in respective module, see + <seealso marker="ets#table/1"><c>ets(3)</c></seealso>, + <seealso marker="dets#table/1"><c>dets(3)</c></seealso>, and + <seealso marker="mnesia:mnesia#table/1"><c>mnesia(3)</c></seealso>. + </p> </desc> </func> - </funcs> <section> <title>See Also</title> - <p><seealso marker="dets">dets(3)</seealso>, + <p><seealso marker="dets"><c>dets(3)</c></seealso>, + <seealso marker="erl_eval"><c>erl_eval(3)</c></seealso>, + <seealso marker="erts:erlang"><c>erlang(3)</c></seealso>, + <seealso marker="kernel:error_logger"><c>error_logger(3)</c></seealso>, + <seealso marker="ets"><c>ets(3)</c></seealso>, + <seealso marker="kernel:file"><c>file(3)</c></seealso>, + <seealso marker="file_sorter"><c>file_sorter(3)</c></seealso>, + <seealso marker="mnesia:mnesia"><c>mnesia(3)</c></seealso>, + <seealso marker="shell"><c>shell(3)</c></seealso>, <seealso marker="doc/reference_manual:users_guide"> - Erlang Reference Manual</seealso>, - <seealso marker="erl_eval">erl_eval(3)</seealso>, - <seealso marker="erts:erlang">erlang(3)</seealso>, - <seealso marker="ets">ets(3)</seealso>, - <seealso marker="kernel:file">file(3)</seealso>, - <seealso marker="error_logger:file">error_logger(3)</seealso>, - <seealso marker="file_sorter">file_sorter(3)</seealso>, - <seealso marker="mnesia:mnesia">mnesia(3)</seealso>, + Erlang Reference Manual</seealso>, <seealso marker="doc/programming_examples:users_guide"> - Programming Examples</seealso>, - <seealso marker="shell">shell(3)</seealso></p> + Programming Examples</seealso></p> </section> </erlref> - diff --git a/lib/stdlib/doc/src/queue.xml b/lib/stdlib/doc/src/queue.xml index e1a96f5c65..9f3aff03a3 100644 --- a/lib/stdlib/doc/src/queue.xml +++ b/lib/stdlib/doc/src/queue.xml @@ -28,63 +28,74 @@ <docno>1</docno> <approved>Bjarne Däcker</approved> <checked></checked> - <date>97-01-15</date> + <date>1997-01-15</date> <rev>B</rev> - <file>queue.sgml</file> + <file>queue.xml</file> </header> <module>queue</module> - <modulesummary>Abstract Data Type for FIFO Queues</modulesummary> + <modulesummary>Abstract data type for FIFO queues.</modulesummary> <description> - <p>This module implements (double ended) FIFO queues + <p>This module provides (double-ended) FIFO queues in an efficient manner.</p> + <p>All functions fail with reason <c>badarg</c> if arguments - are of wrong type, for example queue arguments are not - queues, indexes are not integers, list arguments are + are of wrong type, for example, queue arguments are not + queues, indexes are not integers, and list arguments are not lists. Improper lists cause internal crashes. An index out of range for a queue also causes a failure with reason <c>badarg</c>.</p> + <p>Some functions, where noted, fail with reason <c>empty</c> for an empty queue.</p> + <p>The data representing a queue as used by this module - should be regarded as opaque by other modules. Any code + is to be regarded as opaque by other modules. Any code assuming knowledge of the format is running on thin ice.</p> + <p>All operations has an amortized O(1) running time, except - <c>len/1</c>, <c>join/2</c>, <c>split/2</c>, <c>filter/2</c> - and <c>member/2</c> that have O(n). + <seealso marker="#filter/2"><c>filter/2</c></seealso>, + <seealso marker="#join/2"><c>join/2</c></seealso>, + <seealso marker="#len/1"><c>len/1</c></seealso>, + <seealso marker="#member/2"><c>member/2</c></seealso>, + <seealso marker="#split/2"><c>split/2</c></seealso> that have O(n). To minimize the size of a queue minimizing the amount of garbage built by queue operations, the queues do not contain explicit length information, and that is why <c>len/1</c> is O(n). If better performance for this particular operation is essential, it is easy for the caller to keep track of the length.</p> - <p>Queues are double ended. The mental picture of + + <p>Queues are double-ended. The mental picture of a queue is a line of people (items) waiting for their turn. The queue front is the end with the item that has waited the longest. The queue rear is the end an item enters when it starts to wait. If instead using the mental picture of a list, the front is called head and the rear is called tail.</p> + <p>Entering at the front and exiting at the rear are reverse operations on the queue.</p> - <p>The module has several sets of interface functions. The - "Original API", the "Extended API" and the "Okasaki API".</p> + + <p>This module has three sets of interface functions: the + "Original API", the "Extended API", and the "Okasaki API".</p> + <p>The "Original API" and the "Extended API" both use the - mental picture of a waiting line of items. Both also + mental picture of a waiting line of items. Both have reverse operations suffixed "_r".</p> + <p>The "Original API" item removal functions return compound terms with both the removed item and the resulting queue. - The "Extended API" contain alternative functions that build - less garbage as well as functions for just inspecting the + The "Extended API" contains alternative functions that build + less garbage and functions for just inspecting the queue ends. Also the "Okasaki API" functions build less garbage.</p> - <p>The "Okasaki API" is inspired by "Purely Functional Data structures" + + <p>The "Okasaki API" is inspired by "Purely Functional Data Structures" by Chris Okasaki. It regards queues as lists. - The API is by many regarded as strange and avoidable. - For example many reverse operations have lexically reversed names, + This API is by many regarded as strange and avoidable. + For example, many reverse operations have lexically reversed names, some with more readable but perhaps less understandable aliases.</p> </description> - - <section> <title>Original API</title> </section> @@ -92,7 +103,8 @@ <datatypes> <datatype> <name name="queue" n_vars="1"/> - <desc><p>As returned by <c>new/0</c>.</p></desc> + <desc><p>As returned by + <seealso marker="#new/0"><c>new/0</c></seealso>.</p></desc> </datatype> <datatype> <name name="queue" n_vars="0"/> @@ -101,205 +113,229 @@ <funcs> <func> - <name name="new" arity="0"/> - <fsummary>Create an empty queue</fsummary> + <name name="filter" arity="2"/> + <fsummary>Filter a queue.</fsummary> <desc> - <p>Returns an empty queue.</p> + <p>Returns a queue <c><anno>Q2</anno></c> that is the result of calling + <c><anno>Fun</anno>(<anno>Item</anno>)</c> on all items in + <c><anno>Q1</anno></c>, in order from front to rear.</p> + <p>If <c><anno>Fun</anno>(<anno>Item</anno>)</c> returns <c>true</c>, + <c>Item</c> is copied to the result queue. If it returns <c>false</c>, + <c><anno>Item</anno></c> is not copied. If it returns a list, + the list elements are inserted instead of <c>Item</c> in the + result queue.</p> + <p>So, <c><anno>Fun</anno>(<anno>Item</anno>)</c> returning + <c>[<anno>Item</anno>]</c> is thereby + semantically equivalent to returning <c>true</c>, just + as returning <c>[]</c> is semantically equivalent to + returning <c>false</c>. But returning a list builds + more garbage than returning an atom.</p> </desc> </func> + <func> - <name name="is_queue" arity="1"/> - <fsummary>Test if a term is a queue</fsummary> + <name name="from_list" arity="1"/> + <fsummary>Convert a list to a queue.</fsummary> <desc> - <p>Tests if <c><anno>Term</anno></c> is a queue and returns <c>true</c> if so and - <c>false</c> otherwise.</p> + <p>Returns a queue containing the items in <c><anno>L</anno></c> in the + same order; the head item of the list becomes the front + item of the queue.</p> </desc> </func> + <func> - <name name="is_empty" arity="1"/> - <fsummary>Test if a queue is empty</fsummary> + <name name="in" arity="2"/> + <fsummary>Insert an item at the rear of a queue.</fsummary> <desc> - <p>Tests if <c><anno>Q</anno></c> is empty and returns <c>true</c> if so and - <c>false</c> otherwise.</p> + <p>Inserts <c><anno>Item</anno></c> at the rear of queue + <c><anno>Q1</anno></c>. + Returns the resulting queue <c><anno>Q2</anno></c>.</p> </desc> </func> + <func> - <name name="len" arity="1"/> - <fsummary>Get the length of a queue</fsummary> + <name name="in_r" arity="2"/> + <fsummary>Insert an item at the front of a queue.</fsummary> <desc> - <p>Calculates and returns the length of queue <c><anno>Q</anno></c>.</p> + <p>Inserts <c><anno>Item</anno></c> at the front of queue + <c><anno>Q1</anno></c>. + Returns the resulting queue <c><anno>Q2</anno></c>.</p> </desc> </func> <func> - <name name="in" arity="2"/> - <fsummary>Insert an item at the rear of a queue</fsummary> + <name name="is_empty" arity="1"/> + <fsummary>Test if a queue is empty.</fsummary> <desc> - <p>Inserts <c><anno>Item</anno></c> at the rear of queue <c><anno>Q1</anno></c>. - Returns the resulting queue <c><anno>Q2</anno></c>.</p> + <p>Tests if <c><anno>Q</anno></c> is empty and returns <c>true</c> if + so, otherwise otherwise.</p> </desc> </func> + <func> - <name name="in_r" arity="2"/> - <fsummary>Insert an item at the front of a queue</fsummary> + <name name="is_queue" arity="1"/> + <fsummary>Test if a term is a queue.</fsummary> <desc> - <p>Inserts <c><anno>Item</anno></c> at the front of queue <c><anno>Q1</anno></c>. - Returns the resulting queue <c><anno>Q2</anno></c>.</p> + <p>Tests if <c><anno>Term</anno></c> is a queue and returns <c>true</c> + if so, otherwise <c>false</c>.</p> </desc> </func> + <func> - <name name="out" arity="1"/> - <fsummary>Remove the front item from a queue</fsummary> + <name name="join" arity="2"/> + <fsummary>Join two queues.</fsummary> <desc> - <p>Removes the item at the front of queue <c><anno>Q1</anno></c>. Returns the - tuple <c>{{value, <anno>Item</anno>}, <anno>Q2</anno>}</c>, where <c><anno>Item</anno></c> is the - item removed and <c><anno>Q2</anno></c> is the resulting queue. If <c><anno>Q1</anno></c> is - empty, the tuple <c>{empty, <anno>Q1</anno>}</c> is returned.</p> + <p>Returns a queue <c><anno>Q3</anno></c> that is the result of joining + <c><anno>Q1</anno></c> and <c><anno>Q2</anno></c> with + <c><anno>Q1</anno></c> in front of <c><anno>Q2</anno></c>.</p> </desc> </func> + <func> - <name name="out_r" arity="1"/> - <fsummary>Remove the rear item from a queue</fsummary> + <name name="len" arity="1"/> + <fsummary>Get the length of a queue.</fsummary> <desc> - <p>Removes the item at the rear of the queue <c><anno>Q1</anno></c>. Returns the - tuple <c>{{value, <anno>Item</anno>}, <anno>Q2</anno>}</c>, where <c><anno>Item</anno></c> is the - item removed and <c><anno>Q2</anno></c> is the new queue. If <c><anno>Q1</anno></c> is - empty, the tuple <c>{empty, <anno>Q1</anno>}</c> is returned. </p> + <p>Calculates and returns the length of queue <c><anno>Q</anno></c>.</p> </desc> </func> <func> - <name name="from_list" arity="1"/> - <fsummary>Convert a list to a queue</fsummary> + <name name="member" arity="2"/> + <fsummary>Test if an item is in a queue.</fsummary> <desc> - <p>Returns a queue containing the items in <c><anno>L</anno></c> in the - same order; the head item of the list will become the front - item of the queue.</p> + <p>Returns <c>true</c> if <c><anno>Item</anno></c> matches some element + in <c><anno>Q</anno></c>, otherwise <c>false</c>.</p> </desc> </func> + <func> - <name name="to_list" arity="1"/> - <fsummary>Convert a queue to a list</fsummary> + <name name="new" arity="0"/> + <fsummary>Create an empty queue.</fsummary> <desc> - <p>Returns a list of the items in the queue in the same order; - the front item of the queue will become the head of the list.</p> + <p>Returns an empty queue.</p> </desc> </func> <func> - <name name="reverse" arity="1"/> - <fsummary>Reverse a queue</fsummary> + <name name="out" arity="1"/> + <fsummary>Remove the front item from a queue.</fsummary> <desc> - <p>Returns a queue <c><anno>Q2</anno></c> that contains the items of - <c><anno>Q1</anno></c> in the reverse order.</p> + <p>Removes the item at the front of queue <c><anno>Q1</anno></c>. + Returns tuple <c>{{value, <anno>Item</anno>}, <anno>Q2</anno>}</c>, + where <c><anno>Item</anno></c> is the item removed and + <c><anno>Q2</anno></c> is the resulting queue. If + <c><anno>Q1</anno></c> is empty, tuple + <c>{empty, <anno>Q1</anno>}</c> is returned.</p> </desc> </func> + <func> - <name name="split" arity="2"/> - <fsummary>Split a queue in two</fsummary> + <name name="out_r" arity="1"/> + <fsummary>Remove the rear item from a queue.</fsummary> <desc> - <p>Splits <c><anno>Q1</anno></c> in two. The <c><anno>N</anno></c> front items - are put in <c><anno>Q2</anno></c> and the rest in <c><anno>Q3</anno></c></p> + <p>Removes the item at the rear of queue <c><anno>Q1</anno></c>. + Returns tuple <c>{{value, <anno>Item</anno>}, <anno>Q2</anno>}</c>, + where <c><anno>Item</anno></c> is the item removed and + <c><anno>Q2</anno></c> is the new queue. If <c><anno>Q1</anno></c> is + empty, tuple <c>{empty, <anno>Q1</anno>}</c> is returned.</p> </desc> </func> + <func> - <name name="join" arity="2"/> - <fsummary>Join two queues</fsummary> + <name name="reverse" arity="1"/> + <fsummary>Reverse a queue.</fsummary> <desc> - <p>Returns a queue <c><anno>Q3</anno></c> that is the result of joining - <c><anno>Q1</anno></c> and <c><anno>Q2</anno></c> with <c><anno>Q1</anno></c> in front of - <c><anno>Q2</anno></c>.</p> + <p>Returns a queue <c><anno>Q2</anno></c> containing the items of + <c><anno>Q1</anno></c> in the reverse order.</p> </desc> </func> + <func> - <name name="filter" arity="2"/> - <fsummary>Filter a queue</fsummary> + <name name="split" arity="2"/> + <fsummary>Split a queue in two.</fsummary> <desc> - <p>Returns a queue <c><anno>Q2</anno></c> that is the result of calling - <c><anno>Fun</anno>(<anno>Item</anno>)</c> on all items in <c><anno>Q1</anno></c>, - in order from front to rear.</p> - <p>If <c><anno>Fun</anno>(<anno>Item</anno>)</c> returns <c>true</c>, <c>Item</c> - is copied to the result queue. If it returns <c>false</c>, - <c><anno>Item</anno></c> is not copied. If it returns a list - the list elements are inserted instead of <c>Item</c> in the - result queue.</p> - <p>So, <c><anno>Fun</anno>(<anno>Item</anno>)</c> returning <c>[<anno>Item</anno>]</c> is thereby - semantically equivalent to returning <c>true</c>, just - as returning <c>[]</c> is semantically equivalent to - returning <c>false</c>. But returning a list builds - more garbage than returning an atom.</p> + <p>Splits <c><anno>Q1</anno></c> in two. The <c><anno>N</anno></c> + front items are put in <c><anno>Q2</anno></c> and the rest in + <c><anno>Q3</anno></c>.</p> </desc> </func> + <func> - <name name="member" arity="2"/> - <fsummary>Test if an item is in a queue</fsummary> + <name name="to_list" arity="1"/> + <fsummary>Convert a queue to a list.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>Item</anno></c> matches some element - in <c><anno>Q</anno></c>, otherwise <c>false</c>.</p> + <p>Returns a list of the items in the queue in the same order; + the front item of the queue becomes the head of the list.</p> </desc> </func> </funcs> - - <section> <title>Extended API</title> </section> <funcs> <func> - <name name="get" arity="1"/> - <fsummary>Return the front item of a queue</fsummary> - <desc> - <p>Returns <c><anno>Item</anno></c> at the front of queue <c><anno>Q</anno></c>.</p> - <p>Fails with reason <c>empty</c> if <c><anno>Q</anno></c> is empty.</p> - </desc> - </func> - <func> - <name name="get_r" arity="1"/> - <fsummary>Return the rear item of a queue</fsummary> - <desc> - <p>Returns <c><anno>Item</anno></c> at the rear of queue <c><anno>Q</anno></c>.</p> - <p>Fails with reason <c>empty</c> if <c><anno>Q</anno></c> is empty.</p> - </desc> - </func> - <func> <name name="drop" arity="1"/> - <fsummary>Remove the front item from a queue</fsummary> + <fsummary>Remove the front item from a queue.</fsummary> <desc> <p>Returns a queue <c><anno>Q2</anno></c> that is the result of removing the front item from <c><anno>Q1</anno></c>.</p> <p>Fails with reason <c>empty</c> if <c><anno>Q1</anno></c> is empty.</p> </desc> </func> + <func> <name name="drop_r" arity="1"/> - <fsummary>Remove the rear item from a queue</fsummary> + <fsummary>Remove the rear item from a queue.</fsummary> <desc> <p>Returns a queue <c><anno>Q2</anno></c> that is the result of removing the rear item from <c><anno>Q1</anno></c>.</p> <p>Fails with reason <c>empty</c> if <c><anno>Q1</anno></c> is empty.</p> </desc> </func> + + <func> + <name name="get" arity="1"/> + <fsummary>Return the front item of a queue.</fsummary> + <desc> + <p>Returns <c><anno>Item</anno></c> at the front of queue + <c><anno>Q</anno></c>.</p> + <p>Fails with reason <c>empty</c> if <c><anno>Q</anno></c> is empty.</p> + </desc> + </func> + + <func> + <name name="get_r" arity="1"/> + <fsummary>Return the rear item of a queue.</fsummary> + <desc> + <p>Returns <c><anno>Item</anno></c> at the rear of queue + <c><anno>Q</anno></c>.</p> + <p>Fails with reason <c>empty</c> if <c><anno>Q</anno></c> is empty.</p> + </desc> + </func> + <func> <name name="peek" arity="1"/> - <fsummary>Return the front item of a queue</fsummary> + <fsummary>Return the front item of a queue.</fsummary> <desc> - <p>Returns the tuple <c>{value, <anno>Item</anno>}</c> where <c><anno>Item</anno></c> is the - front item of <c><anno>Q</anno></c>, or <c>empty</c> if <c><anno>Q</anno></c> is empty.</p> + <p>Returns tuple <c>{value, <anno>Item</anno>}</c>, where + <c><anno>Item</anno></c> is the front item of <c><anno>Q</anno></c>, + or <c>empty</c> if <c><anno>Q</anno></c> is empty.</p> </desc> </func> + <func> <name name="peek_r" arity="1"/> - <fsummary>Return the rear item of a queue</fsummary> + <fsummary>Return the rear item of a queue.</fsummary> <desc> - <p>Returns the tuple <c>{value, <anno>Item</anno>}</c> where <c><anno>Item</anno></c> is the - rear item of <c><anno>Q</anno></c>, or <c>empty</c> if <c><anno>Q</anno></c> is empty.</p> + <p>Returns tuple <c>{value, <anno>Item</anno>}</c>, where + <c><anno>Item</anno></c> is the rear item of <c><anno>Q</anno></c>, + or <c>empty</c> if <c><anno>Q</anno></c> is empty.</p> </desc> </func> </funcs> - <section> <title>Okasaki API</title> </section> @@ -307,58 +343,92 @@ <funcs> <func> <name name="cons" arity="2"/> - <fsummary>Insert an item at the head of a queue</fsummary> + <fsummary>Insert an item at the head of a queue.</fsummary> <desc> - <p>Inserts <c><anno>Item</anno></c> at the head of queue <c><anno>Q1</anno></c>. Returns + <p>Inserts <c><anno>Item</anno></c> at the head of queue + <c><anno>Q1</anno></c>. Returns the new queue <c><anno>Q2</anno></c>.</p> </desc> </func> + + <func> + <name name="daeh" arity="1"/> + <fsummary>Return the tail item of a queue.</fsummary> + <desc> + <p>Returns the tail item of queue <c><anno>Q</anno></c>.</p> + <p>Fails with reason <c>empty</c> if <c><anno>Q</anno></c> is empty.</p> + </desc> + </func> + <func> <name name="head" arity="1"/> - <fsummary>Return the item at the head of a queue</fsummary> + <fsummary>Return the item at the head of a queue.</fsummary> <desc> - <p>Returns <c><anno>Item</anno></c> from the head of queue <c><anno>Q</anno></c>.</p> + <p>Returns <c><anno>Item</anno></c> from the head of queue + <c><anno>Q</anno></c>.</p> <p>Fails with reason <c>empty</c> if <c><anno>Q</anno></c> is empty.</p> </desc> </func> + <func> - <name name="tail" arity="1"/> - <fsummary>Remove the head item from a queue</fsummary> + <name name="init" arity="1"/> + <fsummary>Remove the tail item from a queue.</fsummary> <desc> <p>Returns a queue <c><anno>Q2</anno></c> that is the result of removing - the head item from <c><anno>Q1</anno></c>.</p> + the tail item from <c><anno>Q1</anno></c>.</p> <p>Fails with reason <c>empty</c> if <c><anno>Q1</anno></c> is empty.</p> </desc> </func> + <func> - <name name="snoc" arity="2"/> - <fsummary>Insert an item at the tail of a queue</fsummary> + <name name="lait" arity="1"/> + <fsummary>Remove the tail item from a queue.</fsummary> <desc> - <p>Inserts <c><anno>Item</anno></c> as the tail item of queue <c><anno>Q1</anno></c>. Returns - the new queue <c><anno>Q2</anno></c>.</p> + <p>Returns a queue <c><anno>Q2</anno></c> that is the result of removing + the tail item from <c><anno>Q1</anno></c>.</p> + <p>Fails with reason <c>empty</c> if <c><anno>Q1</anno></c> is empty.</p> + <p>The name <c>lait/1</c> is a misspelling - do not use it anymore.</p> </desc> </func> + <func> - <name name="daeh" arity="1"/> <name name="last" arity="1"/> - <fsummary>Return the tail item of a queue</fsummary> + <fsummary>Return the tail item of a queue.</fsummary> <desc> <p>Returns the tail item of queue <c><anno>Q</anno></c>.</p> <p>Fails with reason <c>empty</c> if <c><anno>Q</anno></c> is empty.</p> </desc> </func> + <func> <name name="liat" arity="1"/> - <name name="init" arity="1"/> - <name name="lait" arity="1"/> - <fsummary>Remove the tail item from a queue</fsummary> + <fsummary>Remove the tail item from a queue.</fsummary> <desc> <p>Returns a queue <c><anno>Q2</anno></c> that is the result of removing the tail item from <c><anno>Q1</anno></c>.</p> <p>Fails with reason <c>empty</c> if <c><anno>Q1</anno></c> is empty.</p> - <p>The name <c>lait/1</c> is a misspelling - do not use it anymore.</p> </desc> </func> - </funcs> + <func> + <name name="snoc" arity="2"/> + <fsummary>Insert an item at the tail of a queue.</fsummary> + <desc> + <p>Inserts <c><anno>Item</anno></c> as the tail item of queue + <c><anno>Q1</anno></c>. Returns + the new queue <c><anno>Q2</anno></c>.</p> + </desc> + </func> + + <func> + <name name="tail" arity="1"/> + <fsummary>Remove the head item from a queue.</fsummary> + <desc> + <p>Returns a queue <c><anno>Q2</anno></c> that is the result of removing + the head item from <c><anno>Q1</anno></c>.</p> + <p>Fails with reason <c>empty</c> if <c><anno>Q1</anno></c> is empty.</p> + </desc> + </func> + </funcs> </erlref> + diff --git a/lib/stdlib/doc/src/rand.xml b/lib/stdlib/doc/src/rand.xml index 50057259c6..1dcc3de000 100644 --- a/lib/stdlib/doc/src/rand.xml +++ b/lib/stdlib/doc/src/rand.xml @@ -33,215 +33,231 @@ <file>rand.xml</file> </header> <module>rand</module> - <modulesummary>Pseudo random number generation</modulesummary> + <modulesummary>Pseudo random number generation.</modulesummary> <description> - <p>Random number generator.</p> - - <p>The module contains several different algorithms and can be - extended with more in the future. The current uniform - distribution algorithms uses the - <url href="http://xorshift.di.unimi.it"> - scrambled Xorshift algorithms by Sebastiano Vigna</url> and the - normal distribution algorithm uses the - <url href="http://www.jstatsoft.org/v05/i08"> - Ziggurat Method by Marsaglia and Tsang</url>. - </p> - - <p>The implemented algorithms are:</p> + <p>This module provides a random number generator. The module contains + a number of algorithms. The uniform distribution algorithms use the + <url href="http://xorshift.di.unimi.it">scrambled Xorshift algorithms by + Sebastiano Vigna</url>. The normal distribution algorithm uses the + <url href="http://www.jstatsoft.org/v05/i08">Ziggurat Method by Marsaglia + and Tsang</url>.</p> + + <p>The following algorithms are provided:</p> + <taglist> - <tag><c>exsplus</c></tag> <item>Xorshift116+, 58 bits precision and period of 2^116-1.</item> - <tag><c>exs64</c></tag> <item>Xorshift64*, 64 bits precision and a period of 2^64-1.</item> - <tag><c>exs1024</c></tag> <item>Xorshift1024*, 64 bits precision and a period of 2^1024-1.</item> + <tag><c>exsplus</c></tag> + <item> + <p>Xorshift116+, 58 bits precision and period of 2^116-1</p> + </item> + <tag><c>exs64</c></tag> + <item> + <p>Xorshift64*, 64 bits precision and a period of 2^64-1</p> + </item> + <tag><c>exs1024</c></tag> + <item> + <p>Xorshift1024*, 64 bits precision and a period of 2^1024-1</p> + </item> </taglist> - <p>The current default algorithm is <c>exsplus</c>. The default - may change in future. If a specific algorithm is required make - sure to always use <seealso marker="#seed-1">seed/1</seealso> - to initialize the state. - </p> + <p>The default algorithm is <c>exsplus</c>. If a specific algorithm is + required, ensure to always use <seealso marker="#seed-1"> + <c>seed/1</c></seealso> to initialize the state.</p> <p>Every time a random number is requested, a state is used to - calculate it and a new state produced. The state can either be - implicit or it can be an explicit argument and return value. - </p> + calculate it and a new state is produced. The state can either be + implicit or be an explicit argument and return value.</p> <p>The functions with implicit state use the process dictionary - variable <c>rand_seed</c> to remember the current state.</p> + variable <c>rand_seed</c> to remember the current state.</p> + + <p>If a process calls + <seealso marker="#uniform-0"><c>uniform/0</c></seealso> or + <seealso marker="#uniform-1"><c>uniform/1</c></seealso> without + setting a seed first, <seealso marker="#seed-1"><c>seed/1</c></seealso> + is called automatically with the default algorithm and creates a + non-constant seed.</p> + + <p>The functions with explicit state never use the process dictionary.</p> + + <p><em>Examples:</em></p> + + <p>Simple use; creates and seeds the default algorithm + with a non-constant seed if not already done:</p> + + <pre> +R0 = rand:uniform(), +R1 = rand:uniform(),</pre> - <p>If a process calls <seealso marker="#uniform-0">uniform/0</seealso> or - <seealso marker="#uniform-1">uniform/1</seealso> without - setting a seed first, <seealso marker="#seed-1">seed/1</seealso> - is called automatically with the default algorithm and creates a - non-constant seed.</p> + <p>Use a specified algorithm:</p> - <p>The functions with explicit state never use the process - dictionary.</p> + <pre> +_ = rand:seed(exs1024), +R2 = rand:uniform(),</pre> + + <p>Use a specified algorithm with a constant seed:</p> - <p>Examples:</p> <pre> - %% Simple usage. Creates and seeds the default algorithm - %% with a non-constant seed if not already done. - R0 = rand:uniform(), - R1 = rand:uniform(), - - %% Use a given algorithm. - _ = rand:seed(exs1024), - R2 = rand:uniform(), - - %% Use a given algorithm with a constant seed. - _ = rand:seed(exs1024, {123, 123534, 345345}), - R3 = rand:uniform(), - - %% Use the functional api with non-constant seed. - S0 = rand:seed_s(exsplus), - {R4, S1} = rand:uniform_s(S0), - - %% Create a standard normal deviate. - {SND0, S2} = rand:normal_s(S1), - </pre> - - <note><p>This random number generator is not cryptographically - strong. If a strong cryptographic random number generator is - needed, use one of functions in the - <seealso marker="crypto:crypto">crypto</seealso> - module, for example <c>crypto:strong_rand_bytes/1</c>.</p></note> +_ = rand:seed(exs1024, {123, 123534, 345345}), +R3 = rand:uniform(),</pre> + + <p>Use the functional API with a non-constant seed:</p> + + <pre> +S0 = rand:seed_s(exsplus), +{R4, S1} = rand:uniform_s(S0),</pre> + + <p>Create a standard normal deviate:</p> + + <pre> +{SND0, S2} = rand:normal_s(S1),</pre> + + <note> + <p>This random number generator is not cryptographically + strong. If a strong cryptographic random number generator is + needed, use one of functions in the + <seealso marker="crypto:crypto"><c>crypto</c></seealso> + module, for example, <seealso marker="crypto:crypto"> + <c>crypto:strong_rand_bytes/1</c></seealso>.</p> + </note> + </description> <datatypes> <datatype> <name name="alg"/> </datatype> - <datatype> <name name="state"/> - <desc><p>Algorithm dependent state.</p></desc> + <desc><p>Algorithm-dependent state.</p></desc> </datatype> - <datatype> <name name="export_state"/> - <desc><p>Algorithm dependent state which can be printed or saved to file.</p></desc> + <desc><p>Algorithm-dependent state that can be printed or saved to + file.</p></desc> </datatype> </datatypes> <funcs> <func> - <name name="seed" arity="1"/> - <fsummary>Seed random number generator</fsummary> - <desc> - <marker id="seed-1"/> - <p>Seeds random number generation with the given algorithm and time dependent - data if <anno>AlgOrExpState</anno> is an algorithm.</p> - <p>Otherwise recreates the exported seed in the process - dictionary, and returns the state. - <em>See also:</em> <seealso marker="#export_seed-0">export_seed/0</seealso>.</p> + <name name="export_seed" arity="0"/> + <fsummary>Export the random number generation state.</fsummary> + <desc><marker id="export_seed-0"/> + <p>Returns the random number state in an external format. + To be used with <seealso marker="#seed-1"><c>seed/1</c></seealso>.</p> </desc> </func> + <func> - <name name="seed_s" arity="1"/> - <fsummary>Seed random number generator</fsummary> - <desc> - <p>Seeds random number generation with the given algorithm and time dependent - data if <anno>AlgOrExpState</anno> is an algorithm.</p> - <p>Otherwise recreates the exported seed and returns the state. - <em>See also:</em> <seealso marker="#export_seed-0">export_seed/0</seealso>.</p> + <name name="export_seed_s" arity="1"/> + <fsummary>Export the random number generation state.</fsummary> + <desc><marker id="export_seed_s-1"/> + <p>Returns the random number generator state in an external format. + To be used with <seealso marker="#seed-1"><c>seed/1</c></seealso>.</p> </desc> </func> + <func> - <name name="seed" arity="2"/> - <fsummary>Seed the random number generation</fsummary> + <name name="normal" arity="0"/> + <fsummary>Return a standard normal distributed random float.</fsummary> <desc> - <p>Seeds random number generation with the given algorithm and - integers in the process dictionary and returns - the state.</p> + <p>Returns a standard normal deviate float (that is, the mean + is 0 and the standard deviation is 1) and updates the state in + the process dictionary.</p> </desc> </func> + <func> - <name name="seed_s" arity="2"/> - <fsummary>Seed the random number generation</fsummary> + <name name="normal_s" arity="1"/> + <fsummary>Return a standard normal distributed random float.</fsummary> <desc> - <p>Seeds random number generation with the given algorithm and - integers and returns the state.</p> + <p>Returns, for a specified state, a standard normal + deviate float (that is, the mean is 0 and the standard + deviation is 1) and a new state.</p> </desc> </func> <func> - <name name="export_seed" arity="0"/> - <fsummary>Export the random number generation state</fsummary> - <desc><marker id="export_seed-0"/> - <p>Returns the random number state in an external format. - To be used with <seealso marker="#seed-1">seed/1</seealso>.</p> + <name name="seed" arity="1"/> + <fsummary>Seed random number generator.</fsummary> + <desc> + <marker id="seed-1"/> + <p>Seeds random number generation with the specifed algorithm and + time-dependent data if <anno>AlgOrExpState</anno> is an algorithm.</p> + <p>Otherwise recreates the exported seed in the process dictionary, + and returns the state. See also + <seealso marker="#export_seed-0"><c>export_seed/0</c></seealso>.</p> </desc> </func> <func> - <name name="export_seed_s" arity="1"/> - <fsummary>Export the random number generation state</fsummary> - <desc><marker id="export_seed_s-1"/> - <p>Returns the random number generator state in an external format. - To be used with <seealso marker="#seed-1">seed/1</seealso>.</p> + <name name="seed" arity="2"/> + <fsummary>Seed the random number generation.</fsummary> + <desc> + <p>Seeds random number generation with the specified algorithm and + integers in the process dictionary and returns the state.</p> </desc> </func> <func> - <name name="uniform" arity="0"/> - <fsummary>Return a random float</fsummary> + <name name="seed_s" arity="1"/> + <fsummary>Seed random number generator.</fsummary> <desc> - <marker id="uniform-0"/> - <p>Returns a random float uniformly distributed in the value - range <c>0.0 < <anno>X</anno> < 1.0 </c> and - updates the state in the process dictionary.</p> + <p>Seeds random number generation with the specifed algorithm and + time-dependent data if <anno>AlgOrExpState</anno> is an algorithm.</p> + <p>Otherwise recreates the exported seed and returns the state. + See also <seealso marker="#export_seed-0"> + <c>export_seed/0</c></seealso>.</p> </desc> </func> + <func> - <name name="uniform_s" arity="1"/> - <fsummary>Return a random float</fsummary> + <name name="seed_s" arity="2"/> + <fsummary>Seed the random number generation.</fsummary> <desc> - <p>Given a state, <c>uniform_s/1</c> returns a random float - uniformly distributed in the value range <c>0.0 < - <anno>X</anno> < 1.0</c> and a new state.</p> + <p>Seeds random number generation with the specified algorithm and + integers and returns the state.</p> </desc> </func> <func> - <name name="uniform" arity="1"/> - <fsummary>Return a random integer</fsummary> - <desc> - <marker id="uniform-1"/> - <p>Given an integer <c><anno>N</anno> >= 1</c>, - <c>uniform/1</c> returns a random integer uniformly - distributed in the value range - <c>1 <= <anno>X</anno> <= <anno>N</anno></c> and - updates the state in the process dictionary.</p> + <name name="uniform" arity="0"/> + <fsummary>Return a random float.</fsummary> + <desc><marker id="uniform-0"/> + <p>Returns a random float uniformly distributed in the value + range <c>0.0 < <anno>X</anno> < 1.0</c> and + updates the state in the process dictionary.</p> </desc> </func> + <func> - <name name="uniform_s" arity="2"/> - <fsummary>Return a random integer</fsummary> - <desc> - <p>Given an integer <c><anno>N</anno> >= 1</c> and a state, - <c>uniform_s/2</c> returns a random integer uniformly - distributed in the value range <c>1 <= <anno>X</anno> <= - <anno>N</anno></c> and a new state.</p> + <name name="uniform" arity="1"/> + <fsummary>Return a random integer.</fsummary> + <desc><marker id="uniform-1"/> + <p>Returns, for a specified integer <c><anno>N</anno> >= 1</c>, + a random integer uniformly distributed in the value range + <c>1 <= <anno>X</anno> <= <anno>N</anno></c> and + updates the state in the process dictionary.</p> </desc> </func> <func> - <name name="normal" arity="0"/> - <fsummary>Return a standard normal distributed random float</fsummary> + <name name="uniform_s" arity="1"/> + <fsummary>Return a random float.</fsummary> <desc> - <p>Returns a standard normal deviate float (that is, the mean - is 0 and the standard deviation is 1) and updates the state in - the process dictionary.</p> + <p>Returns, for a specified state, random float + uniformly distributed in the value range <c>0.0 < + <anno>X</anno> < 1.0</c> and a new state.</p> </desc> </func> + <func> - <name name="normal_s" arity="1"/> - <fsummary>Return a standard normal distributed random float</fsummary> + <name name="uniform_s" arity="2"/> + <fsummary>Return a random integer.</fsummary> <desc> - <p>Given a state, <c>normal_s/1</c> returns a standard normal - deviate float (that is, the mean is 0 and the standard - deviation is 1) and a new state.</p> + <p>Returns, for a specified integer <c><anno>N</anno> >= 1</c> + and a state, a random integer uniformly distributed in the value + range <c>1 <= <anno>X</anno> <= <anno>N</anno></c> and a + new state.</p> </desc> </func> - </funcs> </erlref> diff --git a/lib/stdlib/doc/src/random.xml b/lib/stdlib/doc/src/random.xml index dea4e43c95..8d090d20b3 100644 --- a/lib/stdlib/doc/src/random.xml +++ b/lib/stdlib/doc/src/random.xml @@ -24,116 +24,140 @@ <title>random</title> <prepared>Joe Armstrong</prepared> - <responsible>Bjarne Dacker</responsible> + <responsible>Bjarne Däcker</responsible> <docno>1</docno> <approved>Bjarne Däcker</approved> <checked></checked> - <date>96-09-09</date> + <date>1996-09-09</date> <rev>A</rev> - <file>random.sgml</file> + <file>random.xml</file> </header> <module>random</module> - <modulesummary>Pseudo random number generation</modulesummary> + <modulesummary>Pseudo-random number generation.</modulesummary> <description> - <p>Random number generator. The method is attributed to - B.A. Wichmann and I.D.Hill, in 'An efficient and portable + <p>This module provides a random number generator. The method is attributed + to B.A. Wichmann and I.D. Hill in 'An efficient and portable pseudo-random number generator', Journal of Applied - Statistics. AS183. 1982. Also Byte March 1987. </p> - <p>The current algorithm is a modification of the version attributed - to Richard A O'Keefe in the standard Prolog library.</p> + Statistics. AS183. 1982. Also Byte March 1987.</p> + + <p>The algorithm is a modification of the version attributed + to Richard A. O'Keefe in the standard Prolog library.</p> + <p>Every time a random number is requested, a state is used to calculate - it, and a new state produced. The state can either be implicit (kept + it, and a new state is produced. The state can either be implicit (kept in the process dictionary) or be an explicit argument and return value. In this implementation, the state (the type <c>ran()</c>) consists of a tuple of three integers.</p> - <p>It should be noted that this random number generator is not cryptographically - strong. If a strong cryptographic random number generator is needed for - example <c>crypto:strong_rand_bytes/1</c> could be used instead.</p> - <note><p>The new and improved <seealso - marker="stdlib:rand">rand</seealso> module should be used - instead of this module.</p></note> + + <note> + <p>This random number generator is not cryptographically + strong. If a strong cryptographic random number generator is + needed, use one of functions in the + <seealso marker="crypto:crypto"><c>crypto</c></seealso> + module, for example, <seealso marker="crypto:crypto"> + <c>crypto:strong_rand_bytes/1</c></seealso>.</p> + </note> + + <note> + <p>The improved <seealso marker="rand"><c>rand</c></seealso> + module is to be used instead of this module.</p> + </note> </description> + <datatypes> <datatype> <name name="ran"/> <desc><p>The state.</p></desc> </datatype> </datatypes> + <funcs> <func> <name name="seed" arity="0"/> - <fsummary>Seeds random number generation with default values</fsummary> + <fsummary>Seed random number generation with default values.</fsummary> <desc> <p>Seeds random number generation with default (fixed) values - in the process dictionary, and returns the old state.</p> + in the process dictionary and returns the old state.</p> </desc> </func> + + <func> + <name name="seed" arity="1"/> + <fsummary>Seed random number generator.</fsummary> + <desc> + <p><c>seed({<anno>A1</anno>, <anno>A2</anno>, <anno>A3</anno>})</c> + is equivalent to + <c>seed(<anno>A1</anno>, <anno>A2</anno>, <anno>A3</anno>)</c>.</p> + </desc> + </func> + <func> <name name="seed" arity="3"/> - <fsummary>Seeds random number generator</fsummary> + <fsummary>Seed random number generator.</fsummary> <desc> <p>Seeds random number generation with integer values in the process - dictionary, and returns the old state.</p> - <p>One easy way of obtaining a unique value to seed with is to:</p> + dictionary and returns the old state.</p> + <p>The following is an easy way of obtaining a unique value to seed + with:</p> <code type="none"> random:seed(erlang:phash2([node()]), erlang:monotonic_time(), erlang:unique_integer())</code> - <p>See <seealso marker="erts:erlang#phash2/1"> - erlang:phash2/1</seealso>, <seealso marker="erts:erlang#node/0"> - node/0</seealso>, <seealso marker="erts:erlang#monotonic_time/0"> - erlang:monotonic_time/0</seealso>, and + <p>For details, see + <seealso marker="erts:erlang#phash2/1"> + <c>erlang:phash2/1</c></seealso>, + <seealso marker="erts:erlang#node/0"> + <c>erlang:node/0</c></seealso>, + <seealso marker="erts:erlang#monotonic_time/0"> + <c>erlang:monotonic_time/0</c></seealso>, and <seealso marker="erts:erlang#unique_integer/0"> - erlang:unique_integer/0</seealso>) for details.</p> - </desc> - </func> - <func> - <name name="seed" arity="1"/> - <fsummary>Seeds random number generator</fsummary> - <desc> - <p> - <c>seed({<anno>A1</anno>, <anno>A2</anno>, <anno>A3</anno>})</c> is equivalent to <c>seed(<anno>A1</anno>, <anno>A2</anno>, <anno>A3</anno>)</c>. - </p> + <c>erlang:unique_integer/0</c></seealso>.</p> </desc> </func> + <func> <name name="seed0" arity="0"/> - <fsummary>Return default state for random number generation</fsummary> + <fsummary>Return default state for random number generation.</fsummary> <desc> <p>Returns the default state.</p> </desc> </func> + <func> <name name="uniform" arity="0"/> - <fsummary>Return a random float</fsummary> + <fsummary>Return a random float.</fsummary> <desc> <p>Returns a random float uniformly distributed between <c>0.0</c> and <c>1.0</c>, updating the state in the process dictionary.</p> </desc> </func> + <func> <name name="uniform" arity="1"/> - <fsummary>Return a random integer</fsummary> + <fsummary>Return a random integer.</fsummary> <desc> - <p>Given an integer <c><anno>N</anno> >= 1</c>, <c>uniform/1</c> returns a - random integer uniformly distributed between <c>1</c> and - <c><anno>N</anno></c>, updating the state in the process dictionary.</p> + <p>Returns, for a specified integer <c><anno>N</anno> >= 1</c>, + a random integer uniformly distributed between <c>1</c> and + <c><anno>N</anno></c>, updating the state in the process + dictionary.</p> </desc> </func> + <func> <name name="uniform_s" arity="1"/> - <fsummary>Return a random float</fsummary> + <fsummary>Return a random float.</fsummary> <desc> - <p>Given a state, <c>uniform_s/1</c>returns a random float uniformly + <p>Returns, for a specified state, a random float uniformly distributed between <c>0.0</c> and <c>1.0</c>, and a new state.</p> </desc> </func> + <func> <name name="uniform_s" arity="2"/> - <fsummary>Return a random integer</fsummary> + <fsummary>Return a random integer.</fsummary> <desc> - <p>Given an integer <c><anno>N</anno> >= 1</c> and a state, <c>uniform_s/2</c> - returns a random integer uniformly distributed between <c>1</c> and + <p>Returns, for a specified integer <c><anno>N</anno> >= 1</c> and a + state, a random integer uniformly distributed between <c>1</c> and <c><anno>N</anno></c>, and a new state.</p> </desc> </func> @@ -143,12 +167,18 @@ random:seed(erlang:phash2([node()]), <title>Note</title> <p>Some of the functions use the process dictionary variable <c>random_seed</c> to remember the current seed.</p> - <p>If a process calls <c>uniform/0</c> or <c>uniform/1</c> without - setting a seed first, <c>seed/0</c> is called automatically.</p> - <p>The implementation changed in R15. Upgrading to R15 will break - applications that expect a specific output for a given seed. The output - is still deterministic number series, but different compared to releases - older than R15. The seed <c>{0,0,0}</c> will, for example, no longer + + <p>If a process calls + <seealso marker="#uniform/0"><c>uniform/0</c></seealso> or + <seealso marker="#uniform/1"><c>uniform/1</c></seealso> + without setting a seed first, + <seealso marker="#seed/0"><c>seed/0</c></seealso> + is called automatically.</p> + + <p>The implementation changed in Erlang/OTP R15. Upgrading to R15 breaks + applications that expect a specific output for a specified seed. The + output is still deterministic number series, but different compared to + releases older than R15. Seed <c>{0,0,0}</c> does, for example, no longer produce a flawed series of only zeros.</p> </section> </erlref> diff --git a/lib/stdlib/doc/src/re.xml b/lib/stdlib/doc/src/re.xml index fda79d51d5..7f4f0aa18c 100644 --- a/lib/stdlib/doc/src/re.xml +++ b/lib/stdlib/doc/src/re.xml @@ -35,39 +35,37 @@ <file>re.xml</file> </header> <module>re</module> - <modulesummary>Perl like regular expressions for Erlang</modulesummary> + <modulesummary>Perl-like regular expressions for Erlang.</modulesummary> <description> - <p>This module contains regular expression matching functions for - strings and binaries.</p> + strings and binaries.</p> <p>The <seealso marker="#regexp_syntax">regular expression</seealso> - syntax and semantics resemble that of Perl.</p> + syntax and semantics resemble that of Perl.</p> - <p>The library's matching algorithms are currently based on the - PCRE library, but not all of the PCRE library is interfaced and - some parts of the library go beyond what PCRE offers. The sections of - the PCRE documentation which are relevant to this module are included - here.</p> + <p>The matching algorithms of the library are based on the + PCRE library, but not all of the PCRE library is interfaced and + some parts of the library go beyond what PCRE offers. The sections of + the PCRE documentation that are relevant to this module are included + here.</p> <note> - <p>The Erlang literal syntax for strings uses the "\" - (backslash) character as an escape code. You need to escape - backslashes in literal strings, both in your code and in the shell, - with an additional backslash, i.e.: "\\".</p> + <p>The Erlang literal syntax for strings uses the "\" + (backslash) character as an escape code. You need to escape + backslashes in literal strings, both in your code and in the shell, + with an extra backslash, that is, "\\".</p> </note> - - </description> + <datatypes> <datatype> <name name="mp"/> <desc> - <p>Opaque datatype containing a compiled regular expression. - The mp() is guaranteed to be a tuple() having the atom - 're_pattern' as its first element, to allow for matching in - guards. The arity of the tuple() or the content of the other fields - may change in future releases.</p> + <p>Opaque data type containing a compiled regular expression. + <c>mp()</c> is guaranteed to be a tuple() having the atom + <c>re_pattern</c> as its first element, to allow for matching in + guards. The arity of the tuple or the content of the other fields + can change in future Erlang/OTP releases.</p> </desc> </datatype> <datatype> @@ -77,6 +75,7 @@ <name name="compile_option"/> </datatype> </datatypes> + <funcs> <func> <name name="compile" arity="1"/> @@ -85,90 +84,214 @@ <p>The same as <c>compile(<anno>Regexp</anno>,[])</c></p> </desc> </func> + <func> <name name="compile" arity="2"/> - <fsummary>Compile a regular expression into a match program</fsummary> + <fsummary>Compile a regular expression into a match program.</fsummary> <desc> - <p>This function compiles a regular expression with the syntax - described below into an internal format to be used later as a - parameter to the run/2,3 functions.</p> - <p>Compiling the regular expression before matching is useful if - the same expression is to be used in matching against multiple - subjects during the program's lifetime. Compiling once and - executing many times is far more efficient than compiling each - time one wants to match.</p> - <p>When the unicode option is given, the regular expression should be given as a valid Unicode <c>charlist()</c>, otherwise as any valid <c>iodata()</c>.</p> - - <p><marker id="compile_options"/>The options have the following meanings:</p> - <taglist> - <tag><c>unicode</c></tag> - <item>The regular expression is given as a Unicode <c>charlist()</c> and the resulting regular expression code is to be run against a valid Unicode <c>charlist()</c> subject. Also consider the <c>ucp</c> option when using Unicode characters.</item> - <tag><c>anchored</c></tag> - <item>The pattern is forced to be "anchored", that is, it is constrained to match only at the first matching point in the string that is being searched (the "subject string"). This effect can also be achieved by appropriate constructs in the pattern itself.</item> - <tag><c>caseless</c></tag> - <item>Letters in the pattern match both upper and lower case letters. It is equivalent to Perl's /i option, and it can be changed within a pattern by a (?i) option setting. Uppercase and lowercase letters are defined as in the ISO-8859-1 character set.</item> - <tag><c>dollar_endonly</c></tag> - <item>A dollar metacharacter in the pattern matches only at the end of the subject string. Without this option, a dollar also matches immediately before a newline at the end of the string (but not before any other newlines). The <c>dollar_endonly</c> option is ignored if <c>multiline</c> is given. There is no equivalent option in Perl, and no way to set it within a pattern.</item> - <tag><c>dotall</c></tag> - <item>A dot in the pattern matches all characters, including those that indicate newline. Without it, a dot does not match when the current position is at a newline. This option is equivalent to Perl's /s option, and it can be changed within a pattern by a (?s) option setting. A negative class such as [^a] always matches newline characters, independent of this option's setting.</item> - <tag><c>extended</c></tag> - <item>Whitespace data characters in the pattern are ignored except when escaped or inside a character class. Whitespace does not include the VT character (ASCII 11). In addition, characters between an unescaped # outside a character class and the next newline, inclusive, are also ignored. This is equivalent to Perl's /x option, and it can be changed within a pattern by a (?x) option setting. - -This option makes it possible to include comments inside complicated patterns. Note, however, that this applies only to data characters. Whitespace characters may never appear within special character sequences in a pattern, for example within the sequence <c>(?(</c> which introduces a conditional subpattern.</item> - <tag><c>firstline</c></tag> - <item>An unanchored pattern is required to match before or at the first newline in the subject string, though the matched text may continue over the newline.</item> - <tag><c>multiline</c></tag> - <item><p>By default, PCRE treats the subject string as consisting of a single line of characters (even if it actually contains newlines). The "start of line" metacharacter (^) matches only at the start of the string, while the "end of line" metacharacter ($) matches only at the end of the string, or before a terminating newline (unless <c>dollar_endonly</c> is given). This is the same as Perl.</p> - -<p>When <c>multiline</c> is given, the "start of line" and "end of line" constructs match immediately following or immediately before internal newlines in the subject string, respectively, as well as at the very start and end. This is equivalent to Perl's /m option, and it can be changed within a pattern by a (?m) option setting. If there are no newlines in a subject string, or no occurrences of ^ or $ in a pattern, setting <c>multiline</c> has no effect.</p> </item> - <tag><c>no_auto_capture</c></tag> - <item>Disables the use of numbered capturing parentheses in the pattern. Any opening parenthesis that is not followed by ? behaves as if it were followed by ?: but named parentheses can still be used for capturing (and they acquire numbers in the usual way). There is no equivalent of this option in Perl. -</item> - <tag><c>dupnames</c></tag> - <item>Names used to identify capturing subpatterns need not be unique. This can be helpful for certain types of pattern when it is known that only one instance of the named subpattern can ever be matched. There are more details of named subpatterns below</item> - <tag><c>ungreedy</c></tag> - <item>This option inverts the "greediness" of the quantifiers so that they are not greedy by default, but become greedy if followed by "?". It is not compatible with Perl. It can also be set by a (?U) option setting within the pattern.</item> - <tag><c>{newline, NLSpec}</c></tag> - <item> - <p>Override the default definition of a newline in the subject string, which is LF (ASCII 10) in Erlang.</p> - <taglist> - <tag><c>cr</c></tag> - <item>Newline is indicated by a single character CR (ASCII 13)</item> - <tag><c>lf</c></tag> - <item>Newline is indicated by a single character LF (ASCII 10), the default</item> - <tag><c>crlf</c></tag> - <item>Newline is indicated by the two-character CRLF (ASCII 13 followed by ASCII 10) sequence.</item> - <tag><c>anycrlf</c></tag> - <item>Any of the three preceding sequences should be recognized.</item> - <tag><c>any</c></tag> - <item>Any of the newline sequences above, plus the Unicode sequences VT (vertical tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS (paragraph separator, U+2029). </item> - </taglist> - </item> - <tag><c>bsr_anycrlf</c></tag> - <item>Specifies specifically that \R is to match only the cr, lf or crlf sequences, not the Unicode specific newline characters.</item> - <tag><c>bsr_unicode</c></tag> - <item>Specifies specifically that \R is to match all the Unicode newline characters (including crlf etc, the default).</item> - <tag><c>no_start_optimize</c></tag> - <item>This option disables optimization that may malfunction if "Special start-of-pattern items" are present in the regular expression. A typical example would be when matching "DEFABC" against "(*COMMIT)ABC", where the start optimization of PCRE would skip the subject up to the "A" and would never realize that the (*COMMIT) instruction should have made the matching fail. This option is only relevant if you use "start-of-pattern items", as discussed in the section "PCRE regular expression details" below.</item> - <tag><c>ucp</c></tag> - <item>Specifies that Unicode Character Properties should be used when - resolving \B, \b, \D, \d, \S, \s, \W and \w. Without this flag, only - ISO-Latin-1 properties are used. Using Unicode properties hurts - performance, but is semantically correct when working with Unicode - characters beyond the ISO-Latin-1 range.</item> - <tag><c>never_utf</c></tag> - <item>Specifies that the (*UTF) and/or (*UTF8) "start-of-pattern items" are forbidden. This flag can not be combined with <c>unicode</c>. Useful if ISO-Latin-1 patterns from an external source are to be compiled.</item> - </taglist> - </desc> + <p>Compiles a regular expression, with the syntax + described below, into an internal format to be used later as a + parameter to + <seealso marker="#run/2"><c>run/2</c></seealso> and + <seealso marker="#run/3"><c>run/3</c></seealso>.</p> + <p>Compiling the regular expression before matching is useful if + the same expression is to be used in matching against multiple + subjects during the lifetime of the program. Compiling once and + executing many times is far more efficient than compiling each + time one wants to match.</p> + <p>When option <c>unicode</c> is specified, the regular expression + is to be specified as a valid Unicode <c>charlist()</c>, otherwise as + any valid <c>iodata()</c>.</p> + <marker id="compile_options"/> + <p>Options:</p> + <taglist> + <tag><c>unicode</c></tag> + <item> + <p>The regular expression is specified as a Unicode + <c>charlist()</c> and the resulting regular expression code is to + be run against a valid Unicode <c>charlist()</c> subject. Also + consider option <c>ucp</c> when using Unicode characters.</p> + </item> + <tag><c>anchored</c></tag> + <item> + <p>The pattern is forced to be "anchored", that is, it is + constrained to match only at the first matching point in the + string that is searched (the "subject string"). This effect can + also be achieved by appropriate constructs in the pattern + itself.</p> + </item> + <tag><c>caseless</c></tag> + <item> + <p>Letters in the pattern match both uppercase and lowercase + letters. It is equivalent to Perl option <c>/i</c> and can be + changed within a pattern by a <c>(?i)</c> option setting. + Uppercase and lowercase letters are defined as in the ISO 8859-1 + character set.</p> + </item> + <tag><c>dollar_endonly</c></tag> + <item> + <p>A dollar metacharacter in the pattern matches only at the end of + the subject string. Without this option, a dollar also matches + immediately before a newline at the end of the string (but not + before any other newlines). This option is ignored if option + <c>multiline</c> is specified. There is no equivalent option in + Perl, and it cannot be set within a pattern.</p> + </item> + <tag><c>dotall</c></tag> + <item> + <p>A dot in the pattern matches all characters, including those + indicating newline. Without it, a dot does not match when the + current position is at a newline. This option is equivalent to + Perl option <c>/s</c> and it can be changed within a pattern by a + <c>(?s)</c> option setting. A negative class, such as <c>[^a]</c>, + always matches newline characters, independent of the setting of + this option.</p> + </item> + <tag><c>extended</c></tag> + <item> + <p>Whitespace data characters in the pattern are ignored except + when escaped or inside a character class. Whitespace does not + include character 'vt' (ASCII 11). Characters between an + unescaped <c>#</c> outside a character class and the next newline, + inclusive, are also ignored. This is equivalent to Perl option + <c>/x</c> and can be changed within a pattern by a <c>(?x)</c> + option setting.</p> + <p>With this option, comments inside complicated patterns can be + included. However, notice that this applies only to data + characters. Whitespace characters can never appear within special + character sequences in a pattern, for example within sequence + <c>(?(</c> that introduces a conditional subpattern.</p> + </item> + <tag><c>firstline</c></tag> + <item> + <p>An unanchored pattern is required to match before or at the first + newline in the subject string, although the matched text can + continue over the newline.</p> + </item> + <tag><c>multiline</c></tag> + <item> + <p>By default, PCRE treats the subject string as consisting of a + single line of characters (even if it contains newlines). The + "start of line" metacharacter (<c>^</c>) matches only at the + start of the string, while the "end of line" metacharacter + (<c>$</c>) matches only at the end of the string, or before a + terminating newline (unless option <c>dollar_endonly</c> is + specified). This is the same as in Perl.</p> + <p>When this option is specified, the "start of line" and "end of + line" constructs match immediately following or immediately + before internal newlines in the subject string, respectively, as + well as at the very start and end. This is equivalent to Perl + option <c>/m</c> and can be changed within a pattern by a + <c>(?m)</c> option setting. If there are no newlines in a subject + string, or no occurrences of <c>^</c> or <c>$</c> in a pattern, + setting <c>multiline</c> has no effect.</p> </item> + <tag><c>no_auto_capture</c></tag> + <item> + <p>Disables the use of numbered capturing parentheses in the + pattern. Any opening parenthesis that is not followed by <c>?</c> + behaves as if it is followed by <c>?:</c>. Named parentheses can + still be used for capturing (and they acquire numbers in the + usual way). There is no equivalent option in Perl.</p> + </item> + <tag><c>dupnames</c></tag> + <item> + <p>Names used to identify capturing subpatterns need not be unique. + This can be helpful for certain types of pattern when it is known + that only one instance of the named subpattern can ever be + matched. More details of named subpatterns are provided below.</p> + </item> + <tag><c>ungreedy</c></tag> + <item> + <p>Inverts the "greediness" of the quantifiers so that they are not + greedy by default, but become greedy if followed by "?". It is + not compatible with Perl. It can also be set by a <c>(?U)</c> + option setting within the pattern.</p> + </item> + <tag><c>{newline, NLSpec}</c></tag> + <item> + <p>Overrides the default definition of a newline in the subject + string, which is LF (ASCII 10) in Erlang.</p> + <taglist> + <tag><c>cr</c></tag> + <item> + <p>Newline is indicated by a single character <c>cr</c> + (ASCII 13).</p> + </item> + <tag><c>lf</c></tag> + <item> + <p>Newline is indicated by a single character LF (ASCII 10), the + default.</p> + </item> + <tag><c>crlf</c></tag> + <item> + <p>Newline is indicated by the two-character CRLF (ASCII 13 + followed by ASCII 10) sequence.</p> + </item> + <tag><c>anycrlf</c></tag> + <item> + <p>Any of the three preceding sequences is to be recognized.</p> + </item> + <tag><c>any</c></tag> + <item> + <p>Any of the newline sequences above, and the Unicode sequences + VT (vertical tab, U+000B), FF (formfeed, U+000C), NEL (next + line, U+0085), LS (line separator, U+2028), and PS (paragraph + separator, U+2029).</p> + </item> + </taglist> + </item> + <tag><c>bsr_anycrlf</c></tag> + <item> + <p>Specifies specifically that \R is to match only the CR, + LF, or CRLF sequences, not the Unicode-specific newline + characters.</p> + </item> + <tag><c>bsr_unicode</c></tag> + <item> + <p>Specifies specifically that \R is to match all the Unicode + newline characters (including CRLF, and so on, the default).</p> + </item> + <tag><c>no_start_optimize</c></tag> + <item> + <p>Disables optimization that can malfunction if "Special + start-of-pattern items" are present in the regular expression. A + typical example would be when matching "DEFABC" against + "(*COMMIT)ABC", where the start optimization of PCRE would skip + the subject up to "A" and never realize that the (*COMMIT) + instruction is to have made the matching fail. This option is only + relevant if you use "start-of-pattern items", as discussed in + section <seealso marker="#regexp_syntax_details">PCRE Regular Expression + Details</seealso>.</p> + </item> + <tag><c>ucp</c></tag> + <item> + <p>Specifies that Unicode character properties are to be used when + resolving \B, \b, \D, \d, \S, \s, \W and \w. Without this flag, + only ISO Latin-1 properties are used. Using Unicode properties + hurts performance, but is semantically correct when working with + Unicode characters beyond the ISO Latin-1 range.</p> + </item> + <tag><c>never_utf</c></tag> + <item> + <p>Specifies that the (*UTF) and/or (*UTF8) "start-of-pattern + items" are forbidden. This flag cannot be combined with option + <c>unicode</c>. Useful if ISO Latin-1 patterns from an external + source are to be compiled.</p> + </item> + </taglist> + </desc> </func> <func> <name name="inspect" arity="2"/> - <fsummary>Inspects a compiled regular expression</fsummary> + <fsummary>Inspects a compiled regular expression.</fsummary> <desc> - <p>This function takes a compiled regular expression and an item, returning the relevant data from the regular expression. Currently the only supported item is <c>namelist</c>, which returns the tuple <c>{namelist, [ binary()]}</c>, containing the names of all (unique) named subpatterns in the regular expression.</p> - <p>Example:</p> - <code type="none"> + <p>Takes a compiled regular expression and an item, and returns the + relevant data from the regular expression. The only + supported item is <c>namelist</c>, which returns the tuple + <c>{namelist, [binary()]}</c>, containing the names of all (unique) + named subpatterns in the regular expression. For example:</p> + <code type="none"> 1> {ok,MP} = re:compile("(?<A>A)|(?<B>B)|(?<C>C)"). {ok,{re_pattern,3,0,0, <<69,82,67,80,119,0,0,0,0,0,0,0,1,0,0,0,255,255,255,255, @@ -181,8 +304,15 @@ This option makes it possible to include comments inside complicated patterns. N 255,255,...>>}} 4> re:inspect(MPD,namelist). {namelist,[<<"B">>,<<"C">>]}</code> - <p>Note specifically in the second example that the duplicate name only occurs once in the returned list, and that the list is in alphabetical order regardless of where the names are positioned in the regular expression. The order of the names is the same as the order of captured subexpressions if <c>{capture, all_names}</c> is given as an option to <c>re:run/3</c>. You can therefore create a name-to-value mapping from the result of <c>re:run/3</c> like this:</p> -<code> + <p>Notice in the second example that the duplicate name only occurs + once in the returned list, and that the list is in alphabetical order + regardless of where the names are positioned in the regular + expression. The order of the names is the same as the order of + captured subexpressions if <c>{capture, all_names}</c> is specified as + an option to <seealso marker="#run/3"><c>run/3</c></seealso>. + You can therefore create a name-to-value mapping from the result of + <c>run/3</c> like this:</p> + <code> 1> {ok,MP} = re:compile("(?<A>A)|(?<B>B)|(?<C>C)"). {ok,{re_pattern,3,0,0, <<69,82,67,80,119,0,0,0,0,0,0,0,1,0,0,0,255,255,255,255, @@ -193,249 +323,318 @@ This option makes it possible to include comments inside complicated patterns. N {match,[<<"A">>,<<>>,<<>>]} 4> NameMap = lists:zip(N,L). [{<<"A">>,<<"A">>},{<<"B">>,<<>>},{<<"C">>,<<>>}]</code> - <p>More items are expected to be added in the future.</p> + </desc> + </func> + + <func> + <name name="replace" arity="3"/> + <fsummary>Match a subject against regular expression and replace matching + elements with Replacement.</fsummary> + <desc> + <p>Same as <c>replace(<anno>Subject</anno>, <anno>RE</anno>, + <anno>Replacement</anno>, [])</c>.</p> </desc> </func> + + <func> + <name name="replace" arity="4"/> + <fsummary>Match a subject against regular expression and replace matching + elements with Replacement.</fsummary> + <desc> + <p>Replaces the matched part of the <c><anno>Subject</anno></c> string + with the contents of <c><anno>Replacement</anno></c>.</p> + <p>The permissible options are the same as for + <seealso marker="#run/3"><c>run/3</c></seealso>, except that option<c> + capture</c> is not allowed. Instead a <c>{return, + <anno>ReturnType</anno>}</c> is present. The default return type is + <c>iodata</c>, constructed in a way to minimize copying. The + <c>iodata</c> result can be used directly in many I/O operations. If a + flat <c>list()</c> is desired, specify <c>{return, list}</c>. If a + binary is desired, specify <c>{return, binary}</c>.</p> + <p>As in function <c>run/3</c>, an <c>mp()</c> compiled with option + <c>unicode</c> requires <c><anno>Subject</anno></c> to be a Unicode + <c>charlist()</c>. If compilation is done implicitly and the + <c>unicode</c> compilation option is specified to this function, both + the regular expression and <c><anno>Subject</anno></c> are to + specified as valid Unicode <c>charlist()</c>s.</p> + <p>The replacement string can contain the special character + <c>&</c>, which inserts the whole matching expression in the + result, and the special sequence <c>\</c>N (where N is an integer > + 0), <c>\g</c>N, or <c>\g{</c>N<c>}</c>, resulting in the subexpression + number N, is inserted in the result. If no subexpression with that + number is generated by the regular expression, nothing is + inserted.</p> + <p>To insert an & or a \ in the result, precede it + with a \. Notice that Erlang already gives a special meaning to + \ in literal strings, so a single \ must be written as + <c>"\\"</c> and therefore a double \ as <c>"\\\\"</c>.</p> + <p><em>Example:</em></p> + <code> +re:replace("abcd","c","[&]",[{return,list}]).</code> + <p>gives</p> + <code> +"ab[c]d"</code> + <p>while</p> + <code> +re:replace("abcd","c","[\\&]",[{return,list}]).</code> + <p>gives</p> + <code> +"ab[&]d"</code> + <p>As with <c>run/3</c>, compilation errors raise the <c>badarg</c> + exception. <seealso marker="#compile/2"><c>compile/2</c></seealso> + can be used to get more information about the error.</p> + </desc> + </func> + <func> <name name="run" arity="2"/> - <fsummary>Match a subject against regular expression and capture subpatterns</fsummary> + <fsummary>Match a subject against regular expression and capture + subpatterns.</fsummary> <desc> - <p>The same as <c>run(<anno>Subject</anno>,<anno>RE</anno>,[])</c>.</p> + <p>Same as <c>run(<anno>Subject</anno>,<anno>RE</anno>,[])</c>.</p> </desc> </func> + <func> <name name="run" arity="3"/> - <fsummary>Match a subject against regular expression and capture subpatterns</fsummary> - <type_desc variable="CompileOpt">See <seealso marker="#compile_options">compile/2</seealso> above.</type_desc> + <fsummary>Match a subject against regular expression and capture + subpatterns.</fsummary> + <type_desc variable="CompileOpt">See <seealso marker="#compile_options"> + <c>compile/2</c></seealso>.</type_desc> <desc> - - <p>Executes a regexp matching, returning <c>match/{match, - <anno>Captured</anno>}</c> or <c>nomatch</c>. The regular expression can be - given either as <c>iodata()</c> in which case it is - automatically compiled (as by <c>re:compile/2</c>) and executed, - or as a pre-compiled <c>mp()</c> in which case it is executed - against the subject directly.</p> - - <p>When compilation is involved, the exception <c>badarg</c> is - thrown if a compilation error occurs. Call <c>re:compile/2</c> - to get information about the location of the error in the - regular expression.</p> - - <p>If the regular expression is previously compiled, the option - list can only contain the options <c>anchored</c>, - <c>global</c>, <c>notbol</c>, <c>noteol</c>, <c>report_errors</c>, - <c>notempty</c>, <c>notempty_atstart</c>, <c>{offset, integer() >= 0}</c>, - <c>{match_limit, integer() >= 0}</c>, - <c>{match_limit_recursion, integer() >= 0}</c>, - <c>{newline, - <anno>NLSpec</anno>}</c> and - <c>{capture, <anno>ValueSpec</anno>}/{capture, <anno>ValueSpec</anno>, - <anno>Type</anno>}</c>. Otherwise all options valid for the - <c>re:compile/2</c> function are allowed as well. Options - allowed both for compilation and execution of a match, namely - <c>anchored</c> and <c>{newline, <anno>NLSpec</anno>}</c>, - will affect both - the compilation and execution if present together with a non - pre-compiled regular expression.</p> - - <p>If the regular expression was previously compiled with the - option <c>unicode</c>, the <c><anno>Subject</anno></c> should be provided as - a valid Unicode <c>charlist()</c>, otherwise any <c>iodata()</c> - will do. If compilation is involved and the option - <c>unicode</c> is given, both the <c><anno>Subject</anno></c> and the regular - expression should be given as valid Unicode - <c>charlists()</c>.</p> - - <p>The <c>{capture, <anno>ValueSpec</anno>}/{capture, <anno>ValueSpec</anno>, <anno>Type</anno>}</c> - defines what to return from the function upon successful - matching. The <c>capture</c> tuple may contain both a - value specification telling which of the captured - substrings are to be returned, and a type specification, telling - how captured substrings are to be returned (as index tuples, - lists or binaries). The <c>capture</c> option makes the function - quite flexible and powerful. The different options are described - in detail below.</p> - - <p>If the capture options describe that no substring capturing - at all is to be done (<c>{capture, none}</c>), the function will - return the single atom <c>match</c> upon successful matching, - otherwise the tuple - <c>{match, <anno>ValueList</anno>}</c> is returned. Disabling capturing can - be done either by specifying <c>none</c> or an empty list as - <c><anno>ValueSpec</anno></c>.</p> - - <p>The <c>report_errors</c> option adds the possibility that an - error tuple is returned. The tuple will either indicate a - matching error (<c>match_limit</c> or - <c>match_limit_recursion</c>) or a compilation error, where the - error tuple has the format <c>{error, {compile, - <anno>CompileErr</anno>}}</c>. Note that if the option - <c>report_errors</c> is not given, the function never returns - error tuples, but will report compilation errors as a badarg - exception and failed matches due to exceeded match limits simply - as <c>nomatch</c>.</p> - - <p>The options relevant for execution are:</p> - - <taglist> - <tag><c>anchored</c></tag> - - <item>Limits <c>re:run/3</c> to matching at the first matching - position. If a pattern was compiled with <c>anchored</c>, or - turned out to be anchored by virtue of its contents, it cannot - be made unanchored at matching time, hence there is no - <c>unanchored</c> option.</item> - - <tag><c>global</c></tag> - <item> - - <p>Implements global (repetitive) search (the <c>g</c> flag in - Perl). Each match is returned as a separate - <c>list()</c> containing the specific match as well as any - matching subexpressions (or as specified by the <c>capture - option</c>). The <c><anno>Captured</anno></c> part of the return value will - hence be a <c>list()</c> of <c>list()</c>s when this - option is given.</p> - - <p>The interaction of the global option with a regular - expression which matches an empty string surprises some users. - When the global option is given, <c>re:run/3</c> handles empty - matches in the same way as Perl: a zero-length match at any - point will be retried with the options <c>[anchored, - notempty_atstart]</c> as well. If that search gives a result of length - > 0, the result is included. For example:</p> - -<code> re:run("cat","(|at)",[global]).</code> - - <p>The following matching will be performed:</p> - <taglist> - <tag>At offset <c>0</c></tag> - <item>The regexp <c>(|at)</c> will first match at the initial - position of the string <c>cat</c>, giving the result set - <c>[{0,0},{0,0}]</c> (the second <c>{0,0}</c> is due to the - subexpression marked by the parentheses). As the length of the - match is 0, we don't advance to the next position yet.</item> - <tag>At offset <c>0</c> with <c>[anchored, notempty_atstart]</c></tag> - <item> The search is retried - with the options <c>[anchored, notempty_atstart]</c> at the same - position, which does not give any interesting result of longer - length, so the search position is now advanced to the next - character (<c>a</c>).</item> - <tag>At offset <c>1</c></tag> - <item>This time, the search results in - <c>[{1,0},{1,0}]</c>, so this search will also be repeated - with the extra options.</item> - <tag>At offset <c>1</c> with <c>[anchored, notempty_atstart]</c></tag> - <item>Now the <c>ab</c> alternative - is found and the result will be [{1,2},{1,2}]. The result is - added to the list of results and the position in the - search string is advanced two steps.</item> - <tag>At offset <c>3</c></tag> - <item>The search now once again - matches the empty string, giving <c>[{3,0},{3,0}]</c>.</item> - <tag>At offset <c>1</c> with <c>[anchored, notempty_atstart]</c></tag> - <item>This will give no result of length > 0 and we are at - the last position, so the global search is complete.</item> - </taglist> - <p>The result of the call is:</p> - -<code> {match,[[{0,0},{0,0}],[{1,0},{1,0}],[{1,2},{1,2}],[{3,0},{3,0}]]}</code> -</item> - - <tag><c>notempty</c></tag> - <item> - <p>An empty string is not considered to be a valid match if this - option is given. If there are alternatives in the pattern, they - are tried. If all the alternatives match the empty string, the - entire match fails. For example, if the pattern</p> -<code> a?b?</code> - <p>is applied to a string not beginning with "a" or "b", it - would normally match the empty string at the start of the - subject. With the <c>notempty</c> option, this match is not - valid, so re:run/3 searches further into the string for - occurrences of "a" or "b".</p> - </item> - <tag><c>notempty_atstart</c></tag> - <item> - <p>This is like <c>notempty</c>, except that an empty string - match that is not at the start of the subject is permitted. If - the pattern is anchored, such a match can occur only if the - pattern contains \K.</p> - <p>Perl has no direct equivalent of <c>notempty</c> or <c>notempty_atstart</c>, but it does - make a special case of a pattern match of the empty string - within its split() function, and when using the /g modifier. It - is possible to emulate Perl's behavior after matching a null - string by first trying the match again at the same offset with - <c>notempty_atstart</c> and <c>anchored</c>, and then, if that fails, by - advancing the starting offset (see below) and trying an ordinary - match again.</p> - </item> - <tag><c>notbol</c></tag> - - <item>This option specifies that the first character of the subject - string is not the beginning of a line, so the circumflex - metacharacter should not match before it. Setting this without - <c>multiline</c> (at compile time) causes circumflex never to - match. This option only affects the behavior of the circumflex - metacharacter. It does not affect \A.</item> - - <tag><c>noteol</c></tag> - - <item>This option specifies that the end of the subject string - is not the end of a line, so the dollar metacharacter should not - match it nor (except in multiline mode) a newline immediately - before it. Setting this without <c>multiline</c> (at compile time) - causes dollar never to match. This option affects only the - behavior of the dollar metacharacter. It does not affect \Z or - \z.</item> - - <tag><c>report_errors</c></tag> - - <item><p>This option gives better control of the error handling in <c>re:run/3</c>. When it is given, compilation errors (if the regular expression isn't already compiled) as well as run-time errors are explicitly returned as an error tuple.</p> - <p>The possible run-time errors are:</p> - <taglist> - <tag><c>match_limit</c></tag> - - <item>The PCRE library sets a limit on how many times the - internal match function can be called. The default value for - this is 10000000 in the library compiled for Erlang. If - <c>{error, match_limit}</c> is returned, it means that the - execution of the regular expression has reached this - limit. Normally this is to be regarded as a <c>nomatch</c>, - which is the default return value when this happens, but by - specifying <c>report_errors</c>, you will get informed when - the match fails due to to many internal calls.</item> - - <tag><c>match_limit_recursion</c></tag> - - <item>This error is very similar to <c>match_limit</c>, but - occurs when the internal match function of PCRE is - "recursively" called more times than the - "match_limit_recursion" limit, which is by default 10000000 as - well. Note that as long as the <c>match_limit</c> and - <c>match_limit_default</c> values are kept at the default - values, the <c>match_limit_recursion</c> error can not occur, - as the <c>match_limit</c> error will occur before that (each - recursive call is also a call, but not vice versa). Both - limits can however be changed, either by setting limits - directly in the regular expression string (see reference - section below) or by giving options to <c>re:run/3</c></item> - - </taglist> - <p>It is important to understand that what is referred to as - "recursion" when limiting matches is not actually recursion on - the C stack of the Erlang machine, neither is it recursion on - the Erlang process stack. The version of PCRE compiled into the - Erlang VM uses machine "heap" memory to store values that needs to be - kept over recursion in regular expression matches.</p> - </item> - <tag><c>{match_limit, integer() >= 0}</c></tag> - - <item><p>This option limits the execution time of a match in an - implementation-specific way. It is described in the following - way by the PCRE documentation:</p> - - <code> + <p>Executes a regular expression matching, and returns + <c>match/{match, <anno>Captured</anno>}</c> or <c>nomatch</c>. The + regular expression can be specified either as <c>iodata()</c> in + which case it is automatically compiled (as by <c>compile/2</c>) and + executed, or as a precompiled <c>mp()</c> in which case it is executed + against the subject directly.</p> + <p>When compilation is involved, exception <c>badarg</c> is thrown if a + compilation error occurs. Call <c>compile/2</c> to get information + about the location of the error in the regular expression.</p> + <p>If the regular expression is previously compiled, the option list can + only contain the following options:</p> + <list type="bulleted"> + <item><c>anchored</c></item> + <item><c>{capture, <anno>ValueSpec</anno>}/{capture, + <anno>ValueSpec</anno>, <anno>Type</anno>}</c></item> + <item><c>global</c></item> + <item><c>{match_limit, integer() >= 0}</c></item> + <item><c>{match_limit_recursion, integer() >= 0}</c></item> + <item><c>{newline, <anno>NLSpec</anno>}</c></item> + <item><c>notbol</c></item> + <item><c>notempty</c></item> + <item><c>notempty_atstart</c></item> + <item><c>noteol</c></item> + <item><c>{offset, integer() >= 0}</c></item> + <item><c>report_errors</c></item> + </list> + <p>Otherwise all options valid for function <c>compile/2</c> are also + allowed. Options allowed both for compilation and execution of a + match, namely <c>anchored</c> and <c>{newline, + <anno>NLSpec</anno>}</c>, affect both the compilation and execution if + present together with a non-precompiled regular expression.</p> + <p>If the regular expression was previously compiled with option + <c>unicode</c>, <c><anno>Subject</anno></c> is to be provided as a + valid Unicode <c>charlist()</c>, otherwise any <c>iodata()</c> will + do. If compilation is involved and option <c>unicode</c> is specified, + both <c><anno>Subject</anno></c> and the regular expression are to be + specified as valid Unicode <c>charlists()</c>.</p> + <p><c>{capture, <anno>ValueSpec</anno>}/{capture, + <anno>ValueSpec</anno>, <anno>Type</anno>}</c> defines what to return + from the function upon successful matching. The <c>capture</c> tuple + can contain both a value specification, telling which of the captured + substrings are to be returned, and a type specification, telling how + captured substrings are to be returned (as index tuples, lists, or + binaries). The options are described in detail below.</p> + <p>If the capture options describe that no substring capturing is to be + done (<c>{capture, none}</c>), the function returns the single atom + <c>match</c> upon successful matching, otherwise the tuple + <c>{match, <anno>ValueList</anno>}</c>. Disabling capturing can be + done either by specifying <c>none</c> or an empty list as + <c><anno>ValueSpec</anno></c>.</p> + <p>Option <c>report_errors</c> adds the possibility that an error tuple + is returned. The tuple either indicates a matching error + (<c>match_limit</c> or <c>match_limit_recursion</c>), or a compilation + error, where the error tuple has the format <c>{error, {compile, + <anno>CompileErr</anno>}}</c>. Notice that if option + <c>report_errors</c> is not specified, the function never returns + error tuples, but reports compilation errors as a <c>badarg</c> + exception and failed matches because of exceeded match limits simply + as <c>nomatch</c>.</p> + <p>The following options are relevant for execution:</p> + <taglist> + <tag><c>anchored</c></tag> + <item> + <p>Limits <c>run/3</c> to matching at the first matching + position. If a pattern was compiled with <c>anchored</c>, or + turned out to be anchored by virtue of its contents, it cannot + be made unanchored at matching time, hence there is no + <c>unanchored</c> option.</p></item> + <tag><c>global</c></tag> + <item> + <p>Implements global (repetitive) search (flag <c>g</c> in Perl). + Each match is returned as a separate <c>list()</c> containing the + specific match and any matching subexpressions (or as specified + by option <c>capture</c>. The <c><anno>Captured</anno></c> part + of the return value is hence a <c>list()</c> of <c>list()</c>s + when this option is specified.</p> + <p>The interaction of option <c>global</c> with a regular + expression that matches an empty string surprises some users. + When option <c>global</c> is specified, <c>run/3</c> handles + empty matches in the same way as Perl: a zero-length match at any + point is also retried with options <c>[anchored, + notempty_atstart]</c>. If that search gives a result of length + > 0, the result is included. Example:</p> + <code> +re:run("cat","(|at)",[global]).</code> + <p>The following matchings are performed:</p> + <taglist> + <tag>At offset <c>0</c></tag> + <item> + <p>The regular expression <c>(|at)</c> first match at the + initial position of string <c>cat</c>, giving the result set + <c>[{0,0},{0,0}]</c> (the second <c>{0,0}</c> is because of + the subexpression marked by the parentheses). As the length + of the match is 0, we do not advance to the next position + yet.</p> + </item> + <tag>At offset <c>0</c> with <c>[anchored, + notempty_atstart]</c></tag> + <item> + <p>The search is retried with options <c>[anchored, + notempty_atstart]</c> at the same position, which does not + give any interesting result of longer length, so the search + position is advanced to the next character (<c>a</c>).</p> + </item> + <tag>At offset <c>1</c></tag> + <item> + <p>The search results in <c>[{1,0},{1,0}]</c>, so this search is + also repeated with the extra options.</p> + </item> + <tag>At offset <c>1</c> with <c>[anchored, + notempty_atstart]</c></tag> + <item> + <p>Alternative <c>ab</c> is found and the result is + [{1,2},{1,2}]. The result is added to the list of results and + the position in the search string is advanced two steps.</p> + </item> + <tag>At offset <c>3</c></tag> + <item> + <p>The search once again matches the empty string, giving + <c>[{3,0},{3,0}]</c>.</p> + </item> + <tag>At offset <c>1</c> with <c>[anchored, + notempty_atstart]</c></tag> + <item> + <p>This gives no result of length > 0 and we are at the last + position, so the global search is complete.</p> + </item> + </taglist> + <p>The result of the call is:</p> + <code> +{match,[[{0,0},{0,0}],[{1,0},{1,0}],[{1,2},{1,2}],[{3,0},{3,0}]]}</code> + </item> + <tag><c>notempty</c></tag> + <item> + <p>An empty string is not considered to be a valid match if this + option is specified. If alternatives in the pattern exist, they + are tried. If all the alternatives match the empty string, the + entire match fails.</p> + <p><em>Example:</em></p> + <p>If the following pattern is applied to a string not beginning + with "a" or "b", it would normally match the empty string at the + start of the subject:</p> + <code> +a?b?</code> + <p>With option <c>notempty</c>, this match is invalid, so + <c>run/3</c> searches further into the string for occurrences of + "a" or "b".</p> + </item> + <tag><c>notempty_atstart</c></tag> + <item> + <p>Like <c>notempty</c>, except that an empty string match that is + not at the start of the subject is permitted. If the pattern is + anchored, such a match can occur only if the pattern contains + \K.</p> + <p>Perl has no direct equivalent of <c>notempty</c> or + <c>notempty_atstart</c>, but it does make a special case of a + pattern match of the empty string within its split() function, + and when using modifier <c>/g</c>. The Perl behavior can be + emulated after matching a null string by first trying the + match again at the same offset with <c>notempty_atstart</c> and + <c>anchored</c>, and then, if that fails, by advancing the + starting offset (see below) and trying an ordinary match + again.</p> + </item> + <tag><c>notbol</c></tag> + <item> + <p>Specifies that the first character of the subject string is not + the beginning of a line, so the circumflex metacharacter is not + to match before it. Setting this without <c>multiline</c> (at + compile time) causes circumflex never to match. This option only + affects the behavior of the circumflex metacharacter. It does not + affect \A.</p> + </item> + <tag><c>noteol</c></tag> + <item> + <p>Specifies that the end of the subject string is not the end of a + line, so the dollar metacharacter is not to match it nor (except + in multiline mode) a newline immediately before it. Setting this + without <c>multiline</c> (at compile time) causes dollar never to + match. This option affects only the behavior of the dollar + metacharacter. It does not affect \Z or \z.</p> + </item> + <tag><c>report_errors</c></tag> + <item> + <p>Gives better control of the error handling in <c>run/3</c>. When + specified, compilation errors (if the regular expression is not + already compiled) and runtime errors are explicitly returned as + an error tuple.</p> + <p>The following are the possible runtime errors:</p> + <taglist> + <tag><c>match_limit</c></tag> + <item> + <p>The PCRE library sets a limit on how many times the internal + match function can be called. Defaults to 10,000,000 in the + library compiled for Erlang. If <c>{error, match_limit}</c> + is returned, the execution of the regular expression has + reached this limit. This is normally to be regarded as a + <c>nomatch</c>, which is the default return value when this + occurs, but by specifying <c>report_errors</c>, you are + informed when the match fails because of too many internal + calls.</p> + </item> + <tag><c>match_limit_recursion</c></tag> + <item> + <p>This error is very similar to <c>match_limit</c>, but occurs + when the internal match function of PCRE is "recursively" + called more times than the <c>match_limit_recursion</c> limit, + which defaults to 10,000,000 as well. Notice that as long as + the <c>match_limit</c> + and <c>match_limit_default</c> values are + kept at the default values, the <c>match_limit_recursion</c> + error cannot occur, as the <c>match_limit</c> error occurs + before that (each recursive call is also a call, but not + conversely). Both limits can however be changed, either by + setting limits directly in the regular expression string (see + section <seealso marker="#regexp_syntax_details">PCRE Regular + Eexpression Details</seealso>) or by specifying options to + <c>run/3</c>.</p> + </item> + </taglist> + <p>It is important to understand that what is referred to as + "recursion" when limiting matches is not recursion on the C stack + of the Erlang machine or on the Erlang process stack. The PCRE + version compiled into the Erlang VM uses machine "heap" memory to + store values that must be kept over recursion in regular + expression matches.</p> + </item> + <tag><c>{match_limit, integer() >= 0}</c></tag> + <item> + <p>Limits the execution time of a match in an + implementation-specific way. It is described as follows by the + PCRE documentation:</p> + <code> The match_limit field provides a means of preventing PCRE from using up a vast amount of resources when running patterns that are not going to match, but which have a very large number of possibilities in their @@ -448,26 +647,22 @@ imposed on the number of times this function is called during a match, which has the effect of limiting the amount of backtracking that can take place. For patterns that are not anchored, the count restarts from zero for each position in the subject string.</code> - - <p>This means that runaway regular expression matches can fail - faster if the limit is lowered using this option. The default - value compiled into the Erlang virtual machine is 10000000</p> - - <note><p>This option does in no way affect the execution of the - Erlang virtual machine in terms of "long running - BIF's". <c>re:run</c> always give control back to the scheduler - of Erlang processes at intervals that ensures the real time - properties of the Erlang system.</p></note> - </item> - - <tag><c>{match_limit_recursion, integer() >= 0}</c></tag> - - <item><p>This option limits the execution time and memory - consumption of a match in an implementation-specific way, very - similar to <c>match_limit</c>. It is described in the following - way by the PCRE documentation:</p> - - <code> + <p>This means that runaway regular expression matches can fail + faster if the limit is lowered using this option. The default + value 10,000,000 is compiled into the Erlang VM.</p> + <note> + <p>This option does in no way affect the execution of the Erlang + VM in terms of "long running BIFs". <c>run/3</c> always gives + control back to the scheduler of Erlang processes at intervals + that ensures the real-time properties of the Erlang system.</p> + </note> + </item> + <tag><c>{match_limit_recursion, integer() >= 0}</c></tag> + <item> + <p>Limits the execution time and memory consumption of a match in an + implementation-specific way, very similar to <c>match_limit</c>. + It is described as follows by the PCRE documentation:</p> + <code> The match_limit_recursion field is similar to match_limit, but instead of limiting the total number of times that match() is called, it limits the depth of recursion. The recursion depth is a smaller number @@ -477,3273 +672,3535 @@ match_limit. Limiting the recursion depth limits the amount of machine stack that can be used, or, when PCRE has been compiled to use memory on the heap -instead of the stack, the amount of heap memory that can be -used.</code> - - <p>The Erlang virtual machine uses a PCRE library where heap - memory is used when regular expression match recursion happens, - why this limits the usage of machine heap, not C stack.</p> - - <p>Specifying a lower value may result in matches with deep recursion failing, when they should actually have matched:</p> - <code type="none"> +instead of the stack, the amount of heap memory that can be used.</code> + <p>The Erlang VM uses a PCRE library where heap memory is used when + regular expression match recursion occurs. This therefore limits + the use of machine heap, not C stack.</p> + <p>Specifying a lower value can result in matches with deep + recursion failing, when they should have matched:</p> + <code type="none"> 1> re:run("aaaaaaaaaaaaaz","(a+)*z"). {match,[{0,14},{0,13}]} 2> re:run("aaaaaaaaaaaaaz","(a+)*z",[{match_limit_recursion,5}]). nomatch 3> re:run("aaaaaaaaaaaaaz","(a+)*z",[{match_limit_recursion,5},report_errors]). {error,match_limit_recursion}</code> - - <p>This option, as well as the <c>match_limit</c> option should - only be used in very rare cases. Understanding of the PCRE - library internals is recommended before tampering with these - limits.</p> - </item> - - <tag><c>{offset, integer() >= 0}</c></tag> - - <item>Start matching at the offset (position) given in the - subject string. The offset is zero-based, so that the default is - <c>{offset,0}</c> (all of the subject string).</item> - - <tag><c>{newline, <anno>NLSpec</anno>}</c></tag> - <item> - <p>Override the default definition of a newline in the subject string, which is LF (ASCII 10) in Erlang.</p> - <taglist> - <tag><c>cr</c></tag> - <item>Newline is indicated by a single character CR (ASCII 13)</item> - <tag><c>lf</c></tag> - <item>Newline is indicated by a single character LF (ASCII 10), the default</item> - <tag><c>crlf</c></tag> - <item>Newline is indicated by the two-character CRLF (ASCII 13 followed by ASCII 10) sequence.</item> - <tag><c>anycrlf</c></tag> - <item>Any of the three preceding sequences should be recognized.</item> - <tag><c>any</c></tag> - <item>Any of the newline sequences above, plus the Unicode sequences VT (vertical tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS (paragraph separator, U+2029). </item> - </taglist> - </item> - <tag><c>bsr_anycrlf</c></tag> - <item>Specifies specifically that \R is to match only the cr, lf or crlf sequences, not the Unicode specific newline characters. (overrides compilation option)</item> - <tag><c>bsr_unicode</c></tag> - <item>Specifies specifically that \R is to match all the Unicode newline characters (including crlf etc, the default).(overrides compilation option)</item> - - <tag><c>{capture, <anno>ValueSpec</anno>}</c>/<c>{capture, <anno>ValueSpec</anno>, <anno>Type</anno>}</c></tag> - <item> - - <p>Specifies which captured substrings are returned and in what - format. By default, - <c>re:run/3</c> captures all of the matching part of the - substring as well as all capturing subpatterns (all of the - pattern is automatically captured). The default return type is - (zero-based) indexes of the captured parts of the string, given as - <c>{Offset,Length}</c> pairs (the <c>index</c> <c><anno>Type</anno></c> of - capturing).</p> - - <p>As an example of the default behavior, the following call:</p> - - <code> re:run("ABCabcdABC","abcd",[]).</code> - - <p>returns, as first and only captured string the matching part of the subject ("abcd" in the middle) as a index pair <c>{3,4}</c>, where character positions are zero based, just as in offsets. The return value of the call above would then be:</p> - <code> {match,[{3,4}]}</code> - <p>Another (and quite common) case is where the regular expression matches all of the subject, as in:</p> - <code> re:run("ABCabcdABC",".*abcd.*",[]).</code> - <p>where the return value correspondingly will point out all of the string, beginning at index 0 and being 10 characters long:</p> - <code> {match,[{0,10}]}</code> - - <p>If the regular expression contains capturing subpatterns, - like in the following case:</p> - - <code> re:run("ABCabcdABC",".*(abcd).*",[]).</code> - - <p>all of the matched subject is captured, as - well as the captured substrings:</p> - - <code> {match,[{0,10},{3,4}]}</code> - - <p>the complete matching pattern always giving the first return value in the - list and the rest of the subpatterns being added in the - order they occurred in the regular expression.</p> - - <p>The capture tuple is built up as follows:</p> - <taglist> - <tag><c><anno>ValueSpec</anno></c></tag> - <item><p>Specifies which captured (sub)patterns are to be returned. The <c><anno>ValueSpec</anno></c> can either be an atom describing a predefined set of return values, or a list containing either the indexes or the names of specific subpatterns to return.</p> - <p>The predefined sets of subpatterns are:</p> - <taglist> - <tag><c>all</c></tag> - <item>All captured subpatterns including the complete matching string. This is the default.</item> - <tag><c>all_names</c></tag> - <item>All <em>named</em> subpatterns in the regular expression, as if a <c>list()</c> - of all the names <em>in alphabetical order</em> was given. The list of all names can also be retrieved with the <seealso marker="#inspect/2">inspect/2</seealso> function.</item> - <tag><c>first</c></tag> - <item>Only the first captured subpattern, which is always the complete matching part of the subject. All explicitly captured subpatterns are discarded.</item> - <tag><c>all_but_first</c></tag> - <item>All but the first matching subpattern, i.e. all explicitly captured subpatterns, but not the complete matching part of the subject string. This is useful if the regular expression as a whole matches a large part of the subject, but the part you're interested in is in an explicitly captured subpattern. If the return type is <c>list</c> or <c>binary</c>, not returning subpatterns you're not interested in is a good way to optimize.</item> - <tag><c>none</c></tag> - <item>Do not return matching subpatterns at all, yielding the single atom <c>match</c> as the return value of the function when matching successfully instead of the <c>{match, list()}</c> return. Specifying an empty list gives the same behavior.</item> - </taglist> - <p>The value list is a list of indexes for the subpatterns to return, where index 0 is for all of the pattern, and 1 is for the first explicit capturing subpattern in the regular expression, and so forth. When using named captured subpatterns (see below) in the regular expression, one can use <c>atom()</c>s or <c>string()</c>s to specify the subpatterns to be returned. For example, consider the regular expression:</p> - <code> ".*(abcd).*"</code> - <p>matched against the string "ABCabcdABC", capturing only the "abcd" part (the first explicit subpattern):</p> - <code> re:run("ABCabcdABC",".*(abcd).*",[{capture,[1]}]).</code> - <p>The call will yield the following result:</p> - <code> {match,[{3,4}]}</code> - <p>as the first explicitly captured subpattern is "(abcd)", matching "abcd" in the subject, at (zero-based) position 3, of length 4.</p> - <p>Now consider the same regular expression, but with the subpattern explicitly named 'FOO':</p> - <code> ".*(?<FOO>abcd).*"</code> - <p>With this expression, we could still give the index of the subpattern with the following call:</p> - <code> re:run("ABCabcdABC",".*(?<FOO>abcd).*",[{capture,[1]}]).</code> - <p>giving the same result as before. But, since the subpattern is named, we can also specify its name in the value list:</p> - <code> re:run("ABCabcdABC",".*(?<FOO>abcd).*",[{capture,['FOO']}]).</code> - <p>which would yield the same result as the earlier examples, namely:</p> - <code> {match,[{3,4}]}</code> - - <p>The values list might specify indexes or names not present in - the regular expression, in which case the return values vary - depending on the type. If the type is <c>index</c>, the tuple - <c>{-1,0}</c> is returned for values having no corresponding - subpattern in the regexp, but for the other types - (<c>binary</c> and <c>list</c>), the values are the empty binary - or list respectively.</p> - - </item> - <tag><c><anno>Type</anno></c></tag> - <item><p>Optionally specifies how captured substrings are to be returned. If omitted, the default of <c>index</c> is used. The <c><anno>Type</anno></c> can be one of the following:</p> - <taglist> - <tag><c>index</c></tag> - <item>Return captured substrings as pairs of byte indexes into the subject string and length of the matching string in the subject (as if the subject string was flattened with <c>iolist_to_binary/1</c> or <c>unicode:characters_to_binary/2</c> prior to matching). Note that the <c>unicode</c> option results in <em>byte-oriented</em> indexes in a (possibly virtual) <em>UTF-8 encoded</em> binary. A byte index tuple <c>{0,2}</c> might therefore represent one or two characters when <c>unicode</c> is in effect. This might seem counter-intuitive, but has been deemed the most effective and useful way to way to do it. To return lists instead might result in simpler code if that is desired. This return type is the default.</item> - <tag><c>list</c></tag> - <item>Return matching substrings as lists of characters (Erlang <c>string()</c>s). It the <c>unicode</c> option is used in combination with the \C sequence in the regular expression, a captured subpattern can contain bytes that are not valid UTF-8 (\C matches bytes regardless of character encoding). In that case the <c>list</c> capturing may result in the same types of tuples that <c>unicode:characters_to_list/2</c> can return, namely three-tuples with the tag <c>incomplete</c> or <c>error</c>, the successfully converted characters and the invalid UTF-8 tail of the conversion as a binary. The best strategy is to avoid using the \C sequence when capturing lists.</item> - <tag><c>binary</c></tag> - <item>Return matching substrings as binaries. If the <c>unicode</c> option is used, these binaries are in UTF-8. If the \C sequence is used together with <c>unicode</c> the binaries may be invalid UTF-8.</item> + <p>This option and option <c>match_limit</c> are only to be used in + rare cases. Understanding of the PCRE library internals is + recommended before tampering with these limits.</p> + </item> + <tag><c>{offset, integer() >= 0}</c></tag> + <item> + <p>Start matching at the offset (position) specified in the + subject string. The offset is zero-based, so that the default is + <c>{offset,0}</c> (all of the subject string).</p> + </item> + <tag><c>{newline, <anno>NLSpec</anno>}</c></tag> + <item> + <p>Overrides the default definition of a newline in the subject + string, which is LF (ASCII 10) in Erlang.</p> + <taglist> + <tag><c>cr</c></tag> + <item> + <p>Newline is indicated by a single character CR (ASCII 13).</p> + </item> + <tag><c>lf</c></tag> + <item> + <p>Newline is indicated by a single character LF (ASCII 10), + the default.</p> + </item> + <tag><c>crlf</c></tag> + <item> + <p>Newline is indicated by the two-character CRLF (ASCII 13 + followed by ASCII 10) sequence.</p> + </item> + <tag><c>anycrlf</c></tag> + <item> + <p>Any of the three preceding sequences is be recognized.</p> + </item> + <tag><c>any</c></tag> + <item> + <p>Any of the newline sequences above, and the Unicode + sequences VT (vertical tab, U+000B), FF (formfeed, U+000C), NEL + (next line, U+0085), LS (line separator, U+2028), and PS + (paragraph separator, U+2029).</p> + </item> + </taglist> + </item> + <tag><c>bsr_anycrlf</c></tag> + <item> + <p>Specifies specifically that \R is to match only the CR + LF, or CRLF sequences, not the Unicode-specific newline + characters. (Overrides the compilation option.)</p> + </item> + <tag><c>bsr_unicode</c></tag> + <item> + <p>Specifies specifically that \R is to match all the Unicode + newline characters (including CRLF, and so on, the default). + (Overrides the compilation option.)</p> + </item> + <tag><c>{capture, <anno>ValueSpec</anno>}</c>/<c>{capture, + <anno>ValueSpec</anno>, <anno>Type</anno>}</c></tag> + <item> + <p>Specifies which captured substrings are returned and in what + format. By default, <c>run/3</c> captures all of the matching + part of the substring and all capturing subpatterns (all of the + pattern is automatically captured). The default return type is + (zero-based) indexes of the captured parts of the string, + specified as <c>{Offset,Length}</c> pairs (the <c>index</c> + <c><anno>Type</anno></c> of capturing).</p> + <p>As an example of the default behavior, the following call + returns, as first and only captured string, the matching part of + the subject ("abcd" in the middle) as an index pair <c>{3,4}</c>, + where character positions are zero-based, just as in offsets:</p> + <code> +re:run("ABCabcdABC","abcd",[]).</code> + <p>The return value of this call is:</p> + <code> +{match,[{3,4}]}</code> + <p>Another (and quite common) case is where the regular expression + matches all of the subject:</p> + <code> +re:run("ABCabcdABC",".*abcd.*",[]).</code> + <p>Here the return value correspondingly points out all of the + string, beginning at index 0, and it is 10 characters long:</p> + <code> +{match,[{0,10}]}</code> + <p>If the regular expression contains capturing subpatterns, like + in:</p> + <code> +re:run("ABCabcdABC",".*(abcd).*",[]).</code> + <p>all of the matched subject is captured, as well as the captured + substrings:</p> + <code> +{match,[{0,10},{3,4}]}</code> + <p>The complete matching pattern always gives the first return + value in the list and the remaining subpatterns are added in the + order they occurred in the regular expression.</p> + <p>The capture tuple is built up as follows:</p> + <taglist> + <tag><c><anno>ValueSpec</anno></c></tag> + <item> + <p>Specifies which captured (sub)patterns are to be returned. + <c><anno>ValueSpec</anno></c> can either be an atom describing + a predefined set of return values, or a list containing the + indexes or the names of specific subpatterns to return.</p> + <p>The following are the predefined sets of subpatterns:</p> + <taglist> + <tag><c>all</c></tag> + <item> + <p>All captured subpatterns including the complete matching + string. This is the default.</p> + </item> + <tag><c>all_names</c></tag> + <item> + <p>All <em>named</em> subpatterns in the regular expression, + as if a <c>list()</c> of all the names <em>in + alphabetical order</em> was specified. The list of all + names can also be retrieved with + <seealso marker="#inspect/2"> + <c>inspect/2</c></seealso>.</p> + </item> + <tag><c>first</c></tag> + <item> + <p>Only the first captured subpattern, which is always the + complete matching part of the subject. All explicitly + captured subpatterns are discarded.</p> + </item> + <tag><c>all_but_first</c></tag> + <item> + <p>All but the first matching subpattern, that is, all + explicitly captured subpatterns, but not the complete + matching part of the subject string. This is useful if + the regular expression as a whole matches a large part of + the subject, but the part you are interested in is in an + explicitly captured subpattern. If the return type is + <c>list</c> or <c>binary</c>, not returning subpatterns + you are not interested in is a good way to optimize.</p> + </item> + <tag><c>none</c></tag> + <item> + <p>Returns no matching subpatterns, gives the single + atom <c>match</c> as the return value of the function + when matching successfully instead of the <c>{match, + list()}</c> return. Specifying an empty list gives the + same behavior.</p> + </item> + </taglist> + <p>The value list is a list of indexes for the subpatterns to + return, where index 0 is for all of the pattern, and 1 is for + the first explicit capturing subpattern in the regular + expression, and so on. When using named captured subpatterns + (see below) in the regular expression, one can use + <c>atom()</c>s or <c>string()</c>s to specify the subpatterns + to be returned. For example, consider the regular + expression:</p> + <code> +".*(abcd).*"</code> + <p>matched against string "ABCabcdABC", capturing only the + "abcd" part (the first explicit subpattern):</p> + <code> +re:run("ABCabcdABC",".*(abcd).*",[{capture,[1]}]).</code> + <p>The call gives the following result, as the first explicitly + captured subpattern is "(abcd)", matching "abcd" in the + subject, at (zero-based) position 3, of length 4:</p> + <code> +{match,[{3,4}]}</code> + <p>Consider the same regular expression, but with the subpattern + explicitly named 'FOO':</p> + <code> +".*(?<FOO>abcd).*"</code> + <p>With this expression, we could still give the index of the + subpattern with the following call:</p> + <code> +re:run("ABCabcdABC",".*(?<FOO>abcd).*",[{capture,[1]}]).</code> + <p>giving the same result as before. But, as the subpattern is + named, we can also specify its name in the value list:</p> + <code> +re:run("ABCabcdABC",".*(?<FOO>abcd).*",[{capture,['FOO']}]).</code> + <p>This would give the same result as the earlier examples, + namely:</p> + <code> +{match,[{3,4}]}</code> + <p>The values list can specify indexes or names not present in + the regular expression, in which case the return values vary + depending on the type. If the type is <c>index</c>, the tuple + <c>{-1,0}</c> is returned for values with no corresponding + subpattern in the regular expression, but for the other types + (<c>binary</c> and <c>list</c>), the values are the empty + binary or list, respectively.</p> + </item> + <tag><c><anno>Type</anno></c></tag> + <item> + <p>Optionally specifies how captured substrings are to be + returned. If omitted, the default of <c>index</c> is used.</p> + <p><c><anno>Type</anno></c> can be one of the following:</p> + <taglist> + <tag><c>index</c></tag> + <item> + <p>Returns captured substrings as pairs of byte indexes + into the subject string and length of the matching string + in the subject (as if the subject string was flattened + with <seealso marker="erts:erlang#iolist_to_binary/1"> + <c>erlang:iolist_to_binary/1</c></seealso> or + <seealso marker="unicode#characters_to_binary/2"> + <c>unicode:characters_to_binary/2</c></seealso> before + matching). Notice that option <c>unicode</c> results in + <em>byte-oriented</em> indexes in a (possibly virtual) + <em>UTF-8 encoded</em> binary. A byte index tuple + <c>{0,2}</c> can therefore represent one or two + characters when <c>unicode</c> is in effect. This can seem + counter-intuitive, but has been deemed the most effective + and useful way to do it. To return lists instead can + result in simpler code if that is desired. This return + type is the default.</p> + </item> + <tag><c>list</c></tag> + <item> + <p>Returns matching substrings as lists of characters + (Erlang <c>string()</c>s). It option <c>unicode</c> is + used in combination with the \C sequence in the + regular expression, a captured subpattern can contain + bytes that are not valid UTF-8 (\C matches bytes + regardless of character encoding). In that case the + <c>list</c> capturing can result in the same types of + tuples that + <seealso marker="unicode#characters_to_list/2"> + <c>unicode:characters_to_list/2</c></seealso> can return, + namely three-tuples with tag <c>incomplete</c> or + <c>error</c>, the successfully converted characters and + the invalid UTF-8 tail of the conversion as a binary. The + best strategy is to avoid using the \C sequence + when capturing lists.</p> + </item> + <tag><c>binary</c></tag> + <item> + <p>Returns matching substrings as binaries. If option + <c>unicode</c> is used, these binaries are in UTF-8. If + the \C sequence is used together with + <c>unicode</c>, the binaries can be invalid UTF-8.</p> + </item> + </taglist> + </item> + </taglist> + <p>In general, subpatterns that were not assigned a value in the + match are returned as the tuple <c>{-1,0}</c> when <c>type</c> is + <c>index</c>. Unassigned subpatterns are returned as the empty + binary or list, respectively, for other return types. Consider + the following regular expression:</p> + <code> +".*((?<FOO>abdd)|a(..d)).*"</code> + <p>There are three explicitly capturing subpatterns, where the + opening parenthesis position determines the order in the result, + hence <c>((?<FOO>abdd)|a(..d))</c> is subpattern index 1, + <c>(?<FOO>abdd)</c> is subpattern index 2, and <c>(..d)</c> + is subpattern index 3. When matched against the following + string:</p> + <code> +"ABCabcdABC"</code> + <p>the subpattern at index 2 does not match, as "abdd" is not + present in the string, but the complete pattern matches (because + of the alternative <c>a(..d)</c>). The subpattern at index 2 is + therefore unassigned and the default return value is:</p> + <code> +{match,[{0,10},{3,4},{-1,0},{4,3}]}</code> + <p>Setting the capture <c><anno>Type</anno></c> to <c>binary</c> + gives:</p> + <code> +{match,[<<"ABCabcdABC">>,<<"abcd">>,<<>>,<<"bcd">>]}</code> + <p>Here the empty binary (<c><<>></c>) represents the + unassigned subpattern. In the <c>binary</c> case, some information + about the matching is therefore lost, as + <c><<>></c> can + also be an empty string captured.</p> + <p>If differentiation between empty matches and non-existing + subpatterns is necessary, use the <c>type</c> <c>index</c> and do + the conversion to the final type in Erlang code.</p> + <p>When option <c>global</c> is speciified, the <c>capture</c> + specification affects each match separately, so that:</p> + <code> +re:run("cacb","c(a|b)",[global,{capture,[1],list}]).</code> + <p>gives</p> + <code> +{match,[["a"],["b"]]}</code> + </item> </taglist> - </item> - </taglist> - <p>In general, subpatterns that were not assigned a value in the match are returned as the tuple <c>{-1,0}</c> when <c>type</c> is <c>index</c>. Unassigned subpatterns are returned as the empty binary or list, respectively, for other return types. Consider the regular expression:</p> -<code> ".*((?<FOO>abdd)|a(..d)).*"</code> - <p>There are three explicitly capturing subpatterns, where the opening parenthesis position determines the order in the result, hence <c>((?<FOO>abdd)|a(..d))</c> is subpattern index 1, <c>(?<FOO>abdd)</c> is subpattern index 2 and <c>(..d)</c> is subpattern index 3. When matched against the following string:</p> -<code> "ABCabcdABC"</code> - <p>the subpattern at index 2 won't match, as "abdd" is not present in the string, but the complete pattern matches (due to the alternative <c>a(..d)</c>. The subpattern at index 2 is therefore unassigned and the default return value will be:</p> -<code> {match,[{0,10},{3,4},{-1,0},{4,3}]}</code> - <p>Setting the capture <c><anno>Type</anno></c> to <c>binary</c> would give the following:</p> -<code> {match,[<<"ABCabcdABC">>,<<"abcd">>,<<>>,<<"bcd">>]}</code> - <p>where the empty binary (<c><<>></c>) represents the unassigned subpattern. In the <c>binary</c> case, some information about the matching is therefore lost, the <c><<>></c> might just as well be an empty string captured.</p> - <p>If differentiation between empty matches and non existing subpatterns is necessary, use the <c>type</c> <c>index</c> - and do the conversion to the final type in Erlang code.</p> - - <p>When the option <c>global</c> is given, the <c>capture</c> - specification affects each match separately, so that:</p> - - <code> re:run("cacb","c(a|b)",[global,{capture,[1],list}]).</code> - - <p>gives the result:</p> - - <code> {match,[["a"],["b"]]}</code> - - </item> - </taglist> - <p>The options solely affecting the compilation step are described in the <c>re:compile/2</c> function.</p> - </desc> - </func> - <func> - <name name="replace" arity="3"/> - <fsummary>Match a subject against regular expression and replace matching elements with Replacement</fsummary> - <desc> - <p>The same as <c>replace(<anno>Subject</anno>,<anno>RE</anno>,<anno>Replacement</anno>,[])</c>.</p> - </desc> - </func> - <func> - <name name="replace" arity="4"/> - <fsummary>Match a subject against regular expression and replace matching elements with Replacement</fsummary> - <desc> - <p>Replaces the matched part of the <c><anno>Subject</anno></c> string with the contents of <c><anno>Replacement</anno></c>.</p> - <p>The permissible options are the same as for <c>re:run/3</c>, except that the <c>capture</c> option is not allowed. - Instead a <c>{return, <anno>ReturnType</anno>}</c> is present. The default return type is <c>iodata</c>, constructed in a - way to minimize copying. The <c>iodata</c> result can be used directly in many I/O-operations. If a flat <c>list()</c> is - desired, specify <c>{return, list}</c> and if a binary is preferred, specify <c>{return, binary}</c>.</p> - - <p>As in the <c>re:run/3</c> function, an <c>mp()</c> compiled - with the <c>unicode</c> option requires the <c><anno>Subject</anno></c> to be - a Unicode <c>charlist()</c>. If compilation is done implicitly - and the <c>unicode</c> compilation option is given to this - function, both the regular expression and the <c><anno>Subject</anno></c> - should be given as valid Unicode <c>charlist()</c>s.</p> - - <p>The replacement string can contain the special character - <c>&</c>, which inserts the whole matching expression in the - result, and the special sequence <c>\</c>N (where N is an integer > 0), - <c>\g</c>N or <c>\g{</c>N<c>}</c> resulting in the subexpression number N will be - inserted in the result. If no subexpression with that number is - generated by the regular expression, nothing is inserted.</p> - <p>To insert an <c>&</c> or <c>\</c> in the result, precede it - with a <c>\</c>. Note that Erlang already gives a special - meaning to <c>\</c> in literal strings, so a single <c>\</c> - has to be written as <c>"\\"</c> and therefore a double <c>\</c> - as <c>"\\\\"</c>. Example:</p> - <code> re:replace("abcd","c","[&]",[{return,list}]).</code> - <p>gives</p> - <code> "ab[c]d"</code> - <p>while</p> - <code> re:replace("abcd","c","[\\&]",[{return,list}]).</code> - <p>gives</p> - <code> "ab[&]d"</code> - <p>As with <c>re:run/3</c>, compilation errors raise the <c>badarg</c> - exception, <c>re:compile/2</c> can be used to get more information - about the error.</p> + <p>For a descriptions of options only affecting the compilation step, + see <seealso marker="#compile/2"><c>compile/2</c></seealso>.</p> </desc> </func> + <func> <name name="split" arity="2"/> - <fsummary>Split a string by tokens specified as a regular expression</fsummary> + <fsummary>Split a string by tokens specified as a regular expression. + </fsummary> <desc> - <p>The same as <c>split(<anno>Subject</anno>,<anno>RE</anno>,[])</c>.</p> + <p>Same as <c>split(<anno>Subject</anno>, <anno>RE</anno>, [])</c>.</p> </desc> </func> <func> <name name="split" arity="3"/> <fsummary>Split a string by tokens specified as a regular expression</fsummary> - <type_desc variable="CompileOpt">See <seealso marker="#compile_options">compile/2</seealso> above.</type_desc> + <type_desc variable="CompileOpt">See <seealso marker="#compile_options"> + <c>compile/2</c></seealso>.</type_desc> <desc> - <p>This function splits the input into parts by finding tokens - according to the regular expression supplied.</p> - - <p>The splitting is done basically by running a global regexp match and - dividing the initial string wherever a match occurs. The matching part - of the string is removed from the output.</p> - - <p>As in the <c>re:run/3</c> function, an <c>mp()</c> compiled - with the <c>unicode</c> option requires the <c><anno>Subject</anno></c> to be - a Unicode <c>charlist()</c>. If compilation is done implicitly - and the <c>unicode</c> compilation option is given to this - function, both the regular expression and the <c><anno>Subject</anno></c> - should be given as valid Unicode <c>charlist()</c>s.</p> - - <p>The result is given as a list of "strings", the - preferred datatype given in the <c>return</c> option (default iodata).</p> - <p>If subexpressions are given in the regular expression, the - matching subexpressions are returned in the resulting list as - well. An example:</p> - -<code> re:split("Erlang","[ln]",[{return,list}]).</code> - - <p>will yield the result:</p> - -<code> ["Er","a","g"]</code> - - <p>while</p> - -<code> re:split("Erlang","([ln])",[{return,list}]).</code> - - <p>will yield</p> - -<code> ["Er","l","a","n","g"]</code> - - <p>The text matching the subexpression (marked by the parentheses - in the regexp) is - inserted in the result list where it was found. In effect this means - that concatenating the result of a split where the whole regexp is a - single subexpression (as in the example above) will always result in - the original string.</p> - - <p>As there is no matching subexpression for the last part in - the example (the "g"), there is nothing inserted after - that. To make the group of strings and the parts matching the - subexpressions more obvious, one might use the <c>group</c> - option, which groups together the part of the subject string with the - parts matching the subexpressions when the string was split:</p> - -<code> re:split("Erlang","([ln])",[{return,list},group]).</code> - - <p>gives:</p> - -<code> [["Er","l"],["a","n"],["g"]]</code> - - <p>Here the regular expression matched first the "l", - causing "Er" to be the first part in the result. When - the regular expression matched, the (only) subexpression was - bound to the "l", so the "l" is inserted - in the group together with "Er". The next match is of - the "n", making "a" the next part to be - returned. Since the subexpression is bound to the substring - "n" in this case, the "n" is inserted into - this group. The last group consists of the rest of the string, - as no more matches are found.</p> - - - <p>By default, all parts of the string, including the empty - strings, are returned from the function. For example:</p> - -<code> re:split("Erlang","[lg]",[{return,list}]).</code> - - <p>will return:</p> - -<code> ["Er","an",[]]</code> - - <p>since the matching of the "g" in the end of the string - leaves an empty rest which is also returned. This behaviour - differs from the default behaviour of the split function in - Perl, where empty strings at the end are by default removed. To - get the - "trimming" default behavior of Perl, specify - <c>trim</c> as an option:</p> - -<code> re:split("Erlang","[lg]",[{return,list},trim]).</code> - - <p>The result will be:</p> - -<code> ["Er","an"]</code> - - <p>The "trim" option in effect says; "give me as - many parts as possible except the empty ones", which might - be useful in some circumstances. You can also specify how many - parts you want, by specifying <c>{parts,</c>N<c>}</c>:</p> - -<code> re:split("Erlang","[lg]",[{return,list},{parts,2}]).</code> - - <p>This will give:</p> - -<code> ["Er","ang"]</code> - - <p>Note that the last part is "ang", not - "an", as we only specified splitting into two parts, - and the splitting stops when enough parts are given, which is - why the result differs from that of <c>trim</c>.</p> - - <p>More than three parts are not possible with this indata, so</p> - -<code> re:split("Erlang","[lg]",[{return,list},{parts,4}]).</code> - - <p>will give the same result as the default, which is to be - viewed as "an infinite number of parts".</p> - - <p>Specifying <c>0</c> as the number of parts gives the same - effect as the option <c>trim</c>. If subexpressions are - captured, empty subexpression matches at the end are also - stripped from the result if <c>trim</c> or <c>{parts,0}</c> is - specified.</p> + <p>Splits the input into parts by finding tokens according to the + regular expression supplied. The splitting is basically done by + running a global regular expression match and dividing the initial + string wherever a match occurs. The matching part of the string is + removed from the output.</p> + <p>As in <seealso marker="#run/3"><c>run/3</c></seealso>, an <c>mp()</c> + compiled with option <c>unicode</c> requires + <c><anno>Subject</anno></c> to be a Unicode <c>charlist()</c>. If + compilation is done implicitly and the <c>unicode</c> compilation + option is specified to this function, both the regular expression and + <c><anno>Subject</anno></c> are to be specified as valid Unicode + <c>charlist()</c>s.</p> + <p>The result is given as a list of "strings", the preferred + data type specified in option <c>return</c> (default + <c>iodata</c>).</p> + <p>If subexpressions are specified in the regular expression, the + matching subexpressions are returned in the resulting list as + well. For example:</p> + <code> +re:split("Erlang","[ln]",[{return,list}]).</code> + <p>gives</p> + <code> +["Er","a","g"]</code> + <p>while</p> + <code> +re:split("Erlang","([ln])",[{return,list}]).</code> + <p>gives</p> + <code> +["Er","l","a","n","g"]</code> + <p>The text matching the subexpression (marked by the parentheses in the + regular expression) is inserted in the result list where it was found. + This means that concatenating the result of a split where the whole + regular expression is a single subexpression (as in the last example) + always results in the original string.</p> + <p>As there is no matching subexpression for the last part in the + example (the "g"), nothing is inserted after that. To make + the group of strings and the parts matching the subexpressions more + obvious, one can use option <c>group</c>, which groups together the + part of the subject string with the parts matching the subexpressions + when the string was split:</p> + <code> +re:split("Erlang","([ln])",[{return,list},group]).</code> + <p>gives</p> + <code> +[["Er","l"],["a","n"],["g"]]</code> + <p>Here the regular expression first matched the "l", + causing "Er" to be the first part in the result. When + the regular expression matched, the (only) subexpression was + bound to the "l", so the "l" is inserted + in the group together with "Er". The next match is of + the "n", making "a" the next part to be + returned. As the subexpression is bound to substring + "n" in this case, the "n" is inserted into + this group. The last group consists of the remaining string, + as no more matches are found.</p> + <p>By default, all parts of the string, including the empty strings, + are returned from the function, for example:</p> + <code> +re:split("Erlang","[lg]",[{return,list}]).</code> + <p>gives</p> + <code> +["Er","an",[]]</code> + <p>as the matching of the "g" in the end of the string + leaves an empty rest, which is also returned. This behavior + differs from the default behavior of the split function in + Perl, where empty strings at the end are by default removed. To + get the "trimming" default behavior of Perl, specify + <c>trim</c> as an option:</p> + <code> +re:split("Erlang","[lg]",[{return,list},trim]).</code> + <p>gives</p> + <code> +["Er","an"]</code> + <p>The "trim" option says; "give me as many parts as + possible except the empty ones", which sometimes can be + useful. You can also specify how many parts you want, by specifying + <c>{parts,</c>N<c>}</c>:</p> + <code> +re:split("Erlang","[lg]",[{return,list},{parts,2}]).</code> + <p>gives</p> + <code> +["Er","ang"]</code> + <p>Notice that the last part is "ang", not + "an", as splitting was specified into two parts, + and the splitting stops when enough parts are given, which is + why the result differs from that of <c>trim</c>.</p> + <p>More than three parts are not possible with this indata, so</p> + <code> +re:split("Erlang","[lg]",[{return,list},{parts,4}]).</code> + <p>gives the same result as the default, which is to be + viewed as "an infinite number of parts".</p> + <p>Specifying <c>0</c> as the number of parts gives the same + effect as option <c>trim</c>. If subexpressions are + captured, empty subexpressions matched at the end are also + stripped from the result if <c>trim</c> or <c>{parts,0}</c> is + specified.</p> + <p>The <c>trim</c> behavior corresponds exactly to the Perl default. + <c>{parts,N}</c>, where N is a positive integer, corresponds + exactly to the Perl behavior with a positive numerical third + parameter. The default behavior of <c>split/3</c> corresponds + to the Perl behavior when a negative integer is specified as + the third parameter for the Perl routine.</p> + <p>Summary of options not previously described for function + <c>run/3</c>:</p> + <taglist> + <tag><c>{return,<anno>ReturnType</anno>}</c></tag> + <item> + <p>Specifies how the parts of the original string are presented in + the result list. Valid types:</p> + <taglist> + <tag><c>iodata</c></tag> + <item> + <p>The variant of <c>iodata()</c> that gives the least copying + of data with the current implementation (often a binary, but + do not depend on it).</p></item> + <tag><c>binary</c></tag> + <item> + <p>All parts returned as binaries.</p></item> + <tag><c>list</c></tag> + <item> + <p>All parts returned as lists of characters + ("strings").</p> + </item> + </taglist> + </item> + <tag><c>group</c></tag> + <item> + <p>Groups together the part of the string with + the parts of the string matching the subexpressions of the + regular expression.</p> + <p>The return value from the function is in this case a + <c>list()</c> of <c>list()</c>s. Each sublist begins with the + string picked out of the subject string, followed by the parts + matching each of the subexpressions in order of occurrence in the + regular expression.</p> + </item> + <tag><c>{parts,N}</c></tag> + <item> + <p>Specifies the number of parts the subject string is to be + split into.</p> + <p>The number of parts is to be a positive integer for a specific + maximum number of parts, and <c>infinity</c> for the + maximum number of parts possible (the default). Specifying + <c>{parts,0}</c> gives as many parts as possible disregarding + empty parts at the end, the same as specifying <c>trim</c>.</p> + </item> + <tag><c>trim</c></tag> + <item> + <p>Specifies that empty parts at the end of the result list are + to be disregarded. The same as specifying <c>{parts,0}</c>. This + corresponds to the default behavior of the <c>split</c> + built-in function in Perl.</p> + </item> + </taglist> + </desc> + </func> + </funcs> - <p>If you are familiar with Perl, the <c>trim</c> - behaviour corresponds exactly to the Perl default, the - <c>{parts,N}</c> where N is a positive integer corresponds - exactly to the Perl behaviour with a positive numerical third - parameter and the default behaviour of <c>re:split/3</c> corresponds - to that when the Perl routine is given a negative integer as the - third parameter.</p> + <section> + <marker id="regexp_syntax"></marker> + <title>Perl-Like Regular Expression Syntax</title> + <p>The following sections contain reference material for the regular + expressions used by this module. The information is based on the PCRE + documentation, with changes where this module behaves differently to + the PCRE library.</p> + </section> - <p>Summary of options not previously described for the <c>re:run/3</c> function:</p> - <taglist> - <tag>{return,<anno>ReturnType</anno>}</tag> - <item><p>Specifies how the parts of the original string are presented in the result list. The possible types are:</p> - <taglist> - <tag>iodata</tag> - <item>The variant of <c>iodata()</c> that gives the least copying of data with the current implementation (often a binary, but don't depend on it).</item> - <tag>binary</tag> - <item>All parts returned as binaries.</item> - <tag>list</tag> - <item>All parts returned as lists of characters ("strings").</item> - </taglist> + <section> + <marker id="regexp_syntax_details"></marker> + <title>PCRE Regular Expression Details</title> + <p>The syntax and semantics of the regular expressions supported by PCRE are + described in detail in the following sections. Perl's regular expressions + are described in its own documentation, and regular expressions in general + are covered in many books, some with copious examples. + Jeffrey Friedl's "Mastering Regular Expressions", published by O'Reilly, + covers regular expressions in great detail. This description of the PCRE + regular expressions is intended as reference material.</p> + + <p>The reference material is divided into the following sections:</p> + + <list type="bulleted"> + <item><seealso marker="#sect1">Special Start-of-Pattern Items</seealso> </item> - <tag>group</tag> - <item> - - <p>Groups together the part of the string with - the parts of the string matching the subexpressions of the - regexp.</p> - <p>The return value from the function will in this case be a - <c>list()</c> of <c>list()</c>s. Each sublist begins with the - string picked out of the subject string, followed by the parts - matching each of the subexpressions in order of occurrence in the - regular expression.</p> - + <item><seealso marker="#sect2">Characters and Metacharacters</seealso> </item> - <tag>{parts,N}</tag> - <item> - - <p>Specifies the number of parts the subject string is to be - split into.</p> - - <p>The number of parts should be a positive integer for a specific maximum on the - number of parts and <c>infinity</c> for the maximum number of - parts possible (the default). Specifying <c>{parts,0}</c> gives as many parts as - possible disregarding empty parts at the end, the same as - specifying <c>trim</c></p> + <item><seealso marker="#sect3">Backslash</seealso></item> + <item><seealso marker="#sect4">Circumflex and Dollar</seealso></item> + <item><seealso marker="#sect5">Full Stop (Period, Dot) and \N</seealso> </item> - <tag>trim</tag> - <item> - - <p>Specifies that empty parts at the end of the result list are - to be disregarded. The same as specifying <c>{parts,0}</c>. This - corresponds to the default behaviour of the <c>split</c> - built in function in Perl.</p> + <item><seealso marker="#sect6">Matching a Single Data Unit</seealso> + </item> + <item><seealso marker="#sect7">Square Brackets and Character + Classes</seealso></item> + <item><seealso marker="#sect8">Posix Character Classes</seealso></item> + <item><seealso marker="#sect9">Vertical Bar</seealso></item> + <item><seealso marker="#sect10">Internal Option Setting</seealso></item> + <item><seealso marker="#sect11">Subpatterns</seealso></item> + <item><seealso marker="#sect12">Duplicate Subpattern Numbers</seealso> + </item> + <item><seealso marker="#sect13">Named Subpatterns</seealso></item> + <item><seealso marker="#sect14">Repetition</seealso></item> + <item><seealso marker="#sect15">Atomic Grouping and Possessive + Quantifiers</seealso></item> + <item><seealso marker="#sect16">Back References</seealso></item> + <item><seealso marker="#sect17">Assertions</seealso></item> + <item><seealso marker="#sect18">Conditional Subpatterns</seealso></item> + <item><seealso marker="#sect19">Comments</seealso></item> + <item><seealso marker="#sect20">Recursive Patterns</seealso></item> + <item><seealso marker="#sect21">Subpatterns as Subroutines</seealso> </item> - </taglist> + <item><seealso marker="#sect22">Oniguruma Subroutine Syntax</seealso> + </item> + <item><seealso marker="#sect23">Backtracking Control</seealso></item> + </list> + </section> - </desc> - </func> - </funcs> - <section> - <title>PERL LIKE REGULAR EXPRESSIONS SYNTAX</title> - <p><marker id="regexp_syntax"></marker> - The following sections contain reference material for the - regular expressions used by this module. The regular expression - reference is based on the PCRE documentation, with changes in - cases where the re module behaves differently to the PCRE library.</p> + <marker id="sect1"></marker> + <title>Special Start-of-Pattern Items</title> + <p>Some options that can be passed to <seealso marker="#compile/2"> + <c>compile/2</c></seealso> can also be set by special items at the start + of a pattern. These are not Perl-compatible, but are provided to make + these options accessible to pattern writers who are not able to change + the program that processes the pattern. Any number of these items can + appear, but they must all be together right at the start of the + pattern string, and the letters must be in upper case.</p> + + <p><em>UTF Support</em></p> + + <p>Unicode support is basically UTF-8 based. To use Unicode characters, you + either call <seealso marker="#compile/2"><c>compile/2</c></seealso> or + <seealso marker="#run/3"><c>run/3</c></seealso> with option + <c>unicode</c>, or the pattern must start with one of these special + sequences:</p> + + <code> +(*UTF8) +(*UTF)</code> + + <p>Both options give the same effect, the input string is interpreted as + UTF-8. Notice that with these instructions, the automatic conversion of + lists to UTF-8 is not performed by the <c>re</c> functions. Therefore, + using these sequences is not recommended. + Add option <c>unicode</c> when running + <seealso marker="#compile/2"><c>compile/2</c></seealso> instead.</p> + + <p>Some applications that allow their users to supply patterns can wish to + restrict them to non-UTF data for security reasons. If option + <c>never_utf</c> is set at compile time, (*UTF), and so on, are not + allowed, and their appearance causes an error.</p> + + <p><em>Unicode Property Support</em></p> + + <p>The following is another special sequence that can appear at the start of + a pattern:</p> + + <code> +(*UCP)</code> + + <p>This has the same effect as setting option <c>ucp</c>: it causes + sequences such as \d and \w to use Unicode properties to + determine character types, instead of recognizing only characters with + codes < 256 through a lookup table.</p> + + <p><em>Disabling Startup Optimizations</em></p> + + <p>If a pattern starts with <c>(*NO_START_OPT)</c>, + it has the same effect as + setting option <c>no_start_optimize</c> at compile time.</p> + + <p><em>Newline Conventions</em></p> + <marker id="newline_conventions"></marker> + + <p>PCRE supports five conventions for indicating line breaks in strings: a + single CR (carriage return) character, a single LF (line feed) character, + the two-character sequence CRLF, any of the three preceding, and any + Unicode newline sequence.</p> + + <p>A newline convention can also be specified by starting a pattern string + with one of the following five sequences:</p> + + <taglist> + <tag>(*CR)</tag><item>Carriage return</item> + <tag>(*LF)</tag><item>Line feed</item> + <tag>(*CRLF)</tag><item>>Carriage return followed by + line feed</item> + <tag>(*ANYCRLF)</tag><item>Any of the three above</item> + <tag>(*ANY)</tag><item>All Unicode newline sequences</item> + </taglist> + + <p>These override the default and the options specified to + <seealso marker="#compile/2"><c>compile/2</c></seealso>. For example, the + following pattern changes the convention to CR:</p> + + <code> +(*CR)a.b</code> + + <p>This pattern matches <c>a\nb</c>, as LF is no longer a newline. + If more than one of them is present, the last one is used.</p> + + <p>The newline convention affects where the circumflex and dollar assertions + are true. It also affects the interpretation of the dot metacharacter when + <c>dotall</c> is not set, and the behavior of \N. However, it does not + affect what the \R escape sequence matches. By default, this is any + Unicode newline sequence, for Perl compatibility. However, this can be + changed; see the description of \R in section + <seealso marker="#newline_sequences">Newline Sequences</seealso>. A change + of the \R setting can be combined with a change of the newline + convention.</p> + + <p><em>Setting Match and Recursion Limits</em></p> + + <p>The caller of <seealso marker="#run/3"><c>run/3</c></seealso> can set a + limit on the number of times the internal match() function is called and + on the maximum depth of recursive calls. These facilities are provided to + catch runaway matches that are provoked by patterns with huge matching + trees (a typical example is a pattern with nested unlimited repeats) and + to avoid running out of system stack by too much recursion. When one of + these limits is reached, <c>pcre_exec()</c> gives an error return. The + limits can also be set by items at the start of the pattern of the + following forms:</p> + + <code> +(*LIMIT_MATCH=d) +(*LIMIT_RECURSION=d)</code> + + <p>Here d is any number of decimal digits. However, the value of the setting + must be less than the value set by the caller of <c>run/3</c> for it to + have any effect. That is, the pattern writer can lower the limit set by + the programmer, but not raise it. If there is more than one setting of one + of these limits, the lower value is used.</p> + + <p>The default value for both the limits is 10,000,000 in the Erlang + VM. Notice that the recursion limit does not affect the stack depth of the + VM, as PCRE for Erlang is compiled in such a way that the match function + never does recursion on the C stack.</p> </section> -<section><title>PCRE regular expression details</title> - -<p>The syntax and semantics of the regular expressions that are supported by PCRE -are described in detail below. Perl's regular expressions are described in its own documentation, and -regular expressions in general are covered in a number of books, some of which -have copious examples. Jeffrey Friedl's "Mastering Regular Expressions", -published by O'Reilly, covers regular expressions in great detail. This -description of PCRE's regular expressions is intended as reference material.</p> -<p>The reference material is divided into the following sections:</p> -<list> -<item><seealso marker="#sect1">Special start-of-pattern items</seealso></item> -<item><seealso marker="#sect2">Characters and metacharacters</seealso></item> -<item><seealso marker="#sect3">Backslash</seealso></item> -<item><seealso marker="#sect4">Circumflex and dollar</seealso></item> -<item><seealso marker="#sect5">Full stop (period, dot) and \N</seealso></item> -<item><seealso marker="#sect6">Matching a single data unit</seealso></item> -<item><seealso marker="#sect7">Square brackets and character classes</seealso></item> -<item><seealso marker="#sect8">POSIX character classes</seealso></item> -<item><seealso marker="#sect9">Vertical bar</seealso></item> -<item><seealso marker="#sect10">Internal option setting</seealso></item> -<item><seealso marker="#sect11">Subpatterns</seealso></item> -<item><seealso marker="#sect12">Duplicate subpattern numbers</seealso></item> -<item><seealso marker="#sect13">Named subpatterns</seealso></item> -<item><seealso marker="#sect14">Repetition</seealso></item> -<item><seealso marker="#sect15">Atomic grouping and possessive quantifiers</seealso></item> -<item><seealso marker="#sect16">Back references</seealso></item> -<item><seealso marker="#sect17">Assertions</seealso></item> -<item><seealso marker="#sect18">Conditional subpatterns</seealso></item> -<item><seealso marker="#sect19">Comments</seealso></item> -<item><seealso marker="#sect20">Recursive patterns</seealso></item> -<item><seealso marker="#sect21">Subpatterns as subroutines</seealso></item> -<item><seealso marker="#sect22">Oniguruma subroutine syntax</seealso></item> -<!-- XXX C Interface -<item><seealso marker="#sect22">Callouts</seealso></item> ---> -<item><seealso marker="#sect23">Backtracking control</seealso></item> -</list> - -</section> - - -<section><marker id="sect1"></marker><title>Special start-of-pattern items</title> - -<p>A number of options that can be passed to <c>re:compile/2</c> can also be set -by special items at the start of a pattern. These are not Perl-compatible, but -are provided to make these options accessible to pattern writers who are not -able to change the program that processes the pattern. Any number of these -items may appear, but they must all be together right at the start of the -pattern string, and the letters must be in upper case.</p> - -<p><em>UTF support</em></p> -<p> -Unicode support is basically UTF-8 based. To use Unicode characters, you either -call <c>re:compile/2</c>/<c>re:run/3</c> with the <c>unicode</c> option, or the - pattern must start with one of these special sequences:</p> -<quote> -<p> (*UTF8)</p> -<p> (*UTF)</p> -</quote> - -<p>Both options give the same effect, the input string is interpreted -as UTF-8. Note that with these instructions, the automatic conversion -of lists to UTF-8 is not performed by the <c>re</c> functions, why -using these options is not recommended. Add the <c>unicode</c> option -when running <c>re:compile/2</c> instead.</p> - -<p> -Some applications that allow their users to supply patterns may wish to -restrict them to non-UTF data for security reasons. If the <c>never_utf</c> -option is set at compile time, (*UTF) etc. are not allowed, and their -appearance causes an error. -</p> - -<p><em>Unicode property support</em></p> -<p>Another special sequence that may appear at the start of a pattern is</p> -<quote> -<p> (*UCP)</p> -</quote> -<p>This has the same effect as setting the <c>ucp</c> option: it causes sequences -such as \d and \w to use Unicode properties to determine character types, -instead of recognizing only characters with codes less than 256 via a lookup -table. -</p> - -<p><em>Disabling start-up optimizations</em></p> -<p> -If a pattern starts with (*NO_START_OPT), it has the same effect as setting the -<c>no_Start_optimize</c> option at compile time.</p> - -<p><em>Newline conventions</em></p> - -<p>PCRE supports -five -different conventions for indicating line breaks in -strings: a single CR (carriage return) character, a single LF (linefeed) -character, the two-character sequence CRLF -, any of the three preceding, or any -Unicode newline sequence.</p> - -<p>It is also possible to specify a newline convention by starting a pattern -string with one of the following five sequences:</p> - -<taglist> - <tag>(*CR)</tag> <item>carriage return</item> - <tag>(*LF)</tag> <item>linefeed</item> - <tag>(*CRLF)</tag> <item>carriage return, followed by linefeed</item> - <tag>(*ANYCRLF)</tag> <item>any of the three above</item> - <tag>(*ANY)</tag> <item>all Unicode newline sequences</item> -</taglist> - -<p>These override the default and the options given to <c>re:compile/2</c>. For -example, the pattern:</p> - -<quote> -<p> (*CR)a.b</p> -</quote> - -<p>changes the convention to CR. That pattern matches "a\nb" because LF is no -longer a newline. If more than one of them is present, the last one -is used.</p> - -<p>The newline convention affects where the circumflex and dollar assertions are -true. It also affects the interpretation of the dot metacharacter when -<c>dotall</c> is not set, and the behaviour of \N. However, it does not affect -what the \R escape sequence matches. By default, this is any Unicode newline -sequence, for Perl compatibility. However, this can be changed; see the -description of \R in the section entitled - -<em>"Newline sequences"</em> - -below. A change of \R setting can be combined with a change of newline -convention.</p> - -<p><em>Setting match and recursion limits</em></p> - -<p>The caller of <c>re:run/3</c> can set a limit on the number of times the internal match() function is called and on the maximum depth of recursive calls. These facilities are provided to catch runaway matches that are provoked by patterns with huge matching trees (a typical example is a pattern with nested unlimited repeats) and to avoid running out of system stack by too much recursion. When one of these limits is reached, pcre_exec() gives an error return. The limits can also be set by items at the start of the pattern of the form</p> -<quote> -<p> (*LIMIT_MATCH=d)</p> -<p> (*LIMIT_RECURSION=d)</p> -</quote> -<p>where d is any number of decimal digits. However, the value of the setting must be less than the value set by the caller of <c>re:run/3</c> for it to have any effect. In other words, the pattern writer can lower the limit set by the programmer, but not raise it. If there is more than one setting of one of these limits, the lower value is used.</p> - -<p>The current default value for both the limits are 10000000 in the Erlang -VM. Note that the recursion limit does not actually affect the stack -depth of the VM, as PCRE for Erlang is compiled in such a way that the -match function never does recursion on the "C-stack".</p> - -</section> - -<section><marker id="sect2"></marker><title>Characters and metacharacters</title> -<!-- .rs --> - -<p>A regular expression is a pattern that is matched against a subject -string from left to right. Most characters stand for themselves in a -pattern, and match the corresponding characters in the subject. As a -trivial example, the pattern</p> - -<quote> -<p> The quick brown fox</p> -</quote> - -<p>matches a portion of a subject string that is identical to -itself. When caseless matching is specified (the <c>caseless</c> -option), letters are matched independently of case.</p> - -<p>The power of regular expressions comes from the ability to include -alternatives and repetitions in the pattern. These are encoded in the -pattern by the use of <em>metacharacters</em>, which do not stand for -themselves but instead are interpreted in some special way.</p> - -<p>There are two different sets of metacharacters: those that are recognized -anywhere in the pattern except within square brackets, and those that are -recognized within square brackets. Outside square brackets, the metacharacters -are as follows:</p> - -<taglist> - <tag>\</tag> <item>general escape character with several uses</item> - <tag>^</tag> <item>assert start of string (or line, in multiline mode)</item> - <tag>$</tag> <item>assert end of string (or line, in multiline mode)</item> - <tag>.</tag> <item>match any character except newline (by default)</item> - <tag>[</tag> <item>start character class definition</item> - <tag>|</tag> <item>start of alternative branch</item> - <tag>(</tag> <item>start subpattern</item> - <tag>)</tag> <item>end subpattern</item> - <tag>?</tag> <item>extends the meaning of (, - also 0 or 1 quantifier, - also quantifier minimizer</item> - <tag>*</tag> <item>0 or more quantifier</item> - <tag>+</tag> <item>1 or more quantifier, - also "possessive quantifier"</item> - <tag>{</tag> <item>start min/max quantifier</item> -</taglist> - -<p>Part of a pattern that is in square brackets is called a "character class". In -a character class the only metacharacters are:</p> - -<taglist> - <tag>\</tag> <item>general escape character</item> - <tag>^</tag> <item>negate the class, but only if the first character</item> - <tag>-</tag> <item>indicates character range</item> - <tag>[</tag> <item>POSIX character class (only if followed by POSIX - syntax)</item> - <tag>]</tag> <item>terminates the character class</item> + <section> + <marker id="sect2"></marker> + <title>Characters and Metacharacters</title> + <!-- .rs --> + <p>A regular expression is a pattern that is matched against a subject + string from left to right. Most characters stand for themselves in a + pattern and match the corresponding characters in the subject. As a + trivial example, the following pattern matches a portion of a subject + string that is identical to itself:</p> + + <code> +The quick brown fox</code> + + <p>When caseless matching is specified (option <c>caseless</c>), letters + are matched independently of case.</p> + + <p>The power of regular expressions comes from the ability to include + alternatives and repetitions in the pattern. These are encoded in the + pattern by the use of <em>metacharacters</em>, which do not stand for + themselves but instead are interpreted in some special way.</p> + + <p>Two sets of metacharacters exist: those that are recognized anywhere in + the pattern except within square brackets, and those that are recognized + within square brackets. Outside square brackets, the metacharacters are + as follows:</p> + + <taglist> + <tag>\</tag><item>General escape character with many uses</item> + <tag>^</tag><item>Assert start of string (or line, in multiline mode) + </item> + <tag>$</tag><item>Assert end of string (or line, in multiline mode)</item> + <tag>.</tag><item>Match any character except newline (by default)</item> + <tag>[</tag><item>Start character class definition</item> + <tag>|</tag><item>Start of alternative branch</item> + <tag>(</tag><item>Start subpattern</item> + <tag>)</tag><item>End subpattern</item> + <tag>?</tag><item>Extends the meaning of (, also 0 or 1 quantifier, also + quantifier minimizer</item> + <tag>*</tag><item>0 or more quantifiers</item> + <tag>+</tag><item>1 or more quantifier, also "possessive quantifier" + </item> + <tag>{</tag><item>Start min/max quantifier</item> </taglist> -<p>The following sections describe the use of each of the metacharacters.</p> - - -</section> - -<section><marker id="sect3"></marker><title>Backslash</title> + <p>Part of a pattern within square brackets is called a "character class". + The following are the only metacharacters in a character class:</p> + <taglist> + <tag>\</tag><item>General escape character</item> + <tag>^</tag><item>Negate the class, but only if the first character</item> + <tag>-</tag><item>Indicates character range</item> + <tag>[</tag><item>Posix character class (only if followed by Posix syntax) + </item> + <tag>]</tag><item>Terminates the character class</item> + </taglist> -<p>The backslash character has several uses. Firstly, if it is followed by a -character that is not a number or a letter, it takes away any special meaning that character -may have. This use of backslash as an escape character applies both inside and -outside character classes.</p> - -<p>For example, if you want to match a * character, you write \* in the pattern. -This escaping action applies whether or not the following character would -otherwise be interpreted as a metacharacter, so it is always safe to precede a -non-alphanumeric with backslash to specify that it stands for itself. In -particular, if you want to match a backslash, you write \\.</p> - -<p>In <c>unicode</c> mode, only ASCII numbers and letters have any special meaning after a -backslash. All other characters (in particular, those whose codepoints are -greater than 127) are treated as literals.</p> + <p>The following sections describe the use of each metacharacter.</p> + </section> -<p>If a pattern is compiled with the <c>extended</c> option, white space in the -pattern (other than in a character class) and characters between a # outside -a character class and the next newline are ignored. An escaping backslash can -be used to include a white space or # character as part of the pattern.</p> + <section> + <marker id="sect3"></marker> + <title>Backslash</title> + <p>The backslash character has many uses. First, if it is followed by a + character that is not a number or a letter, it takes away any special + meaning that a character can have. This use of backslash as an escape + character applies both inside and outside character classes.</p> + + <p>For example, if you want to match a * character, you write \* in the + pattern. This escaping action applies if the following character would + otherwise be interpreted as a metacharacter, so it is always safe to + precede a non-alphanumeric with backslash to specify that it stands for + itself. In particular, if you want to match a backslash, write \\.</p> + + <p>In <c>unicode</c> mode, only ASCII numbers and letters have any special + meaning after a backslash. All other characters (in particular, those + whose code points are > 127) are treated as literals.</p> + + <p>If a pattern is compiled with option <c>extended</c>, whitespace in the + pattern (other than in a character class) and characters between a # + outside a character class and the next newline are ignored. An escaping + backslash can be used to include a whitespace or # character as part of + the pattern.</p> + + <p>To remove the special meaning from a sequence of characters, put them + between \Q and \E. This is different from Perl in that $ and @ are + handled as literals in \Q...\E sequences in PCRE, while $ and @ cause + variable interpolation in Perl. Notice the following examples:</p> -<p>If you want to remove the special meaning from a sequence of characters, you -can do so by putting them between \Q and \E. This is different from Perl in -that $ and @ are handled as literals in \Q...\E sequences in PCRE, whereas in -Perl, $ and @ cause variable interpolation. Note the following examples:</p> <code type="none"> - Pattern PCRE matches Perl matches - - \Qabc$xyz\E abc$xyz abc followed by the contents of $xyz - \Qabc\$xyz\E abc\$xyz abc\$xyz - \Qabc\E\$\Qxyz\E abc$xyz abc$xyz</code> - - -<p>The \Q...\E sequence is recognized both inside and outside -character classes. An isolated \E that is not preceded by \Q is -ignored. If \Q is not followed by \E later in the pattern, the literal -interpretation continues to the end of the pattern (that is, \E is -assumed at the end). If the isolated \Q is inside a character class, -this causes an error, because the character class is not -terminated.</p> - -<p><em>Non-printing characters</em></p> - -<p>A second use of backslash provides a way of encoding non-printing characters -in patterns in a visible manner. There is no restriction on the appearance of -non-printing characters, apart from the binary zero that terminates a pattern, -but when a pattern is being prepared by text editing, it is often easier to use -one of the following escape sequences than the binary character it represents:</p> - -<taglist> - <tag>\a</tag> <item>alarm, that is, the BEL character (hex 07)</item> - <tag>\cx</tag> <item>"control-x", where x is any ASCII character</item> - <tag>\e </tag> <item>escape (hex 1B)</item> - <tag>\f</tag> <item>form feed (hex 0C)</item> - <tag>\n</tag> <item>linefeed (hex 0A)</item> - <tag>\r</tag> <item>carriage return (hex 0D)</item> - <tag>\t </tag> <item>tab (hex 09)</item> - <tag>\ddd</tag> <item>character with octal code ddd, or back reference</item> - <tag>\xhh </tag> <item>character with hex code hh</item> - <tag>\x{hhh..}</tag> <item>character with hex code hhh..</item> -</taglist> - -<p>The precise effect of \cx on ASCII characters is as follows: if x is a lower -case letter, it is converted to upper case. Then bit 6 of the character (hex -40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A (A is 41, Z is 5A), -but \c{ becomes hex 3B ({ is 7B), and \c; becomes hex 7B (; is 3B). If the -data item (byte or 16-bit value) following \c has a value greater than 127, a -compile-time error occurs. This locks out non-ASCII characters in all modes.</p> - -<p>The \c facility was designed for use with ASCII characters, but with the -extension to Unicode it is even less useful than it once was.</p> - -<p>By default, after \x, from zero to two hexadecimal digits are read (letters -can be in upper or lower case). Any number of hexadecimal digits may appear -between \x{ and }, but the character code is constrained as follows:</p> -<taglist> - <tag>8-bit non-Unicode mode</tag> <item>less than 0x100</item> - <tag>8-bit UTF-8 mode</tag> <item>less than 0x10ffff and a valid codepoint</item> -</taglist> -<p>Invalid Unicode codepoints are the range 0xd800 to 0xdfff (the so-called -"surrogate" codepoints), and 0xffef.</p> - -<p>If characters other than hexadecimal digits appear between \x{ and }, or if -there is no terminating }, this form of escape is not recognized. Instead, the -initial \x will be interpreted as a basic hexadecimal escape, with no -following digits, giving a character whose value is zero.</p> - -<p>Characters whose value is less than 256 can be defined by either of the two -syntaxes for \x. There is no difference in the way they are handled. For -example, \xdc is exactly the same as \x{dc}.</p> - -<p>After \0 up to two further octal digits are read. If there are fewer than two -digits, just those that are present are used. Thus the sequence \0\x\07 -specifies two binary zeros followed by a BEL character (code value 7). Make -sure you supply two digits after the initial zero if the pattern character that -follows is itself an octal digit.</p> - -<p>The handling of a backslash followed by a digit other than 0 is complicated. -Outside a character class, PCRE reads it and any following digits as a decimal -number. If the number is less than 10, or if there have been at least that many -previous capturing left parentheses in the expression, the entire sequence is -taken as a <em>back reference</em>. A description of how this works is given -later, following the discussion of parenthesized subpatterns.</p> - - -<p>Inside a character class, or if the decimal number is greater than 9 and there -have not been that many capturing subpatterns, PCRE re-reads up to three octal -digits following the backslash, and uses them to generate a data character. Any -subsequent digits stand for themselves. The value of the character is -constrained in the same way as characters specified in hexadecimal. -For example:</p> - -<taglist> - <tag>\040</tag> <item>is another way of writing a ASCII space</item> - - <tag>\40</tag> <item>is the same, provided there are fewer than 40 - previous capturing subpatterns</item> - <tag>\7</tag> <item>is always a back reference</item> - - <tag>\11</tag> <item> might be a back reference, or another way of - writing a tab</item> - <tag>\011</tag> <item>is always a tab</item> - <tag>\0113</tag> <item>is a tab followed by the character "3"</item> - - <tag>\113</tag> <item>might be a back reference, otherwise the - character with octal code 113</item> - - <tag>\377</tag> <item>might be a back reference, otherwise - the value 255 (decimal)</item> - - <tag>\81</tag> <item>is either a back reference, or a binary zero - followed by the two characters "8" and "1"</item> -</taglist> - -<p>Note that octal values of 100 or greater must not be introduced by -a leading zero, because no more than three octal digits are ever -read.</p> - -<p>All the sequences that define a single character value can be used both inside -and outside character classes. In addition, inside a character class, \b is -interpreted as the backspace character (hex 08).</p> -<p>\N is not allowed in a character class. \B, \R, and \X are not special -inside a character class. Like other unrecognized escape sequences, they are -treated as the literal characters "B", "R", and "X". Outside a character class, these -sequences have different meanings.</p> - -<p><em>Unsupported escape sequences</em></p> - -<p>In Perl, the sequences \l, \L, \u, and \U are recognized by its string -handler and used to modify the case of following characters. PCRE -does not support these escape sequences.</p> - -<p><em>Absolute and relative back references</em></p> - -<p>The sequence \g followed by an unsigned or a negative number, -optionally enclosed in braces, is an absolute or relative back -reference. A named back reference can be coded as \g{name}. Back -references are discussed later, following the discussion of -parenthesized subpatterns.</p> - -<p><em>Absolute and relative subroutine calls</em></p> -<p>For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or -a number enclosed either in angle brackets or single quotes, is an alternative -syntax for referencing a subpattern as a "subroutine". Details are discussed -later. -Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are <em>not</em> -synonymous. The former is a back reference; the latter is a -subroutine call.</p> - -<p><em>Generic character types</em></p> - -<p>Another use of backslash is for specifying generic character types:</p> - -<taglist> - <tag>\d</tag> <item>any decimal digit</item> - <tag>\D</tag> <item>any character that is not a decimal digit</item> - <tag>\h</tag> <item>any horizontal white space character</item> - <tag>\H</tag> <item>any character that is not a horizontal white space character</item> - <tag>\s</tag> <item>any white space character</item> - <tag>\S</tag> <item>any character that is not a white space character</item> - <tag>\v</tag> <item>any vertical white space character</item> - <tag>\V</tag> <item>any character that is not a vertical white space character</item> - <tag>\w</tag> <item>any "word" character</item> - <tag>\W</tag> <item>any "non-word" character</item> -</taglist> - -<p>There is also the single sequence \N, which matches a non-newline character. -This is the same as the "." metacharacter -when <c>dotall</c> is not set. Perl also uses \N to match characters by name; -PCRE does not support this.</p> - -<p>Each pair of lower and upper case escape sequences partitions the complete set -of characters into two disjoint sets. Any given character matches one, and only -one, of each pair. The sequences can appear both inside and outside character -classes. They each match one character of the appropriate type. If the current -matching point is at the end of the subject string, all of them fail, because -there is no character to match.</p> - -<p>For compatibility with Perl, \s does not match the VT character (code 11). -This makes it different from the POSIX "space" class. The \s characters -are HT (9), LF (10), FF (12), CR (13), and space (32). If "use locale;" is -included in a Perl script, \s may match the VT character. In PCRE, it never -does.</p> - -<p>A "word" character is an underscore or any character that is a letter or digit. -By default, the definition of letters and digits is controlled by PCRE's -low-valued character tables, in Erlang's case (and without the <c>unicode</c> option), -the ISO-Latin-1 character set.</p> - -<p>By default, in <c>unicode</c> mode, characters with values greater than 255, -i.e. all characters outside the ISO-Latin-1 character set, never match -\d, \s, or \w, and always match \D, \S, and \W. These sequences retain -their original meanings from before UTF support was available, mainly for -efficiency reasons. However, if the <c>ucp</c> option is set, the behaviour is changed so that Unicode -properties are used to determine character types, as follows:</p> -<taglist> - <tag>\d</tag> <item>any character that \p{Nd} matches (decimal digit)</item> - <tag>\s</tag> <item>any character that \p{Z} matches, plus HT, LF, FF, CR)</item> - <tag> \w</tag> <item>any character that \p{L} or \p{N} matches, plus underscore)</item> -</taglist> -<p>The upper case escapes match the inverse sets of characters. Note that \d -matches only decimal digits, whereas \w matches any Unicode digit, as well as -any Unicode letter, and underscore. Note also that <c>ucp</c> affects \b, and -\B because they are defined in terms of \w and \W. Matching these sequences -is noticeably slower when <c>ucp</c> is set.</p> - -<p>The sequences \h, \H, \v, and \V are features that were added to Perl at -release 5.10. In contrast to the other sequences, which match only ASCII -characters by default, these always match certain high-valued codepoints, -whether or not <c>ucp</c> is set. The horizontal space characters are:</p> - -<taglist> - <tag>U+0009</tag> <item>Horizontal tab (HT)</item> - <tag>U+0020</tag> <item>Space</item> - <tag>U+00A0</tag> <item>Non-break space</item> - <tag>U+1680</tag> <item>Ogham space mark</item> - <tag>U+180E</tag> <item>Mongolian vowel separator</item> - <tag>U+2000</tag> <item>En quad</item> - <tag>U+2001</tag> <item>Em quad</item> - <tag>U+2002</tag> <item>En space</item> - <tag>U+2003</tag> <item>Em space</item> - <tag>U+2004</tag> <item>Three-per-em space</item> - <tag>U+2005</tag> <item>Four-per-em space</item> - <tag>U+2006</tag> <item>Six-per-em space</item> - <tag>U+2007</tag> <item>Figure space</item> - <tag>U+2008</tag> <item>Punctuation space</item> - <tag>U+2009</tag> <item>Thin space</item> - <tag>U+200A</tag> <item>Hair space</item> - <tag>U+202F</tag> <item>Narrow no-break space</item> - <tag>U+205F</tag> <item>Medium mathematical space</item> - <tag>U+3000</tag> <item>Ideographic space</item> -</taglist> - -<p>The vertical space characters are:</p> - -<taglist> - <tag>U+000A</tag> <item>Linefeed (LF)</item> - <tag>U+000B</tag> <item>Vertical tab (VT)</item> - <tag>U+000C</tag> <item>Form feed (FF)</item> - <tag>U+000D</tag> <item>Carriage return (CR)</item> - <tag>U+0085</tag> <item>Next line (NEL)</item> - <tag>U+2028</tag> <item>Line separator</item> - <tag>U+2029</tag> <item>Paragraph separator</item> -</taglist> - -<p>In 8-bit, non-UTF-8 mode, only the characters with codepoints less than 256 are -relevant.</p> - -<p><em>Newline sequences</em></p> - -<p>Outside a character class, by default, the escape sequence \R matches any -Unicode newline sequence. In non-UTF-8 mode \R is -equivalent to the following:</p> - -<quote><p> (?>\r\n|\n|\x0b|\f|\r|\x85)</p></quote> - -<p>This is an example of an "atomic group", details of which are given below.</p> - -<p>This particular group matches either the two-character sequence CR followed by -LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab, -U+000B), FF (form feed, U+000C), CR (carriage return, U+000D), or NEL (next -line, U+0085). The two-character sequence is treated as a single unit that -cannot be split.</p> - -<p>In Unicode mode, two additional characters whose codepoints are greater than 255 -are added: LS (line separator, U+2028) and PS (paragraph separator, U+2029). -Unicode character property support is not needed for these characters to be -recognized.</p> - - -<p>It is possible to restrict \R to match only CR, LF, or CRLF (instead of the -complete set of Unicode line endings) by setting the option <c>bsr_anycrlf</c> -either at compile time or when the pattern is matched. (BSR is an abbreviation -for "backslash R".) This can be made the default when PCRE is built; if this is -the case, the other behaviour can be requested via the <c>bsr_unicode</c> option. -It is also possible to specify these settings by starting a pattern string with -one of the following sequences:</p> - -<p> (*BSR_ANYCRLF) CR, LF, or CRLF only - (*BSR_UNICODE) any Unicode newline sequence</p> - -<p>These override the default and the options given to the compiling function, but -they can themselves be overridden by options given to a matching function. Note -that these special settings, which are not Perl-compatible, are recognized only -at the very start of a pattern, and that they must be in upper case. If more -than one of them is present, the last one is used. They can be combined with a -change of newline convention; for example, a pattern can start with:</p> - -<p> (*ANY)(*BSR_ANYCRLF)</p> - -<p>They can also be combined with the (*UTF8), (*UTF) or -(*UCP) special sequences. Inside a character class, \R is treated as an -unrecognized escape sequence, and so matches the letter "R" by default.</p> - -<p><em>Unicode character properties</em></p> - -<p>Three additional -escape sequences that match characters with specific properties are available. -When in 8-bit non-UTF-8 mode, these sequences are of course limited to testing -characters whose codepoints are less than 256, but they do work in this mode. -The extra escape sequences are:</p> -<taglist> -<tag>\p{<em>xx</em>}</tag> <item>a character with the <em>xx</em> property</item> -<tag>\P{<em>xx</em>}</tag> <item>a character without the <em>xx</em> property</item> -<tag>\X</tag> <item>a Unicode extended grapheme cluster</item> -</taglist> - -<p>The property names represented by <i>xx</i> above are limited to the Unicode -script names, the general category properties, "Any", which matches any -character (including newline), and some special PCRE properties (described -in the next section). -Other Perl properties such as "InMusicalSymbols" are not currently supported by -PCRE. Note that \P{Any} does not match any characters, so always causes a -match failure.</p> - -<p>Sets of Unicode characters are defined as belonging to certain scripts. A -character from one of these sets can be matched using a script name. For -example:</p> - -<p> \p{Greek} - \P{Han}</p> - -<p>Those that are not part of an identified script are lumped together as -"Common". The current list of scripts is:</p> - -<list> -<item>Arabic</item> -<item>Armenian</item> -<item>Avestan</item> -<item>Balinese</item> -<item>Bamum</item> -<item>Batak</item> -<item>Bengali</item> -<item>Bopomofo</item> -<item>Braille</item> -<item>Buginese</item> -<item>Buhid</item> -<item>Canadian_Aboriginal</item> -<item>Carian</item> -<item>Chakma</item> -<item>Cham</item> -<item>Cherokee</item> -<item>Common</item> -<item>Coptic</item> -<item>Cuneiform</item> -<item>Cypriot</item> -<item>Cyrillic</item> -<item>Deseret</item> -<item>Devanagari</item> -<item>Egyptian_Hieroglyphs</item> -<item>Ethiopic</item> -<item>Georgian</item> -<item>Glagolitic</item> -<item>Gothic</item> -<item>Greek</item> -<item>Gujarati</item> -<item>Gurmukhi</item> -<item>Han</item> -<item>Hangul</item> -<item>Hanunoo</item> -<item>Hebrew</item> -<item>Hiragana</item> -<item>Imperial_Aramaic</item> -<item>Inherited</item> -<item>Inscriptional_Pahlavi</item> -<item>Inscriptional_Parthian</item> -<item>Javanese</item> -<item>Kaithi</item> -<item>Kannada</item> -<item>Katakana</item> -<item>Kayah_Li</item> -<item>Kharoshthi</item> -<item>Khmer</item> -<item>Lao</item> -<item>Latin</item> -<item>Lepcha</item> -<item>Limbu</item> -<item>Linear_B</item> -<item>Lisu</item> -<item>Lycian</item> -<item>Lydian</item> -<item>Malayalam</item> -<item>Mandaic</item> -<item>Meetei_Mayek</item> -<item>Meroitic_Cursive</item> -<item>Meroitic_Hieroglyphs</item> -<item>Miao</item> -<item>Mongolian</item> -<item>Myanmar</item> -<item>New_Tai_Lue</item> -<item>Nko</item> -<item>Ogham</item> -<item>Old_Italic</item> -<item>Old_Persian</item> -<item>Oriya</item> -<item>Old_South_Arabian</item> -<item>Old_Turkic</item> -<item>Ol_Chiki</item> -<item>Osmanya</item> -<item>Phags_Pa</item> -<item>Phoenician</item> -<item>Rejang</item> -<item>Runic</item> -<item>Samaritan</item> -<item>Saurashtra</item> -<item>Sharada</item> -<item>Shavian</item> -<item>Sinhala</item> -<item>Sora_Sompeng</item> -<item>Sundanese</item> -<item>Syloti_Nagri</item> -<item>Syriac</item> -<item>Tagalog</item> -<item>Tagbanwa</item> -<item>Tai_Le</item> -<item>Tai_Tham</item> -<item>Tai_Viet</item> -<item>Takri</item> -<item>Tamil</item> -<item>Telugu</item> -<item>Thaana</item> -<item>Thai</item> -<item>Tibetan</item> -<item>Tifinagh</item> -<item>Ugaritic</item> -<item>Vai</item> -<item>Yi</item> -</list> - -<p>Each character has exactly one Unicode general category property, specified by -a two-letter abbreviation. For compatibility with Perl, negation can be -specified by including a circumflex between the opening brace and the property -name. For example, \p{^Lu} is the same as \P{Lu}.</p> - -<p>If only one letter is specified with \p or \P, it includes all the general -category properties that start with that letter. In this case, in the absence -of negation, the curly brackets in the escape sequence are optional; these two -examples have the same effect:</p> - -<list><item>\p{L}</item> - <item>\pL</item></list> - -<p>The following general category property codes are supported:</p> - -<taglist> - <tag>C</tag> <item>Other</item> - <tag>Cc</tag> <item>Control</item> - <tag>Cf</tag> <item>Format</item> - <tag>Cn</tag> <item>Unassigned</item> - <tag>Co</tag> <item>Private use</item> - <tag>Cs</tag> <item>Surrogate</item> -</taglist> - -<taglist> - <tag>L</tag> <item>Letter</item> - <tag>Ll</tag> <item>Lower case letter</item> - <tag>Lm</tag> <item>Modifier letter</item> - <tag>Lo</tag> <item>Other letter</item> - <tag>Lt</tag> <item>Title case letter</item> - <tag>Lu</tag> <item>Upper case letter</item> -</taglist> - - -<taglist> - <tag>M</tag> <item>Mark</item> - <tag>Mc</tag> <item>Spacing mark</item> - <tag>Me</tag> <item>Enclosing mark</item> - <tag>Mn</tag> <item>Non-spacing mark</item> -</taglist> +Pattern PCRE matches Perl matches + +\Qabc$xyz\E abc$xyz abc followed by the contents of $xyz +\Qabc\$xyz\E abc\$xyz abc\$xyz +\Qabc\E\$\Qxyz\E abc$xyz abc$xyz</code> + + + <p>The \Q...\E sequence is recognized both inside and outside character + classes. An isolated \E that is not preceded by \Q is ignored. If \Q is + not followed by \E later in the pattern, the literal interpretation + continues to the end of the pattern (that is, \E is assumed at the end). + If the isolated \Q is inside a character class, this causes an error, as + the character class is not terminated.</p> + + <p><em>Non-Printing Characters</em></p> + <marker id="non_printing_characters"></marker> + + <p>A second use of backslash provides a way of encoding non-printing + characters in patterns in a visible manner. There is no restriction on the + appearance of non-printing characters, apart from the binary zero that + terminates a pattern. When a pattern is prepared by text editing, it is + often easier to use one of the following escape sequences than the binary + character it represents:</p> + + <taglist> + <tag>\a</tag><item>Alarm, that is, the BEL character (hex 07)</item> + <tag>\cx</tag><item>"Control-x", where x is any ASCII character</item> + <tag>\e</tag><item>Escape (hex 1B)</item> + <tag>\f</tag><item>Form feed (hex 0C)</item> + <tag>\n</tag><item>Line feed (hex 0A)</item> + <tag>\r</tag><item>Carriage return (hex 0D)</item> + <tag>\t</tag><item>Tab (hex 09)</item> + <tag>\ddd</tag><item>Character with octal code ddd, or back reference + </item> + <tag>\xhh</tag><item>Character with hex code hh</item> + <tag>\x{hhh..}</tag><item>Character with hex code hhh..</item> + </taglist> + + <p>The precise effect of \cx on ASCII characters is as follows: if x is a + lowercase letter, it is converted to upper case. Then bit 6 of the + character (hex 40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A + (A is 41, Z is 5A), but \c{ becomes hex 3B ({ is 7B), and \c; becomes + hex 7B (; is 3B). If the data item (byte or 16-bit value) following \c + has a value > 127, a compile-time error occurs. This locks out + non-ASCII characters in all modes.</p> + + <p>The \c facility was designed for use with ASCII characters, but with the + extension to Unicode it is even less useful than it once was.</p> + + <p>By default, after \x, from zero to two hexadecimal digits are read + (letters can be in upper or lower case). Any number of hexadecimal digits + can appear between \x{ and }, but the character code is constrained as + follows:</p> + + <taglist> + <tag>8-bit non-Unicode mode</tag> + <item>< 0x100</item> + <tag>8-bit UTF-8 mode</tag> + <item>< 0x10ffff and a valid code point</item> + </taglist> + + <p>Invalid Unicode code points are the range 0xd800 to 0xdfff (the so-called + "surrogate" code points), and 0xffef.</p> + + <p>If characters other than hexadecimal digits appear between \x{ and }, + or if there is no terminating }, this form of escape is not recognized. + Instead, the initial \x is interpreted as a basic hexadecimal escape, + with no following digits, giving a character whose value is zero.</p> + + <p>Characters whose value is < 256 can be defined by either of the two + syntaxes for \x. There is no difference in the way they are handled. For + example, \xdc is the same as \x{dc}.</p> + + <p>After \0 up to two further octal digits are read. If there are fewer than + two digits, only those that are present are used. Thus the sequence + \0\x\07 specifies two binary zeros followed by a BEL character (code value + 7). Ensure to supply two digits after the initial zero if the pattern + character that follows is itself an octal digit.</p> + + <p>The handling of a backslash followed by a digit other than 0 is + complicated. Outside a character class, PCRE reads it and any following + digits as a decimal number. If the number is < 10, or if there have + been at least that many previous capturing left parentheses in the + expression, the entire sequence is taken as a <em>back reference</em>. A + description of how this works is provided later, following the discussion + of parenthesized subpatterns.</p> + + <p>Inside a character class, or if the decimal number is > 9 and there + have not been that many capturing subpatterns, PCRE re-reads up to three + octal digits following the backslash, and uses them to generate a data + character. Any subsequent digits stand for themselves. The value of the + character is constrained in the same way as characters specified in + hexadecimal. For example:</p> + + <taglist> + <tag>\040</tag> + <item>Another way of writing an ASCII space</item> + <tag>\40</tag> + <item>The same, provided there are < 40 previous capturing + subpatterns</item> + <tag>\7</tag> + <item>Always a back reference</item> + <tag>\11</tag> + <item>Can be a back reference, or another way of writing a tab</item> + <tag>\011</tag> + <item>Always a tab</item> + <tag>\0113</tag> + <item>A tab followed by character "3"</item> + <tag>\113</tag> + <item>Can be a back reference, otherwise the character with octal code + 113 </item> + <tag>\377</tag> + <item>Can be a back reference, otherwise value 255 (decimal)</item> + <tag>\81</tag> + <item>Either a back reference, or a binary zero followed by the two + characters "8" and "1"</item> + </taglist> + + <p>Notice that octal values >= 100 must not be introduced by a leading + zero, as no more than three octal digits are ever read.</p> + + <p>All the sequences that define a single character value can be used both + inside and outside character classes. Also, inside a character class, \b + is interpreted as the backspace character (hex 08).</p> + + <p>\N is not allowed in a character class. \B, \R, and \X are not special + inside a character class. Like other unrecognized escape sequences, they + are treated as the literal characters "B", "R", and "X". Outside a + character class, these sequences have different meanings.</p> + + <p><em>Unsupported Escape Sequences</em></p> + + <p>In Perl, the sequences \l, \L, \u, and \U are recognized by its string + handler and used to modify the case of following characters. PCRE does not + support these escape sequences.</p> + + <p><em>Absolute and Relative Back References</em></p> + + <p>The sequence \g followed by an unsigned or a negative number, optionally + enclosed in braces, is an absolute or relative back reference. A named + back reference can be coded as \g{name}. Back references are discussed + later, following the discussion of parenthesized subpatterns.</p> + + <p><em>Absolute and Relative Subroutine Calls</em></p> + + <p>For compatibility with Oniguruma, the non-Perl syntax \g followed by a + name or a number enclosed either in angle brackets or single quotes, is + alternative syntax for referencing a subpattern as a "subroutine". + Details are discussed later. Notice that \g{...} (Perl syntax) and + \g<...> (Oniguruma syntax) are <em>not</em> synonymous. The former + is a back reference and the latter is a subroutine call.</p> + + <p><em>Generic Character Types</em></p> + <marker id="generic_character_types"></marker> + + <p>Another use of backslash is for specifying generic character types:</p> + + <taglist> + <tag>\d</tag><item>Any decimal digit</item> + <tag>\D</tag><item>Any character that is not a decimal digit</item> + <tag>\h</tag><item>Any horizontal whitespace character</item> + <tag>\H</tag><item>Any character that is not a horizontal whitespace + character</item> + <tag>\s</tag><item>Any whitespace character</item> + <tag>\S</tag><item>Any character that is not a whitespace character + </item> + <tag>\v</tag><item>Any vertical whitespace character</item> + <tag>\V</tag><item>Any character that is not a vertical whitespace + character</item> + <tag>\w</tag><item>Any "word" character</item> + <tag>\W</tag><item>Any "non-word" character</item> + </taglist> + + <p>There is also the single sequence \N, which matches a non-newline + character. This is the same as the "." metacharacter when <c>dotall</c> + is not set. Perl also uses \N to match characters by name, but PCRE does + not support this.</p> + + <p>Each pair of lowercase and uppercase escape sequences partitions the + complete set of characters into two disjoint sets. Any given character + matches one, and only one, of each pair. The sequences can appear both + inside and outside character classes. They each match one character of the + appropriate type. If the current matching point is at the end of the + subject string, all fail, as there is no character to match.</p> + + <p>For compatibility with Perl, \s does not match the VT character + (code 11). This makes it different from the Posix "space" class. The \s + characters are HT (9), LF (10), FF (12), CR (13), and space (32). If "use + locale;" is included in a Perl script, \s can match the VT character. In + PCRE, it never does.</p> + + <p>A "word" character is an underscore or any character that is a letter or + a digit. By default, the definition of letters and digits is controlled by + the PCRE low-valued character tables, in Erlang's case (and without option + <c>unicode</c>), the ISO Latin-1 character set.</p> + + <p>By default, in <c>unicode</c> mode, characters with values > 255, that + is, all characters outside the ISO Latin-1 character set, never match \d, + \s, or \w, and always match \D, \S, and \W. These sequences retain their + original meanings from before UTF support was available, mainly for + efficiency reasons. However, if option <c>ucp</c> is set, the behavior is + changed so that Unicode properties are used to determine character types, + as follows:</p> + + <taglist> + <tag>\d</tag><item>Any character that \p{Nd} matches (decimal digit) + </item> + <tag>\s</tag><item>Any character that \p{Z} matches, plus HT, LF, FF, CR + </item> + <tag>\w</tag><item>Any character that \p{L} or \p{N} matches, plus + underscore</item> + </taglist> + + <p>The uppercase escapes match the inverse sets of characters. Notice that + \d matches only decimal digits, while \w matches any Unicode digit, any + Unicode letter, and underscore. Notice also that <c>ucp</c> affects \b and + \B, as they are defined in terms of \w and \W. Matching these sequences is + noticeably slower when <c>ucp</c> is set.</p> + + <p>The sequences \h, \H, \v, and \V are features that were added to Perl in + release 5.10. In contrast to the other sequences, which match only ASCII + characters by default, these always match certain high-valued code points, + regardless if <c>ucp</c> is set.</p> + + <p>The following are the horizontal space characters:</p> + + <taglist> + <tag>U+0009</tag><item>Horizontal tab (HT)</item> + <tag>U+0020</tag><item>Space</item> + <tag>U+00A0</tag><item>Non-break space</item> + <tag>U+1680</tag><item>Ogham space mark</item> + <tag>U+180E</tag><item>Mongolian vowel separator</item> + <tag>U+2000</tag><item>En quad</item> + <tag>U+2001</tag><item>Em quad</item> + <tag>U+2002</tag><item>En space</item> + <tag>U+2003</tag><item>Em space</item> + <tag>U+2004</tag><item>Three-per-em space</item> + <tag>U+2005</tag><item>Four-per-em space</item> + <tag>U+2006</tag><item>Six-per-em space</item> + <tag>U+2007</tag><item>Figure space</item> + <tag>U+2008</tag><item>Punctuation space</item> + <tag>U+2009</tag><item>Thin space</item> + <tag>U+200A</tag><item>Hair space</item> + <tag>U+202F</tag><item>Narrow no-break space</item> + <tag>U+205F</tag><item>Medium mathematical space</item> + <tag>U+3000</tag><item>Ideographic space</item> + </taglist> + + <p>The following are the vertical space characters:</p> + + <taglist> + <tag>U+000A</tag><item>Line feed (LF)</item> + <tag>U+000B</tag><item>Vertical tab (VT)</item> + <tag>U+000C</tag><item>Form feed (FF)</item> + <tag>U+000D</tag><item>Carriage return (CR)</item> + <tag>U+0085</tag><item>Next line (NEL)</item> + <tag>U+2028</tag><item>Line separator</item> + <tag>U+2029</tag><item>Paragraph separator</item> + </taglist> + + <p>In 8-bit, non-UTF-8 mode, only the characters with code points < 256 + are relevant.</p> + + <p><em>Newline Sequences</em></p> + <marker id="newline_sequences"></marker> + + <p>Outside a character class, by default, the escape sequence \R matches any + Unicode newline sequence. In non-UTF-8 mode, \R is equivalent to the + following:</p> + + <code> +(?>\r\n|\n|\x0b|\f|\r|\x85)</code> + + <p>This is an example of an "atomic group", details are provided below.</p> + + <p>This particular group matches either the two-character sequence CR + followed by LF, or one of the single characters LF (line feed, U+000A), + VT (vertical tab, U+000B), FF (form feed, U+000C), CR (carriage return, + U+000D), or NEL (next line, U+0085). The two-character sequence is + treated as a single unit that cannot be split.</p> + + <p>In Unicode mode, two more characters whose code points are > 255 are + added: LS (line separator, U+2028) and PS (paragraph separator, U+2029). + Unicode character property support is not needed for these characters to + be recognized.</p> + + <p>\R can be restricted to match only CR, LF, or CRLF (instead of the + complete set of Unicode line endings) by setting option <c>bsr_anycrlf</c> + either at compile time or when the pattern is matched. (BSR is an acronym + for "backslash R".) This can be made the default when PCRE is built; if + so, the other behavior can be requested through option + <c>bsr_unicode</c>. These settings can also be specified by starting a + pattern string with one of the following sequences:</p> + + <taglist> + <tag>(*BSR_ANYCRLF)</tag> + <item>CR, LF, or CRLF only</item> + <tag>(*BSR_UNICODE)</tag> + <item>Any Unicode newline sequence</item> + </taglist> + + <p>These override the default and the options specified to the compiling + function, but they can themselves be overridden by options specified to a + matching function. Notice that these special settings, which are not + Perl-compatible, are recognized only at the very start of a pattern, and + that they must be in upper case. If more than one of them is present, the + last one is used. They can be combined with a change of newline + convention; for example, a pattern can start with:</p> + + <code> +(*ANY)(*BSR_ANYCRLF)</code> + + <p>They can also be combined with the (*UTF8), (*UTF), or (*UCP) special + sequences. Inside a character class, \R is treated as an unrecognized + escape sequence, and so matches the letter "R" by default.</p> + + <p><em>Unicode Character Properties</em></p> + + <p>Three more escape sequences that match characters with specific + properties are available. When in 8-bit non-UTF-8 mode, these sequences + are limited to testing characters whose code points are < + 256, but they do work in this mode. The following are the extra escape + sequences:</p> + + <taglist> + <tag>\p{<em>xx</em>}</tag> + <item>A character with property <em>xx</em></item> + <tag>\P{<em>xx</em>}</tag> + <item>A character without property <em>xx</em></item> + <tag>\X</tag> + <item>A Unicode extended grapheme cluster</item> + </taglist> + + <p>The property names represented by <em>xx</em> above are limited to the + Unicode script names, the general category properties, "Any", which + matches any character (including newline), and some special PCRE + properties (described in the next section). Other Perl properties, such as + "InMusicalSymbols", are currently not supported by PCRE. Notice that + \P{Any} does not match any characters and always causes a match + failure.</p> + + <p>Sets of Unicode characters are defined as belonging to certain scripts. + A character from one of these sets can be matched using a script name, for + example:</p> + + <code> +\p{Greek} \P{Han}</code> + + <p>Those that are not part of an identified script are lumped together as + "Common". The following is the current list of scripts:</p> + + <list type="bulleted"> + <item>Arabic</item> + <item>Armenian</item> + <item>Avestan</item> + <item>Balinese</item> + <item>Bamum</item> + <item>Batak</item> + <item>Bengali</item> + <item>Bopomofo</item> + <item>Braille</item> + <item>Buginese</item> + <item>Buhid</item> + <item>Canadian_Aboriginal</item> + <item>Carian</item> + <item>Chakma</item> + <item>Cham</item> + <item>Cherokee</item> + <item>Common</item> + <item>Coptic</item> + <item>Cuneiform</item> + <item>Cypriot</item> + <item>Cyrillic</item> + <item>Deseret</item> + <item>Devanagari</item> + <item>Egyptian_Hieroglyphs</item> + <item>Ethiopic</item> + <item>Georgian</item> + <item>Glagolitic</item> + <item>Gothic</item> + <item>Greek</item> + <item>Gujarati</item> + <item>Gurmukhi</item> + <item>Han</item> + <item>Hangul</item> + <item>Hanunoo</item> + <item>Hebrew</item> + <item>Hiragana</item> + <item>Imperial_Aramaic</item> + <item>Inherited</item> + <item>Inscriptional_Pahlavi</item> + <item>Inscriptional_Parthian</item> + <item>Javanese</item> + <item>Kaithi</item> + <item>Kannada</item> + <item>Katakana</item> + <item>Kayah_Li</item> + <item>Kharoshthi</item> + <item>Khmer</item> + <item>Lao</item> + <item>Latin</item> + <item>Lepcha</item> + <item>Limbu</item> + <item>Linear_B</item> + <item>Lisu</item> + <item>Lycian</item> + <item>Lydian</item> + <item>Malayalam</item> + <item>Mandaic</item> + <item>Meetei_Mayek</item> + <item>Meroitic_Cursive</item> + <item>Meroitic_Hieroglyphs</item> + <item>Miao</item> + <item>Mongolian</item> + <item>Myanmar</item> + <item>New_Tai_Lue</item> + <item>Nko</item> + <item>Ogham</item> + <item>Old_Italic</item> + <item>Old_Persian</item> + <item>Oriya</item> + <item>Old_South_Arabian</item> + <item>Old_Turkic</item> + <item>Ol_Chiki</item> + <item>Osmanya</item> + <item>Phags_Pa</item> + <item>Phoenician</item> + <item>Rejang</item> + <item>Runic</item> + <item>Samaritan</item> + <item>Saurashtra</item> + <item>Sharada</item> + <item>Shavian</item> + <item>Sinhala</item> + <item>Sora_Sompeng</item> + <item>Sundanese</item> + <item>Syloti_Nagri</item> + <item>Syriac</item> + <item>Tagalog</item> + <item>Tagbanwa</item> + <item>Tai_Le</item> + <item>Tai_Tham</item> + <item>Tai_Viet</item> + <item>Takri</item> + <item>Tamil</item> + <item>Telugu</item> + <item>Thaana</item> + <item>Thai</item> + <item>Tibetan</item> + <item>Tifinagh</item> + <item>Ugaritic</item> + <item>Vai</item> + <item>Yi</item> + </list> + + <p>Each character has exactly one Unicode general category property, + specified by a two-letter acronym. For compatibility with Perl, negation + can be specified by including a circumflex between the opening brace and + the property name. For example, \p{^Lu} is the same as \P{Lu}.</p> + + <p>If only one letter is specified with \p or \P, it includes all the + general category properties that start with that letter. In this case, in + the absence of negation, the curly brackets in the escape sequence are + optional. The following two examples have the same effect:</p> + + <code> +\p{L} +\pL</code> + + <p>The following general category property codes are supported:</p> + + <taglist> + <tag>C</tag><item>Other</item> + <tag>Cc</tag><item>Control</item> + <tag>Cf</tag><item>Format</item> + <tag>Cn</tag><item>Unassigned</item> + <tag>Co</tag><item>Private use</item> + <tag>Cs</tag><item>Surrogate</item> + <tag>L</tag><item>Letter</item> + <tag>Ll</tag><item>Lowercase letter</item> + <tag>Lm</tag><item>Modifier letter</item> + <tag>Lo</tag><item>Other letter</item> + <tag>Lt</tag><item>Title case letter</item> + <tag>Lu</tag><item>Uppercase letter</item> + <tag>M</tag><item>Mark</item> + <tag>Mc</tag><item>Spacing mark</item> + <tag>Me</tag><item>Enclosing mark</item> + <tag>Mn</tag><item>Non-spacing mark</item> + <tag>N</tag><item>Number</item> + <tag>Nd</tag><item>Decimal number</item> + <tag>Nl</tag><item>Letter number</item> + <tag>No</tag><item>Other number</item> + <tag>P</tag><item>Punctuation</item> + <tag>Pc</tag><item>Connector punctuation</item> + <tag>Pd</tag><item>Dash punctuation</item> + <tag>Pe</tag><item>Close punctuation</item> + <tag>Pf</tag><item>Final punctuation</item> + <tag>Pi</tag><item>Initial punctuation</item> + <tag>Po</tag><item>Other punctuation</item> + <tag>Ps</tag><item>Open punctuation</item> + <tag>S</tag><item>Symbol</item> + <tag>Sc</tag><item>Currency symbol</item> + <tag>Sk</tag><item>Modifier symbol</item> + <tag>Sm</tag><item>Mathematical symbol</item> + <tag>So</tag><item>Other symbol</item> + <tag>Z</tag><item>Separator</item> + <tag>Zl</tag><item>Line separator</item> + <tag>Zp</tag><item>Paragraph separator</item> + <tag>Zs</tag><item>Space separator</item> + </taglist> + + <p>The special property L& is also supported. It matches a character + that has the Lu, Ll, or Lt property, that is, a letter that is not + classified as a modifier or "other".</p> + + <p>The Cs (Surrogate) property applies only to characters in the range + U+D800 to U+DFFF. Such characters are invalid in Unicode strings and so + cannot be tested by PCRE. Perl does not support the Cs property.</p> + + <p>The long synonyms for property names supported by Perl (such as + \p{Letter}) are not supported by PCRE. It is not permitted to prefix any + of these properties with "Is".</p> + + <p>No character in the Unicode table has the Cn (unassigned) property. + This property is instead assumed for any code point that is not in the + Unicode table.</p> + + <p>Specifying caseless matching does not affect these escape sequences. For + example, \p{Lu} always matches only uppercase letters. This is different + from the behavior of current versions of Perl.</p> + + <p>Matching characters by Unicode property is not fast, as PCRE must do a + multistage table lookup to find a character property. That is why the + traditional escape sequences such as \d and \w do not use Unicode + properties in PCRE by default. However, you can make them do so by setting + option <c>ucp</c> or by starting the pattern with (*UCP).</p> + + <p><em>Extended Grapheme Clusters</em></p> + + <p>The \X escape matches any number of Unicode characters that form an + "extended grapheme cluster", and treats the sequence as an atomic group + (see below). Up to and including release 8.31, PCRE matched an earlier, + simpler definition that was equivalent to <c>(?>\PM\pM*)</c>. That is, + it matched a character without the "mark" property, followed by zero or + more characters with the "mark" property. Characters with the "mark" + property are typically non-spacing accents that affect the preceding + character.</p> + + <p>This simple definition was extended in Unicode to include more + complicated kinds of composite character by giving each character a + grapheme breaking property, and creating rules that use these properties + to define the boundaries of extended grapheme clusters. In PCRE releases + later than 8.31, \X matches one of these clusters.</p> + + <p>\X always matches at least one character. Then it decides whether to add + more characters according to the following rules for ending a cluster:</p> + + <list type="ordered"> + <item> + <p>End at the end of the subject string.</p> + </item> + <item> + <p>Do not end between CR and LF; otherwise end after any control + character.</p> + </item> + <item> + <p>Do not break Hangul (a Korean script) syllable sequences. Hangul + characters are of five types: L, V, T, LV, and LVT. An L character can + be followed by an L, V, LV, or LVT character. An LV or V character can + be followed by a V or T character. An LVT or T character can be + followed only by a T character.</p> + </item> + <item> + <p>Do not end before extending characters or spacing marks. Characters + with the "mark" property always have the "extend" grapheme breaking + property.</p> + </item> + <item> + <p>Do not end after prepend characters.</p> + </item> + <item> + <p>Otherwise, end the cluster.</p> + </item> + </list> -<taglist> - <tag>N</tag> <item>Number</item> - <tag>Nd</tag> <item>Decimal number</item> - <tag>Nl</tag> <item>Letter number</item> - <tag>No</tag> <item>Other number</item> -</taglist> + <p><em>PCRE Additional Properties</em></p> -<taglist> - <tag>P</tag> <item>Punctuation</item> - <tag>Pc</tag> <item>Connector punctuation</item> - <tag>Pd</tag> <item>Dash punctuation</item> - <tag>Pe</tag> <item>Close punctuation</item> - <tag>Pf</tag> <item>Final punctuation</item> - <tag>Pi</tag> <item>Initial punctuation</item> - <tag>Po</tag> <item>Other punctuation</item> - <tag>Ps</tag> <item>Open punctuation</item> -</taglist> + <p>In addition to the standard Unicode properties described earlier, PCRE + supports four more that make it possible to convert traditional escape + sequences, such as \w and \s, and Posix character classes to use Unicode + properties. PCRE uses these non-standard, non-Perl properties internally + when <c>PCRE_UCP</c> is set. However, they can also be used explicitly. + The properties are as follows:</p> -<taglist> - <tag>S</tag> <item>Symbol</item> - <tag>Sc</tag> <item>Currency symbol</item> - <tag>Sk</tag> <item>Modifier symbol</item> - <tag>Sm</tag> <item>Mathematical symbol</item> - <tag>So</tag> <item>Other symbol</item> -</taglist> + <taglist> + <tag>Xan</tag> + <item> + <p>Any alphanumeric character. Matches characters that have either the + L (letter) or the N (number) property.</p> + </item> + <tag>Xps</tag> + <item> + <p>Any Posix space character. Matches the characters tab, line feed, + vertical tab, form feed, carriage return, and any other character + that has the Z (separator) property.</p> + </item> + <tag>Xsp</tag> + <item> + <p>Any Perl space character. Matches the same as Xps, except that + vertical tab is excluded.</p> + </item> + <tag>Xwd</tag> + <item> + <p>Any Perl "word" character. Matches the same characters as Xan, plus + underscore.</p> + </item> + </taglist> + + <p>There is another non-standard property, Xuc, which matches any character + that can be represented by a Universal Character Name in C++ and other + programming languages. These are the characters $, @, ` (grave accent), + and all characters with Unicode code points >= U+00A0, except for the + surrogates U+D800 to U+DFFF. Notice that most base (ASCII) characters are + excluded. (Universal Character Names are of the form \uHHHH or \UHHHHHHHH, + where H is a hexadecimal digit. Notice that the Xuc property does not + match these sequences but the characters that they represent.)</p> + + <p><em>Resetting the Match Start</em></p> + + <p>The escape sequence \K causes any previously matched characters not to + be included in the final matched sequence. For example, the following + pattern matches "foobar", but reports that it has matched "bar":</p> + + <code> +foo\Kbar</code> + + <p>This feature is similar to a lookbehind assertion + <!-- HTML <a href="#lookbehind"> --> + <!-- </a> --> + (described below). However, in this case, the part of the subject before + the real match does not have to be of fixed length, as lookbehind + assertions do. The use of \K does not interfere with the setting of + captured substrings. For example, when the following pattern matches + "foobar", the first substring is still set to "foo":</p> -<taglist> - <tag>Z</tag> <item>Separator</item> - <tag>Zl</tag> <item>Line separator</item> - <tag>Zp</tag> <item>Paragraph separator</item> - <tag>Zs</tag> <item>Space separator</item> -</taglist> +<code> +(foo)\Kbar</code> + + <p>Perl documents that the use of \K within assertions is "not well + defined". In PCRE, \K is acted upon when it occurs inside positive + assertions, but is ignored in negative assertions.</p> + + <p><em>Simple Assertions</em></p> + + <p>The final use of backslash is for certain simple assertions. An + assertion specifies a condition that must be met at a particular point in + a match, without consuming any characters from the subject string. The + use of subpatterns for more complicated assertions is described below. The + following are the backslashed assertions:</p> + + <taglist> + <tag>\b</tag><item>Matches at a word boundary.</item> + <tag>\B</tag><item>Matches when not at a word boundary.</item> + <tag>\A</tag><item>Matches at the start of the subject.</item> + <tag>\Z</tag><item>Matches at the end of the subject, and before a newline + at the end of the subject.</item> + <tag>\z</tag><item>Matches only at the end of the subject.</item> + <tag>\G</tag><item>Matches at the first matching position in the subject. + </item> + </taglist> + + <p>Inside a character class, \b has a different meaning; it matches the + backspace character. If any other of these assertions appears in a + character class, by default it matches the corresponding literal character + (for example, \B matches the letter B).</p> + + <p>A word boundary is a position in the subject string where the current + character and the previous character do not both match \w or \W (that is, + one matches \w and the other matches \W), or the start or end of the + string if the first or last character matches \w, respectively. In UTF + mode, the meanings of \w and \W can be changed by setting option + <c>ucp</c>. When this is done, it also affects \b and \B. PCRE and Perl do + not have a separate "start of word" or "end of word" metasequence. + However, whatever follows \b normally determines which it is. For example, + the fragment \ba matches "a" at the start of a word.</p> + + <p>The \A, \Z, and \z assertions differ from the traditional circumflex and + dollar (described in the next section) in that they only ever match at the + very start and end of the subject string, whatever options are set. Thus, + they are independent of multiline mode. These three assertions are not + affected by options <c>notbol</c> or <c>noteol</c>, which affect only the + behavior of the circumflex and dollar metacharacters. However, if argument + <c>startoffset</c> of <seealso marker="#run/3"><c>run/3</c></seealso> is + non-zero, indicating that matching is to start at a point other than the + beginning of the subject, \A can never match. The difference between \Z + and \z is that \Z matches before a newline at the end of the string and + at the very end, while \z matches only at the end.</p> + + <p>The \G assertion is true only when the current matching position is at + the start point of the match, as specified by argument <c>startoffset</c> + of <c>run/3</c>. It differs from \A when the value of <c>startoffset</c> + is non-zero. By calling <c>run/3</c> multiple times with appropriate + arguments, you can mimic the Perl option <c>/g</c>, and it is in this + kind of implementation where \G can be useful.</p> + + <p>Notice, however, that the PCRE interpretation of \G, as the start of the + current match, is subtly different from Perl, which defines it as the end + of the previous match. In Perl, these can be different when the previously + matched string was empty. As PCRE does only one match at a time, it cannot + reproduce this behavior.</p> + + <p>If all the alternatives of a pattern begin with \G, the expression is + anchored to the starting match position, and the "anchored" flag is set in + the compiled regular expression.</p> + </section> -<p>The special property L& is also supported: it matches a character that has -the Lu, Ll, or Lt property, in other words, a letter that is not classified as -a modifier or "other".</p> - -<p>The Cs (Surrogate) property applies only to characters in the range U+D800 to -U+DFFF. Such characters are not valid in Unicode strings and so -cannot be tested by PCRE. Perl does not support the Cs property</p> - -<p>The long synonyms for property names that Perl supports (such as \p{Letter}) -are not supported by PCRE, nor is it permitted to prefix any of these -properties with "Is".</p> - -<p>No character that is in the Unicode table has the Cn (unassigned) property. -Instead, this property is assumed for any code point that is not in the -Unicode table.</p> - -<p>Specifying caseless matching does not affect these escape sequences. For -example, \p{Lu} always matches only upper case letters. This is different from -the behaviour of current versions of Perl.</p> -<p>Matching characters by Unicode property is not fast, because PCRE has to do a -multistage table lookup in order to find a character's property. That is why -the traditional escape sequences such as \d and \w do not use Unicode -properties in PCRE by default, though you can make them do so by setting the -<c>ucp</c> option or by starting the pattern with (*UCP).</p> - -<p><em>Extended grapheme clusters</em></p> -<p>The \X escape matches any number of Unicode characters that form an "extended -grapheme cluster", and treats the sequence as an atomic group (see below). -Up to and including release 8.31, PCRE matched an earlier, simpler definition -that was equivalent to</p> - -<quote><p> (?>\PM\pM*)</p></quote> - -<p>That is, it matched a character without the "mark" property, followed by zero -or more characters with the "mark" property. Characters with the "mark" -property are typically non-spacing accents that affect the preceding character.</p> - -<p>This simple definition was extended in Unicode to include more complicated -kinds of composite character by giving each character a grapheme breaking -property, and creating rules that use these properties to define the boundaries -of extended grapheme clusters. In releases of PCRE later than 8.31, \X matches -one of these clusters.</p> - -<p>\X always matches at least one character. Then it decides whether to add -additional characters according to the following rules for ending a cluster:</p> -<taglist> -<tag>1.</tag> <item>End at the end of the subject string.</item> -<tag>2.</tag> <item>Do not end between CR and LF; otherwise end after any control character.</item> -<tag>3.</tag> <item>Do not break Hangul (a Korean script) syllable sequences. Hangul characters -are of five types: L, V, T, LV, and LVT. An L character may be followed by an -L, V, LV, or LVT character; an LV or V character may be followed by a V or T -character; an LVT or T character may be follwed only by a T character.</item> -<tag>4.</tag> <item>Do not end before extending characters or spacing marks. Characters with -the "mark" property always have the "extend" grapheme breaking property.</item> -<tag>5.</tag> <item>Do not end after prepend characters.</item> -<tag>6.</tag> <item>Otherwise, end the cluster.</item> -</taglist> + <section> + <marker id="sect4"></marker> + <title>Circumflex and Dollar</title> + <p>The circumflex and dollar metacharacters are zero-width assertions. That + is, they test for a particular condition to be true without consuming any + characters from the subject string.</p> + + <p>Outside a character class, in the default matching mode, the circumflex + character is an assertion that is true only if the current matching point + is at the start of the subject string. If argument <c>startoffset</c> of + <seealso marker="#run/3"><c>run/3</c></seealso> is non-zero, circumflex + can never match if option <c>multiline</c> is unset. Inside a character + class, circumflex has an entirely different meaning (see below).</p> + + <p>Circumflex needs not to be the first character of the pattern if + some alternatives are involved, but it is to be the first thing in + each alternative in which it appears if the pattern is ever to match that + branch. If all possible alternatives start with a circumflex, that is, if + the pattern is constrained to match only at the start of the subject, it + is said to be an "anchored" pattern. (There are also other constructs that + can cause a pattern to be anchored.)</p> + + <p>The dollar character is an assertion that is true only if the current + matching point is at the end of the subject string, or immediately before + a newline at the end of the string (by default). Notice however that it + does not match the newline. Dollar needs not to be the last character of + the pattern if some alternatives are involved, but it is to be the + last item in any branch in which it appears. Dollar has no special meaning + in a character class.</p> + + <p>The meaning of dollar can be changed so that it matches only at the very + end of the string, by setting option <c>dollar_endonly</c> at compile + time. This does not affect the \Z assertion.</p> + + <p>The meanings of the circumflex and dollar characters are changed if + option <c>multiline</c> is set. When this is the case, a circumflex + matches immediately after internal newlines and at the start of the + subject string. It does not match after a newline that ends the string. A + dollar matches before any newlines in the string, and at the very end, + when <c>multiline</c> is set. When newline is specified as the + two-character sequence CRLF, isolated CR and LF characters do not + indicate newlines.</p> + + <p>For example, the pattern /^abc$/ matches the subject string "def\nabc" + (where \n represents a newline) in multiline mode, but not otherwise. + So, patterns that are anchored in single-line mode because all + branches start with ^ are not anchored in multiline mode, and a match for + circumflex is possible when argument <em>startoffset</em> of <c>run/3</c> + is non-zero. Option <c>dollar_endonly</c> is ignored if <c>multiline</c> + is set.</p> + + <p>Notice that the sequences \A, \Z, and \z can be used to match the start + and end of the subject in both modes. If all branches of a pattern start + with \A, it is always anchored, regardless if <c>multiline</c> is set.</p> + </section> -<p><em>PCRE's additional properties</em></p> - -<p>As well as the standard Unicode properties described above, PCRE supports four -more that make it possible to convert traditional escape sequences such as \w -and \s and POSIX character classes to use Unicode properties. PCRE uses these -non-standard, non-Perl properties internally when PCRE_UCP is set. However, -they may also be used explicitly. These properties are:</p> -<taglist> - <tag>Xan</tag> <item>Any alphanumeric character</item> - <tag>Xps</tag> <item>Any POSIX space character</item> - <tag>Xsp</tag> <item>Any Perl space character</item> - <tag>Xwd</tag> <item>Any Perl "word" character</item> -</taglist> -<p>Xan matches characters that have either the L (letter) or the N (number) -property. Xps matches the characters tab, linefeed, vertical tab, form feed, or -carriage return, and any other character that has the Z (separator) property. -Xsp is the same as Xps, except that vertical tab is excluded. Xwd matches the -same characters as Xan, plus underscore.</p> - -<p>There is another non-standard property, Xuc, which matches any character that -can be represented by a Universal Character Name in C++ and other programming -languages. These are the characters $, @, ` (grave accent), and all characters -with Unicode code points greater than or equal to U+00A0, except for the -surrogates U+D800 to U+DFFF. Note that most base (ASCII) characters are -excluded. (Universal Character Names are of the form \uHHHH or \UHHHHHHHH -where H is a hexadecimal digit. Note that the Xuc property does not match these -sequences but the characters that they represent.)</p> - -<p><em>Resetting the match start</em></p> - -<p>The escape sequence \K causes any previously matched characters not to be -included in the final matched sequence. For example, the pattern:</p> - -<quote><p> foo\Kbar</p></quote> - -<p>matches "foobar", but reports that it has matched "bar". This feature is -similar to a lookbehind assertion -<!-- HTML <a href="#lookbehind"> --> -<!-- </a> --> -(described below). - -However, in this case, the part of the subject before the real match does not -have to be of fixed length, as lookbehind assertions do. The use of \K does -not interfere with the setting of -captured substrings. -For example, when the pattern</p> - -<quote><p> (foo)\Kbar</p></quote> - -<p>matches "foobar", the first substring is still set to "foo".</p> - -<p>Perl documents that the use of \K within assertions is "not well defined". In -PCRE, \K is acted upon when it occurs inside positive assertions, but is -ignored in negative assertions.</p> - -<p><em>Simple assertions</em></p> - -<p>The final use of backslash is for certain simple assertions. An -assertion specifies a condition that has to be met at a particular -point in a match, without consuming any characters from the subject -string. The use of subpatterns for more complicated assertions is -described below. The backslashed assertions are:</p> - -<taglist> - <tag>\b</tag> <item>matches at a word boundary</item> - <tag>\B</tag> <item>matches when not at a word boundary</item> - <tag>\A</tag> <item>matches at the start of the subject</item> - <tag>\Z</tag> <item>matches at the end of the subject - also matches before a newline at the end of - the subject</item> - <tag>\z</tag> <item>matches only at the end of the subject</item> - <tag>\G</tag> <item>matches at the first matching position in the - subject</item> -</taglist> + <section> + <marker id="sect5"></marker> + <title>Full Stop (Period, Dot) and \N</title> + <p>Outside a character class, a dot in the pattern matches any character in + the subject string except (by default) a character that signifies the end + of a line.</p> + + <p>When a line ending is defined as a single character, dot never matches + that character. When the two-character sequence CRLF is used, dot does not + match CR if it is immediately followed by LF, otherwise it matches all + characters (including isolated CRs and LFs). When any Unicode line endings + are recognized, dot does not match CR, LF, or any of the other + line-ending characters.</p> + + <p>The behavior of dot regarding newlines can be changed. If option + <c>dotall</c> is set, a dot matches any character, without exception. If + the two-character sequence CRLF is present in the subject string, it takes + two dots to match it.</p> + + <p>The handling of dot is entirely independent of the handling of circumflex + and dollar, the only relationship is that both involve newlines. Dot has + no special meaning in a character class.</p> + + <p>The escape sequence \N behaves like a dot, except that it is not affected + by option <c>PCRE_DOTALL</c>. That is, it matches any character except one + that signifies the end of a line. Perl also uses \N to match characters by + name but PCRE does not support this.</p> + </section> -<p>Inside a character class, \b has a different meaning; it matches the backspace -character. If any other of these assertions appears in a character class, by -default it matches the corresponding literal character (for example, \B -matches the letter B). </p> - -<p>A word boundary is a position in the subject string where the current character -and the previous character do not both match \w or \W (i.e. one matches -\w and the other matches \W), or the start or end of the string if the -first or last character matches \w, respectively. In a UTF mode, the meanings -of \w and \W can be changed by setting the <c>ucp</c> option. When this is -done, it also affects \b and \B. Neither PCRE nor Perl has a separate "start -of word" or "end of word" metasequence. However, whatever follows \b normally -determines which it is. For example, the fragment \ba matches "a" at the start -of a word.</p> - -<p>The \A, \Z, and \z assertions differ from the traditional circumflex and -dollar (described in the next section) in that they only ever match at the very -start and end of the subject string, whatever options are set. Thus, they are -independent of multiline mode. These three assertions are not affected by the -<c>notbol</c> or <c>noteol</c> options, which affect only the behaviour of the -circumflex and dollar metacharacters. However, if the <em>startoffset</em> -argument of <c>re:run/3</c> is non-zero, indicating that matching is to start -at a point other than the beginning of the subject, \A can never match. The -difference between \Z and \z is that \Z matches before a newline at the end -of the string as well as at the very end, whereas \z matches only at the end.</p> - -<p>The \G assertion is true only when the current matching position is at the -start point of the match, as specified by the <em>startoffset</em> argument of -<c>re:run/3</c>. It differs from \A when the value of <em>startoffset</em> is -non-zero. By calling <c>re:run/3</c> multiple times with appropriate -arguments, you can mimic Perl's /g option, and it is in this kind of -implementation where \G can be useful.</p> - -<p>Note, however, that PCRE's interpretation of \G, as the start of the current -match, is subtly different from Perl's, which defines it as the end of the -previous match. In Perl, these can be different when the previously matched -string was empty. Because PCRE does just one match at a time, it cannot -reproduce this behaviour.</p> - -<p>If all the alternatives of a pattern begin with \G, the expression is anchored -to the starting match position, and the "anchored" flag is set in the compiled -regular expression.</p> - -</section> - -<section><marker id="sect4"></marker><title>Circumflex and dollar</title> - -<p>The circumflex and dollar metacharacters are zero-width assertions. That is, -they test for a particular condition being true without consuming any -characters from the subject string.</p> - -<p>Outside a character class, in the default matching mode, the circumflex -character is an assertion that is true only if the current matching point is at -the start of the subject string. If the <i>startoffset</i> argument of -<c>re:run/3</c> is non-zero, circumflex can never match if the <c>multiline</c> -option is unset. Inside a character class, circumflex has an entirely different -meaning (see below).</p> - -<p>Circumflex need not be the first character of the pattern if a number of -alternatives are involved, but it should be the first thing in each alternative -in which it appears if the pattern is ever to match that branch. If all -possible alternatives start with a circumflex, that is, if the pattern is -constrained to match only at the start of the subject, it is said to be an -"anchored" pattern. (There are also other constructs that can cause a pattern -to be anchored.)</p> - -<p>The dollar character is an assertion that is true only if the current matching -point is at the end of the subject string, or immediately before a newline at -the end of the string (by default). Note, however, that it does not actually -match the newline. Dollar need not be the last character of the pattern if a -number of alternatives are involved, but it should be the last item in any -branch in which it appears. Dollar has no special meaning in a character class.</p> - -<p>The meaning of dollar can be changed so that it matches only at the -very end of the string, by setting the <c>dollar_endonly</c> option at -compile time. This does not affect the \Z assertion.</p> - -<p>The meanings of the circumflex and dollar characters are changed if the -<c>multiline</c> option is set. When this is the case, a circumflex matches -immediately after internal newlines as well as at the start of the subject -string. It does not match after a newline that ends the string. A dollar -matches before any newlines in the string, as well as at the very end, when -<c>multiline</c> is set. When newline is specified as the two-character -sequence CRLF, isolated CR and LF characters do not indicate newlines.</p> - -<p>For example, the pattern /^abc$/ matches the subject string -"def\nabc" (where \n represents a newline) in multiline mode, but -not otherwise. Consequently, patterns that are anchored in single line -mode because all branches start with ^ are not anchored in multiline -mode, and a match for circumflex is possible when the -<em>startoffset</em> argument of <c>re:run/3</c> is non-zero. The -<c>dollar_endonly</c> option is ignored if <c>multiline</c> is set.</p> - -<p>Note that the sequences \A, \Z, and \z can be used to match the start and -end of the subject in both modes, and if all branches of a pattern start with -\A it is always anchored, whether or not <c>multiline</c> is set.</p> - - -</section> - -<section><marker id="sect5"></marker><title>Full stop (period, dot) and \N</title> - -<p>Outside a character class, a dot in the pattern matches any one character in -the subject string except (by default) a character that signifies the end of a -line. -</p> - -<p>When a line ending is defined as a single character, dot never matches that -character; when the two-character sequence CRLF is used, dot does not match CR -if it is immediately followed by LF, but otherwise it matches all characters -(including isolated CRs and LFs). -When any Unicode line endings are being -recognized, dot does not match CR or LF or any of the other line ending -characters. -</p> - -<p>The behaviour of dot with regard to newlines can be changed. If -the <c>dotall</c> option is set, a dot matches any one character, -without exception. If the two-character sequence CRLF is present in -the subject string, it takes two dots to match it.</p> - -<p>The handling of dot is entirely independent of the handling of -circumflex and dollar, the only relationship being that they both -involve newlines. Dot has no special meaning in a character class.</p> - -<p>The escape sequence \N behaves like a dot, except that it is not affected by -the PCRE_DOTALL option. In other words, it matches any character except one -that signifies the end of a line. Perl also uses \N to match characters by -name; PCRE does not support this.</p> - -</section> - -<section><marker id="sect6"></marker><title>Matching a single data unit</title> - -<p>Outside a character class, the escape sequence \C matches any one data unit, -whether or not a UTF mode is set. One data unit is one -byte. Unlike a dot, \C always -matches line-ending characters. The feature is provided in Perl in order to -match individual bytes in UTF-8 mode, but it is unclear how it can usefully be -used. Because \C breaks up characters into individual data units, matching one -unit with \C in a UTF mode means that the rest of the string may start with a -malformed UTF character. This has undefined results, because PCRE assumes that -it is dealing with valid UTF strings.</p> - -<p>PCRE does not allow \C to appear in lookbehind assertions (described below) -in a UTF mode, because this would make it impossible to calculate the length of -the lookbehind.</p> - -<p>In general, the \C escape sequence is best avoided. However, one -way of using it that avoids the problem of malformed UTF characters is to use a -lookahead to check the length of the next character, as in this pattern, which -could be used with a UTF-8 string (ignore white space and line breaks):</p> + <section> + <marker id="sect6"></marker> + <title>Matching a Single Data Unit</title> + <p>Outside a character class, the escape sequence \C matches any data unit, + regardless if a UTF mode is set. One data unit is one byte. Unlike a dot, + \C always matches line-ending characters. The feature is provided in Perl + to match individual bytes in UTF-8 mode, but it is unclear how it can + usefully be used. As \C breaks up characters into individual data units, + matching one unit with \C in a UTF mode means that the remaining string + can start with a malformed UTF character. This has undefined results, as + PCRE assumes that it deals with valid UTF strings.</p> + + <p>PCRE does not allow \C to appear in lookbehind assertions (described + below) in a UTF mode, as this would make it impossible to calculate the + length of the lookbehind.</p> + + <p>The \C escape sequence is best avoided. However, one way of using it that + avoids the problem of malformed UTF characters is to use a lookahead to + check the length of the next character, as in the following pattern, which + can be used with a UTF-8 string (ignore whitespace and line breaks):</p> <code type="none"> - (?| (?=[\x00-\x7f])(\C) | - (?=[\x80-\x{7ff}])(\C)(\C) | - (?=[\x{800}-\x{ffff}])(\C)(\C)(\C) | - (?=[\x{10000}-\x{1fffff}])(\C)(\C)(\C)(\C))</code> - -<p>A group that starts with (?| resets the capturing parentheses numbers in each -alternative (see "Duplicate Subpattern Numbers" -below). The assertions at the start of each branch check the next UTF-8 -character for values whose encoding uses 1, 2, 3, or 4 bytes, respectively. The -character's individual bytes are then captured by the appropriate number of -groups.</p> - -</section> - -<section><marker id="sect7"></marker><title>Square brackets and character classes</title> - -<p>An opening square bracket introduces a character class, terminated by a closing -square bracket. A closing square bracket on its own is not special by default. -However, if the PCRE_JAVASCRIPT_COMPAT option is set, a lone closing square -bracket causes a compile-time error. If a closing square bracket is required as -a member of the class, it should be the first data character in the class -(after an initial circumflex, if present) or escaped with a backslash.</p> - -<p>A character class matches a single character in the subject. In a UTF mode, the -character may be more than one data unit long. A matched character must be in -the set of characters defined by the class, unless the first character in the -class definition is a circumflex, in which case the subject character must not -be in the set defined by the class. If a circumflex is actually required as a -member of the class, ensure it is not the first character, or escape it with a -backslash.</p> - -<p>For example, the character class [aeiou] matches any lower case vowel, while -[^aeiou] matches any character that is not a lower case vowel. Note that a -circumflex is just a convenient notation for specifying the characters that -are in the class by enumerating those that are not. A class that starts with a -circumflex is not an assertion; it still consumes a character from the subject -string, and therefore it fails if the current pointer is at the end of the -string.</p> - -<p>In UTF-8 mode, characters with values greater than 255 (0xffff) -can be included in a class as a literal string of data units, or by using the -\x{ escaping mechanism.</p> - -<p>When caseless matching is set, any letters in a class represent both their -upper case and lower case versions, so for example, a caseless [aeiou] matches -"A" as well as "a", and a caseless [^aeiou] does not match "A", whereas a -caseful version would. In a UTF mode, PCRE always understands the concept of -case for characters whose values are less than 256, so caseless matching is -always possible. For characters with higher values, the concept of case is -supported if PCRE is compiled with Unicode property support, but not otherwise. -If you want to use caseless matching in a UTF mode for characters 256 and -above, you must ensure that PCRE is compiled with Unicode property support as -well as with UTF support.</p> - -<p>Characters that might indicate line breaks are never treated in any special way -when matching character classes, whatever line-ending sequence is in use, and -whatever setting of the PCRE_DOTALL and PCRE_MULTILINE options is used. A class -such as [^a] always matches one of these characters.</p> - -<p>The minus (hyphen) character can be used to specify a range of characters in a -character class. For example, [d-m] matches any letter between d and m, -inclusive. If a minus character is required in a class, it must be escaped with -a backslash or appear in a position where it cannot be interpreted as -indicating a range, typically as the first or last character in the class.</p> - -<p>It is not possible to have the literal character "]" as the end character of a -range. A pattern such as [W-]46] is interpreted as a class of two characters -("W" and "-") followed by a literal string "46]", so it would match "W46]" or -"-46]". However, if the "]" is escaped with a backslash it is interpreted as -the end of range, so [W-\]46] is interpreted as a class containing a range -followed by two other characters. The octal or hexadecimal representation of -"]" can also be used to end a range.</p> - -<p>Ranges operate in the collating sequence of character values. They can also be -used for characters specified numerically, for example [\000-\037]. Ranges -can include any characters that are valid for the current mode.</p> - -<p>If a range that includes letters is used when caseless matching is set, it -matches the letters in either case. For example, [W-c] is equivalent to -[][\\^_`wxyzabc], matched caselessly, and in a non-UTF mode, if character -tables for a French locale are in use, [\xc8-\xcb] matches accented E -characters in both cases. In UTF modes, PCRE supports the concept of case for -characters with values greater than 255 only when it is compiled with Unicode -property support.</p> - -<p>The character escape sequences \d, \D, \h, \H, \p, \P, \s, \S, \v, -\V, \w, and \W may appear in a character class, and add the characters that -they match to the class. For example, [\dABCDEF] matches any hexadecimal -digit. In UTF modes, the <c>ucp</c> option affects the meanings of \d, \s, \w -and their upper case partners, just as it does when they appear outside a -character class, as described in the section entitled -"Generic character types" -above. The escape sequence \b has a different meaning inside a character -class; it matches the backspace character. The sequences \B, \N, \R, and \X -are not special inside a character class. Like any other unrecognized escape -sequences, they are treated as the literal characters "B", "N", "R", and "X".</p> - -<p>A circumflex can conveniently be used with the upper case character types to -specify a more restricted set of characters than the matching lower case type. -For example, the class [^\W_] matches any letter or digit, but not underscore, -whereas [\w] includes underscore. A positive character class should be read as -"something OR something OR ..." and a negative class as "NOT something AND NOT -something AND NOT ...".</p> - -<p>The only metacharacters that are recognized in character classes -are backslash, hyphen (only where it can be interpreted as specifying -a range), circumflex (only at the start), opening square bracket (only -when it can be interpreted as introducing a POSIX class name - see the -next section), and the terminating closing square bracket. However, -escaping other non-alphanumeric characters does no harm.</p> -</section> - -<section><marker id="sect8"></marker><title>POSIX character classes</title> - -<p>Perl supports the POSIX notation for character classes. This uses names -enclosed by [: and :] within the enclosing square brackets. PCRE also supports -this notation. For example,</p> - -<quote><p> [01[:alpha:]%]</p></quote> - -<p>matches "0", "1", any alphabetic character, or "%". The supported class names -are:</p> - -<taglist> - <tag>alnum</tag> <item>letters and digits</item> - <tag>alpha</tag> <item>letters</item> - <tag>ascii</tag> <item>character codes 0 - 127</item> - <tag>blank</tag> <item>space or tab only</item> - <tag>cntrl</tag> <item>control characters</item> - <tag>digit</tag> <item>decimal digits (same as \d)</item> - <tag>graph</tag> <item>printing characters, excluding space</item> - <tag>lower</tag> <item>lower case letters</item> - <tag>print</tag> <item>printing characters, including space</item> - <tag>punct</tag> <item>printing characters, excluding letters and digits and space</item> - <tag>space</tag> <item>whitespace (not quite the same as \s)</item> - <tag>upper</tag> <item>upper case letters</item> - <tag>word</tag> <item>"word" characters (same as \w)</item> - <tag>xdigit</tag> <item>hexadecimal digits</item> -</taglist> - -<p>The "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13), and -space (32). Notice that this list includes the VT character (code 11). This -makes "space" different to \s, which does not include VT (for Perl -compatibility).</p> - -<p>The name "word" is a Perl extension, and "blank" is a GNU extension -from Perl 5.8. Another Perl extension is negation, which is indicated -by a ^ character after the colon. For example,</p> - -<quote><p> [12[:^digit:]]</p></quote> - -<p>matches "1", "2", or any non-digit. PCRE (and Perl) also recognize the POSIX -syntax [.ch.] and [=ch=] where "ch" is a "collating element", but these are not -supported, and an error is given if they are encountered.</p> - -<p>By default, in UTF modes, characters with values greater than 255 do not match -any of the POSIX character classes. However, if the PCRE_UCP option is passed -to <em>pcre_compile()</em>, some of the classes are changed so that Unicode -character properties are used. This is achieved by replacing the POSIX classes -by other sequences, as follows:</p> - -<taglist> - <tag>[:alnum:]</tag> <item>becomes <em>\p{Xan}</em></item> - <tag>[:alpha:]</tag> <item>becomes <em>\p{L}</em></item> - <tag>[:blank:]</tag> <item>becomes <em>\h</em></item> - <tag>[:digit:]</tag> <item>becomes <em>\p{Nd}</em></item> - <tag>[:lower:]</tag> <item>becomes <em>\p{Ll}</em></item> - <tag>[:space:]</tag> <item>becomes <em>\p{Xps}</em></item> - <tag>[:upper:]</tag> <item>becomes <em>\p{Lu}</em></item> - <tag>[:word:]</tag> <item>becomes <em>\p{Xwd}</em></item> -</taglist> - -<p>Negated versions, such as [:^alpha:] use \P instead of \p. The other POSIX -classes are unchanged, and match only characters with code points less than -256.</p> - -</section> - - -<section><marker id="sect9"></marker><title>Vertical bar</title> - -<p>Vertical bar characters are used to separate alternative -patterns. For example, the pattern</p> - -<quote><p> gilbert|sullivan</p></quote> - -<p>matches either "gilbert" or "sullivan". Any number of alternatives -may appear, and an empty alternative is permitted (matching the empty -string). The matching process tries each alternative in turn, from -left to right, and the first one that succeeds is used. If the -alternatives are within a subpattern (defined below), "succeeds" means -matching the rest of the main pattern as well as the alternative in -the subpattern.</p> - -</section> - -<section><marker id="sect10"></marker><title>Internal option setting</title> - -<p>The settings of the <c>caseless</c>, <c>multiline</c>, <c>dotall</c>, and -<c>extended</c> options (which are Perl-compatible) can be changed from within -the pattern by a sequence of Perl option letters enclosed between "(?" and ")". -The option letters are</p> - -<taglist> - <tag>i</tag> <item>for <c>caseless</c></item> - <tag>m</tag> <item>for <c>multiline</c></item> - <tag>s</tag> <item>for <c>dotall</c></item> - <tag>x</tag> <item>for <c>extended</c></item> -</taglist> - -<p>For example, (?im) sets caseless, multiline matching. It is also possible to -unset these options by preceding the letter with a hyphen, and a combined -setting and unsetting such as (?im-sx), which sets <c>caseless</c> and -<c>multiline</c> while unsetting <c>dotall</c> and <c>extended</c>, is also -permitted. If a letter appears both before and after the hyphen, the option is -unset.</p> - -<p>The PCRE-specific options <c>dupnames</c>, <c>ungreedy</c>, and -<c>extra</c> can be changed in the same way as the Perl-compatible -options by using the characters J, U and X respectively.</p> - -<p>When one of these option changes occurs at top level (that is, not inside -subpattern parentheses), the change applies to the remainder of the pattern -that follows. If the change is placed right at the start of a pattern, PCRE -extracts it into the global options.</p> - -<p>An option change within a subpattern (see below for a description of -subpatterns) affects only that part of the subpattern that follows it, so</p> - -<quote><p> (a(?i)b)c</p></quote> - -<p>matches abc and aBc and no other strings (assuming <c>caseless</c> -is not used). By this means, options can be made to have different -settings in different parts of the pattern. Any changes made in one -alternative do carry on into subsequent branches within the same -subpattern. For example,</p> - -<quote><p> (a(?i)b|c)</p></quote> - -<p>matches "ab", "aB", "c", and "C", even though when matching "C" the first -branch is abandoned before the option setting. This is because the effects of -option settings happen at compile time. There would be some very weird -behaviour otherwise.</p> - -<p><em>Note:</em> There are other PCRE-specific options that can be set by the -application when the compiling or matching functions are called. In some cases -the pattern can contain special leading sequences such as (*CRLF) to override -what the application has set or what has been defaulted. Details are given in -the section entitled "Newline sequences" -above. There are also the (*UTF8) and (*UCP) leading -sequences that can be used to set UTF and Unicode property modes; they are -equivalent to setting the <c>unicode</c> and the <c>ucp</c> -options, respectively. The (*UTF) sequence is a generic version that can be -used with any of the libraries. However, the application can set the -<c>never_utf</c> option, which locks out the use of the (*UTF) sequences.</p> - -</section> - -<section><marker id="sect11"></marker><title>Subpatterns</title> - -<p>Subpatterns are delimited by parentheses (round brackets), which -can be nested. Turning part of a pattern into a subpattern does two -things:</p> - -<p>1. It localizes a set of alternatives. For example, the pattern</p> - -<quote><p> cat(aract|erpillar|)</p></quote> - -<p>matches "cataract", "caterpillar", or "cat". Without the parentheses, it would -match "cataract", "erpillar" or an empty string.</p> - -<p>2. It sets up the subpattern as a capturing subpattern. This means that, when -the complete pattern matches, that portion of the subject string that matched the -subpattern is passed back to the caller via the return value of -<c>re:run/3</c>.</p> - -<p>Opening parentheses are counted from left to right (starting -from 1) to obtain numbers for the capturing subpatterns.For example, if the string -"the red king" is matched against the pattern</p> - -<quote><p> the ((red|white) (king|queen))</p></quote> - -<p>the captured substrings are "red king", "red", and "king", and are numbered 1, -2, and 3, respectively.</p> - -<p>The fact that plain parentheses fulfil two functions is not always helpful. -There are often times when a grouping subpattern is required without a -capturing requirement. If an opening parenthesis is followed by a question mark -and a colon, the subpattern does not do any capturing, and is not counted when -computing the number of any subsequent capturing subpatterns. For example, if -the string "the white queen" is matched against the pattern</p> - -<quote><p> the ((?:red|white) (king|queen))</p></quote> - -<p>the captured substrings are "white queen" and "queen", and are numbered 1 and -2. The maximum number of capturing subpatterns is 65535.</p> - -<p>As a convenient shorthand, if any option settings are required at the start of -a non-capturing subpattern, the option letters may appear between the "?" and -the ":". Thus the two patterns</p> +(?| (?=[\x00-\x7f])(\C) | + (?=[\x80-\x{7ff}])(\C)(\C) | + (?=[\x{800}-\x{ffff}])(\C)(\C)(\C) | + (?=[\x{10000}-\x{1fffff}])(\C)(\C)(\C)(\C))</code> + + <p>A group that starts with (?| resets the capturing parentheses numbers in + each alternative (see section <seealso marker="#sect12">Duplicate + Subpattern Numbers</seealso>). The assertions at the start of each branch + check the next UTF-8 character for values whose encoding uses 1, 2, 3, or + 4 bytes, respectively. The individual bytes of the character are then + captured by the appropriate number of groups.</p> + </section> -<list> -<item>(?i:saturday|sunday)</item> -<item>(?:(?i)saturday|sunday)</item> -</list> + <section> + <marker id="sect7"></marker> + <title>Square Brackets and Character Classes</title> + <p>An opening square bracket introduces a character class, terminated by a + closing square bracket. A closing square bracket on its own is not special + by default. However, if option <c>PCRE_JAVASCRIPT_COMPAT</c> is set, a + lone closing square bracket causes a compile-time error. If a closing + square bracket is required as a member of the class, it is to be the first + data character in the class (after an initial circumflex, if present) or + escaped with a backslash.</p> + + <p>A character class matches a single character in the subject. In a UTF + mode, the character can be more than one data unit long. A matched + character must be in the set of characters defined by the class, unless + the first character in the class definition is a circumflex, in which case + the subject character must not be in the set defined by the class. If a + circumflex is required as a member of the class, ensure that it is not the + first character, or escape it with a backslash.</p> + + <p>For example, the character class <c>[aeiou]</c> matches any lowercase + vowel, while <c>[^aeiou]</c> matches any character that is not a lowercase + vowel. Notice that a circumflex is just a convenient notation for + specifying the characters that are in the class by enumerating those that + are not. A class that starts with a circumflex is not an assertion; it + still consumes a character from the subject string, and therefore it fails + if the current pointer is at the end of the string.</p> + + <p>In UTF-8 mode, characters with values > 255 (0xffff) can be included + in a class as a literal string of data units, or by using the \x{ escaping + mechanism.</p> + + <p>When caseless matching is set, any letters in a class represent both + their uppercase and lowercase versions. For example, a caseless + <c>[aeiou]</c> matches "A" and "a", and a caseless <c>[^aeiou]</c> does + not match "A", but a caseful version would. In a UTF mode, PCRE always + understands the concept of case for characters whose values are < 256, + so caseless matching is always possible. For characters with higher + values, the concept of case is supported only if PCRE is compiled with + Unicode property support. If you want to use caseless matching in a UTF + mode for characters >=, ensure that PCRE is compiled with Unicode + property support and with UTF support.</p> + + <p>Characters that can indicate line breaks are never treated in any special + way when matching character classes, whatever line-ending sequence is in + use, and whatever setting of options <c>PCRE_DOTALL</c> and + <c>PCRE_MULTILINE</c> is used. A class such as [^a] always matches one of + these characters.</p> + + <p>The minus (hyphen) character can be used to specify a range of characters + in a character class. For example, [d-m] matches any letter between d and + m, inclusive. If a minus character is required in a class, it must be + escaped with a backslash or appear in a position where it cannot be + interpreted as indicating a range, typically as the first or last + character in the class.</p> + + <p>The literal character "]" cannot be the end character of a range. A + pattern such as [W-]46] is interpreted as a class of two characters ("W" + and "-") followed by a literal string "46]", so it would match "W46]" or + "-46]". However, if "]" is escaped with a backslash, it is interpreted as + the end of range, so [W-\]46] is interpreted as a class containing a range + followed by two other characters. The octal or hexadecimal representation + of "]" can also be used to end a range.</p> + + <p>Ranges operate in the collating sequence of character values. They can + also be used for characters specified numerically, for example, + [\000-\037]. Ranges can include any characters that are valid for the + current mode.</p> + + <p>If a range that includes letters is used when caseless matching is set, + it matches the letters in either case. For example, [W-c] is equivalent to + [][\\^_`wxyzabc], matched caselessly. In a non-UTF mode, if character + tables for a French locale are in use, [\xc8-\xcb] matches accented E + characters in both cases. In UTF modes, PCRE supports the concept of case + for characters with values > 255 only when it is compiled with Unicode + property support.</p> + + <p>The character escape sequences \d, \D, \h, \H, \p, \P, \s, \S, \v, \V, + \w, and \W can appear in a character class, and add the characters that + they match to the class. For example, [\dABCDEF] matches any hexadecimal + digit. In UTF modes, option <c>ucp</c> affects the meanings of \d, \s, \w + and their uppercase partners, just as it does when they appear outside a + character class, as described in section + <seealso marker="#generic_character_types">Generic Character + Types</seealso> earlier. The escape sequence \b has a different meaning + inside a character class; it matches the backspace character. The + sequences \B, \N, \R, and \X are not special inside a character class. + Like any other unrecognized escape sequences, they are treated as the + literal characters "B", "N", "R", and "X".</p> + + <p>A circumflex can conveniently be used with the uppercase character types + to specify a more restricted set of characters than the matching lowercase + type. For example, class [^\W_] matches any letter or digit, but not + underscore, while [\w] includes underscore. A positive character class + is to be read as "something OR something OR ..." and a negative class as + "NOT something AND NOT something AND NOT ...".</p> + + <p>Only the following metacharacters are recognized in character + classes:</p> + + <list type="bulleted"> + <item>Backslash</item> + <item>Hyphen (only where it can be interpreted as specifying a + range)</item> + <item>Circumflex (only at the start)</item> + <item>Opening square bracket (only when it can be interpreted as + introducing a Posix class name; see the next section)</item> + <item>Terminating closing square bracket</item> + </list> + + <p>However, escaping other non-alphanumeric characters does no harm.</p> + </section> -<p>match exactly the same set of strings. Because alternative branches are tried -from left to right, and options are not reset until the end of the subpattern -is reached, an option setting in one branch does affect subsequent branches, so -the above patterns match "SUNDAY" as well as "Saturday".</p> + <section> + <marker id="sect8"></marker> + <title>Posix Character Classes</title> + <p>Perl supports the Posix notation for character classes. This uses names + enclosed by [: and :] within the enclosing square brackets. PCRE also + supports this notation. For example, the following matches "0", "1", any + alphabetic character, or "%":</p> + + <code> +[01[:alpha:]%]</code> + + <p>The following are the supported class names:</p> + + <taglist> + <tag>alnum</tag><item>Letters and digits</item> + <tag>alpha</tag><item>Letters</item> + <tag>ascii</tag><item>Character codes 0-127</item> + <tag>blank</tag><item>Space or tab only</item> + <tag>cntrl</tag><item>Control characters</item> + <tag>digit</tag><item>Decimal digits (same as \d)</item> + <tag>graph</tag><item>Printing characters, excluding space</item> + <tag>lower</tag><item>Lowercase letters</item> + <tag>print</tag><item>Printing characters, including space</item> + <tag>punct</tag><item>Printing characters, excluding letters, digits, and + space</item> + <tag>space</tag><item>Whitespace (not quite the same as \s)</item> + <tag>upper</tag><item>Uppercase letters</item> + <tag>word</tag><item>"Word" characters (same as \w)</item> + <tag>xdigit</tag><item>Hexadecimal digits</item> + </taglist> + + <p>The "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13), + and space (32). Notice that this list includes the VT character (code 11). + This makes "space" different to \s, which does not include VT (for Perl + compatibility).</p> + + <p>The name "word" is a Perl extension, and "blank" is a GNU extension from + Perl 5.8. Another Perl extension is negation, which is indicated by a ^ + character after the colon. For example, the following matches "1", "2", + or any non-digit:</p> + + <code> +[12[:^digit:]]</code> + + <p>PCRE (and Perl) also recognize the Posix syntax [.ch.] and [=ch=] where + "ch" is a "collating element", but these are not supported, and an error + is given if they are encountered.</p> + + <p>By default, in UTF modes, characters with values > 255 do not match + any of the Posix character classes. However, if option <c>PCRE_UCP</c> is + passed to <c>pcre_compile()</c>, some of the classes are changed so that + Unicode character properties are used. This is achieved by replacing the + Posix classes by other sequences, as follows:</p> + + <taglist> + <tag>[:alnum:]</tag><item>Becomes <em>\p{Xan}</em></item> + <tag>[:alpha:]</tag><item>Becomes <em>\p{L}</em></item> + <tag>[:blank:]</tag><item>Becomes <em>\h</em></item> + <tag>[:digit:]</tag><item>Becomes <em>\p{Nd}</em></item> + <tag>[:lower:]</tag><item>Becomes <em>\p{Ll}</em></item> + <tag>[:space:]</tag><item>Becomes <em>\p{Xps}</em></item> + <tag>[:upper:]</tag><item>Becomes <em>\p{Lu}</em></item> + <tag>[:word:]</tag><item>Becomes <em>\p{Xwd}</em></item> + </taglist> + + <p>Negated versions, such as [:^alpha:], use \P instead of \p. The other + Posix classes are unchanged, and match only characters with code points + < 256.</p> + </section> -</section> + <section> + <marker id="sect9"></marker> + <title>Vertical Bar</title> + <p>Vertical bar characters are used to separate alternative patterns. For + example, the following pattern matches either "gilbert" or "sullivan":</p> + + <code> +gilbert|sullivan</code> + + <p>Any number of alternatives can appear, and an empty alternative is + permitted (matching the empty string). The matching process tries each + alternative in turn, from left to right, and the first that succeeds is + used. If the alternatives are within a subpattern (defined in section + <seealso marker="#sect11">Subpatterns</seealso>), "succeeds" means + matching the remaining main pattern and the alternative in the + subpattern.</p> + </section> -<section><marker id="sect12"></marker><title>Duplicate subpattern numbers</title> + <section> + <marker id="sect10"></marker> + <title>Internal Option Setting</title> + <p>The settings of the Perl-compatible options <c>caseless</c>, + <c>multiline</c>, <c>dotall</c>, and <c>extended</c> can be changed from + within the pattern by a sequence of Perl option letters enclosed between + "(?" and ")". The option letters are as follows:</p> + + <taglist> + <tag>i</tag><item>For <c>caseless</c></item> + <tag>m</tag><item>For <c>multiline</c></item> + <tag>s</tag><item>For <c>dotall</c></item> + <tag>x</tag><item>For <c>extended</c></item> + </taglist> + + <p>For example, <c>(?im)</c> sets caseless, multiline matching. These + options can also be unset by preceding the letter with a hyphen. A + combined setting and unsetting such as <c>(?im-sx)</c>, which sets + <c>caseless</c> and <c>multiline</c>, while unsetting <c>dotall</c> and + <c>extended</c>, is also permitted. If a letter appears both before and + after the hyphen, the option is unset.</p> + + <p>The PCRE-specific options <c>dupnames</c>, <c>ungreedy</c>, and + <c>extra</c> can be changed in the same way as the Perl-compatible + options by using the characters J, U, and X respectively.</p> + + <p>When one of these option changes occurs at top-level (that is, not inside + subpattern parentheses), the change applies to the remainder of the + pattern that follows. If the change is placed right at the start of a + pattern, PCRE extracts it into the global options.</p> + <p>An option change within a subpattern (see section + <seealso marker="#sect11">Subpatterns</seealso>) affects only that part of + the subpattern that follows it. So, the following matches abc and aBc and + no other strings (assuming <c>caseless</c> is not used):</p> + + <code> +(a(?i)b)c</code> + + <p>By this means, options can be made to have different settings in + different parts of the pattern. Any changes made in one alternative do + carry on into subsequent branches within the same subpattern. For + example:</p> + + <code> +(a(?i)b|c)</code> + + <p>matches "ab", "aB", "c", and "C", although when matching "C" the first + branch is abandoned before the option setting. This is because the effects + of option settings occur at compile time. There would be some weird + behavior otherwise.</p> -<p>Perl 5.10 introduced a feature whereby each alternative in a subpattern uses -the same numbers for its capturing parentheses. Such a subpattern starts with -(?| and is itself a non-capturing subpattern. For example, consider this -pattern:</p> + <note> + <p>Other PCRE-specific options can be set by the application when the + compiling or matching functions are called. Sometimes the pattern can + contain special leading sequences, such as (*CRLF), to override what + the application has set or what has been defaulted. Details are provided + in section <seealso marker="#newline_sequences"> + Newline Sequences</seealso> earlier.</p> + <p>The (*UTF8) and (*UCP) leading sequences can be used to set UTF and + Unicode property modes. They are equivalent to setting options + <c>unicode</c> and <c>ucp</c>, respectively. The (*UTF) sequence is a + generic version that can be used with any of the libraries. However, + the application can set option <c>never_utf</c>, which locks out the + use of the (*UTF) sequences.</p> + </note> + </section> -<quote><p> (?|(Sat)ur|(Sun))day</p></quote> + <section> + <marker id="sect11"></marker> + <title>Subpatterns</title> + <p>Subpatterns are delimited by parentheses (round brackets), which can be + nested. Turning part of a pattern into a subpattern does two things:</p> -<p>Because the two alternatives are inside a (?| group, both sets of capturing -parentheses are numbered one. Thus, when the pattern matches, you can look -at captured substring number one, whichever alternative matched. This construct -is useful when you want to capture part, but not all, of one of a number of -alternatives. Inside a (?| group, parentheses are numbered as usual, but the -number is reset at the start of each branch. The numbers of any capturing -parentheses that follow the subpattern start after the highest number used in -any branch. The following example is taken from the Perl documentation. The -numbers underneath show in which buffer the captured content will be stored.</p> + <taglist> + <tag>1.</tag> + <item> + <p>It localizes a set of alternatives. For example, the following + pattern matches "cataract", "caterpillar", or "cat":</p> + <code> +cat(aract|erpillar|)</code> + <p>Without the parentheses, it would match "cataract", "erpillar", or an + empty string.</p> + </item> + <tag>2.</tag> + <item> + <p>It sets up the subpattern as a capturing subpattern. That is, when + the complete pattern matches, that portion of the subject string that + matched the subpattern is passed back to the caller through the + return value of <seealso marker="#run/3"><c>run/3</c></seealso>.</p> + </item> + </taglist> + + <p>Opening parentheses are counted from left to right (starting from 1) to + obtain numbers for the capturing subpatterns. For example, if the string + "the red king" is matched against the following pattern, the captured + substrings are "red king", "red", and "king", and are numbered 1, 2, and + 3, respectively:</p> + + <code> +the ((red|white) (king|queen))</code> + + <p>It is not always helpful that plain parentheses fulfill two functions. + Often a grouping subpattern is required without a capturing requirement. + If an opening parenthesis is followed by a question mark and a colon, the + subpattern does not do any capturing, and is not counted when computing + the number of any subsequent capturing subpatterns. For example, if the + string "the white queen" is matched against the following pattern, the + captured substrings are "white queen" and "queen", and are numbered 1 and + 2:</p> + + <code> +the ((?:red|white) (king|queen))</code> + + <p>The maximum number of capturing subpatterns is 65535.</p> + + <p>As a convenient shorthand, if any option settings are required at the + start of a non-capturing subpattern, the option letters can appear between + "?" and ":". Thus, the following two patterns match the same set of + strings:</p> + + <code> +(?i:saturday|sunday) +(?:(?i)saturday|sunday)</code> + + <p>As alternative branches are tried from left to right, and options are not + reset until the end of the subpattern is reached, an option setting in one + branch does affect subsequent branches, so the above patterns match both + "SUNDAY" and "Saturday".</p> + </section> -<code type="none"> - # before ---------------branch-reset----------- after - / ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x - # 1 2 2 3 2 3 4</code> - -<p>A back reference to a numbered subpattern uses the most recent value that is -set for that number by any subpattern. The following pattern matches "abcabc" -or "defdef":</p> - -<quote><p> /(?|(abc)|(def))\1/</p></quote> - -<p>In contrast, a subroutine call to a numbered subpattern always refers to the -first one in the pattern with the given number. The following pattern matches -"abcabc" or "defabc":</p> - -<quote><p> /(?|(abc)|(def))(?1)/</p></quote> - -<p>If a condition test -for a subpattern's having matched refers to a non-unique number, the test is -true if any of the subpatterns of that number have matched.</p> - -<p>An alternative approach to using this "branch reset" feature is to use -duplicate named subpatterns, as described in the next section.</p> - -</section> - -<section><marker id="sect13"></marker><title>Named subpatterns</title> - -<p>Identifying capturing parentheses by number is simple, but it can be very hard -to keep track of the numbers in complicated regular expressions. Furthermore, -if an expression is modified, the numbers may change. To help with this -difficulty, PCRE supports the naming of subpatterns. This feature was not -added to Perl until release 5.10. Python had the feature earlier, and PCRE -introduced it at release 4.0, using the Python syntax. PCRE now supports both -the Perl and the Python syntax. Perl allows identically numbered subpatterns to -have different names, but PCRE does not.</p> - -<p>In PCRE, a subpattern can be named in one of three ways: -(?<name>...) or (?'name'...) as in Perl, or (?P<name>...) -as in Python. References to capturing parentheses from other parts of -the pattern, such as back references, recursion, and conditions, can be -made by name as well as by number.</p> - -<p>Names consist of up to 32 alphanumeric characters and underscores. Named -capturing parentheses are still allocated numbers as well as names, exactly as -if the names were not present. -<!-- XXX C Interface -The PCRE API provides function calls for -extracting the name-to-number translation table from a compiled pattern. There -is also a convenience function for extracting a captured substring by name. ---> -The <c>capture</c> specification to <c>re:run/3</c> can use named values if they are present in the regular expression. -</p> - -<p>By default, a name must be unique within a pattern, but it is possible to relax -this constraint by setting the <c>dupnames</c> option at compile time. (Duplicate -names are also always permitted for subpatterns with the same number, set up as -described in the previous section.) Duplicate names can be useful for patterns -where only one instance of the named parentheses can match. Suppose you want to -match the name of a weekday, either as a 3-letter abbreviation or as the full -name, and in both cases you want to extract the abbreviation. This pattern -(ignoring the line breaks) does the job:</p> + <section> + <marker id="sect12"></marker> + <title>Duplicate Subpattern Numbers</title> + <p>Perl 5.10 introduced a feature where each alternative in a subpattern + uses the same numbers for its capturing parentheses. Such a subpattern + starts with <c>(?|</c> and is itself a non-capturing subpattern. For + example, consider the following pattern:</p> + + <code> +(?|(Sat)ur|(Sun))day</code> + + <p>As the two alternatives are inside a <c>(?|</c> group, both sets of + capturing parentheses are numbered one. Thus, when the pattern matches, + you can look at captured substring number one, whichever alternative + matched. This construct is useful when you want to capture a part, but + not all, of one of many alternatives. Inside a <c>(?|</c> group, + parentheses are numbered as usual, but the number is reset at the start + of each branch. The numbers of any capturing parentheses that follow the + subpattern start after the highest number used in any branch. + The following example is from the Perl documentation; the numbers + underneath show in which buffer the captured content is stored:</p> <code type="none"> - (?<DN>Mon|Fri|Sun)(?:day)?| - (?<DN>Tue)(?:sday)?| - (?<DN>Wed)(?:nesday)?| - (?<DN>Thu)(?:rsday)?| - (?<DN>Sat)(?:urday)?</code> - -<p>There are five capturing substrings, but only one is ever set after a match. -(An alternative way of solving this problem is to use a "branch reset" -subpattern, as described in the previous section.)</p> - -<!-- XXX C Interface - -<p>The convenience function for extracting the data by name returns the substring -for the first (and in this example, the only) subpattern of that name that -matched. This saves searching to find which numbered subpattern it was. If you -make a reference to a non-unique named subpattern from elsewhere in the -pattern, the one that corresponds to the lowest number is used. For further -details of the interfaces for handling named subpatterns, see the -<em>pcreapi</em> - -documentation.</p> ---> - -<p>In case of capturing named subpatterns which names are not unique, the first matching occurrence (counted from left to right in the subject) is returned from <c>re:exec/3</c>, if the name is specified in the <c>values</c> part of the <c>capture</c> statement. The <c>all_names</c> capturing value will match all of the names in the same way.</p> - -<p><em>Warning:</em> You cannot use different names to distinguish between two -subpatterns with the same number because PCRE uses only the numbers when -matching. For this reason, an error is given at compile time if different names -are given to subpatterns with the same number. However, you can give the same -name to subpatterns with the same number, even when <c>dupnames</c> is not set.</p> - -</section> - -<section><marker id="sect14"></marker><title>Repetition</title> - -<p>Repetition is specified by quantifiers, which can follow any of the -following items:</p> - -<list> - <item>a literal data character</item> - <item>the dot metacharacter</item> - <item>the \C escape sequence</item> - <item>the \X escape sequence</item> - <item>the \R escape sequence</item> - <item>an escape such as \d or \pL that matches a single character</item> - <item>a character class</item> - <item>a back reference (see next section)</item> - <item>a parenthesized subpattern (including assertions)</item> - <item>a subroutine call to a subpattern (recursive or otherwise)</item> -</list> - -<p>The general repetition quantifier specifies a minimum and maximum number of -permitted matches, by giving the two numbers in curly brackets (braces), -separated by a comma. The numbers must be less than 65536, and the first must -be less than or equal to the second. For example:</p> - -<quote><p> z{2,4}</p></quote> - -<p>matches "zz", "zzz", or "zzzz". A closing brace on its own is not a special -character. If the second number is omitted, but the comma is present, there is -no upper limit; if the second number and the comma are both omitted, the -quantifier specifies an exact number of required matches. Thus</p> - -<quote><p> [aeiou]{3,}</p></quote> - -<p>matches at least 3 successive vowels, but may match many more, while</p> - -<quote><p> \d{8}</p></quote> - -<p>matches exactly 8 digits. An opening curly bracket that appears in a position -where a quantifier is not allowed, or one that does not match the syntax of a -quantifier, is taken as a literal character. For example, {,6} is not a -quantifier, but a literal string of four characters.</p> - -<p>In Unicode mode, quantifiers apply to characters rather than to individual data -units. Thus, for example, \x{100}{2} matches two characters, each of -which is represented by a two-byte sequence in a UTF-8 string. Similarly, -\X{3} matches three Unicode extended grapheme clusters, each of which may be -several data units long (and they may be of different lengths).</p> -<p>The quantifier {0} is permitted, causing the expression to behave as if the -previous item and the quantifier were not present. This may be useful for -subpatterns that are referenced as subroutines -from elsewhere in the pattern (but see also the section entitled -"Defining subpatterns for use by reference only" -below). Items other than subpatterns that have a {0} quantifier are omitted -from the compiled pattern.</p> - -<p>For convenience, the three most common quantifiers have single-character -abbreviations:</p> - -<taglist> - <tag>*</tag> <item>is equivalent to {0,}</item> - <tag>+</tag> <item>is equivalent to {1,}</item> - <tag>?</tag> <item>is equivalent to {0,1}</item> -</taglist> - -<p>It is possible to construct infinite loops by following a -subpattern that can match no characters with a quantifier that has no -upper limit, for example:</p> - -<quote><p> (a?)*</p></quote> - -<p>Earlier versions of Perl and PCRE used to give an error at compile time for -such patterns. However, because there are cases where this can be useful, such -patterns are now accepted, but if any repetition of the subpattern does in fact -match no characters, the loop is forcibly broken.</p> - -<p>By default, the quantifiers are "greedy", that is, they match as much as -possible (up to the maximum number of permitted times), without causing the -rest of the pattern to fail. The classic example of where this gives problems -is in trying to match comments in C programs. These appear between /* and */ -and within the comment, individual * and / characters may appear. An attempt to -match C comments by applying the pattern</p> - -<quote><p> /\*.*\*/</p></quote> - -<p>to the string</p> - -<quote><p> /* first comment */ not comment /* second comment */</p></quote> - -<p>fails, because it matches the entire string owing to the greediness of the .* -item.</p> - -<p>However, if a quantifier is followed by a question mark, it ceases to be -greedy, and instead matches the minimum number of times possible, so the -pattern</p> - -<quote><p> /\*.*?\*/</p></quote> - -<p>does the right thing with the C comments. The meaning of the various -quantifiers is not otherwise changed, just the preferred number of matches. -Do not confuse this use of question mark with its use as a quantifier in its -own right. Because it has two uses, it can sometimes appear doubled, as in</p> - -<quote><p> \d??\d</p></quote> - -<p>which matches one digit by preference, but can match two if that is the only -way the rest of the pattern matches.</p> - -<p>If the <c>ungreedy</c> option is set (an option that is not available in Perl), -the quantifiers are not greedy by default, but individual ones can be made -greedy by following them with a question mark. In other words, it inverts the -default behaviour.</p> - -<p>When a parenthesized subpattern is quantified with a minimum repeat count that -is greater than 1 or with a limited maximum, more memory is required for the -compiled pattern, in proportion to the size of the minimum or maximum.</p> - -<p>If a pattern starts with .* or .{0,} and the <c>dotall</c> option (equivalent -to Perl's /s) is set, thus allowing the dot to match newlines, the pattern is -implicitly anchored, because whatever follows will be tried against every -character position in the subject string, so there is no point in retrying the -overall match at any position after the first. PCRE normally treats such a -pattern as though it were preceded by \A.</p> - -<p>In cases where it is known that the subject string contains no newlines, it is -worth setting <c>dotall</c> in order to obtain this optimization, or -alternatively using ^ to indicate anchoring explicitly.</p> - -<p>However, there are some cases where the optimization cannot be used. When .* -is inside capturing parentheses that are the subject of a back reference -elsewhere in the pattern, a match at the start may fail where a later one -succeeds. Consider, for example:</p> - -<quote><p> (.*)abc\1</p></quote> - -<p>If the subject is "xyz123abc123" the match point is the fourth character. For -this reason, such a pattern is not implicitly anchored.</p> - -<p>Another case where implicit anchoring is not applied is when the leading .* is -inside an atomic group. Once again, a match at the start may fail where a later -one succeeds. Consider this pattern:</p> - -<quote><p> (?>.*?a)b</p></quote> - -<p>It matches "ab" in the subject "aab". The use of the backtracking control verbs -(*PRUNE) and (*SKIP) also disable this optimization.</p> - -<p>When a capturing subpattern is repeated, the value captured is the substring -that matched the final iteration. For example, after</p> - -<quote><p> (tweedle[dume]{3}\s*)+</p></quote> - -<p>has matched "tweedledum tweedledee" the value of the captured substring is -"tweedledee". However, if there are nested capturing subpatterns, the -corresponding captured values may have been set in previous iterations. For -example, after</p> - -<quote><p> /(a|(b))+/</p></quote> - -<p>matches "aba" the value of the second captured substring is "b".</p> - - -</section> - -<section><marker id="sect15"></marker><title>Atomic grouping and possessive quantifiers</title> +# before ---------------branch-reset----------- after +/ ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x +# 1 2 2 3 2 3 4</code> -<p>With both maximizing ("greedy") and minimizing ("ungreedy" or "lazy") -repetition, failure of what follows normally causes the repeated item to be -re-evaluated to see if a different number of repeats allows the rest of the -pattern to match. Sometimes it is useful to prevent this, either to change the -nature of the match, or to cause it fail earlier than it otherwise might, when -the author of the pattern knows there is no point in carrying on.</p> + <p>A back reference to a numbered subpattern uses the most recent value that + is set for that number by any subpattern. The following pattern matches + "abcabc" or "defdef":</p> -<p>Consider, for example, the pattern \d+foo when applied to the subject line</p> + <code> +/(?|(abc)|(def))\1/</code> -<quote><p> 123456bar</p></quote> + <p>In contrast, a subroutine call to a numbered subpattern always refers to + the first one in the pattern with the given number. The following pattern + matches "abcabc" or "defabc":</p> -<p>After matching all 6 digits and then failing to match "foo", the normal -action of the matcher is to try again with only 5 digits matching the \d+ -item, and then with 4, and so on, before ultimately failing. "Atomic grouping" -(a term taken from Jeffrey Friedl's book) provides the means for specifying -that once a subpattern has matched, it is not to be re-evaluated in this way.</p> + <code> +/(?|(abc)|(def))(?1)/</code> -<p>If we use atomic grouping for the previous example, the matcher gives up -immediately on failing to match "foo" the first time. The notation is a kind of -special parenthesis, starting with (?> as in this example:</p> + <p>If a condition test for a subpattern having matched refers to a + non-unique number, the test is true if any of the subpatterns of that + number have matched.</p> -<quote><p> (?>\d+)foo</p></quote> - -<p>This kind of parenthesis "locks up" the part of the pattern it contains once -it has matched, and a failure further into the pattern is prevented from -backtracking into it. Backtracking past it to previous items, however, works as -normal.</p> - -<p>An alternative description is that a subpattern of this type matches the string -of characters that an identical standalone pattern would match, if anchored at -the current point in the subject string.</p> - -<p>Atomic grouping subpatterns are not capturing subpatterns. Simple cases such as -the above example can be thought of as a maximizing repeat that must swallow -everything it can. So, while both \d+ and \d+? are prepared to adjust the -number of digits they match in order to make the rest of the pattern match, -(?>\d+) can only match an entire sequence of digits.</p> - -<p>Atomic groups in general can of course contain arbitrarily complicated -subpatterns, and can be nested. However, when the subpattern for an atomic -group is just a single repeated item, as in the example above, a simpler -notation, called a "possessive quantifier" can be used. This consists of an -additional + character following a quantifier. Using this notation, the -previous example can be rewritten as</p> - -<quote><p> \d++foo</p></quote> - -<p>Note that a possessive quantifier can be used with an entire group, for -example:</p> - -<quote><p> (abc|xyz){2,3}+</p></quote> - -<p>Possessive quantifiers are always greedy; the setting of the <c>ungreedy</c> -option is ignored. They are a convenient notation for the simpler forms of -atomic group. However, there is no difference in the meaning of a possessive -quantifier and the equivalent atomic group, though there may be a performance -difference; possessive quantifiers should be slightly faster.</p> - -<p>The possessive quantifier syntax is an extension to the Perl 5.8 syntax. -Jeffrey Friedl originated the idea (and the name) in the first edition of his -book. Mike McCloskey liked it, so implemented it when he built Sun's Java -package, and PCRE copied it from there. It ultimately found its way into Perl -at release 5.10.</p> - -<p>PCRE has an optimization that automatically "possessifies" certain simple -pattern constructs. For example, the sequence A+B is treated as A++B because -there is no point in backtracking into a sequence of A's when B must follow.</p> + <p>An alternative approach using this "branch reset" feature is to use + duplicate named subpatterns, as described in the next section.</p> + </section> -<p>When a pattern contains an unlimited repeat inside a subpattern that can itself -be repeated an unlimited number of times, the use of an atomic group is the -only way to avoid some failing matches taking a very long time indeed. The -pattern</p> + <section> + <marker id="sect13"></marker> + <title>Named Subpatterns</title> + <p>Identifying capturing parentheses by number is simple, but it can be + hard to keep track of the numbers in complicated regular expressions. + Also, if an expression is modified, the numbers can change. To help with + this difficulty, PCRE supports the naming of subpatterns. This feature was + not added to Perl until release 5.10. Python had the feature earlier, and + PCRE introduced it at release 4.0, using the Python syntax. PCRE now + supports both the Perl and the Python syntax. Perl allows identically + numbered subpatterns to have different names, but PCRE does not.</p> + + <p>In PCRE, a subpattern can be named in one of three ways: + <c>(?<name>...)</c> or <c>(?'name'...)</c> as in Perl, or + <c>(?P<name>...)</c> as in Python. References to capturing + parentheses from other parts of the pattern, such as back references, + recursion, and conditions, can be made by name and by number.</p> + + <p>Names consist of up to 32 alphanumeric characters and underscores. Named + capturing parentheses are still allocated numbers as well as names, + exactly as if the names were not present. + The <c>capture</c> specification to <seealso marker="#run/3"> + <c>run/3</c></seealso> can use named values if they are present in the + regular expression.</p> -<quote><p> (\D+|<\d+>)*[!?]</p></quote> + <p>By default, a name must be unique within a pattern, but this constraint + can be relaxed by setting option <c>dupnames</c> at compile time. + (Duplicate names are also always permitted for subpatterns with the same + number, set up as described in the previous section.) Duplicate names can + be useful for patterns where only one instance of the named parentheses + can match. Suppose that you want to match the name of a weekday, either as + a 3-letter abbreviation or as the full name, and in both cases you want to + extract the abbreviation. The following pattern (ignoring the line + breaks) does the job:</p> + + <code type="none"> +(?<DN>Mon|Fri|Sun)(?:day)?| +(?<DN>Tue)(?:sday)?| +(?<DN>Wed)(?:nesday)?| +(?<DN>Thu)(?:rsday)?| +(?<DN>Sat)(?:urday)?</code> + + <p>There are five capturing substrings, but only one is ever set after a + match. (An alternative way of solving this problem is to use a "branch + reset" subpattern, as described in the previous section.)</p> + + <p>For capturing named subpatterns which names are not unique, the first + matching occurrence (counted from left to right in the subject) is + returned from <seealso marker="#run/3"><c>run/3</c></seealso>, if the name + is specified in the <c>values</c> part of the <c>capture</c> statement. + The <c>all_names</c> capturing value matches all the names in the same + way.</p> -<p>matches an unlimited number of substrings that either consist of non-digits, or -digits enclosed in <>, followed by either ! or ?. When it matches, it runs -quickly. However, if it is applied to</p> + <note> + <p>You cannot use different names to distinguish between two subpatterns + with the same number, as PCRE uses only the numbers when matching. For + this reason, an error is given at compile time if different names are + specified to subpatterns with the same number. However, you can specify + the same name to subpatterns with the same number, even when + <c>dupnames</c> is not set.</p> + </note> + </section> -<quote><p> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa</p></quote> + <section> + <marker id="sect14"></marker> + <title>Repetition</title> + <p>Repetition is specified by quantifiers, which can follow any of the + following items:</p> + + <list type="bulleted"> + <item>A literal data character</item> + <item>The dot metacharacter</item> + <item>The \C escape sequence</item> + <item>The \X escape sequence</item> + <item>The \R escape sequence</item> + <item>An escape such as \d or \pL that matches a single character</item> + <item>A character class</item> + <item>A back reference (see the next section)</item> + <item>A parenthesized subpattern (including assertions)</item> + <item>A subroutine call to a subpattern (recursive or otherwise)</item> + </list> + + <p>The general repetition quantifier specifies a minimum and maximum number + of permitted matches, by giving the two numbers in curly brackets + (braces), separated by a comma. The numbers must be < 65536, and the + first must be less than or equal to the second. For example, the following + matches "zz", "zzz", or "zzzz":</p> + + <code> +z{2,4}</code> + + <p>A closing brace on its own is not a special character. If the second + number is omitted, but the comma is present, there is no upper limit. If + the second number and the comma are both omitted, the quantifier specifies + an exact number of required matches. Thus, the following matches at least + three successive vowels, but can match many more:</p> + + <code> +[aeiou]{3,}</code> + + <p>The following matches exactly eight digits:</p> + + <code> +\d{8}</code> + + <p>An opening curly bracket that appears in a position where a quantifier is + not allowed, or one that does not match the syntax of a quantifier, is + taken as a literal character. For example, {,6} is not a quantifier, but a + literal string of four characters.</p> + + <p>In Unicode mode, quantifiers apply to characters rather than to + individual data units. Thus, for example, \x{100}{2} matches two + characters, each of which is represented by a 2-byte sequence in a + UTF-8 string. Similarly, \X{3} matches three Unicode extended grapheme + clusters, each of which can be many data units long (and they can be of + different lengths).</p> + + <p>The quantifier {0} is permitted, causing the expression to behave as if + the previous item and the quantifier were not present. This can be useful + for subpatterns that are referenced as subroutines from elsewhere in the + pattern (but see also section <seealso marker="#defining_subpatterns"> + Defining Subpatterns for Use by Reference Only</seealso>). Items other + than subpatterns that have a {0} quantifier are omitted from the compiled + pattern.</p> + + <p>For convenience, the three most common quantifiers have single-character + abbreviations:</p> + + <taglist> + <tag>*</tag><item>Equivalent to {0,}</item> + <tag>+</tag><item>Equivalent to {1,}</item> + <tag>?</tag><item>Equivalent to {0,1}</item> + </taglist> + + <p>Infinite loops can be constructed by following a subpattern that can + match no characters with a quantifier that has no upper limit, for + example:</p> + + <code> +(a?)*</code> + + <p>Earlier versions of Perl and PCRE used to give an error at compile time + for such patterns. However, as there are cases where this can be useful, + such patterns are now accepted. However, if any repetition of the + subpattern matches no characters, the loop is forcibly broken.</p> + + <p>By default, the quantifiers are "greedy", that is, they match as much as + possible (up to the maximum number of permitted times), without causing + the remaining pattern to fail. The classic example of where this gives + problems is in trying to match comments in C programs. These appear + between /* and */. Within the comment, individual * and / characters can + appear. An attempt to match C comments by applying the pattern</p> + + <code> +/\*.*\*/</code> + + <p>to the string</p> + + <code> +/* first comment */ not comment /* second comment */</code> + + <p>fails, as it matches the entire string owing to the greediness of the .* + item.</p> + + <p>However, if a quantifier is followed by a question mark, it ceases to be + greedy, and instead matches the minimum number of times possible, so the + following pattern does the right thing with the C comments:</p> + + <code> +/\*.*?\*/</code> + + <p>The meaning of the various quantifiers is not otherwise changed, only + the preferred number of matches. Do not confuse this use of question mark + with its use as a quantifier in its own right. As it has two uses, it can + sometimes appear doubled, as in</p> + + <code> +\d??\d</code> + + <p>which matches one digit by preference, but can match two if that is the + only way the remaining pattern matches.</p> -<p>it takes a long time before reporting failure. This is because the string can -be divided between the internal \D+ repeat and the external * repeat in a -large number of ways, and all have to be tried. (The example uses [!?] rather -than a single character at the end, because both PCRE and Perl have an -optimization that allows for fast failure when a single character is used. They -remember the last single character that is required for a match, and fail early -if it is not present in the string.) If the pattern is changed so that it uses -an atomic group, like this:</p> + <p>If option <c>ungreedy</c> is set (an option that is not available in + Perl), the quantifiers are not greedy by default, but individual ones can + be made greedy by following them with a question mark. That is, it inverts + the default behavior.</p> -<quote><p> ((?>\D+)|<\d+>)*[!?]</p></quote> + <p>When a parenthesized subpattern is quantified with a minimum repeat count + that is > 1 or with a limited maximum, more memory is required for the + compiled pattern, in proportion to the size of the minimum or maximum.</p> -<p>sequences of non-digits cannot be broken, and failure happens quickly.</p> - -</section> - -<section><marker id="sect16"></marker><title>Back references</title> - -<p>Outside a character class, a backslash followed by a digit greater than 0 (and -possibly further digits) is a back reference to a capturing subpattern earlier -(that is, to its left) in the pattern, provided there have been that many -previous capturing left parentheses.</p> - -<p>However, if the decimal number following the backslash is less than 10, it is -always taken as a back reference, and causes an error only if there are not -that many capturing left parentheses in the entire pattern. In other words, the -parentheses that are referenced need not be to the left of the reference for -numbers less than 10. A "forward back reference" of this type can make sense -when a repetition is involved and the subpattern to the right has participated -in an earlier iteration.</p> - -<p>It is not possible to have a numerical "forward back reference" to -a subpattern whose number is 10 or more using this syntax because a -sequence such as \50 is interpreted as a character defined in -octal. See the subsection entitled "Non-printing characters" above for -further details of the handling of digits following a backslash. There -is no such problem when named parentheses are used. A back reference -to any subpattern is possible using named parentheses (see below).</p> - -<p>Another way of avoiding the ambiguity inherent in the use of digits following a -backslash is to use the \g escape sequence. This escape must be followed by an -unsigned number or a negative number, optionally enclosed in braces. These -examples are all identical:</p> - -<list> - <item>(ring), \1</item> - <item>(ring), \g1</item> - <item>(ring), \g{1}</item> -</list> - -<p>An unsigned number specifies an absolute reference without the -ambiguity that is present in the older syntax. It is also useful when -literal digits follow the reference. A negative number is a relative -reference. Consider this example:</p> - -<quote><p> (abc(def)ghi)\g{-1}</p></quote> - -<p>The sequence \g{-1} is a reference to the most recently started capturing -subpattern before \g, that is, is it equivalent to \2 in this example. -Similarly, \g{-2} would be equivalent to \1. The use of relative references -can be helpful in long patterns, and also in patterns that are created by -joining together fragments that contain references within themselves.</p> - -<p>A back reference matches whatever actually matched the capturing -subpattern in the current subject string, rather than anything -matching the subpattern itself (see "Subpatterns as subroutines" below -for a way of doing that). So the pattern</p> - -<quote><p> (sens|respons)e and \1ibility</p></quote> - -<p>matches "sense and sensibility" and "response and responsibility", but not -"sense and responsibility". If caseful matching is in force at the time of the -back reference, the case of letters is relevant. For example,</p> - -<quote><p> ((?i)rah)\s+\1</p></quote> - -<p>matches "rah rah" and "RAH RAH", but not "RAH rah", even though the original -capturing subpattern is matched caselessly.</p> - -<p>There are several different ways of writing back references to named -subpatterns. The .NET syntax \k{name} and the Perl syntax \k<name> or -\k'name' are supported, as is the Python syntax (?P=name). Perl 5.10's unified -back reference syntax, in which \g can be used for both numeric and named -references, is also supported. We could rewrite the above example in any of -the following ways:</p> - -<list> - <item>(?<p1>(?i)rah)\s+\k<p1></item> - <item>(?'p1'(?i)rah)\s+\k{p1}</item> - <item>(?P<p1>(?i)rah)\s+(?P=p1)</item> - <item>(?<p1>(?i)rah)\s+\g{p1}</item> -</list> - -<p>A subpattern that is referenced by name may appear in the pattern before or -after the reference.</p> - -<p>There may be more than one back reference to the same subpattern. If a -subpattern has not actually been used in a particular match, any back -references to it always fail. For example, the pattern</p> - -<quote><p> (a|(bc))\2</p></quote> - -<p>always fails if it starts to match "a" rather than "bc". Because -there may be many capturing parentheses in a pattern, all digits -following the backslash are taken as part of a potential back -reference number. If the pattern continues with a digit character, -some delimiter must be used to terminate the back reference. If the -<c>extended</c> option is set, this can be whitespace. Otherwise an -empty comment (see "Comments" below) can be used.</p> - -<p><em>Recursive back references</em></p> - -<p>A back reference that occurs inside the parentheses to which it refers fails -when the subpattern is first used, so, for example, (a\1) never matches. -However, such references can be useful inside repeated subpatterns. For -example, the pattern</p> - -<quote><p> (a|b\1)+</p></quote> - -<p>matches any number of "a"s and also "aba", "ababbaa" etc. At each iteration of -the subpattern, the back reference matches the character string corresponding -to the previous iteration. In order for this to work, the pattern must be such -that the first iteration does not need to match the back reference. This can be -done using alternation, as in the example above, or by a quantifier with a -minimum of zero.</p> - -<p>Back references of this type cause the group that they reference to be treated -as an atomic group. -Once the whole group has been matched, a subsequent matching failure cannot -cause backtracking into the middle of the group.</p> - -</section> - -<section><marker id="sect17"></marker><title>Assertions</title> - -<p>An assertion is a test on the characters following or preceding the current -matching point that does not actually consume any characters. The simple -assertions coded as \b, \B, \A, \G, \Z, \z, ^ and $ are described -above.</p> - - -<p>More complicated assertions are coded as subpatterns. There are two kinds: -those that look ahead of the current position in the subject string, and those -that look behind it. An assertion subpattern is matched in the normal way, -except that it does not cause the current matching position to be changed.</p> - -<p>Assertion subpatterns are not capturing subpatterns. If such an assertion -contains capturing subpatterns within it, these are counted for the purposes of -numbering the capturing subpatterns in the whole pattern. However, substring -capturing is carried out only for positive assertions. (Perl sometimes, but not -always, does do capturing in negative assertions.)</p> - -<p>For compatibility with Perl, assertion subpatterns may be repeated; though -it makes no sense to assert the same thing several times, the side effect of -capturing parentheses may occasionally be useful. In practice, there only three -cases:</p> - -<taglist> -<tag>(1)</tag> <item>If the quantifier is {0}, the assertion is never obeyed during matching. -However, it may contain internal capturing parenthesized groups that are called -from elsewhere via the subroutine mechanism.</item> -<tag>(2)</tag> <item>If quantifier is {0,n} where n is greater than zero, it is treated as if it -were {0,1}. At run time, the rest of the pattern match is tried with and -without the assertion, the order depending on the greediness of the quantifier.</item> -<tag>(3)</tag> <item>If the minimum repetition is greater than zero, the quantifier is ignored. -The assertion is obeyed just once when encountered during matching.</item> -</taglist> + <p>If a pattern starts with .* or .{0,} and option <c>dotall</c> (equivalent + to Perl option <c>/s</c>) is set, thus allowing the dot to match newlines, + the pattern is implicitly anchored, because whatever follows is tried + against every character position in the subject string. So, there is no + point in retrying the overall match at any position after the first. PCRE + normally treats such a pattern as if it was preceded by \A.</p> -<p><em>Lookahead assertions</em></p> + <p>In cases where it is known that the subject string contains no newlines, + it is worth setting <c>dotall</c> to obtain this optimization, or + alternatively using ^ to indicate anchoring explicitly.</p> -<p>Lookahead assertions start with (?= for positive assertions and (?! for -negative assertions. For example,</p> + <p>However, there are some cases where the optimization cannot be used. When + .* is inside capturing parentheses that are the subject of a back + reference elsewhere in the pattern, a match at the start can fail where a + later one succeeds. Consider, for example:</p> + + <code> +(.*)abc\1</code> -<quote><p> \w+(?=;)</p></quote> + <p>If the subject is "xyz123abc123", the match point is the fourth + character. Therefore, such a pattern is not implicitly anchored.</p> -<p>matches a word followed by a semicolon, but does not include the semicolon in -the match, and</p> + <p>Another case where implicit anchoring is not applied is when the leading + .* is inside an atomic group. Once again, a match at the start can fail + where a later one succeeds. Consider the following pattern:</p> -<quote><p> foo(?!bar)</p></quote> + <code> +(?>.*?a)b</code> -<p>matches any occurrence of "foo" that is not followed by "bar". Note that the -apparently similar pattern</p> + <p>It matches "ab" in the subject "aab". The use of the backtracking control + verbs (*PRUNE) and (*SKIP) also disable this optimization.</p> -<quote><p> (?!foo)bar</p></quote> + <p>When a capturing subpattern is repeated, the value captured is the + substring that matched the final iteration. For example, after</p> -<p>does not find an occurrence of "bar" that is preceded by something other than -"foo"; it finds any occurrence of "bar" whatsoever, because the assertion -(?!foo) is always true when the next three characters are "bar". A -lookbehind assertion is needed to achieve the other effect.</p> + <code> +(tweedle[dume]{3}\s*)+</code> -<p>If you want to force a matching failure at some point in a pattern, the most -convenient way to do it is with (?!) because an empty string always matches, so -an assertion that requires there not to be an empty string must always fail. -The backtracking control verb (*FAIL) or (*F) is a synonym for (?!).</p> + <p>has matched "tweedledum tweedledee", the value of the captured substring + is "tweedledee". However, if there are nested capturing subpatterns, the + corresponding captured values can have been set in previous iterations. + For example, after</p> + <code> +/(a|(b))+/</code> -<p><em>Lookbehind assertions</em></p> + <p>matches "aba", the value of the second captured substring is "b".</p> + </section> -<p>Lookbehind assertions start with (?<= for positive assertions and (?<! for -negative assertions. For example,</p> + <section> + <marker id="sect15"></marker> + <title>Atomic Grouping and Possessive Quantifiers</title> + <p>With both maximizing ("greedy") and minimizing ("ungreedy" or "lazy") + repetition, failure of what follows normally causes the repeated item to + be re-evaluated to see if a different number of repeats allows the + remaining pattern to match. Sometimes it is useful to prevent this, either + to change the nature of the match, or to cause it to fail earlier than it + otherwise might, when the author of the pattern knows that there is no + point in carrying on.</p> + + <p>Consider, for example, the pattern \d+foo when applied to the following + subject line:</p> + + <code> +123456bar</code> + + <p>After matching all six digits and then failing to match "foo", the normal + action of the matcher is to try again with only five digits matching item + \d+, and then with four, and so on, before ultimately failing. "Atomic + grouping" (a term taken from Jeffrey Friedl's book) provides the means for + specifying that once a subpattern has matched, it is not to be + re-evaluated in this way.</p> + + <p>If atomic grouping is used for the previous example, the matcher gives up + immediately on failing to match "foo" the first time. The notation is a + kind of special parenthesis, starting with <c>(?></c> as in the + following example:</p> + + <code> +(?>\d+)foo</code> + + <p>This kind of parenthesis "locks up" the part of the pattern it contains + once it has matched, and a failure further into the pattern is prevented + from backtracking into it. Backtracking past it to previous items, + however, works as normal.</p> + + <p>An alternative description is that a subpattern of this type matches the + string of characters that an identical standalone pattern would match, if + anchored at the current point in the subject string.</p> + + <p>Atomic grouping subpatterns are not capturing subpatterns. Simple cases + such as the above example can be thought of as a maximizing repeat that + must swallow everything it can. So, while both \d+ and \d+? are prepared + to adjust the number of digits they match to make the remaining pattern + match, <c>(?>\d+)</c> can only match an entire sequence of digits.</p> + + <p>Atomic groups in general can contain any complicated + subpatterns, and can be nested. However, when the subpattern for an atomic + group is just a single repeated item, as in the example above, a simpler + notation, called a "possessive quantifier" can be used. This consists of + an extra + character following a quantifier. Using this notation, the + previous example can be rewritten as</p> + + <code> +\d++foo</code> + + <p>Notice that a possessive quantifier can be used with an entire group, + for example:</p> + + <code> +(abc|xyz){2,3}+</code> + + <p>Possessive quantifiers are always greedy; the setting of option + <c>ungreedy</c> is ignored. They are a convenient notation for the simpler + forms of an atomic group. However, there is no difference in the meaning + of a possessive quantifier and the equivalent atomic group, but there can + be a performance difference; possessive quantifiers are probably slightly + faster.</p> + + <p>The possessive quantifier syntax is an extension to the Perl 5.8 syntax. + Jeffrey Friedl originated the idea (and the name) in the first edition of + his book. Mike McCloskey liked it, so implemented it when he built the + Sun Java package, and PCRE copied it from there. It ultimately found its + way into Perl at release 5.10.</p> + + <p>PCRE has an optimization that automatically "possessifies" certain simple + pattern constructs. For example, the sequence A+B is treated as A++B, as + there is no point in backtracking into a sequence of A:s when B must + follow.</p> + + <p>When a pattern contains an unlimited repeat inside a subpattern that can + itself be repeated an unlimited number of times, the use of an atomic + group is the only way to avoid some failing matches taking a long time. + The pattern</p> + + <code> +(\D+|<\d+>)*[!?]</code> + + <p>matches an unlimited number of substrings that either consist of + non-digits, or digits enclosed in <>, followed by ! or ?. When it + matches, it runs quickly. However, if it is applied to</p> + + <code> +aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa</code> + + <p>it takes a long time before reporting failure. This is because the string + can be divided between the internal \D+ repeat and the external * repeat + in many ways, and all must be tried. (The example uses [!?] rather than a + single character at the end, as both PCRE and Perl have an optimization + that allows for fast failure when a single character is used. They + remember the last single character that is required for a match, and fail + early if it is not present in the string.) If the pattern is changed so + that it uses an atomic group, like the following, sequences of non-digits + cannot be broken, and failure happens quickly:</p> + + <code> +((?>\D+)|<\d+>)*[!?]</code> + </section> -<quote><p> (?<!foo)bar</p></quote> + <section> + <marker id="sect16"></marker> + <title>Back References</title> + <p>Outside a character class, a backslash followed by a digit > 0 (and + possibly further digits) is a back reference to a capturing subpattern + earlier (that is, to its left) in the pattern, provided there have been + that many previous capturing left parentheses.</p> + + <p>However, if the decimal number following the backslash is < 10, it is + always taken as a back reference, and causes an error only if there are + not that many capturing left parentheses in the entire pattern. That is, + the parentheses that are referenced do need not be to the left of the + reference for numbers < 10. A "forward back reference" of this type can + make sense when a repetition is involved and the subpattern to the right + has participated in an earlier iteration.</p> + + <p>It is not possible to have a numerical "forward back reference" to a + subpattern whose number is 10 or more using this syntax, as a sequence + such as \50 is interpreted as a character defined in octal. For more + details of the handling of digits following a backslash, see section + <seealso marker="#non_printing_characters">Non-Printing + Characters</seealso> earlier. There is no such problem when named + parentheses are used. A back reference to any subpattern is possible + using named parentheses (see below).</p> + + <p>Another way to avoid the ambiguity inherent in the use of digits + following a backslash is to use the \g escape sequence. This escape must + be followed by an unsigned number or a negative number, optionally + enclosed in braces. The following examples are identical:</p> + + <code> +(ring), \1 +(ring), \g1 +(ring), \g{1}</code> + + <p>An unsigned number specifies an absolute reference without the ambiguity + that is present in the older syntax. It is also useful when literal digits + follow the reference. A negative number is a relative reference. Consider + the following example:</p> + + <code> +(abc(def)ghi)\g{-1}</code> + + <p>The sequence \g{-1} is a reference to the most recently started capturing + subpattern before \g, that is, it is equivalent to \2 in this example. + Similarly, \g{-2} would be equivalent to \1. The use of relative + references can be helpful in long patterns, and also in patterns that are + created by joining fragments containing references within themselves.</p> + + <p>A back reference matches whatever matched the capturing subpattern in the + current subject string, rather than anything matching the subpattern + itself (section <seealso marker="#sect21">Subpattern as + Subroutines</seealso> describes a way of doing that). So, the + following pattern matches "sense and sensibility" and "response and + responsibility", but not "sense and responsibility":</p> + + <code> +(sens|respons)e and \1ibility</code> + + <p>If caseful matching is in force at the time of the back reference, the + case of letters is relevant. For example, the following matches "rah rah" + and "RAH RAH", but not "RAH rah", although the original capturing + subpattern is matched caselessly:</p> + + <code> +((?i)rah)\s+\1</code> + + <p>There are many different ways of writing back references to named + subpatterns. The .NET syntax <c>\k{name}</c> and the Perl syntax + <c>\k<name></c> or <c>\k'name'</c> are supported, as is the Python + syntax <c>(?P=name)</c>. The unified back reference syntax in Perl 5.10, + in which \g can be used for both numeric and named references, is also + supported. The previous example can be rewritten in the following + ways:</p> + + <code> +(?<p1>(?i)rah)\s+\k<p1> +(?'p1'(?i)rah)\s+\k{p1} +(?P<p1>(?i)rah)\s+(?P=p1) +(?<p1>(?i)rah)\s+\g{p1}</code> + + <p>A subpattern that is referenced by name can appear in the pattern before + or after the reference.</p> + + <p>There can be more than one back reference to the same subpattern. If a + subpattern has not been used in a particular match, any back references to + it always fails. For example, the following pattern always fails if it + starts to match "a" rather than "bc":</p> + + <code> +(a|(bc))\2</code> + + <p>As there can be many capturing parentheses in a pattern, all digits + following the backslash are taken as part of a potential back reference + number. If the pattern continues with a digit character, some delimiter + must be used to terminate the back reference. If option <c>extended</c> is + set, this can be whitespace. Otherwise an empty comment (see section + <seealso marker="#sect19">Comments</seealso>) can be used.</p> + + <p><em>Recursive Back References</em></p> + + <p>A back reference that occurs inside the parentheses to which it refers + fails when the subpattern is first used, so, for example, (a\1) never + matches. However, such references can be useful inside repeated + subpatterns. For example, the following pattern matches any number of + "a"s and also "aba", "ababbaa", and so on:</p> + + <code> +(a|b\1)+</code> + + <p>At each iteration of the subpattern, the back reference matches the + character string corresponding to the previous iteration. In order for + this to work, the pattern must be such that the first iteration does not + need to match the back reference. This can be done using alternation, as + in the example above, or by a quantifier with a minimum of zero.</p> + + <p>Back references of this type cause the group that they reference to be + treated as an atomic group. Once the whole group has been matched, a + subsequent matching failure cannot cause backtracking into the middle of + the group.</p> + </section> -<p>does find an occurrence of "bar" that is not preceded by "foo". The contents of -a lookbehind assertion are restricted such that all the strings it matches must -have a fixed length. However, if there are several top-level alternatives, they -do not all have to have the same fixed length. Thus</p> + <section> + <marker id="sect17"></marker> + <title>Assertions</title> + <p>An assertion is a test on the characters following or preceding the + current matching point that does not consume any characters. The simple + assertions coded as \b, \B, \A, \G, \Z, \z, ^, and $ are described in + the previous sections.</p> + + <p>More complicated assertions are coded as subpatterns. There are two + kinds: those that look ahead of the current position in the subject + string, and those that look behind it. An assertion subpattern is matched + in the normal way, except that it does not cause the current matching + position to be changed.</p> + + <p>Assertion subpatterns are not capturing subpatterns. If such an assertion + contains capturing subpatterns within it, these are counted for the + purposes of numbering the capturing subpatterns in the whole pattern. + However, substring capturing is done only for positive assertions. (Perl + sometimes, but not always, performs capturing in negative assertions.)</p> + + <p>For compatibility with Perl, assertion subpatterns can be repeated. + However, it makes no sense to assert the same thing many times, the side + effect of capturing parentheses can occasionally be useful. In practice, + there are only three cases:</p> + + <list type="bulleted"> + <item> + <p>If the quantifier is {0}, the assertion is never obeyed during + matching. However, it can contain internal capturing parenthesized + groups that are called from elsewhere through the subroutine + mechanism.</p> + </item> + <item> + <p>If quantifier is {0,n}, where n > 0, it is treated as if it was + {0,1}. At runtime, the remaining pattern match is tried with and + without the assertion, the order depends on the greediness of the + quantifier.</p> + </item> + <item> + <p>If the minimum repetition is > 0, the quantifier is ignored. The + assertion is obeyed only once when encountered during matching.</p> + </item> + </list> -<quote><p> (?<=bullock|donkey)</p></quote> + <p><em>Lookahead Assertions</em></p> -<p>is permitted, but</p> + <p>Lookahead assertions start with (?= for positive assertions and (?! for + negative assertions. For example, the following matches a word followed by + a semicolon, but does not include the semicolon in the match:</p> -<quote><p> (?<!dogs?|cats?)</p></quote> + <code> +\w+(?=;)</code> -<p>causes an error at compile time. Branches that match different length strings -are permitted only at the top level of a lookbehind assertion. This is an -extension compared with Perl, which requires all branches to -match the same length of string. An assertion such as</p> + <p>The following matches any occurrence of "foo" that is not followed by + "bar":</p> -<quote><p> (?<=ab(c|de))</p></quote> + <code> +foo(?!bar)</code> -<p>is not permitted, because its single top-level branch can match two different -lengths, but it is acceptable to PCRE if rewritten to use two top-level -branches:</p> + <p>Notice that the apparently similar pattern</p> -<quote><p> (?<=abc|abde)</p></quote> + <code> +(?!foo)bar</code> -<p>In some cases, the escape sequence \K (see above) can be -used instead of a lookbehind assertion to get round the fixed-length -restriction.</p> + <p>does not find an occurrence of "bar" that is preceded by something other + than "foo". It finds any occurrence of "bar" whatsoever, as the assertion + (?!foo) is always true when the next three characters are "bar". A + lookbehind assertion is needed to achieve the other effect.</p> -<p>The implementation of lookbehind assertions is, for each alternative, to -temporarily move the current position back by the fixed length and then try to -match. If there are insufficient characters before the current position, the -assertion fails.</p> + <p>If you want to force a matching failure at some point in a pattern, the + most convenient way to do it is with (?!), as an empty string always + matches. So, an assertion that requires there is not to be an empty + string must always fail. The backtracking control verb (*FAIL) or (*F) is + a synonym for (?!).</p> -<p>In a UTF mode, PCRE does not allow the \C escape (which matches a single data -unit even in a UTF mode) to appear in lookbehind assertions, because it makes -it impossible to calculate the length of the lookbehind. The \X and \R -escapes, which can match different numbers of data units, are also not -permitted.</p> -<p>"Subroutine" calls (see below) such as (?2) or (?&X) are permitted in lookbehinds, as long -as the subpattern matches a fixed-length string. Recursion, -however, is not supported.</p> + <p><em>Lookbehind Assertions</em></p> -<p>Possessive quantifiers can be used in conjunction with lookbehind assertions to -specify efficient matching of fixed-length strings at the end of subject -strings. Consider a simple pattern such as</p> + <p>Lookbehind assertions start with (?<= for positive assertions and + (?<! for negative assertions. For example, the following finds an + occurrence of "bar" that is not preceded by "foo":</p> -<quote><p> abcd$</p></quote> + <code> +(?<!foo)bar</code> -<p>when applied to a long string that does not match. Because matching proceeds -from left to right, PCRE will look for each "a" in the subject and then see if -what follows matches the rest of the pattern. If the pattern is specified as</p> + <p>The contents of a lookbehind assertion are restricted such that all the + strings it matches must have a fixed length. However, if there are many + top-level alternatives, they do not all have to have the same fixed + length. Thus, the following is permitted:</p> -<quote><p> ^.*abcd$</p></quote> + <code> +(?<=bullock|donkey)</code> -<p>the initial .* matches the entire string at first, but when this fails (because -there is no following "a"), it backtracks to match all but the last character, -then all but the last two characters, and so on. Once again the search for "a" -covers the entire string, from right to left, so we are no better off. However, -if the pattern is written as</p> + <p>The following causes an error at compile time:</p> -<quote><p> ^.*+(?<=abcd)</p></quote> + <code> +(?<!dogs?|cats?)</code> -<p>there can be no backtracking for the .*+ item; it can match only the entire -string. The subsequent lookbehind assertion does a single test on the last four -characters. If it fails, the match fails immediately. For long strings, this -approach makes a significant difference to the processing time.</p> + <p>Branches that match different length strings are permitted only at the + top-level of a lookbehind assertion. This is an extension compared with + Perl, which requires all branches to match the same length of string. An + assertion such as the following is not permitted, as its single top-level + branch can match two different lengths:</p> -<p><em>Using multiple assertions</em></p> + <code> +(?<=ab(c|de))</code> -<p>Several assertions (of any sort) may occur in succession. For example,</p> + <p>However, it is acceptable to PCRE if rewritten to use two top-level + branches:</p> -<quote><p> (?<=\d{3})(?<!999)foo</p></quote> + <code> +(?<=abc|abde)</code> -<p>matches "foo" preceded by three digits that are not "999". Notice -that each of the assertions is applied independently at the same point -in the subject string. First there is a check that the previous three -characters are all digits, and then there is a check that the same -three characters are not "999". This pattern does <em>not</em> match -"foo" preceded by six characters, the first of which are digits and -the last three of which are not "999". For example, it doesn't match -"123abcfoo". A pattern to do that is</p> + <p>Sometimes the escape sequence \K (see above) can be used instead of + a lookbehind assertion to get round the fixed-length restriction.</p> -<quote><p> (?<=\d{3}...)(?<!999)foo</p></quote> + <p>The implementation of lookbehind assertions is, for each alternative, to + move the current position back temporarily by the fixed length and then + try to match. If there are insufficient characters before the current + position, the assertion fails.</p> -<p>This time the first assertion looks at the preceding six -characters, checking that the first three are digits, and then the -second assertion checks that the preceding three characters are not -"999".</p> + <p>In a UTF mode, PCRE does not allow the \C escape (which matches a single + data unit even in a UTF mode) to appear in lookbehind assertions, as it + makes it impossible to calculate the length of the lookbehind. The \X and + \R escapes, which can match different numbers of data units, are not + permitted either.</p> -<p>Assertions can be nested in any combination. For example,</p> + <p>"Subroutine" calls (see below), such as (?2) or (?&X), are permitted + in lookbehinds, as long as the subpattern matches a fixed-length string. + Recursion, however, is not supported.</p> -<quote><p> (?<=(?<!foo)bar)baz</p></quote> + <p>Possessive quantifiers can be used with lookbehind + assertions to specify efficient matching of fixed-length strings at the + end of subject strings. Consider the following simple pattern when applied + to a long string that does not match:</p> -<p>matches an occurrence of "baz" that is preceded by "bar" which in -turn is not preceded by "foo", while</p> + <code> +abcd$</code> -<quote><p> (?<=\d{3}(?!999)...)foo</p></quote> + <p>As matching proceeds from left to right, PCRE looks for each "a" in the + subject and then sees if what follows matches the remaining pattern. If + the pattern is specified as</p> -<p>is another pattern that matches "foo" preceded by three digits and any three -characters that are not "999".</p> + <code> +^.*abcd$</code> -</section> + <p>the initial .* matches the entire string at first. However, when this + fails (as there is no following "a"), it backtracks to match all but the + last character, then all but the last two characters, and so on. Once + again the search for "a" covers the entire string, from right to left, so + we are no better off. However, if the pattern is written as</p> -<section><marker id="sect18"></marker><title>Conditional subpatterns</title> + <code> +^.*+(?<=abcd)</code> -<p>It is possible to cause the matching process to obey a subpattern -conditionally or to choose between two alternative subpatterns, depending on -the result of an assertion, or whether a specific capturing subpattern has -already been matched. The two possible forms of conditional subpattern are:</p> + <p>there can be no backtracking for the .*+ item; it can match only the + entire string. The subsequent lookbehind assertion does a single test on + the last four characters. If it fails, the match fails immediately. For + long strings, this approach makes a significant difference to the + processing time.</p> -<list> -<item>(?(condition)yes-pattern)</item> -<item>(?(condition)yes-pattern|no-pattern)</item> -</list> + <p><em>Using Multiple Assertions</em></p> -<p>If the condition is satisfied, the yes-pattern is used; otherwise the -no-pattern (if present) is used. If there are more than two alternatives in the -subpattern, a compile-time error occurs. Each of the two alternatives may -itself contain nested subpatterns of any form, including conditional -subpatterns; the restriction to two alternatives applies only at the level of -the condition. This pattern fragment is an example where the alternatives are -complex:</p> + <p>Many assertions (of any sort) can occur in succession. For example, the + following matches "foo" preceded by three digits that are not "999":</p> -<quote><p> (?(1) (A|B|C) | (D | (?(2)E|F) | E) )</p></quote> + <code> +(?<=\d{3})(?<!999)foo</code> -<p>There are four kinds of condition: references to subpatterns, references to -recursion, a pseudo-condition called DEFINE, and assertions.</p> + <p>Notice that each of the assertions is applied independently at the same + point in the subject string. First there is a check that the previous + three characters are all digits, and then there is a check that the same + three characters are not "999". This pattern does <em>not</em> match + "foo" preceded by six characters, the first of which are digits and the + last three of which are not "999". For example, it does not match + "123abcfoo". A pattern to do that is the following:</p> + <code> +(?<=\d{3}...)(?<!999)foo</code> -<p><em>Checking for a used subpattern by number</em></p> + <p>This time the first assertion looks at the preceding six characters, + checks that the first three are digits, and then the second assertion + checks that the preceding three characters are not "999".</p> -<p>If the text between the parentheses consists of a sequence of -digits, the condition is true if a capturing subpattern of that number has previously -matched. If there is more than one capturing subpattern with the same number -(see the earlier section about duplicate subpattern numbers), -the condition is true if any of them have matched. An alternative notation is -to precede the digits with a plus or minus sign. In this case, the subpattern -number is relative rather than absolute. The most recently opened parentheses -can be referenced by (?(-1), the next most recent by (?(-2), and so on. Inside -loops it can also make sense to refer to subsequent groups. The next -parentheses to be opened can be referenced as (?(+1), and so on. (The value -zero in any of these forms is not used; it provokes a compile-time error.)</p> + <p>Assertions can be nested in any combination. For example, the following + matches an occurrence of "baz" that is preceded by "bar", which in turn is + not preceded by "foo":</p> -<p>Consider the following pattern, which contains non-significant -whitespace to make it more readable (assume the <c>extended</c> -option) and to divide it into three parts for ease of discussion:</p> + <code> +(?<=(?<!foo)bar)baz</code> -<quote><p> ( \( )? [^()]+ (?(1) \) )</p></quote> + <p>The following pattern matches "foo" preceded by three digits and any + three characters that are not "999":</p> -<p>The first part matches an optional opening parenthesis, and if that -character is present, sets it as the first captured substring. The second part -matches one or more characters that are not parentheses. The third part is a -conditional subpattern that tests whether or not the first set of parentheses matched -or not. If they did, that is, if subject started with an opening parenthesis, -the condition is true, and so the yes-pattern is executed and a closing -parenthesis is required. Otherwise, since no-pattern is not present, the -subpattern matches nothing. In other words, this pattern matches a sequence of -non-parentheses, optionally enclosed in parentheses.</p> + <code> +(?<=\d{3}(?!999)...)foo</code> + </section> -<p>If you were embedding this pattern in a larger one, you could use a relative -reference:</p> + <section> + <marker id="sect18"></marker> + <title>Conditional Subpatterns</title> + <p>It is possible to cause the matching process to obey a subpattern + conditionally or to choose between two alternative subpatterns, depending + on the result of an assertion, or whether a specific capturing subpattern + has already been matched. The following are the two possible forms of + conditional subpattern:</p> + + <code> +(?(condition)yes-pattern) +(?(condition)yes-pattern|no-pattern)</code> + + <p>If the condition is satisfied, the yes-pattern is used, otherwise the + no-pattern (if present). If more than two alternatives exist in the + subpattern, a compile-time error occurs. Each of the two alternatives can + itself contain nested subpatterns of any form, including conditional + subpatterns; the restriction to two alternatives applies only at the level + of the condition. The following pattern fragment is an example where the + alternatives are complex:</p> + + <code> +(?(1) (A|B|C) | (D | (?(2)E|F) | E) )</code> + + <p>There are four kinds of condition: references to subpatterns, references + to recursion, a pseudo-condition called DEFINE, and assertions.</p> + + <p><em>Checking for a Used Subpattern By Number</em></p> + + <p>If the text between the parentheses consists of a sequence of digits, + the condition is true if a capturing subpattern of that number has + previously matched. If more than one capturing subpattern with the same + number exists (see section <seealso marker="#sect12"> + Duplicate Subpattern Numbers</seealso> earlier), the condition is true if + any of them have matched. An alternative notation is to precede the + digits with a plus or minus sign. In this case, the subpattern number is + relative rather than absolute. The most recently opened parentheses can be + referenced by (?(-1), the next most recent by (?(-2), and so on. Inside + loops, it can also make sense to refer to subsequent groups. The next + parentheses to be opened can be referenced as (?(+1), and so on. (The + value zero in any of these forms is not used; it provokes a compile-time + error.)</p> + + <p>Consider the following pattern, which contains non-significant whitespace + to make it more readable (assume option <c>extended</c>) and to divide it + into three parts for ease of discussion:</p> + + <code> +( \( )? [^()]+ (?(1) \) )</code> + + <p>The first part matches an optional opening parenthesis, and if that + character is present, sets it as the first captured substring. The second + part matches one or more characters that are not parentheses. The third + part is a conditional subpattern that tests whether the first set of + parentheses matched or not. If they did, that is, if subject started with + an opening parenthesis, the condition is true, and so the yes-pattern is + executed and a closing parenthesis is required. Otherwise, as no-pattern + is not present, the subpattern matches nothing. That is, this pattern + matches a sequence of non-parentheses, optionally enclosed in + parentheses.</p> + + <p>If this pattern is embedded in a larger one, a relative reference can be + used:</p> + + <code> +...other stuff... ( \( )? [^()]+ (?(-1) \) ) ...</code> + + <p>This makes the fragment independent of the parentheses in the larger + pattern.</p> + + <p><em>Checking for a Used Subpattern By Name</em></p> + + <p>Perl uses the syntax (?(<name>)...) or (?('name')...) to test for a + used subpattern by name. For compatibility with earlier versions of PCRE, + which had this facility before Perl, the syntax (?(name)...) is also + recognized. However, there is a possible ambiguity with this syntax, as + subpattern names can consist entirely of digits. PCRE looks first for a + named subpattern; if it cannot find one and the name consists entirely of + digits, PCRE looks for a subpattern of that number, which must be > 0. + Using subpattern names that consist entirely of digits is not + recommended.</p> + + <p>Rewriting the previous example to use a named subpattern gives:</p> + + <code> +(?<OPEN> \( )? [^()]+ (?(<OPEN>) \) )</code> + + <p>If the name used in a condition of this kind is a duplicate, the test is + applied to all subpatterns of the same name, and is true if any one of + them has matched.</p> + + <p><em>Checking for Pattern Recursion</em></p> + + <p>If the condition is the string (R), and there is no subpattern with the + name R, the condition is true if a recursive call to the whole pattern or + any subpattern has been made. If digits or a name preceded by ampersand + follow the letter R, for example:</p> + + <code> +(?(R3)...) or (?(R&name)...)</code> + + <p>the condition is true if the most recent recursion is into a subpattern + whose number or name is given. This condition does not check the entire + recursion stack. If the name used in a condition of this kind is a + duplicate, the test is applied to all subpatterns of the same name, and is + true if any one of them is the most recent recursion.</p> + + <p>At "top-level", all these recursion test conditions are false. The syntax + for recursive patterns is described below.</p> + + <p><em>Defining Subpatterns for Use By Reference Only</em></p> + <marker id="defining_subpatterns"/> + + <p>If the condition is the string (DEFINE), and there is no subpattern with + the name DEFINE, the condition is always false. In this case, there can be + only one alternative in the subpattern. It is always skipped if control + reaches this point in the pattern. The idea of DEFINE is that it can be + used to define "subroutines" that can be referenced from elsewhere. (The + use of subroutines is described below.) For example, a pattern to match + an IPv4 address, such as "192.168.23.245", can be written like this + (ignore whitespace and line breaks):</p> + + <code> +(?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) ) \b (?&byte) (\.(?&byte)){3} \b</code> + + <p>The first part of the pattern is a DEFINE group inside which is a another + group named "byte" is defined. This matches an individual component of an + IPv4 address (a number < 256). When matching takes place, this part of + the pattern is skipped, as DEFINE acts like a false condition. The + remaining pattern uses references to the named group to match the four + dot-separated components of an IPv4 address, insisting on a word boundary + at each end.</p> + + <p><em>Assertion Conditions</em></p> + + <p>If the condition is not in any of the above formats, it must be an + assertion. This can be a positive or negative lookahead or lookbehind + assertion. Consider the following pattern, containing non-significant + whitespace, and with the two alternatives on the second line:</p> + + <code type="none"> +(?(?=[^a-z]*[a-z]) +\d{2}-[a-z]{3}-\d{2} | \d{2}-\d{2}-\d{2} )</code> + + <p>The condition is a positive lookahead assertion that matches an optional + sequence of non-letters followed by a letter. That is, it tests for the + presence of at least one letter in the subject. If a letter is found, the + subject is matched against the first alternative, otherwise it is matched + against the second. This pattern matches strings in one of the two forms + dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits.</p> + </section> -<quote><p> ...other stuff... ( \( )? [^()]+ (?(-1) \) ) ...</p></quote> + <section> + <marker id="sect19"></marker> + <title>Comments</title> + <p>There are two ways to include comments in patterns that are processed by + PCRE. In both cases, the start of the comment must not be in a character + class, or in the middle of any other sequence of related characters such + as (?: or a subpattern name or number. The characters that make up a + comment play no part in the pattern matching.</p> + + <p>The sequence (?# marks the start of a comment that continues up to the + next closing parenthesis. Nested parentheses are not permitted. If option + PCRE_EXTENDED is set, an unescaped # character also introduces a comment, + which in this case continues to immediately after the next newline + character or character sequence in the pattern. Which characters are + interpreted as newlines is controlled by the options passed to a + compiling function or by a special sequence at the start of the pattern, + as described in section <seealso marker="#newline_conventions"> + Newline Conventions</seealso> earlier.</p> + + <p>Notice that the end of this type of comment is a literal newline sequence + in the pattern; escape sequences that happen to represent a newline do not + count. For example, consider the following pattern when <c>extended</c> is + set, and the default newline convention is in force:</p> + + <code> +abc #comment \n still comment</code> + + <p>On encountering character #, <c>pcre_compile()</c> skips along, looking + for a newline in the pattern. The sequence \n is still literal at this + stage, so it does not terminate the comment. Only a character with code + value 0x0a (the default newline) does so.</p> + </section> -<p>This makes the fragment independent of the parentheses in the larger pattern.</p> + <section> + <marker id="sect20"></marker> + <title>Recursive Patterns</title> + <p>Consider the problem of matching a string in parentheses, allowing for + unlimited nested parentheses. Without the use of recursion, the best that + can be done is to use a pattern that matches up to some fixed depth of + nesting. It is not possible to handle an arbitrary nesting depth.</p> + + <p>For some time, Perl has provided a facility that allows regular + expressions to recurse (among other things). It does this by + interpolating Perl code in the expression at runtime, and the code can + refer to the expression itself. A Perl pattern using code interpolation to + solve the parentheses problem can be created like this:</p> + + <code> +$re = qr{\( (?: (?>[^()]+) | (?p{$re}) )* \)}x;</code> + + <p>Item (?p{...}) interpolates Perl code at runtime, and in this case refers + recursively to the pattern in which it appears.</p> + + <p>Obviously, PCRE cannot support the interpolation of Perl code. Instead, + it supports special syntax for recursion of the entire pattern, and for + individual subpattern recursion. After its introduction in PCRE and + Python, this kind of recursion was later introduced into Perl at + release 5.10.</p> + + <p>A special item that consists of (? followed by a number > 0 and a + closing parenthesis is a recursive subroutine call of the subpattern of + the given number, if it occurs inside that subpattern. (If not, + it is a non-recursive subroutine call, which is described in the next + section.) The special item (?R) or (?0) is a recursive call of the entire + regular expression.</p> -<p><em>Checking for a used subpattern by name</em></p> + <p>This PCRE pattern solves the nested parentheses problem (assume that + option <c>extended</c> is set so that whitespace is ignored):</p> + + <code> +\( ( [^()]++ | (?R) )* \)</code> + + <p>First it matches an opening parenthesis. Then it matches any number of + substrings, which can either be a sequence of non-parentheses or a + recursive match of the pattern itself (that is, a correctly parenthesized + substring). Finally there is a closing parenthesis. Notice the use of a + possessive quantifier to avoid backtracking into sequences of + non-parentheses.</p> + + <p>If this was part of a larger pattern, you would not want to recurse the + entire pattern, so instead you can use:</p> + + <code> +( \( ( [^()]++ | (?1) )* \) )</code> + + <p>The pattern is here within parentheses so that the recursion refers to + them instead of the whole pattern.</p> + + <p>In a larger pattern, keeping track of parenthesis numbers can be tricky. + This is made easier by the use of relative references. Instead of (?1) in + the pattern above, you can write (?-2) to refer to the second most + recently opened parentheses preceding the recursion. That is, a negative + number counts capturing parentheses leftwards from the point at which it + is encountered.</p> + + <p>It is also possible to refer to later opened parentheses, by + writing references such as (?+2). However, these cannot be recursive, as + the reference is not inside the parentheses that are referenced. They are + always non-recursive subroutine calls, as described in the next + section.</p> + + <p>An alternative approach is to use named parentheses instead. The Perl + syntax for this is (?&name). The earlier PCRE syntax (?P>name) is + also supported. We can rewrite the above example as follows:</p> + + <code> +(?<pn> \( ( [^()]++ | (?&pn) )* \) )</code> + + <p>If there is more than one subpattern with the same name, the earliest + one is used.</p> + + <p>This particular example pattern that we have studied contains nested + unlimited repeats, and so the use of a possessive quantifier for matching + strings of non-parentheses is important when applying the pattern to + strings that do not match. For example, when this pattern is applied + to</p> + + <code> +(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()</code> + + <p>it gives "no match" quickly. However, if a possessive quantifier is not + used, the match runs for a long time, as there are so many different + ways the + and * repeats can carve up the subject, and all must be tested + before failure can be reported.</p> + + <p>At the end of a match, the values of capturing parentheses are those from + the outermost level. If the pattern above is matched against</p> + + <code> +(ab(cd)ef)</code> + + <p>the value for the inner capturing parentheses (numbered 2) is "ef", + which is the last value taken on at the top-level. If a capturing + subpattern is not matched at the top level, its final captured value is + unset, even if it was (temporarily) set at a deeper level during the + matching process.</p> + + <p>Do not confuse item (?R) with condition (R), which tests for recursion. + Consider the following pattern, which matches text in angle brackets, + allowing for arbitrary nesting. Only digits are allowed in nested brackets + (that is, when recursing), while any characters are permitted at the + outer level.</p> + + <code> +< (?: (?(R) \d++ | [^<>]*+) | (?R)) * ></code> + + <p>Here (?(R) is the start of a conditional subpattern, with two different + alternatives for the recursive and non-recursive cases. Item (?R) is the + actual recursive call.</p> + + <p><em>Differences in Recursion Processing between PCRE and Perl</em></p> + + <p>Recursion processing in PCRE differs from Perl in two important ways. In + PCRE (like Python, but unlike Perl), a recursive subpattern call is always + treated as an atomic group. That is, once it has matched some of the + subject string, it is never re-entered, even if it contains untried + alternatives and there is a subsequent matching failure. This can be + illustrated by the following pattern, which means to match a palindromic + string containing an odd number of characters (for example, "a", "aba", + "abcba", "abcdcba"):</p> + + <code> +^(.|(.)(?1)\2)$</code> + + <p>The idea is that it either matches a single character, or two identical + characters surrounding a subpalindrome. In Perl, this pattern works; in + PCRE it does not work if the pattern is longer than three characters. + Consider the subject string "abcba".</p> + + <p>At the top level, the first character is matched, but as it is not at + the end of the string, the first alternative fails, the second + alternative is taken, and the recursion kicks in. The recursive call to + subpattern 1 successfully matches the next character ("b"). (Notice that + the beginning and end of line tests are not part of the recursion.)</p> + + <p>Back at the top level, the next character ("c") is compared with what + subpattern 2 matched, which was "a". This fails. As the recursion is + treated as an atomic group, there are now no backtracking points, and so + the entire match fails. (Perl can now re-enter the recursion + and try the second alternative.) However, if the pattern is written with + the alternatives in the other order, things are different:</p> + + <code> +^((.)(?1)\2|.)$</code> + + <p>This time, the recursing alternative is tried first, and continues to + recurse until it runs out of characters, at which point the recursion + fails. But this time we have another alternative to try at the higher + level. That is the significant difference: in the previous case the + remaining alternative is at a deeper recursion level, which PCRE cannot + use.</p> + + <p>To change the pattern so that it matches all palindromic strings, not + only those with an odd number of characters, it is tempting to change the + pattern to this:</p> + + <code> +^((.)(?1)\2|.?)$</code> + + <p>Again, this works in Perl, but not in PCRE, and for the same reason. When + a deeper recursion has matched a single character, it cannot be entered + again to match an empty string. The solution is to separate the two cases, + and write out the odd and even cases as alternatives at the higher + level:</p> + + <code> +^(?:((.)(?1)\2|)|((.)(?3)\4|.))</code> + + <p>If you want to match typical palindromic phrases, the pattern must ignore + all non-word characters, which can be done as follows:</p> + + <code> +^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$</code> + + <p>If run with option <c>caseless</c>, this pattern matches phrases such as + "A man, a plan, a canal: Panama!" and it works well in both PCRE and Perl. + Notice the use of the possessive quantifier *+ to avoid backtracking into + sequences of non-word characters. Without this, PCRE takes much longer + (10 times or more) to match typical phrases, and Perl takes so long that + you think it has gone into a loop.</p> -<p>Perl uses the syntax (?(<name>)...) or (?('name')...) to test -for a used subpattern by name. For compatibility with earlier versions -of PCRE, which had this facility before Perl, the syntax (?(name)...) -is also recognized. However, there is a possible ambiguity with this -syntax, because subpattern names may consist entirely of digits. PCRE -looks first for a named subpattern; if it cannot find one and the name -consists entirely of digits, PCRE looks for a subpattern of that -number, which must be greater than zero. Using subpattern names that -consist entirely of digits is not recommended.</p> + <note> + <p>The palindrome-matching patterns above work only if the subject string + does not start with a palindrome that is shorter than the entire string. + For example, although "abcba" is correctly matched, if the subject is + "ababa", PCRE finds palindrome "aba" at the start, and then fails at top + level, as the end of the string does not follow. Once again, it cannot + jump back into the recursion to try other alternatives, so the entire + match fails.</p> + </note> -<p>Rewriting the above example to use a named subpattern gives this:</p> + <p>The second way in which PCRE and Perl differ in their recursion + processing is in the handling of captured values. In Perl, when a + subpattern is called recursively or as a subpattern (see the next + section), it has no access to any values that were captured outside the + recursion. In PCRE these values can be referenced. Consider the following + pattern:</p> + + <code> +^(.)(\1|a(?2))</code> + + <p>In PCRE, it matches "bab". The first capturing parentheses match "b", + then in the second group, when the back reference \1 fails to match "b", + the second alternative matches "a", and then recurses. In the recursion, + \1 does now match "b" and so the whole match succeeds. In Perl, the + pattern fails to match because inside the recursive call \1 cannot access + the externally set value.</p> + </section> -<quote><p> (?<OPEN> \( )? [^()]+ (?(<OPEN>) \) )</p></quote> + <section> + <marker id="sect21"></marker> + <title>Subpatterns as Subroutines</title> + <p>If the syntax for a recursive subpattern call (either by number or by + name) is used outside the parentheses to which it refers, it operates + like a subroutine in a programming language. The called subpattern can be + defined before or after the reference. A numbered reference can be + absolute or relative, as in the following examples:</p> + + <code> +(...(absolute)...)...(?2)... +(...(relative)...)...(?-1)... +(...(?+1)...(relative)...</code> + + <p>An earlier example pointed out that the following pattern matches "sense + and sensibility" and "response and responsibility", but not "sense and + responsibility":</p> + + <code> +(sens|respons)e and \1ibility</code> + + <p>If instead the following pattern is used, it matches "sense and + responsibility" and the other two strings:</p> + + <code> +(sens|respons)e and (?1)ibility</code> + + <p>Another example is provided in the discussion of DEFINE earlier.</p> + + <p>All subroutine calls, recursive or not, are always treated as atomic + groups. That is, once a subroutine has matched some of the subject string, + it is never re-entered, even if it contains untried alternatives and there + is a subsequent matching failure. Any capturing parentheses that are set + during the subroutine call revert to their previous values afterwards.</p> + + <p>Processing options such as case-independence are fixed when a subpattern + is defined, so if it is used as a subroutine, such options cannot be + changed for different calls. For example, the following pattern matches + "abcabc" but not "abcABC", as the change of processing option does not + affect the called subpattern:</p> + + <code> +(abc)(?i:(?-1))</code> + </section> -<p>If the name used in a condition of this kind is a duplicate, the test is -applied to all subpatterns of the same name, and is true if any one of them has -matched.</p> + <section> + <marker id="sect22"></marker> + <title>Oniguruma Subroutine Syntax</title> + <p>For compatibility with Oniguruma, the non-Perl syntax \g followed by a + name or a number enclosed either in angle brackets or single quotes, is + alternative syntax for referencing a subpattern as a subroutine, possibly + recursively. Here follows two of the examples used above, rewritten using + this syntax:</p> + + <code> +(?<pn> \( ( (?>[^()]+) | \g<pn> )* \) ) +(sens|respons)e and \g'1'ibility</code> + + <p>PCRE supports an extension to Oniguruma: if a number is preceded by a + plus or minus sign, it is taken as a relative reference, for example:</p> + + <code> +(abc)(?i:\g<-1>)</code> + + <p>Notice that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) + are <em>not</em> synonymous. The former is a back reference; the latter + is a subroutine call.</p> + </section> -<p><em>Checking for pattern recursion</em></p> + <section> + <marker id="sect23"></marker> + <title>Backtracking Control</title> + <p>Perl 5.10 introduced some "Special Backtracking Control Verbs", + which are still described in the Perl documentation as "experimental and + subject to change or removal in a future version of Perl". It goes on to + say: "Their usage in production code should be noted to avoid problems + during upgrades." The same remarks apply to the PCRE features described + in this section.</p> + + <p>The new verbs make use of what was previously invalid syntax: an opening + parenthesis followed by an asterisk. They are generally of the form + (*VERB) or (*VERB:NAME). Some can take either form, possibly behaving + differently depending on whether a name is present. A name is any sequence + of characters that does not include a closing parenthesis. The maximum + name length is 255 in the 8-bit library and 65535 in the 16-bit and 32-bit + libraries. If the name is empty, that is, if the closing parenthesis + immediately follows the colon, the effect is as if the colon was not + there. Any number of these verbs can occur in a pattern.</p> + + <p>The behavior of these verbs in repeated groups, assertions, and in + subpatterns called as subroutines (whether or not recursively) is + described below.</p> + + <p><em>Optimizations That Affect Backtracking Verbs</em></p> + + <p>PCRE contains some optimizations that are used to speed up matching by + running some checks at the start of each match attempt. For example, it + can know the minimum length of matching subject, or that a particular + character must be present. When one of these optimizations bypasses the + running of a match, any included backtracking verbs are not processed. + processed. You can suppress the start-of-match optimizations by setting + option <c>no_start_optimize</c> when calling + <seealso marker="#compile/2"><c>compile/2</c></seealso> or + <seealso marker="#run/3"><c>run/3</c></seealso>, or by starting the + pattern with (*NO_START_OPT).</p> + + <p>Experiments with Perl suggest that it too has similar optimizations, + sometimes leading to anomalous results.</p> + + <p><em>Verbs That Act Immediately</em></p> + + <p>The following verbs act as soon as they are encountered. They must not + be followed by a name.</p> + + <code> +(*ACCEPT)</code> + + <p>This verb causes the match to end successfully, skipping the remainder of + the pattern. However, when it is inside a subpattern that is called as a + subroutine, only that subpattern is ended successfully. Matching then + continues at the outer level. If (*ACCEPT) is triggered in a positive + assertion, the assertion succeeds; in a negative assertion, the assertion + fails.</p> + + <p>If (*ACCEPT) is inside capturing parentheses, the data so far is + captured. For example, the following matches "AB", "AAD", or "ACD". When + it matches "AB", "B" is captured by the outer parentheses.</p> + + <code> +A((?:A|B(*ACCEPT)|C)D)</code> + + <p>The following verb causes a matching failure, forcing backtracking to + occur. It is equivalent to (?!) but easier to read.</p> + + <code> +(*FAIL) or (*F)</code> + + <p>The Perl documentation states that it is probably useful only when + combined with (?{}) or (??{}). Those are Perl features that + are not present in PCRE.</p> + + <p>A match with the string "aaaa" always fails, but the callout is taken + before each backtrack occurs (in this example, 10 times).</p> + + <p><em>Recording Which Path Was Taken</em></p> + + <p>The main purpose of this verb is to track how a match was arrived at, + although it also has a secondary use in with advancing the match + starting point (see (*SKIP) below).</p> -<p>If the condition is the string (R), and there is no subpattern with -the name R, the condition is true if a recursive call to the whole -pattern or any subpattern has been made. If digits or a name preceded -by ampersand follow the letter R, for example:</p> + <note> + <p>In Erlang, there is no interface to retrieve a mark with + <seealso marker="#run/2"><c>run/2,3</c></seealso>, so only the secondary + purpose is relevant to the Erlang programmer.</p> -<quote><p> (?(R3)...) or (?(R&name)...)</p></quote> + <p>The rest of this section is therefore deliberately not adapted for + reading by the Erlang programmer, but the examples can help in + understanding NAMES as they can be used by (*SKIP).</p> + </note> -<p>the condition is true if the most recent recursion is into a -subpattern whose number or name is given. This condition does not -check the entire recursion stack. If the name used in a condition of this kind is a duplicate, the test is -applied to all subpatterns of the same name, and is true if any one of them is -the most recent recursion.</p> + <code> +(*MARK:NAME) or (*:NAME)</code> -<p>At "top level", all these recursion test conditions are false. The syntax for recursive -patterns is described below.</p> - -<p><em>Defining subpatterns for use by reference only</em></p> - -<p>If the condition is the string (DEFINE), and there is no subpattern with the -name DEFINE, the condition is always false. In this case, there may be only one -alternative in the subpattern. It is always skipped if control reaches this -point in the pattern; the idea of DEFINE is that it can be used to define -"subroutines" that can be referenced from elsewhere. (The use of subroutines -is described below.) For example, a pattern to match an IPv4 address such as -"192.168.23.245" could be -written like this (ignore whitespace and line breaks):</p> + <p>A name is always required with this verb. There can be as many instances + of (*MARK) as you like in a pattern, and their names do not have to be + unique.</p> -<quote><p> (?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) ) - \b (?&byte) (\.(?&byte)){3} \b</p></quote> - -<p>The first part of the pattern is a DEFINE group inside which a -another group named "byte" is defined. This matches an individual -component of an IPv4 address (a number less than 256). When matching -takes place, this part of the pattern is skipped because DEFINE acts -like a false condition. The rest of the pattern uses references to the -named group to match the four dot-separated components of an IPv4 -address, insisting on a word boundary at each end.</p> - -<p><em>Assertion conditions</em></p> + <p>When a match succeeds, the name of the last encountered (*MARK:NAME), + (*PRUNE:NAME), or (*THEN:NAME) on the matching path is passed back to the + caller as described in section "Extra data for <c>pcre_exec()</c>" in the + <c>pcreapi</c> documentation. In the following example of <c>pcretest</c> + output, the /K modifier requests the retrieval and outputting of (*MARK) + data:</p> -<p>If the condition is not in any of the above formats, it must be an -assertion. This may be a positive or negative lookahead or lookbehind -assertion. Consider this pattern, again containing non-significant -whitespace, and with the two alternatives on the second line:</p> +<code> + re> /X(*MARK:A)Y|X(*MARK:B)Z/K +data> XY + 0: XY +MK: A +XZ + 0: XZ +MK: B</code> + + <p>The (*MARK) name is tagged with "MK:" in this output, and in this example + it indicates which of the two alternatives matched. This is a more + efficient way of obtaining this information than putting each alternative + in its own capturing parentheses.</p> + + <p>If a verb with a name is encountered in a positive assertion that is + true, the name is recorded and passed back if it is the last encountered. + This does not occur for negative assertions or failing positive + assertions.</p> + + <p>After a partial match or a failed match, the last encountered name in the + entire match process is returned, for example:</p> + + <code> + re> /X(*MARK:A)Y|X(*MARK:B)Z/K +data> XP +No match, mark = B</code> + + <p>Notice that in this unanchored example, the mark is retained from the + match attempt that started at letter "X" in the subject. Subsequent match + attempts starting at "P" and then with an empty string do not get as far + as the (*MARK) item, nevertheless do not reset it.</p> + + <p><em>Verbs That Act after Backtracking</em></p> + + <p>The following verbs do nothing when they are encountered. Matching + continues with what follows, but if there is no subsequent match, causing + a backtrack to the verb, a failure is forced. That is, backtracking cannot + pass to the left of the verb. However, when one of these verbs appears + inside an atomic group or an assertion that is true, its effect is + confined to that group, as once the group has been matched, there is never + any backtracking into it. In this situation, backtracking can "jump back" + to the left of the entire atomic group or assertion. (Remember also, as + stated above, that this localization also applies in subroutine + calls.)</p> + + <p>These verbs differ in exactly what kind of failure occurs when + backtracking reaches them. The behavior described below is what occurs + when the verb is not in a subroutine or an assertion. Subsequent sections + cover these special cases.</p> + + <p>The following verb, which must not be followed by a name, causes the + whole match to fail outright if there is a later matching failure that + causes backtracking to reach it. Even if the pattern is unanchored, no + further attempts to find a match by advancing the starting point take + place.</p> + + <code> +(*COMMIT)</code> + + <p>If (*COMMIT) is the only backtracking verb that is encountered, once it + has been passed, <seealso marker="#run/2"><c>run/2,3</c></seealso> is + committed to find a match at the current starting point, or not at all, + for example:</p> + + <code> +a+(*COMMIT)b</code> + + <p>This matches "xxaab" but not "aacaab". It can be thought of as a kind of + dynamic anchor, or "I've started, so I must finish". The name of the most + recently passed (*MARK) in the path is passed back when (*COMMIT) forces + a match failure.</p> + + <p>If more than one backtracking verb exists in a pattern, a different one + that follows (*COMMIT) can be triggered first, so merely passing (*COMMIT) + during a match does not always guarantee that a match must be at this + starting point.</p> + + <p>Notice that (*COMMIT) at the start of a pattern is not the same as an + anchor, unless the PCRE start-of-match optimizations are turned off, as + shown in the following example:</p> <code type="none"> - (?(?=[^a-z]*[a-z]) - \d{2}-[a-z]{3}-\d{2} | \d{2}-\d{2}-\d{2} )</code> - -<p>The condition is a positive lookahead assertion that matches an optional -sequence of non-letters followed by a letter. In other words, it tests for the -presence of at least one letter in the subject. If a letter is found, the -subject is matched against the first alternative; otherwise it is matched -against the second. This pattern matches strings in one of the two forms -dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits.</p> - - -</section> - -<section><marker id="sect19"></marker><title>Comments</title> - -<p>There are two ways of including comments in patterns that are processed by -PCRE. In both cases, the start of the comment must not be in a character class, -nor in the middle of any other sequence of related characters such as (?: or a -subpattern name or number. The characters that make up a comment play no part -in the pattern matching.</p> - -<p>The sequence (?# marks the start of a comment that continues up to the next -closing parenthesis. Nested parentheses are not permitted. If the PCRE_EXTENDED -option is set, an unescaped # character also introduces a comment, which in -this case continues to immediately after the next newline character or -character sequence in the pattern. Which characters are interpreted as newlines -is controlled by the options passed to a compiling function or by a special -sequence at the start of the pattern, as described in the section entitled -"Newline conventions" -above. Note that the end of this type of comment is a literal newline sequence -in the pattern; escape sequences that happen to represent a newline do not -count. For example, consider this pattern when <c>extended</c> is set, and the -default newline convention is in force:</p> - -<quote><p> abc #comment \n still comment</p></quote> - -<p>On encountering the # character, <em>pcre_compile()</em> skips along, looking for -a newline in the pattern. The sequence \n is still literal at this stage, so -it does not terminate the comment. Only an actual character with the code value -0x0a (the default newline) does so.</p> - -</section> - -<section><marker id="sect20"></marker><title>Recursive patterns</title> +1> re:run("xyzabc","(*COMMIT)abc",[{capture,all,list}]). +{match,["abc"]} +2> re:run("xyzabc","(*COMMIT)abc",[{capture,all,list},no_start_optimize]). +nomatch</code> + + <p>PCRE knows that any match must start with "a", so the optimization skips + along the subject to "a" before running the first match attempt, which + succeeds. When the optimization is disabled by option + <c>no_start_optimize</c>, the match starts at "x" and so the (*COMMIT) + causes it to fail without trying any other starting points.</p> + + <p>The following verb causes the match to fail at the current starting + position in the subject if there is a later matching failure that causes + backtracking to reach it:</p> + + <code> +(*PRUNE) or (*PRUNE:NAME)</code> + + <p>If the pattern is unanchored, the normal "bumpalong" advance to the next + starting character then occurs. Backtracking can occur as usual to the + left of (*PRUNE), before it is reached, or when matching to the right of + (*PRUNE), but if there is no match to the right, backtracking cannot + cross (*PRUNE). In simple cases, the use of (*PRUNE) is just an + alternative to an atomic group or possessive quantifier, but there are + some uses of (*PRUNE) that cannot be expressed in any other way. In an + anchored pattern, (*PRUNE) has the same effect as (*COMMIT).</p> + + <p>The behavior of (*PRUNE:NAME) is the not the same as + (*MARK:NAME)(*PRUNE). It is like (*MARK:NAME) in that the name is + remembered for passing back to the caller. However, (*SKIP:NAME) searches + only for names set with (*MARK).</p> -<p>Consider the problem of matching a string in parentheses, allowing for -unlimited nested parentheses. Without the use of recursion, the best that can -be done is to use a pattern that matches up to some fixed depth of nesting. It -is not possible to handle an arbitrary nesting depth.</p> - -<p>For some time, Perl has provided a facility that allows regular -expressions to recurse (amongst other things). It does this by -interpolating Perl code in the expression at run time, and the code -can refer to the expression itself. A Perl pattern using code -interpolation to solve the parentheses problem can be created like -this:</p> - -<quote><p> $re = qr{\( (?: (?>[^()]+) | (?p{$re}) )* \)}x;</p></quote> - -<p>The (?p{...}) item interpolates Perl code at run time, and in this -case refers recursively to the pattern in which it appears.</p> - -<p>Obviously, PCRE cannot support the interpolation of Perl code. Instead, it -supports special syntax for recursion of the entire pattern, and also for -individual subpattern recursion. After its introduction in PCRE and Python, -this kind of recursion was subsequently introduced into Perl at release 5.10.</p> - -<p>A special item that consists of (? followed by a number greater -than zero and a closing parenthesis is a recursive subroutine call of the -subpattern of the given number, provided that it occurs inside that -subpattern. (If not, it is a non-recursive subroutine call, which is described in -the next section.) The special item (?R) or (?0) is a recursive call -of the entire regular expression.</p> - -<p>This PCRE pattern solves the nested parentheses problem (assume the -<c>extended</c> option is set so that whitespace is ignored):</p> - -<quote><p> \( ( [^()]++ | (?R) )* \)</p></quote> - -<p>First it matches an opening parenthesis. Then it matches any number -of substrings which can either be a sequence of non-parentheses, or a -recursive match of the pattern itself (that is, a correctly -parenthesized substring). Finally there is a closing -parenthesis. Note the use of a possessive quantifier to avoid -backtracking into sequences of non-parentheses.</p> - -<p>If this were part of a larger pattern, you would not want to -recurse the entire pattern, so instead you could use this:</p> - -<quote><p> ( \( ( [^()]++ | (?1) )* \) )</p></quote> - -<p>We have put the pattern into parentheses, and caused the recursion -to refer to them instead of the whole pattern.</p> - -<p>In a larger pattern, keeping track of parenthesis numbers can be tricky. This -is made easier by the use of relative references. Instead of (?1) in the -pattern above you can write (?-2) to refer to the second most recently opened -parentheses preceding the recursion. In other words, a negative number counts -capturing parentheses leftwards from the point at which it is encountered.</p> - -<p>It is also possible to refer to subsequently opened parentheses, by -writing references such as (?+2). However, these cannot be recursive -because the reference is not inside the parentheses that are -referenced. They are always non-recursive subroutine calls, as described in the -next section.</p> - -<p>An alternative approach is to use named parentheses instead. The -Perl syntax for this is (?&name); PCRE's earlier syntax -(?P>name) is also supported. We could rewrite the above example as -follows:</p> - -<quote><p> (?<pn> \( ( [^()]++ | (?&pn) )* \) )</p></quote> - -<p>If there is more than one subpattern with the same name, the earliest one is -used.</p> - -<p>This particular example pattern that we have been looking at contains nested -unlimited repeats, and so the use of a possessive quantifier for matching -strings of non-parentheses is important when applying the pattern to strings -that do not match. For example, when this pattern is applied to</p> - -<quote><p> (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()</p></quote> - -<p>it yields "no match" quickly. However, if a possessive quantifier is not used, -the match runs for a very long time indeed because there are so many different -ways the + and * repeats can carve up the subject, and all have to be tested -before failure can be reported.</p> - -<p>At the end of a match, the values of capturing parentheses are those from -the outermost level. If the pattern above is matched against</p> - -<quote><p> (ab(cd)ef)</p></quote> - -<p>the value for the inner capturing parentheses (numbered 2) is "ef", which is -the last value taken on at the top level. If a capturing subpattern is not -matched at the top level, its final captured value is unset, even if it was -(temporarily) set at a deeper level during the matching process.</p> - -<p>Do not confuse the (?R) item with the condition (R), which tests for recursion. -Consider this pattern, which matches text in angle brackets, allowing for -arbitrary nesting. Only digits are allowed in nested brackets (that is, when -recursing), whereas any characters are permitted at the outer level.</p> - -<quote><p> < (?: (?(R) \d++ | [^<>]*+) | (?R)) * ></p></quote> - -<p>In this pattern, (?(R) is the start of a conditional subpattern, with two -different alternatives for the recursive and non-recursive cases. The (?R) item -is the actual recursive call.</p> - -<p><em>Differences in recursion processing between PCRE and Perl</em></p> - -<p>Recursion processing in PCRE differs from Perl in two important ways. In PCRE -(like Python, but unlike Perl), a recursive subpattern call is always treated -as an atomic group. That is, once it has matched some of the subject string, it -is never re-entered, even if it contains untried alternatives and there is a -subsequent matching failure. This can be illustrated by the following pattern, -which purports to match a palindromic string that contains an odd number of -characters (for example, "a", "aba", "abcba", "abcdcba"):</p> - -<quote><p> ^(.|(.)(?1)\2)$</p></quote> - -<p>The idea is that it either matches a single character, or two identical -characters surrounding a sub-palindrome. In Perl, this pattern works; in PCRE -it does not if the pattern is longer than three characters. Consider the -subject string "abcba":</p> - -<p>At the top level, the first character is matched, but as it is not at the end -of the string, the first alternative fails; the second alternative is taken -and the recursion kicks in. The recursive call to subpattern 1 successfully -matches the next character ("b"). (Note that the beginning and end of line -tests are not part of the recursion).</p> - -<p>Back at the top level, the next character ("c") is compared with what -subpattern 2 matched, which was "a". This fails. Because the recursion is -treated as an atomic group, there are now no backtracking points, and so the -entire match fails. (Perl is able, at this point, to re-enter the recursion and -try the second alternative.) However, if the pattern is written with the -alternatives in the other order, things are different:</p> - -<quote><p> ^((.)(?1)\2|.)$</p></quote> - -<p>This time, the recursing alternative is tried first, and continues to recurse -until it runs out of characters, at which point the recursion fails. But this -time we do have another alternative to try at the higher level. That is the big -difference: in the previous case the remaining alternative is at a deeper -recursion level, which PCRE cannot use.</p> - -<p>To change the pattern so that it matches all palindromic strings, not just -those with an odd number of characters, it is tempting to change the pattern to -this:</p> - -<quote><p> ^((.)(?1)\2|.?)$</p></quote> - -<p>Again, this works in Perl, but not in PCRE, and for the same reason. When a -deeper recursion has matched a single character, it cannot be entered again in -order to match an empty string. The solution is to separate the two cases, and -write out the odd and even cases as alternatives at the higher level:</p> - -<quote><p> ^(?:((.)(?1)\2|)|((.)(?3)\4|.))</p></quote> - -<p>If you want to match typical palindromic phrases, the pattern has to ignore all -non-word characters, which can be done like this:</p> - - <quote><p> ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$</p></quote> - -<p>If run with the <c>caseless</c> option, this pattern matches phrases such as "A -man, a plan, a canal: Panama!" and it works well in both PCRE and Perl. Note -the use of the possessive quantifier *+ to avoid backtracking into sequences of -non-word characters. Without this, PCRE takes a great deal longer (ten times or -more) to match typical phrases, and Perl takes so long that you think it has -gone into a loop.</p> - -<p><em>WARNING</em>: The palindrome-matching patterns above work only if the subject -string does not start with a palindrome that is shorter than the entire string. -For example, although "abcba" is correctly matched, if the subject is "ababa", -PCRE finds the palindrome "aba" at the start, then fails at top level because -the end of the string does not follow. Once again, it cannot jump back into the -recursion to try other alternatives, so the entire match fails.</p> - -<p>The second way in which PCRE and Perl differ in their recursion processing is -in the handling of captured values. In Perl, when a subpattern is called -recursively or as a subpattern (see the next section), it has no access to any -values that were captured outside the recursion, whereas in PCRE these values -can be referenced. Consider this pattern:</p> - -<quote><p> ^(.)(\1|a(?2))</p></quote> - -<p>In PCRE, this pattern matches "bab". The first capturing parentheses match "b", -then in the second group, when the back reference \1 fails to match "b", the -second alternative matches "a" and then recurses. In the recursion, \1 does -now match "b" and so the whole match succeeds. In Perl, the pattern fails to -match because inside the recursive call \1 cannot access the externally set -value.</p> - -</section> - -<section><marker id="sect21"></marker><title>Subpatterns as subroutines</title> - -<p>If the syntax for a recursive subpattern call (either by number or by -name) is used outside the parentheses to which it refers, it operates like a -subroutine in a programming language. The called subpattern may be defined -before or after the reference. A numbered reference can be absolute or -relative, as in these examples:</p> - -<list> - <item>(...(absolute)...)...(?2)...</item> - <item>(...(relative)...)...(?-1)...</item> - <item>(...(?+1)...(relative)...</item> -</list> - -<p>An earlier example pointed out that the pattern</p> - -<quote><p> (sens|respons)e and \1ibility</p></quote> - -<p>matches "sense and sensibility" and "response and responsibility", but not -"sense and responsibility". If instead the pattern</p> - -<quote><p> (sens|respons)e and (?1)ibility</p></quote> - -<p>is used, it does match "sense and responsibility" as well as the other two -strings. Another example is given in the discussion of DEFINE above.</p> - -<p>All subroutine calls, whether recursive or not, are always treated as atomic -groups. That is, once a subroutine has matched some of the subject string, it -is never re-entered, even if it contains untried alternatives and there is a -subsequent matching failure. Any capturing parentheses that are set during the -subroutine call revert to their previous values afterwards.</p> - -<p>Processing options such as case-independence are fixed when a subpattern is -defined, so if it is used as a subroutine, such options cannot be changed for -different calls. For example, consider this pattern:</p> -<quote><p> (abc)(?i:(?-1))</p></quote> - -<p>It matches "abcabc". It does not match "abcABC" because the change of -processing option does not affect the called subpattern.</p> - -</section> - -<section><marker id="sect22"></marker><title>Oniguruma subroutine syntax</title> -<p>For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or -a number enclosed either in angle brackets or single quotes, is an alternative -syntax for referencing a subpattern as a subroutine, possibly recursively. Here -are two of the examples used above, rewritten using this syntax:</p> -<quote> - <p> (?<pn> \( ( (?>[^()]+) | \g<pn> )* \) )</p> - <p> (sens|respons)e and \g'1'ibility</p> -</quote> -<p>PCRE supports an extension to Oniguruma: if a number is preceded by a -plus or a minus sign it is taken as a relative reference. For example:</p> - - <quote><p> (abc)(?i:\g<-1>)</p></quote> - -<p>Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are <i>not</i> -synonymous. The former is a back reference; the latter is a subroutine call.</p> - -</section> -<!-- XXX C interface - -<section> <marker id="sect22"><title>Callouts</title></marker> - -<p>Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl -code to be obeyed in the middle of matching a regular expression. This makes it -possible, amongst other things, to extract different substrings that match the -same pair of parentheses when there is a repetition.</p> - -<p>PCRE provides a similar feature, but of course it cannot obey arbitrary Perl -code. The feature is called "callout". The caller of PCRE provides an external -function by putting its entry point in the global variable <em>pcre_callout</em>. -By default, this variable contains NULL, which disables all calling out.</p> - -<p>Within a regular expression, (?C) indicates the points at which the external -function is to be called. If you want to identify different callout points, you -can put a number less than 256 after the letter C. The default value is zero. -For example, this pattern has two callout points:</p> - -<quote><p> (?C1)abc(?C2)def</p></quote> - - -<p>If the <c>AUTO_CALLOUT</c> flag is passed to <c>re:compile/2</c>, callouts are -automatically installed before each item in the pattern. They are all numbered -255.</p> + <note> + <p>The fact that (*PRUNE:NAME) remembers the name is useless to the Erlang + programmer, as names cannot be retrieved.</p> + </note> -<p>During matching, when PCRE reaches a callout point (and <em>pcre_callout</em> is -set), the external function is called. It is provided with the number of the -callout, the position in the pattern, and, optionally, one item of data -originally supplied by the caller of <c>re:run/3</c>. The callout function -may cause matching to proceed, to backtrack, or to fail altogether. A complete -description of the interface to the callout function is given in the -<em>pcrecallout</em> -documentation.</p> + <p>The following verb, when specified without a name, is like (*PRUNE), + except that if the pattern is unanchored, the "bumpalong" advance is not + to the next character, but to the position in the subject where (*SKIP) + was encountered.</p> + <code> +(*SKIP)</code> -</section> ---> + <p>(*SKIP) signifies that whatever text was matched leading up to it cannot + be part of a successful match. Consider:</p> -<section><marker id="sect23"></marker><title>Backtracking control</title> - -<p>Perl 5.10 introduced a number of "Special Backtracking Control Verbs", which -are still described in the Perl documentation as "experimental and subject to -change or removal in a future version of Perl". It goes on to say: "Their usage -in production code should be noted to avoid problems during upgrades." The same -remarks apply to the PCRE features described in this section.</p> - -<p>The new verbs make use of what was previously invalid syntax: an opening -parenthesis followed by an asterisk. They are generally of the form -(*VERB) or (*VERB:NAME). Some may take either form, possibly behaving -differently depending on whether or not a name is present. A name is any -sequence of characters that does not include a closing parenthesis. The maximum -length of name is 255 in the 8-bit library and 65535 in the 16-bit and 32-bit -libraries. If the name is empty, that is, if the closing parenthesis -immediately follows the colon, the effect is as if the colon were not there. -Any number of these verbs may occur in a pattern.</p> - -<!-- XXX C interface -<p>Since these verbs are specifically related to backtracking, most of them can be -used only when the pattern is to be matched using one of the traditional -matching functions, because these use a backtracking algorithm. With the -exception of (*FAIL), which behaves like a failing negative assertion, the -backtracking control verbs cause an error if encountered by a DFA matching -function.</p> ---> -<p>The behaviour of these verbs in -repeated groups, assertions, -and in subpatterns called as subroutines -(whether or not recursively) is documented below.</p> - -<p><em>Optimizations that affect backtracking verbs</em></p> - -<p>PCRE contains some optimizations that are used to speed up matching by running -some checks at the start of each match attempt. For example, it may know the -minimum length of matching subject, or that a particular character must be -present. When one of these optimizations bypasses the running of a match, any -included backtracking verbs will not, of course, be processed. You can suppress -the start-of-match optimizations by setting the <c>no_start_optimize</c> option -when calling <c>re:compile/2</c> or <c>re:run/3</c>, or by starting the -pattern with (*NO_START_OPT).</p> - -<p>Experiments with Perl suggest that it too has similar optimizations, sometimes -leading to anomalous results.</p> - -<p><em>Verbs that act immediately</em></p> - -<p>The following verbs act as soon as they are encountered. They may not be -followed by a name.</p> - -<quote><p> (*ACCEPT)</p></quote> - -<p>This verb causes the match to end successfully, skipping the remainder of the -pattern. However, when it is inside a subpattern that is called as a -subroutine, only that subpattern is ended successfully. Matching then continues -at the outer level. If (*ACCEPT) in triggered in a positive assertion, the -assertion succeeds; in a negative assertion, the assertion fails.</p> - -<p>If (*ACCEPT) is inside capturing parentheses, the data so far is captured. For -example:</p> - -<quote><p> A((?:A|B(*ACCEPT)|C)D)</p></quote> - -<p>This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by -the outer parentheses.</p> - -<quote><p> (*FAIL) or (*F)</p></quote> - -<p>This verb causes a matching failure, forcing backtracking to occur. It is -equivalent to (?!) but easier to read. The Perl documentation notes that it is -probably useful only when combined with (?{}) or (??{}). Those are, of course, -Perl features that are not present in PCRE. The nearest equivalent is the -callout feature, as for example in this pattern:</p> - -<quote><p> a+(?C)(*FAIL)</p></quote> - -<p>A match with the string "aaaa" always fails, but the callout is taken before -each backtrack happens (in this example, 10 times).</p> - -<p><em>Recording which path was taken</em></p> - -<p>There is one verb whose main purpose is to track how a match was arrived at, -though it also has a secondary use in conjunction with advancing the match -starting point (see (*SKIP) below).</p> - -<warning> -<p>In Erlang, there is no interface to retrieve a mark with <c>re:run/{2,3]</c>, -so only the secondary purpose is relevant to the Erlang programmer!</p> -<p>The rest of this section is therefore deliberately not adapted for reading -by the Erlang programmer, however the examples might help in understanding NAMES as -they can be used by (*SKIP).</p> -</warning> - -<quote><p> (*MARK:NAME) or (*:NAME)</p></quote> - -<p>A name is always required with this verb. There may be as many instances of -(*MARK) as you like in a pattern, and their names do not have to be unique.</p> - -<p>When a match succeeds, the name of the last-encountered (*MARK:NAME), -(*PRUNE:NAME), or (*THEN:NAME) on the matching path is passed back to the -caller as described in the section entitled "Extra data for <c>pcre_exec()</c>" -in the <c>pcreapi</c> -documentation. Here is an example of <c>pcretest</c> output, where the /K -modifier requests the retrieval and outputting of (*MARK) data:</p> -<code> - re> /X(*MARK:A)Y|X(*MARK:B)Z/K - data> XY - 0: XY - MK: A - XZ - 0: XZ - MK: B</code> - -<p>The (*MARK) name is tagged with "MK:" in this output, and in this example it -indicates which of the two alternatives matched. This is a more efficient way -of obtaining this information than putting each alternative in its own -capturing parentheses.</p> - -<p>If a verb with a name is encountered in a positive assertion that is true, the -name is recorded and passed back if it is the last-encountered. This does not -happen for negative assertions or failing positive assertions.</p> - -<p>After a partial match or a failed match, the last encountered name in the -entire match process is returned. For example:</p> -<code> - re> /X(*MARK:A)Y|X(*MARK:B)Z/K - data> XP - No match, mark = B</code> - -<p>Note that in this unanchored example the mark is retained from the match -attempt that started at the letter "X" in the subject. Subsequent match -attempts starting at "P" and then with an empty string do not get as far as the -(*MARK) item, but nevertheless do not reset it.</p> - -<!-- -<p>If you are interested in (*MARK) values after failed matches, you should -probably set the PCRE_NO_START_OPTIMIZE option -(see above) -to ensure that the match is always attempted.</p> ---> - -<p><em>Verbs that act after backtracking</em></p> - -<p>The following verbs do nothing when they are encountered. Matching continues -with what follows, but if there is no subsequent match, causing a backtrack to -the verb, a failure is forced. That is, backtracking cannot pass to the left of -the verb. However, when one of these verbs appears inside an atomic group or an -assertion that is true, its effect is confined to that group, because once the -group has been matched, there is never any backtracking into it. In this -situation, backtracking can "jump back" to the left of the entire atomic group -or assertion. (Remember also, as stated above, that this localization also -applies in subroutine calls.)</p> - -<p>These verbs differ in exactly what kind of failure occurs when backtracking -reaches them. The behaviour described below is what happens when the verb is -not in a subroutine or an assertion. Subsequent sections cover these special -cases.</p> - -<quote><p> (*COMMIT)</p></quote> - -<p>This verb, which may not be followed by a name, causes the whole match to fail -outright if there is a later matching failure that causes backtracking to reach -it. Even if the pattern is unanchored, no further attempts to find a match by -advancing the starting point take place. If (*COMMIT) is the only backtracking -verb that is encountered, once it has been passed <c>re:run/{2,3}</c> is -committed to finding a match at the current starting point, or not at all. For -example:</p> - -<quote><p> a+(*COMMIT)b</p></quote> - -<p>This matches "xxaab" but not "aacaab". It can be thought of as a kind of -dynamic anchor, or "I've started, so I must finish." The name of the most -recently passed (*MARK) in the path is passed back when (*COMMIT) forces a -match failure.</p> - -<p>If there is more than one backtracking verb in a pattern, a different one that -follows (*COMMIT) may be triggered first, so merely passing (*COMMIT) during a -match does not always guarantee that a match must be at this starting point.</p> - -<p>Note that (*COMMIT) at the start of a pattern is not the same as an anchor, -unless PCRE's start-of-match optimizations are turned off, as shown in this - example:</p> -<code type="none"> - 1> re:run("xyzabc","(*COMMIT)abc",[{capture,all,list}]). - {match,["abc"]} - 2> re:run("xyzabc","(*COMMIT)abc",[{capture,all,list},no_start_optimize]). - nomatch</code> - -<p>PCRE knows that any match must start with "a", so the optimization skips along -the subject to "a" before running the first match attempt, which succeeds. When -the optimization is disabled by the <c>no_start_optimize</c> option, the match -starts at "x" and so the (*COMMIT) causes it to fail without trying any other -starting points.</p> - -<quote><p> (*PRUNE) or (*PRUNE:NAME)</p></quote> - -<p>This verb causes the match to fail at the current starting position in the -subject if there is a later matching failure that causes backtracking to reach -it. If the pattern is unanchored, the normal "bumpalong" advance to the next -starting character then happens. Backtracking can occur as usual to the left of -(*PRUNE), before it is reached, or when matching to the right of (*PRUNE), but -if there is no match to the right, backtracking cannot cross (*PRUNE). In -simple cases, the use of (*PRUNE) is just an alternative to an atomic group or -possessive quantifier, but there are some uses of (*PRUNE) that cannot be -expressed in any other way. In an anchored pattern (*PRUNE) has the same effect -as (*COMMIT).</p> - -<p>The behaviour of (*PRUNE:NAME) is the not the same as (*MARK:NAME)(*PRUNE). -It is like (*MARK:NAME) in that the name is remembered for passing back to the -caller. However, (*SKIP:NAME) searches only for names set with (*MARK).</p> - -<warning> -<p>The fact that (*PRUNE:NAME) remembers the name is useless to the Erlang programmer, -as names can not be retrieved.</p> -</warning> - -<quote><p> (*SKIP)</p></quote> - -<p>This verb, when given without a name, is like (*PRUNE), except that if the -pattern is unanchored, the "bumpalong" advance is not to the next character, -but to the position in the subject where (*SKIP) was encountered. (*SKIP) -signifies that whatever text was matched leading up to it cannot be part of a -successful match. Consider:</p> - -<quote><p> a+(*SKIP)b</p></quote> - -<p>If the subject is "aaaac...", after the first match attempt fails (starting at -the first character in the string), the starting point skips on to start the -next attempt at "c". Note that a possessive quantifer does not have the same -effect as this example; although it would suppress backtracking during the -first match attempt, the second attempt would start at the second character -instead of skipping on to "c".</p> - -<quote><p> (*SKIP:NAME)</p></quote> - -<p>When (*SKIP) has an associated name, its behaviour is modified. When it is -triggered, the previous path through the pattern is searched for the most -recent (*MARK) that has the same name. If one is found, the "bumpalong" advance -is to the subject position that corresponds to that (*MARK) instead of to where -(*SKIP) was encountered. If no (*MARK) with a matching name is found, the -(*SKIP) is ignored.</p> - -<p>Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores -names that are set by (*PRUNE:NAME) or (*THEN:NAME).</p> - -<quote><p> (*THEN) or (*THEN:NAME)</p></quote> - -<p>This verb causes a skip to the next innermost alternative when backtracking -reaches it. That is, it cancels any further backtracking within the current -alternative. Its name comes from the observation that it can be used for a -pattern-based if-then-else block:</p> - -<quote><p> ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...</p></quote> + <code> +a+(*SKIP)b</code> -<p>If the COND1 pattern matches, FOO is tried (and possibly further items after -the end of the group if FOO succeeds); on failure, the matcher skips to the -second alternative and tries COND2, without backtracking into COND1. If that -succeeds and BAR fails, COND3 is tried. If subsequently BAZ fails, there are no -more alternatives, so there is a backtrack to whatever came before the entire -group. If (*THEN) is not inside an alternation, it acts like (*PRUNE).</p> - -<p>The behaviour of (*THEN:NAME) is the not the same as (*MARK:NAME)(*THEN). -It is like (*MARK:NAME) in that the name is remembered for passing back to the -caller. However, (*SKIP:NAME) searches only for names set with (*MARK).</p> - -<warning> -<p>The fact that (*THEN:NAME) remembers the name is useless to the Erlang programmer, -as names can not be retrieved.</p> -</warning> - -<p>A subpattern that does not contain a | character is just a part of the -enclosing alternative; it is not a nested alternation with only one -alternative. The effect of (*THEN) extends beyond such a subpattern to the -enclosing alternative. Consider this pattern, where A, B, etc. are complex -pattern fragments that do not contain any | characters at this level:</p> - -<quote><p> A (B(*THEN)C) | D</p></quote> + <p>If the subject is "aaaac...", after the first match attempt fails + (starting at the first character in the string), the starting point skips + on to start the next attempt at "c". Notice that a possessive quantifier + does not have the same effect as this example; although it would suppress + backtracking during the first match attempt, the second attempt would + start at the second character instead of skipping on to "c".</p> -<p>If A and B are matched, but there is a failure in C, matching does not -backtrack into A; instead it moves to the next alternative, that is, D. -However, if the subpattern containing (*THEN) is given an alternative, it -behaves differently:</p> - -<quote><p> A (B(*THEN)C | (*FAIL)) | D</p></quote> + <p>When (*SKIP) has an associated name, its behavior is modified:</p> -<p>The effect of (*THEN) is now confined to the inner subpattern. After a failure -in C, matching moves to (*FAIL), which causes the whole subpattern to fail -because there are no more alternatives to try. In this case, matching does now -backtrack into A.</p> + <code> +(*SKIP:NAME)</code> -<p>Note that a conditional subpattern is not considered as having two -alternatives, because only one is ever used. In other words, the | character in -a conditional subpattern has a different meaning. Ignoring white space, -consider:</p> + <p>When this is triggered, the previous path through the pattern is searched + for the most recent (*MARK) that has the same name. If one is found, the + "bumpalong" advance is to the subject position that corresponds to that + (*MARK) instead of to where (*SKIP) was encountered. If no (*MARK) with a + matching name is found, (*SKIP) is ignored.</p> -<quote><p> ^.*? (?(?=a) a | b(*THEN)c )</p></quote> + <p>Notice that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It + ignores names that are set by (*PRUNE:NAME) or (*THEN:NAME).</p> -<p>If the subject is "ba", this pattern does not match. Because .*? is ungreedy, -it initially matches zero characters. The condition (?=a) then fails, the -character "b" is matched, but "c" is not. At this point, matching does not -backtrack to .*? as might perhaps be expected from the presence of the | -character. The conditional subpattern is part of the single alternative that -comprises the whole pattern, and so the match fails. (If there was a backtrack -into .*?, allowing it to match "b", the match would succeed.)</p> + <p>The following verb causes a skip to the next innermost alternative when + backtracking reaches it. That is, it cancels any further backtracking + within the current alternative.</p> -<p>The verbs just described provide four different "strengths" of control when -subsequent matching fails. (*THEN) is the weakest, carrying on the match at the -next alternative. (*PRUNE) comes next, failing the match at the current -starting position, but allowing an advance to the next character (for an -unanchored pattern). (*SKIP) is similar, except that the advance may be more -than one character. (*COMMIT) is the strongest, causing the entire match to -fail.</p> + <code> +(*THEN) or (*THEN:NAME)</code> + <p>The verb name comes from the observation that it can be used for a + pattern-based if-then-else block:</p> -<p><em>More than one backtracking verb</em></p> + <code> +( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...</code> -<p>If more than one backtracking verb is present in a pattern, the one that is -backtracked onto first acts. For example, consider this pattern, where A, B, -etc. are complex pattern fragments:</p> + <p>If the COND1 pattern matches, FOO is tried (and possibly further items + after the end of the group if FOO succeeds). On failure, the matcher skips + to the second alternative and tries COND2, without backtracking into + COND1. If that succeeds and BAR fails, COND3 is tried. If BAZ then fails, + there are no more alternatives, so there is a backtrack to whatever + came before the entire group. If (*THEN) is not inside an alternation, it + acts like (*PRUNE).</p> -<quote><p> (A(*COMMIT)B(*THEN)C|ABD)</p></quote> + <p>The behavior of (*THEN:NAME) is the not the same as (*MARK:NAME)(*THEN). + It is like (*MARK:NAME) in that the name is remembered for passing back to + the caller. However, (*SKIP:NAME) searches only for names set with + (*MARK).</p> -<p>If A matches but B fails, the backtrack to (*COMMIT) causes the entire match to -fail. However, if A and B match, but C fails, the backtrack to (*THEN) causes -the next alternative (ABD) to be tried. This behaviour is consistent, but is -not always the same as Perl's. It means that if two or more backtracking verbs -appear in succession, all the the last of them has no effect. Consider this -example:</p> + <note> + <p>The fact that (*THEN:NAME) remembers the name is useless to the Erlang + programmer, as names cannot be retrieved.</p> + </note> -<quote><p> ...(*COMMIT)(*PRUNE)...</p></quote> + <p>A subpattern that does not contain a | character is just a part of the + enclosing alternative; it is not a nested alternation with only one + alternative. The effect of (*THEN) extends beyond such a subpattern to the + enclosing alternative. Consider the following pattern, where A, B, and so + on, are complex pattern fragments that do not contain any | characters at + this level:</p> + + <code> +A (B(*THEN)C) | D</code> + + <p>If A and B are matched, but there is a failure in C, matching does not + backtrack into A; instead it moves to the next alternative, that is, D. + However, if the subpattern containing (*THEN) is given an alternative, it + behaves differently:</p> + + <code> +A (B(*THEN)C | (*FAIL)) | D</code> + + <p>The effect of (*THEN) is now confined to the inner subpattern. After a + failure in C, matching moves to (*FAIL), which causes the whole subpattern + to fail, as there are no more alternatives to try. In this case, matching + does now backtrack into A.</p> + + <p>Notice that a conditional subpattern is not considered as having two + alternatives, as only one is ever used. That is, the | character in a + conditional subpattern has a different meaning. Ignoring whitespace, + consider:</p> + + <code> +^.*? (?(?=a) a | b(*THEN)c )</code> + + <p>If the subject is "ba", this pattern does not match. As .*? is ungreedy, + it initially matches zero characters. The condition (?=a) then fails, the + character "b" is matched, but "c" is not. At this point, matching does not + backtrack to .*? as can perhaps be expected from the presence of the | + character. The conditional subpattern is part of the single alternative + that comprises the whole pattern, and so the match fails. (If there was a + backtrack into .*?, allowing it to match "b", the match would + succeed.)</p> + + <p>The verbs described above provide four different "strengths" of control + when subsequent matching fails:</p> + + <list type="bulleted"> + <item> + <p>(*THEN) is the weakest, carrying on the match at the next + alternative.</p> + </item> + <item> + <p>(*PRUNE) comes next, fails the match at the current starting + position, but allows an advance to the next character (for an + unanchored pattern).</p> + </item> + <item> + <p>(*SKIP) is similar, except that the advance can be more than one + character.</p> + </item> + <item> + <p>(*COMMIT) is the strongest, causing the entire match to fail.</p> + </item> + </list> -<p>If there is a matching failure to the right, backtracking onto (*PRUNE) cases -it to be triggered, and its action is taken. There can never be a backtrack -onto (*COMMIT).</p> + <p><em>More than One Backtracking Verb</em></p> -<p><em>Backtracking verbs in repeated groups</em></p> + <p>If more than one backtracking verb is present in a pattern, the one that + is backtracked onto first acts. For example, consider the following + pattern, where A, B, and so on, are complex pattern fragments:</p> -<p>PCRE differs from Perl in its handling of backtracking verbs in repeated -groups. For example, consider:</p> + <code> +(A(*COMMIT)B(*THEN)C|ABD)</code> -<quote><p> /(a(*COMMIT)b)+ac/</p></quote> + <p>If A matches but B fails, the backtrack to (*COMMIT) causes the entire + match to fail. However, if A and B match, but C fails, the backtrack to + (*THEN) causes the next alternative (ABD) to be tried. This behavior is + consistent, but is not always the same as in Perl. It means that if two or + more backtracking verbs appear in succession, the last of them has no + effect. Consider the following example:</p> -<p>If the subject is "abac", Perl matches, but PCRE fails because the (*COMMIT) in -the second repeat of the group acts.</p> + <code> +...(*COMMIT)(*PRUNE)...</code> -<p><em>Backtracking verbs in assertions</em></p> + <p>If there is a matching failure to the right, backtracking onto (*PRUNE) + cases it to be triggered, and its action is taken. There can never be a + backtrack onto (*COMMIT).</p> -<p>(*FAIL) in an assertion has its normal effect: it forces an immediate backtrack.</p> + <p><em>Backtracking Verbs in Repeated Groups</em></p> -<p>(*ACCEPT) in a positive assertion causes the assertion to succeed without any -further processing. In a negative assertion, (*ACCEPT) causes the assertion to -fail without any further processing.</p> + <p>PCRE differs from Perl in its handling of backtracking verbs in repeated + groups. For example, consider:</p> -<p>The other backtracking verbs are not treated specially if they appear in a -positive assertion. In particular, (*THEN) skips to the next alternative in the -innermost enclosing group that has alternations, whether or not this is within -the assertion.</p> + <code> +/(a(*COMMIT)b)+ac/</code> -<p>Negative assertions are, however, different, in order to ensure that changing a -positive assertion into a negative assertion changes its result. Backtracking -into (*COMMIT), (*SKIP), or (*PRUNE) causes a negative assertion to be true, -without considering any further alternative branches in the assertion. -Backtracking into (*THEN) causes it to skip to the next enclosing alternative -within the assertion (the normal behaviour), but if the assertion does not have -such an alternative, (*THEN) behaves like (*PRUNE).</p> + <p>If the subject is "abac", Perl matches, but PCRE fails because the + (*COMMIT) in the second repeat of the group acts.</p> -<p><em>Backtracking verbs in subroutines</em></p> + <p><em>Backtracking Verbs in Assertions</em></p> -<p>These behaviours occur whether or not the subpattern is called recursively. -Perl's treatment of subroutines is different in some cases.</p> + <p>(*FAIL) in an assertion has its normal effect: it forces an immediate + backtrack.</p> -<p>(*FAIL) in a subpattern called as a subroutine has its normal effect: it forces -an immediate backtrack.</p> + <p>(*ACCEPT) in a positive assertion causes the assertion to succeed without + any further processing. In a negative assertion, (*ACCEPT) causes the + assertion to fail without any further processing.</p> -<p>(*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to -succeed without any further processing. Matching then continues after the -subroutine call.</p> + <p>The other backtracking verbs are not treated specially if they appear in + a positive assertion. In particular, (*THEN) skips to the next alternative + in the innermost enclosing group that has alternations, regardless if this + is within the assertion.</p> -<p>(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine cause -the subroutine match to fail.</p> + <p>Negative assertions are, however, different, to ensure that changing a + positive assertion into a negative assertion changes its result. + Backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes a negative + assertion to be true, without considering any further alternative branches + in the assertion. Backtracking into (*THEN) causes it to skip to the next + enclosing alternative within the assertion (the normal behavior), but if + the assertion does not have such an alternative, (*THEN) behaves like + (*PRUNE).</p> -<p>(*THEN) skips to the next alternative in the innermost enclosing group within -the subpattern that has alternatives. If there is no such group within the -subpattern, (*THEN) causes the subroutine match to fail.</p> + <p><em>Backtracking Verbs in Subroutines</em></p> -</section> + <p>These behaviors occur regardless if the subpattern is called recursively. + The treatment of subroutines in Perl is different in some cases.</p> + <list type="bulleted"> + <item> + <p>(*FAIL) in a subpattern called as a subroutine has its normal effect: + it forces an immediate backtrack.</p> + </item> + <item> + <p>(*ACCEPT) in a subpattern called as a subroutine causes the + subroutine match to succeed without any further processing. Matching + then continues after the subroutine call.</p> + </item> + <item> + <p>(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a + subroutine cause the subroutine match to fail.</p> + </item> + <item> + <p>(*THEN) skips to the next alternative in the innermost enclosing + group within the subpattern that has alternatives. If there is no such + group within the subpattern, (*THEN) causes the subroutine match to + fail.</p> + </item> + </list> + </section> </erlref> diff --git a/lib/stdlib/doc/src/ref_man.xml b/lib/stdlib/doc/src/ref_man.xml index 404873ea32..878a3babc5 100644 --- a/lib/stdlib/doc/src/ref_man.xml +++ b/lib/stdlib/doc/src/ref_man.xml @@ -30,9 +30,6 @@ <file>application.xml</file> </header> <description> - <p>The Standard Erlang Libraries application, <em>STDLIB</em>, - contains modules for manipulating lists, strings and files etc.</p> - <br></br> </description> <xi:include href="stdlib_app.xml"/> <xi:include href="array.xml"/> diff --git a/lib/stdlib/doc/src/sets.xml b/lib/stdlib/doc/src/sets.xml index 531d18fbef..f7668af1ed 100644 --- a/lib/stdlib/doc/src/sets.xml +++ b/lib/stdlib/doc/src/sets.xml @@ -24,21 +24,23 @@ <title>sets</title> <prepared>Robert Virding</prepared> - <responsible>Bjarne Dacker</responsible> + <responsible>Bjarne Däcker</responsible> <docno>1</docno> <approved>Bjarne Däcker</approved> <checked></checked> - <date>99-07-27</date> + <date>1999-07-27</date> <rev>A</rev> - <file>sets.sgml</file> + <file>sets.xml</file> </header> <module>sets</module> - <modulesummary>Functions for Set Manipulation</modulesummary> + <modulesummary>Functions for set manipulation.</modulesummary> <description> <p>Sets are collections of elements with no duplicate elements. - The representation of a set is not defined.</p> - <p>This module provides exactly the same interface as the module - <c>ordsets</c> but with a defined representation. One difference is + The representation of a set is undefined.</p> + + <p>This module provides the same interface as the + <seealso marker="ordsets"><c>ordsets(3)</c></seealso> module + but with a defined representation. One difference is that while this module considers two elements as different if they do not match (<c>=:=</c>), <c>ordsets</c> considers two elements as different if and only if they do not compare equal (<c>==</c>).</p> @@ -47,151 +49,170 @@ <datatypes> <datatype> <name name="set" n_vars="1"/> - <desc><p>As returned by <c>new/0</c>.</p></desc> + <desc><p>As returned by + <seealso marker="#new/0"><c>new/0</c></seealso>.</p></desc> </datatype> <datatype> <name name="set" n_vars="0"/> </datatype> </datatypes> + <funcs> <func> - <name name="new" arity="0"/> - <fsummary>Return an empty set</fsummary> + <name name="add_element" arity="2"/> + <fsummary>Add an element to a <c>Set</c>.</fsummary> <desc> - <p>Returns a new empty set.</p> + <p>Returns a new set formed from <c><anno>Set1</anno></c> with + <c><anno>Element</anno></c> inserted.</p> </desc> </func> + <func> - <name name="is_set" arity="1"/> - <fsummary>Test for a <c>Set</c></fsummary> + <name name="del_element" arity="2"/> + <fsummary>Remove an element from a <c>Set</c>.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>Set</anno></c> is a set of - elements, otherwise <c>false</c>.</p> + <p>Returns <c><anno>Set1</anno></c>, but with + <c><anno>Element</anno></c> removed.</p> </desc> </func> + <func> - <name name="size" arity="1"/> - <fsummary>Return the number of elements in a set</fsummary> + <name name="filter" arity="2"/> + <fsummary>Filter set elements.</fsummary> <desc> - <p>Returns the number of elements in <c><anno>Set</anno></c>.</p> + <p>Filters elements in <c><anno>Set1</anno></c> with boolean function + <c><anno>Pred</anno></c>.</p> </desc> </func> + <func> - <name name="to_list" arity="1"/> - <fsummary>Convert a <c>Set</c>into a list</fsummary> + <name name="fold" arity="3"/> + <fsummary>Fold over set elements.</fsummary> <desc> - <p>Returns the elements of <c><anno>Set</anno></c> as a list. - The order of the returned elements is undefined.</p> + <p>Folds <c><anno>Function</anno></c> over every element in + <c><anno>Set</anno></c> and returns the final value of the + accumulator. The evaluation order is undefined.</p> </desc> </func> + <func> <name name="from_list" arity="1"/> - <fsummary>Convert a list into a <c>Set</c></fsummary> + <fsummary>Convert a list into a <c>Set</c>.</fsummary> <desc> <p>Returns a set of the elements in <c><anno>List</anno></c>.</p> </desc> </func> + <func> - <name name="is_element" arity="2"/> - <fsummary>Test for membership of a <c>Set</c></fsummary> + <name name="intersection" arity="1"/> + <fsummary>Return the intersection of a list of <c>Sets</c>.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>Element</anno></c> is an element of - <c><anno>Set</anno></c>, otherwise <c>false</c>.</p> + <p>Returns the intersection of the non-empty list of sets.</p> </desc> </func> + <func> - <name name="add_element" arity="2"/> - <fsummary>Add an element to a <c>Set</c></fsummary> + <name name="intersection" arity="2"/> + <fsummary>Return the intersection of two <c>Sets</c>.</fsummary> <desc> - <p>Returns a new set formed from <c><anno>Set1</anno></c> with - <c><anno>Element</anno></c> inserted.</p> + <p>Returns the intersection of <c><anno>Set1</anno></c> and + <c><anno>Set2</anno></c>.</p> </desc> </func> + <func> - <name name="del_element" arity="2"/> - <fsummary>Remove an element from a <c>Set</c></fsummary> + <name name="is_disjoint" arity="2"/> + <fsummary>Check whether two <c>Sets</c> are disjoint.</fsummary> <desc> - <p>Returns <c><anno>Set1</anno></c>, but with <c><anno>Element</anno></c> removed.</p> + <p>Returns <c>true</c> if <c><anno>Set1</anno></c> and + <c><anno>Set2</anno></c> are disjoint (have no elements in common), + otherwise <c>false</c>.</p> </desc> </func> + <func> - <name name="union" arity="2"/> - <fsummary>Return the union of two <c>Sets</c></fsummary> + <name name="is_element" arity="2"/> + <fsummary>Test for membership of a <c>Set</c>.</fsummary> <desc> - <p>Returns the merged (union) set of <c><anno>Set1</anno></c> and - <c><anno>Set2</anno></c>.</p> + <p>Returns <c>true</c> if <c><anno>Element</anno></c> is an element of + <c><anno>Set</anno></c>, otherwise <c>false</c>.</p> </desc> </func> + <func> - <name name="union" arity="1"/> - <fsummary>Return the union of a list of <c>Sets</c></fsummary> + <name name="is_set" arity="1"/> + <fsummary>Test for a <c>Set</c>.</fsummary> <desc> - <p>Returns the merged (union) set of the list of sets.</p> + <p>Returns <c>true</c> if <c><anno>Set</anno></c> is a set of + elements, otherwise <c>false</c>.</p> </desc> </func> + <func> - <name name="intersection" arity="2"/> - <fsummary>Return the intersection of two <c>Sets</c></fsummary> + <name name="is_subset" arity="2"/> + <fsummary>Test for subset.</fsummary> <desc> - <p>Returns the intersection of <c><anno>Set1</anno></c> and - <c><anno>Set2</anno></c>.</p> + <p>Returns <c>true</c> when every element of <c><anno>Set1</anno></c> is + also a member of <c><anno>Set2</anno></c>, otherwise <c>false</c>.</p> </desc> </func> + <func> - <name name="intersection" arity="1"/> - <fsummary>Return the intersection of a list of <c>Sets</c></fsummary> + <name name="new" arity="0"/> + <fsummary>Return an empty set.</fsummary> <desc> - <p>Returns the intersection of the non-empty list of sets.</p> + <p>Returns a new empty set.</p> </desc> </func> + <func> - <name name="is_disjoint" arity="2"/> - <fsummary>Check whether two <c>Sets</c> are disjoint</fsummary> + <name name="size" arity="1"/> + <fsummary>Return the number of elements in a set.</fsummary> <desc> - <p>Returns <c>true</c> if <c><anno>Set1</anno></c> and - <c><anno>Set2</anno></c> are disjoint (have no elements in common), - and <c>false</c> otherwise.</p> + <p>Returns the number of elements in <c><anno>Set</anno></c>.</p> </desc> </func> + <func> <name name="subtract" arity="2"/> - <fsummary>Return the difference of two <c>Sets</c></fsummary> + <fsummary>Return the difference of two <c>Sets</c>.</fsummary> <desc> - <p>Returns only the elements of <c><anno>Set1</anno></c> which are not + <p>Returns only the elements of <c><anno>Set1</anno></c> that are not also elements of <c><anno>Set2</anno></c>.</p> </desc> </func> + <func> - <name name="is_subset" arity="2"/> - <fsummary>Test for subset</fsummary> + <name name="to_list" arity="1"/> + <fsummary>Convert a <c>Set</c>into a list.</fsummary> <desc> - <p>Returns <c>true</c> when every element of <c><anno>Set1</anno></c>1 is - also a member of <c><anno>Set2</anno></c>, otherwise <c>false</c>.</p> + <p>Returns the elements of <c><anno>Set</anno></c> as a list. + The order of the returned elements is undefined.</p> </desc> </func> + <func> - <name name="fold" arity="3"/> - <fsummary>Fold over set elements</fsummary> + <name name="union" arity="1"/> + <fsummary>Return the union of a list of <c>Sets</c>.</fsummary> <desc> - <p>Fold <c><anno>Function</anno></c> over every element in <c><anno>Set</anno></c> - returning the final value of the accumulator. - The evaluation order is undefined.</p> + <p>Returns the merged (union) set of the list of sets.</p> </desc> </func> + <func> - <name name="filter" arity="2"/> - <fsummary>Filter set elements</fsummary> + <name name="union" arity="2"/> + <fsummary>Return the union of two <c>Sets</c>.</fsummary> <desc> - <p>Filter elements in <c><anno>Set1</anno></c> with boolean function - <c><anno>Pred</anno></c>.</p> + <p>Returns the merged (union) set of <c><anno>Set1</anno></c> and + <c><anno>Set2</anno></c>.</p> </desc> </func> </funcs> <section> <title>See Also</title> - <p><seealso marker="ordsets">ordsets(3)</seealso>, - <seealso marker="gb_sets">gb_sets(3)</seealso></p> + <p><seealso marker="gb_sets"><c>gb_sets(3)</c></seealso>, + <seealso marker="ordsets"><c>ordsets(3)</c></seealso></p> </section> </erlref> diff --git a/lib/stdlib/doc/src/shell.xml b/lib/stdlib/doc/src/shell.xml index 65c441203c..d6e8036d4e 100644 --- a/lib/stdlib/doc/src/shell.xml +++ b/lib/stdlib/doc/src/shell.xml @@ -24,87 +24,96 @@ <title>shell</title> <prepared>Bjorn Gustavsson</prepared> - <responsible>Bjarne Dacker</responsible> + <responsible>Bjarne Däcker</responsible> <docno>1</docno> <approved>Bjarne Däcker</approved> <checked></checked> - <date>97-01-24</date> + <date>1997-01-24</date> <rev>A</rev> - <file>shell.sgml</file> + <file>shell.xml</file> </header> <module>shell</module> - <modulesummary>The Erlang Shell</modulesummary> + <modulesummary>The Erlang shell.</modulesummary> <description> - <p>The module <c>shell</c> implements an Erlang shell. - </p> - <p>The shell is a user interface program + <p>This module provides an Erlang shell.</p> + + <p>The shell is a user interface program for entering expression sequences. The expressions are - evaluated and a value is returned. + evaluated and a value is returned. A history mechanism saves previous commands and their values, which can then be incorporated in later commands. How many commands and results to save can be determined by the user, - either interactively, by calling <c>shell:history/1</c> and - <c>shell:results/1</c>, or by setting the application configuration + either interactively, by calling + <seealso marker="#history/1"><c>history/1</c></seealso> and + <seealso marker="#results/1"><c>results/1</c></seealso>, + or by setting the application configuration parameters <c>shell_history_length</c> and - <c>shell_saved_results</c> for the application STDLIB. - </p> - <p>The shell uses a helper process for evaluating commands in - order to protect the history mechanism from exceptions. By + <c>shell_saved_results</c> for the STDLIB application.</p> + + <p>The shell uses a helper process for evaluating commands + to protect the history mechanism from exceptions. By default the evaluator process is killed when an exception - occurs, but by calling <c>shell:catch_exception/1</c> or by + occurs, but by calling <seealso marker="#catch_exception/1"> + <c>catch_exception/1</c></seealso> or by setting the application configuration parameter - <c>shell_catch_exception</c> for the application STDLIB - this behavior can be changed. See also the example below. - </p> + <c>shell_catch_exception</c> for the STDLIB application + this behavior can be changed. See also the example below.</p> + <p>Variable bindings, and local process dictionary changes - which are generated in user expressions are preserved, and the variables + that are generated in user expressions are preserved, and the variables can be used in later commands to access their values. The - bindings can also be forgotten so the variables can be re-used. - </p> + bindings can also be forgotten so the variables can be reused.</p> + <p>The special shell commands all have the syntax of (local) function calls. They are evaluated as normal function calls and many commands can be used in one - expression sequence. - </p> + expression sequence.</p> + <p>If a command (local function call) is not recognized by the - shell, an attempt is first made to find the function in the + shell, an attempt is first made to find the function in module <c>user_default</c>, where customized local commands - can be placed. If found, then the function is evaluated. - Otherwise, an attempt is made to evaluate the function in the - module <c>shell_default</c>. The module - <c>user_default</c> must be explicitly loaded. - </p> + can be placed. If found, the function is evaluated, + otherwise an attempt is made to evaluate the function in + module <c>shell_default</c>. Module + <c>user_default</c> must be explicitly loaded.</p> + <p>The shell also permits the user to start multiple concurrent - jobs. A job can be regarded as a set of processes which can - communicate with the shell. - </p> + jobs. A job can be regarded as a set of processes that can + communicate with the shell.</p> + <p>There is some support for reading and printing records in the shell. During compilation record expressions are translated to tuple expressions. In runtime it is not known whether a tuple - actually represents a record. Nor are the record definitions - used by compiler available at runtime. So in order to read the + represents a record, and the record definitions + used by the compiler are unavailable at runtime. So, to read the record syntax and print tuples as records when possible, record - definitions have to be maintained by the shell itself. The shell - commands for reading, defining, forgetting, listing, and - printing records are described below. Note that each job has its - own set of record definitions. To facilitate matters record - definitions in the modules <c>shell_default</c> and + definitions must be maintained by the shell itself.</p> + + <p>The shell commands for reading, defining, forgetting, listing, and + printing records are described below. Notice that each job has its + own set of record definitions. To facilitate matters, record + definitions in modules <c>shell_default</c> and <c>user_default</c> (if loaded) are read each time a new job is - started. For instance, adding the line</p> + started. For example, adding the following line + to <c>user_default</c> makes the definition of <c>file_info</c> + readily available in the shell:</p> + <code type="none"> - -include_lib("kernel/include/file.hrl").</code> - <p>to <c>user_default</c> makes the definition of <c>file_info</c> - readily available in the shell. - </p> - <p>The shell runs in two modes: </p> +-include_lib("kernel/include/file.hrl").</code> + + <p>The shell runs in two modes:</p> + <list type="bulleted"> - <item><c>Normal (possibly restricted)</c> mode, in which - commands can be edited and expressions evaluated. + <item> + <p><c>Normal (possibly restricted)</c> mode, in which + commands can be edited and expressions evaluated</p> </item> - <item>Job Control Mode <c>JCL</c>, in which jobs can be - started, killed, detached and connected. + <item> + <p>Job Control Mode, <c>JCL</c>, in which jobs can be + started, killed, detached, and connected</p> </item> </list> + <p>Only the currently connected job can 'talk' to the shell.</p> </description> @@ -117,60 +126,51 @@ </item> <tag><c>f()</c></tag> <item> - <p>Removes all variable bindings. - </p> + <p>Removes all variable bindings.</p> </item> <tag><c>f(X)</c></tag> <item> - <p>Removes the binding of variable <c>X</c>. - </p> + <p>Removes the binding of variable <c>X</c>.</p> </item> <tag><c>h()</c></tag> <item> - <p>Prints the history list. - </p> + <p>Prints the history list.</p> </item> <tag><c>history(N)</c></tag> <item> <p>Sets the number of previous commands to keep in the history list to <c>N</c>. The previous number is returned. - The default number is 20. - </p> + Defaults to 20.</p> </item> <tag><c>results(N)</c></tag> <item> <p>Sets the number of results from previous commands to keep in the history list to <c>N</c>. The previous number is returned. - The default number is 20. - </p> + Defaults to 20.</p> </item> <tag><c>e(N)</c></tag> <item> - <p>Repeats the command <c>N</c>, if <c>N</c> is positive. If + <p>Repeats command <c>N</c>, if <c>N</c> is positive. If it is negative, the <c>N</c>th previous command is repeated - (i.e., <c>e(-1)</c> repeats the previous command). - </p> + (that is, <c>e(-1)</c> repeats the previous command).</p> </item> <tag><c>v(N)</c></tag> <item> - <p>Uses the return value of the command <c>N</c> in the + <p>Uses the return value of command <c>N</c> in the current command, if <c>N</c> is positive. If it is negative, the return value of the <c>N</c>th previous command is used - (i.e., <c>v(-1)</c> uses the value of the previous command). - </p> + (that is, <c>v(-1)</c> uses the value of the previous command).</p> </item> <tag><c>help()</c></tag> <item> - <p>Evaluates <c>shell_default:help()</c>. - </p> + <p>Evaluates <c>shell_default:help()</c>.</p> </item> <tag><c>c(File)</c></tag> <item> <p>Evaluates <c>shell_default:c(File)</c>. This compiles and loads code in <c>File</c> and purges old versions of code, if necessary. Assumes that the file and module names - are the same. - </p> + are the same.</p> </item> <tag><c>catch_exception(Bool)</c></tag> <item> @@ -179,161 +179,264 @@ (<c>false</c>) is to kill the evaluator process when an exception occurs, which causes the shell to create a new evaluator process. When the exception handling is set to - <c>true</c> the evaluator process lives on which means that - for instance ports and ETS tables as well as processes - linked to the evaluator process survive the exception. - </p> + <c>true</c>, the evaluator process lives on. This means, + for example, that ports and ETS tables as well as processes + linked to the evaluator process survive the exception.</p> </item> <tag><c>rd(RecordName, RecordDefinition)</c></tag> <item> <p>Defines a record in the shell. <c>RecordName</c> is an atom and <c>RecordDefinition</c> lists the field names and the default values. Usually record definitions are made - known to the shell by use of the <c>rr</c> commands + known to the shell by use of the <c>rr/1,2,3</c> commands described below, but sometimes it is handy to define records - on the fly. - </p> + on the fly.</p> </item> <tag><c>rf()</c></tag> <item> <p>Removes all record definitions, then reads record definitions from the modules <c>shell_default</c> and <c>user_default</c> (if loaded). Returns the names of the - records defined. - </p> + records defined.</p> </item> <tag><c>rf(RecordNames)</c></tag> <item> <p>Removes selected record definitions. <c>RecordNames</c> is a record name or a list of record names. - Use <c>'_'</c> to remove all record definitions. - </p> + To remove all record definitions, use <c>'_'</c>.</p> </item> <tag><c>rl()</c></tag> <item> - <p>Prints all record definitions. - </p> + <p>Prints all record definitions.</p> </item> <tag><c>rl(RecordNames)</c></tag> <item> <p>Prints selected record definitions. - <c>RecordNames</c> is a record name or a list of record names. - </p> + <c>RecordNames</c> is a record name or a list of record names.</p> </item> <tag><c>rp(Term)</c></tag> <item> <p>Prints a term using the record definitions known to the shell. All of <c>Term</c> is printed; the depth is not - limited as is the case when a return value is printed. - </p> + limited as is the case when a return value is printed.</p> </item> <tag><c>rr(Module)</c></tag> <item> <p>Reads record definitions from a module's BEAM file. If there are no record definitions in the BEAM file, the source file is located and read instead. Returns the names - of the record definitions read. <c>Module</c> is an atom. - </p> + of the record definitions read. <c>Module</c> is an atom.</p> </item> <tag><c>rr(Wildcard)</c></tag> <item> <p>Reads record definitions from files. Existing definitions of any of the record names read are replaced. <c>Wildcard</c> is a wildcard string as defined in - <c>filelib(3)</c> but not an atom. - </p> + <seealso marker="filelib"><c>filelib(3)</c></seealso>, + but not an atom.</p> </item> <tag><c>rr(WildcardOrModule, RecordNames)</c></tag> <item> <p>Reads record definitions from files but discards record names not mentioned in <c>RecordNames</c> (a - record name or a list of record names). - </p> + record name or a list of record names).</p> </item> <tag><c>rr(WildcardOrModule, RecordNames, Options)</c></tag> <item> <p>Reads record definitions from files. The compiler options <c>{i, Dir}</c>, <c>{d, Macro}</c>, and <c>{d, Macro, Value}</c> are recognized and used - for setting up the include path and macro definitions. Use - <c>'_'</c> as value of <c>RecordNames</c> to read all record - definitions. - </p> + for setting up the include path and macro definitions. + To read all record definitions, use + <c>'_'</c> as value of <c>RecordNames</c>.</p> </item> </taglist> </section> <section> <title>Example</title> - <p>The following example is a long dialogue with the shell. Commands + <p>The following example is a long dialog with the shell. Commands starting with <c>></c> are inputs to the shell. All other lines - are output from the shell. All commands in this example are explained at the end of the dialogue. - .</p> + are output from the shell.</p> + <pre> strider 1> <input>erl</input> Erlang (BEAM) emulator version 5.3 [hipe] [threads:0] Eshell V5.3 (abort with ^G) -1><input>Str = "abcd".</input> -"abcd" +1> <input>Str = "abcd".</input> +"abcd"</pre> + + <p>Command 1 sets variable <c>Str</c> to string <c>"abcd"</c>.</p> + + <pre> 2> <input>L = length(Str).</input> -4 +4</pre> + + <p>Command 2 sets <c>L</c> to the length of string <c>Str</c>.</p> + + <pre> 3> <input>Descriptor = {L, list_to_atom(Str)}.</input> -{4,abcd} +{4,abcd}</pre> + + <p>Command 3 builds the tuple <c>Descriptor</c>, evaluating the BIF + <seealso marker="erts:erlang#list_to_atom/1"><c>list_to_atom/1</c> + </seealso>.</p> + + <pre> 4> <input>L.</input> -4 +4</pre> + + <p>Command 4 prints the value of variable <c>L</c>.</p> + + <pre> 5> <input>b().</input> Descriptor = {4,abcd} L = 4 Str = "abcd" -ok +ok</pre> + + <p>Command 5 evaluates the internal shell command <c>b()</c>, which + is an abbreviation of "bindings". This prints + the current shell variables and their bindings. <c>ok</c> at + the end is the return value of function <c>b()</c>.</p> + + <pre> 6> <input>f(L).</input> -ok +ok</pre> + + <p>Command 6 evaluates the internal shell command <c>f(L)</c> (abbreviation + of "forget"). The value of variable <c>L</c> is removed.</p> + + <pre> 7> <input>b().</input> Descriptor = {4,abcd} Str = "abcd" -ok +ok</pre> + + <p>Command 7 prints the new bindings.</p> + + <pre> 8> <input>f(L).</input> -ok +ok</pre> + + <p>Command 8 has no effect, as <c>L</c> has no value.</p> + + <pre> 9> <input>{L, _} = Descriptor.</input> -{4,abcd} +{4,abcd}</pre> + + <p>Command 9 performs a pattern matching operation on + <c>Descriptor</c>, binding a new value to <c>L</c>.</p> + + <pre> 10> <input>L.</input> -4 +4</pre> + + <p>Command 10 prints the current value of <c>L</c>.</p> + + <pre> 11> <input>{P, Q, R} = Descriptor.</input> -** exception error: no match of right hand side value {4,abcd} +** exception error: no match of right hand side value {4,abcd}</pre> + + <p>Command 11 tries to match <c>{P, Q, R}</c> against + <c>Descriptor</c>, which is <c>{4, abc}</c>. The match fails and + none of the new variables become bound. The printout starting + with "<c>** exception error:</c>" is not the value of the + expression (the expression had no value because its evaluation + failed), but a warning printed by the system to inform + the user that an error has occurred. The values of the other + variables (<c>L</c>, <c>Str</c>, and so on) are unchanged.</p> + + <pre> 12> <input>P.</input> -* 1: variable 'P' is unbound ** +* 1: variable 'P' is unbound 13> <input>Descriptor.</input> -{4,abcd} +{4,abcd}</pre> + + <p>Commands 12 and 13 show that <c>P</c> is unbound because the + previous command failed, and that <c>Descriptor</c> has not + changed.</p> + + <pre> 14><input>{P, Q} = Descriptor.</input> {4,abcd} 15> <input>P.</input> -4 +4</pre> + + <p>Commands 14 and 15 show a correct match where <c>P</c> and + <c>Q</c> are bound.</p> + + <pre> 16> <input>f().</input> -ok +ok</pre> + + <p>Command 16 clears all bindings.</p> + + <p>The next few commands assume that <c>test1:demo(X)</c> is + defined as follows:</p> + + <p><c>demo(X) -></c><br></br> + <c>put(aa, worked),</c><br></br> + <c>X = 1,</c><br></br> + <c>X + 10.</c></p> + + <pre> 17> <input>put(aa, hello).</input> undefined 18> <input>get(aa).</input> -hello +hello</pre> + + <p>Commands 17 and 18 set and inspect the value of item + <c>aa</c> in the process dictionary.</p> + + <pre> 19> <input>Y = test1:demo(1).</input> -11 +11</pre> + + <p>Command 19 evaluates <c>test1:demo(1)</c>. The evaluation + succeeds and the changes made in the process dictionary become + visible to the shell. The new value of dictionary item + <c>aa</c> can be seen in command 20.</p> + + <pre> 20> <input>get().</input> [{aa,worked}] 21> <input>put(aa, hello).</input> worked 22> <input>Z = test1:demo(2).</input> ** exception error: no match of right hand side value 1 - in function test1:demo/1 + in function test1:demo/1</pre> + + <p>Commands 21 and 22 change the value of dictionary item + <c>aa</c> to <c>hello</c> and call <c>test1:demo(2)</c>. Evaluation + fails and the changes made to the dictionary in + <c>test1:demo(2)</c>, before the error occurred, are discarded.</p> + + <pre> 23> <input>Z.</input> -* 1: variable 'Z' is unbound ** +* 1: variable 'Z' is unbound 24> <input>get(aa).</input> -hello +hello</pre> + + <p>Commands 23 and 24 show that <c>Z</c> was not bound and that + dictionary item <c>aa</c> has retained its original value.</p> + + <pre> 25> <input>erase(), put(aa, hello).</input> undefined 26> <input>spawn(test1, demo, [1]).</input> <0.57.0> 27> <input>get(aa).</input> -hello +hello</pre> + + <p>Commands 25, 26, and 27 show the effect of evaluating + <c>test1:demo(1)</c> in the background. In this case, the + expression is evaluated in a newly spawned process. Any + changes made in the process dictionary are local to the newly + spawned process and therefore not visible to the shell.</p> + + <pre> 28> <input>io:format("hello hello\n").</input> hello hello ok @@ -341,31 +444,96 @@ ok hello hello ok 30> <input>v(28).</input> -ok +ok</pre> + + <p>Commands 28, 29 and 30 use the history facilities of the shell. + Command 29 re-evaluates command 28. Command 30 uses the value (result) + of command 28. In the cases of a pure function (a function + with no side effects), the result is the same. For a function + with side effects, the result can be different.</p> + + <p>The next few commands show some record manipulation. It is + assumed that <c>ex.erl</c> defines a record as follows:</p> + + <p><c>-record(rec, {a, b = val()}).</c></p> + <p><c>val() -></c><br></br> + <c>3.</c></p> + + <pre> 31> <input>c(ex).</input> {ok,ex} 32> <input>rr(ex).</input> -[rec] +[rec]</pre> + + <p>Commands 31 and 32 compile file <c>ex.erl</c> and read + the record definitions in <c>ex.beam</c>. If the compiler did not + output any record definitions on the BEAM file, <c>rr(ex)</c> + tries to read record definitions from the source file instead.</p> + + <pre> 33> <input>rl(rec).</input> -record(rec,{a,b = val()}). -ok +ok</pre> + + <p>Command 33 prints the definition of the record named + <c>rec</c>.</p> + + <pre> 34> <input>#rec{}.</input> -** exception error: undefined shell command val/0 +** exception error: undefined shell command val/0</pre> + + <p>Command 34 tries to create a <c>rec</c> record, but fails + as function <c>val/0</c> is undefined.</p> + + <pre> 35> <input>#rec{b = 3}.</input> -#rec{a = undefined,b = 3} +#rec{a = undefined,b = 3}</pre> + + <p>Command 35 shows the workaround: explicitly assign values to record + fields that cannot otherwise be initialized.</p> + + <pre> 36> <input>rp(v(-1)).</input> #rec{a = undefined,b = 3} -ok +ok</pre> + + <p>Command 36 prints the newly created record using record + definitions maintained by the shell.</p> + + <pre> 37> <input>rd(rec, {f = orddict:new()}).</input> -rec +rec</pre> + + <p>Command 37 defines a record directly in the shell. The + definition replaces the one read from file <c>ex.beam</c>.</p> + + <pre> 38> <input>#rec{}.</input> #rec{f = []} -ok +ok</pre> + + <p>Command 38 creates a record using the new definition, and + prints the result.</p> + + <pre> 39> <input>rd(rec, {c}), A.</input> -* 1: variable 'A' is unbound ** +* 1: variable 'A' is unbound 40> <input>#rec{}.</input> #rec{c = undefined} -ok +ok</pre> + + <p>Command 39 and 40 show that record definitions are updated + as side effects. The evaluation of the command fails, but + the definition of <c>rec</c> has been carried out.</p> + + <p>For the next command, it is assumed that <c>test1:loop(N)</c> is + defined as follows:</p> + + <p><c>loop(N) -></c><br></br> + <c>io:format("Hello Number: ~w~n", [N]),</c><br></br> + <c>loop(N+1).</c></p> + + <pre> 41> <input>test1:loop(0).</input> Hello Number: 0 Hello Number: 1 @@ -383,225 +551,122 @@ Hello Number: 3375 Hello Number: 3376 Hello Number: 3377 Hello Number: 3378 -** exception exit: killed +** exception exit: killed</pre> + + <p>Command 41 evaluates <c>test1:loop(0)</c>, which puts the + system into an infinite loop. At this point the user types + <c>^G</c> (Control G), which suspends output from the + current process, + which is stuck in a loop, and activates <c>JCL</c> mode. In <c>JCL</c> + mode the user can start and stop jobs.</p> + + <p>In this particular case, command <c>i</c> ("interrupt") + terminates the looping program, and command <c>c</c> + connects to the shell again. As the process was + running in the background before we killed it, more + printouts occur before message "<c>** exception exit: killed</c>" + is shown.</p> + + <pre> 42> <input>E = ets:new(t, []).</input> -17 +17</pre> + + <p>Command 42 creates an ETS table.</p> + + <pre> 43> <input>ets:insert({d,1,2}).</input> -** exception error: undefined function ets:insert/1 +** exception error: undefined function ets:insert/1</pre> + + <p>Command 43 tries to insert a tuple into the ETS table, but the + first argument (the table) is missing. The exception kills the + evaluator process.</p> + + <pre> 44> <input>ets:insert(E, {d,1,2}).</input> ** exception error: argument is of wrong type in function ets:insert/2 - called as ets:insert(16,{d,1,2}) + called as ets:insert(16,{d,1,2})</pre> + + <p>Command 44 corrects the mistake, but the ETS table has been + destroyed as it was owned by the killed evaluator process.</p> + + <pre> 45> <input>f(E).</input> ok 46> <input>catch_exception(true).</input> -false +false</pre> + + <p>Command 46 sets the exception handling of the evaluator process + to <c>true</c>. The exception handling can also be set when + starting Erlang by <c>erl -stdlib shell_catch_exception true</c>.</p> + + <pre> 47> <input>E = ets:new(t, []).</input> 18 48> <input>ets:insert({d,1,2}).</input> -* exception error: undefined function ets:insert/1 -49> <input>ets:insert(E, {d,1,2}).</input> -true -50> <input>halt().</input> -strider 2></pre> - </section> +* exception error: undefined function ets:insert/1</pre> - <section> - <title>Comments</title> - <p>Command 1 sets the variable <c>Str</c> to the string - <c>"abcd"</c>. - </p> - <p>Command 2 sets <c>L</c> to the length of the string evaluating - the BIF <c>atom_to_list</c>. - </p> - <p>Command 3 builds the tuple <c>Descriptor</c>. - </p> - <p>Command 4 prints the value of the variable <c>L</c>. - </p> - <p>Command 5 evaluates the internal shell command <c>b()</c>, which - is an abbreviation of "bindings". This prints - the current shell variables and their bindings. The <c>ok</c> at - the end is the return value of the <c>b()</c> function. - </p> - <p>Command 6 <c>f(L)</c> evaluates the internal shell command - <c>f(L)</c> (abbreviation of "forget"). The value of the variable - <c>L</c> is removed. - </p> - <p>Command 7 prints the new bindings. - </p> - <p>Command 8 has no effect since <c>L</c> has no value.</p> - <p>Command 9 performs a pattern matching operation on - <c>Descriptor</c>, binding a new value to <c>L</c>. - </p> - <p>Command 10 prints the current value of <c>L</c>. - </p> - <p>Command 11 tries to match <c>{P, Q, R}</c> against - <c>Descriptor</c> which is <c>{4, abc}</c>. The match fails and - none of the new variables become bound. The printout starting - with "<c>** exception error:</c>" is not the value of the - expression (the expression had no value because its evaluation - failed), but rather a warning printed by the system to inform - the user that an error has occurred. The values of the other - variables (<c>L</c>, <c>Str</c>, etc.) are unchanged. - </p> - <p>Commands 12 and 13 show that <c>P</c> is unbound because the - previous command failed, and that <c>Descriptor</c> has not - changed. - </p> - <p>Commands 14 and 15 show a correct match where <c>P</c> and - <c>Q</c> are bound. - </p> - <p>Command 16 clears all bindings. - </p> - <p>The next few commands assume that <c>test1:demo(X)</c> is - defined in the following way:</p> - <pre> -demo(X) -> - put(aa, worked), - X = 1, - X + 10. </pre> - <p>Commands 17 and 18 set and inspect the value of the item - <c>aa</c> in the process dictionary. - </p> - <p>Command 19 evaluates <c>test1:demo(1)</c>. The evaluation - succeeds and the changes made in the process dictionary become - visible to the shell. The new value of the dictionary item - <c>aa</c> can be seen in command 20. - </p> - <p>Commands 21 and 22 change the value of the dictionary item - <c>aa</c> to <c>hello</c> and call <c>test1:demo(2)</c>. Evaluation - fails and the changes made to the dictionary in - <c>test1:demo(2)</c>, before the error occurred, are discarded. - </p> - <p>Commands 23 and 24 show that <c>Z</c> was not bound and that the - dictionary item <c>aa</c> has retained its original value. - </p> - <p>Commands 25, 26 and 27 show the effect of evaluating - <c>test1:demo(1)</c> in the background. In this case, the - expression is evaluated in a newly spawned process. Any - changes made in the process dictionary are local to the newly - spawned process and therefore not visible to the shell. - </p> - <p>Commands 28, 29 and 30 use the history facilities of the shell. - </p> - <p>Command 29 is <c>e(28)</c>. This re-evaluates command - 28. Command 30 is <c>v(28)</c>. This uses the value (result) of - command 28. In the cases of a pure function (a function - with no side effects), the result is the same. For a function - with side effects, the result can be different. - </p> - <p>The next few commands show some record manipulation. It is - assumed that <c>ex.erl</c> defines a record like this:</p> - <pre> --record(rec, {a, b = val()}). - -val() -> - 3. </pre> - <p>Commands 31 and 32 compiles the file <c>ex.erl</c> and reads - the record definitions in <c>ex.beam</c>. If the compiler did not - output any record definitions on the BEAM file, <c>rr(ex)</c> - tries to read record definitions from the source file instead. - </p> - <p>Command 33 prints the definition of the record named - <c>rec</c>. - </p> - <p>Command 34 tries to create a <c>rec</c> record, but fails - since the function <c>val/0</c> is undefined. Command 35 shows - the workaround: explicitly assign values to record fields that - cannot otherwise be initialized. - </p> - <p>Command 36 prints the newly created record using record - definitions maintained by the shell. - </p> - <p>Command 37 defines a record directly in the shell. The - definition replaces the one read from the file <c>ex.beam</c>. - </p> - <p>Command 38 creates a record using the new definition, and - prints the result. - </p> - <p>Command 39 and 40 show that record definitions are updated - as side effects. The evaluation of the command fails but - the definition of <c>rec</c> has been carried out. - </p> - <p>For the next command, it is assumed that <c>test1:loop(N)</c> is - defined in the following way:</p> - <pre> -loop(N) -> - io:format("Hello Number: ~w~n", [N]), - loop(N+1).</pre> - <p>Command 41 evaluates <c>test1:loop(0)</c>, which puts the - system into an infinite loop. At this point the user types - <c>Control G</c>, which suspends output from the current process, - which is stuck in a loop, and activates <c>JCL</c> mode. In <c>JCL</c> - mode the user can start and stop jobs. - </p> - <p>In this particular case, the <c>i</c> command ("interrupt") is - used to terminate the looping program, and the <c>c</c> command - is used to connect to the shell again. Since the process was - running in the background before we killed it, there will be - more printouts before the "<c>** exception exit: killed</c>" - message is shown. - </p> - <p>Command 42 creates an ETS table.</p> - <p>Command 43 tries to insert a tuple into the ETS table but the - first argument (the table) is missing. The exception kills the - evaluator process.</p> - <p>Command 44 corrects the mistake, but the ETS table has been - destroyed since it was owned by the killed evaluator process.</p> - <p>Command 46 sets the exception handling of the evaluator process - to <c>true</c>. The exception handling can also be set when - starting Erlang, like this: <c>erl -stdlib shell_catch_exception - true</c>.</p> <p>Command 48 makes the same mistake as in command 43, but this time the evaluator process lives on. The single star at the beginning of the printout signals that the exception has been caught.</p> + + <pre> +49> <input>ets:insert(E, {d,1,2}).</input> +true</pre> + <p>Command 49 successfully inserts the tuple into the ETS table.</p> - <p>The <c>halt()</c> command exits the Erlang runtime system. - </p> + + <pre> +50> <input>halt().</input> +strider 2></pre> + + <p>Command 50 exits the Erlang runtime system.</p> </section> <section> <title>JCL Mode</title> <p>When the shell starts, it starts a single evaluator - process. This process, together with any local processes which + process. This process, together with any local processes that it spawns, is referred to as a <c>job</c>. Only the current job, which is said to be <c>connected</c>, can perform operations - with standard IO. All other jobs, which are said to be <c>detached</c>, are - <c>blocked</c> if they attempt to use standard IO. - </p> - <p>All jobs which do not use standard IO run in the normal way. - </p> - <p>The shell escape key <em><c>^G</c></em> (Control G) detaches the current job - and activates <c>JCL</c> mode. The <c>JCL</c> mode prompt is <c>"-->"</c>. If <c>"?"</c> is entered at the prompt, the following help message is - displayed:</p> - <pre> - --> ? - c [nn] - connect to job - i [nn] - interrupt job - k [nn] - kill job - j - list all jobs - s [shell] - start local shell - r [node [shell]] - start remote shell - q - quit erlang - ? | h - this message </pre> + with standard I/O. All other jobs, which are said to be <c>detached</c>, + are <c>blocked</c> if they attempt to use standard I/O.</p> + + <p>All jobs that do not use standard I/O run in the normal way.</p> + + <p>The shell escape key <c>^G</c> (Control G) detaches the current + job and activates <c>JCL</c> mode. The <c>JCL</c> mode prompt is + <c>"-->"</c>. If <c>"?"</c> is entered at the prompt, the following help + message is displayed:</p> + + <pre> +--> ? +c [nn] - connect to job +i [nn] - interrupt job +k [nn] - kill job +j - list all jobs +s [shell] - start local shell +r [node [shell]] - start remote shell +q - quit erlang +? | h - this message</pre> + <p>The <c>JCL</c> commands have the following meaning:</p> + <taglist> <tag><c>c [nn]</c></tag> <item> <p>Connects to job number <c><![CDATA[<nn>]]></c> or the current - job. The standard shell is resumed. Operations which use - standard IO by the current job will be interleaved with - user inputs to the shell. - </p> + job. The standard shell is resumed. Operations that use + standard I/O by the current job are interleaved with + user inputs to the shell.</p> </item> <tag><c>i [nn]</c></tag> <item> <p>Stops the current evaluator process for job number <c>nn</c> or the current job, but does not kill the shell - process. Accordingly, any variable bindings and the process dictionary - will be preserved and the job can be connected again. - This command can be used to interrupt an endless loop. - </p> + process. So, any variable bindings and the process + dictionary are preserved and the job can be connected again. + This command can be used to interrupt an endless loop.</p> </item> <tag><c>k [nn]</c></tag> <item> @@ -609,135 +674,166 @@ loop(N) -> job. All spawned processes in the job are killed, provided they have not evaluated the <c>group_leader/1</c> BIF and are located on - the local machine. Processes spawned on remote nodes will - not be killed. - </p> + the local machine. Processes spawned on remote nodes + are not killed.</p> </item> <tag><c>j</c></tag> <item> <p>Lists all jobs. A list of all known jobs is - printed. The current job name is prefixed with '*'. - </p> + printed. The current job name is prefixed with '*'.</p> </item> <tag><c>s</c></tag> <item> - <p>Starts a new job. This will be assigned the new index - <c>[nn]</c> which can be used in references. - </p> + <p>Starts a new job. This is assigned the new index + <c>[nn]</c>, which can be used in references.</p> </item> <tag><c>s [shell]</c></tag> <item> - <p>Starts a new job. This will be assigned the new index - <c>[nn]</c> which can be used in references. - If the optional argument <c>shell</c> is given, it is assumed - to be a module that implements an alternative shell. - </p> + <p>Starts a new job. This is assigned the new index + <c>[nn]</c>, which can be used in references. + If optional argument <c>shell</c> is specified, it is assumed + to be a module that implements an alternative shell.</p> </item> <tag><c>r [node]</c></tag> <item> <p>Starts a remote job on <c>node</c>. This is used in distributed Erlang to allow a shell running on one node to - control a number of applications running on a network of - nodes. - If the optional argument <c>shell</c> is given, it is assumed - to be a module that implements an alternative shell. - </p> + control a number of applications running on a network of nodes. + If optional argument <c>shell</c> is specified, it is assumed + to be a module that implements an alternative shell.</p> </item> <tag><c>q</c></tag> <item> - <p>Quits Erlang. Note that this option is disabled if - Erlang is started with the ignore break, <c>+Bi</c>, - system flag (which may be useful e.g. when running - a restricted shell, see below). - </p> + <p>Quits Erlang. Notice that this option is disabled if + Erlang is started with the ignore break, <c>+Bi</c>, + system flag (which can be useful, for example when running + a restricted shell, see the next section).</p> </item> <tag><c>?</c></tag> <item> - <p>Displays this message.</p> + <p>Displays the help message above.</p> </item> </taglist> - <p>It is possible to alter the behavior of shell escape by means - of the STDLIB application variable <c>shell_esc</c>. The value of + + <p>The behavior of shell escape can be changed by the STDLIB + application variable <c>shell_esc</c>. The value of the variable can be either <c>jcl</c> (<c>erl -stdlib shell_esc jcl</c>) or <c>abort</c> (<c>erl -stdlib shell_esc abort</c>). The - first option sets ^G to activate <c>JCL</c> mode (which is also - default behavior). The latter sets ^G to terminate the current - shell and start a new one. <c>JCL</c> mode cannot be invoked when - <c>shell_esc</c> is set to <c>abort</c>. </p> - <p>If you want an Erlang node to have a remote job active from the start - (rather than the default local job), you start Erlang with the - <c>-remsh</c> flag. Example: <c>erl -sname this_node -remsh other_node@other_host</c></p> + first option sets <c>^G</c> to activate <c>JCL</c> mode (which + is also default behavior). The latter sets <c>^G</c> to + terminate the current shell and start a new one. + <c>JCL</c> mode cannot be invoked when + <c>shell_esc</c> is set to <c>abort</c>.</p> + + <p>If you want an Erlang node to have a remote job active from the start + (rather than the default local job), start Erlang with flag + <c>-remsh</c>, for example, + <c>erl -sname this_node -remsh other_node@other_host</c></p> </section> <section> <title>Restricted Shell</title> - <p>The shell may be started in a + <p>The shell can be started in a restricted mode. In this mode, the shell evaluates a function call only if allowed. This feature makes it possible to, for example, prevent a user from accidentally calling a function from the prompt that could harm a running system (useful in combination - with the the system flag <em><c>+Bi</c></em>).</p> + with system flag <c>+Bi</c>).</p> + <p>When the restricted shell evaluates an expression and - encounters a function call or an operator application, + encounters a function call or an operator application, it calls a callback function (with information about the function call in question). This callback function returns <c>true</c> to let the shell go ahead with the evaluation, or <c>false</c> to abort it. There are two possible callback functions for the user to implement:</p> - <p><em><c>local_allowed(Func, ArgList, State) -> {true,NewState} | {false,NewState}</c></em></p> - <p>to determine if the call to the local function <c>Func</c> - with arguments <c>ArgList</c> should be allowed.</p> - <p><em><c>non_local_allowed(FuncSpec, ArgList, State) -> {true,NewState} | {false,NewState} | {{redirect,NewFuncSpec,NewArgList},NewState}</c></em></p> - <p>to determine if the call to non-local function - <c>FuncSpec</c> (<c>{Module,Func}</c> or a fun) with arguments - <c>ArgList</c> should be allowed. The return value - <c>{redirect,NewFuncSpec,NewArgList}</c> can be used to let - the shell evaluate some other function than the one specified by - <c>FuncSpec</c> and <c>ArgList</c>.</p> - <p>These callback functions are in fact called from local and + + <list type="bulleted"> + <item> + <p><c>local_allowed(Func, ArgList, State) -> {boolean(),NewState}</c></p> + <p>This is used to determine if the call to the local function + <c>Func</c> with arguments <c>ArgList</c> is to be allowed.</p> + </item> + <item> + <p><c>non_local_allowed(FuncSpec, ArgList, State) + -> {boolean(),NewState} + | {{redirect,NewFuncSpec,NewArgList},NewState}</c></p> + <p>This is used to determine if the call to non-local function + <c>FuncSpec</c> (<c>{Module,Func}</c> or a fun) with arguments + <c>ArgList</c> is to be allowed. The return value + <c>{redirect,NewFuncSpec,NewArgList}</c> can be used to let + the shell evaluate some other function than the one specified by + <c>FuncSpec</c> and <c>ArgList</c>.</p> + </item> + </list> + + <p>These callback functions are called from local and non-local evaluation function handlers, described in the - <seealso marker="erl_eval">erl_eval</seealso> + <seealso marker="erl_eval"><c>erl_eval</c></seealso> manual page. (Arguments in <c>ArgList</c> are evaluated before the callback functions are called.)</p> - <p>The <c>State</c> argument is a tuple + + <p>Argument <c>State</c> is a tuple <c>{ShellState,ExprState}</c>. The return value <c>NewState</c> - has the same form. This may be used to carry a state between calls + has the same form. This can be used to carry a state between calls to the callback functions. Data saved in <c>ShellState</c> lives through an entire shell session. Data saved in <c>ExprState</c> lives only through the evaluation of the current expression.</p> + <p>There are two ways to start a restricted shell session:</p> + <list type="bulleted"> - <item>Use the STDLIB application variable <c>restricted_shell</c> - and specify, as its value, the name of the callback - module. Example (with callback functions implemented in - callback_mod.erl): <c>$ erl -stdlib restricted_shell callback_mod</c></item> - <item>From a normal shell session, call function - <c>shell:start_restricted/1</c>. This exits the current evaluator - and starts a new one in restricted mode.</item> + <item> + <p>Use STDLIB application variable <c>restricted_shell</c> + and specify, as its value, the name of the callback + module. Example (with callback functions implemented in + <c>callback_mod.erl</c>): + <c>$ erl -stdlib restricted_shell callback_mod</c>.</p> + </item> + <item> + <p>From a normal shell session, call function + <seealso marker="#start_restricted/1"> + <c>start_restricted/1</c></seealso>. This exits the current evaluator + and starts a new one in restricted mode.</p> + </item> </list> + <p><em>Notes:</em></p> <list type="bulleted"> - <item>When restricted shell mode is activated or - deactivated, new jobs started on the node will run in restricted - or normal mode respectively.</item> - <item>If restricted mode has been enabled on a - particular node, remote shells connecting to this node will also - run in restricted mode.</item> - <item>The callback functions cannot be used to allow or disallow - execution of functions called from compiled code (only functions - called from expressions entered at the shell prompt).</item> + <item> + <p>When restricted shell mode is activated or + deactivated, new jobs started on the node run in restricted + or normal mode, respectively.</p> + </item> + <item> + <p>If restricted mode has been enabled on a + particular node, remote shells connecting to this node also + run in restricted mode.</p> + </item> + <item> + <p>The callback functions cannot be used to allow or disallow + execution of functions called from compiled code (only functions + called from expressions entered at the shell prompt).</p> + </item> </list> + <p>Errors when loading the callback module is handled in different ways depending on how the restricted shell is activated:</p> + <list type="bulleted"> - <item>If the restricted shell is activated by setting the kernel - variable during emulator startup and the callback module cannot be - loaded, a default restricted shell allowing only the commands - <c>q()</c> and <c>init:stop()</c> is used as fallback.</item> - <item>If the restricted shell is activated using - <c>shell:start_restricted/1</c> and the callback module cannot be - loaded, an error report is sent to the error logger and the call - returns <c>{error,Reason}</c>.</item> + <item> + <p>If the restricted shell is activated by setting the STDLIB + variable during emulator startup, and the callback module cannot be + loaded, a default restricted shell allowing only the commands + <c>q()</c> and <c>init:stop()</c> is used as fallback.</p> + </item> + <item> + <p>If the restricted shell is activated using + <seealso marker="#start_restricted/1"> + <c>start_restricted/1</c></seealso> and the callback module cannot + be loaded, an error report is sent to the error logger and the call + returns <c>{error,Reason}</c>.</p> + </item> </list> </section> @@ -746,44 +842,27 @@ loop(N) -> <p>The default shell prompt function displays the name of the node (if the node can be part of a distributed system) and the current command number. The user can customize the prompt - function by calling - <c>shell:prompt_func/1</c> or by setting the application + function by calling <seealso marker="#prompt_func/1"> + <c>prompt_func/1</c></seealso> or by setting application configuration parameter <c>shell_prompt_func</c> for the - application STDLIB.</p> + STDLIB application.</p> + <p>A customized prompt function is stated as a tuple <c>{Mod, Func}</c>. The function is called as <c>Mod:Func(L)</c>, where <c>L</c> is a list of key-value pairs created by the shell. Currently there is only one pair: - <c>{history, N}</c>, where N is the current command number. The - function should return a list of characters or an atom. This - constraint is due to the Erlang I/O-protocol. Unicode characters - beyond codepoint 255 are allowed in the list. Note + <c>{history, N}</c>, where <c>N</c> is the current command number. The + function is to return a list of characters or an atom. This + constraint is because of the Erlang I/O protocol. Unicode characters + beyond code point 255 are allowed in the list. Notice that in restricted mode the call <c>Mod:Func(L)</c> must be - allowed or the default shell prompt function will be called.</p> - </section> + allowed or the default shell prompt function is called.</p> + </section> <funcs> <func> - <name name="history" arity="1"/> - <fsummary>Sets the number of previous commands to keep</fsummary> - <desc> - <p>Sets the number of previous commands to keep in the - history list to <c><anno>N</anno></c>. The previous number is returned. - The default number is 20.</p> - </desc> - </func> - <func> - <name name="results" arity="1"/> - <fsummary>Sets the number of previous results to keep</fsummary> - <desc> - <p>Sets the number of results from previous commands to keep in - the history list to <c><anno>N</anno></c>. The previous number is returned. - The default number is 20.</p> - </desc> - </func> - <func> <name>catch_exception(Bool) -> boolean()</name> - <fsummary>Sets the exception handling of the shell</fsummary> + <fsummary>Set the exception handling of the shell.</fsummary> <type> <v>Bool = boolean()</v> </type> @@ -793,52 +872,76 @@ loop(N) -> (<c>false</c>) is to kill the evaluator process when an exception occurs, which causes the shell to create a new evaluator process. When the exception handling is set to - <c>true</c> the evaluator process lives on which means that - for instance ports and ETS tables as well as processes + <c>true</c>, the evaluator process lives on, which means that, + for example, ports and ETS tables as well as processes linked to the evaluator process survive the exception.</p> </desc> </func> + + <func> + <name name="history" arity="1"/> + <fsummary>Set the number of previous commands to keep.</fsummary> + <desc> + <p>Sets the number of previous commands to keep in the + history list to <c><anno>N</anno></c>. The previous number is + returned. Defaults to 20.</p> + </desc> + </func> + <func> <name name="prompt_func" arity="1"/> - <fsummary>Sets the shell prompt</fsummary> + <fsummary>Set the shell prompt.</fsummary> <desc> <p>Sets the shell prompt function to <c><anno>PromptFunc</anno></c>. The previous prompt function is returned.</p> </desc> </func> + + <func> + <name name="results" arity="1"/> + <fsummary>Set the number of previous results to keep.</fsummary> + <desc> + <p>Sets the number of results from previous commands to keep in + the history list to <c><anno>N</anno></c>. The previous number is + returned. Defaults to 20.</p> + </desc> + </func> + <func> <name name="start_restricted" arity="1"/> - <fsummary>Exits a normal shell and starts a restricted shell.</fsummary> + <fsummary>Exit a normal shell and starts a restricted shell.</fsummary> <desc> - <p>Exits a normal shell and starts a restricted - shell. <c><anno>Module</anno></c> specifies the callback module for the + <p>Exits a normal shell and starts a restricted shell. + <c><anno>Module</anno></c> specifies the callback module for the functions <c>local_allowed/3</c> and <c>non_local_allowed/3</c>. The function is meant to be called from the shell.</p> <p>If the callback module cannot be loaded, an error tuple is returned. The <c><anno>Reason</anno></c> in the error tuple is the one - returned by the code loader when trying to load the code of the callback - module.</p> + returned by the code loader when trying to load the code of the + callback module.</p> </desc> </func> + <func> <name name="stop_restricted" arity="0"/> - <fsummary>Exits a restricted shell and starts a normal shell.</fsummary> + <fsummary>Exit a restricted shell and starts a normal shell.</fsummary> <desc> <p>Exits a restricted shell and starts a normal shell. The function is meant to be called from the shell.</p> </desc> </func> + <func> <name name="strings" arity="1"/> - <fsummary>Sets the shell's string recognition flag.</fsummary> + <fsummary>Set the shell's string recognition flag.</fsummary> <desc> <p>Sets pretty printing of lists to <c><anno>Strings</anno></c>. The previous value of the flag is returned.</p> <p>The flag can also be set by the STDLIB application variable - <c>shell_strings</c>. The default is - <c>true</c> which means that lists of integers will be - printed using the string syntax, when possible. The value - <c>false</c> means that no lists will be printed using the + <c>shell_strings</c>. Defaults to + <c>true</c>, which means that lists of integers are + printed using the string syntax, when possible. Value + <c>false</c> means that no lists are printed using the string syntax.</p> </desc> </func> diff --git a/lib/stdlib/doc/src/shell_default.xml b/lib/stdlib/doc/src/shell_default.xml index 4a90b7d7cc..81c99bce10 100644 --- a/lib/stdlib/doc/src/shell_default.xml +++ b/lib/stdlib/doc/src/shell_default.xml @@ -32,25 +32,27 @@ <checked>Joe Armstrong</checked> <date>1996-09-09</date> <rev>A</rev> - <file>shell_default.sgml</file> + <file>shell_default.xml</file> </header> <module>shell_default</module> - <modulesummary>Customizing the Erlang Environment</modulesummary> + <modulesummary>Customizing the Erlang environment.</modulesummary> <description> - <p>The functions in <c>shell_default</c> are called when no module - name is given in a shell command. - </p> - <p>Consider the following shell dialogue:</p> + <p>The functions in this module are called when no module name is + specified in a shell command.</p> + + <p>Consider the following shell dialog:</p> + <pre> -1 > <input>lists:reverse("abc").</input> +1> <input>lists:reverse("abc").</input> "cba" -2 > <input>c(foo).</input> -{ok, foo} </pre> - <p>In command one, the module <c>lists</c> is called. In command - two, no module name is specified. The shell searches the modules - <c>user_default</c> followed by <c>shell_default</c> for the - function <c>foo/1</c>. - </p> +2> <input>c(foo).</input> +{ok, foo}</pre> + + <p>In command one, module <seealso marker="lists"><c>lists</c></seealso> is + called. In command two, no module name is specified. The shell searches + module <c>user_default</c> followed by module <c>shell_default</c> for + function <c>foo/1</c>.</p> + <p><c>shell_default</c> is intended for "system wide" customizations to the shell. <c>user_default</c> is intended for "local" or individual user customizations.</p> @@ -60,10 +62,12 @@ <title>Hint</title> <p>To add your own commands to the shell, create a module called <c>user_default</c> and add the commands you want. Then add the - following line as the <em>first</em> line in your <c>.erlang</c> file in your - home directory. </p> + following line as the <em>first</em> line in your <c>.erlang</c> file in + your home directory.</p> + <pre> -code:load_abs("$PATH/user_default"). </pre> +code:load_abs("$PATH/user_default").</pre> + <p><c>$PATH</c> is the directory where your <c>user_default</c> module can be found.</p> </section> diff --git a/lib/stdlib/doc/src/slave.xml b/lib/stdlib/doc/src/slave.xml index 244822568b..e53ec8231b 100644 --- a/lib/stdlib/doc/src/slave.xml +++ b/lib/stdlib/doc/src/slave.xml @@ -29,89 +29,139 @@ <rev></rev> </header> <module>slave</module> - <modulesummary>Functions to Starting and Controlling Slave Nodes</modulesummary> + <modulesummary>Functions for starting and controlling slave nodes. + </modulesummary> <description> <p>This module provides functions for starting Erlang slave nodes. - All slave nodes which are started by a master will terminate - automatically when the master terminates. All TTY output produced - at the slave will be sent back to the master node. File I/O is - done via the master.</p> + All slave nodes that are started by a master terminate + automatically when the master terminates. All terminal output produced + at the slave is sent back to the master node. File I/O is + done through the master.</p> + <p>Slave nodes on other hosts than the current one are started with - the program <c>rsh</c>. The user must be allowed to <c>rsh</c> to + the <c>rsh</c> program. The user must be allowed to <c>rsh</c> to the remote hosts without being prompted for a password. This can - be arranged in a number of ways (refer to the <c>rsh</c> - documentation for details). A slave node started on the same host + be arranged in a number of ways (for details, see the <c>rsh</c> + documentation). A slave node started on the same host as the master inherits certain environment values from the master, such as the current directory and the environment variables. For what can be assumed about the environment when a slave is started - on another host, read the documentation for the <c>rsh</c> + on another host, see the documentation for the <c>rsh</c> program.</p> + <p>An alternative to the <c>rsh</c> program can be specified on - the command line to <c>erl</c> as follows: <c>-rsh Program</c>.</p> - <p>The slave node should use the same file system at the master. At - least, Erlang/OTP should be installed in the same place on both - computers and the same version of Erlang should be used.</p> - <p>Currently, a node running on Windows NT can only start slave + the command line to + <seealso marker="erts:erl#erl"><c>erl(1)</c></seealso> as follows:</p> + + <pre> +-rsh Program</pre> + + <p>The slave node is to use the same file system at the master. At + least, Erlang/OTP is to be installed in the same place on both + computers and the same version of Erlang is to be used.</p> + + <p>A node running on Windows can only start slave nodes on the host on which it is running.</p> + <p>The master node must be alive.</p> </description> + <funcs> <func> + <name>pseudo([Master | ServerList]) -> ok</name> + <fsummary>Start a number of pseudo servers.</fsummary> + <type> + <v>Master = node()</v> + <v>ServerList = [atom()]</v> + </type> + <desc> + <p>Calls <c>pseudo(Master, ServerList)</c>. If you want to start + a node from the command line and set up a number of pseudo + servers, an Erlang runtime system can be started as follows:</p> + <pre> +% erl -name abc -s slave pseudo klacke@super x --</pre> + </desc> + </func> + + <func> + <name name="pseudo" arity="2"/> + <fsummary>Start a number of pseudo servers.</fsummary> + <desc> + <p>Starts a number of pseudo servers. A pseudo server is a + server with a registered name that does nothing + but pass on all message to the real server that executes at a + master node. A pseudo server is an intermediary that only has + the same registered name as the real server.</p> + <p>For example, if you have started a slave node <c>N</c> and + want to execute <c>pxw</c> graphics code on this node, you can + start server <c>pxw_server</c> as a pseudo server at + the slave node. This is illustrated as follows:</p> + <code type="none"> +rpc:call(N, slave, pseudo, [node(), [pxw_server]]).</code> + </desc> + </func> + + <func> + <name name="relay" arity="1"/> + <fsummary>Run a pseudo server.</fsummary> + <desc> + <p>Runs a pseudo server. This function never returns any value + and the process that executes the function receives + messages. All messages received are simply passed on to + <c><anno>Pid</anno></c>.</p> + </desc> + </func> + + <func> <name name="start" arity="1"/> <name name="start" arity="2"/> <name name="start" arity="3"/> - <fsummary>Start a slave node on a host</fsummary> + <fsummary>Start a slave node on a host.</fsummary> <desc> - <p>Starts a slave node on the host <c><anno>Host</anno></c>. Host names need - not necessarily be specified as fully qualified names; short + <p>Starts a slave node on host <c><anno>Host</anno></c>. Host names + need not necessarily be specified as fully qualified names; short names can also be used. This is the same condition that applies to names of distributed Erlang nodes.</p> - <p>The name of the started node will be <c><anno>Name</anno>@<anno>Host</anno></c>. If no - name is provided, the name will be the same as the node which - executes the call (with the exception of the host name part of - the node name).</p> + <p>The name of the started node becomes + <c><anno>Name</anno>@<anno>Host</anno></c>. If no + name is provided, the name becomes the same as the node that + executes the call (except the host name part of the node name).</p> <p>The slave node resets its <c>user</c> process so that all - terminal I/O which is produced at the slave is automatically - relayed to the master. Also, the file process will be relayed + terminal I/O that is produced at the slave is automatically + relayed to the master. Also, the file process is relayed to the master.</p> - <p>The <c><anno>Args</anno></c> argument is used to set <c>erl</c> command - line arguments. If provided, it is passed to the new node and - can be used for a variety of purposes. See - <seealso marker="erts:erl#erl">erl(1)</seealso></p> - <p>As an example, suppose that we want to start a slave node at - host <c>H</c> with the node name <c>Name@H</c>, and we also + <p>Argument <c><anno>Args</anno></c> is used to set <c>erl</c> + command-line arguments. If provided, it is passed to the new + node and can be used for a variety of purposes; see + <seealso marker="erts:erl#erl"><c>erl(1)</c></seealso>.</p> + <p>As an example, suppose that you want to start a slave node at + host <c>H</c> with node name <c>Name@H</c> and want the slave node to have the following properties:</p> <list type="bulleted"> - <item> - <p>directory <c>Dir</c> should be added to the code path;</p> - </item> - <item> - <p>the Mnesia directory should be set to <c>M</c>;</p> - </item> - <item> - <p>the unix <c>DISPLAY</c> environment variable should be - set to the display of the master node.</p> - </item> + <item>Directory <c>Dir</c> is to be added to the code path.</item> + <item>The Mnesia directory is to be set to <c>M</c>.</item> + <item>The Unix <c>DISPLAY</c> environment variable is to be + set to the display of the master node.</item> </list> <p>The following code is executed to achieve this:</p> <code type="none"> E = " -env DISPLAY " ++ net_adm:localhost() ++ ":0 ", Arg = "-mnesia_dir " ++ M ++ " -pa " ++ Dir ++ E, slave:start(H, Name, Arg).</code> - <p>If successful, the function returns <c>{ok, <anno>Node</anno>}</c>, - where <c><anno>Node</anno></c> is the name of the new node. Otherwise it - returns <c>{error, <anno>Reason</anno>}</c>, where <c><anno>Reason</anno></c> can be - one of:</p> + <p>The function returns <c>{ok, <anno>Node</anno>}</c>, where + <c><anno>Node</anno></c> is the name of the new node, otherwise + <c>{error, <anno>Reason</anno>}</c>, where <c><anno>Reason</anno></c> + can be one of:</p> <taglist> <tag><c>timeout</c></tag> <item> <p>The master node failed to get in contact with the slave - node. This can happen in a number of circumstances:</p> + node. This can occur in a number of circumstances:</p> <list type="bulleted"> - <item>Erlang/OTP is not installed on the remote host</item> - <item>the file system on the other host has a different - structure to the the master</item> - <item>the Erlang nodes have different cookies.</item> + <item>Erlang/OTP is not installed on the remote host.</item> + <item>The file system on the other host has a different + structure to the the master.</item> + <item>The Erlang nodes have different cookies.</item> </list> </item> <tag><c>no_rsh</c></tag> @@ -120,75 +170,35 @@ slave:start(H, Name, Arg).</code> </item> <tag><c>{already_running, <anno>Node</anno>}</c></tag> <item> - <p>A node with the name <c><anno>Name</anno>@<anno>Host</anno></c> already exists.</p> + <p>A node with name <c><anno>Name</anno>@<anno>Host</anno></c> + already exists.</p> </item> </taglist> </desc> </func> + <func> <name name="start_link" arity="1"/> <name name="start_link" arity="2"/> <name name="start_link" arity="3"/> - <fsummary>Start and link to a slave node on a host</fsummary> + <fsummary>Start and link to a slave node on a host.</fsummary> <desc> <p>Starts a slave node in the same way as <c>start/1,2,3</c>, except that the slave node is linked to the currently executing process. If that process terminates, the slave node also terminates.</p> - <p>See <c>start/1,2,3</c> for a description of arguments and - return values.</p> + <p>For a description of arguments and return values, see + <seealso marker="#start/1"><c>start/1,2,3</c></seealso>.</p> </desc> </func> + <func> <name name="stop" arity="1"/> - <fsummary>Stop (kill) a node</fsummary> + <fsummary>Stop (kill) a node.</fsummary> <desc> <p>Stops (kills) a node.</p> </desc> </func> - <func> - <name>pseudo([Master | ServerList]) -> ok</name> - <fsummary>Start a number of pseudo servers</fsummary> - <type> - <v>Master = node()</v> - <v>ServerList = [atom()]</v> - </type> - <desc> - <p>Calls <c>pseudo(Master, ServerList)</c>. If we want to start - a node from the command line and set up a number of pseudo - servers, an Erlang runtime system can be started as - follows:</p> - <pre> -% erl -name abc -s slave pseudo klacke@super x --</pre> - </desc> - </func> - <func> - <name name="pseudo" arity="2"/> - <fsummary>Start a number of pseudo servers</fsummary> - <desc> - <p>Starts a number of pseudo servers. A pseudo server is a - server with a registered name which does absolutely nothing - but pass on all message to the real server which executes at a - master node. A pseudo server is an intermediary which only has - the same registered name as the real server.</p> - <p>For example, if we have started a slave node <c>N</c> and - want to execute <c>pxw</c> graphics code on this node, we can - start the server <c>pxw_server</c> as a pseudo server at - the slave node. The following code illustrates:</p> - <code type="none"> -rpc:call(N, slave, pseudo, [node(), [pxw_server]]).</code> - </desc> - </func> - <func> - <name name="relay" arity="1"/> - <fsummary>Run a pseudo server</fsummary> - <desc> - <p>Runs a pseudo server. This function never returns any value - and the process which executes the function will receive - messages. All messages received will simply be passed on to - <c><anno>Pid</anno></c>.</p> - </desc> - </func> </funcs> </erlref> diff --git a/lib/stdlib/doc/src/sofs.xml b/lib/stdlib/doc/src/sofs.xml index cf0855bc85..4cf1984d46 100644 --- a/lib/stdlib/doc/src/sofs.xml +++ b/lib/stdlib/doc/src/sofs.xml @@ -24,260 +24,284 @@ <title>sofs</title> <prepared>Hans Bolinder</prepared> - <responsible>nobody</responsible> + <responsible></responsible> <docno></docno> - <approved>nobody</approved> - <checked>no</checked> + <approved></approved> + <checked></checked> <date>2001-08-25</date> <rev>PA1</rev> - <file>sofs.sgml</file> + <file>sofs.xml</file> </header> <module>sofs</module> - <modulesummary>Functions for Manipulating Sets of Sets</modulesummary> + <modulesummary>Functions for manipulating sets of sets.</modulesummary> <description> - <p>The <c>sofs</c> module implements operations on finite sets and + <p>This module provides operations on finite sets and relations represented as sets. Intuitively, a set is a collection of elements; every element belongs to the set, and the set contains every element.</p> + <p>Given a set A and a sentence S(x), where x is a free variable, a new set B whose elements are exactly those elements of A for which S(x) holds can be formed, this is denoted B = {x in A : S(x)}. Sentences are expressed using the logical operators "for some" (or "there exists"), "for all", "and", "or", "not". If the existence of a set containing all the - specified elements is known (as will always be the case in this - module), we write B = {x : S(x)}. </p> - <p>The <em>unordered set</em> containing the elements a, b and c - is denoted {a, b, c}. This notation is not to be - confused with tuples. The <em>ordered pair</em> of a and b, with - first <em>coordinate</em> a and second coordinate b, is denoted - (a, b). An ordered pair is an <em>ordered set</em> of two - elements. In this module ordered sets can contain one, two or - more elements, and parentheses are used to enclose the elements. - Unordered sets and ordered sets are orthogonal, again in this - module; there is no unordered set equal to any ordered set.</p> - <p>The set that contains no elements is called the <em>empty set</em>. - If two sets A and B contain the same elements, then A - is <marker id="equal"></marker><em>equal</em> to B, denoted - A = B. Two ordered sets are equal if they contain the - same number of elements and have equal elements at each - coordinate. If a set A contains all elements that B contains, - then B is a <marker id="subset"></marker><em>subset</em> of A. - The <marker id="union"></marker><em>union</em> of two sets A and B is - the smallest set that contains all elements of A and all elements of - B. The <marker id="intersection"></marker><em>intersection</em> of two - sets A and B is the set that contains all elements of A that - belong to B. - Two sets are <marker id="disjoint"></marker><em>disjoint</em> if their - intersection is the empty set. - The <marker id="difference"></marker><em>difference</em> of - two sets A and B is the set that contains all elements of A that - do not belong to B. - The <marker id="symmetric_difference"></marker><em>symmetric - difference</em> of - two sets is the set that contains those element that belong to - either of the two sets, but not both. - The <marker id="union_n"></marker><em>union</em> of a collection - of sets is the smallest set that contains all the elements that - belong to at least one set of the collection. - The <marker id="intersection_n"></marker><em>intersection</em> of - a non-empty collection of sets is the set that contains all elements - that belong to every set of the collection.</p> - <p>The <marker id="Cartesian_product"></marker><em>Cartesian - product</em> of - two sets X and Y, denoted X × Y, is the set - {a : a = (x, y) for some x in X and for - some y in Y}. - A <marker id="relation"></marker><em>relation</em> is a subset of - X × Y. Let R be a relation. The fact that - (x, y) belongs to R is written as x R y. Since - relations are sets, the definitions of the last paragraph - (subset, union, and so on) apply to relations as well. - The <marker id="domain"></marker><em>domain</em> of R is the - set {x : x R y for some y in Y}. - The <marker id="range"></marker><em>range</em> of R is the - set {y : x R y for some x in X}. - The <marker id="converse"></marker><em>converse</em> of R is the - set {a : a = (y, x) for some - (x, y) in R}. If A is a subset of X, then - the <marker id="image"></marker><em>image</em> of - A under R is the set {y : x R y for some - x in A}, and if B is a subset of Y, then - the <marker id="inverse_image"></marker><em>inverse image</em> of B is - the set {x : x R y for some y in B}. If R is a - relation from X to Y and S is a relation from Y to Z, then - the <marker id="relative_product"></marker><em>relative product</em> of - R and S is the relation T from X to Z defined so that x T z - if and only if there exists an element y in Y such that - x R y and y S z. - The <marker id="restriction"></marker><em>restriction</em> of R to A is - the set S defined so that x S y if and only if there exists an - element x in A such that x R y. If S is a restriction - of R to A, then R is - an <marker id="extension"></marker><em>extension</em> of S to X. - If X = Y then we call R a relation <em>in</em> X. - The <marker id="field"></marker><em>field</em> of a relation R in X - is the union of the domain of R and the range of R. - If R is a relation in X, and - if S is defined so that x S y if x R y and - not x = y, then S is - the <marker id="strict_relation"></marker><em>strict</em> relation - corresponding to - R, and vice versa, if S is a relation in X, and if R is defined - so that x R y if x S y or x = y, - then R is the <marker id="weak_relation"></marker><em>weak</em> relation - corresponding to S. A relation R in X is <em>reflexive</em> if - x R x for every element x of X; it is - <em>symmetric</em> if x R y implies that - y R x; and it is <em>transitive</em> if - x R y and y R z imply that x R z.</p> - <p>A <marker id="function"></marker><em>function</em> F is a relation, a - subset of X × Y, such that the domain of F is - equal to X and such that for every x in X there is a unique - element y in Y with (x, y) in F. The latter condition can - be formulated as follows: if x F y and x F z - then y = z. In this module, it will not be required - that the domain of F be equal to X for a relation to be - considered a function. Instead of writing - (x, y) in F or x F y, we write - F(x) = y when F is a function, and say that F maps x - onto y, or that the value of F at x is y. Since functions are - relations, the definitions of the last paragraph (domain, range, - and so on) apply to functions as well. If the converse of a - function F is a function F', then F' is called - the <marker id="inverse"></marker><em>inverse</em> of F. - The relative product of two functions F1 and F2 is called - the <marker id="composite"></marker><em>composite</em> of F1 and F2 - if the range of F1 is a subset of the domain of F2. </p> - <p>Sometimes, when the range of a function is more important than - the function itself, the function is called a <em>family</em>. - The domain of a family is called the <em>index set</em>, and the - range is called the <em>indexed set</em>. If x is a family from - I to X, then x[i] denotes the value of the function at index i. - The notation "a family in X" is used for such a family. When the - indexed set is a set of subsets of a set X, then we call x - a <marker id="family"></marker><em>family of subsets</em> of X. If x - is a family of subsets of X, then the union of the range of x is - called the <em>union of the family</em> x. If x is non-empty - (the index set is non-empty), - the <em>intersection of the family</em> x is the intersection of - the range of x. In this - module, the only families that will be considered are families - of subsets of some set X; in the following the word "family" - will be used for such families of subsets.</p> - <p>A <marker id="partition"></marker><em>partition</em> of a set X is a - collection S of non-empty subsets of X whose union is X and - whose elements are pairwise disjoint. A relation in a set is an - <em>equivalence relation</em> if it is reflexive, symmetric and - transitive. If R is an equivalence relation in X, and x is an - element of X, - the <marker id="equivalence_class"></marker><em>equivalence - class</em> of x with respect to R is the set of all those - elements y of X for which x R y holds. The equivalence - classes constitute a partitioning of X. Conversely, if C is a - partition of X, then the relation that holds for any two - elements of X if they belong to the same equivalence class, is - an equivalence relation induced by the partition C. If R is an - equivalence relation in X, then - the <marker id="canonical_map"></marker><em>canonical map</em> is - the function that maps every element of X onto its equivalence class. - </p> - <p><marker id="binary_relation"></marker>Relations as defined above - (as sets of ordered pairs) will from now on be referred to as - <em>binary relations</em>. We call a set of ordered sets - (x[1], ..., x[n]) an <marker id="n_ary_relation"></marker> - <em>(n-ary) relation</em>, and say that the relation is a subset of - the <marker id="Cartesian_product_tuple"></marker>Cartesian product - X[1] × ... × X[n] where x[i] is - an element of X[i], 1 <= i <= n. - The <marker id="projection"></marker><em>projection</em> of an n-ary - relation R onto coordinate i is the set {x[i] : - (x[1], ..., x[i], ..., x[n]) in R for some - x[j] in X[j], 1 <= j <= n - and not i = j}. The projections of a binary relation R - onto the first and second coordinates are the domain and the - range of R respectively. The relative product of binary - relations can be generalized to n-ary relations as follows. Let - TR be an ordered set (R[1], ..., R[n]) of binary - relations from X to Y[i] and S a binary relation from - (Y[1] × ... × Y[n]) to Z. - The <marker id="tuple_relative_product"></marker><em>relative - product</em> of - TR and S is the binary relation T from X to Z defined so that - x T z if and only if there exists an element y[i] in - Y[i] for each 1 <= i <= n such that - x R[i] y[i] and - (y[1], ..., y[n]) S z. Now let TR be a an - ordered set (R[1], ..., R[n]) of binary relations from - X[i] to Y[i] and S a subset of - X[1] × ... × X[n]. - The <marker id="multiple_relative_product"></marker><em>multiple - relative product</em> of TR and S is defined to be the - set {z : z = ((x[1], ..., x[n]), (y[1],...,y[n])) - for some (x[1], ..., x[n]) in S and for some - (x[i], y[i]) in R[i], - 1 <= i <= n}. - The <marker id="natural_join"></marker><em>natural join</em> of - an n-ary relation R - and an m-ary relation S on coordinate i and j is defined to be - the set {z : z = (x[1], ..., x[n], - y[1], ..., y[j-1], y[j+1], ..., y[m]) - for some (x[1], ..., x[n]) in R and for some - (y[1], ..., y[m]) in S such that - x[i] = y[j]}.</p> - <p><marker id="sets_definition"></marker>The sets recognized by this - module will be represented by elements of the relation Sets, defined as - the smallest set such that:</p> + specified elements is known (as is always the case in this + module), this is denoted B = {x : S(x)}.</p> + <list type="bulleted"> - <item>for every atom T except '_' and for every term X, - (T, X) belongs to Sets (<em>atomic sets</em>); + <item> + <p>The <em>unordered set</em> containing the elements a, b, and c is + denoted {a, b, c}. This notation is not to be confused with + tuples.</p> + <p>The <em>ordered pair</em> of a and b, with first <em>coordinate</em> + a and second coordinate b, is denoted (a, b). An ordered pair + is an <em>ordered set</em> of two elements. In this module, ordered + sets can contain one, two, or more elements, and parentheses are + used to enclose the elements.</p> + <p>Unordered sets and ordered sets are orthogonal, again in this + module; there is no unordered set equal to any ordered set.</p> </item> - <item>(['_'], []) belongs to Sets (the <em>untyped empty set</em>); + <item> + <p>The <em>empty set</em> contains no elements.</p> + <p>Set A is <marker id="equal"></marker><em>equal</em> to set B if they + contain the same elements, which is denoted A = B. Two + ordered sets are equal if they contain the same number of elements + and have equal elements at each coordinate.</p> + <p>Set B is a <marker id="subset"></marker><em>subset</em> of set A + if A contains all elements that B contains.</p> + <p>The <marker id="union"></marker><em>union</em> of two sets A and B + is the smallest set that contains all elements of A and all elements + of B.</p> + <p>The <marker id="intersection"></marker><em>intersection</em> of two + sets A and B is the set that contains all elements of A that belong + to B.</p> + <p>Two sets are <marker id="disjoint"></marker><em>disjoint</em> if + their intersection is the empty set.</p> + <p>The <marker id="difference"></marker><em>difference</em> of two sets + A and B is the set that contains all elements of A that do not belong + to B.</p> + <p>The <marker id="symmetric_difference"></marker><em>symmetric + difference</em> of two sets is the set that contains those element + that belong to either of the two sets, but not both.</p> + <p>The <marker id="union_n"></marker><em>union</em> of a collection + of sets is the smallest set that contains all the elements that + belong to at least one set of the collection.</p> + <p>The <marker id="intersection_n"></marker><em>intersection</em> of + a non-empty collection of sets is the set that contains all elements + that belong to every set of the collection.</p> </item> - <item>for every tuple T = {T[1], ..., T[n]} and - for every tuple X = {X[1], ..., X[n]}, if - (T[i], X[i]) belongs to Sets for every - 1 <= i <= n then (T, X) belongs - to Sets (<em>ordered sets</em>); + <item> + <p>The <marker id="Cartesian_product"></marker><em>Cartesian + product</em> of two sets X and Y, denoted X × Y, is + the set {a : a = (x, y) for some x in X and + for some y in Y}.</p> + <p>A <marker id="relation"></marker><em>relation</em> is a subset of + X × Y. Let R be a relation. The fact that (x, y) + belongs to R is written as x R y. As relations are sets, + the definitions of the last item (subset, union, and so on) apply to + relations as well.</p> + <p>The <marker id="domain"></marker><em>domain</em> of R is the set + {x : x R y for some y in Y}.</p> + <p>The <marker id="range"></marker><em>range</em> of R is the set + {y : x R y for some x in X}.</p> + <p>The <marker id="converse"></marker><em>converse</em> of R is the + set {a : a = (y, x) for some + (x, y) in R}.</p> + <p>If A is a subset of X, the <marker id="image"></marker><em>image</em> + of A under R is the set {y : x R y for some + x in A}. If B is a subset of Y, the + <marker id="inverse_image"></marker><em>inverse image</em> of B is the + set {x : x R y for some y in B}.</p> + <p>If R is a relation from X to Y, and S is a relation from Y to Z, the + <marker id="relative_product"></marker><em>relative product</em> of R + and S is the relation T from X to Z defined so that x T z + if and only if there exists an element y in Y such that + x R y and y S z.</p> + <p>The <marker id="restriction"></marker><em>restriction</em> of R to A + is the set S defined so that x S y if and only if there + exists an element x in A such that x R y.</p> + <p>If S is a restriction of R to A, then R is an + <marker id="extension"></marker><em>extension</em> of S to X.</p> + <p>If X = Y, then R is called a relation <em>in</em> X.</p> + <p>The <marker id="field"></marker><em>field</em> of a relation R in X + is the union of the domain of R and the range of R.</p> + <p>If R is a relation in X, and if S is defined so that x S y + if x R y and not x = y, then S is the + <marker id="strict_relation"></marker><em>strict</em> relation + corresponding to R. Conversely, if S is a relation in X, and if R is + defined so that x R y if x S y or x = y, + then R is the <marker id="weak_relation"></marker><em>weak</em> + relation corresponding to S.</p> + <p>A relation R in X is <em>reflexive</em> if x R x for every + element x of X, it is <em>symmetric</em> if x R y implies + that y R x, and it is <em>transitive</em> if + x R y and y R z imply that x R z.</p> + </item> + <item> + <p>A <marker id="function"></marker><em>function</em> F is a relation, + a subset of X × Y, such that the domain of F is equal + to X and such that for every x in X there is a unique element y in Y + with (x, y) in F. The latter condition can be formulated as + follows: if x F y and x F z, then y = z. + In this module, it is not required that the domain of F is equal to X + for a relation to be considered a function.</p> + <p>Instead of writing (x, y) in F or x F y, we + write F(x) = y when F is a function, and say that F maps x + onto y, or that the value of F at x is y.</p> + <p>As functions are relations, the definitions of the last item (domain, + range, and so on) apply to functions as well.</p> + <p>If the converse of a function F is a function F', then F' is called + the <marker id="inverse"></marker><em>inverse</em> of F.</p> + <p>The relative product of two functions F1 and F2 is called + the <marker id="composite"></marker><em>composite</em> of F1 and F2 + if the range of F1 is a subset of the domain of F2.</p> + </item> + <item> + <p>Sometimes, when the range of a function is more important than the + function itself, the function is called a <em>family</em>.</p> + <p>The domain of a family is called the <em>index set</em>, and the + range is called the <em>indexed set</em>.</p> + <p>If x is a family from I to X, then x[i] denotes the value of the + function at index i. The notation "a family in X" is used for such a + family.</p> + <p>When the indexed set is a set of subsets of a set X, we call x a + <marker id="family"></marker><em>family of subsets</em> of X.</p> + <p>If x is a family of subsets of X, the union of the range of x is + called the <em>union of the family</em> x.</p> + <p>If x is non-empty (the index set is non-empty), the <em>intersection + of the family</em> x is the intersection of the range of x.</p> + <p>In this module, the only families that are considered are families + of subsets of some set X; in the following, the word "family" is + used for such families of subsets.</p> + </item> + <item> + <p>A <marker id="partition"></marker><em>partition</em> of a set X is a + collection S of non-empty subsets of X whose union is X and whose + elements are pairwise disjoint.</p> + <p>A relation in a set is an <em>equivalence relation</em> if it is + reflexive, symmetric, and transitive.</p> + <p>If R is an equivalence relation in X, and x is an element of X, the + <marker id="equivalence_class"></marker><em>equivalence class</em> of + x with respect to R is the set of all those elements y of X for which + x R y holds. The equivalence classes constitute a + partitioning of X. Conversely, if C is a partition of X, the relation + that holds for any two elements of X if they belong to the same + equivalence class, is an equivalence relation induced by the + partition C.</p> + <p>If R is an equivalence relation in X, the + <marker id="canonical_map"></marker><em>canonical map</em> is the + function that maps every element of X onto its equivalence class.</p> + </item> + <item> + <p><marker id="binary_relation"></marker>Relations as defined above + (as sets of ordered pairs) are from now on referred to as <em>binary + relations</em>.</p> + <p>We call a set of ordered sets (x[1], ..., x[n]) an + <marker id="n_ary_relation"></marker><em>(n-ary) relation</em>, and + say that the relation is a subset of the + <marker id="Cartesian_product_tuple"></marker>Cartesian product + X[1] × ... × X[n], where x[i] is + an element of X[i], 1 <= i <= n.</p> + <p>The <marker id="projection"></marker><em>projection</em> of an n-ary + relation R onto coordinate i is the set {x[i] : + (x[1], ..., x[i], ..., x[n]) in R for some + x[j] in X[j], 1 <= j <= n and + not i = j}. The projections of a binary relation R onto the + first and second coordinates are the domain and the range of R, + respectively.</p> + <p>The relative product of binary relations can be generalized to n-ary + relations as follows. Let TR be an ordered set + (R[1], ..., R[n]) of binary relations from X to Y[i] + and S a binary relation from + (Y[1] × ... × Y[n]) to Z. The + <marker id="tuple_relative_product"></marker><em>relative product</em> + of TR and S is the binary relation T from X to Z defined so that + x T z if and only if there exists an element y[i] in Y[i] + for each 1 <= i <= n such that + x R[i] y[i] and + (y[1], ..., y[n]) S z. Now let TR be a an + ordered set (R[1], ..., R[n]) of binary relations from + X[i] to Y[i] and S a subset of + X[1] × ... × X[n]. + The <marker id="multiple_relative_product"></marker><em>multiple + relative product</em> of TR and S is defined to be the set + {z : z = ((x[1], ..., x[n]), (y[1],...,y[n])) + for some (x[1], ..., x[n]) in S and for some + (x[i], y[i]) in R[i], 1 <= i <= n}.</p> + <p>The <marker id="natural_join"></marker><em>natural join</em> of an + n-ary relation R and an m-ary relation S on coordinate i and j is + defined to be the set + {z : z = (x[1], ..., x[n], + y[1], ..., y[j-1], y[j+1], ..., y[m]) + for some (x[1], ..., x[n]) in R and for some + (y[1], ..., y[m]) in S such that + x[i] = y[j]}.</p> + </item> + <item> + <p><marker id="sets_definition"></marker>The sets recognized by this + module are represented by elements of the relation Sets, which is + defined as the smallest set such that:</p> + <list type="bulleted"> + <item> + <p>For every atom T, except '_', and for every term X, + (T, X) belongs to Sets (<em>atomic sets</em>).</p> + </item> + <item> + <p>(['_'], []) belongs to Sets (the <em>untyped empty + set</em>).</p> + </item> + <item> + <p>For every tuple T = {T[1], ..., T[n]} and + for every tuple X = {X[1], ..., X[n]}, if + (T[i], X[i]) belongs to Sets for every + 1 <= i <= n, then (T, X) belongs + to Sets (<em>ordered sets</em>).</p> + </item> + <item> + <p>For every term T, if X is the empty list or a non-empty + sorted list [X[1], ..., X[n]] without duplicates + such that (T, X[i]) belongs to Sets for every + 1 <= i <= n, then ([T], X) + belongs to Sets (<em>typed unordered sets</em>).</p> + </item> + </list> + <p>An <marker id="external_set"></marker><em>external set</em> is an + element of the range of Sets.</p> + <p>A <marker id="type"></marker><em>type</em> is an element of the + domain of Sets.</p> + <p>If S is an element (T, X) of Sets, then T is a + <marker id="valid_type"></marker><em>valid type</em> of X, T is the + type of S, and X is the external set of S. + <seealso marker="#from_term/2"><c>from_term/2</c></seealso> creates a + set from a type and an Erlang term turned into an external set.</p> + <p>The sets represented by Sets are the elements of the range of + function Set from Sets to Erlang terms and sets of Erlang terms:</p> + <list type="bulleted"> + <item>Set(T,Term) = Term, where T is an atom</item> + <item>Set({T[1], ..., T[n]}, {X[1], ..., + X[n]}) = (Set(T[1], X[1]), ..., + Set(T[n], X[n]))</item> + <item>Set([T], [X[1], ..., X[n]]) = + {Set(T, X[1]), ..., Set(T, X[n])}</item> + <item>Set([T], []) = {}</item> + </list> + <p>When there is no risk of confusion, elements of Sets are identified + with the sets they represent. For example, if U is the result of + calling <seealso marker="#union/2"><c>union/2</c></seealso> with S1 + and S2 as arguments, then U is said to be the union of S1 and S2. + A more precise formulation is that Set(U) is the union of Set(S1) + and Set(S2).</p> </item> - <item>for every term T, if X is the empty list or a non-empty - sorted list [X[1], ..., X[n]] without duplicates - such that (T, X[i]) belongs to Sets for every - 1 <= i <= n, then ([T], X) - belongs to Sets (<em>typed unordered sets</em>).</item> - </list> - <p>An <marker id="external_set"></marker><em>external set</em> is an - element of the range of Sets. - A <marker id="type"></marker><em>type</em> - is an element of the domain of Sets. If S is an element - (T, X) of Sets, then T is - a <marker id="valid_type"></marker><em>valid type</em> of X, - T is the type of S, and X is the external set - of S. <seealso marker="#from_term">from_term/2</seealso> creates a - set from a type and an Erlang term turned into an external set.</p> - <p>The actual sets represented by Sets are the elements of the - range of the function Set from Sets to Erlang terms and sets of - Erlang terms:</p> - <list type="bulleted"> - <item>Set(T,Term) = Term, where T is an atom;</item> - <item>Set({T[1], ..., T[n]}, {X[1], ..., X[n]}) - = (Set(T[1], X[1]), ..., Set(T[n], X[n]));</item> - <item>Set([T], [X[1], ..., X[n]]) - = {Set(T, X[1]), ..., Set(T, X[n])};</item> - <item>Set([T], []) = {}.</item> </list> - <p>When there is no risk of confusion, elements of Sets will be - identified with the sets they represent. For instance, if U is - the result of calling <c>union/2</c> with S1 and S2 as - arguments, then U is said to be the union of S1 and S2. A more - precise formulation would be that Set(U) is the union of Set(S1) - and Set(S2).</p> + <p>The types are used to implement the various conditions that - sets need to fulfill. As an example, consider the relative + sets must fulfill. As an example, consider the relative product of two sets R and S, and recall that the relative product of R and S is defined if R is a binary relation to Y and - S is a binary relation from Y. The function that implements the relative - product, <seealso marker="#relprod_impl">relative_product/2</seealso>, checks + S is a binary relation from Y. The function that implements the + relative product, <seealso marker="#relative_product/2"> + <c>relative_product/2</c></seealso>, checks that the arguments represent binary relations by matching [{A,B}] against the type of the first argument (Arg1 say), and [{C,D}] against the type of the second argument (Arg2 say). The fact @@ -290,33 +314,51 @@ ensure that W is equal to Y. The untyped empty set is handled separately: its type, ['_'], matches the type of any unordered set.</p> - <p>A few functions of this module (<c>drestriction/3</c>, - <c>family_projection/2</c>, <c>partition/2</c>, - <c>partition_family/2</c>, <c>projection/2</c>, - <c>restriction/3</c>, <c>substitution/2</c>) accept an Erlang + + <p>A few functions of this module + (<seealso marker="#drestriction/3"><c>drestriction/3</c></seealso>, + <seealso marker="#family_projection/2"><c>family_projection/2</c></seealso>, + <seealso marker="#partition/2"><c>partition/2</c></seealso>, + <seealso marker="#partition_family/2"><c>partition_family/2</c></seealso>, + <seealso marker="#projection/2"><c>projection/2</c></seealso>, + <seealso marker="#restriction/3"><c>restriction/3</c></seealso>, + <seealso marker="#substitution/2"><c>substitution/2</c></seealso>) + accept an Erlang function as a means to modify each element of a given unordered set. <marker id="set_fun"></marker>Such a function, called - SetFun in the following, can be - specified as a functional object (fun), a tuple - <c>{external, Fun}</c>, or an integer. If SetFun is - specified as a fun, the fun is applied to each element of the - given set and the return value is assumed to be a set. If SetFun - is specified as a tuple <c>{external, Fun}</c>, Fun is applied - to the external set of each element of the given set and the - return value is assumed to be an external set. Selecting the - elements of an unordered set as external sets and assembling a - new unordered set from a list of external sets is in the present - implementation more efficient than modifying each element as a - set. However, this optimization can only be utilized when the - elements of the unordered set are atomic or ordered sets. It - must also be the case that the type of the elements matches some - clause of Fun (the type of the created set is the result of - applying Fun to the type of the given set), and that Fun does - nothing but selecting, duplicating or rearranging parts of the - elements. Specifying a SetFun as an integer I is equivalent to - specifying <c>{external, fun(X) -> element(I, X) end}</c>, - but is to be preferred since it makes it possible to handle this - case even more efficiently. Examples of SetFuns:</p> + SetFun in the following, can be specified as a functional object (fun), + a tuple <c>{external, Fun}</c>, or an integer:</p> + + <list type="bulleted"> + <item> + <p>If SetFun is specified as a fun, the fun is applied to each element + of the given set and the return value is assumed to be a set.</p> + </item> + <item> + <p>If SetFun is specified as a tuple <c>{external, Fun}</c>, Fun is + applied to the external set of each element of the given set and the + return value is assumed to be an external set. Selecting the + elements of an unordered set as external sets and assembling a + new unordered set from a list of external sets is in the present + implementation more efficient than modifying each element as a + set. However, this optimization can only be used when the + elements of the unordered set are atomic or ordered sets. It + must also be the case that the type of the elements matches some + clause of Fun (the type of the created set is the result of + applying Fun to the type of the given set), and that Fun does + nothing but selecting, duplicating, or rearranging parts of the + elements.</p> + </item> + <item> + <p>Specifying a SetFun as an integer I is equivalent to + specifying <c>{external, fun(X) -> + element(I, X) end}</c>, but is to be preferred, as it + makes it possible to handle this case even more efficiently.</p> + </item> + </list> + + <p>Examples of SetFuns:</p> + <pre> fun sofs:union/1 fun(S) -> sofs:partition(1, S) end @@ -325,22 +367,31 @@ fun(S) -> sofs:partition(1, S) end {external, fun({_,{_,C}}) -> C end} {external, fun({_,{_,{_,E}=C}}) -> {E,{E,C}} end} 2</pre> + <p>The order in which a SetFun is applied to the elements of an - unordered set is not specified, and may change in future - versions of sofs.</p> + unordered set is not specified, and can change in future + versions of this module.</p> + <p>The execution time of the functions of this module is dominated by the time it takes to sort lists. When no sorting is needed, the execution time is in the worst case proportional to the sum of the sizes of the input arguments and the returned value. A - few functions execute in constant time: <c>from_external</c>, - <c>is_empty_set</c>, <c>is_set</c>, <c>is_sofs_set</c>, - <c>to_external</c>, <c>type</c>.</p> + few functions execute in constant time: + <seealso marker="#from_external/2"><c>from_external/2</c></seealso>, + <seealso marker="#is_empty_set/1"><c>is_empty_set/1</c></seealso>, + <seealso marker="#is_set/1"><c>is_set/1</c></seealso>, + <seealso marker="#is_sofs_set/1"><c>is_sofs_set/1</c></seealso>, + <seealso marker="#to_external/1"><c>to_external/1</c></seealso> + <seealso marker="#type/1"><c>type/1</c></seealso>.</p> + <p>The functions of this module exit the process with a <c>badarg</c>, <c>bad_function</c>, or <c>type_mismatch</c> message when given badly formed arguments or sets the types of which are not compatible.</p> - <p>When comparing external sets the operator <c>==/2</c> is used.</p> + + <p>When comparing external sets, operator <c>==/2</c> is used.</p> </description> + <datatypes> <datatype> <name name="anyset"></name> @@ -399,10 +450,10 @@ fun(S) -> sofs:partition(1, S) end <datatype> <!-- Parameterized opaque types are NYI: --> <name>tuple_of(T)</name> - <desc><p><marker id="type-tuple_of"/> - A tuple where the elements are of type <c>T</c>.</p></desc> + <desc><p>A tuple where the elements are of type <c>T</c>.</p></desc> </datatype> </datatypes> + <funcs> <func> <name name="a_function" arity="1"/> @@ -411,24 +462,25 @@ fun(S) -> sofs:partition(1, S) end <desc> <p>Creates a <seealso marker="#function">function</seealso>. <c>a_function(F, T)</c> is equivalent to - <c>from_term(F, T)</c>, if the result is a function. If + <c>from_term(F, T)</c> if the result is a function. If no <seealso marker="#type">type</seealso> is explicitly - given, <c>[{atom, atom}]</c> is used as type of the - function.</p> + specified, <c>[{atom, atom}]</c> is used as the + function type.</p> </desc> </func> + <func> <name name="canonical_relation" arity="1"/> <fsummary>Return the canonical map.</fsummary> <desc> <p>Returns the binary relation containing the elements - (E, Set) such that Set belongs to <anno>SetOfSets</anno> and E - belongs to Set. If SetOfSets is - a <seealso marker="#partition">partition</seealso> of a set X and - R is the equivalence relation in X induced by SetOfSets, then the - returned relation is - the <seealso marker="#canonical_map">canonical map</seealso> from - X onto the equivalence classes with respect to R.</p> + (E, Set) such that Set belongs to <c><anno>SetOfSets</anno></c> + and E belongs to Set. If <c>SetOfSets</c> is + a <seealso marker="#partition">partition</seealso> of a set X and + R is the equivalence relation in X induced by <c>SetOfSets</c>, + then the returned relation is + the <seealso marker="#canonical_map">canonical map</seealso> from + X onto the equivalence classes with respect to R.</p> <pre> 1> <input>Ss = sofs:from_term([[a,b],[b,c]]),</input> <input>CR = sofs:canonical_relation(Ss),</input> @@ -436,13 +488,14 @@ fun(S) -> sofs:partition(1, S) end [{a,[a,b]},{b,[a,b]},{b,[b,c]},{c,[b,c]}]</pre> </desc> </func> + <func> <name name="composite" arity="2"/> <fsummary>Return the composite of two functions.</fsummary> <desc> <p>Returns the <seealso marker="#composite">composite</seealso> of - the functions <anno>Function1</anno> and - <anno>Function2</anno>.</p> + the functions <c><anno>Function1</anno></c> and + <c><anno>Function2</anno></c>.</p> <pre> 1> <input>F1 = sofs:a_function([{a,1},{b,2},{c,2}]),</input> <input>F2 = sofs:a_function([{1,x},{2,y},{3,z}]),</input> @@ -451,13 +504,14 @@ fun(S) -> sofs:partition(1, S) end [{a,x},{b,y},{c,y}]</pre> </desc> </func> + <func> <name name="constant_function" arity="2"/> - <fsummary>Create the function that maps each element of a + <fsummary>Create the function that maps each element of a set onto another set.</fsummary> <desc> <p>Creates the <seealso marker="#function">function</seealso> - that maps each element of the set Set onto AnySet.</p> + that maps each element of set <c>Set</c> onto <c>AnySet</c>.</p> <pre> 1> <input>S = sofs:set([a,b]),</input> <input>E = sofs:from_term(1),</input> @@ -466,12 +520,13 @@ fun(S) -> sofs:partition(1, S) end [{a,1},{b,1}]</pre> </desc> </func> + <func> <name name="converse" arity="1"/> <fsummary>Return the converse of a binary relation.</fsummary> <desc> <p>Returns the <seealso marker="#converse">converse</seealso> - of the binary relation <anno>BinRel1</anno>.</p> + of the binary relation <c><anno>BinRel1</anno></c>.</p> <pre> 1> <input>R1 = sofs:relation([{1,a},{2,b},{3,a}]),</input> <input>R2 = sofs:converse(R1),</input> @@ -479,39 +534,42 @@ fun(S) -> sofs:partition(1, S) end [{a,1},{a,3},{b,2}]</pre> </desc> </func> + <func> <name name="difference" arity="2"/> <fsummary>Return the difference of two sets.</fsummary> <desc> - <p>Returns the <seealso marker="#difference">difference</seealso> of - the sets <anno>Set1</anno> and <anno>Set2</anno>.</p> + <p>Returns the <seealso marker="#difference">difference</seealso> of + the sets <c><anno>Set1</anno></c> and <c><anno>Set2</anno></c>.</p> </desc> </func> + <func> <name name="digraph_to_family" arity="1"/> <name name="digraph_to_family" arity="2"/> <fsummary>Create a family from a directed graph.</fsummary> <desc> <p>Creates a <seealso marker="#family">family</seealso> from - the directed graph <anno>Graph</anno>. Each vertex a of - <anno>Graph</anno> is - represented by a pair (a, {b[1], ..., b[n]}) - where the b[i]'s are the out-neighbours of a. If no type is - explicitly given, [{atom, [atom]}] is used as type of - the family. It is assumed that <anno>Type</anno> is - a <seealso marker="#valid_type">valid type</seealso> of the - external set of the family.</p> + the directed graph <c><anno>Graph</anno></c>. Each vertex a of + <c><anno>Graph</anno></c> is + represented by a pair (a, {b[1], ..., b[n]}), + where the b[i]:s are the out-neighbors of a. If no type is + explicitly specified, [{atom, [atom]}] is used as type of + the family. It is assumed that <c><anno>Type</anno></c> is + a <seealso marker="#valid_type">valid type</seealso> of the + external set of the family.</p> <p>If G is a directed graph, it holds that the vertices and edges of G are the same as the vertices and edges of <c>family_to_digraph(digraph_to_family(G))</c>.</p> </desc> </func> + <func> <name name="domain" arity="1"/> <fsummary>Return the domain of a binary relation.</fsummary> <desc> - <p>Returns the <seealso marker="#domain">domain</seealso> of - the binary relation <anno>BinRel</anno>.</p> + <p>Returns the <seealso marker="#domain">domain</seealso> of + the binary relation <c><anno>BinRel</anno></c>.</p> <pre> 1> <input>R = sofs:relation([{1,a},{1,b},{2,b},{2,c}]),</input> <input>S = sofs:domain(R),</input> @@ -519,14 +577,15 @@ fun(S) -> sofs:partition(1, S) end [1,2]</pre> </desc> </func> + <func> <name name="drestriction" arity="2"/> <fsummary>Return a restriction of a binary relation.</fsummary> <desc> <p>Returns the difference between the binary relation - <anno>BinRel1</anno> + <c><anno>BinRel1</anno></c> and the <seealso marker="#restriction">restriction</seealso> - of <anno>BinRel1</anno> to <anno>Set</anno>.</p> + of <c><anno>BinRel1</anno></c> to <c><anno>Set</anno></c>.</p> <pre> 1> <input>R1 = sofs:relation([{1,a},{2,b},{3,c}]),</input> <input>S = sofs:set([2,4,6]),</input> @@ -537,14 +596,15 @@ fun(S) -> sofs:partition(1, S) end <c>difference(R, restriction(R, S))</c>.</p> </desc> </func> + <func> <name name="drestriction" arity="3"/> <fsummary>Return a restriction of a relation.</fsummary> <desc> - <p>Returns a subset of <anno>Set1</anno> containing those elements - that do - not yield an element in <anno>Set2</anno> as the result of applying - <anno>SetFun</anno>.</p> + <p>Returns a subset of <c><anno>Set1</anno></c> containing those + elements that do not give + an element in <c><anno>Set2</anno></c> as the result of applying + <c><anno>SetFun</anno></c>.</p> <pre> 1> <input>SetFun = {external, fun({_A,B,C}) -> {B,C} end},</input> <input>R1 = sofs:relation([{a,aa,1},{b,bb,2},{c,cc,3}]),</input> @@ -556,24 +616,27 @@ fun(S) -> sofs:partition(1, S) end <c>difference(S1, restriction(F, S1, S2))</c>.</p> </desc> </func> + <func> <name name="empty_set" arity="0"/> <fsummary>Return the untyped empty set.</fsummary> <desc> - <p>Returns the <seealso marker="#sets_definition">untyped empty + <p>Returns the <seealso marker="#sets_definition">untyped empty set</seealso>. <c>empty_set()</c> is equivalent to <c>from_term([], ['_'])</c>.</p> </desc> </func> + <func> <name name="extension" arity="3"/> <fsummary>Extend the domain of a binary relation.</fsummary> <desc> - <p>Returns the <seealso marker="#extension">extension</seealso> of - <anno>BinRel1</anno> such that - for each element E in <anno>Set</anno> that does not belong to the - <seealso marker="#domain">domain</seealso> of <anno>BinRel1</anno>, - <anno>BinRel2</anno> contains the pair (E, AnySet).</p> + <p>Returns the <seealso marker="#extension">extension</seealso> of + <c><anno>BinRel1</anno></c> such that for + each element E in <c><anno>Set</anno></c> that does not belong to the + <seealso marker="#domain">domain</seealso> of + <c><anno>BinRel1</anno></c>, <c><anno>BinRel2</anno></c> contains the + pair (E, <c>AnySet</c>).</p> <pre> 1> <input>S = sofs:set([b,c]),</input> <input>A = sofs:empty_set(),</input> @@ -583,31 +646,33 @@ fun(S) -> sofs:partition(1, S) end [{a,[1,2]},{b,[3]},{c,[]}]</pre> </desc> </func> + <func> <name name="family" arity="1"/> <name name="family" arity="2"/> <fsummary>Create a family of subsets.</fsummary> <desc> - <p>Creates a <seealso marker="#family">family of subsets</seealso>. - <c>family(F, T)</c> is equivalent to - <c>from_term(F, T)</c>, if the result is a family. If + <p>Creates a <seealso marker="#family">family of subsets</seealso>. + <c>family(F, T)</c> is equivalent to + <c>from_term(F, T)</c> if the result is a family. If no <seealso marker="#type">type</seealso> is explicitly - given, <c>[{atom, [atom]}]</c> is used as type of the - family.</p> + specified, <c>[{atom, [atom]}]</c> is used as the + family type.</p> </desc> </func> + <func> <name name="family_difference" arity="2"/> <fsummary>Return the difference of two families.</fsummary> <desc> - <p>If <anno>Family1</anno> and <anno>Family2</anno> - are <seealso marker="#family">families</seealso>, then - <anno>Family3</anno> is the family + <p>If <c><anno>Family1</anno></c> and <c><anno>Family2</anno></c> + are <seealso marker="#family">families</seealso>, then + <c><anno>Family3</anno></c> is the family such that the index set is equal to the index set of - <anno>Family1</anno>, and <anno>Family3</anno>[i] is the - difference between <anno>Family1</anno>[i] - and <anno>Family2</anno>[i] if <anno>Family2</anno> maps i, - <anno>Family1</anno>[i] otherwise.</p> + <c><anno>Family1</anno></c>, and <c><anno>Family3</anno></c>[i] is + the difference between <c><anno>Family1</anno></c>[i] + and <c><anno>Family2</anno></c>[i] if <c><anno>Family2</anno></c> + maps i, otherwise <c><anno>Family1</anno>[i]</c>.</p> <pre> 1> <input>F1 = sofs:family([{a,[1,2]},{b,[3,4]}]),</input> <input>F2 = sofs:family([{b,[4,5]},{c,[6,7]}]),</input> @@ -616,19 +681,20 @@ fun(S) -> sofs:partition(1, S) end [{a,[1,2]},{b,[3]}]</pre> </desc> </func> + <func> <name name="family_domain" arity="1"/> <fsummary>Return a family of domains.</fsummary> <desc> - <p>If <anno>Family1</anno> is + <p>If <c><anno>Family1</anno></c> is a <seealso marker="#family">family</seealso> - and <anno>Family1</anno>[i] is a binary relation for every i - in the index set of <anno>Family1</anno>, - then <anno>Family2</anno> is the family with the same index - set as <anno>Family1</anno> such - that <anno>Family2</anno>[i] is + and <c><anno>Family1</anno></c>[i] is a binary relation for every i + in the index set of <c><anno>Family1</anno></c>, + then <c><anno>Family2</anno></c> is the family with the same index + set as <c><anno>Family1</anno></c> such + that <c><anno>Family2</anno></c>[i] is the <seealso marker="#domain">domain</seealso> of - <anno>Family1</anno>[i].</p> + <c><anno>Family1</anno>[i]</c>.</p> <pre> 1> <input>FR = sofs:from_term([{a,[{1,a},{2,b},{3,c}]},{b,[]},{c,[{4,d},{5,e}]}]),</input> <input>F = sofs:family_domain(FR),</input> @@ -636,43 +702,46 @@ fun(S) -> sofs:partition(1, S) end [{a,[1,2,3]},{b,[]},{c,[4,5]}]</pre> </desc> </func> + <func> <name name="family_field" arity="1"/> <fsummary>Return a family of fields.</fsummary> <desc> - <p>If <anno>Family1</anno> is + <p>If <c><anno>Family1</anno></c> is a <seealso marker="#family">family</seealso> - and <anno>Family1</anno>[i] is a binary relation for every i - in the index set of <anno>Family1</anno>, - then <anno>Family2</anno> is the family with the same index - set as <anno>Family1</anno> such - that <anno>Family2</anno>[i] is + and <c><anno>Family1</anno></c>[i] is a binary relation for every i + in the index set of <c><anno>Family1</anno></c>, + then <c><anno>Family2</anno></c> is the family with the same index + set as <c><anno>Family1</anno></c> such + that <c><anno>Family2</anno></c>[i] is the <seealso marker="#field">field</seealso> of - <anno>Family1</anno>[i].</p> + <c><anno>Family1</anno></c>[i].</p> <pre> 1> <input>FR = sofs:from_term([{a,[{1,a},{2,b},{3,c}]},{b,[]},{c,[{4,d},{5,e}]}]),</input> <input>F = sofs:family_field(FR),</input> <input>sofs:to_external(F).</input> [{a,[1,2,3,a,b,c]},{b,[]},{c,[4,5,d,e]}]</pre> <p><c>family_field(Family1)</c> is equivalent to - <c>family_union(family_domain(Family1), family_range(Family1))</c>.</p> + <c>family_union(family_domain(Family1), + family_range(Family1))</c>.</p> </desc> </func> + <func> <name name="family_intersection" arity="1"/> <fsummary>Return the intersection of a family of sets of sets.</fsummary> <desc> - <p>If <anno>Family1</anno> is + <p>If <c><anno>Family1</anno></c> is a <seealso marker="#family">family</seealso> - and <anno>Family1</anno>[i] is a set of sets for every i in - the index set of <anno>Family1</anno>, - then <anno>Family2</anno> is the family with the same index - set as <anno>Family1</anno> such - that <anno>Family2</anno>[i] is + and <c><anno>Family1</anno></c>[i] is a set of sets for every i in + the index set of <c><anno>Family1</anno></c>, + then <c><anno>Family2</anno></c> is the family with the same index + set as <c><anno>Family1</anno></c> such + that <c><anno>Family2</anno></c>[i] is the <seealso marker="#intersection_n">intersection</seealso> - of <anno>Family1</anno>[i].</p> - <p>If <anno>Family1</anno>[i] is an empty set for some i, then + of <c><anno>Family1</anno></c>[i].</p> + <p>If <c><anno>Family1</anno></c>[i] is an empty set for some i, the process exits with a <c>badarg</c> message.</p> <pre> 1> <input>F1 = sofs:from_term([{a,[[1,2,3],[2,3,4]]},{b,[[x,y,z],[x,y]]}]),</input> @@ -681,17 +750,18 @@ fun(S) -> sofs:partition(1, S) end [{a,[2,3]},{b,[x,y]}]</pre> </desc> </func> + <func> <name name="family_intersection" arity="2"/> <fsummary>Return the intersection of two families.</fsummary> <desc> - <p>If <anno>Family1</anno> and <anno>Family2</anno> - are <seealso marker="#family">families</seealso>, - then <anno>Family3</anno> is the family such that the index - set is the intersection of <anno>Family1</anno>'s and - <anno>Family2</anno>'s index sets, - and <anno>Family3</anno>[i] is the intersection of - <anno>Family1</anno>[i] and <anno>Family2</anno>[i].</p> + <p>If <c><anno>Family1</anno></c> and <c><anno>Family2</anno></c> + are <seealso marker="#family">families</seealso>, + then <c><anno>Family3</anno></c> is the family such that the index + set is the intersection of <c><anno>Family1</anno></c>:s and + <c><anno>Family2</anno></c>:s index sets, + and <c><anno>Family3</anno></c>[i] is the intersection of + <c><anno>Family1</anno></c>[i] and <c><anno>Family2</anno></c>[i].</p> <pre> 1> <input>F1 = sofs:family([{a,[1,2]},{b,[3,4]},{c,[5,6]}]),</input> <input>F2 = sofs:family([{b,[4,5]},{c,[7,8]},{d,[9,10]}]),</input> @@ -700,17 +770,18 @@ fun(S) -> sofs:partition(1, S) end [{b,[4]},{c,[]}]</pre> </desc> </func> + <func> <name name="family_projection" arity="2"/> <fsummary>Return a family of modified subsets.</fsummary> <desc> - <p>If <anno>Family1</anno> is - a <seealso marker="#family">family</seealso> - then <anno>Family2</anno> is the family with the same index - set as <anno>Family1</anno> such - that <anno>Family2</anno>[i] is the result of - calling <anno>SetFun</anno> with <anno>Family1</anno>[i] as - argument.</p> + <p>If <c><anno>Family1</anno></c> is + a <seealso marker="#family">family</seealso>, + then <c><anno>Family2</anno></c> is the family with the same index + set as <c><anno>Family1</anno></c> such + that <c><anno>Family2</anno></c>[i] is the result of + calling <c><anno>SetFun</anno></c> with <c><anno>Family1</anno></c>[i] + as argument.</p> <pre> 1> <input>F1 = sofs:from_term([{a,[[1,2],[2,3]]},{b,[[]]}]),</input> <input>F2 = sofs:family_projection(fun sofs:union/1, F1),</input> @@ -718,19 +789,20 @@ fun(S) -> sofs:partition(1, S) end [{a,[1,2,3]},{b,[]}]</pre> </desc> </func> + <func> <name name="family_range" arity="1"/> <fsummary>Return a family of ranges.</fsummary> <desc> - <p>If <anno>Family1</anno> is + <p>If <c><anno>Family1</anno></c> is a <seealso marker="#family">family</seealso> - and <anno>Family1</anno>[i] is a binary relation for every i - in the index set of <anno>Family1</anno>, - then <anno>Family2</anno> is the family with the same index - set as <anno>Family1</anno> such - that <anno>Family2</anno>[i] is + and <c><anno>Family1</anno></c>[i] is a binary relation for every i + in the index set of <c><anno>Family1</anno></c>, + then <c><anno>Family2</anno></c> is the family with the same index + set as <c><anno>Family1</anno></c> such + that <c><anno>Family2</anno></c>[i] is the <seealso marker="#range">range</seealso> of - <anno>Family1</anno>[i].</p> + <c><anno>Family1</anno></c>[i].</p> <pre> 1> <input>FR = sofs:from_term([{a,[{1,a},{2,b},{3,c}]},{b,[]},{c,[{4,d},{5,e}]}]),</input> <input>F = sofs:family_range(FR),</input> @@ -738,22 +810,23 @@ fun(S) -> sofs:partition(1, S) end [{a,[a,b,c]},{b,[]},{c,[d,e]}]</pre> </desc> </func> + <func> <name name="family_specification" arity="2"/> <fsummary>Select a subset of a family using a predicate.</fsummary> <desc> - <p>If <anno>Family1</anno> is + <p>If <c><anno>Family1</anno></c> is a <seealso marker="#family">family</seealso>, - then <anno>Family2</anno> is + then <c><anno>Family2</anno></c> is the <seealso marker="#restriction">restriction</seealso> of - <anno>Family1</anno> to those elements i of the index set - for which <anno>Fun</anno> applied - to <anno>Family1</anno>[i] returns - <c>true</c>. If <anno>Fun</anno> is a - tuple <c>{external, Fun2}</c>, Fun2 is applied to + <c><anno>Family1</anno></c> to those elements i of the index set + for which <c><anno>Fun</anno></c> applied + to <c><anno>Family1</anno></c>[i] returns + <c>true</c>. If <c><anno>Fun</anno></c> is a + tuple <c>{external, Fun2}</c>, then <c>Fun2</c> is applied to the <seealso marker="#external_set">external set</seealso> - of <anno>Family1</anno>[i], otherwise <anno>Fun</anno> is - applied to <anno>Family1</anno>[i].</p> + of <c><anno>Family1</anno></c>[i], otherwise <c><anno>Fun</anno></c> + is applied to <c><anno>Family1</anno></c>[i].</p> <pre> 1> <input>F1 = sofs:family([{a,[1,2,3]},{b,[1,2]},{c,[1]}]),</input> <input>SpecFun = fun(S) -> sofs:no_elements(S) =:= 2 end,</input> @@ -762,23 +835,24 @@ fun(S) -> sofs:partition(1, S) end [{b,[1,2]}]</pre> </desc> </func> + <func> <name name="family_to_digraph" arity="1"/> <name name="family_to_digraph" arity="2"/> <fsummary>Create a directed graph from a family.</fsummary> <desc> - <p>Creates a directed graph from - the <seealso marker="#family">family</seealso> <anno>Family</anno>. + <p>Creates a directed graph from + <seealso marker="#family">family</seealso> <c><anno>Family</anno></c>. For each pair (a, {b[1], ..., b[n]}) - of <anno>Family</anno>, the vertex - a as well the edges (a, b[i]) for + of <c><anno>Family</anno></c>, vertex + a and the edges (a, b[i]) for 1 <= i <= n are added to a newly created directed graph.</p> - <p>If no graph type is given <seealso marker="digraph#new/0"> - digraph:new/0</seealso> is used for - creating the directed graph, otherwise the <anno>GraphType</anno> - argument is passed on as second argument to - <seealso marker="digraph#new/1">digraph:new/1</seealso>.</p> + <p>If no graph type is specified, <seealso marker="digraph#new/0"> + <c>digraph:new/0</c></seealso> is used for + creating the directed graph, otherwise argument + <c><anno>GraphType</anno></c> is passed on as second argument to + <seealso marker="digraph#new/1"><c>digraph:new/1</c></seealso>.</p> <p>It F is a family, it holds that F is a subset of <c>digraph_to_family(family_to_digraph(F), type(F))</c>. Equality holds if <c>union_of_family(F)</c> is a subset of @@ -787,16 +861,17 @@ fun(S) -> sofs:partition(1, S) end a <c>cyclic</c> message.</p> </desc> </func> + <func> <name name="family_to_relation" arity="1"/> <fsummary>Create a binary relation from a family.</fsummary> <desc> - <p>If <anno>Family</anno> is + <p>If <c><anno>Family</anno></c> is a <seealso marker="#family">family</seealso>, - then <anno>BinRel</anno> is the binary relation containing + then <c><anno>BinRel</anno></c> is the binary relation containing all pairs (i, x) such that i belongs to the index set - of <anno>Family</anno> and x belongs - to <anno>Family</anno>[i].</p> + of <c><anno>Family</anno></c> and x belongs + to <c><anno>Family</anno></c>[i].</p> <pre> 1> <input>F = sofs:family([{a,[]}, {b,[1]}, {c,[2,3]}]),</input> <input>R = sofs:family_to_relation(F),</input> @@ -804,19 +879,20 @@ fun(S) -> sofs:partition(1, S) end [{b,1},{c,2},{c,3}]</pre> </desc> </func> + <func> <name name="family_union" arity="1"/> <fsummary>Return the union of a family of sets of sets.</fsummary> <desc> - <p>If <anno>Family1</anno> is + <p>If <c><anno>Family1</anno></c> is a <seealso marker="#family">family</seealso> - and <anno>Family1</anno>[i] is a set of sets for each i in - the index set of <anno>Family1</anno>, - then <anno>Family2</anno> is the family with the same index - set as <anno>Family1</anno> such - that <anno>Family2</anno>[i] is + and <c><anno>Family1</anno></c>[i] is a set of sets for each i in + the index set of <c><anno>Family1</anno></c>, + then <c><anno>Family2</anno></c> is the family with the same index + set as <c><anno>Family1</anno></c> such + that <c><anno>Family2</anno></c>[i] is the <seealso marker="#union_n">union</seealso> of - <anno>Family1</anno>[i].</p> + <c><anno>Family1</anno></c>[i].</p> <pre> 1> <input>F1 = sofs:from_term([{a,[[1,2],[2,3]]},{b,[[]]}]),</input> <input>F2 = sofs:family_union(F1),</input> @@ -826,19 +902,20 @@ fun(S) -> sofs:partition(1, S) end <c>family_projection(fun sofs:union/1, F)</c>.</p> </desc> </func> + <func> <name name="family_union" arity="2"/> <fsummary>Return the union of two families.</fsummary> <desc> - <p>If <anno>Family1</anno> and <anno>Family2</anno> - are <seealso marker="#family">families</seealso>, - then <anno>Family3</anno> is the family such that the index - set is the union of <anno>Family1</anno>'s - and <anno>Family2</anno>'s index sets, - and <anno>Family3</anno>[i] is the union - of <anno>Family1</anno>[i] and <anno>Family2</anno>[i] if - both maps i, <anno>Family1</anno>[i] - or <anno>Family2</anno>[i] otherwise.</p> + <p>If <c><anno>Family1</anno></c> and <c><anno>Family2</anno></c> + are <seealso marker="#family">families</seealso>, + then <c><anno>Family3</anno></c> is the family such that the index + set is the union of <c><anno>Family1</anno></c>:s + and <c><anno>Family2</anno></c>:s index sets, + and <c><anno>Family3</anno></c>[i] is the union + of <c><anno>Family1</anno></c>[i] and <c><anno>Family2</anno></c>[i] + if both map i, otherwise <c><anno>Family1</anno></c>[i] + or <c><anno>Family2</anno></c>[i].</p> <pre> 1> <input>F1 = sofs:family([{a,[1,2]},{b,[3,4]},{c,[5,6]}]),</input> <input>F2 = sofs:family([{b,[4,5]},{c,[7,8]},{d,[9,10]}]),</input> @@ -847,40 +924,43 @@ fun(S) -> sofs:partition(1, S) end [{a,[1,2]},{b,[3,4,5]},{c,[5,6,7,8]},{d,[9,10]}]</pre> </desc> </func> + <func> <name name="field" arity="1"/> <fsummary>Return the field of a binary relation.</fsummary> <desc> <p>Returns the <seealso marker="#field">field</seealso> of the - binary relation <anno>BinRel</anno>.</p> + binary relation <c><anno>BinRel</anno></c>.</p> <pre> 1> <input>R = sofs:relation([{1,a},{1,b},{2,b},{2,c}]),</input> <input>S = sofs:field(R),</input> <input>sofs:to_external(S).</input> [1,2,a,b,c]</pre> - <p><c>field(R)</c> is equivalent - to <c>union(domain(R), range(R))</c>.</p> + <p><c>field(R)</c> is equivalent + to <c>union(domain(R), range(R))</c>.</p> </desc> </func> + <func> <name name="from_external" arity="2"/> <fsummary>Create a set.</fsummary> <desc> <p>Creates a set from the <seealso marker="#external_set">external - set</seealso> <anno>ExternalSet</anno> - and the <seealso marker="#type">type</seealso> <anno>Type</anno>. - It is assumed that <anno>Type</anno> is + set</seealso> <c><anno>ExternalSet</anno></c> and + the <seealso marker="#type">type</seealso> <c><anno>Type</anno></c>. + It is assumed that <c><anno>Type</anno></c> is a <seealso marker="#valid_type">valid - type</seealso> of <anno>ExternalSet</anno>.</p> + type</seealso> of <c><anno>ExternalSet</anno></c>.</p> </desc> </func> + <func> <name name="from_sets" arity="1" clause_i="1"/> <fsummary>Create a set out of a list of sets.</fsummary> <desc> - <p>Returns the <seealso marker="#sets_definition">unordered - set</seealso> containing the sets of the list - <anno>ListOfSets</anno>.</p> + <p>Returns the <seealso marker="#sets_definition">unordered + set</seealso> containing the sets of list + <c><anno>ListOfSets</anno></c>.</p> <pre> 1> <input>S1 = sofs:relation([{a,1},{b,2}]),</input> <input>S2 = sofs:relation([{x,3},{y,4}]),</input> @@ -889,31 +969,33 @@ fun(S) -> sofs:partition(1, S) end [[{a,1},{b,2}],[{x,3},{y,4}]]</pre> </desc> </func> + <func> <name name="from_sets" arity="1" clause_i="2"/> <fsummary>Create an ordered set out of a tuple of sets.</fsummary> <desc> - <p>Returns the <seealso marker="#sets_definition">ordered - set</seealso> containing the sets of the non-empty tuple - <anno>TupleOfSets</anno>.</p> + <p>Returns the <seealso marker="#sets_definition">ordered + set</seealso> containing the sets of the non-empty tuple + <c><anno>TupleOfSets</anno></c>.</p> </desc> </func> + <func> <name name="from_term" arity="1"/> <name name="from_term" arity="2"/> <fsummary>Create a set.</fsummary> <desc> - <p><marker id="from_term"></marker>Creates an element - of <seealso marker="#sets_definition">Sets</seealso> by - traversing the term <anno>Term</anno>, sorting lists, - removing duplicates and - deriving or verifying a <seealso marker="#valid_type">valid - type</seealso> for the so obtained external set. An - explicitly given <seealso marker="#type">type</seealso> - <anno>Type</anno> + <p><marker id="from_term"></marker>Creates an element + of <seealso marker="#sets_definition">Sets</seealso> by + traversing term <c><anno>Term</anno></c>, sorting lists, + removing duplicates, and + deriving or verifying a <seealso marker="#valid_type">valid + type</seealso> for the so obtained external set. An + explicitly specified <seealso marker="#type">type</seealso> + <c><anno>Type</anno></c> can be used to limit the depth of the traversal; an atomic - type stops the traversal, as demonstrated by this example - where "foo" and {"foo"} are left unmodified:</p> + type stops the traversal, as shown by the following example + where <c>"foo"</c> and <c>{"foo"}</c> are left unmodified:</p> <pre> 1> <input>S = sofs:from_term([{{"foo"},[1,1]},{"foo",[2,2]}], [{atom,[atom]}]),</input> @@ -921,12 +1003,12 @@ fun(S) -> sofs:partition(1, S) end [{{"foo"},[1]},{"foo",[2]}]</pre> <p><c>from_term</c> can be used for creating atomic or ordered sets. The only purpose of such a set is that of later - building unordered sets since all functions in this module + building unordered sets, as all functions in this module that <em>do</em> anything operate on unordered sets. Creating unordered sets from a collection of ordered sets - may be the way to go if the ordered sets are big and one + can be the way to go if the ordered sets are big and one does not want to waste heap by rebuilding the elements of - the unordered set. An example showing that a set can be + the unordered set. The following example shows that a set can be built "layer by layer":</p> <pre> 1> <input>A = sofs:from_term(a),</input> @@ -936,19 +1018,25 @@ fun(S) -> sofs:partition(1, S) end <input>Ss = sofs:from_sets([P1,P2]),</input> <input>sofs:to_external(Ss).</input> [{a,[1,2,3]},{b,[4,5,6]}]</pre> - <p>Other functions that create sets are <c>from_external/2</c> - and <c>from_sets/1</c>. Special cases of <c>from_term/2</c> - are <c>a_function/1,2</c>, <c>empty_set/0</c>, - <c>family/1,2</c>, <c>relation/1,2</c>, and <c>set/1,2</c>.</p> + <p>Other functions that create sets are + <seealso marker="#from_external/2"><c>from_external/2</c></seealso> + and <seealso marker="#from_sets/1"><c>from_sets/1</c></seealso>. + Special cases of <c>from_term/2</c> are + <seealso marker="#a_function/1"><c>a_function/1,2</c></seealso>, + <seealso marker="#empty_set/0"><c>empty_set/0</c></seealso>, + <seealso marker="#family/1"><c>family/1,2</c></seealso>, + <seealso marker="#relation/1"><c>relation/1,2</c></seealso>, and + <seealso marker="#set/1"><c>set/1,2</c></seealso>.</p> </desc> </func> + <func> <name name="image" arity="2"/> <fsummary>Return the image of a set under a binary relation.</fsummary> <desc> - <p>Returns the <seealso marker="#image">image</seealso> of the - set <anno>Set1</anno> under the binary - relation <anno>BinRel</anno>.</p> + <p>Returns the <seealso marker="#image">image</seealso> of + set <c><anno>Set1</anno></c> under the binary + relation <c><anno>BinRel</anno></c>.</p> <pre> 1> <input>R = sofs:relation([{1,a},{2,b},{2,c},{3,d}]),</input> <input>S1 = sofs:set([1,2]),</input> @@ -957,32 +1045,35 @@ fun(S) -> sofs:partition(1, S) end [a,b,c]</pre> </desc> </func> + <func> <name name="intersection" arity="1"/> <fsummary>Return the intersection of a set of sets.</fsummary> <desc> - <p>Returns - the <seealso marker="#intersection_n">intersection</seealso> of - the set of sets <anno>SetOfSets</anno>.</p> + <p>Returns + the <seealso marker="#intersection_n">intersection</seealso> of + the set of sets <c><anno>SetOfSets</anno></c>.</p> <p>Intersecting an empty set of sets exits the process with a <c>badarg</c> message.</p> </desc> </func> + <func> <name name="intersection" arity="2"/> <fsummary>Return the intersection of two sets.</fsummary> <desc> - <p>Returns - the <seealso marker="#intersection">intersection</seealso> of - <anno>Set1</anno> and <anno>Set2</anno>.</p> + <p>Returns + the <seealso marker="#intersection">intersection</seealso> of + <c><anno>Set1</anno></c> and <c><anno>Set2</anno></c>.</p> </desc> </func> + <func> <name name="intersection_of_family" arity="1"/> <fsummary>Return the intersection of a family.</fsummary> <desc> - <p>Returns the intersection of - the <seealso marker="#family">family</seealso> <anno>Family</anno>. + <p>Returns the intersection of + <seealso marker="#family">family</seealso> <c><anno>Family</anno></c>. </p> <p>Intersecting an empty family exits the process with a <c>badarg</c> message.</p> @@ -993,12 +1084,13 @@ fun(S) -> sofs:partition(1, S) end [2]</pre> </desc> </func> + <func> <name name="inverse" arity="1"/> <fsummary>Return the inverse of a function.</fsummary> <desc> <p>Returns the <seealso marker="#inverse">inverse</seealso> - of the function <anno>Function1</anno>.</p> + of function <c><anno>Function1</anno></c>.</p> <pre> 1> <input>R1 = sofs:relation([{1,a},{2,b},{3,c}]),</input> <input>R2 = sofs:inverse(R1),</input> @@ -1006,14 +1098,15 @@ fun(S) -> sofs:partition(1, S) end [{a,1},{b,2},{c,3}]</pre> </desc> </func> + <func> <name name="inverse_image" arity="2"/> - <fsummary>Return the inverse image of a set under + <fsummary>Return the inverse image of a set under a binary relation.</fsummary> <desc> <p>Returns the <seealso marker="#inverse_image">inverse - image</seealso> of <anno>Set1</anno> under the binary - relation <anno>BinRel</anno>.</p> + image</seealso> of <c><anno>Set1</anno></c> under the binary + relation <c><anno>BinRel</anno></c>.</p> <pre> 1> <input>R = sofs:relation([{1,a},{2,b},{2,c},{3,d}]),</input> <input>S1 = sofs:set([c,d,e]),</input> @@ -1022,42 +1115,46 @@ fun(S) -> sofs:partition(1, S) end [2,3]</pre> </desc> </func> + <func> <name name="is_a_function" arity="1"/> <fsummary>Test for a function.</fsummary> <desc> - <p>Returns <c>true</c> if the binary relation <anno>BinRel</anno> + <p>Returns <c>true</c> if the binary relation <c><anno>BinRel</anno></c> is a <seealso marker="#function">function</seealso> or the - untyped empty set, <c>false</c> otherwise.</p> + untyped empty set, otherwise <c>false</c>.</p> </desc> </func> + <func> <name name="is_disjoint" arity="2"/> <fsummary>Test for disjoint sets.</fsummary> <desc> - <p>Returns <c>true</c> if <anno>Set1</anno> - and <anno>Set2</anno> - are <seealso marker="#disjoint">disjoint</seealso>, <c>false</c> - otherwise.</p> + <p>Returns <c>true</c> if <c><anno>Set1</anno></c> + and <c><anno>Set2</anno></c> + are <seealso marker="#disjoint">disjoint</seealso>, otherwise + <c>false</c>.</p> </desc> </func> + <func> <name name="is_empty_set" arity="1"/> <fsummary>Test for an empty set.</fsummary> <desc> - <p>Returns <c>true</c> if <anno>AnySet</anno> is an empty - unordered set, <c>false</c> otherwise.</p> + <p>Returns <c>true</c> if <c><anno>AnySet</anno></c> is an empty + unordered set, otherwise <c>false</c>.</p> </desc> </func> + <func> <name name="is_equal" arity="2"/> <fsummary>Test two sets for equality.</fsummary> <desc> - <p>Returns <c>true</c> if the <anno>AnySet1</anno> - and <anno>AnySet2</anno> - are <seealso marker="#equal">equal</seealso>, <c>false</c> - otherwise. This example shows that <c>==/2</c> is used when - comparing sets for equality:</p> + <p>Returns <c>true</c> if <c><anno>AnySet1</anno></c> + and <c><anno>AnySet2</anno></c> + are <seealso marker="#equal">equal</seealso>, otherwise + <c>false</c>. The following example shows that <c>==/2</c> is + used when comparing sets for equality:</p> <pre> 1> <input>S1 = sofs:set([1.0]),</input> <input>S2 = sofs:set([1]),</input> @@ -1065,50 +1162,55 @@ fun(S) -> sofs:partition(1, S) end true</pre> </desc> </func> + <func> <name name="is_set" arity="1"/> <fsummary>Test for an unordered set.</fsummary> <desc> - <p>Returns <c>true</c> if <anno>AnySet</anno> is - an <seealso marker="#sets_definition">unordered set</seealso>, and - <c>false</c> if <anno>AnySet</anno> is an ordered set or an + <p>Returns <c>true</c> if <c><anno>AnySet</anno></c> is + an <seealso marker="#sets_definition">unordered set</seealso>, and + <c>false</c> if <c><anno>AnySet</anno></c> is an ordered set or an atomic set.</p> </desc> </func> + <func> <name name="is_sofs_set" arity="1"/> <fsummary>Test for an unordered set.</fsummary> <desc> - <p>Returns <c>true</c> if <anno>Term</anno> is + <p>Returns <c>true</c> if <c><anno>Term</anno></c> is an <seealso marker="#sets_definition">unordered set</seealso>, an - ordered set or an atomic set, <c>false</c> otherwise.</p> + ordered set, or an atomic set, otherwise <c>false</c>.</p> </desc> </func> + <func> <name name="is_subset" arity="2"/> <fsummary>Test two sets for subset.</fsummary> <desc> - <p>Returns <c>true</c> if <anno>Set1</anno> is - a <seealso marker="#subset">subset</seealso> - of <anno>Set2</anno>, <c>false</c> otherwise.</p> + <p>Returns <c>true</c> if <c><anno>Set1</anno></c> is + a <seealso marker="#subset">subset</seealso> + of <c><anno>Set2</anno></c>, otherwise <c>false</c>.</p> </desc> </func> + <func> <name name="is_type" arity="1"/> <fsummary>Test for a type.</fsummary> <desc> - <p>Returns <c>true</c> if the term <anno>Term</anno> is - a <seealso marker="#type">type</seealso>.</p> + <p>Returns <c>true</c> if term <c><anno>Term</anno></c> is + a <seealso marker="#type">type</seealso>.</p> </desc> </func> + <func> <name name="join" arity="4"/> <fsummary>Return the join of two relations.</fsummary> <desc> - <p>Returns the <seealso marker="#natural_join">natural - join</seealso> of the relations <anno>Relation1</anno> - and <anno>Relation2</anno> on coordinates <anno>I</anno> and - <anno>J</anno>.</p> + <p>Returns the <seealso marker="#natural_join">natural + join</seealso> of the relations <c><anno>Relation1</anno></c> + and <c><anno>Relation2</anno></c> on coordinates <c><anno>I</anno></c> + and <c><anno>J</anno></c>.</p> <pre> 1> <input>R1 = sofs:relation([{a,x,1},{b,y,2}]),</input> <input>R2 = sofs:relation([{1,f,g},{1,h,i},{2,3,4}]),</input> @@ -1117,18 +1219,19 @@ true</pre> [{a,x,1,f,g},{a,x,1,h,i},{b,y,2,3,4}]</pre> </desc> </func> + <func> <name name="multiple_relative_product" arity="2"/> - <fsummary>Return the multiple relative product of a tuple of binary + <fsummary>Return the multiple relative product of a tuple of binary relations and a relation.</fsummary> <desc> - <p>If <anno>TupleOfBinRels</anno> is a non-empty tuple + <p>If <c><anno>TupleOfBinRels</anno></c> is a non-empty tuple {R[1], ..., R[n]} of binary relations - and <anno>BinRel1</anno> is a binary relation, - then <anno>BinRel2</anno> is - the <seealso marker="#multiple_relative_product">multiple relative - product</seealso> of the ordered set - (R[i], ..., R[n]) and <anno>BinRel1</anno>.</p> + and <c><anno>BinRel1</anno></c> is a binary relation, + then <c><anno>BinRel2</anno></c> is + the <seealso marker="#multiple_relative_product">multiple relative + product</seealso> of the ordered set + (R[i], ..., R[n]) and <c><anno>BinRel1</anno></c>.</p> <pre> 1> <input>Ri = sofs:relation([{a,1},{b,2},{c,3}]),</input> <input>R = sofs:relation([{a,b},{b,c},{c,a}]),</input> @@ -1137,22 +1240,24 @@ true</pre> [{1,2},{2,3},{3,1}]</pre> </desc> </func> + <func> <name name="no_elements" arity="1"/> <fsummary>Return the number of elements of a set.</fsummary> <desc> <p>Returns the number of elements of the ordered or unordered - set <anno>ASet</anno>.</p> + set <c><anno>ASet</anno></c>.</p> </desc> </func> + <func> <name name="partition" arity="1"/> <fsummary>Return the coarsest partition given a set of sets.</fsummary> <desc> - <p>Returns the <seealso marker="#partition">partition</seealso> of - the union of the set of sets <anno>SetOfSets</anno> such that two - elements are considered equal if they belong to the same - elements of <anno>SetOfSets</anno>.</p> + <p>Returns the <seealso marker="#partition">partition</seealso> of + the union of the set of sets <c><anno>SetOfSets</anno></c> such that + two elements are considered equal if they belong to the same + elements of <c><anno>SetOfSets</anno></c>.</p> <pre> 1> <input>Sets1 = sofs:from_term([[a,b,c],[d,e,f],[g,h,i]]),</input> <input>Sets2 = sofs:from_term([[b,c,d],[e,f,g],[h,i,j]]),</input> @@ -1161,13 +1266,14 @@ true</pre> [[a],[b,c],[d],[e,f],[g],[h,i],[j]]</pre> </desc> </func> + <func> <name name="partition" arity="2"/> <fsummary>Return a partition of a set.</fsummary> <desc> - <p>Returns the <seealso marker="#partition">partition</seealso> of - <anno>Set</anno> such that two elements are considered equal - if the results of applying <anno>SetFun</anno> are equal.</p> + <p>Returns the <seealso marker="#partition">partition</seealso> of + <c><anno>Set</anno></c> such that two elements are considered equal + if the results of applying <c><anno>SetFun</anno></c> are equal.</p> <pre> 1> <input>Ss = sofs:from_term([[a],[b],[c,d],[e,f]]),</input> <input>SetFun = fun(S) -> sofs:from_term(sofs:no_elements(S)) end,</input> @@ -1176,17 +1282,18 @@ true</pre> [[[a],[b]],[[c,d],[e,f]]]</pre> </desc> </func> + <func> <name name="partition" arity="3"/> <fsummary>Return a partition of a set.</fsummary> <desc> <p>Returns a pair of sets that, regarded as constituting a - set, forms a <seealso marker="#partition">partition</seealso> of - <anno>Set1</anno>. If the - result of applying <anno>SetFun</anno> to an element - of <anno>Set1</anno> yields an element in <anno>Set2</anno>, - the element belongs to <anno>Set3</anno>, otherwise the - element belongs to <anno>Set4</anno>.</p> + set, forms a <seealso marker="#partition">partition</seealso> of + <c><anno>Set1</anno></c>. If the + result of applying <c><anno>SetFun</anno></c> to an element of + <c><anno>Set1</anno></c> gives an element in <c><anno>Set2</anno></c>, + the element belongs to <c><anno>Set3</anno></c>, otherwise the + element belongs to <c><anno>Set4</anno></c>.</p> <pre> 1> <input>R1 = sofs:relation([{1,a},{2,b},{3,c}]),</input> <input>S = sofs:set([2,4,6]),</input> @@ -1194,23 +1301,23 @@ true</pre> <input>{sofs:to_external(R2),sofs:to_external(R3)}.</input> {[{2,b}],[{1,a},{3,c}]}</pre> <p><c>partition(F, S1, S2)</c> is equivalent to - <c>{restriction(F, S1, S2), + <c>{restriction(F, S1, S2), drestriction(F, S1, S2)}</c>.</p> </desc> </func> + <func> <name name="partition_family" arity="2"/> <fsummary>Return a family indexing a partition.</fsummary> <desc> - <p>Returns the <seealso marker="#family">family</seealso> - <anno>Family</anno> where the indexed set is - a <seealso marker="#partition">partition</seealso> - of <anno>Set</anno> such that two elements are considered - equal if the results of applying <anno>SetFun</anno> are the - same value i. This i is the index that <anno>Family</anno> - maps onto - the <seealso marker="#equivalence_class">equivalence - class</seealso>.</p> + <p>Returns <seealso marker="#family">family</seealso> + <c><anno>Family</anno></c> where the indexed set is + a <seealso marker="#partition">partition</seealso> + of <c><anno>Set</anno></c> such that two elements are considered + equal if the results of applying <c><anno>SetFun</anno></c> are the + same value i. This i is the index that <c><anno>Family</anno></c> + maps onto the <seealso marker="#equivalence_class">equivalence + class</seealso>.</p> <pre> 1> <input>S = sofs:relation([{a,a,a,a},{a,a,b,b},{a,b,b,b}]),</input> <input>SetFun = {external, fun({A,_,C,_}) -> {A,C} end},</input> @@ -1219,16 +1326,16 @@ true</pre> [{{a,a},[{a,a,a,a}]},{{a,b},[{a,a,b,b},{a,b,b,b}]}]</pre> </desc> </func> + <func> <name name="product" arity="1"/> <fsummary>Return the Cartesian product of a tuple of sets.</fsummary> <desc> - <p>Returns the <seealso marker="#Cartesian_product_tuple">Cartesian - product</seealso> of the non-empty tuple of sets - <anno>TupleOfSets</anno>. If (x[1], ..., x[n]) is - an element of the n-ary relation <anno>Relation</anno>, then - x[i] is drawn from element i - of <anno>TupleOfSets</anno>.</p> + <p>Returns the <seealso marker="#Cartesian_product_tuple">Cartesian + product</seealso> of the non-empty tuple of sets + <c><anno>TupleOfSets</anno></c>. If (x[1], ..., x[n]) is + an element of the n-ary relation <c><anno>Relation</anno></c>, then + x[i] is drawn from element i of <c><anno>TupleOfSets</anno></c>.</p> <pre> 1> <input>S1 = sofs:set([a,b]),</input> <input>S2 = sofs:set([1,2]),</input> @@ -1238,13 +1345,14 @@ true</pre> [{a,1,x},{a,1,y},{a,2,x},{a,2,y},{b,1,x},{b,1,y},{b,2,x},{b,2,y}]</pre> </desc> </func> + <func> <name name="product" arity="2"/> <fsummary>Return the Cartesian product of two sets.</fsummary> <desc> - <p>Returns the <seealso marker="#Cartesian_product">Cartesian - product</seealso> of <anno>Set1</anno> - and <anno>Set2</anno>.</p> + <p>Returns the <seealso marker="#Cartesian_product">Cartesian + product</seealso> of <c><anno>Set1</anno></c> + and <c><anno>Set2</anno></c>.</p> <pre> 1> <input>S1 = sofs:set([1,2]),</input> <input>S2 = sofs:set([a,b]),</input> @@ -1255,17 +1363,18 @@ true</pre> <c>product({S1, S2})</c>.</p> </desc> </func> + <func> <name name="projection" arity="2"/> <fsummary>Return a set of substituted elements.</fsummary> <desc> <p>Returns the set created by substituting each element of - <anno>Set1</anno> by the result of - applying <anno>SetFun</anno> to the element.</p> - <p>If <anno>SetFun</anno> is a number i >= 1 and - <anno>Set1</anno> is a relation, then the returned set is + <c><anno>Set1</anno></c> by the result of + applying <c><anno>SetFun</anno></c> to the element.</p> + <p>If <c><anno>SetFun</anno></c> is a number i >= 1 and + <c><anno>Set1</anno></c> is a relation, then the returned set is the <seealso marker="#projection">projection</seealso> of - <anno>Set1</anno> onto coordinate i.</p> + <c><anno>Set1</anno></c> onto coordinate i.</p> <pre> 1> <input>S1 = sofs:from_term([{1,a},{2,b},{3,a}]),</input> <input>S2 = sofs:projection(2, S1),</input> @@ -1273,12 +1382,13 @@ true</pre> [a,b]</pre> </desc> </func> + <func> <name name="range" arity="1"/> <fsummary>Return the range of a binary relation.</fsummary> <desc> <p>Returns the <seealso marker="#range">range</seealso> of the - binary relation <anno>BinRel</anno>.</p> + binary relation <c><anno>BinRel</anno></c>.</p> <pre> 1> <input>R = sofs:relation([{1,a},{1,b},{2,b},{2,c}]),</input> <input>S = sofs:range(R),</input> @@ -1286,6 +1396,7 @@ true</pre> [a,b,c]</pre> </desc> </func> + <func> <name name="relation" arity="1"/> <name name="relation" arity="2"/> @@ -1293,27 +1404,28 @@ true</pre> <desc> <p>Creates a <seealso marker="#relation">relation</seealso>. <c>relation(R, T)</c> is equivalent to - <c>from_term(R, T)</c>, if T is - a <seealso marker="#type">type</seealso> and the result is a - relation. If <anno>Type</anno> is an integer N, then - <c>[{atom, ..., atom}])</c>, where the size of the - tuple is N, is used as type of the relation. If no type is - explicitly given, the size of the first tuple of - <anno>Tuples</anno> is + <c>from_term(R, T)</c>, if T is + a <seealso marker="#type">type</seealso> and the result is a + relation. If <c><anno>Type</anno></c> is an integer N, then + <c>[{atom, ..., atom}])</c>, where the tuple size + is N, is used as type of the relation. If no type is + explicitly specified, the size of the first tuple of + <c><anno>Tuples</anno></c> is used if there is such a tuple. <c>relation([])</c> is equivalent to <c>relation([], 2)</c>.</p> </desc> </func> + <func> <name name="relation_to_family" arity="1"/> <fsummary>Create a family from a binary relation.</fsummary> <desc> - <p>Returns the <seealso marker="#family">family</seealso> - <anno>Family</anno> such that the index set is equal to - the <seealso marker="#domain">domain</seealso> of the binary - relation <anno>BinRel</anno>, and <anno>Family</anno>[i] is - the <seealso marker="#image">image</seealso> of the set of i - under <anno>BinRel</anno>.</p> + <p>Returns <seealso marker="#family">family</seealso> + <c><anno>Family</anno></c> such that the index set is equal to + the <seealso marker="#domain">domain</seealso> of the binary + relation <c><anno>BinRel</anno></c>, and <c><anno>Family</anno></c>[i] + is the <seealso marker="#image">image</seealso> of the set of i + under <c><anno>BinRel</anno></c>.</p> <pre> 1> <input>R = sofs:relation([{b,1},{c,2},{c,3}]),</input> <input>F = sofs:relation_to_family(R),</input> @@ -1321,20 +1433,21 @@ true</pre> [{b,[1]},{c,[2,3]}]</pre> </desc> </func> + <func> <name name="relative_product" arity="1"/> <name name="relative_product" arity="2" clause_i="1"/> <fsummary>Return the relative product of a list of binary relations - and a binary relation.</fsummary> + and a binary relation.</fsummary> <desc> - <p>If <anno>ListOfBinRels</anno> is a non-empty list + <p>If <c><anno>ListOfBinRels</anno></c> is a non-empty list [R[1], ..., R[n]] of binary relations and - <anno>BinRel1</anno> - is a binary relation, then <anno>BinRel2</anno> is the <seealso - marker="#tuple_relative_product">relative product</seealso> + <c><anno>BinRel1</anno></c> + is a binary relation, then <c><anno>BinRel2</anno></c> is the + <seealso marker="#tuple_relative_product">relative product</seealso> of the ordered set (R[i], ..., R[n]) and - <anno>BinRel1</anno>.</p> - <p>If <anno>BinRel1</anno> is omitted, the relation of equality + <c><anno>BinRel1</anno></c>.</p> + <p>If <c><anno>BinRel1</anno></c> is omitted, the relation of equality between the elements of the <seealso marker="#Cartesian_product_tuple">Cartesian product</seealso> of the ranges of R[i], @@ -1346,33 +1459,33 @@ true</pre> <input>R2 = sofs:relative_product([TR, R1]),</input> <input>sofs:to_external(R2).</input> [{1,{a,u}},{1,{aa,u}},{2,{b,v}}]</pre> - <p>Note that <c>relative_product([R1], R2)</c> is + <p>Notice that <c>relative_product([R1], R2)</c> is different from <c>relative_product(R1, R2)</c>; the - list of one element is not identified with the element - itself.</p> + list of one element is not identified with the element itself.</p> </desc> </func> + <func> <name name="relative_product" arity="2" clause_i="2"/> - <fsummary>Return the relative product of + <fsummary>Return the relative product of two binary relations.</fsummary> <desc> - <p><marker id="relprod_impl"></marker>Returns - the <seealso marker="#relative_product">relative - product</seealso> of the binary relations <anno>BinRel1</anno> - and <anno>BinRel2</anno>.</p> + <p>Returns the <seealso marker="#relative_product">relative + product</seealso> of the binary relations <c><anno>BinRel1</anno></c> + and <c><anno>BinRel2</anno></c>.</p> </desc> </func> + <func> <name name="relative_product1" arity="2"/> - <fsummary>Return the relative_product of + <fsummary>Return the relative_product of two binary relations.</fsummary> <desc> - <p>Returns the <seealso marker="#relative_product">relative - product</seealso> of - the <seealso marker="#converse">converse</seealso> of the - binary relation <anno>BinRel1</anno> and the binary - relation <anno>BinRel2</anno>.</p> + <p>Returns the <seealso marker="#relative_product">relative + product</seealso> of + the <seealso marker="#converse">converse</seealso> of the + binary relation <c><anno>BinRel1</anno></c> and the binary + relation <c><anno>BinRel2</anno></c>.</p> <pre> 1> <input>R1 = sofs:relation([{1,a},{1,aa},{2,b}]),</input> <input>R2 = sofs:relation([{1,u},{2,v},{3,c}]),</input> @@ -1383,13 +1496,14 @@ true</pre> <c>relative_product(converse(R1), R2)</c>.</p> </desc> </func> + <func> <name name="restriction" arity="2"/> <fsummary>Return a restriction of a binary relation.</fsummary> <desc> <p>Returns the <seealso marker="#restriction">restriction</seealso> of - the binary relation <anno>BinRel1</anno> - to <anno>Set</anno>.</p> + the binary relation <c><anno>BinRel1</anno></c> + to <c><anno>Set</anno></c>.</p> <pre> 1> <input>R1 = sofs:relation([{1,a},{2,b},{3,c}]),</input> <input>S = sofs:set([1,2,4]),</input> @@ -1398,13 +1512,14 @@ true</pre> [{1,a},{2,b}]</pre> </desc> </func> + <func> <name name="restriction" arity="3"/> <fsummary>Return a restriction of a set.</fsummary> <desc> - <p>Returns a subset of <anno>Set1</anno> containing those - elements that yield an element in <anno>Set2</anno> as the - result of applying <anno>SetFun</anno>.</p> + <p>Returns a subset of <c><anno>Set1</anno></c> containing those + elements that gives an element in <c><anno>Set2</anno></c> as the + result of applying <c><anno>SetFun</anno></c>.</p> <pre> 1> <input>S1 = sofs:relation([{1,a},{2,b},{3,c}]),</input> <input>S2 = sofs:set([b,c,d]),</input> @@ -1413,28 +1528,30 @@ true</pre> [{2,b},{3,c}]</pre> </desc> </func> + <func> <name name="set" arity="1"/> <name name="set" arity="2"/> <fsummary>Create a set of atoms or any type of sets.</fsummary> <desc> - <p>Creates an <seealso marker="#sets_definition">unordered - set</seealso>. <c>set(L, T)</c> is equivalent to + <p>Creates an <seealso marker="#sets_definition">unordered + set</seealso>. <c>set(L, T)</c> is equivalent to <c>from_term(L, T)</c>, if the result is an unordered set. If no <seealso marker="#type">type</seealso> is - explicitly given, <c>[atom]</c> is used as type of the set.</p> + explicitly specified, <c>[atom]</c> is used as the set type.</p> </desc> </func> + <func> <name name="specification" arity="2"/> <fsummary>Select a subset using a predicate.</fsummary> <desc> <p>Returns the set containing every element - of <anno>Set1</anno> for which <anno>Fun</anno> - returns <c>true</c>. If <anno>Fun</anno> is a tuple - <c>{external, Fun2}</c>, Fun2 is applied to the + of <c><anno>Set1</anno></c> for which <c><anno>Fun</anno></c> + returns <c>true</c>. If <c><anno>Fun</anno></c> is a tuple + <c>{external, Fun2}</c>, <c>Fun2</c> is applied to the <seealso marker="#external_set">external set</seealso> of - each element, otherwise <anno>Fun</anno> is applied to each + each element, otherwise <c><anno>Fun</anno></c> is applied to each element.</p> <pre> 1> <input>R1 = sofs:relation([{a,1},{b,2}]),</input> @@ -1445,14 +1562,15 @@ true</pre> [[{a,1},{b,2}]]</pre> </desc> </func> + <func> <name name="strict_relation" arity="1"/> - <fsummary>Return the strict relation corresponding to + <fsummary>Return the strict relation corresponding to a given relation.</fsummary> <desc> - <p>Returns the <seealso marker="#strict_relation">strict + <p>Returns the <seealso marker="#strict_relation">strict relation</seealso> corresponding to the binary - relation <anno>BinRel1</anno>.</p> + relation <c><anno>BinRel1</anno></c>.</p> <pre> 1> <input>R1 = sofs:relation([{1,1},{1,2},{2,1},{2,2}]),</input> <input>R2 = sofs:strict_relation(R1),</input> @@ -1460,13 +1578,14 @@ true</pre> [{1,2},{2,1}]</pre> </desc> </func> + <func> <name name="substitution" arity="2"/> <fsummary>Return a function with a given set as domain.</fsummary> <desc> <p>Returns a function, the domain of which - is <anno>Set1</anno>. The value of an element of the domain - is the result of applying <anno>SetFun</anno> to the + is <c><anno>Set1</anno></c>. The value of an element of the domain + is the result of applying <c><anno>SetFun</anno></c> to the element.</p> <pre> 1> <input>L = [{a,1},{b,2}].</input> @@ -1483,24 +1602,24 @@ true</pre> 1> <input>I = sofs:substitution(fun(A) -> A end, sofs:set([a,b,c])),</input> <input>sofs:to_external(I).</input> [{a,a},{b,b},{c,c}]</pre> - <p>Let SetOfSets be a set of sets and BinRel a binary - relation. The function that maps each element Set of - SetOfSets onto the <seealso marker="#image">image</seealso> - of Set under BinRel is returned by this function:</p> + <p>Let <c>SetOfSets</c> be a set of sets and <c>BinRel</c> a binary + relation. The function that maps each element <c>Set</c> of + <c>SetOfSets</c> onto the <seealso marker="#image">image</seealso> + of <c>Set</c> under <c>BinRel</c> is returned by the following + function:</p> <pre> images(SetOfSets, BinRel) -> Fun = fun(Set) -> sofs:image(BinRel, Set) end, sofs:substitution(Fun, SetOfSets).</pre> - <p>Here might be the place to reveal something that was more - or less stated before, namely that external unordered sets - are represented as sorted lists. As a consequence, creating - the image of a set under a relation R may traverse all + <p>External unordered sets are represented as sorted lists. So, + creating the image of a set under a relation R can traverse all elements of R (to that comes the sorting of results, the - image). In <c>images/2</c>, BinRel will be traversed once - for each element of SetOfSets, which may take too long. The - following efficient function could be used instead under the - assumption that the image of each element of SetOfSets under - BinRel is non-empty:</p> + image). In <seealso marker="#image/2"><c>image/2</c></seealso>, + <c>BinRel</c> is traversed once + for each element of <c>SetOfSets</c>, which can take too long. The + following efficient function can be used instead under the + assumption that the image of each element of <c>SetOfSets</c> under + <c>BinRel</c> is non-empty:</p> <pre> images2(SetOfSets, BinRel) -> CR = sofs:canonical_relation(SetOfSets), @@ -1508,13 +1627,14 @@ images2(SetOfSets, BinRel) -> sofs:relation_to_family(R).</pre> </desc> </func> + <func> <name name="symdiff" arity="2"/> <fsummary>Return the symmetric difference of two sets.</fsummary> <desc> - <p>Returns the <seealso marker="#symmetric_difference">symmetric + <p>Returns the <seealso marker="#symmetric_difference">symmetric difference</seealso> (or the Boolean sum) - of <anno>Set1</anno> and <anno>Set2</anno>.</p> + of <c><anno>Set1</anno></c> and <c><anno>Set2</anno></c>.</p> <pre> 1> <input>S1 = sofs:set([1,2,3]),</input> <input>S2 = sofs:set([2,3,4]),</input> @@ -1523,68 +1643,81 @@ images2(SetOfSets, BinRel) -> [1,4]</pre> </desc> </func> + <func> <name name="symmetric_partition" arity="2"/> <fsummary>Return a partition of two sets.</fsummary> <desc> - <p>Returns a triple of sets: <anno>Set3</anno> contains the - elements of <anno>Set1</anno> that do not belong - to <anno>Set2</anno>; <anno>Set4</anno> contains the - elements of <anno>Set1</anno> that belong - to <anno>Set2</anno>; <anno>Set5</anno> contains the - elements of <anno>Set2</anno> that do not belong - to <anno>Set1</anno>.</p> + <p>Returns a triple of sets:</p> + <list type="bulleted"> + <item><c><anno>Set3</anno></c> contains the elements of + <c><anno>Set1</anno></c> that do not belong to + <c><anno>Set2</anno></c>. + </item> + <item><c><anno>Set4</anno></c> contains the elements of + <c><anno>Set1</anno></c> that belong to <c><anno>Set2</anno></c>. + </item> + <item><c><anno>Set5</anno></c> contains the elements of + <c><anno>Set2</anno></c> that do not belong to + <c><anno>Set1</anno></c>. + </item> + </list> </desc> </func> + <func> <name name="to_external" arity="1"/> <fsummary>Return the elements of a set.</fsummary> <desc> - <p>Returns the <seealso marker="#external_set">external - set</seealso> of an atomic, ordered or unordered set.</p> + <p>Returns the <seealso marker="#external_set">external + set</seealso> of an atomic, ordered, or unordered set.</p> </desc> </func> + <func> <name name="to_sets" arity="1"/> - <fsummary>Return a list or a tuple of the elements of set.</fsummary> + <fsummary>Return a list or a tuple of the elements of a set.</fsummary> <desc> - <p>Returns the elements of the ordered set <anno>ASet</anno> + <p>Returns the elements of the ordered set <c><anno>ASet</anno></c> as a tuple of sets, and the elements of the unordered set - <anno>ASet</anno> as a sorted list of sets without + <c><anno>ASet</anno></c> as a sorted list of sets without duplicates.</p> </desc> </func> + <func> <name name="type" arity="1"/> <fsummary>Return the type of a set.</fsummary> <desc> <p>Returns the <seealso marker="#type">type</seealso> of an - atomic, ordered or unordered set.</p> + atomic, ordered, or unordered set.</p> </desc> </func> + <func> <name name="union" arity="1"/> <fsummary>Return the union of a set of sets.</fsummary> <desc> <p>Returns the <seealso marker="#union_n">union</seealso> of the - set of sets <anno>SetOfSets</anno>.</p> + set of sets <c><anno>SetOfSets</anno></c>.</p> </desc> </func> + <func> <name name="union" arity="2"/> <fsummary>Return the union of two sets.</fsummary> <desc> <p>Returns the <seealso marker="#union">union</seealso> of - <anno>Set1</anno> and <anno>Set2</anno>.</p> + <c><anno>Set1</anno></c> and <c><anno>Set2</anno></c>.</p> </desc> </func> + <func> <name name="union_of_family" arity="1"/> <fsummary>Return the union of a family.</fsummary> <desc> - <p>Returns the union of - the <seealso marker="#family">family</seealso> <anno>Family</anno>. - </p> + <p>Returns the union of <seealso marker="#family">family</seealso> + <c><anno>Family</anno></c>.</p> <pre> 1> <input>F = sofs:family([{a,[0,2,4]},{b,[0,1,2]},{c,[2,3]}]),</input> <input>S = sofs:union_of_family(F),</input> @@ -1592,16 +1725,17 @@ images2(SetOfSets, BinRel) -> [0,1,2,3,4]</pre> </desc> </func> + <func> <name name="weak_relation" arity="1"/> - <fsummary>Return the weak relation corresponding to + <fsummary>Return the weak relation corresponding to a given relation.</fsummary> <desc> <p>Returns a subset S of the <seealso marker="#weak_relation">weak relation</seealso> W - corresponding to the binary relation <anno>BinRel1</anno>. + corresponding to the binary relation <c><anno>BinRel1</anno></c>. Let F be the <seealso marker="#field">field</seealso> of - <anno>BinRel1</anno>. The + <c><anno>BinRel1</anno></c>. The subset S is defined so that x S y if x W y for some x in F and for some y in F.</p> <pre> @@ -1615,11 +1749,11 @@ images2(SetOfSets, BinRel) -> <section> <title>See Also</title> - <p><seealso marker="dict">dict(3)</seealso>, - <seealso marker="digraph">digraph(3)</seealso>, - <seealso marker="orddict">orddict(3)</seealso>, - <seealso marker="ordsets">ordsets(3)</seealso>, - <seealso marker="sets">sets(3)</seealso></p> + <p><seealso marker="dict"><c>dict(3)</c></seealso>, + <seealso marker="digraph"><c>digraph(3)</c></seealso>, + <seealso marker="orddict"><c>orddict(3)</c></seealso>, + <seealso marker="ordsets"><c>ordsets(3)</c></seealso>, + <seealso marker="sets"><c>sets(3)</c></seealso></p> </section> </erlref> diff --git a/lib/stdlib/doc/src/stdlib_app.xml b/lib/stdlib/doc/src/stdlib_app.xml index 5508be9c5d..f857cc394b 100644 --- a/lib/stdlib/doc/src/stdlib_app.xml +++ b/lib/stdlib/doc/src/stdlib_app.xml @@ -29,38 +29,38 @@ <rev></rev> </header> <app>STDLIB</app> - <appsummary>The STDLIB Application</appsummary> + <appsummary>The STDLIB application.</appsummary> <description> - <p>The STDLIB is mandatory in the sense that the minimal system - based on Erlang/OTP consists of Kernel and STDLIB. The STDLIB - application contains no services.</p> + <p>The STDLIB application is mandatory in the sense that the minimal + system based on Erlang/OTP consists of Kernel and STDLIB. + The STDLIB application contains no services.</p> </description> <section> <title>Configuration</title> <p>The following configuration parameters are defined for the STDLIB - application. See <c>app(4)</c> for more information about - configuration parameters.</p> + application. For more information about configuration parameters, see the + <seealso marker="kernel:app"><c>app(4)</c></seealso> module in Kernel.</p> + <taglist> <tag><c>shell_esc = icl | abort</c></tag> <item> - <p>This parameter can be used to alter the behaviour of - the Erlang shell when ^G is pressed.</p> + <p>Can be used to change the behavior of the Erlang shell when + <em>^G</em> is pressed.</p> </item> <tag><c>restricted_shell = module()</c></tag> <item> - <p>This parameter can be used to run the Erlang shell - in restricted mode.</p> + <p>Can be used to run the Erlang shell in restricted mode.</p> </item> <tag><c>shell_catch_exception = boolean()</c></tag> <item> - <p>This parameter can be used to set the exception handling - of the Erlang shell's evaluator process.</p> + <p>Can be used to set the exception handling of the evaluator process of + Erlang shell.</p> </item> <tag><c>shell_history_length = integer() >= 0</c></tag> <item> - <p>This parameter can be used to determine how many - commands are saved by the Erlang shell.</p> + <p>Can be used to determine how many commands are saved by the Erlang + shell.</p> </item> <tag><c>shell_prompt_func = {Mod, Func} | default</c></tag> <item> @@ -69,27 +69,26 @@ <item><c>Mod = atom()</c></item> <item><c>Func = atom()</c></item> </list> - <p>This parameter can be used to set a customized - Erlang shell prompt function.</p> + <p>Can be used to set a customized Erlang shell prompt function.</p> </item> <tag><c>shell_saved_results = integer() >= 0</c></tag> <item> - <p>This parameter can be used to determine how many - results are saved by the Erlang shell.</p> + <p>Can be used to determine how many results are saved by the Erlang + shell.</p> </item> <tag><c>shell_strings = boolean()</c></tag> <item> - <p>This parameter can be used to determine how the Erlang - shell outputs lists of integers.</p> + <p>Can be used to determine how the Erlang shell outputs lists of + integers.</p> </item> </taglist> </section> <section> <title>See Also</title> - <p><seealso marker="kernel:app">app(4)</seealso>, - <seealso marker="kernel:application">application(3)</seealso>, - <seealso marker="shell">shell(3)</seealso>, </p> + <p><seealso marker="kernel:app"><c>app(4)</c></seealso>, + <seealso marker="kernel:application"><c>application(3)</c></seealso>, + <seealso marker="shell">shell(3)</seealso></p> </section> </appref> diff --git a/lib/stdlib/doc/src/string.xml b/lib/stdlib/doc/src/string.xml index a9ecb60244..dddedf1132 100644 --- a/lib/stdlib/doc/src/string.xml +++ b/lib/stdlib/doc/src/string.xml @@ -24,306 +24,372 @@ <title>string</title> <prepared>Robert Virding</prepared> - <responsible>Bjarne Dacker</responsible> + <responsible>Bjarne Däcker</responsible> <docno>1</docno> <approved>Bjarne Däcker</approved> <checked></checked> - <date>96-09-28</date> + <date>1996-09-28</date> <rev>A</rev> - <file>string.sgml</file> + <file>string.xml</file> </header> <module>string</module> - <modulesummary>String Processing Functions</modulesummary> + <modulesummary>String processing functions.</modulesummary> <description> - <p>This module contains functions for string processing.</p> + <p>This module provides functions for string processing.</p> </description> + <funcs> <func> - <name name="len" arity="1"/> - <fsummary>Return the length of a string</fsummary> + <name name="centre" arity="2"/> + <name name="centre" arity="3"/> + <fsummary>Center a string.</fsummary> <desc> - <p>Returns the number of characters in the string.</p> + <p>Returns a string, where <c><anno>String</anno></c> is centered in the + string and surrounded by blanks or <c><anno>Character</anno></c>. + The resulting string has length <c><anno>Number</anno></c>.</p> </desc> </func> + <func> - <name name="equal" arity="2"/> - <fsummary>Test string equality</fsummary> + <name name="chars" arity="2"/> + <name name="chars" arity="3"/> + <fsummary>Returns a string consisting of numbers of characters.</fsummary> <desc> - <p>Tests whether two strings are equal. Returns <c>true</c> if - they are, otherwise <c>false</c>.</p> + <p>Returns a string consisting of <c><anno>Number</anno></c> characters + <c><anno>Character</anno></c>. Optionally, the string can end with + string <c><anno>Tail</anno></c>.</p> </desc> </func> + <func> - <name name="concat" arity="2"/> - <fsummary>Concatenate two strings</fsummary> + <name name="chr" arity="2"/> + <fsummary>Return the index of the first occurrence of + a character in a string.</fsummary> <desc> - <p>Concatenates two strings to form a new string. Returns the - new string.</p> + <p>Returns the index of the first occurrence of + <c><anno>Character</anno></c> in <c><anno>String</anno></c>. Returns + <c>0</c> if <c><anno>Character</anno></c> does not occur.</p> </desc> </func> + <func> - <name name="chr" arity="2"/> - <name name="rchr" arity="2"/> - <fsummary>Return the index of the first/last occurrence of<c>Character</c>in <c>String</c></fsummary> + <name name="concat" arity="2"/> + <fsummary>Concatenate two strings.</fsummary> <desc> - <p>Returns the index of the first/last occurrence of - <c><anno>Character</anno></c> in <c><anno>String</anno></c>. <c>0</c> is returned if <c><anno>Character</anno></c> does not - occur.</p> + <p>Concatenates <c><anno>String1</anno></c> and + <c><anno>String2</anno></c> to form a new string + <c><anno>String3</anno></c>, which is returned.</p> </desc> </func> + <func> - <name name="str" arity="2"/> - <name name="rstr" arity="2"/> - <fsummary>Find the index of a substring</fsummary> + <name name="copies" arity="2"/> + <fsummary>Copy a string.</fsummary> <desc> - <p>Returns the position where the first/last occurrence of - <c><anno>SubString</anno></c> begins in <c><anno>String</anno></c>. <c>0</c> is returned if <c><anno>SubString</anno></c> - does not exist in <c><anno>String</anno></c>. - For example:</p> - <code type="none"> -> string:str(" Hello Hello World World ", "Hello World"). -8 </code> + <p>Returns a string containing <c><anno>String</anno></c> repeated + <c><anno>Number</anno></c> times.</p> </desc> </func> + <func> - <name name="span" arity="2"/> <name name="cspan" arity="2"/> - <fsummary>Span characters at start of string</fsummary> + <fsummary>Span characters at start of a string.</fsummary> <desc> <p>Returns the length of the maximum initial segment of - <c><anno>String</anno></c>, which consists entirely of characters from (not - from) <c><anno>Chars</anno></c>.</p> - <p>For example:</p> + <c><anno>String</anno></c>, which consists entirely of characters + not from <c><anno>Chars</anno></c>.</p> + <p><em>Example:</em></p> <code type="none"> -> string:span("\t abcdef", " \t"). -5 > string:cspan("\t abcdef", " \t"). -0 </code> +0</code> </desc> </func> + <func> - <name name="substr" arity="2"/> - <name name="substr" arity="3"/> - <fsummary>Return a substring of <c>String</c></fsummary> + <name name="equal" arity="2"/> + <fsummary>Test string equality.</fsummary> + <desc> + <p>Returns <c>true</c> if <c><anno>String1</anno></c> and + <c><anno>String2</anno></c> are equal, otherwise <c>false</c>.</p> + </desc> + </func> + + <func> + <name name="join" arity="2"/> + <fsummary>Join a list of strings with separator.</fsummary> <desc> - <p>Returns a substring of <c><anno>String</anno></c>, starting at the - position <c><anno>Start</anno></c>, and ending at the end of the string or - at length <c><anno>Length</anno></c>.</p> - <p>For example:</p> + <p>Returns a string with the elements of <c><anno>StringList</anno></c> + separated by the string in <c><anno>Separator</anno></c>.</p> + <p><em>Example:</em></p> <code type="none"> -> substr("Hello World", 4, 5). -"lo Wo" </code> +> join(["one", "two", "three"], ", "). +"one, two, three"</code> </desc> </func> + <func> - <name name="tokens" arity="2"/> - <fsummary>Split string into tokens</fsummary> + <name name="left" arity="2"/> + <name name="left" arity="3"/> + <fsummary>Adjust left end of a string.</fsummary> <desc> - <p>Returns a list of tokens in <c><anno>String</anno></c>, separated by the - characters in <c><anno>SeparatorList</anno></c>.</p> - <p>For example:</p> + <p>Returns <c><anno>String</anno></c> with the length adjusted in + accordance with <c><anno>Number</anno></c>. The left margin is + fixed. If <c>length(<anno>String</anno>)</c> < + <c><anno>Number</anno></c>, then <c><anno>String</anno></c> is padded + with blanks or <c><anno>Character</anno></c>s.</p> + <p><em>Example:</em></p> <code type="none"> -> tokens("abc defxxghix jkl", "x "). -["abc", "def", "ghi", "jkl"] </code> - <p>Note that, as shown in the example above, two or more - adjacent separator characters in <c><anno>String</anno></c> - will be treated as one. That is, there will not be any empty - strings in the resulting list of tokens.</p> +> string:left("Hello",10,$.). +"Hello....."</code> </desc> </func> + <func> - <name name="join" arity="2"/> - <fsummary>Join a list of strings with separator</fsummary> + <name name="len" arity="1"/> + <fsummary>Return the length of a string.</fsummary> <desc> - <p>Returns a string with the elements of <c><anno>StringList</anno></c> - separated by the string in <c><anno>Separator</anno></c>.</p> - <p>For example:</p> - <code type="none"> -> join(["one", "two", "three"], ", "). -"one, two, three" </code> + <p>Returns the number of characters in <c><anno>String</anno></c>.</p> </desc> </func> + <func> - <name name="chars" arity="2"/> - <name name="chars" arity="3"/> - <fsummary>Returns a string consisting of numbers of characters</fsummary> + <name name="rchr" arity="2"/> + <fsummary>Return the index of the last occurrence of + a character in a string.</fsummary> <desc> - <p>Returns a string consisting of <c><anno>Number</anno></c> of characters - <c><anno>Character</anno></c>. Optionally, the string can end with the - string <c><anno>Tail</anno></c>.</p> + <p>Returns the index of the last occurrence of + <c><anno>Character</anno></c> in <c><anno>String</anno></c>. Returns + <c>0</c> if <c><anno>Character</anno></c> does not occur.</p> </desc> </func> + <func> - <name name="copies" arity="2"/> - <fsummary>Copy a string</fsummary> + <name name="right" arity="2"/> + <name name="right" arity="3"/> + <fsummary>Adjust right end of a string.</fsummary> <desc> - <p>Returns a string containing <c><anno>String</anno></c> repeated - <c><anno>Number</anno></c> times.</p> + <p>Returns <c><anno>String</anno></c> with the length adjusted in + accordance with <c><anno>Number</anno></c>. The right margin is + fixed. If the length of <c>(<anno>String</anno>)</c> < + <c><anno>Number</anno></c>, then <c><anno>String</anno></c> is padded + with blanks or <c><anno>Character</anno></c>s.</p> + <p><em>Example:</em></p> + <code type="none"> +> string:right("Hello", 10, $.). +".....Hello"</code> </desc> </func> + <func> - <name name="words" arity="1"/> - <name name="words" arity="2"/> - <fsummary>Count blank separated words</fsummary> + <name name="rstr" arity="2"/> + <fsummary>Find the index of a substring.</fsummary> <desc> - <p>Returns the number of words in <c><anno>String</anno></c>, separated by - blanks or <c><anno>Character</anno></c>.</p> - <p>For example:</p> + <p>Returns the position where the last occurrence of + <c><anno>SubString</anno></c> begins in <c><anno>String</anno></c>. + Returns <c>0</c> if <c><anno>SubString</anno></c> + does not exist in <c><anno>String</anno></c>.</p> + <p><em>Example:</em></p> <code type="none"> -> words(" Hello old boy!", $o). -4 </code> +> string:rstr(" Hello Hello World World ", "Hello World"). +8</code> </desc> </func> + <func> - <name name="sub_word" arity="2"/> - <name name="sub_word" arity="3"/> - <fsummary>Extract subword</fsummary> + <name name="span" arity="2"/> + <fsummary>Span characters at start of a string.</fsummary> <desc> - <p>Returns the word in position <c><anno>Number</anno></c> of <c><anno>String</anno></c>. - Words are separated by blanks or <c><anno>Character</anno></c>s.</p> - <p>For example:</p> + <p>Returns the length of the maximum initial segment of + <c><anno>String</anno></c>, which consists entirely of characters + from <c><anno>Chars</anno></c>.</p> + <p><em>Example:</em></p> <code type="none"> -> string:sub_word(" Hello old boy !",3,$o). -"ld b" </code> +> string:span("\t abcdef", " \t"). +5</code> </desc> </func> + + <func> + <name name="str" arity="2"/> + <fsummary>Find the index of a substring.</fsummary> + <desc> + <p>Returns the position where the first occurrence of + <c><anno>SubString</anno></c> begins in <c><anno>String</anno></c>. + Returns <c>0</c> if <c><anno>SubString</anno></c> + does not exist in <c><anno>String</anno></c>.</p> + <p><em>Example:</em></p> + <code type="none"> +> string:str(" Hello Hello World World ", "Hello World"). +8</code> + </desc> + </func> + <func> <name name="strip" arity="1"/> <name name="strip" arity="2"/> <name name="strip" arity="3"/> - <fsummary>Strip leading or trailing characters</fsummary> + <fsummary>Strip leading or trailing characters.</fsummary> <desc> <p>Returns a string, where leading and/or trailing blanks or a number of <c><anno>Character</anno></c> have been removed. - <c><anno>Direction</anno></c> can be <c>left</c>, <c>right</c>, or - <c>both</c> and indicates from which direction blanks are to be - removed. The function <c>strip/1</c> is equivalent to + <c><anno>Direction</anno></c>, which can be <c>left</c>, <c>right</c>, + or <c>both</c>, indicates from which direction blanks are to be + removed. <c>strip/1</c> is equivalent to <c>strip(String, both)</c>.</p> - <p>For example:</p> + <p><em>Example:</em></p> <code type="none"> > string:strip("...Hello.....", both, $.). -"Hello" </code> +"Hello"</code> </desc> </func> + <func> - <name name="left" arity="2"/> - <name name="left" arity="3"/> - <fsummary>Adjust left end of string</fsummary> + <name name="sub_string" arity="2"/> + <name name="sub_string" arity="3"/> + <fsummary>Extract a substring.</fsummary> <desc> - <p>Returns the <c><anno>String</anno></c> with the length adjusted in - accordance with <c><anno>Number</anno></c>. The left margin is - fixed. If the <c>length(<anno>String</anno>)</c> < <c><anno>Number</anno></c>, - <c><anno>String</anno></c> is padded with blanks or <c><anno>Character</anno></c>s.</p> - <p>For example:</p> + <p>Returns a substring of <c><anno>String</anno></c>, starting at + position <c><anno>Start</anno></c> to the end of the string, or to + and including position <c><anno>Stop</anno></c>.</p> + <p><em>Example:</em></p> <code type="none"> -> string:left("Hello",10,$.). -"Hello....." </code> +sub_string("Hello World", 4, 8). +"lo Wo"</code> </desc> </func> + <func> - <name name="right" arity="2"/> - <name name="right" arity="3"/> - <fsummary>Adjust right end of string</fsummary> + <name name="substr" arity="2"/> + <name name="substr" arity="3"/> + <fsummary>Return a substring of a string.</fsummary> <desc> - <p>Returns the <c><anno>String</anno></c> with the length adjusted in - accordance with <c><anno>Number</anno></c>. The right margin is - fixed. If the length of <c>(<anno>String</anno>)</c> < <c><anno>Number</anno></c>, - <c><anno>String</anno></c> is padded with blanks or <c><anno>Character</anno></c>s.</p> - <p>For example:</p> + <p>Returns a substring of <c><anno>String</anno></c>, starting at + position <c><anno>Start</anno></c>, and ending at the end of the + string or at length <c><anno>Length</anno></c>.</p> + <p><em>Example:</em></p> <code type="none"> -> string:right("Hello", 10, $.). -".....Hello" </code> - </desc> - </func> - <func> - <name name="centre" arity="2"/> - <name name="centre" arity="3"/> - <fsummary>Center a string</fsummary> - <desc> - <p>Returns a string, where <c><anno>String</anno></c> is centred in the - string and surrounded by blanks or characters. The resulting - string will have the length <c><anno>Number</anno></c>.</p> +> substr("Hello World", 4, 5). +"lo Wo"</code> </desc> </func> + <func> - <name name="sub_string" arity="2"/> - <name name="sub_string" arity="3"/> - <fsummary>Extract a substring</fsummary> + <name name="sub_word" arity="2"/> + <name name="sub_word" arity="3"/> + <fsummary>Extract subword.</fsummary> <desc> - <p>Returns a substring of <c><anno>String</anno></c>, starting at the - position <c><anno>Start</anno></c> to the end of the string, or to and - including the <c><anno>Stop</anno></c> position.</p> - <p>For example:</p> + <p>Returns the word in position <c><anno>Number</anno></c> of + <c><anno>String</anno></c>. Words are separated by blanks or + <c><anno>Character</anno></c>s.</p> + <p><em>Example:</em></p> <code type="none"> -sub_string("Hello World", 4, 8). -"lo Wo" </code> +> string:sub_word(" Hello old boy !",3,$o). +"ld b"</code> </desc> </func> + <func> <name name="to_float" arity="1"/> - <fsummary>Returns a float whose text representation is the integers (ASCII values) in String.</fsummary> + <fsummary>Returns a float whose text representation is the integers + (ASCII values) in a string.</fsummary> <desc> - <p>Argument <c><anno>String</anno></c> is expected to start with a valid text - represented float (the digits being ASCII values). Remaining characters - in the string after the float are returned in <c><anno>Rest</anno></c>.</p> - <p>Example:</p> + <p>Argument <c><anno>String</anno></c> is expected to start with a + valid text represented float (the digits are ASCII values). + Remaining characters in the string after the float are returned in + <c><anno>Rest</anno></c>.</p> + <p><em>Example:</em></p> <code type="none"> - > {F1,Fs} = string:to_float("1.0-1.0e-1"), - > {F2,[]} = string:to_float(Fs), - > F1+F2. - 0.9 - > string:to_float("3/2=1.5"). - {error,no_float} - > string:to_float("-1.5eX"). - {-1.5,"eX"}</code> +> {F1,Fs} = string:to_float("1.0-1.0e-1"), +> {F2,[]} = string:to_float(Fs), +> F1+F2. +0.9 +> string:to_float("3/2=1.5"). +{error,no_float} +> string:to_float("-1.5eX"). +{-1.5,"eX"}</code> </desc> </func> + <func> <name name="to_integer" arity="1"/> - <fsummary>Returns an integer whose text representation is the integers (ASCII values) in String.</fsummary> + <fsummary>Returns an integer whose text representation is the integers + (ASCII values) in a string.</fsummary> <desc> - <p>Argument <c><anno>String</anno></c> is expected to start with a valid text - represented integer (the digits being ASCII values). Remaining characters - in the string after the integer are returned in <c><anno>Rest</anno></c>.</p> - <p>Example:</p> + <p>Argument <c><anno>String</anno></c> is expected to start with a + valid text represented integer (the digits are ASCII values). + Remaining characters in the string after the integer are returned in + <c><anno>Rest</anno></c>.</p> + <p><em>Example:</em></p> <code type="none"> - > {I1,Is} = string:to_integer("33+22"), - > {I2,[]} = string:to_integer(Is), - > I1-I2. - 11 - > string:to_integer("0.5"). - {0,".5"} - > string:to_integer("x=2"). - {error,no_integer}</code> +> {I1,Is} = string:to_integer("33+22"), +> {I2,[]} = string:to_integer(Is), +> I1-I2. +11 +> string:to_integer("0.5"). +{0,".5"} +> string:to_integer("x=2"). +{error,no_integer}</code> </desc> </func> + <func> <name name="to_lower" arity="1" clause_i="1"/> <name name="to_lower" arity="1" clause_i="2"/> <name name="to_upper" arity="1" clause_i="1"/> <name name="to_upper" arity="1" clause_i="2"/> - <fsummary>Convert case of string (ISO/IEC 8859-1)</fsummary> + <fsummary>Convert case of string (ISO/IEC 8859-1).</fsummary> <type variable="String" name_i="1"/> <type variable="Result" name_i="1"/> <type variable="Char"/> <type variable="CharResult"/> <desc> - <p>The given string or character is case-converted. Note that - the supported character set is ISO/IEC 8859-1 (a.k.a. Latin 1), - all values outside this set is unchanged</p> + <p>The specified string or character is case-converted. Notice that + the supported character set is ISO/IEC 8859-1 (also called Latin 1); + all values outside this set are unchanged</p> + </desc> + </func> + + <func> + <name name="tokens" arity="2"/> + <fsummary>Split string into tokens.</fsummary> + <desc> + <p>Returns a list of tokens in <c><anno>String</anno></c>, separated + by the characters in <c><anno>SeparatorList</anno></c>.</p> + <p><em>Example:</em></p> + <code type="none"> +> tokens("abc defxxghix jkl", "x "). +["abc", "def", "ghi", "jkl"]</code> + <p>Notice that, as shown in this example, two or more + adjacent separator characters in <c><anno>String</anno></c> + are treated as one. That is, there are no empty + strings in the resulting list of tokens.</p> + </desc> + </func> + + <func> + <name name="words" arity="1"/> + <name name="words" arity="2"/> + <fsummary>Count blank separated words.</fsummary> + <desc> + <p>Returns the number of words in <c><anno>String</anno></c>, separated + by blanks or <c><anno>Character</anno></c>.</p> + <p><em>Example:</em></p> + <code type="none"> +> words(" Hello old boy!", $o). +4</code> </desc> </func> </funcs> <section> <title>Notes</title> - <p>Some of the general string functions may seem to overlap each - other. The reason for this is that this string package is the - combination of two earlier packages and all the functions of - both packages have been retained. - </p> + <p>Some of the general string functions can seem to overlap each + other. The reason is that this string package is the + combination of two earlier packages and all functions of + both packages have been retained.</p> + <note> - <p>Any undocumented functions in <c>string</c> should not be used.</p> + <p>Any undocumented functions in <c>string</c> are not to be used.</p> </note> </section> </erlref> diff --git a/lib/stdlib/doc/src/supervisor.xml b/lib/stdlib/doc/src/supervisor.xml index 29e5a732d5..294196f746 100644 --- a/lib/stdlib/doc/src/supervisor.xml +++ b/lib/stdlib/doc/src/supervisor.xml @@ -29,124 +29,138 @@ <rev></rev> </header> <module>supervisor</module> - <modulesummary>Generic Supervisor Behaviour</modulesummary> + <modulesummary>Generic supervisor behavior.</modulesummary> <description> - <p>A behaviour module for implementing a supervisor, a process which + <p>This behavior module provides a supervisor, a process that supervises other processes called child processes. A child process can either be another supervisor or a worker process. Worker processes are normally implemented using one of the - <c>gen_event</c>, <c>gen_fsm</c>, <c>gen_statem</c> or <c>gen_server</c> - behaviours. A supervisor implemented using this module will have + <seealso marker="gen_event"><c>gen_event</c></seealso>, + <seealso marker="gen_fsm"><c>gen_fsm</c></seealso>, + <seealso marker="gen_server"><c>gen_server</c></seealso>, or + <seealso marker="gen_statem"><c>gen_statem</c></seealso> + behaviors. A supervisor implemented using this module has a standard set of interface functions and include functionality for tracing and error reporting. Supervisors are used to build a hierarchical process structure called a supervision tree, a - nice way to structure a fault tolerant application. Refer to - <em>OTP Design Principles</em> for more information.</p> + nice way to structure a fault-tolerant application. For more + information, see <seealso marker="doc/design_principles:sup_princ"> + Supervisor Behaviour</seealso> in OTP Design Principles.</p> + <p>A supervisor expects the definition of which child processes to supervise to be specified in a callback module exporting a - pre-defined set of functions.</p> - <p>Unless otherwise stated, all functions in this module will fail + predefined set of functions.</p> + + <p>Unless otherwise stated, all functions in this module fail if the specified supervisor does not exist or if bad arguments - are given.</p> + are specified.</p> </description> <section> + <marker id="supervision_princ"/> <title>Supervision Principles</title> - <p>The supervisor is responsible for starting, stopping and + <p>The supervisor is responsible for starting, stopping, and monitoring its child processes. The basic idea of a supervisor is - that it shall keep its child processes alive by restarting them + that it must keep its child processes alive by restarting them when necessary.</p> + <p>The children of a supervisor are defined as a list of <em>child specifications</em>. When the supervisor is started, the child processes are started in order from left to right according to this list. When the supervisor terminates, it first terminates its child processes in reversed start order, from right to left.</p> + <marker id="sup_flags"/> - <p>The properties of a supervisor are defined by the supervisor - flags. This is the type definition for the supervisor flags: - </p> - <pre>sup_flags() = #{strategy => strategy(), % optional + <p>The supervisor properties are defined by the supervisor flags. + The type definition for the supervisor flags is as follows:</p> + + <pre> +sup_flags() = #{strategy => strategy(), % optional intensity => non_neg_integer(), % optional - period => pos_integer()} % optional - </pre> - <p>A supervisor can have one of the following <em>restart - strategies</em>, specified with the <c>strategy</c> key in the - above map: - </p> + period => pos_integer()} % optional</pre> + + <p>A supervisor can have one of the following <em>restart strategies</em> + specified with the <c>strategy</c> key in the above map:</p> + <list type="bulleted"> <item> - <p><c>one_for_one</c> - if one child process terminates and - should be restarted, only that child process is + <p><c>one_for_one</c> - If one child process terminates and + is to be restarted, only that child process is affected. This is the default restart strategy.</p> </item> <item> - <p><c>one_for_all</c> - if one child process terminates and - should be restarted, all other child processes are terminated + <p><c>one_for_all</c> - If one child process terminates and + is to be restarted, all other child processes are terminated and then all child processes are restarted.</p> </item> <item> - <p><c>rest_for_one</c> - if one child process terminates and - should be restarted, the 'rest' of the child processes -- - i.e. the child processes after the terminated child process - in the start order -- are terminated. Then the terminated + <p><c>rest_for_one</c> - If one child process terminates and + is to be restarted, the 'rest' of the child processes (that + is, the child processes after the terminated child process + in the start order) are terminated. Then the terminated child process and all child processes after it are restarted.</p> </item> <item> - <p><c>simple_one_for_one</c> - a simplified <c>one_for_one</c> + <p><c>simple_one_for_one</c> - A simplified <c>one_for_one</c> supervisor, where all child processes are dynamically added - instances of the same process type, i.e. running the same + instances of the same process type, that is, running the same code.</p> - <p>The functions <c>delete_child/2</c> - and <c>restart_child/2</c> are invalid for - <c>simple_one_for_one</c> supervisors and will return + <p>Functions + <seealso marker="#delete_child/2"><c>delete_child/2</c></seealso> and + <seealso marker="#restart_child/2"><c>restart_child/2</c></seealso> + are invalid for <c>simple_one_for_one</c> supervisors and return <c>{error,simple_one_for_one}</c> if the specified supervisor uses this restart strategy.</p> - <p>The function <c>terminate_child/2</c> can be used for + <p>Function <seealso marker="#terminate_child/2"> + <c>terminate_child/2</c></seealso> can be used for children under <c>simple_one_for_one</c> supervisors by - giving the child's <c>pid()</c> as the second argument. If + specifying the child's <c>pid()</c> as the second argument. If instead the child specification identifier is used, - <c>terminate_child/2</c> will return + <c>terminate_child/2</c> return <c>{error,simple_one_for_one}</c>.</p> - <p>Because a <c>simple_one_for_one</c> supervisor could have + <p>As a <c>simple_one_for_one</c> supervisor can have many children, it shuts them all down asynchronously. This - means that the children will do their cleanup in parallel, + means that the children do their cleanup in parallel, and therefore the order in which they are stopped is not defined.</p> </item> </list> + <p>To prevent a supervisor from getting into an infinite loop of child process terminations and restarts, a <em>maximum restart intensity</em> is defined using two integer values specified - with the <c>intensity</c> and <c>period</c> keys in the above + with keys <c>intensity</c> and <c>period</c> in the above map. Assuming the values <c>MaxR</c> for <c>intensity</c> - and <c>MaxT</c> for <c>period</c>, then if more than <c>MaxR</c> - restarts occur within <c>MaxT</c> seconds, the supervisor will - terminate all child processes and then itself. The default value - for <c>intensity</c> is <c>1</c>, and the default value - for <c>period</c> is <c>5</c>. - </p> + and <c>MaxT</c> for <c>period</c>, then, if more than <c>MaxR</c> + restarts occur within <c>MaxT</c> seconds, the supervisor + terminates all child processes and then itself. <c>intensity</c> + defaults to <c>1</c> and <c>period</c> defaults to <c>5</c>.</p> + <marker id="child_spec"/> - <p>This is the type definition of a child specification:</p> - <pre>child_spec() = #{id => child_id(), % mandatory + <p>The type definition of a child specification is as follows:</p> + + <pre> +child_spec() = #{id => child_id(), % mandatory start => mfargs(), % mandatory restart => restart(), % optional shutdown => shutdown(), % optional type => worker(), % optional modules => modules()} % optional</pre> + <p>The old tuple format is kept for backwards compatibility, see <seealso marker="#type-child_spec">child_spec()</seealso>, - but the map is preferred. - </p> + but the map is preferred.</p> + <list type="bulleted"> <item> <p><c>id</c> is used to identify the child specification internally by the supervisor.</p> <p>The <c>id</c> key is mandatory.</p> - <p>Note that this identifier on occations has been called - "name". As far as possible, the terms "identifier" or "id" - are now used but in order to keep backwards compatibility, - some occurences of "name" can still be found, for example - in error messages.</p> + <p>Notice that this identifier on occations has been called + "name". As far as possible, the terms "identifier" or "id" + are now used but to keep backward compatibility, + some occurences of "name" can still be found, for example + in error messages.</p> </item> <item> <p><c>start</c> defines the function call used to start the @@ -154,84 +168,86 @@ tuple <c>{M,F,A}</c> used as <c>apply(M,F,A)</c>.</p> <p>The start function <em>must create and link to</em> the child process, and must return <c>{ok,Child}</c> or - <c>{ok,Child,Info}</c> where <c>Child</c> is the pid of - the child process and <c>Info</c> an arbitrary term which is + <c>{ok,Child,Info}</c>, where <c>Child</c> is the pid of + the child process and <c>Info</c> any term that is ignored by the supervisor.</p> <p>The start function can also return <c>ignore</c> if the child process for some reason cannot be started, in which case - the child specification will be kept by the supervisor - (unless it is a temporary child) but the non-existing child - process will be ignored.</p> - <p>If something goes wrong, the function may also return an + the child specification is kept by the supervisor + (unless it is a temporary child) but the non-existing child + process is ignored.</p> + <p>If something goes wrong, the function can also return an error tuple <c>{error,Error}</c>.</p> - <p>Note that the <c>start_link</c> functions of the different - behaviour modules fulfill the above requirements.</p> - <p>The <c>start</c> key is mandatory.</p> + <p>Notice that the <c>start_link</c> functions of the different + behavior modules fulfill the above requirements.</p> + <p>The <c>start</c> key is mandatory.</p> </item> <item> <p><c>restart</c> defines when a terminated child process - shall be restarted. A <c>permanent</c> child process will - always be restarted, a <c>temporary</c> child process will - never be restarted (even when the supervisor's restart strategy + must be restarted. A <c>permanent</c> child process is + always restarted. A <c>temporary</c> child process is + never restarted (even when the supervisor's restart strategy is <c>rest_for_one</c> or <c>one_for_all</c> and a sibling's - death causes the temporary process to be terminated) and a - <c>transient</c> child process will be restarted only if - it terminates abnormally, i.e. with another exit reason - than <c>normal</c>, <c>shutdown</c> or <c>{shutdown,Term}</c>.</p> - <p>The <c>restart</c> key is optional. If it is not given, the - default value <c>permanent</c> will be used.</p> + death causes the temporary process to be terminated). + A <c>transient</c> child process is restarted only if + it terminates abnormally, that is, with another exit reason + than <c>normal</c>, <c>shutdown</c>, or <c>{shutdown,Term}</c>.</p> + <p>The <c>restart</c> key is optional. If it is not specified, + it defaults to <c>permanent</c>.</p> </item> <item> - <p><c>shutdown</c> defines how a child process shall be - terminated. <c>brutal_kill</c> means the child process will - be unconditionally terminated using <c>exit(Child,kill)</c>. - An integer timeout value means that the supervisor will tell + <p><c>shutdown</c> defines how a child process must be + terminated. <c>brutal_kill</c> means that the child process + is unconditionally terminated using <c>exit(Child,kill)</c>. + An integer time-out value means that the supervisor tells the child process to terminate by calling <c>exit(Child,shutdown)</c> and then wait for an exit signal - with reason <c>shutdown</c> back from the child process. If - no exit signal is received within the specified number of milliseconds, + with reason <c>shutdown</c> back from the child process. If no + exit signal is received within the specified number of milliseconds, the child process is unconditionally terminated using <c>exit(Child,kill)</c>.</p> <p>If the child process is another supervisor, the shutdown time - should be set to <c>infinity</c> to give the subtree ample + is to be set to <c>infinity</c> to give the subtree ample time to shut down. It is also allowed to set it to <c>infinity</c>, if the child process is a worker.</p> <warning> <p>Be careful when setting the shutdown time to - <c>infinity</c> when the child process is a worker. Because, in this - situation, the termination of the supervision tree depends on the - child process, it must be implemented in a safe way and its cleanup - procedure must always return.</p> + <c>infinity</c> when the child process is a worker. Because, in this + situation, the termination of the supervision tree depends on the + child process, it must be implemented in a safe way and its cleanup + procedure must always return.</p> </warning> - <p>Note that all child processes implemented using the standard - OTP behaviour modules automatically adhere to the shutdown + <p>Notice that all child processes implemented using the standard + OTP behavior modules automatically adhere to the shutdown protocol.</p> - <p>The <c>shutdown</c> key is optional. If it is not given, - the default value <c>5000</c> will be used if the child is - of type <c>worker</c>; and <c>infinity</c> will be used if - the child is of type <c>supervisor</c>.</p> + <p>The <c>shutdown</c> key is optional. If it is not specified, + it defaults to <c>5000</c> if the child is + of type <c>worker</c> and it defaults to <c>infinity</c> if + the child is of type <c>supervisor</c>.</p> </item> <item> <p><c>type</c> specifies if the child process is a supervisor or a worker.</p> - <p>The <c>type</c> key is optional. If it is not given, the - default value <c>worker</c> will be used.</p> + <p>The <c>type</c> key is optional. If it is not specified, + it defaults to <c>worker</c>.</p> </item> <item> <p><c>modules</c> is used by the release handler during code replacement to determine which processes are using a certain module. As a rule of thumb, if the child process is a <c>supervisor</c>, <c>gen_server</c>, - <c>gen_fsm</c> or <c>gen_statem</c> - this should be a list with one element <c>[Module]</c>, - where <c>Module</c> is the callback module. If the child - process is an event manager (<c>gen_event</c>) with a - dynamic set of callback modules, the value <c>dynamic</c> - shall be used. See <em>OTP Design Principles</em> for more - information about release handling.</p> - <p>The <c>modules</c> key is optional. If it is not given, it - defaults to <c>[M]</c>, where <c>M</c> comes from the - child's start <c>{M,F,A}</c></p> + <c>gen_statem</c>, or <c>gen_fsm</c>, + this is to be a list with one element <c>[Module]</c>, + where <c>Module</c> is the callback module. If the child + process is an event manager (<c>gen_event</c>) with a + dynamic set of callback modules, value <c>dynamic</c> + must be used. For more information about release handling, see + <seealso marker="doc/design_principles:release_handling"> + Release Handling</seealso> + in OTP Design Principles.</p> + <p>The <c>modules</c> key is optional. If it is not specified, it + defaults to <c>[M]</c>, where <c>M</c> comes from the + child's start <c>{M,F,A}</c>.</p> </item> <item> <p>Internally, the supervisor also keeps track of the pid @@ -240,6 +256,7 @@ </item> </list> </section> + <datatypes> <datatype> <name name="child"/> @@ -250,20 +267,18 @@ </datatype> <datatype> <name name="child_spec"/> - <desc><p>The tuple format is kept for backwards compatibility - only. A map is preferred; see more details - <seealso marker="#child_spec">above</seealso>.</p></desc> + <desc><p>The tuple format is kept for backward compatibility + only. A map is preferred; see more details + <seealso marker="#child_spec">above</seealso>.</p></desc> </datatype> <datatype> <name name="mfargs"/> - <desc> - <p>The value <c>undefined</c> for <c><anno>A</anno></c> (the - argument list) is only to be used internally - in <c>supervisor</c>. If the restart type of the child - is <c>temporary</c>, then the process is never to be - restarted and therefore there is no need to store the real - argument list. The value <c>undefined</c> will then be - stored instead.</p> + <desc><p>Value <c>undefined</c> for <c><anno>A</anno></c> (the + argument list) is only to be used internally + in <c>supervisor</c>. If the restart type of the child + is <c>temporary</c>, the process is never to be + restarted and therefore there is no need to store the real + argument list. Value <c>undefined</c> is then stored instead.</p> </desc> </datatype> <datatype> @@ -280,9 +295,9 @@ </datatype> <datatype> <name name="sup_flags"/> - <desc><p>The tuple format is kept for backwards compatibility - only. A map is preferred; see more details - <seealso marker="#sup_flags">above</seealso>.</p></desc> + <desc><p>The tuple format is kept for backward compatibility + only. A map is preferred; see more details + <seealso marker="#sup_flags">above</seealso>.</p></desc> </datatype> <datatype> <name name="sup_ref"/> @@ -291,307 +306,355 @@ <name name="worker"/> </datatype> </datatypes> + <funcs> <func> - <name name="start_link" arity="2"/> - <name name="start_link" arity="3"/> - <fsummary>Create a supervisor process.</fsummary> - <type name="startlink_ret"/> - <type name="startlink_err"/> - <type name="sup_name"/> + <name name="check_childspecs" arity="1"/> + <fsummary>Check if children specifications are syntactically correct. + </fsummary> <desc> - <p>Creates a supervisor process as part of a supervision tree. - The function will, among other things, ensure that - the supervisor is linked to the calling process (its - supervisor).</p> - <p>The created supervisor process calls <c><anno>Module</anno>:init/1</c> to - find out about restart strategy, maximum restart intensity - and child processes. To ensure a synchronized start-up - procedure, <c>start_link/2,3</c> does not return until - <c><anno>Module</anno>:init/1</c> has returned and all child processes - have been started.</p> - <p>If <c><anno>SupName</anno>={local,Name}</c>, the supervisor is registered - locally as <c>Name</c> using <c>register/2</c>. If - <c><anno>SupName</anno>={global,Name}</c> the supervisor is registered - globally as <c>Name</c> using <c>global:register_name/2</c>. If - <c><anno>SupName</anno>={via,<anno>Module</anno>,<anno>Name</anno>}</c> the supervisor - is registered as <c>Name</c> using the registry represented by - <c>Module</c>. The <c>Module</c> callback must export the functions - <c>register_name/2</c>, <c>unregister_name/1</c> and <c>send/2</c>, - which shall behave like the corresponding functions in <c>global</c>. - Thus, <c>{via,global,<anno>Name</anno>}</c> is a valid reference.</p> - <p>If no name is provided, the supervisor is not registered.</p> - <p><c><anno>Module</anno></c> is the name of the callback module.</p> - <p><c><anno>Args</anno></c> is an arbitrary term which is passed as - the argument to <c><anno>Module</anno>:init/1</c>.</p> - <p>If the supervisor and its child processes are successfully - created (i.e. if all child process start functions return - <c>{ok,Child}</c>, <c>{ok,Child,Info}</c>, or <c>ignore</c>), - the function returns <c>{ok,Pid}</c>, where <c>Pid</c> is - the pid of the supervisor. If there already exists a process - with the specified <c><anno>SupName</anno></c>, the function returns - <c>{error,{already_started,Pid}}</c>, where <c>Pid</c> is - the pid of that process.</p> - <p>If <c><anno>Module</anno>:init/1</c> returns <c>ignore</c>, this function - returns <c>ignore</c> as well, and the supervisor terminates - with reason <c>normal</c>. - If <c><anno>Module</anno>:init/1</c> fails or returns an incorrect value, - this function returns <c>{error,Term}</c> where <c>Term</c> - is a term with information about the error, and the supervisor - terminates with reason <c>Term</c>.</p> - <p>If any child process start function fails or returns an error - tuple or an erroneous value, the supervisor will first terminate - all already started child processes with reason <c>shutdown</c> - and then terminate itself and return - <c>{error, {shutdown, Reason}}</c>.</p> + <p>Takes a list of child specification as argument + and returns <c>ok</c> if all of them are syntactically + correct, otherwise <c>{error,<anno>Error</anno>}</c>.</p> </desc> </func> + <func> - <name name="start_child" arity="2"/> - <fsummary>Dynamically add a child process to a supervisor.</fsummary> - <type name="startchild_ret"/> - <type name="startchild_err"/> + <name name="count_children" arity="1"/> + <fsummary>Return counts for the number of child specifications, + active children, supervisors, and workers.</fsummary> <desc> - <p>Dynamically adds a child specification to the supervisor - <c><anno>SupRef</anno></c> which starts the corresponding child process.</p> - <p><marker id="SupRef"/><c><anno>SupRef</anno></c> can be:</p> + <p>Returns a property list (see <seealso marker="proplists"> + <c>proplists</c></seealso>) containing the + counts for each of the following elements of the supervisor's + child specifications and managed processes:</p> <list type="bulleted"> - <item>the pid,</item> - <item><c>Name</c>, if the supervisor is locally registered,</item> - <item><c>{Name,Node}</c>, if the supervisor is locally - registered at another node, or</item> - <item><c>{global,Name}</c>, if the supervisor is globally - registered.</item> - <item><c>{via,Module,Name}</c>, if the supervisor is registered - through an alternative process registry.</item> + <item> + <p><c>specs</c> - The total count of children, dead or alive.</p> + </item> + <item> + <p><c>active</c> - The count of all actively running child + processes managed by this supervisor. For a + <c>simple_one_for_one</c> supervisors, no check is done to ensure + that each child process is still alive, although the result + provided here is likely to be very + accurate unless the supervisor is heavily overloaded.</p> + </item> + <item> + <p><c>supervisors</c> - The count of all children marked as + <c>child_type = supervisor</c> in the specification list, + regardless if the child process is still alive.</p> + </item> + <item> + <p><c>workers</c> - The count of all children marked as + <c>child_type = worker</c> in the specification list, + regardless if the child process is still alive.</p> + </item> </list> - <p><c><anno>ChildSpec</anno></c> must be a valid child specification - (unless the supervisor is a <c>simple_one_for_one</c> - supervisor; see below). The child process will be started by - using the start function as defined in the child - specification.</p> - <p>In the case of a <c>simple_one_for_one</c> supervisor, - the child specification defined in <c>Module:init/1</c> will - be used, and <c><anno>ChildSpec</anno></c> shall instead be an arbitrary - list of terms <c><anno>List</anno></c>. The child process will then be - started by appending <c><anno>List</anno></c> to the existing start - function arguments, i.e. by calling - <c>apply(M, F, A++<anno>List</anno>)</c> where <c>{M,F,A}</c> is the start - function defined in the child specification.</p> - <p>If there already exists a child specification with - the specified identifier, <c><anno>ChildSpec</anno></c> is discarded, and - the function returns <c>{error,already_present}</c> or - <c>{error,{already_started,<anno>Child</anno>}}</c>, depending on if - the corresponding child process is running or not.</p> - <p>If the child process start function returns <c>{ok,<anno>Child</anno>}</c> - or <c>{ok,<anno>Child</anno>,<anno>Info</anno>}</c>, the child specification and pid are - added to the supervisor and the function returns the same - value.</p> - <p>If the child process start function returns <c>ignore</c>, - the child specification is added to the supervisor (unless the - supervisor is a <c>simple_one_for_one</c> supervisor, see below), - the pid is set to <c>undefined</c> and the function returns - <c>{ok,undefined}</c>. - </p> - <p>In the case of a <c>simple_one_for_one</c> supervisor, when a child - process start function returns <c>ignore</c> the functions returns - <c>{ok,undefined}</c> and no child is added to the supervisor. - </p> - <p>If the child process start function returns an error tuple or - an erroneous value, or if it fails, the child specification is - discarded, and the function returns <c>{error,Error}</c> where - <c>Error</c> is a term containing information about the error - and child specification.</p> + <p>For a description of <c><anno>SupRef</anno></c>, see + <seealso marker="#SupRef"><c>start_child/2</c></seealso>.</p> </desc> </func> - <func> - <name name="terminate_child" arity="2"/> - <fsummary>Terminate a child process belonging to a supervisor.</fsummary> - <desc> - <p>Tells the supervisor <c><anno>SupRef</anno></c> to terminate the given - child.</p> - - <p>If the supervisor is not <c>simple_one_for_one</c>, - <c><anno>Id</anno></c> must be the child specification - identifier. The process, if there is one, is terminated and, - unless it is a temporary child, the child specification is - kept by the supervisor. The child process may later be - restarted by the supervisor. The child process can also be - restarted explicitly by calling - <c>restart_child/2</c>. Use <c>delete_child/2</c> to remove - the child specification.</p> - - <p>If the child is temporary, the child specification is deleted as - soon as the process terminates. This means - that <c>delete_child/2</c> has no meaning, - and <c>restart_child/2</c> can not be used for these - children.</p> - <p>If the supervisor is <c>simple_one_for_one</c>, <c><anno>Id</anno></c> - must be the child process' <c>pid()</c>. If the specified - process is alive, but is not a child of the given - supervisor, the function will return - <c>{error,not_found}</c>. If the child specification - identifier is given instead of a <c>pid()</c>, the - function will return <c>{error,simple_one_for_one}</c>.</p> - <p>If successful, the function returns <c>ok</c>. If there is - no child specification with the specified <c><anno>Id</anno></c>, the - function returns <c>{error,not_found}</c>.</p> - <p>See <seealso marker="#SupRef"><c>start_child/2</c></seealso> - for a description of <c><anno>SupRef</anno></c>.</p> - </desc> - </func> <func> <name name="delete_child" arity="2"/> <fsummary>Delete a child specification from a supervisor.</fsummary> <desc> - <p>Tells the supervisor <c><anno>SupRef</anno></c> to delete the child - specification identified by <c><anno>Id</anno></c>. The corresponding child - process must not be running. Use <c>terminate_child/2</c> to - terminate it.</p> - <p>See <seealso marker="#SupRef"><c>start_child/2</c></seealso> - for a description of <c><anno>SupRef</anno></c>.</p> + <p>Tells supervisor <c><anno>SupRef</anno></c> to delete the child + specification identified by <c><anno>Id</anno></c>. The corresponding + child process must not be running. Use + <seealso marker="#terminate_child/2"> + <c>terminate_child/2</c></seealso> to terminate it.</p> + <p>For a description of <c><anno>SupRef</anno></c>, see + <seealso marker="#SupRef"><c>start_child/2</c></seealso>.</p> <p>If successful, the function returns <c>ok</c>. If the child - specification identified by <c><anno>Id</anno></c> exists but - the corresponding child process is running or about to be restarted, - the function returns <c>{error,running}</c> or - <c>{error,restarting}</c>, respectively. If the child specification + specification identified by <c><anno>Id</anno></c> exists but the + corresponding child process is running or is about to be restarted, + the function returns <c>{error,running}</c> or + <c>{error,restarting}</c>, respectively. If the child specification identified by <c><anno>Id</anno></c> does not exist, the function - returns <c>{error,not_found}</c>.</p> + returns <c>{error,not_found}</c>.</p> + </desc> + </func> + + <func> + <name name="get_childspec" arity="2"/> + <fsummary>Return the child specification map for the specified + child.</fsummary> + <desc> + <p>Returns the child specification map for the child identified + by <c>Id</c> under supervisor <c>SupRef</c>. The returned + map contains all keys, both mandatory and optional.</p> + <p>For a description of <c><anno>SupRef</anno></c>, see + <seealso marker="#SupRef"><c>start_child/2</c></seealso>.</p> </desc> </func> + <func> <name name="restart_child" arity="2"/> - <fsummary>Restart a terminated child process belonging to a supervisor.</fsummary> + <fsummary>Restart a terminated child process belonging to a supervisor. + </fsummary> <desc> - <p>Tells the supervisor <c><anno>SupRef</anno></c> to restart + <p>Tells supervisor <c><anno>SupRef</anno></c> to restart a child process corresponding to the child specification identified by <c><anno>Id</anno></c>. The child specification must exist, and the corresponding child process must not be running.</p> - <p>Note that for temporary children, the child specification - is automatically deleted when the child terminates; thus - it is not possible to restart such children.</p> - <p>See <seealso marker="#SupRef"><c>start_child/2</c></seealso> - for a description of <c>SupRef</c>.</p> + <p>Notice that for temporary children, the child specification + is automatically deleted when the child terminates; thus, + it is not possible to restart such children.</p> + <p>For a description of <c><anno>SupRef</anno></c>, see + <seealso marker="#SupRef"><c>start_child/2</c></seealso>.</p> <p>If the child specification identified by <c><anno>Id</anno></c> does not exist, the function returns <c>{error,not_found}</c>. If the child specification exists but the corresponding process is already running, the - function returns - <c>{error,running}</c>.</p> + function returns <c>{error,running}</c>.</p> <p>If the child process start function returns <c>{ok,<anno>Child</anno>}</c> or <c>{ok,<anno>Child</anno>,<anno>Info</anno>}</c>, the pid is added to the supervisor and the function returns the same value.</p> <p>If the child process start function returns <c>ignore</c>, - the pid remains set to <c>undefined</c>, and the function + the pid remains set to <c>undefined</c> and the function returns <c>{ok,undefined}</c>.</p> <p>If the child process start function returns an error tuple or an erroneous value, or if it fails, the function returns - <c>{error,<anno>Error</anno>}</c> + <c>{error,<anno>Error</anno>}</c>, where <c><anno>Error</anno></c> is a term containing information about the error.</p> </desc> </func> + <func> - <name name="which_children" arity="1"/> - <fsummary>Return information about all children specifications and - child processes belonging to a supervisor.</fsummary> + <name name="start_child" arity="2"/> + <fsummary>Dynamically add a child process to a supervisor.</fsummary> + <type name="startchild_ret"/> + <type name="startchild_err"/> <desc> - <p>Returns a newly created list with information about all child - specifications and child processes belonging to - the supervisor <c><anno>SupRef</anno></c>.</p> - <p>Note that calling this function when supervising a large - number of children under low memory conditions can cause an - out of memory exception.</p> - <p>See <seealso marker="#SupRef"><c>start_child/2</c></seealso> for a description of - <c>SupRef</c>.</p> - <p>The information given for each child specification/process - is:</p> + <p>Dynamically adds a child specification to supervisor + <c><anno>SupRef</anno></c>, which starts the corresponding child + process.</p> + <p><marker id="SupRef"/><c><anno>SupRef</anno></c> can be any of the + following:</p> + <list type="bulleted"> + <item>The pid</item> + <item><c>Name</c>, if the supervisor is locally registered</item> + <item><c>{Name,Node}</c>, if the supervisor is locally + registered at another node</item> + <item><c>{global,Name}</c>, if the supervisor is globally + registered</item> + <item><c>{via,Module,Name}</c>, if the supervisor is registered + through an alternative process registry</item> + </list> + <p><c><anno>ChildSpec</anno></c> must be a valid child specification + (unless the supervisor is a <c>simple_one_for_one</c> + supervisor; see below). The child process is started by + using the start function as defined in the child specification.</p> + <p>For a <c>simple_one_for_one</c> supervisor, + the child specification defined in <c>Module:init/1</c> is used, + and <c><anno>ChildSpec</anno></c> must instead be an arbitrary + list of terms <c><anno>List</anno></c>. The child process is then + started by appending <c><anno>List</anno></c> to the existing start + function arguments, that is, by calling + <c>apply(M, F, A++<anno>List</anno>)</c>, where <c>{M,F,A}</c> is the + start function defined in the child specification.</p> <list type="bulleted"> <item> - <p><c><anno>Id</anno></c> - as defined in the child specification or - <c>undefined</c> in the case of a - <c>simple_one_for_one</c> supervisor.</p> - </item> - <item> - <p><c><anno>Child</anno></c> - the pid of the corresponding child - process, the atom <c>restarting</c> if the process is about to be - restarted, or <c>undefined</c> if there is no such process.</p> + <p>If there already exists a child specification with the specified + identifier, <c><anno>ChildSpec</anno></c> is discarded, and + the function returns <c>{error,already_present}</c> or + <c>{error,{already_started,<anno>Child</anno>}}</c>, depending on + if the corresponding child process is running or not.</p> </item> <item> - <p><c><anno>Type</anno></c> - as defined in the child specification.</p> + <p>If the child process start function returns + <c>{ok,<anno>Child</anno>}</c> or + <c>{ok,<anno>Child</anno>,<anno>Info</anno>}</c>, the child + specification and pid are added to the supervisor and the + function returns the same value.</p> </item> <item> - <p><c><anno>Modules</anno></c> - as defined in the child specification.</p> + <p>If the child process start function returns <c>ignore</c>, + the child specification is added to the supervisor (unless the + supervisor is a <c>simple_one_for_one</c> supervisor, see below), + the pid is set to <c>undefined</c>, and the function returns + <c>{ok,undefined}</c>.</p> </item> </list> + <p>For a <c>simple_one_for_one</c> supervisor, when a child + process start function returns <c>ignore</c>, the functions returns + <c>{ok,undefined}</c> and no child is added to the supervisor.</p> + <p>If the child process start function returns an error tuple or + an erroneous value, or if it fails, the child specification is + discarded, and the function returns <c>{error,Error}</c>, where + <c>Error</c> is a term containing information about the error + and child specification.</p> </desc> </func> + <func> - <name name="count_children" arity="1"/> - <fsummary>Return counts for the number of child specifications, - active children, supervisors, and workers.</fsummary> + <name name="start_link" arity="2"/> + <name name="start_link" arity="3"/> + <fsummary>Create a supervisor process.</fsummary> + <type name="startlink_ret"/> + <type name="startlink_err"/> + <type name="sup_name"/> <desc> - <p>Returns a property list (see <c>proplists</c>) containing the - counts for each of the following elements of the supervisor's - child specifications and managed processes:</p> + <p>Creates a supervisor process as part of a supervision tree. + For example, the function ensures that the supervisor is linked to + the calling process (its supervisor).</p> + <p>The created supervisor process calls + <c><anno>Module</anno>:init/1</c> to + find out about restart strategy, maximum restart intensity, + and child processes. To ensure a synchronized startup + procedure, <c>start_link/2,3</c> does not return until + <c><anno>Module</anno>:init/1</c> has returned and all child + processes have been started.</p> <list type="bulleted"> <item> - <p><c>specs</c> - the total count of children, dead or alive.</p> + <p>If <c><anno>SupName</anno>={local,Name}</c>, the supervisor is + registered locally as <c>Name</c> using <c>register/2</c>.</p> </item> <item> - <p><c>active</c> - the count of all actively running child processes - managed by this supervisor. In the case of <c>simple_one_for_one</c> - supervisors, no check is carried out to ensure that each child process - is still alive, though the result provided here is likely to be very - accurate unless the supervisor is heavily overloaded.</p> + <p>If <c><anno>SupName</anno>={global,Name}</c>, the supervisor is + registered globally as <c>Name</c> using + <seealso marker="kernel:global#register_name/2"> + <c>global:register_name/2</c></seealso>.</p> </item> <item> - <p><c>supervisors</c> - the count of all children marked as - child_type = supervisor in the spec list, whether or not the - child process is still alive.</p> + <p>If + <c><anno>SupName</anno>={via,<anno>Module</anno>,<anno>Name</anno>}</c>, + the supervisor is registered as <c>Name</c> using the registry + represented by <c>Module</c>. The <c>Module</c> callback must + export the functions <c>register_name/2</c>, + <c>unregister_name/1</c>, and <c>send/2</c>, which must behave + like the corresponding functions in + <seealso marker="kernel:global"><c>global</c></seealso>. Thus, + <c>{via,global,<anno>Name</anno>}</c> is a valid reference.</p> + </item> + </list> + <p>If no name is provided, the supervisor is not registered.</p> + <p><c><anno>Module</anno></c> is the name of the callback module.</p> + <p><c><anno>Args</anno></c> is any term that is passed as + the argument to <c><anno>Module</anno>:init/1</c>.</p> + <list type="bulleted"> + <item> + <p>If the supervisor and its child processes are successfully + created (that is, if all child process start functions return + <c>{ok,Child}</c>, <c>{ok,Child,Info}</c>, or <c>ignore</c>), + the function returns <c>{ok,Pid}</c>, where <c>Pid</c> is + the pid of the supervisor.</p> </item> <item> - <p><c>workers</c> - the count of all children marked as - child_type = worker in the spec list, whether or not the child - process is still alive.</p> + <p>If there already exists a process with the specified + <c><anno>SupName</anno></c>, the function returns + <c>{error,{already_started,Pid}}</c>, where <c>Pid</c> is + the pid of that process.</p> + </item> + <item> + <p>If <c><anno>Module</anno>:init/1</c> returns <c>ignore</c>, this + function returns <c>ignore</c> as well, and the supervisor + terminates with reason <c>normal</c>.</p> + </item> + <item> + <p>If <c><anno>Module</anno>:init/1</c> fails or returns an + incorrect value, this function returns <c>{error,Term}</c>, where + <c>Term</c> is a term with information about the error, and the + supervisor terminates with reason <c>Term</c>.</p> + </item> + <item> + <p>If any child process start function fails or returns an error + tuple or an erroneous value, the supervisor first terminates + all already started child processes with reason <c>shutdown</c> + and then terminate itself and returns + <c>{error, {shutdown, Reason}}</c>.</p> </item> </list> - <p>See <seealso marker="#SupRef"><c>start_child/2</c></seealso> - for a description of <c><anno>SupRef</anno></c>.</p> </desc> </func> + <func> - <name name="check_childspecs" arity="1"/> - <fsummary>Check if children specifications are syntactically correct.</fsummary> + <name name="terminate_child" arity="2"/> + <fsummary>Terminate a child process belonging to a supervisor.</fsummary> <desc> - <p>This function takes a list of child specification as argument - and returns <c>ok</c> if all of them are syntactically - correct, or <c>{error,<anno>Error</anno>}</c> otherwise.</p> + <p>Tells supervisor <c><anno>SupRef</anno></c> to terminate the + specified child.</p> + <p>If the supervisor is not <c>simple_one_for_one</c>, + <c><anno>Id</anno></c> must be the child specification + identifier. The process, if any, is terminated and, + unless it is a temporary child, the child specification is + kept by the supervisor. The child process can later be + restarted by the supervisor. The child process can also be + restarted explicitly by calling + <seealso marker="#restart_child/2"><c>restart_child/2</c></seealso>. + Use + <seealso marker="#delete_child/2"><c>delete_child/2</c></seealso> + to remove the child specification.</p> + <p>If the child is temporary, the child specification is deleted as + soon as the process terminates. This means + that <c>delete_child/2</c> has no meaning + and <c>restart_child/2</c> cannot be used for these children.</p> + <p>If the supervisor is <c>simple_one_for_one</c>, + <c><anno>Id</anno></c> + must be the <c>pid()</c> of the child process. If the specified + process is alive, but is not a child of the specified + supervisor, the function returns + <c>{error,not_found}</c>. If the child specification + identifier is specified instead of a <c>pid()</c>, the + function returns <c>{error,simple_one_for_one}</c>.</p> + <p>If successful, the function returns <c>ok</c>. If there is + no child specification with the specified <c><anno>Id</anno></c>, the + function returns <c>{error,not_found}</c>.</p> + <p>For a description of <c><anno>SupRef</anno></c>, see + <seealso marker="#SupRef"><c>start_child/2</c></seealso>.</p> </desc> </func> + <func> - <name name="get_childspec" arity="2"/> - <fsummary>Return the child specification map for the given - child.</fsummary> + <name name="which_children" arity="1"/> + <fsummary>Return information about all children specifications and + child processes belonging to a supervisor.</fsummary> <desc> - <p>Returns the child specification map for the child identified - by <c>Id</c> under supervisor <c>SupRef</c>. The returned - map contains all keys, both mandatory and optional.</p> - <p>See <seealso marker="#SupRef"><c>start_child/2</c></seealso> - for a description of <c><anno>SupRef</anno></c>.</p> + <p>Returns a newly created list with information about all child + specifications and child processes belonging to + supervisor <c><anno>SupRef</anno></c>.</p> + <p>Notice that calling this function when supervising many + childrens under low memory conditions can cause an + out of memory exception.</p> + <p>For a description of <c><anno>SupRef</anno></c>, see + <seealso marker="#SupRef"><c>start_child/2</c></seealso>.</p> + <p>The following information is given for each child + specification/process:</p> + <list type="bulleted"> + <item> + <p><c><anno>Id</anno></c> - As defined in the child specification or + <c>undefined</c> for a <c>simple_one_for_one</c> supervisor.</p> + </item> + <item> + <p><c><anno>Child</anno></c> - The pid of the corresponding child + process, the atom <c>restarting</c> if the process is about to be + restarted, or <c>undefined</c> if there is no such process.</p> + </item> + <item> + <p><c><anno>Type</anno></c> - As defined in the child + specification.</p> + </item> + <item> + <p><c><anno>Modules</anno></c> - As defined in the child + specification.</p> + </item> + </list> </desc> </func> </funcs> <section> - <title>CALLBACK FUNCTIONS</title> - <p>The following functions must be exported from a + <title>Callback Functions</title> + <p>The following function must be exported from a <c>supervisor</c> callback module.</p> </section> + <funcs> <func> <name>Module:init(Args) -> Result</name> @@ -599,47 +662,52 @@ <type> <v>Args = term()</v> <v>Result = {ok,{SupFlags,[ChildSpec]}} | ignore</v> - <v> SupFlags = <seealso marker="#type-sup_flags">sup_flags()</seealso></v> - <v> ChildSpec = <seealso marker="#type-child_spec">child_spec()</seealso></v> + <v> SupFlags = + <seealso marker="#type-sup_flags"><c>sup_flags()</c></seealso></v> + <v> ChildSpec = + <seealso marker="#type-child_spec"><c>child_spec()</c></seealso></v> </type> <desc> <p>Whenever a supervisor is started using - <c>supervisor:start_link/2,3</c>, this function is called by + <seealso marker="#start_link/2"><c>start_link/2,3</c></seealso>, + this function is called by the new process to find out about restart strategy, maximum restart intensity, and child specifications.</p> <p><c>Args</c> is the <c>Args</c> argument provided to the start function.</p> <p><c>SupFlags</c> is the supervisor flags defining the - restart strategy and max restart intensity for the + restart strategy and maximum restart intensity for the supervisor. <c>[ChildSpec]</c> is a list of valid child specifications defining which child processes the supervisor - shall start and monitor. See the discussion about - Supervision Principles above.</p> - <p>Note that when the restart strategy is + must start and monitor. See the discussion in section + <seealso marker="#supervision_princ"> + <c>Supervision Principles</c></seealso> earlier.</p> + <p>Notice that when the restart strategy is <c>simple_one_for_one</c>, the list of child specifications must be a list with one child specification only. - (The child specification identifier is ignored.) No child process is then started + (The child specification identifier is ignored.) + No child process is then started during the initialization phase, but all children are assumed to be started dynamically using - <c>supervisor:start_child/2</c>.</p> - <p>The function may also return <c>ignore</c>.</p> - <p>Note that this function might also be called as a part of a - code upgrade procedure. For this reason, the function should - not have any side effects. See - <seealso marker="doc/design_principles:appup_cookbook#sup">Design - Principles</seealso> for more information about code upgrade - of supervisors.</p> + <seealso marker="#start_child/2"><c>start_child/2</c></seealso>.</p> + <p>The function can also return <c>ignore</c>.</p> + <p>Notice that this function can also be called as a part of a code + upgrade procedure. Therefore, the function is not to have any side + effects. For more information about code upgrade of supervisors, see + section + <seealso marker="doc/design_principles:appup_cookbook#sup">Changing + a Supervisor</seealso> in OTP Design Principles.</p> </desc> </func> </funcs> <section> - <title>SEE ALSO</title> - <p><seealso marker="gen_event">gen_event(3)</seealso>, - <seealso marker="gen_fsm">gen_fsm(3)</seealso>, - <seealso marker="gen_statem">gen_statem(3)</seealso>, - <seealso marker="gen_server">gen_server(3)</seealso>, - <seealso marker="sys">sys(3)</seealso></p> + <title>See Also</title> + <p><seealso marker="gen_event"><c>gen_event(3)</c></seealso>, + <seealso marker="gen_fsm"><c>gen_fsm(3)</c></seealso>, + <seealso marker="gen_statem"><c>gen_statem(3)</c></seealso>, + <seealso marker="gen_server"><c>gen_server(3)</c></seealso>, + <seealso marker="sys"><c>sys(3)</c></seealso></p> </section> </erlref> diff --git a/lib/stdlib/doc/src/supervisor_bridge.xml b/lib/stdlib/doc/src/supervisor_bridge.xml index e40c8bbd6f..c4c1b37548 100644 --- a/lib/stdlib/doc/src/supervisor_bridge.xml +++ b/lib/stdlib/doc/src/supervisor_bridge.xml @@ -31,73 +31,106 @@ <rev></rev> </header> <module>supervisor_bridge</module> - <modulesummary>Generic Supervisor Bridge Behaviour.</modulesummary> + <modulesummary>Generic supervisor bridge behavior.</modulesummary> <description> - <p>A behaviour module for implementing a supervisor_bridge, a process - which connects a subsystem not designed according to the OTP design - principles to a supervision tree. The supervisor_bridge sits between + <p>This behavior module provides a supervisor bridge, a process + that connects a subsystem not designed according to the OTP design + principles to a supervision tree. The supervisor bridge sits between a supervisor and the subsystem. It behaves like a real supervisor to its own supervisor, but has a different interface than a real - supervisor to the subsystem. Refer to <em>OTP Design Principles</em> - for more information.</p> - <p>A supervisor_bridge assumes the functions for starting and stopping + supervisor to the subsystem. For more information, see + <seealso marker="doc/design_principles:sup_princ"> + Supervisor Behaviour</seealso> in OTP Design Principles. + </p> + + <p>A supervisor bridge assumes the functions for starting and stopping the subsystem to be located in a callback module exporting a - pre-defined set of functions.</p> - <p>The <c>sys</c> module can be used for debugging a - supervisor_bridge.</p> - <p>Unless otherwise stated, all functions in this module will fail if - the specified supervisor_bridge does not exist or if bad arguments are - given.</p> + predefined set of functions.</p> + + <p>The <seealso marker="sys"><c>sys(3)</c></seealso> module can be used + for debugging a supervisor bridge.</p> + + <p>Unless otherwise stated, all functions in this module fail if + the specified supervisor bridge does not exist or if bad arguments are + specified.</p> </description> + <funcs> <func> <name name="start_link" arity="2"/> <name name="start_link" arity="3"/> <fsummary>Create a supervisor bridge process.</fsummary> <desc> - <p>Creates a supervisor_bridge process, linked to the calling - process, which calls <c><anno>Module</anno>:init/1</c> to start the subsystem. - To ensure a synchronized start-up procedure, this function does + <p>Creates a supervisor bridge process, linked to the calling process, + which calls <c><anno>Module</anno>:init/1</c> to start the subsystem. + To ensure a synchronized startup procedure, this function does not return until <c><anno>Module</anno>:init/1</c> has returned.</p> - <p>If <c><anno>SupBridgeName</anno>={local,<anno>Name</anno>}</c> the supervisor_bridge is - registered locally as <c><anno>Name</anno></c> using <c>register/2</c>. - If <c><anno>SupBridgeName</anno>={global,<anno>Name</anno>}</c> the supervisor_bridge is - registered globally as <c><anno>Name</anno></c> using - <c>global:register_name/2</c>. - If <c><anno>SupBridgeName</anno>={via,<anno>Module</anno>,<anno>Name</anno>}</c> the supervisor_bridge is - registered as <c><anno>Name</anno></c> using a registry represented - by <anno>Module</anno>. The <c>Module</c> callback should export - the functions <c>register_name/2</c>, <c>unregister_name/1</c> - and <c>send/2</c>, which should behave like the - corresponding functions in <c>global</c>. Thus, - <c>{via,global,GlobalName}</c> is a valid reference. - If no name is provided, the supervisor_bridge is not registered. - If there already exists a process with the specified - <c><anno>SupBridgeName</anno></c> the function returns - <c>{error,{already_started,<anno>Pid</anno>}}</c>, where <c><anno>Pid</anno></c> is the pid - of that process.</p> + <list type="bulleted"> + <item> + <p>If <c><anno>SupBridgeName</anno>={local,<anno>Name</anno>}</c>, + the supervisor bridge is registered locally as + <c><anno>Name</anno></c> using <c>register/2</c>.</p> + </item> + <item> + <p>If <c><anno>SupBridgeName</anno>={global,<anno>Name</anno>}</c>, + the supervisor bridge is registered globally as + <c><anno>Name</anno></c> using + <seealso marker="kernel:global#register_name/2"> + <c>global:register_name/2</c></seealso>.</p> + </item> + <item> + <p>If + <c><anno>SupBridgeName</anno>={via,<anno>Module</anno>,<anno>Name</anno>}</c>, + the supervisor bridge is registered as <c><anno>Name</anno></c> + using a registry represented by <anno>Module</anno>. The + <c>Module</c> callback is to export functions + <c>register_name/2</c>, <c>unregister_name/1</c>, and <c>send/2</c>, + which are to behave like the corresponding functions in + <seealso marker="kernel:global"><c>global</c></seealso>. + Thus, <c>{via,global,GlobalName}</c> is a valid reference.</p> + </item> + </list> + <p>If no name is provided, the supervisor bridge is not registered.</p> <p><c><anno>Module</anno></c> is the name of the callback module.</p> - <p><c><anno>Args</anno></c> is an arbitrary term which is passed as the argument - to <c><anno>Module</anno>:init/1</c>.</p> - <p>If the supervisor_bridge and the subsystem are successfully - started the function returns <c>{ok,<anno>Pid</anno>}</c>, where <c><anno>Pid</anno></c> is - is the pid of the supervisor_bridge.</p> - <p>If <c><anno>Module</anno>:init/1</c> returns <c>ignore</c>, this function - returns <c>ignore</c> as well and the supervisor_bridge terminates - with reason <c>normal</c>. - If <c><anno>Module</anno>:init/1</c> fails or returns an error tuple or an - incorrect value, this function returns <c>{error,<anno>Error</anno>r}</c> where - <c><anno>Error</anno></c> is a term with information about the error, and - the supervisor_bridge terminates with reason <c><anno>Error</anno></c>.</p> + <p><c><anno>Args</anno></c> is an arbitrary term that is passed as the + argument to <c><anno>Module</anno>:init/1</c>.</p> + <list type="bulleted"> + <item> + <p>If the supervisor bridge and the subsystem are successfully + started, the function returns <c>{ok,<anno>Pid</anno>}</c>, where + <c><anno>Pid</anno></c> is is the pid of the supervisor + bridge.</p> + </item> + <item> + <p>If there already exists a process with the specified + <c><anno>SupBridgeName</anno></c>, the function returns + <c>{error,{already_started,<anno>Pid</anno>}}</c>, where + <c><anno>Pid</anno></c> is the pid of that process.</p> + </item> + <item> + <p>If <c><anno>Module</anno>:init/1</c> returns <c>ignore</c>, this + function returns <c>ignore</c> as well and the supervisor bridge + terminates with reason <c>normal</c>.</p> + </item> + <item> + <p>If <c><anno>Module</anno>:init/1</c> fails or returns an error + tuple or an incorrect value, this function returns + <c>{error,<anno>Error</anno>r}</c>, where + <c><anno>Error</anno></c> is a term with information about the + error, and the supervisor bridge + terminates with reason <c><anno>Error</anno></c>.</p> + </item> + </list> </desc> </func> </funcs> <section> - <title>CALLBACK FUNCTIONS</title> - <p>The following functions should be exported from a + <title>Callback Functions</title> + <p>The following functions must be exported from a <c>supervisor_bridge</c> callback module.</p> </section> + <funcs> <func> <name>Module:init(Args) -> Result</name> @@ -110,25 +143,26 @@ <v> Error = term()</v> </type> <desc> - <p>Whenever a supervisor_bridge is started using - <c>supervisor_bridge:start_link/2,3</c>, this function is called + <p>Whenever a supervisor bridge is started using + <seealso marker="#start_link/2"><c>start_link/2,3</c></seealso>, + this function is called by the new process to start the subsystem and initialize.</p> <p><c>Args</c> is the <c>Args</c> argument provided to the start function.</p> - <p>The function should return <c>{ok,Pid,State}</c> where <c>Pid</c> + <p>The function is to return <c>{ok,Pid,State}</c>, where <c>Pid</c> is the pid of the main process in the subsystem and <c>State</c> is any term.</p> <p>If later <c>Pid</c> terminates with a reason <c>Reason</c>, - the supervisor bridge will terminate with reason <c>Reason</c> as - well. - If later the supervisor_bridge is stopped by its supervisor with - reason <c>Reason</c>, it will call + the supervisor bridge terminates with reason <c>Reason</c> as well. + If later the supervisor bridge is stopped by its supervisor with + reason <c>Reason</c>, it calls <c>Module:terminate(Reason,State)</c> to terminate.</p> - <p>If something goes wrong during the initialization the function - should return <c>{error,Error}</c> where <c>Error</c> is any - term, or <c>ignore</c>.</p> + <p>If the initialization fails, the function is to return + <c>{error,Error}</c>, where <c>Error</c> is any term, + or <c>ignore</c>.</p> </desc> </func> + <func> <name>Module:terminate(Reason, State)</name> <fsummary>Clean up and stop subsystem.</fsummary> @@ -137,15 +171,15 @@ <v>State = term()</v> </type> <desc> - <p>This function is called by the supervisor_bridge when it is about - to terminate. It should be the opposite of <c>Module:init/1</c> + <p>This function is called by the supervisor bridge when it is about + to terminate. It is to be the opposite of <c>Module:init/1</c> and stop the subsystem and do any necessary cleaning up. The return value is ignored.</p> - <p><c>Reason</c> is <c>shutdown</c> if the supervisor_bridge is - terminated by its supervisor. If the supervisor_bridge terminates + <p><c>Reason</c> is <c>shutdown</c> if the supervisor bridge is + terminated by its supervisor. If the supervisor bridge terminates because a a linked process (apart from the main process of the subsystem) has terminated with reason <c>Term</c>, - <c>Reason</c> will be <c>Term</c>.</p> + then <c>Reason</c> becomes <c>Term</c>.</p> <p><c>State</c> is taken from the return value of <c>Module:init/1</c>.</p> </desc> @@ -153,9 +187,9 @@ </funcs> <section> - <title>SEE ALSO</title> - <p><seealso marker="supervisor">supervisor(3)</seealso>, - <seealso marker="sys">sys(3)</seealso></p> + <title>See Also</title> + <p><seealso marker="supervisor"><c>supervisor(3)</c></seealso>, + <seealso marker="sys"><c>sys(3)</c></seealso></p> </section> </erlref> diff --git a/lib/stdlib/doc/src/sys.xml b/lib/stdlib/doc/src/sys.xml index 2255395f46..1120b926d5 100644 --- a/lib/stdlib/doc/src/sys.xml +++ b/lib/stdlib/doc/src/sys.xml @@ -4,7 +4,7 @@ <erlref> <header> <copyright> - <year>1996</year><year>2016</year> + <year>1996</year><year>2014</year> <holder>Ericsson AB. All Rights Reserved.</holder> </copyright> <legalnotice> @@ -30,62 +30,67 @@ <checked></checked> <date>1996-06-06</date> <rev></rev> - <file>sys.sgml</file> + <file>sys.xml</file> </header> <module>sys</module> - <modulesummary>A Functional Interface to System Messages</modulesummary> + <modulesummary>A functional interface to system messages.</modulesummary> <description> - <p>This module contains functions for sending system messages used by programs, and messages used for debugging purposes. - </p> - <p>Functions used for implementation of processes - should also understand system messages such as debugging - messages and code change. These functions must be used to implement the use of system messages for a process; either directly, or through standard behaviours, such as <c>gen_server</c>.</p> - <p>The default timeout is 5000 ms, unless otherwise specified. The - <c>timeout</c> defines the time period to wait for the process to + <p>This module contains functions for sending system messages used by + programs, and messages used for debugging purposes.</p> + <p>Functions used for implementation of processes are also expected to + understand system messages, such as debug messages and code change. These + functions must be used to implement the use of system messages for a + process; either directly, or through standard behaviors, such as + <seealso marker="gen_server"><c>gen_server</c></seealso>.</p> + <p>The default time-out is 5000 ms, unless otherwise specified. + <c>timeout</c> defines the time to wait for the process to respond to a request. If the process does not respond, the function evaluates <c>exit({timeout, {M, F, A}})</c>. </p> - <p><marker id="dbg_opt"/>The functions make reference to a debug structure. - The debug structure is a list of <c>dbg_opt()</c>. - <c>dbg_opt()</c> is an internal data type used by the - <c>handle_system_msg/6</c> function. No debugging is performed if it is an empty list. - </p> + <marker id="dbg_opt"/> + <p>The functions make references to a debug structure. + The debug structure is a list of <c>dbg_opt()</c>, which is an internal + data type used by function <seealso marker="#handle_system_msg/6"> + <c>handle_system_msg/6</c></seealso>. No debugging is performed if it is + an empty list.</p> </description> <section> <title>System Messages</title> - <p>Processes which are not implemented as one of the standard - behaviours must still understand system - messages. There are three different messages which must be - understood: - </p> + <p>Processes that are not implemented as one of the standard + behaviors must still understand system messages. The following + three messages must be understood:</p> <list type="bulleted"> <item> <p>Plain system messages. These are received as <c>{system, From, Msg}</c>. The content and meaning of this message are not interpreted by the - receiving process module. When a system message has been - received, the function <c>sys:handle_system_msg/6</c> - is called in order to handle the request. - </p> + receiving process module. When a system message is received, function + <seealso marker="#handle_system_msg/6"> + <c>handle_system_msg/6</c></seealso> + is called to handle the request.</p> </item> <item> <p>Shutdown messages. If the process traps exits, it must - be able to handle an shut-down request from its parent, the + be able to handle a shutdown request from its parent, the supervisor. The message <c>{'EXIT', Parent, Reason}</c> - from the parent is an order to terminate. The process must terminate when this message is received, normally with the + from the parent is an order to terminate. The process must + terminate when this message is received, normally with the same <c>Reason</c> as <c>Parent</c>. </p> </item> <item> - <p>There is one more message which the process must understand if the modules used to implement the process change dynamically during runtime. An example of such a process is the <c>gen_event</c> processes. This message is <c>{get_modules, From}</c>. The reply to this message is <c>From ! {modules, Modules}</c>, - where <c>Modules</c> is a list of the currently active modules in the process. - </p> + <p>If the modules used to implement the process change dynamically + during runtime, the process must understand one more message. An + example is the <seealso marker="gen_event"><c>gen_event</c></seealso> + processes. The message is <c>{get_modules, From}</c>. + The reply to this message is <c>From ! {modules, Modules}</c>, where + <c>Modules</c> is a list of the currently active modules in the + process.</p> <p>This message is used by the release handler to find which - processes execute a certain module. The process may at a - later time be suspended and ordered to perform a code change - for one of its modules. - </p> + processes that execute a certain module. The process can later be + suspended and ordered to perform a code change for one of its + modules.</p> </item> </list> </section> @@ -93,15 +98,16 @@ <section> <title>System Events</title> <p>When debugging a process with the functions of this - module, the process generates <em>system_events</em> which are + module, the process generates <em>system_events</em>, which are then treated in the debug function. For example, <c>trace</c> - formats the system events to the tty. + formats the system events to the terminal. </p> - <p>There are three predefined system events which are used when a + <p>Three predefined system events are used when a process receives or sends a message. The process can also define its own system events. It is always up to the process itself to format these events.</p> </section> + <datatypes> <datatype> <name name="name"/> @@ -111,7 +117,7 @@ </datatype> <datatype> <name name="dbg_opt"/> - <desc><p>See <seealso marker="#dbg_opt">above</seealso>.</p></desc> + <desc><p>See the introduction of this manual page.</p></desc> </datatype> <datatype> <name name="dbg_fun"/> @@ -120,421 +126,594 @@ <name name="format_fun"/> </datatype> </datatypes> + <funcs> <func> - <name name="log" arity="2"/> - <name name="log" arity="3"/> - <fsummary>Log system events in memory</fsummary> - <desc> - <p>Turns the logging of system events On or Off. If On, a - maximum of <c><anno>N</anno></c> events are kept in the - debug structure (the default is 10). If <c><anno>Flag</anno></c> is <c>get</c>, a list of all - logged events is returned. If <c><anno>Flag</anno></c> is <c>print</c>, the - logged events are printed to <c>standard_io</c>. The events are - formatted with a function that is defined by the process that - generated the event (with a call to - <c>sys:handle_debug/4</c>).</p> - </desc> - </func> - <func> - <name name="log_to_file" arity="2"/> - <name name="log_to_file" arity="3"/> - <fsummary>Log system events to the specified file</fsummary> - <desc> - <p>Enables or disables the logging of all system events in textual - format to the file. The events are formatted with a function that is - defined by the process that generated the event (with a call - to <c>sys:handle_debug/4</c>).</p> - </desc> - </func> - <func> - <name name="statistics" arity="2"/> - <name name="statistics" arity="3"/> - <fsummary>Enable or disable the collections of statistics</fsummary> + <name name="change_code" arity="4"/> + <name name="change_code" arity="5"/> + <fsummary>Send the code change system message to the process.</fsummary> <desc> - <p>Enables or disables the collection of statistics. If <c><anno>Flag</anno></c> is - <c>get</c>, the statistical collection is returned.</p> + <p>Tells the process to change code. The process must be + suspended to handle this message. Argument <c><anno>Extra</anno></c> + is reserved for each process to use as its own. Function + <c><anno>Module</anno>:system_code_change/4</c> is called. + <c><anno>OldVsn</anno></c> is the old version of the + <c><anno>Module</anno></c>.</p> </desc> </func> + <func> - <name name="trace" arity="2"/> - <name name="trace" arity="3"/> - <fsummary>Print all system events on <c>standard_io</c></fsummary> + <name name="get_state" arity="1"/> + <name name="get_state" arity="2"/> + <fsummary>Get the state of the process.</fsummary> <desc> - <p>Prints all system events on <c>standard_io</c>. The events are - formatted with a function that is defined by the process that - generated the event (with a call to - <c>sys:handle_debug/4</c>).</p> + <p>Gets the state of the process.</p> + <note> + <p>These functions are intended only to help with debugging. They are + provided for convenience, allowing developers to avoid having to + create their own state extraction functions and also avoid having + to interactively extract the state from the return values of + <seealso marker="#get_status-1"><c>get_status/1</c></seealso> or + <seealso marker="#get_status-2"><c>get_status/2</c></seealso> + while debugging.</p> + </note> + <p>The value of <c><anno>State</anno></c> varies for different types of + processes, as follows:</p> + <list type="bulleted"> + <item> + <p>For a + <seealso marker="gen_server"><c>gen_server</c></seealso> + process, the returned <c><anno>State</anno></c> + is the state of the callback module.</p> + </item> + <item> + <p>For a + <seealso marker="gen_fsm"><c>gen_fsm</c></seealso> + process, <c><anno>State</anno></c> is the tuple + <c>{CurrentStateName, CurrentStateData}</c>.</p> + </item> + <item> + <p>For a + <seealso marker="gen_statem"><c>gen_statem</c></seealso> + process, <c><anno>State</anno></c> is the tuple + <c>{CurrentState,CurrentData}</c>.</p> + </item> + <item> + <p>For a + <seealso marker="gen_event"><c>gen_event</c></seealso> + process, <c><anno>State</anno></c> is a list of tuples, + where each tuple corresponds to an event handler registered + in the process and contains <c>{Module, Id, HandlerState}</c>, + as follows:</p> + <taglist> + <tag><c>Module</c></tag> + <item> + <p>The module name of the event handler.</p> + </item> + <tag><c>Id</c></tag> + <item> + <p>The ID of the handler (which is <c>false</c> if it was + registered without an ID).</p> + </item> + <tag><c>HandlerState</c></tag> + <item> + <p>The state of the handler.</p> + </item> + </taglist> + </item> + </list> + <p>If the callback module exports a function <c>system_get_state/1</c>, + it is called in the target process to get its state. Its argument is + the same as the <c>Misc</c> value returned by + <seealso marker="#get_status-1"><c>get_status/1,2</c></seealso>, and + function <seealso marker="#Module:system_get_state/1"> + <c>Module:system_get_state/1</c></seealso> is expected to extract the + state of the callback module from it. Function + <c>system_get_state/1</c> must return <c>{ok, State}</c>, where + <c>State</c> is the state of the callback module.</p> + <p>If the callback module does not export a <c>system_get_state/1</c> + function, <c>get_state/1,2</c> assumes that the <c>Misc</c> value is + the state of the callback module and returns it directly instead.</p> + <p>If the callback module's <c>system_get_state/1</c> function crashes + or throws an exception, the caller exits with error + <c>{callback_failed, {Module, system_get_state}, {Class, Reason}}</c>, + where <c>Module</c> is the name of the callback module and + <c>Class</c> and <c>Reason</c> indicate details of the exception.</p> + <p>Function <c>system_get_state/1</c> is primarily useful for + user-defined behaviors and modules that implement OTP + <seealso marker="#special_process">special processes</seealso>. + The <c>gen_server</c>, <c>gen_fsm</c>, + <c>gen_statem</c>, and <c>gen_event</c> OTP + behavior modules export this function, so callback modules for those + behaviors need not to supply their own.</p> + <p>For more information about a process, including its state, see + <seealso marker="#get_status-1"><c>get_status/1</c></seealso> and + <seealso marker="#get_status-2"><c>get_status/2</c></seealso>.</p> </desc> </func> + <func> - <name name="no_debug" arity="1"/> - <name name="no_debug" arity="2"/> - <fsummary>Turn off debugging</fsummary> + <name name="get_status" arity="1"/> + <name name="get_status" arity="2"/> + <fsummary>Get the status of the process.</fsummary> <desc> - <p>Turns off all debugging for the process. This includes - functions that have been installed explicitly with the - <c>install</c> function, for example triggers.</p> + <p>Gets the status of the process.</p> + <p>The value of <c><anno>Misc</anno></c> varies for different types of + processes, for example:</p> + <list type="bulleted"> + <item> + <p>A <seealso marker="gen_server"><c>gen_server</c></seealso> + process returns the state of the callback module.</p> + </item> + <item> + <p>A <seealso marker="gen_fsm"><c>gen_fsm</c></seealso> + process returns information, such as its current + state name and state data.</p> + </item> + <item> + <p>A <seealso marker="gen_statem"><c>gen_statem</c></seealso> + process returns information, such as its current + state name and state data.</p> + </item> + <item> + <p>A <seealso marker="gen_event"><c>gen_event</c></seealso> + process returns information about each of its + registered handlers.</p> + </item> + </list> + <p>Callback modules for <c>gen_server</c>, + <c>gen_fsm</c>, <c>gen_statem</c>, and <c>gen_event</c> + can also change the value of <c><anno>Misc</anno></c> + by exporting a function <c>format_status/2</c>, which contributes + module-specific information. For details, see + <seealso marker="gen_server#Module:format_status/2"> + <c>gen_server:format_status/2</c></seealso>, + <seealso marker="gen_fsm#Module:format_status/2"> + <c>gen_fsm:format_status/2</c></seealso>, + <seealso marker="gen_statem#Module:format_status/2"> + <c>gen_statem:format_status/2</c></seealso>, and + <seealso marker="gen_event#Module:format_status/2"> + <c>gen_event:format_status/2</c></seealso>.</p> </desc> </func> + <func> - <name name="suspend" arity="1"/> - <name name="suspend" arity="2"/> - <fsummary>Suspend the process</fsummary> + <name name="install" arity="2"/> + <name name="install" arity="3"/> + <fsummary>Install a debug function in the process.</fsummary> <desc> - <p>Suspends the process. When the process is suspended, it - will only respond to other system messages, but not other - messages.</p> + <p>Enables installation of alternative debug functions. An example of + such a function is a trigger, a function that waits for some + special event and performs some action when the event is + generated. For example, turning on low-level tracing.</p> + <p><c><anno>Func</anno></c> is called whenever a system event is + generated. This function is to return <c>done</c>, or a new + <c>Func</c> state. In the first case, the function is removed. It is + also removed if the function fails.</p> </desc> </func> + <func> - <name name="resume" arity="1"/> - <name name="resume" arity="2"/> - <fsummary>Resume a suspended process</fsummary> + <name name="log" arity="2"/> + <name name="log" arity="3"/> + <fsummary>Log system events in memory.</fsummary> <desc> - <p>Resumes a suspended process.</p> + <p>Turns the logging of system events on or off. If on, a + maximum of <c><anno>N</anno></c> events are kept in the + debug structure (default is 10).</p> + <p>If <c><anno>Flag</anno></c> is <c>get</c>, a list of all logged + events is returned.</p> + <p>If <c><anno>Flag</anno></c> is <c>print</c>, the logged events + are printed to <c>standard_io</c>.</p> + <p>The events are formatted with a function that is defined by the + process that generated the event (with a call to + <seealso marker="#handle_debug/4"> + <c>handle_debug/4</c>)</seealso>.</p> </desc> </func> + <func> - <name name="change_code" arity="4"/> - <name name="change_code" arity="5"/> - <fsummary>Send the code change system message to the process</fsummary> + <name name="log_to_file" arity="2"/> + <name name="log_to_file" arity="3"/> + <fsummary>Log system events to the specified file.</fsummary> <desc> - <p>Tells the process to change code. The process must be - suspended to handle this message. The <c><anno>Extra</anno></c> argument is - reserved for each process to use as its own. The function - <c><anno>Module</anno>:system_code_change/4</c> is called. <c><anno>OldVsn</anno></c> is - the old version of the <c><anno>Module</anno></c>.</p> + <p>Enables or disables the logging of all system events in text + format to the file. The events are formatted with a function that is + defined by the process that generated the event (with a call to + <seealso marker="#handle_debug/4"><c>handle_debug/4</c></seealso>). + </p> </desc> </func> + <func> - <name name="get_status" arity="1"/> - <name name="get_status" arity="2"/> - <fsummary>Get the status of the process</fsummary> + <name name="no_debug" arity="1"/> + <name name="no_debug" arity="2"/> + <fsummary>Turn off debugging.</fsummary> <desc> - <p>Gets the status of the process.</p> - <p>The value of <c><anno>Misc</anno></c> varies for different types of - processes. For example, a <c>gen_server</c> process returns - the callback module's state, a <c>gen_fsm</c> process - returns information such as its current state name and state data, - a <c>gen_statem</c> process returns information about - its current state and data, and a <c>gen_event</c> process - returns information about each of its - registered handlers. Callback modules for <c>gen_server</c>, - <c>gen_fsm</c>, <c>gen_statem</c> and <c>gen_event</c> - can also customise the value - of <c><anno>Misc</anno></c> by exporting a <c>format_status/2</c> - function that contributes module-specific information; - see <seealso marker="gen_server#Module:format_status/2">gen_server format_status/2</seealso>, - <seealso marker="gen_fsm#Module:format_status/2">gen_fsm format_status/2</seealso>, - <seealso marker="gen_statem#Module:format_status/2">gen_statem format_status/2</seealso>, and - <seealso marker="gen_event#Module:format_status/2">gen_event format_status/2</seealso> - for more details.</p> + <p>Turns off all debugging for the process. This includes + functions that are installed explicitly with function + <seealso marker="#install/2"><c>install/2,3</c></seealso>, + for example, triggers.</p> </desc> </func> + <func> - <name name="get_state" arity="1"/> - <name name="get_state" arity="2"/> - <fsummary>Get the state of the process</fsummary> + <name name="remove" arity="2"/> + <name name="remove" arity="3"/> + <fsummary>Remove a debug function from the process.</fsummary> <desc> - <p>Gets the state of the process.</p> - <note> - <p>These functions are intended only to help with debugging. They are provided for - convenience, allowing developers to avoid having to create their own state extraction - functions and also avoid having to interactively extract state from the return values of - <seealso marker="#get_status-1"><c>get_status/1</c></seealso> or - <seealso marker="#get_status-2"><c>get_status/2</c></seealso> while debugging.</p> - </note> - <p>The value of <c><anno>State</anno></c> varies for different types of - processes. For a <c>gen_server</c> process, the returned <c><anno>State</anno></c> - is simply the callback module's state. For a <c>gen_fsm</c> process, - <c><anno>State</anno></c> is the tuple <c>{CurrentStateName, CurrentStateData}</c>. - For a <c>gen_statem</c> process <c><anno>State</anno></c> is - the tuple <c>{CurrentState,CurrentData}.</c> - For a <c>gen_event</c> process, <c><anno>State</anno></c> a list of tuples, - where each tuple corresponds to an event handler registered in the process and contains - <c>{Module, Id, HandlerState}</c>, where <c>Module</c> is the event handler's module name, - <c>Id</c> is the handler's ID (which is the value <c>false</c> if it was registered without - an ID), and <c>HandlerState</c> is the handler's state.</p> - <p>If the callback module exports a <c>system_get_state/1</c> function, it will be called in the - target process to get its state. Its argument is the same as the <c>Misc</c> value returned by - <seealso marker="#get_status-1">get_status/1,2</seealso>, and the <c>system_get_state/1</c> - function is expected to extract the callback module's state from it. The <c>system_get_state/1</c> - function must return <c>{ok, State}</c> where <c>State</c> is the callback module's state.</p> - <p>If the callback module does not export a <c>system_get_state/1</c> function, <c>get_state/1,2</c> - assumes the <c>Misc</c> value is the callback module's state and returns it directly instead.</p> - <p>If the callback module's <c>system_get_state/1</c> function crashes or throws an exception, the - caller exits with error <c>{callback_failed, {Module, system_get_state}, {Class, Reason}}</c> where - <c>Module</c> is the name of the callback module and <c>Class</c> and <c>Reason</c> indicate - details of the exception.</p> - <p>The <c>system_get_state/1</c> function is primarily useful for user-defined - behaviours and modules that implement OTP <seealso marker="#special_process">special - processes</seealso>. The <c>gen_server</c>, <c>gen_fsm</c>, - <c>gen_statem</c> and <c>gen_event</c> OTP - behaviour modules export this function, so callback modules for those behaviours - need not supply their own.</p> - <p>To obtain more information about a process, including its state, see - <seealso marker="#get_status-1">get_status/1</seealso> and - <seealso marker="#get_status-2">get_status/2</seealso>.</p> + <p>Removes an installed debug function from the + process. <c><anno>Func</anno></c> must be the same as previously + installed.</p> </desc> </func> + <func> <name name="replace_state" arity="2"/> <name name="replace_state" arity="3"/> - <fsummary>Replace the state of the process</fsummary> + <fsummary>Replace the state of the process.</fsummary> <desc> <p>Replaces the state of the process, and returns the new state.</p> <note> - <p>These functions are intended only to help with debugging, and they should not be - be called from normal code. They are provided for convenience, allowing developers - to avoid having to create their own custom state replacement functions.</p> + <p>These functions are intended only to help with debugging, and are + not to be called from normal code. They are provided for + convenience, allowing developers to avoid having to create their own + custom state replacement functions.</p> </note> - <p>The <c><anno>StateFun</anno></c> function provides a new state for the process. - The <c><anno>State</anno></c> argument and <c><anno>NewState</anno></c> return value - of <c><anno>StateFun</anno></c> vary for different types of processes. For a - <c>gen_server</c> process, <c><anno>State</anno></c> is simply the callback module's - state, and <c><anno>NewState</anno></c> is a new instance of that state. For a - <c>gen_fsm</c> process, <c><anno>State</anno></c> is the tuple - <c>{CurrentStateName, CurrentStateData}</c>, and <c><anno>NewState</anno></c> - is a similar tuple that may contain a new state name, new state data, or both. - The same applies for a <c>gen_statem</c> process but - it names the tuple fields <c>{CurrentState,CurrentData}</c>. - For a <c>gen_event</c> process, <c><anno>State</anno></c> is the tuple - <c>{Module, Id, HandlerState}</c> where <c>Module</c> is the event handler's module name, - <c>Id</c> is the handler's ID (which is the value <c>false</c> if it was registered without - an ID), and <c>HandlerState</c> is the handler's state. <c><anno>NewState</anno></c> is a - similar tuple where <c>Module</c> and <c>Id</c> shall have the same values as in - <c><anno>State</anno></c> but the value of <c>HandlerState</c> may be different. Returning - a <c><anno>NewState</anno></c> whose <c>Module</c> or <c>Id</c> values differ from those of - <c><anno>State</anno></c> will result in the event handler's state remaining unchanged. For a - <c>gen_event</c> process, <c><anno>StateFun</anno></c> is called once for each event handler - registered in the <c>gen_event</c> process.</p> - <p>If a <c><anno>StateFun</anno></c> function decides not to effect any change in process - state, then regardless of process type, it may simply return its <c><anno>State</anno></c> - argument.</p> - <p>If a <c><anno>StateFun</anno></c> function crashes or throws an exception, then - for <c>gen_server</c>, <c>gen_fsm</c> or <c>gen_statem</c> processes, - the original state of the process is - unchanged. For <c>gen_event</c> processes, a crashing or failing <c><anno>StateFun</anno></c> - function means that only the state of the particular event handler it was working on when it - failed or crashed is unchanged; it can still succeed in changing the states of other event + <p>Function <c><anno>StateFun</anno></c> provides a new state for the + process. Argument <c><anno>State</anno></c> and the + <c><anno>NewState</anno></c> return value of + <c><anno>StateFun</anno></c> vary for different types of + processes as follows:</p> + <list type="bulleted"> + <item> + <p>For a <seealso marker="gen_server"><c>gen_server</c></seealso> + process, <c><anno>State</anno></c> is the state of the callback + module and <c><anno>NewState</anno></c> + is a new instance of that state.</p> + </item> + <item> + <p>For a <seealso marker="gen_fsm"><c>gen_fsm</c></seealso> process, + <c><anno>State</anno></c> is the tuple <c>{CurrentStateName, + CurrentStateData}</c>, and <c><anno>NewState</anno></c> is a + similar tuple, which can contain + a new state name, new state data, or both.</p> + </item> + <item> + <p>For a <seealso marker="gen_statem"><c>gen_statem</c></seealso> + process, <c><anno>State</anno></c> is the + tuple <c>{CurrentState,CurrentData}</c>, + and <c><anno>NewState</anno></c> is a + similar tuple, which can contain + a new current state, new state data, or both.</p> + </item> + <item> + <p>For a <seealso marker="gen_event"><c>gen_event</c></seealso> + process, <c><anno>State</anno></c> is the + tuple <c>{Module, Id, HandlerState}</c> as follows:</p> + <taglist> + <tag><c>Module</c></tag> + <item> + <p>The module name of the event handler.</p> + </item> + <tag><c>Id</c></tag> + <item> + <p>The ID of the handler (which is <c>false</c> if it was + registered without an ID).</p> + </item> + <tag><c>HandlerState</c></tag> + <item> + <p>The state of the handler.</p> + </item> + </taglist> + <p><c><anno>NewState</anno></c> is a similar tuple where + <c>Module</c> and <c>Id</c> are to have the same values as in + <c><anno>State</anno></c>, but the value of <c>HandlerState</c> + can be different. Returning a <c><anno>NewState</anno></c>, whose + <c>Module</c> or <c>Id</c> values differ from those of + <c><anno>State</anno></c>, leaves the state of the event handler + unchanged. For a <c>gen_event</c> process, + <c><anno>StateFun</anno></c> is called once for each event handler + registered in the <c>gen_event</c> process.</p> + </item> + </list> + <p>If a <c><anno>StateFun</anno></c> function decides not to effect any + change in process state, then regardless of process type, it can + return its <c><anno>State</anno></c> argument.</p> + <p>If a <c><anno>StateFun</anno></c> function crashes or throws an + exception, the original state of the process is unchanged for + <c>gen_server</c>, <c>gen_fsm</c>, and <c>gen_statem</c> processes. + For <c>gen_event</c> processes, a crashing or + failing <c><anno>StateFun</anno></c> function + means that only the state of the particular event handler it was + working on when it failed or crashed is unchanged; it can still + succeed in changing the states of other event handlers registered in the same <c>gen_event</c> process.</p> - <p>If the callback module exports a <c>system_replace_state/2</c> function, it will be called in the - target process to replace its state using <c>StateFun</c>. Its two arguments are <c>StateFun</c> - and <c>Misc</c>, where <c>Misc</c> is the same as the <c>Misc</c> value returned by - <seealso marker="#get_status-1">get_status/1,2</seealso>. A <c>system_replace_state/2</c> function - is expected to return <c>{ok, NewState, NewMisc}</c> where <c>NewState</c> is the callback module's - new state obtained by calling <c>StateFun</c>, and <c>NewMisc</c> is a possibly new value used to - replace the original <c>Misc</c> (required since <c>Misc</c> often contains the callback - module's state within it).</p> - <p>If the callback module does not export a <c>system_replace_state/2</c> function, - <c>replace_state/2,3</c> assumes the <c>Misc</c> value is the callback module's state, passes it - to <c>StateFun</c> and uses the return value as both the new state and as the new value of - <c>Misc</c>.</p> - <p>If the callback module's <c>system_replace_state/2</c> function crashes or throws an exception, - the caller exits with error <c>{callback_failed, {Module, system_replace_state}, {Class, Reason}}</c> - where <c>Module</c> is the name of the callback module and <c>Class</c> and <c>Reason</c> indicate details - of the exception. If the callback module does not provide a <c>system_replace_state/2</c> function and - <c>StateFun</c> crashes or throws an exception, the caller exits with error + <p>If the callback module exports a + <seealso marker="#Module:system_replace_state/2"> + <c>system_replace_state/2</c></seealso> function, it is called in the + target process to replace its state using <c>StateFun</c>. Its two + arguments are <c>StateFun</c> and <c>Misc</c>, where + <c>Misc</c> is the same as the <c>Misc</c> value returned by + <seealso marker="#get_status-1"><c>get_status/1,2</c></seealso>. + A <c>system_replace_state/2</c> function is expected to return + <c>{ok, NewState, NewMisc}</c>, where <c>NewState</c> is the new state + of the callback module, obtained by calling <c>StateFun</c>, and + <c>NewMisc</c> is + a possibly new value used to replace the original <c>Misc</c> + (required as <c>Misc</c> often contains the state of the callback + module within it).</p> + <p>If the callback module does not export a + <c>system_replace_state/2</c> function, + <seealso marker="#replace_state/2"><c>replace_state/2,3</c></seealso> + assumes that <c>Misc</c> is the state of the callback module, + passes it to <c>StateFun</c> and uses the return value as + both the new state and as the new value of <c>Misc</c>.</p> + <p>If the callback module's function <c>system_replace_state/2</c> + crashes or throws an exception, the caller exits with error + <c>{callback_failed, {Module, system_replace_state}, {Class, + Reason}}</c>, where <c>Module</c> is the name of the callback module + and <c>Class</c> and <c>Reason</c> indicate details of the exception. + If the callback module does not provide a + <c>system_replace_state/2</c> function and <c>StateFun</c> crashes or + throws an exception, the caller exits with error <c>{callback_failed, StateFun, {Class, Reason}}</c>.</p> - <p>The <c>system_replace_state/2</c> function is primarily useful for user-defined behaviours and - modules that implement OTP <seealso marker="#special_process">special processes</seealso>. The - <c>gen_server</c>, <c>gen_fsm</c>, <c>gen_statem</c> and - <c>gen_event</c> OTP behaviour modules export this function, - and so callback modules for those behaviours need not supply their own.</p> + <p>Function <c>system_replace_state/2</c> is primarily useful for + user-defined behaviors and modules that implement OTP + <seealso marker="#special_process">special processes</seealso>. The + OTP behavior modules <c>gen_server</c>, + <c>gen_fsm</c>, <c>gen_statem</c>, and <c>gen_event</c> + export this function, so callback modules for those + behaviors need not to supply their own.</p> </desc> </func> + <func> - <name name="install" arity="2"/> - <name name="install" arity="3"/> - <fsummary>Install a debug function in the process</fsummary> + <name name="resume" arity="1"/> + <name name="resume" arity="2"/> + <fsummary>Resume a suspended process.</fsummary> <desc> - <p>This function makes it possible to install other debug - functions than the ones defined above. An example of such a - function is a trigger, a function that waits for some - special event and performs some action when the event is - generated. This could, for example, be turning on low level tracing. - </p> - <p><c><anno>Func</anno></c> is called whenever a system event is - generated. This function should return <c>done</c>, or a new - func state. In the first case, the function is removed. It is removed - if the function fails.</p> + <p>Resumes a suspended process.</p> </desc> </func> + <func> - <name name="remove" arity="2"/> - <name name="remove" arity="3"/> - <fsummary>Remove a debug function from the process</fsummary> + <name name="statistics" arity="2"/> + <name name="statistics" arity="3"/> + <fsummary>Enable or disable the collections of statistics.</fsummary> <desc> - <p>Removes a previously installed debug function from the - process. <c><anno>Func</anno></c> must be the same as previously - installed.</p> + <p>Enables or disables the collection of statistics. If + <c><anno>Flag</anno></c> is <c>get</c>, + the statistical collection is returned.</p> </desc> </func> + + <func> + <name name="suspend" arity="1"/> + <name name="suspend" arity="2"/> + <fsummary>Suspend the process.</fsummary> + <desc> + <p>Suspends the process. When the process is suspended, it + only responds to other system messages, but not other + messages.</p> + </desc> + </func> + <func> <name name="terminate" arity="2"/> <name name="terminate" arity="3"/> - <fsummary>Terminate the process</fsummary> + <fsummary>Terminate the process.</fsummary> <desc> - <p>This function orders the process to terminate with the - given <c><anno>Reason</anno></c>. The termination is done - asynchronously, so there is no guarantee that the process is - actually terminated when the function returns.</p> + <p>Orders the process to terminate with the + specified <c><anno>Reason</anno></c>. The termination is done + asynchronously, so it is not guaranteed that the process is + terminated when the function returns.</p> + </desc> + </func> + + <func> + <name name="trace" arity="2"/> + <name name="trace" arity="3"/> + <fsummary>Print all system events on <c>standard_io</c>.</fsummary> + <desc> + <p>Prints all system events on <c>standard_io</c>. The events are + formatted with a function that is defined by the process that + generated the event (with a call to + <seealso marker="#handle_debug/4"><c>handle_debug/4</c></seealso>). + </p> </desc> </func> </funcs> <section> <title>Process Implementation Functions</title> - <p><marker id="special_process"/>The following functions are used when implementing a - special process. This is an ordinary process which does not use a - standard behaviour, but a process which understands the standard system messages.</p> + <marker id="special_process"/> + <p>The following functions are used when implementing a + special process. This is an ordinary process, which does not use a + standard behavior, but a process that understands the standard system + messages.</p> </section> + <funcs> <func> <name name="debug_options" arity="1"/> - <fsummary>Convert a list of options to a debug structure</fsummary> + <fsummary>Convert a list of options to a debug structure.</fsummary> <desc> - <p>This function can be used by a process that initiates a debug - structure from a list of options. The values of the - <c><anno>Opt</anno></c> argument are the same as the corresponding + <p>Can be used by a process that initiates a debug + structure from a list of options. The values of argument + <c><anno>Opt</anno></c> are the same as for the corresponding functions.</p> </desc> </func> + <func> <name name="get_debug" arity="3"/> - <fsummary>Get the data associated with a debug option</fsummary> + <fsummary>Get the data associated with a debug option.</fsummary> <desc> - <p>This function gets the data associated with a debug option. <c><anno>Default</anno></c> is returned if the - <c><anno>Item</anno></c> is not found. Can be - used by the process to retrieve debug data for printing - before it terminates.</p> + <p>Gets the data associated with a debug option. + <c><anno>Default</anno></c> + is returned if <c><anno>Item</anno></c> is not found. Can be + used by the process to retrieve debug data for printing before it + terminates.</p> </desc> </func> + <func> <name name="handle_debug" arity="4"/> - <fsummary>Generate a system event</fsummary> + <fsummary>Generate a system event.</fsummary> <desc> <p>This function is called by a process when it generates a - system event. <c><anno>FormFunc</anno></c> is a formatting - function which is called as <c><anno>FormFunc</anno>(Device, - <anno>Event</anno>, <anno>Extra</anno>)</c> in order to print - the events, which is necessary if tracing is activated. - <c><anno>Extra</anno></c> is any extra information which the - process needs in the format function, for example the name - of the process.</p> + system event. <c><anno>FormFunc</anno></c> is a formatting + function, called as <c><anno>FormFunc</anno>(Device, + <anno>Event</anno>, <anno>Extra</anno>)</c> to print the events, + which is necessary if tracing is activated. + <c><anno>Extra</anno></c> is any extra information that the + process needs in the format function, for example, the process + name.</p> </desc> </func> + <func> <name name="handle_system_msg" arity="6"/> - <fsummary>Take care of system messages</fsummary> + <fsummary>Take care of system messages.</fsummary> <desc> - <p>This function is used by a process module that wishes to take care of system - messages. The process receives a <c>{system, <anno>From</anno>, <anno>Msg</anno>}</c> - message and passes the <c><anno>Msg</anno></c> and <c><anno>From</anno></c> to this - function. - </p> - <p>This function <em>never</em> returns. It calls the function - <c><anno>Module</anno>:system_continue(<anno>Parent</anno>, NDebug, <anno>Misc</anno>)</c> where the - process continues the execution, or - <c><anno>Module</anno>:system_terminate(Reason, <anno>Parent</anno>, <anno>Debug</anno>, <anno>Misc</anno>)</c> if - the process should terminate. The <c><anno>Module</anno></c> must export - <c>system_continue/3</c>, <c>system_terminate/4</c>, - <c>system_code_change/4</c>, <c>system_get_state/1</c> and - <c>system_replace_state/2</c> (see below). - </p> - <p>The <c><anno>Misc</anno></c> argument can be used to save internal data - in a process, for example its state. It is sent to + <p>This function is used by a process module to take care of system + messages. The process receives a + <c>{system, <anno>From</anno>, <anno>Msg</anno>}</c> message and + passes <c><anno>Msg</anno></c> and <c><anno>From</anno></c> to this + function.</p> + <p>This function <em>never</em> returns. It calls either of the + following functions:</p> + <list type="bulleted"> + <item> + <p><c><anno>Module</anno>:system_continue(<anno>Parent</anno>, + NDebug, <anno>Misc</anno>)</c>, + where the process continues the execution.</p> + </item> + <item> + <p><c><anno>Module</anno>:system_terminate(Reason, + <anno>Parent</anno>, <anno>Debug</anno>, <anno>Misc</anno>)</c>, + if the process is to terminate.</p> + </item> + </list> + <p><c><anno>Module</anno></c> must export the following:</p> + <list type="bulleted"> + <item><c>system_continue/3</c></item> + <item><c>system_terminate/4</c></item> + <item><c>system_code_change/4</c></item> + <item><c>system_get_state/1</c></item> + <item><c>system_replace_state/2</c></item> + </list> + <p>Argument <c><anno>Misc</anno></c> can be used to save internal data + in a process, for example, its state. It is sent to <c><anno>Module</anno>:system_continue/3</c> or - <c><anno>Module</anno>:system_terminate/4</c></p> + <c><anno>Module</anno>:system_terminate/4</c>.</p> </desc> </func> + <func> <name name="print_log" arity="1"/> - <fsummary>Print the logged events in the debug structure</fsummary> + <fsummary>Print the logged events in the debug structure.</fsummary> <desc> - <p>Prints the logged system events in the debug structure + <p>Prints the logged system events in the debug structure, using <c>FormFunc</c> as defined when the event was - generated by a call to <c>handle_debug/4</c>.</p> + generated by a call to + <seealso marker="#handle_debug/4"><c>handle_debug/4</c></seealso>.</p> </desc> </func> + <func> - <name>Mod:system_continue(Parent, Debug, Misc) -> none()</name> - <fsummary>Called when the process should continue its execution</fsummary> + <name>Module:system_code_change(Misc, Module, OldVsn, Extra) -> + {ok, NMisc}</name> + <fsummary>Called when the process is to perform a code change.</fsummary> <type> - <v>Parent = pid()</v> - <v>Debug = [<seealso marker="#type-dbg_opt">dbg_opt()</seealso>]</v> <v>Misc = term()</v> + <v>OldVsn = undefined | term()</v> + <v>Module = atom()</v> + <v>Extra = term()</v> + <v>NMisc = term()</v> </type> <desc> - <p>This function is called from <c>sys:handle_system_msg/6</c> when the process - should continue its execution (for example after it has been - suspended). This function never returns.</p> + <p>Called from <seealso marker="#handle_system_msg/6"> + <c>handle_system_msg/6</c></seealso> when the process is to perform a + code change. The code change is used when the + internal data structure has changed. This function + converts argument <c>Misc</c> to the new data + structure. <c>OldVsn</c> is attribute <em>vsn</em> of the + old version of the <c>Module</c>. If no such attribute is + defined, the atom <c>undefined</c> is sent.</p> </desc> </func> + <func> - <name>Mod:system_terminate(Reason, Parent, Debug, Misc) -> none()</name> - <fsummary>Called when the process should terminate</fsummary> + <name>Module:system_continue(Parent, Debug, Misc) -> none()</name> + <fsummary>Called when the process is to continue its execution.</fsummary> <type> - <v>Reason = term()</v> <v>Parent = pid()</v> <v>Debug = [<seealso marker="#type-dbg_opt">dbg_opt()</seealso>]</v> <v>Misc = term()</v> </type> <desc> - <p>This function is called from <c>sys:handle_system_msg/6</c> when the process - should terminate. For example, this function is called when - the process is suspended and its parent orders shut-down. - It gives the process a chance to do a clean-up. This function never - returns.</p> + <p>Called from <seealso marker="#handle_system_msg/6"> + <c>handle_system_msg/6</c></seealso> when the process is to continue + its execution (for example, after it has been + suspended). This function never returns.</p> </desc> </func> + <func> - <name>Mod:system_code_change(Misc, Module, OldVsn, Extra) -> {ok, NMisc}</name> - <fsummary>Called when the process should perform a code change</fsummary> + <name>Module:system_get_state(Misc) -> {ok, State}</name> + <fsummary>Called when the process is to return its current state. + </fsummary> <type> <v>Misc = term()</v> - <v>OldVsn = undefined | term()</v> - <v>Module = atom()</v> - <v>Extra = term()</v> - <v>NMisc = term()</v> + <v>State = term()</v> </type> <desc> - <p>Called from <c>sys:handle_system_msg/6</c> when the process - should perform a code change. The code change is used when the - internal data structure has changed. This function - converts the <c>Misc</c> argument to the new data - structure. <c>OldVsn</c> is the <em>vsn</em> attribute of the - old version of the <c>Module</c>. If no such attribute was - defined, the atom <c>undefined</c> is sent.</p> + <p>Called from <seealso marker="#handle_system_msg/6"> + <c>handle_system_msg/6</c></seealso> + when the process is to return a term that reflects its current state. + <c>State</c> is the value returned by + <seealso marker="#get_state/2"><c>get_state/2</c></seealso>.</p> </desc> </func> + <func> - <name>Mod:system_get_state(Misc) -> {ok, State}</name> - <fsummary>Called when the process should return its current state</fsummary> + <name>Module:system_replace_state(StateFun, Misc) -> + {ok, NState, NMisc}</name> + <fsummary>Called when the process is to replace its current state. + </fsummary> <type> + <v>StateFun = fun((State :: term()) -> NState)</v> <v>Misc = term()</v> - <v>State = term()</v> - </type> + <v>NState = term()</v> + <v>NMisc = term()</v> + </type> <desc> - <p>This function is called from <c>sys:handle_system_msg/6</c> when the process - should return a term that reflects its current state. <c>State</c> is the - value returned by <c>sys:get_state/2</c>.</p> + <p>Called from <seealso marker="#handle_system_msg/6"> + <c>handle_system_msg/6</c></seealso> when the process is to replace + its current state. <c>NState</c> is the value returned by + <seealso marker="#replace_state/3"><c>replace_state/3</c></seealso>. + </p> </desc> </func> + <func> - <name>Mod:system_replace_state(StateFun, Misc) -> {ok, NState, NMisc}</name> - <fsummary>Called when the process should replace its current state</fsummary> + <name>Module:system_terminate(Reason, Parent, Debug, Misc) -> none()</name> + <fsummary>Called when the process is to terminate.</fsummary> <type> - <v>StateFun = fun((State :: term()) -> NState)</v> + <v>Reason = term()</v> + <v>Parent = pid()</v> + <v>Debug = [<seealso marker="#type-dbg_opt">dbg_opt()</seealso>]</v> <v>Misc = term()</v> - <v>NState = term()</v> - <v>NMisc = term()</v> - </type> + </type> <desc> - <p>This function is called from <c>sys:handle_system_msg/6</c> when the process - should replace its current state. <c>NState</c> is the value returned by - <c>sys:replace_state/3</c>.</p> + <p>Called from <seealso marker="#handle_system_msg/6"> + <c>handle_system_msg/6</c></seealso> when the process is to terminate. + For example, this function is called when + the process is suspended and its parent orders shutdown. + It gives the process a chance to do a cleanup. This function never + returns.</p> </desc> </func> </funcs> diff --git a/lib/stdlib/doc/src/timer.xml b/lib/stdlib/doc/src/timer.xml index 4f259d57a8..fcaccdb2cb 100644 --- a/lib/stdlib/doc/src/timer.xml +++ b/lib/stdlib/doc/src/timer.xml @@ -30,26 +30,25 @@ <checked></checked> <date>1998-09-09</date> <rev>D</rev> - <file>timer.sgml</file> + <file>timer.xml</file> </header> <module>timer</module> - <modulesummary>Timer Functions</modulesummary> + <modulesummary>Timer functions.</modulesummary> <description> <p>This module provides useful functions related to time. Unless otherwise - stated, time is always measured in <c>milliseconds</c>. All - timer functions return immediately, regardless of work carried - out by another process. - </p> - <p>Successful evaluations of the timer functions yield return values - containing a timer reference, denoted <c>TRef</c> below. By using - <c>cancel/1</c>, the returned reference can be used to cancel any - requested action. A <c>TRef</c> is an Erlang term, the contents - of which must not be altered. - </p> - <p>The timeouts are not exact, but should be <c>at least</c> as long - as requested. - </p> + stated, time is always measured in <em>milliseconds</em>. All + timer functions return immediately, regardless of work done by another + process.</p> + <p>Successful evaluations of the timer functions give return values + containing a timer reference, denoted <c>TRef</c>. By using + <seealso marker="#cancel/1"><c>cancel/1</c></seealso>, + the returned reference can be used to cancel any + requested action. A <c>TRef</c> is an Erlang term, which contents + must not be changed.</p> + <p>The time-outs are not exact, but are <em>at least</em> as long + as requested.</p> </description> + <datatypes> <datatype> <name name="time"/> @@ -60,231 +59,286 @@ <desc><p>A timer reference.</p></desc> </datatype> </datatypes> + <funcs> <func> - <name name="start" arity="0"/> - <fsummary>Start a global timer server (named <c>timer_server</c>).</fsummary> + <name name="apply_after" arity="4"/> + <fsummary>Apply <c>Module:Function(Arguments)</c> after a specified + <c>Time</c>.</fsummary> <desc> - <p>Starts the timer server. Normally, the server does not need - to be started explicitly. It is started dynamically if it - is needed. This is useful during development, but in a - target system the server should be started explicitly. Use - configuration parameters for <c>kernel</c> for this.</p> + <p>Evaluates <c>apply(<anno>Module</anno>, <anno>Function</anno>, + <anno>Arguments</anno>)</c> after <c><anno>Time</anno></c> + milliseconds.</p> + <p>Returns <c>{ok, <anno>TRef</anno>}</c> or + <c>{error, <anno>Reason</anno>}</c>.</p> </desc> </func> + <func> - <name name="apply_after" arity="4"/> - <fsummary>Apply <c>Module:Function(Arguments)</c>after a specified <c>Time</c>.</fsummary> + <name name="apply_interval" arity="4"/> + <fsummary>Evaluate <c>Module:Function(Arguments)</c> repeatedly at + intervals of <c>Time</c>.</fsummary> <desc> - <p>Evaluates <c>apply(<anno>Module</anno>, <anno>Function</anno>, <anno>Arguments</anno>)</c> after <c><anno>Time</anno></c> amount of time - has elapsed. Returns <c>{ok, <anno>TRef</anno>}</c>, or <c>{error, <anno>Reason</anno>}</c>.</p> + <p>Evaluates <c>apply(<anno>Module</anno>, <anno>Function</anno>, + <anno>Arguments</anno>)</c> repeatedly at intervals of + <c><anno>Time</anno></c>.</p> + <p>Returns <c>{ok, <anno>TRef</anno>}</c> or + <c>{error, <anno>Reason</anno>}</c>.</p> </desc> </func> + <func> - <name name="send_after" arity="2"/> - <name name="send_after" arity="3"/> - <fsummary>Send <c>Message</c>to <c>Pid</c>after a specified <c>Time</c>.</fsummary> + <name name="cancel" arity="1"/> + <fsummary>Cancel a previously requested time-out identified by + <c>TRef</c>.</fsummary> <desc> - <taglist> - <tag><c>send_after/3</c></tag> - <item> - <p>Evaluates <c><anno>Pid</anno> ! <anno>Message</anno></c> after <c><anno>Time</anno></c> amount - of time has elapsed. (<c><anno>Pid</anno></c> can also be an atom of a - registered name.) Returns <c>{ok, <anno>TRef</anno>}</c>, or - <c>{error, <anno>Reason</anno>}</c>.</p> - </item> - <tag><c>send_after/2</c></tag> - <item> - <p>Same as <c>send_after(<anno>Time</anno>, self(), <anno>Message</anno>)</c>.</p> - </item> - </taglist> + <p>Cancels a previously requested time-out. <c><anno>TRef</anno></c> is + a unique + timer reference returned by the related timer function.</p> + <p>Returns <c>{ok, cancel}</c>, or <c>{error, <anno>Reason</anno>}</c> + when <c><anno>TRef</anno></c> is not a timer reference.</p> </desc> </func> + <func> - <name name="kill_after" arity="1"/> - <name name="kill_after" arity="2"/> <name name="exit_after" arity="2"/> <name name="exit_after" arity="3"/> - <fsummary>Send an exit signal with <c>Reason</c>after a specified <c>Time</c>.</fsummary> + <fsummary>Send an exit signal with <c>Reason</c> after a specified + <c>Time</c>.</fsummary> + <desc> + <p><c>exit_after/2</c> is the same as + <c>exit_after(<anno>Time</anno>, self(), + <anno>Reason1</anno>)</c>.</p> + <p><c>exit_after/3</c> sends an exit signal with reason + <c><anno>Reason1</anno></c> to + pid <c><anno>Pid</anno></c>. Returns <c>{ok, <anno>TRef</anno>}</c> + or <c>{error, <anno>Reason2</anno>}</c>.</p> + </desc> + </func> + + <func> + <name name="hms" arity="3"/> + <fsummary>Convert <c>Hours</c>+<c>Minutes</c>+<c>Seconds</c> to + <c>Milliseconds</c>.</fsummary> + <desc> + <p>Returns the number of milliseconds in <c><anno>Hours</anno> + + <anno>Minutes</anno> + <anno>Seconds</anno></c>.</p> + </desc> + </func> + + <func> + <name name="hours" arity="1"/> + <fsummary>Convert <c>Hours</c> to <c>Milliseconds</c>.</fsummary> + <desc> + <p>Returns the number of milliseconds in <c><anno>Hours</anno></c>.</p> + </desc> + </func> + + <func> + <name name="kill_after" arity="1"/> + <name name="kill_after" arity="2"/> + <fsummary>Send an exit signal with <c>Reason</c> after a specified + <c>Time</c>.</fsummary> + <desc> + <p><c>kill_after/1</c> is the same as + <c>exit_after(<anno>Time</anno>, self(), kill)</c>.</p> + <p><c>kill_after/2</c> is the same as + <c>exit_after(<anno>Time</anno>, <anno>Pid</anno>, kill)</c>.</p> + </desc> + </func> + + <func> + <name name="minutes" arity="1"/> + <fsummary>Converts <c>Minutes</c> to <c>Milliseconds</c>.</fsummary> + <desc> + <p>Returns the number of milliseconds in + <c><anno>Minutes</anno></c>.</p> + </desc> + </func> + + <func> + <name name="now_diff" arity="2"/> + <fsummary>Calculate time difference between time stamps.</fsummary> + <type_desc variable="Tdiff">In microseconds</type_desc> + <desc> + <p>Calculates the time difference <c><anno>Tdiff</anno> = + <anno>T2</anno> - <anno>T1</anno></c> in <em>microseconds</em>, + where <c><anno>T1</anno></c> and <c><anno>T2</anno></c> + are time-stamp tuples on the same format as returned from + <seealso marker="erts:erlang#timestamp/0"> + <c>erlang:timestamp/0</c></seealso> or + <seealso marker="kernel:os#timestamp/0"> + <c>os:timestamp/0</c></seealso>.</p> + </desc> + </func> + + <func> + <name name="seconds" arity="1"/> + <fsummary>Convert <c>Seconds</c> to <c>Milliseconds</c>.</fsummary> + <desc> + <p>Returns the number of milliseconds in + <c><anno>Seconds</anno></c>.</p> + </desc> + </func> + + <func> + <name name="send_after" arity="2"/> + <name name="send_after" arity="3"/> + <fsummary>Send <c>Message</c> to <c>Pid</c> after a specified + <c>Time</c>.</fsummary> <desc> <taglist> - <tag><c>exit_after/3</c></tag> - <item> - <p>Send an exit signal with reason <c><anno>Reason1</anno></c> to Pid - <c><anno>Pid</anno></c>. Returns <c>{ok, <anno>TRef</anno>}</c>, or - <c>{error, <anno>Reason2</anno>}</c>.</p> - </item> - <tag><c>exit_after/2</c></tag> - <item> - <p>Same as <c>exit_after(<anno>Time</anno>, self(), <anno>Reason1</anno>)</c>. </p> - </item> - <tag><c>kill_after/2</c></tag> + <tag><c>send_after/3</c></tag> <item> - <p>Same as <c>exit_after(<anno>Time</anno>, <anno>Pid</anno>, kill)</c>. </p> + <p>Evaluates <c><anno>Pid</anno> ! <anno>Message</anno></c> after + <c><anno>Time</anno></c> milliseconds. (<c><anno>Pid</anno></c> + can also be an atom of a registered name.)</p> + <p>Returns <c>{ok, <anno>TRef</anno>}</c> or + <c>{error, <anno>Reason</anno>}</c>.</p> </item> - <tag><c>kill_after/1</c></tag> + <tag><c>send_after/2</c></tag> <item> - <p>Same as <c>exit_after(<anno>Time</anno>, self(), kill)</c>. </p> + <p>Same as <c>send_after(<anno>Time</anno>, self(), + <anno>Message</anno>)</c>.</p> </item> </taglist> </desc> </func> - <func> - <name name="apply_interval" arity="4"/> - <fsummary>Evaluate <c>Module:Function(Arguments)</c>repeatedly at intervals of <c>Time</c>.</fsummary> - <desc> - <p>Evaluates <c>apply(<anno>Module</anno>, <anno>Function</anno>, <anno>Arguments</anno>)</c> repeatedly at - intervals of <c><anno>Time</anno></c>. Returns <c>{ok, <anno>TRef</anno>}</c>, or - <c>{error, <anno>Reason</anno>}</c>.</p> - </desc> - </func> + <func> <name name="send_interval" arity="2"/> <name name="send_interval" arity="3"/> - <fsummary>Send <c>Message</c>repeatedly at intervals of <c>Time</c>.</fsummary> + <fsummary>Send <c>Message</c> repeatedly at intervals of <c>Time</c>. + </fsummary> <desc> <taglist> <tag><c>send_interval/3</c></tag> <item> - <p>Evaluates <c><anno>Pid</anno> ! <anno>Message</anno></c> repeatedly after <c><anno>Time</anno></c> - amount of time has elapsed. (<c><anno>Pid</anno></c> can also be an atom of - a registered name.) Returns <c>{ok, <anno>TRef</anno>}</c> or + <p>Evaluates <c><anno>Pid</anno> ! <anno>Message</anno></c> + repeatedly after <c><anno>Time</anno></c> milliseconds. + (<c><anno>Pid</anno></c> can also be + an atom of a registered name.)</p> + <p>Returns <c>{ok, <anno>TRef</anno>}</c> or <c>{error, <anno>Reason</anno>}</c>.</p> </item> <tag><c>send_interval/2</c></tag> <item> - <p>Same as <c>send_interval(<anno>Time</anno>, self(), <anno>Message</anno>)</c>.</p> + <p>Same as <c>send_interval(<anno>Time</anno>, self(), + <anno>Message</anno>)</c>.</p> </item> </taglist> </desc> </func> + <func> - <name name="cancel" arity="1"/> - <fsummary>Cancel a previously requested timeout identified by <c>TRef</c>.</fsummary> + <name name="sleep" arity="1"/> + <fsummary>Suspend the calling process for <c>Time</c> milliseconds. + </fsummary> <desc> - <p>Cancels a previously requested timeout. <c><anno>TRef</anno></c> is a unique - timer reference returned by the timer function in question. Returns - <c>{ok, cancel}</c>, or <c>{error, <anno>Reason</anno>}</c> when <c><anno>TRef</anno></c> - is not a timer reference.</p> + <p>Suspends the process calling this function for + <c><anno>Time</anno></c> milliseconds and then returns <c>ok</c>, + or suspends the process forever if <c><anno>Time</anno></c> is the + atom <c>infinity</c>. Naturally, this + function does <em>not</em> return immediately.</p> </desc> </func> + <func> - <name name="sleep" arity="1"/> - <fsummary>Suspend the calling process for <c>Time</c>amount of milliseconds.</fsummary> + <name name="start" arity="0"/> + <fsummary>Start a global timer server (named <c>timer_server</c>). + </fsummary> <desc> - <p>Suspends the process calling this function for <c><anno>Time</anno></c> amount - of milliseconds and then returns <c>ok</c>, or suspend the process - forever if <c><anno>Time</anno></c> is the atom <c>infinity</c>. Naturally, this - function does <em>not</em> return immediately.</p> + <p>Starts the timer server. Normally, the server does not need + to be started explicitly. It is started dynamically if it + is needed. This is useful during development, but in a + target system the server is to be started explicitly. Use + configuration parameters for + <seealso marker="kernel:index">Kernel</seealso> for this.</p> </desc> </func> + <func> <name name="tc" arity="1"/> <name name="tc" arity="2"/> <name name="tc" arity="3"/> <fsummary>Measure the real time it takes to evaluate <c>apply(Module, - Function, Arguments)</c> or <c>apply(Fun, Arguments)</c></fsummary> + Function, Arguments)</c> or <c>apply(Fun, Arguments)</c>.</fsummary> <type_desc variable="Time">In microseconds</type_desc> <desc> <taglist> <tag><c>tc/3</c></tag> <item> - <p>Evaluates <c>apply(<anno>Module</anno>, <anno>Function</anno>, <anno>Arguments</anno>)</c> and measures - the elapsed real time as reported by <c>os:timestamp/0</c>. - Returns <c>{<anno>Time</anno>, <anno>Value</anno>}</c>, where - <c><anno>Time</anno></c> is the elapsed real time in <em>microseconds</em>, - and <c><anno>Value</anno></c> is what is returned from the apply.</p> + <p>Evaluates <c>apply(<anno>Module</anno>, <anno>Function</anno>, + <anno>Arguments</anno>)</c> and measures the elapsed real time as + reported by <seealso marker="os:timestamp/0"> + <c>os:timestamp/0</c></seealso>.</p> + <p>Returns <c>{<anno>Time</anno>, <anno>Value</anno>}</c>, where + <c><anno>Time</anno></c> is the elapsed real time in + <em>microseconds</em>, and <c><anno>Value</anno></c> is what is + returned from the apply.</p> </item> <tag><c>tc/2</c></tag> <item> - <p>Evaluates <c>apply(<anno>Fun</anno>, <anno>Arguments</anno>)</c>. Otherwise works - like <c>tc/3</c>.</p> + <p>Evaluates <c>apply(<anno>Fun</anno>, <anno>Arguments</anno>)</c>. + Otherwise the same as <c>tc/3</c>.</p> </item> <tag><c>tc/1</c></tag> <item> - <p>Evaluates <c><anno>Fun</anno>()</c>. Otherwise works like <c>tc/2</c>.</p> + <p>Evaluates <c><anno>Fun</anno>()</c>. Otherwise the same as + <c>tc/2</c>.</p> </item> - </taglist> </desc> </func> - <func> - <name name="now_diff" arity="2"/> - <fsummary>Calculate time difference between timestamps</fsummary> - <type_desc variable="Tdiff">In microseconds</type_desc> - <desc> - <p>Calculates the time difference <c><anno>Tdiff</anno> = <anno>T2</anno> - <anno>T1</anno></c> in - <em>microseconds</em>, where <c><anno>T1</anno></c> and <c><anno>T2</anno></c> - are timestamp tuples on the same format as returned from - <seealso marker="erts:erlang#timestamp/0"><c>erlang:timestamp/0</c></seealso>, - or <seealso marker="kernel:os#timestamp/0"><c>os:timestamp/0</c></seealso>.</p> - </desc> - </func> - <func> - <name name="seconds" arity="1"/> - <fsummary>Convert <c>Seconds</c>to <c>Milliseconds</c>.</fsummary> - <desc> - <p>Returns the number of milliseconds in <c><anno>Seconds</anno></c>.</p> - </desc> - </func> - <func> - <name name="minutes" arity="1"/> - <fsummary>Converts <c>Minutes</c> to <c>Milliseconds</c>.</fsummary> - <desc> - <p>Return the number of milliseconds in <c><anno>Minutes</anno></c>.</p> - </desc> - </func> - <func> - <name name="hours" arity="1"/> - <fsummary>Convert <c>Hours</c>to <c>Milliseconds</c>.</fsummary> - <desc> - <p>Returns the number of milliseconds in <c><anno>Hours</anno></c>.</p> - </desc> - </func> - <func> - <name name="hms" arity="3"/> - <fsummary>Convert <c>Hours</c>+<c>Minutes</c>+<c>Seconds</c>to <c>Milliseconds</c>.</fsummary> - <desc> - <p>Returns the number of milliseconds in <c><anno>Hours</anno> + <anno>Minutes</anno> + <anno>Seconds</anno></c>.</p> - </desc> - </func> </funcs> <section> <title>Examples</title> - <p>This example illustrates how to print out "Hello World!" in 5 seconds:</p> + <p><em>Example 1</em></p> + <p>The following example shows how to print "Hello World!" in 5 seconds:</p> <pre> - 1> <input>timer:apply_after(5000, io, format, ["~nHello World!~n", []]).</input> - {ok,TRef} - Hello World!</pre> - <p>The following coding example illustrates a process which performs a - certain action and if this action is not completed within a certain - limit, then the process is killed.</p> +1> <input>timer:apply_after(5000, io, format, ["~nHello World!~n", []]).</input> +{ok,TRef} +Hello World!</pre> + + <p><em>Example 2</em></p> + <p>The following example shows a process performing a + certain action, and if this action is not completed within a certain + limit, the process is killed:</p> <code type="none"> - Pid = spawn(mod, fun, [foo, bar]), - %% If pid is not finished in 10 seconds, kill him - {ok, R} = timer:kill_after(timer:seconds(10), Pid), - ... - %% We change our mind... - timer:cancel(R), - ...</code> +Pid = spawn(mod, fun, [foo, bar]), +%% If pid is not finished in 10 seconds, kill him +{ok, R} = timer:kill_after(timer:seconds(10), Pid), +... +%% We change our mind... +timer:cancel(R), +...</code> </section> <section> - <title>WARNING</title> - <p>A timer can always be removed by calling <c>cancel/1</c>. - </p> - <p>An interval timer, i.e. a timer created by evaluating any of the - functions <c>apply_interval/4</c>, <c>send_interval/3</c>, and - <c>send_interval/2</c>, is linked to the process towards which - the timer performs its task. - </p> - <p>A one-shot timer, i.e. a timer created by evaluating any of the - functions <c>apply_after/4</c>, <c>send_after/3</c>, - <c>send_after/2</c>, <c>exit_after/3</c>, <c>exit_after/2</c>, - <c>kill_after/2</c>, and <c>kill_after/1</c> is not linked to any - process. Hence, such a timer is removed only when it reaches its - timeout, or if it is explicitly removed by a call to <c>cancel/1</c>.</p> + <title>Notes</title> + <p>A timer can always be removed by calling + <seealso marker="#cancel/1"><c>cancel/1</c></seealso>.</p> + + <p>An interval timer, that is, a timer created by evaluating any of the + functions + <seealso marker="#apply_interval/4"><c>apply_interval/4</c></seealso>, + <seealso marker="#send_interval/3"><c>send_interval/3</c></seealso>, and + <seealso marker="#send_interval/2"><c>send_interval/2</c></seealso> + is linked to the process to which the timer performs its task.</p> + + <p>A one-shot timer, that is, a timer created by evaluating any of the + functions + <seealso marker="#apply_after/4"><c>apply_after/4</c></seealso>, + <seealso marker="#send_after/3"><c>send_after/3</c></seealso>, + <seealso marker="#send_after/2"><c>send_after/2</c></seealso>, + <seealso marker="#exit_after/3"><c>exit_after/3</c></seealso>, + <seealso marker="#exit_after/2"><c>exit_after/2</c></seealso>, + <seealso marker="#kill_after/2"><c>kill_after/2</c></seealso>, and + <seealso marker="#kill_after/1"><c>kill_after/1</c></seealso> + is not linked to any process. Hence, such a timer is removed only + when it reaches its time-out, or if it is explicitly removed by a call to + <seealso marker="#cancel/1"><c>cancel/1</c></seealso>.</p> </section> </erlref> diff --git a/lib/stdlib/doc/src/unicode.xml b/lib/stdlib/doc/src/unicode.xml index edc6830cb5..93d0d37456 100644 --- a/lib/stdlib/doc/src/unicode.xml +++ b/lib/stdlib/doc/src/unicode.xml @@ -31,12 +31,27 @@ <rev></rev> </header> <module>unicode</module> - <modulesummary>Functions for converting Unicode characters</modulesummary> + <modulesummary>Functions for converting Unicode characters.</modulesummary> <description> - <p>This module contains functions for converting between different character representations. Basically it converts between ISO-latin-1 characters and Unicode ditto, but it can also convert between different Unicode encodings (like UTF-8, UTF-16 and UTF-32).</p> - <p>The default Unicode encoding in Erlang is in binaries UTF-8, which is also the format in which built in functions and libraries in OTP expect to find binary Unicode data. In lists, Unicode data is encoded as integers, each integer representing one character and encoded simply as the Unicode codepoint for the character.</p> - <p>Other Unicode encodings than integers representing codepoints or UTF-8 in binaries are referred to as "external encodings". The ISO-latin-1 encoding is in binaries and lists referred to as latin1-encoding.</p> - <p>It is recommended to only use external encodings for communication with external entities where this is required. When working inside the Erlang/OTP environment, it is recommended to keep binaries in UTF-8 when representing Unicode characters. Latin1 encoding is supported both for backward compatibility and for communication with external entities not supporting Unicode character sets.</p> + <p>This module contains functions for converting between different character + representations. It converts between ISO Latin-1 characters and Unicode + characters, but it can also convert between different Unicode encodings + (like UTF-8, UTF-16, and UTF-32).</p> + <p>The default Unicode encoding in Erlang is in binaries UTF-8, which is also + the format in which built-in functions and libraries in OTP expect to find + binary Unicode data. In lists, Unicode data is encoded as integers, each + integer representing one character and encoded simply as the Unicode code + point for the character.</p> + <p>Other Unicode encodings than integers representing code points or UTF-8 + in binaries are referred to as "external encodings". The ISO + Latin-1 encoding + is in binaries and lists referred to as latin1-encoding.</p> + <p>It is recommended to only use external encodings for communication with + external entities where this is required. When working inside the + Erlang/OTP environment, it is recommended to keep binaries in UTF-8 when + representing Unicode characters. ISO Latin-1 encoding is supported both + for backward compatibility and for communication + with external entities not supporting Unicode character sets.</p> </description> <datatypes> @@ -49,7 +64,8 @@ <datatype> <name name="unicode_binary"/> <desc> - <p>A <c>binary()</c> with characters encoded in the UTF-8 coding standard.</p> + <p>A <c>binary()</c> with characters encoded in the UTF-8 coding + standard.</p> </desc> </datatype> <datatype> @@ -61,8 +77,8 @@ <datatype> <name name="external_unicode_binary"/> <desc> - <p>A <c>binary()</c> with characters coded in a user specified Unicode - encoding other than UTF-8 (UTF-16 or UTF-32).</p> + <p>A <c>binary()</c> with characters coded in a user-specified Unicode + encoding other than UTF-8 (that is, UTF-16 or UTF-32).</p> </desc> </datatype> <datatype> @@ -73,23 +89,23 @@ </datatype> <datatype> <name name="latin1_binary"/> - <desc><p>A <c>binary()</c> with characters coded in ISO-latin-1.</p> + <desc><p>A <c>binary()</c> with characters coded in ISO Latin-1.</p> </desc> </datatype> <datatype> <name name="latin1_char"/> - <desc><p>An <c>integer()</c> representing valid latin1 + <desc><p>An <c>integer()</c> representing a valid ISO Latin-1 character (0-255).</p> </desc> </datatype> <datatype> <name name="latin1_chardata"/> - <desc><p>The same as <c>iodata()</c>.</p> + <desc><p>Same as <c>iodata()</c>.</p> </desc> </datatype> <datatype> <name name="latin1_charlist"/> - <desc><p>The same as <c>iolist()</c>.</p> + <desc><p>Same as <c>iolist()</c>.</p> </desc> </datatype> </datatypes> @@ -100,197 +116,224 @@ <fsummary>Identify UTF byte order marks in a binary.</fsummary> <type name="endian"/> <type_desc variable="Bin"> - A <c>binary()</c> such that <c>byte_size(<anno>Bin</anno>) >= 4</c>. + A <c>binary()</c> such that <c>byte_size(<anno>Bin</anno>) >= 4</c>. </type_desc> <desc> - - <p>Check for a UTF byte order mark (BOM) in the beginning of a - binary. If the supplied binary <c><anno>Bin</anno></c> begins with a valid - byte order mark for either UTF-8, UTF-16 or UTF-32, the function - returns the encoding identified along with the length of the BOM - in bytes.</p> - - <p>If no BOM is found, the function returns <c>{latin1,0}</c></p> + <p>Checks for a UTF Byte Order Mark (BOM) in the beginning of a + binary. If the supplied binary <c><anno>Bin</anno></c> begins with a + valid BOM for either UTF-8, UTF-16, or UTF-32, the function + returns the encoding identified along with the BOM length + in bytes.</p> + <p>If no BOM is found, the function returns <c>{latin1,0}</c>.</p> </desc> </func> + <func> - <name name="characters_to_list" arity="1"/> - <fsummary>Convert a collection of characters to list of Unicode characters</fsummary> + <name name="characters_to_binary" arity="1"/> + <fsummary>Convert a collection of characters to a UTF-8 binary.</fsummary> <desc> - <p>Same as <c>characters_to_list(<anno>Data</anno>, unicode)</c>.</p> + <p>Same as <c>characters_to_binary(<anno>Data</anno>, unicode, + unicode)</c>.</p> </desc> </func> - <func> - <name name="characters_to_list" arity="2"/> - <fsummary>Convert a collection of characters to list of Unicode characters</fsummary> - <desc> - - <p>Converts a possibly deep list of integers and - binaries into a list of integers representing Unicode - characters. The binaries in the input may have characters - encoded as latin1 (0 - 255, one character per byte), in which - case the <c><anno>InEncoding</anno></c> parameter should be given as - <c>latin1</c>, or have characters encoded as one of the - UTF-encodings, which is given as the <c><anno>InEncoding</anno></c> - parameter. Only when the <c><anno>InEncoding</anno></c> is one of the UTF - encodings, integers in the list are allowed to be greater than - 255.</p> - - <p>If <c><anno>InEncoding</anno></c> is <c>latin1</c>, the <c><anno>Data</anno></c> parameter - corresponds to the <c>iodata()</c> type, but for <c>unicode</c>, - the <c><anno>Data</anno></c> parameter can contain integers greater than 255 - (Unicode characters beyond the ISO-latin-1 range), which would - make it invalid as <c>iodata()</c>.</p> - - <p>The purpose of the function is mainly to be able to convert - combinations of Unicode characters into a pure Unicode - string in list representation for further processing. For - writing the data to an external entity, the reverse function - <seealso - marker="#characters_to_binary/3"><c>characters_to_binary/3</c></seealso> - comes in handy.</p> - - <p>The option <c>unicode</c> is an alias for <c>utf8</c>, as this is the - preferred encoding for Unicode characters in - binaries. <c>utf16</c> is an alias for <c>{utf16,big}</c> and - <c>utf32</c> is an alias for <c>{utf32,big}</c>. The <c>big</c> - and <c>little</c> atoms denote big or little endian - encoding.</p> - <p>If for some reason, the data cannot be converted, either - because of illegal Unicode/latin1 characters in the list, or - because of invalid UTF encoding in any binaries, an error - tuple is returned. The error tuple contains the tag - <c>error</c>, a list representing the characters that could be - converted before the error occurred and a representation of the - characters including and after the offending integer/bytes. The - last part is mostly for debugging as it still constitutes a - possibly deep and/or mixed list, not necessarily of the same - depth as the original data. The error occurs when traversing the - list and whatever is left to decode is simply returned as is.</p> - - <p>However, if the input <c><anno>Data</anno></c> is a pure binary, the third - part of the error tuple is guaranteed to be a binary as - well.</p> - - <p>Errors occur for the following reasons:</p> - <list type="bulleted"> - - <item>Integers out of range - If <c><anno>InEncoding</anno></c> is - <c>latin1</c>, an error occurs whenever an integer greater - than 255 is found in the lists. If <c><anno>InEncoding</anno></c> is - of a Unicode type, an error occurs whenever an integer - <list type="bulleted"> - <item>greater than <c>16#10FFFF</c> - (the maximum Unicode character),</item> - <item>in the range <c>16#D800</c> to <c>16#DFFF</c> - (invalid range reserved for UTF-16 surrogate pairs)</item> - </list> - is found. - </item> - - <item>UTF encoding incorrect - If <c><anno>InEncoding</anno></c> is - one of the UTF types, the bytes in any binaries have to be valid - in that encoding. Errors can occur for various - reasons, including "pure" decoding errors - (like the upper - bits of the bytes being wrong), the bytes are decoded to a - too large number, the bytes are decoded to a code-point in the - invalid Unicode - range, or encoding is "overlong", meaning that a - number should have been encoded in fewer bytes. The - case of a truncated UTF is handled specially, see the - paragraph about incomplete binaries below. If - <c><anno>InEncoding</anno></c> is <c>latin1</c>, binaries are always valid - as long as they contain whole bytes, - as each byte falls into the valid ISO-latin-1 range.</item> - - </list> - - <p>A special type of error is when no actual invalid integers or - bytes are found, but a trailing <c>binary()</c> consists of too - few bytes to decode the last character. This error might occur - if bytes are read from a file in chunks or binaries in other - ways are split on non UTF character boundaries. In this case an - <c>incomplete</c> tuple is returned instead of the <c>error</c> - tuple. It consists of the same parts as the <c>error</c> tuple, but - the tag is <c>incomplete</c> instead of <c>error</c> and the - last element is always guaranteed to be a binary consisting of - the first part of a (so far) valid UTF character.</p> - - <p>If one UTF characters is split over two consecutive - binaries in the <c><anno>Data</anno></c>, the conversion succeeds. This means - that a character can be decoded from a range of binaries as long - as the whole range is given as input without errors - occurring. Example:</p> - -<code> - decode_data(Data) -> - case unicode:characters_to_list(Data,unicode) of - {incomplete,Encoded, Rest} -> - More = get_some_more_data(), - Encoded ++ decode_data([Rest, More]); - {error,Encoded,Rest} -> - handle_error(Encoded,Rest); - List -> - List - end. -</code> - <p>Bit-strings that are not whole bytes are however not allowed, - so a UTF character has to be split along 8-bit boundaries to - ever be decoded.</p> - - <p>If any parameters are of the wrong type, the list structure - is invalid (a number as tail) or the binaries do not contain - whole bytes (bit-strings), a <c>badarg</c> exception is - thrown.</p> - + <func> + <name name="characters_to_binary" arity="2"/> + <fsummary>Convert a collection of characters to a UTF-8 binary.</fsummary> + <desc> + <p>Same as <c>characters_to_binary(<anno>Data</anno>, + <anno>InEncoding</anno>, unicode)</c>.</p> </desc> </func> + <func> - <name name="characters_to_binary" arity="1"/> - <fsummary>Convert a collection of characters to a UTF-8 binary</fsummary> + <name name="characters_to_binary" arity="3"/> + <fsummary>Convert a collection of characters to a UTF-8 binary.</fsummary> <desc> - <p>Same as <c>characters_to_binary(<anno>Data</anno>, unicode, unicode)</c>.</p> + <p>Behaves as <seealso marker="#characters_to_list/2"> + <c>characters_to_list/2</c></seealso>, but produces a binary + instead of a Unicode list.</p> + <p><c><anno>InEncoding</anno></c> defines how input is to be interpreted + if binaries are present in <c>Data</c></p> + <p><c><anno>OutEncoding</anno></c> defines in what format output is to + be generated.</p> + <p>Options:</p> + <taglist> + <tag><c>unicode</c></tag> + <item> + <p>An alias for <c>utf8</c>, as this is the preferred encoding for + Unicode characters in binaries.</p> + </item> + <tag><c>utf16</c></tag> + <item> + <p>An alias for <c>{utf16,big}</c>.</p> + </item> + <tag><c>utf32</c></tag> + <item> + <p>An alias for <c>{utf32,big}</c>.</p> + </item> + </taglist> + <p>The atoms <c>big</c> and <c>little</c> denote big- or little-endian + encoding.</p> + <p>Errors and exceptions occur as in + <seealso marker="#characters_to_list/2"> + <c>characters_to_list/2</c></seealso>, but the second element + in tuple <c>error</c> or <c>incomplete</c> is a <c>binary()</c> + and not a <c>list()</c>.</p> </desc> </func> - <func> - <name name="characters_to_binary" arity="2"/> - <fsummary>Convert a collection of characters to a UTF-8 binary</fsummary> + <func> + <name name="characters_to_list" arity="1"/> + <fsummary>Convert a collection of characters to a list of Unicode + characters.</fsummary> <desc> - <p>Same as <c>characters_to_binary(<anno>Data</anno>, <anno>InEncoding</anno>, unicode)</c>.</p> + <p>Same as <c>characters_to_list(<anno>Data</anno>, unicode)</c>.</p> </desc> - </func> + </func> + <func> - <name name="characters_to_binary" arity="3"/> - <fsummary>Convert a collection of characters to a UTF-8 binary</fsummary> + <name name="characters_to_list" arity="2"/> + <fsummary>Convert a collection of characters to a list of Unicode + characters.</fsummary> <desc> - - <p>Behaves as <seealso marker="#characters_to_list/2"> - <c>characters_to_list/2</c></seealso>, but produces an binary - instead of a Unicode list. The - <c><anno>InEncoding</anno></c> defines how input is to be interpreted if - binaries are present in the <c>Data</c>, while - <c><anno>OutEncoding</anno></c> defines in what format output is to be - generated.</p> - - <p>The option <c>unicode</c> is an alias for <c>utf8</c>, as this is the - preferred encoding for Unicode characters in - binaries. <c>utf16</c> is an alias for <c>{utf16,big}</c> and - <c>utf32</c> is an alias for <c>{utf32,big}</c>. The <c>big</c> - and <c>little</c> atoms denote big or little endian - encoding.</p> - - <p>Errors and exceptions occur as in <seealso - marker="#characters_to_list/2"> - <c>characters_to_list/2</c></seealso>, but the second element - in the <c>error</c> or - <c>incomplete</c> tuple will be a <c>binary()</c> and not a - <c>list()</c>.</p> - + <p>Converts a possibly deep list of integers and + binaries into a list of integers representing Unicode + characters. The binaries in the input can have characters + encoded as one of the following:</p> + <list type="bulleted"> + <item> + <p>ISO Latin-1 (0-255, one character per byte). Here, + case parameter <c><anno>InEncoding</anno></c> is to be specified + as <c>latin1</c>.</p> + </item> + <item> + <p>One of the UTF-encodings, which is specified as parameter + <c><anno>InEncoding</anno></c>.</p> + </item> + </list> + <p>Only when <c><anno>InEncoding</anno></c> is one of the UTF + encodings, integers in the list are allowed to be > 255.</p> + <p>If <c><anno>InEncoding</anno></c> is <c>latin1</c>, parameter + <c><anno>Data</anno></c> corresponds to the <c>iodata()</c> type, + but for <c>unicode</c>, parameter <c><anno>Data</anno></c> can + contain integers > 255 + (Unicode characters beyond the ISO Latin-1 range), which + makes it invalid as <c>iodata()</c>.</p> + <p>The purpose of the function is mainly to convert + combinations of Unicode characters into a pure Unicode + string in list representation for further processing. For + writing the data to an external entity, the reverse function + <seealso marker="#characters_to_binary/3"> + <c>characters_to_binary/3</c></seealso> + comes in handy.</p> + <p>Option <c>unicode</c> is an alias for <c>utf8</c>, as this is the + preferred encoding for Unicode characters in + binaries. <c>utf16</c> is an alias for <c>{utf16,big}</c> and + <c>utf32</c> is an alias for <c>{utf32,big}</c>. The atoms <c>big</c> + and <c>little</c> denote big- or little-endian encoding.</p> + <p>If the data cannot be converted, either + because of illegal Unicode/ISO Latin-1 characters in the list, + or because of invalid UTF encoding in any binaries, an error + tuple is returned. The error tuple contains the tag + <c>error</c>, a list representing the characters that could be + converted before the error occurred and a representation of the + characters including and after the offending integer/bytes. The + last part is mostly for debugging, as it still constitutes a + possibly deep or mixed list, or both, not necessarily of the same + depth as the original data. The error occurs when traversing the + list and whatever is left to decode is returned "as is".</p> + <p>However, if the input <c><anno>Data</anno></c> is a pure binary, + the third part of the error tuple is guaranteed to be a binary as + well.</p> + <p>Errors occur for the following reasons:</p> + <list type="bulleted"> + <item> + <p>Integers out of range.</p> + <p>If <c><anno>InEncoding</anno></c> is <c>latin1</c>, + an error occurs whenever an integer > 255 is found + in the lists.</p> + <p>If <c><anno>InEncoding</anno></c> is of a Unicode type, + an error occurs whenever either of the following is found:</p> + <list type="bulleted"> + <item> + <p>An integer > 16#10FFFF + (the maximum Unicode character)</p> + </item> + <item> + <p>An integer in the range 16#D800 to 16#DFFF (invalid range + reserved for UTF-16 surrogate pairs)</p> + </item> + </list> + </item> + <item> + <p>Incorrect UTF encoding.</p> + <p>If <c><anno>InEncoding</anno></c> is one of the UTF types, + the bytes in any binaries must be valid in that encoding.</p> + <p>Errors can occur for various reasons, including the + following:</p> + <list type="bulleted"> + <item> + <p>"Pure" decoding errors + (like the upper bits of the bytes being wrong).</p> + </item> + <item> + <p>The bytes are decoded to a too large number.</p> + </item> + <item> + <p>The bytes are decoded to a code point in the invalid + Unicode range.</p> + </item> + <item> + <p>Encoding is "overlong", meaning that a number + should have been encoded in fewer bytes.</p> + </item> + </list> + <p>The case of a truncated UTF is handled specially, see the + paragraph about incomplete binaries below.</p> + <p>If <c><anno>InEncoding</anno></c> is <c>latin1</c>, binaries are + always valid as long as they contain whole bytes, + as each byte falls into the valid ISO Latin-1 range.</p> + </item> + </list> + <p>A special type of error is when no actual invalid integers or + bytes are found, but a trailing <c>binary()</c> consists of too + few bytes to decode the last character. This error can occur + if bytes are read from a file in chunks or if binaries in other + ways are split on non-UTF character boundaries. An <c>incomplete</c> + tuple is then returned instead of the <c>error</c> tuple. + It consists of the same parts as the <c>error</c> tuple, but + the tag is <c>incomplete</c> instead of <c>error</c> and the + last element is always guaranteed to be a binary consisting of + the first part of a (so far) valid UTF character.</p> + <p>If one UTF character is split over two consecutive binaries in + the <c><anno>Data</anno></c>, the conversion succeeds. This means + that a character can be decoded from a range of binaries as long + as the whole range is specified as input without errors occurring.</p> + <p><em>Example:</em></p> + <code> +decode_data(Data) -> + case unicode:characters_to_list(Data,unicode) of + {incomplete,Encoded, Rest} -> + More = get_some_more_data(), + Encoded ++ decode_data([Rest, More]); + {error,Encoded,Rest} -> + handle_error(Encoded,Rest); + List -> + List + end.</code> + <p>However, bit strings that are not whole bytes are not allowed, + so a UTF character must be split along 8-bit boundaries to + ever be decoded.</p> + <p>A <c>badarg</c> exception is thrown for the following cases:</p> + <list type="bulleted"> + <item>Any parameters are of the wrong type.</item> + <item>The list structure is invalid (a number as tail).</item> + <item>The binaries do not contain whole bytes (bit strings).</item> + </list> </desc> </func> + <func> <name name="encoding_to_bom" arity="1"/> <fsummary>Create a binary UTF byte order mark from encoding.</fsummary> @@ -298,20 +341,15 @@ A <c>binary()</c> such that <c>byte_size(<anno>Bin</anno>) >= 4</c>. </type_desc> <desc> - - <p>Create a UTF byte order mark (BOM) as a binary from the - supplied <c><anno>InEncoding</anno></c>. The BOM is, if supported at all, - expected to be placed first in UTF encoded files or - messages.</p> - - <p>The function returns <c><<>></c> for the - <c>latin1</c> encoding as there is no BOM for ISO-latin-1.</p> - - <p>It can be noted that the BOM for UTF-8 is seldom used, and it - is really not a <em>byte order</em> mark. There are obviously no - byte order issues with UTF-8, so the BOM is only there to - differentiate UTF-8 encoding from other UTF formats.</p> - + <p>Creates a UTF Byte Order Mark (BOM) as a binary from the + supplied <c><anno>InEncoding</anno></c>. The BOM is, if supported at + all, expected to be placed first in UTF encoded files or messages.</p> + <p>The function returns <c><<>></c> for + <c>latin1</c> encoding, as there is no BOM for ISO Latin-1.</p> + <p>Notice that the BOM for UTF-8 is seldom used, and it + is really not a <em>byte order</em> mark. There are obviously no + byte order issues with UTF-8, so the BOM is only there to + differentiate UTF-8 encoding from other UTF formats.</p> </desc> </func> </funcs> diff --git a/lib/stdlib/doc/src/unicode_usage.xml b/lib/stdlib/doc/src/unicode_usage.xml index b4c9385e33..efc8b75075 100644 --- a/lib/stdlib/doc/src/unicode_usage.xml +++ b/lib/stdlib/doc/src/unicode_usage.xml @@ -33,427 +33,495 @@ <rev>PA1</rev> <file>unicode_usage.xml</file> </header> -<section> -<title>Unicode Implementation</title> - <p>Implementing support for Unicode character sets is an ongoing - process. The Erlang Enhancement Proposal (EEP) 10 outlined the - basics of Unicode support and also specified a default encoding in - binaries that all Unicode-aware modules should handle in the - future.</p> - - <p>The functionality described in EEP10 was implemented in Erlang/OTP - R13A, but that was by no means the end of it. In Erlang/OTP R14B01 support - for Unicode file names was added, although it was in no way complete - and was by default disabled on platforms where no guarantee was given - for the file name encoding. With Erlang/OTP R16A came support for UTF-8 encoded - source code, among with enhancements to many of the applications to - support both Unicode encoded file names as well as support for UTF-8 - encoded files in several circumstances. Most notable is the support - for UTF-8 in files read by <c>file:consult/1</c>, release handler support - for UTF-8 and more support for Unicode character sets in the - I/O-system. In Erlang/OTP 17.0, the encoding default for Erlang source files was - switched to UTF-8.</p> - - <p>This guide outlines the current Unicode support and gives a couple - of recipes for working with Unicode data.</p> -</section> -<section> -<title>Understanding Unicode</title> - <p>Experience with the Unicode support in Erlang has made it - painfully clear that understanding Unicode characters and encodings - is not as easy as one would expect. The complexity of the field as - well as the implications of the standard requires thorough - understanding of concepts rarely before thought of.</p> - - <p>Furthermore the Erlang implementation requires understanding of - concepts that never were an issue for many (Erlang) programmers. To - understand and use Unicode characters requires that you study the - subject thoroughly, even if you're an experienced programmer.</p> - - <p>As an example, one could contemplate the issue of converting - between upper and lower case letters. Reading the standard will make - you realize that, to begin with, there's not a simple one to one - mapping in all scripts. Take German as an example, where there's a - letter "ß" (Sharp s) in lower case, but the uppercase equivalent is - "SS". Or Greek, where "Σ" has two different lowercase forms: "ς" in - word-final position and "σ" elsewhere. Or Turkish where dotted and - dot-less "i" both exist in lower case and upper case forms, or - Cyrillic "I" which usually has no lowercase form. Or of course - languages that have no concept of upper case (or lower case). So, a - conversion function will need to know not only one character at a - time, but possibly the whole sentence, maybe the natural language - the translation should be in and also take into account differences - in input and output string length and so on. There is at the time of - writing no Unicode to_upper/to_lower functionality in Erlang/OTP, but - there are publicly available libraries that address these issues.</p> - - <p>Another example is the accented characters where the same glyph - has two different representations. Let's look at the Swedish - "ö". There's a code point for that in the Unicode standard, but you - can also write it as "o" followed by U+0308 (Combining Diaeresis, - with the simplified meaning that the last letter should have a "¨" - above). They have exactly the same glyph. They are for most - purposes the same, but they have completely different - representations. For example MacOS X converts all file names to use - Combining Diaeresis, while most other programs (including Erlang) - try to hide that by doing the opposite when for example listing - directories. However it's done, it's usually important to normalize - such characters to avoid utter confusion.</p> - - <p>The list of examples can be made as long as the Unicode standard, I - suspect. The point is that one need a kind of knowledge that was - never needed when programs only took one or two languages into - account. The complexity of human languages and scripts, certainly - has made this a challenge when constructing a universal - standard. Supporting Unicode properly in your program <em>will</em> require - effort.</p> - -</section> -<section> -<title>What Unicode Is</title> - <p>Unicode is a standard defining code points (numbers) for all - known, living or dead, scripts. In principle, every known symbol - used in any language has a Unicode code point.</p> - <p>Unicode code points are defined and published by the <em>Unicode - Consortium</em>, which is a non profit organization.</p> - <p>Support for Unicode is increasing throughout the world of - computing, as the benefits of one common character set are - overwhelming when programs are used in a global environment.</p> - <p>Along with the base of the standard: the code points for all the - scripts, there are a couple of <em>encoding standards</em> available.</p> - <p>It is vital to understand the difference between encodings and - Unicode characters. Unicode characters are code points according to - the Unicode standard, while the encodings are ways to represent such - code points. An encoding is just a standard for representation, - UTF-8 can for example be used to represent a very limited part of - the Unicode character set (e.g. ISO-Latin-1), or the full Unicode - range. It's just an encoding format.</p> - <p>As long as all character sets were limited to 256 characters, - each character could be stored in one single byte, so there was more - or less only one practical encoding for the characters. Encoding - each character in one byte was so common that the encoding wasn't - even named. When we now, with the Unicode system, have a lot more - than 256 characters, we need a common way to represent these. The - common ways of representing the code points are the encodings. This - means a whole new concept to the programmer, the concept of - character representation, which was before a non-issue.</p> - - <p>Different operating systems and tools support different - encodings. For example Linux and MacOS X has chosen the UTF-8 - encoding, which is backwards compatible with 7-bit ASCII and - therefore affects programs written in plain English the - least. Windows on the other hand supports a limited version of - UTF-16, namely all the code planes where the characters can be - stored in one single 16-bit entity, which includes most living - languages.</p> - - <p>The most widely spread encodings are:</p> - <taglist> - <tag>Bytewise representation</tag> - <item>This is not a proper Unicode representation, but the - representation used for characters before the Unicode standard. It - can still be used to represent character code points in the Unicode - standard that have numbers below 256, which corresponds exactly to - the ISO-Latin-1 character set. In Erlang, this is commonly denoted - <c>latin1</c> encoding, which is slightly misleading as ISO-Latin-1 is - a character code range, not an encoding.</item> - <tag>UTF-8</tag> - <item>Each character is stored in one to four bytes depending on - code point. The encoding is backwards compatible with bytewise - representation of 7-bit ASCII as all 7-bit characters are stored - in one single byte in UTF-8. The characters beyond code point 127 - are stored in more bytes, letting the most significant bit in the - first character indicate a multi-byte character. For details on - the encoding, the RFC is publicly available. Note that UTF-8 is - <em>not</em> compatible with bytewise representation for - code points between 128 and 255, so a ISO-Latin-1 bytewise - representation is not generally compatible with UTF-8.</item> - <tag>UTF-16</tag> - <item>This encoding has many similarities to UTF-8, but the basic - unit is a 16-bit number. This means that all characters occupy at - least two bytes, some high numbers even four bytes. Some programs, - libraries and operating systems claiming to use UTF-16 only allows - for characters that can be stored in one 16-bit entity, which is - usually sufficient to handle living languages. As the basic unit - is more than one byte, byte-order issues occur, why UTF-16 exists - in both a big-endian and little-endian variant. In Erlang, the - full UTF-16 range is supported when applicable, like in the - <c>unicode</c> module and in the bit syntax.</item> - <tag>UTF-32</tag> - <item>The most straight forward representation. Each character is - stored in one single 32-bit number. There is no need for escapes - or any variable amount of entities for one character, all Unicode - code points can be stored in one single 32-bit entity. As with - UTF-16, there are byte-order issues, UTF-32 can be both big- and - little-endian.</item> - <tag>UCS-4</tag> - <item>Basically the same as UTF-32, but without some Unicode - semantics, defined by IEEE and has little use as a separate - encoding standard. For all normal (and possibly abnormal) usages, - UTF-32 and UCS-4 are interchangeable.</item> - </taglist> - <p>Certain ranges of numbers are left unused in the Unicode standard - and certain ranges are even deemed invalid. The most notable invalid - range is 16#D800 - 16#DFFF, as the UTF-16 encoding does not allow - for encoding of these numbers. It can be speculated that the UTF-16 - encoding standard was, from the beginning, expected to be able to - hold all Unicode characters in one 16-bit entity, but then had to be - extended, leaving a hole in the Unicode range to cope with backward - compatibility.</p> - <p>Additionally, the code point 16#FEFF is used for byte order marks - (BOM's) and use of that character is not encouraged in other - contexts than that. It actually is valid though, as the character - "ZWNBS" (Zero Width Non Breaking Space). BOM's are used to identify - encodings and byte order for programs where such parameters are not - known in advance. Byte order marks are more seldom used than one - could expect, but their use might become more widely spread as they - provide the means for programs to make educated guesses about the - Unicode format of a certain file.</p> -</section> -<section> - <title>Areas of Unicode Support</title> - <p>To support Unicode in Erlang, problems in several areas have been - addressed. Each area is described briefly in this section and more - thoroughly further down in this document:</p> - <taglist> - <tag>Representation</tag> - <item>To handle Unicode characters in Erlang, we have to have a - common representation both in lists and binaries. The EEP (10) and - the subsequent initial implementation in Erlang/OTP R13A settled a standard - representation of Unicode characters in Erlang.</item> - <tag>Manipulation</tag> - <item>The Unicode characters need to be processed by the Erlang - program, why library functions need to be able to handle them. In - some cases functionality was added to already existing interfaces - (as the string module now can handle lists with arbitrary code points), - in some cases new functionality or options need to be added (as in - the <c>io</c>-module, the file handling, the <c>unicode</c> module - and the bit syntax). Today most modules in kernel and STDLIB, as - well as the VM are Unicode aware.</item> - <tag>File I/O</tag> - <item>I/O is by far the most problematic area for Unicode. A file - is an entity where bytes are stored and the lore of programming - has been to treat characters and bytes as interchangeable. With - Unicode characters, you need to decide on an encoding as soon as - you want to store the data in a file. In Erlang you can open a - text file with an encoding option, so that you can read characters - from it rather than bytes, but you can also open a file for - bytewise I/O. The I/O-system of Erlang has been designed (or at - least used) in a way where you expect any I/O-server to be - able to cope with any string data, but that is no longer the case - when you work with Unicode characters. Handling the fact that you - need to know the capabilities of the device where your data ends - up is something new to the Erlang programmer. Furthermore, ports - in Erlang are byte oriented, so an arbitrary string of (Unicode) - characters can not be sent to a port without first converting it - to an encoding of choice.</item> - <tag>Terminal I/O</tag> - <item>Terminal I/O is slightly easier than file I/O. The output is - meant for human reading and is usually Erlang syntax (e.g. in the - shell). There exists syntactic representation of any Unicode - character without actually displaying the glyph (instead written - as <c>\x{</c>HHH<c>}</c>), so Unicode data can usually be displayed - even if the terminal as such do not support the whole Unicode - range.</item> - <tag>File names</tag> - <item>File names can be stored as Unicode strings, in different - ways depending on the underlying OS and file system. This can be - handled fairly easy by a program. The problems arise when the file - system is not consistent in it's encodings, like for example - Linux. Linux allows files to be named with any sequence of bytes, - leaving to each program to interpret those bytes. On systems where - these "transparent" file names are used, Erlang has to be informed - about the file name encoding by a startup flag. The default is - bytewise interpretation, which is actually usually wrong, but - allows for interpretation of <em>all</em> file names. The concept - of "raw file names" can be used to handle wrongly encoded - file names if one enables Unicode file name translation - (<c>+fnu</c>) on platforms where this is not the default.</item> - <tag>Source code encoding</tag> - <item>When it comes to the Erlang source code, there is support - for the UTF-8 encoding and bytewise encoding. The default in - Erlang/OTP R16B was bytewise (or latin1) encoding; in Erlang/OTP 17.0 - it was changed to UTF-8. You can control the encoding by a comment like: -<code> -%% -*- coding: utf-8 -*- -</code> - in the beginning of the file. This of course requires your editor to - support UTF-8 as well. The same comment is also interpreted by - functions like <c>file:consult/1</c>, the release handler etc, so that - you can have all text files in your source directories in UTF-8 - encoding. - </item> - <tag>The language</tag> - <item>Having the source code in UTF-8 also allows you to write - string literals containing Unicode characters with code points > - 255, although atoms, module names and function names are - restricted to the ISO-Latin-1 range. Binary - literals where you use the <c>/utf8</c> type, can also be - expressed using Unicode characters > 255. Having module names - using characters other than 7-bit ASCII can cause trouble on - operating systems with inconsistent file naming schemes, and might - also hurt portability, so it's not really recommended. It is - suggested in EEP 40 that the language should also allow for - Unicode characters > 255 in variable names. Whether to - implement that EEP or not is yet to be decided.</item> - </taglist> -</section> -<section> - <title>Standard Unicode Representation</title> - <p>In Erlang, strings are actually lists of integers. A string was - up until Erlang/OTP R13 defined to be encoded in the ISO-latin-1 (ISO8859-1) - character set, which is, code point by code point, a sub-range of - the Unicode character set.</p> - <p>The standard list encoding for strings was therefore easily - extended to cope with the whole Unicode range: A Unicode string in - Erlang is simply a list containing integers, each integer being a - valid Unicode code point and representing one character in the - Unicode character set.</p> - <p>Erlang strings in ISO-latin-1 are a subset of Unicode - strings.</p> - <p>Only if a string contains code points < 256, can it be - directly converted to a binary by using - i.e. <c>erlang:iolist_to_binary/1</c> or can be sent directly to a - port. If the string contains Unicode characters > 255, an - encoding has to be decided upon and the string should be converted - to a binary in the preferred encoding using - <c>unicode:characters_to_binary/{1,2,3}</c>. Strings are not - generally lists of bytes, as they were before Erlang/OTP R13. They are lists of - characters. Characters are not generally bytes, they are Unicode - code points.</p> - - <p>Binaries are more troublesome. For performance reasons, programs - often store textual data in binaries instead of lists, mainly - because they are more compact (one byte per character instead of two - words per character, as is the case with lists). Using - <c>erlang:list_to_binary/1</c>, an ISO-Latin-1 Erlang string could - be converted into a binary, effectively using bytewise encoding - - one byte per character. This was very convenient for those limited - Erlang strings, but cannot be done for arbitrary Unicode lists.</p> - <p>As the UTF-8 encoding is widely spread and provides some backward - compatibility in the 7-bit ASCII range, it is selected as the - standard encoding for Unicode characters in binaries for Erlang.</p> - <p>The standard binary encoding is used whenever a library function - in Erlang should cope with Unicode data in binaries, but is of - course not enforced when communicating externally. Functions and - bit-syntax exist to encode and decode both UTF-8, UTF-16 and UTF-32 - in binaries. Library functions dealing with binaries and Unicode in - general, however, only deal with the default encoding.</p> - - <p>Character data may be combined from several sources, sometimes - available in a mix of strings and binaries. Erlang has for long had - the concept of <c>iodata</c> or <c>iolist</c>s, where binaries and - lists can be combined to represent a sequence of bytes. In the same - way, the Unicode aware modules often allow for combinations of - binaries and lists where the binaries have characters encoded in - UTF-8 and the lists contain such binaries or numbers representing - Unicode code points:</p> - <code type="none"> + <section> + <title>Unicode Implementation</title> + <p>Implementing support for Unicode character sets is an ongoing process. + The Erlang Enhancement Proposal (EEP) 10 outlined the basics of Unicode + support and specified a default encoding in binaries that all + Unicode-aware modules are to handle in the future.</p> + + <p>Here is an overview what has been done so far:</p> + + <list type="bulleted"> + <item><p>The functionality described in EEP10 was implemented + in Erlang/OTP R13A.</p></item> + + <item><p>Erlang/OTP R14B01 added support for Unicode + filenames, but it was not complete and was by default + disabled on platforms where no guarantee was given for the + filename encoding.</p></item> + + <item><p>With Erlang/OTP R16A came support for UTF-8 encoded + source code, with enhancements to many of the applications to + support both Unicode encoded filenames and support for UTF-8 + encoded files in many circumstances. Most notable is the + support for UTF-8 in files read by <seealso + marker="kernel:file#consult/1"><c>file:consult/1</c></seealso>, + release handler support for UTF-8, and more support for + Unicode character sets in the I/O system.</p></item> + + <item><p>In Erlang/OTP 17.0, the encoding default for Erlang + source files was switched to UTF-8.</p></item> + </list> + + <p>This section outlines the current Unicode support and gives some + recipes for working with Unicode data.</p> + </section> + + <section> + <title>Understanding Unicode</title> + <p>Experience with the Unicode support in Erlang has made it clear that + understanding Unicode characters and encodings is not as easy as one + would expect. The complexity of the field and the implications of the + standard require thorough understanding of concepts rarely before + thought of.</p> + + <p>Also, the Erlang implementation requires understanding of + concepts that were never an issue for many (Erlang) programmers. To + understand and use Unicode characters requires that you study the + subject thoroughly, even if you are an experienced programmer.</p> + + <p>As an example, contemplate the issue of converting between upper and + lower case letters. Reading the standard makes you realize that there is + not a simple one to one mapping in all scripts, for example:</p> + + <list type="bulleted"> + <item> + <p>In German, the letter "ß" (sharp s) is in lower case, but the + uppercase equivalent is "SS".</p> + </item> + <item> + <p>In Greek, the letter "Σ" has two different lowercase forms, + "ς" in word-final position and "σ" elsewhere.</p> + </item> + <item> + <p>In Turkish, both dotted and dotless "i" exist in lower case and + upper case forms.</p> + </item> + <item> + <p>Cyrillic "I" has usually no lowercase form.</p> + </item> + <item> + <p>Languages with no concept of upper case (or lower case).</p> + </item> + </list> + + <p>So, a conversion function must know not only one character at a time, + but possibly the whole sentence, the natural language to translate to, + the differences in input and output string length, and so on. + Erlang/OTP has currently no Unicode <c>to_upper</c>/<c>to_lower</c> + functionality, but publicly available libraries address these issues.</p> + + <p>Another example is the accented characters, where the same glyph has two + different representations. The Swedish letter "ö" is one example. + The Unicode standard has a code point for it, but you can also write it + as "o" followed by "U+0308" (Combining Diaeresis, with the simplified + meaning that the last letter is to have "¨" above). They have the same + glyph. They are for most purposes the same, but have different + representations. For example, MacOS X converts all filenames to use + Combining Diaeresis, while most other programs (including Erlang) try to + hide that by doing the opposite when, for example, listing directories. + However it is done, it is usually important to normalize such + characters to avoid confusion.</p> + + <p>The list of examples can be made long. One need a kind of knowledge that + was not needed when programs only considered one or two languages. The + complexity of human languages and scripts has certainly made this a + challenge when constructing a universal standard. Supporting Unicode + properly in your program will require effort.</p> + </section> + + <section> + <title>What Unicode Is</title> + <p>Unicode is a standard defining code points (numbers) for all known, + living or dead, scripts. In principle, every symbol used in any + language has a Unicode code point. Unicode code points are defined and + published by the Unicode Consortium, which is a non-profit + organization.</p> + + <p>Support for Unicode is increasing throughout the world of computing, as + the benefits of one common character set are overwhelming when programs + are used in a global environment. Along with the base of the standard, + the code points for all the scripts, some <em>encoding standards</em> are + available.</p> + + <p>It is vital to understand the difference between encodings and Unicode + characters. Unicode characters are code points according to the Unicode + standard, while the encodings are ways to represent such code points. An + encoding is only a standard for representation. UTF-8 can, for example, + be used to represent a very limited part of the Unicode character set + (for example ISO-Latin-1) or the full Unicode range. It is only an + encoding format.</p> + + <p>As long as all character sets were limited to 256 characters, each + character could be stored in one single byte, so there was more or less + only one practical encoding for the characters. Encoding each character + in one byte was so common that the encoding was not even named. With the + Unicode system there are much more than 256 characters, so a common way + is needed to represent these. The common ways of representing the code + points are the encodings. This means a whole new concept to the + programmer, the concept of character representation, which was a + non-issue earlier.</p> + + <p>Different operating systems and tools support different encodings. For + example, Linux and MacOS X have chosen the UTF-8 encoding, which is + backward compatible with 7-bit ASCII and therefore affects programs + written in plain English the least. Windows supports a limited version + of UTF-16, namely all the code planes where the characters can be + stored in one single 16-bit entity, which includes most living + languages.</p> + + <p>The following are the most widely spread encodings:</p> + + <taglist> + <tag>Bytewise representation</tag> + <item> + <p>This is not a proper Unicode representation, but the representation + used for characters before the Unicode standard. It can still be used + to represent character code points in the Unicode standard with + numbers < 256, which exactly corresponds to the ISO Latin-1 + character set. In Erlang, this is commonly denoted <c>latin1</c> + encoding, which is slightly misleading as ISO Latin-1 is a + character code range, not an encoding.</p> + </item> + <tag>UTF-8</tag> + <item> + <p>Each character is stored in one to four bytes depending on code + point. The encoding is backward compatible with bytewise + representation of 7-bit ASCII, as all 7-bit characters are stored in + one single byte in UTF-8. The characters beyond code point 127 are + stored in more bytes, letting the most significant bit in the first + character indicate a multi-byte character. For details on the + encoding, the RFC is publicly available.</p> + <p>Notice that UTF-8 is <em>not</em> compatible with bytewise + representation for code points from 128 through 255, so an ISO + Latin-1 bytewise representation is generally incompatible with + UTF-8.</p> + </item> + <tag>UTF-16</tag> + <item> + <p>This encoding has many similarities to UTF-8, but the basic + unit is a 16-bit number. This means that all characters occupy + at least two bytes, and some high numbers four bytes. Some + programs, libraries, and operating systems claiming to use + UTF-16 only allow for characters that can be stored in one + 16-bit entity, which is usually sufficient to handle living + languages. As the basic unit is more than one byte, byte-order + issues occur, which is why UTF-16 exists in both a big-endian + and a little-endian variant.</p> + <p>In Erlang, the full UTF-16 range is supported when applicable, like + in the <seealso marker="stdlib:unicode"><c>unicode</c></seealso> + module and in the bit syntax.</p> + </item> + <tag>UTF-32</tag> + <item> + <p>The most straightforward representation. Each character is stored in + one single 32-bit number. There is no need for escapes or any + variable number of entities for one character. All Unicode code + points can be stored in one single 32-bit entity. As with UTF-16, + there are byte-order issues. UTF-32 can be both big-endian and + little-endian.</p> + </item> + <tag>UCS-4</tag> + <item> + <p>Basically the same as UTF-32, but without some Unicode semantics, + defined by IEEE, and has little use as a separate encoding standard. + For all normal (and possibly abnormal) use, UTF-32 and UCS-4 are + interchangeable.</p> + </item> + </taglist> + + <p>Certain number ranges are unused in the Unicode standard and certain + ranges are even deemed invalid. The most notable invalid range is + 16#D800-16#DFFF, as the UTF-16 encoding does not allow for encoding of + these numbers. This is possibly because the UTF-16 encoding standard, + from the beginning, was expected to be able to hold all Unicode + characters in one 16-bit entity, but was then extended, leaving a hole + in the Unicode range to handle backward compatibility.</p> + + <p>Code point 16#FEFF is used for Byte Order Marks (BOMs) and use of that + character is not encouraged in other contexts. It is valid though, as + the character "ZWNBS" (Zero Width Non Breaking Space). BOMs are used to + identify encodings and byte order for programs where such parameters are + not known in advance. BOMs are more seldom used than expected, but can + become more widely spread as they provide the means for programs to make + educated guesses about the Unicode format of a certain file.</p> + </section> + + <section> + <title>Areas of Unicode Support</title> + <p>To support Unicode in Erlang, problems in various areas have been + addressed. This section describes each area briefly and more + thoroughly later in this User's Guide.</p> + + <taglist> + <tag>Representation</tag> + <item> + <p>To handle Unicode characters in Erlang, a common representation + in both lists and binaries is needed. EEP (10) and the subsequent + initial implementation in Erlang/OTP R13A settled a standard + representation of Unicode characters in Erlang.</p> + </item> + <tag>Manipulation</tag> + <item> + <p>The Unicode characters need to be processed by the Erlang + program, which is why library functions must be able to handle + them. In some cases functionality has been added to already + existing interfaces (as the <seealso + marker="stdlib:string"><c>string</c></seealso> module now can + handle lists with any code points). In some cases new + functionality or options have been added (as in the <seealso + marker="stdlib:io"><c>io</c></seealso> module, the file + handling, the <seealso + marker="stdlib:unicode"><c>unicode</c></seealso> module, and + the bit syntax). Today most modules in Kernel and + STDLIB, as well as the VM are Unicode-aware.</p> + </item> + <tag>File I/O</tag> + <item> + <p>I/O is by far the most problematic area for Unicode. A file is an + entity where bytes are stored, and the lore of programming has been + to treat characters and bytes as interchangeable. With Unicode + characters, you must decide on an encoding when you want to store + the data in a file. In Erlang, you can open a text file with an + encoding option, so that you can read characters from it rather than + bytes, but you can also open a file for bytewise I/O.</p> + <p>The Erlang I/O-system has been designed (or at least used) in a way + where you expect any I/O server to handle any string data. + That is, however, no longer the case when working with Unicode + characters. The Erlang programmer must now know the + capabilities of the device where the data ends up. Also, ports in + Erlang are byte-oriented, so an arbitrary string of (Unicode) + characters cannot be sent to a port without first converting it to an + encoding of choice.</p> + </item> + <tag>Terminal I/O</tag> + <item> + <p>Terminal I/O is slightly easier than file I/O. The output is meant + for human reading and is usually Erlang syntax (for example, in the + shell). There exists syntactic representation of any Unicode + character without displaying the glyph (instead written as + <c>\x</c>{<c>HHH</c>}). Unicode data can therefore usually be + displayed even if the terminal as such does not support the whole + Unicode range.</p> + </item> + <tag>Filenames</tag> + <item> + <p>Filenames can be stored as Unicode strings in different ways + depending on the underlying operating system and file system. This + can be handled fairly easy by a program. The problems arise when the + file system is inconsistent in its encodings. For example, Linux + allows files to be named with any sequence of bytes, leaving to each + program to interpret those bytes. On systems where these + "transparent" filenames are used, Erlang must be informed about the + filename encoding by a startup flag. The default is bytewise + interpretation, which is usually wrong, but allows for interpretation + of <em>all</em> filenames.</p> + <p>The concept of "raw filenames" can be used to handle wrongly encoded + filenames if one enables Unicode filename translation (<c>+fnu</c>) + on platforms where this is not the default.</p> + </item> + <tag>Source code encoding</tag> + <item> + <p>The Erlang source code has support for the UTF-8 encoding + and bytewise encoding. The default in Erlang/OTP R16B was bytewise + (<c>latin1</c>) encoding. It was changed to UTF-8 in Erlang/OTP 17.0. + You can control the encoding by a comment like the following in the + beginning of the file:</p> + <code> +%% -*- coding: utf-8 -*-</code> + <p>This of course requires your editor to support UTF-8 as well. The + same comment is also interpreted by functions like + <seealso marker="kernel:file#consult/1"><c>file:consult/1</c></seealso>, + the release handler, and so on, so that you can have all text files + in your source directories in UTF-8 encoding.</p> + </item> + <tag>The language</tag> + <item> + <p>Having the source code in UTF-8 also allows you to write string + literals containing Unicode characters with code points > 255, + although atoms, module names, and function names are restricted to + the ISO Latin-1 range. Binary literals, where you use type + <c>/utf8</c>, can also be expressed using Unicode characters > 255. + Having module names using characters other than 7-bit ASCII can cause + trouble on operating systems with inconsistent file naming schemes, + and can hurt portability, so it is not recommended.</p> + <p>EEP 40 suggests that the language is also to allow for Unicode + characters > 255 in variable names. Whether to implement that EEP + is yet to be decided.</p> + </item> + </taglist> + </section> + + <section> + <title>Standard Unicode Representation</title> + <p>In Erlang, strings are lists of integers. A string was until + Erlang/OTP R13 defined to be encoded in the ISO Latin-1 (ISO 8859-1) + character set, which is, code point by code point, a subrange of the + Unicode character set.</p> + + <p>The standard list encoding for strings was therefore easily extended to + handle the whole Unicode range. A Unicode string in Erlang is a list + containing integers, where each integer is a valid Unicode code point and + represents one character in the Unicode character set.</p> + + <p>Erlang strings in ISO Latin-1 are a subset of Unicode strings.</p> + + <p>Only if a string contains code points < 256, can it be directly + converted to a binary by using, for example, + <seealso marker="erts:erlang#iolist_to_binary/1"><c>erlang:iolist_to_binary/1</c></seealso> + or can be sent directly to a port. If the string contains Unicode + characters > 255, an encoding must be decided upon and the string is to + be converted to a binary in the preferred encoding using + <seealso marker="stdlib:unicode#characters_to_binary/1"><c>unicode:characters_to_binary/1,2,3</c></seealso>. + Strings are not generally lists of bytes, as they were before + Erlang/OTP R13, they are lists of characters. Characters are not + generally bytes, they are Unicode code points.</p> + + <p>Binaries are more troublesome. For performance reasons, programs often + store textual data in binaries instead of lists, mainly because they are + more compact (one byte per character instead of two words per character, + as is the case with lists). Using + <seealso marker="erts:erlang#list_to_binary/1"><c>erlang:list_to_binary/1</c></seealso>, + an ISO Latin-1 Erlang string can be converted into a binary, effectively + using bytewise encoding: one byte per character. This was convenient for + those limited Erlang strings, but cannot be done for arbitrary Unicode + lists.</p> + + <p>As the UTF-8 encoding is widely spread and provides some backward + compatibility in the 7-bit ASCII range, it is selected as the standard + encoding for Unicode characters in binaries for Erlang.</p> + + <p>The standard binary encoding is used whenever a library function in + Erlang is to handle Unicode data in binaries, but is of course not + enforced when communicating externally. Functions and bit syntax exist to + encode and decode both UTF-8, UTF-16, and UTF-32 in binaries. However, + library functions dealing with binaries and Unicode in general only deal + with the default encoding.</p> + + <p>Character data can be combined from many sources, sometimes available in + a mix of strings and binaries. Erlang has for long had the concept of + <c>iodata</c> or <c>iolist</c>s, where binaries and lists can be combined + to represent a sequence of bytes. In the same way, the Unicode-aware + modules often allow for combinations of binaries and lists, where the + binaries have characters encoded in UTF-8 and the lists contain such + binaries or numbers representing Unicode code points:</p> + + <code type="none"> unicode_binary() = binary() with characters encoded in UTF-8 coding standard chardata() = charlist() | unicode_binary() charlist() = maybe_improper_list(char() | unicode_binary() | charlist(), - unicode_binary() | nil())</code> - <p>The module <seealso - marker="stdlib:unicode"><c>unicode</c></seealso> in STDLIB even - supports similar mixes with binaries containing other encodings than - UTF-8, but that is a special case to allow for conversions to and - from external data:</p> - <code type="none"> -external_unicode_binary() = binary() with characters coded in - a user specified Unicode encoding other than UTF-8 (UTF-16 or UTF-32) + unicode_binary() | nil())</code> + + <p>The module <seealso marker="stdlib:unicode"><c>unicode</c></seealso> + even supports similar mixes with binaries containing other encodings than + UTF-8, but that is a special case to allow for conversions to and from + external data:</p> + + <code type="none"> +external_unicode_binary() = binary() with characters coded in a user-specified + Unicode encoding other than UTF-8 (UTF-16 or UTF-32) external_chardata() = external_charlist() | external_unicode_binary() -external_charlist() = maybe_improper_list(char() | - external_unicode_binary() | - external_charlist(), - external_unicode_binary() | nil())</code> -</section> -<section> - <title>Basic Language Support</title> - <p><marker id="unicode_in_erlang"/>As of Erlang/OTP R16 Erlang - source files can be written in either UTF-8 or bytewise encoding - (a.k.a. <c>latin1</c> encoding). The details on how to state the encoding - of an Erlang source file can be found in - <seealso marker="stdlib:epp#encoding"><c>epp(3)</c></seealso>. Strings and comments - can be written using Unicode, but functions still have to be named - using characters from the ISO-latin-1 character set and atoms are - restricted to the same ISO-latin-1 range. These restrictions in the - language are of course independent of the encoding of the source - file.</p> +external_charlist() = maybe_improper_list(char() | external_unicode_binary() | + external_charlist(), external_unicode_binary() | nil())</code> + </section> + <section> - <title>Bit-syntax</title> - <p>The bit-syntax contains types for coping with binary data in the - three main encodings. The types are named <c>utf8</c>, <c>utf16</c> - and <c>utf32</c> respectively. The <c>utf16</c> and <c>utf32</c> types - can be in a big- or little-endian variant:</p> - <code> + <title>Basic Language Support</title> + <p><marker id="unicode_in_erlang"/>As from Erlang/OTP R16, Erlang source + files can be written in UTF-8 or bytewise (<c>latin1</c>) encoding. For + information about how to state the encoding of an Erlang source file, see + the <seealso marker="stdlib:epp#encoding"><c>epp(3)</c></seealso> module. + Strings and comments can be written using Unicode, but functions must + still be named using characters from the ISO Latin-1 character set, and + atoms are restricted to the same ISO Latin-1 range. These restrictions in + the language are of course independent of the encoding of the source + file.</p> + + <section> + <title>Bit Syntax</title> + <p>The bit syntax contains types for handling binary data in the + three main encodings. The types are named <c>utf8</c>, <c>utf16</c>, + and <c>utf32</c>. The <c>utf16</c> and <c>utf32</c> types can be in a + big-endian or a little-endian variant:</p> + + <code> <<Ch/utf8,_/binary>> = Bin1, <<Ch/utf16-little,_/binary>> = Bin2, Bin3 = <<$H/utf32-little, $e/utf32-little, $l/utf32-little, $l/utf32-little, $o/utf32-little>>,</code> - <p>For convenience, literal strings can be encoded with a Unicode - encoding in binaries using the following (or similar) syntax:</p> - <code> + + <p>For convenience, literal strings can be encoded with a Unicode + encoding in binaries using the following (or similar) syntax:</p> + + <code> Bin4 = <<"Hello"/utf16>>,</code> - </section> - <section> - <title>String and Character Literals</title> - <p>For source code, there is an extension to the <c>\</c>OOO - (backslash followed by three octal numbers) and <c>\x</c>HH - (backslash followed by <c>x</c>, followed by two hexadecimal - characters) syntax, namely <c>\x{</c>H ...<c>}</c> (a backslash - followed by an <c>x</c>, followed by left curly bracket, any - number of hexadecimal digits and a terminating right curly - bracket). This allows for entering characters of any code point - literally in a string even when the encoding of the source file is - bytewise (<c>latin1</c>).</p> - <p>In the shell, if using a Unicode input device, or in source - code stored in UTF-8, <c>$</c> can be followed directly by a - Unicode character producing an integer. In the following example - the code point of a Cyrillic <c>с</c> is output:</p> - <pre> + </section> + + <section> + <title>String and Character Literals</title> + <p>For source code, there is an extension to syntax <c>\</c>OOO + (backslash followed by three octal numbers) and <c>\x</c>HH (backslash + followed by <c>x</c>, followed by two hexadecimal characters), namely + <c>\x{</c>H ...<c>}</c> (backslash followed by <c>x</c>, followed by + left curly bracket, any number of hexadecimal digits, and a terminating + right curly bracket). This allows for entering characters of any code + point literally in a string even when the encoding of the source file + is bytewise (<c>latin1</c>).</p> + + <p>In the shell, if using a Unicode input device, or in source code + stored in UTF-8, <c>$</c> can be followed directly by a Unicode + character producing an integer. In the following example, the code + point of a Cyrillic <c>с</c> is output:</p> + + <pre> 7> <input>$с.</input> 1089</pre> - </section> - <section> - <title>Heuristic String Detection</title> - <p>In certain output functions and in the output of return values - in the shell, Erlang tries to heuristically detect string data in - lists and binaries. Typically you will see heuristic detection in - a situation like this:</p> - <pre> + </section> + + <section> + <title>Heuristic String Detection</title> + <p>In certain output functions and in the output of return values in + the shell, Erlang tries to detect string data in lists and binaries + heuristically. Typically you will see heuristic detection in a + situation like this:</p> + + <pre> 1> <input>[97,98,99].</input> "abc" 2> <input><<97,98,99>>.</input> <<"abc">> 3> <input><<195,165,195,164,195,182>>.</input> <<"åäö"/utf8>></pre> - <p>Here the shell will detect lists containing printable - characters or binaries containing printable characters either in - bytewise or UTF-8 encoding. The question here is: what is a - printable character? One view would be that anything the Unicode - standard thinks is printable, will also be printable according to - the heuristic detection. The result would be that almost any list - of integers will be deemed a string, resulting in all sorts of - characters being printed, maybe even characters your terminal does - not have in its font set (resulting in some generic output you - probably will not appreciate). Another way is to keep it backwards - compatible so that only the ISO-Latin-1 character set is used to - detect a string. A third way would be to let the user decide - exactly what Unicode ranges are to be viewed as characters. Since - Erlang/OTP R16B you can select either the whole Unicode range or the - ISO-Latin-1 range by supplying the startup flag <c>+pc - </c><i>Range</i>, where <i>Range</i> is either <c>latin1</c> or - <c>unicode</c>. For backwards compatibility, the default is - <c>latin1</c>. This only controls how heuristic string detection - is done. In the future, more ranges are expected to be added, so - that one can tailor the heuristics to the language and region - relevant to the user.</p> - <p>Lets look at an example with the two different startup options:</p> -<pre> + + <p>Here the shell detects lists containing printable characters or + binaries containing printable characters in bytewise or UTF-8 encoding. + But what is a printable character? One view is that anything the Unicode + standard thinks is printable, is also printable according to the + heuristic detection. The result is then that almost any list of + integers are deemed a string, and all sorts of characters are printed, + maybe also characters that your terminal lacks in its font set + (resulting in some unappreciated generic output). + Another way is to keep it backward compatible so that only the ISO + Latin-1 character set is used to detect a string. A third way is to let + the user decide exactly what Unicode ranges that are to be viewed as + characters.</p> + + <p>As from Erlang/OTP R16B you can select the ISO Latin-1 range or the + whole Unicode range by supplying startup flag <c>+pc latin1</c> or + <c>+pc unicode</c>, respectively. For backward compatibility, + <c>latin1</c> is default. This only controls how heuristic string + detection is done. More ranges are expected to be added in the future, + enabling tailoring of the heuristics to the language and region + relevant to the user.</p> + + <p>The following examples show the two startup options:</p> + + <pre> $ <input>erl +pc latin1</input> Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false] @@ -467,9 +535,9 @@ Eshell V5.10.1 (abort with ^G) 4> <input><<208,174,208,189,208,184,208,186,208,190,208,180>>.</input> <<208,174,208,189,208,184,208,186,208,190,208,180>> 5> <input><<229/utf8,228/utf8,246/utf8>>.</input> -<<"åäö"/utf8>> -</pre> -<pre> +<<"åäö"/utf8>></pre> + + <pre> $ <input>erl +pc unicode</input> Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false] @@ -483,78 +551,88 @@ Eshell V5.10.1 (abort with ^G) 4> <input><<208,174,208,189,208,184,208,186,208,190,208,180>>.</input> <<"Юникод"/utf8>> 5> <input><<229/utf8,228/utf8,246/utf8>>.</input> -<<"åäö"/utf8>> -</pre> - <p>In the examples, we can see that the default Erlang shell will - only interpret characters from the ISO-Latin1 range as printable - and will only detect lists or binaries with those "printable" - characters as containing string data. The valid UTF-8 binary - containing "Юникод", will not be printed as a string. When, on the - other hand, started with all Unicode characters printable (<c>+pc - unicode</c>), the shell will output anything containing printable - Unicode data (in binaries either UTF-8 or bytewise encoded) as - string data.</p> - - <p>These heuristics are also used by - <c>io</c>(<c>_lib</c>)<c>:format/2</c> and friends when the - <c>t</c> modifier is used in conjunction with <c>~p</c> or - <c>~P</c>:</p> -<pre> +<<"åäö"/utf8>></pre> + + <p>In the examples, you can see that the default Erlang shell interprets + only characters from the ISO Latin1 range as printable and only detects + lists or binaries with those "printable" characters as containing + string data. The valid UTF-8 binary containing the Russian word + "Юникод", is not printed as a string. When started with all Unicode + characters printable (<c>+pc unicode</c>), the shell outputs anything + containing printable Unicode data (in binaries, either UTF-8 or + bytewise encoded) as string data.</p> + + <p>These heuristics are also used by + <seealso marker="stdlib:io#format/2"><c>io:format/2</c></seealso>, + <seealso marker="stdlib:io_lib#format/2"><c>io_lib:format/2</c></seealso>, + and friends when modifier <c>t</c> is used with <c>~p</c> or + <c>~P</c>:</p> + + <pre> $ <input>erl +pc latin1</input> Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false] Eshell V5.10.1 (abort with ^G) 1> <input>io:format("~tp~n",[{<<"åäö">>, <<"åäö"/utf8>>, <<208,174,208,189,208,184,208,186,208,190,208,180>>}]).</input> {<<"åäö">>,<<"åäö"/utf8>>,<<208,174,208,189,208,184,208,186,208,190,208,180>>} -ok -</pre> -<pre> +ok</pre> + + <pre> $ <input>erl +pc unicode</input> Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false] Eshell V5.10.1 (abort with ^G) 1> <input>io:format("~tp~n",[{<<"åäö">>, <<"åäö"/utf8>>, <<208,174,208,189,208,184,208,186,208,190,208,180>>}]).</input> {<<"åäö">>,<<"åäö"/utf8>>,<<"Юникод"/utf8>>} -ok -</pre> - <p>Please observe that this only affects <i>heuristic</i> interpretation - of lists and binaries on output. For example the <c>~ts</c> format - sequence does always output a valid lists of characters, - regardless of the <c>+pc</c> setting, as the programmer has - explicitly requested string output.</p> +ok</pre> + + <p>Notice that this only affects <em>heuristic</em> interpretation of + lists and binaries on output. For example, the <c>~ts</c> format + sequence always outputs a valid list of characters, regardless of the + <c>+pc</c> setting, as the programmer has explicitly requested string + output.</p> + </section> </section> -</section> -<section> - <title>The Interactive Shell</title> - <p>The interactive Erlang shell, when started towards a terminal or - started using the <c>werl</c> command on windows, can support - Unicode input and output.</p> - <p>On Windows, proper operation requires that a suitable font - is installed and selected for the Erlang application to use. If no - suitable font is available on your system, try installing the DejaVu - fonts (<c>dejavu-fonts.org</c>), which are freely available and then - select that font in the Erlang shell application.</p> - <p>On Unix-like operating systems, the terminal should be able - to handle UTF-8 on input and output (modern versions of XTerm, KDE - konsole and the Gnome terminal do for example) and your locale - settings have to be proper. As an example, my <c>LANG</c> - environment variable is set as this:</p> - <pre> + + <section> + <title>The Interactive Shell</title> + <p>The interactive Erlang shell, when started to a terminal or started + using command <c>werl</c> on Windows, can support Unicode input and + output.</p> + + <p>On Windows, proper operation requires that a suitable font is + installed and selected for the Erlang application to use. If no suitable + font is available on your system, try installing the + <url href="http://dejavu-fonts.org">DejaVu fonts</url>, which are freely + available, and then select that font in the Erlang shell application.</p> + + <p>On Unix-like operating systems, the terminal is to be able to handle + UTF-8 on input and output (this is done by, for example, modern versions + of XTerm, KDE Konsole, and the Gnome terminal) + and your locale settings must be proper. As + an example, a <c>LANG</c> environment variable can be set as follows:</p> + + <pre> $ <input>echo $LANG</input> en_US.UTF-8</pre> - <p>Actually, most systems handle the <c>LC_CTYPE</c> variable before - <c>LANG</c>, so if that is set, it has to be set to - <c>UTF-8</c>:</p> - <pre> + + <p>Most systems handle variable <c>LC_CTYPE</c> before <c>LANG</c>, so if + that is set, it must be set to <c>UTF-8</c>:</p> + + <pre> $ echo <input>$LC_CTYPE</input> en_US.UTF-8</pre> - <p>The <c>LANG</c> or <c>LC_CTYPE</c> setting should be consistent - with what the terminal is capable of, there is no portable way for - Erlang to ask the actual terminal about its UTF-8 capacity, we have - to rely on the language and character type settings.</p> - <p>To investigate what Erlang thinks about the terminal, the - <c>io:getopts()</c> call can be used when the shell is started:</p> - <pre> + + <p>The <c>LANG</c> or <c>LC_CTYPE</c> setting are to be consistent with + what the terminal is capable of. There is no portable way for Erlang to + ask the terminal about its UTF-8 capacity, we have to rely on the + language and character type settings.</p> + + <p>To investigate what Erlang thinks about the terminal, the call + <seealso marker="stdlib:io#getopts/1"><c>io:getopts()</c></seealso> + can be used when the shell is started:</p> + + <pre> $ <input>LC_CTYPE=en_US.ISO-8859-1 erl</input> Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false] @@ -571,27 +649,31 @@ Eshell V5.10.1 (abort with ^G) {encoding,unicode} 2></pre> - <p>When (finally?) everything is in order with the locale settings, - fonts and the terminal emulator, you probably also have discovered a - way to input characters in the script you desire. For testing, the - simplest way is to add some keyboard mappings for other languages, - usually done with some applet in your desktop environment. In my KDE - environment, I start the KDE Control Center (Personal Settings), - select "Regional and Accessibility" and then "Keyboard Layout". On - Windows XP, I start Control Panel->Regional and Language - Options, select the Language tab and click the Details... button in - the square named "Text services and input Languages". Your - environment probably provides similar means of changing the keyboard - layout. Make sure you have a way to easily switch back and forth - between keyboards if you are not used to this, entering commands - using a Cyrillic character set is, as an example, not easily done in - the Erlang shell.</p> - - <p>Now you are set up for some Unicode input and output. The - simplest thing to do is of course to enter a string in the - shell:</p> - - <pre> + <p>When (finally?) everything is in order with the locale settings, fonts. + and the terminal emulator, you have probably found a way to input + characters in the script you desire. For testing, the simplest way is to + add some keyboard mappings for other languages, usually done with some + applet in your desktop environment.</p> + + <p>In a KDE environment, select <em>KDE Control Center (Personal + Settings)</em> > <em>Regional and Accessibility</em> > <em>Keyboard + Layout</em>.</p> + + <p>On Windows XP, select <em>Control Panel</em> > <em>Regional and Language + Options</em>, select tab <em>Language</em>, and click button + <em>Details...</em> in the square named <em>Text Services and Input + Languages</em>.</p> + + <p>Your environment + probably provides similar means of changing the keyboard layout. Ensure + that you have a way to switch back and forth between keyboards easily if + you are not used to this. For example, entering commands using a Cyrillic + character set is not easily done in the Erlang shell.</p> + + <p>Now you are set up for some Unicode input and output. The simplest thing + to do is to enter a string in the shell:</p> + + <pre> $ <input>erl</input> Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false] @@ -603,12 +685,13 @@ Eshell V5.10.1 (abort with ^G) 3> <input>io:format("~ts~n", [v(2)]).</input> Юникод ok -4> </pre> - <p>While strings can be input as Unicode characters, the language - elements are still limited to the ISO-latin-1 character set. Only - character constants and strings are allowed to be beyond that - range:</p> - <pre> +4></pre> + + <p>While strings can be input as Unicode characters, the language elements + are still limited to the ISO Latin-1 character set. Only character + constants and strings are allowed to be beyond that range:</p> + + <pre> $ <input>erl</input> Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false] @@ -618,371 +701,398 @@ Eshell V5.10.1 (abort with ^G) 2> <input>Юникод.</input> * 1: illegal character 2> </pre> -</section> -<section> - <title>Unicode File Names</title> - <marker id="unicode_file_names"/> - <p>Most modern operating systems support Unicode file names in some - way or another. There are several different ways to do this and - Erlang by default treats the different approaches differently:</p> - <taglist> - <tag>Mandatory Unicode file naming</tag> - <item> - <p>Windows and, for most common uses, MacOS X enforces Unicode - support for file names. All files created in the file system have - names that can consistently be interpreted. In MacOS X, all file - names are retrieved in UTF-8 encoding, while Windows has - selected an approach where each system call handling file names - has a special Unicode aware variant, giving much the same - effect. There are no file names on these systems that are not - Unicode file names, why the default behavior of the Erlang VM is - to work in "Unicode file name translation mode", - meaning that a file name can be given as a Unicode list and that - will be automatically translated to the proper name encoding for - the underlying operating and file system.</p> - <p>Doing i.e. a <c>file:list_dir/1</c> on one of these systems - may return Unicode lists with code points beyond 255, depending - on the content of the actual file system.</p> - <p>As the feature is fairly new, you may still stumble upon non - core applications that cannot handle being provided with file - names containing characters with code points larger than 255, but - the core Erlang system should have no problems with Unicode file - names.</p> - </item> - <tag>Transparent file naming</tag> - <item> - <p>Most Unix operating systems have adopted a simpler approach, - namely that Unicode file naming is not enforced, but by - convention. Those systems usually use UTF-8 encoding for Unicode - file names, but do not enforce it. On such a system, a file name - containing characters having code points between 128 and 255 may - be named either as plain ISO-latin-1 or using UTF-8 encoding. As - no consistency is enforced, the Erlang VM can do no consistent - translation of all file names.</p> - - <p>By default on such systems, Erlang starts in <c>utf8</c> file - name mode if the terminal supports UTF-8, otherwise in - <c>latin1</c> mode.</p> - - <p>In the <c>latin1</c> mode, file names are bytewise endcoded. - This allows for list representation of all file names in - the system, but, for example, a file named "Östersund.txt", will - appear in <c>file:list_dir/1</c> as either "Östersund.txt" (if - the file name was encoded in bytewise ISO-Latin-1 by the program - creating the file, or more probably as - <c>[195,150,115,116,101,114,115,117,110,100]</c>, which is a - list containing UTF-8 bytes - not what you would want... If you - on the other hand use Unicode file name translation on such a - system, non-UTF-8 file names will simply be ignored by functions - like <c>file:list_dir/1</c>. They can be retrieved with - <c>file:list_dir_all/1</c>, but wrongly encoded file names will - appear as "raw file names".</p> - - </item> - </taglist> - - <p>The Unicode file naming support was introduced with Erlang/OTP - R14B01. A VM operating in Unicode file name translation mode can - work with files having names in any language or character set (as - long as it is supported by the underlying OS and file system). The - Unicode character list is used to denote file or directory names and - if the file system content is listed, you will also get - Unicode lists as return value. The support lies in the Kernel and - STDLIB modules, why most applications (that does not explicitly - require the file names to be in the ISO-latin-1 range) will benefit - from the Unicode support without change.</p> - - <p>On operating systems with mandatory Unicode file names, this - means that you more easily conform to the file names of other (non - Erlang) applications, and you can also process file names that, at - least on Windows, were completely inaccessible (due to having names - that could not be represented in ISO-latin-1). Also you will avoid - creating incomprehensible file names on MacOS X as the vfs layer of - the OS will accept all your file names as UTF-8 and will not rewrite - them.</p> - - <p>For most systems, turning on Unicode file name translation is no - problem even if it uses transparent file naming. Very few systems - have mixed file name encodings. A consistent UTF-8 named system will - work perfectly in Unicode file name mode. It was still however - considered experimental in Erlang/OTP R14B01 and is still not the default on - such systems. Unicode file name translation is turned on with the - <c>+fnu</c> switch to the On Linux, a VM started without explicitly - stating the file name translation mode will default to <c>latin1</c> - as the native file name encoding. On Windows and MacOS X, the - default behavior is that of Unicode file name translation, why the - <c>file:native_name_encoding/0</c> by default returns <c>utf8</c> on - those systems (the fact that Windows actually does not use UTF-8 on - the file system level can safely be ignored by the Erlang - programmer). The default behavior can, as stated before, be - changed using the <c>+fnu</c> or <c>+fnl</c> options to the VM, see - the <seealso marker="erts:erl"><c>erl</c></seealso> program. If the - VM is started in Unicode file name translation mode, - <c>file:native_name_encoding/0</c> will return the atom - <c>utf8</c>. The <c>+fnu</c> switch can be followed by <c>w</c>, - <c>i</c> or <c>e</c>, to control how wrongly encoded file names are - to be reported. <c>w</c> means that a warning is sent to the - <c>error_logger</c> whenever a wrongly encoded file name is - "skipped" in directory listings, <c>i</c> means that those wrongly - encoded file names are silently ignored and <c>e</c> means that the - API function will return an error whenever a wrongly encoded file - (or directory) name is encountered. <c>w</c> is the default. Note - that <c>file:read_link/1</c> will always return an error if the link - points to an invalid file name.</p> - - <p>In Unicode file name mode, file names given to the BIF - <c>open_port/2</c> with the option <c>{spawn_executable,...}</c> are - also interpreted as Unicode. So is the parameter list given in the - <c>args</c> option available when using <c>spawn_executable</c>. The - UTF-8 translation of arguments can be avoided using binaries, see - the discussion about raw file names below.</p> - - <p>It is worth noting that the file <c>encoding</c> options given - when opening a file has nothing to do with the file <em>name</em> - encoding convention. You can very well open files containing data - encoded in UTF-8 but having file names in bytewise (<c>latin1</c>) encoding - or vice versa.</p> - - <note><p>Erlang drivers and NIF shared objects still can not be - named with names containing code points beyond 127. This is a known - limitation to be removed in a future release. Erlang modules however - can, but it is definitely not a good idea and is still considered - experimental.</p></note> - -<section> - <title>Notes About Raw File Names</title> - <marker id="notes-about-raw-filenames"/> - <p>Raw file names were introduced together with Unicode file name - support in erts-5.8.2 (Erlang/OTP R14B01). The reason "raw file - names" was introduced in the system was to be able to - consistently represent file names given in different encodings on - the same system. Having the VM automatically translate a file name - that is not in UTF-8 to a list of Unicode characters might seem - practical, but this would open up for both duplicate file names and - other inconsistent behavior. Consider a directory containing a file - named "björn" in ISO-latin-1, while the Erlang VM is - operating in Unicode file name mode (and therefore expecting UTF-8 - file naming). The ISO-latin-1 name is not valid UTF-8 and one could - be tempted to think that automatic conversion in for example - <c>file:list_dir/1</c> is a good idea. But what would happen if we - later tried to open the file and have the name as a Unicode list - (magically converted from the ISO-latin-1 file name)? The VM will - convert the file name given to UTF-8, as this is the encoding - expected. Effectively this means trying to open the file named - <<"björn"/utf8>>. This file does not exist, - and even if it existed it would not be the same file as the one that - was listed. We could even create two files named "björn", - one named in the UTF-8 encoding and one not. If - <c>file:list_dir/1</c> would automatically convert the ISO-latin-1 - file name to a list, we would get two identical file names as the - result. To avoid this, we need to differentiate between file names - being properly encoded according to the Unicode file naming - convention (i.e. UTF-8) and file names being invalid under the - encoding. By the common <c>file:list_dir/1</c> function, the wrongly - encoded file names are simply ignored in Unicode file name - translation mode, but by the <c>file:list_dir_all/1</c> function, - the file names with invalid encoding are returned as "raw" - file names, i.e. as binaries.</p> - - <p>The Erlang <c>file</c> module accepts raw file names as - input. <c>open_port({spawn_executable, ...} ...)</c> also accepts - them. As mentioned earlier, the arguments given in the option list - to <c>open_port({spawn_executable, ...} ...)</c> undergo the same - conversion as the file names, meaning that the executable will be - provided with arguments in UTF-8 as well. This translation is - avoided consistently with how the file names are treated, by giving - the argument as a binary.</p> - - <p>To force Unicode file name translation mode on systems where this - is not the default was considered experimental in Erlang/OTP R14B01 due to - the fact that the initial implementation did not ignore wrongly - encoded file names, so that raw file names could spread unexpectedly - throughout the system. Beginning with Erlang/OTP R16B, the wrongly encoded file - names are only retrieved by special functions - (e.g. <c>file:list_dir_all/1</c>), so the impact on existing code is - much lower, why it is now supported. Unicode file name translation - is expected to be default in future releases.</p> - - <p>Even if you are operating without Unicode file naming translation - automatically done by the VM, you can access and create files with - names in UTF-8 encoding by using raw file names encoded as - UTF-8. Enforcing the UTF-8 encoding regardless of the mode the - Erlang VM is started in might, in some circumstances be a good idea, - as the convention of using UTF-8 file names is spreading.</p> -</section> -<section> - <title>Notes About MacOS X</title> - <p>MacOS X's vfs layer enforces UTF-8 file names in a quite - aggressive way. Older versions did this by simply refusing to create - non UTF-8 conforming file names, while newer versions replace - offending bytes with the sequence "%HH", where HH is the - original character in hexadecimal notation. As Unicode translation - is enabled by default on MacOS X, the only way to come up against - this is to either start the VM with the <c>+fnl</c> flag or to use a - raw file name in bytewise (<c>latin1</c>) encoding. If using a raw - filename, with a bytewise encoding containing characters between 127 - and 255, to create a file, the file can not be opened using the same - name as the one used to create it. There is no remedy for this - behaviour, other than keeping the file names in the right - encoding.</p> - - <p>MacOS X also reorganizes the names of files so that the - representation of accents etc is using the "combining characters", - i.e. the character <c>ö</c> is represented as the code points - [111,776], where 111 is the character <c>o</c> and 776 is the - special accent character "combining diaeresis". This way of - normalizing Unicode is otherwise very seldom used and Erlang - normalizes those file names in the opposite way upon retrieval, so - that file names using combining accents are not passed up to the - Erlang application. In Erlang the file name "björn" is - retrieved as [98,106,246,114,110], not as [98,106,117,776,114,110], - even though the file system might think differently. The - normalization into combining accents are redone when actually - accessing files, so this can usually be ignored by the Erlang - programmer.</p> -</section> -</section> -<section> - <title>Unicode in Environment and Parameters</title> - <marker id="unicode_in_environment_and_parameters"/> - <p>Environment variables and their interpretation is handled much in - the same way as file names. If Unicode file names are enabled, - environment variables as well as parameters to the Erlang VM are - expected to be in Unicode.</p> - <p>If Unicode file names are enabled, the calls to - <seealso marker="kernel:os#getenv/0"><c>os:getenv/0</c></seealso>, - <seealso marker="kernel:os#getenv/1"><c>os:getenv/1</c></seealso>, - <seealso marker="kernel:os#putenv/2"><c>os:putenv/2</c></seealso> and - <seealso marker="kernel:os#unsetenv/1"><c>os:unsetenv/1</c></seealso> - will handle Unicode strings. On Unix-like platforms, the built-in - functions will translate environment variables in UTF-8 to/from - Unicode strings, possibly with code points > 255. On Windows the - Unicode versions of the environment system API will be used, also - allowing for code points > 255.</p> - <p>On Unix-like operating systems, parameters are expected to be - UTF-8 without translation if Unicode file names are enabled.</p> -</section> -<section> - <title>Unicode-aware Modules</title> - <p>Most of the modules in Erlang/OTP are of course Unicode-unaware - in the sense that they have no notion of Unicode and really should - not have. Typically they handle non-textual or byte-oriented data - (like <c>gen_tcp</c> etc).</p> - <p>Modules that actually handle textual data (like <c>io_lib</c>, - <c>string</c> etc) are sometimes subject to conversion or extension - to be able to handle Unicode characters.</p> - <p>Fortunately, most textual data has been stored in lists and range - checking has been sparse, why modules like <c>string</c> works well - for Unicode lists with little need for conversion or extension.</p> - <p>Some modules are however changed to be explicitly - Unicode-aware. These modules include:</p> - <taglist> - <tag><c>unicode</c></tag> - <item> - <p>The module <seealso marker="stdlib:unicode"><c>unicode</c></seealso> - is obviously Unicode-aware. It contains functions for conversion - between different Unicode formats as well as some utilities for - identifying byte order marks. Few programs handling Unicode data - will survive without this module.</p> - </item> - <tag><c>io</c></tag> - <item> - <p>The <seealso marker="stdlib:io"><c>io</c></seealso> module has been - extended along with the actual I/O-protocol to handle Unicode - data. This means that several functions require binaries to be - in UTF-8 and there are modifiers to formatting control sequences - to allow for outputting of Unicode strings.</p> - </item> - <tag><c>file</c>, <c>group</c>, <c>user</c></tag> - <item> - <p>I/O-servers throughout the system are able to handle - Unicode data and has options for converting data upon actual - output or input to/from the device. As shown earlier, the - <seealso marker="stdlib:shell"><c>shell</c></seealso> has support for - Unicode terminals and the <seealso - marker="kernel:file"><c>file</c></seealso> module allows for - translation to and from various Unicode formats on disk.</p> - <p>The actual reading and writing of files with Unicode data is - however not best done with the <c>file</c> module as its - interface is byte oriented. A file opened with a Unicode - encoding (like UTF-8), is then best read or written using the - <seealso marker="stdlib:io"><c>io</c></seealso> module.</p> - </item> - <tag><c>re</c></tag> - <item> - <p>The <seealso marker="stdlib:re"><c>re</c></seealso> module allows - for matching Unicode strings as a special option. As the library - is actually centered on matching in binaries, the Unicode - support is UTF-8-centered.</p> - </item> - <tag><c>wx</c></tag> - <item> - <p>The <seealso marker="wx:wx"><c>wx</c></seealso> graphical library - has extensive support for Unicode text</p> - </item> - </taglist> - <p>The module <seealso - marker="stdlib:string"><c>string</c></seealso> works perfectly for - Unicode strings as well as for ISO-latin-1 strings with the - exception of the language-dependent <seealso - marker="stdlib:string#to_upper/1"><c>to_upper</c></seealso> and - <seealso marker="stdlib:string#to_lower/1"><c>to_lower</c></seealso> - functions, which are only correct for the ISO-latin-1 character - set. Actually they can never function correctly for Unicode - characters in their current form, as there are language and locale - issues as well as multi-character mappings to consider when - converting text between cases. Converting case in an international - environment is a big subject not yet addressed in OTP.</p> -</section> -<section> - <title>Unicode Data in Files</title> - <p>The fact that Erlang as such can handle Unicode data in many forms - does not automatically mean that the content of any file can be - Unicode text. The external entities such as ports or I/O-servers are - not generally Unicode capable.</p> - <p>Ports are always byte oriented, so before sending data that you - are not sure is bytewise encoded to a port, make sure to encode it - in a proper Unicode encoding. Sometimes this will mean that only - part of the data shall be encoded as e.g. UTF-8, some parts may be - binary data (like a length indicator) or something else that shall - not undergo character encoding, so no automatic translation is - present.</p> - <p>I/O-servers behave a little differently. The I/O-servers connected - to terminals (or stdout) can usually cope with Unicode data - regardless of the <c>encoding</c> option. This is convenient when - one expects a modern environment but do not want to crash when - writing to a archaic terminal or pipe. Files on the other hand are - more picky. A file can have an encoding option which makes it - generally usable by the io-module (e.g. <c>{encoding,utf8}</c>), but - is by default opened as a byte oriented file. The <seealso - marker="kernel:file"><c>file</c></seealso> module is byte oriented, why only - ISO-Latin-1 characters can be written using that module. The - <seealso marker="stdlib:io"><c>io</c></seealso> module is the one to use if - Unicode data is to be output to a file with other <c>encoding</c> - than <c>latin1</c> (a.k.a. bytewise encoding). It is slightly - confusing that a file opened with - e.g. <c>file:open(Name,[read,{encoding,utf8}])</c>, cannot be - properly read using <c>file:read(File,N)</c> but you have to use the - <c>io</c> module to retrieve the Unicode data from it. The reason is - that <c>file:read</c> and <c>file:write</c> (and friends) are purely - byte oriented, and should so be, as that is the way to access - files other than text files - byte by byte. Just as with ports, you - can of course write encoded data into a file by "manually" converting - the data to the encoding of choice (using the <seealso - marker="stdlib:unicode"><c>unicode</c></seealso> module or the bit syntax) - and then output it on a bytewise encoded (<c>latin1</c>) file.</p> - <p>The rule of thumb is that the <seealso - marker="kernel:file"><c>file</c></seealso> module should be used for files - opened for bytewise access (<c>{encoding,latin1}</c>) and the - <seealso marker="stdlib:io"><c>io</c></seealso> module should be used when - accessing files with any other encoding - (e.g. <c>{encoding,uf8}</c>).</p> - - <p>Functions reading Erlang syntax from files generally recognize - the <c>coding:</c> comment and can therefore handle Unicode data on - input. When writing Erlang Terms to a file, you should insert - such comments when applicable:</p> - <pre> + </section> + + <section> + <title>Unicode Filenames</title> + <marker id="unicode_file_names"/> + <p>Most modern operating systems support Unicode filenames in some way. + There are many different ways to do this and Erlang by default treats the + different approaches differently:</p> + + <taglist> + <tag>Mandatory Unicode file naming</tag> + <item> + <p>Windows and, for most common uses, MacOS X enforce Unicode support + for filenames. All files created in the file system have names that + can consistently be interpreted. In MacOS X, all filenames are + retrieved in UTF-8 encoding. In Windows, each system call handling + filenames has a special Unicode-aware variant, giving much the same + effect. There are no filenames on these systems that are not Unicode + filenames. So, the default behavior of the Erlang VM is to work in + "Unicode filename translation mode". This means that a + filename can be specified as a Unicode list, which is automatically + translated to the proper name encoding for the underlying operating + system and file system.</p> + <p>Doing, for example, a + <seealso marker="kernel:file#list_dir/1"><c>file:list_dir/1</c></seealso> + on one of these systems can return Unicode lists with code points + > 255, depending on the content of the file system.</p> + </item> + <tag>Transparent file naming</tag> + <item> + <p>Most Unix operating systems have adopted a simpler approach, namely + that Unicode file naming is not enforced, but by convention. Those + systems usually use UTF-8 encoding for Unicode filenames, but do not + enforce it. On such a system, a filename containing characters with + code points from 128 through 255 can be named as plain ISO Latin-1 or + use UTF-8 encoding. As no consistency is enforced, the Erlang VM + cannot do consistent translation of all filenames.</p> + <p>By default on such systems, Erlang starts in <c>utf8</c> filename + mode if the terminal supports UTF-8, otherwise in <c>latin1</c> + mode.</p> + <p>In <c>latin1</c> mode, filenames are bytewise encoded. This allows + for list representation of all filenames in the system. However, a + a file named "Östersund.txt", appears in + <seealso marker="kernel:file#list_dir/1"><c>file:list_dir/1</c></seealso> + either as "Östersund.txt" (if the filename was encoded in bytewise + ISO Latin-1 by the program creating the file) or more probably as + <c>[195,150,115,116,101,114,115,117,110,100]</c>, which is a list + containing UTF-8 bytes (not what you want). If you use Unicode + filename translation on such a system, non-UTF-8 filenames are + ignored by functions like <c>file:list_dir/1</c>. They can be + retrieved with function + <seealso marker="kernel:file#list_dir_all/1"><c>file:list_dir_all/1</c></seealso>, + but wrongly encoded filenames appear as "raw filenames". + </p> + </item> + </taglist> + + <p>The Unicode file naming support was introduced in Erlang/OTP + R14B01. A VM operating in Unicode filename translation mode can + work with files having names in any language or character set (as + long as it is supported by the underlying operating system and + file system). The Unicode character list is used to denote + filenames or directory names. If the file system content is + listed, you also get Unicode lists as return value. The support + lies in the Kernel and STDLIB modules, which is why + most applications (that does not explicitly require the filenames + to be in the ISO Latin-1 range) benefit from the Unicode support + without change.</p> + + <p>On operating systems with mandatory Unicode filenames, this means that + you more easily conform to the filenames of other (non-Erlang) + applications. You can also process filenames that, at least on Windows, + were inaccessible (because of having names that could not be represented + in ISO Latin-1). Also, you avoid creating incomprehensible filenames + on MacOS X, as the <c>vfs</c> layer of the operating system accepts all + your filenames as UTF-8 does not rewrite them.</p> + + <p>For most systems, turning on Unicode filename translation is no problem + even if it uses transparent file naming. Very few systems have mixed + filename encodings. A consistent UTF-8 named system works perfectly in + Unicode filename mode. It was still, however, considered experimental in + Erlang/OTP R14B01 and is still not the default on such systems.</p> + + <p>Unicode filename translation is turned on with switch <c>+fnu</c>. On + Linux, a VM started without explicitly stating the filename translation + mode defaults to <c>latin1</c> as the native filename encoding. On + Windows and MacOS X, the default behavior is that of Unicode filename + translation. Therefore + <seealso marker="kernel:file#native_name_encoding/0"><c>file:native_name_encoding/0</c></seealso> + by default returns <c>utf8</c> on those systems (Windows does not use + UTF-8 on the file system level, but this can safely be ignored by the + Erlang programmer). The default behavior can, as stated earlier, be + changed using option <c>+fnu</c> or <c>+fnl</c> to the VM, see the + <seealso marker="erts:erl"><c>erl</c></seealso> program. If the VM is + started in Unicode filename translation mode, + <c>file:native_name_encoding/0</c> returns atom <c>utf8</c>. Switch + <c>+fnu</c> can be followed by <c>w</c>, <c>i</c>, or <c>e</c> to control + how wrongly encoded filenames are to be reported.</p> + + <list type="bulleted"> + <item> + <p><c>w</c> means that a warning is sent to the <c>error_logger</c> + whenever a wrongly encoded filename is "skipped" in directory + listings. <c>w</c> is the default.</p> + </item> + <item> + <p><c>i</c> means that wrongly encoded filenames are silently ignored. + </p> + </item> + <item> + <p><c>e</c> means that the API function returns an error whenever a + wrongly encoded filename (or directory name) is encountered.</p> + </item> + </list> + + <p>Notice that + <seealso marker="kernel:file#read_link/1"><c>file:read_link/1</c></seealso> + always returns an error if the link points to an invalid filename.</p> + + <p>In Unicode filename mode, filenames given to BIF <c>open_port/2</c> with + option <c>{spawn_executable,...}</c> are also interpreted as Unicode. So + is the parameter list specified in option <c>args</c> available when + using <c>spawn_executable</c>. The UTF-8 translation of arguments can be + avoided using binaries, see section + <seealso marker="#notes-about-raw-filenames">Notes About Raw Filenames</seealso>. + </p> + + <p>Notice that the file encoding options specified when opening a file has + nothing to do with the filename encoding convention. You can very well + open files containing data encoded in UTF-8, but having filenames in + bytewise (<c>latin1</c>) encoding or conversely.</p> + + <note><p>Erlang drivers and NIF-shared objects still cannot be named with + names containing code points > 127. This limitation will be removed in + a future release. However, Erlang modules can, but it is definitely not a + good idea and is still considered experimental.</p> + </note> + + <section> + <title>Notes About Raw Filenames</title> + <marker id="notes-about-raw-filenames"/> + <p>Raw filenames were introduced together with Unicode filename support + in ERTS 5.8.2 (Erlang/OTP R14B01). The reason "raw + filenames" were introduced in the system was + to be able to represent + filenames, specified in different encodings on the same system, + consistently. It can seem practical to have the VM automatically + translate a filename that is not in UTF-8 to a list of Unicode + characters, but this would open up for both duplicate filenames and + other inconsistent behavior.</p> + + <p>Consider a directory containing a file named "björn" in ISO + Latin-1, while the Erlang VM is operating in Unicode filename mode (and + therefore expects UTF-8 file naming). The ISO Latin-1 name is not valid + UTF-8 and one can be tempted to think that automatic conversion in, for + example, + <seealso marker="kernel:file#list_dir/1"><c>file:list_dir/1</c></seealso> + is a good idea. But what would happen if we later tried to open the file + and have the name as a Unicode list (magically converted from the ISO + Latin-1 filename)? The VM converts the filename to UTF-8, as this is + the encoding expected. Effectively this means trying to open the file + named <<"björn"/utf8>>. This file does not exist, + and even if it existed it would not be the same file as the one that was + listed. We could even create two files named "björn", one + named in UTF-8 encoding and one not. If <c>file:list_dir/1</c> would + automatically convert the ISO Latin-1 filename to a list, we would get + two identical filenames as the result. To avoid this, we must + differentiate between filenames that are properly encoded according to + the Unicode file naming convention (that is, UTF-8) and filenames that + are invalid under the encoding. By the common function + <c>file:list_dir/1</c>, the wrongly encoded filenames are ignored in + Unicode filename translation mode, but by function + <seealso marker="kernel:file#list_dir_all/1"><c>file:list_dir_all/1</c></seealso> + the filenames with invalid encoding are returned as "raw" + filenames, that is, as binaries.</p> + + <p>The <c>file</c> module accepts raw filenames as input. + <c>open_port({spawn_executable, ...} ...)</c> also accepts them. As + mentioned earlier, the arguments specified in the option list to + <c>open_port({spawn_executable, ...} ...)</c> undergo the same + conversion as the filenames, meaning that the executable is provided + with arguments in UTF-8 as well. This translation is avoided + consistently with how the filenames are treated, by giving the argument + as a binary.</p> + + <p>To force Unicode filename translation mode on systems where this is not + the default was considered experimental in Erlang/OTP R14B01. This was + because the initial implementation did not ignore wrongly encoded + filenames, so that raw filenames could spread unexpectedly throughout + the system. As from Erlang/OTP R16B, the wrongly encoded + filenames are only retrieved by special functions (such as + <c>file:list_dir_all/1</c>). Since the impact on existing code is + therefore much lower it is now supported. + Unicode filename translation is + expected to be default in future releases.</p> + + <p>Even if you are operating without Unicode file naming translation + automatically done by the VM, you can access and create files with + names in UTF-8 encoding by using raw filenames encoded as UTF-8. + Enforcing the UTF-8 encoding regardless of the mode the Erlang VM is + started in can in some circumstances be a good idea, as the convention + of using UTF-8 filenames is spreading.</p> + </section> + + <section> + <title>Notes About MacOS X</title> + <p>The <c>vfs</c> layer of MacOS X enforces UTF-8 filenames in an + aggressive way. Older versions did this by refusing to create non-UTF-8 + conforming filenames, while newer versions replace offending bytes with + the sequence "%HH", where HH is the original character in + hexadecimal notation. As Unicode translation is enabled by default on + MacOS X, the only way to come up against this is to either start the VM + with flag <c>+fnl</c> or to use a raw filename in bytewise + (<c>latin1</c>) encoding. If using a raw filename, with a bytewise + encoding containing characters from 127 through 255, to create a file, + the file cannot be opened using the same name as the one used to create + it. There is no remedy for this behavior, except keeping the filenames + in the correct encoding.</p> + + <p>MacOS X reorganizes the filenames so that the representation of + accents, and so on, uses the "combining characters". For example, + character <c>ö</c> is represented as code points <c>[111,776]</c>, + where <c>111</c> is character <c>o</c> and <c>776</c> is the special + accent character "Combining Diaeresis". This way of normalizing Unicode + is otherwise very seldom used. Erlang normalizes those filenames in the + opposite way upon retrieval, so that filenames using combining accents + are not passed up to the Erlang application. In Erlang, filename + "björn" is retrieved as <c>[98,106,246,114,110]</c>, not as + <c>[98,106,117,776,114,110]</c>, although the file system can think + differently. The normalization into combining accents is redone when + accessing files, so this can usually be ignored by the Erlang + programmer.</p> + </section> + </section> + + <section> + <title>Unicode in Environment and Parameters</title> + <marker id="unicode_in_environment_and_parameters"/> + <p>Environment variables and their interpretation are handled much in the + same way as filenames. If Unicode filenames are enabled, environment + variables as well as parameters to the Erlang VM are expected to be in + Unicode.</p> + + <p>If Unicode filenames are enabled, the calls to + <seealso marker="kernel:os#getenv/0"><c>os:getenv/0,1</c></seealso>, + <seealso marker="kernel:os#putenv/2"><c>os:putenv/2</c></seealso>, and + <seealso marker="kernel:os#unsetenv/1"><c>os:unsetenv/1</c></seealso> + handle Unicode strings. On Unix-like platforms, the built-in functions + translate environment variables in UTF-8 to/from Unicode strings, possibly + with code points > 255. On Windows, the Unicode versions of the + environment system API are used, and code points > 255 are allowed.</p> + <p>On Unix-like operating systems, parameters are expected to be UTF-8 + without translation if Unicode filenames are enabled.</p> + </section> + + <section> + <title>Unicode-Aware Modules</title> + <p>Most of the modules in Erlang/OTP are Unicode-unaware in the sense that + they have no notion of Unicode and should not have. Typically they handle + non-textual or byte-oriented data (such as <c>gen_tcp</c>).</p> + + <p>Modules handling textual data (such as + <seealso marker="stdlib:io_lib"><c>io_lib</c></seealso> and + <seealso marker="stdlib:string"><c>string</c></seealso> are sometimes + subject to conversion or extension to be able to handle Unicode + characters.</p> + + <p>Fortunately, most textual data has been stored in lists and range + checking has been sparse, so modules like <c>string</c> work well for + Unicode lists with little need for conversion or extension.</p> + + <p>Some modules are, however, changed to be explicitly Unicode-aware. These + modules include:</p> + + <taglist> + <tag><c>unicode</c></tag> + <item> + <p>The <seealso marker="stdlib:unicode"><c>unicode</c></seealso> + module is clearly Unicode-aware. It contains functions for conversion + between different Unicode formats and some utilities for identifying + byte order marks. Few programs handling Unicode data survive without + this module.</p> + </item> + <tag><c>io</c></tag> + <item> + <p>The <seealso marker="stdlib:io"><c>io</c></seealso> module has been + extended along with the actual I/O protocol to handle Unicode data. + This means that many functions require binaries to be in UTF-8, and + there are modifiers to format control sequences to allow for output + of Unicode strings.</p> + </item> + <tag><c>file</c>, <c>group</c>, <c>user</c></tag> + <item> + <p>I/O-servers throughout the system can handle Unicode data and have + options for converting data upon output or input to/from the device. + As shown earlier, the + <seealso marker="stdlib:shell"><c>shell</c></seealso> module has + support for Unicode terminals and the + <seealso marker="kernel:file"><c>file</c></seealso> module + allows for translation to and from various Unicode formats on + disk.</p> + <p>Reading and writing of files with Unicode data is, however, not best + done with the <c>file</c> module, as its interface is + byte-oriented. A file opened with a Unicode encoding (like UTF-8) is + best read or written using the + <seealso marker="stdlib:io"><c>io</c></seealso> module.</p> + </item> + <tag><c>re</c></tag> + <item> + <p>The <seealso marker="stdlib:re"><c>re</c></seealso> module allows + for matching Unicode strings as a special option. As the library is + centered on matching in binaries, the Unicode support is + UTF-8-centered.</p> + </item> + <tag><c>wx</c></tag> + <item> + <p>The graphical library <seealso marker="wx:wx"><c>wx</c></seealso> + has extensive support for Unicode text.</p></item> + </taglist> + + <p>The <seealso marker="stdlib:string"><c>string</c></seealso> module works + perfectly for Unicode strings and ISO Latin-1 strings, except the + language-dependent functions + <seealso marker="stdlib:string#to_upper/1"><c>string:to_upper/1</c></seealso> + and + <seealso marker="stdlib:string#to_lower/1"><c>string:to_lower/1</c></seealso>, + which are only correct for the ISO Latin-1 character set. These two + functions can never function correctly for Unicode characters in their + current form, as there are language and locale issues as well as + multi-character mappings to consider when converting text between cases. + Converting case in an international environment is a large subject not + yet addressed in OTP.</p> + </section> + + <section> + <title>Unicode Data in Files</title> + <p>Although Erlang can handle Unicode data in many forms does not + automatically mean that the content of any file can be Unicode text. The + external entities, such as ports and I/O servers, are not generally + Unicode capable.</p> + + <p>Ports are always byte-oriented, so before sending data that you are not + sure is bytewise-encoded to a port, ensure to encode it in a proper + Unicode encoding. Sometimes this means that only part of the data must + be encoded as, for example, UTF-8. Some parts can be binary data (like a + length indicator) or something else that must not undergo character + encoding, so no automatic translation is present.</p> + + <p>I/O servers behave a little differently. The I/O servers connected to + terminals (or <c>stdout</c>) can usually cope with Unicode data + regardless of the encoding option. This is convenient when one expects + a modern environment but do not want to crash when writing to an archaic + terminal or pipe.</p> + + <p>A file can have an encoding option that makes it generally usable by the + <seealso marker="stdlib:io"><c>io</c></seealso> module (for example + <c>{encoding,utf8}</c>), but is by default opened as a byte-oriented file. + The <seealso marker="kernel:file"><c>file</c></seealso> module is + byte-oriented, so only ISO Latin-1 characters can be written using that + module. Use the <c>io</c> module if Unicode data is to be output to a + file with other <c>encoding</c> than <c>latin1</c> (bytewise encoding). + It is slightly confusing that a file opened with, for example, + <c>file:open(Name,[read,{encoding,utf8}])</c> cannot be properly read + using <c>file:read(File,N)</c>, but using the <c>io</c> module to retrieve + the Unicode data from it. The reason is that <c>file:read</c> and + <c>file:write</c> (and friends) are purely byte-oriented, and should be, + as that is the way to access files other than text files, byte by byte. + As with ports, you can write encoded data into a file by "manually" + converting the data to the encoding of choice (using the + <seealso marker="stdlib:unicode"><c>unicode</c></seealso> module or the + bit syntax) and then output it on a bytewise (<c>latin1</c>) encoded + file.</p> + + <p>Recommendations:</p> + + <list type="bulleted"> + <item><p>Use the + <seealso marker="kernel:file"><c>file</c></seealso> module for + files opened for bytewise access (<c>{encoding,latin1}</c>).</p> + </item> + <item><p>Use the <seealso marker="stdlib:io"><c>io</c></seealso> module + when accessing files with any other encoding (for example + <c>{encoding,uf8}</c>).</p> + </item> + </list> + + <p>Functions reading Erlang syntax from files recognize the <c>coding:</c> + comment and can therefore handle Unicode data on input. When writing + Erlang terms to a file, you are advised to insert such comments when + applicable:</p> + + <pre> $ <input>erl +fna +pc unicode</input> Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false] @@ -990,202 +1100,224 @@ Eshell V5.10.1 (abort with ^G) 1> <input>file:write_file("test.term",<<"%% coding: utf-8\n[{\"Юникод\",4711}].\n"/utf8>>).</input> ok 2> <input>file:consult("test.term").</input> -{ok,[[{"Юникод",4711}]]} - </pre> -</section> -<section> - <title>Summary of Options</title> - <marker id="unicode_options_summary"/> - <p>The Unicode support is controlled by both command line switches, - some standard environment variables and the version of OTP you are - using. Most options affect mainly the way Unicode data is displayed, - not the actual functionality of the API's in the standard - libraries. This means that Erlang programs usually do not - need to concern themselves with these options, they are more for the - development environment. An Erlang program can be written so that it - works well regardless of the type of system or the Unicode options - that are in effect.</p> - - <p>Here follows a summary of the settings affecting Unicode:</p> - <taglist> - <tag>The <c>LANG</c> and <c>LC_CTYPE</c> environment variables</tag> - <item> - <p>The language setting in the OS mainly affects the shell. The - terminal (i.e. the group leader) will operate with <c>{encoding, - unicode}</c> only if the environment tells it that UTF-8 is - allowed. This setting should correspond to the actual terminal - you are using.</p> - <p>The environment can also affect file name interpretation, if - Erlang is started with the <c>+fna</c> flag (which is default from - Erlang/OTP 17.0).</p> - <p>You can check the setting of this by calling - <c>io:getopts()</c>, which will give you an option list - containing <c>{encoding,unicode}</c> or - <c>{encoding,latin1}</c>.</p> - </item> - <tag>The <c>+pc </c>{<c>unicode</c>|<c>latin1</c>} flag to - <seealso marker="erts:erl"><c>erl(1)</c></seealso></tag> - <item> - <p>This flag affects what is interpreted as string data when - doing heuristic string detection in the shell and in - <c>io</c>/<c>io_lib:format</c> with the <c>"~tp"</c> and - <c>~tP</c> formatting instructions, as described above.</p> - <p>You can check this option by calling io:printable_range/0, - which will return <c>unicode</c> or <c>latin1</c>. To be - compatible with future (expected) extensions to the settings, - one should rather use <c>io_lib:printable_list/1</c> to check if - a list is printable according to the setting. That function will - take into account new possible settings returned from - <c>io:printable_range/0</c>.</p> - </item> - <tag>The <c>+fn</c>{<c>l</c>|<c>a</c>|<c>u</c>} - [{<c>w</c>|<c>i</c>|<c>e</c>}] - flag to <seealso marker="erts:erl"><c>erl(1)</c></seealso></tag> - <item> - <p>This flag affects how the file names are to be interpreted. On - operating systems with transparent file naming, this has to be - specified to allow for file naming in Unicode characters (and - for correct interpretation of file names containing characters - > 255.</p> - <p><c>+fnl</c> means bytewise interpretation of file names, which - was the usual way to represent ISO-Latin-1 file names before - UTF-8 file naming got widespread.</p> - <p><c>+fnu</c> means that file names are encoded in UTF-8, which - is nowadays the common scheme (although not enforced).</p> - <p><c>+fna</c> means that you automatically select between - <c>+fnl</c> and <c>+fnu</c>, based on the <c>LANG</c> and - <c>LC_CTYPE</c> environment variables. This is optimistic - heuristics indeed, nothing enforces a user to have a terminal - with the same encoding as the file system, but usually, this is - the case. This is the default on all Unix-like operating - systems except MacOS X.</p> - - <p>The file name translation mode can be read with the - <c>file:native_name_encoding/0</c> function, which returns - <c>latin1</c> (meaning bytewise encoding) or <c>utf8</c>.</p> - </item> - <tag><seealso marker="stdlib:epp#default_encoding/0"> - <c>epp:default_encoding/0</c></seealso></tag> - <item> - <p>This function returns the default encoding for Erlang source - files (if no encoding comment is present) in the currently - running release. In Erlang/OTP R16B <c>latin1</c> was returned (meaning - bytewise encoding). In Erlang/OTP 17.0 and forward it returns - <c>utf8</c>.</p> - <p>The encoding of each file can be specified using comments as - described in - <seealso marker="stdlib:epp#encoding"><c>epp(3)</c></seealso>.</p> - </item> - <tag><seealso marker="stdlib:io#setopts/1"><c>io:setopts/</c>{<c>1</c>,<c>2</c>}</seealso> and the <c>-oldshell</c>/<c>-noshell</c> flags.</tag> - <item> - <p>When Erlang is started with <c>-oldshell</c> or - <c>-noshell</c>, the I/O-server for <c>standard_io</c> is default - set to bytewise encoding, while an interactive shell defaults to - what the environment variables says.</p> - <p>With the <c>io:setopts/2</c> function you can set the - encoding of a file or other I/O-server. This can also be set when - opening a file. Setting the terminal (or other - <c>standard_io</c> server) unconditionally to the option - <c>{encoding,utf8}</c> will for example make UTF-8 encoded characters - being written to the device regardless of how Erlang was started or - the users environment.</p> - <p>Opening files with <c>encoding</c> option is convenient when - writing or reading text files in a known encoding.</p> - <p>You can retrieve the <c>encoding</c> setting for an I/O-server - using <seealso - marker="stdlib:io#getopts/1"><c>io:getopts()</c></seealso>.</p> - </item> - </taglist> -</section> -<section> - <title>Recipes</title> - <p>When starting with Unicode, one often stumbles over some common - issues. I try to outline some methods of dealing with Unicode data - in this section.</p> +{ok,[[{"Юникод",4711}]]}</pre> + </section> + + <section> + <title>Summary of Options</title> + <marker id="unicode_options_summary"/> + <p>The Unicode support is controlled by both command-line switches, some + standard environment variables, and the OTP version you are using. Most + options affect mainly how Unicode data is displayed, not the + functionality of the APIs in the standard libraries. This means that + Erlang programs usually do not need to concern themselves with these + options, they are more for the development environment. An Erlang program + can be written so that it works well regardless of the type of system or + the Unicode options that are in effect.</p> + + <p>Here follows a summary of the settings affecting Unicode:</p> + + <taglist> + <tag>The <c>LANG</c> and <c>LC_CTYPE</c> environment variables</tag> + <item> + <p>The language setting in the operating system mainly affects the + shell. The terminal (that is, the group leader) operates with + <c>{encoding, unicode}</c> only if the environment tells it that + UTF-8 is allowed. This setting is to correspond to the terminal you + are using.</p> + <p>The environment can also affect filename interpretation, if Erlang + is started with flag <c>+fna</c> (which is default from + Erlang/OTP 17.0).</p> + <p>You can check the setting of this by calling + <seealso marker="stdlib:io#getopts/1"><c>io:getopts()</c></seealso>, + which gives you an option list containing <c>{encoding,unicode}</c> + or <c>{encoding,latin1}</c>.</p> + </item> + <tag>The <c>+pc</c> {<c>unicode</c>|<c>latin1</c>} flag to + <seealso marker="erts:erl"><c>erl(1)</c></seealso></tag> + <item> + <p>This flag affects what is interpreted as string data when doing + heuristic string detection in the shell and in + <seealso marker="stdlib:io"><c>io</c></seealso>/ + <seealso marker="stdlib:io_lib#format/2"><c>io_lib:format</c></seealso> + with the <c>"~tp"</c> and <c>~tP</c> formatting instructions, as + described earlier.</p> + <p>You can check this option by calling + <seealso marker="stdlib:io#printable_range/0"><c>io:printable_range/0</c></seealso>, + which returns <c>unicode</c> or <c>latin1</c>. To be compatible with + future (expected) extensions to the settings, rather use + <seealso marker="stdlib:io_lib#printable_list/1"><c>io_lib:printable_list/1</c></seealso> + to check if a list is printable according to the setting. That + function takes into account new possible settings returned from + <c>io:printable_range/0</c>.</p> + </item> + <tag>The <c>+fn</c>{<c>l</c>|<c>u</c>|<c>a</c>} + [{<c>w</c>|<c>i</c>|<c>e</c>}] flag to + <seealso marker="erts:erl"><c>erl(1)</c></seealso></tag> + <item> + <p>This flag affects how the filenames are to be interpreted. On + operating systems with transparent file naming, this must be + specified to allow for file naming in Unicode characters (and for + correct interpretation of filenames containing characters > 255). + </p> + <list type="bulleted"> + <item> + <p><c>+fnl</c> means bytewise interpretation of filenames, which was + the usual way to represent ISO Latin-1 filenames before UTF-8 + file naming got widespread.</p> + </item> + <item> + <p><c>+fnu</c> means that filenames are encoded in UTF-8, which is + nowadays the common scheme (although not enforced).</p> + </item> + <item> + <p><c>+fna</c> means that you automatically select between + <c>+fnl</c> and <c>+fnu</c>, based on environment variables + <c>LANG</c> and <c>LC_CTYPE</c>. This is optimistic + heuristics indeed, nothing enforces a user to have a terminal with + the same encoding as the file system, but this is usually the + case. This is the default on all Unix-like operating systems, + except MacOS X.</p> + </item> + </list> + <p>The filename translation mode can be read with function + <seealso marker="kernel:file#native_name_encoding/0"><c>file:native_name_encoding/0</c></seealso>, + which returns <c>latin1</c> (bytewise encoding) or <c>utf8</c>.</p> + </item> + <tag><seealso marker="stdlib:epp#default_encoding/0"><c>epp:default_encoding/0</c></seealso></tag> + <item> + <p>This function returns the default encoding for Erlang source files + (if no encoding comment is present) in the currently running release. + In Erlang/OTP R16B, <c>latin1</c> (bytewise encoding) was returned. + As from Erlang/OTP 17.0, <c>utf8</c> is returned.</p> + <p>The encoding of each file can be specified using comments as + described in the + <seealso marker="stdlib:epp#encoding"><c>epp(3)</c></seealso> module. + </p> + </item> + <tag><seealso marker="stdlib:io#setopts/1"><c>io:setopts/1,2</c></seealso> + and flags <c>-oldshell</c>/<c>-noshell</c></tag> + <item> + <p>When Erlang is started with <c>-oldshell</c> or <c>-noshell</c>, the + I/O server for <c>standard_io</c> is by default set to bytewise + encoding, while an interactive shell defaults to what the + environment variables says.</p> + <p>You can set the encoding of a file or other I/O server with function + <seealso marker="stdlib:io#setopts/1"><c>io:setopts/2</c></seealso>. + This can also be set when opening a file. Setting the terminal (or + other <c>standard_io</c> server) unconditionally to option + <c>{encoding,utf8}</c> implies that UTF-8 encoded characters are + written to the device, regardless of how Erlang was started or the + user's environment.</p> + <p>Opening files with option <c>encoding</c> is convenient when + writing or reading text files in a known encoding.</p> + <p>You can retrieve the <c>encoding</c> setting for an I/O server with + function + <seealso marker="stdlib:io#getopts/1"><c>io:getopts()</c></seealso>. + </p> + </item> + </taglist> + </section> + <section> - <title>Byte Order Marks</title> - <p>A common method of identifying encoding in text-files is to put - a byte order mark (BOM) first in the file. The BOM is the - code point 16#FEFF encoded in the same way as the rest of the - file. If such a file is to be read, the first few bytes (depending - on encoding) is not part of the actual text. This code outlines - how to open a file which is believed to have a BOM and set the - files encoding and position for further sequential reading - (preferably using the <seealso marker="stdlib:io"><c>io</c></seealso> - module). Note that error handling is omitted from the code:</p> -<code> + <title>Recipes</title> + <p>When starting with Unicode, one often stumbles over some common issues. + This section describes some methods of dealing with Unicode data.</p> + + <section> + <title>Byte Order Marks</title> + <p>A common method of identifying encoding in text files is to put a Byte + Order Mark (BOM) first in the file. The BOM is the code point 16#FEFF + encoded in the same way as the remaining file. If such a file is to be + read, the first few bytes (depending on encoding) are not part of the + text. This code outlines how to open a file that is believed to + have a BOM, and sets the files encoding and position for further + sequential reading (preferably using the + <seealso marker="stdlib:io"><c>io</c></seealso> module).</p> + + <p>Notice that error handling is omitted from the code:</p> + + <code> open_bom_file_for_reading(File) -> {ok,F} = file:open(File,[read,binary]), {ok,Bin} = file:read(F,4), {Type,Bytes} = unicode:bom_to_encoding(Bin), file:position(F,Bytes), io:setopts(F,[{encoding,Type}]), - {ok,F}. -</code> - <p>The <c>unicode:bom_to_encoding/1</c> function identifies the - encoding from a binary of at least four bytes. It returns, along - with an term suitable for setting the encoding of the file, the - actual length of the BOM, so that the file position can be set - accordingly. Note that <c>file:position/2</c> always works on - byte-offsets, so that the actual byte-length of the BOM is - needed.</p> - <p>To open a file for writing and putting the BOM first is even - simpler:</p> -<code> + {ok,F}.</code> + + <p>Function + <seealso marker="stdlib:unicode#bom_to_encoding/1"><c>unicode:bom_to_encoding/1</c></seealso> + identifies the encoding from a binary of at least four bytes. It + returns, along with a term suitable for setting the encoding of the + file, the byte length of the BOM, so that the file position can be set + accordingly. Notice that function + <seealso marker="kernel:file#position/2"><c>file:position/2</c></seealso> + always works on byte-offsets, so that the byte length of the BOM is + needed.</p> + + <p>To open a file for writing and place the BOM first is even simpler:</p> + + <code> open_bom_file_for_writing(File,Encoding) -> {ok,F} = file:open(File,[write,binary]), ok = file:write(File,unicode:encoding_to_bom(Encoding)), io:setopts(F,[{encoding,Encoding}]), - {ok,F}. -</code> - <p>In both cases the file is then best processed using the - <c>io</c> module, as the functions in <c>io</c> can handle code - points beyond the ISO-latin-1 range.</p> - </section> - <section> - <title>Formatted I/O</title> - <p>When reading and writing to Unicode-aware entities, like the - User or a file opened for Unicode translation, you will probably - want to format text strings using the functions in <seealso - marker="stdlib:io"><c>io</c></seealso> or <seealso - marker="stdlib:io_lib"><c>io_lib</c></seealso>. For backward - compatibility reasons, these functions do not accept just any list - as a string, but require a special <em>translation modifier</em> - when working with Unicode texts. The modifier is <c>t</c>. When - applied to the <c>s</c> control character in a formatting string, - it accepts all Unicode code points and expect binaries to be in - UTF-8:</p> - <pre> + {ok,F}.</code> + + <p>The file is in both these cases then best processed using the + <seealso marker="stdlib:io"><c>io</c></seealso> module, as the functions + in that module can handle code points beyond the ISO Latin-1 range.</p> + </section> + + <section> + <title>Formatted I/O</title> + <p>When reading and writing to Unicode-aware entities, like a + file opened for Unicode translation, you probably want to format text + strings using the functions in the + <seealso marker="stdlib:io"><c>io</c></seealso> module or the + <seealso marker="stdlib:io_lib"><c>io_lib</c></seealso> module. For + backward compatibility reasons, these functions do not accept any list + as a string, but require a special <em>translation modifier</em> when + working with Unicode texts. The modifier is <c>t</c>. When applied to + control character <c>s</c> in a formatting string, it accepts all + Unicode code points and expects binaries to be in UTF-8:</p> + + <pre> 1> <input>io:format("~ts~n",[<<"åäö"/utf8>>]).</input> åäö ok 2> <input>io:format("~s~n",[<<"åäö"/utf8>>]).</input> åäö ok</pre> - <p>Obviously the second <c>io:format/2</c> gives undesired output - because the UTF-8 binary is not in latin1. For backward - compatibility, the non prefixed <c>s</c> control character expects - bytewise encoded ISO-latin-1 characters in binaries and lists - containing only code points < 256.</p> - <p>As long as the data is always lists, the <c>t</c> modifier can - be used for any string, but when binary data is involved, care - must be taken to make the right choice of formatting characters. A - bytewise encoded binary will also be interpreted as a string and - printed even when using <c>~ts</c>, but it might be mistaken for a - valid UTF-8 string and one should therefore avoid using the - <c>~ts</c> control if the binary contains bytewise encoded - characters and not UTF-8.</p> - <p>The function <c>format/2</c> in <c>io_lib</c> behaves - similarly. This function is defined to return a deep list of - characters and the output could easily be converted to binary data - for outputting on a device of any kind by a simple - <c>erlang:list_to_binary/1</c>. When the translation modifier is - used, the list can however contain characters that cannot be - stored in one byte. The call to <c>erlang:list_to_binary/1</c> - will in that case fail. However, if the I/O server you want to - communicate with is Unicode-aware, the list returned can still be - used directly:</p> -<pre> + + <p>Clearly, the second <c>io:format/2</c> gives undesired output, as the + UTF-8 binary is not in <c>latin1</c>. For backward compatibility, the + non-prefixed control character <c>s</c> expects bytewise-encoded ISO + Latin-1 characters in binaries and lists containing only code points + < 256.</p> + + <p>As long as the data is always lists, modifier <c>t</c> can be used for + any string, but when binary data is involved, care must be taken to + make the correct choice of formatting characters. A bytewise-encoded + binary is also interpreted as a string, and printed even when using + <c>~ts</c>, but it can be mistaken for a valid UTF-8 string. Avoid + therefore using the <c>~ts</c> control if the binary contains + bytewise-encoded characters and not UTF-8.</p> + + <p>Function + <seealso marker="stdlib:io_lib#format/2"><c>io_lib:format/2</c></seealso> + behaves similarly. It is defined to return a deep list of characters + and the output can easily be converted to binary data for outputting on + any device by a simple + <seealso marker="erts:erlang#list_to_binary/1"><c>erlang:list_to_binary/1</c></seealso>. + When the translation modifier is used, the list can, however, contain + characters that cannot be stored in one byte. The call to + <c>erlang:list_to_binary/1</c> then fails. However, if the I/O server + you want to communicate with is Unicode-aware, the returned list can + still be used directly:</p> + + <pre> $ <input>erl +pc unicode</input> Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false] @@ -1195,55 +1327,56 @@ Eshell V5.10.1 (abort with ^G) 2> <input>io:put_chars(io_lib:format("~ts~n", ["Γιούνικοντ"])).</input> Γιούνικοντ ok</pre> - <p>The Unicode string is returned as a Unicode list, which is - recognized as such since the Erlang shell uses the Unicode - encoding (and is started with all Unicode characters considered - printable). The Unicode list is valid input to the <seealso - marker="stdlib:io#put_chars/2"><c>io:put_chars/2</c></seealso> function, - so data can be output on any Unicode capable device. If the device - is a terminal, characters will be output in the <c>\x{</c>H - ...<c>}</c> format if encoding is <c>latin1</c> otherwise in UTF-8 - (for the non-interactive terminal - "oldshell" or "noshell") or - whatever is suitable to show the character properly (for an - interactive terminal - the regular shell). The bottom line is that - you can always send Unicode data to the <c>standard_io</c> - device. Files will however only accept Unicode code points beyond - ISO-latin-1 if <c>encoding</c> is set to something else than - <c>latin1</c>.</p> - </section> - <section> - <title>Heuristic Identification of UTF-8</title> - <p>While it is - strongly encouraged that the actual encoding of characters in - binary data is known prior to processing, that is not always - possible. On a typical Linux system, there is a mix of UTF-8 - and ISO-latin-1 text files and there are seldom any BOM's in the - files to identify them.</p> - <p>UTF-8 is designed in such a way that ISO-latin-1 characters - with numbers beyond the 7-bit ASCII range are seldom considered - valid when decoded as UTF-8. Therefore one can usually use - heuristics to determine if a file is in UTF-8 or if it is encoded - in ISO-latin-1 (one byte per character) encoding. The - <c>unicode</c> module can be used to determine if data can be - interpreted as UTF-8:</p> - <code> + + <p>The Unicode string is returned as a Unicode list, which is recognized + as such, as the Erlang shell uses the Unicode encoding (and is started + with all Unicode characters considered printable). The Unicode list is + valid input to function + <seealso marker="stdlib:io#put_chars/2"><c>io:put_chars/2</c></seealso>, + so data can be output on any Unicode-capable device. If the device is a + terminal, characters are output in format <c>\x{</c>H...<c>}</c> if + encoding is <c>latin1</c>. Otherwise in UTF-8 (for the non-interactive + terminal: "oldshell" or "noshell") or whatever is suitable to show the + character properly (for an interactive terminal: the regular shell).</p> + + <p>So, you can always send Unicode data to the <c>standard_io</c> device. + Files, however, accept only Unicode code points beyond ISO Latin-1 if + <c>encoding</c> is set to something else than <c>latin1</c>.</p> + </section> + + <section> + <title>Heuristic Identification of UTF-8</title> + <p>While it is strongly encouraged that the encoding of characters + in binary data is known before processing, that is not always possible. + On a typical Linux system, there is a mix of UTF-8 and ISO Latin-1 text + files, and there are seldom any BOMs in the files to identify them.</p> + + <p>UTF-8 is designed so that ISO Latin-1 characters with numbers beyond + the 7-bit ASCII range are seldom considered valid when decoded as UTF-8. + Therefore one can usually use heuristics to determine if a file is in + UTF-8 or if it is encoded in ISO Latin-1 (one byte per character). + The <seealso marker="stdlib:unicode"><c>unicode</c></seealso> + module can be used to determine if data can be interpreted as UTF-8:</p> + + <code> heuristic_encoding_bin(Bin) when is_binary(Bin) -> case unicode:characters_to_binary(Bin,utf8,utf8) of Bin -> utf8; _ -> latin1 - end. - </code> - <p>If one does not have a complete binary of the file content, one - could instead chunk through the file and check part by part. The - return-tuple <c>{incomplete,Decoded,Rest}</c> from - <c>unicode:characters_to_binary/{1,2,3}</c> comes in handy. The - incomplete rest from one chunk of data read from the file is - prepended to the next chunk and we therefore circumvent the - problem of character boundaries when reading chunks of bytes in - UTF-8 encoding:</p> - <code> + end.</code> + + <p>If you do not have a complete binary of the file content, you can + instead chunk through the file and check part by part. The return-tuple + <c>{incomplete,Decoded,Rest}</c> from function + <seealso marker="stdlib:unicode#characters_to_binary/1"><c>unicode:characters_to_binary/1,2,3</c></seealso> + comes in handy. The incomplete rest from one chunk of data read from the + file is prepended to the next chunk and we therefore avoid the problem + of character boundaries when reading chunks of bytes in UTF-8 + encoding:</p> + + <code> heuristic_encoding_file(FileName) -> {ok,F} = file:open(FileName,[read,binary]), loop_through_file(F,<<>>,file:read(F,1024)). @@ -1260,13 +1393,14 @@ loop_through_file(F,Acc,{ok,Bin}) when is_binary(Bin) -> loop_through_file(F,Rest,file:read(F,1024)); Res when is_binary(Res) -> loop_through_file(F,<<>>,file:read(F,1024)) - end. - </code> - <p>Another option is to try to read the whole file in UTF-8 - encoding and see if it fails. Here we need to read the file using - <c>io:get_chars/3</c>, as we have to succeed in reading characters - with a code point over 255:</p> - <code> + end.</code> + + <p>Another option is to try to read the whole file in UTF-8 encoding and + see if it fails. Here we need to read the file using function + <seealso marker="stdlib:io#get_chars/3"><c>io:get_chars/3</c></seealso>, + as we have to read characters with a code point > 255:</p> + + <code> heuristic_encoding_file2(FileName) -> {ok,F} = file:open(FileName,[read,binary,{encoding,utf8}]), loop_through_file2(F,io:get_chars(F,'',1024)). @@ -1276,69 +1410,71 @@ loop_through_file2(_,eof) -> loop_through_file2(_,{error,_Err}) -> latin1; loop_through_file2(F,Bin) when is_binary(Bin) -> - loop_through_file2(F,io:get_chars(F,'',1024)). - </code> - </section> - <section> - <title>Lists of UTF-8 Bytes</title> - <p>For various reasons, you may find yourself having a list of - UTF-8 bytes. This is not a regular string of Unicode characters as - each element in the list does not contain one character. Instead - you get the "raw" UTF-8 encoding that you have in binaries. This - is easily converted to a proper Unicode string by first converting - byte per byte into a binary and then converting the binary of - UTF-8 encoded characters back to a Unicode string:</p> - <code> - utf8_list_to_string(StrangeList) -> - unicode:characters_to_list(list_to_binary(StrangeList)). - </code> - </section> - <section> - <title>Double UTF-8 Encoding</title> - <p>When working with binaries, you may get the horrible "double - UTF-8 encoding", where strange characters are encoded in your - binaries or files that you did not expect. What you may have got, - is a UTF-8 encoded binary that is for the second time encoded as - UTF-8. A common situation is where you read a file, byte by byte, - but the actual content is already UTF-8. If you then convert the - bytes to UTF-8, using i.e. the <c>unicode</c> module or by - writing to a file opened with the <c>{encoding,utf8}</c> - option. You will have each <i>byte</i> in the in the input file - encoded as UTF-8, not each character of the original text (one - character may have been encoded in several bytes). There is no - real remedy for this other than being very sure of which data is - actually encoded in which format, and never convert UTF-8 data - (possibly read byte by byte from a file) into UTF-8 again.</p> - <p>The by far most common situation where this happens, is when - you get lists of UTF-8 instead of proper Unicode strings, and then - convert them to UTF-8 in a binary or on a file:</p> - <code> - wrong_thing_to_do() -> - {ok,Bin} = file:read_file("an_utf8_encoded_file.txt"), - MyList = binary_to_list(Bin), %% Wrong! It is an utf8 binary! - {ok,C} = file:open("catastrophe.txt",[write,{encoding,utf8}]), - io:put_chars(C,MyList), %% Expects a Unicode string, but get UTF-8 - %% bytes in a list! - file:close(C). %% The file catastrophe.txt contains more or less unreadable - %% garbage! - </code> - <p>Make very sure you know what a binary contains before - converting it to a string. If no other option exists, try - heuristics:</p> - <code> - if_you_can_not_know() -> - {ok,Bin} = file:read_file("maybe_utf8_encoded_file.txt"), - MyList = case unicode:characters_to_list(Bin) of - L when is_list(L) -> - L; - _ -> - binary_to_list(Bin) %% The file was bytewise encoded - end, - %% Now we know that the list is a Unicode string, not a list of UTF-8 bytes - {ok,G} = file:open("greatness.txt",[write,{encoding,utf8}]), - io:put_chars(G,MyList), %% Expects a Unicode string, which is what it gets! - file:close(G). %% The file contains valid UTF-8 encoded Unicode characters! - </code> + loop_through_file2(F,io:get_chars(F,'',1024)).</code> + </section> + + <section> + <title>Lists of UTF-8 Bytes</title> + <p>For various reasons, you can sometimes have a list of UTF-8 + bytes. This is not a regular string of Unicode characters, as each list + element does not contain one character. Instead you get the "raw" UTF-8 + encoding that you have in binaries. This is easily converted to a proper + Unicode string by first converting byte per byte into a binary, and then + converting the binary of UTF-8 encoded characters back to a Unicode + string:</p> + + <code> +utf8_list_to_string(StrangeList) -> + unicode:characters_to_list(list_to_binary(StrangeList)).</code> + </section> + + <section> + <title>Double UTF-8 Encoding</title> + <p>When working with binaries, you can get the horrible "double UTF-8 + encoding", where strange characters are encoded in your binaries or + files. In other words, you can get a UTF-8 encoded binary that for the + second time is encoded as UTF-8. A common situation is where you read a + file, byte by byte, but the content is already UTF-8. If you then + convert the bytes to UTF-8, using, for example, the + <seealso marker="stdlib:unicode"><c>unicode</c></seealso> module, or by + writing to a file opened with option <c>{encoding,utf8}</c>, you have + each <em>byte</em> in the input file encoded as UTF-8, not each + character of the original text (one character can have been encoded in + many bytes). There is no real remedy for this other than to be sure of + which data is encoded in which format, and never convert UTF-8 data + (possibly read byte by byte from a file) into UTF-8 again.</p> + + <p>By far the most common situation where this occurs, is when you get + lists of UTF-8 instead of proper Unicode strings, and then convert them + to UTF-8 in a binary or on a file:</p> + + <code> +wrong_thing_to_do() -> + {ok,Bin} = file:read_file("an_utf8_encoded_file.txt"), + MyList = binary_to_list(Bin), %% Wrong! It is an utf8 binary! + {ok,C} = file:open("catastrophe.txt",[write,{encoding,utf8}]), + io:put_chars(C,MyList), %% Expects a Unicode string, but get UTF-8 + %% bytes in a list! + file:close(C). %% The file catastrophe.txt contains more or less unreadable + %% garbage!</code> + + <p>Ensure you know what a binary contains before converting it to a + string. If no other option exists, try heuristics:</p> + + <code> +if_you_can_not_know() -> + {ok,Bin} = file:read_file("maybe_utf8_encoded_file.txt"), + MyList = case unicode:characters_to_list(Bin) of + L when is_list(L) -> + L; + _ -> + binary_to_list(Bin) %% The file was bytewise encoded + end, + %% Now we know that the list is a Unicode string, not a list of UTF-8 bytes + {ok,G} = file:open("greatness.txt",[write,{encoding,utf8}]), + io:put_chars(G,MyList), %% Expects a Unicode string, which is what it gets! + file:close(G). %% The file contains valid UTF-8 encoded Unicode characters!</code> + </section> </section> -</section> </chapter> + diff --git a/lib/stdlib/doc/src/win32reg.xml b/lib/stdlib/doc/src/win32reg.xml index 52a8942c59..f4a4fa1626 100644 --- a/lib/stdlib/doc/src/win32reg.xml +++ b/lib/stdlib/doc/src/win32reg.xml @@ -24,38 +24,39 @@ <title>win32reg</title> <prepared>Bjorn Gustavsson</prepared> - <responsible>NN</responsible> + <responsible></responsible> <docno></docno> - <approved>nobody</approved> - <checked>no</checked> + <approved></approved> + <checked></checked> <date>2000-08-10</date> <rev>PA1</rev> - <file>win32reg.sgml</file> + <file>win32reg.xml</file> </header> <module>win32reg</module> - <modulesummary>win32reg provides access to the registry on Windows</modulesummary> + <modulesummary>Provides access to the registry on Windows.</modulesummary> <description> - <p><c>win32reg</c> provides read and write access to the + <p>This module provides read and write access to the registry on Windows. It is essentially a port driver wrapped around the Win32 API calls for accessing the registry.</p> <p>The registry is a hierarchical database, used to store various system - and software information in Windows. It is available in Windows 95 and - Windows NT. It contains installation data, and is updated by installers + and software information in Windows. + It contains installation data, and is updated by installers and system programs. The Erlang installer updates the registry by adding data that Erlang needs.</p> <p>The registry contains keys and values. Keys are like the directories in a file system, they form a hierarchy. Values are like files, they have a name and a value, and also a type.</p> - <p>Paths to keys are left to right, with sub-keys to the right and backslash - between keys. (Remember that backslashes must be doubled in Erlang strings.) - Case is preserved but not significant. - Example: <c>"\\hkey_local_machine\\software\\Ericsson\\Erlang\\5.0"</c> is the key + <p>Paths to keys are left to right, with subkeys to the right and backslash + between keys. (Remember that backslashes must be doubled in Erlang + strings.) Case is preserved but not significant.</p> + <p>For example, + <c>"\\hkey_local_machine\\software\\Ericsson\\Erlang\\5.0"</c> is the key for the installation data for the latest Erlang release.</p> - <p>There are six entry points in the Windows registry, top level keys. They can be - abbreviated in the <c>win32reg</c> module as:</p> + <p>There are six entry points in the Windows registry, top-level keys. + They can be abbreviated in this module as follows:</p> <pre> -Abbrev. Registry key -======= ============ +Abbreviation Registry key +============ ============ hkcr HKEY_CLASSES_ROOT current_user HKEY_CURRENT_USER hkcu HKEY_CURRENT_USER @@ -67,29 +68,39 @@ current_config HKEY_CURRENT_CONFIG hkcc HKEY_CURRENT_CONFIG dyn_data HKEY_DYN_DATA hkdd HKEY_DYN_DATA</pre> - <p>The key above could be written as <c>"\\hklm\\software\\ericsson\\erlang\\5.0"</c>.</p> - <p>The <c>win32reg</c> module uses a current key. It works much like the - current directory. From the current key, values can be fetched, sub-keys + <p>The key above can be written as + <c>"\\hklm\\software\\ericsson\\erlang\\5.0"</c>.</p> + <p>This module uses a current key. It works much like the + current directory. From the current key, values can be fetched, subkeys can be listed, and so on.</p> - <p>Under a key, any number of named values can be stored. They have name, and + <p>Under a key, any number of named values can be stored. They have names, types, and data.</p> - <p>Currently, the <c>win32reg</c> module supports storing only the following - types: REG_DWORD, which is an - integer, REG_SZ which is a string and REG_BINARY which is a binary. - Other types can be read, and will be returned as binaries.</p> - <p>There is also a "default" value, which has the empty string as name. It is read and - written with the atom <c>default</c> instead of the name.</p> - <p>Some registry values are stored as strings with references to environment variables, - e.g. <c>"%SystemRoot%Windows"</c>. <c>SystemRoot</c> is an environment variable, and should be - replaced with its value. A function <c>expand/1</c> is provided, so that environment - variables surrounded in % can be expanded to their values.</p> - <p>For additional information on the Windows registry consult the Win32 + <p><c>win32reg</c> supports storing of the following types:</p> + <list type="bulleted"> + <item><c>REG_DWORD</c>, which is an integer</item> + <item><c>REG_SZ</c>, which is a string</item> + <item><c>REG_BINARY</c>, which is a binary</item> + </list> + <p>Other types can be read, and are returned as binaries.</p> + <p>There is also a "default" value, which has the empty string as name. It + is read and written with the atom <c>default</c> instead of the name.</p> + <p>Some registry values are stored as strings with references to environment + variables, for example, <c>%SystemRoot%Windows</c>. <c>SystemRoot</c> is + an environment variable, and is to be replaced with its value. Function + <seealso marker="#expand/1"><c>expand/1</c></seealso> is provided so that + environment variables surrounded by <c>%</c> can be expanded to their + values.</p> + <p>For more information on the Windows registry, see consult the Win32 Programmer's Reference.</p> </description> + <datatypes> <datatype> <name name="reg_handle"/> - <desc><p>As returned by <seealso marker="#open/1">open/1</seealso>.</p></desc> + <desc> + <p>As returned by + <seealso marker="#open/1"><c>open/1</c></seealso>.</p> + </desc> </datatype> <datatype> <name name="name"/> @@ -98,136 +109,164 @@ hkdd HKEY_DYN_DATA</pre> <name name="value"/> </datatype> </datatypes> + <funcs> <func> <name name="change_key" arity="2"/> - <fsummary>Move to a key in the registry</fsummary> + <fsummary>Move to a key in the registry.</fsummary> <desc> - <p>Changes the current key to another key. Works like cd. + <p>Changes the current key to another key. Works like <c>cd</c>. The key can be specified as a relative path or as an - absolute path, starting with \.</p> + absolute path, starting with <c>\.</c></p> </desc> </func> + <func> <name name="change_key_create" arity="2"/> - <fsummary>Move to a key, create it if it is not there</fsummary> + <fsummary>Move to a key, create it if it is not there.</fsummary> <desc> <p>Creates a key, or just changes to it, if it is already there. Works - like a combination of <c>mkdir</c> and <c>cd</c>. Calls the Win32 API function - <c>RegCreateKeyEx()</c>.</p> - <p>The registry must have been opened in write-mode.</p> + like a combination of <c>mkdir</c> and <c>cd</c>. + Calls the Win32 API function <c>RegCreateKeyEx()</c>.</p> + <p>The registry must have been opened in write mode.</p> </desc> </func> + <func> <name name="close" arity="1"/> <fsummary>Close the registry.</fsummary> <desc> - <p>Closes the registry. After that, the <c><anno>RegHandle</anno></c> cannot - be used.</p> + <p>Closes the registry. After that, the <c><anno>RegHandle</anno></c> + cannot be used.</p> </desc> </func> + <func> <name name="current_key" arity="1"/> <fsummary>Return the path to the current key.</fsummary> <desc> - <p>Returns the path to the current key. This is the equivalent of <c>pwd</c>.</p> - <p>Note that the current key is stored in the driver, and might be - invalid (e.g. if the key has been removed).</p> + <p>Returns the path to the current key. This is the equivalent of + <c>pwd</c>.</p> + <p>Notice that the current key is stored in the driver, and can be + invalid (for example, if the key has been removed).</p> </desc> </func> + <func> <name name="delete_key" arity="1"/> - <fsummary>Delete the current key</fsummary> + <fsummary>Delete the current key.</fsummary> <desc> <p>Deletes the current key, if it is valid. Calls the Win32 API - function <c>RegDeleteKey()</c>. Note that this call does not change the current key, - (unlike <c>change_key_create/2</c>.) This means that after the call, the - current key is invalid.</p> + function <c>RegDeleteKey()</c>. Notice that this call does not change + the current key (unlike + <seealso marker="#change_key_create/2"> + <c>change_key_create/2</c></seealso>). + This means that after the call, the current key is invalid.</p> </desc> </func> + <func> <name name="delete_value" arity="2"/> <fsummary>Delete the named value on the current key.</fsummary> <desc> <p>Deletes a named value on the current key. The atom <c>default</c> is - used for the the default value.</p> - <p>The registry must have been opened in write-mode.</p> + used for the default value.</p> + <p>The registry must have been opened in write mode.</p> </desc> </func> + <func> <name name="expand" arity="1"/> - <fsummary>Expand a string with environment variables</fsummary> + <fsummary>Expand a string with environment variables.</fsummary> <desc> <p>Expands a string containing environment variables between percent - characters. Anything between two % is taken for a environment - variable, and is replaced by the value. Two consecutive % is replaced - by one %.</p> - <p>A variable name that is not in the environment, will result in an error.</p> + characters. Anything between two <c>%</c> is taken for an environment + variable, and is replaced by the value. Two consecutive <c>%</c> are + replaced by one <c>%</c>.</p> + <p>A variable name that is not in the environment results in an + error.</p> </desc> </func> + <func> <name name="format_error" arity="1"/> - <fsummary>Convert an POSIX errorcode to a string</fsummary> + <fsummary>Convert a POSIX error code to a string.</fsummary> <desc> - <p>Convert an POSIX errorcode to a string (by calling <c>erl_posix_msg:message</c>).</p> + <p>Converts a POSIX error code to a string + (by calling <c>erl_posix_msg:message/1</c>).</p> </desc> </func> + <func> <name name="open" arity="1"/> - <fsummary>Open the registry for reading or writing</fsummary> + <fsummary>Open the registry for reading or writing.</fsummary> <desc> - <p>Opens the registry for reading or writing. The current key will be the root - (<c>HKEY_CLASSES_ROOT</c>). The <c>read</c> flag in the mode list can be omitted.</p> - <p>Use <c>change_key/2</c> with an absolute path after <c>open</c>.</p> + <p>Opens the registry for reading or writing. The current key is the + root (<c>HKEY_CLASSES_ROOT</c>). Flag <c>read</c> in the mode list + can be omitted.</p> + <p>Use <seealso marker="#change_key/2"><c>change_key/2</c></seealso> + with an absolute path after + <seealso marker="#open/1"><c>open</c></seealso>.</p> </desc> </func> + <func> <name name="set_value" arity="3"/> - <fsummary>Set value at the current registry key with specified name.</fsummary> + <fsummary>Set value at the current registry key with specified name. + </fsummary> <desc> - <p>Sets the named (or default) value to value. Calls the Win32 - API function <c>RegSetValueEx()</c>. The value can be of three types, and - the corresponding registry type will be used. Currently the types supported - are: <c>REG_DWORD</c> for integers, <c>REG_SZ</c> for strings and - <c>REG_BINARY</c> for binaries. Other types cannot currently be added - or changed.</p> - <p>The registry must have been opened in write-mode.</p> + <p>Sets the named (or default) value to <c>value</c>. Calls the Win32 + API function <c>RegSetValueEx()</c>. The value can be of three types, + and the corresponding registry type is used. The supported types + are the following:</p> + <list type="bulleted"> + <item><c>REG_DWORD</c> for integers</item> + <item><c>REG_SZ</c> for strings</item> + <item><c>REG_BINARY</c> for binaries</item> + </list> + <p>Other types cannot be added or changed.</p> + <p>The registry must have been opened in write mode.</p> </desc> </func> + <func> <name name="sub_keys" arity="1"/> <fsummary>Get subkeys to the current key.</fsummary> <desc> <p>Returns a list of subkeys to the current key. Calls the Win32 API function <c>EnumRegKeysEx()</c>.</p> - <p>Avoid calling this on the root keys, it can be slow.</p> + <p>Avoid calling this on the root keys, as it can be slow.</p> </desc> </func> + <func> <name name="value" arity="2"/> <fsummary>Get the named value on the current key.</fsummary> <desc> <p>Retrieves the named value (or default) on the current key. - Registry values of type <c>REG_SZ</c>, are returned as strings. Type <c>REG_DWORD</c> - values are returned as integers. All other types are returned as binaries.</p> + Registry values of type <c>REG_SZ</c> are returned as strings. + Type <c>REG_DWORD</c> values are returned as integers. All other + types are returned as binaries.</p> </desc> </func> + <func> <name name="values" arity="1"/> <fsummary>Get all values on the current key.</fsummary> <desc> <p>Retrieves a list of all values on the current key. The values - have types corresponding to the registry types, see <c>value</c>. + have types corresponding to the registry types, see + <seealso marker="#value/2"><c>value/2</c></seealso>. Calls the Win32 API function <c>EnumRegValuesEx()</c>.</p> </desc> </func> </funcs> <section> - <title>SEE ALSO</title> - <p>Win32 Programmer's Reference (from Microsoft)</p> - <p><c>erl_posix_msg</c></p> - <p>The Windows 95 Registry (book from O'Reilly)</p> + <title>See Also</title> + <p><c>erl_posix_msg</c>, + The Windows 95 Registry (book from O'Reilly), + Win32 Programmer's Reference (from Microsoft)</p> </section> </erlref> diff --git a/lib/stdlib/doc/src/zip.xml b/lib/stdlib/doc/src/zip.xml index 09a6587583..0b5eac1e16 100644 --- a/lib/stdlib/doc/src/zip.xml +++ b/lib/stdlib/doc/src/zip.xml @@ -28,98 +28,130 @@ <docno>1</docno> <approved></approved> <checked></checked> - <date>05-11-02</date> + <date>2005-11-02</date> <rev>PA1</rev> - <file>zip.sgml</file> + <file>zip.xml</file> </header> <module>zip</module> - <modulesummary>Utility for reading and creating 'zip' archives.</modulesummary> + <modulesummary>Utility for reading and creating 'zip' archives. + </modulesummary> <description> - <p>The <c>zip</c> module archives and extracts files to and from a zip - archive. The zip format is specified by the "ZIP Appnote.txt" file - available on PKWare's website www.pkware.com.</p> + <p>This module archives and extracts files to and from a zip + archive. The zip format is specified by the "ZIP Appnote.txt" file, + available on the PKWARE web site + <url href="http://www.pkware.com">www.pkware.com</url>.</p> <p>The zip module supports zip archive versions up to 6.1. However, password-protection and Zip64 are not supported.</p> - <p>By convention, the name of a zip file should end in "<c>.zip</c>". - To abide to the convention, you'll need to add "<c>.zip</c>" yourself - to the name.</p> - <p>Zip archives are created with the - <seealso marker="#zip_2">zip/2</seealso> or the - <seealso marker="#zip_2">zip/3</seealso> function. (They are - also available as <c>create</c>, to resemble the <c>erl_tar</c> - module.)</p> - <p>To extract files from a zip archive, use the - <seealso marker="#unzip_1">unzip/1</seealso> or the - <seealso marker="#unzip_2">unzip/2</seealso> function. (They are - also available as <c>extract</c>.)</p> - <p>To fold a function over all files in a zip archive, use the - <seealso marker="#foldl_3">foldl_3</seealso> function.</p> - <p>To return a list of the files in a zip archive, use the - <seealso marker="#list_dir_1">list_dir/1</seealso> or the - <seealso marker="#list_dir_2">list_dir/2</seealso> function. (They - are also available as <c>table</c>.)</p> - <p>To print a list of files to the Erlang shell, - use either the <seealso marker="#t_1">t/1</seealso> or - <seealso marker="#tt_1">tt/1</seealso> function.</p> - <p>In some cases, it is desirable to open a zip archive, and to - unzip files from it file by file, without having to reopen the - archive. The functions - <seealso marker="#zip_open">zip_open</seealso>, - <seealso marker="#zip_get">zip_get</seealso>, - <seealso marker="#zip_list_dir">zip_list_dir</seealso> and - <seealso marker="#zip_close">zip_close</seealso> do this.</p> + <p>By convention, the name of a zip file is to end with <c>.zip</c>. + To abide to the convention, add <c>.zip</c> to the filename.</p> + <list type="bulleted"> + <item> + <p>To create zip archives, use function + <seealso marker="#zip/2"><c>zip/2</c></seealso> or + <seealso marker="#zip/2"><c>zip/3</c></seealso>. They are + also available as <c>create/2,3</c>, to resemble the + <seealso marker="erl_tar"><c>erl_tar</c></seealso> module.</p> + </item> + <item> + <p>To extract files from a zip archive, use function + <seealso marker="#unzip/1"><c>unzip/1</c></seealso> or + <seealso marker="#unzip/2"><c>unzip/2</c></seealso>. They are + also available as <c>extract/1,2</c>, to resemble the + <seealso marker="erl_tar"><c>erl_tar</c></seealso> module.</p> + </item> + <item> + <p>To fold a function over all files in a zip archive, use function + <seealso marker="#foldl/3"><c>foldl/3</c></seealso>.</p> + </item> + <item> + <p>To return a list of the files in a zip archive, use function + <seealso marker="#list_dir/1"><c>list_dir/1</c></seealso> or + <seealso marker="#list_dir/2"><c>list_dir/2</c></seealso>. They are + also available as <c>table/1,2</c>, to resemble the + <seealso marker="erl_tar"><c>erl_tar</c></seealso> module.</p> + </item> + <item> + <p>To print a list of files to the Erlang shell, use function + <seealso marker="#t/1"><c>t/1</c></seealso> or + <seealso marker="#tt/1"><c>tt/1</c></seealso>.</p> + </item> + <item> + <p>Sometimes it is desirable to open a zip archive, and to + unzip files from it file by file, without having to reopen the + archive. This can be done by functions + <seealso marker="#zip_open/1"><c>zip_open/1,2</c></seealso>, + <seealso marker="#zip_get/1"><c>zip_get/1,2</c></seealso>, + <seealso marker="#zip_list_dir/1"><c>zip_list_dir/1</c></seealso>, and + <seealso marker="#zip_close/1"><c>zip_close/1</c></seealso>.</p> + </item> + </list> </description> <section> - <title>LIMITATIONS</title> - <p>Zip64 archives are not currently supported.</p> - <p>Password-protected and encrypted archives are not currently - supported</p> - <p>Only the DEFLATE (zlib-compression) and the STORE (uncompressed - data) zip methods are supported.</p> - <p>The size of the archive is limited to 2 G-byte (32 bits).</p> - <p>Comments for individual files is not supported when creating zip - archives. The zip archive comment for the whole zip archive is - supported.</p> - <p>There is currently no support for altering an existing zip archive. - To add or remove a file from an archive, the whole archive must be - recreated.</p> + <title>Limitations</title> + <list type="bulleted"> + <item> + <p>Zip64 archives are not supported.</p> + </item> + <item> + <p>Password-protected and encrypted archives are not supported.</p> + </item> + <item> + <p>Only the DEFLATE (zlib-compression) and the STORE (uncompressed + data) zip methods are supported.</p> + </item> + <item> + <p>The archive size is limited to 2 GB (32 bits).</p> + </item> + <item> + <p>Comments for individual files are not supported when creating zip + archives. The zip archive comment for the whole zip archive is + supported.</p> + </item> + <item> + <p>Changing a zip archive is not supported. + To add or remove a file from an archive, the whole archive must be + recreated.</p> + </item> + </list> </section> <datatypes> <datatype> <name name="zip_comment"/> <desc> - <p>The record <c>zip_comment</c> just contains the archive comment for - a zip archive</p> + <p>The record <c>zip_comment</c> only contains the archive comment for + a zip archive.</p> </desc> </datatype> <datatype> <name name="zip_file"/> <desc> - <p>The record <c>zip_file</c> contains the following fields.</p> + <p>The record <c>zip_file</c> contains the following fields:</p> <taglist> <tag><c>name</c></tag> <item> - <p>the name of the file</p> + <p>The filename</p> </item> <tag><c>info</c></tag> <item> - <p>file info as in - <seealso marker="kernel:file#read_file_info/1">file:read_file_info/1</seealso></p> + <p>File information as in + <seealso marker="kernel:file#read_file_info/1"> + <c>file:read_file_info/1</c></seealso> + in Kernel</p> </item> <tag><c>comment</c></tag> <item> - <p>the comment for the file in the zip archive</p> + <p>The comment for the file in the zip archive</p> </item> <tag><c>offset</c></tag> <item> - <p>the offset of the file in the zip archive (used internally)</p> + <p>The file offset in the zip archive (used internally)</p> </item> <tag><c>comp_size</c></tag> <item> - <p>the compressed size of the file (the uncompressed size is found - in <c>info</c>)</p> + <p>The size of the compressed file (the size of the uncompressed + file is found in <c>info</c>)</p> </item> </taglist> </desc> @@ -133,224 +165,44 @@ <datatype> <name name="create_option"/> <desc> - <p>These options are described in <seealso marker="#zip_options">create/3</seealso>.</p> + <p>These options are described in <seealso marker="#zip_options"> + <c>create/3</c></seealso>.</p> </desc> </datatype> <datatype> - <name name="handle"/> + <name name="handle"/> <desc> - <p>As returned by <seealso marker="#zip_open/2">zip_open/2</seealso>.</p> + <p>As returned by + <seealso marker="#zip_open/2"><c>zip_open/2</c></seealso>.</p> </desc> </datatype> </datatypes> + <funcs> <func> - <name name="zip" arity="2"/> - <name name="zip" arity="3"/> - <name name="create" arity="2"/> - <name name="create" arity="3"/> - <fsummary>Create a zip archive with options</fsummary> - <desc> - <p>The <marker id="zip_2"></marker><c>zip</c> function creates a - zip archive containing the files specified in <c><anno>FileList</anno></c>.</p> - <p>As synonyms, the functions <c>create/2</c> and <c>create/3</c> - are provided, to make it resemble the <c>erl_tar</c> module.</p> - <p>The file-list is a list of files, with paths relative to the - current directory, they will be stored with this path in the - archive. Files may also be specified with data in binaries, - to create an archive directly from data.</p> - <p>Files will be compressed using the DEFLATE compression, as - described in the Appnote.txt file. However, files will be - stored without compression if they already are compressed. - The <c>zip/2</c> and <c>zip/3</c> functions check the file extension - to see whether the file should be stored without compression. - Files with the following extensions are not compressed: - <c>.Z</c>, <c>.zip</c>, <c>.zoo</c>, <c>.arc</c>, <c>.lzh</c>, - <c>.arj</c>.</p> - <p>It is possible to override the default behavior and - explicitly control what types of files that should be - compressed by using the <c>{compress, <anno>What</anno>}</c> and - <c>{uncompress, <anno>What</anno>}</c> options. It is possible to have - several <c>compress</c> and <c>uncompress</c> options. In - order to trigger compression of a file, its extension must - match with the - <c>compress</c> condition and must not match the - <c>uncompress</c> condition. For example if <c>compress</c> is - set to <c>["gif", "jpg"]</c> and <c>uncompress</c> is set to - <c>["jpg"]</c>, only files with <c>"gif"</c> as extension will - be compressed. No other files will be compressed.</p> - <marker id="zip_options"></marker> - <p>The following options are available:</p> - <taglist> - <tag><c>cooked</c></tag> - <item> - <p>By default, the <c>open/2</c> function will open the - zip file in <c>raw</c> mode, which is faster but does not allow - a remote (erlang) file server to be used. Adding <c>cooked</c> - to the mode list will override the default and open the zip file - without the <c>raw</c> option. The same goes for the files - added.</p> - </item> - <tag><c>verbose</c></tag> - <item> - <p>Print an informational message about each file - being added.</p> - </item> - <tag><c>memory</c></tag> - <item> - <p>The output will not be to a file, but instead as a tuple - <c>{<anno>FileName</anno>, binary()}</c>. The binary will be a full zip - archive with header, and can be extracted with for instance - <c>unzip/2</c>.</p> - </item> - <tag><c>{comment, <anno>Comment</anno>}</c></tag> - <item> - <p>Add a comment to the zip-archive.</p> - </item> - <tag><c>{cwd, <anno>CWD</anno>}</c></tag> - <item> - <p>Use the given directory as current directory, it will be - prepended to file names when adding them, although it will not - be in the zip-archive. (Acting like a file:set_cwd/1, but - without changing the global cwd property.)</p> - </item> - <tag><c>{compress, <anno>What</anno>}</c></tag> - <item> - <p>Controls what types of files will be - compressed. It is by default set to <c>all</c>. The - following values of <c>What</c> are allowed:</p> - <taglist> - <tag><c>all</c></tag> - <item><p> means that all files will be compressed (as long - as they pass the <c>uncompress</c> condition).</p></item> - <tag><c>[<anno>Extension</anno>]</c></tag> - <item><p>means that only files with exactly these extensions - will be compressed.</p></item> - <tag><c>{add,[<anno>Extension</anno>]}</c></tag> - <item><p>adds these extensions to the list of compress - extensions.</p></item> - <tag><c>{del,[<anno>Extension</anno>]}</c></tag> - <item><p>deletes these extensions from the list of compress - extensions.</p></item> - </taglist> - </item> - <tag><c>{uncompress, <anno>What</anno>}</c></tag> - <item> - <p>Controls what types of files will be uncompressed. It is by - default set to <c>[".Z", ".zip", ".zoo", ".arc", ".lzh", ".arj"]</c>. - The following values of <c>What</c> are allowed:</p> - <taglist> - <tag><c>all</c></tag> - <item><p> means that no files will be compressed.</p></item> - <tag><c>[<anno>Extension</anno>]</c></tag> - <item><p>means that files with these extensions will be - uncompressed.</p></item> - <tag><c>{add,[<anno>Extension</anno>]}</c></tag> - <item><p>adds these extensions to the list of uncompress - extensions.</p></item> - <tag><c>{del,[<anno>Extension</anno>]}</c></tag> - <item><p>deletes these extensions from the list of uncompress - extensions.</p></item> - </taglist> - </item> - </taglist> - </desc> - </func> - <func> - <name name="unzip" arity="1"/> - <name name="unzip" arity="2"/> - <name name="extract" arity="1"/> - <name name="extract" arity="2"/> - <fsummary>Extract files from a zip archive</fsummary> - <desc> - <p>The <marker id="unzip_1"></marker><c>unzip/1</c> function extracts - all files from a zip archive. - The <marker id="unzip_2"></marker><c>unzip/2</c> function provides - options to extract some files, and more.</p> - <p>If the <c><anno>Archive</anno></c> argument is given as a binary, - the contents of the binary is assumed to be a zip archive, - otherwise it should be a filename.</p> - <p>The following options are available:</p> - <taglist> - <tag><c>{file_list, <anno>FileList</anno>}</c></tag> - <item> - <p>By default, all files will be extracted from the zip - archive. With the <c>{file_list, <anno>FileList</anno>}</c> option, - the <c>unzip/2</c> function will only extract the files - whose names are included in <c><anno>FileList</anno></c>. The full - paths, including the names of all sub directories within - the zip archive, must be specified.</p> - </item> - <tag><c>cooked</c></tag> - <item> - <p>By default, the <c>open/2</c> function will open the - zip file in <c>raw</c> mode, which is faster but does not allow - a remote (erlang) file server to be used. Adding <c>cooked</c> - to the mode list will override the default and open the zip file - without the <c>raw</c> option. The same goes for the files - extracted.</p> - </item> - <tag><c>keep_old_files</c></tag> - <item> - <p>By default, all existing files with the same name as file in - the zip archive will be overwritten. With the <c>keep_old_files</c> - option, the <c>unzip/2</c> function will not overwrite any existing - files. Note that even with the <c>memory</c> option given, which - means that no files will be overwritten, files existing will be - excluded from the result.</p> - </item> - <tag><c>verbose</c></tag> - <item> - <p>Print an informational message as each file is being - extracted.</p> - </item> - <tag><c>memory</c></tag> - <item> - <p>Instead of extracting to the current directory, the - <c>memory</c> option will give the result as a list of tuples - <c>{Filename, Binary}</c>, where <c>Binary</c> is a binary - containing the extracted data of the file named <c>Filename</c> - in the zip archive.</p> - </item> - <tag><c>{cwd, CWD}</c></tag> - <item> - <p>Use the given directory as current directory, it will be - prepended to file names when extracting them from the - zip-archive. (Acting like a file:set_cwd/1, but without - changing the global cwd property.)</p> - </item> - </taglist> - </desc> - </func> - <func> <name name="foldl" arity="3"/> - <fsummary>Fold a function over all files in a zip archive</fsummary> + <fsummary>Fold a function over all files in a zip archive.</fsummary> <desc> - <p>The <marker id="foldl_3"></marker> <c>foldl/3</c> function - calls <c><anno>Fun</anno>(<anno>FileInArchive</anno>, <anno>GetInfo - </anno>, <anno>GetBin</anno>, <anno>AccIn</anno>)</c> on - successive files in the <c>Archive</c>, starting with - <c><anno>AccIn</anno> - == <anno>Acc0</anno></c>. <c><anno>FileInArchive</anno></c> is - the name that the file - has in the archive. <c><anno>GetInfo</anno></c> is a fun that - returns info - about the the file. <c><anno>GetBin</anno></c> returns the contents - of the - file. Both <c><anno>GetInfo</anno></c> and <c><anno>GetBin</anno></c> - must be called - within the <c><anno>Fun</anno></c>. Their behavior is undefined if - they are - called outside the context of the <c><anno>Fun</anno></c>. - The <c><anno>Fun</anno></c> - must return a new accumulator which is passed to the next - call. <c>foldl/3</c> returns the final value of the - accumulator. <c><anno>Acc0</anno></c> is returned if the archive is - empty. It is not necessary to iterate over all files in the - archive. The iteration may be ended prematurely in a - controlled manner by throwing an exception.</p> - - <p>For example:</p> + <p>Calls <c><anno>Fun</anno>(<anno>FileInArchive</anno>, <anno>GetInfo + </anno>, <anno>GetBin</anno>, <anno>AccIn</anno>)</c> on + successive files in the <c>Archive</c>, starting with + <c><anno>AccIn</anno> == <anno>Acc0</anno></c>.</p> + <p><c><anno>FileInArchive</anno></c> is the name that the file + has in the archive.</p> + <p><c><anno>GetInfo</anno></c> is a fun that returns information + about the file.</p> + <p><c><anno>GetBin</anno></c> returns the file contents.</p> + <p>Both <c><anno>GetInfo</anno></c> and <c><anno>GetBin</anno></c> + must be called within the <c><anno>Fun</anno></c>. Their behavior is + undefined if they are called outside the context of + <c><anno>Fun</anno></c>.</p> + <p>The <c><anno>Fun</anno></c> must return a new accumulator, which is + passed to the next call. <c>foldl/3</c> returns the final accumulator + value. <c><anno>Acc0</anno></c> is returned if the archive is + empty. It is not necessary to iterate over all files in the archive. + The iteration can be ended prematurely in a controlled manner + by throwing an exception.</p> + <p><em>Example:</em></p> <pre> > <input>Name = "dummy.zip".</input> "dummy.zip" @@ -380,97 +232,300 @@ </pre> </desc> </func> + <func> <name name="list_dir" arity="1"/> <name name="list_dir" arity="2"/> <name name="table" arity="1" /> <name name="table" arity="2"/> - <fsummary>Retrieve the name of all files in a zip archive</fsummary> + <fsummary>Retrieve the name of all files in a zip archive.</fsummary> <desc> - <p>The <marker id="list_dir_1"></marker><c>list_dir/1</c> - function retrieves the names of all files in the zip archive - <c><anno>Archive</anno></c>. The <marker id="list_dir_2"></marker> - <c>list_dir/2</c> function provides options.</p> - <p>As synonyms, the functions <c>table/2</c> and <c>table/3</c> - are provided, to make it resemble the <c>erl_tar</c> module.</p> + <p><c>list_dir/1</c> retrieves all filenames in the zip archive + <c><anno>Archive</anno></c>.</p> + <p><c>list_dir/2</c> provides options.</p> + <p><c>table/1</c> and <c>table/2</c> are provided as synonyms + to resemble the + <seealso marker="erl_tar"><c>erl_tar</c></seealso> module.</p> <p>The result value is the tuple <c>{ok, List}</c>, where <c>List</c> contains the zip archive comment as the first element.</p> - <p>The following options are available:</p> + <p>One option is available:</p> <taglist> <tag><c>cooked</c></tag> <item> - <p>By default, the <c>open/2</c> function will open the - zip file in <c>raw</c> mode, which is faster but does not allow - a remote (erlang) file server to be used. Adding <c>cooked</c> - to the mode list will override the default and open the zip file - without the <c>raw</c> option.</p> + <p>By default, this function opens the zip file in + <c>raw</c> mode, which is faster but does not allow a remote + (Erlang) file server to be used. Adding <c>cooked</c> to the + mode list overrides the default + and opens the zip file without option <c>raw</c>.</p> </item> </taglist> </desc> </func> + <func> <name name="t" arity="1"/> - <fsummary>Print the name of each file in a zip archive</fsummary> + <fsummary>Print the name of each file in a zip archive.</fsummary> <desc> - <p>The <marker id="t_1"></marker><c>t/1</c> function prints the names - of all files in the zip archive <c><anno>Archive</anno></c> to the Erlang shell. - (Similar to "<c>tar t</c>".)</p> + <p>Prints all filenames in the zip archive <c><anno>Archive</anno></c> + to the Erlang shell. (Similar to <c>tar t</c>.)</p> </desc> </func> + <func> <name name="tt" arity="1"/> - <fsummary>Print name and information for each file in a zip archive</fsummary> + <fsummary>Print name and information for each file in a zip archive. + </fsummary> <desc> - <p>The <marker id="tt_1"></marker><c>tt/1</c> function prints names and - information about all files in the zip archive <c><anno>Archive</anno></c> to - the Erlang shell. (Similar to "<c>tar tv</c>".)</p> + <p>Prints filenames and information about all files in the zip archive + <c><anno>Archive</anno></c> to the Erlang shell. + (Similar to <c>tar tv</c>.)</p> </desc> </func> + <func> - <name name="zip_open" arity="1"/> - <name name="zip_open" arity="2"/> - <fsummary>Open an archive and return a handle to it</fsummary> + <name name="unzip" arity="1"/> + <name name="unzip" arity="2"/> + <name name="extract" arity="1"/> + <name name="extract" arity="2"/> + <fsummary>Extract files from a zip archive.</fsummary> <desc> - <p>The <marker id="zip_open"></marker><c>zip_open</c> function - opens a - zip archive, and reads and saves its directory. This - means that subsequently reading files from the archive will be - faster than unzipping files one at a time with <c>unzip</c>.</p> - <p>The archive must be closed with <c>zip_close/1</c>.</p> - <p>The <c><anno>ZipHandle</anno></c> will be closed if the - process which originally opened the archive dies.</p> + <p><c>unzip/1</c> extracts all files from a zip archive.</p> + <p><c>unzip/2</c> provides options to extract some files, and more.</p> + <p><c>extract/1</c> and <c>extract/2</c> are provided as synonyms + to resemble module + <seealso marker="erl_tar"><c>erl_tar</c></seealso>.</p> + <p>If argument <c><anno>Archive</anno></c> is specified as a binary, + the contents of the binary is assumed to be a zip archive, + otherwise a filename.</p> + <p>Options:</p> + <taglist> + <tag><c>{file_list, <anno>FileList</anno>}</c></tag> + <item> + <p>By default, all files are extracted from the zip + archive. With option <c>{file_list, <anno>FileList</anno>}</c>, + function <c>unzip/2</c> only extracts the files + whose names are included in <c><anno>FileList</anno></c>. The full + paths, including the names of all subdirectories within + the zip archive, must be specified.</p> + </item> + <tag><c>cooked</c></tag> + <item> + <p>By default, this function opens the + zip file in <c>raw</c> mode, which is faster but does not allow + a remote (Erlang) file server to be used. Adding <c>cooked</c> + to the mode list overrides the default and opens the zip file + without option <c>raw</c>. The same applies for the files + extracted.</p> + </item> + <tag><c>keep_old_files</c></tag> + <item> + <p>By default, all files with the same name as files in + the zip archive are overwritten. With option <c>keep_old_files</c> + set, function <c>unzip/2</c> does not overwrite existing files. + Notice that + even with option <c>memory</c> specified, which + means that no files are overwritten, existing files are + excluded from the result.</p> + </item> + <tag><c>verbose</c></tag> + <item> + <p>Prints an informational message for each extracted file.</p> + </item> + <tag><c>memory</c></tag> + <item> + <p>Instead of extracting to the current directory, + the result is given as a list of tuples + <c>{Filename, Binary}</c>, where <c>Binary</c> is a binary + containing the extracted data of file <c>Filename</c> + in the zip archive.</p> + </item> + <tag><c>{cwd, CWD}</c></tag> + <item> + <p>Uses the specified directory as current directory. It is + prepended to filenames when extracting them from the + zip archive. (Acting like + <seealso marker="kernel:file#set_cwd/1"> + <c>file:set_cwd/1</c></seealso> in Kernel, + but without changing the global <c>cwd</c> property.)</p> + </item> + </taglist> </desc> </func> + <func> - <name name="zip_list_dir" arity="1"/> - <fsummary>Return a table of files in open zip archive</fsummary> + <name name="zip" arity="2"/> + <name name="zip" arity="3"/> + <name name="create" arity="2"/> + <name name="create" arity="3"/> + <fsummary>Create a zip archive with options.</fsummary> <desc> - <p>The <marker id="zip_list_dir"></marker> - <c>zip_list_dir/1</c> function - returns the file list of an open zip archive. The first returned - element is the zip archive comment.</p> + <p>Creates a zip archive containing the files specified in + <c><anno>FileList</anno></c>.</p> + <p><c>create/2</c> and <c>create/3</c> are provided as synonyms + to resemble module + <seealso marker="erl_tar"><c>erl_tar</c></seealso>.</p> + <p><c><anno>FileList</anno></c> is a list of files, with paths relative + to the current directory, which are stored with this path in the + archive. Files can also be specified with data in binaries + to create an archive directly from data.</p> + <p>Files are compressed using the DEFLATE compression, as + described in the "Appnote.txt" file. However, files are + stored without compression if they are already compressed. + <c>zip/2</c> and <c>zip/3</c> check the file extension + to determine if the file is to be stored without compression. + Files with the following extensions are not compressed: + <c>.Z</c>, <c>.zip</c>, <c>.zoo</c>, <c>.arc</c>, <c>.lzh</c>, + <c>.arj</c>.</p> + <p>It is possible to override the default behavior and control + what types of files that are to be compressed by using options + <c>{compress, <anno>What</anno>}</c> and + <c>{uncompress, <anno>What</anno>}</c>. It is also possible to use + many <c>compress</c> and <c>uncompress</c> options.</p> + <p>To trigger file compression, its extension must match with the + <c>compress</c> condition and must not match the + <c>uncompress</c> condition. For example, if <c>compress</c> is + set to <c>["gif", "jpg"]</c> and <c>uncompress</c> is set to + <c>["jpg"]</c>, only files with extension <c>"gif"</c> are + compressed.</p> + <marker id="zip_options"></marker> + <p>Options:</p> + <taglist> + <tag><c>cooked</c></tag> + <item> + <p>By default, this function opens the + zip file in mode <c>raw</c>, which is faster but does not allow + a remote (Erlang) file server to be used. Adding <c>cooked</c> + to the mode list overrides the default and opens the zip file + without the <c>raw</c> option. The same applies for the files + added.</p> + </item> + <tag><c>verbose</c></tag> + <item> + <p>Prints an informational message about each added file.</p> + </item> + <tag><c>memory</c></tag> + <item> + <p>The output is not to a file, but instead as a tuple + <c>{<anno>FileName</anno>, binary()}</c>. The binary is a full zip + archive with header and can be extracted with, for example, + <seealso marker="#unzip/2"><c>unzip/2</c></seealso>.</p> + </item> + <tag><c>{comment, <anno>Comment</anno>}</c></tag> + <item> + <p>Adds a comment to the zip archive.</p> + </item> + <tag><c>{cwd, <anno>CWD</anno>}</c></tag> + <item> + <p>Uses the specified directory as current work directory + (<c>cwd</c>). This is prepended to filenames when adding them, + although not in the zip archive (acting like + <seealso marker="kernel:file#set_cwd/1"> + <c>file:set_cwd/1</c></seealso> in Kernel, but without + changing the global <c>cwd</c> property.).</p> + </item> + <tag><c>{compress, <anno>What</anno>}</c></tag> + <item> + <p>Controls what types of files to be compressed. Defaults to + <c>all</c>. The following values of <c>What</c> are allowed:</p> + <taglist> + <tag><c>all</c></tag> + <item> + <p>All files are compressed (as long + as they pass the <c>uncompress</c> condition).</p> + </item> + <tag><c>[<anno>Extension</anno>]</c></tag> + <item> + <p>Only files with exactly these extensions + are compressed.</p> + </item> + <tag><c>{add,[<anno>Extension</anno>]}</c></tag> + <item> + <p>Adds these extensions to the list of compress + extensions.</p> + </item> + <tag><c>{del,[<anno>Extension</anno>]}</c></tag> + <item> + <p>Deletes these extensions from the list of compress + extensions.</p> + </item> + </taglist> + </item> + <tag><c>{uncompress, <anno>What</anno>}</c></tag> + <item> + <p>Controls what types of files to be uncompressed. Defaults to + <c>[".Z", ".zip", ".zoo", ".arc", ".lzh", ".arj"]</c>. + The following values of <c>What</c> are allowed:</p> + <taglist> + <tag><c>all</c></tag> + <item> + <p>No files are compressed.</p> + </item> + <tag><c>[<anno>Extension</anno>]</c></tag> + <item> + <p>Files with these extensions are uncompressed.</p> + </item> + <tag><c>{add,[<anno>Extension</anno>]}</c></tag> + <item> + <p>Adds these extensions to the list of uncompress + extensions.</p> + </item> + <tag><c>{del,[<anno>Extension</anno>]}</c></tag> + <item> + <p>Deletes these extensions from the list of uncompress + extensions.</p> + </item> + </taglist> + </item> + </taglist> </desc> </func> + + <func> + <name name="zip_close" arity="1"/> + <fsummary>Close an open archive.</fsummary> + <desc> + <p>Closes a zip archive, previously opened with + <seealso marker="#zip_open/1"><c>zip_open/1,2</c></seealso>. + All resources are closed, and the handle is not to be used after + closing.</p> + </desc> + </func> + <func> <name name="zip_get" arity="1"/> <name name="zip_get" arity="2"/> - <fsummary>Extract files from an open archive</fsummary> + <fsummary>Extract files from an open archive.</fsummary> <desc> - <p>The <marker id="zip_get"></marker><c>zip_get</c> function extracts - one or all files from an open archive.</p> - <p>The files will be unzipped to memory or to file, depending on - the options given to the <c>zip_open</c> function when the - archive was opened.</p> + <p>Extracts one or all files from an open archive.</p> + <p>The files are unzipped to memory or to file, depending on + the options specified to function + <seealso marker="#zip_open/1"><c>zip_open/1,2</c></seealso> + when opening the archive.</p> </desc> </func> + <func> - <name name="zip_close" arity="1"/> - <fsummary>Close an open archive</fsummary> + <name name="zip_list_dir" arity="1"/> + <fsummary>Return a table of files in open zip archive.</fsummary> <desc> - <p>The <marker id="zip_close"></marker><c>zip_close/1</c> function - closes a zip archive, previously opened with <c>zip_open</c>. All - resources are closed, and the handle should not be used after - closing.</p> + <p>Returns the file list of an open zip archive. The first returned + element is the zip archive comment.</p> + </desc> + </func> + + <func> + <name name="zip_open" arity="1"/> + <name name="zip_open" arity="2"/> + <fsummary>Open an archive and return a handle to it.</fsummary> + <desc> + <p>Opens a zip archive, and reads and saves its directory. This + means that later reading files from the archive is + faster than unzipping files one at a time with + <seealso marker="#unzip/1"><c>unzip/1,2</c></seealso>.</p> + <p>The archive must be closed with + <seealso marker="#zip_close/1"><c>zip_close/1</c></seealso>.</p> + <p>The <c><anno>ZipHandle</anno></c> is closed if the + process that originally opened the archive dies.</p> </desc> </func> </funcs> |