diff options
Diffstat (limited to 'lib/stdlib/doc/src')
29 files changed, 1713 insertions, 367 deletions
diff --git a/lib/stdlib/doc/src/assert_hrl.xml b/lib/stdlib/doc/src/assert_hrl.xml index cb91b1f126..ea23cca2ee 100644 --- a/lib/stdlib/doc/src/assert_hrl.xml +++ b/lib/stdlib/doc/src/assert_hrl.xml @@ -4,7 +4,7 @@ <fileref> <header> <copyright> - <year>2012</year><year>2016</year> + <year>2012</year><year>2017</year> <holder>Ericsson AB. All Rights Reserved.</holder> </copyright> <legalnotice> @@ -28,7 +28,7 @@ <date></date> <rev></rev> </header> - <file>assert.hrl.xml</file> + <file>assert.hrl</file> <filesummary>Assert macros.</filesummary> <description> <p>The include file <c>assert.hrl</c> provides macros for inserting @@ -49,25 +49,33 @@ entries in the <c>Info</c> list are optional; do not rely programatically on any of them being present.</p> + <p>Each assert macro has a corresponding version with an extra argument, + for adding comments to assertions. These can for example be printed as + part of error reports, to clarify the meaning of the check that + failed. For example, <c>?assertEqual(0, fib(0), "Fibonacci is defined + for zero")</c>. The comment text can be any character data (string, + UTF8-binary, or deep list of such data), and will be included in the + error term as <c>{comment, Text}</c>.</p> + <p>If the macro <c>NOASSERT</c> is defined when <c>assert.hrl</c> is read by the compiler, the macros are defined as equivalent to the atom - <c>ok</c>. The test is not performed and there is no cost at runtime.</p> + <c>ok</c>. The test will not be performed and there is no cost at runtime.</p> <p>For example, using <c>erlc</c> to compile your modules, the following - disable all assertions:</p> + disables all assertions:</p> <code type="none"> erlc -DNOASSERT=true *.erl</code> - <p>The value of <c>NOASSERT</c> does not matter, only the fact that it is - defined.</p> + <p>(The value of <c>NOASSERT</c> does not matter, only the fact that it is + defined.)</p> <p>A few other macros also have effect on the enabling or disabling of assertions:</p> <list type="bulleted"> - <item><p>If <c>NODEBUG</c> is defined, it implies <c>NOASSERT</c>, unless - <c>DEBUG</c> is also defined, which is assumed to take precedence.</p> + <item><p>If <c>NODEBUG</c> is defined, it implies <c>NOASSERT</c> (unless + <c>DEBUG</c> is also defined, which overrides <c>NODEBUG</c>).</p> </item> <item><p>If <c>ASSERT</c> is defined, it overrides <c>NOASSERT</c>, that is, the assertions remain enabled.</p></item> @@ -84,16 +92,22 @@ erlc -DNOASSERT=true *.erl</code> <title>Macros</title> <taglist> <tag><c>assert(BoolExpr)</c></tag> + <item></item> + <tag><c>URKAassert(BoolExpr, Comment)</c></tag> <item> <p>Tests that <c>BoolExpr</c> completes normally returning <c>true</c>.</p> </item> <tag><c>assertNot(BoolExpr)</c></tag> + <item></item> + <tag><c>assertNot(BoolExpr, Comment)</c></tag> <item> <p>Tests that <c>BoolExpr</c> completes normally returning <c>false</c>.</p> </item> <tag><c>assertMatch(GuardedPattern, Expr)</c></tag> + <item></item> + <tag><c>assertMatch(GuardedPattern, Expr, Comment)</c></tag> <item> <p>Tests that <c>Expr</c> completes normally yielding a value that matches <c>GuardedPattern</c>, for example:</p> @@ -104,6 +118,8 @@ erlc -DNOASSERT=true *.erl</code> ?assertMatch({bork, X} when X > 0, f())</code> </item> <tag><c>assertNotMatch(GuardedPattern, Expr)</c></tag> + <item></item> + <tag><c>assertNotMatch(GuardedPattern, Expr, Comment)</c></tag> <item> <p>Tests that <c>Expr</c> completes normally yielding a value that does not match <c>GuardedPattern</c>.</p> @@ -111,16 +127,22 @@ erlc -DNOASSERT=true *.erl</code> <c>when</c> part.</p> </item> <tag><c>assertEqual(ExpectedValue, Expr)</c></tag> + <item></item> + <tag><c>assertEqual(ExpectedValue, Expr, Comment)</c></tag> <item> <p>Tests that <c>Expr</c> completes normally yielding a value that is exactly equal to <c>ExpectedValue</c>.</p> </item> <tag><c>assertNotEqual(ExpectedValue, Expr)</c></tag> + <item></item> + <tag><c>assertNotEqual(ExpectedValue, Expr, Comment)</c></tag> <item> <p>Tests that <c>Expr</c> completes normally yielding a value that is not exactly equal to <c>ExpectedValue</c>.</p> </item> <tag><c>assertException(Class, Term, Expr)</c></tag> + <item></item> + <tag><c>assertException(Class, Term, Expr, Comment)</c></tag> <item> <p>Tests that <c>Expr</c> completes abnormally with an exception of type <c>Class</c> and with the associated <c>Term</c>. The assertion fails @@ -130,6 +152,8 @@ erlc -DNOASSERT=true *.erl</code> patterns, as in <c>assertMatch</c>.</p> </item> <tag><c>assertNotException(Class, Term, Expr)</c></tag> + <item></item> + <tag><c>assertNotException(Class, Term, Expr, Comment)</c></tag> <item> <p>Tests that <c>Expr</c> does not evaluate abnormally with an exception of type <c>Class</c> and with the associated <c>Term</c>. @@ -139,14 +163,20 @@ erlc -DNOASSERT=true *.erl</code> be guarded patterns.</p> </item> <tag><c>assertError(Term, Expr)</c></tag> + <item></item> + <tag><c>assertError(Term, Expr, Comment)</c></tag> <item> <p>Equivalent to <c>assertException(error, Term, Expr)</c></p> </item> <tag><c>assertExit(Term, Expr)</c></tag> + <item></item> + <tag><c>assertExit(Term, Expr, Comment)</c></tag> <item> <p>Equivalent to <c>assertException(exit, Term, Expr)</c></p> </item> <tag><c>assertThrow(Term, Expr)</c></tag> + <item></item> + <tag><c>assertThrow(Term, Expr, Comment)</c></tag> <item> <p>Equivalent to <c>assertException(throw, Term, Expr)</c></p> </item> diff --git a/lib/stdlib/doc/src/c.xml b/lib/stdlib/doc/src/c.xml index 92ab59c6b0..7666699183 100644 --- a/lib/stdlib/doc/src/c.xml +++ b/lib/stdlib/doc/src/c.xml @@ -52,13 +52,27 @@ <func> <name name="c" arity="1"/> <name name="c" arity="2"/> - <fsummary>Compile and load code in a file.</fsummary> + <name name="c" arity="3"/> + <fsummary>Compile and load a file or module.</fsummary> <desc> - <p>Compiles and then purges and loads the code for a file. - <c><anno>Options</anno></c> defaults to <c>[]</c>. Compilation is - equivalent to:</p> - <code type="none"> -compile:file(<anno>File</anno>, <anno>Options</anno> ++ [report_errors, report_warnings])</code> + <p>Compiles and then purges and loads the code for a module. + <c><anno>Module</anno></c> can be either a module name or a source + file path, with or without <c>.erl</c> extension. + <c><anno>Options</anno></c> defaults to <c>[]</c>.</p> + <p>If <c><anno>Module</anno></c> is an atom and is not the path of a + source file, then the code path is searched to locate the object + file for the module and extract its original compiler options and + source path. If the source file is not found in the original + location, <seealso + marker="filelib#find_source/1"><c>filelib:find_source/1</c></seealso> + is used to search for it relative to the directory of the object + file.</p> + <p>The source file is compiled with the the original + options appended to the given <c><anno>Options</anno></c>, the + output replacing the old object file if and only if compilation + succeeds. A function <c><anno>Filter</anno></c> can be specified + for removing elements from from the original compiler options + before the new options are added.</p> <p>Notice that purging the code means that any processes lingering in old code for the module are killed without warning. For more information, see <c>code/3</c>.</p> @@ -148,6 +162,15 @@ compile:file(<anno>File</anno>, <anno>Options</anno> ++ [report_errors, report_w </func> <func> + <name name="lm" arity="0"/> + <fsummary>Loads all modified modules.</fsummary> + <desc> + <p>Reloads all currently loaded modules that have changed on disk (see <c>mm()</c>). + Returns the list of results from calling <c>l(M)</c> for each such <c>M</c>.</p> + </desc> + </func> + + <func> <name name="ls" arity="0"/> <fsummary>List files in the current directory.</fsummary> <desc> @@ -182,6 +205,15 @@ compile:file(<anno>File</anno>, <anno>Options</anno> ++ [report_errors, report_w </func> <func> + <name name="mm" arity="0"/> + <fsummary>Lists all modified modules.</fsummary> + <desc> + <p>Lists all modified modules. Shorthand for + <seealso marker="kernel:code#modified_modules/0"><c>code:modified_modules/0</c></seealso>.</p> + </desc> + </func> + + <func> <name name="memory" arity="0"/> <fsummary>Memory allocation information.</fsummary> <desc> diff --git a/lib/stdlib/doc/src/dets.xml b/lib/stdlib/doc/src/dets.xml index 2e4261d72e..eb6e32aecf 100644 --- a/lib/stdlib/doc/src/dets.xml +++ b/lib/stdlib/doc/src/dets.xml @@ -100,18 +100,12 @@ provided by Dets, neither is the limited support for concurrent updates that makes a sequence of <c>first</c> and <c>next</c> calls safe to use on fixed ETS tables. Both these - features will be provided by Dets in a future release of + features may be provided by Dets in a future release of Erlang/OTP. Until then, the Mnesia application (or some user-implemented method for locking) must be used to implement safe concurrency. Currently, no Erlang/OTP library has support for ordered disk-based term storage.</p> - <p>Two versions of the format used for storing objects on file are - supported by Dets. The first version, 8, is the format always used - for tables created by Erlang/OTP R7 and earlier. The second version, 9, - is the default version of tables created by Erlang/OTP R8 (and later - releases). Erlang/OTP R8 can create version 8 tables, and convert version - 8 tables to version 9, and conversely, upon request.</p> <p>All Dets functions return <c>{error, Reason}</c> if an error occurs (<seealso marker="#first/1"><c>first/1</c></seealso> and <seealso marker="#next/2"><c>next/2</c></seealso> are exceptions, they @@ -190,9 +184,6 @@ <datatype> <name name="type"/> </datatype> - <datatype> - <name name="version"/> - </datatype> </datatypes> <funcs> @@ -385,8 +376,7 @@ <p><c>{bchunk_format, binary()}</c> - An opaque binary describing the format of the objects returned by <c>bchunk/2</c>. The binary can be used as argument to - <c>is_compatible_chunk_format/2</c>. Only available for - version 9 tables.</p> + <c>is_compatible_chunk_format/2</c>.</p> </item> <item> <p><c>{hash, Hash}</c> - Describes which BIF is @@ -394,10 +384,6 @@ Dets table. Possible values of <c>Hash</c>:</p> <list> <item> - <p><c>hash</c> - Implies that the <c>erlang:hash/2</c> BIF - is used.</p> - </item> - <item> <p><c>phash</c> - Implies that the <c>erlang:phash/2</c> BIF is used.</p> </item> @@ -413,8 +399,7 @@ </item> <item> <p><c>{no_keys, integer >= 0()}</c> - The number of different - keys stored in the table. Only available for version 9 - tables.</p> + keys stored in the table.</p> </item> <item> <p><c>{no_objects, integer >= 0()}</c> - The number of objects @@ -424,8 +409,7 @@ <p><c>{no_slots, {Min, Used, Max}}</c> - The number of slots of the table. <c>Min</c> is the minimum number of slots, <c>Used</c> is the number of currently used slots, - and <c>Max</c> is the maximum number of slots. Only - available for version 9 tables.</p> + and <c>Max</c> is the maximum number of slots.</p> </item> <item> <p><c>{owner, pid()}</c> - The pid of the process that @@ -466,10 +450,6 @@ time warp safe</seealso>. Time warp safe code must use <c>safe_fixed_monotonic_time</c> instead.</p> </item> - <item> - <p><c>{version, integer()}</c> - The version of the format of - the table.</p> - </item> </list> </desc> </func> @@ -662,8 +642,8 @@ ok objects at a time, until at least one object matches or the end of the table is reached. The default, indicated by giving <c><anno>N</anno></c> the value <c>default</c>, is to let - the number of objects vary depending on the sizes of the objects. If - <c><anno>Name</anno></c> is a version 9 table, all objects with the + the number of objects vary depending on the sizes of the objects. + All objects with the same key are always matched at the same time, which implies that more than <anno>N</anno> objects can sometimes be matched.</p> <p>The table is always to be protected using @@ -743,9 +723,9 @@ ok end of the table is reached. The default, indicated by giving <c><anno>N</anno></c> the value <c>default</c>, is to let the number - of objects vary depending on the sizes of the objects. If - <c><anno>Name</anno></c> is a version 9 table, all matching objects - with the same key are always returned in the same reply, which implies + of objects vary depending on the sizes of the objects. All + matching objects with the same key are always returned + in the same reply, which implies that more than <anno>N</anno> objects can sometimes be returned.</p> <p>The table is always to be protected using <seealso marker="#safe_fixtable/2"><c>safe_fixtable/2</c></seealso> @@ -842,8 +822,7 @@ ok maximal value. Notice that a higher value can increase the table fragmentation, and a smaller value can decrease the fragmentation, at - the expense of execution time. Only available for version - 9 tables.</p> + the expense of execution time.</p> </item> <item> <p><c>{min_no_slots, </c><seealso marker="#type-no_slots"> @@ -880,12 +859,7 @@ ok FileName}}</c> is returned if the table must be repaired.</p> <p>Value <c>force</c> means that a reparation is made even if the table is properly closed. - This is how to convert tables created by older versions of - STDLIB. An example is tables hashed with the deprecated - <c>erlang:hash/2</c> BIF. Tables created with Dets from - STDLIB version 1.8.2 or later use function - <c>erlang:phash/2</c> or function <c>erlang:phash2/1</c>, - which is preferred.</p> + This is a seldom needed option.</p> <p>Option <c>repair</c> is ignored if the table is already open.</p> </item> <item> @@ -893,15 +867,6 @@ ok <c>type()</c></seealso><c>}</c> - The table type. Defaults to <c>set</c>.</p> </item> - <item> - <p><c>{version, </c><seealso marker="#type-version"> - <c>version()</c></seealso><c>}</c> - The version of the format - used for the table. Defaults to <c>9</c>. Tables on the format - used before Erlang/OTP R8 can be created by specifying value - <c>8</c>. A version 8 table can be converted to a version 9 - table by specifying options <c>{version,9}</c> - and <c>{repair,force}</c>.</p> - </item> </list> </desc> </func> @@ -1041,8 +1006,8 @@ ok a time, until at least one object matches or the end of the table is reached. The default, indicated by giving <c><anno>N</anno></c> the value <c>default</c>, is to let the number - of objects vary depending on the sizes of the objects. If - <c><anno>Name</anno></c> is a version 9 table, all objects with the + of objects vary depending on the sizes of the objects. All + objects with the same key are always handled at the same time, which implies that the match specification can be applied to more than <anno>N</anno> objects.</p> diff --git a/lib/stdlib/doc/src/dict.xml b/lib/stdlib/doc/src/dict.xml index c926ff1b5b..c229a18721 100644 --- a/lib/stdlib/doc/src/dict.xml +++ b/lib/stdlib/doc/src/dict.xml @@ -106,6 +106,16 @@ </func> <func> + <name name="take" arity="2"/> + <fsummary>Return value and new dictionary without element with this value.</fsummary> + <desc> + <p>This function returns value from dictionary and a + new dictionary without this value. + Returns <c>error</c> if the key is not present in the dictionary.</p> + </desc> + </func> + + <func> <name name="filter" arity="2"/> <fsummary>Select elements that satisfy a predicate.</fsummary> <desc> diff --git a/lib/stdlib/doc/src/erl_expand_records.xml b/lib/stdlib/doc/src/erl_expand_records.xml index 7e4aa2db37..b6aa75ed03 100644 --- a/lib/stdlib/doc/src/erl_expand_records.xml +++ b/lib/stdlib/doc/src/erl_expand_records.xml @@ -45,8 +45,10 @@ <name name="module" arity="2"/> <fsummary>Expand all records in a module.</fsummary> <desc> - <p>Expands all records in a module. The returned module has no - references to records, attributes, or code.</p> + <p>Expands all records in a module to use explicit tuple + operations and adds explicit module names to calls to BIFs and + imported functions. The returned module has no references to + records, attributes, or code.</p> </desc> </func> </funcs> diff --git a/lib/stdlib/doc/src/erl_internal.xml b/lib/stdlib/doc/src/erl_internal.xml index cf49df0972..17cd0fb240 100644 --- a/lib/stdlib/doc/src/erl_internal.xml +++ b/lib/stdlib/doc/src/erl_internal.xml @@ -44,6 +44,16 @@ <funcs> <func> + <name name="add_predefined_functions" arity="1"/> + <fsummary>Add code for pre-defined functions.</fsummary> + <desc> + <p>Adds to <c><anno>Forms</anno></c> the code for the standard + pre-defined functions (such as <c>module_info/0</c>) that are + to be included in every module.</p> + </desc> + </func> + + <func> <name name="arith_op" arity="2"/> <fsummary>Test for an arithmetic operator.</fsummary> <desc> diff --git a/lib/stdlib/doc/src/erl_tar.xml b/lib/stdlib/doc/src/erl_tar.xml index 24e7b64b9e..fab7c832d5 100644 --- a/lib/stdlib/doc/src/erl_tar.xml +++ b/lib/stdlib/doc/src/erl_tar.xml @@ -37,12 +37,13 @@ </modulesummary> <description> <p>This module archives and extract files to and from - a tar file. This module supports the <c>ustar</c> format - (IEEE Std 1003.1 and ISO/IEC 9945-1). All modern <c>tar</c> - programs (including GNU tar) can read this format. To ensure that - that GNU tar produces a tar file that <c>erl_tar</c> can read, - specify option <c>--format=ustar</c> to GNU tar.</p> - + a tar file. This module supports reading most common tar formats, + namely v7, STAR, USTAR, and PAX, as well as some of GNU tar's extensions + to the USTAR format (sparse files most notably). It produces tar archives + in USTAR format, unless the files being archived require PAX format due to + restrictions in USTAR (such as unicode metadata, filename length, and more). + As such, <c>erl_tar</c> supports tar archives produced by most all modern + tar utilities, and produces tarballs which should be similarly portable.</p> <p>By convention, the name of a tar file is to end in "<c>.tar</c>". To abide to the convention, add "<c>.tar</c>" to the name.</p> @@ -83,6 +84,8 @@ <p>If <seealso marker="kernel:file#native_name_encoding/0"> <c>file:native_name_encoding/0</c></seealso> returns <c>latin1</c>, no translation of path names is done.</p> + + <p>Unicode metadata stored in PAX headers is preserved</p> </section> <section> @@ -104,21 +107,20 @@ <title>Limitations</title> <list type="bulleted"> <item> - <p>For maximum compatibility, it is safe to archive files with names - up to 100 characters in length. Such tar files can generally be - extracted by any <c>tar</c> program.</p> - </item> - <item> - <p>For filenames exceeding 100 characters in length, the resulting tar - file can only be correctly extracted by a POSIX-compatible <c>tar</c> - program (such as Solaris <c>tar</c> or a modern GNU <c>tar</c>).</p> - </item> - <item> - <p>Files with longer names than 256 bytes cannot be stored.</p> + <p>If you must remain compatible with the USTAR tar format, you must ensure file paths being + stored are less than 255 bytes in total, with a maximum filename component + length of 100 bytes. USTAR uses a header field (prefix) in addition to the name field, and + splits file paths longer than 100 bytes into two parts. This split is done on a directory boundary, + and is done in such a way to make the best use of the space available in those two fields, but in practice + this will often mean that you have less than 255 bytes for a path. <c>erl_tar</c> will + automatically upgrade the format to PAX to handle longer filenames, so this is only an issue if you + need to extract the archive with an older implementation of <c>erl_tar</c> or <c>tar</c> which does + not support PAX. In this case, the PAX headers will be extracted as regular files, and you will need to + apply them manually.</p> </item> <item> - <p>The file name a symbolic link points is always limited - to 100 characters.</p> + <p>Like the above, if you must remain USTAR compatible, you must also ensure than paths for + symbolic/hard links are no more than 100 bytes, otherwise PAX headers will be used.</p> </item> </list> </section> @@ -129,7 +131,9 @@ <fsummary>Add a file to an open tar file.</fsummary> <type> <v>TarDescriptor = term()</v> - <v>Filename = filename()</v> + <v>FilenameOrBin = filename()|binary()</v> + <v>NameInArchive = filename()</v> + <v>Filename = filename()|{NameInArchive,FilenameOrBin}</v> <v>Options = [Option]</v> <v>Option = dereference|verbose|{chunks,ChunkSize}</v> <v>ChunkSize = positive_integer()</v> @@ -139,6 +143,9 @@ <desc> <p>Adds a file to a tar file that has been opened for writing by <seealso marker="#open/2"><c>open/1</c></seealso>.</p> + <p><c>NameInArchive</c> is the name under which the file becomes + stored in the tar file. The file gets this name when it is + extracted from the tar file.</p> <p>Options:</p> <taglist> <tag><c>dereference</c></tag> @@ -183,9 +190,6 @@ <seealso marker="#open/2"><c>open/2</c></seealso>. This function accepts the same options as <seealso marker="#add/3"><c>add/3</c></seealso>.</p> - <p><c>NameInArchive</c> is the name under which the file becomes - stored in the tar file. The file gets this name when it is - extracted from the tar file.</p> </desc> </func> @@ -206,8 +210,8 @@ <fsummary>Create a tar archive.</fsummary> <type> <v>Name = filename()</v> - <v>FileList = [Filename|{NameInArchive, binary()},{NameInArchive, - Filename}]</v> + <v>FileList = [Filename|{NameInArchive, FilenameOrBin}]</v> + <v>FilenameOrBin = filename()|binary()</v> <v>Filename = filename()</v> <v>NameInArchive = filename()</v> <v>RetValue = ok|{error,{Name,Reason}}</v> @@ -225,8 +229,8 @@ <fsummary>Create a tar archive with options.</fsummary> <type> <v>Name = filename()</v> - <v>FileList = [Filename|{NameInArchive, binary()},{NameInArchive, - Filename}]</v> + <v>FileList = [Filename|{NameInArchive, FilenameOrBin}]</v> + <v>FilenameOrBin = filename()|binary()</v> <v>Filename = filename()</v> <v>NameInArchive = filename()</v> <v>OptionList = [Option]</v> @@ -275,7 +279,8 @@ <name>extract(Name) -> RetValue</name> <fsummary>Extract all files from a tar file.</fsummary> <type> - <v>Name = filename()</v> + <v>Name = filename() | {binary,binary()} | {file,Fd}</v> + <v>Fd = file_descriptor()</v> <v>RetValue = ok|{error,{Name,Reason}}</v> <v>Reason = term()</v> </type> @@ -287,6 +292,10 @@ <c>Fd</c> is assumed to be a file descriptor returned from function <c>file:open/2</c>.</p> <p>Otherwise, <c>Name</c> is to be a filename.</p> + <note><p>Leading slashes in tar member names will be removed before + writing the file. That is, absolute paths will be turned into + relative paths. There will be an info message written to the error + logger when paths are changed in this way.</p></note> </desc> </func> @@ -294,8 +303,7 @@ <name>extract(Name, OptionList)</name> <fsummary>Extract files from a tar file.</fsummary> <type> - <v>Name = filename() | {binary,Binary} | {file,Fd}</v> - <v>Binary = binary()</v> + <v>Name = filename() | {binary,binary()} | {file,Fd}</v> <v>Fd = file_descriptor()</v> <v>OptionList = [Option]</v> <v>Option = {cwd,Cwd}|{files,FileList}|keep_old_files|verbose|memory</v> @@ -521,7 +529,7 @@ erl_tar:close(TarDesc)</code> <name>table(Name) -> RetValue</name> <fsummary>Retrieve the name of all files in a tar file.</fsummary> <type> - <v>Name = filename()</v> + <v>Name = filename()|{binary,binary()}|{file,file_descriptor()}</v> <v>RetValue = {ok,[string()]}|{error,{Name,Reason}}</v> <v>Reason = term()</v> </type> @@ -535,7 +543,7 @@ erl_tar:close(TarDesc)</code> <fsummary>Retrieve name and information of all files in a tar file. </fsummary> <type> - <v>Name = filename()</v> + <v>Name = filename()|{binary,binary()}|{file,file_descriptor()}</v> </type> <desc> <p>Retrieves the names of all files in the tar file <c>Name</c>.</p> @@ -546,7 +554,7 @@ erl_tar:close(TarDesc)</code> <name>t(Name)</name> <fsummary>Print the name of each file in a tar file.</fsummary> <type> - <v>Name = filename()</v> + <v>Name = filename()|{binary,binary()}|{file,file_descriptor()}</v> </type> <desc> <p>Prints the names of all files in the tar file <c>Name</c> to the @@ -559,7 +567,7 @@ erl_tar:close(TarDesc)</code> <fsummary>Print name and information for each file in a tar file. </fsummary> <type> - <v>Name = filename()</v> + <v>Name = filename()|{binary,binary()}|{file,file_descriptor()}</v> </type> <desc> <p>Prints names and information about all files in the tar file diff --git a/lib/stdlib/doc/src/ets.xml b/lib/stdlib/doc/src/ets.xml index 5f5d2b7f36..d1ec176f81 100644 --- a/lib/stdlib/doc/src/ets.xml +++ b/lib/stdlib/doc/src/ets.xml @@ -541,10 +541,6 @@ Error: fun containing local Erlang function calls <c><anno>Tab</anno></c> is not of the correct type, or if <c><anno>Item</anno></c> is not one of the allowed values, a <c>badarg</c> exception is raised.</p> - <warning> - <p>In Erlang/OTP R11B and earlier, this function would not fail but - return <c>undefined</c> for invalid values for <c>Item</c>.</p> - </warning> <p>In addition to the <c>{<anno>Item</anno>,<anno>Value</anno>}</c> pairs defined for <seealso marker="#info/1"><c>info/1</c></seealso>, the following items are allowed:</p> @@ -1495,6 +1491,25 @@ is_integer(X), is_integer(Y), X + Y < 4711]]></code> </func> <func> + <name name="select_replace" arity="2"/> + <fsummary>Match and replace objects atomically in an ETS table</fsummary> + <desc> + <p>Matches the objects in the table <c><anno>Tab</anno></c> using a + <seealso marker="#match_spec">match specification</seealso>. If + an object is matched, the existing object is replaced with + the match specification result, which <em>must</em> retain + the original key or the operation will fail with <c>badarg</c>.</p> + <p>For the moment, due to performance and semantic constraints, + tables of type <c>bag</c> are not yet supported.</p> + <p>The function returns the total number of replaced objects.</p> + <note> + <p>The match/replacement operation atomicity scope is limited + to each individual object.</p> + </note> + </desc> + </func> + + <func> <name name="select_reverse" arity="1"/> <fsummary>Continue matching objects in an ETS table.</fsummary> <desc> diff --git a/lib/stdlib/doc/src/filelib.xml b/lib/stdlib/doc/src/filelib.xml index 7c6380ce28..ad73fc254a 100644 --- a/lib/stdlib/doc/src/filelib.xml +++ b/lib/stdlib/doc/src/filelib.xml @@ -60,6 +60,12 @@ <datatype> <name name="filename_all"/> </datatype> + <datatype> + <name name="find_file_rule"/> + </datatype> + <datatype> + <name name="find_source_rule"/> + </datatype> </datatypes> <funcs> @@ -226,7 +232,51 @@ filelib:wildcard("lib/**/*.{erl,hrl}")</code> directory.</p> </desc> </func> + + <func> + <name name="find_file" arity="2"/> + <name name="find_file" arity="3"/> + <fsummary>Find a file relative to a given directory.</fsummary> + <desc> + <p>Looks for a file of the given name by applying suffix rules to + the given directory path. For example, a rule <c>{"ebin", "src"}</c> + means that if the directory path ends with <c>"ebin"</c>, the + corresponding path ending in <c>"src"</c> should be searched.</p> + <p>If <c><anno>Rules</anno></c> is left out or is an empty list, the + default system rules are used. See also the Kernel application + parameter <seealso + marker="kernel:kernel_app#source_search_rules"><c>source_search_rules</c></seealso>.</p> + </desc> + </func> + <func> + <name name="find_source" arity="1"/> + <fsummary>Find the source file for a given object file.</fsummary> + <desc> + <p>Equivalent to <c>find_source(Base, Dir)</c>, where <c>Dir</c> is + <c>filename:dirname(<anno>FilePath</anno>)</c> and <c>Base</c> is + <c>filename:basename(<anno>FilePath</anno>)</c>.</p> + </desc> + </func> + <func> + <name name="find_source" arity="2"/> + <name name="find_source" arity="3"/> + <fsummary>Find a source file relative to a given directory.</fsummary> + <desc> + <p>Applies file extension specific rules to find the source file for + a given object file relative to the object directory. For example, + for a file with the extension <c>.beam</c>, the default rule is to + look for a file with a corresponding extension <c>.erl</c> by + replacing the suffix <c>"ebin"</c> of the object directory path with + <c>"src"</c>. + The file search is done through <seealso + marker="#find_file/3"><c>find_file/3</c></seealso>. The directory of + the object file is always tried before any other directory specified + by the rules.</p> + <p>If <c><anno>Rules</anno></c> is left out or is an empty list, the + default system rules are used. See also the Kernel application + parameter <seealso + marker="kernel:kernel_app#source_search_rules"><c>source_search_rules</c></seealso>.</p> + </desc> + </func> </funcs> </erlref> - - diff --git a/lib/stdlib/doc/src/filename.xml b/lib/stdlib/doc/src/filename.xml index f20500709c..0ccca37a9d 100644 --- a/lib/stdlib/doc/src/filename.xml +++ b/lib/stdlib/doc/src/filename.xml @@ -4,7 +4,7 @@ <erlref> <header> <copyright> - <year>1997</year><year>2017</year> + <year>1997</year><year>2016</year> <holder>Ericsson AB. All Rights Reserved.</holder> </copyright> <legalnotice> @@ -356,10 +356,12 @@ true <p>Finds the source filename and compiler options for a module. The result can be fed to <seealso marker="compiler:compile#file/2"> <c>compile:file/2</c></seealso> to compile the file again.</p> - <warning><p>It is not recommended to use this function. If possible, - use the <seealso marker="beam_lib"><c>beam_lib(3)</c></seealso> - module to extract the abstract code format from the Beam file and - compile that instead.</p></warning> + <warning> + <p>This function is deprecated. Use <seealso marker="filelib#find_source/1"> + <c>filelib:find_source/1</c></seealso> instead for finding source files.</p> + <p>If possible, use the <seealso marker="beam_lib"><c>beam_lib(3)</c></seealso> + module to extract the compiler options and the abstract code + format from the Beam file and compile that instead.</p></warning> <p>Argument <c><anno>Beam</anno></c>, which can be a string or an atom, specifies either the module name or the path to the source code, with or without extension <c>".erl"</c>. In either diff --git a/lib/stdlib/doc/src/gb_trees.xml b/lib/stdlib/doc/src/gb_trees.xml index 790d4b8bf1..5cfff021c1 100644 --- a/lib/stdlib/doc/src/gb_trees.xml +++ b/lib/stdlib/doc/src/gb_trees.xml @@ -109,6 +109,28 @@ </func> <func> + <name name="take" arity="2"/> + <fsummary>Returns a value and new tree without node with key <c>Key</c>.</fsummary> + <desc> + <p>Returns a value <c><anno>Value</anno></c> from node with key <c><anno>Key</anno></c> + and new <c><anno>Tree2</anno></c> without the node with this value. + Assumes that the node with key is present in the tree, + crashes otherwise.</p> + </desc> + </func> + + <func> + <name name="take_any" arity="2"/> + <fsummary>Returns a value and new tree without node with key <c>Key</c>.</fsummary> + <desc> + <p>Returns a value <c><anno>Value</anno></c> from node with key <c><anno>Key</anno></c> + and new <c><anno>Tree2</anno></c> without the node with this value. + Returns <c>error</c> if the node with the key is not present in + the tree.</p> + </desc> + </func> + + <func> <name name="empty" arity="0"/> <fsummary>Return an empty tree.</fsummary> <desc> diff --git a/lib/stdlib/doc/src/gen_event.xml b/lib/stdlib/doc/src/gen_event.xml index c24542002a..56cb7974a2 100644 --- a/lib/stdlib/doc/src/gen_event.xml +++ b/lib/stdlib/doc/src/gen_event.xml @@ -350,13 +350,18 @@ gen_event:stop -----> Module:terminate/2 <func> <name>start() -> Result</name> - <name>start(EventMgrName) -> Result</name> + <name>start(EventMgrName | Options) -> Result</name> + <name>start(EventMgrName, Options) -> Result</name> <fsummary>Create a stand-alone event manager process.</fsummary> <type> - <v>EventMgrName = {local,Name} | {global,GlobalName} - | {via,Module,ViaName}</v> + <v>EventMgrName = {local,Name} | {global,GlobalName} | {via,Module,ViaName}</v> <v> Name = atom()</v> <v> GlobalName = ViaName = term()</v> + <v>Options = [Option]</v> + <v> Option = {debug,Dbgs} | {timeout,Time} | {spawn_opt,SOpts}</v> + <v> Dbgs = [Dbg]</v> + <v> Dbg = trace | log | statistics | {log_to_file,FileName} | {install,{Func,FuncState}}</v> + <v> SOpts = [term()]</v> <v>Result = {ok,Pid} | {error,{already_started,Pid}}</v> <v> Pid = pid()</v> </type> @@ -371,14 +376,19 @@ gen_event:stop -----> Module:terminate/2 <func> <name>start_link() -> Result</name> - <name>start_link(EventMgrName) -> Result</name> + <name>start_link(EventMgrName | Options) -> Result</name> + <name>start_link(EventMgrName, Options) -> Result</name> <fsummary>Create a generic event manager process in a supervision tree. </fsummary> <type> - <v>EventMgrName = {local,Name} | {global,GlobalName} - | {via,Module,ViaName}</v> + <v>EventMgrName = {local,Name} | {global,GlobalName} | {via,Module,ViaName}</v> <v> Name = atom()</v> <v> GlobalName = ViaName = term()</v> + <v>Options = [Option]</v> + <v> Option = {debug,Dbgs} | {timeout,Time} | {spawn_opt,SOpts}</v> + <v> Dbgs = [Dbg]</v> + <v> Dbg = trace | log | statistics | {log_to_file,FileName} | {install,{Func,FuncState}}</v> + <v> SOpts = [term()]</v> <v>Result = {ok,Pid} | {error,{already_started,Pid}}</v> <v> Pid = pid()</v> </type> @@ -569,6 +579,13 @@ gen_event:stop -----> Module:terminate/2 <v>Extra = term()</v> </type> <desc> + <note> + <p>This callback is optional, so callback modules need not export it. + If a release upgrade/downgrade with <c>Change={advanced,Extra}</c> + specified in the <c>.appup</c> file is made when <c>code_change/3</c> + isn't implemented the event handler will crash with an <c>undef</c> error + reason.</p> + </note> <p>This function is called for an installed event handler that is to update its internal state during a release upgrade/downgrade, that is, when the instruction @@ -749,6 +766,12 @@ gen_event:stop -----> Module:terminate/2 <v> Id = term()</v> </type> <desc> + <note> + <p>This callback is optional, so callback modules need not + export it. The <c>gen_event</c> module provides a default + implementation of this function that logs about the unexpected + <c>Info</c> message, drops it and returns <c>{noreply, State}</c>.</p> + </note> <p>This function is called for each installed event handler when an event manager receives any other message than an event or a synchronous request (or a system message).</p> @@ -805,6 +828,11 @@ gen_event:stop -----> Module:terminate/2 <v> Args = Reason = Term = term()</v> </type> <desc> + <note> + <p>This callback is optional, so callback modules need not + export it. The <c>gen_event</c> module provides a default + implementation without cleanup.</p> + </note> <p>Whenever an event handler is deleted from an event manager, this function is called. It is to be the opposite of <seealso marker="#Module:init/1"><c>Module:init/1</c></seealso> diff --git a/lib/stdlib/doc/src/gen_fsm.xml b/lib/stdlib/doc/src/gen_fsm.xml index de06987d38..691a039e34 100644 --- a/lib/stdlib/doc/src/gen_fsm.xml +++ b/lib/stdlib/doc/src/gen_fsm.xml @@ -534,11 +534,6 @@ gen_fsm:sync_send_all_state_event -----> Module:handle_sync_event/4 the function call fails.</p> <p>Return value <c>Reply</c> is defined in the return value of <c>Module:StateName/3</c>.</p> - <note> - <p>The ancient behavior of sometimes consuming the server - exit message if the server died during the call while - linked to the client was removed in Erlang 5.6/OTP R12B.</p> - </note> </desc> </func> </funcs> @@ -567,6 +562,13 @@ gen_fsm:sync_send_all_state_event -----> Module:handle_sync_event/4 <v>Extra = term()</v> </type> <desc> + <note> + <p>This callback is optional, so callback modules need not export it. + If a release upgrade/downgrade with <c>Change={advanced,Extra}</c> + specified in the <c>appup</c> file is made when <c>code_change/4</c> + isn't implemented the process will crash with an <c>undef</c> exit + reason.</p> + </note> <p>This function is called by a <c>gen_fsm</c> process when it is to update its internal state data during a release upgrade/downgrade, that is, when instruction <c>{update,Module,Change,...}</c>, @@ -691,6 +693,13 @@ gen_fsm:sync_send_all_state_event -----> Module:handle_sync_event/4 <v> Reason = normal | term()</v> </type> <desc> + <note> + <p>This callback is optional, so callback modules need not + export it. The <c>gen_fsm</c> module provides a default + implementation of this function that logs about the unexpected + <c>Info</c> message, drops it and returns + <c>{next_state, StateName, StateData}</c>.</p> + </note> <p>This function is called by a <c>gen_fsm</c> process when it receives any other message than a synchronous or asynchronous event (or a system message).</p> @@ -904,6 +913,11 @@ gen_fsm:sync_send_all_state_event -----> Module:handle_sync_event/4 <v>StateData = term()</v> </type> <desc> + <note> + <p>This callback is optional, so callback modules need not + export it. The <c>gen_fsm</c> module provides a default + implementation without cleanup.</p> + </note> <p>This function is called by a <c>gen_fsm</c> process when it is about to terminate. It is to be the opposite of <seealso marker="#Module:init/1"><c>Module:init/1</c></seealso> diff --git a/lib/stdlib/doc/src/gen_server.xml b/lib/stdlib/doc/src/gen_server.xml index 4a7dd60858..4540449792 100644 --- a/lib/stdlib/doc/src/gen_server.xml +++ b/lib/stdlib/doc/src/gen_server.xml @@ -162,11 +162,6 @@ gen_server:abcast -----> Module:handle_cast/2 of <c>Module:handle_call/3</c>.</p> <p>The call can fail for many reasons, including time-out and the called <c>gen_server</c> process dying before or during the call.</p> - <note> - <p>The ancient behavior of sometimes consuming the server - exit message if the server died during the call while - linked to the client was removed in Erlang 5.6/OTP R12B.</p> - </note> </desc> </func> @@ -509,6 +504,13 @@ gen_server:abcast -----> Module:handle_cast/2 <v>Reason = term()</v> </type> <desc> + <note> + <p>This callback is optional, so callback modules need not export it. + If a release upgrade/downgrade with <c>Change={advanced,Extra}</c> + specified in the <c>appup</c> file is made when <c>code_change/3</c> + isn't implemented the process will crash with an <c>undef</c> exit + reason.</p> + </note> <p>This function is called by a <c>gen_server</c> process when it is to update its internal state during a release upgrade/downgrade, that is, when the instruction <c>{update,Module,Change,...}</c>, @@ -695,6 +697,12 @@ gen_server:abcast -----> Module:handle_cast/2 <v> Reason = normal | term()</v> </type> <desc> + <note> + <p>This callback is optional, so callback modules need not + export it. The <c>gen_server</c> module provides a default + implementation of this function that logs about the unexpected + <c>Info</c> message, drops it and returns <c>{noreply, State}</c>.</p> + </note> <p>This function is called by a <c>gen_server</c> process when a time-out occurs or when it receives any other message than a synchronous or asynchronous request (or a system message).</p> @@ -755,6 +763,11 @@ gen_server:abcast -----> Module:handle_cast/2 <v>State = term()</v> </type> <desc> + <note> + <p>This callback is optional, so callback modules need not + export it. The <c>gen_server</c> module provides a default + implementation without cleanup.</p> + </note> <p>This function is called by a <c>gen_server</c> process when it is about to terminate. It is to be the opposite of <seealso marker="#Module:init/1"><c>Module:init/1</c></seealso> diff --git a/lib/stdlib/doc/src/io_lib.xml b/lib/stdlib/doc/src/io_lib.xml index 931e50f6f2..5ae400da62 100644 --- a/lib/stdlib/doc/src/io_lib.xml +++ b/lib/stdlib/doc/src/io_lib.xml @@ -4,7 +4,7 @@ <erlref> <header> <copyright> - <year>1996</year><year>2016</year> + <year>1996</year><year>2017</year> <holder>Ericsson AB. All Rights Reserved.</holder> </copyright> <legalnotice> @@ -147,7 +147,7 @@ format string (that is, <c>~ts</c> or <c>~tc</c>), the resulting list can contain characters beyond the ISO Latin-1 character range (that is, numbers > 255). If so, the - result is not an ordinary Erlang <c>string()</c>, but can well be + result is still an ordinary Erlang <c>string()</c>, and can well be used in any context where Unicode data is allowed.</p> </desc> </func> @@ -384,6 +384,16 @@ </func> <func> + <name name="write_atom_as_latin1" arity="1"/> + <fsummary>Write an atom.</fsummary> + <desc> + <p>Returns the list of characters needed to print atom + <c><anno>Atom</anno></c>. Non-Latin-1 characters + are escaped.</p> + </desc> + </func> + + <func> <name name="write_char" arity="1"/> <fsummary>Write a character.</fsummary> <desc> diff --git a/lib/stdlib/doc/src/math.xml b/lib/stdlib/doc/src/math.xml index 1358ce5cbf..b4f096217a 100644 --- a/lib/stdlib/doc/src/math.xml +++ b/lib/stdlib/doc/src/math.xml @@ -57,9 +57,12 @@ <name name="atan" arity="1"/> <name name="atan2" arity="2"/> <name name="atanh" arity="1"/> + <name name="ceil" arity="1"/> <name name="cos" arity="1"/> <name name="cosh" arity="1"/> <name name="exp" arity="1"/> + <name name="floor" arity="1"/> + <name name="fmod" arity="2"/> <name name="log" arity="1"/> <name name="log10" arity="1"/> <name name="log2" arity="1"/> diff --git a/lib/stdlib/doc/src/notes.xml b/lib/stdlib/doc/src/notes.xml index bb35224182..428d8a6e70 100644 --- a/lib/stdlib/doc/src/notes.xml +++ b/lib/stdlib/doc/src/notes.xml @@ -3267,7 +3267,7 @@ <p> Two bugs in io:format for ~F.~Ps has been corrected. When length(S) >= abs(F) > P, the precision P was incorrectly - ignored. When F == P > lenght(S) the result was + ignored. When F == P > length(S) the result was incorrectly left adjusted. Bug found by Ali Yakout who also provided a fix.</p> <p> diff --git a/lib/stdlib/doc/src/orddict.xml b/lib/stdlib/doc/src/orddict.xml index d048983c61..26bbf499c6 100644 --- a/lib/stdlib/doc/src/orddict.xml +++ b/lib/stdlib/doc/src/orddict.xml @@ -38,7 +38,7 @@ <p>This module provides a <c>Key</c>-<c>Value</c> dictionary. An <c>orddict</c> is a representation of a dictionary, where a list of pairs is used to store the keys and values. The list is - ordered after the keys.</p> + ordered after the keys in the <em>Erlang term order</em>.</p> <p>This module provides the same interface as the <seealso marker="dict"><c>dict(3)</c></seealso> module @@ -113,6 +113,15 @@ </func> <func> + <name name="take" arity="2"/> + <fsummary>Return value and new dictionary without element with this value.</fsummary> + <desc> + <p>This function returns value from dictionary and new dictionary without this value. + Returns <c>error</c> if the key is not present in the dictionary.</p> + </desc> + </func> + + <func> <name name="filter" arity="2"/> <fsummary>Select elements that satisfy a predicate.</fsummary> <desc> diff --git a/lib/stdlib/doc/src/ordsets.xml b/lib/stdlib/doc/src/ordsets.xml index 148281fcf7..7b590932e4 100644 --- a/lib/stdlib/doc/src/ordsets.xml +++ b/lib/stdlib/doc/src/ordsets.xml @@ -39,7 +39,8 @@ <p>Sets are collections of elements with no duplicate elements. An <c>ordset</c> is a representation of a set, where an ordered list is used to store the elements of the set. An ordered list - is more efficient than an unordered list.</p> + is more efficient than an unordered list. Elements are ordered + according to the <em>Erlang term order</em>.</p> <p>This module provides the same interface as the <seealso marker="sets"><c>sets(3)</c></seealso> module diff --git a/lib/stdlib/doc/src/proc_lib.xml b/lib/stdlib/doc/src/proc_lib.xml index da03c39a26..e64b2ce18a 100644 --- a/lib/stdlib/doc/src/proc_lib.xml +++ b/lib/stdlib/doc/src/proc_lib.xml @@ -66,6 +66,12 @@ <seealso marker="sasl:error_logging">SASL Error Logging</seealso> in the SASL User's Guide.</p> + <p>Unlike in "plain Erlang", <c>proc_lib</c> processes will not generate + <em>error reports</em>, which are written to the terminal by the + emulator and do not require SASL to be started. All exceptions are + converted to <em>exits</em> which are ignored by the default + <c>error_logger</c> handler.</p> + <p>The crash report contains the previously stored information, such as ancestors and initial function, the termination reason, and information about other processes that terminate as a result diff --git a/lib/stdlib/doc/src/proplists.xml b/lib/stdlib/doc/src/proplists.xml index fe6b8cc3bf..990d47b313 100644 --- a/lib/stdlib/doc/src/proplists.xml +++ b/lib/stdlib/doc/src/proplists.xml @@ -344,7 +344,7 @@ split([{c, 2}, {e, 1}, a, {c, 3, 4}, d, {b, 5}, b], [a, b, c])</code> with <c>{K2, true}</c>, thus changing the name of the option and simultaneously negating the value specified by <seealso marker="#get_bool/2"> - <c>get_bool(Key, <anno>ListIn</anno></c></seealso>. + <c>get_bool(Key, <anno>ListIn</anno>)</c></seealso>. If the same <c>K1</c> occurs more than once in <c><anno>Negations</anno></c>, only the first occurrence is used.</p> <p>For example, <c>substitute_negations([{no_foo, foo}], L)</c> diff --git a/lib/stdlib/doc/src/rand.xml b/lib/stdlib/doc/src/rand.xml index eb7870e367..e06d7e467d 100644 --- a/lib/stdlib/doc/src/rand.xml +++ b/lib/stdlib/doc/src/rand.xml @@ -4,7 +4,7 @@ <erlref> <header> <copyright> - <year>2015</year><year>2016</year> + <year>2015</year><year>2017</year> <holder>Ericsson AB. All Rights Reserved.</holder> </copyright> <legalnotice> @@ -41,27 +41,82 @@ Sebastiano Vigna</url>. The normal distribution algorithm uses the <url href="http://www.jstatsoft.org/v05/i08">Ziggurat Method by Marsaglia and Tsang</url>.</p> + <p>For some algorithms, jump functions are provided for generating + non-overlapping sequences for parallel computations. + The jump functions perform calculations + equivalent to perform a large number of repeated calls + for calculating new states. </p> <p>The following algorithms are provided:</p> <taglist> - <tag><c>exsplus</c></tag> + <tag><c>exrop</c></tag> <item> - <p>Xorshift116+, 58 bits precision and period of 2^116-1</p> + <p>Xoroshiro116+, 58 bits precision and period of 2^116-1</p> + <p>Jump function: equivalent to 2^64 calls</p> </item> - <tag><c>exs64</c></tag> + <tag><c>exs1024s</c></tag> <item> - <p>Xorshift64*, 64 bits precision and a period of 2^64-1</p> + <p>Xorshift1024*, 64 bits precision and a period of 2^1024-1</p> + <p>Jump function: equivalent to 2^512 calls</p> </item> - <tag><c>exs1024</c></tag> + <tag><c>exsp</c></tag> <item> - <p>Xorshift1024*, 64 bits precision and a period of 2^1024-1</p> + <p>Xorshift116+, 58 bits precision and period of 2^116-1</p> + <p>Jump function: equivalent to 2^64 calls</p> + <p> + This is a corrected version of the previous default algorithm, + that now has been superseeded by Xoroshiro116+ (<c>exrop</c>). + Since there is no native 58 bit rotate instruction this + algorithm executes a little (say < 15%) faster than <c>exrop</c>. + See the + <url href="http://xorshift.di.unimi.it">algorithms' homepage</url>. + </p> </item> </taglist> - <p>The default algorithm is <c>exsplus</c>. If a specific algorithm is + <p> + The default algorithm is <c>exrop</c> (Xoroshiro116+). + If a specific algorithm is required, ensure to always use <seealso marker="#seed-1"> - <c>seed/1</c></seealso> to initialize the state.</p> + <c>seed/1</c></seealso> to initialize the state. + </p> + + <p> + Undocumented (old) algorithms are deprecated but still implemented + so old code relying on them will produce + the same pseudo random sequences as before. + </p> + + <note> + <p> + There were a number of problems in the implementation + of the now undocumented algorithms, which is why + they are deprecated. The new algorithms are a bit slower + but do not have these problems: + </p> + <p> + Uniform integer ranges had a skew in the probability distribution + that was not noticable for small ranges but for large ranges + less than the generator's precision the probability to produce + a low number could be twice the probability for a high. + </p> + <p> + Uniform integer ranges larger than or equal to the generator's + precision used a floating point fallback that only calculated + with 52 bits which is smaller than the requested range + and therefore were not all numbers in the requested range + even possible to produce. + </p> + <p> + Uniform floats had a non-uniform density so small values + i.e less than 0.5 had got smaller intervals decreasing + as the generated value approached 0.0 although still uniformly + distributed for sufficiently large subranges. The new algorithms + produces uniformly distributed floats on the form N * 2.0^(-53) + hence equally spaced. + </p> + </note> <p>Every time a random number is requested, a state is used to calculate it and a new state is produced. The state can either be @@ -91,19 +146,19 @@ R1 = rand:uniform(),</pre> <p>Use a specified algorithm:</p> <pre> -_ = rand:seed(exs1024), +_ = rand:seed(exs1024s), R2 = rand:uniform(),</pre> <p>Use a specified algorithm with a constant seed:</p> <pre> -_ = rand:seed(exs1024, {123, 123534, 345345}), +_ = rand:seed(exs1024s, {123, 123534, 345345}), R3 = rand:uniform(),</pre> <p>Use the functional API with a non-constant seed:</p> <pre> -S0 = rand:seed_s(exsplus), +S0 = rand:seed_s(exrop), {R4, S1} = rand:uniform_s(S0),</pre> <p>Create a standard normal deviate:</p> @@ -111,28 +166,93 @@ S0 = rand:seed_s(exsplus), <pre> {SND0, S2} = rand:normal_s(S1),</pre> + <p>Create a normal deviate with mean -3 and variance 0.5:</p> + + <pre> +{ND0, S3} = rand:normal_s(-3, 0.5, S2),</pre> + <note> - <p>This random number generator is not cryptographically - strong. If a strong cryptographic random number generator is - needed, use one of functions in the - <seealso marker="crypto:crypto"><c>crypto</c></seealso> - module, for example, <seealso marker="crypto:crypto"> - <c>crypto:strong_rand_bytes/1</c></seealso>.</p> + <p>The builtin random number generator algorithms are not + cryptographically strong. If a cryptographically strong + random number generator is needed, use something like + <seealso marker="crypto:crypto#rand_seed-0"><c>crypto:rand_seed/0</c></seealso>. + </p> </note> + <p> + For all these generators the lowest bit(s) has got + a slightly less random behaviour than all other bits. + 1 bit for <c>exrop</c> (and <c>exsp</c>), + and 3 bits for <c>exs1024s</c>. + See for example the explanation in the + <url href="http://xoroshiro.di.unimi.it/xoroshiro128plus.c"> + Xoroshiro128+ + </url> + generator source code: + </p> + <pre> +Beside passing BigCrush, this generator passes the PractRand test suite +up to (and included) 16TB, with the exception of binary rank tests, +which fail due to the lowest bit being an LFSR; all other bits pass all +tests. We suggest to use a sign test to extract a random Boolean value.</pre> + <p> + If this is a problem; to generate a boolean + use something like this: + </p> + <pre>(rand:uniform(16) > 8)</pre> + <p> + And for a general range, with <c>N = 1</c> for <c>exrop</c>, + and <c>N = 3</c> for <c>exs1024s</c>: + </p> + <pre>(((rand:uniform(Range bsl N) - 1) bsr N) + 1)</pre> + <p> + The floating point generating functions in this module + waste the lowest bits when converting from an integer + so they avoid this snag. + </p> + + </description> <datatypes> <datatype> + <name name="builtin_alg"/> + </datatype> + <datatype> <name name="alg"/> </datatype> <datatype> + <name name="alg_handler"/> + </datatype> + <datatype> + <name name="alg_state"/> + </datatype> + <datatype> <name name="state"/> <desc><p>Algorithm-dependent state.</p></desc> </datatype> <datatype> <name name="export_state"/> - <desc><p>Algorithm-dependent state that can be printed or saved to - file.</p></desc> + <desc> + <p> + Algorithm-dependent state that can be printed or saved to file. + </p> + </desc> + </datatype> + <datatype> + <name name="exs64_state"/> + <desc><p>Algorithm specific internal state</p></desc> + </datatype> + <datatype> + <name name="exsplus_state"/> + <desc><p>Algorithm specific internal state</p></desc> + </datatype> + <datatype> + <name name="exs1024_state"/> + <desc><p>Algorithm specific internal state</p></desc> + </datatype> + <datatype> + <name name="exrop_state"/> + <desc><p>Algorithm specific internal state</p></desc> </datatype> </datatypes> @@ -156,6 +276,33 @@ S0 = rand:seed_s(exsplus), </func> <func> + <name name="jump" arity="0"/> + <fsummary>Return the seed after performing jump calculation + to the state in the process dictionary.</fsummary> + <desc><marker id="jump-0" /> + <p>Returns the state + after performing jump calculation + to the state in the process dictionary.</p> + <p>This function generates a <c>not_implemented</c> error exception + when the jump function is not implemented for + the algorithm specified in the state + in the process dictionary.</p> + </desc> + </func> + + <func> + <name name="jump" arity="1"/> + <fsummary>Return the seed after performing jump calculation.</fsummary> + <desc><marker id="jump-1" /> + <p>Returns the state after performing jump calculation + to the given state. </p> + <p>This function generates a <c>not_implemented</c> error exception + when the jump function is not implemented for + the algorithm specified in the state.</p> + </desc> + </func> + + <func> <name name="normal" arity="0"/> <fsummary>Return a standard normal distributed random float.</fsummary> <desc> @@ -166,6 +313,15 @@ S0 = rand:seed_s(exsplus), </func> <func> + <name name="normal" arity="2"/> + <fsummary>Return a normal distributed random float.</fsummary> + <desc> + <p>Returns a normal N(Mean, Variance) deviate float + and updates the state in the process dictionary.</p> + </desc> + </func> + + <func> <name name="normal_s" arity="1"/> <fsummary>Return a standard normal distributed random float.</fsummary> <desc> @@ -176,12 +332,24 @@ S0 = rand:seed_s(exsplus), </func> <func> + <name name="normal_s" arity="3"/> + <fsummary>Return a normal distributed random float.</fsummary> + <desc> + <p>Returns, for a specified state, a normal N(Mean, Variance) + deviate float and a new state.</p> + </desc> + </func> + + <func> <name name="seed" arity="1"/> <fsummary>Seed random number generator.</fsummary> <desc> <marker id="seed-1"/> - <p>Seeds random number generation with the specifed algorithm and - time-dependent data if <anno>AlgOrExpState</anno> is an algorithm.</p> + <p> + Seeds random number generation with the specifed algorithm and + time-dependent data if <c><anno>AlgOrStateOrExpState</anno></c> + is an algorithm. + </p> <p>Otherwise recreates the exported seed in the process dictionary, and returns the state. See also <seealso marker="#export_seed-0"><c>export_seed/0</c></seealso>.</p> @@ -201,8 +369,11 @@ S0 = rand:seed_s(exsplus), <name name="seed_s" arity="1"/> <fsummary>Seed random number generator.</fsummary> <desc> - <p>Seeds random number generation with the specifed algorithm and - time-dependent data if <anno>AlgOrExpState</anno> is an algorithm.</p> + <p> + Seeds random number generation with the specifed algorithm and + time-dependent data if <c><anno>AlgOrStateOrExpState</anno></c> + is an algorithm. + </p> <p>Otherwise recreates the exported seed and returns the state. See also <seealso marker="#export_seed-0"> <c>export_seed/0</c></seealso>.</p> @@ -223,7 +394,7 @@ S0 = rand:seed_s(exsplus), <fsummary>Return a random float.</fsummary> <desc><marker id="uniform-0"/> <p>Returns a random float uniformly distributed in the value - range <c>0.0 < <anno>X</anno> < 1.0</c> and + range <c>0.0 =< <anno>X</anno> < 1.0</c> and updates the state in the process dictionary.</p> </desc> </func> @@ -234,7 +405,7 @@ S0 = rand:seed_s(exsplus), <desc><marker id="uniform-1"/> <p>Returns, for a specified integer <c><anno>N</anno> >= 1</c>, a random integer uniformly distributed in the value range - <c>1 <= <anno>X</anno> <= <anno>N</anno></c> and + <c>1 =< <anno>X</anno> =< <anno>N</anno></c> and updates the state in the process dictionary.</p> </desc> </func> @@ -244,7 +415,7 @@ S0 = rand:seed_s(exsplus), <fsummary>Return a random float.</fsummary> <desc> <p>Returns, for a specified state, random float - uniformly distributed in the value range <c>0.0 < + uniformly distributed in the value range <c>0.0 =< <anno>X</anno> < 1.0</c> and a new state.</p> </desc> </func> @@ -255,7 +426,7 @@ S0 = rand:seed_s(exsplus), <desc> <p>Returns, for a specified integer <c><anno>N</anno> >= 1</c> and a state, a random integer uniformly distributed in the value - range <c>1 <= <anno>X</anno> <= <anno>N</anno></c> and a + range <c>1 =< <anno>X</anno> =< <anno>N</anno></c> and a new state.</p> </desc> </func> diff --git a/lib/stdlib/doc/src/re.xml b/lib/stdlib/doc/src/re.xml index 7f4f0aa18c..078ca0e38c 100644 --- a/lib/stdlib/doc/src/re.xml +++ b/lib/stdlib/doc/src/re.xml @@ -5,7 +5,7 @@ <header> <copyright> <year>2007</year> - <year>2016</year> + <year>2017</year> <holder>Ericsson AB, All Rights Reserved</holder> </copyright> <legalnotice> @@ -45,9 +45,10 @@ <p>The matching algorithms of the library are based on the PCRE library, but not all of the PCRE library is interfaced and - some parts of the library go beyond what PCRE offers. The sections of - the PCRE documentation that are relevant to this module are included - here.</p> + some parts of the library go beyond what PCRE offers. Currently + PCRE version 8.40 (release date 2017-01-11) is used. The sections + of the PCRE documentation that are relevant to this module are + included here.</p> <note> <p>The Erlang literal syntax for strings uses the "\" @@ -78,6 +79,14 @@ <funcs> <func> + <name name="version" arity="0"/> + <fsummary>Gives the PCRE version of the system in a string format</fsummary> + <desc> + <p>The return of this function is a string with the PCRE version of the system that was used in the Erlang/OTP compilation.</p> + </desc> + </func> + + <func> <name name="compile" arity="1"/> <fsummary>Compile a regular expression into a match program</fsummary> <desc> @@ -149,13 +158,25 @@ </item> <tag><c>extended</c></tag> <item> - <p>Whitespace data characters in the pattern are ignored except - when escaped or inside a character class. Whitespace does not - include character 'vt' (ASCII 11). Characters between an - unescaped <c>#</c> outside a character class and the next newline, - inclusive, are also ignored. This is equivalent to Perl option - <c>/x</c> and can be changed within a pattern by a <c>(?x)</c> - option setting.</p> + <p>If this option is set, most white space characters in the + pattern are totally ignored except when escaped or inside a + character class. However, white space is not allowed within + sequences such as <c>(?></c> that introduce various + parenthesized subpatterns, nor within a numerical quantifier + such as <c>{1,3}</c>. However, ignorable white space is permitted + between an item and a following quantifier and between a + quantifier and a following + that indicates possessiveness. + </p> + <p>White space did not used to include the VT character (code + 11), because Perl did not treat this character as white space. + However, Perl changed at release 5.18, so PCRE followed at + release 8.34, and VT is now treated as white space. + </p> + <p>This also causes characters between an unescaped # + outside a character class and the next newline, inclusive, to + be ignored. This is equivalent to Perl's <c>/x</c> option, and it + can be changed within a pattern by a <c>(?x)</c> option setting. + </p> <p>With this option, comments inside complicated patterns can be included. However, notice that this applies only to data characters. Whitespace characters can never appear within special @@ -1321,6 +1342,8 @@ re:split("Erlang","[lg]",[{return,list},{parts,4}]).</code> VM. Notice that the recursion limit does not affect the stack depth of the VM, as PCRE for Erlang is compiled in such a way that the match function never does recursion on the C stack.</p> + <p>Note that <c>LIMIT_MATCH</c> and <c>LIMIT_RECURSION</c> can only reduce + the value of the limits set by the caller, not increase them.</p> </section> <section> @@ -1444,12 +1467,17 @@ Pattern PCRE matches Perl matches <tag>\n</tag><item>Line feed (hex 0A)</item> <tag>\r</tag><item>Carriage return (hex 0D)</item> <tag>\t</tag><item>Tab (hex 09)</item> + <tag>\0dd</tag><item>Character with octal code 0dd</item> <tag>\ddd</tag><item>Character with octal code ddd, or back reference </item> + <tag>\o{ddd..}</tag><item>character with octal code ddd..</item> <tag>\xhh</tag><item>Character with hex code hh</item> <tag>\x{hhh..}</tag><item>Character with hex code hhh..</item> </taglist> + <note><p>Note that \0dd is always an octal code, and that \8 and \9 are + the literal characters "8" and "9".</p></note> + <p>The precise effect of \cx on ASCII characters is as follows: if x is a lowercase letter, it is converted to upper case. Then bit 6 of the character (hex 40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A @@ -1461,50 +1489,38 @@ Pattern PCRE matches Perl matches <p>The \c facility was designed for use with ASCII characters, but with the extension to Unicode it is even less useful than it once was.</p> - <p>By default, after \x, from zero to two hexadecimal digits are read - (letters can be in upper or lower case). Any number of hexadecimal digits - can appear between \x{ and }, but the character code is constrained as - follows:</p> - - <taglist> - <tag>8-bit non-Unicode mode</tag> - <item>< 0x100</item> - <tag>8-bit UTF-8 mode</tag> - <item>< 0x10ffff and a valid code point</item> - </taglist> - - <p>Invalid Unicode code points are the range 0xd800 to 0xdfff (the so-called - "surrogate" code points), and 0xffef.</p> - - <p>If characters other than hexadecimal digits appear between \x{ and }, - or if there is no terminating }, this form of escape is not recognized. - Instead, the initial \x is interpreted as a basic hexadecimal escape, - with no following digits, giving a character whose value is zero.</p> - - <p>Characters whose value is < 256 can be defined by either of the two - syntaxes for \x. There is no difference in the way they are handled. For - example, \xdc is the same as \x{dc}.</p> - <p>After \0 up to two further octal digits are read. If there are fewer than - two digits, only those that are present are used. Thus the sequence - \0\x\07 specifies two binary zeros followed by a BEL character (code value - 7). Ensure to supply two digits after the initial zero if the pattern - character that follows is itself an octal digit.</p> + two digits, just those that are present are used. Thus the sequence + \0\x\015 specifies two binary zeros followed by a CR character (code value + 13). Make sure you supply two digits after the initial zero if the pattern + character that follows is itself an octal digit.</p> + + <p>The escape \o must be followed by a sequence of octal digits, enclosed + in braces. An error occurs if this is not the case. This escape is a recent + addition to Perl; it provides way of specifying character code points as + octal numbers greater than 0777, and it also allows octal numbers and back + references to be unambiguously specified.</p> + + <p>For greater clarity and unambiguity, it is best to avoid following \ by + a digit greater than zero. Instead, use \o{} or \x{} to specify character + numbers, and \g{} to specify back references. The following paragraphs + describe the old, ambiguous syntax.</p> <p>The handling of a backslash followed by a digit other than 0 is - complicated. Outside a character class, PCRE reads it and any following - digits as a decimal number. If the number is < 10, or if there have + complicated, and Perl has changed in recent releases, causing PCRE also + to change. Outside a character class, PCRE reads the digit and any following + digits as a decimal number. If the number is < 8, or if there have been at least that many previous capturing left parentheses in the expression, the entire sequence is taken as a <em>back reference</em>. A description of how this works is provided later, following the discussion of parenthesized subpatterns.</p> - <p>Inside a character class, or if the decimal number is > 9 and there - have not been that many capturing subpatterns, PCRE re-reads up to three - octal digits following the backslash, and uses them to generate a data - character. Any subsequent digits stand for themselves. The value of the - character is constrained in the same way as characters specified in - hexadecimal. For example:</p> + <p>Inside a character class, or if the decimal number following \ is > + 7 and there have not been that many capturing subpatterns, PCRE handles + \8 and \9 as the literal characters "8" and "9", and otherwise re-reads + up to three octal digits following the backslash, and using them to + generate a data character. Any subsequent digits stand for themselves. + For example:</p> <taglist> <tag>\040</tag> @@ -1526,12 +1542,38 @@ Pattern PCRE matches Perl matches <tag>\377</tag> <item>Can be a back reference, otherwise value 255 (decimal)</item> <tag>\81</tag> - <item>Either a back reference, or a binary zero followed by the two - characters "8" and "1"</item> + <item>Either a back reference, or the two characters "8" and "1"</item> </taglist> - <p>Notice that octal values >= 100 must not be introduced by a leading - zero, as no more than three octal digits are ever read.</p> + <p>Notice that octal values >= 100 that are specified using this syntax + must not be introduced by a leading zero, as no more than three octal digits + are ever read.</p> + + <p>By default, after \x that is not followed by {, from zero to two + hexadecimal digits are read (letters can be in upper or lower case). Any + number of hexadecimal digits may appear between \x{ and }. If a character + other than a hexadecimal digit appears between \x{ and }, or if there is no + terminating }, an error occurs. + </p> + + <p>Characters whose value is less than 256 can be defined by either of the + two syntaxes for \x. There is no difference in the way they are handled. For + example, \xdc is exactly the same as \x{dc}.</p> + + <p><em>Constraints on character values</em></p> + + <p>Characters that are specified using octal or hexadecimal numbers are + limited to certain values, as follows:</p> + <taglist> + <tag>8-bit non-UTF mode</tag> + <item><p>< 0x100</p></item> + <tag>8-bit UTF-8 mode</tag> + <item><p>< 0x10ffff and a valid codepoint</p></item> + </taglist> + <p>Invalid Unicode codepoints are the range 0xd800 to 0xdfff (the + so-called "surrogate" codepoints), and 0xffef.</p> + + <p><em>Escape sequences in character classes</em></p> <p>All the sequences that define a single character value can be used both inside and outside character classes. Also, inside a character class, \b @@ -1597,11 +1639,14 @@ Pattern PCRE matches Perl matches appropriate type. If the current matching point is at the end of the subject string, all fail, as there is no character to match.</p> - <p>For compatibility with Perl, \s does not match the VT character - (code 11). This makes it different from the Posix "space" class. The \s - characters are HT (9), LF (10), FF (12), CR (13), and space (32). If "use - locale;" is included in a Perl script, \s can match the VT character. In - PCRE, it never does.</p> + <p>For compatibility with Perl, \s did not used to match the VT character (code + 11), which made it different from the the POSIX "space" class. However, Perl + added VT at release 5.18, and PCRE followed suit at release 8.34. The default + \s characters are now HT (9), LF (10), VT (11), FF (12), CR (13), and space + (32), which are defined as white space in the "C" locale. This list may vary if + locale-specific matching is taking place. For example, in some locales the + "non-breaking space" character (\xA0) is recognized as white space, and in + others the VT character is not.</p> <p>A "word" character is an underscore or any character that is a letter or a digit. By default, the definition of letters and digits is controlled by @@ -1619,9 +1664,9 @@ Pattern PCRE matches Perl matches <taglist> <tag>\d</tag><item>Any character that \p{Nd} matches (decimal digit) </item> - <tag>\s</tag><item>Any character that \p{Z} matches, plus HT, LF, FF, CR + <tag>\s</tag><item>Any character that \p{Z} or \h or \v </item> - <tag>\w</tag><item>Any character that \p{L} or \p{N} matches, plus + <tag>\w</tag><item>Any character that matches \p{L} or \p{N} matches, plus underscore</item> </taglist> @@ -1769,6 +1814,7 @@ Pattern PCRE matches Perl matches <item>Avestan</item> <item>Balinese</item> <item>Bamum</item> + <item>Bassa_Vah</item> <item>Batak</item> <item>Bengali</item> <item>Bopomofo</item> @@ -1777,6 +1823,7 @@ Pattern PCRE matches Perl matches <item>Buhid</item> <item>Canadian_Aboriginal</item> <item>Carian</item> + <item>Caucasian_Albanian</item> <item>Chakma</item> <item>Cham</item> <item>Cherokee</item> @@ -1787,11 +1834,14 @@ Pattern PCRE matches Perl matches <item>Cyrillic</item> <item>Deseret</item> <item>Devanagari</item> + <item>Duployan</item> <item>Egyptian_Hieroglyphs</item> + <item>Elbasan</item> <item>Ethiopic</item> <item>Georgian</item> <item>Glagolitic</item> <item>Gothic</item> + <item>Grantha</item> <item>Greek</item> <item>Gujarati</item> <item>Gurmukhi</item> @@ -1811,40 +1861,56 @@ Pattern PCRE matches Perl matches <item>Kayah_Li</item> <item>Kharoshthi</item> <item>Khmer</item> + <item>Khojki</item> + <item>Khudawadi</item> <item>Lao</item> <item>Latin</item> <item>Lepcha</item> <item>Limbu</item> + <item>Linear_A</item> <item>Linear_B</item> <item>Lisu</item> <item>Lycian</item> <item>Lydian</item> + <item>Mahajani</item> <item>Malayalam</item> <item>Mandaic</item> + <item>Manichaean</item> <item>Meetei_Mayek</item> + <item>Mende_Kikakui</item> <item>Meroitic_Cursive</item> <item>Meroitic_Hieroglyphs</item> <item>Miao</item> + <item>Modi</item> <item>Mongolian</item> + <item>Mro</item> <item>Myanmar</item> + <item>Nabataean</item> <item>New_Tai_Lue</item> <item>Nko</item> <item>Ogham</item> + <item>Ol_Chiki</item> <item>Old_Italic</item> + <item>Old_North_Arabian</item> + <item>Old_Permic</item> <item>Old_Persian</item> <item>Oriya</item> <item>Old_South_Arabian</item> <item>Old_Turkic</item> - <item>Ol_Chiki</item> <item>Osmanya</item> + <item>Pahawh_Hmong</item> + <item>Palmyrene</item> + <item>Pau_Cin_Hau</item> <item>Phags_Pa</item> <item>Phoenician</item> + <item>Psalter_Pahlavi</item> <item>Rejang</item> <item>Runic</item> <item>Samaritan</item> <item>Saurashtra</item> <item>Sharada</item> <item>Shavian</item> + <item>Siddham</item> <item>Sinhala</item> <item>Sora_Sompeng</item> <item>Sundanese</item> @@ -1862,8 +1928,10 @@ Pattern PCRE matches Perl matches <item>Thai</item> <item>Tibetan</item> <item>Tifinagh</item> + <item>Tirhuta</item> <item>Ugaritic</item> <item>Vai</item> + <item>Warang_Citi</item> <item>Yi</item> </list> @@ -2001,10 +2069,10 @@ Pattern PCRE matches Perl matches <p>In addition to the standard Unicode properties described earlier, PCRE supports four more that make it possible to convert traditional escape - sequences, such as \w and \s, and Posix character classes to use Unicode + sequences, such as \w and \s to use Unicode properties. PCRE uses these non-standard, non-Perl properties internally - when <c>PCRE_UCP</c> is set. However, they can also be used explicitly. - The properties are as follows:</p> + when the <c>ucp</c> option is passed. However, they can also be used + explicitly. The properties are as follows:</p> <taglist> <tag>Xan</tag> @@ -2030,6 +2098,16 @@ Pattern PCRE matches Perl matches </item> </taglist> + <p>Perl and POSIX space are now the same. Perl added VT to its space + character set at release 5.18 and PCRE changed at release 8.34.</p> + + <p>Xan matches characters that have either the L (letter) or the N (number) + property. Xps matches the characters tab, linefeed, vertical tab, form feed, + or carriage return, and any other character that has the Z (separator) + property. Xsp is the same as Xps; it used to exclude vertical tab, for Perl + compatibility, but Perl changed, and so PCRE followed at release 8.34. Xwd + matches the same characters as Xan, plus underscore. + </p> <p>There is another non-standard property, Xuc, which matches any character that can be represented by a Universal Character Name in C++ and other programming languages. These are the characters $, @, ` (grave accent), @@ -2062,7 +2140,9 @@ foo\Kbar</code> <p>Perl documents that the use of \K within assertions is "not well defined". In PCRE, \K is acted upon when it occurs inside positive - assertions, but is ignored in negative assertions.</p> + assertions, but is ignored in negative assertions. Note that when a + pattern such as (?=ab\K) matches, the reported start of the match can + be greater than the end of the match.</p> <p><em>Simple Assertions</em></p> @@ -2301,7 +2381,8 @@ foo\Kbar</code> m, inclusive. If a minus character is required in a class, it must be escaped with a backslash or appear in a position where it cannot be interpreted as indicating a range, typically as the first or last - character in the class.</p> + character in the class, or immediately after a range. For example, [b-d-z] + matches letters in the range b to d, a hyphen character, or z.</p> <p>The literal character "]" cannot be the end character of a range. A pattern such as [W-]46] is interpreted as a class of two characters ("W" @@ -2311,6 +2392,11 @@ foo\Kbar</code> followed by two other characters. The octal or hexadecimal representation of "]" can also be used to end a range.</p> + <p>An error is generated if a POSIX character class (see below) or an + escape sequence other than one that defines a single character appears at + a point where a range ending character is expected. For example, [z-\xff] + is valid, but [A-\d] and [A-[:digit:]] are not.</p> + <p>Ranges operate in the collating sequence of character values. They can also be used for characters specified numerically, for example, [\000-\037]. Ranges can include any characters that are valid for the @@ -2353,7 +2439,8 @@ foo\Kbar</code> range)</item> <item>Circumflex (only at the start)</item> <item>Opening square bracket (only when it can be interpreted as - introducing a Posix class name; see the next section)</item> + introducing a Posix class name, or for a special compatibility + feature; see the next two sections)</item> <item>Terminating closing square bracket</item> </list> @@ -2385,16 +2472,18 @@ foo\Kbar</code> <tag>print</tag><item>Printing characters, including space</item> <tag>punct</tag><item>Printing characters, excluding letters, digits, and space</item> - <tag>space</tag><item>Whitespace (not quite the same as \s)</item> + <tag>space</tag><item>Whitespace (the same as \s from PCRE 8.34)</item> <tag>upper</tag><item>Uppercase letters</item> <tag>word</tag><item>"Word" characters (same as \w)</item> <tag>xdigit</tag><item>Hexadecimal digits</item> </taglist> - <p>The "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13), - and space (32). Notice that this list includes the VT character (code 11). - This makes "space" different to \s, which does not include VT (for Perl - compatibility).</p> + <p>The default "space" characters are HT (9), LF (10), VT (11), FF (12), + CR (13), and space (32). If locale-specific matching is taking place, the + list of space characters may be different; there may be fewer or more of + them. "Space" used to be different to \s, which did not include VT, for + Perl compatibility. However, Perl changed at release 5.18, and PCRE followed + at release 8.34. "Space" and \s now match the same set of characters.</p> <p>The name "word" is a Perl extension, and "blank" is a GNU extension from Perl 5.8. Another Perl extension is negation, which is indicated by a ^ @@ -2408,11 +2497,11 @@ foo\Kbar</code> "ch" is a "collating element", but these are not supported, and an error is given if they are encountered.</p> - <p>By default, in UTF modes, characters with values > 255 do not match + <p>By default, characters with values > 255 do not match any of the Posix character classes. However, if option <c>PCRE_UCP</c> is passed to <c>pcre_compile()</c>, some of the classes are changed so that - Unicode character properties are used. This is achieved by replacing the - Posix classes by other sequences, as follows:</p> + Unicode character properties are used. This is achieved by replacing + certain Posix classes by other sequences, as follows:</p> <taglist> <tag>[:alnum:]</tag><item>Becomes <em>\p{Xan}</em></item> @@ -2425,9 +2514,49 @@ foo\Kbar</code> <tag>[:word:]</tag><item>Becomes <em>\p{Xwd}</em></item> </taglist> - <p>Negated versions, such as [:^alpha:], use \P instead of \p. The other - Posix classes are unchanged, and match only characters with code points - < 256.</p> + <p>Negated versions, such as [:^alpha:], use \P instead of \p. Three other + POSIX classes are handled specially in UCP mode:</p> + <taglist> + <tag>[:graph:]</tag> + <item><p>This matches characters that have glyphs that mark the page + when printed. In Unicode property terms, it matches all characters with + the L, M, N, P, S, or Cf properties, except for:</p> + <taglist> + <tag>U+061C</tag><item><p>Arabic Letter Mark</p></item> + <tag>U+180E</tag><item><p>Mongolian Vowel Separator</p></item> + <tag>U+2066 - U+2069</tag><item><p>Various "isolate"s</p></item> + </taglist> + </item> + <tag>[:print:]</tag> + <item><p>This matches the same characters as [:graph:] plus space + characters that are not controls, that is, characters with the Zs + property.</p></item> + <tag>[:punct:]</tag><item><p>This matches all characters that have + the Unicode P (punctuation) property, plus those characters whose code + points are less than 128 that have the S (Symbol) property.</p></item> + </taglist> + <p>The other POSIX classes are unchanged, and match only characters with + code points less than 128. + </p> + + <p><em>Compatibility Feature for Word Boundaries</em></p> + + <p>In the POSIX.2 compliant library that was included in 4.4BSD Unix, + the ugly syntax [[:<:]] and [[:>:]] is used for matching "start + of word" and "end of word". PCRE treats these items as follows:</p> + <taglist> + <tag>[[:<:]]</tag><item><p>is converted to \b(?=\w)</p></item> + <tag>[[:>:]]</tag><item><p>is converted to \b(?<=\w)</p></item> + </taglist> + <p>Only these exact character sequences are recognized. A sequence such as + [a[:<:]b] provokes error for an unrecognized POSIX class name. This + support is not compatible with Perl. It is provided to help migrations from + other environments, and is best not used in any new patterns. Note that \b + matches at the start and the end of a word (see "Simple assertions" above), + and in a Perl-style pattern the preceding or following character normally + shows which is wanted, without the need for the assertions that are used + above in order to give exactly the POSIX behaviour.</p> + </section> <section> @@ -2476,8 +2605,7 @@ gilbert|sullivan</code> <p>When one of these option changes occurs at top-level (that is, not inside subpattern parentheses), the change applies to the remainder of the - pattern that follows. If the change is placed right at the start of a - pattern, PCRE extracts it into the global options.</p> + pattern that follows.</p> <p>An option change within a subpattern (see section <seealso marker="#sect11">Subpatterns</seealso>) affects only that part of the subpattern that follows it. So, the following matches abc and aBc and @@ -2645,9 +2773,9 @@ the ((?:red|white) (king|queen))</code> parentheses from other parts of the pattern, such as back references, recursion, and conditions, can be made by name and by number.</p> - <p>Names consist of up to 32 alphanumeric characters and underscores. Named - capturing parentheses are still allocated numbers as well as names, - exactly as if the names were not present. + <p>Names consist of up to 32 alphanumeric characters and underscores, but + must start with a non-digit. Named capturing parentheses are still allocated + numbers as well as names, exactly as if the names were not present. The <c>capture</c> specification to <seealso marker="#run/3"> <c>run/3</c></seealso> can use named values if they are present in the regular expression.</p> @@ -3118,7 +3246,14 @@ aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa</code> purposes of numbering the capturing subpatterns in the whole pattern. However, substring capturing is done only for positive assertions. (Perl sometimes, but not always, performs capturing in negative assertions.)</p> - + <warning> + <p>If a positive assertion containing one or more capturing subpatterns + succeeds, but failure to match later in the pattern causes backtracking over + this assertion, the captures within the assertion are reset only if no higher + numbered captures are already set. This is, unfortunately, a fundamental + limitation of the current implementation, and as PCRE1 is now in + maintenance-only status, it is unlikely ever to change.</p> + </warning> <p>For compatibility with Perl, assertion subpatterns can be repeated. However, it makes no sense to assert the same thing many times, the side effect of capturing parentheses can occasionally be useful. In practice, @@ -3371,12 +3506,7 @@ abcd$</code> <p>Perl uses the syntax (?(<name>)...) or (?('name')...) to test for a used subpattern by name. For compatibility with earlier versions of PCRE, which had this facility before Perl, the syntax (?(name)...) is also - recognized. However, there is a possible ambiguity with this syntax, as - subpattern names can consist entirely of digits. PCRE looks first for a - named subpattern; if it cannot find one and the name consists entirely of - digits, PCRE looks for a subpattern of that number, which must be > 0. - Using subpattern names that consist entirely of digits is not - recommended.</p> + recognized.</p> <p>Rewriting the previous example to use a named subpattern gives:</p> @@ -3958,11 +4088,13 @@ a+(*COMMIT)b</code> 2> re:run("xyzabc","(*COMMIT)abc",[{capture,all,list},no_start_optimize]). nomatch</code> - <p>PCRE knows that any match must start with "a", so the optimization skips - along the subject to "a" before running the first match attempt, which - succeeds. When the optimization is disabled by option - <c>no_start_optimize</c>, the match starts at "x" and so the (*COMMIT) - causes it to fail without trying any other starting points.</p> + <p>For this pattern, PCRE knows that any match must start with "a", so the + optimization skips along the subject to "a" before applying the pattern to the + first set of data. The match attempt then succeeds. In the second call the + <c>no_start_optimize</c> disables the optimization that skips along to the + first character. The pattern is now applied starting at "x", and so the + (*COMMIT) causes the match to fail without trying any other starting + points.</p> <p>The following verb causes the match to fail at the current starting position in the subject if there is a later matching failure that causes @@ -4138,7 +4270,7 @@ A (B(*THEN)C | (*FAIL)) | D</code> ...(*COMMIT)(*PRUNE)...</code> <p>If there is a matching failure to the right, backtracking onto (*PRUNE) - cases it to be triggered, and its action is taken. There can never be a + causes it to be triggered, and its action is taken. There can never be a backtrack onto (*COMMIT).</p> <p><em>Backtracking Verbs in Repeated Groups</em></p> diff --git a/lib/stdlib/doc/src/shell.xml b/lib/stdlib/doc/src/shell.xml index d6e8036d4e..ab62c2fcdd 100644 --- a/lib/stdlib/doc/src/shell.xml +++ b/lib/stdlib/doc/src/shell.xml @@ -4,7 +4,7 @@ <erlref> <header> <copyright> - <year>1996</year><year>2016</year> + <year>1996</year><year>2017</year> <holder>Ericsson AB. All Rights Reserved.</holder> </copyright> <legalnotice> @@ -165,12 +165,12 @@ <item> <p>Evaluates <c>shell_default:help()</c>.</p> </item> - <tag><c>c(File)</c></tag> + <tag><c>c(Mod)</c></tag> <item> - <p>Evaluates <c>shell_default:c(File)</c>. This compiles - and loads code in <c>File</c> and purges old versions of - code, if necessary. Assumes that the file and module names - are the same.</p> + <p>Evaluates <c>shell_default:c(Mod)</c>. This compiles and + loads the module <c>Mod</c> and purges old versions of the + code, if necessary. <c>Mod</c> can be either a module name or a + a source file path, with or without <c>.erl</c> extension.</p> </item> <tag><c>catch_exception(Bool)</c></tag> <item> @@ -854,7 +854,7 @@ q - quit erlang <c>{history, N}</c>, where <c>N</c> is the current command number. The function is to return a list of characters or an atom. This constraint is because of the Erlang I/O protocol. Unicode characters - beyond code point 255 are allowed in the list. Notice + beyond code point 255 are allowed in the list and the atom. Notice that in restricted mode the call <c>Mod:Func(L)</c> must be allowed or the default shell prompt function is called.</p> </section> diff --git a/lib/stdlib/doc/src/string.xml b/lib/stdlib/doc/src/string.xml index dddedf1132..dc83c40a9a 100644 --- a/lib/stdlib/doc/src/string.xml +++ b/lib/stdlib/doc/src/string.xml @@ -36,8 +36,613 @@ <modulesummary>String processing functions.</modulesummary> <description> <p>This module provides functions for string processing.</p> + <p>A string in this module is represented by <seealso marker="unicode#type-chardata"> + <c>unicode:chardata()</c></seealso>, that is, a list of codepoints, + binaries with UTF-8-encoded codepoints + (<em>UTF-8 binaries</em>), or a mix of the two.</p> + <code> +"abcd" is a valid string +<<"abcd">> is a valid string +["abcd"] is a valid string +<<"abc..åäö"/utf8>> is a valid string +<<"abc..åäö">> is NOT a valid string, + but a binary with Latin-1-encoded codepoints +[<<"abc">>, "..åäö"] is a valid string +[atom] is NOT a valid string</code> + <p> + This module operates on grapheme clusters. A <em>grapheme cluster</em> + is a user-perceived character, which can be represented by several + codepoints. + </p> + <code> +"å" [229] or [97, 778] +"e̊" [101, 778]</code> + <p> + The string length of "ß↑e̊" is 3, even though it is represented by the + codepoints <c>[223,8593,101,778]</c> or the UTF-8 binary + <c><<195,159,226,134,145,101,204,138>></c>. + </p> + <p> + Grapheme clusters for codepoints of class <c>prepend</c> + and non-modern (or decomposed) Hangul is not handled for performance + reasons in + <seealso marker="#find/3"><c>find/3</c></seealso>, + <seealso marker="#replace/3"><c>replace/3</c></seealso>, + <seealso marker="#split/2"><c>split/2</c></seealso>, + <seealso marker="#lexemes/2"><c>split/2</c></seealso> and + <seealso marker="#trim/3"><c>trim/3</c></seealso>. + </p> + <p> + Splitting and appending strings is to be done on grapheme clusters + borders. + There is no verification that the results of appending strings are + valid or normalized. + </p> + <p> + Most of the functions expect all input to be normalized to one form, + see for example <seealso marker="unicode#characters_to_nfc_list/1"> + <c>unicode:characters_to_nfc_list/1</c></seealso>. + </p> + <p> + Language or locale specific handling of input is not considered + in any function. + </p> + <p> + The functions can crash for non-valid input strings. For example, + the functions expect UTF-8 binaries but not all functions + verify that all binaries are encoded correctly. + </p> + <p> + Unless otherwise specified the return value type is the same as + the input type. That is, binary input returns binary output, + list input returns a list output, and mixed input can return a + mixed output.</p> + <code> +1> string:trim(" sarah "). +"sarah" +2> string:trim(<<" sarah ">>). +<<"sarah">> +3> string:lexemes("foo bar", " "). +["foo","bar"] +4> string:lexemes(<<"foo bar">>, " "). +[<<"foo">>,<<"bar">>]</code> + <p>This module has been reworked in Erlang/OTP 20 to + handle <seealso marker="unicode#type-chardata"> + <c>unicode:chardata()</c></seealso> and operate on grapheme + clusters. The <seealso marker="#oldapi"> <c>old + functions</c></seealso> that only work on Latin-1 lists as input + are still available but should not be + used. They will be deprecated in Erlang/OTP 21. + </p> </description> + <datatypes> + <datatype> + <name name="direction"/> + <name name="grapheme_cluster"/> + <desc> + <p>A user-perceived character, consisting of one or more + codepoints.</p> + </desc> + </datatype> + </datatypes> + + <funcs> + + <func> + <name name="casefold" arity="1"/> + <fsummary>Convert a string to a comparable string.</fsummary> + <desc> + <p> + Converts <c><anno>String</anno></c> to a case-agnostic + comparable string. Function <c>casefold/1</c> is preferred + over <c>lowercase/1</c> when two strings are to be compared + for equality. See also <seealso marker="#equal/4"><c>equal/4</c></seealso>. + </p> + <p><em>Example:</em></p> + <pre> +1> <input>string:casefold("Ω and ẞ SHARP S").</input> +"ω and ss sharp s"</pre> + </desc> + </func> + + <func> + <name name="chomp" arity="1"/> + <fsummary>Remove trailing end of line control characters.</fsummary> + <desc> + <p> + Returns a string where any trailing <c>\n</c> or + <c>\r\n</c> have been removed from <c><anno>String</anno></c>. + </p> + <p><em>Example:</em></p> + <pre> +182> <input>string:chomp(<<"\nHello\n\n">>).</input> +<<"\nHello">> +183> <input>string:chomp("\nHello\r\r\n").</input> +"\nHello\r"</pre> + </desc> + </func> + + <func> + <name name="equal" arity="2"/> + <name name="equal" arity="3"/> + <name name="equal" arity="4"/> + <fsummary>Test string equality.</fsummary> + <desc> + <p> + Returns <c>true</c> if <c><anno>A</anno></c> and + <c><anno>B</anno></c> are equal, otherwise <c>false</c>. + </p> + <p> + If <c><anno>IgnoreCase</anno></c> is <c>true</c> + the function does <seealso marker="#casefold/1"> + <c>casefold</c>ing</seealso> on the fly before the equality test. + </p> + <p>If <c><anno>Norm</anno></c> is not <c>none</c> + the function applies normalization on the fly before the equality test. + There are four available normalization forms: + <seealso marker="unicode#characters_to_nfc_list/1"> <c>nfc</c></seealso>, + <seealso marker="unicode#characters_to_nfd_list/1"> <c>nfd</c></seealso>, + <seealso marker="unicode#characters_to_nfkc_list/1"> <c>nfkc</c></seealso>, and + <seealso marker="unicode#characters_to_nfkd_list/1"> <c>nfkd</c></seealso>. + </p> + <p>By default, + <c><anno>IgnoreCase</anno></c> is <c>false</c> and + <c><anno>Norm</anno></c> is <c>none</c>.</p> + <p><em>Example:</em></p> + <pre> +1> <input>string:equal("åäö", <<"åäö"/utf8>>).</input> +true +2> <input>string:equal("åäö", unicode:characters_to_nfd_binary("åäö")).</input> +false +3> <input>string:equal("åäö", unicode:characters_to_nfd_binary("ÅÄÖ"), true, nfc).</input> +true</pre> + </desc> + </func> + + <func> + <name name="find" arity="2"/> + <name name="find" arity="3"/> + <fsummary>Find start of substring.</fsummary> + <desc> + <p> + Removes anything before <c><anno>SearchPattern</anno></c> in <c><anno>String</anno></c> + and returns the remainder of the string or <c>nomatch</c> if <c><anno>SearchPattern</anno></c> is not + found. + <c><anno>Dir</anno></c>, which can be <c>leading</c> or + <c>trailing</c>, indicates from which direction characters + are to be searched. + </p> + <p> + By default, <c><anno>Dir</anno></c> is <c>leading</c>. + </p> + <p><em>Example:</em></p> + <pre> +1> <input>string:find("ab..cd..ef", ".").</input> +"..cd..ef" +2> <input>string:find(<<"ab..cd..ef">>, "..", trailing).</input> +<<"..ef">> +3> <input>string:find(<<"ab..cd..ef">>, "x", leading).</input> +nomatch +4> <input>string:find("ab..cd..ef", "x", trailing).</input> +nomatch</pre> + </desc> + </func> + + <func> + <name name="is_empty" arity="1"/> + <fsummary>Check if the string is empty.</fsummary> + <desc> + <p>Returns <c>true</c> if <c><anno>String</anno></c> is the + empty string, otherwise <c>false</c>.</p> + <p><em>Example:</em></p> + <pre> +1> <input>string:is_empty("foo").</input> +false +2> <input>string:is_empty(["",<<>>]).</input> +true</pre> + </desc> + </func> + + <func> + <name name="length" arity="1"/> + <fsummary>Calculate length of the string.</fsummary> + <desc> + <p> + Returns the number of grapheme clusters in <c><anno>String</anno></c>. + </p> + <p><em>Example:</em></p> + <pre> +1> <input>string:length("ß↑e̊").</input> +3 +2> <input>string:length(<<195,159,226,134,145,101,204,138>>).</input> +3</pre> + </desc> + </func> + + <func> + <name name="lexemes" arity="2"/> + <fsummary>Split string into lexemes.</fsummary> + <desc> + <p> + Returns a list of lexemes in <c><anno>String</anno></c>, separated + by the grapheme clusters in <c><anno>SeparatorList</anno></c>. + </p> + <p> + Notice that, as shown in this example, two or more + adjacent separator graphemes clusters in <c><anno>String</anno></c> + are treated as one. That is, there are no empty + strings in the resulting list of lexemes. + See also <seealso marker="#split/3"><c>split/3</c></seealso> which returns + empty strings. + </p> + <p>Notice that <c>[$\r,$\n]</c> is one grapheme cluster.</p> + <p><em>Example:</em></p> + <pre> +1> <input>string:lexemes("abc de̊fxxghix jkl\r\nfoo", "x e" ++ [[$\r,$\n]]).</input> +["abc","de̊f","ghi","jkl","foo"] +2> <input>string:lexemes(<<"abc de̊fxxghix jkl\r\nfoo"/utf8>>, "x e" ++ [$\r,$\n]).</input> +[<<"abc">>,<<"de̊f"/utf8>>,<<"ghi">>,<<"jkl\r\nfoo">>]</pre> + </desc> + </func> + + <func> + <name name="lowercase" arity="1"/> + <fsummary>Convert a string to lowercase</fsummary> + <desc> + <p> + Converts <c><anno>String</anno></c> to lowercase. + </p> + <p> + Notice that function <seealso marker="#casefold/1"><c>casefold/1</c></seealso> + should be used when converting a string to + be tested for equality. + </p> + <p><em>Example:</em></p> + <pre> +2> <input>string:lowercase(string:uppercase("Michał")).</input> +"michał"</pre> + </desc> + </func> + + <func> + <name name="next_codepoint" arity="1"/> + <fsummary>Pick the first codepoint.</fsummary> + <desc> + <p> + Returns the first codepoint in <c><anno>String</anno></c> + and the rest of <c><anno>String</anno></c> in the tail. + </p> + <p><em>Example:</em></p> + <pre> +1> <input>string:next_codepoint(unicode:characters_to_binary("e̊fg")).</input> +[101|<<"̊fg"/utf8>>]</pre> + </desc> + </func> + + <func> + <name name="next_grapheme" arity="1"/> + <fsummary>Pick the first grapheme cluster.</fsummary> + <desc> + <p> + Returns the first grapheme cluster in <c><anno>String</anno></c> + and the rest of <c><anno>String</anno></c> in the tail. + </p> + <p><em>Example:</em></p> + <pre> +1> <input>string:next_grapheme(unicode:characters_to_binary("e̊fg")).</input> +["e̊"|<<"fg">>]</pre> + </desc> + </func> + + <func> + <name name="nth_lexeme" arity="3"/> + <fsummary>Pick the nth lexeme.</fsummary> + <desc> + <p>Returns lexeme number <c><anno>N</anno></c> in + <c><anno>String</anno></c>, where lexemes are separated by + the grapheme clusters in <c><anno>SeparatorList</anno></c>. + </p> + <p><em>Example:</em></p> + <pre> +1> <input>string:nth_lexeme("abc.de̊f.ghiejkl", 3, ".e").</input> +"ghi"</pre> + </desc> + </func> + + <func> + <name name="pad" arity="2"/> + <name name="pad" arity="3"/> + <name name="pad" arity="4"/> + <fsummary>Pad a string to given length.</fsummary> + <desc> + <p> + Pads <c><anno>String</anno></c> to <c><anno>Length</anno></c> with + grapheme cluster <c><anno>Char</anno></c>. + <c><anno>Dir</anno></c>, which can be <c>leading</c>, <c>trailing</c>, + or <c>both</c>, indicates where the padding should be added. + </p> + <p>By default, <c><anno>Char</anno></c> is <c>$\s</c> and + <c><anno>Dir</anno></c> is <c>trailing</c>. + </p> + <p><em>Example:</em></p> + <pre> +1> <input>string:pad(<<"He̊llö"/utf8>>, 8).</input> +[<<72,101,204,138,108,108,195,182>>,32,32,32] +2> <input>io:format("'~ts'~n",[string:pad("He̊llö", 8, leading)]).</input> +' He̊llö' +3> <input>io:format("'~ts'~n",[string:pad("He̊llö", 8, both)]).</input> +' He̊llö '</pre> + </desc> + </func> + + <func> + <name name="prefix" arity="2"/> + <fsummary>Remove prefix from string.</fsummary> + <desc> + <p> + If <c><anno>Prefix</anno></c> is the prefix of + <c><anno>String</anno></c>, removes it and returns the + remainder of <c><anno>String</anno></c>, otherwise returns + <c>nomatch</c>. + </p> + <p><em>Example:</em></p> + <pre> +1> <input>string:prefix(<<"prefix of string">>, "pre").</input> +<<"fix of string">> +2> <input>string:prefix("pre", "prefix").</input> +nomatch</pre> + </desc> + </func> + + <func> + <name name="replace" arity="3"/> + <name name="replace" arity="4"/> + <fsummary>Replace a pattern in string.</fsummary> + <desc> + <p> + Replaces <c><anno>SearchPattern</anno></c> in <c><anno>String</anno></c> + with <c><anno>Replacement</anno></c>. + <c><anno>Where</anno></c>, default <c>leading</c>, indicates whether + the <c>leading</c>, the <c>trailing</c> or <c>all</c> encounters of + <c><anno>SearchPattern</anno></c> are to be replaced. + </p> + <p>Can be implemented as:</p> + <pre>lists:join(Replacement, split(String, SearchPattern, Where)).</pre> + <p><em>Example:</em></p> + <pre> +1> <input>string:replace(<<"ab..cd..ef">>, "..", "*").</input> +[<<"ab">>,"*",<<"cd..ef">>] +2> <input>string:replace(<<"ab..cd..ef">>, "..", "*", all).</input> +[<<"ab">>,"*",<<"cd">>,"*",<<"ef">>]</pre> + </desc> + </func> + + <func> + <name name="reverse" arity="1"/> + <fsummary>Reverses a string</fsummary> + <desc> + <p> + Returns the reverse list of the grapheme clusters in <c><anno>String</anno></c>. + </p> + <p><em>Example:</em></p> + <pre> +1> Reverse = <input>string:reverse(unicode:characters_to_nfd_binary("ÅÄÖ")).</input> +[[79,776],[65,776],[65,778]] +2> <input>io:format("~ts~n",[Reverse]).</input> +ÖÄÅ</pre> + </desc> + </func> + + <func> + <name name="slice" arity="2"/> + <name name="slice" arity="3"/> + <fsummary>Extract a part of string</fsummary> + <desc> + <p>Returns a substring of <c><anno>String</anno></c> of + at most <c><anno>Length</anno></c> grapheme clusters, starting at position + <c><anno>Start</anno></c>.</p> + <p>By default, <c><anno>Length</anno></c> is <c>infinity</c>.</p> + <p><em>Example:</em></p> + <pre> +1> <input>string:slice(<<"He̊llö Wörld"/utf8>>, 4).</input> +<<"ö Wörld"/utf8>> +2> <input>string:slice(["He̊llö ", <<"Wörld"/utf8>>], 4,4).</input> +"ö Wö" +3> <input>string:slice(["He̊llö ", <<"Wörld"/utf8>>], 4,50).</input> +"ö Wörld"</pre> + </desc> + </func> + + <func> + <name name="split" arity="2"/> + <name name="split" arity="3"/> + <fsummary>Split a string into substrings.</fsummary> + <desc> + <p> + Splits <c><anno>String</anno></c> where <c><anno>SearchPattern</anno></c> + is encountered and return the remaining parts. + <c><anno>Where</anno></c>, default <c>leading</c>, indicates whether + the <c>leading</c>, the <c>trailing</c> or <c>all</c> encounters of + <c><anno>SearchPattern</anno></c> will split <c><anno>String</anno></c>. + </p> + <p><em>Example:</em></p> + <pre> +0> <input>string:split("ab..bc..cd", "..").</input> +["ab","bc..cd"] +1> <input>string:split(<<"ab..bc..cd">>, "..", trailing).</input> +[<<"ab..bc">>,<<"cd">>] +2> <input>string:split(<<"ab..bc....cd">>, "..", all).</input> +[<<"ab">>,<<"bc">>,<<>>,<<"cd">>]</pre> + </desc> + </func> + + <func> + <name name="take" arity="2"/> + <name name="take" arity="3"/> + <name name="take" arity="4"/> + <fsummary>Take leading or trailing parts.</fsummary> + <desc> + <p>Takes characters from <c><anno>String</anno></c> as long as + the characters are members of set <c><anno>Characters</anno></c> + or the complement of set <c><anno>Characters</anno></c>. + <c><anno>Dir</anno></c>, + which can be <c>leading</c> or <c>trailing</c>, indicates from + which direction characters are to be taken. + </p> + <p><em>Example:</em></p> + <pre> +5> <input>string:take("abc0z123", lists:seq($a,$z)).</input> +{"abc","0z123"} +6> <input>string:take(<<"abc0z123">>, lists:seq($0,$9), true, leading).</input> +{<<"abc">>,<<"0z123">>} +7> <input>string:take("abc0z123", lists:seq($0,$9), false, trailing).</input> +{"abc0z","123"} +8> <input>string:take(<<"abc0z123">>, lists:seq($a,$z), true, trailing).</input> +{<<"abc0z">>,<<"123">>}</pre> + </desc> + </func> + + <func> + <name name="titlecase" arity="1"/> + <fsummary>Convert a string to titlecase.</fsummary> + <desc> + <p> + Converts <c><anno>String</anno></c> to titlecase. + </p> + <p><em>Example:</em></p> + <pre> +1> <input>string:titlecase("ß is a SHARP s").</input> +"Ss is a SHARP s"</pre> + </desc> + </func> + + <func> + <name name="to_float" arity="1"/> + <fsummary>Return a float whose text representation is the integers + (ASCII values) of a string.</fsummary> + <desc> + <p>Argument <c><anno>String</anno></c> is expected to start with a + valid text represented float (the digits are ASCII values). + Remaining characters in the string after the float are returned in + <c><anno>Rest</anno></c>.</p> + <p><em>Example:</em></p> + <pre> +> <input>{F1,Fs} = string:to_float("1.0-1.0e-1"),</input> +> <input>{F2,[]} = string:to_float(Fs),</input> +> <input>F1+F2.</input> +0.9 +> <input>string:to_float("3/2=1.5").</input> +{error,no_float} +> <input>string:to_float("-1.5eX").</input> +{-1.5,"eX"}</pre> + </desc> + </func> + + <func> + <name name="to_integer" arity="1"/> + <fsummary>Return an integer whose text representation is the integers + (ASCII values) of a string.</fsummary> + <desc> + <p>Argument <c><anno>String</anno></c> is expected to start with a + valid text represented integer (the digits are ASCII values). + Remaining characters in the string after the integer are returned in + <c><anno>Rest</anno></c>.</p> + <p><em>Example:</em></p> + <pre> +> <input>{I1,Is} = string:to_integer("33+22"),</input> +> <input>{I2,[]} = string:to_integer(Is),</input> +> <input>I1-I2.</input> +11 +> <input>string:to_integer("0.5").</input> +{0,".5"} +> <input>string:to_integer("x=2").</input> +{error,no_integer}</pre> + </desc> + </func> + + <func> + <name name="to_graphemes" arity="1"/> + <fsummary>Convert a string to a list of grapheme clusters.</fsummary> + <desc> + <p> + Converts <c><anno>String</anno></c> to a list of grapheme clusters. + </p> + <p><em>Example:</em></p> + <pre> +1> <input>string:to_graphemes("ß↑e̊").</input> +[223,8593,[101,778]] +2> <input>string:to_graphemes(<<"ß↑e̊"/utf8>>).</input> +[223,8593,[101,778]]</pre> + </desc> + </func> + + <func> + <name name="trim" arity="1"/> + <name name="trim" arity="2"/> + <name name="trim" arity="3"/> + <fsummary>Trim leading or trailing, or both, characters.</fsummary> + <desc> + <p> + Returns a string, where leading or trailing, or both, + <c><anno>Characters</anno></c> have been removed. + <c><anno>Dir</anno></c> which can be <c>leading</c>, <c>trailing</c>, + or <c>both</c>, indicates from which direction characters + are to be removed. + </p> + <p> Default <c><anno>Characters</anno></c> are the set of + nonbreakable whitespace codepoints, defined as + Pattern_White_Space in + <url href="http://unicode.org/reports/tr31/">Unicode Standard Annex #31</url>. + <c>By default, <anno>Dir</anno></c> is <c>both</c>. + </p> + <p> + Notice that <c>[$\r,$\n]</c> is one grapheme cluster according + to the Unicode Standard. + </p> + <p><em>Example:</em></p> + <pre> +1> <input>string:trim("\t Hello \n").</input> +"Hello" +2> <input>string:trim(<<"\t Hello \n">>, leading).</input> +<<"Hello \n">> +3> <input>string:trim(<<".Hello.\n">>, trailing, "\n.").</input> +<<".Hello">></pre> + </desc> + </func> + + <func> + <name name="uppercase" arity="1"/> + <fsummary>Convert a string to uppercase.</fsummary> + <desc> + <p> + Converts <c><anno>String</anno></c> to uppercase. + </p> + <p>See also <seealso marker="#titlecase/1"><c>titlecase/1</c></seealso>.</p> + <p><em>Example:</em></p> + <pre> +1> <input>string:uppercase("Michał").</input> +"MICHAŁ"</pre> + </desc> + </func> + + </funcs> + + <section> + <marker id="oldapi"/> + <title>Obsolete API functions</title> + <p>Here follows the function of the old API. + These functions only work on a list of Latin-1 characters. + </p> + <note><p> + The functions are kept for backward compatibility, but are + not recommended. + They will be deprecated in Erlang/OTP 21. + </p> + <p>Any undocumented functions in <c>string</c> are not to be used.</p> + </note> + </section> + <funcs> <func> <name name="centre" arity="2"/> @@ -47,17 +652,24 @@ <p>Returns a string, where <c><anno>String</anno></c> is centered in the string and surrounded by blanks or <c><anno>Character</anno></c>. The resulting string has length <c><anno>Number</anno></c>.</p> + <p>This function is <seealso marker="#oldapi">obsolete</seealso>. + Use + <seealso marker="#pad/3"><c>pad/3</c></seealso>. + </p> </desc> </func> <func> <name name="chars" arity="2"/> <name name="chars" arity="3"/> - <fsummary>Returns a string consisting of numbers of characters.</fsummary> + <fsummary>Return a string consisting of numbers of characters.</fsummary> <desc> <p>Returns a string consisting of <c><anno>Number</anno></c> characters <c><anno>Character</anno></c>. Optionally, the string can end with string <c><anno>Tail</anno></c>.</p> + <p>This function is <seealso marker="#oldapi">obsolete</seealso>. + Use + <seealso marker="lists#duplicate/2"><c>lists:duplicate/2</c></seealso>.</p> </desc> </func> @@ -69,6 +681,9 @@ <p>Returns the index of the first occurrence of <c><anno>Character</anno></c> in <c><anno>String</anno></c>. Returns <c>0</c> if <c><anno>Character</anno></c> does not occur.</p> + <p>This function is <seealso marker="#oldapi">obsolete</seealso>. + Use + <seealso marker="#find/2"><c>find/2</c></seealso>.</p> </desc> </func> @@ -79,6 +694,16 @@ <p>Concatenates <c><anno>String1</anno></c> and <c><anno>String2</anno></c> to form a new string <c><anno>String3</anno></c>, which is returned.</p> + <p> + This function is <seealso marker="#oldapi">obsolete</seealso>. + Use <c>[<anno>String1</anno>, <anno>String2</anno>]</c> as + <c>Data</c> argument, and call + <seealso marker="unicode#characters_to_list/2"> + <c>unicode:characters_to_list/2</c></seealso> or + <seealso marker="unicode#characters_to_binary/2"> + <c>unicode:characters_to_binary/2</c></seealso> + to flatten the output. + </p> </desc> </func> @@ -88,6 +713,9 @@ <desc> <p>Returns a string containing <c><anno>String</anno></c> repeated <c><anno>Number</anno></c> times.</p> + <p>This function is <seealso marker="#oldapi">obsolete</seealso>. + Use + <seealso marker="lists#duplicate/2"><c>lists:duplicate/2</c></seealso>.</p> </desc> </func> @@ -98,6 +726,9 @@ <p>Returns the length of the maximum initial segment of <c><anno>String</anno></c>, which consists entirely of characters not from <c><anno>Chars</anno></c>.</p> + <p>This function is <seealso marker="#oldapi">obsolete</seealso>. + Use + <seealso marker="#take/3"><c>take/3</c></seealso>.</p> <p><em>Example:</em></p> <code type="none"> > string:cspan("\t abcdef", " \t"). @@ -106,20 +737,14 @@ </func> <func> - <name name="equal" arity="2"/> - <fsummary>Test string equality.</fsummary> - <desc> - <p>Returns <c>true</c> if <c><anno>String1</anno></c> and - <c><anno>String2</anno></c> are equal, otherwise <c>false</c>.</p> - </desc> - </func> - - <func> <name name="join" arity="2"/> <fsummary>Join a list of strings with separator.</fsummary> <desc> <p>Returns a string with the elements of <c><anno>StringList</anno></c> separated by the string in <c><anno>Separator</anno></c>.</p> + <p>This function is <seealso marker="#oldapi">obsolete</seealso>. + Use + <seealso marker="lists#join/2"><c>lists:join/2</c></seealso>.</p> <p><em>Example:</em></p> <code type="none"> > join(["one", "two", "three"], ", "). @@ -137,6 +762,10 @@ fixed. If <c>length(<anno>String</anno>)</c> < <c><anno>Number</anno></c>, then <c><anno>String</anno></c> is padded with blanks or <c><anno>Character</anno></c>s.</p> + <p>This function is <seealso marker="#oldapi">obsolete</seealso>. + Use + <seealso marker="#pad/2"><c>pad/2</c></seealso> or + <seealso marker="#pad/3"><c>pad/3</c></seealso>.</p> <p><em>Example:</em></p> <code type="none"> > string:left("Hello",10,$.). @@ -149,6 +778,9 @@ <fsummary>Return the length of a string.</fsummary> <desc> <p>Returns the number of characters in <c><anno>String</anno></c>.</p> + <p>This function is <seealso marker="#oldapi">obsolete</seealso>. + Use + <seealso marker="#length/1"><c>length/1</c></seealso>.</p> </desc> </func> @@ -160,6 +792,9 @@ <p>Returns the index of the last occurrence of <c><anno>Character</anno></c> in <c><anno>String</anno></c>. Returns <c>0</c> if <c><anno>Character</anno></c> does not occur.</p> + <p>This function is <seealso marker="#oldapi">obsolete</seealso>. + Use + <seealso marker="#find/3"><c>find/3</c></seealso>.</p> </desc> </func> @@ -173,6 +808,9 @@ fixed. If the length of <c>(<anno>String</anno>)</c> < <c><anno>Number</anno></c>, then <c><anno>String</anno></c> is padded with blanks or <c><anno>Character</anno></c>s.</p> + <p>This function is <seealso marker="#oldapi">obsolete</seealso>. + Use + <seealso marker="#pad/3"><c>pad/3</c></seealso>.</p> <p><em>Example:</em></p> <code type="none"> > string:right("Hello", 10, $.). @@ -188,6 +826,9 @@ <c><anno>SubString</anno></c> begins in <c><anno>String</anno></c>. Returns <c>0</c> if <c><anno>SubString</anno></c> does not exist in <c><anno>String</anno></c>.</p> + <p>This function is <seealso marker="#oldapi">obsolete</seealso>. + Use + <seealso marker="#find/3"><c>find/3</c></seealso>.</p> <p><em>Example:</em></p> <code type="none"> > string:rstr(" Hello Hello World World ", "Hello World"). @@ -202,6 +843,9 @@ <p>Returns the length of the maximum initial segment of <c><anno>String</anno></c>, which consists entirely of characters from <c><anno>Chars</anno></c>.</p> + <p>This function is <seealso marker="#oldapi">obsolete</seealso>. + Use + <seealso marker="#take/2"><c>take/2</c></seealso>.</p> <p><em>Example:</em></p> <code type="none"> > string:span("\t abcdef", " \t"). @@ -217,6 +861,9 @@ <c><anno>SubString</anno></c> begins in <c><anno>String</anno></c>. Returns <c>0</c> if <c><anno>SubString</anno></c> does not exist in <c><anno>String</anno></c>.</p> + <p>This function is <seealso marker="#oldapi">obsolete</seealso>. + Use + <seealso marker="#find/2"><c>find/2</c></seealso>.</p> <p><em>Example:</em></p> <code type="none"> > string:str(" Hello Hello World World ", "Hello World"). @@ -230,12 +877,15 @@ <name name="strip" arity="3"/> <fsummary>Strip leading or trailing characters.</fsummary> <desc> - <p>Returns a string, where leading and/or trailing blanks or a + <p>Returns a string, where leading or trailing, or both, blanks or a number of <c><anno>Character</anno></c> have been removed. <c><anno>Direction</anno></c>, which can be <c>left</c>, <c>right</c>, or <c>both</c>, indicates from which direction blanks are to be removed. <c>strip/1</c> is equivalent to <c>strip(String, both)</c>.</p> + <p>This function is <seealso marker="#oldapi">obsolete</seealso>. + Use + <seealso marker="#trim/3"><c>trim/3</c></seealso>.</p> <p><em>Example:</em></p> <code type="none"> > string:strip("...Hello.....", both, $.). @@ -251,6 +901,9 @@ <p>Returns a substring of <c><anno>String</anno></c>, starting at position <c><anno>Start</anno></c> to the end of the string, or to and including position <c><anno>Stop</anno></c>.</p> + <p>This function is <seealso marker="#oldapi">obsolete</seealso>. + Use + <seealso marker="#slice/3"><c>slice/3</c></seealso>.</p> <p><em>Example:</em></p> <code type="none"> sub_string("Hello World", 4, 8). @@ -266,6 +919,9 @@ sub_string("Hello World", 4, 8). <p>Returns a substring of <c><anno>String</anno></c>, starting at position <c><anno>Start</anno></c>, and ending at the end of the string or at length <c><anno>Length</anno></c>.</p> + <p>This function is <seealso marker="#oldapi">obsolete</seealso>. + Use + <seealso marker="#slice/3"><c>slice/3</c></seealso>.</p> <p><em>Example:</em></p> <code type="none"> > substr("Hello World", 4, 5). @@ -281,6 +937,9 @@ sub_string("Hello World", 4, 8). <p>Returns the word in position <c><anno>Number</anno></c> of <c><anno>String</anno></c>. Words are separated by blanks or <c><anno>Character</anno></c>s.</p> + <p>This function is <seealso marker="#oldapi">obsolete</seealso>. + Use + <seealso marker="#nth_lexeme/3"><c>nth_lexeme/3</c></seealso>.</p> <p><em>Example:</em></p> <code type="none"> > string:sub_word(" Hello old boy !",3,$o). @@ -289,50 +948,6 @@ sub_string("Hello World", 4, 8). </func> <func> - <name name="to_float" arity="1"/> - <fsummary>Returns a float whose text representation is the integers - (ASCII values) in a string.</fsummary> - <desc> - <p>Argument <c><anno>String</anno></c> is expected to start with a - valid text represented float (the digits are ASCII values). - Remaining characters in the string after the float are returned in - <c><anno>Rest</anno></c>.</p> - <p><em>Example:</em></p> - <code type="none"> -> {F1,Fs} = string:to_float("1.0-1.0e-1"), -> {F2,[]} = string:to_float(Fs), -> F1+F2. -0.9 -> string:to_float("3/2=1.5"). -{error,no_float} -> string:to_float("-1.5eX"). -{-1.5,"eX"}</code> - </desc> - </func> - - <func> - <name name="to_integer" arity="1"/> - <fsummary>Returns an integer whose text representation is the integers - (ASCII values) in a string.</fsummary> - <desc> - <p>Argument <c><anno>String</anno></c> is expected to start with a - valid text represented integer (the digits are ASCII values). - Remaining characters in the string after the integer are returned in - <c><anno>Rest</anno></c>.</p> - <p><em>Example:</em></p> - <code type="none"> -> {I1,Is} = string:to_integer("33+22"), -> {I2,[]} = string:to_integer(Is), -> I1-I2. -11 -> string:to_integer("0.5"). -{0,".5"} -> string:to_integer("x=2"). -{error,no_integer}</code> - </desc> - </func> - - <func> <name name="to_lower" arity="1" clause_i="1"/> <name name="to_lower" arity="1" clause_i="2"/> <name name="to_upper" arity="1" clause_i="1"/> @@ -346,6 +961,11 @@ sub_string("Hello World", 4, 8). <p>The specified string or character is case-converted. Notice that the supported character set is ISO/IEC 8859-1 (also called Latin 1); all values outside this set are unchanged</p> + <p>This function is <seealso marker="#oldapi">obsolete</seealso> use + <seealso marker="#lowercase/1"><c>lowercase/1</c></seealso>, + <seealso marker="#uppercase/1"><c>uppercase/1</c></seealso>, + <seealso marker="#titlecase/1"><c>titlecase/1</c></seealso> or + <seealso marker="#casefold/1"><c>casefold/1</c></seealso>.</p> </desc> </func> @@ -363,6 +983,9 @@ sub_string("Hello World", 4, 8). adjacent separator characters in <c><anno>String</anno></c> are treated as one. That is, there are no empty strings in the resulting list of tokens.</p> + <p>This function is <seealso marker="#oldapi">obsolete</seealso>. + Use + <seealso marker="#lexemes/2"><c>lexemes/2</c></seealso>.</p> </desc> </func> @@ -373,6 +996,9 @@ sub_string("Hello World", 4, 8). <desc> <p>Returns the number of words in <c><anno>String</anno></c>, separated by blanks or <c><anno>Character</anno></c>.</p> + <p>This function is <seealso marker="#oldapi">obsolete</seealso>. + Use + <seealso marker="#lexemes/2"><c>lexemes/2</c></seealso>.</p> <p><em>Example:</em></p> <code type="none"> > words(" Hello old boy!", $o). @@ -387,10 +1013,7 @@ sub_string("Hello World", 4, 8). other. The reason is that this string package is the combination of two earlier packages and all functions of both packages have been retained.</p> - - <note> - <p>Any undocumented functions in <c>string</c> are not to be used.</p> - </note> </section> + </erlref> diff --git a/lib/stdlib/doc/src/supervisor.xml b/lib/stdlib/doc/src/supervisor.xml index 294196f746..bb06d3645e 100644 --- a/lib/stdlib/doc/src/supervisor.xml +++ b/lib/stdlib/doc/src/supervisor.xml @@ -133,8 +133,10 @@ sup_flags() = #{strategy => strategy(), % optional map. Assuming the values <c>MaxR</c> for <c>intensity</c> and <c>MaxT</c> for <c>period</c>, then, if more than <c>MaxR</c> restarts occur within <c>MaxT</c> seconds, the supervisor - terminates all child processes and then itself. <c>intensity</c> - defaults to <c>1</c> and <c>period</c> defaults to <c>5</c>.</p> + terminates all child processes and then itself. The termination + reason for the supervisor itself in that case will be <c>shutdown</c>. + <c>intensity</c> defaults to <c>1</c> and <c>period</c> defaults to + <c>5</c>.</p> <marker id="child_spec"/> <p>The type definition of a child specification is as follows:</p> diff --git a/lib/stdlib/doc/src/sys.xml b/lib/stdlib/doc/src/sys.xml index 0fed649660..45171f814d 100644 --- a/lib/stdlib/doc/src/sys.xml +++ b/lib/stdlib/doc/src/sys.xml @@ -4,7 +4,7 @@ <erlref> <header> <copyright> - <year>1996</year><year>2017</year> + <year>1996</year><year>2016</year> <holder>Ericsson AB. All Rights Reserved.</holder> </copyright> <legalnotice> diff --git a/lib/stdlib/doc/src/unicode.xml b/lib/stdlib/doc/src/unicode.xml index 93d0d37456..382b253ba1 100644 --- a/lib/stdlib/doc/src/unicode.xml +++ b/lib/stdlib/doc/src/unicode.xml @@ -50,8 +50,35 @@ external entities where this is required. When working inside the Erlang/OTP environment, it is recommended to keep binaries in UTF-8 when representing Unicode characters. ISO Latin-1 encoding is supported both - for backward compatibility and for communication - with external entities not supporting Unicode character sets.</p> + for backward compatibility and for communication + with external entities not supporting Unicode character sets.</p> + <p>Programs should always operate on a normalized form and compare + canonical-equivalent Unicode characters as equal. All characters + should thus be normalized to one form once on the system borders. + One of the following functions can convert characters to their + normalized forms <seealso marker="#characters_to_nfc_list/1"> + <c>characters_to_nfc_list/1</c></seealso>, + <seealso marker="#characters_to_nfc_binary/1"> + <c>characters_to_nfc_binary/1</c></seealso>, + <seealso marker="#characters_to_nfd_list/1"> + <c>characters_to_nfd_list/1</c></seealso> or + <seealso marker="#characters_to_nfd_binary/1"> + <c>characters_to_nfd_binary/1</c></seealso>. + For general text + <seealso marker="#characters_to_nfc_list/1"> + <c>characters_to_nfc_list/1</c></seealso> or + <seealso marker="#characters_to_nfc_binary/1"> + <c>characters_to_nfc_binary/1</c></seealso> is preferred, and + for identifiers one of the compatibility normalization + functions, such as + <seealso marker="#characters_to_nfkc_list/1"> + <c>characters_to_nfkc_list/1</c></seealso>, + is preferred for security reasons. + The normalization functions where introduced in OTP 20. + Additional information on normalization can be found in the + <url href="http://unicode.org/faq/normalization.html">Unicode FAQ</url>. + </p> + </description> <datatypes> @@ -335,6 +362,154 @@ decode_data(Data) -> </func> <func> + <name name="characters_to_nfc_list" arity="1"/> + <fsummary>Normalize characters to a list of canonical equivalent + composed Unicode characters.</fsummary> + <desc> + <p>Converts a possibly deep list of characters and binaries + into a Normalized Form of canonical equivalent Composed + characters according to the Unicode standard.</p> + <p>Any binaries in the input must be encoded with utf8 + encoding. + </p> + <p>The result is a list of characters.</p> + <code> +3> unicode:characters_to_nfc_list([<<"abc..a">>,[778],$a,[776],$o,[776]]). +"abc..åäö" +</code> + </desc> + </func> + + <func> + <name name="characters_to_nfc_binary" arity="1"/> + <fsummary>Normalize characters to a utf8 binary of canonical equivalent + composed Unicode characters.</fsummary> + <desc> + <p>Converts a possibly deep list of characters and binaries + into a Normalized Form of canonical equivalent Composed + characters according to the Unicode standard.</p> + <p>Any binaries in the input must be encoded with utf8 + encoding.</p> + <p>The result is an utf8 encoded binary.</p> + <code> +4> unicode:characters_to_nfc_binary([<<"abc..a">>,[778],$a,[776],$o,[776]]). +<<"abc..åäö"/utf8>> +</code> + </desc> + </func> + + <func> + <name name="characters_to_nfd_list" arity="1"/> + <fsummary>Normalize characters to a list of canonical equivalent + decomposed Unicode characters.</fsummary> + <desc> + <p>Converts a possibly deep list of characters and binaries + into a Normalized Form of canonical equivalent Decomposed + characters according to the Unicode standard.</p> + <p>Any binaries in the input must be encoded with utf8 + encoding. + </p> + <p>The result is a list of characters.</p> + <code> +1> unicode:characters_to_nfd_list("abc..åäö"). +[97,98,99,46,46,97,778,97,776,111,776] +</code> + </desc> + </func> + + <func> + <name name="characters_to_nfd_binary" arity="1"/> + <fsummary>Normalize characters to a utf8 binary of canonical equivalent + decomposed Unicode characters.</fsummary> + <desc> + <p>Converts a possibly deep list of characters and binaries + into a Normalized Form of canonical equivalent Decomposed + characters according to the Unicode standard.</p> + <p>Any binaries in the input must be encoded with utf8 + encoding.</p> + <p>The result is an utf8 encoded binary.</p> + <code> +2> unicode:characters_to_nfd_binary("abc..åäö"). +<<97,98,99,46,46,97,204,138,97,204,136,111,204,136>> +</code> + </desc> + </func> + + <func> + <name name="characters_to_nfkc_list" arity="1"/> + <fsummary>Normalize characters to a list of canonical equivalent + composed Unicode characters.</fsummary> + <desc> + <p>Converts a possibly deep list of characters and binaries + into a Normalized Form of compatibly equivalent Composed + characters according to the Unicode standard.</p> + <p>Any binaries in the input must be encoded with utf8 + encoding. + </p> + <p>The result is a list of characters.</p> + <code> +3> unicode:characters_to_nfkc_list([<<"abc..a">>,[778],$a,[776],$o,[776],[65299,65298]]). +"abc..åäö32" +</code> + </desc> + </func> + + <func> + <name name="characters_to_nfkc_binary" arity="1"/> + <fsummary>Normalize characters to a utf8 binary of compatibly equivalent + composed Unicode characters.</fsummary> + <desc> + <p>Converts a possibly deep list of characters and binaries + into a Normalized Form of compatibly equivalent Composed + characters according to the Unicode standard.</p> + <p>Any binaries in the input must be encoded with utf8 + encoding.</p> + <p>The result is an utf8 encoded binary.</p> + <code> +4> unicode:characters_to_nfkc_binary([<<"abc..a">>,[778],$a,[776],$o,[776],[65299,65298]]). +<<"abc..åäö32"/utf8>> +</code> + </desc> + </func> + + <func> + <name name="characters_to_nfkd_list" arity="1"/> + <fsummary>Normalize characters to a list of compatibly equivalent + decomposed Unicode characters.</fsummary> + <desc> + <p>Converts a possibly deep list of characters and binaries + into a Normalized Form of compatibly equivalent Decomposed + characters according to the Unicode standard.</p> + <p>Any binaries in the input must be encoded with utf8 + encoding. + </p> + <p>The result is a list of characters.</p> + <code> +1> unicode:characters_to_nfkd_list(["abc..åäö",[65299,65298]]). +[97,98,99,46,46,97,778,97,776,111,776,51,50] +</code> + </desc> + </func> + + <func> + <name name="characters_to_nfkd_binary" arity="1"/> + <fsummary>Normalize characters to a utf8 binary of compatibly equivalent + decomposed Unicode characters.</fsummary> + <desc> + <p>Converts a possibly deep list of characters and binaries + into a Normalized Form of compatibly equivalent Decomposed + characters according to the Unicode standard.</p> + <p>Any binaries in the input must be encoded with utf8 + encoding.</p> + <p>The result is an utf8 encoded binary.</p> + <code> +2> unicode:characters_to_nfkd_binary(["abc..åäö",[65299,65298]]). +<<97,98,99,46,46,97,204,138,97,204,136,111,204,136,51,50>> +</code> + </desc> + </func> + + <func> <name name="encoding_to_bom" arity="1"/> <fsummary>Create a binary UTF byte order mark from encoding.</fsummary> <type_desc variable="Bin"> diff --git a/lib/stdlib/doc/src/unicode_usage.xml b/lib/stdlib/doc/src/unicode_usage.xml index efc8b75075..11b84f552a 100644 --- a/lib/stdlib/doc/src/unicode_usage.xml +++ b/lib/stdlib/doc/src/unicode_usage.xml @@ -62,6 +62,13 @@ <item><p>In Erlang/OTP 17.0, the encoding default for Erlang source files was switched to UTF-8.</p></item> + + <item><p>In Erlang/OTP 20.0, atoms and function can contain + Unicode characters. Module names are still restricted to + the ISO-Latin-1 range.</p> + <p>Support was added for normalizations forms in + <c>unicode</c> and the <c>string</c> module now handles + utf8-encoded binaries.</p></item> </list> <p>This section outlines the current Unicode support and gives some @@ -106,23 +113,27 @@ </item> </list> - <p>So, a conversion function must know not only one character at a time, - but possibly the whole sentence, the natural language to translate to, - the differences in input and output string length, and so on. - Erlang/OTP has currently no Unicode <c>to_upper</c>/<c>to_lower</c> - functionality, but publicly available libraries address these issues.</p> - - <p>Another example is the accented characters, where the same glyph has two - different representations. The Swedish letter "ö" is one example. - The Unicode standard has a code point for it, but you can also write it - as "o" followed by "U+0308" (Combining Diaeresis, with the simplified - meaning that the last letter is to have "¨" above). They have the same - glyph. They are for most purposes the same, but have different - representations. For example, MacOS X converts all filenames to use - Combining Diaeresis, while most other programs (including Erlang) try to - hide that by doing the opposite when, for example, listing directories. - However it is done, it is usually important to normalize such - characters to avoid confusion.</p> + <p>So, a conversion function must know not only one character at a + time, but possibly the whole sentence, the natural language to + translate to, the differences in input and output string length, + and so on. Erlang/OTP has currently no Unicode + <c>uppercase</c>/<c>lowercase</c> functionality with language + specific handling, but publicly available libraries address these + issues.</p> + + <p>Another example is the accented characters, where the same + glyph has two different representations. The Swedish letter "ö" is + one example. The Unicode standard has a code point for it, but + you can also write it as "o" followed by "U+0308" (Combining + Diaeresis, with the simplified meaning that the last letter is to + have "¨" above). They have the same glyph, user perceived + character. They are for most purposes the same, but have different + representations. For example, MacOS X converts all filenames to + use Combining Diaeresis, while most other programs (including + Erlang) try to hide that by doing the opposite when, for example, + listing directories. However it is done, it is usually important + to normalize such characters to avoid confusion. + </p> <p>The list of examples can be made long. One need a kind of knowledge that was not needed when programs only considered one or two languages. The @@ -269,7 +280,7 @@ them. In some cases functionality has been added to already existing interfaces (as the <seealso marker="stdlib:string"><c>string</c></seealso> module now can - handle lists with any code points). In some cases new + handle strings with any code points). In some cases new functionality or options have been added (as in the <seealso marker="stdlib:io"><c>io</c></seealso> module, the file handling, the <seealso @@ -339,9 +350,10 @@ <tag>The language</tag> <item> <p>Having the source code in UTF-8 also allows you to write string - literals containing Unicode characters with code points > 255, - although atoms, module names, and function names are restricted to - the ISO Latin-1 range. Binary literals, where you use type + literals, function names, and atoms containing Unicode + characters with code points > 255. + Module names are still restricted to the ISO Latin-1 range. + Binary literals, where you use type <c>/utf8</c>, can also be expressed using Unicode characters > 255. Having module names using characters other than 7-bit ASCII can cause trouble on operating systems with inconsistent file naming schemes, @@ -432,15 +444,17 @@ external_charlist() = maybe_improper_list(char() | external_unicode_binary() | <section> <title>Basic Language Support</title> - <p><marker id="unicode_in_erlang"/>As from Erlang/OTP R16, Erlang source - files can be written in UTF-8 or bytewise (<c>latin1</c>) encoding. For - information about how to state the encoding of an Erlang source file, see - the <seealso marker="stdlib:epp#encoding"><c>epp(3)</c></seealso> module. - Strings and comments can be written using Unicode, but functions must - still be named using characters from the ISO Latin-1 character set, and - atoms are restricted to the same ISO Latin-1 range. These restrictions in - the language are of course independent of the encoding of the source - file.</p> + <p><marker id="unicode_in_erlang"/>As from Erlang/OTP R16, Erlang + source files can be written in UTF-8 or bytewise (<c>latin1</c>) + encoding. For information about how to state the encoding of an + Erlang source file, see the <seealso + marker="stdlib:epp#encoding"><c>epp(3)</c></seealso> module. As + from Erlang/OTP R16, strings and comments can be written using + Unicode. As from Erlang/OTP 20, also atoms and functions can be + written using Unicode. Modules names must still be named using + characters from the ISO Latin-1 character set. (These + restrictions in the language are independent of the encoding of + the source file.)</p> <section> <title>Bit Syntax</title> @@ -970,7 +984,7 @@ Eshell V5.10.1 (abort with ^G) <p>Fortunately, most textual data has been stored in lists and range checking has been sparse, so modules like <c>string</c> work well for - Unicode lists with little need for conversion or extension.</p> + Unicode strings with little need for conversion or extension.</p> <p>Some modules are, however, changed to be explicitly Unicode-aware. These modules include:</p> @@ -1021,18 +1035,17 @@ Eshell V5.10.1 (abort with ^G) has extensive support for Unicode text.</p></item> </taglist> - <p>The <seealso marker="stdlib:string"><c>string</c></seealso> module works - perfectly for Unicode strings and ISO Latin-1 strings, except the - language-dependent functions - <seealso marker="stdlib:string#to_upper/1"><c>string:to_upper/1</c></seealso> - and - <seealso marker="stdlib:string#to_lower/1"><c>string:to_lower/1</c></seealso>, - which are only correct for the ISO Latin-1 character set. These two - functions can never function correctly for Unicode characters in their - current form, as there are language and locale issues as well as - multi-character mappings to consider when converting text between cases. - Converting case in an international environment is a large subject not - yet addressed in OTP.</p> + <p>The <seealso marker="stdlib:string"><c>string</c></seealso> + module works perfectly for Unicode strings and ISO Latin-1 + strings, except the language-dependent functions <seealso + marker="stdlib:string#uppercase/1"><c>string:uppercase/1</c></seealso> + and <seealso + marker="stdlib:string#lowercase/1"><c>string:lowercase/1</c></seealso>. + These two functions can never function correctly for Unicode + characters in their current form, as there are language and locale + issues to consider when converting text between cases. Converting + case in an international environment is a large subject not yet + addressed in OTP.</p> </section> <section> |