diff options
Diffstat (limited to 'lib/stdlib/doc/src/re.xml')
-rw-r--r-- | lib/stdlib/doc/src/re.xml | 135 |
1 files changed, 50 insertions, 85 deletions
diff --git a/lib/stdlib/doc/src/re.xml b/lib/stdlib/doc/src/re.xml index 9091035392..18867cfb68 100644 --- a/lib/stdlib/doc/src/re.xml +++ b/lib/stdlib/doc/src/re.xml @@ -59,28 +59,24 @@ </description> - <section> - <title>DATA TYPES</title> - <code type="none"> - iodata() = iolist() | binary() - iolist() = [char() | binary() | iolist()] - - a binary is allowed as the tail of the list</code> - <code type="none"> - unicode_binary() = binary() with characters encoded in UTF-8 coding standard - unicode_char() = integer() representing a valid unicode codepoint - - chardata() = charlist() | unicode_binary() - - charlist() = [unicode_char() | unicode_binary() | charlist()] - - a unicode_binary is allowed as the tail of the list</code> - - <code type="none"> - mp() = Opaque datatype containing a compiled regular expression. - - The mp() is guaranteed to be a tuple() having the atom + <datatypes> + <datatype> + <name name="mp"/> + <desc> + <p>Opaque datatype containing a compiled regular expression. + The mp() is guaranteed to be a tuple() having the atom 're_pattern' as its first element, to allow for matching in guards. The arity of the tuple() or the content of the other fields - may change in future releases.</code> - </section> + may change in future releases.</p> + </desc> + </datatype> + <datatype> + <name name="nl_spec"/> + </datatype> + <datatype> + <name name="compile_option"/> + </datatype> + </datatypes> <funcs> <func> <name>compile(Regexp) -> {ok, MP} | {error, ErrSpec}</name> @@ -96,14 +92,14 @@ <name>compile(Regexp,Options) -> {ok, MP} | {error, ErrSpec}</name> <fsummary>Compile a regular expression into a match program</fsummary> <type> - <v>Regexp = iodata() | charlist()</v> + <v>Regexp = iodata() | <seealso marker="unicode#type-charlist">io:charlist()</seealso></v> <v>Options = [ Option ]</v> - <v>Option = unicode | anchored | caseless | dollar_endonly | dotall | extended | firstline | multiline | no_auto_capture | dupnames | ungreedy | {newline, NLSpec}| bsr_anycrlf | bsr_unicode</v> - <v>NLSpec = cr | crlf | lf | anycrlf | any </v> - <v>MP = mp()</v> + <v>Option = <seealso marker="#type-compile_option">compile_option()</seealso></v> + <v>NLSpec = <seealso marker="#type-nl_spec">nl_spec()</seealso></v> + <v>MP = <seealso marker="#type-mp">mp()</seealso></v> <v>ErrSpec = {ErrString, Position}</v> <v>ErrString = string()</v> - <v>Position = int()</v> + <v>Position = non_neg_integer()</v> </type> <desc> <p>This function compiles a regular expression with the syntax @@ -116,7 +112,7 @@ time one wants to match.</p> <p>When the unicode option is given, the regular expression should be given as a valid unicode <c>charlist()</c>, otherwise as any valid <c>iodata()</c>.</p> - <p>The options have the following meanings:</p> + <p><marker id="compile_options"/>The options have the following meanings:</p> <taglist> <tag><c>unicode</c></tag> <item>The regular expression is given as a unicode <c>charlist()</c> and the resulting regular expression code is to be run against a valid unicode <c>charlist()</c> subject.</item> @@ -137,7 +133,7 @@ This option makes it possible to include comments inside complicated patterns. N <tag><c>multiline</c></tag> <item><p>By default, PCRE treats the subject string as consisting of a single line of characters (even if it actually contains newlines). The "start of line" metacharacter (^) matches only at the start of the string, while the "end of line" metacharacter ($) matches only at the end of the string, or before a terminating newline (unless <c>dollar_endonly</c> is given). This is the same as Perl.</p> -<p>When <c>multiline</c> it is given, the "start of line" and "end of line" constructs match immediately following or immediately before internal newlines in the subject string, respectively, as well as at the very start and end. This is equivalent to Perl's /m option, and it can be changed within a pattern by a (?m) option setting. If there are no newlines in a subject string, or no occurrences of ^ or $ in a pattern, setting <c>multiline</c> has no effect.</p> </item> +<p>When <c>multiline</c> is given, the "start of line" and "end of line" constructs match immediately following or immediately before internal newlines in the subject string, respectively, as well as at the very start and end. This is equivalent to Perl's /m option, and it can be changed within a pattern by a (?m) option setting. If there are no newlines in a subject string, or no occurrences of ^ or $ in a pattern, setting <c>multiline</c> has no effect.</p> </item> <tag><c>no_auto_capture</c></tag> <item>Disables the use of numbered capturing parentheses in the pattern. Any opening parenthesis that is not followed by ? behaves as if it were followed by ?: but named parentheses can still be used for capturing (and they acquire numbers in the usual way). There is no equivalent of this option in Perl. </item> @@ -173,10 +169,10 @@ This option makes it possible to include comments inside complicated patterns. N <name>run(Subject,RE) -> {match, Captured} | nomatch</name> <fsummary>Match a subject against regular expression and capture subpatterns</fsummary> <type> - <v>Subject = iodata() | charlist()</v> - <v>RE = mp() | iodata() | charlist()</v> + <v>Subject = iodata() | <seealso marker="unicode#type-charlist">io:charlist()</seealso></v> + <v>RE = <seealso marker="#type-mp">mp()</seealso> | iodata()</v> <v>Captured = [ CaptureData ]</v> - <v>CaptureData = {int(),int()}</v> + <v>CaptureData = {integer(),integer()}</v> </type> <desc> <p>The same as <c>run(Subject,RE,[])</c>.</p> @@ -186,18 +182,19 @@ This option makes it possible to include comments inside complicated patterns. N <name>run(Subject,RE,Options) -> {match, Captured} | match | nomatch</name> <fsummary>Match a subject against regular expression and capture subpatterns</fsummary> <type> - <v>Subject = iodata() | charlist()</v> - <v>RE = mp() | iodata() | charlist()</v> + <v>Subject = iodata() | <seealso marker="unicode#type-charlist">io:charlist()</seealso></v> + <v>RE = <seealso marker="#type-mp">mp()</seealso> | iodata() | <seealso marker="unicode#type-charlist">io:charlist()</seealso></v> <v>Options = [ Option ]</v> - <v>Option = anchored | global | notbol | noteol | notempty | {offset, int()} | {newline, NLSpec} | bsr_anycrlf | bsr_unicode | {capture, ValueSpec} | {capture, ValueSpec, Type} | CompileOpt</v> + <v>Option = anchored | global | notbol | noteol | notempty | {offset, integer() >= 0} | {newline, NLSpec} | bsr_anycrlf | bsr_unicode | {capture, ValueSpec} | {capture, ValueSpec, Type} | CompileOpt</v> <v>Type = index | list | binary</v> <v>ValueSpec = all | all_but_first | first | none | ValueList</v> <v>ValueList = [ ValueID ]</v> - <v>ValueID = int() | string() | atom()</v> - <v>CompileOpt = see compile/2 above</v> - <v>NLSpec = cr | crlf | lf | anycrlf | any</v> + <v>ValueID = integer() | string() | atom()</v> + <v>CompileOpt = <seealso marker="#type-compile_option">compile_option()</seealso></v> + <d>See <seealso marker="#compile_options">compile/2</seealso> above.</d> + <v>NLSpec = <seealso marker="#type-nl_spec">nl_spec()</seealso></v> <v>Captured = [ CaptureData ] | [ [ CaptureData ] ... ]</v> - <v>CaptureData = {int(),int()} | ListConversionData | binary()</v> + <v>CaptureData = {integer(),integer()} | ListConversionData | binary()</v> <v>ListConversionData = string() | {error, string(), binary()} | {incomplete, string(), binary()}</v> </type> <desc> @@ -217,7 +214,7 @@ This option makes it possible to include comments inside complicated patterns. N <p>If the regular expression is previously compiled, the option list can only contain the options <c>anchored</c>, <c>global</c>, <c>notbol</c>, <c>noteol</c>, - <c>notempty</c>, <c>{offset, int()}</c>, <c>{newline, + <c>notempty</c>, <c>{offset, integer() >= 0}</c>, <c>{newline, NLSpec}</c> and <c>{capture, ValueSpec}/{capture, ValueSpec, Type}</c>. Otherwise all options valid for the <c>re:compile/2</c> function are allowed as well. Options @@ -360,7 +357,7 @@ This option makes it possible to include comments inside complicated patterns. N behavior of the dollar metacharacter. It does not affect \Z or \z.</item> - <tag><c>{offset, int()}</c></tag> + <tag><c>{offset, integer() >= 0}</c></tag> <item>Start matching at the offset (position) given in the subject string. The offset is zero-based, so that the default is @@ -503,42 +500,27 @@ This option makes it possible to include comments inside complicated patterns. N </desc> </func> <func> - <name>replace(Subject,RE,Replacement) -> iodata() | charlist()</name> + <name name="replace" arity="3"/> <fsummary>Match a subject against regular expression and replace matching elements with Replacement</fsummary> - <type> - <v>Subject = iodata() | charlist()</v> - <v>RE = mp() | iodata()</v> - <v>Replacement = iodata() | charlist()</v> - </type> <desc> - <p>The same as <c>replace(Subject,RE,Replacement,[])</c>.</p> + <p>The same as <c>replace(<anno>Subject</anno>,<anno>RE</anno>,<anno>Replacement</anno>,[])</c>.</p> </desc> </func> <func> - <name>replace(Subject,RE,Replacement,Options) -> iodata() | charlist() | binary() | list()</name> + <name name="replace" arity="4"/> <fsummary>Match a subject against regular expression and replace matching elements with Replacement</fsummary> - <type> - <v>Subject = iodata() | charlist()</v> - <v>RE = mp() | iodata() | charlist()</v> - <v>Replacement = iodata() | charlist()</v> - <v>Options = [ Option ]</v> - <v>Option = anchored | global | notbol | noteol | notempty | {offset, int()} | {newline, NLSpec} | bsr_anycrlf | bsr_unicode | {return, ReturnType} | CompileOpt</v> - <v>ReturnType = iodata | list | binary</v> - <v>CompileOpt = see compile/2 above</v> - <v>NLSpec = cr | crlf | lf | anycrlf | any </v> - </type> <desc> - <p>Replaces the matched part of the <c>Subject</c> string with the contents of <c>Replacement</c>.</p> + <p>Replaces the matched part of the <c><anno>Subject</anno></c> string with the contents of <c><anno>Replacement</anno></c>.</p> <p>The permissible options are the same as for <c>re:run/3</c>, except that the <c>capture</c> option is not allowed. - Instead a <c>{return, ReturnType}</c> is present. The default return type is <c>iodata</c>, constructed in a + Instead a <c>{return, <anno>ReturnType</anno>}</c> is present. The default return type is <c>iodata</c>, constructed in a way to minimize copying. The <c>iodata</c> result can be used directly in many i/o-operations. If a flat <c>list()</c> is desired, specify <c>{return, list}</c> and if a binary is preferred, specify <c>{return, binary}</c>.</p> <p>As in the <c>re:run/3</c> function, an <c>mp()</c> compiled - with the <c>unicode</c> option requires the <c>Subject</c> to be + with the <c>unicode</c> option requires the <c><anno>Subject</anno></c> to be a Unicode <c>charlist()</c>. If compilation is done implicitly and the <c>unicode</c> compilation option is given to this - function, both the regular expression and the <c>Subject</c> + function, both the regular expression and the <c><anno>Subject</anno></c> should be given as valid Unicode <c>charlist()</c>s.</p> <p>The replacement string can contain the special character @@ -565,34 +547,17 @@ This option makes it possible to include comments inside complicated patterns. N </desc> </func> <func> - <name>split(Subject,RE) -> SplitList</name> + <name name="split" arity="2"/> <fsummary>Split a string by tokens specified as a regular expression</fsummary> - <type> - <v>Subject = iodata() | charlist()</v> - <v>RE = mp() | iodata()</v> - <v>SplitList = [ iodata() | charlist() ]</v> - </type> <desc> - <p>The same as <c>split(Subject,RE,[])</c>.</p> + <p>The same as <c>split(<anno>Subject</anno>,<anno>RE</anno>,[])</c>.</p> </desc> </func> <func> - <name>split(Subject,RE,Options) -> SplitList</name> + <name name="split" arity="3"/> <fsummary>Split a string by tokens specified as a regular expression</fsummary> - <type> - <v>Subject = iodata() | charlist()</v> - <v>RE = mp() | iodata() | charlist()</v> - <v>Options = [ Option ]</v> - <v>Option = anchored | global | notbol | noteol | notempty | {offset, int()} | {newline, NLSpec} | bsr_anycrlf | bsr_unicode | {return, ReturnType} | {parts, NumParts} | group | trim | CompileOpt</v> - <v>NumParts = int() | infinity</v> - <v>ReturnType = iodata | list | binary</v> - <v>CompileOpt = see compile/2 above</v> - <v>NLSpec = cr | crlf | lf | anycrlf | any </v> - <v>SplitList = [ RetData ] | [ GroupedRetData ]</v> - <v>GroupedRetData = [ RetData ]</v> - <v>RetData = iodata() | charlist() | binary() | list()</v> - </type> + <type_desc variable="CompileOpt">See <seealso marker="#compile_options">compile/2</seealso> above.</type_desc> <desc> <p>This function splits the input into parts by finding tokens according to the regular expression supplied.</p> @@ -602,10 +567,10 @@ This option makes it possible to include comments inside complicated patterns. N of the string is removed from the output.</p> <p>As in the <c>re:run/3</c> function, an <c>mp()</c> compiled - with the <c>unicode</c> option requires the <c>Subject</c> to be + with the <c>unicode</c> option requires the <c><anno>Subject</anno></c> to be a Unicode <c>charlist()</c>. If compilation is done implicitly and the <c>unicode</c> compilation option is given to this - function, both the regular expression and the <c>Subject</c> + function, both the regular expression and the <c><anno>Subject</anno></c> should be given as valid Unicode <c>charlist()</c>s.</p> <p>The result is given as a list of "strings", the @@ -722,7 +687,7 @@ This option makes it possible to include comments inside complicated patterns. N <p>Summary of options not previously described for the <c>re:run/3</c> function:</p> <taglist> - <tag>{return,ReturnType}</tag> + <tag>{return,<anno>ReturnType</anno>}</tag> <item><p>Specifies how the parts of the original string are presented in the result list. The possible types are:</p> <taglist> <tag>iodata</tag> |