aboutsummaryrefslogtreecommitdiffstats
path: root/lib/stdlib/doc/src
diff options
context:
space:
mode:
Diffstat (limited to 'lib/stdlib/doc/src')
-rw-r--r--lib/stdlib/doc/src/io.xml3
-rw-r--r--lib/stdlib/doc/src/io_lib.xml70
-rw-r--r--lib/stdlib/doc/src/re.xml21
3 files changed, 84 insertions, 10 deletions
diff --git a/lib/stdlib/doc/src/io.xml b/lib/stdlib/doc/src/io.xml
index a28180b42a..8ebfdb2e7f 100644
--- a/lib/stdlib/doc/src/io.xml
+++ b/lib/stdlib/doc/src/io.xml
@@ -505,7 +505,8 @@ ok
<p>Writes the data with standard syntax in the same way as
<c>~w</c>, but breaks terms whose printed representation
is longer than one line into many lines and indents each
- line sensibly. It also tries to detect lists of
+ line sensibly. Left justification is not supported.
+ It also tries to detect lists of
printable characters and to output these as strings. The
Unicode translation modifier is used for determining
what characters are printable. For example:</p>
diff --git a/lib/stdlib/doc/src/io_lib.xml b/lib/stdlib/doc/src/io_lib.xml
index 3312b08064..2117d66381 100644
--- a/lib/stdlib/doc/src/io_lib.xml
+++ b/lib/stdlib/doc/src/io_lib.xml
@@ -4,7 +4,7 @@
<erlref>
<header>
<copyright>
- <year>1996</year><year>2013</year>
+ <year>1996</year><year>2014</year>
<holder>Ericsson AB. All Rights Reserved.</holder>
</copyright>
<legalnotice>
@@ -59,6 +59,35 @@
<datatype>
<name name="latin1_string"/>
</datatype>
+ <datatype>
+ <name name="format_spec"/>
+ <desc><p>Description:</p>
+ <list type="bulleted">
+ <item><p><c>control_char</c> is the type of control
+ sequence: <c>$P</c>, <c>$w</c>, and so on;</p>
+ </item>
+ <item><p><c>args</c> is a list of the arguments used by the
+ control sequence, or an empty list if the control sequence
+ does not take any arguments;</p>
+ </item>
+ <item><p><c>width</c> is the field width;</p>
+ </item>
+ <item><p><c>adjust</c> is the adjustment;</p>
+ </item>
+ <item><p><c>precision</c> is the precision of the printed
+ argument;</p>
+ </item>
+ <item><p><c>pad_char</c> is the padding character;</p>
+ </item>
+ <item><p><c>encoding</c> is set to <c>true</c> if the translation
+ modifier <c>t</c> is present;</p>
+ </item>
+ <item><p><c>strings</c> is set to <c>false</c> if the modifier
+ <c>l</c> is present.</p>
+ </item>
+ </list>
+ </desc>
+ </datatype>
</datatypes>
<funcs>
<func>
@@ -260,6 +289,45 @@
</desc>
</func>
<func>
+ <name name="scan_format" arity="2"/>
+ <fsummary>Parse all control sequences in the format string</fsummary>
+ <desc>
+ <p>Returns a list corresponding to the given format string,
+ where control sequences have been replaced with
+ corresponding tuples. This list can be passed to <seealso
+ marker="#build_text/1">io_lib:build_text/1</seealso> to have
+ the same effect as <c>io_lib:format(Format, Args)</c>, or to
+ <seealso
+ marker="#unscan_format/1">io_lib:unscan_format/1</seealso>
+ in order to get the corresponding pair of <c>Format</c> and
+ <c>Args</c> (with every <c>*</c> and corresponding argument
+ expanded to numeric values).</p>
+ <p>A typical use of this function is to replace unbounded-size
+ control sequences like <c>~w</c> and <c>~p</c> with the
+ depth-limited variants <c>~W</c> and <c>~P</c> before
+ formatting to text, e.g. in a logger.</p>
+ </desc>
+ </func>
+ <func>
+ <name name="unscan_format" arity="1"/>
+ <fsummary>Revert a pre-parsed format list to a plain character list
+ and a list of arguments</fsummary>
+ <desc>
+ <p>See <seealso
+ marker="#scan_format/2">io_lib:scan_format/2</seealso> for
+ details.</p>
+ </desc>
+ </func>
+ <func>
+ <name name="build_text" arity="1"/>
+ <fsummary>Build the output text for a pre-parsed format list</fsummary>
+ <desc>
+ <p>See <seealso
+ marker="#scan_format/2">io_lib:scan_format/2</seealso> for
+ details.</p>
+ </desc>
+ </func>
+ <func>
<name name="indentation" arity="2"/>
<fsummary>Indentation after printing string</fsummary>
<desc>
diff --git a/lib/stdlib/doc/src/re.xml b/lib/stdlib/doc/src/re.xml
index a1833f6a51..5af1468e9b 100644
--- a/lib/stdlib/doc/src/re.xml
+++ b/lib/stdlib/doc/src/re.xml
@@ -150,7 +150,11 @@ This option makes it possible to include comments inside complicated patterns. N
<tag><c>no_start_optimize</c></tag>
<item>This option disables optimization that may malfunction if "Special start-of-pattern items" are present in the regular expression. A typical example would be when matching "DEFABC" against "(*COMMIT)ABC", where the start optimization of PCRE would skip the subject up to the "A" and would never realize that the (*COMMIT) instruction should have made the matching fail. This option is only relevant if you use "start-of-pattern items", as discussed in the section "PCRE regular expression details" below.</item>
<tag><c>ucp</c></tag>
- <item>Specifies that Unicode Character Properties should be used when resolving \B, \b, \D, \d, \S, \s, \Wand \w. Without this flag, only ISO-Latin-1 properties are used. Using Unicode properties hurts performance, but is semantically correct when working with Unicode characters beyond the ISO-Latin-1 range.</item>
+ <item>Specifies that Unicode Character Properties should be used when
+ resolving \B, \b, \D, \d, \S, \s, \W and \w. Without this flag, only
+ ISO-Latin-1 properties are used. Using Unicode properties hurts
+ performance, but is semantically correct when working with Unicode
+ characters beyond the ISO-Latin-1 range.</item>
<tag><c>never_utf</c></tag>
<item>Specifies that the (*UTF) and/or (*UTF8) "start-of-pattern items" are forbidden. This flag can not be combined with <c>unicode</c>. Useful if ISO-Latin-1 patterns from an external source are to be compiled.</item>
</taglist>
@@ -966,7 +970,7 @@ appearance causes an error.
</quote>
<p>This has the same effect as setting the <c>ucp</c> option: it causes sequences
such as \d and \w to use Unicode properties to determine character types,
-instead of recognizing only characters with codes less than 128 via a lookup
+instead of recognizing only characters with codes less than 256 via a lookup
table.
</p>
@@ -1307,7 +1311,8 @@ By default, the definition of letters and digits is controlled by PCRE's
low-valued character tables, in Erlang's case (and without the <c>unicode</c> option),
the ISO-Latin-1 character set.</p>
-<p>By default, in <c>unicode</c> mode, characters with values greater than 128 never match
+<p>By default, in <c>unicode</c> mode, characters with values greater than 255,
+i.e. all characters outside the ISO-Latin-1 character set, never match
\d, \s, or \w, and always match \D, \S, and \W. These sequences retain
their original meanings from before UTF support was available, mainly for
efficiency reasons. However, if the <c>ucp</c> option is set, the behaviour is changed so that Unicode
@@ -1954,10 +1959,10 @@ can be included in a class as a literal string of data units, or by using the
upper case and lower case versions, so for example, a caseless [aeiou] matches
"A" as well as "a", and a caseless [^aeiou] does not match "A", whereas a
caseful version would. In a UTF mode, PCRE always understands the concept of
-case for characters whose values are less than 128, so caseless matching is
+case for characters whose values are less than 256, so caseless matching is
always possible. For characters with higher values, the concept of case is
supported if PCRE is compiled with Unicode property support, but not otherwise.
-If you want to use caseless matching in a UTF mode for characters 128 and
+If you want to use caseless matching in a UTF mode for characters 256 and
above, you must ensure that PCRE is compiled with Unicode property support as
well as with UTF support.</p>
@@ -1989,7 +1994,7 @@ matches the letters in either case. For example, [W-c] is equivalent to
[][\\^_`wxyzabc], matched caselessly, and in a non-UTF mode, if character
tables for a French locale are in use, [\xc8-\xcb] matches accented E
characters in both cases. In UTF modes, PCRE supports the concept of case for
-characters with values greater than 128 only when it is compiled with Unicode
+characters with values greater than 255 only when it is compiled with Unicode
property support.</p>
<p>The character escape sequences \d, \D, \h, \H, \p, \P, \s, \S, \v,
@@ -2062,7 +2067,7 @@ by a ^ character after the colon. For example,</p>
syntax [.ch.] and [=ch=] where "ch" is a "collating element", but these are not
supported, and an error is given if they are encountered.</p>
-<p>By default, in UTF modes, characters with values greater than 128 do not match
+<p>By default, in UTF modes, characters with values greater than 255 do not match
any of the POSIX character classes. However, if the PCRE_UCP option is passed
to <b>pcre_compile()</b>, some of the classes are changed so that Unicode
character properties are used. This is achieved by replacing the POSIX classes
@@ -2081,7 +2086,7 @@ by other sequences, as follows:</p>
<p>Negated versions, such as [:^alpha:] use \P instead of \p. The other POSIX
classes are unchanged, and match only characters with code points less than
-128.</p>
+256.</p>
</section>