diff options
author | Hans Bolinder <[email protected]> | 2018-03-08 15:47:42 +0100 |
---|---|---|
committer | GitHub <[email protected]> | 2018-03-08 15:47:42 +0100 |
commit | 1f90069ac153214a0510fa862361e1e31d558b91 (patch) | |
tree | 24897fb38b1c663a47f2153ea9f213ba195517d3 /lib/stdlib/doc/src/io.xml | |
parent | 88f654aa94e7a51681ad5774a0677bfa2fba77bd (diff) | |
parent | 437943f77f3fb9d2a12c659e52c8d35e3e24fbc6 (diff) | |
download | otp-1f90069ac153214a0510fa862361e1e31d558b91.tar.gz otp-1f90069ac153214a0510fa862361e1e31d558b91.tar.bz2 otp-1f90069ac153214a0510fa862361e1e31d558b91.zip |
Merge pull request #1737 from richcarl/io-format-printable-strings-doc
Improve documentation of io:format ~p when Unicode is involved
Diffstat (limited to 'lib/stdlib/doc/src/io.xml')
-rw-r--r-- | lib/stdlib/doc/src/io.xml | 72 |
1 files changed, 53 insertions, 19 deletions
diff --git a/lib/stdlib/doc/src/io.xml b/lib/stdlib/doc/src/io.xml index 72c774e6ef..6a7c06188b 100644 --- a/lib/stdlib/doc/src/io.xml +++ b/lib/stdlib/doc/src/io.xml @@ -167,11 +167,11 @@ ok</pre> The default padding character is <c>' '</c> (space).</p> </item> <item> - <p><c>Mod</c> is the control sequence modifier. It is either a - single character (<c>t</c>, for Unicode - translation, and <c>l</c>, for stopping <c>p</c> and - <c>P</c> from detecting printable characters) - that changes the interpretation of <c>Data</c>.</p> + <p><c>Mod</c> is the control sequence modifier. It is a + single character that changes the interpretation of + <c>Data</c>. This can be <c>t</c>, for Unicode translation, + or <c>l</c>, for stopping <c>p</c> and <c>P</c> from + detecting printable characters.</p> </item> </list> <p><em>Available control sequences:</em></p> @@ -277,10 +277,9 @@ ok <c>~w</c>, but breaks terms whose printed representation is longer than one line into many lines and indents each line sensibly. Left-justification is not supported. - It also tries to detect lists of - printable characters and to output these as strings. The - Unicode translation modifier is used for determining - what characters are printable, for example:</p> + It also tries to detect flat lists of + printable characters and output these as strings. + For example:</p> <pre> 1> <input>T = [{attributes,[[{id,age,1.50000},{mode,explicit},</input> <input>{typename,"INTEGER"}], [{id,cho},{mode,explicit},{typename,'Cho'}]]},</input> @@ -302,7 +301,7 @@ ok {mode,implicit}] ok</pre> <p>The field width specifies the maximum line length. - Defaults to 80. The precision specifies the initial + It defaults to 80. The precision specifies the initial indentation of the term. It defaults to the number of characters printed on this line in the <em>same</em> call to <seealso marker="#write/1"><c>write/1</c></seealso> or @@ -332,18 +331,53 @@ ok [{a,[97]}, {b,[98]}] ok</pre> - <p>Binaries that look like UTF-8 encoded strings are - output with the string syntax if the Unicode translation - modifier is specified:</p> + <p>The Unicode translation modifier <c>t</c> specifies how to treat + characters outside the Latin-1 range of codepoints, in + atoms, strings, and binaries. For example, printing an atom + containing a character > 255:</p> + <pre> +8> <input>io:fwrite("~p~n",[list_to_atom([1024])]).</input> +'\x{400}' +ok +9> <input>io:fwrite("~tp~n",[list_to_atom([1024])]).</input> +'Ѐ' +ok</pre> + <p>By default, Erlang only detects lists of characters + in the Latin-1 range as strings, but the <c>+pc unicode</c> + flag can be used to change this (see <seealso + marker="#printable_range/0"> + <c>printable_range/0</c></seealso> for details). For example:</p> + <pre> +10> <input>io:fwrite("~p~n",[[214]]).</input> +"Ö" +ok +11> <input>io:fwrite("~p~n",[[1024]]).</input> +[1024] +ok +12> <input>io:fwrite("~tp~n",[[1024]]).</input> +[1024] +ok +</pre> + <p>but if Erlang was started with <c>+pc unicode</c>:</p> <pre> -9> <input>io:fwrite("~p~n",[[1024]]).</input> +13> <input>io:fwrite("~p~n",[[1024]]).</input> [1024] -10> <input>io:fwrite("~tp~n",[[1024]]).</input> -"\x{400}" -11> <input>io:fwrite("~tp~n", [<<128,128>>]).</input> +ok +14> <input>io:fwrite("~tp~n",[[1024]]).</input> +"Ѐ" +ok</pre> + <p>Similarly, binaries that look like UTF-8 encoded strings + are output with the binary string syntax if the <c>t</c> + modifier is specified:</p> + <pre> +15> <input>io:fwrite("~p~n", [<<208,128>>]).</input> +<<208,128>> +ok +16> <input>io:fwrite("~tp~n", [<<208,128>>]).</input> +<<"Ѐ"/utf8>> +ok +17> <input>io:fwrite("~tp~n", [<<128,128>>]).</input> <<128,128>> -12> <input>io:fwrite("~tp~n", [<<208,128>>]).</input> -<<"\x{400}"/utf8>> ok</pre> </item> <tag><c>W</c></tag> |