From 437943f77f3fb9d2a12c659e52c8d35e3e24fbc6 Mon Sep 17 00:00:00 2001 From: Richard Carlsson Date: Fri, 2 Mar 2018 17:01:22 +0100 Subject: Improve documentation of io:format ~p when Unicode is involved --- lib/stdlib/doc/src/io.xml | 72 ++++++++++++++++++++++++++++++++++------------- 1 file changed, 53 insertions(+), 19 deletions(-) diff --git a/lib/stdlib/doc/src/io.xml b/lib/stdlib/doc/src/io.xml index 72c774e6ef..6a7c06188b 100644 --- a/lib/stdlib/doc/src/io.xml +++ b/lib/stdlib/doc/src/io.xml @@ -167,11 +167,11 @@ ok The default padding character is ' ' (space).

-

Mod is the control sequence modifier. It is either a - single character (t, for Unicode - translation, and l, for stopping p and - P from detecting printable characters) - that changes the interpretation of Data.

+

Mod is the control sequence modifier. It is a + single character that changes the interpretation of + Data. This can be t, for Unicode translation, + or l, for stopping p and P from + detecting printable characters.

Available control sequences:

@@ -277,10 +277,9 @@ ok ~w, but breaks terms whose printed representation is longer than one line into many lines and indents each line sensibly. Left-justification is not supported. - It also tries to detect lists of - printable characters and to output these as strings. The - Unicode translation modifier is used for determining - what characters are printable, for example:

+ It also tries to detect flat lists of + printable characters and output these as strings. + For example:

 1> T = [{attributes,[[{id,age,1.50000},{mode,explicit},
 {typename,"INTEGER"}], [{id,cho},{mode,explicit},{typename,'Cho'}]]},
@@ -302,7 +301,7 @@ ok
  {mode,implicit}]
 ok

The field width specifies the maximum line length. - Defaults to 80. The precision specifies the initial + It defaults to 80. The precision specifies the initial indentation of the term. It defaults to the number of characters printed on this line in the same call to write/1 or @@ -332,18 +331,53 @@ ok [{a,[97]}, {b,[98]}] ok -

Binaries that look like UTF-8 encoded strings are - output with the string syntax if the Unicode translation - modifier is specified:

+

The Unicode translation modifier t specifies how to treat + characters outside the Latin-1 range of codepoints, in + atoms, strings, and binaries. For example, printing an atom + containing a character > 255:

+
+8> io:fwrite("~p~n",[list_to_atom([1024])]).
+'\x{400}'
+ok
+9> io:fwrite("~tp~n",[list_to_atom([1024])]).
+'Ѐ'
+ok
+

By default, Erlang only detects lists of characters + in the Latin-1 range as strings, but the +pc unicode + flag can be used to change this (see + printable_range/0 for details). For example:

+
+10> io:fwrite("~p~n",[[214]]).
+"Ö"
+ok
+11> io:fwrite("~p~n",[[1024]]).
+[1024]
+ok
+12> io:fwrite("~tp~n",[[1024]]).
+[1024]
+ok
+
+

but if Erlang was started with +pc unicode:

-9> io:fwrite("~p~n",[[1024]]).
+13> io:fwrite("~p~n",[[1024]]).
 [1024]
-10> io:fwrite("~tp~n",[[1024]]).
-"\x{400}"
-11> io:fwrite("~tp~n", [<<128,128>>]).
+ok
+14> io:fwrite("~tp~n",[[1024]]).
+"Ѐ"
+ok
+

Similarly, binaries that look like UTF-8 encoded strings + are output with the binary string syntax if the t + modifier is specified:

+
+15> io:fwrite("~p~n", [<<208,128>>]).
+<<208,128>>
+ok
+16> io:fwrite("~tp~n", [<<208,128>>]).
+<<"Ѐ"/utf8>>
+ok
+17> io:fwrite("~tp~n", [<<128,128>>]).
 <<128,128>>
-12> io:fwrite("~tp~n", [<<208,128>>]).
-<<"\x{400}"/utf8>>
 ok
W -- cgit v1.2.3