diff options
author | Björn Gustavsson <bjorn@erlang.org> | 2017-02-01 16:25:47 +0100 |
---|---|---|
committer | Björn Gustavsson <bjorn@erlang.org> | 2017-02-02 12:18:54 +0100 |
commit | db442323e9e86528edeb7226d55404e290b088b3 (patch) | |
tree | 58248d652e7878947f2e75d1627a9fccecd35bb9 /lib/stdlib/src/io_lib_format.erl | |
parent | cce3120dd0021c5ab5bf8d5b4088e7364f678dda (diff) | |
download | otp-db442323e9e86528edeb7226d55404e290b088b3.tar.gz otp-db442323e9e86528edeb7226d55404e290b088b3.tar.bz2 otp-db442323e9e86528edeb7226d55404e290b088b3.zip |
Make "~s" fail for Unicode atoms
26b59dfe67e introduced support for arbitrary Unicode characters in
atoms. After that commit, it is possible to print any atom with
a "~s" format string:
1> io:format("~s\n", ['спутник']).
спутник
Note that the same text as a string will fail:
2> io:format("~s\n", ["спутник"]).
** exception error: bad argument
in function io:format/3
called as io:format(<0.53.0>,"~s\n",
[[1089,1087,1091,1090,1085,1080,1082]])
Being more permissive for atoms is probably beneficial for io:format/2.
However, for io_lib:format/2, the new behavior breaks this guarantee
in the documentation for io_lib:format/2:
If and only if the Unicode translation modifier is used in
the format string (that is, ~ts or ~tc), the resulting list
can contain characters beyond the ISO Latin-1 character range
(that is, numbers > 255).
The problem is that you can no longer be sure whether io_lib:format/2
will return an iolist that can be successfully passed to a port
or iolist_to_binary/1.
We see three solutions:
1. Keep the new behavior. That means that you can get non-iolist data
when you use ~s for printing an atom, but a 'badarg' when printing
Unicode strings. That is inconsistent, and it delays error detection
if the result is passed to a port or iolist_to_binary/1.
2. Always allow Unicode characters for ~s. That would be incompatible,
because ~s says that any binary is encoded in latin1, while ~ts says
that any binary is encoded in UTF-8. To implement this solution, we
could no longer support latin1 binaries; all binaries would have to
be encoded in UTF-8.
3. Only allow ~s for atoms where all characters are less than 256.
Require ~ts to print atoms such as 'спутник'.
We reject solution 1 because it is slightly incompatible and is
inconsistent.
We reject solution 2 because it too incompatible.
Therefore, this commit implements solution 3.
Diffstat (limited to 'lib/stdlib/src/io_lib_format.erl')
-rw-r--r-- | lib/stdlib/src/io_lib_format.erl | 5 |
1 files changed, 4 insertions, 1 deletions
diff --git a/lib/stdlib/src/io_lib_format.erl b/lib/stdlib/src/io_lib_format.erl index c7b75961cb..3113767614 100644 --- a/lib/stdlib/src/io_lib_format.erl +++ b/lib/stdlib/src/io_lib_format.erl @@ -265,7 +265,10 @@ control($W, [A,Depth], F, Adj, P, Pad, _Enc, _Str, _I) when is_integer(Depth) -> term(io_lib:write(A, Depth), F, Adj, P, Pad); control($P, [A,Depth], F, Adj, P, Pad, Enc, Str, I) when is_integer(Depth) -> print(A, Depth, F, Adj, P, Pad, Enc, Str, I); -control($s, [A], F, Adj, P, Pad, _Enc, _Str, _I) when is_atom(A) -> +control($s, [A], F, Adj, P, Pad, latin1, _Str, _I) when is_atom(A) -> + L = iolist_to_chars(atom_to_list(A)), + string(L, F, Adj, P, Pad); +control($s, [A], F, Adj, P, Pad, unicode, _Str, _I) when is_atom(A) -> string(atom_to_list(A), F, Adj, P, Pad); control($s, [L0], F, Adj, P, Pad, latin1, _Str, _I) -> L = iolist_to_chars(L0), |