diff options
author | José Valim <[email protected]> | 2016-05-31 14:28:54 +0200 |
---|---|---|
committer | José Valim <[email protected]> | 2017-01-30 15:24:05 +0100 |
commit | 26b59dfe67ef551cd94765557cdd8c79794bcc38 (patch) | |
tree | 696adc07b3e7a4a3f1ed6c52311ff6e163b218b4 /erts/doc | |
parent | 6c7539b0e39996f870385e5276e08c0dd98b6eb8 (diff) | |
download | otp-26b59dfe67ef551cd94765557cdd8c79794bcc38.tar.gz otp-26b59dfe67ef551cd94765557cdd8c79794bcc38.tar.bz2 otp-26b59dfe67ef551cd94765557cdd8c79794bcc38.zip |
Add new AtU8 beam chunk
The new chunk stores atoms encoded in UTF-8.
beam_lib has also been modified to handle the new
'utf8_atoms' attribute while the 'atoms' attribute
may be a missing chunk from now on.
The binary_to_atom/2 BIF can now encode any utf8
binary with up to 255 characters.
The list_to_atom/1 BIF can now accept codepoints
higher than 255 with up to 255 characters (thanks
to Björn Gustavsson).
Diffstat (limited to 'erts/doc')
-rw-r--r-- | erts/doc/src/erl_ext_dist.xml | 15 | ||||
-rw-r--r-- | erts/doc/src/erlang.xml | 35 |
2 files changed, 18 insertions, 32 deletions
diff --git a/erts/doc/src/erl_ext_dist.xml b/erts/doc/src/erl_ext_dist.xml index 4f799f8f34..a436a9ca74 100644 --- a/erts/doc/src/erl_ext_dist.xml +++ b/erts/doc/src/erl_ext_dist.xml @@ -119,16 +119,11 @@ <tcaption>Compressed Data Format when Expanded</tcaption></table> <marker id="utf8_atoms"/> <note> - <p>As from ERTS 5.10 (OTP R16) support - for UTF-8 encoded atoms has been introduced in the external format. - However, only characters that can be encoded using Latin-1 (ISO-8859-1) - are currently supported in atoms. The support for UTF-8 encoded atoms - in the external format has been implemented to be able to support - all Unicode characters in atoms in <em>some future release</em>. - Until full Unicode support for atoms has been introduced, - it is an <em>error</em> to pass atoms containing - characters that cannot be encoded in Latin-1, and <em>the behavior is - undefined</em>.</p> + <p>As from ERTS 9.0 (OTP 20), UTF-8 encoded atoms may contain any Unicode + character. Although the support for UTF-8 encoded atoms in the external + format is available since ERTS 5.10 (OTP R16), passing atoms that cannot + be encoded in Latin-1 is an <em>error</em> in versions earlier than + Erlang/OTP 20, and <em>the behavior is undefined</em>.</p> <p>When distribution flag <seealso marker="erl_dist_protocol#dflags"> <c>DFLAG_UTF8_ATOMS</c></seealso> has been exchanged between both nodes in the <seealso marker="erl_dist_protocol#distribution_handshake"> diff --git a/erts/doc/src/erlang.xml b/erts/doc/src/erlang.xml index b3fab3874b..cf038c49f0 100644 --- a/erts/doc/src/erlang.xml +++ b/erts/doc/src/erlang.xml @@ -325,16 +325,11 @@ Z = erlang:adler32_combine(X,Y,iolist_size(Data2)).</code> is <c>latin1</c>, one byte exists for each character in the text representation. If <c><anno>Encoding</anno></c> is <c>utf8</c> or - <c>unicode</c>, the characters are encoded using UTF-8 - (that is, characters from 128 through 255 are - encoded in two bytes).</p> + <c>unicode</c>, the characters are encoded using UTF-8 where + characters may require multiple bytes.</p> <note> - <p><c>atom_to_binary(<anno>Atom</anno>, latin1)</c> never - fails, as the text representation of an atom can only - contain characters from 0 through 255. In a future release, - the text representation - of atoms can be allowed to contain any Unicode character and - <c>atom_to_binary(<anno>Atom</anno>, latin1)</c> then fails if the + <p>As from Erlang/OTP 20, atoms can contain any Unicode character + and <c>atom_to_binary(<anno>Atom</anno>, latin1)</c> may fail if the text representation for <c><anno>Atom</anno></c> contains a Unicode character > 255.</p> </note> @@ -402,13 +397,11 @@ Z = erlang:adler32_combine(X,Y,iolist_size(Data2)).</code> translation of bytes in the binary is done. If <c><anno>Encoding</anno></c> is <c>utf8</c> or <c>unicode</c>, the binary must contain - valid UTF-8 sequences. Only Unicode characters up - to 255 are allowed.</p> + valid UTF-8 sequences.</p> <note> - <p><c>binary_to_atom(<anno>Binary</anno>, utf8)</c> fails if - the binary contains Unicode characters > 255. - In a future release, such Unicode characters can be allowed and - <c>binary_to_atom(<anno>Binary</anno>, utf8)</c> does then not fail. + <p>As from Erlang/OTP 20, <c>binary_to_atom(<anno>Binary</anno>, utf8)</c> + is capable of encoding any Unicode character. Earlier versions would + fail if the binary contained Unicode characters > 255. For more information about Unicode support in atoms, see the <seealso marker="erl_ext_dist#utf8_atoms">note on UTF-8 encoded atoms</seealso> @@ -419,9 +412,7 @@ Z = erlang:adler32_combine(X,Y,iolist_size(Data2)).</code> > <input>binary_to_atom(<<"Erlang">>, latin1).</input> 'Erlang' > <input>binary_to_atom(<<1024/utf8>>, utf8).</input> -** exception error: bad argument - in function binary_to_atom/2 - called as binary_to_atom(<<208,128>>,utf8)</pre> +'Ѐ'</pre> </desc> </func> @@ -2401,10 +2392,10 @@ os_prompt%</pre> <desc> <p>Returns the atom whose text representation is <c><anno>String</anno></c>.</p> - <p><c><anno>String</anno></c> can only contain ISO-latin-1 - characters (that is, numbers < 256) as the implementation does not - allow Unicode characters equal to or above 256 in atoms. - For more information on Unicode support in atoms, see + <p>As from Erlang/OTP 20, <c><anno>String</anno></c> may contain + any Unicode character. Earlier versions allowed only ISO-latin-1 + characters as the implementation did not allow Unicode characters + above 255. For more information on Unicode support in atoms, see <seealso marker="erl_ext_dist#utf8_atoms">note on UTF-8 encoded atoms</seealso> in section "External Term Format" in the User's Guide.</p> |