aboutsummaryrefslogtreecommitdiffstats
path: root/erts/doc
diff options
context:
space:
mode:
authorJosé Valim <[email protected]>2016-05-31 14:28:54 +0200
committerJosé Valim <[email protected]>2017-01-30 15:24:05 +0100
commit26b59dfe67ef551cd94765557cdd8c79794bcc38 (patch)
tree696adc07b3e7a4a3f1ed6c52311ff6e163b218b4 /erts/doc
parent6c7539b0e39996f870385e5276e08c0dd98b6eb8 (diff)
downloadotp-26b59dfe67ef551cd94765557cdd8c79794bcc38.tar.gz
otp-26b59dfe67ef551cd94765557cdd8c79794bcc38.tar.bz2
otp-26b59dfe67ef551cd94765557cdd8c79794bcc38.zip
Add new AtU8 beam chunk
The new chunk stores atoms encoded in UTF-8. beam_lib has also been modified to handle the new 'utf8_atoms' attribute while the 'atoms' attribute may be a missing chunk from now on. The binary_to_atom/2 BIF can now encode any utf8 binary with up to 255 characters. The list_to_atom/1 BIF can now accept codepoints higher than 255 with up to 255 characters (thanks to Björn Gustavsson).
Diffstat (limited to 'erts/doc')
-rw-r--r--erts/doc/src/erl_ext_dist.xml15
-rw-r--r--erts/doc/src/erlang.xml35
2 files changed, 18 insertions, 32 deletions
diff --git a/erts/doc/src/erl_ext_dist.xml b/erts/doc/src/erl_ext_dist.xml
index 4f799f8f34..a436a9ca74 100644
--- a/erts/doc/src/erl_ext_dist.xml
+++ b/erts/doc/src/erl_ext_dist.xml
@@ -119,16 +119,11 @@
<tcaption>Compressed Data Format when Expanded</tcaption></table>
<marker id="utf8_atoms"/>
<note>
- <p>As from ERTS 5.10 (OTP R16) support
- for UTF-8 encoded atoms has been introduced in the external format.
- However, only characters that can be encoded using Latin-1 (ISO-8859-1)
- are currently supported in atoms. The support for UTF-8 encoded atoms
- in the external format has been implemented to be able to support
- all Unicode characters in atoms in <em>some future release</em>.
- Until full Unicode support for atoms has been introduced,
- it is an <em>error</em> to pass atoms containing
- characters that cannot be encoded in Latin-1, and <em>the behavior is
- undefined</em>.</p>
+ <p>As from ERTS 9.0 (OTP 20), UTF-8 encoded atoms may contain any Unicode
+ character. Although the support for UTF-8 encoded atoms in the external
+ format is available since ERTS 5.10 (OTP R16), passing atoms that cannot
+ be encoded in Latin-1 is an <em>error</em> in versions earlier than
+ Erlang/OTP 20, and <em>the behavior is undefined</em>.</p>
<p>When distribution flag <seealso marker="erl_dist_protocol#dflags">
<c>DFLAG_UTF8_ATOMS</c></seealso> has been exchanged between both nodes
in the <seealso marker="erl_dist_protocol#distribution_handshake">
diff --git a/erts/doc/src/erlang.xml b/erts/doc/src/erlang.xml
index b3fab3874b..cf038c49f0 100644
--- a/erts/doc/src/erlang.xml
+++ b/erts/doc/src/erlang.xml
@@ -325,16 +325,11 @@ Z = erlang:adler32_combine(X,Y,iolist_size(Data2)).</code>
is <c>latin1</c>, one byte exists for each character
in the text representation. If <c><anno>Encoding</anno></c> is
<c>utf8</c> or
- <c>unicode</c>, the characters are encoded using UTF-8
- (that is, characters from 128 through 255 are
- encoded in two bytes).</p>
+ <c>unicode</c>, the characters are encoded using UTF-8 where
+ characters may require multiple bytes.</p>
<note>
- <p><c>atom_to_binary(<anno>Atom</anno>, latin1)</c> never
- fails, as the text representation of an atom can only
- contain characters from 0 through 255. In a future release,
- the text representation
- of atoms can be allowed to contain any Unicode character and
- <c>atom_to_binary(<anno>Atom</anno>, latin1)</c> then fails if the
+ <p>As from Erlang/OTP 20, atoms can contain any Unicode character
+ and <c>atom_to_binary(<anno>Atom</anno>, latin1)</c> may fail if the
text representation for <c><anno>Atom</anno></c> contains a Unicode
character &gt; 255.</p>
</note>
@@ -402,13 +397,11 @@ Z = erlang:adler32_combine(X,Y,iolist_size(Data2)).</code>
translation of bytes in the binary is done.
If <c><anno>Encoding</anno></c>
is <c>utf8</c> or <c>unicode</c>, the binary must contain
- valid UTF-8 sequences. Only Unicode characters up
- to 255 are allowed.</p>
+ valid UTF-8 sequences.</p>
<note>
- <p><c>binary_to_atom(<anno>Binary</anno>, utf8)</c> fails if
- the binary contains Unicode characters &gt; 255.
- In a future release, such Unicode characters can be allowed and
- <c>binary_to_atom(<anno>Binary</anno>, utf8)</c> does then not fail.
+ <p>As from Erlang/OTP 20, <c>binary_to_atom(<anno>Binary</anno>, utf8)</c>
+ is capable of encoding any Unicode character. Earlier versions would
+ fail if the binary contained Unicode characters &gt; 255.
For more information about Unicode support in atoms, see the
<seealso marker="erl_ext_dist#utf8_atoms">note on UTF-8
encoded atoms</seealso>
@@ -419,9 +412,7 @@ Z = erlang:adler32_combine(X,Y,iolist_size(Data2)).</code>
> <input>binary_to_atom(&lt;&lt;"Erlang"&gt;&gt;, latin1).</input>
'Erlang'
> <input>binary_to_atom(&lt;&lt;1024/utf8&gt;&gt;, utf8).</input>
-** exception error: bad argument
- in function binary_to_atom/2
- called as binary_to_atom(&lt;&lt;208,128&gt;&gt;,utf8)</pre>
+'Ѐ'</pre>
</desc>
</func>
@@ -2401,10 +2392,10 @@ os_prompt%</pre>
<desc>
<p>Returns the atom whose text representation is
<c><anno>String</anno></c>.</p>
- <p><c><anno>String</anno></c> can only contain ISO-latin-1
- characters (that is, numbers &lt; 256) as the implementation does not
- allow Unicode characters equal to or above 256 in atoms.
- For more information on Unicode support in atoms, see
+ <p>As from Erlang/OTP 20, <c><anno>String</anno></c> may contain
+ any Unicode character. Earlier versions allowed only ISO-latin-1
+ characters as the implementation did not allow Unicode characters
+ above 255. For more information on Unicode support in atoms, see
<seealso marker="erl_ext_dist#utf8_atoms">note on UTF-8
encoded atoms</seealso>
in section "External Term Format" in the User's Guide.</p>