From 1b7b4f7398765188815f697444e42029454dcd3d Mon Sep 17 00:00:00 2001 From: Sverker Eriksson Date: Wed, 8 Mar 2017 20:34:55 +0100 Subject: erts: Mark latin1 atom encoding as deprecated which means tags ATOM_EXT and SMALL_ATOM_EXT. --- erts/doc/src/erl_ext_dist.xml | 156 ++++++++++++++++++++---------------------- 1 file changed, 76 insertions(+), 80 deletions(-) (limited to 'erts/doc') diff --git a/erts/doc/src/erl_ext_dist.xml b/erts/doc/src/erl_ext_dist.xml index a436a9ca74..da2dc94e5b 100644 --- a/erts/doc/src/erl_ext_dist.xml +++ b/erts/doc/src/erl_ext_dist.xml @@ -51,7 +51,7 @@ term into the external format. To convert binary data encoding to a term, the BIF - erlang:binary_to_term/1c> is used. + erlang:binary_to_term/1 is used.

The distribution does this implicitly when sending messages across @@ -119,22 +119,18 @@ Compressed Data Format when Expanded -

As from ERTS 9.0 (OTP 20), UTF-8 encoded atoms may contain any Unicode - character. Although the support for UTF-8 encoded atoms in the external - format is available since ERTS 5.10 (OTP R16), passing atoms that cannot - be encoded in Latin-1 is an error in versions earlier than - Erlang/OTP 20, and the behavior is undefined.

-

When distribution flag - DFLAG_UTF8_ATOMS has been exchanged between both nodes - in the - distribution handshake, all atoms in the distribution header - are encoded in UTF-8, otherwise in Latin-1. The two - new tags ATOM_UTF8_EXT - and - SMALL_ATOM_UTF8_EXT - are only used if the distribution flag DFLAG_UTF8_ATOMS has - been exchanged between nodes, or if an atom containing characters - that cannot be encoded in Latin-1 is encountered.

+

As from ERTS 9.0 (OTP 20), atoms may contain any Unicode + characters and are always encoded using the UTF-8 external formats + ATOM_UTF8_EXT + or SMALL_ATOM_UTF8_EXT. + The old Latin-1 formats ATOM_EXT + and SMALL_ATOM_EXT + are deprecated and are only kept for backward + compatibility when decoding terms encoded by older nodes.

+

Support for UTF-8 encoded atoms in the external format has been + available since ERTS 5.10 (OTP R16). This abillity allows such old nodes + to decode, store and encode any Unicode atoms received from a new OTP 20 + node.

The maximum number of allowed characters in an atom is 255. In the UTF-8 case, each character can need 4 bytes to be encoded.

@@ -389,28 +385,6 @@

-
- - ATOM_EXT - - - 1 - 2 - Len - - - 100 - Len - AtomName - - ATOM_EXT
-

- An atom is stored with a 2 byte unsigned length in big-endian order, - followed by Len numbers of 8-bit Latin-1 characters that forms - the AtomName. The maximum allowed value for Len is 255. -

-
-
REFERENCE_EXT @@ -432,8 +406,8 @@ Encodes a reference object (an object generated with erlang:make_ref/0). The Node term is an encoded atom, that is, - ATOM_EXT, - SMALL_ATOM_EXT, or + ATOM_UTF8_EXT, + SMALL_ATOM_UTF8_EXT, or ATOM_CACHE_REF. The ID field contains a big-endian unsigned integer, but is to be regarded as uninterpreted data, @@ -771,39 +745,6 @@

-
- - SMALL_ATOM_EXT - - - 1 - 1 - Len - - - 115 - Len - AtomName - - SMALL_ATOM_EXT
-

- An atom is stored with a 1 byte unsigned length, - followed by Len numbers of 8-bit Latin-1 characters that - forms the AtomName. Longer atoms can be represented - by ATOM_EXT. -

- -

- SMALL_ATOM_EXT was introduced in ERTS 5.7.2 and - require an exchange of distribution flag - - DFLAG_SMALL_ATOM_TAGS in the - - distribution handshake. -

-
-
-
FUN_EXT @@ -838,8 +779,8 @@ Module

Encoded as an atom, using - ATOM_EXT, - SMALL_ATOM_EXT, + ATOM_UTF8_EXT, + SMALL_ATOM_UTF8_EXT, or ATOM_CACHE_REF. This is the module that the fun is implemented in. @@ -933,8 +874,8 @@ Module

Encoded as an atom, using - ATOM_EXT, - SMALL_ATOM_EXT, + ATOM_UTF8_EXT, + SMALL_ATOM_UTF8_EXT, or ATOM_CACHE_REF. Is the module that the fun is implemented in. @@ -996,8 +937,8 @@

Module and Function are atoms - (encoded using ATOM_EXT, - SMALL_ATOM_EXT, or + (encoded using ATOM_UTF8_EXT, + SMALL_ATOM_UTF8_EXT, or ATOM_CACHE_REF).

@@ -1109,6 +1050,61 @@ in the beginning of this section.

+ +
+ + ATOM_EXT (deprecated) + + + 1 + 2 + Len + + + 100 + Len + AtomName + + ATOM_EXT
+

+ An atom is stored with a 2 byte unsigned length in big-endian order, + followed by Len numbers of 8-bit Latin-1 characters that forms + the AtomName. The maximum allowed value for Len is 255. +

+
+ +
+ + SMALL_ATOM_EXT (deprecated) + + + 1 + 1 + Len + + + 115 + Len + AtomName + + SMALL_ATOM_EXT
+

+ An atom is stored with a 1 byte unsigned length, + followed by Len numbers of 8-bit Latin-1 characters that + forms the AtomName. +

+ +

+ SMALL_ATOM_EXT was introduced in ERTS 5.7.2 and + require an exchange of distribution flag + + DFLAG_SMALL_ATOM_TAGS in the + + distribution handshake. +

+
+
+ -- cgit v1.2.3