diff options
author | Sverker Eriksson <[email protected]> | 2013-01-23 18:09:35 +0100 |
---|---|---|
committer | Sverker Eriksson <[email protected]> | 2013-01-23 18:09:35 +0100 |
commit | b8e623410d1c22fe6d5fdeb8ccb0b2305533f033 (patch) | |
tree | 708d64e36e18b61ae1801c02ec3aeef42a697be3 /lib/erl_interface/doc | |
parent | e99df74bee7c245ec76678e336fcd09d4b51a089 (diff) | |
parent | d6e3e256b850050b7a86323b2948009d5fcc30a9 (diff) | |
download | otp-b8e623410d1c22fe6d5fdeb8ccb0b2305533f033.tar.gz otp-b8e623410d1c22fe6d5fdeb8ccb0b2305533f033.tar.bz2 otp-b8e623410d1c22fe6d5fdeb8ccb0b2305533f033.zip |
Merge branch 'sverk/r16/utf8-atoms'
* sverk/r16/utf8-atoms:
erl_interface: Fix bug when transcoding atoms from and to UTF8
erl_interface: Changed erlang_char_encoding interface
erts: Testcase doing unicode atom printout with ~w
erl_interface: even more utf8 atom stuff
erts: Fix bug in analyze_utf8 causing faulty latin1 detection
Add UTF-8 node name support for epmd
workaround...
Fix merge conflict with hasse
UTF-8 atom documentation
test case
erl_interface: utf8 atoms continued
Add utf8 atom distribution test cases
atom fixes for NIFs and atom_to_binary
UTF-8 support for distribution
Implement UTF-8 atom support for jinterface
erl_interface: Enable decode of unicode atoms
stdlib: Fix printing of unicode atoms
erts: Change internal representation of atoms to utf8
erts: Refactor rename DFLAG(S)_INTERNAL_TAGS for conformity
Conflicts:
erts/emulator/beam/io.c
OTP-10753
Diffstat (limited to 'lib/erl_interface/doc')
-rw-r--r-- | lib/erl_interface/doc/src/ei.xml | 64 | ||||
-rw-r--r-- | lib/erl_interface/doc/src/erl_eterm.xml | 18 |
2 files changed, 74 insertions, 8 deletions
diff --git a/lib/erl_interface/doc/src/ei.xml b/lib/erl_interface/doc/src/ei.xml index 539e16d837..117c787da6 100644 --- a/lib/erl_interface/doc/src/ei.xml +++ b/lib/erl_interface/doc/src/ei.xml @@ -82,6 +82,25 @@ function returns the size required (note that for strings an extra byte is needed for the 0 string terminator).</p> </description> + <section> + <title>DATA TYPES</title> + + <taglist> + <tag><marker id="erlang_char_encoding"/>enum erlang_char_encoding</tag> + <item> + <p/> + <code type="none"> +enum erlang_char_encoding { + ERLANG_ASCII, ERLANG_LATIN1, ERLANG_UTF8 +}; +</code> + <p>The character encoding used for atoms. <c>ERLANG_ASCII</c> represents 7-bit ASCII. + Latin1 and UTF8 are different extensions of 7-bit ASCII. All 7-bit ASCII characters + are valid Latin1 and UTF8 characters. ASCII and Latin1 both represent each character + by one byte. A UTF8 character can consist of one to four bytes.</p> + </item> + </taglist> + </section> <funcs> <func> <name><ret>void</ret><nametext>ei_set_compat_rel(release_number)</nametext></name> @@ -225,12 +244,32 @@ <fsummary>Encode an atom</fsummary> <desc> <p>Encodes an atom in the binary format. The <c><![CDATA[p]]></c> parameter - is the name of the atom. Only upto <c><![CDATA[MAXATOMLEN]]></c> bytes + is the name of the atom in latin1 encoding. Only upto <c>MAXATOMLEN-1</c> bytes are encoded. The name should be zero-terminated, except for the <c><![CDATA[ei_x_encode_atom_len()]]></c> function.</p> </desc> </func> <func> + <name><ret>int</ret><nametext>ei_encode_atom_as(char *buf, int *index, const char *p, enum erlang_char_encoding from_enc, enum erlang_char_encoding to_enc)</nametext></name> + <name><ret>int</ret><nametext>ei_encode_atom_len_as(char *buf, int *index, const char *p, int len, enum erlang_char_encoding from_enc, enum erlang_char_encoding to_enc)</nametext></name> + <name><ret>int</ret><nametext>ei_x_encode_atom_as(ei_x_buff* x, const char *p, enum erlang_char_encoding from_enc, enum erlang_char_encoding to_enc)</nametext></name> + <name><ret>int</ret><nametext>ei_x_encode_atom_len_as(ei_x_buff* x, const char *p, int len, enum erlang_char_encoding from_enc, enum erlang_char_encoding to_enc)</nametext></name> + <fsummary>Encode an atom</fsummary> + <desc> + <p>Encodes an atom in the binary format with character encoding + <c><seealso marker="#erlang_char_encoding">to_enc</seealso></c> (latin1 or utf8). + The <c>p</c> parameter is the name of the atom with character encoding + <c><seealso marker="#erlang_char_encoding">from_enc</seealso></c> (ascii, latin1 or utf8). + The name must either be zero-terminated or a function variant with a <c>len</c> + parameter must be used.</p> + <p>The encoding will fail if <c>p</c> is not a valid string in encoding <c>from_enc</c>, + if the string is too long or if it can not be represented with character encoding <c>to_enc</c>.</p> + <p>These functions were introduced in R16 release of Erlang/OTP as part of a first step + to support UTF8 atoms. Atoms encoded with <c>ERLANG_UTF8</c> + can not be decoded by earlier releases than R16.</p> + </desc> + </func> + <func> <name><ret>int</ret><nametext>ei_encode_binary(char *buf, int *index, const void *p, long len)</nametext></name> <name><ret>int</ret><nametext>ei_x_encode_binary(ei_x_buff* x, const void *p, long len)</nametext></name> <fsummary>Encode a binary</fsummary> @@ -490,11 +529,32 @@ ei_x_encode_empty_list(&x); <fsummary>Decode an atom</fsummary> <desc> <p>This function decodes an atom from the binary format. The - name of the atom is placed at <c><![CDATA[p]]></c>. There can be at most + null terminated name of the atom is placed at <c><![CDATA[p]]></c>. There can be at most <c><![CDATA[MAXATOMLEN]]></c> bytes placed in the buffer.</p> </desc> </func> <func> + <name><ret>int</ret><nametext>ei_decode_atom_as(const char *buf, int *index, char *p, int plen, enum erlang_char_encoding want, enum erlang_char_encoding* was, enum erlang_char_encoding* result)</nametext></name> + <fsummary>Decode an atom</fsummary> + <desc> + <p>This function decodes an atom from the binary format. The + null terminated name of the atom is placed in buffer at <c>p</c> of length + <c>plen</c> bytes.</p> + <p>The wanted string encoding is specified by <c><seealso marker="#erlang_char_encoding"> + want</seealso></c>. The original encoding used in the + binary format (latin1 or utf8) can be obtained from <c>*was</c>. The actual encoding of the resulting string + (7-bit ascii, latin1 or utf8) can be obtained from <c>*result</c>. Both <c>was</c> and <c>result</c> can be <c>NULL</c>. + + <c>*result</c> may differ from <c>want</c> if <c>want</c> is a bitwise-or'd combination like + <c>ERLANG_LATIN1|ERLANG_UTF8</c> or if <c>*result</c> turn out to be pure 7-bit ascii + (compatible with both latin1 and utf8).</p> + <p>This function fails if the atom is too long for the buffer + or if it can not be represented with encoding <c>want</c>.</p> + <p>This function was introduced in R16 release of Erlang/OTP as part of a first step + to support UTF8 atoms.</p> + </desc> + </func> + <func> <name><ret>int</ret><nametext>ei_decode_binary(const char *buf, int *index, void *p, long *len)</nametext></name> <fsummary>Decode a binary</fsummary> <desc> diff --git a/lib/erl_interface/doc/src/erl_eterm.xml b/lib/erl_interface/doc/src/erl_eterm.xml index f403618c59..c7840d7813 100644 --- a/lib/erl_interface/doc/src/erl_eterm.xml +++ b/lib/erl_interface/doc/src/erl_eterm.xml @@ -77,10 +77,12 @@ </p> <taglist> <tag><c><![CDATA[char *ERL_ATOM_PTR(t)]]></c></tag> + <tag><c><![CDATA[char *ERL_ATOM_PTR_UTF8(t)]]></c></tag> <item>A string representing atom <c><![CDATA[t]]></c>. </item> <tag><c><![CDATA[int ERL_ATOM_SIZE(t)]]></c></tag> - <item>The length (in characters) of atom t.</item> + <tag><c><![CDATA[int ERL_ATOM_SIZE_UTF8(t)]]></c></tag> + <item>The length (in bytes) of atom t.</item> <tag><c><![CDATA[void *ERL_BIN_PTR(t)]]></c></tag> <item>A pointer to the contents of <c><![CDATA[t]]></c></item> <tag><c><![CDATA[int ERL_BIN_SIZE(t)]]></c></tag> @@ -92,6 +94,7 @@ <tag><c><![CDATA[double ERL_FLOAT_VALUE(t)]]></c></tag> <item>The floating point value of <c><![CDATA[t]]></c>.</item> <tag><c><![CDATA[ETERM *ERL_PID_NODE(t)]]></c></tag> + <tag><c><![CDATA[ETERM *ERL_PID_NODE_UTF8(t)]]></c></tag> <item>The Node in pid <c><![CDATA[t]]></c>.</item> <tag><c><![CDATA[int ERL_PID_NUMBER(t)]]></c></tag> <item>The sequence number in pid <c><![CDATA[t]]></c>.</item> @@ -104,6 +107,7 @@ <tag><c><![CDATA[int ERL_PORT_CREATION(t)]]></c></tag> <item>The creation number in port <c><![CDATA[t]]></c>.</item> <tag><c><![CDATA[ETERM *ERL_PORT_NODE(t)]]></c></tag> + <tag><c><![CDATA[ETERM *ERL_PORT_NODE_UTF8(t)]]></c></tag> <item>The node in port <c><![CDATA[t]]></c>.</item> <tag><c><![CDATA[int ERL_REF_NUMBER(t)]]></c></tag> <item>The first part of the reference number in ref <c><![CDATA[t]]></c>. Use @@ -296,7 +300,7 @@ iohead ::= Binary <name><ret>ETERM *</ret><nametext>erl_mk_atom(string)</nametext></name> <fsummary>Creates an atom</fsummary> <type> - <v>char *string;</v> + <v>const char *string;</v> </type> <desc> <p>Creates an atom.</p> @@ -305,10 +309,12 @@ iohead ::= Binary <p>Returns an Erlang term containing an atom. Note that it is the callers responsibility to make sure that <c><![CDATA[string]]></c> contains a valid name for an atom.</p> - <p><c><![CDATA[ERL_ATOM_PTR(atom)]]></c> can be used to retrieve the - atom name (as a string). Note that the string is not - 0-terminated in the atom. <c><![CDATA[ERL_ATOM_SIZE(atom)]]></c>returns - the length of the atom name.</p> + <p><c><![CDATA[ERL_ATOM_PTR(atom)]]></c> and <c><![CDATA[ERL_ATOM_PTR_UTF8(atom)]]></c> + can be used to retrieve the atom name (as a null terminated string). <c><![CDATA[ERL_ATOM_SIZE(atom)]]></c> + and <c><![CDATA[ERL_ATOM_SIZE_UTF8(atom)]]></c> returns the length of the atom name.</p> + <note><p>Note that the UTF8 variants were introduced in Erlang/OTP releases R16 + and the string returned by <c>ERL_ATOM_PTR(atom)</c> was not null terminated on older releases.</p> + </note> </desc> </func> <func> |