From 037150979ff809df85757bd2b3f676e2e4c6be88 Mon Sep 17 00:00:00 2001 From: Hans Bolinder Date: Tue, 17 Jan 2012 12:28:32 +0100 Subject: Move types and specs from erl_bif_types.erl to modules --- lib/stdlib/doc/src/unicode.xml | 50 +++++++++++++----------------------------- 1 file changed, 15 insertions(+), 35 deletions(-) (limited to 'lib/stdlib/doc/src/unicode.xml') diff --git a/lib/stdlib/doc/src/unicode.xml b/lib/stdlib/doc/src/unicode.xml index 1001ebbae4..1f6cbaccd7 100644 --- a/lib/stdlib/doc/src/unicode.xml +++ b/lib/stdlib/doc/src/unicode.xml @@ -5,7 +5,7 @@
1996 - 2011 + 2012 Ericsson AB, All Rights Reserved @@ -130,34 +130,24 @@ - characters_to_list(Data, InEncoding) -> Result + Convert a collection of characters to list of Unicode characters - - Data = latin1_chardata() - | chardata() - | external_chardata() - Result = list() | {error, list(), RestData} | {incomplete, list(), binary()} - RestData = latin1_chardata() - | chardata() - | external_chardata() - InEncoding = encoding() -

This function converts a possibly deep list of integers and binaries into a list of integers representing unicode characters. The binaries in the input may have characters encoded as latin1 (0 - 255, one character per byte), in which - case the InEncoding parameter should be given as + case the InEncoding parameter should be given as latin1, or have characters encoded as one of the - UTF-encodings, which is given as the InEncoding - parameter. Only when the InEncoding is one of the UTF + UTF-encodings, which is given as the InEncoding + parameter. Only when the InEncoding is one of the UTF encodings, integers in the list are allowed to be grater than 255.

-

If InEncoding is latin1, the Data parameter +

If InEncoding is latin1, the Data parameter corresponds to the iodata() type, but for unicode, - the Data parameter can contain integers greater than 255 + the Data parameter can contain integers greater than 255 (unicode characters beyond the iso-latin-1 range), which would make it invalid as iodata().

@@ -188,16 +178,16 @@ depth as the original data. The error occurs when traversing the list and whatever's left to decode is simply returned as is.

-

However, if the input Data is a pure binary, the third +

However, if the input Data is a pure binary, the third part of the error tuple is guaranteed to be a binary as well.

Errors occur for the following reasons:

- Integers out of range - If InEncoding is + Integers out of range - If InEncoding is latin1, an error occurs whenever an integer greater - than 255 is found in the lists. If InEncoding is + than 255 is found in the lists. If InEncoding is of a Unicode type, an error occurs whenever an integer greater than 16#10FFFF @@ -208,7 +198,7 @@ is found. - UTF encoding incorrect - If InEncoding is + UTF encoding incorrect - If InEncoding is one of the UTF types, the bytes in any binaries have to be valid in that encoding. Errors can occur for various reasons, including "pure" decoding errors @@ -220,7 +210,7 @@ number should have been encoded in fewer bytes. The case of a truncated UTF is handled specially, see the paragraph about incomplete binaries below. If - InEncoding is latin1, binaries are always valid + InEncoding is latin1, binaries are always valid as long as they contain whole bytes, as each byte falls into the valid iso-latin-1 range. @@ -238,7 +228,7 @@ the first part of a (so far) valid UTF character.

If one UTF characters is split over two consecutive - binaries in the Data, the conversion succeeds. This means + binaries in the Data, the conversion succeeds. This means that a character can be decoded from a range of binaries as long as the whole range is given as input without errors occurring. Example:

@@ -274,21 +264,11 @@
- characters_to_binary(Data,InEncoding) -> Result + Convert a collection of characters to an UTF-8 binary - - Data = latin1_chardata() - | chardata() - | external_chardata() - Result = binary() | {error, binary(), RestData} | {incomplete, binary(), binary()} - RestData = latin1_chardata() - | chardata() - | external_chardata() - InEncoding = encoding() - -

Same as characters_to_binary(Data, InEncoding, unicode).

+

Same as characters_to_binary(Data, InEncoding, unicode).

-- cgit v1.2.3