From aef954289c850f69fee007fe978a627f8fae57e8 Mon Sep 17 00:00:00 2001 From: Raimo Niskanen Date: Tue, 22 Nov 2011 11:05:08 +0100 Subject: stdlib: Fix typo in unicode_usage.xml Reported by Uwe Dauernheim. --- lib/stdlib/doc/src/unicode_usage.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/stdlib/doc/src/unicode_usage.xml b/lib/stdlib/doc/src/unicode_usage.xml index 0fa7de0a5c..a7e010a05f 100644 --- a/lib/stdlib/doc/src/unicode_usage.xml +++ b/lib/stdlib/doc/src/unicode_usage.xml @@ -59,7 +59,7 @@ Standard Unicode representation in Erlang

In Erlang, strings are actually lists of integers. A string is defined to be encoded in the ISO-latin-1 (ISO8859-1) character set, which is, codepoint by codepoint, a sub-range of the Unicode character set.

The standard list encoding for strings is therefore easily extendible to cope with the whole Unicode range: A Unicode string in Erlang is simply a list containing integers, each integer being a valid Unicode codepoint and representing one character in the Unicode character set.

-

Regular Erlang strings in ISO-latin-1 are a subset of there Unicode strings.

+

Regular Erlang strings in ISO-latin-1 are a subset of their Unicode strings.

Binaries on the other hand are more troublesome. For performance reasons, programs often store textual data in binaries instead of lists, mainly because they are more compact (one byte per character instead of two words per character, as is the case with lists). Using erlang:list_to_binary/1, an regular Erlang string can be converted into a binary, effectively using the ISO-latin-1 encoding in the binary - one byte per character. This is very convenient for those regular Erlang strings, but cannot be done for Unicode lists.

As the UTF-8 encoding is widely spread and provides the most compact storage, it is selected as the standard encoding of Unicode characters in binaries for Erlang.

-- cgit v1.2.3