Allow noncharacter code points in unicode encoding and decoding

The two noncharacter code points 16#FFFE and 16#FFFF were not allowed to be encoded or decoded using the unicode module or bit syntax. That causes an inconsistency, since the noncharacters 16#FDD0 to 16#FDEF could be encoded/decoded. There is two ways to fix that inconsistency. We have chosen to allow 16#FFFE and 16#FFFF to be encoded and decoded, because the noncharacters could be useful internally within an application and it will make encoding and decoding slightly faster. Reported-by: Alisdair Sullivan
author: Björn Gustavsson <[email protected]> 2011-08-30 11:51:11 +0200
committer: Björn Gustavsson <[email protected]> 2011-10-13 14:16:00 +0200
commit: 34db76765561487e526fe66d3d19ecf3b3fb9dc8 (patch)
tree: 9141e3c5729e46d03c8b27b14da3b29b1e54abca /system/doc/reference_manual/expressions.xml
parent: 6ca6dd3c670fb8185ebb9a20c2a731a7375c1cac (diff)
download: otp-34db76765561487e526fe66d3d19ecf3b3fb9dc8.tar.gz
otp-34db76765561487e526fe66d3d19ecf3b3fb9dc8.tar.bz2
otp-34db76765561487e526fe66d3d19ecf3b3fb9dc8.zip
1 files changed, 5 insertions, 7 deletions
diff --git a/system/doc/reference_manual/expressions.xml b/system/doc/reference_manual/expressions.xml
index 497d7eb464..644896cd7f 100644
--- a/system/doc/reference_manual/expressions.xml
+++ b/system/doc/reference_manual/expressions.xml
@@ -879,9 +879,8 @@ Ei = Value |
     and UTF-32, respectively.</p>
 
     <p>When constructing a segment of a <c>utf</c> type, <c>Value</c>
-    must be an integer in one of the ranges 0..16#D7FF,
-    16#E000..16#FFFD, or 16#10000..16#10FFFF
-    (i.e. a valid Unicode code point). Construction
+    must be an integer in the range 0..16#D7FF or
+    16#E000....16#10FFFF. Construction
     will fail with a <c>badarg</c> exception if <c>Value</c> is
     outside the allowed ranges. The size of the resulting binary
     segment depends on the type and/or <c>Value</c>. For <c>utf8</c>,
@@ -896,14 +895,13 @@ Ei = Value |
     <c><![CDATA[<<$a/utf8,$b/utf8,$c/utf8>>]]></c>.</p>
 
     <p>A successful match of a segment of a <c>utf</c> type results
-    in an integer in one of the ranges 0..16#D7FF, 16#E000..16#FFFD,
-    or 16#10000..16#10FFFF
-    (i.e. a valid Unicode code point). The match will fail if returned value
+    in an integer in the range 0..16#D7FF or  16#E000..16#10FFFF.
+    The match will fail if returned value
     would fall outside those ranges.</p>
 
     <p>A segment of type <c>utf8</c> will match 1 to 4 bytes in the binary,
     if the binary at the match position contains a valid UTF-8 sequence.
-    (See RFC-2279 or the Unicode standard.)</p>
+    (See RFC-3629 or the Unicode standard.)</p>
 
     <p>A segment of type <c>utf16</c> may match 2 or 4 bytes in the binary.
     The match will fail if the binary at the match position does not contain
author	Björn Gustavsson <[email protected]>	2011-08-30 11:51:11 +0200
committer	Björn Gustavsson <[email protected]>	2011-10-13 14:16:00 +0200
commit	34db76765561487e526fe66d3d19ecf3b3fb9dc8 (patch)
tree	9141e3c5729e46d03c8b27b14da3b29b1e54abca /system/doc/reference_manual/expressions.xml
parent	6ca6dd3c670fb8185ebb9a20c2a731a7375c1cac (diff)
download	otp-34db76765561487e526fe66d3d19ecf3b3fb9dc8.tar.gz otp-34db76765561487e526fe66d3d19ecf3b3fb9dc8.tar.bz2 otp-34db76765561487e526fe66d3d19ecf3b3fb9dc8.zip