From b6196f5d47c06e6991cfc7e76db3d588a7797302 Mon Sep 17 00:00:00 2001
From: Patrik Nyblom Force the The VM works with file names as if they are encoded using the ISO-latin-1 encoding, disallowing Unicode characters with codepoints beyond 255. This is default on operating systems that have transparent file naming, i.e. all Unixes except MacOSX.
If Unicode filename encoding is in effect (see the
If Unicode file name encoding is in effect (see the
Returns the
If Unicode file name encoding is in effect (see the
Sets a new
If Unicode filename encoding is in effect (see the
On Unix platforms, the environment will be set using UTF-8 encoding + if Unicode file name translation is in effect. On Windows the + environment is set using wide character interfaces.
Certain ranges of characters are left unused and certain ranges are even deemed invalid. The most notable invalid range is 16#D800 - 16#DFFF, as the UTF-16 encoding does not allow for encoding of these numbers. It can be speculated that the UTF-16 encoding standard was, from the beginning, expected to be able to hold all Unicode characters in one 16-bit entity, but then had to be extended, leaving a hole in the Unicode range to cope with backward compatibility.
-Additionally, the codepoint 16#FEFF is used for byte order marks (BOM's) and use of that character is not encouraged in other contexts than that. It actually is valid though, as the character "ZWNBS" (Zero Width Non Breaking Space). BOM's are used to identify encodings and byte order for programs where such parameters are not known in advance. Byte order marks are more seldom used than one could expect, put their use is becoming more widely spread as they provide the means for programs to make educated guesses about the Unicode format of a certain file.
+Additionally, the codepoint 16#FEFF is used for byte order marks (BOM's) and use of that character is not encouraged in other contexts than that. It actually is valid though, as the character "ZWNBS" (Zero Width Non Breaking Space). BOM's are used to identify encodings and byte order for programs where such parameters are not known in advance. Byte order marks are more seldom used than one could expect, but their use is becoming more widely spread as they provide the means for programs to make educated guesses about the Unicode format of a certain file.
Environment variables and their interpretation is handled much in the same way as file names. If Unicode file names are enabled, environment variables as well as parameters to the Erlang VM are expected to be in Unicode.
+If Unicode file names are enabled, the calls to
On Unix-like operating systems, parameters are expected to be UTF-8 without translation if Unicode file names are enabled.
+Most of the modules in Erlang/OTP are of course Unicode-unaware in the sense that they have no notion of Unicode and really shouldn't have. Typically they handle non-textual or byte-oriented data (like
Modules that actually handle textual data (like