diff options
Diffstat (limited to 'lib/stdlib/doc/src')
-rw-r--r-- | lib/stdlib/doc/src/unicode_usage.xml | 35 |
1 files changed, 19 insertions, 16 deletions
diff --git a/lib/stdlib/doc/src/unicode_usage.xml b/lib/stdlib/doc/src/unicode_usage.xml index 33cd70e0b7..ee7dd128f1 100644 --- a/lib/stdlib/doc/src/unicode_usage.xml +++ b/lib/stdlib/doc/src/unicode_usage.xml @@ -52,8 +52,8 @@ for UTF-8 and more support for Unicode character sets in the I/O-system.</p> - <p>In R17, the encoding default for Erlang source files will be - switched to UTF-8 and in R18 Erlang will support atoms in the full + <p>In 17.0, the encoding default for Erlang source files was + switched to UTF-8 and in 18.0 Erlang will support atoms in the full Unicode range, meaning full Unicode function and module names</p> @@ -290,7 +290,7 @@ <item>Having the source code in UTF-8 also allows you to write string literals containing Unicode characters with code points > 255, although atoms, module names and function names will be - restricted to the ISO-Latin-1 range until the R18 release. Binary + restricted to the ISO-Latin-1 range until the 18.0 release. Binary literals where you use the <c>/utf8</c> type, can also be expressed using Unicode characters > 255. Having module names using characters other than 7-bit ASCII can cause trouble on @@ -385,7 +385,7 @@ external_charlist() = maybe_improper_list(char() | using characters from the ISO-latin-1 character set and atoms are restricted to the same ISO-latin-1 range. These restrictions in the language are of course independent of the encoding of the source - file. Erlang/OTP R18 is expected to handle functions named in + file. Erlang/OTP 18.0 is expected to handle functions named in Unicode as well as Unicode atoms.</p> <section> <title>Bit-syntax</title> @@ -662,11 +662,14 @@ Eshell V5.10.1 (abort with ^G) containing characters having code points between 128 and 255 may be named either as plain ISO-latin-1 or using UTF-8 encoding. As no consistency is enforced, the Erlang VM can do no consistent - translation of all file names. If the VM would automatically - select encoding based on heuristics, one could get unexpected - behavior on these systems. By default, Erlang starts in "latin1" - file name mode on such systems, meaning bytewise encoding in file - names. This allows for list representation of all file names in + translation of all file names.</p> + + <p>By default on such systems, Erlang starts in <c>utf8</c> file + name mode if the terminal supports UTF-8, otherwise in + <c>latin1</c> mode.</p> + + <p>In the <c>latin1</c> mode, file names are bytewise endcoded. + This allows for list representation of all file names in the system, but, for example, a file named "Ă–stersund.txt", will appear in <c>file:list_dir/1</c> as either "Ă–stersund.txt" (if the file name was encoded in bytewise ISO-Latin-1 by the program @@ -752,7 +755,7 @@ Eshell V5.10.1 (abort with ^G) <section> <title>Notes About Raw File Names</title> - + <marker id="notes-about-raw-filenames"/> <p>Raw file names were introduced together with Unicode file name support in erts-5.8.2 (OTP R14B01). The reason "raw file names" was introduced in the system was to be able to @@ -1014,7 +1017,8 @@ ok allowed. This setting should correspond to the actual terminal you are using.</p> <p>The environment can also affect file name interpretation, if - Erlang is started with the <c>+fna</c> flag.</p> + Erlang is started with the <c>+fna</c> flag (which is default from + Erlang/OTP 17.0).</p> <p>You can check the setting of this by calling <c>io:getopts()</c>, which will give you an option list containing <c>{encoding,unicode}</c> or @@ -1046,8 +1050,7 @@ ok > 255.</p> <p><c>+fnl</c> means bytewise interpretation of file names, which was the usual way to represent ISO-Latin-1 file names before - UTF-8 file naming got widespread. This is the default on all - Unix-like operating systems except MacOS X.</p> + UTF-8 file naming got widespread.</p> <p><c>+fnu</c> means that file names are encoded in UTF-8, which is nowadays the common scheme (although not enforced).</p> <p><c>+fna</c> means that you automatically select between @@ -1055,8 +1058,8 @@ ok <c>LC_CTYPE</c> environment variables. This is optimistic heuristics indeed, nothing enforces a user to have a terminal with the same encoding as the file system, but usually, this is - the case. This might be the default behavior in a future - release.</p> + the case. This is the default on all Unix-like operating + systems except MacOS X.</p> <p>The file name translation mode can be read with the <c>file:native_name_encoding/0</c> function, which returns @@ -1068,7 +1071,7 @@ ok <p>This function returns the default encoding for Erlang source files (if no encoding comment is present) in the currently running release. For R16 this returns <c>latin1</c> (meaning - bytewise encoding). In R17 and forward it is expected to return + bytewise encoding). In 17.0 and forward it returns <c>utf8</c>.</p> <p>The encoding of each file can be specified using comments as described in |