From 0cc9753f7f873bbcf8a528e05ab0adbd5c8fc79c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B6rn=20Gustavsson?= Date: Mon, 27 Jan 2014 13:10:54 +0100 Subject: Change the default file name encoding mode to +fnaw --- lib/stdlib/doc/src/unicode_usage.xml | 35 +++++++++++++++++++---------------- 1 file changed, 19 insertions(+), 16 deletions(-) (limited to 'lib/stdlib') diff --git a/lib/stdlib/doc/src/unicode_usage.xml b/lib/stdlib/doc/src/unicode_usage.xml index 33cd70e0b7..ee7dd128f1 100644 --- a/lib/stdlib/doc/src/unicode_usage.xml +++ b/lib/stdlib/doc/src/unicode_usage.xml @@ -52,8 +52,8 @@ for UTF-8 and more support for Unicode character sets in the I/O-system.

-

In R17, the encoding default for Erlang source files will be - switched to UTF-8 and in R18 Erlang will support atoms in the full +

In 17.0, the encoding default for Erlang source files was + switched to UTF-8 and in 18.0 Erlang will support atoms in the full Unicode range, meaning full Unicode function and module names

@@ -290,7 +290,7 @@ Having the source code in UTF-8 also allows you to write string literals containing Unicode characters with code points > 255, although atoms, module names and function names will be - restricted to the ISO-Latin-1 range until the R18 release. Binary + restricted to the ISO-Latin-1 range until the 18.0 release. Binary literals where you use the /utf8 type, can also be expressed using Unicode characters > 255. Having module names using characters other than 7-bit ASCII can cause trouble on @@ -385,7 +385,7 @@ external_charlist() = maybe_improper_list(char() | using characters from the ISO-latin-1 character set and atoms are restricted to the same ISO-latin-1 range. These restrictions in the language are of course independent of the encoding of the source - file. Erlang/OTP R18 is expected to handle functions named in + file. Erlang/OTP 18.0 is expected to handle functions named in Unicode as well as Unicode atoms.

Bit-syntax @@ -662,11 +662,14 @@ Eshell V5.10.1 (abort with ^G) containing characters having code points between 128 and 255 may be named either as plain ISO-latin-1 or using UTF-8 encoding. As no consistency is enforced, the Erlang VM can do no consistent - translation of all file names. If the VM would automatically - select encoding based on heuristics, one could get unexpected - behavior on these systems. By default, Erlang starts in "latin1" - file name mode on such systems, meaning bytewise encoding in file - names. This allows for list representation of all file names in + translation of all file names.

+ +

By default on such systems, Erlang starts in utf8 file + name mode if the terminal supports UTF-8, otherwise in + latin1 mode.

+ +

In the latin1 mode, file names are bytewise endcoded. + This allows for list representation of all file names in the system, but, for example, a file named "Ă–stersund.txt", will appear in file:list_dir/1 as either "Ă–stersund.txt" (if the file name was encoded in bytewise ISO-Latin-1 by the program @@ -752,7 +755,7 @@ Eshell V5.10.1 (abort with ^G)

Notes About Raw File Names - +

Raw file names were introduced together with Unicode file name support in erts-5.8.2 (OTP R14B01). The reason "raw file names" was introduced in the system was to be able to @@ -1014,7 +1017,8 @@ ok allowed. This setting should correspond to the actual terminal you are using.

The environment can also affect file name interpretation, if - Erlang is started with the +fna flag.

+ Erlang is started with the +fna flag (which is default from + Erlang/OTP 17.0).

You can check the setting of this by calling io:getopts(), which will give you an option list containing {encoding,unicode} or @@ -1046,8 +1050,7 @@ ok > 255.

+fnl means bytewise interpretation of file names, which was the usual way to represent ISO-Latin-1 file names before - UTF-8 file naming got widespread. This is the default on all - Unix-like operating systems except MacOS X.

+ UTF-8 file naming got widespread.

+fnu means that file names are encoded in UTF-8, which is nowadays the common scheme (although not enforced).

+fna means that you automatically select between @@ -1055,8 +1058,8 @@ ok LC_CTYPE environment variables. This is optimistic heuristics indeed, nothing enforces a user to have a terminal with the same encoding as the file system, but usually, this is - the case. This might be the default behavior in a future - release.

+ the case. This is the default on all Unix-like operating + systems except MacOS X.

The file name translation mode can be read with the file:native_name_encoding/0 function, which returns @@ -1068,7 +1071,7 @@ ok

This function returns the default encoding for Erlang source files (if no encoding comment is present) in the currently running release. For R16 this returns latin1 (meaning - bytewise encoding). In R17 and forward it is expected to return + bytewise encoding). In 17.0 and forward it returns utf8.

The encoding of each file can be specified using comments as described in -- cgit v1.2.3