From 3e6877b06ae395a9d4310ef664d0360867a47f62 Mon Sep 17 00:00:00 2001
From: Patrik Nyblom Force the The VM works with file names as if they are encoded using the ISO-latin-1 encoding, disallowing Unicode characters with codepoints beyond 255. This is default on operating systems that have transparent file naming, i.e. all Unixes except MacOSX. The VM works with file names as if they are encoded using UTF-8 (or some other system specific Unicode encoding). This is the default on operating systems that enforce Unicode encoding, i.e. Windows and MacOSX. By enabling Unicode file name translation on systems where this is not default, you open up to the possibility that some file names can not be interpreted by the VM and therefore will be returned to the program as raw binaries. The option is therefore considered experimental. Selection between Sets the default heap size of processes to the size
diff --git a/lib/kernel/doc/src/file.xml b/lib/kernel/doc/src/file.xml
index 64cdd3a8ea..d3441d3623 100644
--- a/lib/kernel/doc/src/file.xml
+++ b/lib/kernel/doc/src/file.xml
@@ -36,6 +36,61 @@
other Erlang processes to continue executing in parallel with
the file operations. See the command line flag
The Erlang VM supports file names in Unicode to a limited
+ extent. Depending on how the VM is started (with the parameter
+ The default behavior for Unicode character translation depends
+ on to what extent the underlying OS/filesystem enforces consistent
+ naming. On OSes where all file names are ensured to be in one or
+ another encoding, Unicode is the default (currently this holds for
+ Windows and MacOSX). On OSes with completely transparent file
+ naming (i.e. all Unixes except MacOSX), ISO-latin-1 file naming is
+ the default. The reason for the ISO-latin-1 default is that
+ file names are not guaranteed to be possible to interpret according to
+ the Unicode encoding expected (i.e. UTF-8), and file names that
+ cannot be decoded will only be accessible by using "raw
+ file names", in other word file names given as binaries. As file names are traditionally not binaries in Erlang,
+ applications that need to handle raw file names need to be
+ converted, why the Unicode mode for file names is not default on
+ systems having completely transparent file naming. Raw file names is a new feature in OTP R14B01, which allows the
+ user to supply completely uninterpreted file names to the
+ underlying OS/filesystem. They are supplied as binaries, where it
+ is up to the user to supply a correct encoding for the
+ environment. The function Note that on Windows, This function returns the configured default file name encoding to use for raw file names. Generally an application supplying file names raw (as binaries), should obey the character encoding returned by this function. By default, the VM uses ISO-latin-1 file name encoding on filesystems and/or OSes that use completely transparent file naming. This includes all Unix versions except for MacOSX, where the vfs layer enforces UTF-8 file naming. By giving the experimental option On Windows, this function also returns This module contains utilities on a higher level than the The module supports Unicode file names, so that it will match against regular expressions given in Unicode and that it will find and process raw file names (i.e. files named in a way that does not confirm to the expected encoding). If the VM operates in Unicode file naming mode on a machine with transparent file naming, the For more information about raw file names, see the
-filename() = string() | atom() | DeepList
-dirname() = filename()
-DeepList = [char() | atom() | DeepList]
+filename() = = string() | atom() | DeepList | RawFilename
+ DeepList = [char() | atom() | DeepList]
+ RawFilename = binary()
+ If VM is in unicode filename mode, string() and char() are allowed to be > 255.
+ RawFilename is a filename not subject to Unicode translation, meaning that it
+ can contain characters not conforming to the Unicode encoding expected from the
+ filesystem (i.e. non-UTF-8 characters although the VM is started in Unicode
+ filename mode).
+dirname() = filename()
If Unicode file name translation is in effect and the file
+ system is completely transparent, file names that cannot be
+ interpreted as Unicode may be encountered, in which case the
+
For more information about raw file names, see the
+
The module supports raw file names in the way that if a binary is present, or the file name cannot be interpreted according to the return value of
+
-name() = string() | atom() | DeepList
-DeepList = [char() | atom() | DeepList]
+name() = string() | atom() | DeepList | RawFilename
+ DeepList = [char() | atom() | DeepList]
+ RawFilename = binary()
+ If VM is in unicode filename mode, string() and char() are allowed to be > 255.
+ RawFilename is a filename not subject to Unicode translation, meaning that it
+ can contain characters not conforming to the Unicode encoding expected from the
+ filesystem (i.e. non-UTF-8 characters although the VM is started in Unicode
+ filename mode).
+