diff options
Diffstat (limited to 'lib/kernel/doc/src/file.xml')
-rw-r--r-- | lib/kernel/doc/src/file.xml | 135 |
1 files changed, 132 insertions, 3 deletions
diff --git a/lib/kernel/doc/src/file.xml b/lib/kernel/doc/src/file.xml index 50f9722a1c..36fce464c5 100644 --- a/lib/kernel/doc/src/file.xml +++ b/lib/kernel/doc/src/file.xml @@ -36,6 +36,61 @@ other Erlang processes to continue executing in parallel with the file operations. See the command line flag <c>+A</c> in <seealso marker="erts:erl">erl(1)</seealso>.</p> + + <p>The Erlang VM supports file names in Unicode to a limited + extent. Depending on how the VM is started (with the parameter + <c>+fnu</c> or <c>+fnl</c>), file names given can contain + characters > 255 and the VM system will convert file names + back and forth to the native file name encoding.</p> + + <p>The default behavior for Unicode character translation depends + on to what extent the underlying OS/filesystem enforces consistent + naming. On OSes where all file names are ensured to be in one or + another encoding, Unicode is the default (currently this holds for + Windows and MacOSX). On OSes with completely transparent file + naming (i.e. all Unixes except MacOSX), ISO-latin-1 file naming is + the default. The reason for the ISO-latin-1 default is that + file names are not guaranteed to be possible to interpret according to + the Unicode encoding expected (i.e. UTF-8), and file names that + cannot be decoded will only be accessible by using "raw + file names", in other word file names given as binaries.</p> + + <p>As file names are traditionally not binaries in Erlang, + applications that need to handle raw file names need to be + converted, why the Unicode mode for file names is not default on + systems having completely transparent file naming.</p> + + <note>As of R14B01, the most basic file handling modules + (<c>file</c>, <c>prim_file</c>, <c>filelib</c> and + <c>filename</c>) accept raw file names, but the rest of OTP is not + guaranteed to handle them, why Unicode file naming on systems + where it is not default is still considered experimental.</note> + + <p>Raw file names is a new feature in OTP R14B01, which allows the + user to supply completely uninterpreted file names to the + underlying OS/filesystem. They are supplied as binaries, where it + is up to the user to supply a correct encoding for the + environment. The function <c>file:native_name_encoding()</c> can + be used to check what encoding the VM is working in. If the + function returns <c>latin1</c> file names are not in any way + converted to Unicode, if it is <c>utf8</c>, raw file names should + be encoded as UTF-8 if they are to follow the convention of the VM + (and usually the convention of the OS as well). Using raw + file names is useful if you have a filesystem with inconsistent + file naming, where some files are named in UTF-8 encoding while + others are not. A file:list_dir on such mixed file name systems + when the VM is in Unicode file name mode might return file names as + raw binaries as they cannot be interpreted as Unicode + file names. Raw file names can also be used to give UTF-8 encoded + file names even though the VM is not started in Unicode file name + translation mode.</p> + + <p>Note that on Windows, <c>file:native_name_encoding()</c> + returns <c>utf8</c> per default, which is the format for raw + file names even on Windows, although the underlying OS specific + code works in a limited version of little endian UTF16. As far as + the Erlang programmer is concerned, Windows native Unicode format + is UTF-8...</p> </description> <section> @@ -47,8 +102,14 @@ iodata() = iolist() | binary() io_device() as returned by file:open/2, a process handling IO protocols -name() = string() | atom() | DeepList +name() = string() | atom() | DeepList | RawFilename DeepList = [char() | atom() | DeepList] + RawFilename = binary() + If VM is in unicode filename mode, string() and char() are allowed to be > 255. + RawFilename is a filename not subject to Unicode translation, meaning that it + can contain characters not conforming to the Unicode encoding expected from the + filesystem (i.e. non-UTF-8 characters although the VM is started in Unicode + filename mode). posix() an atom which is named from the POSIX error codes used in @@ -62,6 +123,25 @@ time() = {{Year, Month, Day}, {Hour, Minute, Second}} </section> <funcs> <func> + <name>advise(IoDevice, Offset, Length, Advise) -> ok | {error, Reason}</name> + <fsummary>Predeclare an access pattern for file data</fsummary> + <type> + <v>IoDevice = io_device()</v> + <v>Offset = int()</v> + <v>Length = int()</v> + <v>Advise = posix_file_advise()</v> + <v>posix_file_advise() = normal | sequential | random | no_reuse + | will_need | dont_need</v> + <v>Reason = ext_posix()</v> + </type> + <desc> + <p><c>advise/4</c> can be used to announce an intention to access file + data in a specific pattern in the future, thus allowing the + operating system to perform appropriate optimizations.</p> + <p>On some platforms, this function might have no effect.</p> + </desc> + </func> + <func> <name>change_group(Filename, Gid) -> ok | {error, Reason}</name> <fsummary>Change group of a file</fsummary> <type> @@ -579,13 +659,24 @@ f.txt: {person, "kalle", 25}. </desc> </func> <func> + <name>native_name_encoding() -> latin1 | utf8</name> + <fsummary>Return the VM's configured filename encoding.</fsummary> + <desc> + <p>This function returns the configured default file name encoding to use for raw file names. Generally an application supplying file names raw (as binaries), should obey the character encoding returned by this function.</p> + <p>By default, the VM uses ISO-latin-1 file name encoding on filesystems and/or OSes that use completely transparent file naming. This includes all Unix versions except MacOSX, where the vfs layer enforces UTF-8 file naming. By giving the experimental option <c>+fnu</c> when starting Erlang, UTF-8 translation of file names can be turned on even for those systems. If Unicode file name translation is in effect, the system behaves as usual as long as file names conform to the encoding, but will return file names that are not properly encoded in UTF-8 as raw file names (i.e. binaries).</p> + <p>On Windows, this function also returns <c>utf8</c> by default. The OS uses a pure Unicode naming scheme and file names are always possible to interpret as valid Unicode. The fact that the underlying Windows OS actually encodes file names using little endian UTF-16 can be ignored by the Erlang programmer. Windows and MacOSX are the only operating systems where the VM operates in Unicode file name mode by default.</p> + </desc> + </func> + <func> <name>open(Filename, Modes) -> {ok, IoDevice} | {error, Reason}</name> <fsummary>Open a file</fsummary> <type> <v>Filename = name()</v> <v>Modes = [Mode]</v> - <v> Mode = read | write | append | raw | binary | {delayed_write, Size, Delay} | delayed_write | {read_ahead, Size} | read_ahead | compressed</v> + <v> Mode = read | write | append | exclusive | raw | binary | {delayed_write, Size, Delay} | delayed_write | {read_ahead, Size} | read_ahead | compressed | {encoding, Encoding}</v> <v> Size = Delay = int()</v> + <v> Encoding = latin1 | unicode | utf8 | utf16 | {utf16, Endian} | utf32 | {utf32, Endian}</v> + <v> Endian = big | little</v> <v>IoDevice = io_device()</v> <v>Reason = ext_posix() | system_limit</v> </type> @@ -611,6 +702,17 @@ f.txt: {person, "kalle", 25}. file opened with <c>append</c> will take place at the end of the file.</p> </item> + <tag><c>exclusive</c></tag> + <item> + <p>The file, when opened for writing, is created if it + does not exist. If the file exists, open will return + <c>{error, eexist}</c>.</p> + <warning><p>This option does not guarantee exclusiveness on + file systems that do not support O_EXCL properly, + such as NFS. Do not depend on this option unless you + know that the file system supports it (in general, local + file systems should be safe).</p></warning> + </item> <tag><c>raw</c></tag> <item> <p>The <c>raw</c> option allows faster access to a file, @@ -1199,7 +1301,7 @@ f.txt: {person, "kalle", 25}. </item> <tag><c>{no_translation, unicode, latin1}</c></tag> <item> - <p>The file is was opened with another <c>encoding</c> than <c>latin1</c> and the data on the file can not be translated to the byte-oriented data that this function returns.</p> + <p>The file was opened with another <c>encoding</c> than <c>latin1</c> and the data in the file can not be translated to the byte-oriented data that this function returns.</p> </item> </taglist> </desc> @@ -1641,6 +1743,33 @@ f.txt: {person, "kalle", 25}. </desc> </func> <func> + <name>datasync(IoDevice) -> ok | {error, Reason}</name> + <fsummary>Synchronizes the in-memory data of a file, ignoring most of its metadata, with that on the physical medium</fsummary> + <type> + <v>IoDevice = io_device()</v> + <v>Reason = ext_posix() | terminated</v> + </type> + <desc> + <p>Makes sure that any buffers kept by the operating system + (not by the Erlang runtime system) are written to disk. In + many ways it's resembles fsync but it not requires to update + some of file's metadata such as the access time. On + some platforms, this function might have no effect.</p> + <p>Applications that access databases or log files often write + a tiny data fragment (e.g., one line in a log file) and then + call fsync() immediately in order to ensure that the written + data is physically stored on the harddisk. Unfortunately, fsync() + will always initiate two write operations: one for the newly + written data and another one in order to update the modification + time stored in the inode. If the modification time is not a part + of the transaction concept fdatasync() can be used to avoid + unnecessary inode disk write operations.</p> + <p>Available only in some POSIX systems. This call results in a + call to fsync(), or has no effect, in systems not implementing + the fdatasync syscall.</p> + </desc> + </func> + <func> <name>truncate(IoDevice) -> ok | {error, Reason}</name> <fsummary>Truncate a file</fsummary> <type> |