aboutsummaryrefslogtreecommitdiffstats
path: root/lib/stdlib/doc
diff options
context:
space:
mode:
authorPaul Schoenfelder <[email protected]>2017-01-31 17:40:34 -0600
committerPaul Schoenfelder <[email protected]>2017-02-16 08:55:15 -0500
commitaa0c4b0df7cdc750450906aff4e8c81627d80605 (patch)
tree1dddc195225011bb7fefd1094bc6852629b44f21 /lib/stdlib/doc
parentcce3120dd0021c5ab5bf8d5b4088e7364f678dda (diff)
downloadotp-aa0c4b0df7cdc750450906aff4e8c81627d80605.tar.gz
otp-aa0c4b0df7cdc750450906aff4e8c81627d80605.tar.bz2
otp-aa0c4b0df7cdc750450906aff4e8c81627d80605.zip
Update erl_tar to support PAX format, etc.
This commit introduces the following key changes: - Support for reading tar archives in formats currently in common use, such as v7, STAR, USTAR, PAX, and GNU tar's extensions to the STAR/USTAR format. - Support for writing PAX archives, only when necessary, using USTAR when possible for greater portability. These changes result in lifting of some prior restrictions: - Support for reading archives produced by modern tar implementations when other restrictions described below are present. - Support for filenames which exceed 100 bytes in length, or paths which exceed 255 bytes (see USTAR format specification for more details on this restriction). - Support for filenames of arbitrary length - Support for unicode metadata (the previous behaviour of erl_tar was actually violating the spec, by writing unicode-encoded data to fields which are defined to be 7-bit ASCII, even though this technically worked when using erl_tar at source and destination, it may not have worked with other tar utilities, and this implementation now conforms to the spec). - Support for uid/gid values which cannot be converted to octal integers.
Diffstat (limited to 'lib/stdlib/doc')
-rw-r--r--lib/stdlib/doc/src/erl_tar.xml72
1 files changed, 38 insertions, 34 deletions
diff --git a/lib/stdlib/doc/src/erl_tar.xml b/lib/stdlib/doc/src/erl_tar.xml
index 24e7b64b9e..f28d8b425b 100644
--- a/lib/stdlib/doc/src/erl_tar.xml
+++ b/lib/stdlib/doc/src/erl_tar.xml
@@ -37,12 +37,13 @@
</modulesummary>
<description>
<p>This module archives and extract files to and from
- a tar file. This module supports the <c>ustar</c> format
- (IEEE Std 1003.1 and ISO/IEC&nbsp;9945-1). All modern <c>tar</c>
- programs (including GNU tar) can read this format. To ensure that
- that GNU tar produces a tar file that <c>erl_tar</c> can read,
- specify option <c>--format=ustar</c> to GNU tar.</p>
-
+ a tar file. This module supports reading most common tar formats,
+ namely v7, STAR, USTAR, and PAX, as well as some of GNU tar's extensions
+ to the USTAR format (sparse files most notably). It produces tar archives
+ in USTAR format, unless the files being archived require PAX format due to
+ restrictions in USTAR (such as unicode metadata, filename length, and more).
+ As such, <c>erl_tar</c> supports tar archives produced by most all modern
+ tar utilities, and produces tarballs which should be similarly portable.</p>
<p>By convention, the name of a tar file is to end in "<c>.tar</c>".
To abide to the convention, add "<c>.tar</c>" to the name.</p>
@@ -83,6 +84,8 @@
<p>If <seealso marker="kernel:file#native_name_encoding/0">
<c>file:native_name_encoding/0</c></seealso>
returns <c>latin1</c>, no translation of path names is done.</p>
+
+ <p>Unicode metadata stored in PAX headers is preserved</p>
</section>
<section>
@@ -104,21 +107,20 @@
<title>Limitations</title>
<list type="bulleted">
<item>
- <p>For maximum compatibility, it is safe to archive files with names
- up to 100 characters in length. Such tar files can generally be
- extracted by any <c>tar</c> program.</p>
- </item>
- <item>
- <p>For filenames exceeding 100 characters in length, the resulting tar
- file can only be correctly extracted by a POSIX-compatible <c>tar</c>
- program (such as Solaris <c>tar</c> or a modern GNU <c>tar</c>).</p>
- </item>
- <item>
- <p>Files with longer names than 256 bytes cannot be stored.</p>
+ <p>If you must remain compatible with the USTAR tar format, you must ensure file paths being
+ stored are less than 255 bytes in total, with a maximum filename component
+ length of 100 bytes. USTAR uses a header field (prefix) in addition to the name field, and
+ splits file paths longer than 100 bytes into two parts. This split is done on a directory boundary,
+ and is done in such a way to make the best use of the space available in those two fields, but in practice
+ this will often mean that you have less than 255 bytes for a path. <c>erl_tar</c> will
+ automatically upgrade the format to PAX to handle longer filenames, so this is only an issue if you
+ need to extract the archive with an older implementation of <c>erl_tar</c> or <c>tar</c> which does
+ not support PAX. In this case, the PAX headers will be extracted as regular files, and you will need to
+ apply them manually.</p>
</item>
<item>
- <p>The file name a symbolic link points is always limited
- to 100 characters.</p>
+ <p>Like the above, if you must remain USTAR compatible, you must also ensure than paths for
+ symbolic/hard links are no more than 100 bytes, otherwise PAX headers will be used.</p>
</item>
</list>
</section>
@@ -129,7 +131,9 @@
<fsummary>Add a file to an open tar file.</fsummary>
<type>
<v>TarDescriptor = term()</v>
- <v>Filename = filename()</v>
+ <v>FilenameOrBin = filename()|binary()</v>
+ <v>NameInArchive = filename()</v>
+ <v>Filename = filename()|{NameInArchive,FilenameOrBin}</v>
<v>Options = [Option]</v>
<v>Option = dereference|verbose|{chunks,ChunkSize}</v>
<v>ChunkSize = positive_integer()</v>
@@ -139,6 +143,9 @@
<desc>
<p>Adds a file to a tar file that has been opened for writing by
<seealso marker="#open/2"><c>open/1</c></seealso>.</p>
+ <p><c>NameInArchive</c> is the name under which the file becomes
+ stored in the tar file. The file gets this name when it is
+ extracted from the tar file.</p>
<p>Options:</p>
<taglist>
<tag><c>dereference</c></tag>
@@ -183,9 +190,6 @@
<seealso marker="#open/2"><c>open/2</c></seealso>. This function
accepts the same options as
<seealso marker="#add/3"><c>add/3</c></seealso>.</p>
- <p><c>NameInArchive</c> is the name under which the file becomes
- stored in the tar file. The file gets this name when it is
- extracted from the tar file.</p>
</desc>
</func>
@@ -206,8 +210,8 @@
<fsummary>Create a tar archive.</fsummary>
<type>
<v>Name = filename()</v>
- <v>FileList = [Filename|{NameInArchive, binary()},{NameInArchive,
- Filename}]</v>
+ <v>FileList = [Filename|{NameInArchive, FilenameOrBin}]</v>
+ <v>FilenameOrBin = filename()|binary()</v>
<v>Filename = filename()</v>
<v>NameInArchive = filename()</v>
<v>RetValue = ok|{error,{Name,Reason}}</v>
@@ -225,8 +229,8 @@
<fsummary>Create a tar archive with options.</fsummary>
<type>
<v>Name = filename()</v>
- <v>FileList = [Filename|{NameInArchive, binary()},{NameInArchive,
- Filename}]</v>
+ <v>FileList = [Filename|{NameInArchive, FilenameOrBin}]</v>
+ <v>FilenameOrBin = filename()|binary()</v>
<v>Filename = filename()</v>
<v>NameInArchive = filename()</v>
<v>OptionList = [Option]</v>
@@ -275,7 +279,8 @@
<name>extract(Name) -> RetValue</name>
<fsummary>Extract all files from a tar file.</fsummary>
<type>
- <v>Name = filename()</v>
+ <v>Name = filename() | {binary,binary()} | {file,Fd}</v>
+ <v>Fd = file_descriptor()</v>
<v>RetValue = ok|{error,{Name,Reason}}</v>
<v>Reason = term()</v>
</type>
@@ -294,8 +299,7 @@
<name>extract(Name, OptionList)</name>
<fsummary>Extract files from a tar file.</fsummary>
<type>
- <v>Name = filename() | {binary,Binary} | {file,Fd}</v>
- <v>Binary = binary()</v>
+ <v>Name = filename() | {binary,binary()} | {file,Fd}</v>
<v>Fd = file_descriptor()</v>
<v>OptionList = [Option]</v>
<v>Option = {cwd,Cwd}|{files,FileList}|keep_old_files|verbose|memory</v>
@@ -521,7 +525,7 @@ erl_tar:close(TarDesc)</code>
<name>table(Name) -> RetValue</name>
<fsummary>Retrieve the name of all files in a tar file.</fsummary>
<type>
- <v>Name = filename()</v>
+ <v>Name = filename()|{binary,binary()}|{file,file_descriptor()}</v>
<v>RetValue = {ok,[string()]}|{error,{Name,Reason}}</v>
<v>Reason = term()</v>
</type>
@@ -535,7 +539,7 @@ erl_tar:close(TarDesc)</code>
<fsummary>Retrieve name and information of all files in a tar file.
</fsummary>
<type>
- <v>Name = filename()</v>
+ <v>Name = filename()|{binary,binary()}|{file,file_descriptor()}</v>
</type>
<desc>
<p>Retrieves the names of all files in the tar file <c>Name</c>.</p>
@@ -546,7 +550,7 @@ erl_tar:close(TarDesc)</code>
<name>t(Name)</name>
<fsummary>Print the name of each file in a tar file.</fsummary>
<type>
- <v>Name = filename()</v>
+ <v>Name = filename()|{binary,binary()}|{file,file_descriptor()}</v>
</type>
<desc>
<p>Prints the names of all files in the tar file <c>Name</c> to the
@@ -559,7 +563,7 @@ erl_tar:close(TarDesc)</code>
<fsummary>Print name and information for each file in a tar file.
</fsummary>
<type>
- <v>Name = filename()</v>
+ <v>Name = filename()|{binary,binary()}|{file,file_descriptor()}</v>
</type>
<desc>
<p>Prints names and information about all files in the tar file