From aa0c4b0df7cdc750450906aff4e8c81627d80605 Mon Sep 17 00:00:00 2001
From: Paul Schoenfelder
Date: Tue, 31 Jan 2017 17:40:34 -0600
Subject: Update erl_tar to support PAX format, etc.
This commit introduces the following key changes:
- Support for reading tar archives in formats currently in common use,
such as v7, STAR, USTAR, PAX, and GNU tar's extensions to the
STAR/USTAR format.
- Support for writing PAX archives, only when necessary, using USTAR
when possible for greater portability.
These changes result in lifting of some prior restrictions:
- Support for reading archives produced by modern tar implementations
when other restrictions described below are present.
- Support for filenames which exceed 100 bytes in length, or paths which
exceed 255 bytes (see USTAR format specification for more details on
this restriction).
- Support for filenames of arbitrary length
- Support for unicode metadata (the previous behaviour of erl_tar was
actually violating the spec, by writing unicode-encoded data to fields
which are defined to be 7-bit ASCII, even though this technically
worked when using erl_tar at source and destination, it may not have
worked with other tar utilities, and this implementation now conforms
to the spec).
- Support for uid/gid values which cannot be converted to octal
integers.
---
lib/stdlib/doc/src/erl_tar.xml | 72 ++++++++++++++++++++++--------------------
1 file changed, 38 insertions(+), 34 deletions(-)
(limited to 'lib/stdlib/doc')
diff --git a/lib/stdlib/doc/src/erl_tar.xml b/lib/stdlib/doc/src/erl_tar.xml
index 24e7b64b9e..f28d8b425b 100644
--- a/lib/stdlib/doc/src/erl_tar.xml
+++ b/lib/stdlib/doc/src/erl_tar.xml
@@ -37,12 +37,13 @@
This module archives and extract files to and from
- a tar file. This module supports the ustar format
- (IEEE Std 1003.1 and ISO/IEC 9945-1). All modern tar
- programs (including GNU tar) can read this format. To ensure that
- that GNU tar produces a tar file that erl_tar can read,
- specify option --format=ustar to GNU tar.
-
+ a tar file. This module supports reading most common tar formats,
+ namely v7, STAR, USTAR, and PAX, as well as some of GNU tar's extensions
+ to the USTAR format (sparse files most notably). It produces tar archives
+ in USTAR format, unless the files being archived require PAX format due to
+ restrictions in USTAR (such as unicode metadata, filename length, and more).
+ As such, erl_tar supports tar archives produced by most all modern
+ tar utilities, and produces tarballs which should be similarly portable.
By convention, the name of a tar file is to end in ".tar".
To abide to the convention, add ".tar" to the name.
@@ -83,6 +84,8 @@
If
file:native_name_encoding/0
returns latin1, no translation of path names is done.
+
+ Unicode metadata stored in PAX headers is preserved
@@ -104,21 +107,20 @@
Limitations
-
-
For maximum compatibility, it is safe to archive files with names
- up to 100 characters in length. Such tar files can generally be
- extracted by any tar program.
-
- -
-
For filenames exceeding 100 characters in length, the resulting tar
- file can only be correctly extracted by a POSIX-compatible tar
- program (such as Solaris tar or a modern GNU tar).
-
- -
-
Files with longer names than 256 bytes cannot be stored.
+ If you must remain compatible with the USTAR tar format, you must ensure file paths being
+ stored are less than 255 bytes in total, with a maximum filename component
+ length of 100 bytes. USTAR uses a header field (prefix) in addition to the name field, and
+ splits file paths longer than 100 bytes into two parts. This split is done on a directory boundary,
+ and is done in such a way to make the best use of the space available in those two fields, but in practice
+ this will often mean that you have less than 255 bytes for a path. erl_tar will
+ automatically upgrade the format to PAX to handle longer filenames, so this is only an issue if you
+ need to extract the archive with an older implementation of erl_tar or tar which does
+ not support PAX. In this case, the PAX headers will be extracted as regular files, and you will need to
+ apply them manually.
-
-
The file name a symbolic link points is always limited
- to 100 characters.
+ Like the above, if you must remain USTAR compatible, you must also ensure than paths for
+ symbolic/hard links are no more than 100 bytes, otherwise PAX headers will be used.
@@ -129,7 +131,9 @@
Add a file to an open tar file.
TarDescriptor = term()
- Filename = filename()
+ FilenameOrBin = filename()|binary()
+ NameInArchive = filename()
+ Filename = filename()|{NameInArchive,FilenameOrBin}
Options = [Option]
Option = dereference|verbose|{chunks,ChunkSize}
ChunkSize = positive_integer()
@@ -139,6 +143,9 @@
Adds a file to a tar file that has been opened for writing by
open/1.
+ NameInArchive is the name under which the file becomes
+ stored in the tar file. The file gets this name when it is
+ extracted from the tar file.
Options:
dereference
@@ -183,9 +190,6 @@
open/2. This function
accepts the same options as
add/3.
- NameInArchive is the name under which the file becomes
- stored in the tar file. The file gets this name when it is
- extracted from the tar file.
@@ -206,8 +210,8 @@
Create a tar archive.
Name = filename()
- FileList = [Filename|{NameInArchive, binary()},{NameInArchive,
- Filename}]
+ FileList = [Filename|{NameInArchive, FilenameOrBin}]
+ FilenameOrBin = filename()|binary()
Filename = filename()
NameInArchive = filename()
RetValue = ok|{error,{Name,Reason}}
@@ -225,8 +229,8 @@
Create a tar archive with options.
Name = filename()
- FileList = [Filename|{NameInArchive, binary()},{NameInArchive,
- Filename}]
+ FileList = [Filename|{NameInArchive, FilenameOrBin}]
+ FilenameOrBin = filename()|binary()
Filename = filename()
NameInArchive = filename()
OptionList = [Option]
@@ -275,7 +279,8 @@
extract(Name) -> RetValue
Extract all files from a tar file.
- Name = filename()
+ Name = filename() | {binary,binary()} | {file,Fd}
+ Fd = file_descriptor()
RetValue = ok|{error,{Name,Reason}}
Reason = term()
@@ -294,8 +299,7 @@
extract(Name, OptionList)
Extract files from a tar file.
- Name = filename() | {binary,Binary} | {file,Fd}
- Binary = binary()
+ Name = filename() | {binary,binary()} | {file,Fd}
Fd = file_descriptor()
OptionList = [Option]
Option = {cwd,Cwd}|{files,FileList}|keep_old_files|verbose|memory
@@ -521,7 +525,7 @@ erl_tar:close(TarDesc)
table(Name) -> RetValue
Retrieve the name of all files in a tar file.
- Name = filename()
+ Name = filename()|{binary,binary()}|{file,file_descriptor()}
RetValue = {ok,[string()]}|{error,{Name,Reason}}
Reason = term()
@@ -535,7 +539,7 @@ erl_tar:close(TarDesc)
Retrieve name and information of all files in a tar file.
- Name = filename()
+ Name = filename()|{binary,binary()}|{file,file_descriptor()}
Retrieves the names of all files in the tar file Name.
@@ -546,7 +550,7 @@ erl_tar:close(TarDesc)
t(Name)
Print the name of each file in a tar file.
- Name = filename()
+ Name = filename()|{binary,binary()}|{file,file_descriptor()}
Prints the names of all files in the tar file Name to the
@@ -559,7 +563,7 @@ erl_tar:close(TarDesc)
Print name and information for each file in a tar file.
- Name = filename()
+ Name = filename()|{binary,binary()}|{file,file_descriptor()}
Prints names and information about all files in the tar file
--
cgit v1.2.3