From aa0c4b0df7cdc750450906aff4e8c81627d80605 Mon Sep 17 00:00:00 2001 From: Paul Schoenfelder Date: Tue, 31 Jan 2017 17:40:34 -0600 Subject: Update erl_tar to support PAX format, etc. This commit introduces the following key changes: - Support for reading tar archives in formats currently in common use, such as v7, STAR, USTAR, PAX, and GNU tar's extensions to the STAR/USTAR format. - Support for writing PAX archives, only when necessary, using USTAR when possible for greater portability. These changes result in lifting of some prior restrictions: - Support for reading archives produced by modern tar implementations when other restrictions described below are present. - Support for filenames which exceed 100 bytes in length, or paths which exceed 255 bytes (see USTAR format specification for more details on this restriction). - Support for filenames of arbitrary length - Support for unicode metadata (the previous behaviour of erl_tar was actually violating the spec, by writing unicode-encoded data to fields which are defined to be 7-bit ASCII, even though this technically worked when using erl_tar at source and destination, it may not have worked with other tar utilities, and this implementation now conforms to the spec). - Support for uid/gid values which cannot be converted to octal integers. --- lib/stdlib/doc/src/erl_tar.xml | 72 ++++++++++++++++++++++-------------------- 1 file changed, 38 insertions(+), 34 deletions(-) (limited to 'lib/stdlib/doc') diff --git a/lib/stdlib/doc/src/erl_tar.xml b/lib/stdlib/doc/src/erl_tar.xml index 24e7b64b9e..f28d8b425b 100644 --- a/lib/stdlib/doc/src/erl_tar.xml +++ b/lib/stdlib/doc/src/erl_tar.xml @@ -37,12 +37,13 @@

This module archives and extract files to and from - a tar file. This module supports the ustar format - (IEEE Std 1003.1 and ISO/IEC 9945-1). All modern tar - programs (including GNU tar) can read this format. To ensure that - that GNU tar produces a tar file that erl_tar can read, - specify option --format=ustar to GNU tar.

- + a tar file. This module supports reading most common tar formats, + namely v7, STAR, USTAR, and PAX, as well as some of GNU tar's extensions + to the USTAR format (sparse files most notably). It produces tar archives + in USTAR format, unless the files being archived require PAX format due to + restrictions in USTAR (such as unicode metadata, filename length, and more). + As such, erl_tar supports tar archives produced by most all modern + tar utilities, and produces tarballs which should be similarly portable.

By convention, the name of a tar file is to end in ".tar". To abide to the convention, add ".tar" to the name.

@@ -83,6 +84,8 @@

If file:native_name_encoding/0 returns latin1, no translation of path names is done.

+ +

Unicode metadata stored in PAX headers is preserved

@@ -104,21 +107,20 @@ Limitations -

For maximum compatibility, it is safe to archive files with names - up to 100 characters in length. Such tar files can generally be - extracted by any tar program.

-
- -

For filenames exceeding 100 characters in length, the resulting tar - file can only be correctly extracted by a POSIX-compatible tar - program (such as Solaris tar or a modern GNU tar).

-
- -

Files with longer names than 256 bytes cannot be stored.

+

If you must remain compatible with the USTAR tar format, you must ensure file paths being + stored are less than 255 bytes in total, with a maximum filename component + length of 100 bytes. USTAR uses a header field (prefix) in addition to the name field, and + splits file paths longer than 100 bytes into two parts. This split is done on a directory boundary, + and is done in such a way to make the best use of the space available in those two fields, but in practice + this will often mean that you have less than 255 bytes for a path. erl_tar will + automatically upgrade the format to PAX to handle longer filenames, so this is only an issue if you + need to extract the archive with an older implementation of erl_tar or tar which does + not support PAX. In this case, the PAX headers will be extracted as regular files, and you will need to + apply them manually.

-

The file name a symbolic link points is always limited - to 100 characters.

+

Like the above, if you must remain USTAR compatible, you must also ensure than paths for + symbolic/hard links are no more than 100 bytes, otherwise PAX headers will be used.

@@ -129,7 +131,9 @@ Add a file to an open tar file. TarDescriptor = term() - Filename = filename() + FilenameOrBin = filename()|binary() + NameInArchive = filename() + Filename = filename()|{NameInArchive,FilenameOrBin} Options = [Option] Option = dereference|verbose|{chunks,ChunkSize} ChunkSize = positive_integer() @@ -139,6 +143,9 @@

Adds a file to a tar file that has been opened for writing by open/1.

+

NameInArchive is the name under which the file becomes + stored in the tar file. The file gets this name when it is + extracted from the tar file.

Options:

dereference @@ -183,9 +190,6 @@ open/2. This function accepts the same options as add/3.

-

NameInArchive is the name under which the file becomes - stored in the tar file. The file gets this name when it is - extracted from the tar file.

@@ -206,8 +210,8 @@ Create a tar archive. Name = filename() - FileList = [Filename|{NameInArchive, binary()},{NameInArchive, - Filename}] + FileList = [Filename|{NameInArchive, FilenameOrBin}] + FilenameOrBin = filename()|binary() Filename = filename() NameInArchive = filename() RetValue = ok|{error,{Name,Reason}} @@ -225,8 +229,8 @@ Create a tar archive with options. Name = filename() - FileList = [Filename|{NameInArchive, binary()},{NameInArchive, - Filename}] + FileList = [Filename|{NameInArchive, FilenameOrBin}] + FilenameOrBin = filename()|binary() Filename = filename() NameInArchive = filename() OptionList = [Option] @@ -275,7 +279,8 @@ extract(Name) -> RetValue Extract all files from a tar file. - Name = filename() + Name = filename() | {binary,binary()} | {file,Fd} + Fd = file_descriptor() RetValue = ok|{error,{Name,Reason}} Reason = term() @@ -294,8 +299,7 @@ extract(Name, OptionList) Extract files from a tar file. - Name = filename() | {binary,Binary} | {file,Fd} - Binary = binary() + Name = filename() | {binary,binary()} | {file,Fd} Fd = file_descriptor() OptionList = [Option] Option = {cwd,Cwd}|{files,FileList}|keep_old_files|verbose|memory @@ -521,7 +525,7 @@ erl_tar:close(TarDesc) table(Name) -> RetValue Retrieve the name of all files in a tar file. - Name = filename() + Name = filename()|{binary,binary()}|{file,file_descriptor()} RetValue = {ok,[string()]}|{error,{Name,Reason}} Reason = term() @@ -535,7 +539,7 @@ erl_tar:close(TarDesc) Retrieve name and information of all files in a tar file. - Name = filename() + Name = filename()|{binary,binary()}|{file,file_descriptor()}

Retrieves the names of all files in the tar file Name.

@@ -546,7 +550,7 @@ erl_tar:close(TarDesc) t(Name) Print the name of each file in a tar file. - Name = filename() + Name = filename()|{binary,binary()}|{file,file_descriptor()}

Prints the names of all files in the tar file Name to the @@ -559,7 +563,7 @@ erl_tar:close(TarDesc) Print name and information for each file in a tar file. - Name = filename() + Name = filename()|{binary,binary()}|{file,file_descriptor()}

Prints names and information about all files in the tar file -- cgit v1.2.3