aboutsummaryrefslogtreecommitdiffstats
path: root/lib/kernel/test/zlib_SUITE_data/zipdoc
diff options
context:
space:
mode:
authorErlang/OTP <[email protected]>2009-11-20 14:54:40 +0000
committerErlang/OTP <[email protected]>2009-11-20 14:54:40 +0000
commit84adefa331c4159d432d22840663c38f155cd4c1 (patch)
treebff9a9c66adda4df2106dfd0e5c053ab182a12bd /lib/kernel/test/zlib_SUITE_data/zipdoc
downloadotp-84adefa331c4159d432d22840663c38f155cd4c1.tar.gz
otp-84adefa331c4159d432d22840663c38f155cd4c1.tar.bz2
otp-84adefa331c4159d432d22840663c38f155cd4c1.zip
The R13B03 release.OTP_R13B03
Diffstat (limited to 'lib/kernel/test/zlib_SUITE_data/zipdoc')
-rw-r--r--lib/kernel/test/zlib_SUITE_data/zipdoc1924
1 files changed, 1924 insertions, 0 deletions
diff --git a/lib/kernel/test/zlib_SUITE_data/zipdoc b/lib/kernel/test/zlib_SUITE_data/zipdoc
new file mode 100644
index 0000000000..e63952e3ef
--- /dev/null
+++ b/lib/kernel/test/zlib_SUITE_data/zipdoc
@@ -0,0 +1,1924 @@
+[Info-ZIP note, 981119: this file is based on PKWARE's appnote.txt of
+ 15 February 1996, taking into account PKWARE's revised appnote.txt version
+ of 01 September 1998. It has been unofficially corrected and extended by
+ Info-ZIP without explicit permission by PKWARE. Although Info-ZIP
+ believes the information to be accurate and complete, it is provided
+ under a disclaimer similar to the PKWARE disclaimer below, differing
+ only in the substitution of "Info-ZIP" for "PKWARE". In other words,
+ use this information at your own risk, but we think it's correct.
+
+ Specification info from PKWARE that was obviously wrong has been corrected
+ silently (e.g. missing structure fields, wrong numbers
+ As of PKZIPW 2.50, two new incompatibilities have been introduced by PKWARE;
+ they are noted below. Note that the "NTFS tag" conflict is currently not
+ real; PKZIPW 2.50 actually tags NTFS files as having come from a FAT
+ file system, too.]
+
+
+Disclaimer
+----------
+
+Although PKWARE will attempt to supply current and accurate
+information relating to its file formats, algorithms, and the
+subject programs, the possibility of error can not be eliminated.
+PKWARE therefore expressly disclaims any warranty that the
+information contained in the associated materials relating to the
+subject programs and/or the format of the files created or
+accessed by the subject programs and/or the algorithms used by
+the subject programs, or any other matter, is current, correct or
+accurate as delivered. Any risk of damage due to any possible
+inaccurate information is assumed by the user of the information.
+Furthermore, the information relating to the subject programs
+and/or the file formats created or accessed by the subject
+programs and/or the algorithms used by the subject programs is
+subject to change without notice.
+
+
+General Format of a ZIP file
+----------------------------
+
+ Files stored in arbitrary order. Large zipfiles can span multiple
+ diskette media.
+
+ Overall zipfile format:
+
+ [local file header + file data + data_descriptor] . . .
+ [central directory] end of central directory record
+
+
+ A. Local file header:
+
+ local file header signature 4 bytes (0x04034b50)
+ version needed to extract 2 bytes
+ general purpose bit flag 2 bytes
+ compression method 2 bytes
+ last mod file time 2 bytes
+ last mod file date 2 bytes
+ crc-32 4 bytes
+ compressed size 4 bytes
+ uncompressed size 4 bytes
+ filename length 2 bytes
+ extra field length 2 bytes
+
+ filename (variable size)
+ extra field (variable size)
+
+
+ B. Data descriptor:
+
+ data descriptor signature 4 bytes (0x08074b50)
+ crc-32 4 bytes
+ compressed size 4 bytes
+ uncompressed size 4 bytes
+
+ This descriptor exists only if bit 3 of the general
+ purpose bit flag is set (see below). It is byte aligned
+ and immediately follows the last byte of compressed data.
+ This descriptor is used only when it was not possible to
+ seek in the output zip file, e.g., when the output zip file
+ was standard output or a non seekable device.
+
+ C. Central directory structure:
+
+ [file header] . . . end of central dir record
+
+ File header:
+
+ central file header signature 4 bytes (0x02014b50)
+ version made by 2 bytes
+ version needed to extract 2 bytes
+ general purpose bit flag 2 bytes
+ compression method 2 bytes
+ last mod file time 2 bytes
+ last mod file date 2 bytes
+ crc-32 4 bytes
+ compressed size 4 bytes
+ uncompressed size 4 bytes
+ filename length 2 bytes
+ extra field length 2 bytes
+ file comment length 2 bytes
+ disk number start 2 bytes
+ internal file attributes 2 bytes
+ external file attributes 4 bytes
+ relative offset of local header 4 bytes
+
+ filename (variable size)
+ extra field (variable size)
+ file comment (variable size)
+
+ End of central dir record:
+
+ end of central dir signature 4 bytes (0x06054b50)
+ number of this disk 2 bytes
+ number of the disk with the
+ start of the central directory 2 bytes
+ total number of entries in
+ the central dir on this disk 2 bytes
+ total number of entries in
+ the central dir 2 bytes
+ size of the central directory 4 bytes
+ offset of start of central
+ directory with respect to
+ the starting disk number 4 bytes
+ zipfile comment length 2 bytes
+ zipfile comment (variable size)
+
+
+ D. Explanation of fields:
+
+ version made by (2 bytes)
+
+ The upper byte indicates the host system (OS) for the
+ file. Software can use this information to determine
+ the line record format for text files etc. The current
+ mappings are:
+
+ 0 - FAT file system (DOS, OS/2, NT) + PKZIPW 2.50 VFAT, NTFS
+ 1 - Amiga
+ 2 - VMS (VAX or Alpha AXP)
+ 3 - Unix
+ 4 - VM/CMS
+ 5 - Atari
+ 6 - HPFS file system (OS/2, NT 3.x)
+ 7 - Macintosh
+ 8 - Z-System
+ 9 - CP/M
+ 10 - TOPS-20 [supposedly PKZIPW 2.50 NTFS]
+ 11 - NTFS file system (NT) [used by Info-ZIP, only]
+ 12 - SMS/QDOS
+ 13 - Acorn RISC OS
+ 14 - VFAT file system (Win95, NT) [Info-ZIP reservation, unused]
+ 15 - MVS
+ 16 - BeOS (BeBox or PowerMac)
+ 17 - Tandem
+ 18 thru 255 - unused
+
+ The lower byte indicates the version number of the
+ software used to encode the file. The value/10
+ indicates the major version number, and the value
+ mod 10 is the minor version number.
+
+ version needed to extract (2 bytes)
+
+ The minimum software version needed to extract the
+ file, mapped as above.
+
+ general purpose bit flag: (2 bytes)
+
+ Bit 0: If set, indicates that the file is encrypted.
+
+ (For Method 6 - Imploding)
+ Bit 1: If the compression method used was type 6,
+ Imploding, then this bit, if set, indicates
+ an 8K sliding dictionary was used. If clear,
+ then a 4K sliding dictionary was used.
+ Bit 2: If the compression method used was type 6,
+ Imploding, then this bit, if set, indicates
+ an 3 Shannon-Fano trees were used to encode the
+ sliding dictionary output. If clear, then 2
+ Shannon-Fano trees were used.
+
+ (For Method 8 - Deflating)
+ Bit 2 Bit 1
+ 0 0 Normal (-en) compression option was used.
+ 0 1 Maximum (-ex) compression option was used.
+ 1 0 Fast (-ef) compression option was used.
+ 1 1 Super Fast (-es) compression option was used.
+
+ Note: Bits 1 and 2 are undefined if the compression
+ method is any other.
+
+ Bit 3: If this bit is set, the fields crc-32, compressed size
+ and uncompressed size are set to zero in the local
+ header. The correct values are put in the data descriptor
+ immediately following the compressed data. (Note: PKZIP
+ version 2.04g for DOS only recognizes this bit for method 8
+ compression, newer versions of PKZIP recognize this bit
+ for any compression method.)
+ [Info-ZIP note: This bit was introduced by PKZIP 2.04 for
+ DOS. In general, this feature can only be reliably used
+ together with compression methods that allow intrinsic
+ detection of the "end-of-compressed-data" condition. From
+ the set of compression methods described in this Zip archive
+ specification, only "deflate" meets this requirement.
+ Especially, the method STORED does not work!
+ The Info-ZIP tools recognize this bit regardless of the
+ compression method; but, they rely on correctly set
+ "compressed size" information in the central directory entry.]
+
+ Bit 5: If this bit is set, this indicates that the file is compressed
+ patched data. (Note: Requires PKZIP version 2.70 or greater)
+
+ The upper three bits are reserved and used internally
+ by the software when processing the zipfile. The
+ remaining bits are unused.
+
+ compression method: (2 bytes)
+
+ (see accompanying documentation for algorithm
+ descriptions)
+
+ 0 - The file is stored (no compression)
+ 1 - The file is Shrunk
+ 2 - The file is Reduced with compression factor 1
+ 3 - The file is Reduced with compression factor 2
+ 4 - The file is Reduced with compression factor 3
+ 5 - The file is Reduced with compression factor 4
+ 6 - The file is Imploded
+ 7 - Reserved for Tokenizing compression algorithm
+ 8 - The file is Deflated
+ 9 - Reserved for enhanced Deflating
+ 10 - PKWARE Data Compression Library Imploding
+
+ date and time fields: (2 bytes each)
+
+ The date and time are encoded in standard MS-DOS format.
+ If input came from standard input, the date and time are
+ those at which compression was started for this data.
+
+ CRC-32: (4 bytes)
+
+ The CRC-32 algorithm was generously contributed by
+ David Schwaderer and can be found in his excellent
+ book "C Programmers Guide to NetBIOS" published by
+ Howard W. Sams & Co. Inc. The 'magic number' for
+ the CRC is 0xdebb20e3. The proper CRC pre and post
+ conditioning is used, meaning that the CRC register
+ is pre-conditioned with all ones (a starting value
+ of 0xffffffff) and the value is post-conditioned by
+ taking the one's complement of the CRC residual.
+ If bit 3 of the general purpose flag is set, this
+ field is set to zero in the local header and the correct
+ value is put in the data descriptor and in the central
+ directory.
+
+ compressed size: (4 bytes)
+ uncompressed size: (4 bytes)
+
+ The size of the file compressed and uncompressed,
+ respectively. If bit 3 of the general purpose bit flag
+ is set, these fields are set to zero in the local header
+ and the correct values are put in the data descriptor and
+ in the central directory.
+
+ filename length: (2 bytes)
+ extra field length: (2 bytes)
+ file comment length: (2 bytes)
+
+ The length of the filename, extra field, and comment
+ fields respectively. The combined length of any
+ directory record and these three fields should not
+ generally exceed 65,535 bytes. If input came from standard
+ input, the filename length is set to zero.
+
+ [Info-ZIP note:
+ This feature is not yet supported by any PKWARE version of ZIP
+ (at least not in PKZIP for DOS and PKZIP for Windows/WinNT).
+ The Info-ZIP programs handle standard input differently:
+ If input came from standard input, the filename is set to "-"
+ (length one).]
+
+
+ disk number start: (2 bytes)
+
+ The number of the disk on which this file begins.
+
+ internal file attributes: (2 bytes)
+
+ The lowest bit of this field indicates, if set, that
+ the file is apparently an ASCII or text file. If not
+ set, that the file apparently contains binary data.
+ The remaining bits are unused in version 1.0.
+
+ external file attributes: (4 bytes)
+
+ The mapping of the external attributes is
+ host-system dependent (see 'version made by'). For
+ MS-DOS, the low order byte is the MS-DOS directory
+ attribute byte. If input came from standard input, this
+ field is set to zero.
+
+ relative offset of local header: (4 bytes)
+
+ This is the offset from the start of the first disk on
+ which this file appears, to where the local header should
+ be found.
+
+ filename: (Variable)
+
+ The name of the file, with optional relative path.
+ The path stored should not contain a drive or
+ device letter, or a leading slash. All slashes
+ should be forward slashes '/' as opposed to
+ backwards slashes '\' for compatibility with Amiga
+ and Unix file systems etc. If input came from standard
+ input, there is no filename field.
+ [Info-ZIP discrepancy:
+ If input came from standard input, the file name is set
+ to "-" (without the quotes).
+ As far as we know, the PKWARE specification for "input from
+ stdin" is not supported by PKZIP/PKUNZIP for DOS, OS/2, Windows
+ Windows NT.]
+
+ extra field: (Variable)
+
+ This is for future expansion. If additional information
+ needs to be stored in the future, it should be stored
+ here. Earlier versions of the software can then safely
+ skip this file, and find the next file or header. This
+ field will be 0 length in version 1.0.
+
+ In order to allow different programs and different types
+ of information to be stored in the 'extra' field in .ZIP
+ files, the following structure should be used for all
+ programs storing data in this field:
+
+ header1+data1 + header2+data2 . . .
+
+ Each header should consist of:
+
+ Header ID - 2 bytes
+ Data Size - 2 bytes
+
+ Note: all fields stored in Intel low-byte/high-byte order.
+
+ The Header ID field indicates the type of data that is in
+ the following data block.
+
+ Header ID's of 0 thru 31 are reserved for use by PKWARE.
+ The remaining ID's can be used by third party vendors for
+ proprietary usage.
+
+ The current Header ID mappings defined by PKWARE are:
+
+ 0x0007 AV Info
+ 0x0009 OS/2 extended attributes (also Info-ZIP)
+ 0x000a PKWARE Win95/WinNT FileTimes [undocumented!]
+ 0x000c PKWARE VAX/VMS (also Info-ZIP)
+ 0x000d PKWARE Unix
+ 0x000f Patch Descriptor
+
+ The Header ID mappings defined by Info-ZIP and third parties are:
+
+ 0x07c8 Info-ZIP Macintosh (old, J. Lee)
+ 0x2605 ZipIt Macintosh (first version)
+ 0x2705 ZipIt Macintosh v 1.3.5 and newer (w/o full filename)
+ 0x334d Info-ZIP Macintosh (new, D. Haase's 'Mac3' field )
+ 0x4341 Acorn/SparkFS (David Pilling)
+ 0x4453 Windows NT security descriptor (binary ACL)
+ 0x4704 VM/CMS
+ 0x470f MVS
+ 0x4b46 FWKCS MD5 (third party, see below)
+ 0x4c41 OS/2 access control list (text ACL)
+ 0x4d49 Info-ZIP VMS (VAX or Alpha)
+ 0x5356 AOS/VS (binary ACL)
+ 0x5455 extended timestamp
+ 0x5855 Info-ZIP Unix (original; also OS/2, NT, etc.)
+ 0x6542 BeOS (BeBox, PowerMac, etc.)
+ 0x756e ASi Unix
+ 0x7855 Info-ZIP Unix (new)
+ 0xfb4a SMS/QDOS
+
+ The Data Size field indicates the size of the following
+ data block. Programs can use this value to skip to the
+ next header block, passing over any data blocks that are
+ not of interest.
+
+ Note: As stated above, the size of the entire .ZIP file
+ header, including the filename, comment, and extra
+ field should not exceed 64K in size.
+
+ In case two different programs should appropriate the same
+ Header ID value, it is strongly recommended that each
+ program place a unique signature of at least two bytes in
+ size (and preferably 4 bytes or bigger) at the start of
+ each data area. Every program should verify that its
+ unique signature is present, in addition to the Header ID
+ value being correct, before assuming that it is a block of
+ known type.
+
+ In the following descriptions, note that "Short" means two bytes,
+ "Long" means four bytes, and "Long-Long" means eight bytes,
+ regardless of their native sizes. Unless specifically noted, all
+ integer fields should be interpreted as unsigned (non-negative)
+ numbers.
+
+
+ -OS/2 Extended Attributes Extra Field:
+ ====================================
+
+ The following is the layout of the OS/2 extended attributes "extra"
+ block. (Last Revision 19960922)
+
+ Note: all fields stored in Intel low-byte/high-byte order.
+
+ Local-header version:
+
+ Value Size Description
+ ----- ---- -----------
+ (OS/2) 0x0009 Short tag for this extra block type
+ TSize Short total data size for this block
+ BSize Long uncompressed EA data size
+ CType Short compression type
+ EACRC Long CRC value for uncompressed EA data
+ (var.) variable compressed EA data
+
+ Central-header version:
+
+ Value Size Description
+ ----- ---- -----------
+ (OS/2) 0x0009 Short tag for this extra block type
+ TSize Short total data size for this block
+ BSize Long size of uncompressed local EA data
+
+ The value of CType is interpreted according to the "compression
+ method" section above; i.e., 0 for stored, 8 for deflated, etc.
+
+ The OS/2 extended attribute structure (FEA2LIST) is compressed and
+ then stored in its entirety within this structure. There will only
+ ever be one block of data in the variable-length field.
+
+
+ -OS/2 Access Control List Extra Field:
+ ====================================
+
+ The following is the layout of the OS/2 ACL extra block.
+ (Last Revision 19960922)
+
+ Local-header version:
+
+ Value Size Description
+ ----- ---- -----------
+ (ACL) 0x4c41 Short tag for this extra block type
+ TSize Short total data size for this block
+ BSize Long uncompressed ACL data size
+ CType Short compression type
+ EACRC Long CRC value for uncompressed ACL data
+ (var.) variable compressed ACL data
+
+ Central-header version:
+
+ Value Size Description
+ ----- ---- -----------
+ (ACL) 0x4c41 Short tag for this extra block type
+ TSize Short total data size for this block
+ BSize Long size of uncompressed local ACL data
+
+ The value of CType is interpreted according to the "compression
+ method" section above; i.e., 0 for stored, 8 for deflated, etc.
+
+ The uncompressed ACL data consist of a text header of the form
+ "ACL1:%hX,%hd\n", where the first field is the OS/2 ACCINFO acc_attr
+ member and the second is acc_count, followed by acc_count strings
+ of the form "%s,%hx\n", where the first field is acl_ugname (user
+ group name) and the second acl_access. This block type will be
+ extended for other operating systems as needed.
+
+
+ -Windows NT Security Descriptor Extra Field:
+ ==========================================
+
+ The following is the layout of the NT Security Descriptor (another
+ type of ACL) extra block. (Last Revision 19960922)
+
+ Local-header version:
+
+ Value Size Description
+ ----- ---- -----------
+ (SD) 0x4453 Short tag for this extra block type
+ TSize Short total data size for this block
+ BSize Long uncompressed SD data size
+ Version Byte version of uncompressed SD data format
+ CType Short compression type
+ EACRC Long CRC value for uncompressed SD data
+ (var.) variable compressed SD data
+
+ Central-header version:
+
+ Value Size Description
+ ----- ---- -----------
+ (SD) 0x4453 Short tag for this extra block type
+ TSize Short total data size for this block
+ BSize Long size of uncompressed local SD data
+
+ The value of CType is interpreted according to the "compression
+ method" section above; i.e., 0 for stored, 8 for deflated, etc.
+ Version specifies how the compressed data are to be interpreted
+ and allows for future expansion of this extra field type. Currently
+ only version 0 is defined.
+
+ For version 0, the compressed data are to be interpreted as a single
+ valid Windows NT SECURITY_DESCRIPTOR data structure, in self-relative
+ format.
+
+
+ -PKWARE Win95/WinNT Extra Field:
+ ==============================
+
+ The following description covers PKWARE's undocumented
+ Windows 95 & Windows NT extra field, introduced with the
+ release of PKZIP for Windows 2.50. (Last Revision 19980425)
+
+ This field has a fixed data size of 32 bytes and is only stored
+ as local extra field.
+
+ Value Size Description
+ ----- ---- -----------
+ (WinNT) 0x000a Short Tag for this "extra" block type
+ TSize Short Total Data Size for this block
+ Unknwn1 Long ???? (all 0 ?)
+ Unknwn2 Long ????
+ ModTime Long-Long 64-bit NTFS last-modified filetime
+ AccTime Long-Long 64-bit NTFS last-access filetime
+ CreTime Long-Long 64-bit NTFS creation filetime
+
+ The NTFS filetimes are 64-bit unsigned integers, stored in Intel
+ (least significant byte first) byte order. They determine the
+ number of 1.0E-07 seconds (1/10th microseconds!) past WinNT "epoch",
+ which is "01-Jan-1601 00:00:00 UTC".
+
+
+ -PKWARE VAX/VMS Extra Field:
+ ==========================
+
+ The following is the layout of PKWARE's VAX/VMS attributes "extra"
+ block. (Last Revision 12/17/91)
+
+ Note: all fields stored in Intel low-byte/high-byte order.
+
+ Value Size Description
+ ----- ---- -----------
+ (VMS) 0x000c Short Tag for this "extra" block type
+ TSize Short Total Data Size for this block
+ CRC Long 32-bit CRC for remainder of the block
+ Tag1 Short VMS attribute tag value #1
+ Size1 Short Size of attribute #1, in bytes
+ (var.) Size1 Attribute #1 data
+ .
+ .
+ .
+ TagN Short VMS attribute tage value #N
+ SizeN Short Size of attribute #N, in bytes
+ (var.) SizeN Attribute #N data
+
+ Rules:
+
+ 1. There will be one or more of attributes present, which will
+ each be preceded by the above TagX & SizeX values. These
+ values are identical to the ATR$C_XXXX and ATR$S_XXXX constants
+ which are defined in ATR.H under VMS C. Neither of these values
+ will ever be zero.
+
+ 2. No word alignment or padding is performed.
+
+ 3. A well-behaved PKZIP/VMS program should never produce more than
+ one sub-block with the same TagX value. Also, there will never
+ be more than one "extra" block of type 0x000c in a particular
+ directory record.
+
+
+ -Info-ZIP VMS Extra Field:
+ ========================
+
+ The following is the layout of Info-ZIP's VMS attributes extra
+ block for VAX or Alpha AXP. The local-header and central-header
+ versions are identical. (Last Revision 19960922)
+
+ Value Size Description
+ ----- ---- -----------
+ (VMS2) 0x4d49 Short tag for this extra block type
+ TSize Short total data size for this block
+ ID Long block ID
+ Flags Short info bytes
+ BSize Short uncompressed block size
+ Reserved Long (reserved)
+ (var.) variable compressed VMS file-attributes block
+
+ The block ID is one of the following unterminated strings:
+
+ "VFAB" struct FAB
+ "VALL" struct XABALL
+ "VFHC" struct XABFHC
+ "VDAT" struct XABDAT
+ "VRDT" struct XABRDT
+ "VPRO" struct XABPRO
+ "VKEY" struct XABKEY
+ "VMSV" version (e.g., "V6.1"; truncated at hyphen)
+ "VNAM" reserved
+
+ The lower three bits of Flags indicate the compression method. The
+ currently defined methods are:
+
+ 0 stored (not compressed)
+ 1 simple "RLE"
+ 2 deflated
+
+ The "RLE" method simply replaces zero-valued bytes with zero-valued
+ bits and non-zero-valued bytes with a "1" bit followed by the byte
+ value.
+
+ The variable-length compressed data contains only the data corre-
+ sponding to the indicated structure or string. Typically multiple
+ VMS2 extra fields are present (each with a unique block type).
+
+
+ -Info-ZIP Macintosh Extra Field:
+ ==============================
+
+ The following is the layout of the (old) Info-ZIP resource-fork extra
+ block for Macintosh. The local-header and central-header versions
+ are identical. (Last Revision 19960922)
+
+ Value Size Description
+ ----- ---- -----------
+ (Mac) 0x07c8 Short tag for this extra block type
+ TSize Short total data size for this block
+ "JLEE" beLong extra-field signature
+ FInfo 16 bytes Macintosh FInfo structure
+ CrDat beLong HParamBlockRec fileParam.ioFlCrDat
+ MdDat beLong HParamBlockRec fileParam.ioFlMdDat
+ Flags beLong info bits
+ DirID beLong HParamBlockRec fileParam.ioDirID
+ VolName 28 bytes volume name (optional)
+
+ All fields but the first two are in native Macintosh format
+ (big-endian Motorola order, not little-endian Intel). The least
+ significant bit of Flags is 1 if the file is a data fork, 0 other-
+ wise. In addition, if this extra field is present, the filename
+ has an extra 'd' or 'r' appended to indicate data fork or resource
+ fork. The 28-byte VolName field may be omitted.
+
+
+ -ZipIt Macintosh Extra Field (long):
+ ==================================
+
+ The following is the layout of the ZipIt extra block for Macintosh.
+ The local-header and central-header versions are identical.
+ (Last Revision 19970130)
+
+ Value Size Description
+ ----- ---- -----------
+ (Mac2) 0x2605 Short tag for this extra block type
+ TSize Short total data size for this block
+ "ZPIT" beLong extra-field signature
+ FnLen Byte length of FileName
+ FileName variable full Macintosh filename
+ FileType Byte[4] four-byte Mac file type string
+ Creator Byte[4] four-byte Mac creator string
+
+
+ -ZipIt Macintosh Extra Field (short):
+ ===================================
+
+ The following is the layout of a shortened variant of the
+ ZipIt extra block for Macintosh (without "full name" entry).
+ This variant is used by ZipIt 1.3.5 and newer for entries that
+ do not need a "full Mac filename" record.
+ The local-header and central-header versions are identical.
+ (Last Revision 19980903)
+
+ Value Size Description
+ ----- ---- -----------
+ (Mac2b) 0x2705 Short tag for this extra block type
+ TSize Short total data size for this block
+ "ZPIT" beLong extra-field signature
+ FileType Byte[4] four-byte Mac file type string
+ Creator Byte[4] four-byte Mac creator string
+
+
+ -Info-ZIP Macintosh Extra Field (new):
+ ====================================
+
+ The following is the layout of the (new) Info-ZIP extra
+ block for Macintosh, designed by Dirk Haase.
+ All values are in little-endian.
+ (Last Revision 19981005)
+
+ Local-header version:
+
+ Value Size Description
+ ----- ---- -----------
+ (Mac3) 0x334d Short tag for this extra block type ("M3")
+ TSize Short total data size for this block
+ BSize Long uncompressed finder attribute data size
+ Flags Short info bits
+ fdType Byte[4] Type of the File (4-byte string)
+ fdCreator Byte[4] Creator of the File (4-byte string)
+ (CType) Short compression type
+ (CRC) Long CRC value for uncompressed MacOS data
+ Attribs variable finder attribute data (see below)
+
+
+ Central-header version:
+
+ Value Size Description
+ ----- ---- -----------
+ (Mac3) 0x334d Short tag for this extra block type ("M3")
+ TSize Short total data size for this block
+ BSize Long uncompressed finder attribute data size
+ Flags Short info bits
+ fdType Byte[4] Type of the File (4-byte string)
+ fdCreator Byte[4] Creator of the File (4-byte string)
+
+ The third bit of Flags in both headers indicates whether
+ the LOCAL extra field is uncompressed (and therefore whether CType
+ and CRC are omitted):
+
+ Bits of the Flags:
+ bit 0 if set, file is a data fork; otherwise unset
+ bit 1 if set, filename will be not changed
+ bit 2 if set, Attribs is uncompressed (no CType, CRC)
+ bit 3 if set, date and times are in 64 bit
+ if zero date and times are in 32 bit.
+ bit 4 if set, timezone offsets fields for the native
+ Mac times are omitted (UTC support deactivated)
+ bits 5-15 reserved;
+
+
+ Attributes:
+
+ Attribs is a Mac-specific block of data in little-endian format with
+ the following structure (if compressed, uncompress it first):
+
+ Value Size Description
+ ----- ---- -----------
+ fdFlags Short Finder Flags
+ fdLocation.v Short Finder Icon Location
+ fdLocation.h Short Finder Icon Location
+ fdFldr Short Folder containing file
+
+ FXInfo 16 bytes Macintosh FXInfo structure
+ FXInfo-Structure:
+ fdIconID Short
+ fdUnused[3] Short unused but reserved 6 bytes
+ fdScript Byte Script flag and number
+ fdXFlags Byte More flag bits
+ fdComment Short Comment ID
+ fdPutAway Long Home Dir ID
+
+ FVersNum Byte file version number
+ may be not used by MacOS
+ ACUser Byte directory access rights
+
+ FlCrDat ULong date and time of creation
+ FlMdDat ULong date and time of last modification
+ FlBkDat ULong date and time of last backup
+ These time numbers are original Mac FileTime values (local time!).
+ Currently, date-time width is 32-bit, but future version may
+ support be 64-bit times (see flags)
+
+ CrGMTOffs Long(signed!) difference "local Creat. time - UTC"
+ MdGMTOffs Long(signed!) difference "local Modif. time - UTC"
+ BkGMTOffs Long(signed!) difference "local Backup time - UTC"
+ These "local time - UTC" differences (stored in seconds) may be
+ used to support timestamp adjustment after inter-timezone transfer.
+ These fields are optional; bit 4 of the flags word controls their
+ presence.
+
+ Charset Short TextEncodingBase (Charset)
+ valid for the following two fields
+
+ FullPath variable Path of the current file.
+ Zero terminated string (C-String)
+ Currently coded in the native Charset.
+
+ Comment variable Finder Comment of the current file.
+ Zero terminated string (C-String)
+ Currently coded in the native Charset.
+
+
+ -Acorn SparkFS Extra Field:
+ =========================
+
+ The following is the layout of David Pilling's SparkFS extra block
+ for Acorn RISC OS. The local-header and central-header versions are
+ identical. (Last Revision 19960922)
+
+ Value Size Description
+ ----- ---- -----------
+ (Acorn) 0x4341 Short tag for this extra block type
+ TSize Short total data size for this block
+ "ARC0" Long extra-field signature
+ LoadAddr Long load address or file type
+ ExecAddr Long exec address
+ Attr Long file permissions
+ Zero Long reserved; always zero
+
+ The following bits of Attr are associated with the given file
+ permissions:
+
+ bit 0 user-writable ('W')
+ bit 1 user-readable ('R')
+ bit 2 reserved
+ bit 3 locked ('L')
+ bit 4 publicly writable ('w')
+ bit 5 publicly readable ('r')
+ bit 6 reserved
+ bit 7 reserved
+
+
+ -VM/CMS Extra Field:
+ ==================
+
+ The following is the layout of the file-attributes extra block for
+ VM/CMS. The local-header and central-header versions are
+ identical. (Last Revision 19960922)
+
+ Value Size Description
+ ----- ---- -----------
+ (VM/CMS) 0x4704 Short tag for this extra block type
+ TSize Short total data size for this block
+ flData variable file attributes data
+
+ flData is an uncompressed fldata_t struct.
+
+
+ -MVS Extra Field:
+ ===============
+
+ The following is the layout of the file-attributes extra block for
+ MVS. The local-header and central-header versions are identical.
+ (Last Revision 19960922)
+
+ Value Size Description
+ ----- ---- -----------
+ (MVS) 0x470f Short tag for this extra block type
+ TSize Short total data size for this block
+ flData variable file attributes data
+
+ flData is an uncompressed fldata_t struct.
+
+
+ -PKWARE Unix Extra Field:
+ ========================
+
+ The following is the layout of PKWARE's Unix "extra" block.
+ It was introduced with the release of PKZIP for Unix 2.50.
+ Note: all fields are stored in Intel low-byte/high-byte order.
+ (Last Revision 19980901)
+
+ This field has a minimum data size of 12 bytes and is only stored
+ as local extra field.
+
+ Value Size Description
+ ----- ---- -----------
+ (Unix0) 0x000d Short Tag for this "extra" block type
+ TSize Short Total Data Size for this block
+ AcTime Long time of last access (UTC/GMT)
+ ModTime Long time of last modification (UTC/GMT)
+ UID Short Unix user ID
+ GID Short Unix group ID
+ (var) variable Variable length data field
+
+ The variable length data field will contain file type
+ specific data. Currently the only values allowed are
+ the original "linked to" file names for hard or symbolic links.
+
+ The fixed part of this field has the same layout as Info-ZIP's
+ abandoned "Unix1 timestamps & owner ID info" extra field;
+ only the two tag bytes are different.
+
+
+ -PATCH Descriptor Extra Field:
+ ============================
+
+ The following is the layout of the Patch Descriptor "extra"
+ block.
+
+ Note: all fields stored in Intel low-byte/high-byte order.
+
+ Value Size Description
+ ----- ---- -----------
+ (Patch) 0x000f Short Tag for this "extra" block type
+ TSize Short Size of the total "extra" block
+ Version Short Version of the descriptor
+ Flags Long Actions and reactions (see below)
+ OldSize Long Size of the file about to be patched
+ OldCRC Long 32-bit CRC of the file about to be patched
+ NewSize Long Size of the resulting file
+ NewCRC Long 32-bit CRC of the resulting file
+
+
+ Actions and reactions
+
+ Bits Description
+ ---- ----------------
+ 0 Use for autodetection
+ 1 Treat as selfpatch
+ 2-3 RESERVED
+ 4-5 Action (see below)
+ 6-7 RESERVED
+ 8-9 Reaction (see below) to absent file
+ 10-11 Reaction (see below) to newer file
+ 12-13 Reaction (see below) to unknown file
+ 14-15 RESERVED
+ 16-31 RESERVED
+
+ Actions
+
+ Action Value
+ ------ -----
+ none 0
+ add 1
+ delete 2
+ patch 3
+
+ Reactions
+
+ Reaction Value
+ -------- -----
+ ask 0
+ skip 1
+ ignore 2
+ fail 3
+
+
+ -Extended Timestamp Extra Field:
+ ==============================
+
+ The following is the layout of the extended-timestamp extra block.
+ (Last Revision 19970118)
+
+ Local-header version:
+
+ Value Size Description
+ ----- ---- -----------
+ (time) 0x5455 Short tag for this extra block type
+ TSize Short total data size for this block
+ Flags Byte info bits
+ (ModTime) Long time of last modification (UTC/GMT)
+ (AcTime) Long time of last access (UTC/GMT)
+ (CrTime) Long time of original creation (UTC/GMT)
+
+ Central-header version:
+
+ Value Size Description
+ ----- ---- -----------
+ (time) 0x5455 Short tag for this extra block type
+ TSize Short total data size for this block
+ Flags Byte info bits (refers to local header!)
+ (ModTime) Long time of last modification (UTC/GMT)
+
+ The central-header extra field contains the modification time only,
+ or no timestamp at all. TSize is used to flag its presence or
+ absence. But note:
+
+ If "Flags" indicates that Modtime is present in the local header
+ field, it MUST be present in the central header field, too!
+ This correspondence is required because the modification time
+ value may be used to support trans-timezone freshening and
+ updating operations with zip archives.
+
+ The time values are in standard Unix signed-long format, indicating
+ the number of seconds since 1 January 1970 00:00:00. The times
+ are relative to Coordinated Universal Time (UTC), also sometimes
+ referred to as Greenwich Mean Time (GMT). To convert to local time,
+ the software must know the local timezone offset from UTC/GMT.
+
+ The lower three bits of Flags in both headers indicate which time-
+ stamps are present in the LOCAL extra field:
+
+ bit 0 if set, modification time is present
+ bit 1 if set, access time is present
+ bit 2 if set, creation time is present
+ bits 3-7 reserved for additional timestamps; not set
+
+ Those times that are present will appear in the order indicated, but
+ any combination of times may be omitted. (Creation time may be
+ present without access time, for example.) TSize should equal
+ (1 + 4*(number of set bits in Flags)), as the block is currently
+ defined. Other timestamps may be added in the future.
+
+
+ -Info-ZIP Unix Extra Field (type 1):
+ ==================================
+
+ The following is the layout of the old Info-ZIP extra block for
+ Unix. It has been replaced by the extended-timestamp extra block
+ (0x5455) and the Unix type 2 extra block (0x7855).
+ (Last Revision 19970118)
+
+ Local-header version:
+
+ Value Size Description
+ ----- ---- -----------
+ (Unix1) 0x5855 Short tag for this extra block type
+ TSize Short total data size for this block
+ AcTime Long time of last access (UTC/GMT)
+ ModTime Long time of last modification (UTC/GMT)
+ UID Short Unix user ID
+ GID Short Unix group ID
+
+ Central-header version:
+
+ Value Size Description
+ ----- ---- -----------
+ (Unix1) 0x5855 Short tag for this extra block type
+ TSize Short total data size for this block
+ AcTime Long time of last access (GMT/UTC)
+ ModTime Long time of last modification (GMT/UTC)
+
+ The file access and modification times are in standard Unix signed-
+ long format, indicating the number of seconds since 1 January 1970
+ 00:00:00. The times are relative to Coordinated Universal Time
+ (UTC), also sometimes referred to as Greenwich Mean Time (GMT). To
+ convert to local time, the software must know the local timezone
+ offset from UTC/GMT. The modification time may be used by non-Unix
+ systems to support inter-timezone freshening and updating of zip
+ archives.
+
+ The local-header extra block may optionally contain UID and GID
+ info for the file. The local-header TSize value is the only
+ indication of this. Note that Unix UIDs and GIDs are usually
+ specific to a particular machine, and they generally require root
+ access to restore.
+
+ This extra field type is obsolete, but it has been in use since
+ mid-1994. Therefore future archiving software should continue to
+ support it. Some guidelines:
+
+ An archive member should either contain the old "Unix1"
+ extra field block or the new extra field types "time" and/or
+ "Unix2".
+
+ If both the old "Unix1" block type and one or both of the new
+ block types "time" and "Unix2" are found, the "Unix1" block
+ should be considered invalid and ignored.
+
+ Unarchiving software should recognize both old and new extra
+ field block types, but the info from new types overrides the
+ old "Unix1" field.
+
+ Archiving software should recognize "Unix1" extra fields for
+ timestamp comparison but never create it for updated, freshened
+ or new archive members. When copying existing members to a new
+ archive, any "Unix1" extra field blocks should be converted to
+ the new "time" and/or "Unix2" types.
+
+
+ -Info-ZIP Unix Extra Field (type 2):
+ ==================================
+
+ The following is the layout of the new Info-ZIP extra block for
+ Unix. (Last Revision 19960922)
+
+ Local-header version:
+
+ Value Size Description
+ ----- ---- -----------
+ (Unix2) 0x7855 Short tag for this extra block type
+ TSize Short total data size for this block
+ UID Short Unix user ID
+ GID Short Unix group ID
+
+ Central-header version:
+
+ Value Size Description
+ ----- ---- -----------
+ (Unix2) 0x7855 Short tag for this extra block type
+ TSize Short total data size for this block
+
+ The data size of the central-header version is zero; it is used
+ solely as a flag that UID/GID info is present in the local-header
+ extra field. If additional fields are ever added to the local
+ version, the central version may be extended to indicate this.
+
+ Note that Unix UIDs and GIDs are usually specific to a particular
+ machine, and they generally require root access to restore.
+
+
+ -ASi Unix Extra Field:
+ ====================
+
+ The following is the layout of the ASi extra block for Unix. The
+ local-header and central-header versions are identical.
+ (Last Revision 19960916)
+
+ Value Size Description
+ ----- ---- -----------
+ (Unix3) 0x756e Short tag for this extra block type
+ TSize Short total data size for this block
+ CRC Long CRC-32 of the remaining data
+ Mode Short file permissions
+ SizDev Long symlink'd size OR major/minor dev num
+ UID Short user ID
+ GID Short group ID
+ (var.) variable symbolic link filename
+
+ Mode is the standard Unix st_mode field from struct stat, containing
+ user/group/other permissions, setuid/setgid and symlink info, etc.
+
+ If Mode indicates that this file is a symbolic link, SizDev is the
+ size of the file to which the link points. Otherwise, if the file
+ is a device, SizDev contains the standard Unix st_rdev field from
+ struct stat (includes the major and minor numbers of the device).
+ SizDev is undefined in other cases.
+
+ If Mode indicates that the file is a symbolic link, the final field
+ will be the name of the file to which the link points. The file-
+ name length can be inferred from TSize.
+
+ [Note that TSize may incorrectly refer to the data size not counting
+ the CRC; i.e., it may be four bytes too small.]
+
+
+ -BeOS Extra Field:
+ ================
+
+ The following is the layout of the file-attributes extra block for
+ BeOS. (Last Revision 19970531)
+
+ Local-header version:
+
+ Value Size Description
+ ----- ---- -----------
+ (BeOS) 0x6542 Short tag for this extra block type
+ TSize Short total data size for this block
+ BSize Long uncompressed file attribute data size
+ Flags Byte info bits
+ (CType) Short compression type
+ (CRC) Long CRC value for uncompressed file attribs
+ Attribs variable file attribute data
+
+ Central-header version:
+
+ Value Size Description
+ ----- ---- -----------
+ (BeOS) 0x6542 Short tag for this extra block type
+ TSize Short total data size for this block
+ BSize Long size of uncompressed local EF block data
+ Flags Byte info bits
+
+ The least significant bit of Flags in both headers indicates whether
+ the LOCAL extra field is uncompressed (and therefore whether CType
+ and CRC are omitted):
+
+ bit 0 if set, Attribs is uncompressed (no CType, CRC)
+ bits 1-7 reserved; if set, assume error or unknown data
+
+ Currently the only supported compression types are deflated (type 8)
+ and stored (type 0); the latter is not used by Info-ZIP's Zip but is
+ supported by UnZip.
+
+ Attribs is a BeOS-specific block of data in big-endian format with
+ the following structure (if compressed, uncompress it first):
+
+ Value Size Description
+ ----- ---- -----------
+ Name variable attribute name (null-terminated string)
+ Type Long attribute type (32-bit unsigned integer)
+ Size Long Long data size for this sub-block (64 bits)
+ Data variable attribute data
+
+ The attribute structure is repeated for every attribute. The Data
+ field may contain anything--text, flags, bitmaps, etc.
+
+
+ -SMS/QDOS Extra Field:
+ ====================
+
+ The following is the layout of the file-attributes extra block for
+ SMS/QDOS. The local-header and central-header versions are identical.
+ (Last Revision 19960929)
+
+ Value Size Description
+ ----- ---- -----------
+ (QDOS) 0xfb4a Short tag for this extra block type
+ TSize Short total data size for this block
+ LongID Long extra-field signature
+ (ExtraID) Long additional signature/flag bytes
+ QDirect 64 bytes qdirect structure
+
+ LongID may be "QZHD" or "QDOS". In the latter case, ExtraID will
+ be present. Its first three bytes are "02\0"; the last byte is
+ currently undefined.
+
+ QDirect contains the file's uncompressed directory info (qdirect
+ struct). Its elements are in native (big-endian) format:
+
+ d_length beLong file length
+ d_access byte file access type
+ d_type byte file type
+ d_datalen beLong data length
+ d_reserved beLong unused
+ d_szname beShort size of filename
+ d_name 36 bytes filename
+ d_update beLong time of last update
+ d_refdate beLong file version number
+ d_backup beLong time of last backup (archive date)
+
+
+ -AOS/VS Extra Field:
+ ==================
+
+ The following is the layout of the extra block for Data General
+ AOS/VS. The local-header and central-header versions are identical.
+ (Last Revision 19961125)
+
+ Value Size Description
+ ----- ---- -----------
+ (AOSVS) 0x5356 Short tag for this extra block type
+ TSize Short total data size for this block
+ "FCI\0" Long extra-field signature
+ Version Byte version of AOS/VS extra block (10 = 1.0)
+ Fstat variable fstat packet
+ AclBuf variable raw ACL data ($MXACL bytes)
+
+ Fstat contains the file's uncompressed fstat packet, which is one of
+ the following:
+
+ normal fstat packet (P_FSTAT struct)
+ DIR/CPD fstat packet (P_FSTAT_DIR struct)
+ unit (device) fstat packet (P_FSTAT_UNIT struct)
+ IPC file fstat packet (P_FSTAT_IPC struct)
+
+ AclBuf contains the raw ACL data; its length is $MXACL.
+
+
+ -FWKCS MD5 Extra Field:
+ =====================
+
+ The following is the layout of the optional extra block used by the
+ FWKCS utility. There is no local-header version; the following
+ applies only to the central header. (Last Revision 19961207)
+
+ Central-header version:
+
+ Value Size Description
+ ----- ---- -----------
+ (MD5) 0x4b46 Short tag for this extra block type
+ TSize Short total data size for this block (19)
+ "MD5" 3 bytes extra-field signature
+ MD5hash 16 bytes 128-bit MD5 hash of uncompressed data
+
+ The MD5 hash in this extra block is used to automatically identify
+ files independent of their filenames; it is an an enhanced contents-
+ signature.
+
+ FWKCS provides an option to strip this extra field, if
+ present, from a zipfile central directory. In adding
+ this extra field, FWKCS preserves Zipfile Authenticity
+ Verification; if stripping this extra field, FWKCS
+ preserves all versions of AV through PKZIP version 2.04g.
+
+ ``The MD5 algorithm is being placed in the public domain for review
+ and possible adoption as a standard.'' (Ron Rivest, MIT Laboratory
+ for Computer Science and RSA Data Security, Inc., April 1992, RFC
+ 1321, 11.76-77). FWKCS, and FWKCS Contents_Signature System, are
+ trademarks of Frederick W. Kantor.
+
+
+
+ file comment: (Variable)
+
+ The comment for this file.
+
+ number of this disk: (2 bytes)
+
+ The number of this disk, which contains central
+ directory end record.
+
+ number of the disk with the start of the central directory: (2 bytes)
+
+ The number of the disk on which the central
+ directory starts.
+
+ total number of entries in the central dir on this disk: (2 bytes)
+
+ The number of central directory entries on this disk.
+
+ total number of entries in the central dir: (2 bytes)
+
+ The total number of files in the zipfile.
+
+
+ size of the central directory: (4 bytes)
+
+ The size (in bytes) of the entire central directory.
+
+ offset of start of central directory with respect to
+ the starting disk number: (4 bytes)
+
+ Offset of the start of the central directory on the
+ disk on which the central directory starts.
+
+ zipfile comment length: (2 bytes)
+
+ The length of the comment for this zipfile.
+
+ zipfile comment: (Variable)
+
+ The comment for this zipfile.
+
+
+ D. General notes:
+
+ 1) All fields unless otherwise noted are unsigned and stored
+ in Intel low-byte:high-byte, low-word:high-word order.
+
+ 2) String fields are not null terminated, since the
+ length is given explicitly.
+
+ 3) Local headers should not span disk boundaries. Also, even
+ though the central directory can span disk boundaries, no
+ single record in the central directory should be split
+ across disks.
+
+ 4) The entries in the central directory may not necessarily
+ be in the same order that files appear in the zipfile.
+
+UnShrinking - Method 1
+----------------------
+
+Shrinking is a Dynamic Ziv-Lempel-Welch compression algorithm
+with partial clearing. The initial code size is 9 bits, and
+the maximum code size is 13 bits. Shrinking differs from
+conventional Dynamic Ziv-Lempel-Welch implementations in several
+respects:
+
+1) The code size is controlled by the compressor, and is not
+ automatically increased when codes larger than the current
+ code size are created (but not necessarily used). When
+ the decompressor encounters the code sequence 256
+ (decimal) followed by 1, it should increase the code size
+ read from the input stream to the next bit size. No
+ blocking of the codes is performed, so the next code at
+ the increased size should be read from the input stream
+ immediately after where the previous code at the smaller
+ bit size was read. Again, the decompressor should not
+ increase the code size used until the sequence 256,1 is
+ encountered.
+
+2) When the table becomes full, total clearing is not
+ performed. Rather, when the compressor emits the code
+ sequence 256,2 (decimal), the decompressor should clear
+ all leaf nodes from the Ziv-Lempel tree, and continue to
+ use the current code size. The nodes that are cleared
+ from the Ziv-Lempel tree are then re-used, with the lowest
+ code value re-used first, and the highest code value
+ re-used last. The compressor can emit the sequence 256,2
+ at any time.
+
+
+
+Expanding - Methods 2-5
+-----------------------
+
+The Reducing algorithm is actually a combination of two
+distinct algorithms. The first algorithm compresses repeated
+byte sequences, and the second algorithm takes the compressed
+stream from the first algorithm and applies a probabilistic
+compression method.
+
+The probabilistic compression stores an array of 'follower
+sets' S(j), for j=0 to 255, corresponding to each possible
+ASCII character. Each set contains between 0 and 32
+characters, to be denoted as S(j)[0],...,S(j)[m], where m<32.
+The sets are stored at the beginning of the data area for a
+Reduced file, in reverse order, with S(255) first, and S(0)
+last.
+
+The sets are encoded as { N(j), S(j)[0],...,S(j)[N(j)-1] },
+where N(j) is the size of set S(j). N(j) can be 0, in which
+case the follower set for S(j) is empty. Each N(j) value is
+encoded in 6 bits, followed by N(j) eight bit character values
+corresponding to S(j)[0] to S(j)[N(j)-1] respectively. If
+N(j) is 0, then no values for S(j) are stored, and the value
+for N(j-1) immediately follows.
+
+Immediately after the follower sets, is the compressed data
+stream. The compressed data stream can be interpreted for the
+probabilistic decompression as follows:
+
+
+let Last-Character <- 0.
+loop until done
+ if the follower set S(Last-Character) is empty then
+ read 8 bits from the input stream, and copy this
+ value to the output stream.
+ otherwise if the follower set S(Last-Character) is non-empty then
+ read 1 bit from the input stream.
+ if this bit is not zero then
+ read 8 bits from the input stream, and copy this
+ value to the output stream.
+ otherwise if this bit is zero then
+ read B(N(Last-Character)) bits from the input
+ stream, and assign this value to I.
+ Copy the value of S(Last-Character)[I] to the
+ output stream.
+
+ assign the last value placed on the output stream to
+ Last-Character.
+end loop
+
+
+B(N(j)) is defined as the minimal number of bits required to
+encode the value N(j)-1.
+
+
+The decompressed stream from above can then be expanded to
+re-create the original file as follows:
+
+
+let State <- 0.
+
+loop until done
+ read 8 bits from the input stream into C.
+ case State of
+ 0: if C is not equal to DLE (144 decimal) then
+ copy C to the output stream.
+ otherwise if C is equal to DLE then
+ let State <- 1.
+
+ 1: if C is non-zero then
+ let V <- C.
+ let Len <- L(V)
+ let State <- F(Len).
+ otherwise if C is zero then
+ copy the value 144 (decimal) to the output stream.
+ let State <- 0
+
+ 2: let Len <- Len + C
+ let State <- 3.
+
+ 3: move backwards D(V,C) bytes in the output stream
+ (if this position is before the start of the output
+ stream, then assume that all the data before the
+ start of the output stream is filled with zeros).
+ copy Len+3 bytes from this position to the output stream.
+ let State <- 0.
+ end case
+end loop
+
+
+The functions F,L, and D are dependent on the 'compression
+factor', 1 through 4, and are defined as follows:
+
+For compression factor 1:
+ L(X) equals the lower 7 bits of X.
+ F(X) equals 2 if X equals 127 otherwise F(X) equals 3.
+ D(X,Y) equals the (upper 1 bit of X) * 256 + Y + 1.
+For compression factor 2:
+ L(X) equals the lower 6 bits of X.
+ F(X) equals 2 if X equals 63 otherwise F(X) equals 3.
+ D(X,Y) equals the (upper 2 bits of X) * 256 + Y + 1.
+For compression factor 3:
+ L(X) equals the lower 5 bits of X.
+ F(X) equals 2 if X equals 31 otherwise F(X) equals 3.
+ D(X,Y) equals the (upper 3 bits of X) * 256 + Y + 1.
+For compression factor 4:
+ L(X) equals the lower 4 bits of X.
+ F(X) equals 2 if X equals 15 otherwise F(X) equals 3.
+ D(X,Y) equals the (upper 4 bits of X) * 256 + Y + 1.
+
+
+Imploding - Method 6
+--------------------
+
+The Imploding algorithm is actually a combination of two distinct
+algorithms. The first algorithm compresses repeated byte
+sequences using a sliding dictionary. The second algorithm is
+used to compress the encoding of the sliding dictionary output,
+using multiple Shannon-Fano trees.
+
+The Imploding algorithm can use a 4K or 8K sliding dictionary
+size. The dictionary size used can be determined by bit 1 in the
+general purpose flag word; a 0 bit indicates a 4K dictionary
+while a 1 bit indicates an 8K dictionary.
+
+The Shannon-Fano trees are stored at the start of the compressed
+file. The number of trees stored is defined by bit 2 in the
+general purpose flag word; a 0 bit indicates two trees stored, a
+1 bit indicates three trees are stored. If 3 trees are stored,
+the first Shannon-Fano tree represents the encoding of the
+Literal characters, the second tree represents the encoding of
+the Length information, the third represents the encoding of the
+Distance information. When 2 Shannon-Fano trees are stored, the
+Length tree is stored first, followed by the Distance tree.
+
+The Literal Shannon-Fano tree, if present is used to represent
+the entire ASCII character set, and contains 256 values. This
+tree is used to compress any data not compressed by the sliding
+dictionary algorithm. When this tree is present, the Minimum
+Match Length for the sliding dictionary is 3. If this tree is
+not present, the Minimum Match Length is 2.
+
+The Length Shannon-Fano tree is used to compress the Length part
+of the (length,distance) pairs from the sliding dictionary
+output. The Length tree contains 64 values, ranging from the
+Minimum Match Length, to 63 plus the Minimum Match Length.
+
+The Distance Shannon-Fano tree is used to compress the Distance
+part of the (length,distance) pairs from the sliding dictionary
+output. The Distance tree contains 64 values, ranging from 0 to
+63, representing the upper 6 bits of the distance value. The
+distance values themselves will be between 0 and the sliding
+dictionary size, either 4K or 8K.
+
+The Shannon-Fano trees themselves are stored in a compressed
+format. The first byte of the tree data represents the number of
+bytes of data representing the (compressed) Shannon-Fano tree
+minus 1. The remaining bytes represent the Shannon-Fano tree
+data encoded as:
+
+ High 4 bits: Number of values at this bit length + 1. (1 - 16)
+ Low 4 bits: Bit Length needed to represent value + 1. (1 - 16)
+
+The Shannon-Fano codes can be constructed from the bit lengths
+using the following algorithm:
+
+1) Sort the Bit Lengths in ascending order, while retaining the
+ order of the original lengths stored in the file.
+
+2) Generate the Shannon-Fano trees:
+
+ Code <- 0
+ CodeIncrement <- 0
+ LastBitLength <- 0
+ i <- number of Shannon-Fano codes - 1 (either 255 or 63)
+
+ loop while i >= 0
+ Code = Code + CodeIncrement
+ if BitLength(i) <> LastBitLength then
+ LastBitLength=BitLength(i)
+ CodeIncrement = 1 shifted left (16 - LastBitLength)
+ ShannonCode(i) = Code
+ i <- i - 1
+ end loop
+
+
+3) Reverse the order of all the bits in the above ShannonCode()
+ vector, so that the most significant bit becomes the least
+ significant bit. For example, the value 0x1234 (hex) would
+ become 0x2C48 (hex).
+
+4) Restore the order of Shannon-Fano codes as originally stored
+ within the file.
+
+Example:
+
+ This example will show the encoding of a Shannon-Fano tree
+ of size 8. Notice that the actual Shannon-Fano trees used
+ for Imploding are either 64 or 256 entries in size.
+
+Example: 0x02, 0x42, 0x01, 0x13
+
+ The first byte indicates 3 values in this table. Decoding the
+ bytes:
+ 0x42 = 5 codes of 3 bits long
+ 0x01 = 1 code of 2 bits long
+ 0x13 = 2 codes of 4 bits long
+
+ This would generate the original bit length array of:
+ (3, 3, 3, 3, 3, 2, 4, 4)
+
+ There are 8 codes in this table for the values 0 thru 7. Using the
+ algorithm to obtain the Shannon-Fano codes produces:
+
+ Reversed Order Original
+Val Sorted Constructed Code Value Restored Length
+--- ------ ----------------- -------- -------- ------
+0: 2 1100000000000000 11 101 3
+1: 3 1010000000000000 101 001 3
+2: 3 1000000000000000 001 110 3
+3: 3 0110000000000000 110 010 3
+4: 3 0100000000000000 010 100 3
+5: 3 0010000000000000 100 11 2
+6: 4 0001000000000000 1000 1000 4
+7: 4 0000000000000000 0000 0000 4
+
+
+The values in the Val, Order Restored and Original Length columns
+now represent the Shannon-Fano encoding tree that can be used for
+decoding the Shannon-Fano encoded data. How to parse the
+variable length Shannon-Fano values from the data stream is beyond the
+scope of this document. (See the references listed at the end of
+this document for more information.) However, traditional decoding
+schemes used for Huffman variable length decoding, such as the
+Greenlaw algorithm, can be successfully applied.
+
+The compressed data stream begins immediately after the
+compressed Shannon-Fano data. The compressed data stream can be
+interpreted as follows:
+
+loop until done
+ read 1 bit from input stream.
+
+ if this bit is non-zero then (encoded data is literal data)
+ if Literal Shannon-Fano tree is present
+ read and decode character using Literal Shannon-Fano tree.
+ otherwise
+ read 8 bits from input stream.
+ copy character to the output stream.
+ otherwise (encoded data is sliding dictionary match)
+ if 8K dictionary size
+ read 7 bits for offset Distance (lower 7 bits of offset).
+ otherwise
+ read 6 bits for offset Distance (lower 6 bits of offset).
+
+ using the Distance Shannon-Fano tree, read and decode the
+ upper 6 bits of the Distance value.
+
+ using the Length Shannon-Fano tree, read and decode
+ the Length value.
+
+ Length <- Length + Minimum Match Length
+
+ if Length = 63 + Minimum Match Length
+ read 8 bits from the input stream,
+ add this value to Length.
+
+ move backwards Distance+1 bytes in the output stream, and
+ copy Length characters from this position to the output
+ stream. (if this position is before the start of the output
+ stream, then assume that all the data before the start of
+ the output stream is filled with zeros).
+end loop
+
+Tokenizing - Method 7
+--------------------
+
+This method is not used by PKZIP.
+
+Deflating - Method 8
+-----------------
+
+The Deflate algorithm is similar to the Implode algorithm using
+a sliding dictionary of up to 32K with secondary compression
+from Huffman/Shannon-Fano codes.
+
+The compressed data is stored in blocks with a header describing
+the block and the Huffman codes used in the data block. The header
+format is as follows:
+
+ Bit 0: Last Block bit This bit is set to 1 if this is the last
+ compressed block in the data.
+ Bits 1-2: Block type
+ 00 (0) - Block is stored - All stored data is byte aligned.
+ Skip bits until next byte, then next word = block length,
+ followed by the ones compliment of the block length word.
+ Remaining data in block is the stored data.
+
+ 01 (1) - Use fixed Huffman codes for literal and distance codes.
+ Lit Code Bits Dist Code Bits
+ --------- ---- --------- ----
+ 0 - 143 8 0 - 31 5
+ 144 - 255 9
+ 256 - 279 7
+ 280 - 287 8
+
+ Literal codes 286-287 and distance codes 30-31 are never
+ used but participate in the huffman construction.
+
+ 10 (2) - Dynamic Huffman codes. (See expanding Huffman codes)
+
+ 11 (3) - Reserved - Flag a "Error in compressed data" if seen.
+
+Expanding Huffman Codes
+-----------------------
+If the data block is stored with dynamic Huffman codes, the Huffman
+codes are sent in the following compressed format:
+
+ 5 Bits: # of Literal codes sent - 257 (257 - 286)
+ All other codes are never sent.
+ 5 Bits: # of Dist codes - 1 (1 - 32)
+ 4 Bits: # of Bit Length codes - 4 (4 - 19)
+
+The Huffman codes are sent as bit lengths and the codes are built as
+described in the implode algorithm. The bit lengths themselves are
+compressed with Huffman codes. There are 19 bit length codes:
+
+ 0 - 15: Represent bit lengths of 0 - 15
+ 16: Copy the previous bit length 3 - 6 times.
+ The next 2 bits indicate repeat length (0 = 3, ... ,3 = 6)
+ Example: Codes 8, 16 (+2 bits 11), 16 (+2 bits 10) will
+ expand to 12 bit lengths of 8 (1 + 6 + 5)
+ 17: Repeat a bit length of 0 for 3 - 10 times. (3 bits of length)
+ 18: Repeat a bit length of 0 for 11 - 138 times (7 bits of length)
+
+The lengths of the bit length codes are sent packed 3 bits per value
+(0 - 7) in the following order:
+
+ 16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15
+
+The Huffman codes should be built as described in the Implode algorithm
+except codes are assigned starting at the shortest bit length, i.e. the
+shortest code should be all 0's rather than all 1's. Also, codes with
+a bit length of zero do not participate in the tree construction. The
+codes are then used to decode the bit lengths for the literal and distance
+tables.
+
+The bit lengths for the literal tables are sent first with the number
+of entries sent described by the 5 bits sent earlier. There are up
+to 286 literal characters; the first 256 represent the respective 8
+bit character, code 256 represents the End-Of-Block code, the remaining
+29 codes represent copy lengths of 3 thru 258. There are up to 30
+distance codes representing distances from 1 thru 32k as described
+below.
+
+ Length Codes
+ ------------
+ Extra Extra Extra Extra
+ Code Bits Length Code Bits Lengths Code Bits Lengths Code Bits Length(s)
+ ---- ---- ------ ---- ---- ------- ---- ---- ------- ---- ---- ---------
+ 257 0 3 265 1 11,12 273 3 35-42 281 5 131-162
+ 258 0 4 266 1 13,14 274 3 43-50 282 5 163-194
+ 259 0 5 267 1 15,16 275 3 51-58 283 5 195-226
+ 260 0 6 268 1 17,18 276 3 59-66 284 5 227-257
+ 261 0 7 269 2 19-22 277 4 67-82 285 0 258
+ 262 0 8 270 2 23-26 278 4 83-98
+ 263 0 9 271 2 27-30 279 4 99-114
+ 264 0 10 272 2 31-34 280 4 115-130
+
+ Distance Codes
+ --------------
+ Extra Extra Extra Extra
+ Code Bits Dist Code Bits Dist Code Bits Distance Code Bits Distance
+ ---- ---- ---- ---- ---- ------ ---- ---- -------- ---- ---- --------
+ 0 0 1 8 3 17-24 16 7 257-384 24 11 4097-6144
+ 1 0 2 9 3 25-32 17 7 385-512 25 11 6145-8192
+ 2 0 3 10 4 33-48 18 8 513-768 26 12 8193-12288
+ 3 0 4 11 4 49-64 19 8 769-1024 27 12 12289-16384
+ 4 1 5,6 12 5 65-96 20 9 1025-1536 28 13 16385-24576
+ 5 1 7,8 13 5 97-128 21 9 1537-2048 29 13 24577-32768
+ 6 2 9-12 14 6 129-192 22 10 2049-3072
+ 7 2 13-16 15 6 193-256 23 10 3073-4096
+
+The compressed data stream begins immediately after the
+compressed header data. The compressed data stream can be
+interpreted as follows:
+
+do
+ read header from input stream.
+
+ if stored block
+ skip bits until byte aligned
+ read count and 1's compliment of count
+ copy count bytes data block
+ otherwise
+ loop until end of block code sent
+ decode literal character from input stream
+ if literal < 256
+ copy character to the output stream
+ otherwise
+ if literal = end of block
+ break from loop
+ otherwise
+ decode distance from input stream
+
+ move backwards distance bytes in the output stream, and
+ copy length characters from this position to the output
+ stream.
+ end loop
+while not last block
+
+if data descriptor exists
+ skip bits until byte aligned
+ check data descriptor signature
+ read crc and sizes
+endif
+
+Decryption
+----------
+
+The encryption used in PKZIP was generously supplied by Roger
+Schlafly. PKWARE is grateful to Mr. Schlafly for his expert
+help and advice in the field of data encryption.
+
+PKZIP encrypts the compressed data stream. Encrypted files must
+be decrypted before they can be extracted.
+
+Each encrypted file has an extra 12 bytes stored at the start of
+the data area defining the encryption header for that file. The
+encryption header is originally set to random values, and then
+itself encrypted, using three, 32-bit keys. The key values are
+initialized using the supplied encryption password. After each byte
+is encrypted, the keys are then updated using pseudo-random number
+generation techniques in combination with the same CRC-32 algorithm
+used in PKZIP and described elsewhere in this document.
+
+The following is the basic steps required to decrypt a file:
+
+1) Initialize the three 32-bit keys with the password.
+2) Read and decrypt the 12-byte encryption header, further
+ initializing the encryption keys.
+3) Read and decrypt the compressed data stream using the
+ encryption keys.
+
+
+Step 1 - Initializing the encryption keys
+-----------------------------------------
+
+Key(0) <- 305419896
+Key(1) <- 591751049
+Key(2) <- 878082192
+
+loop for i <- 0 to length(password)-1
+ update_keys(password(i))
+end loop
+
+
+Where update_keys() is defined as:
+
+
+update_keys(char):
+ Key(0) <- crc32(key(0),char)
+ Key(1) <- Key(1) + (Key(0) & 000000ffH)
+ Key(1) <- Key(1) * 134775813 + 1
+ Key(2) <- crc32(key(2),key(1) >> 24)
+end update_keys
+
+
+Where crc32(old_crc,char) is a routine that given a CRC value and a
+character, returns an updated CRC value after applying the CRC-32
+algorithm described elsewhere in this document.
+
+
+Step 2 - Decrypting the encryption header
+-----------------------------------------
+
+The purpose of this step is to further initialize the encryption
+keys, based on random data, to render a plaintext attack on the
+data ineffective.
+
+
+Read the 12-byte encryption header into Buffer, in locations
+Buffer(0) thru Buffer(11).
+
+loop for i <- 0 to 11
+ C <- buffer(i) ^ decrypt_byte()
+ update_keys(C)
+ buffer(i) <- C
+end loop
+
+
+Where decrypt_byte() is defined as:
+
+
+unsigned char decrypt_byte()
+ local unsigned short temp
+ temp <- Key(2) | 2
+ decrypt_byte <- (temp * (temp ^ 1)) >> 8
+end decrypt_byte
+
+
+After the header is decrypted, the last 1 or 2 bytes in Buffer
+should be the high-order word/byte of the CRC for the file being
+decrypted, stored in Intel low-byte/high-byte order, or the high-order
+byte of the file time if bit 3 of the general purpose bit flag is set.
+Versions of PKZIP prior to 2.0 used a 2 byte CRC check; a 1 byte CRC check is
+used on versions after 2.0. This can be used to test if the password
+supplied is correct or not.
+
+
+Step 3 - Decrypting the compressed data stream
+----------------------------------------------
+
+The compressed data stream can be decrypted as follows:
+
+
+loop until done
+ read a character into C
+ Temp <- C ^ decrypt_byte()
+ update_keys(temp)
+ output Temp
+end loop
+
+
+In addition to the above mentioned contributors to PKZIP and PKUNZIP,
+I would like to extend special thanks to Robert Mahoney for suggesting
+the extension .ZIP for this software.
+
+
+References:
+
+ Fiala, Edward R., and Greene, Daniel H., "Data compression with
+ finite windows", Communications of the ACM, Volume 32, Number 4,
+ April 1989, pages 490-505.
+
+ Held, Gilbert, "Data Compression, Techniques and Applications,
+ Hardware and Software Considerations",
+ John Wiley & Sons, 1987.
+
+ Huffman, D.A., "A method for the construction of minimum-redundancy
+ codes", Proceedings of the IRE, Volume 40, Number 9, September 1952,
+ pages 1098-1101.
+
+ Nelson, Mark, "LZW Data Compression", Dr. Dobbs Journal, Volume 14,
+ Number 10, October 1989, pages 29-37.
+
+ Nelson, Mark, "The Data Compression Book", M&T Books, 1991.
+
+ Storer, James A., "Data Compression, Methods and Theory",
+ Computer Science Press, 1988
+
+ Welch, Terry, "A Technique for High-Performance Data Compression",
+ IEEE Computer, Volume 17, Number 6, June 1984, pages 8-19.
+
+ Ziv, J. and Lempel, A., "A universal algorithm for sequential data
+ compression", Communications of the ACM, Volume 30, Number 6,
+ June 1987, pages 520-540.
+
+ Ziv, J. and Lempel, A., "Compression of individual sequences via
+ variable-rate coding", IEEE Transactions on Information Theory,
+ Volume 24, Number 5, September 1978, pages 530-536.