From 84adefa331c4159d432d22840663c38f155cd4c1 Mon Sep 17 00:00:00 2001 From: Erlang/OTP Date: Fri, 20 Nov 2009 14:54:40 +0000 Subject: The R13B03 release. --- lib/kernel/test/zlib_SUITE_data/zipdoc | 1924 ++++++++++++++++++++++++++++++++ 1 file changed, 1924 insertions(+) create mode 100644 lib/kernel/test/zlib_SUITE_data/zipdoc (limited to 'lib/kernel/test/zlib_SUITE_data/zipdoc') diff --git a/lib/kernel/test/zlib_SUITE_data/zipdoc b/lib/kernel/test/zlib_SUITE_data/zipdoc new file mode 100644 index 0000000000..e63952e3ef --- /dev/null +++ b/lib/kernel/test/zlib_SUITE_data/zipdoc @@ -0,0 +1,1924 @@ +[Info-ZIP note, 981119: this file is based on PKWARE's appnote.txt of + 15 February 1996, taking into account PKWARE's revised appnote.txt version + of 01 September 1998. It has been unofficially corrected and extended by + Info-ZIP without explicit permission by PKWARE. Although Info-ZIP + believes the information to be accurate and complete, it is provided + under a disclaimer similar to the PKWARE disclaimer below, differing + only in the substitution of "Info-ZIP" for "PKWARE". In other words, + use this information at your own risk, but we think it's correct. + + Specification info from PKWARE that was obviously wrong has been corrected + silently (e.g. missing structure fields, wrong numbers + As of PKZIPW 2.50, two new incompatibilities have been introduced by PKWARE; + they are noted below. Note that the "NTFS tag" conflict is currently not + real; PKZIPW 2.50 actually tags NTFS files as having come from a FAT + file system, too.] + + +Disclaimer +---------- + +Although PKWARE will attempt to supply current and accurate +information relating to its file formats, algorithms, and the +subject programs, the possibility of error can not be eliminated. +PKWARE therefore expressly disclaims any warranty that the +information contained in the associated materials relating to the +subject programs and/or the format of the files created or +accessed by the subject programs and/or the algorithms used by +the subject programs, or any other matter, is current, correct or +accurate as delivered. Any risk of damage due to any possible +inaccurate information is assumed by the user of the information. +Furthermore, the information relating to the subject programs +and/or the file formats created or accessed by the subject +programs and/or the algorithms used by the subject programs is +subject to change without notice. + + +General Format of a ZIP file +---------------------------- + + Files stored in arbitrary order. Large zipfiles can span multiple + diskette media. + + Overall zipfile format: + + [local file header + file data + data_descriptor] . . . + [central directory] end of central directory record + + + A. Local file header: + + local file header signature 4 bytes (0x04034b50) + version needed to extract 2 bytes + general purpose bit flag 2 bytes + compression method 2 bytes + last mod file time 2 bytes + last mod file date 2 bytes + crc-32 4 bytes + compressed size 4 bytes + uncompressed size 4 bytes + filename length 2 bytes + extra field length 2 bytes + + filename (variable size) + extra field (variable size) + + + B. Data descriptor: + + data descriptor signature 4 bytes (0x08074b50) + crc-32 4 bytes + compressed size 4 bytes + uncompressed size 4 bytes + + This descriptor exists only if bit 3 of the general + purpose bit flag is set (see below). It is byte aligned + and immediately follows the last byte of compressed data. + This descriptor is used only when it was not possible to + seek in the output zip file, e.g., when the output zip file + was standard output or a non seekable device. + + C. Central directory structure: + + [file header] . . . end of central dir record + + File header: + + central file header signature 4 bytes (0x02014b50) + version made by 2 bytes + version needed to extract 2 bytes + general purpose bit flag 2 bytes + compression method 2 bytes + last mod file time 2 bytes + last mod file date 2 bytes + crc-32 4 bytes + compressed size 4 bytes + uncompressed size 4 bytes + filename length 2 bytes + extra field length 2 bytes + file comment length 2 bytes + disk number start 2 bytes + internal file attributes 2 bytes + external file attributes 4 bytes + relative offset of local header 4 bytes + + filename (variable size) + extra field (variable size) + file comment (variable size) + + End of central dir record: + + end of central dir signature 4 bytes (0x06054b50) + number of this disk 2 bytes + number of the disk with the + start of the central directory 2 bytes + total number of entries in + the central dir on this disk 2 bytes + total number of entries in + the central dir 2 bytes + size of the central directory 4 bytes + offset of start of central + directory with respect to + the starting disk number 4 bytes + zipfile comment length 2 bytes + zipfile comment (variable size) + + + D. Explanation of fields: + + version made by (2 bytes) + + The upper byte indicates the host system (OS) for the + file. Software can use this information to determine + the line record format for text files etc. The current + mappings are: + + 0 - FAT file system (DOS, OS/2, NT) + PKZIPW 2.50 VFAT, NTFS + 1 - Amiga + 2 - VMS (VAX or Alpha AXP) + 3 - Unix + 4 - VM/CMS + 5 - Atari + 6 - HPFS file system (OS/2, NT 3.x) + 7 - Macintosh + 8 - Z-System + 9 - CP/M + 10 - TOPS-20 [supposedly PKZIPW 2.50 NTFS] + 11 - NTFS file system (NT) [used by Info-ZIP, only] + 12 - SMS/QDOS + 13 - Acorn RISC OS + 14 - VFAT file system (Win95, NT) [Info-ZIP reservation, unused] + 15 - MVS + 16 - BeOS (BeBox or PowerMac) + 17 - Tandem + 18 thru 255 - unused + + The lower byte indicates the version number of the + software used to encode the file. The value/10 + indicates the major version number, and the value + mod 10 is the minor version number. + + version needed to extract (2 bytes) + + The minimum software version needed to extract the + file, mapped as above. + + general purpose bit flag: (2 bytes) + + Bit 0: If set, indicates that the file is encrypted. + + (For Method 6 - Imploding) + Bit 1: If the compression method used was type 6, + Imploding, then this bit, if set, indicates + an 8K sliding dictionary was used. If clear, + then a 4K sliding dictionary was used. + Bit 2: If the compression method used was type 6, + Imploding, then this bit, if set, indicates + an 3 Shannon-Fano trees were used to encode the + sliding dictionary output. If clear, then 2 + Shannon-Fano trees were used. + + (For Method 8 - Deflating) + Bit 2 Bit 1 + 0 0 Normal (-en) compression option was used. + 0 1 Maximum (-ex) compression option was used. + 1 0 Fast (-ef) compression option was used. + 1 1 Super Fast (-es) compression option was used. + + Note: Bits 1 and 2 are undefined if the compression + method is any other. + + Bit 3: If this bit is set, the fields crc-32, compressed size + and uncompressed size are set to zero in the local + header. The correct values are put in the data descriptor + immediately following the compressed data. (Note: PKZIP + version 2.04g for DOS only recognizes this bit for method 8 + compression, newer versions of PKZIP recognize this bit + for any compression method.) + [Info-ZIP note: This bit was introduced by PKZIP 2.04 for + DOS. In general, this feature can only be reliably used + together with compression methods that allow intrinsic + detection of the "end-of-compressed-data" condition. From + the set of compression methods described in this Zip archive + specification, only "deflate" meets this requirement. + Especially, the method STORED does not work! + The Info-ZIP tools recognize this bit regardless of the + compression method; but, they rely on correctly set + "compressed size" information in the central directory entry.] + + Bit 5: If this bit is set, this indicates that the file is compressed + patched data. (Note: Requires PKZIP version 2.70 or greater) + + The upper three bits are reserved and used internally + by the software when processing the zipfile. The + remaining bits are unused. + + compression method: (2 bytes) + + (see accompanying documentation for algorithm + descriptions) + + 0 - The file is stored (no compression) + 1 - The file is Shrunk + 2 - The file is Reduced with compression factor 1 + 3 - The file is Reduced with compression factor 2 + 4 - The file is Reduced with compression factor 3 + 5 - The file is Reduced with compression factor 4 + 6 - The file is Imploded + 7 - Reserved for Tokenizing compression algorithm + 8 - The file is Deflated + 9 - Reserved for enhanced Deflating + 10 - PKWARE Data Compression Library Imploding + + date and time fields: (2 bytes each) + + The date and time are encoded in standard MS-DOS format. + If input came from standard input, the date and time are + those at which compression was started for this data. + + CRC-32: (4 bytes) + + The CRC-32 algorithm was generously contributed by + David Schwaderer and can be found in his excellent + book "C Programmers Guide to NetBIOS" published by + Howard W. Sams & Co. Inc. The 'magic number' for + the CRC is 0xdebb20e3. The proper CRC pre and post + conditioning is used, meaning that the CRC register + is pre-conditioned with all ones (a starting value + of 0xffffffff) and the value is post-conditioned by + taking the one's complement of the CRC residual. + If bit 3 of the general purpose flag is set, this + field is set to zero in the local header and the correct + value is put in the data descriptor and in the central + directory. + + compressed size: (4 bytes) + uncompressed size: (4 bytes) + + The size of the file compressed and uncompressed, + respectively. If bit 3 of the general purpose bit flag + is set, these fields are set to zero in the local header + and the correct values are put in the data descriptor and + in the central directory. + + filename length: (2 bytes) + extra field length: (2 bytes) + file comment length: (2 bytes) + + The length of the filename, extra field, and comment + fields respectively. The combined length of any + directory record and these three fields should not + generally exceed 65,535 bytes. If input came from standard + input, the filename length is set to zero. + + [Info-ZIP note: + This feature is not yet supported by any PKWARE version of ZIP + (at least not in PKZIP for DOS and PKZIP for Windows/WinNT). + The Info-ZIP programs handle standard input differently: + If input came from standard input, the filename is set to "-" + (length one).] + + + disk number start: (2 bytes) + + The number of the disk on which this file begins. + + internal file attributes: (2 bytes) + + The lowest bit of this field indicates, if set, that + the file is apparently an ASCII or text file. If not + set, that the file apparently contains binary data. + The remaining bits are unused in version 1.0. + + external file attributes: (4 bytes) + + The mapping of the external attributes is + host-system dependent (see 'version made by'). For + MS-DOS, the low order byte is the MS-DOS directory + attribute byte. If input came from standard input, this + field is set to zero. + + relative offset of local header: (4 bytes) + + This is the offset from the start of the first disk on + which this file appears, to where the local header should + be found. + + filename: (Variable) + + The name of the file, with optional relative path. + The path stored should not contain a drive or + device letter, or a leading slash. All slashes + should be forward slashes '/' as opposed to + backwards slashes '\' for compatibility with Amiga + and Unix file systems etc. If input came from standard + input, there is no filename field. + [Info-ZIP discrepancy: + If input came from standard input, the file name is set + to "-" (without the quotes). + As far as we know, the PKWARE specification for "input from + stdin" is not supported by PKZIP/PKUNZIP for DOS, OS/2, Windows + Windows NT.] + + extra field: (Variable) + + This is for future expansion. If additional information + needs to be stored in the future, it should be stored + here. Earlier versions of the software can then safely + skip this file, and find the next file or header. This + field will be 0 length in version 1.0. + + In order to allow different programs and different types + of information to be stored in the 'extra' field in .ZIP + files, the following structure should be used for all + programs storing data in this field: + + header1+data1 + header2+data2 . . . + + Each header should consist of: + + Header ID - 2 bytes + Data Size - 2 bytes + + Note: all fields stored in Intel low-byte/high-byte order. + + The Header ID field indicates the type of data that is in + the following data block. + + Header ID's of 0 thru 31 are reserved for use by PKWARE. + The remaining ID's can be used by third party vendors for + proprietary usage. + + The current Header ID mappings defined by PKWARE are: + + 0x0007 AV Info + 0x0009 OS/2 extended attributes (also Info-ZIP) + 0x000a PKWARE Win95/WinNT FileTimes [undocumented!] + 0x000c PKWARE VAX/VMS (also Info-ZIP) + 0x000d PKWARE Unix + 0x000f Patch Descriptor + + The Header ID mappings defined by Info-ZIP and third parties are: + + 0x07c8 Info-ZIP Macintosh (old, J. Lee) + 0x2605 ZipIt Macintosh (first version) + 0x2705 ZipIt Macintosh v 1.3.5 and newer (w/o full filename) + 0x334d Info-ZIP Macintosh (new, D. Haase's 'Mac3' field ) + 0x4341 Acorn/SparkFS (David Pilling) + 0x4453 Windows NT security descriptor (binary ACL) + 0x4704 VM/CMS + 0x470f MVS + 0x4b46 FWKCS MD5 (third party, see below) + 0x4c41 OS/2 access control list (text ACL) + 0x4d49 Info-ZIP VMS (VAX or Alpha) + 0x5356 AOS/VS (binary ACL) + 0x5455 extended timestamp + 0x5855 Info-ZIP Unix (original; also OS/2, NT, etc.) + 0x6542 BeOS (BeBox, PowerMac, etc.) + 0x756e ASi Unix + 0x7855 Info-ZIP Unix (new) + 0xfb4a SMS/QDOS + + The Data Size field indicates the size of the following + data block. Programs can use this value to skip to the + next header block, passing over any data blocks that are + not of interest. + + Note: As stated above, the size of the entire .ZIP file + header, including the filename, comment, and extra + field should not exceed 64K in size. + + In case two different programs should appropriate the same + Header ID value, it is strongly recommended that each + program place a unique signature of at least two bytes in + size (and preferably 4 bytes or bigger) at the start of + each data area. Every program should verify that its + unique signature is present, in addition to the Header ID + value being correct, before assuming that it is a block of + known type. + + In the following descriptions, note that "Short" means two bytes, + "Long" means four bytes, and "Long-Long" means eight bytes, + regardless of their native sizes. Unless specifically noted, all + integer fields should be interpreted as unsigned (non-negative) + numbers. + + + -OS/2 Extended Attributes Extra Field: + ==================================== + + The following is the layout of the OS/2 extended attributes "extra" + block. (Last Revision 19960922) + + Note: all fields stored in Intel low-byte/high-byte order. + + Local-header version: + + Value Size Description + ----- ---- ----------- + (OS/2) 0x0009 Short tag for this extra block type + TSize Short total data size for this block + BSize Long uncompressed EA data size + CType Short compression type + EACRC Long CRC value for uncompressed EA data + (var.) variable compressed EA data + + Central-header version: + + Value Size Description + ----- ---- ----------- + (OS/2) 0x0009 Short tag for this extra block type + TSize Short total data size for this block + BSize Long size of uncompressed local EA data + + The value of CType is interpreted according to the "compression + method" section above; i.e., 0 for stored, 8 for deflated, etc. + + The OS/2 extended attribute structure (FEA2LIST) is compressed and + then stored in its entirety within this structure. There will only + ever be one block of data in the variable-length field. + + + -OS/2 Access Control List Extra Field: + ==================================== + + The following is the layout of the OS/2 ACL extra block. + (Last Revision 19960922) + + Local-header version: + + Value Size Description + ----- ---- ----------- + (ACL) 0x4c41 Short tag for this extra block type + TSize Short total data size for this block + BSize Long uncompressed ACL data size + CType Short compression type + EACRC Long CRC value for uncompressed ACL data + (var.) variable compressed ACL data + + Central-header version: + + Value Size Description + ----- ---- ----------- + (ACL) 0x4c41 Short tag for this extra block type + TSize Short total data size for this block + BSize Long size of uncompressed local ACL data + + The value of CType is interpreted according to the "compression + method" section above; i.e., 0 for stored, 8 for deflated, etc. + + The uncompressed ACL data consist of a text header of the form + "ACL1:%hX,%hd\n", where the first field is the OS/2 ACCINFO acc_attr + member and the second is acc_count, followed by acc_count strings + of the form "%s,%hx\n", where the first field is acl_ugname (user + group name) and the second acl_access. This block type will be + extended for other operating systems as needed. + + + -Windows NT Security Descriptor Extra Field: + ========================================== + + The following is the layout of the NT Security Descriptor (another + type of ACL) extra block. (Last Revision 19960922) + + Local-header version: + + Value Size Description + ----- ---- ----------- + (SD) 0x4453 Short tag for this extra block type + TSize Short total data size for this block + BSize Long uncompressed SD data size + Version Byte version of uncompressed SD data format + CType Short compression type + EACRC Long CRC value for uncompressed SD data + (var.) variable compressed SD data + + Central-header version: + + Value Size Description + ----- ---- ----------- + (SD) 0x4453 Short tag for this extra block type + TSize Short total data size for this block + BSize Long size of uncompressed local SD data + + The value of CType is interpreted according to the "compression + method" section above; i.e., 0 for stored, 8 for deflated, etc. + Version specifies how the compressed data are to be interpreted + and allows for future expansion of this extra field type. Currently + only version 0 is defined. + + For version 0, the compressed data are to be interpreted as a single + valid Windows NT SECURITY_DESCRIPTOR data structure, in self-relative + format. + + + -PKWARE Win95/WinNT Extra Field: + ============================== + + The following description covers PKWARE's undocumented + Windows 95 & Windows NT extra field, introduced with the + release of PKZIP for Windows 2.50. (Last Revision 19980425) + + This field has a fixed data size of 32 bytes and is only stored + as local extra field. + + Value Size Description + ----- ---- ----------- + (WinNT) 0x000a Short Tag for this "extra" block type + TSize Short Total Data Size for this block + Unknwn1 Long ???? (all 0 ?) + Unknwn2 Long ???? + ModTime Long-Long 64-bit NTFS last-modified filetime + AccTime Long-Long 64-bit NTFS last-access filetime + CreTime Long-Long 64-bit NTFS creation filetime + + The NTFS filetimes are 64-bit unsigned integers, stored in Intel + (least significant byte first) byte order. They determine the + number of 1.0E-07 seconds (1/10th microseconds!) past WinNT "epoch", + which is "01-Jan-1601 00:00:00 UTC". + + + -PKWARE VAX/VMS Extra Field: + ========================== + + The following is the layout of PKWARE's VAX/VMS attributes "extra" + block. (Last Revision 12/17/91) + + Note: all fields stored in Intel low-byte/high-byte order. + + Value Size Description + ----- ---- ----------- + (VMS) 0x000c Short Tag for this "extra" block type + TSize Short Total Data Size for this block + CRC Long 32-bit CRC for remainder of the block + Tag1 Short VMS attribute tag value #1 + Size1 Short Size of attribute #1, in bytes + (var.) Size1 Attribute #1 data + . + . + . + TagN Short VMS attribute tage value #N + SizeN Short Size of attribute #N, in bytes + (var.) SizeN Attribute #N data + + Rules: + + 1. There will be one or more of attributes present, which will + each be preceded by the above TagX & SizeX values. These + values are identical to the ATR$C_XXXX and ATR$S_XXXX constants + which are defined in ATR.H under VMS C. Neither of these values + will ever be zero. + + 2. No word alignment or padding is performed. + + 3. A well-behaved PKZIP/VMS program should never produce more than + one sub-block with the same TagX value. Also, there will never + be more than one "extra" block of type 0x000c in a particular + directory record. + + + -Info-ZIP VMS Extra Field: + ======================== + + The following is the layout of Info-ZIP's VMS attributes extra + block for VAX or Alpha AXP. The local-header and central-header + versions are identical. (Last Revision 19960922) + + Value Size Description + ----- ---- ----------- + (VMS2) 0x4d49 Short tag for this extra block type + TSize Short total data size for this block + ID Long block ID + Flags Short info bytes + BSize Short uncompressed block size + Reserved Long (reserved) + (var.) variable compressed VMS file-attributes block + + The block ID is one of the following unterminated strings: + + "VFAB" struct FAB + "VALL" struct XABALL + "VFHC" struct XABFHC + "VDAT" struct XABDAT + "VRDT" struct XABRDT + "VPRO" struct XABPRO + "VKEY" struct XABKEY + "VMSV" version (e.g., "V6.1"; truncated at hyphen) + "VNAM" reserved + + The lower three bits of Flags indicate the compression method. The + currently defined methods are: + + 0 stored (not compressed) + 1 simple "RLE" + 2 deflated + + The "RLE" method simply replaces zero-valued bytes with zero-valued + bits and non-zero-valued bytes with a "1" bit followed by the byte + value. + + The variable-length compressed data contains only the data corre- + sponding to the indicated structure or string. Typically multiple + VMS2 extra fields are present (each with a unique block type). + + + -Info-ZIP Macintosh Extra Field: + ============================== + + The following is the layout of the (old) Info-ZIP resource-fork extra + block for Macintosh. The local-header and central-header versions + are identical. (Last Revision 19960922) + + Value Size Description + ----- ---- ----------- + (Mac) 0x07c8 Short tag for this extra block type + TSize Short total data size for this block + "JLEE" beLong extra-field signature + FInfo 16 bytes Macintosh FInfo structure + CrDat beLong HParamBlockRec fileParam.ioFlCrDat + MdDat beLong HParamBlockRec fileParam.ioFlMdDat + Flags beLong info bits + DirID beLong HParamBlockRec fileParam.ioDirID + VolName 28 bytes volume name (optional) + + All fields but the first two are in native Macintosh format + (big-endian Motorola order, not little-endian Intel). The least + significant bit of Flags is 1 if the file is a data fork, 0 other- + wise. In addition, if this extra field is present, the filename + has an extra 'd' or 'r' appended to indicate data fork or resource + fork. The 28-byte VolName field may be omitted. + + + -ZipIt Macintosh Extra Field (long): + ================================== + + The following is the layout of the ZipIt extra block for Macintosh. + The local-header and central-header versions are identical. + (Last Revision 19970130) + + Value Size Description + ----- ---- ----------- + (Mac2) 0x2605 Short tag for this extra block type + TSize Short total data size for this block + "ZPIT" beLong extra-field signature + FnLen Byte length of FileName + FileName variable full Macintosh filename + FileType Byte[4] four-byte Mac file type string + Creator Byte[4] four-byte Mac creator string + + + -ZipIt Macintosh Extra Field (short): + =================================== + + The following is the layout of a shortened variant of the + ZipIt extra block for Macintosh (without "full name" entry). + This variant is used by ZipIt 1.3.5 and newer for entries that + do not need a "full Mac filename" record. + The local-header and central-header versions are identical. + (Last Revision 19980903) + + Value Size Description + ----- ---- ----------- + (Mac2b) 0x2705 Short tag for this extra block type + TSize Short total data size for this block + "ZPIT" beLong extra-field signature + FileType Byte[4] four-byte Mac file type string + Creator Byte[4] four-byte Mac creator string + + + -Info-ZIP Macintosh Extra Field (new): + ==================================== + + The following is the layout of the (new) Info-ZIP extra + block for Macintosh, designed by Dirk Haase. + All values are in little-endian. + (Last Revision 19981005) + + Local-header version: + + Value Size Description + ----- ---- ----------- + (Mac3) 0x334d Short tag for this extra block type ("M3") + TSize Short total data size for this block + BSize Long uncompressed finder attribute data size + Flags Short info bits + fdType Byte[4] Type of the File (4-byte string) + fdCreator Byte[4] Creator of the File (4-byte string) + (CType) Short compression type + (CRC) Long CRC value for uncompressed MacOS data + Attribs variable finder attribute data (see below) + + + Central-header version: + + Value Size Description + ----- ---- ----------- + (Mac3) 0x334d Short tag for this extra block type ("M3") + TSize Short total data size for this block + BSize Long uncompressed finder attribute data size + Flags Short info bits + fdType Byte[4] Type of the File (4-byte string) + fdCreator Byte[4] Creator of the File (4-byte string) + + The third bit of Flags in both headers indicates whether + the LOCAL extra field is uncompressed (and therefore whether CType + and CRC are omitted): + + Bits of the Flags: + bit 0 if set, file is a data fork; otherwise unset + bit 1 if set, filename will be not changed + bit 2 if set, Attribs is uncompressed (no CType, CRC) + bit 3 if set, date and times are in 64 bit + if zero date and times are in 32 bit. + bit 4 if set, timezone offsets fields for the native + Mac times are omitted (UTC support deactivated) + bits 5-15 reserved; + + + Attributes: + + Attribs is a Mac-specific block of data in little-endian format with + the following structure (if compressed, uncompress it first): + + Value Size Description + ----- ---- ----------- + fdFlags Short Finder Flags + fdLocation.v Short Finder Icon Location + fdLocation.h Short Finder Icon Location + fdFldr Short Folder containing file + + FXInfo 16 bytes Macintosh FXInfo structure + FXInfo-Structure: + fdIconID Short + fdUnused[3] Short unused but reserved 6 bytes + fdScript Byte Script flag and number + fdXFlags Byte More flag bits + fdComment Short Comment ID + fdPutAway Long Home Dir ID + + FVersNum Byte file version number + may be not used by MacOS + ACUser Byte directory access rights + + FlCrDat ULong date and time of creation + FlMdDat ULong date and time of last modification + FlBkDat ULong date and time of last backup + These time numbers are original Mac FileTime values (local time!). + Currently, date-time width is 32-bit, but future version may + support be 64-bit times (see flags) + + CrGMTOffs Long(signed!) difference "local Creat. time - UTC" + MdGMTOffs Long(signed!) difference "local Modif. time - UTC" + BkGMTOffs Long(signed!) difference "local Backup time - UTC" + These "local time - UTC" differences (stored in seconds) may be + used to support timestamp adjustment after inter-timezone transfer. + These fields are optional; bit 4 of the flags word controls their + presence. + + Charset Short TextEncodingBase (Charset) + valid for the following two fields + + FullPath variable Path of the current file. + Zero terminated string (C-String) + Currently coded in the native Charset. + + Comment variable Finder Comment of the current file. + Zero terminated string (C-String) + Currently coded in the native Charset. + + + -Acorn SparkFS Extra Field: + ========================= + + The following is the layout of David Pilling's SparkFS extra block + for Acorn RISC OS. The local-header and central-header versions are + identical. (Last Revision 19960922) + + Value Size Description + ----- ---- ----------- + (Acorn) 0x4341 Short tag for this extra block type + TSize Short total data size for this block + "ARC0" Long extra-field signature + LoadAddr Long load address or file type + ExecAddr Long exec address + Attr Long file permissions + Zero Long reserved; always zero + + The following bits of Attr are associated with the given file + permissions: + + bit 0 user-writable ('W') + bit 1 user-readable ('R') + bit 2 reserved + bit 3 locked ('L') + bit 4 publicly writable ('w') + bit 5 publicly readable ('r') + bit 6 reserved + bit 7 reserved + + + -VM/CMS Extra Field: + ================== + + The following is the layout of the file-attributes extra block for + VM/CMS. The local-header and central-header versions are + identical. (Last Revision 19960922) + + Value Size Description + ----- ---- ----------- + (VM/CMS) 0x4704 Short tag for this extra block type + TSize Short total data size for this block + flData variable file attributes data + + flData is an uncompressed fldata_t struct. + + + -MVS Extra Field: + =============== + + The following is the layout of the file-attributes extra block for + MVS. The local-header and central-header versions are identical. + (Last Revision 19960922) + + Value Size Description + ----- ---- ----------- + (MVS) 0x470f Short tag for this extra block type + TSize Short total data size for this block + flData variable file attributes data + + flData is an uncompressed fldata_t struct. + + + -PKWARE Unix Extra Field: + ======================== + + The following is the layout of PKWARE's Unix "extra" block. + It was introduced with the release of PKZIP for Unix 2.50. + Note: all fields are stored in Intel low-byte/high-byte order. + (Last Revision 19980901) + + This field has a minimum data size of 12 bytes and is only stored + as local extra field. + + Value Size Description + ----- ---- ----------- + (Unix0) 0x000d Short Tag for this "extra" block type + TSize Short Total Data Size for this block + AcTime Long time of last access (UTC/GMT) + ModTime Long time of last modification (UTC/GMT) + UID Short Unix user ID + GID Short Unix group ID + (var) variable Variable length data field + + The variable length data field will contain file type + specific data. Currently the only values allowed are + the original "linked to" file names for hard or symbolic links. + + The fixed part of this field has the same layout as Info-ZIP's + abandoned "Unix1 timestamps & owner ID info" extra field; + only the two tag bytes are different. + + + -PATCH Descriptor Extra Field: + ============================ + + The following is the layout of the Patch Descriptor "extra" + block. + + Note: all fields stored in Intel low-byte/high-byte order. + + Value Size Description + ----- ---- ----------- + (Patch) 0x000f Short Tag for this "extra" block type + TSize Short Size of the total "extra" block + Version Short Version of the descriptor + Flags Long Actions and reactions (see below) + OldSize Long Size of the file about to be patched + OldCRC Long 32-bit CRC of the file about to be patched + NewSize Long Size of the resulting file + NewCRC Long 32-bit CRC of the resulting file + + + Actions and reactions + + Bits Description + ---- ---------------- + 0 Use for autodetection + 1 Treat as selfpatch + 2-3 RESERVED + 4-5 Action (see below) + 6-7 RESERVED + 8-9 Reaction (see below) to absent file + 10-11 Reaction (see below) to newer file + 12-13 Reaction (see below) to unknown file + 14-15 RESERVED + 16-31 RESERVED + + Actions + + Action Value + ------ ----- + none 0 + add 1 + delete 2 + patch 3 + + Reactions + + Reaction Value + -------- ----- + ask 0 + skip 1 + ignore 2 + fail 3 + + + -Extended Timestamp Extra Field: + ============================== + + The following is the layout of the extended-timestamp extra block. + (Last Revision 19970118) + + Local-header version: + + Value Size Description + ----- ---- ----------- + (time) 0x5455 Short tag for this extra block type + TSize Short total data size for this block + Flags Byte info bits + (ModTime) Long time of last modification (UTC/GMT) + (AcTime) Long time of last access (UTC/GMT) + (CrTime) Long time of original creation (UTC/GMT) + + Central-header version: + + Value Size Description + ----- ---- ----------- + (time) 0x5455 Short tag for this extra block type + TSize Short total data size for this block + Flags Byte info bits (refers to local header!) + (ModTime) Long time of last modification (UTC/GMT) + + The central-header extra field contains the modification time only, + or no timestamp at all. TSize is used to flag its presence or + absence. But note: + + If "Flags" indicates that Modtime is present in the local header + field, it MUST be present in the central header field, too! + This correspondence is required because the modification time + value may be used to support trans-timezone freshening and + updating operations with zip archives. + + The time values are in standard Unix signed-long format, indicating + the number of seconds since 1 January 1970 00:00:00. The times + are relative to Coordinated Universal Time (UTC), also sometimes + referred to as Greenwich Mean Time (GMT). To convert to local time, + the software must know the local timezone offset from UTC/GMT. + + The lower three bits of Flags in both headers indicate which time- + stamps are present in the LOCAL extra field: + + bit 0 if set, modification time is present + bit 1 if set, access time is present + bit 2 if set, creation time is present + bits 3-7 reserved for additional timestamps; not set + + Those times that are present will appear in the order indicated, but + any combination of times may be omitted. (Creation time may be + present without access time, for example.) TSize should equal + (1 + 4*(number of set bits in Flags)), as the block is currently + defined. Other timestamps may be added in the future. + + + -Info-ZIP Unix Extra Field (type 1): + ================================== + + The following is the layout of the old Info-ZIP extra block for + Unix. It has been replaced by the extended-timestamp extra block + (0x5455) and the Unix type 2 extra block (0x7855). + (Last Revision 19970118) + + Local-header version: + + Value Size Description + ----- ---- ----------- + (Unix1) 0x5855 Short tag for this extra block type + TSize Short total data size for this block + AcTime Long time of last access (UTC/GMT) + ModTime Long time of last modification (UTC/GMT) + UID Short Unix user ID + GID Short Unix group ID + + Central-header version: + + Value Size Description + ----- ---- ----------- + (Unix1) 0x5855 Short tag for this extra block type + TSize Short total data size for this block + AcTime Long time of last access (GMT/UTC) + ModTime Long time of last modification (GMT/UTC) + + The file access and modification times are in standard Unix signed- + long format, indicating the number of seconds since 1 January 1970 + 00:00:00. The times are relative to Coordinated Universal Time + (UTC), also sometimes referred to as Greenwich Mean Time (GMT). To + convert to local time, the software must know the local timezone + offset from UTC/GMT. The modification time may be used by non-Unix + systems to support inter-timezone freshening and updating of zip + archives. + + The local-header extra block may optionally contain UID and GID + info for the file. The local-header TSize value is the only + indication of this. Note that Unix UIDs and GIDs are usually + specific to a particular machine, and they generally require root + access to restore. + + This extra field type is obsolete, but it has been in use since + mid-1994. Therefore future archiving software should continue to + support it. Some guidelines: + + An archive member should either contain the old "Unix1" + extra field block or the new extra field types "time" and/or + "Unix2". + + If both the old "Unix1" block type and one or both of the new + block types "time" and "Unix2" are found, the "Unix1" block + should be considered invalid and ignored. + + Unarchiving software should recognize both old and new extra + field block types, but the info from new types overrides the + old "Unix1" field. + + Archiving software should recognize "Unix1" extra fields for + timestamp comparison but never create it for updated, freshened + or new archive members. When copying existing members to a new + archive, any "Unix1" extra field blocks should be converted to + the new "time" and/or "Unix2" types. + + + -Info-ZIP Unix Extra Field (type 2): + ================================== + + The following is the layout of the new Info-ZIP extra block for + Unix. (Last Revision 19960922) + + Local-header version: + + Value Size Description + ----- ---- ----------- + (Unix2) 0x7855 Short tag for this extra block type + TSize Short total data size for this block + UID Short Unix user ID + GID Short Unix group ID + + Central-header version: + + Value Size Description + ----- ---- ----------- + (Unix2) 0x7855 Short tag for this extra block type + TSize Short total data size for this block + + The data size of the central-header version is zero; it is used + solely as a flag that UID/GID info is present in the local-header + extra field. If additional fields are ever added to the local + version, the central version may be extended to indicate this. + + Note that Unix UIDs and GIDs are usually specific to a particular + machine, and they generally require root access to restore. + + + -ASi Unix Extra Field: + ==================== + + The following is the layout of the ASi extra block for Unix. The + local-header and central-header versions are identical. + (Last Revision 19960916) + + Value Size Description + ----- ---- ----------- + (Unix3) 0x756e Short tag for this extra block type + TSize Short total data size for this block + CRC Long CRC-32 of the remaining data + Mode Short file permissions + SizDev Long symlink'd size OR major/minor dev num + UID Short user ID + GID Short group ID + (var.) variable symbolic link filename + + Mode is the standard Unix st_mode field from struct stat, containing + user/group/other permissions, setuid/setgid and symlink info, etc. + + If Mode indicates that this file is a symbolic link, SizDev is the + size of the file to which the link points. Otherwise, if the file + is a device, SizDev contains the standard Unix st_rdev field from + struct stat (includes the major and minor numbers of the device). + SizDev is undefined in other cases. + + If Mode indicates that the file is a symbolic link, the final field + will be the name of the file to which the link points. The file- + name length can be inferred from TSize. + + [Note that TSize may incorrectly refer to the data size not counting + the CRC; i.e., it may be four bytes too small.] + + + -BeOS Extra Field: + ================ + + The following is the layout of the file-attributes extra block for + BeOS. (Last Revision 19970531) + + Local-header version: + + Value Size Description + ----- ---- ----------- + (BeOS) 0x6542 Short tag for this extra block type + TSize Short total data size for this block + BSize Long uncompressed file attribute data size + Flags Byte info bits + (CType) Short compression type + (CRC) Long CRC value for uncompressed file attribs + Attribs variable file attribute data + + Central-header version: + + Value Size Description + ----- ---- ----------- + (BeOS) 0x6542 Short tag for this extra block type + TSize Short total data size for this block + BSize Long size of uncompressed local EF block data + Flags Byte info bits + + The least significant bit of Flags in both headers indicates whether + the LOCAL extra field is uncompressed (and therefore whether CType + and CRC are omitted): + + bit 0 if set, Attribs is uncompressed (no CType, CRC) + bits 1-7 reserved; if set, assume error or unknown data + + Currently the only supported compression types are deflated (type 8) + and stored (type 0); the latter is not used by Info-ZIP's Zip but is + supported by UnZip. + + Attribs is a BeOS-specific block of data in big-endian format with + the following structure (if compressed, uncompress it first): + + Value Size Description + ----- ---- ----------- + Name variable attribute name (null-terminated string) + Type Long attribute type (32-bit unsigned integer) + Size Long Long data size for this sub-block (64 bits) + Data variable attribute data + + The attribute structure is repeated for every attribute. The Data + field may contain anything--text, flags, bitmaps, etc. + + + -SMS/QDOS Extra Field: + ==================== + + The following is the layout of the file-attributes extra block for + SMS/QDOS. The local-header and central-header versions are identical. + (Last Revision 19960929) + + Value Size Description + ----- ---- ----------- + (QDOS) 0xfb4a Short tag for this extra block type + TSize Short total data size for this block + LongID Long extra-field signature + (ExtraID) Long additional signature/flag bytes + QDirect 64 bytes qdirect structure + + LongID may be "QZHD" or "QDOS". In the latter case, ExtraID will + be present. Its first three bytes are "02\0"; the last byte is + currently undefined. + + QDirect contains the file's uncompressed directory info (qdirect + struct). Its elements are in native (big-endian) format: + + d_length beLong file length + d_access byte file access type + d_type byte file type + d_datalen beLong data length + d_reserved beLong unused + d_szname beShort size of filename + d_name 36 bytes filename + d_update beLong time of last update + d_refdate beLong file version number + d_backup beLong time of last backup (archive date) + + + -AOS/VS Extra Field: + ================== + + The following is the layout of the extra block for Data General + AOS/VS. The local-header and central-header versions are identical. + (Last Revision 19961125) + + Value Size Description + ----- ---- ----------- + (AOSVS) 0x5356 Short tag for this extra block type + TSize Short total data size for this block + "FCI\0" Long extra-field signature + Version Byte version of AOS/VS extra block (10 = 1.0) + Fstat variable fstat packet + AclBuf variable raw ACL data ($MXACL bytes) + + Fstat contains the file's uncompressed fstat packet, which is one of + the following: + + normal fstat packet (P_FSTAT struct) + DIR/CPD fstat packet (P_FSTAT_DIR struct) + unit (device) fstat packet (P_FSTAT_UNIT struct) + IPC file fstat packet (P_FSTAT_IPC struct) + + AclBuf contains the raw ACL data; its length is $MXACL. + + + -FWKCS MD5 Extra Field: + ===================== + + The following is the layout of the optional extra block used by the + FWKCS utility. There is no local-header version; the following + applies only to the central header. (Last Revision 19961207) + + Central-header version: + + Value Size Description + ----- ---- ----------- + (MD5) 0x4b46 Short tag for this extra block type + TSize Short total data size for this block (19) + "MD5" 3 bytes extra-field signature + MD5hash 16 bytes 128-bit MD5 hash of uncompressed data + + The MD5 hash in this extra block is used to automatically identify + files independent of their filenames; it is an an enhanced contents- + signature. + + FWKCS provides an option to strip this extra field, if + present, from a zipfile central directory. In adding + this extra field, FWKCS preserves Zipfile Authenticity + Verification; if stripping this extra field, FWKCS + preserves all versions of AV through PKZIP version 2.04g. + + ``The MD5 algorithm is being placed in the public domain for review + and possible adoption as a standard.'' (Ron Rivest, MIT Laboratory + for Computer Science and RSA Data Security, Inc., April 1992, RFC + 1321, 11.76-77). FWKCS, and FWKCS Contents_Signature System, are + trademarks of Frederick W. Kantor. + + + + file comment: (Variable) + + The comment for this file. + + number of this disk: (2 bytes) + + The number of this disk, which contains central + directory end record. + + number of the disk with the start of the central directory: (2 bytes) + + The number of the disk on which the central + directory starts. + + total number of entries in the central dir on this disk: (2 bytes) + + The number of central directory entries on this disk. + + total number of entries in the central dir: (2 bytes) + + The total number of files in the zipfile. + + + size of the central directory: (4 bytes) + + The size (in bytes) of the entire central directory. + + offset of start of central directory with respect to + the starting disk number: (4 bytes) + + Offset of the start of the central directory on the + disk on which the central directory starts. + + zipfile comment length: (2 bytes) + + The length of the comment for this zipfile. + + zipfile comment: (Variable) + + The comment for this zipfile. + + + D. General notes: + + 1) All fields unless otherwise noted are unsigned and stored + in Intel low-byte:high-byte, low-word:high-word order. + + 2) String fields are not null terminated, since the + length is given explicitly. + + 3) Local headers should not span disk boundaries. Also, even + though the central directory can span disk boundaries, no + single record in the central directory should be split + across disks. + + 4) The entries in the central directory may not necessarily + be in the same order that files appear in the zipfile. + +UnShrinking - Method 1 +---------------------- + +Shrinking is a Dynamic Ziv-Lempel-Welch compression algorithm +with partial clearing. The initial code size is 9 bits, and +the maximum code size is 13 bits. Shrinking differs from +conventional Dynamic Ziv-Lempel-Welch implementations in several +respects: + +1) The code size is controlled by the compressor, and is not + automatically increased when codes larger than the current + code size are created (but not necessarily used). When + the decompressor encounters the code sequence 256 + (decimal) followed by 1, it should increase the code size + read from the input stream to the next bit size. No + blocking of the codes is performed, so the next code at + the increased size should be read from the input stream + immediately after where the previous code at the smaller + bit size was read. Again, the decompressor should not + increase the code size used until the sequence 256,1 is + encountered. + +2) When the table becomes full, total clearing is not + performed. Rather, when the compressor emits the code + sequence 256,2 (decimal), the decompressor should clear + all leaf nodes from the Ziv-Lempel tree, and continue to + use the current code size. The nodes that are cleared + from the Ziv-Lempel tree are then re-used, with the lowest + code value re-used first, and the highest code value + re-used last. The compressor can emit the sequence 256,2 + at any time. + + + +Expanding - Methods 2-5 +----------------------- + +The Reducing algorithm is actually a combination of two +distinct algorithms. The first algorithm compresses repeated +byte sequences, and the second algorithm takes the compressed +stream from the first algorithm and applies a probabilistic +compression method. + +The probabilistic compression stores an array of 'follower +sets' S(j), for j=0 to 255, corresponding to each possible +ASCII character. Each set contains between 0 and 32 +characters, to be denoted as S(j)[0],...,S(j)[m], where m<32. +The sets are stored at the beginning of the data area for a +Reduced file, in reverse order, with S(255) first, and S(0) +last. + +The sets are encoded as { N(j), S(j)[0],...,S(j)[N(j)-1] }, +where N(j) is the size of set S(j). N(j) can be 0, in which +case the follower set for S(j) is empty. Each N(j) value is +encoded in 6 bits, followed by N(j) eight bit character values +corresponding to S(j)[0] to S(j)[N(j)-1] respectively. If +N(j) is 0, then no values for S(j) are stored, and the value +for N(j-1) immediately follows. + +Immediately after the follower sets, is the compressed data +stream. The compressed data stream can be interpreted for the +probabilistic decompression as follows: + + +let Last-Character <- 0. +loop until done + if the follower set S(Last-Character) is empty then + read 8 bits from the input stream, and copy this + value to the output stream. + otherwise if the follower set S(Last-Character) is non-empty then + read 1 bit from the input stream. + if this bit is not zero then + read 8 bits from the input stream, and copy this + value to the output stream. + otherwise if this bit is zero then + read B(N(Last-Character)) bits from the input + stream, and assign this value to I. + Copy the value of S(Last-Character)[I] to the + output stream. + + assign the last value placed on the output stream to + Last-Character. +end loop + + +B(N(j)) is defined as the minimal number of bits required to +encode the value N(j)-1. + + +The decompressed stream from above can then be expanded to +re-create the original file as follows: + + +let State <- 0. + +loop until done + read 8 bits from the input stream into C. + case State of + 0: if C is not equal to DLE (144 decimal) then + copy C to the output stream. + otherwise if C is equal to DLE then + let State <- 1. + + 1: if C is non-zero then + let V <- C. + let Len <- L(V) + let State <- F(Len). + otherwise if C is zero then + copy the value 144 (decimal) to the output stream. + let State <- 0 + + 2: let Len <- Len + C + let State <- 3. + + 3: move backwards D(V,C) bytes in the output stream + (if this position is before the start of the output + stream, then assume that all the data before the + start of the output stream is filled with zeros). + copy Len+3 bytes from this position to the output stream. + let State <- 0. + end case +end loop + + +The functions F,L, and D are dependent on the 'compression +factor', 1 through 4, and are defined as follows: + +For compression factor 1: + L(X) equals the lower 7 bits of X. + F(X) equals 2 if X equals 127 otherwise F(X) equals 3. + D(X,Y) equals the (upper 1 bit of X) * 256 + Y + 1. +For compression factor 2: + L(X) equals the lower 6 bits of X. + F(X) equals 2 if X equals 63 otherwise F(X) equals 3. + D(X,Y) equals the (upper 2 bits of X) * 256 + Y + 1. +For compression factor 3: + L(X) equals the lower 5 bits of X. + F(X) equals 2 if X equals 31 otherwise F(X) equals 3. + D(X,Y) equals the (upper 3 bits of X) * 256 + Y + 1. +For compression factor 4: + L(X) equals the lower 4 bits of X. + F(X) equals 2 if X equals 15 otherwise F(X) equals 3. + D(X,Y) equals the (upper 4 bits of X) * 256 + Y + 1. + + +Imploding - Method 6 +-------------------- + +The Imploding algorithm is actually a combination of two distinct +algorithms. The first algorithm compresses repeated byte +sequences using a sliding dictionary. The second algorithm is +used to compress the encoding of the sliding dictionary output, +using multiple Shannon-Fano trees. + +The Imploding algorithm can use a 4K or 8K sliding dictionary +size. The dictionary size used can be determined by bit 1 in the +general purpose flag word; a 0 bit indicates a 4K dictionary +while a 1 bit indicates an 8K dictionary. + +The Shannon-Fano trees are stored at the start of the compressed +file. The number of trees stored is defined by bit 2 in the +general purpose flag word; a 0 bit indicates two trees stored, a +1 bit indicates three trees are stored. If 3 trees are stored, +the first Shannon-Fano tree represents the encoding of the +Literal characters, the second tree represents the encoding of +the Length information, the third represents the encoding of the +Distance information. When 2 Shannon-Fano trees are stored, the +Length tree is stored first, followed by the Distance tree. + +The Literal Shannon-Fano tree, if present is used to represent +the entire ASCII character set, and contains 256 values. This +tree is used to compress any data not compressed by the sliding +dictionary algorithm. When this tree is present, the Minimum +Match Length for the sliding dictionary is 3. If this tree is +not present, the Minimum Match Length is 2. + +The Length Shannon-Fano tree is used to compress the Length part +of the (length,distance) pairs from the sliding dictionary +output. The Length tree contains 64 values, ranging from the +Minimum Match Length, to 63 plus the Minimum Match Length. + +The Distance Shannon-Fano tree is used to compress the Distance +part of the (length,distance) pairs from the sliding dictionary +output. The Distance tree contains 64 values, ranging from 0 to +63, representing the upper 6 bits of the distance value. The +distance values themselves will be between 0 and the sliding +dictionary size, either 4K or 8K. + +The Shannon-Fano trees themselves are stored in a compressed +format. The first byte of the tree data represents the number of +bytes of data representing the (compressed) Shannon-Fano tree +minus 1. The remaining bytes represent the Shannon-Fano tree +data encoded as: + + High 4 bits: Number of values at this bit length + 1. (1 - 16) + Low 4 bits: Bit Length needed to represent value + 1. (1 - 16) + +The Shannon-Fano codes can be constructed from the bit lengths +using the following algorithm: + +1) Sort the Bit Lengths in ascending order, while retaining the + order of the original lengths stored in the file. + +2) Generate the Shannon-Fano trees: + + Code <- 0 + CodeIncrement <- 0 + LastBitLength <- 0 + i <- number of Shannon-Fano codes - 1 (either 255 or 63) + + loop while i >= 0 + Code = Code + CodeIncrement + if BitLength(i) <> LastBitLength then + LastBitLength=BitLength(i) + CodeIncrement = 1 shifted left (16 - LastBitLength) + ShannonCode(i) = Code + i <- i - 1 + end loop + + +3) Reverse the order of all the bits in the above ShannonCode() + vector, so that the most significant bit becomes the least + significant bit. For example, the value 0x1234 (hex) would + become 0x2C48 (hex). + +4) Restore the order of Shannon-Fano codes as originally stored + within the file. + +Example: + + This example will show the encoding of a Shannon-Fano tree + of size 8. Notice that the actual Shannon-Fano trees used + for Imploding are either 64 or 256 entries in size. + +Example: 0x02, 0x42, 0x01, 0x13 + + The first byte indicates 3 values in this table. Decoding the + bytes: + 0x42 = 5 codes of 3 bits long + 0x01 = 1 code of 2 bits long + 0x13 = 2 codes of 4 bits long + + This would generate the original bit length array of: + (3, 3, 3, 3, 3, 2, 4, 4) + + There are 8 codes in this table for the values 0 thru 7. Using the + algorithm to obtain the Shannon-Fano codes produces: + + Reversed Order Original +Val Sorted Constructed Code Value Restored Length +--- ------ ----------------- -------- -------- ------ +0: 2 1100000000000000 11 101 3 +1: 3 1010000000000000 101 001 3 +2: 3 1000000000000000 001 110 3 +3: 3 0110000000000000 110 010 3 +4: 3 0100000000000000 010 100 3 +5: 3 0010000000000000 100 11 2 +6: 4 0001000000000000 1000 1000 4 +7: 4 0000000000000000 0000 0000 4 + + +The values in the Val, Order Restored and Original Length columns +now represent the Shannon-Fano encoding tree that can be used for +decoding the Shannon-Fano encoded data. How to parse the +variable length Shannon-Fano values from the data stream is beyond the +scope of this document. (See the references listed at the end of +this document for more information.) However, traditional decoding +schemes used for Huffman variable length decoding, such as the +Greenlaw algorithm, can be successfully applied. + +The compressed data stream begins immediately after the +compressed Shannon-Fano data. The compressed data stream can be +interpreted as follows: + +loop until done + read 1 bit from input stream. + + if this bit is non-zero then (encoded data is literal data) + if Literal Shannon-Fano tree is present + read and decode character using Literal Shannon-Fano tree. + otherwise + read 8 bits from input stream. + copy character to the output stream. + otherwise (encoded data is sliding dictionary match) + if 8K dictionary size + read 7 bits for offset Distance (lower 7 bits of offset). + otherwise + read 6 bits for offset Distance (lower 6 bits of offset). + + using the Distance Shannon-Fano tree, read and decode the + upper 6 bits of the Distance value. + + using the Length Shannon-Fano tree, read and decode + the Length value. + + Length <- Length + Minimum Match Length + + if Length = 63 + Minimum Match Length + read 8 bits from the input stream, + add this value to Length. + + move backwards Distance+1 bytes in the output stream, and + copy Length characters from this position to the output + stream. (if this position is before the start of the output + stream, then assume that all the data before the start of + the output stream is filled with zeros). +end loop + +Tokenizing - Method 7 +-------------------- + +This method is not used by PKZIP. + +Deflating - Method 8 +----------------- + +The Deflate algorithm is similar to the Implode algorithm using +a sliding dictionary of up to 32K with secondary compression +from Huffman/Shannon-Fano codes. + +The compressed data is stored in blocks with a header describing +the block and the Huffman codes used in the data block. The header +format is as follows: + + Bit 0: Last Block bit This bit is set to 1 if this is the last + compressed block in the data. + Bits 1-2: Block type + 00 (0) - Block is stored - All stored data is byte aligned. + Skip bits until next byte, then next word = block length, + followed by the ones compliment of the block length word. + Remaining data in block is the stored data. + + 01 (1) - Use fixed Huffman codes for literal and distance codes. + Lit Code Bits Dist Code Bits + --------- ---- --------- ---- + 0 - 143 8 0 - 31 5 + 144 - 255 9 + 256 - 279 7 + 280 - 287 8 + + Literal codes 286-287 and distance codes 30-31 are never + used but participate in the huffman construction. + + 10 (2) - Dynamic Huffman codes. (See expanding Huffman codes) + + 11 (3) - Reserved - Flag a "Error in compressed data" if seen. + +Expanding Huffman Codes +----------------------- +If the data block is stored with dynamic Huffman codes, the Huffman +codes are sent in the following compressed format: + + 5 Bits: # of Literal codes sent - 257 (257 - 286) + All other codes are never sent. + 5 Bits: # of Dist codes - 1 (1 - 32) + 4 Bits: # of Bit Length codes - 4 (4 - 19) + +The Huffman codes are sent as bit lengths and the codes are built as +described in the implode algorithm. The bit lengths themselves are +compressed with Huffman codes. There are 19 bit length codes: + + 0 - 15: Represent bit lengths of 0 - 15 + 16: Copy the previous bit length 3 - 6 times. + The next 2 bits indicate repeat length (0 = 3, ... ,3 = 6) + Example: Codes 8, 16 (+2 bits 11), 16 (+2 bits 10) will + expand to 12 bit lengths of 8 (1 + 6 + 5) + 17: Repeat a bit length of 0 for 3 - 10 times. (3 bits of length) + 18: Repeat a bit length of 0 for 11 - 138 times (7 bits of length) + +The lengths of the bit length codes are sent packed 3 bits per value +(0 - 7) in the following order: + + 16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15 + +The Huffman codes should be built as described in the Implode algorithm +except codes are assigned starting at the shortest bit length, i.e. the +shortest code should be all 0's rather than all 1's. Also, codes with +a bit length of zero do not participate in the tree construction. The +codes are then used to decode the bit lengths for the literal and distance +tables. + +The bit lengths for the literal tables are sent first with the number +of entries sent described by the 5 bits sent earlier. There are up +to 286 literal characters; the first 256 represent the respective 8 +bit character, code 256 represents the End-Of-Block code, the remaining +29 codes represent copy lengths of 3 thru 258. There are up to 30 +distance codes representing distances from 1 thru 32k as described +below. + + Length Codes + ------------ + Extra Extra Extra Extra + Code Bits Length Code Bits Lengths Code Bits Lengths Code Bits Length(s) + ---- ---- ------ ---- ---- ------- ---- ---- ------- ---- ---- --------- + 257 0 3 265 1 11,12 273 3 35-42 281 5 131-162 + 258 0 4 266 1 13,14 274 3 43-50 282 5 163-194 + 259 0 5 267 1 15,16 275 3 51-58 283 5 195-226 + 260 0 6 268 1 17,18 276 3 59-66 284 5 227-257 + 261 0 7 269 2 19-22 277 4 67-82 285 0 258 + 262 0 8 270 2 23-26 278 4 83-98 + 263 0 9 271 2 27-30 279 4 99-114 + 264 0 10 272 2 31-34 280 4 115-130 + + Distance Codes + -------------- + Extra Extra Extra Extra + Code Bits Dist Code Bits Dist Code Bits Distance Code Bits Distance + ---- ---- ---- ---- ---- ------ ---- ---- -------- ---- ---- -------- + 0 0 1 8 3 17-24 16 7 257-384 24 11 4097-6144 + 1 0 2 9 3 25-32 17 7 385-512 25 11 6145-8192 + 2 0 3 10 4 33-48 18 8 513-768 26 12 8193-12288 + 3 0 4 11 4 49-64 19 8 769-1024 27 12 12289-16384 + 4 1 5,6 12 5 65-96 20 9 1025-1536 28 13 16385-24576 + 5 1 7,8 13 5 97-128 21 9 1537-2048 29 13 24577-32768 + 6 2 9-12 14 6 129-192 22 10 2049-3072 + 7 2 13-16 15 6 193-256 23 10 3073-4096 + +The compressed data stream begins immediately after the +compressed header data. The compressed data stream can be +interpreted as follows: + +do + read header from input stream. + + if stored block + skip bits until byte aligned + read count and 1's compliment of count + copy count bytes data block + otherwise + loop until end of block code sent + decode literal character from input stream + if literal < 256 + copy character to the output stream + otherwise + if literal = end of block + break from loop + otherwise + decode distance from input stream + + move backwards distance bytes in the output stream, and + copy length characters from this position to the output + stream. + end loop +while not last block + +if data descriptor exists + skip bits until byte aligned + check data descriptor signature + read crc and sizes +endif + +Decryption +---------- + +The encryption used in PKZIP was generously supplied by Roger +Schlafly. PKWARE is grateful to Mr. Schlafly for his expert +help and advice in the field of data encryption. + +PKZIP encrypts the compressed data stream. Encrypted files must +be decrypted before they can be extracted. + +Each encrypted file has an extra 12 bytes stored at the start of +the data area defining the encryption header for that file. The +encryption header is originally set to random values, and then +itself encrypted, using three, 32-bit keys. The key values are +initialized using the supplied encryption password. After each byte +is encrypted, the keys are then updated using pseudo-random number +generation techniques in combination with the same CRC-32 algorithm +used in PKZIP and described elsewhere in this document. + +The following is the basic steps required to decrypt a file: + +1) Initialize the three 32-bit keys with the password. +2) Read and decrypt the 12-byte encryption header, further + initializing the encryption keys. +3) Read and decrypt the compressed data stream using the + encryption keys. + + +Step 1 - Initializing the encryption keys +----------------------------------------- + +Key(0) <- 305419896 +Key(1) <- 591751049 +Key(2) <- 878082192 + +loop for i <- 0 to length(password)-1 + update_keys(password(i)) +end loop + + +Where update_keys() is defined as: + + +update_keys(char): + Key(0) <- crc32(key(0),char) + Key(1) <- Key(1) + (Key(0) & 000000ffH) + Key(1) <- Key(1) * 134775813 + 1 + Key(2) <- crc32(key(2),key(1) >> 24) +end update_keys + + +Where crc32(old_crc,char) is a routine that given a CRC value and a +character, returns an updated CRC value after applying the CRC-32 +algorithm described elsewhere in this document. + + +Step 2 - Decrypting the encryption header +----------------------------------------- + +The purpose of this step is to further initialize the encryption +keys, based on random data, to render a plaintext attack on the +data ineffective. + + +Read the 12-byte encryption header into Buffer, in locations +Buffer(0) thru Buffer(11). + +loop for i <- 0 to 11 + C <- buffer(i) ^ decrypt_byte() + update_keys(C) + buffer(i) <- C +end loop + + +Where decrypt_byte() is defined as: + + +unsigned char decrypt_byte() + local unsigned short temp + temp <- Key(2) | 2 + decrypt_byte <- (temp * (temp ^ 1)) >> 8 +end decrypt_byte + + +After the header is decrypted, the last 1 or 2 bytes in Buffer +should be the high-order word/byte of the CRC for the file being +decrypted, stored in Intel low-byte/high-byte order, or the high-order +byte of the file time if bit 3 of the general purpose bit flag is set. +Versions of PKZIP prior to 2.0 used a 2 byte CRC check; a 1 byte CRC check is +used on versions after 2.0. This can be used to test if the password +supplied is correct or not. + + +Step 3 - Decrypting the compressed data stream +---------------------------------------------- + +The compressed data stream can be decrypted as follows: + + +loop until done + read a character into C + Temp <- C ^ decrypt_byte() + update_keys(temp) + output Temp +end loop + + +In addition to the above mentioned contributors to PKZIP and PKUNZIP, +I would like to extend special thanks to Robert Mahoney for suggesting +the extension .ZIP for this software. + + +References: + + Fiala, Edward R., and Greene, Daniel H., "Data compression with + finite windows", Communications of the ACM, Volume 32, Number 4, + April 1989, pages 490-505. + + Held, Gilbert, "Data Compression, Techniques and Applications, + Hardware and Software Considerations", + John Wiley & Sons, 1987. + + Huffman, D.A., "A method for the construction of minimum-redundancy + codes", Proceedings of the IRE, Volume 40, Number 9, September 1952, + pages 1098-1101. + + Nelson, Mark, "LZW Data Compression", Dr. Dobbs Journal, Volume 14, + Number 10, October 1989, pages 29-37. + + Nelson, Mark, "The Data Compression Book", M&T Books, 1991. + + Storer, James A., "Data Compression, Methods and Theory", + Computer Science Press, 1988 + + Welch, Terry, "A Technique for High-Performance Data Compression", + IEEE Computer, Volume 17, Number 6, June 1984, pages 8-19. + + Ziv, J. and Lempel, A., "A universal algorithm for sequential data + compression", Communications of the ACM, Volume 30, Number 6, + June 1987, pages 520-540. + + Ziv, J. and Lempel, A., "Compression of individual sequences via + variable-rate coding", IEEE Transactions on Information Theory, + Volume 24, Number 5, September 1978, pages 530-536. -- cgit v1.2.3