csp2: CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/share/doc/xz/xz-file-format.txt annotate

annotate CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/share/doc/xz/xz-file-format.txt @ 68:5028fdace37b

planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d

author	jpayne
date	Tue, 18 Mar 2025 16:23:26 -0400
parents
children

rev	line source
jpayne@68	1
jpayne@68	2 The .xz File Format
jpayne@68	3 ===================
jpayne@68	4
jpayne@68	5 Version 1.2.1 (2024-04-08)
jpayne@68	6
jpayne@68	7
jpayne@68	8 0. Preface
jpayne@68	9 0.1. Notices and Acknowledgements
jpayne@68	10 0.2. Getting the Latest Version
jpayne@68	11 0.3. Version History
jpayne@68	12 1. Conventions
jpayne@68	13 1.1. Byte and Its Representation
jpayne@68	14 1.2. Multibyte Integers
jpayne@68	15 2. Overall Structure of .xz File
jpayne@68	16 2.1. Stream
jpayne@68	17 2.1.1. Stream Header
jpayne@68	18 2.1.1.1. Header Magic Bytes
jpayne@68	19 2.1.1.2. Stream Flags
jpayne@68	20 2.1.1.3. CRC32
jpayne@68	21 2.1.2. Stream Footer
jpayne@68	22 2.1.2.1. CRC32
jpayne@68	23 2.1.2.2. Backward Size
jpayne@68	24 2.1.2.3. Stream Flags
jpayne@68	25 2.1.2.4. Footer Magic Bytes
jpayne@68	26 2.2. Stream Padding
jpayne@68	27 3. Block
jpayne@68	28 3.1. Block Header
jpayne@68	29 3.1.1. Block Header Size
jpayne@68	30 3.1.2. Block Flags
jpayne@68	31 3.1.3. Compressed Size
jpayne@68	32 3.1.4. Uncompressed Size
jpayne@68	33 3.1.5. List of Filter Flags
jpayne@68	34 3.1.6. Header Padding
jpayne@68	35 3.1.7. CRC32
jpayne@68	36 3.2. Compressed Data
jpayne@68	37 3.3. Block Padding
jpayne@68	38 3.4. Check
jpayne@68	39 4. Index
jpayne@68	40 4.1. Index Indicator
jpayne@68	41 4.2. Number of Records
jpayne@68	42 4.3. List of Records
jpayne@68	43 4.3.1. Unpadded Size
jpayne@68	44 4.3.2. Uncompressed Size
jpayne@68	45 4.4. Index Padding
jpayne@68	46 4.5. CRC32
jpayne@68	47 5. Filter Chains
jpayne@68	48 5.1. Alignment
jpayne@68	49 5.2. Security
jpayne@68	50 5.3. Filters
jpayne@68	51 5.3.1. LZMA2
jpayne@68	52 5.3.2. Branch/Call/Jump Filters for Executables
jpayne@68	53 5.3.3. Delta
jpayne@68	54 5.3.3.1. Format of the Encoded Output
jpayne@68	55 5.4. Custom Filter IDs
jpayne@68	56 5.4.1. Reserved Custom Filter ID Ranges
jpayne@68	57 6. Cyclic Redundancy Checks
jpayne@68	58 7. References
jpayne@68	59
jpayne@68	60
jpayne@68	61 0. Preface
jpayne@68	62
jpayne@68	63 This document describes the .xz file format (filename suffix
jpayne@68	64 ".xz", MIME type "application/x-xz"). It is intended that this
jpayne@68	65 this format replace the old .lzma format used by LZMA SDK and
jpayne@68	66 LZMA Utils.
jpayne@68	67
jpayne@68	68
jpayne@68	69 0.1. Notices and Acknowledgements
jpayne@68	70
jpayne@68	71 This file format was designed by Lasse Collin
jpayne@68	72 <lasse.collin@tukaani.org> and Igor Pavlov.
jpayne@68	73
jpayne@68	74 Special thanks for helping with this document goes to
jpayne@68	75 Ville Koskinen. Thanks for helping with this document goes to
jpayne@68	76 Mark Adler, H. Peter Anvin, Mikko Pouru, and Lars Wirzenius.
jpayne@68	77
jpayne@68	78 This document has been put into the public domain.
jpayne@68	79
jpayne@68	80
jpayne@68	81 0.2. Getting the Latest Version
jpayne@68	82
jpayne@68	83 The latest official version of this document can be downloaded
jpayne@68	84 from <https://tukaani.org/xz/xz-file-format.txt>.
jpayne@68	85
jpayne@68	86 Specific versions of this document have a filename
jpayne@68	87 xz-file-format-X.Y.Z.txt where X.Y.Z is the version number.
jpayne@68	88 For example, the version 1.0.0 of this document is available
jpayne@68	89 at <https://tukaani.org/xz/xz-file-format-1.0.0.txt>.
jpayne@68	90
jpayne@68	91
jpayne@68	92 0.3. Version History
jpayne@68	93
jpayne@68	94 Version Date Description
jpayne@68	95
jpayne@68	96 1.2.1 2024-04-08 The URLs of this specification and
jpayne@68	97 XZ Utils were changed back to the
jpayne@68	98 original ones in Sections 0.2 and 7.
jpayne@68	99
jpayne@68	100 1.2.0 2024-01-19 Added RISC-V filter and updated URLs in
jpayne@68	101 Sections 0.2 and 7. The URL of this
jpayne@68	102 specification was changed.
jpayne@68	103
jpayne@68	104 1.1.0 2022-12-11 Added ARM64 filter and clarified 32-bit
jpayne@68	105 ARM endianness in Section 5.3.2,
jpayne@68	106 language improvements in Section 5.4
jpayne@68	107
jpayne@68	108 1.0.4 2009-08-27 Language improvements in Sections 1.2,
jpayne@68	109 2.1.1.2, 3.1.1, 3.1.2, and 5.3.1
jpayne@68	110
jpayne@68	111 1.0.3 2009-06-05 Spelling fixes in Sections 5.1 and 5.4
jpayne@68	112
jpayne@68	113 1.0.2 2009-06-04 Typo fixes in Sections 4 and 5.3.1
jpayne@68	114
jpayne@68	115 1.0.1 2009-06-01 Typo fix in Section 0.3 and minor
jpayne@68	116 clarifications to Sections 2, 2.2,
jpayne@68	117 3.3, 4.4, and 5.3.2
jpayne@68	118
jpayne@68	119 1.0.0 2009-01-14 The first official version
jpayne@68	120
jpayne@68	121
jpayne@68	122 1. Conventions
jpayne@68	123
jpayne@68	124 The key words "MUST", "MUST NOT", "REQUIRED", "SHOULD",
jpayne@68	125 "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
jpayne@68	126 document are to be interpreted as described in [RFC-2119].
jpayne@68	127
jpayne@68	128 Indicating a warning means displaying a message, returning
jpayne@68	129 appropriate exit status, or doing something else to let the
jpayne@68	130 user know that something worth warning occurred. The operation
jpayne@68	131 SHOULD still finish if a warning is indicated.
jpayne@68	132
jpayne@68	133 Indicating an error means displaying a message, returning
jpayne@68	134 appropriate exit status, or doing something else to let the
jpayne@68	135 user know that something prevented successfully finishing the
jpayne@68	136 operation. The operation MUST be aborted once an error has
jpayne@68	137 been indicated.
jpayne@68	138
jpayne@68	139
jpayne@68	140 1.1. Byte and Its Representation
jpayne@68	141
jpayne@68	142 In this document, byte is always 8 bits.
jpayne@68	143
jpayne@68	144 A "null byte" has all bits unset. That is, the value of a null
jpayne@68	145 byte is 0x00.
jpayne@68	146
jpayne@68	147 To represent byte blocks, this document uses notation that
jpayne@68	148 is similar to the notation used in [RFC-1952]:
jpayne@68	149
jpayne@68	150 +-------+
jpayne@68	151 \| Foo \| One byte.
jpayne@68	152 +-------+
jpayne@68	153
jpayne@68	154 +---+---+
jpayne@68	155 \| Foo \| Two bytes; that is, some of the vertical bars
jpayne@68	156 +---+---+ can be missing.
jpayne@68	157
jpayne@68	158 +=======+
jpayne@68	159 \| Foo \| Zero or more bytes.
jpayne@68	160 +=======+
jpayne@68	161
jpayne@68	162 In this document, a boxed byte or a byte sequence declared
jpayne@68	163 using this notation is called "a field". The example field
jpayne@68	164 above would be called "the Foo field" or plain "Foo".
jpayne@68	165
jpayne@68	166 If there are many fields, they may be split to multiple lines.
jpayne@68	167 This is indicated with an arrow ("--->"):
jpayne@68	168
jpayne@68	169 +=====+
jpayne@68	170 \| Foo \|
jpayne@68	171 +=====+
jpayne@68	172
jpayne@68	173 +=====+
jpayne@68	174 ---> \| Bar \|
jpayne@68	175 +=====+
jpayne@68	176
jpayne@68	177 The above is equivalent to this:
jpayne@68	178
jpayne@68	179 +=====+=====+
jpayne@68	180 \| Foo \| Bar \|
jpayne@68	181 +=====+=====+
jpayne@68	182
jpayne@68	183
jpayne@68	184 1.2. Multibyte Integers
jpayne@68	185
jpayne@68	186 Multibyte integers of static length, such as CRC values,
jpayne@68	187 are stored in little endian byte order (least significant
jpayne@68	188 byte first).
jpayne@68	189
jpayne@68	190 When smaller values are more likely than bigger values (for
jpayne@68	191 example file sizes), multibyte integers are encoded in a
jpayne@68	192 variable-length representation:
jpayne@68	193 - Numbers in the range [0, 127] are copied as is, and take
jpayne@68	194 one byte of space.
jpayne@68	195 - Bigger numbers will occupy two or more bytes. All but the
jpayne@68	196 last byte of the multibyte representation have the highest
jpayne@68	197 (eighth) bit set.
jpayne@68	198
jpayne@68	199 For now, the value of the variable-length integers is limited
jpayne@68	200 to 63 bits, which limits the encoded size of the integer to
jpayne@68	201 nine bytes. These limits may be increased in the future if
jpayne@68	202 needed.
jpayne@68	203
jpayne@68	204 The following C code illustrates encoding and decoding of
jpayne@68	205 variable-length integers. The functions return the number of
jpayne@68	206 bytes occupied by the integer (1-9), or zero on error.
jpayne@68	207
jpayne@68	208 #include <stddef.h>
jpayne@68	209 #include <inttypes.h>
jpayne@68	210
jpayne@68	211 size_t
jpayne@68	212 encode(uint8_t buf[static 9], uint64_t num)
jpayne@68	213 {
jpayne@68	214 if (num > UINT64_MAX / 2)
jpayne@68	215 return 0;
jpayne@68	216
jpayne@68	217 size_t i = 0;
jpayne@68	218
jpayne@68	219 while (num >= 0x80) {
jpayne@68	220 buf[i++] = (uint8_t)(num) \| 0x80;
jpayne@68	221 num >>= 7;
jpayne@68	222 }
jpayne@68	223
jpayne@68	224 buf[i++] = (uint8_t)(num);
jpayne@68	225
jpayne@68	226 return i;
jpayne@68	227 }
jpayne@68	228
jpayne@68	229 size_t
jpayne@68	230 decode(const uint8_t buf[], size_t size_max, uint64_t *num)
jpayne@68	231 {
jpayne@68	232 if (size_max == 0)
jpayne@68	233 return 0;
jpayne@68	234
jpayne@68	235 if (size_max > 9)
jpayne@68	236 size_max = 9;
jpayne@68	237
jpayne@68	238 *num = buf[0] & 0x7F;
jpayne@68	239 size_t i = 0;
jpayne@68	240
jpayne@68	241 while (buf[i++] & 0x80) {
jpayne@68	242 if (i >= size_max \|\| buf[i] == 0x00)
jpayne@68	243 return 0;
jpayne@68	244
jpayne@68	245 num \|= (uint64_t)(buf[i] & 0x7F) << (i 7);
jpayne@68	246 }
jpayne@68	247
jpayne@68	248 return i;
jpayne@68	249 }
jpayne@68	250
jpayne@68	251
jpayne@68	252 2. Overall Structure of .xz File
jpayne@68	253
jpayne@68	254 A standalone .xz files consist of one or more Streams which may
jpayne@68	255 have Stream Padding between or after them:
jpayne@68	256
jpayne@68	257 +========+================+========+================+
jpayne@68	258 \| Stream \| Stream Padding \| Stream \| Stream Padding \| ...
jpayne@68	259 +========+================+========+================+
jpayne@68	260
jpayne@68	261 The sizes of Stream and Stream Padding are always multiples
jpayne@68	262 of four bytes, thus the size of every valid .xz file MUST be
jpayne@68	263 a multiple of four bytes.
jpayne@68	264
jpayne@68	265 While a typical file contains only one Stream and no Stream
jpayne@68	266 Padding, a decoder handling standalone .xz files SHOULD support
jpayne@68	267 files that have more than one Stream or Stream Padding.
jpayne@68	268
jpayne@68	269 In contrast to standalone .xz files, when the .xz file format
jpayne@68	270 is used as an internal part of some other file format or
jpayne@68	271 communication protocol, it usually is expected that the decoder
jpayne@68	272 stops after the first Stream, and doesn't look for Stream
jpayne@68	273 Padding or possibly other Streams.
jpayne@68	274
jpayne@68	275
jpayne@68	276 2.1. Stream
jpayne@68	277
jpayne@68	278 +-+-+-+-+-+-+-+-+-+-+-+-+=======+=======+ +=======+
jpayne@68	279 \| Stream Header \| Block \| Block \| ... \| Block \|
jpayne@68	280 +-+-+-+-+-+-+-+-+-+-+-+-+=======+=======+ +=======+
jpayne@68	281
jpayne@68	282 +=======+-+-+-+-+-+-+-+-+-+-+-+-+
jpayne@68	283 ---> \| Index \| Stream Footer \|
jpayne@68	284 +=======+-+-+-+-+-+-+-+-+-+-+-+-+
jpayne@68	285
jpayne@68	286 All the above fields have a size that is a multiple of four. If
jpayne@68	287 Stream is used as an internal part of another file format, it
jpayne@68	288 is RECOMMENDED to make the Stream start at an offset that is
jpayne@68	289 a multiple of four bytes.
jpayne@68	290
jpayne@68	291 Stream Header, Index, and Stream Footer are always present in
jpayne@68	292 a Stream. The maximum size of the Index field is 16 GiB (2^34).
jpayne@68	293
jpayne@68	294 There are zero or more Blocks. The maximum number of Blocks is
jpayne@68	295 limited only by the maximum size of the Index field.
jpayne@68	296
jpayne@68	297 Total size of a Stream MUST be less than 8 EiB (2^63 bytes).
jpayne@68	298 The same limit applies to the total amount of uncompressed
jpayne@68	299 data stored in a Stream.
jpayne@68	300
jpayne@68	301 If an implementation supports handling .xz files with multiple
jpayne@68	302 concatenated Streams, it MAY apply the above limits to the file
jpayne@68	303 as a whole instead of limiting per Stream basis.
jpayne@68	304
jpayne@68	305
jpayne@68	306 2.1.1. Stream Header
jpayne@68	307
jpayne@68	308 +---+---+---+---+---+---+-------+------+--+--+--+--+
jpayne@68	309 \| Header Magic Bytes \| Stream Flags \| CRC32 \|
jpayne@68	310 +---+---+---+---+---+---+-------+------+--+--+--+--+
jpayne@68	311
jpayne@68	312
jpayne@68	313 2.1.1.1. Header Magic Bytes
jpayne@68	314
jpayne@68	315 The first six (6) bytes of the Stream are so called Header
jpayne@68	316 Magic Bytes. They can be used to identify the file type.
jpayne@68	317
jpayne@68	318 Using a C array and ASCII:
jpayne@68	319 const uint8_t HEADER_MAGIC[6]
jpayne@68	320 = { 0xFD, '7', 'z', 'X', 'Z', 0x00 };
jpayne@68	321
jpayne@68	322 In plain hexadecimal:
jpayne@68	323 FD 37 7A 58 5A 00
jpayne@68	324
jpayne@68	325 Notes:
jpayne@68	326 - The first byte (0xFD) was chosen so that the files cannot
jpayne@68	327 be erroneously detected as being in .lzma format, in which
jpayne@68	328 the first byte is in the range [0x00, 0xE0].
jpayne@68	329 - The sixth byte (0x00) was chosen to prevent applications
jpayne@68	330 from misdetecting the file as a text file.
jpayne@68	331
jpayne@68	332 If the Header Magic Bytes don't match, the decoder MUST
jpayne@68	333 indicate an error.
jpayne@68	334
jpayne@68	335
jpayne@68	336 2.1.1.2. Stream Flags
jpayne@68	337
jpayne@68	338 The first byte of Stream Flags is always a null byte. In the
jpayne@68	339 future, this byte may be used to indicate a new Stream version
jpayne@68	340 or other Stream properties.
jpayne@68	341
jpayne@68	342 The second byte of Stream Flags is a bit field:
jpayne@68	343
jpayne@68	344 Bit(s) Mask Description
jpayne@68	345 0-3 0x0F Type of Check (see Section 3.4):
jpayne@68	346 ID Size Check name
jpayne@68	347 0x00 0 bytes None
jpayne@68	348 0x01 4 bytes CRC32
jpayne@68	349 0x02 4 bytes (Reserved)
jpayne@68	350 0x03 4 bytes (Reserved)
jpayne@68	351 0x04 8 bytes CRC64
jpayne@68	352 0x05 8 bytes (Reserved)
jpayne@68	353 0x06 8 bytes (Reserved)
jpayne@68	354 0x07 16 bytes (Reserved)
jpayne@68	355 0x08 16 bytes (Reserved)
jpayne@68	356 0x09 16 bytes (Reserved)
jpayne@68	357 0x0A 32 bytes SHA-256
jpayne@68	358 0x0B 32 bytes (Reserved)
jpayne@68	359 0x0C 32 bytes (Reserved)
jpayne@68	360 0x0D 64 bytes (Reserved)
jpayne@68	361 0x0E 64 bytes (Reserved)
jpayne@68	362 0x0F 64 bytes (Reserved)
jpayne@68	363 4-7 0xF0 Reserved for future use; MUST be zero for now.
jpayne@68	364
jpayne@68	365 Implementations SHOULD support at least the Check IDs 0x00
jpayne@68	366 (None) and 0x01 (CRC32). Supporting other Check IDs is
jpayne@68	367 OPTIONAL. If an unsupported Check is used, the decoder SHOULD
jpayne@68	368 indicate a warning or error.
jpayne@68	369
jpayne@68	370 If any reserved bit is set, the decoder MUST indicate an error.
jpayne@68	371 It is possible that there is a new field present which the
jpayne@68	372 decoder is not aware of, and can thus parse the Stream Header
jpayne@68	373 incorrectly.
jpayne@68	374
jpayne@68	375
jpayne@68	376 2.1.1.3. CRC32
jpayne@68	377
jpayne@68	378 The CRC32 is calculated from the Stream Flags field. It is
jpayne@68	379 stored as an unsigned 32-bit little endian integer. If the
jpayne@68	380 calculated value does not match the stored one, the decoder
jpayne@68	381 MUST indicate an error.
jpayne@68	382
jpayne@68	383 The idea is that Stream Flags would always be two bytes, even
jpayne@68	384 if new features are needed. This way old decoders will be able
jpayne@68	385 to verify the CRC32 calculated from Stream Flags, and thus
jpayne@68	386 distinguish between corrupt files (CRC32 doesn't match) and
jpayne@68	387 files that the decoder doesn't support (CRC32 matches but
jpayne@68	388 Stream Flags has reserved bits set).
jpayne@68	389
jpayne@68	390
jpayne@68	391 2.1.2. Stream Footer
jpayne@68	392
jpayne@68	393 +-+-+-+-+---+---+---+---+-------+------+----------+---------+
jpayne@68	394 \| CRC32 \| Backward Size \| Stream Flags \| Footer Magic Bytes \|
jpayne@68	395 +-+-+-+-+---+---+---+---+-------+------+----------+---------+
jpayne@68	396
jpayne@68	397
jpayne@68	398 2.1.2.1. CRC32
jpayne@68	399
jpayne@68	400 The CRC32 is calculated from the Backward Size and Stream Flags
jpayne@68	401 fields. It is stored as an unsigned 32-bit little endian
jpayne@68	402 integer. If the calculated value does not match the stored one,
jpayne@68	403 the decoder MUST indicate an error.
jpayne@68	404
jpayne@68	405 The reason to have the CRC32 field before the Backward Size and
jpayne@68	406 Stream Flags fields is to keep the four-byte fields aligned to
jpayne@68	407 a multiple of four bytes.
jpayne@68	408
jpayne@68	409
jpayne@68	410 2.1.2.2. Backward Size
jpayne@68	411
jpayne@68	412 Backward Size is stored as a 32-bit little endian integer,
jpayne@68	413 which indicates the size of the Index field as multiple of
jpayne@68	414 four bytes, minimum value being four bytes:
jpayne@68	415
jpayne@68	416 real_backward_size = (stored_backward_size + 1) * 4;
jpayne@68	417
jpayne@68	418 If the stored value does not match the real size of the Index
jpayne@68	419 field, the decoder MUST indicate an error.
jpayne@68	420
jpayne@68	421 Using a fixed-size integer to store Backward Size makes
jpayne@68	422 it slightly simpler to parse the Stream Footer when the
jpayne@68	423 application needs to parse the Stream backwards.
jpayne@68	424
jpayne@68	425
jpayne@68	426 2.1.2.3. Stream Flags
jpayne@68	427
jpayne@68	428 This is a copy of the Stream Flags field from the Stream
jpayne@68	429 Header. The information stored to Stream Flags is needed
jpayne@68	430 when parsing the Stream backwards. The decoder MUST compare
jpayne@68	431 the Stream Flags fields in both Stream Header and Stream
jpayne@68	432 Footer, and indicate an error if they are not identical.
jpayne@68	433
jpayne@68	434
jpayne@68	435 2.1.2.4. Footer Magic Bytes
jpayne@68	436
jpayne@68	437 As the last step of the decoding process, the decoder MUST
jpayne@68	438 verify the existence of Footer Magic Bytes. If they don't
jpayne@68	439 match, an error MUST be indicated.
jpayne@68	440
jpayne@68	441 Using a C array and ASCII:
jpayne@68	442 const uint8_t FOOTER_MAGIC[2] = { 'Y', 'Z' };
jpayne@68	443
jpayne@68	444 In hexadecimal:
jpayne@68	445 59 5A
jpayne@68	446
jpayne@68	447 The primary reason to have Footer Magic Bytes is to make
jpayne@68	448 it easier to detect incomplete files quickly, without
jpayne@68	449 uncompressing. If the file does not end with Footer Magic Bytes
jpayne@68	450 (excluding Stream Padding described in Section 2.2), it cannot
jpayne@68	451 be undamaged, unless someone has intentionally appended garbage
jpayne@68	452 after the end of the Stream.
jpayne@68	453
jpayne@68	454
jpayne@68	455 2.2. Stream Padding
jpayne@68	456
jpayne@68	457 Only the decoders that support decoding of concatenated Streams
jpayne@68	458 MUST support Stream Padding.
jpayne@68	459
jpayne@68	460 Stream Padding MUST contain only null bytes. To preserve the
jpayne@68	461 four-byte alignment of consecutive Streams, the size of Stream
jpayne@68	462 Padding MUST be a multiple of four bytes. Empty Stream Padding
jpayne@68	463 is allowed. If these requirements are not met, the decoder MUST
jpayne@68	464 indicate an error.
jpayne@68	465
jpayne@68	466 Note that non-empty Stream Padding is allowed at the end of the
jpayne@68	467 file; there doesn't need to be a new Stream after non-empty
jpayne@68	468 Stream Padding. This can be convenient in certain situations
jpayne@68	469 [GNU-tar].
jpayne@68	470
jpayne@68	471 The possibility of Stream Padding MUST be taken into account
jpayne@68	472 when designing an application that parses Streams backwards,
jpayne@68	473 and the application supports concatenated Streams.
jpayne@68	474
jpayne@68	475
jpayne@68	476 3. Block
jpayne@68	477
jpayne@68	478 +==============+=================+===============+=======+
jpayne@68	479 \| Block Header \| Compressed Data \| Block Padding \| Check \|
jpayne@68	480 +==============+=================+===============+=======+
jpayne@68	481
jpayne@68	482
jpayne@68	483 3.1. Block Header
jpayne@68	484
jpayne@68	485 +-------------------+-------------+=================+
jpayne@68	486 \| Block Header Size \| Block Flags \| Compressed Size \|
jpayne@68	487 +-------------------+-------------+=================+
jpayne@68	488
jpayne@68	489 +===================+======================+
jpayne@68	490 ---> \| Uncompressed Size \| List of Filter Flags \|
jpayne@68	491 +===================+======================+
jpayne@68	492
jpayne@68	493 +================+--+--+--+--+
jpayne@68	494 ---> \| Header Padding \| CRC32 \|
jpayne@68	495 +================+--+--+--+--+
jpayne@68	496
jpayne@68	497
jpayne@68	498 3.1.1. Block Header Size
jpayne@68	499
jpayne@68	500 This field overlaps with the Index Indicator field (see
jpayne@68	501 Section 4.1).
jpayne@68	502
jpayne@68	503 This field contains the size of the Block Header field,
jpayne@68	504 including the Block Header Size field itself. Valid values are
jpayne@68	505 in the range [0x01, 0xFF], which indicate the size of the Block
jpayne@68	506 Header as multiples of four bytes, minimum size being eight
jpayne@68	507 bytes:
jpayne@68	508
jpayne@68	509 real_header_size = (encoded_header_size + 1) * 4;
jpayne@68	510
jpayne@68	511 If a Block Header bigger than 1024 bytes is needed in the
jpayne@68	512 future, a new field can be added between the Block Header and
jpayne@68	513 Compressed Data fields. The presence of this new field would
jpayne@68	514 be indicated in the Block Header field.
jpayne@68	515
jpayne@68	516
jpayne@68	517 3.1.2. Block Flags
jpayne@68	518
jpayne@68	519 The Block Flags field is a bit field:
jpayne@68	520
jpayne@68	521 Bit(s) Mask Description
jpayne@68	522 0-1 0x03 Number of filters (1-4)
jpayne@68	523 2-5 0x3C Reserved for future use; MUST be zero for now.
jpayne@68	524 6 0x40 The Compressed Size field is present.
jpayne@68	525 7 0x80 The Uncompressed Size field is present.
jpayne@68	526
jpayne@68	527 If any reserved bit is set, the decoder MUST indicate an error.
jpayne@68	528 It is possible that there is a new field present which the
jpayne@68	529 decoder is not aware of, and can thus parse the Block Header
jpayne@68	530 incorrectly.
jpayne@68	531
jpayne@68	532
jpayne@68	533 3.1.3. Compressed Size
jpayne@68	534
jpayne@68	535 This field is present only if the appropriate bit is set in
jpayne@68	536 the Block Flags field (see Section 3.1.2).
jpayne@68	537
jpayne@68	538 The Compressed Size field contains the size of the Compressed
jpayne@68	539 Data field, which MUST be non-zero. Compressed Size is stored
jpayne@68	540 using the encoding described in Section 1.2. If the Compressed
jpayne@68	541 Size doesn't match the size of the Compressed Data field, the
jpayne@68	542 decoder MUST indicate an error.
jpayne@68	543
jpayne@68	544
jpayne@68	545 3.1.4. Uncompressed Size
jpayne@68	546
jpayne@68	547 This field is present only if the appropriate bit is set in
jpayne@68	548 the Block Flags field (see Section 3.1.2).
jpayne@68	549
jpayne@68	550 The Uncompressed Size field contains the size of the Block
jpayne@68	551 after uncompressing. Uncompressed Size is stored using the
jpayne@68	552 encoding described in Section 1.2. If the Uncompressed Size
jpayne@68	553 does not match the real uncompressed size, the decoder MUST
jpayne@68	554 indicate an error.
jpayne@68	555
jpayne@68	556 Storing the Compressed Size and Uncompressed Size fields serves
jpayne@68	557 several purposes:
jpayne@68	558 - The decoder knows how much memory it needs to allocate
jpayne@68	559 for a temporary buffer in multithreaded mode.
jpayne@68	560 - Simple error detection: wrong size indicates a broken file.
jpayne@68	561 - Seeking forwards to a specific location in streamed mode.
jpayne@68	562
jpayne@68	563 It should be noted that the only reliable way to determine
jpayne@68	564 the real uncompressed size is to uncompress the Block,
jpayne@68	565 because the Block Header and Index fields may contain
jpayne@68	566 (intentionally or unintentionally) invalid information.
jpayne@68	567
jpayne@68	568
jpayne@68	569 3.1.5. List of Filter Flags
jpayne@68	570
jpayne@68	571 +================+================+ +================+
jpayne@68	572 \| Filter 0 Flags \| Filter 1 Flags \| ... \| Filter n Flags \|
jpayne@68	573 +================+================+ +================+
jpayne@68	574
jpayne@68	575 The number of Filter Flags fields is stored in the Block Flags
jpayne@68	576 field (see Section 3.1.2).
jpayne@68	577
jpayne@68	578 The format of each Filter Flags field is as follows:
jpayne@68	579
jpayne@68	580 +===========+====================+===================+
jpayne@68	581 \| Filter ID \| Size of Properties \| Filter Properties \|
jpayne@68	582 +===========+====================+===================+
jpayne@68	583
jpayne@68	584 Both Filter ID and Size of Properties are stored using the
jpayne@68	585 encoding described in Section 1.2. Size of Properties indicates
jpayne@68	586 the size of the Filter Properties field as bytes. The list of
jpayne@68	587 officially defined Filter IDs and the formats of their Filter
jpayne@68	588 Properties are described in Section 5.3.
jpayne@68	589
jpayne@68	590 Filter IDs greater than or equal to 0x4000_0000_0000_0000
jpayne@68	591 (2^62) are reserved for implementation-specific internal use.
jpayne@68	592 These Filter IDs MUST never be used in List of Filter Flags.
jpayne@68	593
jpayne@68	594
jpayne@68	595 3.1.6. Header Padding
jpayne@68	596
jpayne@68	597 This field contains as many null byte as it is needed to make
jpayne@68	598 the Block Header have the size specified in Block Header Size.
jpayne@68	599 If any of the bytes are not null bytes, the decoder MUST
jpayne@68	600 indicate an error. It is possible that there is a new field
jpayne@68	601 present which the decoder is not aware of, and can thus parse
jpayne@68	602 the Block Header incorrectly.
jpayne@68	603
jpayne@68	604
jpayne@68	605 3.1.7. CRC32
jpayne@68	606
jpayne@68	607 The CRC32 is calculated over everything in the Block Header
jpayne@68	608 field except the CRC32 field itself. It is stored as an
jpayne@68	609 unsigned 32-bit little endian integer. If the calculated
jpayne@68	610 value does not match the stored one, the decoder MUST indicate
jpayne@68	611 an error.
jpayne@68	612
jpayne@68	613 By verifying the CRC32 of the Block Header before parsing the
jpayne@68	614 actual contents allows the decoder to distinguish between
jpayne@68	615 corrupt and unsupported files.
jpayne@68	616
jpayne@68	617
jpayne@68	618 3.2. Compressed Data
jpayne@68	619
jpayne@68	620 The format of Compressed Data depends on Block Flags and List
jpayne@68	621 of Filter Flags. Excluding the descriptions of the simplest
jpayne@68	622 filters in Section 5.3, the format of the filter-specific
jpayne@68	623 encoded data is out of scope of this document.
jpayne@68	624
jpayne@68	625
jpayne@68	626 3.3. Block Padding
jpayne@68	627
jpayne@68	628 Block Padding MUST contain 0-3 null bytes to make the size of
jpayne@68	629 the Block a multiple of four bytes. This can be needed when
jpayne@68	630 the size of Compressed Data is not a multiple of four. If any
jpayne@68	631 of the bytes in Block Padding are not null bytes, the decoder
jpayne@68	632 MUST indicate an error.
jpayne@68	633
jpayne@68	634
jpayne@68	635 3.4. Check
jpayne@68	636
jpayne@68	637 The type and size of the Check field depends on which bits
jpayne@68	638 are set in the Stream Flags field (see Section 2.1.1.2).
jpayne@68	639
jpayne@68	640 The Check, when used, is calculated from the original
jpayne@68	641 uncompressed data. If the calculated Check does not match the
jpayne@68	642 stored one, the decoder MUST indicate an error. If the selected
jpayne@68	643 type of Check is not supported by the decoder, it SHOULD
jpayne@68	644 indicate a warning or error.
jpayne@68	645
jpayne@68	646
jpayne@68	647 4. Index
jpayne@68	648
jpayne@68	649 +-----------------+===================+
jpayne@68	650 \| Index Indicator \| Number of Records \|
jpayne@68	651 +-----------------+===================+
jpayne@68	652
jpayne@68	653 +=================+===============+-+-+-+-+
jpayne@68	654 ---> \| List of Records \| Index Padding \| CRC32 \|
jpayne@68	655 +=================+===============+-+-+-+-+
jpayne@68	656
jpayne@68	657 Index serves several purposes. Using it, one can
jpayne@68	658 - verify that all Blocks in a Stream have been processed;
jpayne@68	659 - find out the uncompressed size of a Stream; and
jpayne@68	660 - quickly access the beginning of any Block (random access).
jpayne@68	661
jpayne@68	662
jpayne@68	663 4.1. Index Indicator
jpayne@68	664
jpayne@68	665 This field overlaps with the Block Header Size field (see
jpayne@68	666 Section 3.1.1). The value of Index Indicator is always 0x00.
jpayne@68	667
jpayne@68	668
jpayne@68	669 4.2. Number of Records
jpayne@68	670
jpayne@68	671 This field indicates how many Records there are in the List
jpayne@68	672 of Records field, and thus how many Blocks there are in the
jpayne@68	673 Stream. The value is stored using the encoding described in
jpayne@68	674 Section 1.2. If the decoder has decoded all the Blocks of the
jpayne@68	675 Stream, and then notices that the Number of Records doesn't
jpayne@68	676 match the real number of Blocks, the decoder MUST indicate an
jpayne@68	677 error.
jpayne@68	678
jpayne@68	679
jpayne@68	680 4.3. List of Records
jpayne@68	681
jpayne@68	682 List of Records consists of as many Records as indicated by the
jpayne@68	683 Number of Records field:
jpayne@68	684
jpayne@68	685 +========+========+
jpayne@68	686 \| Record \| Record \| ...
jpayne@68	687 +========+========+
jpayne@68	688
jpayne@68	689 Each Record contains information about one Block:
jpayne@68	690
jpayne@68	691 +===============+===================+
jpayne@68	692 \| Unpadded Size \| Uncompressed Size \|
jpayne@68	693 +===============+===================+
jpayne@68	694
jpayne@68	695 If the decoder has decoded all the Blocks of the Stream, it
jpayne@68	696 MUST verify that the contents of the Records match the real
jpayne@68	697 Unpadded Size and Uncompressed Size of the respective Blocks.
jpayne@68	698
jpayne@68	699 Implementation hint: It is possible to verify the Index with
jpayne@68	700 constant memory usage by calculating for example SHA-256 of
jpayne@68	701 both the real size values and the List of Records, then
jpayne@68	702 comparing the hash values. Implementing this using
jpayne@68	703 non-cryptographic hash like CRC32 SHOULD be avoided unless
jpayne@68	704 small code size is important.
jpayne@68	705
jpayne@68	706 If the decoder supports random-access reading, it MUST verify
jpayne@68	707 that Unpadded Size and Uncompressed Size of every completely
jpayne@68	708 decoded Block match the sizes stored in the Index. If only
jpayne@68	709 partial Block is decoded, the decoder MUST verify that the
jpayne@68	710 processed sizes don't exceed the sizes stored in the Index.
jpayne@68	711
jpayne@68	712
jpayne@68	713 4.3.1. Unpadded Size
jpayne@68	714
jpayne@68	715 This field indicates the size of the Block excluding the Block
jpayne@68	716 Padding field. That is, Unpadded Size is the size of the Block
jpayne@68	717 Header, Compressed Data, and Check fields. Unpadded Size is
jpayne@68	718 stored using the encoding described in Section 1.2. The value
jpayne@68	719 MUST never be zero; with the current structure of Blocks, the
jpayne@68	720 actual minimum value for Unpadded Size is five.
jpayne@68	721
jpayne@68	722 Implementation note: Because the size of the Block Padding
jpayne@68	723 field is not included in Unpadded Size, calculating the total
jpayne@68	724 size of a Stream or doing random-access reading requires
jpayne@68	725 calculating the actual size of the Blocks by rounding Unpadded
jpayne@68	726 Sizes up to the next multiple of four.
jpayne@68	727
jpayne@68	728 The reason to exclude Block Padding from Unpadded Size is to
jpayne@68	729 ease making a raw copy of Compressed Data without Block
jpayne@68	730 Padding. This can be useful, for example, if someone wants
jpayne@68	731 to convert Streams to some other file format quickly.
jpayne@68	732
jpayne@68	733
jpayne@68	734 4.3.2. Uncompressed Size
jpayne@68	735
jpayne@68	736 This field indicates the Uncompressed Size of the respective
jpayne@68	737 Block as bytes. The value is stored using the encoding
jpayne@68	738 described in Section 1.2.
jpayne@68	739
jpayne@68	740
jpayne@68	741 4.4. Index Padding
jpayne@68	742
jpayne@68	743 This field MUST contain 0-3 null bytes to pad the Index to
jpayne@68	744 a multiple of four bytes. If any of the bytes are not null
jpayne@68	745 bytes, the decoder MUST indicate an error.
jpayne@68	746
jpayne@68	747
jpayne@68	748 4.5. CRC32
jpayne@68	749
jpayne@68	750 The CRC32 is calculated over everything in the Index field
jpayne@68	751 except the CRC32 field itself. The CRC32 is stored as an
jpayne@68	752 unsigned 32-bit little endian integer. If the calculated
jpayne@68	753 value does not match the stored one, the decoder MUST indicate
jpayne@68	754 an error.
jpayne@68	755
jpayne@68	756
jpayne@68	757 5. Filter Chains
jpayne@68	758
jpayne@68	759 The Block Flags field defines how many filters are used. When
jpayne@68	760 more than one filter is used, the filters are chained; that is,
jpayne@68	761 the output of one filter is the input of another filter. The
jpayne@68	762 following figure illustrates the direction of data flow.
jpayne@68	763
jpayne@68	764 v Uncompressed Data ^
jpayne@68	765 \| Filter 0 \|
jpayne@68	766 Encoder \| Filter 1 \| Decoder
jpayne@68	767 \| Filter n \|
jpayne@68	768 v Compressed Data ^
jpayne@68	769
jpayne@68	770
jpayne@68	771 5.1. Alignment
jpayne@68	772
jpayne@68	773 Alignment of uncompressed input data is usually the job of
jpayne@68	774 the application producing the data. For example, to get the
jpayne@68	775 best results, an archiver tool should make sure that all
jpayne@68	776 PowerPC executable files in the archive stream start at
jpayne@68	777 offsets that are multiples of four bytes.
jpayne@68	778
jpayne@68	779 Some filters, for example LZMA2, can be configured to take
jpayne@68	780 advantage of specified alignment of input data. Note that
jpayne@68	781 taking advantage of aligned input can be beneficial also when
jpayne@68	782 a filter is not the first filter in the chain. For example,
jpayne@68	783 if you compress PowerPC executables, you may want to use the
jpayne@68	784 PowerPC filter and chain that with the LZMA2 filter. Because
jpayne@68	785 not only the input but also the output alignment of the PowerPC
jpayne@68	786 filter is four bytes, it is now beneficial to set LZMA2
jpayne@68	787 settings so that the LZMA2 encoder can take advantage of its
jpayne@68	788 four-byte-aligned input data.
jpayne@68	789
jpayne@68	790 The output of the last filter in the chain is stored to the
jpayne@68	791 Compressed Data field, which is is guaranteed to be aligned
jpayne@68	792 to a multiple of four bytes relative to the beginning of the
jpayne@68	793 Stream. This can increase
jpayne@68	794 - speed, if the filtered data is handled multiple bytes at
jpayne@68	795 a time by the filter-specific encoder and decoder,
jpayne@68	796 because accessing aligned data in computer memory is
jpayne@68	797 usually faster; and
jpayne@68	798 - compression ratio, if the output data is later compressed
jpayne@68	799 with an external compression tool.
jpayne@68	800
jpayne@68	801
jpayne@68	802 5.2. Security
jpayne@68	803
jpayne@68	804 If filters would be allowed to be chained freely, it would be
jpayne@68	805 possible to create malicious files, that would be very slow to
jpayne@68	806 decode. Such files could be used to create denial of service
jpayne@68	807 attacks.
jpayne@68	808
jpayne@68	809 Slow files could occur when multiple filters are chained:
jpayne@68	810
jpayne@68	811 v Compressed input data
jpayne@68	812 \| Filter 1 decoder (last filter)
jpayne@68	813 \| Filter 0 decoder (non-last filter)
jpayne@68	814 v Uncompressed output data
jpayne@68	815
jpayne@68	816 The decoder of the last filter in the chain produces a lot of
jpayne@68	817 output from little input. Another filter in the chain takes the
jpayne@68	818 output of the last filter, and produces very little output
jpayne@68	819 while consuming a lot of input. As a result, a lot of data is
jpayne@68	820 moved inside the filter chain, but the filter chain as a whole
jpayne@68	821 gets very little work done.
jpayne@68	822
jpayne@68	823 To prevent this kind of slow files, there are restrictions on
jpayne@68	824 how the filters can be chained. These restrictions MUST be
jpayne@68	825 taken into account when designing new filters.
jpayne@68	826
jpayne@68	827 The maximum number of filters in the chain has been limited to
jpayne@68	828 four, thus there can be at maximum of three non-last filters.
jpayne@68	829 Of these three non-last filters, only two are allowed to change
jpayne@68	830 the size of the data.
jpayne@68	831
jpayne@68	832 The non-last filters, that change the size of the data, MUST
jpayne@68	833 have a limit how much the decoder can compress the data: the
jpayne@68	834 decoder SHOULD produce at least n bytes of output when the
jpayne@68	835 filter is given 2n bytes of input. This limit is not
jpayne@68	836 absolute, but significant deviations MUST be avoided.
jpayne@68	837
jpayne@68	838 The above limitations guarantee that if the last filter in the
jpayne@68	839 chain produces 4n bytes of output, the chain as a whole will
jpayne@68	840 produce at least n bytes of output.
jpayne@68	841
jpayne@68	842
jpayne@68	843 5.3. Filters
jpayne@68	844
jpayne@68	845 5.3.1. LZMA2
jpayne@68	846
jpayne@68	847 LZMA (Lempel-Ziv-Markov chain-Algorithm) is a general-purpose
jpayne@68	848 compression algorithm with high compression ratio and fast
jpayne@68	849 decompression. LZMA is based on LZ77 and range coding
jpayne@68	850 algorithms.
jpayne@68	851
jpayne@68	852 LZMA2 is an extension on top of the original LZMA. LZMA2 uses
jpayne@68	853 LZMA internally, but adds support for flushing the encoder,
jpayne@68	854 uncompressed chunks, eases stateful decoder implementations,
jpayne@68	855 and improves support for multithreading. Thus, the plain LZMA
jpayne@68	856 will not be supported in this file format.
jpayne@68	857
jpayne@68	858 Filter ID: 0x21
jpayne@68	859 Size of Filter Properties: 1 byte
jpayne@68	860 Changes size of data: Yes
jpayne@68	861 Allow as a non-last filter: No
jpayne@68	862 Allow as the last filter: Yes
jpayne@68	863
jpayne@68	864 Preferred alignment:
jpayne@68	865 Input data: Adjustable to 1/2/4/8/16 byte(s)
jpayne@68	866 Output data: 1 byte
jpayne@68	867
jpayne@68	868 The format of the one-byte Filter Properties field is as
jpayne@68	869 follows:
jpayne@68	870
jpayne@68	871 Bits Mask Description
jpayne@68	872 0-5 0x3F Dictionary Size
jpayne@68	873 6-7 0xC0 Reserved for future use; MUST be zero for now.
jpayne@68	874
jpayne@68	875 Dictionary Size is encoded with one-bit mantissa and five-bit
jpayne@68	876 exponent. The smallest dictionary size is 4 KiB and the biggest
jpayne@68	877 is 4 GiB.
jpayne@68	878
jpayne@68	879 Raw value Mantissa Exponent Dictionary size
jpayne@68	880 0 2 11 4 KiB
jpayne@68	881 1 3 11 6 KiB
jpayne@68	882 2 2 12 8 KiB
jpayne@68	883 3 3 12 12 KiB
jpayne@68	884 4 2 13 16 KiB
jpayne@68	885 5 3 13 24 KiB
jpayne@68	886 6 2 14 32 KiB
jpayne@68	887 ... ... ... ...
jpayne@68	888 35 3 27 768 MiB
jpayne@68	889 36 2 28 1024 MiB
jpayne@68	890 37 3 29 1536 MiB
jpayne@68	891 38 2 30 2048 MiB
jpayne@68	892 39 3 30 3072 MiB
jpayne@68	893 40 2 31 4096 MiB - 1 B
jpayne@68	894
jpayne@68	895 Instead of having a table in the decoder, the dictionary size
jpayne@68	896 can be decoded using the following C code:
jpayne@68	897
jpayne@68	898 const uint8_t bits = get_dictionary_flags() & 0x3F;
jpayne@68	899 if (bits > 40)
jpayne@68	900 return DICTIONARY_TOO_BIG; // Bigger than 4 GiB
jpayne@68	901
jpayne@68	902 uint32_t dictionary_size;
jpayne@68	903 if (bits == 40) {
jpayne@68	904 dictionary_size = UINT32_MAX;
jpayne@68	905 } else {
jpayne@68	906 dictionary_size = 2 \| (bits & 1);
jpayne@68	907 dictionary_size <<= bits / 2 + 11;
jpayne@68	908 }
jpayne@68	909
jpayne@68	910
jpayne@68	911 5.3.2. Branch/Call/Jump Filters for Executables
jpayne@68	912
jpayne@68	913 These filters convert relative branch, call, and jump
jpayne@68	914 instructions to their absolute counterparts in executable
jpayne@68	915 files. This conversion increases redundancy and thus
jpayne@68	916 compression ratio.
jpayne@68	917
jpayne@68	918 Size of Filter Properties: 0 or 4 bytes
jpayne@68	919 Changes size of data: No
jpayne@68	920 Allow as a non-last filter: Yes
jpayne@68	921 Allow as the last filter: No
jpayne@68	922
jpayne@68	923 Below is the list of filters in this category. The alignment
jpayne@68	924 is the same for both input and output data.
jpayne@68	925
jpayne@68	926 Filter ID Alignment Description
jpayne@68	927 0x04 1 byte x86 filter (BCJ)
jpayne@68	928 0x05 4 bytes PowerPC (big endian) filter
jpayne@68	929 0x06 16 bytes IA64 filter
jpayne@68	930 0x07 4 bytes ARM filter [1]
jpayne@68	931 0x08 2 bytes ARM Thumb filter [1]
jpayne@68	932 0x09 4 bytes SPARC filter
jpayne@68	933 0x0A 4 bytes ARM64 filter [2]
jpayne@68	934 0x0B 2 bytes RISC-V filter
jpayne@68	935
jpayne@68	936 [1] These are for little endian instruction encoding.
jpayne@68	937 This must not be confused with data endianness.
jpayne@68	938 A processor configured for big endian data access
jpayne@68	939 may still use little endian instruction encoding.
jpayne@68	940 The filters don't care about the data endianness.
jpayne@68	941
jpayne@68	942 [2] 4096-byte alignment gives the best results
jpayne@68	943 because the address in the ADRP instruction
jpayne@68	944 is a multiple of 4096 bytes.
jpayne@68	945
jpayne@68	946 If the size of Filter Properties is four bytes, the Filter
jpayne@68	947 Properties field contains the start offset used for address
jpayne@68	948 conversions. It is stored as an unsigned 32-bit little endian
jpayne@68	949 integer. The start offset MUST be a multiple of the alignment
jpayne@68	950 of the filter as listed in the table above; if it isn't, the
jpayne@68	951 decoder MUST indicate an error. If the size of Filter
jpayne@68	952 Properties is zero, the start offset is zero.
jpayne@68	953
jpayne@68	954 Setting the start offset may be useful if an executable has
jpayne@68	955 multiple sections, and there are many cross-section calls.
jpayne@68	956 Taking advantage of this feature usually requires usage of
jpayne@68	957 the Subblock filter, whose design is not complete yet.
jpayne@68	958
jpayne@68	959
jpayne@68	960 5.3.3. Delta
jpayne@68	961
jpayne@68	962 The Delta filter may increase compression ratio when the value
jpayne@68	963 of the next byte correlates with the value of an earlier byte
jpayne@68	964 at specified distance.
jpayne@68	965
jpayne@68	966 Filter ID: 0x03
jpayne@68	967 Size of Filter Properties: 1 byte
jpayne@68	968 Changes size of data: No
jpayne@68	969 Allow as a non-last filter: Yes
jpayne@68	970 Allow as the last filter: No
jpayne@68	971
jpayne@68	972 Preferred alignment:
jpayne@68	973 Input data: 1 byte
jpayne@68	974 Output data: Same as the original input data
jpayne@68	975
jpayne@68	976 The Properties byte indicates the delta distance, which can be
jpayne@68	977 1-256 bytes backwards from the current byte: 0x00 indicates
jpayne@68	978 distance of 1 byte and 0xFF distance of 256 bytes.
jpayne@68	979
jpayne@68	980
jpayne@68	981 5.3.3.1. Format of the Encoded Output
jpayne@68	982
jpayne@68	983 The code below illustrates both encoding and decoding with
jpayne@68	984 the Delta filter.
jpayne@68	985
jpayne@68	986 // Distance is in the range [1, 256].
jpayne@68	987 const unsigned int distance = get_properties_byte() + 1;
jpayne@68	988 uint8_t pos = 0;
jpayne@68	989 uint8_t delta[256];
jpayne@68	990
jpayne@68	991 memset(delta, 0, sizeof(delta));
jpayne@68	992
jpayne@68	993 while (1) {
jpayne@68	994 const int byte = read_byte();
jpayne@68	995 if (byte == EOF)
jpayne@68	996 break;
jpayne@68	997
jpayne@68	998 uint8_t tmp = delta[(uint8_t)(distance + pos)];
jpayne@68	999 if (is_encoder) {
jpayne@68	1000 tmp = (uint8_t)(byte) - tmp;
jpayne@68	1001 delta[pos] = (uint8_t)(byte);
jpayne@68	1002 } else {
jpayne@68	1003 tmp = (uint8_t)(byte) + tmp;
jpayne@68	1004 delta[pos] = tmp;
jpayne@68	1005 }
jpayne@68	1006
jpayne@68	1007 write_byte(tmp);
jpayne@68	1008 --pos;
jpayne@68	1009 }
jpayne@68	1010
jpayne@68	1011
jpayne@68	1012 5.4. Custom Filter IDs
jpayne@68	1013
jpayne@68	1014 If a developer wants to use custom Filter IDs, there are two
jpayne@68	1015 choices. The first choice is to contact Lasse Collin and ask
jpayne@68	1016 him to allocate a range of IDs for the developer.
jpayne@68	1017
jpayne@68	1018 The second choice is to generate a 40-bit random integer
jpayne@68	1019 which the developer can use as a personal Developer ID.
jpayne@68	1020 To minimize the risk of collisions, Developer ID has to be
jpayne@68	1021 a randomly generated integer, not manually selected "hex word".
jpayne@68	1022 The following command, which works on many free operating
jpayne@68	1023 systems, can be used to generate Developer ID:
jpayne@68	1024
jpayne@68	1025 dd if=/dev/urandom bs=5 count=1 \| hexdump
jpayne@68	1026
jpayne@68	1027 The developer can then use the Developer ID to create unique
jpayne@68	1028 (well, hopefully unique) Filter IDs.
jpayne@68	1029
jpayne@68	1030 Bits Mask Description
jpayne@68	1031 0-15 0x0000_0000_0000_FFFF Filter ID
jpayne@68	1032 16-55 0x00FF_FFFF_FFFF_0000 Developer ID
jpayne@68	1033 56-62 0x3F00_0000_0000_0000 Static prefix: 0x3F
jpayne@68	1034
jpayne@68	1035 The resulting 63-bit integer will use 9 bytes of space when
jpayne@68	1036 stored using the encoding described in Section 1.2. To get
jpayne@68	1037 a shorter ID, see the beginning of this Section how to
jpayne@68	1038 request a custom ID range.
jpayne@68	1039
jpayne@68	1040
jpayne@68	1041 5.4.1. Reserved Custom Filter ID Ranges
jpayne@68	1042
jpayne@68	1043 Range Description
jpayne@68	1044 0x0000_0300 - 0x0000_04FF Reserved to ease .7z compatibility
jpayne@68	1045 0x0002_0000 - 0x0007_FFFF Reserved to ease .7z compatibility
jpayne@68	1046 0x0200_0000 - 0x07FF_FFFF Reserved to ease .7z compatibility
jpayne@68	1047
jpayne@68	1048
jpayne@68	1049 6. Cyclic Redundancy Checks
jpayne@68	1050
jpayne@68	1051 There are several incompatible variations to calculate CRC32
jpayne@68	1052 and CRC64. For simplicity and clarity, complete examples are
jpayne@68	1053 provided to calculate the checks as they are used in this file
jpayne@68	1054 format. Implementations MAY use different code as long as it
jpayne@68	1055 gives identical results.
jpayne@68	1056
jpayne@68	1057 The program below reads data from standard input, calculates
jpayne@68	1058 the CRC32 and CRC64 values, and prints the calculated values
jpayne@68	1059 as big endian hexadecimal strings to standard output.
jpayne@68	1060
jpayne@68	1061 #include <stddef.h>
jpayne@68	1062 #include <inttypes.h>
jpayne@68	1063 #include <stdio.h>
jpayne@68	1064
jpayne@68	1065 uint32_t crc32_table[256];
jpayne@68	1066 uint64_t crc64_table[256];
jpayne@68	1067
jpayne@68	1068 void
jpayne@68	1069 init(void)
jpayne@68	1070 {
jpayne@68	1071 static const uint32_t poly32 = UINT32_C(0xEDB88320);
jpayne@68	1072 static const uint64_t poly64
jpayne@68	1073 = UINT64_C(0xC96C5795D7870F42);
jpayne@68	1074
jpayne@68	1075 for (size_t i = 0; i < 256; ++i) {
jpayne@68	1076 uint32_t crc32 = i;
jpayne@68	1077 uint64_t crc64 = i;
jpayne@68	1078
jpayne@68	1079 for (size_t j = 0; j < 8; ++j) {
jpayne@68	1080 if (crc32 & 1)
jpayne@68	1081 crc32 = (crc32 >> 1) ^ poly32;
jpayne@68	1082 else
jpayne@68	1083 crc32 >>= 1;
jpayne@68	1084
jpayne@68	1085 if (crc64 & 1)
jpayne@68	1086 crc64 = (crc64 >> 1) ^ poly64;
jpayne@68	1087 else
jpayne@68	1088 crc64 >>= 1;
jpayne@68	1089 }
jpayne@68	1090
jpayne@68	1091 crc32_table[i] = crc32;
jpayne@68	1092 crc64_table[i] = crc64;
jpayne@68	1093 }
jpayne@68	1094 }
jpayne@68	1095
jpayne@68	1096 uint32_t
jpayne@68	1097 crc32(const uint8_t *buf, size_t size, uint32_t crc)
jpayne@68	1098 {
jpayne@68	1099 crc = ~crc;
jpayne@68	1100 for (size_t i = 0; i < size; ++i)
jpayne@68	1101 crc = crc32_table[buf[i] ^ (crc & 0xFF)]
jpayne@68	1102 ^ (crc >> 8);
jpayne@68	1103 return ~crc;
jpayne@68	1104 }
jpayne@68	1105
jpayne@68	1106 uint64_t
jpayne@68	1107 crc64(const uint8_t *buf, size_t size, uint64_t crc)
jpayne@68	1108 {
jpayne@68	1109 crc = ~crc;
jpayne@68	1110 for (size_t i = 0; i < size; ++i)
jpayne@68	1111 crc = crc64_table[buf[i] ^ (crc & 0xFF)]
jpayne@68	1112 ^ (crc >> 8);
jpayne@68	1113 return ~crc;
jpayne@68	1114 }
jpayne@68	1115
jpayne@68	1116 int
jpayne@68	1117 main()
jpayne@68	1118 {
jpayne@68	1119 init();
jpayne@68	1120
jpayne@68	1121 uint32_t value32 = 0;
jpayne@68	1122 uint64_t value64 = 0;
jpayne@68	1123 uint64_t total_size = 0;
jpayne@68	1124 uint8_t buf[8192];
jpayne@68	1125
jpayne@68	1126 while (1) {
jpayne@68	1127 const size_t buf_size
jpayne@68	1128 = fread(buf, 1, sizeof(buf), stdin);
jpayne@68	1129 if (buf_size == 0)
jpayne@68	1130 break;
jpayne@68	1131
jpayne@68	1132 total_size += buf_size;
jpayne@68	1133 value32 = crc32(buf, buf_size, value32);
jpayne@68	1134 value64 = crc64(buf, buf_size, value64);
jpayne@68	1135 }
jpayne@68	1136
jpayne@68	1137 printf("Bytes: %" PRIu64 "\n", total_size);
jpayne@68	1138 printf("CRC-32: 0x%08" PRIX32 "\n", value32);
jpayne@68	1139 printf("CRC-64: 0x%016" PRIX64 "\n", value64);
jpayne@68	1140
jpayne@68	1141 return 0;
jpayne@68	1142 }
jpayne@68	1143
jpayne@68	1144
jpayne@68	1145 7. References
jpayne@68	1146
jpayne@68	1147 LZMA SDK - The original LZMA implementation
jpayne@68	1148 https://7-zip.org/sdk.html
jpayne@68	1149
jpayne@68	1150 LZMA Utils - LZMA adapted to POSIX-like systems
jpayne@68	1151 https://tukaani.org/lzma/
jpayne@68	1152
jpayne@68	1153 XZ Utils - The next generation of LZMA Utils
jpayne@68	1154 https://tukaani.org/xz/
jpayne@68	1155
jpayne@68	1156 [RFC-1952]
jpayne@68	1157 GZIP file format specification version 4.3
jpayne@68	1158 https://www.ietf.org/rfc/rfc1952.txt
jpayne@68	1159 - Notation of byte boxes in section "2.1. Overall conventions"
jpayne@68	1160
jpayne@68	1161 [RFC-2119]
jpayne@68	1162 Key words for use in RFCs to Indicate Requirement Levels
jpayne@68	1163 https://www.ietf.org/rfc/rfc2119.txt
jpayne@68	1164
jpayne@68	1165 [GNU-tar]
jpayne@68	1166 GNU tar 1.35 manual
jpayne@68	1167 https://www.gnu.org/software/tar/manual/html_node/Blocking-Factor.html
jpayne@68	1168 - Node 9.4.2 "Blocking Factor", paragraph that begins
jpayne@68	1169 "gzip will complain about trailing garbage"
jpayne@68	1170 - Note that this URL points to the latest version of the
jpayne@68	1171 manual, and may some day not contain the note which is in
jpayne@68	1172 1.35. For the exact version of the manual, download GNU
jpayne@68	1173 tar 1.35: ftp://ftp.gnu.org/pub/gnu/tar/tar-1.35.tar.gz
jpayne@68	1174

Mercurial > repos > rliterman > csp2

annotate CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/share/doc/xz/xz-file-format.txt @ 68:5028fdace37b