jpayne@68
|
1
|
jpayne@68
|
2 The .lzma File Format
|
jpayne@68
|
3 =====================
|
jpayne@68
|
4
|
jpayne@68
|
5 0. Preface
|
jpayne@68
|
6 0.1. Notices and Acknowledgements
|
jpayne@68
|
7 0.2. Changes
|
jpayne@68
|
8 1. File Format
|
jpayne@68
|
9 1.1. Header
|
jpayne@68
|
10 1.1.1. Properties
|
jpayne@68
|
11 1.1.2. Dictionary Size
|
jpayne@68
|
12 1.1.3. Uncompressed Size
|
jpayne@68
|
13 1.2. LZMA Compressed Data
|
jpayne@68
|
14 2. References
|
jpayne@68
|
15
|
jpayne@68
|
16
|
jpayne@68
|
17 0. Preface
|
jpayne@68
|
18
|
jpayne@68
|
19 This document describes the .lzma file format, which is
|
jpayne@68
|
20 sometimes also called LZMA_Alone format. It is a legacy file
|
jpayne@68
|
21 format, which is being or has been replaced by the .xz format.
|
jpayne@68
|
22 The MIME type of the .lzma format is `application/x-lzma'.
|
jpayne@68
|
23
|
jpayne@68
|
24 The most commonly used software to handle .lzma files are
|
jpayne@68
|
25 LZMA SDK, LZMA Utils, 7-Zip, and XZ Utils. This document
|
jpayne@68
|
26 describes some of the differences between these implementations
|
jpayne@68
|
27 and gives hints what subset of the .lzma format is the most
|
jpayne@68
|
28 portable.
|
jpayne@68
|
29
|
jpayne@68
|
30
|
jpayne@68
|
31 0.1. Notices and Acknowledgements
|
jpayne@68
|
32
|
jpayne@68
|
33 This file format was designed by Igor Pavlov for use in
|
jpayne@68
|
34 LZMA SDK. This document was written by Lasse Collin
|
jpayne@68
|
35 <lasse.collin@tukaani.org> using the documentation found
|
jpayne@68
|
36 from the LZMA SDK.
|
jpayne@68
|
37
|
jpayne@68
|
38 This document has been put into the public domain.
|
jpayne@68
|
39
|
jpayne@68
|
40
|
jpayne@68
|
41 0.2. Changes
|
jpayne@68
|
42
|
jpayne@68
|
43 Last modified: 2024-04-08 17:35+0300
|
jpayne@68
|
44
|
jpayne@68
|
45 From version 2011-04-12 11:55+0300 to 2022-07-13 21:00+0300:
|
jpayne@68
|
46 The section 1.1.3 was modified to allow End of Payload Marker
|
jpayne@68
|
47 with a known Uncompressed Size.
|
jpayne@68
|
48
|
jpayne@68
|
49
|
jpayne@68
|
50 1. File Format
|
jpayne@68
|
51
|
jpayne@68
|
52 +-+-+-+-+-+-+-+-+-+-+-+-+-+==========================+
|
jpayne@68
|
53 | Header | LZMA Compressed Data |
|
jpayne@68
|
54 +-+-+-+-+-+-+-+-+-+-+-+-+-+==========================+
|
jpayne@68
|
55
|
jpayne@68
|
56 The .lzma format file consist of 13-byte Header followed by
|
jpayne@68
|
57 the LZMA Compressed Data.
|
jpayne@68
|
58
|
jpayne@68
|
59 Unlike the .gz, .bz2, and .xz formats, it is not possible to
|
jpayne@68
|
60 concatenate multiple .lzma files as is and expect the
|
jpayne@68
|
61 decompression tool to decode the resulting file as if it were
|
jpayne@68
|
62 a single .lzma file.
|
jpayne@68
|
63
|
jpayne@68
|
64 For example, the command line tools from LZMA Utils and
|
jpayne@68
|
65 LZMA SDK silently ignore all the data after the first .lzma
|
jpayne@68
|
66 stream. In contrast, the command line tool from XZ Utils
|
jpayne@68
|
67 considers the .lzma file to be corrupt if there is data after
|
jpayne@68
|
68 the first .lzma stream.
|
jpayne@68
|
69
|
jpayne@68
|
70
|
jpayne@68
|
71 1.1. Header
|
jpayne@68
|
72
|
jpayne@68
|
73 +------------+----+----+----+----+--+--+--+--+--+--+--+--+
|
jpayne@68
|
74 | Properties | Dictionary Size | Uncompressed Size |
|
jpayne@68
|
75 +------------+----+----+----+----+--+--+--+--+--+--+--+--+
|
jpayne@68
|
76
|
jpayne@68
|
77
|
jpayne@68
|
78 1.1.1. Properties
|
jpayne@68
|
79
|
jpayne@68
|
80 The Properties field contains three properties. An abbreviation
|
jpayne@68
|
81 is given in parentheses, followed by the value range of the
|
jpayne@68
|
82 property. The field consists of
|
jpayne@68
|
83
|
jpayne@68
|
84 1) the number of literal context bits (lc, [0, 8]);
|
jpayne@68
|
85 2) the number of literal position bits (lp, [0, 4]); and
|
jpayne@68
|
86 3) the number of position bits (pb, [0, 4]).
|
jpayne@68
|
87
|
jpayne@68
|
88 The properties are encoded using the following formula:
|
jpayne@68
|
89
|
jpayne@68
|
90 Properties = (pb * 5 + lp) * 9 + lc
|
jpayne@68
|
91
|
jpayne@68
|
92 The following C code illustrates a straightforward way to
|
jpayne@68
|
93 decode the Properties field:
|
jpayne@68
|
94
|
jpayne@68
|
95 uint8_t lc, lp, pb;
|
jpayne@68
|
96 uint8_t prop = get_lzma_properties();
|
jpayne@68
|
97 if (prop > (4 * 5 + 4) * 9 + 8)
|
jpayne@68
|
98 return LZMA_PROPERTIES_ERROR;
|
jpayne@68
|
99
|
jpayne@68
|
100 pb = prop / (9 * 5);
|
jpayne@68
|
101 prop -= pb * 9 * 5;
|
jpayne@68
|
102 lp = prop / 9;
|
jpayne@68
|
103 lc = prop - lp * 9;
|
jpayne@68
|
104
|
jpayne@68
|
105 XZ Utils has an additional requirement: lc + lp <= 4. Files
|
jpayne@68
|
106 which don't follow this requirement cannot be decompressed
|
jpayne@68
|
107 with XZ Utils. Usually this isn't a problem since the most
|
jpayne@68
|
108 common lc/lp/pb values are 3/0/2. It is the only lc/lp/pb
|
jpayne@68
|
109 combination that the files created by LZMA Utils can have,
|
jpayne@68
|
110 but LZMA Utils can decompress files with any lc/lp/pb.
|
jpayne@68
|
111
|
jpayne@68
|
112
|
jpayne@68
|
113 1.1.2. Dictionary Size
|
jpayne@68
|
114
|
jpayne@68
|
115 Dictionary Size is stored as an unsigned 32-bit little endian
|
jpayne@68
|
116 integer. Any 32-bit value is possible, but for maximum
|
jpayne@68
|
117 portability, only sizes of 2^n and 2^n + 2^(n-1) should be
|
jpayne@68
|
118 used.
|
jpayne@68
|
119
|
jpayne@68
|
120 LZMA Utils creates only files with dictionary size 2^n,
|
jpayne@68
|
121 16 <= n <= 25. LZMA Utils can decompress files with any
|
jpayne@68
|
122 dictionary size.
|
jpayne@68
|
123
|
jpayne@68
|
124 XZ Utils creates and decompresses .lzma files only with
|
jpayne@68
|
125 dictionary sizes 2^n and 2^n + 2^(n-1). If some other
|
jpayne@68
|
126 dictionary size is specified when compressing, the value
|
jpayne@68
|
127 stored in the Dictionary Size field is a rounded up, but the
|
jpayne@68
|
128 specified value is still used in the actual compression code.
|
jpayne@68
|
129
|
jpayne@68
|
130
|
jpayne@68
|
131 1.1.3. Uncompressed Size
|
jpayne@68
|
132
|
jpayne@68
|
133 Uncompressed Size is stored as unsigned 64-bit little endian
|
jpayne@68
|
134 integer. A special value of 0xFFFF_FFFF_FFFF_FFFF indicates
|
jpayne@68
|
135 that Uncompressed Size is unknown. End of Payload Marker (*)
|
jpayne@68
|
136 is used if Uncompressed Size is unknown. End of Payload Marker
|
jpayne@68
|
137 is allowed but rarely used if Uncompressed Size is known.
|
jpayne@68
|
138 XZ Utils 5.2.5 and older don't support .lzma files that have
|
jpayne@68
|
139 End of Payload Marker together with a known Uncompressed Size.
|
jpayne@68
|
140
|
jpayne@68
|
141 XZ Utils rejects files whose Uncompressed Size field specifies
|
jpayne@68
|
142 a known size that is 256 GiB or more. This is to reject false
|
jpayne@68
|
143 positives when trying to guess if the input file is in the
|
jpayne@68
|
144 .lzma format. When Uncompressed Size is unknown, there is no
|
jpayne@68
|
145 limit for the uncompressed size of the file.
|
jpayne@68
|
146
|
jpayne@68
|
147 (*) Some tools use the term End of Stream (EOS) marker
|
jpayne@68
|
148 instead of End of Payload Marker.
|
jpayne@68
|
149
|
jpayne@68
|
150
|
jpayne@68
|
151 1.2. LZMA Compressed Data
|
jpayne@68
|
152
|
jpayne@68
|
153 Detailed description of the format of this field is out of
|
jpayne@68
|
154 scope of this document.
|
jpayne@68
|
155
|
jpayne@68
|
156
|
jpayne@68
|
157 2. References
|
jpayne@68
|
158
|
jpayne@68
|
159 LZMA SDK - The original LZMA implementation
|
jpayne@68
|
160 https://7-zip.org/sdk.html
|
jpayne@68
|
161
|
jpayne@68
|
162 7-Zip
|
jpayne@68
|
163 https://7-zip.org/
|
jpayne@68
|
164
|
jpayne@68
|
165 LZMA Utils - LZMA adapted to POSIX-like systems
|
jpayne@68
|
166 https://tukaani.org/lzma/
|
jpayne@68
|
167
|
jpayne@68
|
168 XZ Utils - The next generation of LZMA Utils
|
jpayne@68
|
169 https://tukaani.org/xz/
|
jpayne@68
|
170
|
jpayne@68
|
171 The .xz file format - The successor of the .lzma format
|
jpayne@68
|
172 https://tukaani.org/xz/xz-file-format.txt
|
jpayne@68
|
173
|