jpayne@68
|
1
|
jpayne@68
|
2 XZ Utils FAQ
|
jpayne@68
|
3 ============
|
jpayne@68
|
4
|
jpayne@68
|
5 Q: What do the letters XZ mean?
|
jpayne@68
|
6
|
jpayne@68
|
7 A: Nothing. They are just two letters, which come from the file format
|
jpayne@68
|
8 suffix .xz. The .xz suffix was selected, because it seemed to be
|
jpayne@68
|
9 pretty much unused. It has no deeper meaning.
|
jpayne@68
|
10
|
jpayne@68
|
11
|
jpayne@68
|
12 Q: What are LZMA and LZMA2?
|
jpayne@68
|
13
|
jpayne@68
|
14 A: LZMA stands for Lempel-Ziv-Markov chain-Algorithm. It is the name
|
jpayne@68
|
15 of the compression algorithm designed by Igor Pavlov for 7-Zip.
|
jpayne@68
|
16 LZMA is based on LZ77 and range encoding.
|
jpayne@68
|
17
|
jpayne@68
|
18 LZMA2 is an updated version of the original LZMA to fix a couple of
|
jpayne@68
|
19 practical issues. In context of XZ Utils, LZMA is called LZMA1 to
|
jpayne@68
|
20 emphasize that LZMA is not the same thing as LZMA2. LZMA2 is the
|
jpayne@68
|
21 primary compression algorithm in the .xz file format.
|
jpayne@68
|
22
|
jpayne@68
|
23
|
jpayne@68
|
24 Q: There are many LZMA related projects. How does XZ Utils relate to them?
|
jpayne@68
|
25
|
jpayne@68
|
26 A: 7-Zip and LZMA SDK are the original projects. LZMA SDK is roughly
|
jpayne@68
|
27 a subset of the 7-Zip source tree.
|
jpayne@68
|
28
|
jpayne@68
|
29 p7zip is 7-Zip's command-line tools ported to POSIX-like systems.
|
jpayne@68
|
30
|
jpayne@68
|
31 LZMA Utils provide a gzip-like lzma tool for POSIX-like systems.
|
jpayne@68
|
32 LZMA Utils are based on LZMA SDK. XZ Utils are the successor to
|
jpayne@68
|
33 LZMA Utils.
|
jpayne@68
|
34
|
jpayne@68
|
35 There are several other projects using LZMA. Most are more or less
|
jpayne@68
|
36 based on LZMA SDK. See <https://7-zip.org/links.html>.
|
jpayne@68
|
37
|
jpayne@68
|
38
|
jpayne@68
|
39 Q: Why is liblzma named liblzma if its primary file format is .xz?
|
jpayne@68
|
40 Shouldn't it be e.g. libxz?
|
jpayne@68
|
41
|
jpayne@68
|
42 A: When the designing of the .xz format began, the idea was to replace
|
jpayne@68
|
43 the .lzma format and use the same .lzma suffix. It would have been
|
jpayne@68
|
44 quite OK to reuse the suffix when there were very few .lzma files
|
jpayne@68
|
45 around. However, the old .lzma format became popular before the
|
jpayne@68
|
46 new format was finished. The new format was renamed to .xz but the
|
jpayne@68
|
47 name of liblzma wasn't changed.
|
jpayne@68
|
48
|
jpayne@68
|
49
|
jpayne@68
|
50 Q: Do XZ Utils support the .7z format?
|
jpayne@68
|
51
|
jpayne@68
|
52 A: No. Use 7-Zip (Windows) or p7zip (POSIX-like systems) to handle .7z
|
jpayne@68
|
53 files.
|
jpayne@68
|
54
|
jpayne@68
|
55
|
jpayne@68
|
56 Q: I have many .tar.7z files. Can I convert them to .tar.xz without
|
jpayne@68
|
57 spending hours recompressing the data?
|
jpayne@68
|
58
|
jpayne@68
|
59 A: In the "extra" directory, there is a script named 7z2lzma.bash which
|
jpayne@68
|
60 is able to convert some .7z files to the .lzma format (not .xz). It
|
jpayne@68
|
61 needs the 7za (or 7z) command from p7zip. The script may silently
|
jpayne@68
|
62 produce corrupt output if certain assumptions are not met, so
|
jpayne@68
|
63 decompress the resulting .lzma file and compare it against the
|
jpayne@68
|
64 original before deleting the original file!
|
jpayne@68
|
65
|
jpayne@68
|
66
|
jpayne@68
|
67 Q: I have many .lzma files. Can I quickly convert them to the .xz format?
|
jpayne@68
|
68
|
jpayne@68
|
69 A: For now, no. Since XZ Utils supports the .lzma format, it's usually
|
jpayne@68
|
70 not too bad to keep the old files in the old format. If you want to
|
jpayne@68
|
71 do the conversion anyway, you need to decompress the .lzma files and
|
jpayne@68
|
72 then recompress to the .xz format.
|
jpayne@68
|
73
|
jpayne@68
|
74 Technically, there is a way to make the conversion relatively fast
|
jpayne@68
|
75 (roughly twice the time that normal decompression takes). Writing
|
jpayne@68
|
76 such a tool would take quite a bit of time though, and would probably
|
jpayne@68
|
77 be useful to only a few people. If you really want such a conversion
|
jpayne@68
|
78 tool, contact Lasse Collin and offer some money.
|
jpayne@68
|
79
|
jpayne@68
|
80
|
jpayne@68
|
81 Q: I have installed xz, but my tar doesn't recognize .tar.xz files.
|
jpayne@68
|
82 How can I extract .tar.xz files?
|
jpayne@68
|
83
|
jpayne@68
|
84 A: xz -dc foo.tar.xz | tar xf -
|
jpayne@68
|
85
|
jpayne@68
|
86
|
jpayne@68
|
87 Q: Can I recover parts of a broken .xz file (e.g. a corrupted CD-R)?
|
jpayne@68
|
88
|
jpayne@68
|
89 A: It may be possible if the file consists of multiple blocks, which
|
jpayne@68
|
90 typically is not the case if the file was created in single-threaded
|
jpayne@68
|
91 mode. There is no recovery program yet.
|
jpayne@68
|
92
|
jpayne@68
|
93
|
jpayne@68
|
94 Q: Is (some part of) XZ Utils patented?
|
jpayne@68
|
95
|
jpayne@68
|
96 A: Lasse Collin is not aware of any patents that could affect XZ Utils.
|
jpayne@68
|
97 However, due to the nature of software patents, it's not possible to
|
jpayne@68
|
98 guarantee that XZ Utils isn't affected by any third party patent(s).
|
jpayne@68
|
99
|
jpayne@68
|
100
|
jpayne@68
|
101 Q: Where can I find documentation about the file format and algorithms?
|
jpayne@68
|
102
|
jpayne@68
|
103 A: The .xz format is documented in xz-file-format.txt. It is a container
|
jpayne@68
|
104 format only, and doesn't include descriptions of any non-trivial
|
jpayne@68
|
105 filters.
|
jpayne@68
|
106
|
jpayne@68
|
107 Documenting LZMA and LZMA2 is planned, but for now, there is no other
|
jpayne@68
|
108 documentation than the source code. Before you begin, you should know
|
jpayne@68
|
109 the basics of LZ77 and range-coding algorithms. LZMA is based on LZ77,
|
jpayne@68
|
110 but LZMA is a lot more complex. Range coding is used to compress
|
jpayne@68
|
111 the final bitstream like Huffman coding is used in Deflate.
|
jpayne@68
|
112
|
jpayne@68
|
113
|
jpayne@68
|
114 Q: I cannot find BCJ and BCJ2 filters. Don't they exist in liblzma?
|
jpayne@68
|
115
|
jpayne@68
|
116 A: BCJ filter is called "x86" in liblzma. BCJ2 is not included,
|
jpayne@68
|
117 because it requires using more than one encoded output stream.
|
jpayne@68
|
118
|
jpayne@68
|
119
|
jpayne@68
|
120 Q: I need to use a script that runs "xz -9". On a system with 256 MiB
|
jpayne@68
|
121 of RAM, xz says that it cannot allocate memory. Can I make the
|
jpayne@68
|
122 script work without modifying it?
|
jpayne@68
|
123
|
jpayne@68
|
124 A: Set a default memory usage limit for compression. You can do it e.g.
|
jpayne@68
|
125 in a shell initialization script such as ~/.bashrc or /etc/profile:
|
jpayne@68
|
126
|
jpayne@68
|
127 XZ_DEFAULTS=--memlimit-compress=150MiB
|
jpayne@68
|
128 export XZ_DEFAULTS
|
jpayne@68
|
129
|
jpayne@68
|
130 xz will then scale the compression settings down so that the given
|
jpayne@68
|
131 memory usage limit is not reached. This way xz shouldn't run out
|
jpayne@68
|
132 of memory.
|
jpayne@68
|
133
|
jpayne@68
|
134 Check also that memory-related resource limits are high enough.
|
jpayne@68
|
135 On most systems, "ulimit -a" will show the current resource limits.
|
jpayne@68
|
136
|
jpayne@68
|
137
|
jpayne@68
|
138 Q: How do I create files that can be decompressed with XZ Embedded?
|
jpayne@68
|
139
|
jpayne@68
|
140 A: See the documentation in XZ Embedded. In short, something like
|
jpayne@68
|
141 this is a good start:
|
jpayne@68
|
142
|
jpayne@68
|
143 xz --check=crc32 --lzma2=preset=6e,dict=64KiB
|
jpayne@68
|
144
|
jpayne@68
|
145 Or if a BCJ filter is needed too, e.g. if compressing
|
jpayne@68
|
146 a kernel image for PowerPC:
|
jpayne@68
|
147
|
jpayne@68
|
148 xz --check=crc32 --powerpc --lzma2=preset=6e,dict=64KiB
|
jpayne@68
|
149
|
jpayne@68
|
150 Adjust the dictionary size to get a good compromise between
|
jpayne@68
|
151 compression ratio and decompressor memory usage. Note that
|
jpayne@68
|
152 in single-call decompression mode of XZ Embedded, a big
|
jpayne@68
|
153 dictionary doesn't increase memory usage.
|
jpayne@68
|
154
|
jpayne@68
|
155
|
jpayne@68
|
156 Q: How is multi-threaded compression implemented in XZ Utils?
|
jpayne@68
|
157
|
jpayne@68
|
158 A: The simplest method is splitting the uncompressed data into blocks
|
jpayne@68
|
159 and compressing them in parallel independent from each other.
|
jpayne@68
|
160 This is currently the only threading method supported in XZ Utils.
|
jpayne@68
|
161 Since the blocks are compressed independently, they can also be
|
jpayne@68
|
162 decompressed independently. Together with the index feature in .xz,
|
jpayne@68
|
163 this allows using threads to create .xz files for random-access
|
jpayne@68
|
164 reading. This also makes threaded decompression possible.
|
jpayne@68
|
165
|
jpayne@68
|
166 The independent blocks method has a couple of disadvantages too. It
|
jpayne@68
|
167 will compress worse than a single-block method. Often the difference
|
jpayne@68
|
168 is not too big (maybe 1-2 %) but sometimes it can be too big. Also,
|
jpayne@68
|
169 the memory usage of the compressor increases linearly when adding
|
jpayne@68
|
170 threads.
|
jpayne@68
|
171
|
jpayne@68
|
172 At least two other threading methods are possible but these haven't
|
jpayne@68
|
173 been implemented in XZ Utils:
|
jpayne@68
|
174
|
jpayne@68
|
175 Match finder parallelization has been in 7-Zip for ages. It doesn't
|
jpayne@68
|
176 affect compression ratio or memory usage significantly. Among the
|
jpayne@68
|
177 three threading methods, only this is useful when compressing small
|
jpayne@68
|
178 files (files that are not significantly bigger than the dictionary).
|
jpayne@68
|
179 Unfortunately this method scales only to about two CPU cores.
|
jpayne@68
|
180
|
jpayne@68
|
181 The third method is pigz-style threading (I use that name, because
|
jpayne@68
|
182 pigz <https://www.zlib.net/pigz/> uses that method). It doesn't
|
jpayne@68
|
183 affect compression ratio significantly and scales to many cores.
|
jpayne@68
|
184 The memory usage scales linearly when threads are added. This isn't
|
jpayne@68
|
185 significant with pigz, because Deflate uses only a 32 KiB dictionary,
|
jpayne@68
|
186 but with LZMA2 the memory usage will increase dramatically just like
|
jpayne@68
|
187 with the independent-blocks method. There is also a constant
|
jpayne@68
|
188 computational overhead, which may make pigz-method a bit dull on
|
jpayne@68
|
189 dual-core compared to the parallel match finder method, but with more
|
jpayne@68
|
190 cores the overhead is not a big deal anymore.
|
jpayne@68
|
191
|
jpayne@68
|
192 Combining the threading methods will be possible and also useful.
|
jpayne@68
|
193 For example, combining match finder parallelization with pigz-style
|
jpayne@68
|
194 threading or independent-blocks-threading can cut the memory usage
|
jpayne@68
|
195 by 50 %.
|
jpayne@68
|
196
|
jpayne@68
|
197
|
jpayne@68
|
198 Q: I told xz to use many threads but it is using only one or two
|
jpayne@68
|
199 processor cores. What is wrong?
|
jpayne@68
|
200
|
jpayne@68
|
201 A: Since multi-threaded compression is done by splitting the data into
|
jpayne@68
|
202 blocks that are compressed individually, if the input file is too
|
jpayne@68
|
203 small for the block size, then many threads cannot be used. The
|
jpayne@68
|
204 default block size increases when the compression level is
|
jpayne@68
|
205 increased. For example, xz -6 uses 8 MiB LZMA2 dictionary and
|
jpayne@68
|
206 24 MiB blocks, and xz -9 uses 64 MiB LZMA dictionary and 192 MiB
|
jpayne@68
|
207 blocks. If the input file is 100 MiB, xz -6 can use five threads
|
jpayne@68
|
208 of which one will finish quickly as it has only 4 MiB to compress.
|
jpayne@68
|
209 However, for the same file, xz -9 can only use one thread.
|
jpayne@68
|
210
|
jpayne@68
|
211 One can adjust block size with --block-size=SIZE but making the
|
jpayne@68
|
212 block size smaller than LZMA2 dictionary is waste of RAM: using
|
jpayne@68
|
213 xz -9 with 6 MiB blocks isn't any better than using xz -6 with
|
jpayne@68
|
214 6 MiB blocks. The default settings use a block size bigger than
|
jpayne@68
|
215 the LZMA2 dictionary size because this was seen as a reasonable
|
jpayne@68
|
216 compromise between RAM usage and compression ratio.
|
jpayne@68
|
217
|
jpayne@68
|
218 When decompressing, the ability to use threads depends on how the
|
jpayne@68
|
219 file was created. If it was created in multi-threaded mode then
|
jpayne@68
|
220 it can be decompressed in multi-threaded mode too if there are
|
jpayne@68
|
221 multiple blocks in the file.
|
jpayne@68
|
222
|
jpayne@68
|
223
|
jpayne@68
|
224 Q: How do I build a program that needs liblzmadec (lzmadec.h)?
|
jpayne@68
|
225
|
jpayne@68
|
226 A: liblzmadec is part of LZMA Utils. XZ Utils has liblzma, but no
|
jpayne@68
|
227 liblzmadec. The code using liblzmadec should be ported to use
|
jpayne@68
|
228 liblzma instead. If you cannot or don't want to do that, download
|
jpayne@68
|
229 LZMA Utils from <https://tukaani.org/lzma/>.
|
jpayne@68
|
230
|
jpayne@68
|
231
|
jpayne@68
|
232 Q: The default build of liblzma is too big. How can I make it smaller?
|
jpayne@68
|
233
|
jpayne@68
|
234 A: Give --enable-small to the configure script. Use also appropriate
|
jpayne@68
|
235 --enable or --disable options to include only those filter encoders
|
jpayne@68
|
236 and decoders and integrity checks that you actually need. Use
|
jpayne@68
|
237 CFLAGS=-Os (with GCC) or equivalent to tell your compiler to optimize
|
jpayne@68
|
238 for size. See INSTALL for information about configure options.
|
jpayne@68
|
239
|
jpayne@68
|
240 If the result is still too big, take a look at XZ Embedded. It is
|
jpayne@68
|
241 a separate project, which provides a limited but significantly
|
jpayne@68
|
242 smaller XZ decoder implementation than XZ Utils. You can find it
|
jpayne@68
|
243 at <https://tukaani.org/xz/embedded.html>.
|
jpayne@68
|
244
|