jpayne@68: jpayne@68: XZ Utils FAQ jpayne@68: ============ jpayne@68: jpayne@68: Q: What do the letters XZ mean? jpayne@68: jpayne@68: A: Nothing. They are just two letters, which come from the file format jpayne@68: suffix .xz. The .xz suffix was selected, because it seemed to be jpayne@68: pretty much unused. It has no deeper meaning. jpayne@68: jpayne@68: jpayne@68: Q: What are LZMA and LZMA2? jpayne@68: jpayne@68: A: LZMA stands for Lempel-Ziv-Markov chain-Algorithm. It is the name jpayne@68: of the compression algorithm designed by Igor Pavlov for 7-Zip. jpayne@68: LZMA is based on LZ77 and range encoding. jpayne@68: jpayne@68: LZMA2 is an updated version of the original LZMA to fix a couple of jpayne@68: practical issues. In context of XZ Utils, LZMA is called LZMA1 to jpayne@68: emphasize that LZMA is not the same thing as LZMA2. LZMA2 is the jpayne@68: primary compression algorithm in the .xz file format. jpayne@68: jpayne@68: jpayne@68: Q: There are many LZMA related projects. How does XZ Utils relate to them? jpayne@68: jpayne@68: A: 7-Zip and LZMA SDK are the original projects. LZMA SDK is roughly jpayne@68: a subset of the 7-Zip source tree. jpayne@68: jpayne@68: p7zip is 7-Zip's command-line tools ported to POSIX-like systems. jpayne@68: jpayne@68: LZMA Utils provide a gzip-like lzma tool for POSIX-like systems. jpayne@68: LZMA Utils are based on LZMA SDK. XZ Utils are the successor to jpayne@68: LZMA Utils. jpayne@68: jpayne@68: There are several other projects using LZMA. Most are more or less jpayne@68: based on LZMA SDK. See . jpayne@68: jpayne@68: jpayne@68: Q: Why is liblzma named liblzma if its primary file format is .xz? jpayne@68: Shouldn't it be e.g. libxz? jpayne@68: jpayne@68: A: When the designing of the .xz format began, the idea was to replace jpayne@68: the .lzma format and use the same .lzma suffix. It would have been jpayne@68: quite OK to reuse the suffix when there were very few .lzma files jpayne@68: around. However, the old .lzma format became popular before the jpayne@68: new format was finished. The new format was renamed to .xz but the jpayne@68: name of liblzma wasn't changed. jpayne@68: jpayne@68: jpayne@68: Q: Do XZ Utils support the .7z format? jpayne@68: jpayne@68: A: No. Use 7-Zip (Windows) or p7zip (POSIX-like systems) to handle .7z jpayne@68: files. jpayne@68: jpayne@68: jpayne@68: Q: I have many .tar.7z files. Can I convert them to .tar.xz without jpayne@68: spending hours recompressing the data? jpayne@68: jpayne@68: A: In the "extra" directory, there is a script named 7z2lzma.bash which jpayne@68: is able to convert some .7z files to the .lzma format (not .xz). It jpayne@68: needs the 7za (or 7z) command from p7zip. The script may silently jpayne@68: produce corrupt output if certain assumptions are not met, so jpayne@68: decompress the resulting .lzma file and compare it against the jpayne@68: original before deleting the original file! jpayne@68: jpayne@68: jpayne@68: Q: I have many .lzma files. Can I quickly convert them to the .xz format? jpayne@68: jpayne@68: A: For now, no. Since XZ Utils supports the .lzma format, it's usually jpayne@68: not too bad to keep the old files in the old format. If you want to jpayne@68: do the conversion anyway, you need to decompress the .lzma files and jpayne@68: then recompress to the .xz format. jpayne@68: jpayne@68: Technically, there is a way to make the conversion relatively fast jpayne@68: (roughly twice the time that normal decompression takes). Writing jpayne@68: such a tool would take quite a bit of time though, and would probably jpayne@68: be useful to only a few people. If you really want such a conversion jpayne@68: tool, contact Lasse Collin and offer some money. jpayne@68: jpayne@68: jpayne@68: Q: I have installed xz, but my tar doesn't recognize .tar.xz files. jpayne@68: How can I extract .tar.xz files? jpayne@68: jpayne@68: A: xz -dc foo.tar.xz | tar xf - jpayne@68: jpayne@68: jpayne@68: Q: Can I recover parts of a broken .xz file (e.g. a corrupted CD-R)? jpayne@68: jpayne@68: A: It may be possible if the file consists of multiple blocks, which jpayne@68: typically is not the case if the file was created in single-threaded jpayne@68: mode. There is no recovery program yet. jpayne@68: jpayne@68: jpayne@68: Q: Is (some part of) XZ Utils patented? jpayne@68: jpayne@68: A: Lasse Collin is not aware of any patents that could affect XZ Utils. jpayne@68: However, due to the nature of software patents, it's not possible to jpayne@68: guarantee that XZ Utils isn't affected by any third party patent(s). jpayne@68: jpayne@68: jpayne@68: Q: Where can I find documentation about the file format and algorithms? jpayne@68: jpayne@68: A: The .xz format is documented in xz-file-format.txt. It is a container jpayne@68: format only, and doesn't include descriptions of any non-trivial jpayne@68: filters. jpayne@68: jpayne@68: Documenting LZMA and LZMA2 is planned, but for now, there is no other jpayne@68: documentation than the source code. Before you begin, you should know jpayne@68: the basics of LZ77 and range-coding algorithms. LZMA is based on LZ77, jpayne@68: but LZMA is a lot more complex. Range coding is used to compress jpayne@68: the final bitstream like Huffman coding is used in Deflate. jpayne@68: jpayne@68: jpayne@68: Q: I cannot find BCJ and BCJ2 filters. Don't they exist in liblzma? jpayne@68: jpayne@68: A: BCJ filter is called "x86" in liblzma. BCJ2 is not included, jpayne@68: because it requires using more than one encoded output stream. jpayne@68: jpayne@68: jpayne@68: Q: I need to use a script that runs "xz -9". On a system with 256 MiB jpayne@68: of RAM, xz says that it cannot allocate memory. Can I make the jpayne@68: script work without modifying it? jpayne@68: jpayne@68: A: Set a default memory usage limit for compression. You can do it e.g. jpayne@68: in a shell initialization script such as ~/.bashrc or /etc/profile: jpayne@68: jpayne@68: XZ_DEFAULTS=--memlimit-compress=150MiB jpayne@68: export XZ_DEFAULTS jpayne@68: jpayne@68: xz will then scale the compression settings down so that the given jpayne@68: memory usage limit is not reached. This way xz shouldn't run out jpayne@68: of memory. jpayne@68: jpayne@68: Check also that memory-related resource limits are high enough. jpayne@68: On most systems, "ulimit -a" will show the current resource limits. jpayne@68: jpayne@68: jpayne@68: Q: How do I create files that can be decompressed with XZ Embedded? jpayne@68: jpayne@68: A: See the documentation in XZ Embedded. In short, something like jpayne@68: this is a good start: jpayne@68: jpayne@68: xz --check=crc32 --lzma2=preset=6e,dict=64KiB jpayne@68: jpayne@68: Or if a BCJ filter is needed too, e.g. if compressing jpayne@68: a kernel image for PowerPC: jpayne@68: jpayne@68: xz --check=crc32 --powerpc --lzma2=preset=6e,dict=64KiB jpayne@68: jpayne@68: Adjust the dictionary size to get a good compromise between jpayne@68: compression ratio and decompressor memory usage. Note that jpayne@68: in single-call decompression mode of XZ Embedded, a big jpayne@68: dictionary doesn't increase memory usage. jpayne@68: jpayne@68: jpayne@68: Q: How is multi-threaded compression implemented in XZ Utils? jpayne@68: jpayne@68: A: The simplest method is splitting the uncompressed data into blocks jpayne@68: and compressing them in parallel independent from each other. jpayne@68: This is currently the only threading method supported in XZ Utils. jpayne@68: Since the blocks are compressed independently, they can also be jpayne@68: decompressed independently. Together with the index feature in .xz, jpayne@68: this allows using threads to create .xz files for random-access jpayne@68: reading. This also makes threaded decompression possible. jpayne@68: jpayne@68: The independent blocks method has a couple of disadvantages too. It jpayne@68: will compress worse than a single-block method. Often the difference jpayne@68: is not too big (maybe 1-2 %) but sometimes it can be too big. Also, jpayne@68: the memory usage of the compressor increases linearly when adding jpayne@68: threads. jpayne@68: jpayne@68: At least two other threading methods are possible but these haven't jpayne@68: been implemented in XZ Utils: jpayne@68: jpayne@68: Match finder parallelization has been in 7-Zip for ages. It doesn't jpayne@68: affect compression ratio or memory usage significantly. Among the jpayne@68: three threading methods, only this is useful when compressing small jpayne@68: files (files that are not significantly bigger than the dictionary). jpayne@68: Unfortunately this method scales only to about two CPU cores. jpayne@68: jpayne@68: The third method is pigz-style threading (I use that name, because jpayne@68: pigz uses that method). It doesn't jpayne@68: affect compression ratio significantly and scales to many cores. jpayne@68: The memory usage scales linearly when threads are added. This isn't jpayne@68: significant with pigz, because Deflate uses only a 32 KiB dictionary, jpayne@68: but with LZMA2 the memory usage will increase dramatically just like jpayne@68: with the independent-blocks method. There is also a constant jpayne@68: computational overhead, which may make pigz-method a bit dull on jpayne@68: dual-core compared to the parallel match finder method, but with more jpayne@68: cores the overhead is not a big deal anymore. jpayne@68: jpayne@68: Combining the threading methods will be possible and also useful. jpayne@68: For example, combining match finder parallelization with pigz-style jpayne@68: threading or independent-blocks-threading can cut the memory usage jpayne@68: by 50 %. jpayne@68: jpayne@68: jpayne@68: Q: I told xz to use many threads but it is using only one or two jpayne@68: processor cores. What is wrong? jpayne@68: jpayne@68: A: Since multi-threaded compression is done by splitting the data into jpayne@68: blocks that are compressed individually, if the input file is too jpayne@68: small for the block size, then many threads cannot be used. The jpayne@68: default block size increases when the compression level is jpayne@68: increased. For example, xz -6 uses 8 MiB LZMA2 dictionary and jpayne@68: 24 MiB blocks, and xz -9 uses 64 MiB LZMA dictionary and 192 MiB jpayne@68: blocks. If the input file is 100 MiB, xz -6 can use five threads jpayne@68: of which one will finish quickly as it has only 4 MiB to compress. jpayne@68: However, for the same file, xz -9 can only use one thread. jpayne@68: jpayne@68: One can adjust block size with --block-size=SIZE but making the jpayne@68: block size smaller than LZMA2 dictionary is waste of RAM: using jpayne@68: xz -9 with 6 MiB blocks isn't any better than using xz -6 with jpayne@68: 6 MiB blocks. The default settings use a block size bigger than jpayne@68: the LZMA2 dictionary size because this was seen as a reasonable jpayne@68: compromise between RAM usage and compression ratio. jpayne@68: jpayne@68: When decompressing, the ability to use threads depends on how the jpayne@68: file was created. If it was created in multi-threaded mode then jpayne@68: it can be decompressed in multi-threaded mode too if there are jpayne@68: multiple blocks in the file. jpayne@68: jpayne@68: jpayne@68: Q: How do I build a program that needs liblzmadec (lzmadec.h)? jpayne@68: jpayne@68: A: liblzmadec is part of LZMA Utils. XZ Utils has liblzma, but no jpayne@68: liblzmadec. The code using liblzmadec should be ported to use jpayne@68: liblzma instead. If you cannot or don't want to do that, download jpayne@68: LZMA Utils from . jpayne@68: jpayne@68: jpayne@68: Q: The default build of liblzma is too big. How can I make it smaller? jpayne@68: jpayne@68: A: Give --enable-small to the configure script. Use also appropriate jpayne@68: --enable or --disable options to include only those filter encoders jpayne@68: and decoders and integrity checks that you actually need. Use jpayne@68: CFLAGS=-Os (with GCC) or equivalent to tell your compiler to optimize jpayne@68: for size. See INSTALL for information about configure options. jpayne@68: jpayne@68: If the result is still too big, take a look at XZ Embedded. It is jpayne@68: a separate project, which provides a limited but significantly jpayne@68: smaller XZ decoder implementation than XZ Utils. You can find it jpayne@68: at . jpayne@68: