jpayne@68: jpayne@68: The .lzma File Format jpayne@68: ===================== jpayne@68: jpayne@68: 0. Preface jpayne@68: 0.1. Notices and Acknowledgements jpayne@68: 0.2. Changes jpayne@68: 1. File Format jpayne@68: 1.1. Header jpayne@68: 1.1.1. Properties jpayne@68: 1.1.2. Dictionary Size jpayne@68: 1.1.3. Uncompressed Size jpayne@68: 1.2. LZMA Compressed Data jpayne@68: 2. References jpayne@68: jpayne@68: jpayne@68: 0. Preface jpayne@68: jpayne@68: This document describes the .lzma file format, which is jpayne@68: sometimes also called LZMA_Alone format. It is a legacy file jpayne@68: format, which is being or has been replaced by the .xz format. jpayne@68: The MIME type of the .lzma format is `application/x-lzma'. jpayne@68: jpayne@68: The most commonly used software to handle .lzma files are jpayne@68: LZMA SDK, LZMA Utils, 7-Zip, and XZ Utils. This document jpayne@68: describes some of the differences between these implementations jpayne@68: and gives hints what subset of the .lzma format is the most jpayne@68: portable. jpayne@68: jpayne@68: jpayne@68: 0.1. Notices and Acknowledgements jpayne@68: jpayne@68: This file format was designed by Igor Pavlov for use in jpayne@68: LZMA SDK. This document was written by Lasse Collin jpayne@68: using the documentation found jpayne@68: from the LZMA SDK. jpayne@68: jpayne@68: This document has been put into the public domain. jpayne@68: jpayne@68: jpayne@68: 0.2. Changes jpayne@68: jpayne@68: Last modified: 2024-04-08 17:35+0300 jpayne@68: jpayne@68: From version 2011-04-12 11:55+0300 to 2022-07-13 21:00+0300: jpayne@68: The section 1.1.3 was modified to allow End of Payload Marker jpayne@68: with a known Uncompressed Size. jpayne@68: jpayne@68: jpayne@68: 1. File Format jpayne@68: jpayne@68: +-+-+-+-+-+-+-+-+-+-+-+-+-+==========================+ jpayne@68: | Header | LZMA Compressed Data | jpayne@68: +-+-+-+-+-+-+-+-+-+-+-+-+-+==========================+ jpayne@68: jpayne@68: The .lzma format file consist of 13-byte Header followed by jpayne@68: the LZMA Compressed Data. jpayne@68: jpayne@68: Unlike the .gz, .bz2, and .xz formats, it is not possible to jpayne@68: concatenate multiple .lzma files as is and expect the jpayne@68: decompression tool to decode the resulting file as if it were jpayne@68: a single .lzma file. jpayne@68: jpayne@68: For example, the command line tools from LZMA Utils and jpayne@68: LZMA SDK silently ignore all the data after the first .lzma jpayne@68: stream. In contrast, the command line tool from XZ Utils jpayne@68: considers the .lzma file to be corrupt if there is data after jpayne@68: the first .lzma stream. jpayne@68: jpayne@68: jpayne@68: 1.1. Header jpayne@68: jpayne@68: +------------+----+----+----+----+--+--+--+--+--+--+--+--+ jpayne@68: | Properties | Dictionary Size | Uncompressed Size | jpayne@68: +------------+----+----+----+----+--+--+--+--+--+--+--+--+ jpayne@68: jpayne@68: jpayne@68: 1.1.1. Properties jpayne@68: jpayne@68: The Properties field contains three properties. An abbreviation jpayne@68: is given in parentheses, followed by the value range of the jpayne@68: property. The field consists of jpayne@68: jpayne@68: 1) the number of literal context bits (lc, [0, 8]); jpayne@68: 2) the number of literal position bits (lp, [0, 4]); and jpayne@68: 3) the number of position bits (pb, [0, 4]). jpayne@68: jpayne@68: The properties are encoded using the following formula: jpayne@68: jpayne@68: Properties = (pb * 5 + lp) * 9 + lc jpayne@68: jpayne@68: The following C code illustrates a straightforward way to jpayne@68: decode the Properties field: jpayne@68: jpayne@68: uint8_t lc, lp, pb; jpayne@68: uint8_t prop = get_lzma_properties(); jpayne@68: if (prop > (4 * 5 + 4) * 9 + 8) jpayne@68: return LZMA_PROPERTIES_ERROR; jpayne@68: jpayne@68: pb = prop / (9 * 5); jpayne@68: prop -= pb * 9 * 5; jpayne@68: lp = prop / 9; jpayne@68: lc = prop - lp * 9; jpayne@68: jpayne@68: XZ Utils has an additional requirement: lc + lp <= 4. Files jpayne@68: which don't follow this requirement cannot be decompressed jpayne@68: with XZ Utils. Usually this isn't a problem since the most jpayne@68: common lc/lp/pb values are 3/0/2. It is the only lc/lp/pb jpayne@68: combination that the files created by LZMA Utils can have, jpayne@68: but LZMA Utils can decompress files with any lc/lp/pb. jpayne@68: jpayne@68: jpayne@68: 1.1.2. Dictionary Size jpayne@68: jpayne@68: Dictionary Size is stored as an unsigned 32-bit little endian jpayne@68: integer. Any 32-bit value is possible, but for maximum jpayne@68: portability, only sizes of 2^n and 2^n + 2^(n-1) should be jpayne@68: used. jpayne@68: jpayne@68: LZMA Utils creates only files with dictionary size 2^n, jpayne@68: 16 <= n <= 25. LZMA Utils can decompress files with any jpayne@68: dictionary size. jpayne@68: jpayne@68: XZ Utils creates and decompresses .lzma files only with jpayne@68: dictionary sizes 2^n and 2^n + 2^(n-1). If some other jpayne@68: dictionary size is specified when compressing, the value jpayne@68: stored in the Dictionary Size field is a rounded up, but the jpayne@68: specified value is still used in the actual compression code. jpayne@68: jpayne@68: jpayne@68: 1.1.3. Uncompressed Size jpayne@68: jpayne@68: Uncompressed Size is stored as unsigned 64-bit little endian jpayne@68: integer. A special value of 0xFFFF_FFFF_FFFF_FFFF indicates jpayne@68: that Uncompressed Size is unknown. End of Payload Marker (*) jpayne@68: is used if Uncompressed Size is unknown. End of Payload Marker jpayne@68: is allowed but rarely used if Uncompressed Size is known. jpayne@68: XZ Utils 5.2.5 and older don't support .lzma files that have jpayne@68: End of Payload Marker together with a known Uncompressed Size. jpayne@68: jpayne@68: XZ Utils rejects files whose Uncompressed Size field specifies jpayne@68: a known size that is 256 GiB or more. This is to reject false jpayne@68: positives when trying to guess if the input file is in the jpayne@68: .lzma format. When Uncompressed Size is unknown, there is no jpayne@68: limit for the uncompressed size of the file. jpayne@68: jpayne@68: (*) Some tools use the term End of Stream (EOS) marker jpayne@68: instead of End of Payload Marker. jpayne@68: jpayne@68: jpayne@68: 1.2. LZMA Compressed Data jpayne@68: jpayne@68: Detailed description of the format of this field is out of jpayne@68: scope of this document. jpayne@68: jpayne@68: jpayne@68: 2. References jpayne@68: jpayne@68: LZMA SDK - The original LZMA implementation jpayne@68: https://7-zip.org/sdk.html jpayne@68: jpayne@68: 7-Zip jpayne@68: https://7-zip.org/ jpayne@68: jpayne@68: LZMA Utils - LZMA adapted to POSIX-like systems jpayne@68: https://tukaani.org/lzma/ jpayne@68: jpayne@68: XZ Utils - The next generation of LZMA Utils jpayne@68: https://tukaani.org/xz/ jpayne@68: jpayne@68: The .xz file format - The successor of the .lzma format jpayne@68: https://tukaani.org/xz/xz-file-format.txt jpayne@68: