@xeredo/lzma-native

Provides bindings to the native liblzma library (.xz file format, among others)

Usage no npm install needed!

<script type="module">
  import xeredoLzmaNative from 'https://cdn.skypack.dev/@xeredo/lzma-native';
</script>

README

lzma-native

NPM Version NPM Downloads Build Status Windows Coverage Status Dependency Status devDependency Status

Node.js interface to the native liblzma compression library (.xz file format, among others)

This package provides interfaces for compression and decompression of .xz (and legacy .lzma) files, both stream-based and string-based.

NOTE: This is a modified version to directly use the system's liblzma package and will only work on linux with both pkgconfig and liblzma-dev installed

Example usage

Installation

Simply install lzma-native via npm:

$ npm install --save lzma-native

Note: As of version 1.0.0, this module provides pre-built binaries for multiple Node.js versions and all major OS using node-pre-gyp, so for 99 % of users no compiler toolchain is necessary. Please create an issue here if you have any trouble installing this module.

Note: lzma-native@2.x requires a Node version >= 4. If you want to support Node 0.10 or 0.12, you can feel free to use lzma-native@1.x.

For streams

If you don’t have any fancy requirements, using this library is quite simple:

var lzma = require('lzma-native');

var compressor = lzma.createCompressor();
var input = fs.createReadStream('README.md');
var output = fs.createWriteStream('README.md.xz');

input.pipe(compressor).pipe(output);

For decompression, you can simply use lzma.createDecompressor().

Both functions return a stream where you can pipe your input in and read your (de)compressed output from.

For simple strings/Buffers

If you want your input/output to be Buffers (strings will be accepted as input), this even gets a little simpler:

lzma.compress('Banana', function(result) {
    console.log(result); // <Buffer fd 37 7a 58 5a 00 00 01 69 22 de 36 02 00 21 ...>
});

Again, replace lzma.compress with lzma.decompress and you’ll get the inverse transformation.

lzma.compress() and lzma.decompress() will return promises and you don’t need to provide any kind of callback (Example code).

API

Compatibility implementations

Apart from the API described here, lzma-native implements the APIs of the following other LZMA libraries so you can use it nearly as a drop-in replacement:

  • node-xz via lzma.Compressor and lzma.Decompressor
  • LZMA-JS via lzma.LZMA().compress and lzma.LZMA().decompress, though without actual support for progress functions and returning Buffer objects instead of integer arrays. (This produces output in the .lzma file format, not the .xz format!)

Multi-threaded encoding

Since version 1.5.0, lzma-native supports liblzma’s built-in multi-threading encoding capabilities. To make use of them, set the threads option to an integer value: lzma.createCompressor({ threads: n });. You can use value of 0 to use the number of processor cores. This option is only available for the easyEncoder (the default) and streamEncoder encoders.

Note that, by default, encoding will take place in Node’s libuv thread pool regardless of this option, and setting it when multiple encoders are running is likely to affect performance negatively.

Reference

Encoding strings and Buffer objects

Creating streams for encoding

.xz file metadata

Miscellaneous functions

Encoding strings and Buffer objects

lzma.compress(), lzma.decompress()

  • lzma.compress(string, [opt, ]on_finish)
  • lzma.decompress(string, [opt, ]on_finish)
Param Type Description
string Buffer / String Any string or buffer to be (de)compressed (that can be passed to stream.end(…))
[opt] Options / int Optional. See options
on_finish Callback Will be invoked with the resulting Buffer as the first parameter when encoding is finished, and as on_finish(null, err) in case of an error.

These methods will also return a promise that you can use directly.

Example code:

lzma.compress('Bananas', 6, function(result) {
    lzma.decompress(result, function(decompressedResult) {
        assert.equal(decompressedResult.toString(), 'Bananas');
    });
});

Example code for promises:

lzma.compress('Bananas', 6).then(function(result) {
    return lzma.decompress(result);
}).then(function(decompressedResult) {
    assert.equal(decompressedResult.toString(), 'Bananas');
}).catch(function(err) {
    // ...
});

lzma.LZMA().compress(), lzma.LZMA().decompress()

  • lzma.LZMA().compress(string, mode, on_finish[, on_progress])
  • lzma.LZMA().decompress(string, on_finish[, on_progress])

(Compatibility; See LZMA-JS for the original specs.)

Note that the result of compression is in the older LZMA1 format (.lzma files). This is different from the more universally used LZMA2 format (.xz files) and you will have to take care of possible compatibility issues with systems expecting .xz files.

Param Type Description
string Buffer / String / Array Any string, buffer, or array of integers or typed integers (e.g. Uint8Array)
mode int A number between 0 and 9, indicating compression level
on_finish Callback Will be invoked with the resulting Buffer as the first parameter when encoding is finished, and as on_finish(null, err) in case of an error.
on_progress Callback Indicates progress by passing a number in [0.0, 1.0]. Currently, this package only invokes the callback with 0.0 and 1.0.

These methods will also return a promise that you can use directly.

This does not work exactly as described in the original LZMA-JS specification:

  • The results are Buffer objects, not integer arrays. This just makes a lot more sense in a Node.js environment.
  • on_progress is currently only called with 0.0 and 1.0.

Example code:

lzma.LZMA().compress('Bananas', 4, function(result) {
    lzma.LZMA().decompress(result, function(decompressedResult) {
        assert.equal(decompressedResult.toString(), 'Bananas');
    });
});

For an example using promises, see compress().

Creating streams for encoding

lzma.createCompressor(), lzma.createDecompressor()

  • lzma.createCompressor([options])
  • lzma.createDecompressor([options])
Param Type Description
[options] Options / int Optional. See options

Return a duplex stream, i.e. a both readable and writable stream. Input will be read, (de)compressed and written out. You can use this to pipe input through this stream, i.e. to mimick the xz command line util, you can write:

var compressor = lzma.createCompressor();

process.stdin.pipe(compressor).pipe(process.stdout);

The output of compression will be in LZMA2 format (.xz files), while decompression will accept either format via automatic detection.

lzma.Compressor(), lzma.Decompressor()

  • lzma.Compressor([preset], [options])
  • lzma.Decompressor([options])

(Compatibility; See node-xz for the original specs.)

These methods handle the .xz file format.

Param Type Description
[preset] int Optional. See options.preset
[options] Options Optional. See options

Return a duplex stream, i.e. a both readable and writable stream. Input will be read, (de)compressed and written out. You can use this to pipe input through this stream, i.e. to mimick the xz command line util, you can write:

var compressor = lzma.Compressor();

process.stdin.pipe(compressor).pipe(process.stdout);

lzma.createStream()

  • lzma.createStream(coder, options)
Param Type Description
[coder] string Any of the supported coder names, e.g. "easyEncoder" (default) or "autoDecoder".
[options] Options / int Optional. See options

Return a duplex stream for (de-)compression. You can use this to pipe input through this stream.

The available coders are (the most interesting ones first):

Less likely to be of interest to you, but also available:

  • aloneDecoder Decoder which only uses the legacy .lzma format. Supports the options.memlimit option.
  • rawEncoder Custom encoder corresponding to lzma_raw_encoder (See the native library docs for details). Supports the options.filters option.
  • rawDecoder Custom decoder corresponding to lzma_raw_decoder (See the native library docs for details). Supports the options.filters option.
  • streamEncoder Custom encoder corresponding to lzma_stream_encoder (See the native library docs for details). Supports options.filters and options.check options.
  • streamDecoder Custom decoder corresponding to lzma_stream_decoder (See the native library docs for details). Supports options.memlimit and options.flags options.

Options

Option name Type Description
check check Any of lzma.CHECK_CRC32, lzma.CHECK_CRC64, lzma.CHECK_NONE, lzma.CHECK_SHA256
memlimit float A memory limit for (de-)compression in bytes
preset int A number from 0 to 9, 0 being the fastest and weakest compression, 9 the slowest and highest compression level. (Please also see the xz(1) manpage for notes – don’t just blindly use 9!) You can also OR this with lzma.PRESET_EXTREME (the -e option to the xz command line utility).
flags int A bitwise or of lzma.LZMA_TELL_NO_CHECK, lzma.LZMA_TELL_UNSUPPORTED_CHECK, lzma.LZMA_TELL_ANY_CHECK, lzma.LZMA_CONCATENATED
synchronous bool If true, forces synchronous coding (i.e. no usage of threading)
bufsize int The default size for allocated buffers
threads int Set to an integer to use liblzma’s multi-threading support. 0 will choose the number of CPU cores.
blockSize int Maximum uncompressed size of a block in multi-threading mode
timeout int Timeout for a single encoding operation in multi-threading mode

options.filters can, if the coder supports it, be an array of filter objects, each with the following properties:

  • .id Any of lzma.FILTERS_MAX, lzma.FILTER_ARM, lzma.FILTER_ARMTHUMB, lzma.FILTER_IA64, lzma.FILTER_POWERPC, lzma.FILTER_SPARC, lzma.FILTER_X86 or lzma.FILTER_DELTA, lzma.FILTER_LZMA1, lzma.FILTER_LZMA2

The delta filter supports the additional option .dist for a distance between bytes (see the xz(1) manpage).

The LZMA filter supports the additional options .dict_size, .lp, .lc, pb, .mode, nice_len, .mf, .depth and .preset. See the xz(1) manpage for meaning of these parameters and additional information.

Miscellaneous functions

lzma.crc32()

  • lzma.crc32(input[, encoding[, previous]])

Compute the CRC32 checksum of a Buffer or string.

Param Type Description
input string / Buffer Any string or Buffer.
[encoding] string Optional. If input is a string, an encoding to use when converting into binary.
[previous] int The result of a previous CRC32 calculation so that you can compute the checksum per each chunk

Example usage:

lzma.crc32('Banana') // => 69690105

lzma.checkSize()

  • lzma.checkSize(check)

Return the byte size of a check sum.

Param Type Description
check check Any supported check constant.

Example usage:

lzma.checkSize(lzma.CHECK_SHA256) // => 16
lzma.checkSize(lzma.CHECK_CRC32)  // => 4

lzma.easyDecoderMemusage()

  • lzma.easyDecoderMemusage(preset)

Returns the approximate memory usage when decoding using easyDecoder for a given preset.

Param Type Description
preset preset A compression level from 0 to 9

Example usage:

lzma.easyDecoderMemusage(6) // => 8454192

lzma.easyEncoderMemusage()

  • lzma.easyEncoderMemusage(preset)

Returns the approximate memory usage when encoding using easyEncoder for a given preset.

Param Type Description
preset preset A compression level from 0 to 9

Example usage:

lzma.easyEncoderMemusage(6) // => 97620499

lzma.rawDecoderMemusage()

  • lzma.rawDecoderMemusage(filters)

Returns the approximate memory usage when decoding using rawDecoder for a given filter list.

Param Type Description
filters array An array of filters

lzma.rawEncoderMemusage()

  • lzma.rawEncoderMemusage(filters)

Returns the approximate memory usage when encoding using rawEncoder for a given filter list.

Param Type Description
filters array An array of filters

lzma.versionString()

  • lzma.versionString()

Returns the version of the underlying C library.

Example usage:

lzma.versionString() // => '5.2.3'

lzma.versionNumber()

  • lzma.versionNumber()

Returns the version of the underlying C library.

Example usage:

lzma.versionNumber() // => 50020012

.xz file metadata

lzma.isXZ()

  • lzma.isXZ(input)

Tells whether an input buffer is an XZ file (.xz, LZMA2 format) using the file format’s magic number. This is not a complete test, i.e. the data following the file header may still be invalid in some way.

Param Type Description
input string / Buffer Any string or Buffer (integer arrays accepted).

Example usage:

lzma.isXZ(fs.readFileSync('test/hamlet.txt.xz')); // => true
lzma.isXZ(fs.readFileSync('test/hamlet.txt.lzma')); // => false
lzma.isXZ('Banana'); // => false

(The magic number of XZ files is hex fd 37 7a 58 5a 00 at position 0.)

lzma.parseFileIndex()

  • lzma.parseFileIndex(options[, callback])

Read .xz file metadata.

options.fileSize needs to be an integer indicating the size of the file being inspected, e.g. obtained by fs.stat().

options.read(count, offset, cb) must be a function that reads count bytes from the underlying file, starting at position offset. If that is not possible, e.g. because the file does not have enough bytes, the file should be considered corrupt. On success, cb should be called with a Buffer containing the read data. cb can be invoked as cb(err, buffer), in which case err will be passed along to the original callback argument when set.

callback will be called with err and info as its arguments.

If no callback is provided, options.read() must work synchronously and the file info will be returned from lzma.parseFileIndex().

Example usage:

fs.readFile('test/hamlet.txt.xz', function(err, content) {
  // handle error

  lzma.parseFileIndex({
    fileSize: content.length,
    read: function(count, offset, cb) {
      cb(content.slice(offset, offset + count));
    }
  }, function(err, info) {
    // handle error
    
    // do something with e.g. info.uncompressedSize
  });
});

lzma.parseFileIndexFD()

  • lzma.parseFileIndexFD(fd, callback)

Read .xz metadata from a file descriptor.

This is like parseFileIndex(), but lets you pass an file descriptor in fd. The file will be inspected using fs.stat() and fs.read(). The file descriptor will not be opened or closed by this call.

Example usage:

fs.open('test/hamlet.txt.xz', 'r', function(err, fd) {
  // handle error

  lzma.parseFileIndexFD(fd, function(err, info) {
    // handle error
    
    // do something with e.g. info.uncompressedSize
    
    fs.close(fd, function(err) { /* handle error */ });
  });
});

Installation

This package includes the native C library, so there is no need to install it separately.

Licensing

The original C library package contains code under various licenses, with its core (liblzma) being public domain. See its contents for details. This wrapper is licensed under the MIT License.

Related projects

Other implementations of the LZMA algorithms for node.js and/or web clients include:

Note that LZMA has been designed to have much faster decompression than compression, which is something you may want to take into account when choosing an compression algorithm for large files. Almost always, LZMA achieves higher compression ratios than other algorithms, though.

Acknowledgements

Initial development of this project was financially supported by Tradity.