tsmaz

TypeScript port of the smaz string compression library

Usage no npm install needed!

<script type="module">
  import tsmaz from 'https://cdn.skypack.dev/tsmaz';
</script>

README

tsmaz

A port of the smaz small string compression library.

From the original library:

Smaz is a simple compression library suitable for compressing very short strings. General purpose compression libraries will build the state needed for compressing data dynamically, in order to be able to compress every kind of data. This is a very good idea, but not for a specific problem: compressing small strings will not work.

Smaz instead is not good for compressing general purpose data, but can compress text by 40-50% in the average case (works better with English), and is able to perform a bit of compression for HTML and urls as well. The important point is that Smaz is able to compress even strings of two or three bytes!

For example the string "the" is compressed into a single byte.

To compare this with other libraries, think that like zlib will usually not be able to compress text shorter than 100 bytes.

Usage

Install,

npm install tsmaz

then,

const { compress, decompress } = require('tsmaz');

const compressed = compress('foobar');
console.log(decompress(compressed));

It is also possible to use a custom codebook:

const { Smaz } = require('tsmaz');

// NOTE: this array needs to have a maximum length of 254!
const smaz = new Smaz([
  'foo',
  'bar',
  'foobar',
  'er',
  'ab',
  'aa',
];

smaz.decompress(smaz.compress('foo'));

Generating Codebooks

The library exports a generate function, which accepts a list of strings which you would like to compress, then learns a dictionary which works well with the kind of data given as input.

const { generate, Smaz } = require('tsmaz');

// Generate codebook
const codebook = generate([
  'foo',
  'foobar',
  'bar',
  'baz-bar',
]);

// Use custom codebook
const smaz = new Smaz(codebook);

const compressed = smaz.compress('foo-barbaz');
console.log(compressed);
const original = smaz.decompress(compressed);
console.log(original);

Performance

tsmaz makes use of a trie data-structure for look-up (whereas the original implementation used a hashtable). Apart from that the behavior should be the same, and performance is pretty good (around 32k bytes per millisecond for compression and 70k bytes per millisecond for decompression).