@pacote/bloom-filter

Space-efficient probabilistic data structures.

Usage no npm install needed!

<script type="module">
  import pacoteBloomFilter from 'https://cdn.skypack.dev/@pacote/bloom-filter';
</script>

README

@pacote/bloom-filter

version minified minified + gzip

A Bloom filter is a space-efficient probabilistic data structure that allows testing whether an element belongs to a set.

Bloom filters relax result accuracy for this efficiency. With Bloom filters, false positive matches are possible, but false negatives are not. That is to say, while it can tell you with certainty when an element is not in a set, any positive responses indicate only a possibility.

This false positive error rate can be lowered — but never completely eliminated — by increasing the size of the filter and/or the number of hashes computed for each element.

Installation

yarn add @pacote/bloom-filter

Usage

import { BloomFilter } from '@pacote/bloom-filter'

const filter = new BloomFilter({ size: 22056, hashes: 8 })

filter.add('foo')
filter.add('bar')

filter.has('foo') // -> true
filter.has('bar') // -> true
filter.has('baz') // -> false

BloomFilter<T extends { toString(): string }>

BloomFilter builds a data structure that may be used to test the membership of any String-serialisable value (i.e. an object which implements the toString() method).

Because a Bloom filter is so simple, it cannot handle removing elements. You may use a CountingBloomFilter for that.

The class constructor takes an Options object with the following properties:

  • size (number, required) determines the size of the filter in bits. The value is required and must not be negative.

  • hashes (number, required) sets the number of distinct hashes that need to be calculated. A higher number performs more slowly, but lowers the probability of false positives occuring. The value is required and must be a positive integer.

  • seed (number) sets the seed for the hashing function. The default is 0x00c0ffee.

  • filter (BloomFilter<T>) allows initialising the filter from an array of unsigned 32-bit integers.

Class instances may be serialised into JSON using JSON.stringify() for storage or for sending over the network. The JSON string can then be deserialised and fed back into the constructor to recreate the original Bloom filter.

add(element: T): void

The add() method mutates to filter to indicate that the provided serialisable element is present.

has(element: T): boolean

The has() method checks the filter for membership of the provided value. If false, it is guaranteed not to be present. Otherwise, there's the possibility of it being a false positive result.

CountingBloomFilter<T extends { toString(): string }>

CountingBloomFilter builds a data structure that may be used to test the membership of any String-serialisable value (i.e. an object which implements the toString() method). It's a generalisation of BloomFilter that counts the number of times an element was added to the set.

Unlike the simpler BloomFilter, this class supports element removal.

The class constructor takes an Options object with the following properties:

  • size (number, required) determines the number of filter counters. The value is required and must not be negative.

  • hashes (number, required) sets the number of distinct hashes that need to be calculated. A higher number performs more slowly, but lowers the probability of false positives occuring. The value is required and must be a positive integer.

  • seed (number) sets the seed for the hashing function. The default is 0x00c0ffee.

  • filter (BloomFilter<T>) allows initialising the filter from an array of unsigned 32-bit integers.

Class instances may be serialised into JSON using JSON.stringify() for storage or for sending over the network. The JSON string can then be deserialised and fed back into the constructor to recreate the original Bloom filter.

add(element: T): void

The add() method mutates to filter and increments its counters to indicate an instance of the element was added.

remove(element: T): void

The remove() method mutates to filter and decrements its counters to indicate an instance of the element was removed.

has(element: T): number

The has() method checks the filter for membership of the provided element and returns the possible number of times it was added to the filter.

As with regular Bloom filters, there is a possibility of the result being a false positive, meaning the element may have been added fewer times than the returned value indicates.

optimal(items: number, errorRate: number): Options

The optimal() helper function calculates the optimal Bloom filter size and hashes options based on the number of items in the filter (n) and the desired false positive error rate (ε).

The size of the filter, or m, is calculated with:

The number of hashes, or k, is determined by the formula:

Hashing algorithms

This class depends on xxhashjs for an implementation of the fast XXH64 non-cryptographic hashing algorithm to build and search the filter via enhanced double hashing.

License

MIT © Luís Rodrigues.