@rmw/mdbx

Simple, efficient, scalable data store wrapper for libmdbx

Usage no npm install needed!

<script type="module">
  import rmwMdbx from 'https://cdn.skypack.dev/@rmw/mdbx';
</script>

README

lmdbx-store

license npm version get put

lmdbx-store is an ultra-fast interface to libmdbx, which is derived from LMDB. This package provides an extremly fastest and most efficient NodeJS key-value/database interface that exists for full storage and retrieval of structured JS data (objects, arrays, etc.) in a true persisted, scalable, ACID-compliant, database. It provides a simple interface for interacting with libmdbx, as a key-value store, that makes it easy to properly leverage the power, crash-proof design, and efficiency of libmdbx using intuitive JavaScript, and is designed to scale across multiple processes or threads. lmdbx-store offers several key features that make it idiomatic, highly performant, and easy to use libmdbx efficiently:

  • High-performance translation of JS values and data structures to/from binary key/value data
  • Queueing asynchronous off-thread write operations with promise-based API
  • Automated database size handling
  • Simpe transaction management
  • Iterable queries/cursors
  • Record versioning and optimistic locking for scalability/concurrency
  • Optional native off-main-thread compression with high-performance LZ4 compression
  • And ridiculously fast and efficient:

Benchmarking on Node 14.9, with 3.4Ghz i7-4770 Windows, a get operation, using JS numbers as a key, retrieving data from the database (random access), and decoding the data into a structured object with 10 properties (using default MessagePack encoding), can be done in less than one microsecond, or a little over a 1,200,000/sec on a single thread. This is almost twice as fast as a single native JSON.parse call with the same object without any DB interaction! libmdbx scales effortlessly across multiple processes or threads; over 4,500,000 operations/sec on the same 4/8 core computer by running across multiple threads. By running writes on a separate transactional thread, these are extremely fast as well. With encoding the same objects, full encoding and writes can be performed at about 500,000 puts/second or 1,700,000 puts/second on multiple threads.

Design

lmdbx-store handles translation of JavaScript values, primitives, arrays, and objects, to and from the binary storage of libmdbx keys and values with highly optimized code using native C++ code for remarkable performance. It supports multiple types of JS values for keys and values, making it easy to use idiomatic JS for storing and retrieving data.

lmdbx-store is designed for synchronous reads, and asynchronous writes. In idiomatic NodeJS code, I/O operations are performed asynchronously. lmdbx-store observes this design pattern; because libmdbx is a memory-mapped database, read operations do not use any I/O (other than the slight possibility of a page fault), and can almost always be performed faster than Node's event queue callbacks can even execute, and it is easier to write code for instant synchronous values from reads. On the otherhand, in default mode with sync'ed/flushed transactions, write operations do involve I/O, and furthermore can achieve vastly higher throughput by batching operations. The entire transaction of batch operation are performed in a separate thread. Consequently, lmdbx-store is designed for writes to go through this asynchronous batching process and return a simple promise that resolves once the write is completed and flushed to disk.

libmdbx supports multiple modes of transactions, including disabling of file sync'ing (noSync), which makes transaction commits much faster. We highly discourage turning off sync'ing as it leaves the database prone to data corruption. With the default sync'ing enabled, libmdbx has a crash-proof design; a machine can be turned off at any point, and data can only be corrupted if the written data is actually changed or tampered. This does increase latency of transactions (although not necessarily less efficient). However, by batching writes, when a database is under load, slower transactions enable more writes per transaction, and lmdbx-store is able to drive libmdbx to achieve the same levels of throughput with safe sync'ed transactions as without, while still preserving the durability/safety of sync'ed transactions.

lmdbx-store supports and encourages the use of conditional writes; this allows for atomic operations that are dependendent on previously read data, and most transactional types of operations can be written with an optimistic-locking based, atomic-conditional-write pattern. This allows lmdbx-store to scale to handle concurrent execution across many processes or threads while maintaining data integrity.

When an lmdbx-store is created, an libmdbx environment/database is created, and starts with a default DB size of 1MB. libmdbx itself uses a fixed size, but lmdbx-store detects whenever the database goes beyond the current size, and automatically increases the size of DB, and re-executes the write operations after resizing. With this, you do not have to make any estimates of database size, the databases automatically grow as needed (as you would expect from a database!)

lmdbx-store provides optional compression using LZ4 that works in conjunction with the asynchronous writes by performing the compression in the same thread (off the main thread) that performs the writes in a transaction. LZ4 is extremely fast, and decompression can be performed at roughly 5GB/s, so excellent storage efficiency can be achieved with almost negligible performance impact.

lmdbx-store is built on the excellent node-lmdb package.

Usage

An lmdbx-store instances is created with by using open export from the main module:

const { open } = require('lmdbx-store');
// or
// import { open } from 'lmdbx-store';
let myStore = open({
    path: 'my-store',
    // any options go here, we can turn on compression like this:
    compression: true,
});
await myStore.put('greeting', { someText: 'Hello, World!' })
myStore.get('greeting').someText // 'Hello, World!'

(see store options below for more options)

Once you have created a store, you can store and retrieve values using keys:

Upgrade Note

libmdbx 1.0RC (reported as 0.9.90) has upgraded their database format (incompatible with libmdbx 0.9). lmdbx-store 0.8.x uses this new database format and includes an automatic upgrade script that will upgrade an existing legacy database to the new format. To use the automatic upgrade script, you must install the lmdbx-store-0.9 package.

Keys

When using the various APIs, keys can be any JS primitive (string, number, boolean, symbol), an array of primitives, or a Buffer. These primitives are translated to binary keys used by libmdbx in such a way that consistent ordering is preserved. Numbers are ordered naturally, which come before strings, which are ordered lexically. The keys are stored with type information preserved. The getRangeoperations that return a set of entries will return entries with the original JS primitive values for the keys. If arrays are used as keys, they are ordering by first value in the array, with each subsequent element being a tie-breaker. Numbers are stored as doubles, with reversal of sign bit for proper ordering plus type information, so any JS number can be used as a key. For example, here are the order of some different keys:

Symbol.for('even symbols')
-10 // negative supported
-1.1 // decimals supported
400
3E10
'Hello'
['Hello', 'World']
'World'
'hello'
['hello', 1, 'world']
['hello', 'world']

You can override the default encoding of keys, and cause keys to be returned as node buffers using the keyIsBuffer database option (generally slower).

Values

You can store a wide variety of JavaScript values and data structures in lmdbx-store, including objects (with arbitray complexity), arrays, buffers, strings, numbers, etc. in your database. Values are stored and retrieved according the database encoding, which can be set using the encoding property on the database options. By default, data is stored using MessagePack, but there are four supported encodings:

  • msgpack (default) - All values are stored by serializing the value as MessagePack (using the msgpackr package). Values are decoded and parsed on retrieval, so get and getRange will return the object, array, or other value that you have stored. The msgpackr package is extremely fast (faster than native JSON), and provides the most flexibility in storing different value types. See the Shared Structures section for how to achieve maximum efficiency with this.
  • cbor - This specifies all values use the CBOR format, which requires that the cbor-x package be installed. This package is based on msgpackr and supports all the same options.
  • json - All values are stored by serializing the value as JSON (using JSON.stringify) and encoded with UTF-8. Values are decoded and parsed on retrieval using JSON.parse. Generally this does not perform as all as msgpack, nor support as many value types.
  • string - All values should be strings and stored by encoding with UTF-8. Values are returned as strings from get.
  • binary - Values are returned as (Node) buffer objects, representing the raw binary data. Note that creating buffer objects in NodeJS has some overhead and while this is fast and valuable direct storage of binary data, the data encodings provides faster and more optimized process for serializing and deserializing structured data.

Once you have a store, the following methods are available:

store.get(key): any

This will retrieve the value at the specified key. The key must be a JS value/primitive as described above, and the return value will be the stored data (dependent on the encoding), or undefined if the entry does not exist.

store.getEntry(key): any

This will retrieve the the entry at the specified key. The key must be a JS value/primitive as described above, and the return value will be the stored data (dependent on the encoding), or undefined if the entry does not exist. An entry is object with a value property for the value in the database, and a version property for the version number of the entry in the database (if useVersions is enabled for the database).

store.put(key, value, version?: number, ifVersion?: number): Promise<boolean>

This will store the provided value/data at the specified key. If the database is using versioning (see options below), the version parameter will be used to set the version number of the entry. If the ifVersion parameter is set, the put will only occur if the existing entry at the provided key has the version specified by ifVersion at the instance the commit occurs (libmdbx commits are atomic by default). If the ifVersion parameter is not set, the put will occur regardless of the previous value.

This operation will be enqueued to be written in a batch transaction. Any other operations that occur within a certain timeframe (until next event after I/O by default) will also occur in the same transaction. This will return a promise for the completion of the put. The promise will resolve once the transaction has finished committing. The resolved value of the promise will be true if the put was successful, and false if the put did not occur due to the ifVersion not matching at the time of the commit.

If this is performed inside a transation, the put will be included in the current transaction (synchronously).

store.remove(key, valueOrIfVersion?: number): Promise<boolean>

This will delete the entry at the specified key. This functions like put, with the same optional conditional version. This is batched along with put operations, and returns a promise indicating the success of the operation. If you are using a database with duplicate entries per key (with dupSort flag), you can specify the value to remove as the second parameter (instead of a version).

Again, if this is performed inside a transation, the removal will be included in the current transaction (synchronously).

store.putSync(key, value: Buffer, ifVersion?: number): boolean

This will set the provided value at the specified key, but will do so synchronously. If this is called inside of a synchronous transaction, this put will be added to the current transaction. If not, a transaction will be started, the put will be executed, and the transaction will be committed, and then the function will return. We do not recommend this be used for any high-frequency operations as it can be vastly slower (for the main JS thread) than the put operation (often taking multiple milliseconds).

store.removeSync(key, valueOrIfVersion?: number): boolean

This will delete the entry at the specified key. This functions like putSync, providing synchronous entry deletion, and uses the same arguments as remove. This returns true if there was an existing entry deleted, false if there was no matching entry.

store.ifVersion(key, ifVersion: number, callback): Promise<boolean>

This executes a block of conditional writes, and conditionally execute any puts or removes that are called in the callback, using the provided condition that requires the provided key's entry to have the provided version.

store.ifNoExists(key, callback): Promise<boolean>

This executes a block of conditional writes, and conditionally execute any puts or removes that are called in the callback, using the provided condition that requires the provided key's entry does not exist yet.

store.transaction(execute: Function)

This will begin synchronous transaction, execute the provided function, and then commit the transaction. The provided function can perform gets, puts, and removes within the transaction, and the result will be committed. The execute function can return a promise to indicate an ongoing asynchronous transaction, but generally you want to minimize how long a transaction is open on the main thread, at least if you are potentially operating with multiple processes.

store.getRange(options: { start?, end?, reverse?: boolean, limit?: number, offset?: number, versions?: boolean}): Iterable<{ key, value: Buffer }>

This starts a cursor-based query of a range of data in the database, returning an iterable that also has map, filter, and forEach methods. The start and end indicate the starting and ending key for the range. The reverse flag can be used to indicate reverse traversal. The limit can limit the number of entries returned. The returned cursor/query is lazy, and retrieves data as iteration takes place, so a large range could specified without forcing all the entries to be read and loaded in memory upfront, and one can exit out of the loop without traversing the whole range in the database. The query is iterable, we can use it directly in a for-of:

for (let { key, value } of db.getRange({ start, end })) {
    // for each key-value pair in the given range
}

Or we can use the provided iterative methods on the returned results:

db.getRange({ start, end })
    .filter(({ key, value }) => test(key))
    .forEach(({ key, value }) => {
        // for each key-value pair in the given range that matched the filter
    })

Note that map and filter are also lazy, they will only be executed once their returned iterable is iterated or forEach is called on it. The map and filter functions also support async/promise-based functions, and you can create async iterable if the callback functions execute asynchronously (return a promise).

We can also query with offset to skip a certain number of entries, and limit the number of entries to iterate through:

db.getRange({ start, end, offset: 10, limit: 10 }) // skip first 10 and get next 10

If you want to get a true array from the range results, the asArray property will return the results as an array.

store.getValues(key, options?): Iterable<any>

When using a store with duplicate entries per key (with dupSort flag), you can use this to retrieve all the values for a given key. This will return an iterator just like getRange, except each entry will be the value from the database:

let db = store.openDB('my-index', {
    dupSort: true
})
await db.put('key1', 'value1')
await db.put('key1', 'value2')
for (let value of db.getValues('key1')) {
    // iterate values 'value1', 'value2'
}
await db.remove('key', 'value1') // only remove the second value under key1
for (let value of db.getValues('key1')) {
    // just iterate value 'value1'
}

You can optionally provide a second argument with the same options that getRange handles.

store.getKeys(options: { start?, end?, reverse?: boolean, limit?: number, offset?: number, versions?: boolean }): Iterable<any>

This behaves like getRange, but only returns the keys. If this is duplicate key database, each key is only returned once (even if it has multiple values/entries).

store.openDB(database: string|{name:string,...})

libmdbx supports multiple databases per environment (an environment is a single memory-mapped file). When you initialize an libmdbx store with open, the store uses the default root database. However, you can use multiple databases per environment/file and instantiate a store for each one. If you are going to be opening many databases, make sure you set the maxDbs (it defaults to 12). For example, we can open multiple stores for a single environment:

const { open } = require('lmdbx-store');
let rootStore = open('all-my-data');
let usersStore = myStore.openDB('users');
let groupsStore = myStore.openDB('groups');
let productsStore = myStore.openDB('products');

Each of the opened/returned stores has the same API as the default store for the environment. Each of the stores for one environment also share the same batch queue and automated transactions with each other, so immediately writing data from two stores with the same environment will be batched together in the same commit. For example:

usersStore.put('some-user', { data: userInfo });
groupsStore.put('some-group', { groupData: moreData });

Both these puts will be batched and committed in the same transaction in the next event turn.

getLastVersion(): number

This returns the version number of the last entry that was retrieved with get (assuming it was a versioned database). If you are using a database with cache enabled, use getEntry instead.

close(): void

This will close the current store. This closes the underlying libmdbx database, and if this is the root database (opened with open as opposed to store.openDB), it will close the environment (and child stores will no longer be able to interact with the database).

Concurrency and Versioning

libmdbx and lmdbx-store are designed for high concurrency, and we recommend using multiple processes to achieve concurrency with lmdbx-store (processes are more robust than threads, and thread's advantage of shared memory is minimal with separate NodeJS isolates, and you still get shared memory access with processes when using libmdbx). Versioning is the preferred method for achieving atomicity with data updates with concurrency. A version can be stored with an entry, and later the data can be updated, conditional on the version being the expected version. This provides a robust mechanism for concurrent data updates even with multiple processes accessing the same database. To enable versioning, make sure to set the useVersions option when opening the database:

let myStore = open('my-store', { useVersions: true })

You can set a version by using the version argument in put calls. You can later update data and ensure that the data will only be updated if the version matches the expected version by using the ifVersion argument. When retrieving entries, you can access the version number by calling getLastVersion().

You can then make conditional writes, examples:

myStore.put('key1', 'value1', 4, 3); // new version of 4, only if previous version was 3
myStore.ifVersion('key1', 4, () => {
    myStore.put('key1', 'value2', 5); // equivalent to myStore.put('key1', 'value2', 5, 4);
    myStore.put('anotherKey', 'value', 3); // we can do other puts based on the same condition above
    // we can make puts in other stores (from the same db environment) based on same condition too
    myStore2.put('keyInOtherDb', 'value'); 
});

Shared Structures

Shared structures are mechanism for storing the structural information about objects stored in database in dedicated entry, outside of individual entries, for reuse across all of the data in database, for much more efficient storage and faster retrieval of data when storing objects that have the same or similar structures (note that this is only available using the default MessagePack or CBOR encoding, using the msgpackr or cbor-x package). This is highly recommended when storing structured objects with similiar object structures (including inside of array) in lmdbx-store. When enabled, when data is stored, any structural information (the set of property names) is automatically generated and stored in separate entry to be reused for storing and retrieving all data for the database. To enable this feature, simply specify the key where lmdbx-store can store the shared structures. You can use a symbol as a metadata key, as symbols are outside of the range of the standard JS primitive values:

let myStore = open('my-store', {
    sharedStructuresKey: Symbol.for('structures')
})

Once shared structures has been enabled, you can store JavaScript objects just as you would normally would, and lmdbx-store will automatically generate, increment, and save the structural information in the provided key to improve storage efficiency and performance. You never need to directly access this key, just be aware that that entry is being used by lmdbx-store.

Compression

lmdbx-store can optionally use off-thread LZ4 compression as part of the asynchronous writes to enable efficient compression with virtually no overhead to the main thread. LZ4 decompression (in get and getRange calls) is extremely fast and generally has little impact on performance. Compression is turned off by default, but can be turned on by setting the compression property when opening a database. The value of compression can be true or an object with compression settings, including properties:

  • threshold - Only entries that are larger than this value (in bytes) will be compressed. This defaults to 1000 (if compression is enabled)
  • dictionary - This can be buffer to use as a shared dictionary. This is defaults to a shared dictionary in lmdbx-store that helps with compressing JSON and English words in small entries. Zstandard provides utilities for creating your own optimized shared dictionary. For example:
let myStore = open('my-store', {
    compression: {
        threshold: 500, // compress any entry larger than 500 bytes
        dictionary: fs.readFileSync('dict.txt') // use your own shared dictionary
    }
})

Compression is recommended for large databases that may be larger than available RAM, to improve caching and reduce page faults.

Caching

lmdbx-store supports caching of entries from stores, and uses a LRU/LFU (LRFU) and weak-referencing caching mechanism for highly optimized caching and object tracking. There are several key potential benefits to using caching, including performance, key correlation with object identity, and immediate/synchronous access to saved data. Enabling caching will cache gets and puts, which can make frequent gets much faster. Caching is enabled by providing a truthy value for the cache property on the store options.

The weak-referencing mechanism works in harmony with JS garbage collection to allow objects to be cached without preventing GC, and retrieved from the cache until they have actually been collected from memory, making more efficient use of memory. This also can provide a guarantee of object identity correlation with keys: as long as retrieved object is in memory, a get will always return the existing object, and get never will return two copies of the same object (for the same key). The LRFU caching mechanism is scan-resistant, tracking frequency of usage as well as recency.

Because asynchronous put operations immediately go in the cache (and are pinned in the cache until committed), the caching enabled, put values can be retrieved via get, immediately and synchronously after the put call. Without caching enabled, you need wait for the put promise to resolve (indicating it has been committed) before you can access the stored value, but the cache enables the value to be immediately without waiting for the commit to finish:

store.put('hi', 'there');
store.get('hi'); // can immediately access value without having to await the promise

While caching can improve performance, libmdbx itself is extremely fast, and for small objects with sporadic access, caching may not improve performance. Caching tends to provide the most performance benefits for larger objects that may have more significant deserialization costs. Caching does not apply to getRange queries. Also note that this requires Node 14.10 or higher (or Node v13.0 with --harmony-weak-ref flag).

If you are using caching with a database that has versions enabled, you should use the getEntry method to get the value and version, as getLastVersion will not be reliable (only returns the version when the data is accessed from the database).

Store Options

The open method can be used to create the main database/environment with the following signature: open(path, options) or open(options) Additional databases can be opened within the main database environment with: store.openDB(name, options) or store.openDB(options) If the path has an . in it, it is treated as a file name, otherwise it is treated as a directory name, where the data will be stored. The options argument to either of the functions should be an object, and supports the following properties, all of which are optional (except name if not otherwise specified):

  • name - This is the name of the database. This defaults to null (which is the root database) when opening the database environment (open). When an opening a database within an environment (openDB), this is required, if not specified in first parameter.
  • encoding - Sets the encoding for the database, which can be 'msgpack', 'json', 'cbor', 'string', or 'binary'.
  • sharedStructuresKey - Enables shared structures and sets the key where the shared structures will be stored.
  • compression - This enables compression. This can be set a truthy value to enable compression with default settings, or it can be an object with compression settings.
  • cache - Setting this to true enables caching. This can also be set to an object specifying the settings/options for the cache (see settings for weak-lru-cache).
  • useVersions - Set this to true if you will be setting version numbers on the entries in the database. Note that you can not change this flag once a database has entries in it (or they won't be read correctly).
  • encryptionKey - This enables encryption, and the provided value is the key that is used for encryption. This may be a buffer or string, but must be 32 bytes/characters long. This uses the Chacha8 cipher for fast and secure on-disk encryption of data.
  • keyIsBuffer - This will cause the database to expect and return keys as node buffers.
  • keyIsUint32 - This will cause the database to expect and return keys as unsigned 32-bit integers.
  • dupSort - Enables duplicate entries for keys. You will usually want to retrieve the values for a key with getValues.

The following additional option properties are only available when creating the main database environment (open):

  • path - This is the file path to the database environment file you will use.
  • maxDbs - The maximum number of databases to be able to open (there is some extra overhead if this is set very high).
  • maxReaders - The maximum number of concurrent read transactions (readers) to be able to open (more information).
  • commitDelay - This is the amount of time to wait (in milliseconds) for batching write operations before committing the writes (in a transaction). This defaults to 0. A delay of 0 means more immediate commits with less latency (uses setImmediate), but a longer delay (which uses setTimeout) can be more efficient at collecting more writes into a single transaction and reducing I/O load. Note that NodeJS timers only have an effective resolution of about 10ms, so a commitDelay of 1ms will generally wait about 10ms.
  • immediateBatchThreshold - This parameter defines a limit on the number of batched bytes in write operations that can be pending for a transaction before ldmb-store will schedule the asynchronous commit for the immediate next even turn (with setImmediate). The default is 10,000,000 (bytes).
  • syncBatchThreshold - This parameter defines a limit on the number of batched bytes in write operations that can be pending for a transaction before ldmb-store will be force an immediate synchronous commit of all pending batched data for the store. This provides a safeguard against too much data being enqueued for asynchronous commit, and excessive memory usage, that can sometimes occur for a large number of continuous put calls without waiting for an event turn for the timer to execute. The default is 200,000,000 (bytes).

libmdbx Flags

In addition, the following options map to libmdbx's env flags, described here (only noMemInit is recommended, but others are available for boosting performance):

  • noMemInit - This provides a small performance boost (when not using useWritemap) for writes, by skipping zero'ing out malloc'ed data, but can leave application data in unused portions of the database.
  • noReadAhead - This disables read-ahead caching. Turning it off may help random read performance when the DB is larger than RAM and system RAM is full. However, this is not supported by all OSes, including Windows.
  • useWritemap - Use writemaps, this improves performance by reducing malloc calls, but can increase risk of a stray pointer corrupting data.
  • noSubdir - Treat path as a filename instead of directory (this is the default if the path appears to end with an extension and has '.' in it)
  • noSync - Doesn't sync the data to disk. We highly discourage this flag, since it can result in data corruption and lmdbx-store mitigates performance issues associated with disk syncs by batching.
  • noMetaSync - This isn't as dangerous as noSync, but doesn't improve performance much either.
  • readOnly - Self-descriptive.
  • mapAsync - Not recommended, lmdbx-store provides the means to ensure commits are performed in a separate thread (asyncronous to JS), and this prevents accurate notification of when flushes finish.

Serialization options

If you are using the default encoding of 'msgpack', the msgpackr package is used for serialization and deserialization. You can provide store options that are passed to msgpackr, as well. For example, these options can be potentially useful:

  • structuredClone - This enables the structured cloning extensions that will encode object/cyclic references and additional built-in types/classes.
  • useFloat32: 4 - Encode floating point numbers in 32-bit format when possible.

You can also use the CBOR format by specifying the encoding of 'cbor' and installing the cbor-x package, which supports the same options.

Events

The lmdbx-store instance is an EventEmitter, allowing application to listen to database events. There is just one event right now:

beforecommit - This event is fired before a batched operation begins to start a transaction to write all queued writes to the database. The callback function can perform additional (asynchronous) writes (put and remove) and they will be included in the transaction about to be performed (this can be useful for updating a global version stamp based on all previous writes, for example).

License

lmdbx-store is licensed under the terms of the MIT license.

Related Projects

lmdbx-store is built on top of node-lmdb lmdbx-store uses msgpackr for the default serialization of data msgpackr cobase is built on top of lmdbx-store: cobase