@cloudblob/store

Provider agnostic searchable document store that runs on cloud object storage

Usage no npm install needed!

<script type="module">
  import cloudblobStore from 'https://cdn.skypack.dev/@cloudblob/store';
</script>

README

cloudblob-store

Build status Coverage Status Maintainability

Node document store built on cloud persistent storage - currently only AWS S3 is supported. Hope to add Azure Blob storage and Google Cloud Storage soon.

Overview

Use cloudblob-store as a hobbyist, for prototyping or even for scaling (this would require a propper caching strategy to keep request time quick).

Offers indexing & search capabilities out the box through the help of libraries like, FlexSearch and Elasticlunr.

Why

Sometimes you need a data storage backend which is rarely updated and frequently read and should scale when required. Combining the persistence of cloud object/blob storage and serverless architecture gives us that versatility, scaleability and ease of development.

The cloudblob stack was developed to provide a lightweight datastore solution for high read and low write applications that's also very easy to implement and also extremely cost effective. Combine this with caching and indexing workers to provide a scaleable eventually consistent data store.

One of the main aims is to avoid vendor lock-in. If you want to host your own stack have a look at cloudblob-server.

The interface of the datastore client is simple enough. If the latency of cloud storage doesn't work for you, and you've already tried caching. You could always just wrap a mongo client with the same interface.

Getting started

Install the package

npm install @cloudblob/store

Example Usage

const {Datastore, AWS, Flexsearch} = require('@cloudblob/store');

var awsConfig = {
    // AWS-sdk s3 client parameters
    accessKeyId: "xxx...",
    secretAccessKey: "xxx..."
}

const store = new Datastore({
  // the db name here is the bucket name
  db: 'example-database',
  storage: new AWS(awsConfig),
  // specify the namespaces and their indexer class, each namespace can use a different indexer
  // so you can optimise for different types of data
  namespaces: {
    // the parameters for the indexer are (fields array, unique ref)
    user:  {
        indexer: new Flexsearch(['name', 'about', 'age'], '_id'), // the 'indexer' field is optional. Use it if you want namespace to be searchable.
        ref: "_id" // unique field to use for constructing storage key/path
    }
  }
});

var doc = {
    name: "John Doe",
    about: "Lorem ipsum dolar sit amet...",
    age: '30'
}

// save a document, it returns a promise that resolves the saved document (including it's autogenerated unique reference)
store.put('user', doc).then(console.log)
// Prints
// {
//     _id: "<auto_generated_uuid4_hex>",
//     name: "John Doe",
//     about: "I'm a deceased person",
//     age: '30'
// }

// index the document, the namespace index file is lazyloaded
store.index('user', doc).then(console.log)

// at this stage a manual index flush/dump is required.
store.dumpIndex('user').then(console.log)
// prints 
// 'true' or 'false'

// read the document
store.get('user', 'some_doc_key').then(console.log)

// search namespace index (returns key only by default)
store.filter('user', 'John Doe').then(console.log)

// list namespace documents as a paginated response
store.list('user').then(console.log)

Improvements

  • Move the storage backend code to separate repositories to reduce unnecessary SDK bloat
  • Move indexers to separate package
  • Implementing a shardable indexer since index files are lazy loaded