levi

Stream based full-text search for Node.js and browser. Using LevelDB as storage backend.

Usage no npm install needed!

<script type="module">
  import levi from 'https://cdn.skypack.dev/levi';
</script>

README

Levi

Stream based full-text search for Node.js and browsers. Using LevelDB as storage backend.

Build Status

npm install levi

Full-text search using TF-IDF and cosine similarity plus query-time field boost options. Provided with configurable text processing pipeline: Tokenizer, Porter Stemmer and Stopwords filter.

Levi is built on LevelUP - a fast, asynchronous, transactional storage interface. By default, it uses LevelDB on Node.js and IndexedDB on browser. Also works with a variety of LevelDOWN compatible backends.

Using stream based query mechanism with Highland, Levi is designed to be memory efficient, and extensible by combining multiple scoring mechanisms.

API

levi(path, [options])

levi(db, [options])

Create a new Levi instance with a LevelUP database path or instance, or with a SublevelUP section.

var levi = require('levi')

// levi instance of database path `db`
var lv = levi('db') 
.use(levi.tokenizer())
.use(levi.stemmer())
.use(levi.stopword())

Text processing pipeline levi.tokenizer(), levi.stemmer(), levi.stopword() are required for indexing. These are exposed as ginga plugins so that they can be swapped for different language configurations.

.put(key, value, [options], [callback])

Index document identified by key. value can be object or string. Use object fields for value if you want field boost options for search.

All fields are indexed by default. Set options.fields object to specify fields to be indexed.

Accepts optional callback function or returns a promise.

// string as value
lv.put('a', 'Lorem Ipsum is simply dummy text.', function (err) { ... })

// object fields as value
lv.put('b', {
  id: 'b',
  title: 'Lorem Ipsum',
  body: 'Dummy text of the printing and typesetting industry.'
}, function (err) { ... })

// options.fields
lv.put('c', {
  id: 'c',
  title: 'Hello World',
  body: 'Bla bla bla'
}, {
  fields: { title: true } // index title only
}).then(...).catch(...) // returns promise if no callback function

.del(key, [options], [callback])

Delete document key from index.

Accepts optional callback function or returns a promise.

.batch(array, [options], [callback])

Atomic bulk-write operations put and del, similar to LevelUP's array form of batch()

Accepts optional callback function or returns a promise.

lv.batch([
  { type: 'put', key: 'a', value: 'Lorem Ipsum is simply dummy text.' },
  { type: 'del', key: 'b' }
], function (err) { ... })

.get(key, [options], [callback])

Fetch value from the store. Works exactly like LevelUP's get()

Accepts optional callback function or returns a promise.

.readStream([options])

Obtain a ReadStream of documents, lexicographically sorted by key. Works exactly like LevelUP's readStream()

.searchStream(query, [options])

The main search interface of Levi is a Node compatible highland object stream. query can be a string or object fields.

Accepts following options:

  • fields control field boosts. By default every fields weight equally.
  • gt (greater than), gte (greater than or equal) define the lower bound of key range to be searched.
  • lt (less than), lte (less than or equal) define the upper bound of key range to be searched.
  • offset number, offset results. Default 0.
  • limit number, limit number of results. Default infinity.
  • expansions number, maximum expansions of prefix matching for "search as you type" behaviour. Default 0.

A "more like this" query can be done by searching with document itself.

lv.searchStream('lorem ipsum').toArray(function (results) { ... }) // highland method

lv.searchStream('lorem ipsum', {
  fields: { title: 10, '*': 1 } // title field boost. '*' means any field
}).pipe(...)

lv.searchStream('lorem ipusm', {
  fields: { title: 1 }, // title only
}).pipe(...)

// ltgt
lv.searchStream('lorem ipusm', {
  gt: '!posts!',
  lt: '!posts!~'
}).pipe(...)

// document as query
lv.searchStream({ 
  title: 'Lorem Ipsum',
  body: 'Dummy text of the printing and typesetting industry.'
}).pipe(...)

// maximum 10 expansions. 'ips' may also match 'ipso', 'ipsum' etc.
lv.searchStream('lorem ips', {
  expansions: 10
}).pipe(...)

result is of form

{
  key: 'b',
  score: 0.5972843431749838,
  value: { 
    id: 'b',
    title: 'Lorem Ipsum',
    body: 'Dummy text of the printing and typesetting industry.'
  } 
}

.scoreStream(query, [options])

Underlying scoring mechanism of searchStream(). Calculates relevancy score of documents against query, lexicographically sorted by key. Accepts options fields, gt, gte, lt, lte, expansions.

Useful for combining multiple criteria or scoring mechanisms to build a more advanced search functionality.

.pipeline(obj, [callback])

Underlying text processing pipeline of index and query, which extracts text tokens from a serializable obj object.

Accepts optional callback function or returns a promise.

lv.pipeline({
  a: 'foo bar is a placeholder name',
  b: ['foo', 'bar'],
  c: 167,
  d: null,
  e: { ghjk: ['printing'] }
}, function (err, tokens) {
  // tokens
  [ 'foo', 'bar', 'placehold', 'name', 'foo', 'bar', 'print' ]
})

levi.destroy(path, [callback])

Completely remove an existing database at path, which deletes the database directory on Node.js or deletes the IndexedDB database on browser.

If you are using a custom Level backend, you need to invoke its corresponding destroy() function to remove database properly.

Accepts optional callback function or returns a promise.

License

MIT