vinyl-elasticsearch

Presents ElasticSearch as a destination stream of Vinyl objects.

Usage no npm install needed!

<script type="module">
  import vinylElasticsearch from 'https://cdn.skypack.dev/vinyl-elasticsearch';
</script>

README

vinyl-elasticsearch

Read and write Vinyl documents from and to ElasticSearch.

Examples

Reading documents from ES

var gulp = require('gulp');

var elasticsearch = require('vinyl-elasticsearch');

gulp.task('search', function() {
  return elasticsearch.src(
    {
      index: 'someindex',
      size: 0,
      body: {
        // Begin query.
        query: {
          // Boolean query for matching and excluding items.
          bool: {
            must: [{match: {'somefield': 'somevalue'}}]
          }
        },
        // Aggregate on the results
        aggs: {
          actions: {
            terms: {
              field: 'additionalType',
              order: {'_term' : 'asc'},
              size: 10,
            }
          }
        }
      }
    },
    {
      host: 'https://localhost:9200',
      log: 'trace'
    }
  )
    .pipe(echo());
});

Saving documents to ES

var gulp = require('gulp');

var elasticsearch = require('vinyl-elasticsearch');

gulp.task('default', function (){
  return gulp.src('hello.json')
    .pipe(
      elasticsearch.dest(
        {
          index: 'someindex',
          type: 'somedoc'
        },
        {
          host: 'https://localhost:9200',
          log: 'trace'
        }
      )
    );
});

API

Options

The configuration for the connection to ES comes from the opt parameter, which is used when creating the client. Possible options are described at ElasticSearch Configuration.

In addtion to the ElasticSearch options, a few more have been added to help manage templates, use the AWS version of ElasticSearch, and to configure retries.

Managing Templates

If opt.manageTemplate is true, then templates will be maintained automatically on the ElasticSearch host. This involves checking for one or more templates with the names indicated in templateName, and if they don't exist (or the templateOverwrite value is set to true) then the appropriate template from the templateDir is uploaded. If any of these parameters are missing then errors will be propagated back to the caller.

For example:

{
  templateName: 'logstash,listenaction',
  templateOverwrite: true,
  templateDir: path.join(__dirname, '../fixtures/templates')
}

Using the AWS Version of ElasticSearch

In addition, if opt.amazonES is present then each message is signed. This makes it possible to use Amazon's ElasticSearch Service. Possible values for amazonES are:

amazonES: {
  region: 'someregion',
  accessKey: 'accesskeyid',
  secretKey: 'secretaccesskey'
}

Configuring Retries

Some ElasticSearch actions may fail for reasons that are temporary, such as servers being busy with nightly backups, or network failures. To help deal with this opts.retries can be set to indicate the number of retries, with the default being zero.

The retry strategy uses the retry module and the backoff strategy is described there.

dest(glob, opt)

The configuration for the connection to ES comes from the opt parameter, as described above.

The index, document type and id are usually derived from the file object being passed through, using:

  • the file.index value;
  • the file.type value, of if none is present, the file.base value;
  • the file.id value, or if none is present the file.path value.

For any of these values that is not present the corresponding value from opt will be used.

However, to make it possible to override values across the board, any settings in glob take priority.

The body of the document will be set to the data property of the Vinyl file if it's present. Otherwise the buffer in the contents property will be converted to a string and then parsed as JSON.

As of v1.7.0, writing to ES uses the bulk update API. The sequence is that each new item to be indexed is collected in memory and then the whole batch is pushed to ES in one go. The trigger to cause the push is either that the batch size has been reached (which defaults to 5000, but can be set with opt.batchSize) or that the batch timeout has elapsed (which defaults to 5000 millisecond, but can be set with opt.batchTimeout).

src(glob, opt)

The configuration for the connection to ES comes from the opt parameter, as described above.

The document returned will be the whole collection of search results, with the usual hits.hits incantation. This will probably be modified soon to allow individual results to be returned as distinct Vinyl files.