@shelf/dynamodb-parallel-scan

Scan large DynamoDB tables faster with parallelism

Usage no npm install needed!

<script type="module">
  import shelfDynamodbParallelScan from 'https://cdn.skypack.dev/@shelf/dynamodb-parallel-scan';
</script>

README

dynamodb-parallel-scan CircleCI npm (scoped)

Scan DynamoDB table concurrently (up to 1,000,000 segments), recursively read all items from every segment

Install

$ yarn add @shelf/dynamodb-parallel-scan

Usage

Fetch everything at once

const {parallelScan} = require('@shelf/dynamodb-parallel-scan');

(async () => {
  const items = await parallelScan(
    {
      TableName: 'files',
      FilterExpression: 'attribute_exists(#fileSize)',
      ExpressionAttributeNames: {
        '#fileSize': 'fileSize',
      },
      ProjectionExpression: 'fileSize',
    },
    {concurrency: 1000}
  );

  console.log(items);
})();

Use as async generator (or streams)

Note: this stream doesn't implement backpressure mechanism just yet, so memory overflow could happen if you don't consume stream fast enough.

const {parallelScanAsStream} = require('@shelf/dynamodb-parallel-scan');

(async () => {
  const stream = await parallelScanAsStream(
    {
      TableName: 'files',
      FilterExpression: 'attribute_exists(#fileSize)',
      ExpressionAttributeNames: {
        '#fileSize': 'fileSize',
      },
      ProjectionExpression: 'fileSize',
    },
    {concurrency: 1000, chunkSize: 10000}
  );

  for await (const items of stream) {
    console.log(items); // 10k items here
  }
})();

Read

Publish

$ git checkout master
$ yarn version
$ yarn publish
$ git push origin master

License

MIT © Shelf