naushon

Pipelined transformations using AsyncIterators

Usage no npm install needed!

<script type="module">
  import naushon from 'https://cdn.skypack.dev/naushon';
</script>

README

naushon

A tool for incrementally transforming data in node and the browser using AsyncIterators.

Installation

$ npm install naushon

Usage

// Create a generator that produces an endless collection of arbitrarly-large
// integers. Arbitrary-sized integers can be larger than Integer.MAX_VALUE,
// with no upper-bound on their size. There is no way the set of all integers
// would fit in memory, so this is a good use case for naushon
function * integers () {
  let i = 0n
  while (true) {
    yield i++
  }
}

const { eduction, filter, map, partitionAll, pipe, take } = require('naushon')

const xform = pipe(
  filter(n => n % 2 === 0),  // filter out odd numbers
  partitionAll(3),
  map(nums => nums.join(', ')),
  take(3))

for await (const block of eduction(xform, integers())) {
  console.log(block)
}
// Logs:
//  "2, 4, 6"
//  "8, 10, 12"
//  "14, 16, 18"
// then closes

const result = await sequence(xform, integers())
// ["2, 4, 6", "8, 10, 12", "14, 16, 18"]

This is fine for truncating an infinitely-large theoretical data set, but what if we want to operate on a huge real data set?

const { parseFile } = require('fast-csv')
const { eduction, distinct, filter, interpose, map, pipe, count } = require('naushon')

// A stream of rows from a huge list of addresses in a CSV file:
function addresses () {
  return parseFile('./statewide.csv', { headers: true })
}

// Create a filter transformer that yields only eligible addresses by checking
// that the addresses are in qualifying cities and states:
const eligibleAddress = filter(({ city, state }) => 
  eligibleStates.has(state) && eligibleCities.has(city))

// Convert the record into a string because we are going to write it to a file
const addressify = map(({ house_number_and_street, city, state, postcode }) =>
  `${house_number_and_street}, ${city}, ${state} ${postcode}`)

const xform = pipe(
  eligibleAddress, // filter out ineligible addresses
  addressify,      // convert data object into a string
  distinct,        // remove duplicate addresses
  interpose('\n')) // insert newlines between records

// Read each row from the CSV, transforming it with xform, and writing it to
// the output file:
Readable.from(eduction(xform, addresses())).pipe(createWriteStream('./out.txt'))

// Or: if we just want to know how many eligible addresses:
// Consumes the address file a little bit at a time without consuming a ton
// of memory!
const ct = await count(xform, addresses())

Rationale

Very often, a node program will need to read or write sets of data larger than the available memory size. Node Streams do a good job of defining the source and destination primitives for node processes, but writing transformation streams is awkward and hard to compose. Streams also don't exist in the browser, where processing large sets of data will become more-and-more common as we expect more out of web applications.

Naushon attempts to address this need by using a fundamental language primitive, the AsyncIterator, to create composable pipelines. These pipelines can operate on in-memory data structures like arrays and strings, but also database cursors, and audio, video, or filesystem stream objects.

The inspiration for the library is the Clojure transducers library, which is evident in the function names.

Etymology

Naushon is an island off the coast of Massachusetts. The island is one side of Woods Hole, a place where the tide passes through at great speed when the ocean level is higher on one side than the other. The island links to other small islands with bridges, under which some of this water passes, and where it is held in tidal coves. This seems an appropriate metaphor for the intended use of the naushon library: given a massive flow, partition and transform it across many discrete steps.

a photo taken on naushon