@ipld/car

Content Addressable aRchive format reader and writer

Usage no npm install needed!

<script type="module">
  import ipldCar from 'https://cdn.skypack.dev/@ipld/car';
</script>

README

@ipld/car

A JavaScript Content Addressable aRchive (CAR) file reader and writer.

See also:

Contents

Example

// Create a simple .car file with a single block and that block's CID as the
// single root. Then read the .car and fetch the block again.

import fs from 'fs'
import { Readable } from 'stream'
import { CarReader, CarWriter } from '@ipld/car'
import * as raw from 'multiformats/codecs/raw'
import { CID } from 'multiformats/cid'
import { sha256 } from 'multiformats/hashes/sha2'

async function example () {
  const bytes = new TextEncoder().encode('random meaningless bytes')
  const hash = await sha256.digest(raw.encode(bytes))
  const cid = CID.create(1, raw.code, hash)

  // create the writer and set the header with a single root
  const { writer, out } = await CarWriter.create([cid])
  Readable.from(out).pipe(fs.createWriteStream('example.car'))

  // store a new block, creates a new file entry in the CAR archive
  await writer.put({ cid, bytes })
  await writer.close()

  const inStream = fs.createReadStream('example.car')
  // read and parse the entire stream in one go, this will cache the contents of
  // the car in memory so is not suitable for large files.
  const reader = await CarReader.fromIterable(inStream)

  // read the list of roots from the header
  const roots = await reader.getRoots()
  // retrieve a block, as a { cid:CID, bytes:UInt8Array } pair from the archive
  const got = await reader.get(roots[0])
  // also possible: for await (const { cid, bytes } of CarIterator.fromIterable(inStream)) { ... }

  console.log('Retrieved [%s] from example.car with CID [%s]',
    new TextDecoder().decode(got.bytes),
    roots[0].toString())
}

example().catch((err) => {
  console.error(err)
  process.exit(1)
})

Will output:

Retrieved [random meaningless bytes] from example.car with CID [bafkreihwkf6mtnjobdqrkiksr7qhp6tiiqywux64aylunbvmfhzeql2coa]

See the examples directory for more.

Usage

@ipld/car is consumed through factory methods on its different classes. Each class represents a discrete set of functionality. You should select the classes that make the most sense for your use-case.

Please be aware that @ipld/car does not validate that block data matches the paired CIDs when reading a CAR. See the verify-car.js example for one possible approach to validating blocks as they are read. Any CID verification requires that the hash function that was used to generate the CID be available, the CAR format does not restrict the allowable multihashes.

CarReader

The basic CarReader class is consumed via:

import { CarReader } from '@ipld/car/reader'

Or alternatively: import { CarReader } from '@ipld/car'. CommonJS require will also work for the same import paths and references.

CarReader is useful for relatively small CAR archives as it buffers the entirety of the archive in memory to provide access to its data. This class is also suitable in a browser environment. The CarReader class provides random-access get(key) and has(key) methods as well as iterators for blocks()] and cids()].

CarReader can be instantiated from a single Uint8Array or from an AsyncIterable of Uint8Arrays (note that Node.js streams are AsyncIterables and can be consumed in this way).

CarIndexedReader

The CarIndexedReader class is a special form of CarReader and can be consumed in Node.js only (not in the browser) via:

import { CarIndexedReader } from '@ipld/car/indexed-reader'

Or alternatively: import { CarIndexedReader } from '@ipld/car'. CommonJS require will also work for the same import paths and references.

A CarIndexedReader provides the same functionality as CarReader but is instantiated from a path to a CAR file and also adds a close() method that must be called when the reader is no longer required, to clean up resources.

CarIndexedReader performs a single full-scan of a CAR file, collecting a list of CIDs and their block positions in the archive. It then performs random-access reads when blocks are requested via get() and the blocks() and cids() iterators.

This class may be sutiable for random-access (primarily via has() and get()) to relatively large CAR files.

CarBlockIterator and CarCIDIterator

import { CarBlockIterator } from '@ipld/car/iterator'
// or
import { CarCIDIterator } from '@ipld/car/iterator'

Or alternatively: import { CarBlockIterator, CarCIDIterator } from '@ipld/car'. CommonJS require will also work for the same import paths and references.

These two classes provide AsyncIterables to the blocks or just the CIDs contained within a CAR archive. These are efficient mechanisms for scanning an entire CAR archive, regardless of size, if random-access to blocks is not required.

CarBlockIterator and CarCIDIterator can be instantiated from a single Uint8Array (see CarBlockIterator.fromBytes() and CarCIDIterator.fromBytes()) or from an AsyncIterable of Uint8Arrays (see CarBlockIterator.fromIterable() and CarCIDIterator.fromIterable())—note that Node.js streams are AsyncIterables and can be consumed in this way.

CarIndexer

The CarIndexer class can be used to scan a CAR archive and provide indexing data on the contents. It can be consumed via:

import CarIndexer from '@ipld/car/indexed-reader'

Or alternatively: import { CarIndexer } from '@ipld/car'. CommonJS require will also work for the same import paths and references.

This class is used within CarIndexedReader and is only useful in cases where an external index of a CAR needs to be generated and used.

The index data can also be used with CarReader.readRaw()] to fetch block data directly from a file descriptor using the index data for that block.

CarWriter

A CarWriter is used to create new CAR archives. It can be consumed via:

import CarWriter from '@ipld/car/writer'

Or alternatively: import { CarWriter } from '@ipld/car'. CommonJS require will also work for the same import paths and references.

Creation of a CarWriter involves a "channel", or a { writer:CarWriter, out:AsyncIterable<Uint8Array> } pair. The writer side of the channel is used to put() blocks, while the out side of the channel emits the bytes that form the encoded CAR archive.

In Node.js, you can use the Readable.from() API to convert the out AsyncIterable to a standard Node.js stream, or it can be directly fed to a stream.pipeline().

API

Contents

class CarReader

Properties:

  • version (number): The version number of the CAR referenced by this reader (should be 1).

Provides blockstore-like access to a CAR.

Implements the RootsReader interface: getRoots(). And the BlockReader interface: get(), has(), blocks() (defined as a BlockIterator) and cids() (defined as a CIDIterator).

Load this class with either import { CarReader } from '@ipld/car/reader' (const { CarReader } = require('@ipld/car/reader')). Or import { CarReader } from '@ipld/car' (const { CarReader } = require('@ipld/car')). The former will likely result in smaller bundle sizes where this is important.

async CarReader#getRoots()

  • Returns: Promise<CID[]>

Get the list of roots defined by the CAR referenced by this reader. May be zero or more CIDs.

async CarReader#has(key)

  • key (CID)

  • Returns: Promise<boolean>

Check whether a given CID exists within the CAR referenced by this reader.

async CarReader#get(key)

  • key (CID)

  • Returns: Promise<(Block|undefined)>

Fetch a Block (a { cid:CID, bytes:Uint8Array } pair) from the CAR referenced by this reader matching the provided CID. In the case where the provided CID doesn't exist within the CAR, undefined will be returned.

async * CarReader#blocks()

  • Returns: AsyncGenerator<Block>

Returns a BlockIterator (AsyncIterable<Block>) that iterates over all of the Blocks ({ cid:CID, bytes:Uint8Array } pairs) contained within the CAR referenced by this reader.

async * CarReader#cids()

  • Returns: AsyncGenerator<CID>

Returns a CIDIterator (AsyncIterable<CID>) that iterates over all of the CIDs contained within the CAR referenced by this reader.

async CarReader.fromBytes(bytes)

  • bytes (Uint8Array)

  • Returns: Promise<CarReader>: blip blop

Instantiate a CarReader from a Uint8Array blob. This performs a decode fully in memory and maintains the decoded state in memory for full access to the data via the CarReader API.

async CarReader.fromIterable(asyncIterable)

  • asyncIterable (AsyncIterable<Uint8Array>)

  • Returns: Promise<CarReader>

Instantiate a CarReader from a AsyncIterable<Uint8Array>, such as a modern Node.js stream. This performs a decode fully in memory and maintains the decoded state in memory for full access to the data via the CarReader API.

Care should be taken for large archives; this API may not be appropriate where memory is a concern or the archive is potentially larger than the amount of memory that the runtime can handle.

async CarReader.readRaw(fd, blockIndex)

  • fd (fs.promises.FileHandle|number): A file descriptor from the Node.js fs module. Either an integer, from fs.open() or a FileHandle from fs.promises.open().

  • blockIndex (BlockIndex): An index pointing to the location of the Block required. This BlockIndex should take the form: {cid:CID, blockLength:number, blockOffset:number}.

  • Returns: Promise<Block>: A { cid:CID, bytes:Uint8Array } pair.

Reads a block directly from a file descriptor for an open CAR file. This function is only available in Node.js and not a browser environment.

This function can be used in connection with CarIndexer which emits the BlockIndex objects that are required by this function.

The user is responsible for opening and closing the file used in this call.

class CarIndexedReader

Properties:

  • version (number): The version number of the CAR referenced by this reader (should be 1).

A form of CarReader that pre-indexes a CAR archive from a file and provides random access to blocks within the file using the index data. This function is only available in Node.js and not a browser environment.

For large CAR files, using this form of CarReader can be singificantly more efficient in terms of memory. The index consists of a list of CIDs and their location within the archive (see CarIndexer). For large numbers of blocks, this index can also occupy a significant amount of memory. In some cases it may be necessary to expand the memory capacity of a Node.js instance to allow this index to fit. (e.g. by running with NODE_OPTIONS="--max-old-space-size=16384").

As an CarIndexedReader instance maintains an open file descriptor for its CAR file, an additional CarReader#close method is attached. This must be called to have full clean-up of resources after use.

Load this class with either import { CarIndexedReader } from '@ipld/car/indexed-reader' (const { CarIndexedReader } = require('@ipld/car/indexed-reader')). Or import { CarIndexedReader } from '@ipld/car' (const { CarIndexedReader } = require('@ipld/car')). The former will likely result in smaller bundle sizes where this is important.

async CarIndexedReader#getRoots()

  • Returns: Promise<CID[]>

See CarReader#getRoots

async CarIndexedReader#has(key)

  • key (CID)

  • Returns: Promise<boolean>

See CarReader#has

async CarIndexedReader#get(key)

  • key (CID)

  • Returns: Promise<(Block|undefined)>

See CarReader#get

async * CarIndexedReader#blocks()

  • Returns: AsyncGenerator<Block>

See CarReader#blocks

async * CarIndexedReader#cids()

  • Returns: AsyncGenerator<CID>

See CarReader#cids

async CarIndexedReader#close()

  • Returns: Promise<void>

Close the underlying file descriptor maintained by this CarIndexedReader. This must be called for proper resource clean-up to occur.

async CarIndexedReader.fromFile(path)

  • path (string)

  • Returns: Promise<CarIndexedReader>

Instantiate an CarIndexedReader from a file with the provided path. The CAR file is first indexed with a full path that collects CIDs and block locations. This index is maintained in memory. Subsequent reads operate on a read-only file descriptor, fetching the block from its in-file location.

For large archives, the initial indexing may take some time. The returned Promise will resolve only after this is complete.

class CarBlockIterator

Properties:

  • version (number): The version number of the CAR referenced by this iterator (should be 1).

Provides an iterator over all of the Blocks in a CAR. Implements a BlockIterator interface, or AsyncIterable<Block>. Where a Block is a { cid:CID, bytes:Uint8Array } pair.

As an implementer of AsyncIterable, this class can be used directly in a for await (const block of iterator) {} loop. Where the iterator is constructed using CarBlockiterator.fromBytes or CarBlockiterator.fromIterable.

An iteration can only be performce once per instantiation.

CarBlockIterator also implements the RootsReader interface and provides the getRoots() method.

Load this class with either import { CarBlockIterator } from '@ipld/car/iterator' (const { CarBlockIterator } = require('@ipld/car/iterator')). Or import { CarBlockIterator } from '@ipld/car' (const { CarBlockIterator } = require('@ipld/car')).

async CarBlockIterator#getRoots()

  • Returns: Promise<CID[]>

Get the list of roots defined by the CAR referenced by this iterator. May be zero or more CIDs.

async CarBlockIterator.fromBytes(bytes)

  • bytes (Uint8Array)

  • Returns: Promise<CarBlockIterator>

Instantiate a CarBlockIterator from a Uint8Array blob. Rather than decoding the entire byte array prior to returning the iterator, as in CarReader.fromBytes, only the header is decoded and the remainder of the CAR is parsed as the Blocks as yielded.

async CarBlockIterator.fromIterable(asyncIterable)

  • asyncIterable (AsyncIterable<Uint8Array>)

  • Returns: Promise<CarBlockIterator>

Instantiate a CarBlockIterator from a AsyncIterable<Uint8Array>, such as a modern Node.js stream. Rather than decoding the entire byte array prior to returning the iterator, as in CarReader.fromIterable, only the header is decoded and the remainder of the CAR is parsed as the Blocks as yielded.

class CarCIDIterator

Properties:

  • version (number): The version number of the CAR referenced by this iterator (should be 1).

Provides an iterator over all of the CIDs in a CAR. Implements a CIDIterator interface, or AsyncIterable<CID>. Similar to CarBlockIterator but only yields the CIDs in the CAR.

As an implementer of AsyncIterable, this class can be used directly in a for await (const cid of iterator) {} loop. Where the iterator is constructed using CarCIDiterator.fromBytes or CarCIDiterator.fromIterable.

An iteration can only be performce once per instantiation.

CarCIDIterator also implements the RootsReader interface and provides the getRoots() method.

Load this class with either import { CarCIDIterator } from '@ipld/car/iterator' (const { CarCIDIterator } = require('@ipld/car/iterator')). Or import { CarCIDIterator } from '@ipld/car' (const { CarCIDIterator } = require('@ipld/car')).

async CarCIDIterator#getRoots()

  • Returns: Promise<CID[]>

Get the list of roots defined by the CAR referenced by this iterator. May be zero or more CIDs.

async CarCIDIterator.fromBytes(bytes)

  • bytes (Uint8Array)

  • Returns: Promise<CarCIDIterator>

Instantiate a CarCIDIterator from a Uint8Array blob. Rather than decoding the entire byte array prior to returning the iterator, as in CarReader.fromBytes, only the header is decoded and the remainder of the CAR is parsed as the CIDs as yielded.

async CarCIDIterator.fromIterable(asyncIterable)

  • asyncIterable (AsyncIterable<Uint8Array>)

  • Returns: Promise<CarCIDIterator>

Instantiate a CarCIDIterator from a AsyncIterable<Uint8Array>, such as a modern Node.js stream. Rather than decoding the entire byte array prior to returning the iterator, as in CarReader.fromIterable, only the header is decoded and the remainder of the CAR is parsed as the CIDs as yielded.

class CarIndexer

Properties:

  • version (number): The version number of the CAR referenced by this reader (should be 1).

Provides an iterator over all of the Blocks in a CAR, returning their CIDs and byte-location information. Implements an AsyncIterable<BlockIndex>. Where a BlockIndex is a { cid:CID, length:number, offset:number, blockLength:number, blockOffset:number }.

As an implementer of AsyncIterable, this class can be used directly in a for await (const blockIndex of iterator) {} loop. Where the iterator is constructed using CarIndexer.fromBytes or CarIndexer.fromIterable.

An iteration can only be performce once per instantiation.

CarIndexer also implements the RootsReader interface and provides the getRoots() method.

Load this class with either import { CarIndexer } from '@ipld/car/indexer' (const { CarIndexer } = require('@ipld/car/indexer')). Or import { CarIndexer } from '@ipld/car' (const { CarIndexer } = require('@ipld/car')). The former will likely result in smaller bundle sizes where this is important.

async CarIndexer#getRoots()

  • Returns: Promise<CID[]>

Get the list of roots defined by the CAR referenced by this indexer. May be zero or more CIDs.

async CarIndexer.fromBytes(bytes)

  • bytes (Uint8Array)

  • Returns: Promise<CarIndexer>

Instantiate a CarIndexer from a Uint8Array blob. Only the header is decoded initially, the remainder is processed and emitted via the iterator as it is consumed.

async CarIndexer.fromIterable(asyncIterable)

  • asyncIterable (AsyncIterable<Uint8Array>)

  • Returns: Promise<CarIndexer>

Instantiate a CarIndexer from a AsyncIterable<Uint8Array>, such as a modern Node.js stream. is decoded initially, the remainder is processed and emitted via the iterator as it is consumed.

class CarWriter

Provides a writer interface for the creation of CAR files.

Creation of a CarWriter involves the instatiation of an input / output pair in the form of a WriterChannel, which is a { writer:CarWriter, out:AsyncIterable<Uint8Array> } pair. These two components form what can be thought of as a stream-like interface. The writer component (an instantiated CarWriter), has methods to put() new blocks and close() the writing operation (finalising the CAR archive). The out component is an AsyncIterable that yields the bytes of the archive. This can be redirected to a file or other sink. In Node.js, you can use the Readable.from() API to convert this to a standard Node.js stream, or it can be directly fed to a stream.pipeline().

The channel will provide a form of backpressure. The Promise from a write() won't resolve until the resulting data is drained from the out iterable.

It is also possible to ignore the Promise from write() calls and allow the generated data to queue in memory. This should be avoided for large CAR archives of course due to the memory costs and potential for memory overflow.

Load this class with either import { CarWriter } from '@ipld/car/writer' (const { CarWriter } = require('@ipld/car/writer')). Or import { CarWriter } from '@ipld/car' (const { CarWriter } = require('@ipld/car')). The former will likely result in smaller bundle sizes where this is important.

async CarWriter#put(block)

  • block (Block): A { cid:CID, bytes:Uint8Array } pair.

  • Returns: Promise<void>: The returned promise will only resolve once the bytes this block generates are written to the out iterable.

Write a Block (a { cid:CID, bytes:Uint8Array } pair) to the archive.

async CarWriter#close()

  • Returns: Promise<void>

Finalise the CAR archive and signal that the out iterable should end once any remaining bytes are written.

async CarWriter.create(roots)

  • roots (CID[]|CID|void)

  • Returns: WriterChannel: The channel takes the form of { writer:CarWriter, out:AsyncIterable<Uint8Array> }.

Create a new CAR writer "channel" which consists of a { writer:CarWriter, out:AsyncIterable<Uint8Array> } pair.

async CarWriter.createAppender()

  • Returns: WriterChannel: The channel takes the form of { writer:CarWriter, out:AsyncIterable<Uint8Array> }.

Create a new CAR appender "channel" which consists of a { writer:CarWriter, out:AsyncIterable<Uint8Array> } pair. This appender does not consider roots and does not produce a CAR header. It is designed to append blocks to an existing CAR archive. It is expected that out will be concatenated onto the end of an existing archive that already has a properly formatted header.

async CarWriter.updateRootsInBytes(bytes, roots)

  • bytes (Uint8Array)

  • roots (CID[]): A new list of roots to replace the existing list in the CAR header. The new header must take up the same number of bytes as the existing header, so the roots should collectively be the same byte length as the existing roots.

  • Returns: Promise<Uint8Array>

Update the list of roots in the header of an existing CAR as represented in a Uint8Array.

This operation is an overwrite, the total length of the CAR will not be modified. A rejection will occur if the new header will not be the same length as the existing header, in which case the CAR will not be modified. It is the responsibility of the user to ensure that the roots being replaced encode as the same length as the new roots.

The byte array passed in an argument will be modified and also returned upon successful modification.

async CarWriter.updateRootsInFile(fd, roots)

  • fd (fs.promises.FileHandle|number): A file descriptor from the Node.js fs module. Either an integer, from fs.open() or a FileHandle from fs.promises.open().

  • roots (CID[]): A new list of roots to replace the existing list in the CAR header. The new header must take up the same number of bytes as the existing header, so the roots should collectively be the same byte length as the existing roots.

  • Returns: Promise<void>

Update the list of roots in the header of an existing CAR file. The first argument must be a file descriptor for CAR file that is open in read and write mode (not append).

This operation is an overwrite, the total length of the CAR will not be modified. A rejection will occur if the new header will not be the same length as the existing header, in which case the CAR will not be modified. It is the responsibility of the user to ensure that the roots being replaced encode as the same length as the new roots.

This function is only available in Node.js and not a browser environment.

License

Licensed under either of

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.