gumnut

Permissive JS parser and tokenizer in Web Assembly / C, for Node or browser

Usage no npm install needed!

<script type="module">
  import gumnut from 'https://cdn.skypack.dev/gumnut';
</script>

README

Tests

A permissive JavaScript tokenizer and parser in C. See a demo syntax highlighter.

Supports ESM code only (i.e., type="module", which is implicitly strict). Supports all language features in the draft specification (as of January 2021).

This is compiled via Web Assembly to run on the web or inside Node without native bindings. It's not reentrant, so you can't parse another file from within its callbacks. It does not generate an AST (although does emit enough data to do so in JS), does not modify the input, and does not use malloc or free.

Usage

Import and install via your favourite package manager. This requires Node v13.10.0 or higher.

The parser works by invoking callbacks on every token as well as open/close announcements for a 'stack', which roughly maps to something you might make an AST node out of.

import {buildHarness} from 'gumnut';

const harness = await buildHarness();  // WebAssembly instantiation is async

const buffer = new TextEncoder().encode('console.info("hello");');
const memory = harness.prepare(buffer.length);
memory.set(buffer);

harness.handle({
  callback() {
    const type = harness.token.type();
    console.info('token', harness.token.type(), harness.token.string());
  },
  open(stackType) { /* open stack type, return false to skip contents */ },
  close(stackType) { /* close stack type */ },
});

harness.run();

This is fairly low-level and designed to be used by other tools.

Module Imports Rewriter

This provides a rewriter for unresolved ESM imports (i.e., those pointing to "node_modules"), which could be used as part of an ESM dev server. Usage:

import buildImportsRewriter from 'gumnut/imports';
import buildResolver from 'esm-resolve';

// WebAssembly instantiation is async
const run = await buildImportsRewriter(buildResolver);
run('./source.js', (part) => process.stdout.write(part));

This example uses esm-resolve, which implements an ESM resolver in pure JS.

Coverage

This correctly parses all 'pass-explicit' tests from test262-parser-tests, except those which rely on non-strict mode behavior (e.g., use variable names like static and let).

Note

JavaScript has a single open-ended 'after-the-fact' ambiguity for keywords, as async is not always a keyword—even in strict mode. Here's an example:

// this is a function call of a method named "async"
async(/* anything can go here */) {}

// this is an async arrow function
async(/* anything can go here */) => {}

// this calls the async method on foo, and is _not_ ambiguous
foo.async() {}

This parser has to walk over code like this at most twice to resolve whether async is a keyword before continuing. See arrow functions break JavaScript parsers for more details.

It also needs to walk over non-async functions at most twice—like (a, b) =>—to correctly label the argument as either creating new variables in scope, or just using them (like a function call or simple parens).

History

Since engineers like to rewrite everything all the time, see the 2020 branch of this code.