tokenizer-dsl

DSL for building streaming tokenizers.

Usage no npm install needed!

<script type="module">
  import tokenizerDsl from 'https://cdn.skypack.dev/tokenizer-dsl';
</script>

README

tokenizer-dsl build

DSL for building streaming tokenizers.

⚠️ API documentation is available here.

Example below shows how to assemble takers to create tokenizer for numbers:

import {all, char, maybe, text, or, seq} from 'tokenizer-dsl';

const takeZero = text('0');

const takeLeadingDigit = char((charCode) => charCode >= 49 /*1*/ && charCode <= 57 /*9*/);

const takeDigits = all(char((charCode) => charCode >= 48 /*0*/ && charCode <= 57 /*9*/));

const takeDot = text('.');

const takeSign = char((charCode) => charCode === 43 /*+*/ || charCode === 45 /*-*/);

const takeNumber = seq(

    // sign
    maybe(takeSign),

    // integer
    or(
        takeZero,
        seq(
            takeLeadingDigit,
            takeDigits,
        ),
    ),

    // fraction
    maybe(
        seq(
            takeDot,
            maybe(takeDigits),
        ),
    ),
);

To get the offset at which the number ends in the string call takeNumber and provide an input string, and an offset from which the reading should be started:

takeNumber(/*input*/ '0', /*offset*/ 0); // → 1

takeNumber(/*input*/ '123', /*offset*/ 0); // → 3

takeNumber(/*input*/ '+123', /*offset*/ 0); // → 4

takeNumber(/*input*/ '-0.123', /*offset*/ 0); // → 6

takeNumber(/*input*/ '-123.123', /*offset*/ 0); // → 8

takeNumber(/*input*/ 'aaa123bbb', /*offset*/ 3);
  // → 6, because valid number starts at offset 3 and ends at 6

If input string doesn't contain a valid number at an offset then ResultCode.NO_MATCH === -1 is returned:

takeNumber(/*input*/ 'aaa', /*offset*/ 0); // → -1

takeNumber(/*input*/ 'a123', /*offset*/ 0); // → -1

takeNumber(/*input*/ '0000', /*offset*/ 0);
  // → 1, because valid number starts at 0 and ends at 1