@vlr/tokenize

basic tokenize function

Usage no npm install needed!

<script type="module">
  import vlrTokenize from 'https://cdn.skypack.dev/@vlr/tokenize';
</script>

README

@vlr/tokenize

Basic tokenize function used to split content string into known and unknown tokens Longer tokens are prioritized over short tokens

usage

Function tokenize returns objects with token and its position

import { tokenize } from "@vlr/tokenize";

const result = tokenize("my content", ["con", " "]);
// [
//  {token: "my", position: 0}, 
//  {token: " ", position: 2}, 
//  {token: "con", position: 3}
//  {token: "tent", position: 6} 
// ]

Function tokenizePlain returns array of strings

const result = tokenizePlain("some content", ["ten", "t", " "]);
// [ "some", " ", "con", "ten", "t" ];