pos-tagger.js

A Kotlin rewrite of the Stanford Part-Of-Speech Log-Linear tagger

Usage no npm install needed!

<script type="module">
  import posTaggerJs from 'https://cdn.skypack.dev/pos-tagger.js';
</script>

README

pos-tagger.js

Build and test the library

A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'.

This is a rewrite of the Stanford Part-Of-Speech Log-Linear tagger in Kotlin, it is compiled to JavaScript and made available through npm. No background Java service is needed.

This module includes two models:

  • left3words-wsj-0-18
  • bidirectional-distsim-wsj-0-18

The total package size (including both models) is under 10mb. Basic benchmarks show that the JavaScript library has similar performance to that of the original Java code for the "left3words" model.

License: GPL v2 or above

Usage

Install pos-tagger.js from npm:

npm install pos-tagger.js

Example code:
const Tagger = require("pos-tagger.js");
const tagger = new Tagger(Tagger.readModelSync("left3words-wsj-0-18"));

// alternatively
// const tagger = new Tagger(Tagger.readModelSync("bidirectional-distsim-wsj-0-18"));

const output = tagger.tag("I am a happy part-of-speech tagger. How do you do?");

console.log(output);
console.log("First word is a " + output[0][0].tag);

tag takes a string representing one or more "." terminated sentences. The output is a list with one element per input sentence. Each sentence element is itself a list with an element per input token, each element contains a word key with the original token, and a tag key which contains the Penn Treebank part-of-speech tag.

Example output:
[
  [
    { "word": "I", "tag": "PRP" },
    { "word": "am", "tag": "VBP" },
    { "word": "a", "tag": "DT" },
    { "word": "happy", "tag": "JJ" },
    { "word": "part-of-speech", "tag": "JJ" },
    { "word": "tagger", "tag": "NN" },
    { "word": ".", "tag": "." }
  ],
  [
    { "word": "How", "tag": "WRB" },
    { "word": "do", "tag": "VBP" },
    { "word": "you", "tag": "PRP" },
    { "word": "do", "tag": "VB" },
    { "word": "?", "tag": "."
    }
  ]
]
First word is a PRP