ngram-fingerprint

JS version of ngram-fingerprint from Open Refine

Usage no npm install needed!

<script type="module">
  import ngramFingerprint from 'https://cdn.skypack.dev/ngram-fingerprint';
</script>

README

ngram-fingerprint

Windows Mac/Linux
Windows Build status Build Status

JavaScript implementation of the ngram-fingerprint algorithm from the Open Refine project described here.

Algorithm

The algorithm is slightly different to the one by Google Refine. The replacements of extended western characters is already done in the third step and not as the last step. This is mostly done so the sorting will work properly.

  1. change all characters to their lowercase representation
  2. remove all punctuation, whitespace, and control characters
  3. normalize extended western characters to their ASCII representation
  4. obtain all the string n-grams
  5. sort the n-grams and remove duplicates
  6. join the sorted n-grams back together

Usage

var fingerprint = require('ngram-fingerprint')

fingerprint(2, 'paris') // returns arispari