README

valid-8

Pure JavaScript implementation of UTF-8 validation.

To be drop-in replacement for utf-8-validate.

Most time and efforts were spent to develop extensive test suite (over 18k assertions).

Testing

Tests are run using mocha with regular command:

npm test

Many non-obvious aspects of UTF-8 validation are tested, including:

UTF surrogates
long sequences
overlong sequences
incomplete sequences

Testing other libraries

To test other UTF-8 validation libraries, first install them

cd test/others
npm install
cd ../..

and then run tests for one library, eg:

npm test --lib=utf-8-validate

or:

npm test --lib=is-utf8

Speed

Validation speed is measured during test. So far this validator is fastest (this is not a joke!).

valid-8: 300 Mb/s (pure JavaScript)
utf-8-validate: 260 Mb/s (C++)
is-utf8: 110 Mb/s (pure JavaScript either)

API

Validation is simple:

valid8 = require('valid-8')

if(!valid8(new Buffer('你好，世界！')))
{
  // ...
}

For compatibility with utf-8-validate alias is set valid8.Validation.isValidUTF8 === validate8.

By default, valid8 rejects UTF surrogates (0xD800-0xDFFF) and codepoints higher than 0x10FFFF, according to UTF specification.

One can force UTF surrogates to pass test setting valid8.surrogates = true.

To allow long sequences (say, 5 or 6 bytes), set validate8.maxBytes to 5 or 6. 7-byte sequences will always be rejected. By default validate8.maxBytes=4, and can be set to 1, 2 or 3 either. Eg, set validate8.maxBytes=2 to disable Chinese ideograms (and many other symbols).

Rivals

utf-8-validate, C++
is-utf8, pure JavaScript

valid-8

Usage no npm install needed!

README

valid-8

Testing

Testing other libraries

Speed

API

Rivals

See also