README

TinyLD

:tada: Description

Tiny Language Detector, simply detect the language of a unicode UTF-8 text:

Pure JS, No api call, No dependencies (Node and Browser compatible)
Blazing fast and low memory footprint (unlike ML methods)
Train with dataset from Tatoeba and UDHR
Support 64 languages (24 for the web version)
Reliable even for really short texts (chatbot, keywords, ...)
Support both ISO-639-1 & ISO-639-2
Available for both CommonJS and ESM

:floppy_disk: Getting Started

Install

yarn add tinyld # or npm install --save tinyld

Install Documentation

:page_facing_up: TinyLD API

import { detect, detectAll } from 'tinyld'

// Detect
detect('これは日本語です.') // ja
detect('and this is english.') // en

// DetectAll
detectAll('ceci est un text en francais.')
// [ { lang: 'fr', accuracy: 0.5238 }, { lang: 'ro', accuracy: 0.3802 }, ... ]

API Documentation

:paperclip: TinyLD CLI

tinyld This is the text that I want to check
# [ { lang: 'en', accuracy: 1 } ]

More Information

:chart_with_upwards_trend: Performance

Here is a comparison of Tinyld against other popular libraries.

SVG Graph

To summary in one sentence:

Better, Faster, Smaller

More Benchmark Information

Developer

# Install
yarn

# Build
yarn build

# Test
yarn test

# Lint / Auto-fix code style problems
yarn lint

# Optional, used to generate src/profiles/* data from language dataset
# Warning: This step is time consuming and require to install big datasets (described in ./docs/dev.md)
yarn train

# Optional, used to generate benchmark data/bench/*
yarn bench

tinyld

Usage no npm install needed!

README

TinyLD

:tada: Description

Links

:floppy_disk: Getting Started

Install

:page_facing_up: TinyLD API

:paperclip: TinyLD CLI

:chart_with_upwards_trend: Performance

Developer