README
TinyLD
:tada: Description
Tiny Language Detector, simply detect the language of a unicode UTF-8 text:
- Pure JS, No api call, No dependencies (Node and Browser compatible)
- Blazing fast and low memory footprint (unlike ML methods)
- Train with dataset from Tatoeba and UDHR
- Support 64 languages (24 for the web version)
- Reliable even for really short texts (chatbot, keywords, ...)
- Support both ISO-639-1 & ISO-639-2
- Available for both
CommonJS
andESM
Links
:floppy_disk: Getting Started
Install
yarn add tinyld # or npm install --save tinyld
:page_facing_up: TinyLD API
import { detect, detectAll } from 'tinyld'
// Detect
detect('これは日本語です.') // ja
detect('and this is english.') // en
// DetectAll
detectAll('ceci est un text en francais.')
// [ { lang: 'fr', accuracy: 0.5238 }, { lang: 'ro', accuracy: 0.3802 }, ... ]
:paperclip: TinyLD CLI
tinyld This is the text that I want to check
# [ { lang: 'en', accuracy: 1 } ]
:chart_with_upwards_trend: Performance
Here is a comparison of Tinyld against other popular libraries.
To summary in one sentence:
Better, Faster, Smaller
--
Developer
# Install
yarn
# Build
yarn build
# Test
yarn test
# Lint / Auto-fix code style problems
yarn lint
# Optional, used to generate src/profiles/* data from language dataset
# Warning: This step is time consuming and require to install big datasets (described in ./docs/dev.md)
yarn train
# Optional, used to generate benchmark data/bench/*
yarn bench