tinyld

Simple and Performant Language detection library (pure JS and zero dependencies)

Usage no npm install needed!

<script type="module">
  import tinyld from 'https://cdn.skypack.dev/tinyld';
</script>

README

TinyLD

npm npm CDN Download License

logo

:tada: Description

Tiny Language Detector, simply detect the language of a unicode UTF-8 text:

  • Pure JS, No api call, No dependencies (Node and Browser compatible)
  • Blazing fast and low memory footprint (unlike ML methods)
  • Train with dataset from Tatoeba and UDHR
  • Support 64 languages (24 for the web version)
  • Reliable even for really short texts (chatbot, keywords, ...)
  • Support both ISO-639-1 & ISO-639-2
  • Available for both CommonJS and ESM

Links


:floppy_disk: Getting Started

Install

yarn add tinyld # or npm install --save tinyld

Install Documentation


:page_facing_up: TinyLD API

import { detect, detectAll } from 'tinyld'

// Detect
detect('これは日本語です.') // ja
detect('and this is english.') // en

// DetectAll
detectAll('ceci est un text en francais.')
// [ { lang: 'fr', accuracy: 0.5238 }, { lang: 'ro', accuracy: 0.3802 }, ... ]

API Documentation


:paperclip: TinyLD CLI

tinyld This is the text that I want to check
# [ { lang: 'en', accuracy: 1 } ]

More Information


:chart_with_upwards_trend: Performance

Here is a comparison of Tinyld against other popular libraries.

SVG Graph

To summary in one sentence:

Better, Faster, Smaller

More Benchmark Information

--

Developer

# Install
yarn

# Build
yarn build

# Test
yarn test

# Lint / Auto-fix code style problems
yarn lint

# Optional, used to generate src/profiles/* data from language dataset
# Warning: This step is time consuming and require to install big datasets (described in ./docs/dev.md)
yarn train

# Optional, used to generate benchmark data/bench/*
yarn bench