trigrams

Trigram files for 400+ languages

Usage no npm install needed!

<script type="module">
  import trigrams from 'https://cdn.skypack.dev/trigrams';
</script>

README

trigrams

Build Coverage Downloads

Trigrams for 400+ languages.

Install

This package is ESM only: Node 12+ is needed to use it and it must be imported instead of required.

npm:

npm install trigrams

API

This package exports the following identifiers: top, min. There is no default export.

top()

import {top} from 'trigrams'

console.log((await top()).pam)

Yields:

{
  'isa': 6,
  'upa': 6,
  'i k': 6,
  // …
  'ang': 273,
  'ing': 282,
  'ng ': 572
}

Returns a promise resolving to an object mapping UDHR in Unicode codes to objects mapping the top 300 trigrams to occurrence counts.

min()

import {min} from 'trigrams'

console.log((await min()).nld)

Yields:

[
  ' ar',
  'eer',
  'tij',
  // …
  'de ',
  'an ',
  'en '
]

A bit like top, but returns a promise resolving to arrays containing the top 300 trigrams sorted from least occurring to most occurring.

Data

The trigrams are based on the unicode versions of the universal declaration of human rights.

The files are created from all paragraphs made available by wooorm/udhr and do not include headings and such.

Cleaning

Before creating trigrams,

  • The unicode characters from \u0021 to \u0040 (both including) are removed
  • One or more white space characters (\s+) are replaced with a single space
  • Alphabetic characters are lower cased ([A-Z])

Additionally, the input is padded with two spaces on both sides.

Support

| Code | Name | OHCHR | | - | - | - | | 007 | Sãotomense | 1128 | | 008 | Crioulo, Upper Guinea (008) | No | | 009 | Mbundu (009) | No | | 010 | Tetun Dili | No | | 011 | Umbundu (011) | No | | 012 | (Bizisa) | bz1 | | 013 | (Mijisa) | bz2 | | 014 | (Maiunan) | ma1 | | 016 | (Minjiang, spoken) | mi1_spok | | 017 | (Minjiang, written) | mi1_written | | 020 | Drung | ty1 | | 026 | (Yeonbyeon) | ye1 | | aar | Afar | aar | | abk | Abkhaz | abk | | ace | Aceh | atj | | acu | Achuar-Shiwiar | acu | | acu_1 | Achuar-Shiwiar (1) | jiv | | ada | Dangme | gac1 | | ady | Adyghe | ady | | afr | Afrikaans | afk | | agr | Aguaruna | agr | | aii | Assyrian Neo-Aramaic | aii | | ajg | Aja | ajg | | aka_akuapem | Twi (Akuapem) | tws1 | | aka_asante | Twi (Asante) | ass | | aka_fante | Fante | tws3 | | als | Albanian, Tosk | aln | | alt | Altai, Southern | alt | | amc | Amahuaca | amc | | ame | Yaneshaʼ | ame | | amh | Amharic | amh | | ami | Amis | ami | | amr | Amarakaeri | amr | | arb | Arabic, Standard | arz | | arl | Arabela | arl | | arn | Mapudungun | aru | | ast | Asturian | aub | | auc | Waorani | 1127 | | auv | Occitan (Auvergnat) | auv1 | | ayr | Aymara, Central | aym | | azj_cyrl | Azerbaijani, North (Cyrillic) | azb1 | | azj_latn | Azerbaijani, North (Latin) | azb | | bam | Bamanankan | bra | | ban | Bali | bzc | | bax | Bamun | bax | | bba | Baatonum | bba | | bci | Baoulé | bci | | bcl | Bicolano, Central | bkl | | bel | Belarusan | ruw | | bem | Bemba | bem | | ben | Bengali | bng | | bfa | Bari | bfa | | bho | Bhojpuri | bhj | | bin | Edo | edo | | bis | Bislama | bcy | | blt | Tai Dam | No | | blu | Hmong Njua | blu | | boa | Bora | boa | | bod | Tibetan, Central | tic | | bos_cyrl | Bosnian (Cyrillic) | src4 | | bos_latn | Bosnian (Latin) | src1 | | bre | Breton | brt | | btb | Bulu | btb | | buc | Bushi | buc | | bug | Bugis | bpr | | bul | Bulgarian | blg | | cab | Garifuna | cab | | cak | Kaqchikel, Central | cak1 | | cat | Catalan-Valencian-Balear | cln | | cbi | Chachi | 1122 | | cbr | Cashibo-Cacataibo | cbr | | cbs | Cashinahua | cbs | | cbt | Chayahuita | cbt | | cbu | Candoshi-Shapra | cbu | | ccx | Zhuang, Yongbei | ccx | | ceb | Cebuano | ceb | | ces | Czech | czc | | cha | Chamorro | cjd | | chj | Chinantec, Ojitlán | chj | | chk | Chuukese | tru1 | | chr_cased | Cherokee (cased) | No | | chr_uppercase | Cherokee (uppercase) | No | | cic | Chickasaw | cic | | cjk | Chokwe | cjk | | cjk_AO | Chokwe (Angola) | cjk | | cjs | Shor | cjs | | ckb | Kurdish, Central | kdb1 | | cnh | Chin, Haka | hak | | cni | Asháninka | cni | | cof | Colorado | cof | | cos | Corsican | coi | | cot | Caquinte | cot | | cpu | Ashéninka, Pichis | cpu | | crh | Crimean Tatar | crh | | crs | Seselwa Creole French | crs | | csa | Chinantec, Chiltepec | csa | | csw | Cree, Swampy | crm | | ctd | Chin, Tedim | tid | | cym | Welsh | wls | | dag | Dagbani | dag | | dan | Danish | dns | | ddn | Dendi | den | | deu_1901 | German, Standard (1901) | ger | | deu_1996 | German, Standard (1996) | No | | dga | Dagaare, Southern | dga | | dip | Dinka, Northeastern | dinka | | div | Maldivian | div | | dyo | Jola-Fonyi | dyo | | dyu | Jula | dyu | | dzo | Dzongkha | dzo | | ell_monotonic | Greek (monotonic) | grk | | ell_polytonic | Greek (polytonic) | No | | emk | Maninkakan, Eastern | mni | | eml | Romagnolo | eml | | eng | English | eng | | epo | Esperanto | 1115 | | ese | Ese Ejja | ese | | est | Estonian | est | | eus | Basque | bsq | | eve | Even | eve | | evn | Evenki | evn | | ewe | Éwé | ewe | | fao | Faroese | fae | | fij | Fijian | fji | | fin | Finnish | fin | | fkv | Finnish, Kven | fkv | | flm | Chin, Falam | fal | | fon | Fon | foa | | fra | French | frn | | fri | Frisian, Western | fri | | fuf | Pular | fuf | | fur | Friulian | frl | | fuv | Fulfulde, Nigerian | fum | | fuv2 | Fulfulde, Nigerian (2) | fuv | | gaa | Ga | gac2 | | gag | Gagauz | gag | | gax | Oromo, Borana-Arsi-Guji | gax | | gjn | Gonja | dum | | gkp | Kpelle, Guinea | pke | | gla | Gaelic, Scottish | gls | | gld | Nanai | gld | | gle | Gaelic, Irish | gli1 | | glg | Galician | gln | | glv | Manx | No | | gsw1 | Alemannisch (Elsassisch) | gsw | | guc | Wayuu | guc | | gug | Guaraní, Paraguayan | gun | | guj | Gujarati | gjr | | guu | Yanomamö | guu | | gyr | Guarayu | gua | | hat_kreyol | Haitian Creole French (Kreyol) | hat | | hat_popular | Haitian Creole French (Popular) | hat1 | | hau_NE | Hausa (Niger) | gej | | hau_NG | Hausa (Nigeria) | gej | | haw | Hawaiian | hwi | | hea | Hmong, Northern Qiandong | hea | | heb | Hebrew | hbr | | hil | Hiligaynon | hil | | hin | Hindi | hnd | | hlt | Chin, Matu | hlt | | hms | Hmong, Southern Qiandong | hms | | hna | Mina | hna | | hni | Hani | hni | | hns | Hindustani, Sarnami | hns | | hrv | Croatian | src2 | | hsb | Sorbian, Upper | wee | | hsf | Huastec (Sierra de Otontepec) | hus | | hun | Hungarian | hng | | hus | Huastec (Veracruz) | 1118 | | huu | Huitoto, Murui | huu | | hva | Huastec (San Luís Potosí) | hva | | hye | Armenian | arm | | ibb | Ibibio | ibb | | ibo | Igbo | igr | | ido | Ido | 1120 | | ike | Inuktitut, Eastern Canadian | esb | | ilo | Ilocano | ilo | | ina | Interlingua | 1119 | | ind | Indonesian | inz | | isl | Icelandic | ice | | ita | Italian | itn | | jav | Javanese (Latin) | jan | | jav_java | Javanese (Javanese) | No | | jiv | Shuar | 1125 | | jpn | Japanese | jpn | | kal | Inuktitut, Greenlandic | esg | | kan | Kannada | kjv | | kat | Georgian | geo | | kaz | Kazakh | kaz | | kbd | Kabardian | kbd | | kbp | Kabiyé | kbp | | kde | Makonde | kde | | kdh | Tem | kdh | | kea | Kabuverdianu | kea | | kek | Q'eqchi' | 1116 | | kha | Khasi | kha | | khk | Mongolian, Halh (Cyrillic) | khk | | khm | Khmer, Central | khm | | kin | Rwanda | rua1 | | kir | Kirghiz | kdo | | kjh | Khakas | kjh | | kkh_lana | Khün | No | | kmb | Mbundu | mlo | | kmr | Kurdish, Northern | kur | | knc | Kanuri, Central | kph | | kng | Koongo | kon | | kng_AO | Koongo (Angola) | kng | | koi | Komi-Permyak | koi | | koo | Konjo | koo1 | | kor | Korean | kkn | | kqn | Kaonde | kqn | | kqs | Kissi, Northern | kqs | | kri | Krio | kri | | krl | Karelian | krl | | ktu | Kituba | ktu | | kwi | Awa-Cuaiquer | kwi | | lad | Ladino | lad | | lao | Lao | nol | | lat | Latin | ltn | | lat_1 | Latin (1) | ltn1 | | lav | Latvian | lat | | lia | Limba, West-Central | lia | | lij | Ligurian | lij | | lin | Lingala | lin | | lin_tones | Lingala (tones) | No | | lit | Lithuanian | lit | | lld | Ladin | lld | | lnc | Occitan (Languedocien) | prv1 | | lns | Lamnso' | nso | | lob | Lobi | lob | | lot | Otuho | lot | | loz | Lozi | lbm1 | | ltz | Luxembourgeois | lux | | lua | Luba-Kasai | lub | | lue | Luvale | lue | | lug | Ganda | lap1 | | lun | Lunda | mlo1 | | lus | Mizo | lus | | mad | Madura | mhj | | mag | Magahi | mqm | | mah | Marshallese | mzm | | mai | Maithili | No | | mal | Malayalam | mjs | | mal_chillus | Malayalam | mjs | | mam | Mam, Northern | mam | | mar | Marathi | mrt | | maz | Mazahua Central | maz | | mcd | Sharanahua | mcd | | mcf | Matsés | mcf | | men | Mende | mfy | | mfq | Moba | mfq | | mic | Micmac | mic | | min | Minangkabau | mpu | | miq | Mískito | miq | | mkd | Macedonian | mkj | | mlt | Maltese | mls | | mly_arab | Malay (Arabic) | No | | mly_latn | Malay (Latin) | mli | | mnw | Mon | No | | mos | Mòoré | mhm | | mri | Maori | mbf | | mto | Mixe, Totontepec | mto | | mxi | Mozarabic | moz | | mxv | Mixtec, Metlatónoc | mxv | | mya | Burmese | bms | | mzi | Mazatec, Ixcatlán | mao | | nav | Navajo | nav | | nba | Nyemba | nba | | nbl | Ndebele | nel | | ndo | Ndonga | 1114 | | nds | Saxon, Low | ige | | nep | Nepali | nep | | nhn | Nahuatl, Central | nhn | | nio | Nganasan | nio | | niu | Niue | niu | | njo | Naga, Ao | njo | | nku | Kulango, Bouna | kou | | nld | Dutch | dut | | nno | Norwegian, Nynorsk | nrn | | nob | Norwegian, Bokmål | nrr | | not | Nomatsiguenga | not | | nso | Sotho, Northern | srt | | nya_chechewa | Nyanja (Chechewa) | nyj1 | | nya_chinyanja | Nyanja (Chinyanja) | nyj | | nym | Nyamwezi | nyz | | nyn | Nyankore | nyn1 | | nzi | Nzema | nze | | oaa | Orok | oaa | | oci_1 | Occitan (Francoprovençal, Fribourg) | Fr3 | | oci_2 | Occitan (Francoprovençal, Savoie) | fr2 | | oci_3 | Occitan (Francoprovençal, Vaud) | fr4 | | oci_4 | Occitan (Francoprovençal, Valais) | frp | | ojb | Ojibwa, Northwestern | ojb | | oki | Okiek | oki | | orh | Oroqen | orh | | oss | Osetin | ose | | ote | Otomi, Mezquital | 1111 | | pam | Pampangan | pmp | | pan | Panjabi, Eastern | pnj1 | | pap | Papiamentu | pap | | pau | Palauan | plu | | pbb | Páez | pbb | | pbu | Pashto, Northern | pbu | | pcd | Picard | frn2 | | pcm | Pidgin, Nigerian | pcm | | pes_1 | Farsi, Western | prs | | pes_2 | Dari | prs1 | | pis | Pijin | pis | | piu | Pintupi-Luritja | piu | | plt | Malagasy, Plateau | mex | | pnb | Panjabi, Western | No | | pol | Polish | pql | | pon | Pohnpeian | pnf | | por_BR | Portuguese (Brazil) | No | | por_PT | Portuguese (Portugal) | por | | pov | Crioulo, Upper Guinea | gbc | | ppl | Pipil | ppl | | prv | Occitan | pro | | quc | K'iche', Central | 1117 | | qud | Quechua (Unified Quichua, old Hispanic orthography) | qud1 | | qug | Quichua, Chimborazo Highland | qug | | quy | Quechua, Ayacucho | quy | | quz | Quechua, Cusco | quz | | qva | Quechua, Ambo-Pasco | qeg | | qvc | Quechua, Cajamarca | qnt | | qvh | Quechua, Huamalíes-Dos de Mayo Huánuco | qej | | qvm | Quechua, Margos-Yarowilca-Lauricocha | qei | | qvn | Quechua, North Junín | qju | | qwh | Quechua, Huaylas Ancash | qan | | qxa | Quechua, South Bolivian | qec1 | | qxn | Quechua, Northern Conchucos Ancash | qed | | qxu | Quechua, Arequipa-La Unión | qar | | rar | Rarotongan | rrt | | rmn | Romani, Balkan | rmn | | rmn_1 | Romani, Balkan (1) | rmn1 | | rmy | Aromanian | rmy1 | | roh | Romansch | No | | roh_puter | Romansch (Puter) | No | | roh_rumgr | Romansch (Grischun) | rhe | | roh_surmiran | Romansch (Surmiran) | No | | roh_sursilv | Romansch (Sursilvan) | No | | roh_sutsilv | Romansch (Sutsilvan) | No | | roh_vallader | Romansch (Vallader) | No | | ron_1953 | Romanian (1953) | rum | | ron_1993 | Romanian (1993) | No | | ron_2006 | Romanian (2006) | No | | run | Rundi | rud1 | | rus | Russian | rus | | sag | Sango | saj | | sah | Yakut | sah | | san | Sanskrit | skt | | sco | Scots | sco | | sey | Secoya | 1123 | | shk | Shilluk | shk | | shn | Shan | sjn | | shp | Shipibo-Conibo | shp | | sin | Sinhala | snh | | skr | Seraiki | skr | | slk | Slovak | slo | | slv | Slovenian | slv | | sme | Saami, North | lpi | | smo | Samoan | smy | | sna | Shona | shd | | snk | Soninke | snn | | snn | Siona | 1121 | | som | Somali | som | | sot | Sotho, Southern | sso | | spa | Spanish | spn | | src | Sardinian, Logudorese | srd | | srp_cyrl | Serbian (Cyrillic) | src5 | | srp_latn | Serbian (Latin) | src3 | | srr | Serer-Sine | ses | | ssw | Swati | swz1 | | suk | Sukuma | sua | | sun | Sunda | suo | | sus | Susu | sus | | swb | Comorian, Maore | swb | | swe | Swedish | swd | | swh | Swahili | swa | | tah | Tahitian | tht | | tam | Tamil | tcv | | tam_LK | Tamil (Sri Lanka) | No | | tat | Tatar | ttr | | tbz | Ditammari | tbz | | tca | Ticuna | tca | | tel | Telugu | tcw | | tem | Themne | tej | | tet | Tetun | ttm | | tgk | Tajiki | pet | | tgl | Tagalog | tgl | | tha | Thai | thj | | tha2 | Thai (2) | No | | tir | Tigrigna | tgn | | tiv | Tiv | tiv | | tly | Talysh | tly | | tob | Toba | tob | | toi | Tonga | toi | | toj | Tojolabal | toj | | ton | Tongan | tov | | top | Totonac, Papantla | top | | tpi | Tok Pisin | pdg | | tsn | Tswana | tsw | | tso_MZ | Tsonga (Mozambique) | tso | | tso_ZW | Tsonga (Zimbabwe) | tso1 | | tsz | Purepecha | 1112 | | tuk_cyrl | Turkmen (Cyrillic) | tck | | tuk_latn | Turkmen (Latin) | No | | tur | Turkish | trk | | tyv | Tuva | tyv | | tzc | Tzotzil (Chamula) | tzc | | tzh | Tzeltal, Oxchuc | tzc1 | | tzm | Tamazight, Central Atlas | tzm | | uig_arab | Uyghur (Arabic) | uig | | uig_latn | Uyghur (Latin) | No | | ukr | Ukrainian | ukr | | umb | Umbundu | mnf | | ura | Urarina | ura | | urd | Urdu | urd | | urd_2 | Urdu (2) | urd | | uzn_cyrl | Uzbek, Northern (Cyrillic) | uzb1 | | uzn_latn | Uzbek, Northern (Latin) | uzb | | vai | Vai | vai | | vec | Venetian | vec | | ven | Venda | tsh | | ven2 | Venda | ven | | vep | Veps | vep | | vie | Vietnamese | vie | | vmw | Makhuwa | vmw | | war | Waray-Waray | wry | | wln | Walloon | frn1 | | wol | Wolof | wol | | wwa | Waama | ako | | xho | Xhosa | xos | | xsm | Kasem | kas | | yad | Yagua | yad | | yao | Yao | yao | | yap | Yapese | yps | | ydd | Yiddish, Eastern | ydd | | ykg | Yukaghir, Northern | ykg | | yor | Yoruba | yor | | yua | Maya, Yucatán | yua | | zam | Zapotec, Miahuatlán | zam | | zdj | Comorian, Ngazidja | zdj | | zgh | Tamazight, Standard Morocan | ama | | zro | Záparo | 1124 | | ztu | Zapotec, Güilá | ztu1 | | zul | Zulu | zuu |

License

MIT © Titus Wormer