README
trigrams
Trigrams for 400+ languages.
Install
This package is ESM only: Node 12+ is needed to use it and it must be imported
instead of required.
npm:
npm install trigrams
API
This package exports the following identifiers: top, min.
There is no default export.
top()
import {top} from 'trigrams'
console.log((await top()).pam)
Yields:
{
'isa': 6,
'upa': 6,
'i k': 6,
// …
'ang': 273,
'ing': 282,
'ng ': 572
}
Returns a promise resolving to an object mapping UDHR in Unicode codes to objects mapping the top 300 trigrams to occurrence counts.
min()
import {min} from 'trigrams'
console.log((await min()).nld)
Yields:
[
' ar',
'eer',
'tij',
// …
'de ',
'an ',
'en '
]
A bit like top, but returns a promise resolving to arrays containing the top
300 trigrams sorted from least occurring to most occurring.
Data
The trigrams are based on the unicode versions of the universal declaration of human rights.
The files are created from all paragraphs made available by
wooorm/udhr and do not include headings and such.
Cleaning
Before creating trigrams,
- The unicode characters from
\u0021to\u0040(both including) are removed - One or more white space characters (
\s+) are replaced with a single space - Alphabetic characters are lower cased (
[A-Z])
Additionally, the input is padded with two spaces on both sides.
Support
| Code | Name | OHCHR | | - | - | - | | 007 | Sãotomense | 1128 | | 008 | Crioulo, Upper Guinea (008) | No | | 009 | Mbundu (009) | No | | 010 | Tetun Dili | No | | 011 | Umbundu (011) | No | | 012 | (Bizisa) | bz1 | | 013 | (Mijisa) | bz2 | | 014 | (Maiunan) | ma1 | | 016 | (Minjiang, spoken) | mi1_spok | | 017 | (Minjiang, written) | mi1_written | | 020 | Drung | ty1 | | 026 | (Yeonbyeon) | ye1 | | aar | Afar | aar | | abk | Abkhaz | abk | | ace | Aceh | atj | | acu | Achuar-Shiwiar | acu | | acu_1 | Achuar-Shiwiar (1) | jiv | | ada | Dangme | gac1 | | ady | Adyghe | ady | | afr | Afrikaans | afk | | agr | Aguaruna | agr | | aii | Assyrian Neo-Aramaic | aii | | ajg | Aja | ajg | | aka_akuapem | Twi (Akuapem) | tws1 | | aka_asante | Twi (Asante) | ass | | aka_fante | Fante | tws3 | | als | Albanian, Tosk | aln | | alt | Altai, Southern | alt | | amc | Amahuaca | amc | | ame | Yaneshaʼ | ame | | amh | Amharic | amh | | ami | Amis | ami | | amr | Amarakaeri | amr | | arb | Arabic, Standard | arz | | arl | Arabela | arl | | arn | Mapudungun | aru | | ast | Asturian | aub | | auc | Waorani | 1127 | | auv | Occitan (Auvergnat) | auv1 | | ayr | Aymara, Central | aym | | azj_cyrl | Azerbaijani, North (Cyrillic) | azb1 | | azj_latn | Azerbaijani, North (Latin) | azb | | bam | Bamanankan | bra | | ban | Bali | bzc | | bax | Bamun | bax | | bba | Baatonum | bba | | bci | Baoulé | bci | | bcl | Bicolano, Central | bkl | | bel | Belarusan | ruw | | bem | Bemba | bem | | ben | Bengali | bng | | bfa | Bari | bfa | | bho | Bhojpuri | bhj | | bin | Edo | edo | | bis | Bislama | bcy | | blt | Tai Dam | No | | blu | Hmong Njua | blu | | boa | Bora | boa | | bod | Tibetan, Central | tic | | bos_cyrl | Bosnian (Cyrillic) | src4 | | bos_latn | Bosnian (Latin) | src1 | | bre | Breton | brt | | btb | Bulu | btb | | buc | Bushi | buc | | bug | Bugis | bpr | | bul | Bulgarian | blg | | cab | Garifuna | cab | | cak | Kaqchikel, Central | cak1 | | cat | Catalan-Valencian-Balear | cln | | cbi | Chachi | 1122 | | cbr | Cashibo-Cacataibo | cbr | | cbs | Cashinahua | cbs | | cbt | Chayahuita | cbt | | cbu | Candoshi-Shapra | cbu | | ccx | Zhuang, Yongbei | ccx | | ceb | Cebuano | ceb | | ces | Czech | czc | | cha | Chamorro | cjd | | chj | Chinantec, Ojitlán | chj | | chk | Chuukese | tru1 | | chr_cased | Cherokee (cased) | No | | chr_uppercase | Cherokee (uppercase) | No | | cic | Chickasaw | cic | | cjk | Chokwe | cjk | | cjk_AO | Chokwe (Angola) | cjk | | cjs | Shor | cjs | | ckb | Kurdish, Central | kdb1 | | cnh | Chin, Haka | hak | | cni | Asháninka | cni | | cof | Colorado | cof | | cos | Corsican | coi | | cot | Caquinte | cot | | cpu | Ashéninka, Pichis | cpu | | crh | Crimean Tatar | crh | | crs | Seselwa Creole French | crs | | csa | Chinantec, Chiltepec | csa | | csw | Cree, Swampy | crm | | ctd | Chin, Tedim | tid | | cym | Welsh | wls | | dag | Dagbani | dag | | dan | Danish | dns | | ddn | Dendi | den | | deu_1901 | German, Standard (1901) | ger | | deu_1996 | German, Standard (1996) | No | | dga | Dagaare, Southern | dga | | dip | Dinka, Northeastern | dinka | | div | Maldivian | div | | dyo | Jola-Fonyi | dyo | | dyu | Jula | dyu | | dzo | Dzongkha | dzo | | ell_monotonic | Greek (monotonic) | grk | | ell_polytonic | Greek (polytonic) | No | | emk | Maninkakan, Eastern | mni | | eml | Romagnolo | eml | | eng | English | eng | | epo | Esperanto | 1115 | | ese | Ese Ejja | ese | | est | Estonian | est | | eus | Basque | bsq | | eve | Even | eve | | evn | Evenki | evn | | ewe | Éwé | ewe | | fao | Faroese | fae | | fij | Fijian | fji | | fin | Finnish | fin | | fkv | Finnish, Kven | fkv | | flm | Chin, Falam | fal | | fon | Fon | foa | | fra | French | frn | | fri | Frisian, Western | fri | | fuf | Pular | fuf | | fur | Friulian | frl | | fuv | Fulfulde, Nigerian | fum | | fuv2 | Fulfulde, Nigerian (2) | fuv | | gaa | Ga | gac2 | | gag | Gagauz | gag | | gax | Oromo, Borana-Arsi-Guji | gax | | gjn | Gonja | dum | | gkp | Kpelle, Guinea | pke | | gla | Gaelic, Scottish | gls | | gld | Nanai | gld | | gle | Gaelic, Irish | gli1 | | glg | Galician | gln | | glv | Manx | No | | gsw1 | Alemannisch (Elsassisch) | gsw | | guc | Wayuu | guc | | gug | Guaraní, Paraguayan | gun | | guj | Gujarati | gjr | | guu | Yanomamö | guu | | gyr | Guarayu | gua | | hat_kreyol | Haitian Creole French (Kreyol) | hat | | hat_popular | Haitian Creole French (Popular) | hat1 | | hau_NE | Hausa (Niger) | gej | | hau_NG | Hausa (Nigeria) | gej | | haw | Hawaiian | hwi | | hea | Hmong, Northern Qiandong | hea | | heb | Hebrew | hbr | | hil | Hiligaynon | hil | | hin | Hindi | hnd | | hlt | Chin, Matu | hlt | | hms | Hmong, Southern Qiandong | hms | | hna | Mina | hna | | hni | Hani | hni | | hns | Hindustani, Sarnami | hns | | hrv | Croatian | src2 | | hsb | Sorbian, Upper | wee | | hsf | Huastec (Sierra de Otontepec) | hus | | hun | Hungarian | hng | | hus | Huastec (Veracruz) | 1118 | | huu | Huitoto, Murui | huu | | hva | Huastec (San Luís Potosí) | hva | | hye | Armenian | arm | | ibb | Ibibio | ibb | | ibo | Igbo | igr | | ido | Ido | 1120 | | ike | Inuktitut, Eastern Canadian | esb | | ilo | Ilocano | ilo | | ina | Interlingua | 1119 | | ind | Indonesian | inz | | isl | Icelandic | ice | | ita | Italian | itn | | jav | Javanese (Latin) | jan | | jav_java | Javanese (Javanese) | No | | jiv | Shuar | 1125 | | jpn | Japanese | jpn | | kal | Inuktitut, Greenlandic | esg | | kan | Kannada | kjv | | kat | Georgian | geo | | kaz | Kazakh | kaz | | kbd | Kabardian | kbd | | kbp | Kabiyé | kbp | | kde | Makonde | kde | | kdh | Tem | kdh | | kea | Kabuverdianu | kea | | kek | Q'eqchi' | 1116 | | kha | Khasi | kha | | khk | Mongolian, Halh (Cyrillic) | khk | | khm | Khmer, Central | khm | | kin | Rwanda | rua1 | | kir | Kirghiz | kdo | | kjh | Khakas | kjh | | kkh_lana | Khün | No | | kmb | Mbundu | mlo | | kmr | Kurdish, Northern | kur | | knc | Kanuri, Central | kph | | kng | Koongo | kon | | kng_AO | Koongo (Angola) | kng | | koi | Komi-Permyak | koi | | koo | Konjo | koo1 | | kor | Korean | kkn | | kqn | Kaonde | kqn | | kqs | Kissi, Northern | kqs | | kri | Krio | kri | | krl | Karelian | krl | | ktu | Kituba | ktu | | kwi | Awa-Cuaiquer | kwi | | lad | Ladino | lad | | lao | Lao | nol | | lat | Latin | ltn | | lat_1 | Latin (1) | ltn1 | | lav | Latvian | lat | | lia | Limba, West-Central | lia | | lij | Ligurian | lij | | lin | Lingala | lin | | lin_tones | Lingala (tones) | No | | lit | Lithuanian | lit | | lld | Ladin | lld | | lnc | Occitan (Languedocien) | prv1 | | lns | Lamnso' | nso | | lob | Lobi | lob | | lot | Otuho | lot | | loz | Lozi | lbm1 | | ltz | Luxembourgeois | lux | | lua | Luba-Kasai | lub | | lue | Luvale | lue | | lug | Ganda | lap1 | | lun | Lunda | mlo1 | | lus | Mizo | lus | | mad | Madura | mhj | | mag | Magahi | mqm | | mah | Marshallese | mzm | | mai | Maithili | No | | mal | Malayalam | mjs | | mal_chillus | Malayalam | mjs | | mam | Mam, Northern | mam | | mar | Marathi | mrt | | maz | Mazahua Central | maz | | mcd | Sharanahua | mcd | | mcf | Matsés | mcf | | men | Mende | mfy | | mfq | Moba | mfq | | mic | Micmac | mic | | min | Minangkabau | mpu | | miq | Mískito | miq | | mkd | Macedonian | mkj | | mlt | Maltese | mls | | mly_arab | Malay (Arabic) | No | | mly_latn | Malay (Latin) | mli | | mnw | Mon | No | | mos | Mòoré | mhm | | mri | Maori | mbf | | mto | Mixe, Totontepec | mto | | mxi | Mozarabic | moz | | mxv | Mixtec, Metlatónoc | mxv | | mya | Burmese | bms | | mzi | Mazatec, Ixcatlán | mao | | nav | Navajo | nav | | nba | Nyemba | nba | | nbl | Ndebele | nel | | ndo | Ndonga | 1114 | | nds | Saxon, Low | ige | | nep | Nepali | nep | | nhn | Nahuatl, Central | nhn | | nio | Nganasan | nio | | niu | Niue | niu | | njo | Naga, Ao | njo | | nku | Kulango, Bouna | kou | | nld | Dutch | dut | | nno | Norwegian, Nynorsk | nrn | | nob | Norwegian, Bokmål | nrr | | not | Nomatsiguenga | not | | nso | Sotho, Northern | srt | | nya_chechewa | Nyanja (Chechewa) | nyj1 | | nya_chinyanja | Nyanja (Chinyanja) | nyj | | nym | Nyamwezi | nyz | | nyn | Nyankore | nyn1 | | nzi | Nzema | nze | | oaa | Orok | oaa | | oci_1 | Occitan (Francoprovençal, Fribourg) | Fr3 | | oci_2 | Occitan (Francoprovençal, Savoie) | fr2 | | oci_3 | Occitan (Francoprovençal, Vaud) | fr4 | | oci_4 | Occitan (Francoprovençal, Valais) | frp | | ojb | Ojibwa, Northwestern | ojb | | oki | Okiek | oki | | orh | Oroqen | orh | | oss | Osetin | ose | | ote | Otomi, Mezquital | 1111 | | pam | Pampangan | pmp | | pan | Panjabi, Eastern | pnj1 | | pap | Papiamentu | pap | | pau | Palauan | plu | | pbb | Páez | pbb | | pbu | Pashto, Northern | pbu | | pcd | Picard | frn2 | | pcm | Pidgin, Nigerian | pcm | | pes_1 | Farsi, Western | prs | | pes_2 | Dari | prs1 | | pis | Pijin | pis | | piu | Pintupi-Luritja | piu | | plt | Malagasy, Plateau | mex | | pnb | Panjabi, Western | No | | pol | Polish | pql | | pon | Pohnpeian | pnf | | por_BR | Portuguese (Brazil) | No | | por_PT | Portuguese (Portugal) | por | | pov | Crioulo, Upper Guinea | gbc | | ppl | Pipil | ppl | | prv | Occitan | pro | | quc | K'iche', Central | 1117 | | qud | Quechua (Unified Quichua, old Hispanic orthography) | qud1 | | qug | Quichua, Chimborazo Highland | qug | | quy | Quechua, Ayacucho | quy | | quz | Quechua, Cusco | quz | | qva | Quechua, Ambo-Pasco | qeg | | qvc | Quechua, Cajamarca | qnt | | qvh | Quechua, Huamalíes-Dos de Mayo Huánuco | qej | | qvm | Quechua, Margos-Yarowilca-Lauricocha | qei | | qvn | Quechua, North Junín | qju | | qwh | Quechua, Huaylas Ancash | qan | | qxa | Quechua, South Bolivian | qec1 | | qxn | Quechua, Northern Conchucos Ancash | qed | | qxu | Quechua, Arequipa-La Unión | qar | | rar | Rarotongan | rrt | | rmn | Romani, Balkan | rmn | | rmn_1 | Romani, Balkan (1) | rmn1 | | rmy | Aromanian | rmy1 | | roh | Romansch | No | | roh_puter | Romansch (Puter) | No | | roh_rumgr | Romansch (Grischun) | rhe | | roh_surmiran | Romansch (Surmiran) | No | | roh_sursilv | Romansch (Sursilvan) | No | | roh_sutsilv | Romansch (Sutsilvan) | No | | roh_vallader | Romansch (Vallader) | No | | ron_1953 | Romanian (1953) | rum | | ron_1993 | Romanian (1993) | No | | ron_2006 | Romanian (2006) | No | | run | Rundi | rud1 | | rus | Russian | rus | | sag | Sango | saj | | sah | Yakut | sah | | san | Sanskrit | skt | | sco | Scots | sco | | sey | Secoya | 1123 | | shk | Shilluk | shk | | shn | Shan | sjn | | shp | Shipibo-Conibo | shp | | sin | Sinhala | snh | | skr | Seraiki | skr | | slk | Slovak | slo | | slv | Slovenian | slv | | sme | Saami, North | lpi | | smo | Samoan | smy | | sna | Shona | shd | | snk | Soninke | snn | | snn | Siona | 1121 | | som | Somali | som | | sot | Sotho, Southern | sso | | spa | Spanish | spn | | src | Sardinian, Logudorese | srd | | srp_cyrl | Serbian (Cyrillic) | src5 | | srp_latn | Serbian (Latin) | src3 | | srr | Serer-Sine | ses | | ssw | Swati | swz1 | | suk | Sukuma | sua | | sun | Sunda | suo | | sus | Susu | sus | | swb | Comorian, Maore | swb | | swe | Swedish | swd | | swh | Swahili | swa | | tah | Tahitian | tht | | tam | Tamil | tcv | | tam_LK | Tamil (Sri Lanka) | No | | tat | Tatar | ttr | | tbz | Ditammari | tbz | | tca | Ticuna | tca | | tel | Telugu | tcw | | tem | Themne | tej | | tet | Tetun | ttm | | tgk | Tajiki | pet | | tgl | Tagalog | tgl | | tha | Thai | thj | | tha2 | Thai (2) | No | | tir | Tigrigna | tgn | | tiv | Tiv | tiv | | tly | Talysh | tly | | tob | Toba | tob | | toi | Tonga | toi | | toj | Tojolabal | toj | | ton | Tongan | tov | | top | Totonac, Papantla | top | | tpi | Tok Pisin | pdg | | tsn | Tswana | tsw | | tso_MZ | Tsonga (Mozambique) | tso | | tso_ZW | Tsonga (Zimbabwe) | tso1 | | tsz | Purepecha | 1112 | | tuk_cyrl | Turkmen (Cyrillic) | tck | | tuk_latn | Turkmen (Latin) | No | | tur | Turkish | trk | | tyv | Tuva | tyv | | tzc | Tzotzil (Chamula) | tzc | | tzh | Tzeltal, Oxchuc | tzc1 | | tzm | Tamazight, Central Atlas | tzm | | uig_arab | Uyghur (Arabic) | uig | | uig_latn | Uyghur (Latin) | No | | ukr | Ukrainian | ukr | | umb | Umbundu | mnf | | ura | Urarina | ura | | urd | Urdu | urd | | urd_2 | Urdu (2) | urd | | uzn_cyrl | Uzbek, Northern (Cyrillic) | uzb1 | | uzn_latn | Uzbek, Northern (Latin) | uzb | | vai | Vai | vai | | vec | Venetian | vec | | ven | Venda | tsh | | ven2 | Venda | ven | | vep | Veps | vep | | vie | Vietnamese | vie | | vmw | Makhuwa | vmw | | war | Waray-Waray | wry | | wln | Walloon | frn1 | | wol | Wolof | wol | | wwa | Waama | ako | | xho | Xhosa | xos | | xsm | Kasem | kas | | yad | Yagua | yad | | yao | Yao | yao | | yap | Yapese | yps | | ydd | Yiddish, Eastern | ydd | | ykg | Yukaghir, Northern | ykg | | yor | Yoruba | yor | | yua | Maya, Yucatán | yua | | zam | Zapotec, Miahuatlán | zam | | zdj | Comorian, Ngazidja | zdj | | zgh | Tamazight, Standard Morocan | ama | | zro | Záparo | 1124 | | ztu | Zapotec, Güilá | ztu1 | | zul | Zulu | zuu |