README
trigrams
Trigrams for 400+ languages.
Install
This package is ESM only: Node 12+ is needed to use it and it must be import
ed
instead of require
d.
npm:
npm install trigrams
API
This package exports the following identifiers: top
, min
.
There is no default export.
top()
import {top} from 'trigrams'
console.log((await top()).pam)
Yields:
{
'isa': 6,
'upa': 6,
'i k': 6,
// …
'ang': 273,
'ing': 282,
'ng ': 572
}
Returns a promise resolving to an object mapping UDHR in Unicode codes to objects mapping the top 300 trigrams to occurrence counts.
min()
import {min} from 'trigrams'
console.log((await min()).nld)
Yields:
[
' ar',
'eer',
'tij',
// …
'de ',
'an ',
'en '
]
A bit like top
, but returns a promise resolving to arrays containing the top
300 trigrams sorted from least occurring to most occurring.
Data
The trigrams are based on the unicode versions of the universal declaration of human rights.
The files are created from all paragraphs made available by
wooorm/udhr
and do not include headings and such.
Cleaning
Before creating trigrams,
- The unicode characters from
\u0021
to\u0040
(both including) are removed - One or more white space characters (
\s+
) are replaced with a single space - Alphabetic characters are lower cased (
[A-Z]
)
Additionally, the input is padded with two spaces on both sides.
Support
| Code | Name | OHCHR | | - | - | - | | 007 | Sãotomense | 1128 | | 008 | Crioulo, Upper Guinea (008) | No | | 009 | Mbundu (009) | No | | 010 | Tetun Dili | No | | 011 | Umbundu (011) | No | | 012 | (Bizisa) | bz1 | | 013 | (Mijisa) | bz2 | | 014 | (Maiunan) | ma1 | | 016 | (Minjiang, spoken) | mi1_spok | | 017 | (Minjiang, written) | mi1_written | | 020 | Drung | ty1 | | 026 | (Yeonbyeon) | ye1 | | aar | Afar | aar | | abk | Abkhaz | abk | | ace | Aceh | atj | | acu | Achuar-Shiwiar | acu | | acu_1 | Achuar-Shiwiar (1) | jiv | | ada | Dangme | gac1 | | ady | Adyghe | ady | | afr | Afrikaans | afk | | agr | Aguaruna | agr | | aii | Assyrian Neo-Aramaic | aii | | ajg | Aja | ajg | | aka_akuapem | Twi (Akuapem) | tws1 | | aka_asante | Twi (Asante) | ass | | aka_fante | Fante | tws3 | | als | Albanian, Tosk | aln | | alt | Altai, Southern | alt | | amc | Amahuaca | amc | | ame | Yaneshaʼ | ame | | amh | Amharic | amh | | ami | Amis | ami | | amr | Amarakaeri | amr | | arb | Arabic, Standard | arz | | arl | Arabela | arl | | arn | Mapudungun | aru | | ast | Asturian | aub | | auc | Waorani | 1127 | | auv | Occitan (Auvergnat) | auv1 | | ayr | Aymara, Central | aym | | azj_cyrl | Azerbaijani, North (Cyrillic) | azb1 | | azj_latn | Azerbaijani, North (Latin) | azb | | bam | Bamanankan | bra | | ban | Bali | bzc | | bax | Bamun | bax | | bba | Baatonum | bba | | bci | Baoulé | bci | | bcl | Bicolano, Central | bkl | | bel | Belarusan | ruw | | bem | Bemba | bem | | ben | Bengali | bng | | bfa | Bari | bfa | | bho | Bhojpuri | bhj | | bin | Edo | edo | | bis | Bislama | bcy | | blt | Tai Dam | No | | blu | Hmong Njua | blu | | boa | Bora | boa | | bod | Tibetan, Central | tic | | bos_cyrl | Bosnian (Cyrillic) | src4 | | bos_latn | Bosnian (Latin) | src1 | | bre | Breton | brt | | btb | Bulu | btb | | buc | Bushi | buc | | bug | Bugis | bpr | | bul | Bulgarian | blg | | cab | Garifuna | cab | | cak | Kaqchikel, Central | cak1 | | cat | Catalan-Valencian-Balear | cln | | cbi | Chachi | 1122 | | cbr | Cashibo-Cacataibo | cbr | | cbs | Cashinahua | cbs | | cbt | Chayahuita | cbt | | cbu | Candoshi-Shapra | cbu | | ccx | Zhuang, Yongbei | ccx | | ceb | Cebuano | ceb | | ces | Czech | czc | | cha | Chamorro | cjd | | chj | Chinantec, Ojitlán | chj | | chk | Chuukese | tru1 | | chr_cased | Cherokee (cased) | No | | chr_uppercase | Cherokee (uppercase) | No | | cic | Chickasaw | cic | | cjk | Chokwe | cjk | | cjk_AO | Chokwe (Angola) | cjk | | cjs | Shor | cjs | | ckb | Kurdish, Central | kdb1 | | cnh | Chin, Haka | hak | | cni | Asháninka | cni | | cof | Colorado | cof | | cos | Corsican | coi | | cot | Caquinte | cot | | cpu | Ashéninka, Pichis | cpu | | crh | Crimean Tatar | crh | | crs | Seselwa Creole French | crs | | csa | Chinantec, Chiltepec | csa | | csw | Cree, Swampy | crm | | ctd | Chin, Tedim | tid | | cym | Welsh | wls | | dag | Dagbani | dag | | dan | Danish | dns | | ddn | Dendi | den | | deu_1901 | German, Standard (1901) | ger | | deu_1996 | German, Standard (1996) | No | | dga | Dagaare, Southern | dga | | dip | Dinka, Northeastern | dinka | | div | Maldivian | div | | dyo | Jola-Fonyi | dyo | | dyu | Jula | dyu | | dzo | Dzongkha | dzo | | ell_monotonic | Greek (monotonic) | grk | | ell_polytonic | Greek (polytonic) | No | | emk | Maninkakan, Eastern | mni | | eml | Romagnolo | eml | | eng | English | eng | | epo | Esperanto | 1115 | | ese | Ese Ejja | ese | | est | Estonian | est | | eus | Basque | bsq | | eve | Even | eve | | evn | Evenki | evn | | ewe | Éwé | ewe | | fao | Faroese | fae | | fij | Fijian | fji | | fin | Finnish | fin | | fkv | Finnish, Kven | fkv | | flm | Chin, Falam | fal | | fon | Fon | foa | | fra | French | frn | | fri | Frisian, Western | fri | | fuf | Pular | fuf | | fur | Friulian | frl | | fuv | Fulfulde, Nigerian | fum | | fuv2 | Fulfulde, Nigerian (2) | fuv | | gaa | Ga | gac2 | | gag | Gagauz | gag | | gax | Oromo, Borana-Arsi-Guji | gax | | gjn | Gonja | dum | | gkp | Kpelle, Guinea | pke | | gla | Gaelic, Scottish | gls | | gld | Nanai | gld | | gle | Gaelic, Irish | gli1 | | glg | Galician | gln | | glv | Manx | No | | gsw1 | Alemannisch (Elsassisch) | gsw | | guc | Wayuu | guc | | gug | Guaraní, Paraguayan | gun | | guj | Gujarati | gjr | | guu | Yanomamö | guu | | gyr | Guarayu | gua | | hat_kreyol | Haitian Creole French (Kreyol) | hat | | hat_popular | Haitian Creole French (Popular) | hat1 | | hau_NE | Hausa (Niger) | gej | | hau_NG | Hausa (Nigeria) | gej | | haw | Hawaiian | hwi | | hea | Hmong, Northern Qiandong | hea | | heb | Hebrew | hbr | | hil | Hiligaynon | hil | | hin | Hindi | hnd | | hlt | Chin, Matu | hlt | | hms | Hmong, Southern Qiandong | hms | | hna | Mina | hna | | hni | Hani | hni | | hns | Hindustani, Sarnami | hns | | hrv | Croatian | src2 | | hsb | Sorbian, Upper | wee | | hsf | Huastec (Sierra de Otontepec) | hus | | hun | Hungarian | hng | | hus | Huastec (Veracruz) | 1118 | | huu | Huitoto, Murui | huu | | hva | Huastec (San Luís Potosí) | hva | | hye | Armenian | arm | | ibb | Ibibio | ibb | | ibo | Igbo | igr | | ido | Ido | 1120 | | ike | Inuktitut, Eastern Canadian | esb | | ilo | Ilocano | ilo | | ina | Interlingua | 1119 | | ind | Indonesian | inz | | isl | Icelandic | ice | | ita | Italian | itn | | jav | Javanese (Latin) | jan | | jav_java | Javanese (Javanese) | No | | jiv | Shuar | 1125 | | jpn | Japanese | jpn | | kal | Inuktitut, Greenlandic | esg | | kan | Kannada | kjv | | kat | Georgian | geo | | kaz | Kazakh | kaz | | kbd | Kabardian | kbd | | kbp | Kabiyé | kbp | | kde | Makonde | kde | | kdh | Tem | kdh | | kea | Kabuverdianu | kea | | kek | Q'eqchi' | 1116 | | kha | Khasi | kha | | khk | Mongolian, Halh (Cyrillic) | khk | | khm | Khmer, Central | khm | | kin | Rwanda | rua1 | | kir | Kirghiz | kdo | | kjh | Khakas | kjh | | kkh_lana | Khün | No | | kmb | Mbundu | mlo | | kmr | Kurdish, Northern | kur | | knc | Kanuri, Central | kph | | kng | Koongo | kon | | kng_AO | Koongo (Angola) | kng | | koi | Komi-Permyak | koi | | koo | Konjo | koo1 | | kor | Korean | kkn | | kqn | Kaonde | kqn | | kqs | Kissi, Northern | kqs | | kri | Krio | kri | | krl | Karelian | krl | | ktu | Kituba | ktu | | kwi | Awa-Cuaiquer | kwi | | lad | Ladino | lad | | lao | Lao | nol | | lat | Latin | ltn | | lat_1 | Latin (1) | ltn1 | | lav | Latvian | lat | | lia | Limba, West-Central | lia | | lij | Ligurian | lij | | lin | Lingala | lin | | lin_tones | Lingala (tones) | No | | lit | Lithuanian | lit | | lld | Ladin | lld | | lnc | Occitan (Languedocien) | prv1 | | lns | Lamnso' | nso | | lob | Lobi | lob | | lot | Otuho | lot | | loz | Lozi | lbm1 | | ltz | Luxembourgeois | lux | | lua | Luba-Kasai | lub | | lue | Luvale | lue | | lug | Ganda | lap1 | | lun | Lunda | mlo1 | | lus | Mizo | lus | | mad | Madura | mhj | | mag | Magahi | mqm | | mah | Marshallese | mzm | | mai | Maithili | No | | mal | Malayalam | mjs | | mal_chillus | Malayalam | mjs | | mam | Mam, Northern | mam | | mar | Marathi | mrt | | maz | Mazahua Central | maz | | mcd | Sharanahua | mcd | | mcf | Matsés | mcf | | men | Mende | mfy | | mfq | Moba | mfq | | mic | Micmac | mic | | min | Minangkabau | mpu | | miq | Mískito | miq | | mkd | Macedonian | mkj | | mlt | Maltese | mls | | mly_arab | Malay (Arabic) | No | | mly_latn | Malay (Latin) | mli | | mnw | Mon | No | | mos | Mòoré | mhm | | mri | Maori | mbf | | mto | Mixe, Totontepec | mto | | mxi | Mozarabic | moz | | mxv | Mixtec, Metlatónoc | mxv | | mya | Burmese | bms | | mzi | Mazatec, Ixcatlán | mao | | nav | Navajo | nav | | nba | Nyemba | nba | | nbl | Ndebele | nel | | ndo | Ndonga | 1114 | | nds | Saxon, Low | ige | | nep | Nepali | nep | | nhn | Nahuatl, Central | nhn | | nio | Nganasan | nio | | niu | Niue | niu | | njo | Naga, Ao | njo | | nku | Kulango, Bouna | kou | | nld | Dutch | dut | | nno | Norwegian, Nynorsk | nrn | | nob | Norwegian, Bokmål | nrr | | not | Nomatsiguenga | not | | nso | Sotho, Northern | srt | | nya_chechewa | Nyanja (Chechewa) | nyj1 | | nya_chinyanja | Nyanja (Chinyanja) | nyj | | nym | Nyamwezi | nyz | | nyn | Nyankore | nyn1 | | nzi | Nzema | nze | | oaa | Orok | oaa | | oci_1 | Occitan (Francoprovençal, Fribourg) | Fr3 | | oci_2 | Occitan (Francoprovençal, Savoie) | fr2 | | oci_3 | Occitan (Francoprovençal, Vaud) | fr4 | | oci_4 | Occitan (Francoprovençal, Valais) | frp | | ojb | Ojibwa, Northwestern | ojb | | oki | Okiek | oki | | orh | Oroqen | orh | | oss | Osetin | ose | | ote | Otomi, Mezquital | 1111 | | pam | Pampangan | pmp | | pan | Panjabi, Eastern | pnj1 | | pap | Papiamentu | pap | | pau | Palauan | plu | | pbb | Páez | pbb | | pbu | Pashto, Northern | pbu | | pcd | Picard | frn2 | | pcm | Pidgin, Nigerian | pcm | | pes_1 | Farsi, Western | prs | | pes_2 | Dari | prs1 | | pis | Pijin | pis | | piu | Pintupi-Luritja | piu | | plt | Malagasy, Plateau | mex | | pnb | Panjabi, Western | No | | pol | Polish | pql | | pon | Pohnpeian | pnf | | por_BR | Portuguese (Brazil) | No | | por_PT | Portuguese (Portugal) | por | | pov | Crioulo, Upper Guinea | gbc | | ppl | Pipil | ppl | | prv | Occitan | pro | | quc | K'iche', Central | 1117 | | qud | Quechua (Unified Quichua, old Hispanic orthography) | qud1 | | qug | Quichua, Chimborazo Highland | qug | | quy | Quechua, Ayacucho | quy | | quz | Quechua, Cusco | quz | | qva | Quechua, Ambo-Pasco | qeg | | qvc | Quechua, Cajamarca | qnt | | qvh | Quechua, Huamalíes-Dos de Mayo Huánuco | qej | | qvm | Quechua, Margos-Yarowilca-Lauricocha | qei | | qvn | Quechua, North Junín | qju | | qwh | Quechua, Huaylas Ancash | qan | | qxa | Quechua, South Bolivian | qec1 | | qxn | Quechua, Northern Conchucos Ancash | qed | | qxu | Quechua, Arequipa-La Unión | qar | | rar | Rarotongan | rrt | | rmn | Romani, Balkan | rmn | | rmn_1 | Romani, Balkan (1) | rmn1 | | rmy | Aromanian | rmy1 | | roh | Romansch | No | | roh_puter | Romansch (Puter) | No | | roh_rumgr | Romansch (Grischun) | rhe | | roh_surmiran | Romansch (Surmiran) | No | | roh_sursilv | Romansch (Sursilvan) | No | | roh_sutsilv | Romansch (Sutsilvan) | No | | roh_vallader | Romansch (Vallader) | No | | ron_1953 | Romanian (1953) | rum | | ron_1993 | Romanian (1993) | No | | ron_2006 | Romanian (2006) | No | | run | Rundi | rud1 | | rus | Russian | rus | | sag | Sango | saj | | sah | Yakut | sah | | san | Sanskrit | skt | | sco | Scots | sco | | sey | Secoya | 1123 | | shk | Shilluk | shk | | shn | Shan | sjn | | shp | Shipibo-Conibo | shp | | sin | Sinhala | snh | | skr | Seraiki | skr | | slk | Slovak | slo | | slv | Slovenian | slv | | sme | Saami, North | lpi | | smo | Samoan | smy | | sna | Shona | shd | | snk | Soninke | snn | | snn | Siona | 1121 | | som | Somali | som | | sot | Sotho, Southern | sso | | spa | Spanish | spn | | src | Sardinian, Logudorese | srd | | srp_cyrl | Serbian (Cyrillic) | src5 | | srp_latn | Serbian (Latin) | src3 | | srr | Serer-Sine | ses | | ssw | Swati | swz1 | | suk | Sukuma | sua | | sun | Sunda | suo | | sus | Susu | sus | | swb | Comorian, Maore | swb | | swe | Swedish | swd | | swh | Swahili | swa | | tah | Tahitian | tht | | tam | Tamil | tcv | | tam_LK | Tamil (Sri Lanka) | No | | tat | Tatar | ttr | | tbz | Ditammari | tbz | | tca | Ticuna | tca | | tel | Telugu | tcw | | tem | Themne | tej | | tet | Tetun | ttm | | tgk | Tajiki | pet | | tgl | Tagalog | tgl | | tha | Thai | thj | | tha2 | Thai (2) | No | | tir | Tigrigna | tgn | | tiv | Tiv | tiv | | tly | Talysh | tly | | tob | Toba | tob | | toi | Tonga | toi | | toj | Tojolabal | toj | | ton | Tongan | tov | | top | Totonac, Papantla | top | | tpi | Tok Pisin | pdg | | tsn | Tswana | tsw | | tso_MZ | Tsonga (Mozambique) | tso | | tso_ZW | Tsonga (Zimbabwe) | tso1 | | tsz | Purepecha | 1112 | | tuk_cyrl | Turkmen (Cyrillic) | tck | | tuk_latn | Turkmen (Latin) | No | | tur | Turkish | trk | | tyv | Tuva | tyv | | tzc | Tzotzil (Chamula) | tzc | | tzh | Tzeltal, Oxchuc | tzc1 | | tzm | Tamazight, Central Atlas | tzm | | uig_arab | Uyghur (Arabic) | uig | | uig_latn | Uyghur (Latin) | No | | ukr | Ukrainian | ukr | | umb | Umbundu | mnf | | ura | Urarina | ura | | urd | Urdu | urd | | urd_2 | Urdu (2) | urd | | uzn_cyrl | Uzbek, Northern (Cyrillic) | uzb1 | | uzn_latn | Uzbek, Northern (Latin) | uzb | | vai | Vai | vai | | vec | Venetian | vec | | ven | Venda | tsh | | ven2 | Venda | ven | | vep | Veps | vep | | vie | Vietnamese | vie | | vmw | Makhuwa | vmw | | war | Waray-Waray | wry | | wln | Walloon | frn1 | | wol | Wolof | wol | | wwa | Waama | ako | | xho | Xhosa | xos | | xsm | Kasem | kas | | yad | Yagua | yad | | yao | Yao | yao | | yap | Yapese | yps | | ydd | Yiddish, Eastern | ydd | | ykg | Yukaghir, Northern | ykg | | yor | Yoruba | yor | | yua | Maya, Yucatán | yua | | zam | Zapotec, Miahuatlán | zam | | zdj | Comorian, Ngazidja | zdj | | zgh | Tamazight, Standard Morocan | ama | | zro | Záparo | 1124 | | ztu | Zapotec, Güilá | ztu1 | | zul | Zulu | zuu |