arabic-stemmer

A simple stemmer for arabic words.

Usage no npm install needed!

<script type="module">
  import arabicStemmer from 'https://cdn.skypack.dev/arabic-stemmer';
</script>

README

Arabic Stemmer

A simple stemmer for arabic words.

The goal of this stemmer is not to produce a linguistically correct root (that is extremely difficult to achieve for the Arabic language), but rather to have a stem that is hopefully shared between various forms of the word.

It does this by producing a list of potential stems as well as a normalized form of the original word (which might be helpful as a boosting signal when executing a search).

Installation

npm install arabic-stemmer

or include the file from the dist folder

<script src="./dist/index.js"></script>

Usage

import Stemmer from 'arabic-stemmer';
const stemmer = new Stemmer();    // only this line when included with script tag

console.log(stemmer.stem('المستشفيات')); 
console.log(stemmer.stem('كالشفاء')); 
/*
output:
{ stem: [ 'شفي', 'سشف' ], normalized: 'مستشف' }
{ stem: [ 'شفي' ], normalized: 'شفا' }
(both share a common stem ('شفي'))
*/

console.log(stemmer.stem('الأولاد')); 
console.log(stemmer.stem('المولودين')); 
/*
output: 
{ stem: [ 'ولد' ], normalized: 'اولاد' }
{ stem: [ 'ولد', 'ملد' ], normalized: 'مولود' }
(both share a common stem 'ولد')
*/

Output

Stemmer.stem(input) returns an object with the following fields:

  • stem: a list of potential stems.
  • normalized: the original word but normalized (without affixes or diacritics).