README
whoolso-word-filter
A flexible, smart word filter to prevent profanity or whatever whatever suits your taste.
Installing
npm i whoolso-word-filter
const { filterWords } = require('whoolso-word-filter');
Using the filter
This package gives you access to a function called filterWords()
which takes a config object as its sole argument. To keep it simple, the function returns an array with all
the words/phrases found inside a given string.
Building the config object
The config object allows you to configure the filter as much as possible. Take the following example:
const configObj = {
wordsToFilter,
stringToCheck,
lengthThreshold: 3,
leetAlphabet1: textToLeetAlphabet1,
leetAlphabet2: textToLeetAlphabet2,
shortWordLength: 3,
shortWordExceptions
};
const foundWords = filterWords(configObj);
Argument Definitions:
- wordsToFilter: An array containing all the words that you want to filter as strings. If you want to filter phrases, for example
'bad person'
, you'll have to write it without spaces'badperson'
.
There's no need to add the plural version of a word unless the grammar varies, for instance, if one of your words is'idiot'
, it is not necessary to add'idiots'
because it contains the root of the word, which is what interests us. It's not necessary to add the leet versions of your word either (ex.'1d1ot'
). If you want to be really strict, it'd be a good idea to add misspelled versions of the word (ex.'stupid'
could be intentionally misspelled as'stupd'
).
// Suppose we want to filter some political terms. Our wordsToFilter array could be something like this:
const wordsToFilter = [
'gop',
'gerrymander',
'republican',
'republikan',
'rpublican',
'rpblican',
'rpublicn',
'repblicn',
'lefty',
'lfty',
'lftwing'
];
Also, make sure all the words you add are lowercase. The filter converts the string you want to check to lowercase, so array of wordsToFilter must be all lowercase too.
- stringToCheck: The string that you want to check.
const stringToCheck = `I am a political comment whose unique goal is to say the word republican.`;
lengthThreshold: The length of syllabes in which you want to find words separated by spaces.
'I am here to say the word r e p u b l i c a n'
will catch'republican'
if the value oflengthThreshold
is at least 1. If it's 2,'re pu bl ic an'
would be caught as well and so on. The larger you set this option to, the more prone you will be to false positives, so I wouldn't suggest using a number larger than 3, but that depends on your needs.leetAlphabet1 and leetAlphabet2: The function will perform two leet translations in the text. Given the wordsToFilter array shown above, take these sentences
'I am here to say the word republic@n'
and'I am here to say the word republ1c4n'
, both of them will be caught as 'republican'.
leetAlphabet1 and leetAlphabet2 must have the following format:
const textToLeetAlphabet1 = {
A: '@',
B: '8',
C: '(',
D: 'D',
E: '3',
F: 'F',
G: '6',
H: '#',
I: '!',
J: 'J',
K: 'K',
L: '1',
M: 'M',
N: 'N',
O: '0',
P: 'P',
Q: 'Q',
R: 'R',
S: '