leo-profanity

Profanity filter, based on Shutterstock dictionary

Usage no npm install needed!

<script type="module">
  import leoProfanity from 'https://cdn.skypack.dev/leo-profanity';
</script>

README

leo-profanity

Profanity filter, based on "Shutterstock" dictionary

Installation

// npm
npm install leo-profanity
npm install leo-profanity --no-optional # install only English bad word dictionary

// yarn
yarn add leo-profanity
yarn add leo-profanity --ignore-optional # install only English bad word dictionary

// Bower
bower install leo-profanity
// dictionary/default.json

Example usage for npm

// support langs
// - en
// - fr

var filter = require('leo-profanity');

filter.loadDictionary(string)

// replace current dictionary with the french one
filter.loadDictionary('fr');

// replace dictionary with the default one (same as filter.reset())
filter.loadDictionary();

filter.list()

// return all profanity words (Array.string)
filter.list();

filter.check(string)

Check out more cases on filter.clean

// output: true
filter.check('I have boob');

filter.clean(string, [replaceKey=*])

// no bad word
// output: I have 2 eyes
filter.clean('I have 2 eyes');

// normal case
// output: I have ****, etc.
filter.clean('I have boob, etc.');

// case sensitive
// output: I have ****
filter.clean('I have BoOb');

// separated by comma and dot
// output: I have ****.
filter.clean('I have BoOb.');

// multi occurrence
// output: I have ****,****, ***, and etc.
filter.clean('I have boob,boob, ass, and etc.');

// should not detect unspaced-word
// output: Buy classic watches online
filter.clean('Buy classic watches online');

// clean with custom replacement-character
// output: I have ++++
filter.clean('I have boob', '+');

// support "clear letter" in the beginning of the word
// output: I have bo++
filter.clean('I have boob', '+', 2);

filter.add(string|Array.string)

// add word
filter.add('b00b');

// add word's array
// check duplication automatically
filter.add(['b00b', 'b@@b']);

filter.remove(string|Array.string)

// remove word
filter.remove('b00b');

// remove word's array
filter.remove(['b00b', 'b@@b']);

filter.reset()

Reset word list by using default dictionary (also remove word that manually add)

filter.clearList()

Clear all profanity words

Algorithm

This project decide to split it into 2 parts, Sanitize and Filter and these below is a interesting algorithms.

Sanitize

Attempt 1 (1.1): convert all into lower string
Advantage:
  - simple
Disadvantage:
  - none

Attempt 2 (1.2): turn "similar-like" symbol to alphabet
e.g. convert `@` to `a`, `5` and `

	
		
		
		
		
		
		
		
	npm:leo-profanity | Skypack
	
		
		
		
		 to `s`
Advantage:
  - simple + detect some trick word (e.g. @ss, b00b)
Disadvantage:
  - "false positive"
  - limit user imagination (user cannot play with word)
  e.g. joe@ssociallife.com
  e.g. user want to try something funny like "a$a$in"

Attempt 3 (1.3): replace `.` and `,` with space to separate words
in some sentence, people usually using `.` and `,` to connect / end the sentence
Advantage:
  - increase founding possibility
  e.g. I like a55,b00b
Disadvantage:
  - none

Filter

Attempt 1 (2.1): split into array (or using regex, somehow)
using space to split it into array then check by profanity word list
Advantage:
  - simple
Disadvantage:
  - need proper list
  - some "false positive"
  e.g. Great tit (https://en.wikipedia.org/wiki/Great_tit)

Attempt 2 (2.2): filter word inside (with or without space)
detect all alphabet that contain "profanity word" (e.g. `thistextisfunnyboobsanda55`)
Advantage:
  - simple
  - can detect "un-spaced" profanity word
Disadvantage:
  - many "false positive"
  e.g. http://www.morewords.com/contains/ass/
  e.g. Clbuttic mistake (filter mistake)

Summary

We don't know all methods that can produce profanity word (e.g. how many different ways can you enter a55 ?)
There have a non-algorithm-based approach to achieve it (yet)
People will always find a way to connect with each other (e.g. Leet)

So, this project decide to go with 1.1, 1.3 and 2.1. (*note - you can found other attempts in "Reference" section)

TODO

add method
Filter html syntax
Support multi-language
Complete clean API
Increase code coverage percentage
Fix ESLint
Demo page
More word dictionary
setDictionary function
Encapsulate private function
Order by alphabetical
Order by length
Release new version according to loadDictionary + French words
Release completed API, getDictionary
Unit test of proceed method
Unit test of badWordsUsed method
Make other dictionaries optional (only English is a mandatory dictionary)

Other languages

Javascript on npmjs.com/package/leo-profanity
PHP on packagist.org/packages/jojoee/leo-profanity
Python on pypi.org/project/leoprofanity
Java on Maven
Wordpress on wordpress.org

Contribute

Fork the repo
Install Node.js and dependencies
Make a branch for your change and make your changes
Run git add -A to add your changes
Run npm run commit (don't use git commit)
Push your changes with git push then create Pull Request

Contribute for owner

$ npm install -g semantic-release-cli
$ semantic-release-cli setup

Using above command to setup "semantic-release"

Usage no npm install needed!

README

leo-profanity

Installation

Example usage for npm

filter.loadDictionary(string)

filter.list()

filter.check(string)

filter.clean(string, [replaceKey=*])

filter.add(string|Array.string)

filter.remove(string|Array.string)

filter.reset()

filter.clearList()

Algorithm

Sanitize

Filter

TODO

Other languages

Contribute

Contribute for owner

Stats

Reference