leo-profanity

Profanity filter, based on Shutterstock dictionary

Usage no npm install needed!

<script type="module">
  import leoProfanity from 'https://cdn.skypack.dev/leo-profanity';
</script>

README

leo-profanity

continuous integration release runnable runnable old node runnable without optional dependencies Codecov Version - npm License - npm semantic-release

Profanity filter, based on "Shutterstock" dictionary

Installation

// npm
npm install leo-profanity
npm install leo-profanity --no-optional # install only English bad word dictionary

// yarn
yarn add leo-profanity
yarn add leo-profanity --ignore-optional # install only English bad word dictionary

// Bower
bower install leo-profanity
// dictionary/default.json

Example usage for npm

// support langs
// - en
// - fr

var filter = require('leo-profanity');

filter.loadDictionary(string)

// replace current dictionary with the french one
filter.loadDictionary('fr');

// replace dictionary with the default one (same as filter.reset())
filter.loadDictionary();

filter.list()

// return all profanity words (Array.string)
filter.list();

filter.check(string)

Check out more cases on filter.clean

// output: true
filter.check('I have boob');

filter.clean(string, [replaceKey=*])

// no bad word
// output: I have 2 eyes
filter.clean('I have 2 eyes');

// normal case
// output: I have ****, etc.
filter.clean('I have boob, etc.');

// case sensitive
// output: I have ****
filter.clean('I have BoOb');

// separated by comma and dot
// output: I have ****.
filter.clean('I have BoOb.');

// multi occurrence
// output: I have ****,****, ***, and etc.
filter.clean('I have boob,boob, ass, and etc.');

// should not detect unspaced-word
// output: Buy classic watches online
filter.clean('Buy classic watches online');

// clean with custom replacement-character
// output: I have ++++
filter.clean('I have boob', '+');

// support "clear letter" in the beginning of the word
// output: I have bo++
filter.clean('I have boob', '+', 2);

filter.add(string|Array.string)

// add word
filter.add('b00b');

// add word's array
// check duplication automatically
filter.add(['b00b', 'b@@b']);

filter.remove(string|Array.string)

// remove word
filter.remove('b00b');

// remove word's array
filter.remove(['b00b', 'b@@b']);

filter.reset()

Reset word list by using default dictionary (also remove word that manually add)

filter.clearList()

Clear all profanity words

Algorithm

This project decide to split it into 2 parts, Sanitize and Filter and these below is a interesting algorithms.

Sanitize

Attempt 1 (1.1): convert all into lower string
Advantage:
  - simple
Disadvantage:
  - none

Attempt 2 (1.2): turn "similar-like" symbol to alphabet
e.g. convert `@` to `a`, `5` and `

	
		
		
		
		
		
		
		
	npm:leo-profanity | Skypack
	
		
		
		
		 to `s`
Advantage:
  - simple + detect some trick word (e.g. @ss, b00b)
Disadvantage:
  - "false positive"
  - limit user imagination (user cannot play with word)
  e.g. joe@ssociallife.com
  e.g. user want to try something funny like "a$a$in"

Attempt 3 (1.3): replace `.` and `,` with space to separate words
in some sentence, people usually using `.` and `,` to connect / end the sentence
Advantage:
  - increase founding possibility
  e.g. I like a55,b00b
Disadvantage:
  - none

Filter

Attempt 1 (2.1): split into array (or using regex, somehow)
using space to split it into array then check by profanity word list
Advantage:
  - simple
Disadvantage:
  - need proper list
  - some "false positive"
  e.g. Great tit (https://en.wikipedia.org/wiki/Great_tit)

Attempt 2 (2.2): filter word inside (with or without space)
detect all alphabet that contain "profanity word" (e.g. `thistextisfunnyboobsanda55`)
Advantage:
  - simple
  - can detect "un-spaced" profanity word
Disadvantage:
  - many "false positive"
  e.g. http://www.morewords.com/contains/ass/
  e.g. Clbuttic mistake (filter mistake)

Summary

  • We don't know all methods that can produce profanity word (e.g. how many different ways can you enter a55 ?)
  • There have a non-algorithm-based approach to achieve it (yet)
  • People will always find a way to connect with each other (e.g. Leet)

So, this project decide to go with 1.1, 1.3 and 2.1. (*note - you can found other attempts in "Reference" section)

TODO

  • add method
  • Filter html syntax
  • Support multi-language
  • Complete clean API
  • Increase code coverage percentage
  • Fix ESLint
  • Demo page
  • More word dictionary
  • setDictionary function
  • Encapsulate private function
  • Order by alphabetical
  • Order by length
  • Release new version according to loadDictionary + French words
  • Release completed API, getDictionary
  • Unit test of proceed method
  • Unit test of badWordsUsed method
  • Make other dictionaries optional (only English is a mandatory dictionary)

Other languages

Contribute

  1. Fork the repo
  2. Install Node.js and dependencies
  3. Make a branch for your change and make your changes
  4. Run git add -A to add your changes
  5. Run npm run commit (don't use git commit)
  6. Push your changes with git push then create Pull Request

Contribute for owner

$ npm install -g semantic-release-cli
$ semantic-release-cli setup

Using above command to setup "semantic-release"

Stats

NPM Greenkeeper badge

Reference