README
leo-profanity
Profanity filter, based on "Shutterstock" dictionary
Installation
// npm
npm install leo-profanity
npm install leo-profanity --no-optional # install only English bad word dictionary
// yarn
yarn add leo-profanity
yarn add leo-profanity --ignore-optional # install only English bad word dictionary
// Bower
bower install leo-profanity
// dictionary/default.json
Example usage for npm
// support langs
// - en
// - fr
var filter = require('leo-profanity');
filter.loadDictionary(string)
// replace current dictionary with the french one
filter.loadDictionary('fr');
// replace dictionary with the default one (same as filter.reset())
filter.loadDictionary();
filter.list()
// return all profanity words (Array.string)
filter.list();
filter.check(string)
Check out more cases on filter.clean
// output: true
filter.check('I have boob');
filter.clean(string, [replaceKey=*])
// no bad word
// output: I have 2 eyes
filter.clean('I have 2 eyes');
// normal case
// output: I have ****, etc.
filter.clean('I have boob, etc.');
// case sensitive
// output: I have ****
filter.clean('I have BoOb');
// separated by comma and dot
// output: I have ****.
filter.clean('I have BoOb.');
// multi occurrence
// output: I have ****,****, ***, and etc.
filter.clean('I have boob,boob, ass, and etc.');
// should not detect unspaced-word
// output: Buy classic watches online
filter.clean('Buy classic watches online');
// clean with custom replacement-character
// output: I have ++++
filter.clean('I have boob', '+');
// support "clear letter" in the beginning of the word
// output: I have bo++
filter.clean('I have boob', '+', 2);
filter.add(string|Array.string)
// add word
filter.add('b00b');
// add word's array
// check duplication automatically
filter.add(['b00b', 'b@@b']);
filter.remove(string|Array.string)
// remove word
filter.remove('b00b');
// remove word's array
filter.remove(['b00b', 'b@@b']);
filter.reset()
Reset word list by using default dictionary (also remove word that manually add)
filter.clearList()
Clear all profanity words
Algorithm
This project decide to split it into 2 parts, Sanitize
and Filter
and these below is a interesting algorithms.
Sanitize
Attempt 1 (1.1): convert all into lower string
Advantage:
- simple
Disadvantage:
- none
Attempt 2 (1.2): turn "similar-like" symbol to alphabet
e.g. convert `@` to `a`, `5` and `
npm:leo-profanity | Skypack
to `s`
Advantage:
- simple + detect some trick word (e.g. @ss, b00b)
Disadvantage:
- "false positive"
- limit user imagination (user cannot play with word)
e.g. joe@ssociallife.com
e.g. user want to try something funny like "a$a$in"
Attempt 3 (1.3): replace `.` and `,` with space to separate words
in some sentence, people usually using `.` and `,` to connect / end the sentence
Advantage:
- increase founding possibility
e.g. I like a55,b00b
Disadvantage:
- none
Filter
Attempt 1 (2.1): split into array (or using regex, somehow)
using space to split it into array then check by profanity word list
Advantage:
- simple
Disadvantage:
- need proper list
- some "false positive"
e.g. Great tit (https://en.wikipedia.org/wiki/Great_tit)
Attempt 2 (2.2): filter word inside (with or without space)
detect all alphabet that contain "profanity word" (e.g. `thistextisfunnyboobsanda55`)
Advantage:
- simple
- can detect "un-spaced" profanity word
Disadvantage:
- many "false positive"
e.g. http://www.morewords.com/contains/ass/
e.g. Clbuttic mistake (filter mistake)
Summary
- We don't know all methods that can produce profanity word (e.g. how many different ways can you enter a55 ?)
- There have a non-algorithm-based approach to achieve it (yet)
- People will always find a way to connect with each other (e.g. Leet)
So, this project decide to go with 1.1, 1.3 and 2.1. (*note - you can found other attempts in "Reference" section)
TODO
add
method- Filter html syntax
- Support multi-language
- Complete
clean
API - Increase code coverage percentage
- Fix ESLint
- Demo page
- More word dictionary
setDictionary
function- Encapsulate private function
- Order by alphabetical
- Order by length
- Release new version according to
loadDictionary
+ French words - Release completed API,
getDictionary
- Unit test of
proceed
method - Unit test of
badWordsUsed
method - Make other dictionaries optional (only English is a mandatory dictionary)
Other languages
- Javascript on npmjs.com/package/leo-profanity
- PHP on packagist.org/packages/jojoee/leo-profanity
- Python on pypi.org/project/leoprofanity
- Java on Maven
- Wordpress on wordpress.org
Contribute
- Fork the repo
- Install Node.js and dependencies
- Make a branch for your change and make your changes
- Run
git add -A
to add your changes - Run
npm run commit
(don't usegit commit
) - Push your changes with
git push
then create Pull Request
Contribute for owner
$ npm install -g semantic-release-cli
$ semantic-release-cli setup
Using above command to setup "semantic-release"
Stats
Reference
- Inspired by jwils0n/profanity-filter
- Algorithm / Discussion
- "similar-like" symbol to alphabet
- Replace Bad words using Regex
- Clbuttic
- The Clbuttic Mistake
- The Clbuttic Mistake: When obscenity filters go wrong
- Obscenity Filters: Bad Idea, or Incredibly Intercoursing Bad Idea?
- How do you implement a good profanity filter?
- The Untold History of Toontown’s SpeedChat (or BlockChattm from Disney finally arrives)
- Profanity Filter Performance in Java
- Resource bad-word list
- Tool