README
wordsoap-regex
Regular expressions for cleaning up dirty HTML output from Microsoft Word.
module.exports = {
// from http://tim.mackey.ie/CleanWordHTMLUsingRegularExpressions.aspx
msoTags: /<[\/]?(font|span|xml|del|ins|[ovwxp]:\w+)[^>]*?>/,
msoAttributes: /<([^>]*)(?:class|lang|style|size|face|[ovwxp]:\w+)=(?:'[^']*'|""[^""]*""|[^\s>]+)([^>]*)>/,
}
License
ISC © Raine Lourie