document-phrase-occurrence-parser

Finds the number of occurrences of one or more phrases in a directory of .doc, .docx, and .pdf files.

Usage no npm install needed!

<script type="module">
  import documentPhraseOccurrenceParser from 'https://cdn.skypack.dev/document-phrase-occurrence-parser';
</script>

Finds the number of occurrences of one or more phrases in a directory of .doc, .docx, and .pdf files.

npm install --global document-phrase-occurrence-parser

dpop --phrases "laser, shirt, the"

Textract is used to extract files. Depending on the files you want to extract and your OS, there may be external dependencies.

PDF extraction requires pdftotext be installed, link
DOC extraction requires antiword be installed, link, unless on OSX in which case textutil (installed by default) is used.