document-phrase-occurrence-parser

Finds the number of occurrences of one or more phrases in a directory of .doc, .docx, and .pdf files.

Usage no npm install needed!

<script type="module">
  import documentPhraseOccurrenceParser from 'https://cdn.skypack.dev/document-phrase-occurrence-parser';
</script>

README

Document Phrase Occurrence Parser

Finds the number of occurrences of one or more phrases in a directory of .doc, .docx, and .pdf files.

Installation

npm install --global document-phrase-occurrence-parser

Usage

dpop --phrases "laser, shirt, the"

Extraction Requirements

Textract is used to extract files. Depending on the files you want to extract and your OS, there may be external dependencies.

  • PDF extraction requires pdftotext be installed, link
  • DOC extraction requires antiword be installed, link, unless on OSX in which case textutil (installed by default) is used.