README
Poppler for Node
Allows you to use the native Poppler C++ backend to efficiently parse PDF files from NodeJS. Outputs similar information to pdftohtml -xml -stdout test.pdf
(with pdftohtml
from the poppler-utils
package), because it uses parts of the same codebase which have been rewritten to output N-API objects instead of XML code. All contained functions return JavaScript promises.
Getting started
- dependencies for a Dockerfile based on
node:14-buster-slim
:ghostscript python build-essential libjpeg62-turbo-dev libpng-dev libcurl4-gnutls-dev mupdf-tools libfreetype6-dev qpdf
npm install poppler-native
(only tested on Ubuntu 16.04 and 20.04 so far)
// allows filename...
const pdf = require('poppler-native');
pdf.info('test.pdf').then(res => console.log(res));
// ...or buffer with raw PDF bytes directly
const fs = require('fs-extra');
fs.readFile('test.pdf')
.then(f => pdf.info(f))
.then(res => console.log(res));
In order to visualize the parsed text boxes and images, you can also write the entire output from the pdf.info
function into a JSON file, then open the file misc/pdf-json-viewer.html
in any web browser and drag-and-drop the JSON file there.
Contributing
Updating Poppler
This is obviously only necessary when a new version of Poppler is released. According to their readme, the internal Poppler C++ API, which is the foundation of this project, might be subject to breaking changes, even in minor releases. Consequently, evaluate new Poppler versions thoroughly before updating.
- Download the Poppler sources from here.
- Put all
*.h
,*.c
and*.cc
files frompoppler-20.11.0/goo
intonative/poppler/goo
. The same forfofi
andpoppler
. Do not change the existing two config files. - Remove the following files from the
native/poppler/poppler
directory:CairoFontEngine.cc CairoFontEngine.h CairoOutputDev.cc CairoOutputDev.h CairoRescaleBox.cc CairoRescaleBox.h GlobalParamsWin.cc JPEG2000Stream.cc JPEG2000Stream.h SignatureHandler.cc SignatureHandler.h SplashOutputDev.cc SplashOutputDev.h
- Remove the line
#include "splash/SplashTypes.h"
fromnative/poppler/poppler/GfxState.cc
.
License
GPLv2 or later, because the Poppler source is bundled.