node-phantom-async

Simple and reliable bridge between Node.js and PhantomJS

Usage no npm install needed!

<script type="module">
  import nodePhantomAsync from 'https://cdn.skypack.dev/node-phantom-async';
</script>

README

Node-phantom-async

This is a bridge between PhantomJs and Node.js.

This module is a fork from node-phantom-simple but all the methods of both phantom and page objects return promises instead of using callbacks. Bluebird library is used to support promises (even if you run node version that supports promises natively, Bluebird promises are subsctantially faster than native promises - see here).

This module has similar API to node-phantom but it doesn't rely on WebSockets or socket.io. In essence the communication between Node and Phantom has been made significantly simpler. It has the following advantages over node-phantom:

  • Fewer dependencies/layers.
  • Just uses the HTTP server module built into Node.
  • Doesn't use the unreliable and huge socket.io.
  • Works under cluster (node-phantom does not due to how server.listen(0) works in cluster)

Requirements

You will need to install PhantomJS first. The bridge assumes that the "phantomjs" binary is available in the PATH, or you will need to pass its path into the `phantom.create() method.

For running the tests you will need mocha. The tests require PhantomJS 1.6 or newer to pass.

Installing

npm install node-phantom-async

Usage

The entire API of PhantomJS should work, with the exception that every method call returns a promise instead of returning values.

For example this is an adaptation of a web scraping example:

var phantom = require('node-phantom-async');

phantom.create()
.bind({})
.then(function (ph) {
    this.ph = ph;
    return ph.createPage();
})
.then(function (page) {
    this.page = page;
    return page.open('http://tilomitra.com/repository/screenscrape/ajax.html');
})
.then(function (status) {
    console.log('opened site?', status);
    var jqUrl = 'http://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js'
    return this.page.includeJs(jqUrl);
})
.delay(5000) // Wait for AJAX content to load on the page.
.then(function() {
    // jQuery Loaded.
    return this.page.evaluate(function() {
        //Get what you want from the page using jQuery. A good way is to populate an object with all the jQuery commands that you need and then return the object.
        var h2Arr = [],
        pArr = [];
        $('h2').each(function() {
            h2Arr.push($(this).html());
        });
        $('p').each(function() {
            pArr.push($(this).html());
        });

        return {
          h2: h2Arr,
          p: pArr
        };
    });
})
.then(function (result) {
    console.log(result);
})
.finally(function() {
    return this.ph.exit();
})
.finally(function() {
    process.exit();    
});

You can run this sample with node samples/web_scraping.

phantom.create(options)

options is an optional object with options for how to start PhantomJS. options.parameters is an array of parameters that will be passed to PhantomJS on the commandline.

For example

phantom.create({parameters: {'ignore-ssl-errors': 'yes'}})

will start phantom as:

phantomjs --ignore-ssl-errors=yes

You may also pass in a custom path if you need to select a specific instance of PhantomJS or it is not present in PATH environment. This can for example be used together with the PhantomJS package like so:

phantom.create({phantomPath: require('phantomjs').path})

You can also have a look at the test folder to see some examples of using the API, however the de-facto reference is the PhantomJS documentation. Just change all return values into callbacks.

options.ignoreErrorPattern is a regular expression that can be used to silence spurious warnings generated by qt and phantomjs.

On Mavericks, you can use: {ignoreErrorPattern: /CoreText/} to suppress some common annoying font-related warnings.

Additional page methods

In addition to methods from phantomjs api there are several convenience methods returning promises.

checkSelector(selector [, retryOptions])

getSelectorRect(selector [, retryOptions])

Returns the result of el.getBoundingClientRect() where el is the first element matching the selector.

getSelectorVisiblePoint(selector [, retryOptions])

Returns a visible point inside element matching selector. Tests up to 9 points.

sendMouseEventToSelector(selector, mouseEventType, [, button='left'] [, retryOptions])

The default button is left, there is no need to specify button to pass retryOptions - if the last parameter is an object it will be used for retrying.

clickSelector(selector [, button='left'] [, retryOptions])

clickSelector is sugar for sendMouseEventToSelector

submitForm(formSelector, data [, retryOptions])

data is an object with keys matching name attributes of input elements.

If retryOptions are passed the methods will try to check the existence of selector multiple times which can be useful if the page is dynamically updated.

retryOptions support interval, backoff, max_interval, timeout and max_tries properties. Check bluebird-retry library that is used for retrying for details on those options.

If retryOptions are not passed check is performed only once and the returned promise is rejected if the selector matches no elements on the page when it is checked.

WebPage Callbacks

All of the WebPage callbacks have been implemented including onCallback, and are set the same way as with the core phantomjs library:

page.onResourceReceived = function(response) {
    console.log('Response (#' + response.id + ', stage "' + response.stage + '"): ' + JSON.stringify(response));
};

This includes the onPageCreated callback which receives a new page object.

Properties

Properties on the WebPage and Phantom objects are accessed via the get()/set() method calls:

page.get('content')
.then(function (html) {
    console.log("Page HTML is: " + html);
})

page.set('zoomfactor', 0.25)
.then(function () {
    page.render('capture.png');
})

License - MIT

Copyright (c) 2013 Matt Sergeant & Evgeny Poberezkin

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.