qscraper

Web scraping library using promises

Usage no npm install needed!

<script type="module">
  import qscraper from 'https://cdn.skypack.dev/qscraper';
</script>

README

qScraper - Promises-based Web scraping library

Simple, basic web scraping with support for cookies, jQuery functionality. Uses Promises/A+ specification in place of callbacks.

The library has automatic support for gzip streams and 303 redirects (automatically follows them), which are not automatically supported by mikeal's request.

Usage

qScraper works along the lines of "sessions".

var scraper = require('qscraper');
var session = scraper.session(); // creates new "session" (stateless, no cookies)
var cookieSession = scrape.cookieSession(); // creates a session with it's own cookiejar.

API

params refers to a JSON key-value map, e.g. { "name": "myname", "password": "secret"}

Returns raw body:

session.get(uri, params)
session.post(uri, params)

e.g.

session.get('http://google.com')
       .then(function(body) {
            console.log(body); // prints out raw HTML
        });

Returns jQuery $:

session.get$(uri, params)
session.post$(uri, params)

e.g.

session.get$('http://google.com')
       .then(function($) {
            console.log($('title').text()); // prints out "Google"
        });

Returns parsed JSON response:

session.getJson(uri, params)
session.postJson(uri, params)

e.g.

var params = {
    'address': '1600 Amphitheatre Parkway, Mountain View, CA',
    'sensor': 'false'
};

session.getJson('http://maps.googleapis.com/maps/api/geocode/json', params)
       .then(function(json) {
            console.log(json); // prints out JSON response
        });

Download a file to a filename - currently no custom options.

  • If the filename is not specified, qscraper attempts to derive the filename from the uri.
  • If the filename is a directory that exists, qscraper will derive the filename from the uri and download to that folder.

session.download(uri, filename)

e.g.

session.download('https://ajax.googleapis.com/ajax/libs/swfobject/2.2/swfobject.js')
       .then(function(filename) {
            console.log(filename); // prints out 'swfobject.js', which is downloaded.
        });

Scraping sometimes requires custom headers, e.g. setting the Referrer.

session.addHeader(key, value)

e.g.

session.addHeader('Referer', 'http://www.google.com') .then(function() { return session.get('http://myhttp.info'); });

TODO

  • Add a test for the 303 Redirect functionality

Credits