scrappy

Extract rich metadata from URLs

Usage no npm install needed!

<script type="module">
  import scrappy from 'https://cdn.skypack.dev/scrappy';
</script>

README

Scrappy

NPM version NPM downloads Build status Test coverage

Extract rich metadata from URLs.

Try it using Runkit!

Installation

npm install scrappy --save

Usage

Scrappy attempts to parse and extract rich structured metadata from URLs.

import { scraper, urlScraper } from "scrappy";
import * as plugins from "scrappy/dist/plugins";

Scraper

Accepts a request function and a list of plugins to use. The request is expected to return a "page" object, which is the same shape as the input to scrape(page).

const scrape = scraper({
  request,
  plugins: [plugins.htmlmetaparser, plugins.exifdata],
});

const res = await fetch("http://example.com"); // E.g. `popsicle`.

await scrape({
  url: res.url,
  status: res.status,
  headers: res.headers.asObject(),
  body: res.stream(), // Must stream the request instead of buffering to support large responses.
});

URL Scraper

Simpler wrapper around scraper that automatically makes a request(url) for the page.

const scrape = urlScraper({ request });

await scrape("http://example.com");

License

Apache 2.0