raspador

A simple and powerful library for scraping metadata

Usage no npm install needed!

<script type="module">
  import raspador from 'https://cdn.skypack.dev/raspador';
</script>

README

Raspador

Raspador - Metadata scraping made easy!

TypeScript Commitizen Prettier EsLint

A simple and powerful library for scraping metadata. Easy to use scraper without much overhead. No complex logic involved. Just create your selectors and let raspador handle the rest.

Written in TypeScript, so you don't have to worry about finding and installing types separately.

Usage

  1. Import the required items from the package
import raspador, { ld$, Root, Selectors } from 'raspador';
  1. Create the scraper by providing the html string to raspador function
const scraper = raspador(html);
  1. Setup the rules
const selectors: Selectors = ($: Root) => ({
  title: [$('meta[property="og:title"]').attr('content'), $('title').text()],
  author: [ld$($, 'creator[0]')],
});
  1. Get the meta data by passing the selectors to the scraper
const result = scraper(selectors);

Selectors

Raspador uses Cheerio and hence the selectors are compatible. Here are some selectors:

$('meta[property="og:image:url"]').attr('content');
$('html').attr('lang');
$('meta[property="og:logo"]').attr('content');

Raspador exposes another selector for selecting keys from the Linked Data present in:

<script type="application/ld+json">
  {
    "@context": "https://schema.org/",
    "@type": "Recipe",
    "name": "Party Coffee Cake",
    "author": {
      "@type": "Person",
      "name": "Maicy Williams"
    },
    "datePublished": "2018-03-10",
    "description": "This coffee cake is awesome and perfect for parties.",
    "prepTime": "PT20M"
  }
</script>

For selecting the author name from the Linked/Structured Data:

ld$($, 'author.name')-- > 'Maicy Williams';

Selectors will be a function which receives the $ which returns an object where the key can be some identifier and the value will the array of selectors.

const selectors = ($: Root) => ({
  title: [$('meta[property="og:title"]').attr('content'), $('title').text()],
  author: [ld$($, 'creator[0]')],
});

Full Example

import fetch from 'node-fetch';
import raspador, { ld$, Root, Selectors } from 'raspador';

(async () => {
  const html = await fetch(
    'https://blog.sreyaj.dev/implementing-feature-flags-in-angular'
  ).then((res) => res.text());
  // Initialize raspador by passing in the html
  const scraper = raspador(html);

  // Setup the selectors
  const selectors: Selectors = ($: Root) => ({
    title: [$('meta[property="og:title"]').attr('content'), $('title').text()],
    author: [ld$($, 'creator[0]')],
  });
  
  // Pass the selectors to get the result
  const result = scraper(selectors);
  console.log({ result });
})();

Local Development

  1. Clone or download the repo
  2. Install dependencies
npm install
  1. Start the dev server
npm run dev

🤝 Contributing

Contributions, issues and feature requests are welcome.
Feel free to check issues page if you want to contribute.

Author

👤 Adithya Sreyaj

👍🏼 Show your support

Please ⭐️ this repository if this project helped you!

Inspiration and Idea

Show your support for MetaScraper: MetaScraper