vacuumjs

A low-level node.js web page content extractor based on parse5.

Usage no npm install needed!

<script type="module">
  import vacuumjs from 'https://cdn.skypack.dev/vacuumjs';
</script>

README

vacuumjs

A low-level node.js web page content extractor based on parse5.

Build Status codecov

Usage

var extract = require('vacuumjs')
var targetDOM = parse5.parse('some page content')
// the reference dom, not optional
var refDOM = parse5.parse('reference page content')
console.log(extract(targetDOM, refDOM))

Principium

  • Layout similairity
  • Text density