npm:@mojojs/dom

README

A fast and minimalistic HTML/XML DOM parser with CSS selectors. Written in TypeScript.

import DOM from '@mojojs/dom';

// Parse
const dom = new DOM('<div><p id="a">Test</p><p id="b">123</p></div>');

// Find
console.log(dom.at('#b').text());
console.log(dom.find('p').map(el => el.text()).join('\n'));
console.log(dom.find('[id]').map(el => el.attr.id).join('\n'));

// Modify
dom.at('div p').append('<p id="c">456</p>');
dom.find(':not(p)').forEach(el => el.strip());

// Render
console.log(dom.toString());

Formats

There are currently three input formats supported. For HTML documents and fragments we use parse5, and for XML a very relaxed custom parser that will try to make sense of whatever tag soup you hand it.

// HTML document ("<head>", "<body>"... get added automatically)
const dom = new DOM('<p>Hello World!</p>');

// HTML fragment
const dom = new DOM('<p>Hello World!</p>', {fragment: true});

// XML
const dom = new DOM('<rss><link>http://example.com</link></rss>', {xml: true});

Nodes and Elements

When we parse an HTML/XML document or fragment, it gets turned into a tree of nodes.

<!DOCTYPE html>
<html>
  <head><title>Hello</title></head>
  <body>World!</body>
</html>

There are currently eight different kinds of nodes, #cdata, #comment, #doctype, #document, #element, #fragment,#pi, and #text.

#document
|- #doctype (html)
+- #element (html)
   |- #element (head)
   |  +- #element (title)
   |     +- #text (Hello)
   +- #element (body)
      +- #text (World!)

While nodes such as #document and #fragment can be represented by DOM objects, features like dom.attr and dom.tag will not work for them.

CSS Selectors

All CSS selectors that make sense for a standalone parser are supported.

Pattern	Represents
`*`	any element
`E`	an element of type E
`E:not(s1, s2, …)`	an E element that does not match either compound selector s1 or compound selector s2
`E:is(s1, s2, …)`	an E element that matches compound selector s1 and/or compound selector s2
`E.warning`	an E element belonging to the class warning
`E#myid`	an E element with ID equal to myid
`E[foo]`	an E element with a foo attribute
`E[foo="bar"]`	an E element whose foo attribute value is exactly equal to bar
`E[foo="bar" i]`	an E element whose foo attribute value is exactly equal to any (ASCII-range) case-permutation of bar
`E[foo="bar" s]`	an E element whose foo attribute value is exactly and case-sensitively equal to bar
`E[foo~="bar"]`	an E element whose foo attribute value is a list of whitespace-separated values, one of which is exactly equal to bar
`E[foo^="bar"]`	an E element whose foo attribute value begins exactly with the string bar
`E[foo$="bar"]`	an E element whose foo attribute value ends exactly with the string bar
`E[foo*="bar"]`	an E element whose foo attribute value contains the substring bar
`E:any-link`	an E element being the source anchor of a hyperlink
`E:link`	an E element being the source anchor of a hyperlink of which the target is not yet visited
`E:visited`	an E element being the source anchor of a hyperlink of which the target is already visited
`E:checked`	a user interface element E that is checked/selected (for instance a radio-button or checkbox)
`E:root`	an E element, root of the document
`E:empty`	an E element that has no children (neither elements nor text) except perhaps white space
`E:nth-child(n [of S]?)`	an E element, the n-th child of its parent matching S
`E:nth-last-child(n [of S]?)`	an E element, the n-th child of its parent matching S, counting from the last one
`E:first-child`	an E element, first child of its parent
`E:last-child`	an E element, last child of its parent
`E:only-child`	an E element, only child of its parent
`E:nth-of-type(n)`	an E element, the n-th sibling of its type
`E:nth-last-of-type(n)`	an E element, the n-th sibling of its type, counting from the last one
`E:first-of-type`	an E element, first sibling of its type
`E:last-of-type`	an E element, last sibling of its type
`E:only-of-type`	an E element, only sibling of its type
`E:text(string)`	an E element containing text content that substring matches the given string case-insensitively
`E:text(/pattern/i)`	an E element containing text content that regex matches the given pattern
`E F`	an F element descendant of an E element
`E > F`	an F element child of an E element
`E + F`	an F element immediately preceded by an E element
`E ~ F`	an F element preceded by an E element

All supported CSS4 selectors are considered experimental and might change as the spec evolves.

API

Everything you need to extract information from HTML/XML documents and make changes to the DOM tree.

// Parse HTML
const dom = new DOM('<div class="greeting">Hello World!</div>');

// Render `DOM` object to HTML
const html = dom.toString();

// Create a new `DOM` object with one HTML tag
const div = DOM.newTag('div', {class: 'greeting'}, 'Hello World!');

Navigate the DOM tree with and without CSS selectors.

// Find one element matching the CSS selector and return it as `DOM` object
const div = dom.at('div > p');

// Find all elements marching the CSS selector and return them as `DOM` objects
const divs = dom.find('div > p');

// Get root element as `DOM` object (document or fragment node)
const root = dom.root();

// Get parent element as `DOM` object
const parent = dom.parent();

// Get all ancestor elements as `DOM` objects
const ancestors = dom.ancestors();
const ancestors = dom.ancestors('div > p');

// Get all child elements as `DOM` objects
const children = dom.children();
const children = dom.children('div > p');

// Get all sibling elements before this element as `DOM` objects
const preceding = dom.preceding();
const preceding = dom.preceding('div > p');

// Get all sibling elements after this element as `DOM` objects
const following = dom.following();
const following = dom.following('div > p');

// Get sibling element before this element as `DOM` objects
const previous = dom.previous();

// Get sibling element after this element as `DOM` objects
const next = dom.next();

Extract information and manipulate elements.

// Check if element matches the given CSS selector
const isDiv = dom.matches('div > p');

// Extract text content from element
const greeting = dom.text();
const greeting = dom.text({recursive: true});

// Get element tag
const tag = dom.tag;

// Set element tag
dom.tag = 'div';

// Get element attribute value
const class = dom.attr.class;

// Set element attribute value
dom.attr.class = 'whatever';

// Remove element attribute
delete dom.attr.class;

// Get element attribute names
const names = Object.keys(dom.attr);

// Get element's rendered content
const content = dom.content();

// Get form value
const formValue = dom.at('input').val();
const formValue = dom.at('option').val();
const formValue = dom.at('select').val();
const formValue = dom.at('textarea').val();
const formValue = dom.at('button').val();

// Find this element's namespace
const namespace = dom.namespace();

// Get a unique CSS selector for this element
const selector = dom.selector();

// Remove element and its children
dom.remove();

// Remove element but preserve its children
dom.strip();

// Replace element and its children
dom.replace('<p>Hello World!</p>');

// Append HTML/XML fragment after this element
dom.append('<p>Hello World!</p>');

// Append HTML/XML fragment to this element's content
dom.appendContent('<p>Hello World!</p>');

// Prepend HTML/XML fragment before this element
dom.prepend('<p>Hello World!</p>');

// Prepend HTML/XML fragment to this element's content
dom.prependContent('<p>Hello World!</p>');

// Wrap HTML/XML fragment around this element
dom.wrap('<div></div>');

// Wrap HTML/XML fragment around the content of this element
dom.wrapContent('<div></div>');

There is also a node level API that you can for example use to extend the DOM class. It is however still in flux, and therefore not fully documented yet.

// Remove comment nodes that are children of this element
dom.currentNode.childNodes
  .filter(node => node.nodeType === '#comment')
  .forEach(node => node.detach());

// Extract text surrounding this element
const text = dom.currentNode.parentNode.childNodes
  .filter(node => node.nodeType === '#text')
  .map(node => node.value)
  .join('');

Installation

All you need is Node.js 16.0.0 (or newer).

$ npm install @mojojs/dom

Usage no npm install needed!

README

Formats

Nodes and Elements

CSS Selectors

API

Installation