rehype-infer-description-meta

rehype plugin to infer file metadata from the document

Usage no npm install needed!

<script type="module">
  import rehypeInferDescriptionMeta from 'https://cdn.skypack.dev/rehype-infer-description-meta';
</script>

README

rehype-infer-description-meta

Build Coverage Downloads Size Sponsors Backers Chat

rehype plugin to infer the description of a document.

Contents

What is this?

This package is a unified (rehype) plugin to infer the description of a document. It supports different methods: a specific element, everything up to a comment, or up to a certain number of characters.

unified is a project that transforms content with abstract syntax trees (ASTs). rehype adds support for HTML to unified. vfile is the virtual file interface used in unified. hast is the HTML AST that rehype uses. This is a rehype plugin that inspects hast and adds metadata to vfiles.

When should I use this?

This plugin is particularly useful in combination with rehype-meta. When both are used together, a <meta name=description> is populated with the document’s description.

Install

This package is ESM only. In Node.js (version 12.20+, 14.14+, or 16.0+), install with npm:

npm install rehype-infer-description-meta

In Deno with Skypack:

import rehypeInferDescriptionMeta from 'https://cdn.skypack.dev/rehype-infer-description-meta@1?dts'

In browsers with Skypack:

<script type="module">
  import rehypeInferDescriptionMeta from 'https://cdn.skypack.dev/rehype-infer-description-meta@1?min'
</script>

Use

Say our module example.js looks as follows:

import {unified} from 'unified'
import rehypeParse from 'rehype-parse'
import rehypeInferDescriptionMeta from 'rehype-infer-description-meta'
import rehypeDocument from 'rehype-document'
import rehypeMeta from 'rehype-meta'
import rehypeFormat from 'rehype-format'
import rehypeStringify from 'rehype-stringify'

const examples = [
  // Example where the description is in a certain element.
  `<h1>Hello, world!</h1>
  <p class="byline">Lorem ipsum</p>
  <p>Dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.</p>`,
  // Example where the description runs from the start to a comment.
  `<h1>Hello, world!</h1>
  <p>Lorem ipsum<!--more--> dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.</p>`,
  // Example where the description runs from the start to a certain number of characters.
  `<h1>Hello, world!</h1>
    <p>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.</p>`
]

examples.map((example) => main(example))

async function main(example) {
  const file = await unified()
    .use(rehypeParse, {fragment: true})
    .use(rehypeInferDescriptionMeta, {selector: '.byline'})
    .use(rehypeDocument)
    .use(rehypeMeta)
    .use(rehypeFormat)
    .use(rehypeStringify)
    .process(example)

  console.log(String(file))
}

Now running node example.js yields:

👉 Note: observe each meta[name="description"] being derived from body.

<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <meta name="description" content="Lorem ipsum">
  </head>
  <body>
    <h1>Hello, world!</h1>
    <p class="byline">Lorem ipsum</p>
    <p>Dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.</p>
  </body>
</html>

<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <meta name="description" content="Lorem ipsum">
  </head>
  <body>
    <h1>Hello, world!</h1>
    <p>Lorem ipsum<!--more--> dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.</p>
  </body>
</html>

<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <meta name="description" content="Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad…">
  </head>
  <body>
    <h1>Hello, world!</h1>
    <p>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.</p>
  </body>
</html>

API

This package exports no identifiers. The default export is rehypeInferDescriptionMeta.

unified().use(rehypeInferDescriptionMeta, options?)

Infer the description from a document as file metadata. The result is stored on file.data.meta.description. It’s inferred through three strategies:

  1. If options.selector is set and an element for that found, then the description is the text of that element
  2. Otherwise, if a comment is found with the text of options.comment, then the description is the text up to that comment
  3. Otherwise, the description is the text up to options.truncateSize
options

Configuration (optional).

options.selector

CSS selector to the description (string, optional, example: '.byline'). One of the strategies is to look for a certain element, useful if the description is nicely encoded in one element.

options.comment

String to look for in a comment (string, default: 'more'). One of the strategies is to look for this comment, everything before it is the description.

options.truncateSize

Number of characters to truncate to (number, default: 140). One of the strategies is to truncate the document to a certain number of characters.

options.mainSelector

CSS selector to body of content (string, optional, example: 'main'). Useful to exclude other things, such as the head, ads, styles, scripts, and other random stuff, by focussing all strategies in one element.

options.ignoreSelector

CSS selector of nodes to ignore (string, default: 'h1, script, style, noscript, template'). Used when looking for an excerpt comment or truncating the document.

options.maxExcerptSearchSize

How far to search for the excerpt comment before bailing (number, default: 2048). The goal of explicit excerpts is that they are assumed to be somewhat reasonably placed. This option prevents searching giant documents for some comment that probably won’t be found at the end.

options.inferDescriptionHast

Whether to also expose file.data.meta.descriptionHast (boolean, default: false). This is not used by rehype-meta, but could be useful to other plugins. The value at descriptionHast contains the rich HTML elements rather than the plain text content.

Types

This package is fully typed with TypeScript. The extra type Options is exported.

Compatibility

Projects maintained by the unified collective are compatible with all maintained versions of Node.js. As of now, that is Node.js 12.20+, 14.14+, and 16.0+. Our projects sometimes work with older versions, but this is not guaranteed.

This plugin works with rehype-parse version 3+, rehype-stringify version 3+, rehype version 4+, and unified version 6+.

Security

Use of rehype-infer-description-meta is safe.

Related

Contribute

See contributing.md in rehypejs/.github for ways to get started. See support.md for ways to get help.

This project has a code of conduct. By interacting with this repository, organization, or community you agree to abide by its terms.

License

MIT © Titus Wormer