hast-util-to-nlcst

hast utility to transform to nlcst

Usage no npm install needed!

<script type="module">
  import hastUtilToNlcst from 'https://cdn.skypack.dev/hast-util-to-nlcst';
</script>

README

hast-util-to-nlcst

Build Coverage Downloads Size Sponsors Backers Chat

hast utility to transform to nlcst.

Note: You probably want to use rehype-retext.

Install

This package is ESM only: Node 12+ is needed to use it and it must be imported instead of required.

npm:

npm install hast-util-to-nlcst

Use

Say we have the following example.html:

<article>
  Implicit.
  <h1>Explicit: <strong>foo</strong>s-ball</h1>
  <pre><code class="language-foo">bar()</code></pre>
</article>

…and next to it, index.js:

import {readSync} from 'to-vfile'
import {inspect} from 'unist-util-inspect'
import {toNlcst} from 'hast-util-to-nlcst'
import {ParseEnglish} from 'parse-english'
import rehype from 'rehype'

const file = readSync('example.html')
const tree = rehype().parse(file)

console.log(inspect(toNlcst(tree, file, ParseEnglish)))

Which, when running, yields:

RootNode[2] (1:1-6:1, 0-134)
├─ ParagraphNode[3] (1:10-3:3, 9-24)
│  ├─ WhiteSpaceNode: "\n  " (1:10-2:3, 9-12)
│  ├─ SentenceNode[2] (2:3-2:12, 12-21)
│  │  ├─ WordNode[1] (2:3-2:11, 12-20)
│  │  │  └─ TextNode: "Implicit" (2:3-2:11, 12-20)
│  │  └─ PunctuationNode: "." (2:11-2:12, 20-21)
│  └─ WhiteSpaceNode: "\n  " (2:12-3:3, 21-24)
└─ ParagraphNode[1] (3:7-3:43, 28-64)
   └─ SentenceNode[4] (3:7-3:43, 28-64)
      ├─ WordNode[1] (3:7-3:15, 28-36)
      │  └─ TextNode: "Explicit" (3:7-3:15, 28-36)
      ├─ PunctuationNode: ":" (3:15-3:16, 36-37)
      ├─ WhiteSpaceNode: " " (3:16-3:17, 37-38)
      └─ WordNode[4] (3:25-3:43, 46-64)
         ├─ TextNode: "foo" (3:25-3:28, 46-49)
         ├─ TextNode: "s" (3:37-3:38, 58-59)
         ├─ PunctuationNode: "-" (3:38-3:39, 59-60)
         └─ TextNode: "ball" (3:39-3:43, 60-64)

API

This package exports the following identifiers: toNlcst. There is no default export.

toNlcst(tree, file, Parser)

Transform the given hast tree to nlcst.

Parameters
Returns

NlcstNode.

Notes
Implied paragraphs

The algorithm supports implicit and explicit paragraphs, such as:

<article>
  An implicit paragraph.
  <h1>An explicit paragraph.</h1>
</article>

Overlapping paragraphs are also supported (see the tests or the HTML spec for more info).

Ignored nodes

Some elements are ignored and their content will not be present in nlcst: <script>, <style>, <svg>, <math>, <del>.

To ignore other elements, add a data-nlcst attribute with a value of ignore:

<p>This is <span data-nlcst="ignore">hidden</span>.</p>
<p data-nlcst="ignore">Completely hidden.</p>
Source nodes

<code> elements are mapped to Source nodes in nlcst.

To mark other elements as source, add a data-nlcst attribute with a value of source:

<p>This is <span data-nlcst="source">marked as source</span>.</p>
<p data-nlcst="source">Completely marked.</p>

Security

hast-util-to-nlcst does not change the original syntax tree so there are no openings for cross-site scripting (XSS) attacks.

Related

Contribute

See contributing.md in syntax-tree/.github for ways to get started. See support.md for ways to get help.

This project has a code of conduct. By interacting with this repository, organization, or community you agree to abide by its terms.

License

MIT © Titus Wormer