README
hast-util-to-nlcst
hast utility to transform to nlcst.
Note: You probably want to use
rehype-retext
.
Install
This package is ESM only:
Node 12+ is needed to use it and it must be import
ed instead of require
d.
npm:
npm install hast-util-to-nlcst
Use
Say we have the following example.html
:
<article>
Implicit.
<h1>Explicit: <strong>foo</strong>s-ball</h1>
<pre><code class="language-foo">bar()</code></pre>
</article>
…and next to it, index.js
:
import {readSync} from 'to-vfile'
import {inspect} from 'unist-util-inspect'
import {toNlcst} from 'hast-util-to-nlcst'
import {ParseEnglish} from 'parse-english'
import rehype from 'rehype'
const file = readSync('example.html')
const tree = rehype().parse(file)
console.log(inspect(toNlcst(tree, file, ParseEnglish)))
Which, when running, yields:
RootNode[2] (1:1-6:1, 0-134)
├─ ParagraphNode[3] (1:10-3:3, 9-24)
│ ├─ WhiteSpaceNode: "\n " (1:10-2:3, 9-12)
│ ├─ SentenceNode[2] (2:3-2:12, 12-21)
│ │ ├─ WordNode[1] (2:3-2:11, 12-20)
│ │ │ └─ TextNode: "Implicit" (2:3-2:11, 12-20)
│ │ └─ PunctuationNode: "." (2:11-2:12, 20-21)
│ └─ WhiteSpaceNode: "\n " (2:12-3:3, 21-24)
└─ ParagraphNode[1] (3:7-3:43, 28-64)
└─ SentenceNode[4] (3:7-3:43, 28-64)
├─ WordNode[1] (3:7-3:15, 28-36)
│ └─ TextNode: "Explicit" (3:7-3:15, 28-36)
├─ PunctuationNode: ":" (3:15-3:16, 36-37)
├─ WhiteSpaceNode: " " (3:16-3:17, 37-38)
└─ WordNode[4] (3:25-3:43, 46-64)
├─ TextNode: "foo" (3:25-3:28, 46-49)
├─ TextNode: "s" (3:37-3:38, 58-59)
├─ PunctuationNode: "-" (3:38-3:39, 59-60)
└─ TextNode: "ball" (3:39-3:43, 60-64)
API
This package exports the following identifiers: toNlcst
.
There is no default export.
toNlcst(tree, file, Parser)
Transform the given hast tree to nlcst.
Parameters
tree
(HastNode
) — Tree with positional info (HastNode
)file
(VFile
) — Virtual fileparser
(Function
) — nlcst parser, such asparse-english
,parse-dutch
, orparse-latin
Returns
Notes
Implied paragraphs
The algorithm supports implicit and explicit paragraphs, such as:
<article>
An implicit paragraph.
<h1>An explicit paragraph.</h1>
</article>
Overlapping paragraphs are also supported (see the tests or the HTML spec for more info).
Ignored nodes
Some elements are ignored and their content will not be present in
nlcst: <script>
, <style>
, <svg>
, <math>
, <del>
.
To ignore other elements, add a data-nlcst
attribute with a value of ignore
:
<p>This is <span data-nlcst="ignore">hidden</span>.</p>
<p data-nlcst="ignore">Completely hidden.</p>
Source nodes
<code>
elements are mapped to Source
nodes in nlcst.
To mark other elements as source, add a data-nlcst
attribute with a value
of source
:
<p>This is <span data-nlcst="source">marked as source</span>.</p>
<p data-nlcst="source">Completely marked.</p>
Security
hast-util-to-nlcst
does not change the original syntax tree so there are no
openings for cross-site scripting (XSS) attacks.
Related
mdast-util-to-nlcst
— transform mdast to nlcstmdast-util-to-hast
— transform mdast to hasthast-util-to-mdast
— transform hast to mdasthast-util-to-xast
— transform hast to xasthast-util-sanitize
— sanitize hast nodes
Contribute
See contributing.md
in syntax-tree/.github
for ways to get
started.
See support.md
for ways to get help.
This project has a code of conduct. By interacting with this repository, organization, or community you agree to abide by its terms.