agentmarkdown

An accurate, extensible, and fast HTML-to-markdown converter.

Usage no npm install needed!

<script type="module">
  import agentmarkdown from 'https://cdn.skypack.dev/agentmarkdown';
</script>

README

npm version npm downloads Build Status Coverage Status Greenkeeper badge Code Quality Minified Size License GitHub stars

Agent Markdown

An accurate, extensible, and fast HTML-to-markdown converter.

Agent Markdown is a HTML user agent that parses HTML, performs a document layout according to the CSS stylesheet for HTML and then "renders" the laid out document to Markdown. This results in markdown that looks very similar to the way the HTML document looked when parsed and rendered in a browser (user agent).

Usage / Quick Start

import { AgentMarkdown } from "agentmarkdown"
const markdownString = await AgentMarkdown.produce(htmlString)

Install

npm (npm install agentmarkdown)

Features

CLI Example

You can convert any HTML file to Markdown at the command line using the following command, and the markdown output will be printed to stdout:

agentmarkdown <filename.html>

It also responds to stdin, if you pipe html to it. So you can do things like:

echo "<b>bold</bold>" | agentmarkdown > myfile.md

The above commands assume you installed agentmarkdown with npm install --global agentmarkdown but it also works with npx so you can run it without installing like:

npx agentmarkdown <filename.html>

Live Example

You can view the live online web example at https://agentmarkdown.now.sh.

You can build and the web example locally with the following commands:

cd example/
npm install
npm run start

NOTE: If you have trouble starting the example on macOS related to fsevents errors, it may require running xcode-select --install. If that doesn't work, then possibly a sudo rm -rf $(xcode-select -print-path) followed by xcode-select --install will be necessary.

Customize & Extend with Plugins

To customize how the markdown is generated or add support for new elements, implement the LayoutPlugin interface to handle a particular HTML element. The LayoutPlugin interface is defined as follows:

export interface LayoutPlugin {
  /**
   * Specifies the name of the HTML element that this plugin renders markdown for.
   * NOTE: Must be all lowercase
   */
  elementName: string
  /**
   * This is the core of the implementation that will be called for each instance of the HTML element that this plugin is registered for.
   */
  layout: LayoutGenerator
}

The LayoutGenerator is a single function that performs a CSS2 box generation layout algorithm on the an HTML element. Essentially it creates zero or more boxes for the given element that AgentMarkdown will render to text. A box can contain text content and/or other boxes, and each box has a type of inline or block. Inline blocks are laid out horizontally. Block boxes are laid out vertically (i.e. they have new line characters before and after their contents). The LayoutGenerator function definition is as follows:

export interface LayoutGenerator {
  (
    context: LayoutContext,
    manager: LayoutManager,
    element: HtmlNode
  ): CssBox | null
}

An example of how the HTML <b> element could be implemented as a plugin like the following:

class BoldPlugin {
  elementName: "b"

  layout: LayoutGenerator = (
    context: LayoutContext,
    manager: LayoutManager,
    element: HtmlNode
  ): CssBox | null => {
    // let the manager use other plugins to layout any child elements:
    const kids = manager.layout(context, element.children)
    // wrap the child elements in the markdown ** syntax for bold/strong:
    kids.unshift(manager.createBox(BoxType.inline, "**"))
    kids.push(manager.createBox(BoxType.inline, "**"))
    // return a new box containing everything:
    return manager.createBox(BoxType.inline, "", kids)
  }
}

To initialize AgentMarkdown with plugins pass them in as an array value for the layoutPlugins option as follows. To customize the rendering an element you can just specify a plugin for the elementName and your plugin will override the built-in plugin.

const result = await AgentMarkdown.render({
    html: myHtmlString,
    layoutPlugins: [
      new BoldPlugin()
    ]
  })

Show your support

Please give a ⭐️ if this project helped you!

Contributing 🤝

This is a community project. We invite your participation through issues and pull requests! You can peruse the contributing guidelines.

Building

The package is written in TypeScript. To build the package run the following from the root of the repo:

npm run build # It will be built in /dist

Release Process (Deploying to NPM) 🚀

We use semantic-release to consistently release semver-compatible versions. This project deploys to multiple npm distribution tags. Each of the below branches correspond to the following npm distribution tags:

branch npm distribution tag
master latest
beta beta

To trigger a release use a Conventional Commit following Angular Commit Message Conventions on one of the above branches.

Todo / Roadmap

see /docs/todo.md

Alternatives

License 📝

Copyright © 2019 Scott Willeke.

This project is licensed via Mozilla Public License 2.0.