context-url-extractor

Methods for extracting URLs from HTML or text strings with surrounding context

Usage no npm install needed!

<script type="module">
  import contextUrlExtractor from 'https://cdn.skypack.dev/context-url-extractor';
</script>

README

context-url-extractor

standard-readme compliant

Methods for extracting URLs from HTML or text strings with surrounding context

When data mining content that contains URLs, it's far easier for a machine to categorise them if they are semantic (or friendly) URLs:

Bad URL

Good URL

This package provides a jumping off point for data mining the surrounding context of each URL found in the supplied content.

Table of Contents

Install

npm install --save context-url-extractor

Usage

const extractor = new ContextUrlExtractor({ content });
const res = extractor.extractUrls();

Custom Context Lengths

The default pre and post context string lengths are set to 170 characters, but this can be overridden in the constructor.

const extractor = new ContextUrlExtractor({ content, contextCharsBefore: 80, contextCharsAfter: 80 });

Example Response

[
    {
        "url": "https://example.com/profile.aspx?section=99&trId=9877A4CF44987123AED90&rd=722108935",
        "contextPre": "nd. To log in to your profile please <a href=\"",
        "contextPost": "\">click here</a> and sign in with your email "
    }
]

Maintainers

@njhoran

Contributing

Small note: If editing the README, please conform to the standard-readme specification.

License

MIT © 2019 njhoran