grunt-check-pages

Grunt task that checks various aspects of a web page for correctness.

Usage no npm install needed!

<script type="module">
  import gruntCheckPages from 'https://cdn.skypack.dev/grunt-check-pages';
</script>

README

grunt-check-pages

Grunt task that checks various aspects of a web page for correctness.

npm version GitHub tag Build status Coverage License

Getting Started

This plugin requires Grunt ~0.4.4 or later.

If you haven't used Grunt before, be sure to check out the Getting Started guide, as it explains how to create a Gruntfile as well as install and use Grunt plugins. Once you're familiar with that process, you may install this plugin with this command:

npm install grunt-check-pages --save-dev

Once the plugin has been installed, it may be enabled inside your Gruntfile with this line of JavaScript:

grunt.loadNpmTasks('grunt-check-pages');

For similar functionality without a Grunt dependency, please see the check-pages package.

For direct use, the check-pages-cli package wraps check-pages with a command-line interface.

The "checkPages" task

Overview

An important aspect of creating a web site is validating the structure, content, and configuration of the site's pages. The checkPages task provides an easy way to integrate this testing into your normal Grunt workflow.

By providing a list of pages to scan, the task can:

Usage

In your project's Gruntfile, add a section named checkPages to the data object passed into grunt.initConfig(). The following example includes all supported options:

grunt.initConfig({
  checkPages: {
    development: {
      options: {
        pageUrls: [
          'http://localhost:8080/',
          'http://localhost:8080/blog',
          'http://localhost:8080/about.html'
        ],
        checkLinks: true,
        linksToIgnore: [
          'http://localhost:8080/broken.html'
        ],
        noEmptyFragments: true,
        noLocalLinks: true,
        noRedirects: true,
        onlySameDomain: true,
        preferSecure: true,
        queryHashes: true,
        checkCaching: true,
        checkCompression: true,
        checkXhtml: true,
        summary: true,
        terse: true,
        maxResponseTime: 200,
        userAgent: 'custom-user-agent/1.2.3'
      }
    },
    production: {
      options: {
        pageUrls: [
          'http://example.com/',
          'http://example.com/blog',
          'http://example.com/about.html'
        ],
        checkLinks: true,
        maxResponseTime: 500
      }
    }
  }
});

Options

pageUrls

Type: Array of String
Default value: undefined
Required

pageUrls specifies a list of URLs for web pages the task will check. The list can be empty, but must be present. Wildcards are not supported.

URLs can point to local or remote content via the http, https, and file protocols. http and https URLs must be absolute; file URLs can be relative. Some features (for example, HTTP header checks) are not available with the file protocol.

To store the list outside Gruntfile.js, read the array from a JSON file instead: pageUrls: grunt.file.readJSON('pageUrls.json').

checkLinks

Type: Boolean
Default value: false

Enabling checkLinks causes each link in a page to be checked for validity (i.e., an HTTP HEAD or GET request returns success).

For efficiency, a HEAD request is made first and a successful result validates the link. Because some web servers misbehave, a failed HEAD request is followed by a GET request to definitively validate the link.

The following element/attribute pairs are used to identify links:

  • a/href
  • area/href
  • audio/src
  • embed/src
  • iframe/src
  • img/src
  • img/srcset
  • input/src
  • link/href
  • object/data
  • script/src
  • source/src
  • source/srcset
  • track/src
  • video/src
  • video/poster

linksToIgnore

Type: Array of String
Default value: undefined
Used by: checkLinks

linksToIgnore specifies a list of URLs that should be ignored by the link checker.

This is useful for links that are not accessible during development or known to be unreliable.

noEmptyFragments

Type: Boolean
Default value: false
Used by: checkLinks

Set this option to true to fail the task if any links contain an empty fragment identifier (hash) such as <a href="#">.

This is useful to identify placeholder links that haven't been updated.

noLocalLinks

Type: Boolean
Default value: false
Used by: checkLinks

Set this option to true to fail the task if any links to localhost are encountered.

This is useful to detect temporary links that may work during development but would fail when deployed.

The list of host names recognized as localhost are:

  • localhost
  • 127.0.0.1 (and the rest of the 127.0.0.0/8 address block)
  • ::1 (and its expanded forms)

noRedirects

Type: Boolean
Default value: false
Used by: checkLinks

Set this option to true to fail the task if any HTTP redirects are encountered.

This is useful to ensure outgoing links are to the content's canonical location.

onlySameDomain

Type: Boolean
Default value: false
Used by: checkLinks

Set this option to true to block the checking of links on different domains than the referring page.

This is useful during development when external sites aren't changing and don't need to be checked.

preferSecure

Type: Boolean
Default value: false
Used by: checkLinks

Set this option to true to fail the task if any HTTP links are present where the corresponding HTTPS link is also valid.

This is useful to ensure outgoing links use a secure protocol wherever possible.

queryHashes

Type: Boolean
Default value: false
Used by: checkLinks

Set this option to true to verify links with file hashes in the query string point to content that hashes to the expected value.

Query hashes can be used to invalidate cached responses when leveraging browser caching via long cache lifetimes.

Supported hash functions are:

  • image.png?crc32=e4f013b5
  • styles.css?md5=4f47458e34bc855a46349c1335f58cc3
  • archive.zip?sha1=9511fa1a787d021bdf3aa9538029a44209fb5c4c

checkCaching

Type: Boolean
Default value: false

Enabling checkCaching verifies the HTTP Cache-Control and ETag response headers are present and valid.

This is useful to ensure a page makes use of browser caching for better performance.

checkCompression

Type: Boolean
Default value: false

Enabling checkCompression verifies the HTTP Content-Encoding response header is present and valid.

This is useful to ensure a page makes use of compression for better performance.

checkXhtml

Type: Boolean
Default value: false

Enabling checkXhtml attempts to parse each URL's content as XHTML and fails if there are any structural errors.

This is useful to ensure a page's structure is well-formed and unambiguous for browsers.

summary

Type: Boolean
Default value: false

Enabling the summary option logs a summary of each issue found after all checks have completed.

This makes it easy to pick out failures when running tests against many pages. May be combined with the terse option.

terse

Type: Boolean
Default value: false

Enabling the terse option suppresses the logging of each check as it runs, instead displaying a brief overview at the end.

This is useful for scripting or to reduce output. May be combined with the summary option.

maxResponseTime

Type: Number
Default value: undefined

maxResponseTime specifies the maximum amount of time (in milliseconds) a page request can take to finish downloading.

Requests that take more time will trigger a failure (but are still checked for other issues).

userAgent

Type: String
Default value: grunt-check-pages/x.y.z

userAgent specifies the value of the HTTP User-Agent header sent with all page/link requests.

This is useful for pages that alter their behavior based on the user agent. Setting the value null omits the User-Agent header entirely.

Release History

  • 0.1.0 - Initial release, support for checkLinks and checkXhtml.
  • 0.1.1 - Tweak README for better formatting.
  • 0.1.2 - Support page-only mode (no link or XHTML checks), show response time for requests.
  • 0.1.3 - Support maxResponseTime option, buffer all page responses, add "no-cache" header to requests.
  • 0.1.4 - Support checkCaching and checkCompression options, improve error handling, use gruntMock.
  • 0.1.5 - Support userAgent option, weak entity tags, update nock dependency.
  • 0.2.0 - Support noLocalLinks option, rename disallowRedirect option to noRedirects, switch to ESLint, update superagent and nock dependencies.
  • 0.3.0 - Support queryHashes option for CRC-32/MD5/SHA-1, update superagent dependency.
  • 0.4.0 - Rename onlySameDomainLinks option to onlySameDomain, fix handling of redirected page links, use page order for links, update all dependencies.
  • 0.5.0 - Show location of redirected links with noRedirects option, switch to crc-hash dependency.
  • 0.6.0 - Support summary option, update crc-hash, grunt-eslint, nock dependencies.
  • 0.6.1 - Add badges for automated build and coverage info to README (along with npm, GitHub, and license).
  • 0.6.2 - Switch from superagent to request, update grunt-eslint and nock dependencies.
  • 0.7.0 - Move task implementation into reusable check-pages package.
  • 0.7.1 - Fix misreporting of "Bad link" for redirected links when noRedirects enabled.
  • 0.8.0 - Suppress redundant link checks, support noEmptyFragments option, update dependencies.
  • 0.9.0 - Add support for checking local content via the 'file:' protocol, update dependencies.
  • 0.10.0 - International URLs, preferSecure option, terse option, srcset, update dependencies.