README
Pagean
Pagean is a web page analysis tool designed to automate tests requiring web pages to be loaded in a browser window (e.g. 404 error loading an external resource, page renders with horizontal scrollbars). The specific tests are outlined below, but are all general tests that do not include any page-specific logic.
Installation
Install Pagean globally (as shown below), or locally, via npm.
npm install -g pagean
Usage
Pagean runs as a command line tool and is executed as follows:
Installed globally:
> pagean [options]
Installed locally:
> npx pagean [options]
Options:
-V, --version output the version number
-c, --config <file> the path to the pagean configuration file (default: "./.pageanrc.json")
-h, --help display help for command
Pagean requires a configuration file named, which can be specified via the CLI as detailed above, or use the default file .pageanrc.json
in the project root. This file provides the URLs to be tested and options to configure the tests and reports. Details on the available tests and the configuration file format are provided below.
Test Cases
The tests use Puppeteer to launch a headless Chrome browser. The URLs defined in the configuration file are each loaded once, and after page load the applicable tests are executed. Test results are passed
or failed
, but can be configured to report warning
instead of failure. Only a failed
test will cause the test process to fail and exit with an error code (a warning
will not).
Horizontal Scrollbar Test
The horizontal scrollbar test fails if the rendered page has a horizontal scrollbar. If a specific browser viewport size is desired for this test, that can be configured in the puppeteerLaunchOptions
.
Console Output Test
The console output test fails if any output is written to the browser console. An array is included in the report with all entries, as shown below:
[
{
"_args": [],
"_location": {
"lineNumber": undefined,
"url": "https://this.url.does.not.exist/file.js"
},
"_text": "Failed to load resource: net::ERR_NAME_NOT_RESOLVED",
"_type": "error"
}
]
Console Error Test
The console error test fails if any error is written to the browser console, but is otherwise the same as the console output test. This separation allows for testing for console errors, but allowing any other console output.
Rendered HTML Test
The rendered HTML test is intended for cases where content is dynamically created prior to page load (i.e. the load
event firing). The rendered HTML is returned and checked with HTML Hint and the test fails if any issues are found. An array is included in the report with all HTML Hint issues, as shown below:
[
{
"col": 9,
"evidence": " <div id=\"div1\"></div>",
"line": 6,
"message": "The id value [ div1 ] must be unique.",
"raw": " id=\"div1\"",
"rule": {
"description": "The value of id attributes must be unique.",
"id": "id-unique",
"link": "https://github.com/thedaviddias/HTMLHint/wiki/id-unique"
},
"type": "error"
}
]
An htmlhintrc file can be specified in the configuration file, otherwise the default "./.htmlhintrc" file will be used (if it exists). See the Configuration section below.
Note: This test may not find some errors in the original HTML that are removed/resolved as the page is parsed (e.g. closing tags with no opening tags).
Page Load Time Test
The page load time test fails if the page load time (from start through the load
event) exceeds the defined threshold in the configuration file (or the default of 2 seconds). The actual load time is included in the report. Tests will time out at twice the page load time threshold.
External Script Test
The external script test is intended to identify any externally loaded javascript files (e.g. loaded from a CDN) and aggregate those files so they can undergo further analysis (e.g. dependency vulnerability scanning). The test is included here since these tests load fully rendered pages, therefore allowing the aggregation of this data for pages generated using any language or framework. By default the test returns a warning if the page includes any javascript files loaded from a different domain than the page (although this could be overridden to fail instead via setting failWarn: false
, see the Configuration section below). These files are then downloaded and saved in the "pagean-external-files" directory in the project root. Subdirectories are created for each domain, then following the URL path. For example, the following script...
<script src="https://bootstrapcdn.com/bootstrap/4.5.0/js/bootstrap.min.js"></script>
...will be saved as ./bootstrapcdn.com/bootstrap/4.5.0/js/bootstrap.min.js
. The data
array in the test report includes the original file URL and the local saved filename or applicable error, as shown below.
[
{
"url": "https://code.jquery.com/jquery-3.4.1.slim.min.js",
"localFile": "pagean-external-scripts/code.jquery.com/jquery-3.4.1.slim.min.js"
},
{
"url": "http://bootstrapcdn.com/bootstrap/4.5.0/js/bootstrap.min.js",
"error": "Request failed with status code 404"
}
]
Each external script is saved only once, but will be reported on any page where it is referenced.
Broken Link Test
The broken link test checks for broken links on the page. It checks any <a>
tag on the page with href
pointing to another location on the current page or another page (i.e. only http(s)
or file
protocols).
- For links within the page, this test checks for existence of the element on the page, passing if the element exists and failing otherwise (and passing for cases that are always valid, e.g.
#
or#top
for the current page). It does not check the visibility of the element. Failing tests return a response of "#element Not Found" (where#element
identifies the specific element). - For links to other pages, the test tries to most efficiently confirm whether the target link is valid. It first makes a
HEAD
request for that URL and checks the response. If an erroneous response is returned (>= 400 with no execution error) and not code 429 (Too Many Requests), the request is retried with aGET
request. The test passes for HTTP responses < 400 and fails otherwise (if HTTP response is >= 400 or another error occurs).- This can result in false failure indications, specifically for
file://
links (404
orECONNREFUSED
) or where the browser passes a domain identity with the request (page loads when tested, but401
response for links to that page). For these cases, or other false failures, the test configuration allows a booleancheckWithBrowser
option that will instead check links by loading the target in the browser (viapuppeteer
). Note this can increase test execution time, in some cases substantially, due to the time to open a new browser tab and plus load the page and all assets. - If the link to another page includes a hash it is removed prior to checking. The test in this case is confirming a valid link, not that the element exists, which is only done for the current page.
- The test configuration allows an
ignoredLinks
array listing link URLs to ignore for this test. Note this only applies to links to other pages, not links within the page, which are always checked.
- This can result in false failure indications, specifically for
- To optimize performance, link test results are cached and those links are not re-tested for the entire test run (across all tested URLs). The test configuration allows a boolean
ignoreDuplicates
option that can be set tofalse
to bypass this behavior and re-test all links. The results for any failed links are included in the reports in any case.
For any failing test, the data
array in the test report includes the original URL and the response code or error as shown below.
[
{
"href": "https://about.gitlab.com/not-found",
"status": 404
},
{
"href": "http://localhost:8080/brokenLinks.html#notlinked",
"status": "#notlinked Not Found"
},
{
"href": "https://this.url.does.not.exist/",
"status": "ENOTFOUND"
}
]
Reports
Based on the reporters
configuration, Pagean results may be displayed in the console and saved in two reports in the project root directory (any or all of the three):
- A JSON report named
pagean-results.json
- An HTML report named
pagean-results.html
Both reports contain:
- The time of test execution
- A summary of the total tests and results (passed, warning, failed)
- The detailed test results, including the URL tested, list of tests performed on that URL with results, and, if applicable, any relevant data associated with the test failure (e.g. the console errors if the console error test fails).
Complete reports for the example case in this project (the tests as specified in the project .pageanrc.json
file) can be found at the links above.
Configuration
Pagean looks for a configuration file as specified via the CLI, or defaults to a file named .pageanrc.json
in the project root. If the configuration file is not found, is not valid JSON, or does not contain any URLs to check the job will fail.
Below is an example .pageanrc.json
file, which is broken into six major properties:
htmlhintrc
: An optional path to an htmlhintrc file to be used in the rendered HTML testproject
: An optional name of the project, which is included in HTML and JSON reports.puppeteerLaunchOptions
: An optional set of options to pass to Puppeteer on launch. There are no default options. The complete list of available options can be found at https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#puppeteerlaunchoptions.reporters
: An optional array of reporters indicating the test reports that should be provided. There are three possible options -cli
,html
, andjson
. Thecli
option reports all test details to the console, but the final results summary is always output (even withcli
disabled). Ifreporters
is specified, at least one reporter must be included. The default value, as specified below, is all three reporters enabled.settings
: These settings enable/disable or configure tests, and are applied to all tests overriding the default values.- The shorthand notation allows easy enabling/disabling of tests. In this format the test name is given with a boolean value to enable or disable the test. In this case any other test-specific settings use the default values.
- The longhand version includes an object for each test. Every test includes two possible properties (some tests include additional settings):
enabled
: A boolean value to enable/disable the test, and some tests include additional settings (defaulttrue
for all tests).failWarn
: A boolean value causing a failed test to report a warning instead of failure. A warning result will not cause the test process to fail (exit with an error code). The default value for all tests isfalse
except theexternalScriptTest
, as shown below.
The shorthand:
"settings": {
"consoleErrorTest": true
}
is equivalent to the longhand:
"settings": {
"consoleErrorTest": {
"enabled": true,
"failWarn": false
}
}
All available settings with the default values are shown below.
urls
: An array of URLs to be tested, which must contain at least one value. Each array entry can either be a URL string, or an object that contains aurl
string and an optionalsettings
object. This object can contain any of thesettings
values identified above and will override that setting for testing that URL. Theurl
string can be either an actual URL or a local file, as shown in the example below.
{
"puppeteerLaunchOptions": {
"args": [ "--no-sandbox" ]
},
"reporters": [
"cli",
"html",
"json"
],
"settings": {
"horizontalScrollbarTest": {
"enabled": true,
"failWarn": false
},
"consoleOutputTest": {
"enabled": true,
"failWarn": false
},
"consoleErrorTest": {
"enabled": true,
"failWarn": false
},
"renderedHtmlTest": {
"enabled": true,
"failWarn": false
},
"pageLoadTimeTest": {
"enabled": true,
"failWarn": false,
"pageLoadTimeThreshold": 2
},
"externalScriptTest": {
"enabled": true,
"failWarn": true
},
"brokenLinkTest": {
"enabled": true,
"failWarn": false,
"checkWithBrowser": false,
"ignoreDuplicates": true,
"ignoredLinks": []
}
},
"urls": [
"https://gitlab.com/gitlab-ci-utils/pagean/",
{
"url": "./tests/test-cases/consoleLog.html",
"settings": {
"consoleOutputTest": false
}
}
]
}
Docker Images
Provided with the Pagean project are Docker images configured to run the tests. All available image tags can be found in the gitlab-ci-utils/pagean
repository at https://gitlab.com/gitlab-ci-utils/pagean/container_registry. Details on each release can be found on the Releases page.
Note: Any images in the gitlab-ci-utils/pagean/tmp
repository are temporary images used during the build process and may be deleted at any point.
GitLab CI Configuration
The following is an example job from a .gitlab-ci.yml file to use this image to run Pagean against another project in GitLab CI:
pagean:
image: registry.gitlab.com/gitlab-ci-utils/pagean:latest
stage: test
script:
- pagean
artifacts:
when: always
paths:
- pagean-results.html
- pagean-results.json
- pagean-external-scripts/
Testing With Static HTTP Server
The Docker image shown above includes http-server
and wait-on
installed globally to run a local HTTP server for testing static content. The example job below illustrates how to use this for Pagean tests. The script starts the server in this project's test-cases
directory and uses wait-on
to hold the script until the server is running and returns a valid response. The referenced pageanrc
file is the same as the project default pageanrc
, but references all test URLs from the local server.
pagean:
image: registry.gitlab.com/gitlab-ci-utils/pagean:latest
stage: test
before_script:
# Start static server in test cases directory, discarding any console output,
# and wait until the server is running
- http-server ./tests/test-cases > /dev/null 2>&1 & wait-on http://localhost:8080
script:
- pagean -c static-server.pageanrc.json
artifacts:
when: always
paths:
- pagean-results.html
- pagean-results.json
- pagean-external-scripts/
Linting Pageanrc Files
A command line tool is also available to lint pageanrc files, which is executed as follows:
Installed globally:
> pageanrc-lint [options] [file] (default: "./.pageanrc.json")
Installed locally:
> npx pageanrc-lint [options] [file] (default: "./.pageanrc.json")
Lint a pageanrc file
Options:
-V, --version output the version number
-j, --json output JSON with full details
-h, --help display help for command
The --json
option outputs the JSON results to stdout in all cases for consistency ([]
if no errors found, so that it always outputs valid JSON). Otherwise errors are output to stderr, for example:
.\tests\test-configs\cli-tests\some-test.pageanrc.json
<pageanrc>.puppeteerLaunchOptions should NOT have fewer than 1 items
<pageanrc>.reporters[0] should be equal to one of the allowed values (cli, html, json)
<pageanrc>.settings.consoleOutputTest should be either boolean or object with the appropriate properties
<pageanrc>.settings.pageLoadTimeTest.foo should NOT contain additional properties: "foo"
<pageanrc>.settings.pageLoadTimeTest should be either boolean or object with the appropriate properties
<pageanrc>.urls[2].settings.consoleOutputTest should be either boolean or object with the appropriate properties
<pageanrc>.urls[3] should be either URL string or object with the appropriate properties
<pageanrc>.urls[5] should have required property url
In some cases, a single error might result in multiple messages based on the options in the schema definition, especially for cases that can be either a single value or an object with specific properties (e.g. the errors for <pageanrc>.settings.pageLoadTimeTest
in the example above).
Note that because of the large number of options, which are dependent on an external project, the linting of puppeteerLaunchOptions
only checks that at least one property is provided, it does not check the detailed settings.