wellness

A module to handle useful health checks for clustered production applications.

Usage no npm install needed!

<script type="module">
  import wellness from 'https://cdn.skypack.dev/wellness';
</script>

README

wellness

This module provides a healthcheck framework for existing applications using express or applications having no web framework (it can create its own express server). It does the following:

  • provides an optional timeout check for works doing useful work
  • for clustered servers, distrubutes the health info to all workers, so any worker can respond to the health check
  • enables you to create your own custom health checks and use existing wellness health check modules
  • shows valuable information on the heathcheck response

Designed for production systems to ensure they are available. Wellness, has a single check, a timeout (adjustable with the workerTimeOut option parameter) if a worker fails to call workerIsWorking in before the timeout expires.

With workers, if half the workers must have not done any useful work in the time period to be considered failed. The time period is adjustable by setting the option, workerTimeOut.

NOTES:

  • If the option workerTimeOut is 0, the default, there is no worker timeout.
  • If there is no expressApp option parameter, wellness creates an express and adds it healthcheck to that route.
  • The health check only returns 500 when NODE_ENV === 'production'.
  • To force the healthcheck to always fail, set the environment variable HEALTHCHECK_ALWAYS_FAILS to "true".
  • If process.platform === "linux", the linux distribution information is shown on the output.
  • All the healthchecks appear in an array called "healthChecks".
  • If any of the healthChecks fail, the response is an HTTP status of 500 and 200 otherwise.

Example Usage, Non-Clustered Express Server

var wellness = require('wellness');
var express = require('express');
var app = express();

var opts = {
    healthCheckUriPath: '/healthcheck',
    expressApp: app,
    workerTimeOut: 50000
};

function doSomethingUseful() { wellness.workerIsWorking(); }

wellness.nonClusterInit(opts, function(err) {
    if (err)
        return console.error(err.message);

    app.listen(3000, function () {
        console.log('Example app listening on port 3000!');
        doSomethingUseful();
        setInterval(doSomethingUseful, 1000);
    });
});

Running the above code, you can test with:

$ curl  http://localhost:3000/healthcheck

Which supplies the following output:

{
  "healthChecks": [
    "checkWorkers: 8 workers were okay out of 8"
  ],
  "nodeVersions": {
    "ares": "1.10.1-DEV",
    "http_parser": "2.7.0",
    "icu": "57.1",
    "modules": "48",
    "node": "6.2.2",
    "openssl": "1.0.2h",
    "uv": "1.9.1",
    "v8": "5.0.71.52",
    "zlib": "1.2.8"
  },
  "platform": {
    "code": "xenial",
    "name": "Ubuntu 16.04.1 LTS",
    "os": "Ubuntu",
    "release": "16.04"
  },
  "version": "4.0.0"
}

NOTE: The platform output for Linux has more information than the windows or OS X targets.

Example Usage, Clustered Express Server

var wellness = require('wellness');
var express = require('express');
var app = express();
var numCPUs = require('os').cpus().length;

var opts = {
    healthCheckUriPath: '/healthcheck',
    expressApp: app,
    workerTimeOut: 2000,
    numWorkers: numCPUs
};

function doSomethingUseful() {
    wellness.workerIsWorking();
}
var cluster = require('cluster');

if (cluster.isMaster) {
    for (var i = 0; i < numCPUs; i++)
        cluster.fork();
    wellness.clusterPostForkInit();
} else {
    wellness.clusterPostForkInit(opts, function(err) {
        if (err) {
            console.error(err.message);
            return;
        }
        app.listen(3000, function () {
            console.log('Example app listening on port 3000!');
            doSomethingUseful();
            setInterval(doSomethingUseful, 1000);
        });
    });
}

Example Usgae, Non-clustered Express Server

var wellness = require('wellness');
var express = require('express');
var app = express();

var opts = {
    healthCheckUriPath: '/healthcheck',
    expressApp: app,
    workerTimeOut: 3000
};

function doSomethingUseful() {
    wellness.workerIsWorking();
}

wellness.nonClusterInit(opts, function(err) {
    if (err)
        return console.error(err.message);

    app.listen(3000, function () {
        doSomethingUseful();
        setInterval(doSomethingUseful, 1000);
    });
});

API

nonClusterInit(opts, callback)

The init method sets up wellness for use with a non-clustered server. The following properties can be set on the opts object to configure wellness:

  • healthCheckUriPath - a string URI path, e.g. '/healthcheck' OPTIONAL default is '/healthcheck'
  • workerPercentFailed - A decimal percentage, (e.g. 0.50, which is 50%), where is the number of workers that failed is greater, the worker healthcheck fails. The default is 0.50 (50%).
  • expressApp - The express application to add the health check upon. OPTIONAL will create its own express application when none given
  • port - If there is no expressApp, create an express instance, listening on port OPTIONAL default value 8889.
  • workerTimeOut - A number of milliseconds, which if the worker has not called workerIsWorking(), is considered "dead" OPTIONAL default is 0, meaning no checks for worker timeout.
  • logger - Supply a logging option with info, warn, and error for logging. OPTIONAL uses console output by default.
  • packageJsonPath - Path to package.json file, so health check can report on application version. If not supplied process.cwd() is assumed.

clusterPostForkInit(opts, callback)

The init method sets up wellness for use with a clustered server. The properties that can be set on the opts object to configure wellness are the same as on nonClusterInit.

addCheck(func)

Adds a check to the list of health checks to be performed. The health check needs to return a callback that returns an error as the first argument and an optional sucessful status as the second argument. Here is the check cpu usage function as example:

var diskspace = require('diskspace');
var wellness = require('wellness');
var is = require('is2');

/**
 * The check for free diskspace.
 * @param {function} cb - A standard call back for async.parallel
 */
function checkDiskSpace(cb) {
    if (!is.func(cb)) {
        var err = new Error('Bad cb argument checkDiskSpace: '+inspect(cb));
        return cb(err);
    }

    diskspace.check('/', function (err, total, free) {
        if (err)
            return cb(err);
        var freePercent = Math.floor((free / total)*1000) / 10;

        if (freePercent < 20) {
            wellness.setError();
            return cb(null, 'Low diskspace: '+freePercent+'%', false);
        }

        wellness.clearError();
        return cb(null, 'Free diskspace: '+freePercent+'%', true);
    });
}

wellness.addCheck(checkDiskSpace);

Any health check function is called by async.parallel, so it must NOT return an error in the first argument, if you want all the health checks to run.

Also, for every success case, add a non-null truthy value. If there is no value or the value is "falsey", the status code returned from is 500, indicating server failure.

workerIsWorking()

A call that workers, or non-clustered master processes make to signal it is doing useful work. Once over half the workers have not signalled using this function within the time period workerTimeOut, the health check fails.

You must set workerTimeOut to a value greater than 0 for this function to be useful.