README
slimtomato
A slim website automation system for node.js
Introduction
The goal of the "slim tomato" - whose name comes from "au[tomato]r", by the way - is to fill the need of a very basic and slim library to automate interaction with simple websites. It provides a simple and fast way to simulate user interaction at "HTTP level" - sending a request and then following up with "clicking" links, filling forms, and so on.
Let's start with an example to see what the tomato can do:
/* This example attempts to log into the CTEC backend and extract the user's full name */
// Require the tomato
const {Tomato, Request, FormFiller, Assertion, Callback} = require('slimtomato')
// Credentials to log in with (of course, these won't work)
const username = '1234'
const password = 'ABC123'
// Create a tomato, with some lifecycle hooks for debug output and some default headers
const tomato = new Tomato({
beforeStep: step => console.log(`Executing step: ${step.name}`),
afterStep: step => console.log(`Step done: ${step.name}`),
beforeRequest: (step, request) => console.log(`Step "${step.name}" is sending request:`, request),
requestOptions: {headers: {Accept: '*/*', 'User-Agent': 'slimtomato/1'}}
})
// Execute our automation pipeline
tomato.runSteps([
// First, open the login page
new Request('Open login page', {uri: 'https://www.ctec.org/Provider/Logon'}),
// Then, attempt to log in by filling and submitting the login form
new FormFiller('Log in', {
selector: '#form1',
callback: dataObject => { // dataObject already contains the initial values
// These weird names are what the website uses, don't blame me...
dataObject.ctl00$ctl00$cphBodyContent$cphContent$txtUserID = username
dataObject.ctl00$ctl00$cphBodyContent$cphContent$txtPassword = password
},
autoRequest: true
}),
// Before we continue, verify that the previous step (our login) was successful
new Assertion('Verify that login worked', {
// Check that we were redirected to the account dashboard
callback: result => result.request.uri.href === 'https://www.ctec.org/Provider/Account/Main/',
// If not, return the actual URL as part of the error message
explanator: result => `URL was ${result.request.uri.href}`
}),
// Let's extract the user's full name
new Callback('Extract full name', {
callback: ({$}) => $('#cphBodyContent_cphContent_uclProviderDetails_lblName').text()
})
]).then(userFullName => {
// The result from the tomato's promise was the result of its last step
// That is, our user's full name
console.log(`Hello, ${userFullName}!`)
}).catch(console.error) // If something failed, we'd see it
slimtomato does:
- Allow writing a pipeline of automation steps which are executed in order and depend on one another (with support for assertions and custom steps)
- Provide access to the DOM of a webpage using familiar jQuery-style syntax
- Allow conveniently filling and submitting web forms (honoring default values and hidden fields), including support for file uploads
- Allow conveniently "clicking" links on a page
- Remember cookies
- Send a
Referer
header where applicable
slimtomato does not:
- Execute website JavaScript
- Load referenced resources like images, stylesheets, etc.
- Render anything
- Try to emulate a browser (with all the expected HTTP headers, etc.)
- ...any other fancy bells and whistles like this
slimtomato is a thin wrapper around request-promise (actually request-promise-native), combined with cheerio (something like a "jQuery for node.js") to allow working easily with the HTML that is coming from the website, with a bit of logic on top that simplifies working with HTML forms.
This means that slimtomato is meant to be used to automate "simple" (and/or older) websites which neither depend on fancy things on the client nor do active bot-prevention (such as many "internal" websites which become relevant when automating business workflows in companies). If this is not enough for your use case, take a look at full-blown "headless browser" solutions like PhantomJS, which can work with more websites than slimtomato can.
How to install? As usually: npm i slimtomato
😊
Core concept
Tomatoes and steps
The slimtomato
package exports a number of ES6 classes. These are Tomato
and different step types.
An instance of Tomato
is called "a tomato". A tomato can have state (although by default, the only state it has is a cookie jar, accessible in tomato.jar
). tomato.runSteps(arrayOfSteps)
can be called to run a pipeline on the tomato.
An automation pipeline consists of a number of steps. Steps are executed in order, they are usually expected to return a result value (or another step which will then be executed immediately and its result value will be forwarded) and they have access to the previous step's result. The last step's result is made available to the surrounding code that uses the tomato.
There are different kinds of steps (such as Request
or FormFiller
). All steps are derived from a base class Step
. Steps usually have options.
Every step has a name. It is used both for briefly documenting the pipeline in the code and for debugging, since it can be accessed in many places and is also by default included in all error messages.
To add a step to your pipeline, you create an instance of the desired step type, usually like this: new StepType(name, options)
.
Results and options
Steps usually have a result. Steps can also access the previous step's result. This allows to have steps which transform data from previous steps. If a step returns another step instance as "result", then the tomato will execute the newly created step immediately before executing the next one in the pipeline. Note: Such a "generated" step cannot access previous data at the moment.
Steps usually take options which define their parameters. There are three ways to set options:
- By specifying an object. This will simply use the options defined in the object.
new SomeStepType('Something', { hello: 'world' })
- By specifying a function (can be
async
). The function will get the previous step's result as parameter and should return the final options object. This allows to set options dynamically.new SomeStepType('Something', result => ({ hello: result.world }))
- By specifying
true
. This will use the previous step's result as options.new SomeStepType('Something', true)
Tomato class
A tomato is instantiated like this: new Tomato(options)
.
options
is an optional object which can have the following properties:
jar
: Arequest
cookie jar. If not specified, a new one is created automatically (and can be read usingtomato.jar
later).beforeStep
: A function with parameterstep
which is called before a step is executed.step
is at the moment just an object{name}
.afterStep
: A function with parameterstep
which is called after a step was executed.step
is at the moment just an object{name, result}
.beforeRequest
: A function with parameters(step, request)
which is called before theRequest
step type initiates an HTTP request.step
is at the moment just an object{name}
andrequest
is the options object passed to request-promise.requestOptions
: Object of default request options used with therequest
module. The options passed directly to aRequest
step have priority, theheaders
property is merged.
Additionally, you can add arbitrary fields into the options object, which will become properties of the created tomato.
The tomato instance additionally exposes a lastStep
property which is again an object {name, result}
.
For running a pipeline, the tomato instance has one method: runSteps
. It takes an array of step instances as parameter and it returns a promise that resolves to the last step's result. If any step in the pipeline throws an error, the promise is rejected.
Step types
This section describes all the step types available by default. "Input" describes how the previous step's result is used (if at all), "Options" describes which options are supported, and "Output" describes what the step's result will be.
Request
The Request
step type is "the core" of slimtomato. Most of the time, you will operate on results from this step type. It is a thin wrapper around request-promise (actually request-promise-native). Additionally, the website's response is parsed using cheerio.
- Input: Ignored.
- Options: Mostly the options object passed to request-promise, with small differences:
- By default the
jar
property is initialized withtomato.jar
(to allow remembering cookies), andfollowAllRedirects: true
is set. These can be overwritten in the options object. - A new property
raw: true
may be set to prevent parsing the response body with cheerio.
- By default the
- Output: The return value of the request-promise call, augmented with a
$
property with cheerio's result. The most important properties are:$
: DOM parsed by cheerio. You can use it for most intents and purposes like jQuery's$
to access the returned HTML.body
: Raw response body.request
: Describes the final request, after redirects (so you can useresult.request.uri.href
to verify that you "landed" on the right page after a login, for example).
Example:
new Request('Open a page', {
uri: 'https://www.google.com'
})
// Result will allow you to use things like result.$('[name=btnK]').text()
Callback
The Callback
step type simply executes the specified callback with the result of the previous step as parameter and passes the return value as result to the next step. It can be used to transform values or to store or log them.
- Input: Passed to the callback.
- Options:
callback
: A function (can beasync
) which receives the previous step's result as parameter. The return value of this function is used as this step's result. (So, if you are only observing something, don't forget toreturn result
at the end.)
- Output: Return value of the callback.
Example:
new Callback('Log search button text', {
callback: result => {
console.log('Search button text:', result.$('[name=btnK]').text())
return result
}
})
Assertion
The Assertion
step type is a convenience wrapper around Callback
which verifies that a given condition is true. If it is not, an exception is thrown, optionally with a customizable "explanation". It is designed to verify that the last step worked before continuing.
- Input: Passed to the callback.
- Options:
callback
: A function (can beasync
) which receives the previous step'sresult
as parameter and is expected to return a truthy or falsy value to indicate success or failure. If a falsy value is returned, an exception will be thrown.explanator
: Optional function (can not byasync
!) which is invoked in case the callback failed. It also receivesresult
as parameter and should return a string which describes why the assertion failed.
- Output: Input passed through.
Example:
new Assertion('Verify that we landed on the right page', {
callback: result => result.request.uri.hostname.match(/google/),
explanator: result => `URL was ${result.request.uri.href}`
})
LinkClicker
The LinkClicker
step type extracts the URL of a link specified by a selector and prepares a request to that URL (including Referer
). If autoRequest: true
is specified, the request is also executed.
Note: At the moment, <base>
tags are not handled.
- Input: Must be the result of a
Request
step (or aLinkClicker
/FormFiller
withautoRequest: true
). - Options:
selector
: CSS selector of the link to be "clicked".autoRequest
: If set totrue
, the request is executed immediately (aRequest
step is generated to do so). Otherwise, only the request options are prepared.
- Output:
- If
autoRequest
is enabled, the output will be the output of the generatedRequest
step that simulated "clicking" the link. - If
autoRequest
is disabled, the output will be the request options object ready to be passed into aRequest
.
- If
Example:
new LinkClicker('Click on "Settings" link', {
selector: '#fsettl',
autoRequest: true
})
// Result will be the "Settings" page
FormFiller
The FormFiller
step type is the most complex one. It allows to simulate filling a web form.
It will identify a form based on a CSS selector, prefill an object with the default values already set in the form (including hidden fields), allow modifying this object (e.g. setting new values) and then prepare a request to submit the filled form (optionally using a specified submit button in case there are multiple), with the right headers and data. If autoRequest: true
is specified, the request is also executed. Uploading files is also possible.
Note: At the moment, <base>
tags are not handled.
- Input: Must be the result of a
Request
step (or aLinkClicker
/FormFiller
withautoRequest: true
). - Options:
selector
: CSS selector identifying the form to submit.submitSelector
: Optional CSS selector identifying the submit button to use (in case there are multiple).callback
: A function that receives an object with the existing form data (default values, hidden fields) as parameter and can modify this object (for example, set new values).- The keys are the names of the form fields. The values are the string values of the form fields, or arrays with such values in case there was more than one field with the same name.
- To upload files, set the file field to an object
{value, options: {filename, contentType}}
as described here.
autoRequest
: If set totrue
, the request is executed immediately (aRequest
step is generated to do so). Otherwise, only the request options are prepared.
- Output:
- If
autoRequest
is enabled, the output will be the output of the generatedRequest
step that simulated "submitting" the link. - If
autoRequest
is disabled, the output will be the request options object ready to be passed into aRequest
.
- If
Example:
new FormFiller('Submit search', {
selector: '[name=f]',
submitSelector: '[name=btnK]',
callback: dataObject => {
dataObject.q = 'Hello World'
},
autoRequest: true
})
// Result will be the search results page
Creating a custom step type
If you want to create your own step type, you can extend Step
as follows:
const {Step} = require('slimtomato')
class MyNewStepType extends Step {
run (tomato, prev) {
// Implementation here.
// tomato: The tomato instance
// prev: Object {name, result} of last step
// You can access this.options to get your step's current options.
// You can return a value, a promise or another step instance.
}
}
Of course, you can also extend a preexisting step type (but then don't forget to call super.run
).