puppeteer-by-steps

Use puppeteer by instructions

Usage no npm install needed!

<script type="module">
  import puppeteerBySteps from 'https://cdn.skypack.dev/puppeteer-by-steps';
</script>

README

Puppeteer by steps

This package is a wrapper of Puppeteer which works setting an array of steps.

It allows you doing basic scrapping in a easy way.

Get started

Init

Get a new instance of Scrapper

Scrapper(<dimensions>, <showBrowser>, <steps>, <objectData>, <customChromium>)

dimensions: {
    height: <number>
    width: <number>,
}

showBrowser: <boolean>

steps: [{
    type: <string>,
    ...propsAccordingToType
}]

objectData: {
    yourProp1: 'prop1',
    yourProp2: 'prop2',
    yourProp3: 'prop3'
}

customChromium: <string>
// if customChromium = ''. The Chromium to be used will be the installed by Puppeteer; otherwise, The Chromium to be used will be the past one.
const Scrapper = require('puppeteer-by-steps');
s = new Scrapper({ width: 1366, height: 768}, true, newSteps, transformedData);
await s.init(); // Init setting required by Puppeteer
await s.scrap(); // Start the scrapping process

Methods

async init

This method is required after you instance the Scrapper. This method will set your first page, which will called main.

Use:

await s.init();

getBrowser

This method returns the Puppeteer browser. You can use whatever Puppeteer browser method.

s.getBrowser();

getCurrentPage

This method returns the Puppeteer page which is being using currently. You can use whatever Puppeteer page method.

s.getCurrentPage();

async createPage (name: <string>)

This method will create a new page in the browser. It receives a name for the new page.

Use:

await s.createPage(name);

Example:

await s.createPage('secondPage');

async selectPage (name: <string>)

This method allows to switch between the different pages using the name. If does not exist a page with the given name by parameter, the browser will switch to the main page .

Use:

await s.selectPage(name);

Example:

await s.selectPage('secondPage');
await s.selectPage('main');

async selectPageByIndex (index: <number>)

This method allows to switch between the different pages using the index of array pages instances of the browser.

Use:

await s.selectPageByIndex(index);

Example:

await s.selectPageByIndex(0);
await s.selectPageByIndex(1);
await s.selectPageByIndex(5);

async scrap

This method will run the different steps which have been passed to the Scrapper.

Use:

await s.scrap();

getCollectedData

This method returns the data collected across the different steps.

  • If a string parameter is sent, the function will return the specific property stored according to the parameter.
  • If is not sent any parameter, an array will be returned with all data collected.

Use:

const fullData = s.getCollectedData();
const specificData = s.getCollectedData('propertyStored');

getSteps

This method returns the array with current steps

Use:

const data = s.getSteps();

setSteps

This method sets the steps.

Use:

const data = s.setSteps([<Step>]);

async closeBrowser

This method closes the browser.

Use:

await s.closeBrowser();

Methods (Steps)

You can invoke it in individual way or using in steps.

click

Definition:

{
    type: 'click',
    selector: <string:required>, // Selector which will be used for complete the action
    waitFor: <number:default=0> // Milliseconds to await after complete the action
}

Example:

{
    type: 'click',
    selector: 'a.mylink'
    waitFor: 10
}

collect-data

Definition: You can store data inside an object of the your Scrapper instance, this data can be retrieved using the method getCollectedData;

{
    type: 'collect-data',
    prop: '<string|required>', // Name of prop inside the *collectedData*
    selector: <string:required>, // Selector which will be used for complete the action
    contentType: <string:default=innerText:options=innerText,outerHTML> // Type of information you can extract form the selector,
    multiple: <boolean:default=false> // By default returns only one value, otherwise, if there are more one value matched with the selector, will be returned an array
}

Example not-multiple:

{
    type: 'collect-data',
    prop: 'title',
    selector: 'h1',
    contentType: 'innerText',
    multiple: false
}

Example multiple:

{
    type: 'collect-data',
    prop: 'subtitles',
    selector: 'h3',
    contentType: 'innerText',
    multiple: true
}

fill-data

Definition: You can set values to inputs, selects or radio buttoms.

{
    type: 'fill-data',
    data: [<Data>],
    waitFor: <number:default=0> // Milliseconds to await after complete the action
}

<Data>: {
    type: <string:required:options=input,select,radio>,
    selector: <string:required>, // Selector which will be used for complete the action
    origin: <string:options=static,dynamic>, // 'Static' will take the value of the prop *value*, 'dynamic' will take the value of the *objectData* passed at moment of instance creation.
    value: <string> //(For radio buttoms is the index of the option. Example: '0' or '2'),
    waitFor: <number:default=0> // Milliseconds to await after complete the action
}

Example static value:

{
    type: 'fill-data',
    data: [{
        type: 'input',
        selector: '#homeaddress',
        origin: 'static',
        value: 'street 32' // *street 32* will be the value set,
        waitFor: 1000
    },{
        type: 'input',
        selector: '#phonenumber',
        origin: 'static',
        value: '18601234567' //*18601234567* will be the value set
    }],
    waitFor: <number:default=0> // Milliseconds to await after complete the action
}

Example dynamic value:

{
    type: 'fill-data',
    data: [{
        type: 'input',
        selector: '#homeaddress',
        origin: 'dynamic',
        value: 'user_home_address' // The value set will be the value for objectClass.user_home_address (Passed in the creation of the Scrapper instance)
    },{
        type: 'input',
        selector: '#phonenumber',
        origin: 'dynamic',
        value: 'user_phone' // The value set will be the value for objectClass.user_phone (Passed in the creation of the Scrapper instance)
    }],
    waitFor: 2000
}

go-to

Definition:

Definition of Puppeteer options for goTo

{
    type: 'go-to',
    link: <string:url:required>, // URL which will be visited
    waitUntil: <string:default=load:options=load,domcontentloaded,networkidle0,networkidle2>,
    timeout: <number>, // milliseconds
    waitFor: <number:default=0> // Milliseconds to await after complete the action
}

Example:

{
    type: 'go-to',
    link: 'https://github.com/zetogk',
    waitFor: 0
}

press-key

Definition:

List of keys - US Keyboard Layout

{
    type: 'press-key',
    key: <string:required>, // Visit the keyboard layout to know the keys
    waitFor: <number:default=0> // Milliseconds to await after complete the action
}

Example:

{
    type: 'press-key',
    key: 'Escape',
    waitFor: 1000
}

screenshot

Definition:

{
    type: 'screenshot'
}

Example:

{
    type: 'screenshot'
}

wait-for-selector

Definition:

{
    type: 'wait-for-selector',
    selector: <string:required>, // Selector which will be used for complete the action
    timeout: <number> // milliseconds
}

Example:

{
    type: 'wait-for-selector',
    selector: '.mydiv',
    timeout: 10000
}

Authors

Contributors