@sifrr/seo

Server Side Redering for any js based app as a express middleware.

Usage no npm install needed!

<script type="module">
  import sifrrSeo from 'https://cdn.skypack.dev/@sifrr/seo';
</script>

README

sifrr-seo ยท npm version Doscify

Server Side Pre-Redering for any js based app using puppeteer (headless chrome) with caching. Mainly focused on serving rendered content to crawlers/bots.

Features

  • Works with Custom Elements, Shadow DOM
  • Add custom JS to execute before or after rendering
  • key based Caching

How to use

Do npm i @sifrr/seo or yarn add @sifrr/seo or add the package to your package.json file.

Api

Basic usage

SifrrSeo listens for load page event and waits for any fetch, xhr request to complete before returning rendered HTML. It doesn't load any media content on server.

const SifrrSeo = require('@sifrr/seo');

// options
// `cacheStore`: same as store in [node-cache-manager](https://github.com/BryanDonovan/node-cache-manager) options, default: memory store with 100MB storage
// `maxCacheSize`: Maximum in-memory cache size (in MegaBytes)
// `ttl`: time to live for a cache request (in Seconds) 0 means infinity
// `cacheKey`: function that returns cache key for given req object
// `fullUrl`: function for middleware to determine fullUrl of express request
// `beforeRender`: this function will be executed in browser before rendering, doesn't take any arguments
// `afterRender`: this function will be executed in browser after rendering, doesn't take any arguments
// `filterOutgoingRequests`: This function is executed for every outgoing request in sifrr renderer, if this return false request will be blocked, else it will be allowed
//
// default values
const options = {
  cacheStore: 'memory', // default in memory caching
  maxCacheSize: 100,
  ttl: 0,
  cacheKey: (url, headers) => url,
  beforeRender: () => {},
  afterRender: () => {},
  filterOutgoingRequests: (url) => true
}

const sifrrSeo = new SifrrSeo(/* Array of user agents to render for */, options);
// By default array is made up of these crawl bot user agents:
// 'Googlebot', // Google
// 'Bingbot', // Bing
// 'Slurp', // Slurp
// 'DuckDuckBot', // DuckDuckGo
// 'Baiduspider', //Baidu
// 'YandexBot', // Yandex
// 'Sogou', // Sogou
// 'Exabot', // Exalead

// Add your own user agent for which you want to server render
// You can give sub string of regex string like '(Google|Microsoft).*'
sifrrSeo.addUserAgent(/* string */ 'Opera Mini');

// add middleware to any connect/express like server
// for example in express:
const express = require('express');
const server = express();

// Only use for GET requests as a express middleware
server.get(sifrrSeo.getExpressMiddleware(/* function to get full url from express request */ expressReq => `http://127.0.0.1:80${expressReq.originalUrl}`));
server.listen(8080);

// Use it programatically - Only renders get urls
// these url, headers are passed to other functions
sifrrSeo.render(
  url, /* Full url of page to render with protocol, domain, port, etc. */,
  headers = {
    /* Headers to send with GET request */
  }
).then(html => ...).catch((e) => {
  // It won't render the page if [rendering logic](#rendering-logic) is not satisfied and will throw error.
  // e.message === 'No Render' when it doesn't render
});

node-cache-manager supports a lot of stores: list.

Rendering logic

sifrr-seo only renders a request if it has no Referer header (i.e. direct browser requests) and if shouldRender returns true and if content-type is html.

Changing shouldRender()

Change sifrrSeo.shouldRender, by default it returns this._isUserAgent(headers) (details). eg:

sifrrSeo.shouldRender = (url, headers) => {
  // req is request argument given by server (express/connect)
  // return true to render it server-side, return false to not render it.
  return this.isUserAgent(req) && req.fullUrl.indexOf('html') >= 0;
};

Clearing cache

By default, server side rendered html is cached till you restart the server or if you close the browser. You can manually clear cache using

sifrrSeo.clearCache();

Higher level API

render()

returns Promise which resolves in server rendered html if url response has content-type html, else resolves in false.

sifrrSeo.render(
  url, /* Full url of page to render with protocol, domain, port, etc. */,
  headers = {
    /* Headers to send with GET request */
  }
);

_isUserAgent(headers)

Returns true if headers['user-agent'] matches any of user-agents given in initialization

close()

closes puppeteer browser instance

sifrrSeo.close();

setPuppeteerOption()

adds puppeteer launch option. see list of options here.

Example: sifrrSeo.setPuppeteerOption('headless', false) to run it without headless mode

sifrrSeo.addPuppeteerOption('headless', false);

puppeteerOptions

return options that will be used to launch puppeteer instance.

sifrrSeo.puppeteerOptions;

Note: Note that first server render will be slow (depending on server machine), but subsequent requests will be really fast because of caching (depending on efficiency of cache key).

Tips

  • Don't use external scripts in pages without a good cache age.

  • Pre-render a bunch of urls

const fs = require('fs');
const joinPath = require('path').join;

const seo = new SifrrSeo();
seo.shouldRender = () => true;

async function renderUrls(
  urls = [
    /* array of urls */
  ],
  path = url => url
) {
  for (let i = 0; i < urls.length; i++) {
    const html = await seo.render(urls[i]);
    await new Promise((res, rej) =>
      fs.writeFile(path(urls[i]), html, err => {
        if (err) rej(err);
        res('The file has been saved!');
      })
    );
  }
  await seo.close();
}

renderUrls(['http://localhost:8080/abcd', 'http://localhost:8080/whatever'], u =>
  joinPath(__dirname, '.' + u.slice(21))
);