README

logo

Kasha

Pre-render your Single-Page Application.

workflow

Features

Prerender the Single-Page Application.
Automatically collect sitemaps from <meta>s.
Generate robots.txt with sitemap directives.
Sync prerendering.
Async prerendering with callback URL.
URL rewriting.
Works as a proxy server.
Rich APIs.
Caching.

Requirements

MongoDB
nsq

SPA compatibility adjustments

In order to make the pre-rendered SPA works correctly in the client-side browser, you need to do some works:

When pre-rendering, intercept the anonymous AJAX requests and store the responses into <script> tag, so AJAX requests would not send again on the client-side. Our AJAX library teleman and teleman-ssr-cache may help you.
On the client-side, mount the SPA and replace the pre-rendered content.
Set <meta> tags, so search engine can know more about the page. You can use set-meta.

Installation

npm i -g kasha

Docker:

docker pull kasha/kasha

Configuration

See config.sample.js

Running

Start the server:

kasha server --config=/path/to/config.js

Docker:

docker run -v /path/to/config.js:/dest/to/config.js kasha/kasha server --config=/dest/to/config.js

Start the worker:

kasha worker --config=/path/to/config.js

# async worker
# requests with 'callbackURL' parameter will be dispatched to async workers.
kasha worker --async --config=/path/to/config.js

Docker:

docker run -v /path/to/config.js:/dest/to/config.js kasha/kasha worker [--async] --config=/dest/to/config.js

Site Config

db.sites.insert({
  // The hostname of your site.
  host: 'www.example.com',

  // In proxy mode, if the request doesn't contain 'X-Forwarded-Proto' or 'Forwarded:...proto=...' header,
  // then use 'defaultProtocol'.
  defaultProtocol: 'https',
  
  // If your site use REST-style URLs, like /article/123, the query string isn't necessary to the page,
  // you can remove the query string to improve the cache hit rate:
  // keepQuery: false,

  // You can also keep the required query parameter of some URLs
  keepQuery: [
    [
      '/search', // the first element is the pathname of URL.
      'type', // starting from the second element, specifies the query names you need to keep.
      'keyword'
    ],

    // another URL and its query names
    ['/product', 'id']
  ],

  // You can use the '/render' API to crawl the hash-based Single-page application.
  // For example, you can crawl https://www.example.com/app/#/home via
  // /render?url=https%3A%2F%2Fwww.example.com%2Fapp%2F%23%2Fhome
  
  // But if this site is not hash-based, you can remove the hash:
  keepHash: false,
  
  // Rewrites the request URL.
  rewrites: [
    // [from, to]
    // If 'to' is an empty string, the request will be aborted.
    // pattern syntax see https://github.com/jiangfengming/url-router#pattern

    // route all requests to the entry point HTML file
    ['https://www.example.com/(.*)', 'https://static.example.com/index.html'],

    // except robots.txt
    ['https://www.example.com/robots.txt', 'https://static.example.com/robots.txt'],

    // or block it if you do not have one
    // ['https://www.example.com/robots.txt', ''],

    // block google analytics requests
    ['https://www.googletagmanager.com/(.*)', '']
  ],

  // Excludes the pages that don't need pre-rendering.
  excludes: [
    '/your-account/(.*)',
    '/your-orders/(.*)'
  ],

  // But include these pages that matched the excludes pattern
  includes: [
    'your-account/signin'
  ],
  
  // Specifies the User-Agent
  userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/73.0.3683.103 Safari/537.36',
  
  // You can create profiles for different device types.
  // A profile can override keepQuery, keepHash, rewrites, excludes, includes, userAgent.

  profiles: {
    desktop: {
      userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/73.0.3683.103 Safari/537.36',
      rewrites: [
        [
          'https://www.example.com/(.*)',
          'https://static.example.com/desktop/index.html'
        ]
      ]
    },

    mobile: {
      userAgent: 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/73.0.3683.103 Mobile Safari/537.36',
      rewrites: [
        [
          'https://www.example.com/(.*)',
          'https://static.example.com/mobile/index.html'
        ]
      ]
    }
  },

  // If profile param of the request isn't set, use this profile
  defaultProfile: 'desktop'
})

APIs

Please confirm apiHost has been set correctly.

For example, if set apiHost: '127.0.0.1:3000', then only requests from http(s)://127.0.0.1:3000/* can access the APIs, All other domains are served in proxy mode.

GET /render

Renders the page.

Query string params:

url: The encoded URL of the webpage to render.

profile: The profile to use.

type: Set the response type. Defaults to json.

html: Returns html with header Content-Type: text/html.
json: Returns json with header Content-Type: application/json.
static: Returns html with header Content-Type: text/html, but stripped the <script> tags and on* event handlers.

callbackURL: Don't wait the result. Once the job is done, POST the result to the given URL with json format. If callbackURL is set, type is ignored.

metaOnly: If type is json, only returns meta data without html content.

followRedirect: Follows the redirects if the page return 301/302.

refresh: Forces to refresh the cache.

noWait: Don't wait for the response. It is useful for pre-caching the page.

fallback: If no cache found or the cache is expired, the request is proxied to the origin directly. If fallback is set, type must be html, callbackURL, metaOnly, followRedirect, refresh and noWait can not be set.

To the boolean parameters, if the param is absent or set to 0, it means false. If set to 1 or empty value (e.g., &refresh, &refresh=, &refresh=1), it means true.

Example: http://localhost:3000/render?url=https%3A%2F%2Fdavidwalsh.name%2Ffacebook-meta-tags

The returned JSON format example:

{
  "url": "https://davidwalsh.name/facebook-meta-tags",
  "profile": "",
  "status": 200,
  "redirect": null,
  "meta": {
    "title": "Facebook Open Graph META Tags",
    "description": "Facebook's Open Graph protocol allows for web developers to turn their websites into Facebook \"graph\" objects, allowing a certain level of customization over how information is carried over from a non-Facebook website to Facebook when a page is \"recommended\" and \"liked\".",
    "image": "https://davidwalsh.name/demo/facebook-developers-logo.png",
    "canonicalUrl": "https://davidwalsh.name/facebook-meta-tags",
    "author": "David Walsh",
    "keywords": null
  },
  "openGraph": {
    "og": {
      "locale": {
        "current": "en_US"
      },
      "type": "article",
      "title": "Facebook Open Graph META Tags",
      "description": "Facebook's Open Graph protocol allows for web developers to turn their websites into Facebook \"graph\" objects, allowing a certain level of customization over how information is carried over from a non-Facebook website to Facebook when a page is \"recommended\" and \"liked\".",
      "url": "https://davidwalsh.name/facebook-meta-tags",
      "site_name": "David Walsh Blog",
      "updated_time": "2016-02-23T00:44:54+00:00",
      "image": [
        {
          "url": "https://davidwalsh.name/demo/facebook-developers-logo.png",
          "secure_url": "https://davidwalsh.name/demo/facebook-developers-logo.png"
        },
        {
          "url": "https://davidwalsh.name/demo/david-facebook-share.png",
          "secure_url": "https://davidwalsh.name/demo/david-facebook-share.png"
        }
      ]
    },
    "article": {
      "publisher": "https://www.facebook.com/davidwalshblog",
      "section": "APIs",
      "published_time": "2011-04-25T09:24:28+00:00",
      "modified_time": "2016-02-23T00:44:54+00:00"
    }
  },
  "content": "<!DOCTYPE html><html>...</html>",
  "date": "2018-03-13T09:53:00.921Z"
}

GET /:url

Alias of /render?url=ENCODED_URL&type=html.

For example, http://localhost:3000/https://www.example.com/ is equivalent to http://localhost:3000/render?url=https%3A%2F%2Fwww.example.com%2F&type=html

And profile param can be set from Kasha-Profile header, fallback can be set from Kasha-Fallback header.

Notice: the hash of the url won't be sent to server. If you need the hash to be sent to the server, use the /render API.

Proxy mode

If host header of the request is not apiHost, or X-Forwarded-Host or Forwarded:...host=... header is set, Then the requested URL will be treated as url query param of /render API. And type is set to html.

For example, the following request

GET /
Host: www.example.com
Kasha-Profile: mobile
Kasha-Fallback: 1

is equivalent to http://localhost:3000/render?url=https%3A%2F%2Fwww.example.com%2F&type=html&profile=mobile&fallback=1

GET /cache?url=URL

Alias of /render?url=ENCODED_URL&noWait

GET /:site/robots.txt

Get robots.txt file with sitemaps collected by kasha. e.g.:

http://localhost:3000/https://www.example.com/robots.txt

It will fetch the https://www.example.com/robots.txt file, then append sitemap directives at the end. The result example:

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /private/

Sitemap: https://www.example.com/sitemaps.index.1.xml
Sitemap: https://www.example.com/sitemaps.index.google.1.xml
Sitemap: https://www.example.com/sitemaps.index.google.news.1.xml
Sitemap: https://www.example.com/sitemaps.index.google.image.1.xml
Sitemap: https://www.example.com/sitemaps.index.google.video.1.xml

GET /:site/sitemaps.:page.xml

Get sitemap of page N.

For example:

http://localhost:3000/https://www.example.com/sitemaps.1.xml

GET /:site/sitemaps.google.:page.xml

Get Google sitemap of page N.

GET /:site/sitemaps.index.google.:page.xml

Get Google sitemap index file of page N.

GET /:site/sitemaps.index.google.news.:page.xml

Get Google news sitemap index file of Page N.

GET /:site/sitemaps.index.google.image.:page.xml

Get Google image sitemap index file of Page N.

GET /:site/sitemaps.index.google.video.:page.xml

Get Google video sitemap index file of page N.

Collecting sitemap data

kasha can collect sitemap data from custom Open Graph <meta> tags. For example:

<head prefix="og: http://ogp.me/ns# sitemap: https://kasha-io.github.io/kasha/ns/sitemap#">

<!--
canonical url is used as <loc> tag of sitemap xml.
<meta property="og:url" content="..."> can be used also.
-->
<link rel="canonical" href="https://www.example.com/test.html">

<meta property="sitemap:changefreq" content="hourly">
<meta property="sitemap:priority" content="1">
<meta property="sitemap:news:publication:name" content="The Example Times">
<meta property="sitemap:news:publication:language" content="en">
<meta property="sitemap:news:publication_date" content="2018-05-25T09:19:54.000Z">
<meta property="sitemap:news:title" content="Page Title">
<meta property="sitemap:image:loc" content="http://examples.opengraphprotocol.us/media/images/train.jpg">
<meta property="sitemap:image:caption" content="The caption of the image.">
<meta property="sitemap:image:geo_location" content="Limerick, Ireland">
</head>

Sitemap data will be collected only if the origin of the canonical URL is the same as the current page.

See here for available tags: sitemap protocol and Google sitemap extensions

License

MIT

The logo is made from Prosymbols's camera icon licensed by Creative Commons BY 3.0.

Usage no npm install needed!

README

Kasha

Features

Requirements

SPA compatibility adjustments

Installation

Configuration

Running

Start the server:

Start the worker:

Site Config

APIs

GET /render

Query string params:

The returned JSON format example:

GET /:url

Proxy mode

GET /cache?url=URL

GET /:site/robots.txt

GET /:site/sitemaps.:page.xml

GET /:site/sitemaps.google.:page.xml

GET /:site/sitemaps.google.news.:page.xml

GET /:site/sitemaps.google.image.:page.xml

GET /:site/sitemaps.google.video.:page.xml

GET /:site/sitemaps.index.:page.xml

GET /:site/sitemaps.index.google.:page.xml

GET /:site/sitemaps.index.google.news.:page.xml

GET /:site/sitemaps.index.google.image.:page.xml

GET /:site/sitemaps.index.google.video.:page.xml

Collecting sitemap data

License