hafas-gtfs-rt-feed

Generate a GTFS Realtime feed by monitoring a HAFAS endpoint.

Usage no npm install needed!

<script type="module">
  import hafasGtfsRtFeed from 'https://cdn.skypack.dev/hafas-gtfs-rt-feed';
</script>

README

hafas-gtfs-rt-feed

Generate a GTFS Realtime (GTFS-RT) feed by polling a HAFAS endpoint.

npm version build status Prosperity/Apache license support me via GitHub Sponsors chat with me on Twitter

Architecture

hafas-gtfs-rt-feed consists of several components, connected to each other via NATS Streaming channels:

  1. monitor-hafas: Given a hafas-client instance, it uses hafas-monitor-trips to poll live data about all vehicles in the configured geographic area.
  2. match-with-gtfs: Uses match-gtfs-rt-to-gtfs to match this data against static GTFS data imported into a database.
  3. serve-as-gtfs-rt: Uses gtfs-rt-differential-to-full-dataset to aggregate the matched data into a single GTFS-RT feed, and serves the feed via HTTP.

monitor-hafas sends data to match-with-gtfs via two NATS Streaming channels trips & movements; match-with-gtfs sends data to serve-as-gtfs-rt via two channels matched-trips & matched-movements.

                   GTFS data in a            clients
HAFAS API          PostgreSQL DB                ^
   ^ |                  ^ |                     | GTFS-RT
   | |                  | |                     |
   | v                  | v                     |
monitor-hafas      match-with-gtfs        serve-as-gtfs-rt
   ||                 ^^   ||                   ^  ^
   ||                 ||   ||                   |  |
   |+----> trips -----+|   |+--> matched-trips -+  |
   +-----> movements --+   +---> matched-movements +

Usage

Some preparations are necessary for hafas-gtfs-rt-feed to work. Let's get started!

Run npm init inside a new directory to initialize an empty npm-based project.

mkdir deutsche-bahn-gtfs-rt-feed
cd deutsche-bahn-gtfs-rt-feed
npm init

set up NATS Streaming

Install and run the NATS Streaming Server as documented.

Note: If you run Nats Streaming on a different host or port, pass a custom NATS_STREAMING_URL environment variable into all hafas-gtfs-rt-feed components.

set up PostgreSQL

Make sure you have a reasonably recent version of PostgreSQL installed and running. There are guides for many operating systems and environments available on the internet.

Note: If you run PostgreSQL on a different host or port, pass custom PG* environment variables into the match.js component.

install hafas-gtfs-rt-feed

Use the npm CLI:

npm install hafas-gtfs-rt-feed
# added 153 packages in 12s

set up a hafas-client instance

hafas-gtfs-rt-feed is agnostic to the HAFAS API it pulls data from: To fetch data, monitor-hafas just uses the hafas-client you passed in, which you must point towards one out of many HAFAS API endpoints.

Set up hafas-client as documented. A very basic example using the Deutsche Bahn (DB) endpoint:

// deutsche-bahn-hafas.js
const createClient = require('hafas-client')
const dbProfile = require('hafas-client/p/db')

// create hafas-client configured to use Deutsche Bahn's HAFAS API
const client = createClient(dbProfile, 'my-awesome-program')

module.exports = client

build the GTFS matching database

match-with-gtfs needs a pre-populated matching database to run; It uses gtfs-via-postgres and match-gtfs-rt-to-gtfs underneath.

First, we're going to use gtfs-via-postgres's gtfs-to-sql command-line tool to import our GTFS data into PostgreSQL.

Note: Make sure you have an up-to-date GTFS Static dataset, unzipped into individual .txt files.

# create a PostgreSQL database `gtfs`
psql -c 'create database gtfs'
# configure all subsequent commands to use it
export PGDATABASE=gtfs
# import all .txt files
node_modules/.bin/gtfs-to-sql -d -u path/to/gtfs/files/*.txt

You database gtfs should contain basic GTFS data now.


match-gtfs-rt-to-gtfs works by matching HAFAS stops & lines against GTFS stops & lines, using their IDs and their names. Usually, HAFAS & GTFS stop/line names don't have the same format, so they need to be normalized.

You'll have to implement this normalization logic. A simplified (but very naive) normalization logic may look like this:

// hafas-info.js
module.exports = {
    endpointName: 'some-hafas-api',
    normalizeStopName: name => name.toLowerCase().replace(/\s+/g, ' ').trim(),
    normalizeLineName: name => name.toLowerCase().replace(/\s+/g, ' ').trim(),
}
// gtfs-info.js
module.exports = {
    endpointName: 'some-gtfs-feed',
    normalizeStopName: name => name.toLowerCase().replace(/\s+St\.$/, ''),
    normalizeLineName: name => name.toLowerCase(),
}

match-gtfs-rt-to-gtfs needs some special matching indices in the database to work efficiently. Now that we have implemented some normalization logic, we're going to pass it to match-gtfs-rt-to-gtfs's build-gtfs-match-index command-line tool:

# add matching indices to the `gtfs` database
node_modules/.bin/build-gtfs-match-index path/to/hafas-info.js path/to/gtfs-info.js

Note: hafas-gtfs-rt-feed is data- & region-agnostic, so it depends on your HAFAS-endpoint-specific name normalization logic to match as many HAFAS trips/vehicles as possible against the GTFS data. The ratio matched items would ideally be 100%, because GTFS-RT feeds are intended to be consumed along a GTFS dataset with matching IDs.

run all components

Now that we've set everything up, let's run all hafas-gtfs-rt-feed components to check if they are working!

All three components need to be run in parallel, so just open three terminals to run them. They will start logging pino-formatted log messages.

# specify the bounding box to be monitored (required)
export BBOX='{"north": 1.1, "west": 22.2, "south": 3.3, "east": 33.3}'
# start monitor-hafas
node_modules/.bin/monitor-hafas deutsche-bahn-hafas.js
# todo: sample logs
node_modules/.bin/match-with-gtfs
# todo: sample logs
node_modules/.bin/serve-as-gtfs-rt

inspecting the feed

Your GTFS-RT feed should now be served at http://localhost:3000/, and within a few moments, it should contain data! đź‘Ź

You can verify this using many available GTFS-RT tools; Here are two of them to quickly inspect the feed:

  • print-gtfs-rt-cli is a command-line tool, use it with curl: curl 'http://localhost:3000/' -s | print-gtfs-rt.
  • gtfs-rt-inspector is a web app that can inspect any CORS-enabled GTFS-RT feed; Paste http://localhost:3000/ into the url field to inspect yours.

After monitor.js has fetched some data from HAFAS, and after match.js has matched it against the GTFS (or failed or timed out doing so), you should see TripUpdates & VehiclePositions.

metrics

All three components (monitor-hafas, match-with-gtfs, serve-as-gtfs-rt) expose Prometheus-compatible metrics via HTTP. You can fetch and process them using e.g. Prometheus, VictoriaMetrics or the Grafana Agent.

As an example, we're going to inspect monitor-hafas's metrics. Enable them by running it with an METRICS_SERVER_PORT=9323 environment variable and query its metrics via HTTP:

curl 'http://localhost:9323/metrics'
# HELP nats_streaming_sent_total nr. of messages published to NATS streaming
# TYPE nats_streaming_sent_total counter
nats_streaming_sent_total{channel="movements"} 1673
nats_streaming_sent_total{channel="trips"} 1162

# HELP hafas_reqs_total nr. of HAFAS requests
# TYPE hafas_reqs_total counter
hafas_reqs_total{call="radar"} 12
hafas_reqs_total{call="trip"} 1165

# HELP hafas_response_time_seconds HAFAS response time
# TYPE hafas_response_time_seconds summary
hafas_response_time_seconds{quantile="0.05",call="radar"} 1.0396666666666665
hafas_response_time_seconds{quantile="0.5",call="radar"} 3.8535000000000004
hafas_response_time_seconds{quantile="0.95",call="radar"} 6.833
hafas_response_time_seconds_sum{call="radar"} 338.22600000000006
hafas_response_time_seconds_count{call="radar"} 90
hafas_response_time_seconds{quantile="0.05",call="trip"} 2.4385
hafas_response_time_seconds{quantile="0.5",call="trip"} 28.380077380952383
hafas_response_time_seconds{quantile="0.95",call="trip"} 54.51257142857143
hafas_response_time_seconds_sum{call="trip"} 33225.48200000005
hafas_response_time_seconds_count{call="trip"} 1165

# HELP tiles_fetched_total nr. of tiles fetched from HAFAS
# TYPE tiles_fetched_total counter
tiles_fetched_total 2

# HELP movements_fetched_total nr. of movements fetched from HAFAS
# TYPE movements_fetched_total counter
movements_fetched_total 362

# HELP fetch_all_movements_total how often all movements have been fetched
# TYPE fetch_all_movements_total counter
fetch_all_movements_total 1

# HELP fetch_all_movements_duration_seconds time that fetching all movements currently takes
# TYPE fetch_all_movements_duration_seconds gauge
fetch_all_movements_duration_seconds 2.4

health check

serve-as-gtfs-rt exposes a health check that checks if there are any recent entities in the feed.

# healthy
curl 'http://localhost:3000/health' -I
# HTTP/1.1 200 OK
# …

# not healthy
curl 'http://localhost:3000/health' -I
# HTTP/1.1 503 Service Unavailable
# …

on-demand mode

Optionally, you can run your GTFS-RT feed in a demand-responsive mode, where it will only fetch data from HAFAS as long someone requests the GTFS-RT feed, which effectively reduces the long-term nr. of requests to HAFAS.

To understand how this works, remember that

  • movements fetched from HAFAS are formatted as GTFS-RT VehiclePositions.
  • trips fetched from HAFAS are formatted as GTFS-RT TripUpdates.
  • the whole monitor-hafas, match-with-gtfs & serve-as-gtfs-rt setup works like a streaming pipeline.

The on-demand mode works like this:

  • monitor-hafas is either just fetching movements (if you configured it to fetch only trips on demand) or completely idle (if you configured it to fetch both movements & trips on demand) by default.
  • monitor-hafas also subscribes to a demand NATS Streaming channel, which serves as a communication channel for serve-as-gtfs-rt to signal demand.
  • When the GTFS-RT feed is requested via HTTP,
    1. serve-as-gtfs-rt serves the current feed (which contains either VehiclePositionss only, or no entities whatsoever, depending on the on-demand configuration).
    2. serve-as-gtfs-rt signals demand via the demand channel.
    3. Upon receiving a demand signal, monitor-hafas will start fetching trips – or both movements & trips, depending on the on-demand configuration.

This means that, after a first request(s) for the GTFS-RT feed signalling demand, it will take a bit of time until all data is served with subsequent GTFS-RT feed requests; As long as there is constant for the feed, the on-demand mode will behave as if it isn't turned on.

Tell serve-as-gtfs-rt to signal demand via the --signal-demand option. You can then configure monitor-hafas's exact behaviour using the following options:

--movements-fetch-mode <mode>
    Control when movements are fetched from HAFAS.
    "on-demand":
        Only fetch movements from HAFAS when the `serve-as-gtfs-rt` component
        has signalled demand. Trips won't be fetched continuously anymore.
    "continuously" (default):
        Always fetch movements.
--movements-demand-duration <milliseconds>
    With `--movements-fetch-mode "on-demand"`, when the `serve-as-gtfs-rt` component
    has signalled demand, for how long shall movements be fetched?
    Default: movements fetching interval (60s by default) * 5
--trips-fetch-mode <mode>
    Control when trips are fetched from HAFAS.
    "never":
        Never fetch a movement's respective trip.
    "on-demand":
        Only fetch movements' respective trips from HAFAS when the `serve-as-gtfs-rt`
        component has signalled demand.
    "continuously" (default):
        Always fetch each movement's respective trip.
--trips-demand-duration <milliseconds>
    With `--trips-fetch-mode "on-demand"`, when the `serve-as-gtfs-rt` component
    has signalled demand, for how long shall trips be fetched?
    Default: movements fetching interval (60s by default) * 2

controlling the number of requests to HAFAS

Currently, there is no mechanism to influence the total rate of requests to HAFAS directly, no prioritisation between the "find trips in a bounding box" (hafas-client's radar()) and "refresh a trip" (hafas-client's trip()) requests, and no logic to efficiently use requests up to a certain configured limit.

However, there are some dials to influence the amount requests of both types:

  • By defining a smaller or larger bounding box via the BBOX environment variable, you can control the total number of monitored trips, and thus the rate of requests.
  • By setting FETCH_TILES_INTERVAL, you can choose how often the bounding box (or the vehicles within, rather) shall be refreshed, and subsequently how often each trip will be fetched if you have configured that. Note that if a refresh takes longer to than the configured interval, another refresh will follow right after, but the total rate of radar() requests to HAFAS will be lower.
  • You can throttle the total number of requests to HAFAS by throttling hafas-client, but depending on the rate you configure, this might cause the refresh of all monitored trips (as well as finding new trips to monitor) to take longer than configured using FETCH_TRIPS_INTERVAL, so consider it as a secondary tool.

exposing feed metadata

If you pass metadata about the GTFS-Static feed used, serve-as-gtfs-rt will expose it via HTTP:

serve-as-gtfs-rt \
    --static-feed-info path/to/gtfs/files/feed_info.txt \
    --static-feed-url https://data.ndovloket.nl/flixbus/flixbus-eu.zip

curl 'http://localhost:3000/feed_info.csv'
# feed_publisher_name,feed_publisher_url,feed_lang,feed_start_date,feed_end_date,feed_version
# openOV,http://openov.nl,en,20210108,20210221,20210108

curl 'http://localhost:3000/feed_info.csv' -I
# HTTP/1.1 302 Found
# location: https://data.ndovloket.nl/flixbus/flixbus-eu.zip

Related

License

This project is dual-licensed: My contributions are licensed under the Prosperity Public License, contributions of other people are licensed as Apache 2.0.

This license allows you to use and share this software for noncommercial purposes for free and to try this software for commercial purposes for thirty days.

Personal use for research, experiment, and testing for the benefit of public knowledge, personal study, private entertainment, hobby projects, amateur pursuits, or religious observance, without any anticipated commercial application, doesn’t count as use for a commercial purpose.

Buy a commercial license or read more about why I sell private licenses for my projects.

Contributing

If you have a question or have difficulties using hafas-gtfs-rt-feed, please double-check your code and setup first. If you think you have found a bug or want to propose a feature, refer to the issues page.

By contributing, you agree to release your modifications under the Apache 2.0 license.