README
hafas-gtfs-rt-feed
Generate a GTFS Realtime (GTFS-RT) feed by polling a HAFAS endpoint.
Architecture
hafas-gtfs-rt-feed
consists of several components, connected to each other via NATS Streaming channels:
monitor-hafas
: Given ahafas-client
instance, it useshafas-monitor-trips
to poll live data about all vehicles in the configured geographic area.match-with-gtfs
: Usesmatch-gtfs-rt-to-gtfs
to match this data against static GTFS data imported into a database.serve-as-gtfs-rt
: Usesgtfs-rt-differential-to-full-dataset
to aggregate the matched data into a single GTFS-RT feed, and serves the feed via HTTP.
monitor-hafas
sends data to match-with-gtfs
via two NATS Streaming channels trips
& movements
; match-with-gtfs
sends data to serve-as-gtfs-rt
via two channels matched-trips
& matched-movements
.
GTFS data in a clients
HAFAS API PostgreSQL DB ^
^ | ^ | | GTFS-RT
| | | | |
| v | v |
monitor-hafas match-with-gtfs serve-as-gtfs-rt
|| ^^ || ^ ^
|| || || | |
|+----> trips -----+| |+--> matched-trips -+ |
+-----> movements --+ +---> matched-movements +
Usage
Some preparations are necessary for hafas-gtfs-rt-feed
to work. Let's get started!
Run npm init
inside a new directory to initialize an empty npm-based project.
mkdir deutsche-bahn-gtfs-rt-feed
cd deutsche-bahn-gtfs-rt-feed
npm init
set up NATS Streaming
Install and run the NATS Streaming Server as documented.
Note: If you run Nats Streaming on a different host or port, pass a custom NATS_STREAMING_URL
environment variable into all hafas-gtfs-rt-feed
components.
set up PostgreSQL
Make sure you have a reasonably recent version of PostgreSQL installed and running. There are guides for many operating systems and environments available on the internet.
Note: If you run PostgreSQL on a different host or port, pass custom PG*
environment variables into the match.js
component.
hafas-gtfs-rt-feed
install Use the npm CLI:
npm install hafas-gtfs-rt-feed
# added 153 packages in 12s
hafas-client
instance
set up a hafas-gtfs-rt-feed
is agnostic to the HAFAS API it pulls data from: To fetch data, monitor-hafas
just uses the hafas-client
you passed in, which you must point towards one out of many HAFAS API endpoints.
Set up hafas-client
as documented. A very basic example using the Deutsche Bahn (DB) endpoint:
// deutsche-bahn-hafas.js
const createClient = require('hafas-client')
const dbProfile = require('hafas-client/p/db')
// create hafas-client configured to use Deutsche Bahn's HAFAS API
const client = createClient(dbProfile, 'my-awesome-program')
module.exports = client
build the GTFS matching database
match-with-gtfs
needs a pre-populated matching database to run; It uses gtfs-via-postgres
and match-gtfs-rt-to-gtfs
underneath.
First, we're going to use gtfs-via-postgres
's gtfs-to-sql
command-line tool to import our GTFS data into PostgreSQL.
Note: Make sure you have an up-to-date GTFS Static dataset, unzipped into individual .txt
files.
# create a PostgreSQL database `gtfs`
psql -c 'create database gtfs'
# configure all subsequent commands to use it
export PGDATABASE=gtfs
# import all .txt files
node_modules/.bin/gtfs-to-sql -d -u path/to/gtfs/files/*.txt
You database gtfs
should contain basic GTFS data now.
match-gtfs-rt-to-gtfs
works by matching HAFAS stops & lines against GTFS stops & lines, using their IDs and their names. Usually, HAFAS & GTFS stop/line names don't have the same format, so they need to be normalized.
You'll have to implement this normalization logic. A simplified (but very naive) normalization logic may look like this:
// hafas-info.js
module.exports = {
endpointName: 'some-hafas-api',
normalizeStopName: name => name.toLowerCase().replace(/\s+/g, ' ').trim(),
normalizeLineName: name => name.toLowerCase().replace(/\s+/g, ' ').trim(),
}
// gtfs-info.js
module.exports = {
endpointName: 'some-gtfs-feed',
normalizeStopName: name => name.toLowerCase().replace(/\s+St\.$/, ''),
normalizeLineName: name => name.toLowerCase(),
}
match-gtfs-rt-to-gtfs
needs some special matching indices in the database to work efficiently. Now that we have implemented some normalization logic, we're going to pass it to match-gtfs-rt-to-gtfs
's build-gtfs-match-index
command-line tool:
# add matching indices to the `gtfs` database
node_modules/.bin/build-gtfs-match-index path/to/hafas-info.js path/to/gtfs-info.js
Note: hafas-gtfs-rt-feed
is data- & region-agnostic, so it depends on your HAFAS-endpoint-specific name normalization logic to match as many HAFAS trips/vehicles as possible against the GTFS data. The ratio matched items would ideally be 100%, because GTFS-RT feeds are intended to be consumed along a GTFS dataset with matching IDs.
run all components
Now that we've set everything up, let's run all hafas-gtfs-rt-feed
components to check if they are working!
All three components need to be run in parallel, so just open three terminals to run them. They will start logging pino-formatted log messages.
# specify the bounding box to be monitored (required)
export BBOX='{"north": 1.1, "west": 22.2, "south": 3.3, "east": 33.3}'
# start monitor-hafas
node_modules/.bin/monitor-hafas deutsche-bahn-hafas.js
# todo: sample logs
node_modules/.bin/match-with-gtfs
# todo: sample logs
node_modules/.bin/serve-as-gtfs-rt
inspecting the feed
Your GTFS-RT feed should now be served at http://localhost:3000/
, and within a few moments, it should contain data! đź‘Ź
You can verify this using many available GTFS-RT tools; Here are two of them to quickly inspect the feed:
print-gtfs-rt-cli
is a command-line tool, use it withcurl
:curl 'http://localhost:3000/' -s | print-gtfs-rt
.gtfs-rt-inspector
is a web app that can inspect any CORS-enabled GTFS-RT feed; Pastehttp://localhost:3000/
into the url field to inspect yours.
After monitor.js
has fetched some data from HAFAS, and after match.js
has matched it against the GTFS (or failed or timed out doing so), you should see TripUpdate
s & VehiclePosition
s.
metrics
All three components (monitor-hafas
, match-with-gtfs
, serve-as-gtfs-rt
) expose Prometheus-compatible metrics via HTTP. You can fetch and process them using e.g. Prometheus, VictoriaMetrics or the Grafana Agent.
As an example, we're going to inspect monitor-hafas
's metrics. Enable them by running it with an METRICS_SERVER_PORT=9323
environment variable and query its metrics via HTTP:
curl 'http://localhost:9323/metrics'
# HELP nats_streaming_sent_total nr. of messages published to NATS streaming
# TYPE nats_streaming_sent_total counter
nats_streaming_sent_total{channel="movements"} 1673
nats_streaming_sent_total{channel="trips"} 1162
# HELP hafas_reqs_total nr. of HAFAS requests
# TYPE hafas_reqs_total counter
hafas_reqs_total{call="radar"} 12
hafas_reqs_total{call="trip"} 1165
# HELP hafas_response_time_seconds HAFAS response time
# TYPE hafas_response_time_seconds summary
hafas_response_time_seconds{quantile="0.05",call="radar"} 1.0396666666666665
hafas_response_time_seconds{quantile="0.5",call="radar"} 3.8535000000000004
hafas_response_time_seconds{quantile="0.95",call="radar"} 6.833
hafas_response_time_seconds_sum{call="radar"} 338.22600000000006
hafas_response_time_seconds_count{call="radar"} 90
hafas_response_time_seconds{quantile="0.05",call="trip"} 2.4385
hafas_response_time_seconds{quantile="0.5",call="trip"} 28.380077380952383
hafas_response_time_seconds{quantile="0.95",call="trip"} 54.51257142857143
hafas_response_time_seconds_sum{call="trip"} 33225.48200000005
hafas_response_time_seconds_count{call="trip"} 1165
# HELP tiles_fetched_total nr. of tiles fetched from HAFAS
# TYPE tiles_fetched_total counter
tiles_fetched_total 2
# HELP movements_fetched_total nr. of movements fetched from HAFAS
# TYPE movements_fetched_total counter
movements_fetched_total 362
# HELP fetch_all_movements_total how often all movements have been fetched
# TYPE fetch_all_movements_total counter
fetch_all_movements_total 1
# HELP fetch_all_movements_duration_seconds time that fetching all movements currently takes
# TYPE fetch_all_movements_duration_seconds gauge
fetch_all_movements_duration_seconds 2.4
health check
serve-as-gtfs-rt
exposes a health check that checks if there are any recent entities in the feed.
# healthy
curl 'http://localhost:3000/health' -I
# HTTP/1.1 200 OK
# …
# not healthy
curl 'http://localhost:3000/health' -I
# HTTP/1.1 503 Service Unavailable
# …
on-demand mode
Optionally, you can run your GTFS-RT feed in a demand-responsive mode, where it will only fetch data from HAFAS as long someone requests the GTFS-RT feed, which effectively reduces the long-term nr. of requests to HAFAS.
To understand how this works, remember that
- movements fetched from HAFAS are formatted as GTFS-RT
VehiclePosition
s. - trips fetched from HAFAS are formatted as GTFS-RT
TripUpdate
s. - the whole
monitor-hafas
,match-with-gtfs
&serve-as-gtfs-rt
setup works like a streaming pipeline.
The on-demand mode works like this:
monitor-hafas
is either just fetching movements (if you configured it to fetch only trips on demand) or completely idle (if you configured it to fetch both movements & trips on demand) by default.monitor-hafas
also subscribes to ademand
NATS Streaming channel, which serves as a communication channel forserve-as-gtfs-rt
to signal demand.- When the GTFS-RT feed is requested via HTTP,
serve-as-gtfs-rt
serves the current feed (which contains eitherVehiclePositions
s only, or no entities whatsoever, depending on the on-demand configuration).serve-as-gtfs-rt
signals demand via thedemand
channel.- Upon receiving a demand signal,
monitor-hafas
will start fetching trips – or both movements & trips, depending on the on-demand configuration.
This means that, after a first request(s) for the GTFS-RT feed signalling demand, it will take a bit of time until all data is served with subsequent GTFS-RT feed requests; As long as there is constant for the feed, the on-demand mode will behave as if it isn't turned on.
Tell serve-as-gtfs-rt
to signal demand via the --signal-demand
option. You can then configure monitor-hafas
's exact behaviour using the following options:
--movements-fetch-mode <mode>
Control when movements are fetched from HAFAS.
"on-demand":
Only fetch movements from HAFAS when the `serve-as-gtfs-rt` component
has signalled demand. Trips won't be fetched continuously anymore.
"continuously" (default):
Always fetch movements.
--movements-demand-duration <milliseconds>
With `--movements-fetch-mode "on-demand"`, when the `serve-as-gtfs-rt` component
has signalled demand, for how long shall movements be fetched?
Default: movements fetching interval (60s by default) * 5
--trips-fetch-mode <mode>
Control when trips are fetched from HAFAS.
"never":
Never fetch a movement's respective trip.
"on-demand":
Only fetch movements' respective trips from HAFAS when the `serve-as-gtfs-rt`
component has signalled demand.
"continuously" (default):
Always fetch each movement's respective trip.
--trips-demand-duration <milliseconds>
With `--trips-fetch-mode "on-demand"`, when the `serve-as-gtfs-rt` component
has signalled demand, for how long shall trips be fetched?
Default: movements fetching interval (60s by default) * 2
controlling the number of requests to HAFAS
Currently, there is no mechanism to influence the total rate of requests to HAFAS directly, no prioritisation between the "find trips in a bounding box" (hafas-client
's radar()
) and "refresh a trip" (hafas-client
's trip()
) requests, and no logic to efficiently use requests up to a certain configured limit.
However, there are some dials to influence the amount requests of both types:
- By defining a smaller or larger bounding box via the
BBOX
environment variable, you can control the total number of monitored trips, and thus the rate of requests. - By setting
FETCH_TILES_INTERVAL
, you can choose how often the bounding box (or the vehicles within, rather) shall be refreshed, and subsequently how often each trip will be fetched if you have configured that. Note that if a refresh takes longer to than the configured interval, another refresh will follow right after, but the total rate ofradar()
requests to HAFAS will be lower. - You can throttle the total number of requests to HAFAS by throttling
hafas-client
, but depending on the rate you configure, this might cause the refresh of all monitored trips (as well as finding new trips to monitor) to take longer than configured usingFETCH_TRIPS_INTERVAL
, so consider it as a secondary tool.
exposing feed metadata
If you pass metadata about the GTFS-Static feed used, serve-as-gtfs-rt
will expose it via HTTP:
serve-as-gtfs-rt \
--static-feed-info path/to/gtfs/files/feed_info.txt \
--static-feed-url https://data.ndovloket.nl/flixbus/flixbus-eu.zip
curl 'http://localhost:3000/feed_info.csv'
# feed_publisher_name,feed_publisher_url,feed_lang,feed_start_date,feed_end_date,feed_version
# openOV,http://openov.nl,en,20210108,20210221,20210108
curl 'http://localhost:3000/feed_info.csv' -I
# HTTP/1.1 302 Found
# location: https://data.ndovloket.nl/flixbus/flixbus-eu.zip
Related
hafas-gtfs-rt-server-example
– Usinghafas-client
,hafas-monitor-trips
&hafas-gtfs-rt-feed
as a GTFS-RT server.match-gtfs-rt-to-gtfs
– Match realtime transit data (e.g. from GTFS Realtime) with GTFS Static data, even if they don't share an ID.gtfs-rt-differential-to-full-dataset
– Transform a continuous GTFS Realtime stream ofDIFFERENTIAL
incrementality data into aFULL_DATASET
dump.transloc-to-gtfs-real-time
– Transform Transloc Real Time API to the GTFS RealTime Format
License
This project is dual-licensed: My contributions are licensed under the Prosperity Public License, contributions of other people are licensed as Apache 2.0.
This license allows you to use and share this software for noncommercial purposes for free and to try this software for commercial purposes for thirty days.
Personal use for research, experiment, and testing for the benefit of public knowledge, personal study, private entertainment, hobby projects, amateur pursuits, or religious observance, without any anticipated commercial application, doesn’t count as use for a commercial purpose.
Buy a commercial license or read more about why I sell private licenses for my projects.
Contributing
If you have a question or have difficulties using hafas-gtfs-rt-feed
, please double-check your code and setup first. If you think you have found a bug or want to propose a feature, refer to the issues page.
By contributing, you agree to release your modifications under the Apache 2.0 license.