Contiamo Local Dev Environment
Get the dev environment fast!
Get the latest versions:
Start everything in normal mode:
Stop everything and clean up:
Prepare for Pantheon-external mode (only do this once):
sudo bash -c 'echo "127.0.0.1 metadb" >> /etc/hosts'
Start everything in Pantheon-external mode:
- (In Pantheon directory)
env METADB_URL="jdbc:postgresql://localhost:5433/pantheon?user=pantheon&password=test" DATASTORE_TYPE=external sbt run
verify-full mode on port 5435:
- Download the private key for
echo "127.0.0.1 pg-localhost.dev.contiamo.io" | sudo tee -a /etc/hosts
- You may need to tell your local
psqlabout the IdenTrust root we happen to be using:
curl https://letsencrypt.org/certs/trustid-x3-root.pem.txt > ~/.postgresql/root.crt
psql "email@example.com password=<token> dbname=<project UUID> sslmode=verify-full" -h pg-localhost.dev.contiamo.io -p 5435
Local development is supported via Docker Compose.
Additionally, the development requires access to our private docker registry. To access this ask the Ops team for permissions. Once permissions have been granted you must install the
Once installed, run
make docker-auth pull
This will attempt to
- authenticate with Google,
- configure your Docker installation to use the new Google credentials, and
- pull the required Docker images.
Starting a fresh environment
Finally, to start the development environment, run
Once the environment has started, you should see a message with a URL and credentials, like this
Dev ui: http://localhost:9898/contiamo/profile Email: firstname.lastname@example.org Password: localdev
Starting with the latest locadev snapshot
The above section starts with a completely empty environment. A standard development environment with preconfigured data sources (the internal metadbs) is provided in the project and can be started with
The existing environment (if any) will be stopped and destroyed, so be careful. It will then start the db, load the data, and then start the rest of the environment.
- contains two users
email@example.com with password
- has all of the datahub metadbs installed,
- there are two virtualdbs with two views each. One that shows the maintenance tasks inside Hub and the other showing the use of PostGIS queries
- Mr. Lemon is an admin for everything
- Lemon Jr is not an admin and has various permission levels,
liftdatais private and not available to Lemon Jr
- There is a basic amount of metadata assigned to the datasources and tables including custom fields, descriptions, a mix of names, and even one with documentation
This should allow for basic development and testing of most use cases.
Overriding the service images
The image for each service can be overridden using env variables
You can manually override the image used by setting the required variable and the restarting the services
export HUB_IMAGE=eu.gcr.io/dev-and-test-env/hub:v1.2.3 make stop start
The default environment runs only the core services required to support the Data Source integrations. To enable the
demo sign-up service or integration sync-agents for other resource types (like Tableau), you need to enable the optional integration services. To do this, simply export this env variable
export COMPOSE_FILES="-f docker-compose.yml -f docker-compose-extra.yml"
This will modify the
stop commands to include the integration services.
Testing PR images
A helper make target is provided that will automatically pull and restart the local environment with PR preview image for the specified services.
For example to test PR 501 for
hub together with PR 489 for
make pr-preview services=hub:501,contiamo-ui:489
All other services will use the default images.
To reset to the original state, use
make stop start
End-To-End API testing
The project comes with a suite of end-to-end tests that use the API to verify that the backend services are working as expected. You can run this in any environment by using
This assumes that you have already started the localdev environment using
make start or
Passing S3 credentials for the Federated mode / Datasets
By default, the Datasets feature wouldn't work with external DWH systems (e.g. Redshift, Snowflake). Data transfer to these systems needs to go
through mutually accessible object storage. There is
pantheon-datasource-test bucket on S3, but this repo doesn't include credentials for it.
If your scenario requires working with an external DWH you can pass S3 credentials by setting
environment variables. Additionally, the bucket name property should be set via
DATASETS_S3_BUCKET variable. An easy way to set these
variables is the
Datasets for testing the Profiler
Two pre-created datasets have been created that provide more interesting stats and entity detection profiles. These should be used to test the Profiler and the related UI components.
The datasets are available in
pii.csvcontains PII columns that should be detected during the entity detection profile.
- `sales.csv' also contains PII data, but is a good sample for the stats report.
Start and add an external data source
We have a couple of data sets available on GCR for internal testing:
- Postgres database that contains a single table
- Postgis (Postgres) database that contains geometry of Alaska regions. The purpose is to test geometry-related operations for Pantheon and PGQL server.
After starting the local dev environment, run:
docker run --name liftdata --rm --network dev_default eu.gcr.io/dev-and-test-env/deutschebahn-liftdata-postgres:v1.0.0
In the Data Hub, you can now add a external the data source using:
when you are done, run
docker kill liftdata
to stop and cleanup the database container.
Postgis Alaska regions
After starting the local dev environment, run:
docker run --name alaska --rm --network dev_default eu.gcr.io/dev-and-test-env/alaska-postgis:1.0.0
In the Data Hub, you can now add the data source using:
when you are done, run
docker kill alaska
to stop and cleanup the database container.
You can always cleanly stop the environment using
Any data in the databases will be preserved between
Adding the metadbs as external data sources
You can add the Data Hubs own metadbs to the Data Hub, meaning you can inspect the internals of the Data Hub from the Data Hub :) . Each of the following databases can be added as PostgreSQL data sources.
Accessing the metadbs with pgadmin
Go to http://localhost:5050 (The link is on http://localhost:9898/lemonade-shop/configuration page)
Login with the following credentials:
metadb server with the following connection info:
- Host name/address:
- Save password?: ✅
If you need to reclaim space or want to restart your environment from scratch use
This will stop your current environment and remove any Docker volumes related to it. This includes any data and metadata in the databases.
As time goes on, Docker will download new images, but it does not automatically garbage collect old images. To do so, run
docker system prune.
On Mac, all Docker file system data is stored in a single file of a fixed size, which is 16GB or 32GB by default. You can configure the size of this file by clicking on the Docker Desktop tray icon -> Preferences -> Disk -> move the slider.
Exporting and restoring the database state
You can find
restore.sh files in the
Both scripts have the only parameter — a filename.
To make an encrypted snapshot from your local dev environment use:
this will ask you to set the encryption key, will export the database of each service applying compression.
The snapshot is encrypted with a symmetric key (AES-128 cipher).
To erase your local database for each service and restore it to the state of the earlier exported snapshot use:
this will delete all the data you have locally and will perform a reverse operation for
IMPORTANT: do not move the scripts out of their
./scripts folder, they use relative paths.
make load-snapshot uses the committed
localdev.snapshot. You can use the script, as described above, to load any other snapsnots
make helpto see all available commands.
You can also run these commands from a different directory, with e.g.
make -C /path/to/dev start.
The commands in the Makefile are very useful, but there's some extra stuff available if you use
docker-composestraight. For instance, get all logs with
docker-compose logs --follow, or only datastore worker logs with
docker-compose logs --follow ds-worker. Refer to
docker-compose.ymlfor the definitions of the services.
cd'ing to this directory, use e.g.
docker-compose -f /path/to/dev/docker-compose.yml logs --follow.
The Compose file supports overriding the Docker tag used for a service by setting several environment variables:
Options to Postgres
In environment variable
POSTGRES_ARGS, you can pass extra arguments to the PostgreSQL daemon. By defaults, this is set to
-c log_connections=on. To log modification statements in addition to connections, start the dev environment with
env POSTGRES_ARGS="-c log_connections=on -c log_statement=mod" make start
You can inspect these logs with
docker-compose logs --follow metadb. The four acceptable values for
all. Further Postgres options can be found here: https://www.postgresql.org/docs/11/runtime-config.html .
Setting up Pantheon Local Development
Local Pantheon debug development is supported by port redirection. To set this up, you first need to run two extra steps.
This builds the
eu.gcr.io/dev-and-test-env/pantheon:redirDocker image, a "pseudo-Pantheon" that forwards everything to your local Pantheon on
4300. Do not push this image!
/etc/hostsfile to add
You can easily do this with
sudo bash -c 'echo "127.0.0.1 metadb" >> /etc/hosts'.
This ensures that Pantheon can correctly resolve the storage database service.
Running the Pantheon Local Development
Make sure you first set up the prerequisites, and also set up for Pantheon local development.
To start the Pantheon dev environment use
This will replace the Pantheon image with a simple port redirection image that will enable transparent redirect of
- http://localhost:9898/pantheon/api/v1/* to http://localhost:4300/api/v1/* ,
- http://localhost:9898/pantheon/jdbc/* to http://localhost:8765/* .
You can then start your local Pantheon debug build, e.g. from your IDE, and have it bind to those ports on localhost. To configure the meta-DB and enable data store from Pantheon, run SBT with
env METADB_URL="jdbc:postgresql://localhost:5433/pantheon?user=pantheon&password=test" DATASTORE_TYPE=external sbt
or set the same environment variables in IntelliJ. You can also use
export METADB_URL="jdbc:postgresql://localhost:5433/pantheon?user=pantheon&password=test" DATASTORE_TYPE=external, to set the environment variables in the current terminal.
The docker-compose configuration will expose the following ports for use from local Pantheon:
- Nginx web server at
9898<-- Use this to access Data Hub including UI, IDP, Pantheon, Datastore.
- PostgreSQL meta-DB at
- Datastore manager at
- Minio (for ingested files) at
When accessing Pantheon via Nginx on port 9898, you need to pre-pend
/pantheon to Pantheon URLs, for instance: http://localhost:9898/pantheon/api/v1/status . Nginx will strip off the
/pantheon, authenticate the request with IDP, and forward the request to Pantheon as
test credentials for Postgres, you also have access to
metadbdatabase, for datastore,
- collection databases corresponding to a managed DB,
- collection databases corresponding to materializations for a project,
Running a custom Pantheon in prod mode
You can also run Pantheon in prod mode locally, as follows.
- From a console, run
docker build -t eu.gcr.io/dev-and-test-env/pantheon:local .This will download dependencies if they are not cached yet, build a Docker image for Pantheon, and tag it
env PANTHEON_TAG=local make start.
Now datastore and metadb will still be available on the usual ports, but Nginx will proxy to a prod-mode Pantheon which runs inside Docker. Pantheon will automatically be run with appropriate environment variables (https://github.com/contiamo/dev/blob/master/docker-compose.yml#L81).
Warning! Do not push this image to GCR. It may accidentally end up being deployed on dev.contiamo.io .
The Profiler currently lives at http://localhost:8383.