@contiamo/dev

Dev environment for contiamo

Usage no npm install needed!

<script type="module">
  import contiamoDev from 'https://cdn.skypack.dev/@contiamo/dev';
</script>

README

Contiamo Local Dev Environment

Get the dev environment fast!

Quick overview

Get started:

  • make docker-auth
  • make pull

Get the latest versions:

  • git pull
  • make pull

Start everything in normal mode:

  • make start

Stop everything:

  • make stop

Stop everything and clean up:

  • make clean

Prepare for Pantheon-external mode (only do this once):

  • make build
  • sudo bash -c 'echo "127.0.0.1 metadb" >> /etc/hosts'

Start everything in Pantheon-external mode:

  • make pantheon-start
  • (In Pantheon directory) env METADB_URL="jdbc:postgresql://localhost:5433/pantheon?user=pantheon&password=test" DATASTORE_TYPE=external sbt run

Enable TLS verify-full mode on port 5435:

  • Download the private key for *.dev.contiamo.io: make get-pg-key
  • echo "127.0.0.1 pg-localhost.dev.contiamo.io" | sudo tee -a /etc/hosts
  • make build
  • make pantheon-start
  • You may need to tell your local psql about the IdenTrust root we happen to be using: curl https://letsencrypt.org/certs/trustid-x3-root.pem.txt > ~/.postgresql/root.crt
  • psql "user=lemon@example.com password=<token> dbname=<project UUID> sslmode=verify-full" -h pg-localhost.dev.contiamo.io -p 5435

Getting started

Prerequisites

Local development is supported via Docker Compose.

Before you start, you must install Docker and Docker-Compose.

Additionally, the development requires access to our private docker registry. To access this ask the Ops team for permissions. Once permissions have been granted you must install the gcloud CLI.

Once installed, run

make docker-auth pull

This will attempt to

  1. authenticate with Google,
  2. configure your Docker installation to use the new Google credentials, and
  3. pull the required Docker images.

Starting a fresh environment

Finally, to start the development environment, run

make start

Once the environment has started, you should see a message with a URL and credentials, like this

Dev ui:    http://localhost:9898/contiamo/profile
Email:     lemon@example.com
Password:  localdev

Starting with the latest locadev snapshot

The above section starts with a completely empty environment. A standard development environment with preconfigured data sources (the internal metadbs) is provided in the project and can be started with

make load-snapshot

The existing environment (if any) will be stopped and destroyed, so be careful. It will then start the db, load the data, and then start the rest of the environment.

The environment

  • contains two users lemon@example.com and lemonjr@example.com both with password localdev
  • has all of the datahub metadbs installed, foodmart, alaska, and liftdata
  • there are two virtualdbs with two views each. One that shows the maintenance tasks inside Hub and the other showing the use of PostGIS queries
  • Mr. Lemon is an admin for everything
  • Lemon Jr is not an admin and has various permission levels, liftdata is private and not available to Lemon Jr
  • There is a basic amount of metadata assigned to the datasources and tables including custom fields, descriptions, a mix of names, and even one with documentation

This should allow for basic development and testing of most use cases.

Overriding the service images

The image for each service can be overridden using env variables

variable value
AUTH_IMAGE eu.gcr.io/dev-and-test-env/idp:dev
GRAPHQL_IMAGE eu.gcr.io/dev-and-test-env/pgql-server:dev
UI_IMAGE eu.gcr.io/dev-and-test-env/contiamo-ui:dev
HUB_IMAGE eu.gcr.io/dev-and-test-env/hub:dev
DATASTORE_IMAGE eu.gcr.io/dev-and-test-env/datastore:dev
PANTHEON_IMAGE eu.gcr.io/dev-and-test-env/pantheon:dev
HUB_IMAGE eu.gcr.io/dev-and-test-env/hub:dev
PROFILER_IMAGE eu.gcr.io/dev-and-test-env/profiler:dev
SYNC_INGESTER_IMAGE eu.gcr.io/dev-and-test-env/sync-ingester:latest
SYNC_AGENT_TABLEAU_IMAGE eu.gcr.io/dev-and-test-env/sync-agent-tableau:dev

You can manually override the image used by setting the required variable and the restarting the services

export HUB_IMAGE=eu.gcr.io/dev-and-test-env/hub:v1.2.3
make stop start

Integrations

The default environment runs only the core services required to support the Data Source integrations. To enable the demo sign-up service or integration sync-agents for other resource types (like Tableau), you need to enable the optional integration services. To do this, simply export this env variable

export COMPOSE_FILES="-f docker-compose.yml -f docker-compose-extra.yml"

This will modify the start and stop commands to include the integration services.

Testing PR images

A helper make target is provided that will automatically pull and restart the local environment with PR preview image for the specified services.

For example to test PR 501 for hub together with PR 489 for contiamo-ui, use

make pr-preview services=hub:501,contiamo-ui:489

All other services will use the default images.

To reset to the original state, use

make stop start

End-To-End API testing

The project comes with a suite of end-to-end tests that use the API to verify that the backend services are working as expected. You can run this in any environment by using

make test

This assumes that you have already started the localdev environment using make start or make pr-preview.

Passing S3 credentials for the Federated mode / Datasets

By default, the Datasets feature wouldn't work with external DWH systems (e.g. Redshift, Snowflake). Data transfer to these systems needs to go through mutually accessible object storage. There is pantheon-datasource-test bucket on S3, but this repo doesn't include credentials for it. If your scenario requires working with an external DWH you can pass S3 credentials by setting DATASETS_AWS_ACCESS_KEY_ID and DATASETS_AWS_SECRET_ACCESS_KEY environment variables. Additionally, the bucket name property should be set via DATASETS_S3_BUCKET variable. An easy way to set these variables is the .env file.

Datasets for testing the Profiler

Two pre-created datasets have been created that provide more interesting stats and entity detection profiles. These should be used to test the Profiler and the related UI components.

The datasets are available in ./datasets

  1. pii.csv contains PII columns that should be detected during the entity detection profile.
  2. `sales.csv' also contains PII data, but is a good sample for the stats report.

Start and add an external data source

We have a couple of data sets available on GCR for internal testing:

  • Postgres database that contains a single table liftdata.
  • Postgis (Postgres) database that contains geometry of Alaska regions. The purpose is to test geometry-related operations for Pantheon and PGQL server.

Lift data

After starting the local dev environment, run:

docker run --name liftdata  --rm --network dev_default eu.gcr.io/dev-and-test-env/deutschebahn-liftdata-postgres:v1.0.0

In the Data Hub, you can now add a external the data source using:

field value
HOST liftdata
PORT 5432
DATABASE liftdata
USER pantheon
PASS contiamodatahub19

when you are done, run

docker kill liftdata

to stop and cleanup the database container.

Postgis Alaska regions

After starting the local dev environment, run:

docker run --name alaska --rm --network dev_default eu.gcr.io/dev-and-test-env/alaska-postgis:1.0.0

In the Data Hub, you can now add the data source using:

field value
HOST alaska
PORT 5432
DATABASE alaska
USER pantheon
PASS contiamodatahub19

when you are done, run

docker kill alaska

to stop and cleanup the database container.

Stopping

You can always cleanly stop the environment using

make stop

Any data in the databases will be preserved between stop and start.

Adding the metadbs as external data sources

You can add the Data Hubs own metadbs to the Data Hub, meaning you can inspect the internals of the Data Hub from the Data Hub :) . Each of the following databases can be added as PostgreSQL data sources.

service db name host port username password
datastore datastore metadb 5433 user localdev
hub hub metadb 5433 user localdev
idp simpleidp metadb 5433 user localdev
pantheon pantheon metadb 5433 pantheon test

Accessing the metadbs with pgadmin

Go to http://localhost:5050 (The link is on http://localhost:9898/lemonade-shop/configuration page)

Login with the following credentials:

  • Email: pgadmin4@pgadmin.org
  • Password: admin

Add the metadb server with the following connection info:

  • Host name/address: metadb
  • Port: 5433
  • Username: user
  • Password: localdev
  • Save password?: ✅

Cleaning up

If you need to reclaim space or want to restart your environment from scratch use

make clean

This will stop your current environment and remove any Docker volumes related to it. This includes any data and metadata in the databases.

As time goes on, Docker will download new images, but it does not automatically garbage collect old images. To do so, run docker system prune.

On Mac, all Docker file system data is stored in a single file of a fixed size, which is 16GB or 32GB by default. You can configure the size of this file by clicking on the Docker Desktop tray icon -> Preferences -> Disk -> move the slider.

Exporting and restoring the database state

You can find snapshot.sh and restore.sh files in the ./scripts folder.

Both scripts have the only parameter — a filename.

Snapshot

To make an encrypted snapshot from your local dev environment use:

./snapshot.sh localdev.snapshot

this will ask you to set the encryption key, will export the database of each service applying compression.

The snapshot is encrypted with a symmetric key (AES-128 cipher).

Restore

To erase your local database for each service and restore it to the state of the earlier exported snapshot use:

./restore.sh localdev.snapshot

this will delete all the data you have locally and will perform a reverse operation for shapshot.sh.

IMPORTANT: do not move the scripts out of their ./scripts folder, they use relative paths.

The make load-snapshot uses the committed localdev.snapshot. You can use the script, as described above, to load any other snapsnots

Tips

  • Run make or make help to see all available commands.

  • You can also run these commands from a different directory, with e.g. make -C /path/to/dev start.

  • The commands in the Makefile are very useful, but there's some extra stuff available if you use docker-compose straight. For instance, get all logs with docker-compose logs --follow, or only datastore worker logs with docker-compose logs --follow ds-worker. Refer to docker-compose.yml for the definitions of the services.

  • To use docker-compose without cd'ing to this directory, use e.g. docker-compose -f /path/to/dev/docker-compose.yml logs --follow.

Custom Images

The Compose file supports overriding the Docker tag used for a service by setting several environment variables:

Server Environment Variable Default
datastore DATASTORE_TAG dev
idp IDP_TAG dev
pantheon PANTHEON_TAG latest
contiamo-ui CONTIAMOUI_TAG latest

Options to Postgres

In environment variable POSTGRES_ARGS, you can pass extra arguments to the PostgreSQL daemon. By defaults, this is set to -c log_connections=on. To log modification statements in addition to connections, start the dev environment with

env POSTGRES_ARGS="-c log_connections=on -c log_statement=mod" make start

You can inspect these logs with docker-compose logs --follow metadb. The four acceptable values for log_statement are none, ddl, mod, and all. Further Postgres options can be found here: https://www.postgresql.org/docs/11/runtime-config.html .

Setting up Pantheon Local Development

Local Pantheon debug development is supported by port redirection. To set this up, you first need to run two extra steps.

  1. Run

    make build
    

    This builds the eu.gcr.io/dev-and-test-env/pantheon:redir Docker image, a "pseudo-Pantheon" that forwards everything to your local Pantheon on 127.0.0.1 port 4300. Do not push this image!

  2. Modify your /etc/hosts file to add

    127.0.0.1 metadb
    

    You can easily do this with sudo bash -c 'echo "127.0.0.1 metadb" >> /etc/hosts'.

    This ensures that Pantheon can correctly resolve the storage database service.

Running the Pantheon Local Development

Make sure you first set up the prerequisites, and also set up for Pantheon local development.

To start the Pantheon dev environment use

make pantheon-start

This will replace the Pantheon image with a simple port redirection image that will enable transparent redirect of

  • http://localhost:9898/pantheon/api/v1/* to http://localhost:4300/api/v1/* ,
  • http://localhost:9898/pantheon/jdbc/* to http://localhost:8765/* .

You can then start your local Pantheon debug build, e.g. from your IDE, and have it bind to those ports on localhost. To configure the meta-DB and enable data store from Pantheon, run SBT with

env METADB_URL="jdbc:postgresql://localhost:5433/pantheon?user=pantheon&password=test" DATASTORE_TYPE=external sbt

or set the same environment variables in IntelliJ. You can also use export METADB_URL="jdbc:postgresql://localhost:5433/pantheon?user=pantheon&password=test" DATASTORE_TYPE=external, to set the environment variables in the current terminal.


The docker-compose configuration will expose the following ports for use from local Pantheon:

  • Nginx web server at 127.0.0.1 port 9898 <-- Use this to access Data Hub including UI, IDP, Pantheon, Datastore.
  • PostgreSQL meta-DB at 127.0.0.1 port 5433, username pantheon, password test.
  • Datastore manager at 127.0.0.1 port 9191
  • Minio (for ingested files) at 127.0.0.1 port 9000

When accessing Pantheon via Nginx on port 9898, you need to pre-pend /pantheon to Pantheon URLs, for instance: http://localhost:9898/pantheon/api/v1/status . Nginx will strip off the /pantheon, authenticate the request with IDP, and forward the request to Pantheon as /api/v1/status.

Using the pantheon/test credentials for Postgres, you also have access to

  • the metadb database, for datastore,
  • collection databases corresponding to a managed DB,
  • collection databases corresponding to materializations for a project,
  • the simpleidp database.

Running a custom Pantheon in prod mode

You can also run Pantheon in prod mode locally, as follows.

  1. In sbt shell, run dist.
  2. From a console, run docker build -t eu.gcr.io/dev-and-test-env/pantheon:local . This will download dependencies if they are not cached yet, build a Docker image for Pantheon, and tag it local.
  3. Run env PANTHEON_TAG=local make start.

Now datastore and metadb will still be available on the usual ports, but Nginx will proxy to a prod-mode Pantheon which runs inside Docker. Pantheon will automatically be run with appropriate environment variables (https://github.com/contiamo/dev/blob/master/docker-compose.yml#L81).

Warning! Do not push this image to GCR. It may accidentally end up being deployed on dev.contiamo.io .

Profiler Server

The Profiler currently lives at http://localhost:8383.

Enjoy!