README

barnard59

barnard59 is a toolkit to automate extract, transform and load (ETL) tasks. Its main focus is on creating Linked Data. It allows you to generate RDF out of non-RDF data sources. In doing so, it follows the standard adopted in Semantic Web.

More specifically, barnard59 is an engine to execute data pipelines.

Pipeline

A pipeline is a way to extract, transform, and load data (ETL). It allows you to consolidate data from various sources, and publish it as RDF. Using a pipeline you can:

download the source the data
validate it
transform it to linked data
publish it

barnard59 expects a pipeline to be defined as linked-data. The supported formats are:

application/ld+json
application/n-triples
text/turtle
application/rdf+xml

To learn more about building your own pipeline, go here.

Requirements

To execute pipeline with barnard59, you will need node.js environment. All node and npm versions which are actively maintained are supported by barnard59.

To check if you have Node.js installed, run this command in your terminal: node -v

To confirm that you have npm installed, run: npm -v

If you need to install (or upgrade) node, or npm, go here. Once node and npm are installed, you can install the dependencies for this project. To do so, run: npm install

Executing pipeline

To execute pipeline, run: barnard59 run <pipeline_file>

The following arguments are available:

--pipeline <pipeline_iri>
--output <filename>
--verbose or -v
--enable-buffer-monitor
--variable <name=value>

If the file contains more than one pipeline, you can define the pipeline to execute. By default, first pipeline will be executed. To define pipeline to be executed, run: barnard59 run <pipeline_file> --pipeline <pipeline_iri>

Some pipelines will produce a write stream. By default, this stream can be will be printed in the terminal. To write it to the file, run: barnard59 run <pipeline_file> --output <output_file>

Pipelines can represent complex data manipulation. To better understand what is happening, add options:

--verbose or -v - to enable diagnostic console output
--enable-buffer-monitor - to enable histogram of buffer usage

Passing arguments to the pipeline

It is possible to pass arguments from command line to the pipeline. This allows you to use a variable in the pipeline, whithout defining its value. The key-value pairs for pipeline variables can be provided at the runtime. In the cli tool, this can be done via the --variable argument. To pass an argument to the pipeline, run: barnard59 run <pipeline_file> --variable <variable_name_in_pipeline>=<value>

For example, to pass example.txt as a filename to pipeline.json, run: barnard59 run pipeline.json --variable filename=example.txt

In a CI/CD environment you might want to define sensitive arguments like usernames and passwords as environment variables. To pass them to the pipeline, run: barnard59 run pipeline.json --variable password=$PASSWORD where $PASSWORD is a an environment variable.

To pass multiple arguments to the pipeline, call --variable argument multiple times: barnard59 run <pipeline_file> --variable <variable1>=<value1> --variable <variable2>=<value2> ... --variable <variableN>=<valueN>

For example, to pass $USER as a user, and $PASSWORD as a password to pipeline.json, run: barnard59 run pipeline.json --variable user=$USER --variable password=$PASSWORD

Examples

Transform csv file

This pipeline parses a CSV file. It uses the CSV on the Web format. The file examples/parse-csvw.ttl contains two pipelines:

<parseCsvw> - the main pipeline
<parseMetadata> - the metadata pipeline

Tho execute the main pipeline, run:

barnard59 run examples/parse-csvw.ttl --pipeline=http://example.org/pipeline/parseCsvw

Transform json file

This pipeline downloads, and transforms json file. It:

fetches a JSON document from the predefined URL
transforms JSON into a JSON-LD structure This is done using a map step, and a context variable in the JSON file.
generates RDF-JS quads from JSON structure
serializes the quads into N-Triples.

For comparison, the same pipeline is defined in JSON-LD and Turtle format.

To execute the JSON-LD pipeline, run:

node bin/barnard59.js run examples/fetch-json-to-ntriples.json --pipeline http://example.org/pipeline/cet

The --pipeline parameter is required, as the file contains two pipelines.

To execute the Turtle pipeline, run:

node bin/barnard59.js run examples/fetch-json-to-ntriples.ttl --pipeline http://example.org/pipeline/utc

By default, the pipeline stream will be written to stdout. Use --output parameter to write output to the file:

node bin/barnard59.js run examples/fetch-json-to-ntriples.json --pipeline http://example.org/pipeline/cet --output test.nt

Usage no npm install needed!