@nqminds/crop-doc-proc-databot

A databot class for the crop doc data processing chain

Usage no npm install needed!

<script type="module">
  import nqmindsCropDocProcDatabot from 'https://cdn.skypack.dev/@nqminds/crop-doc-proc-databot';
</script>

README

@nqminds/crop-doc-proc-databot

This package provides a base class that crop doc process databots can extend to provide several utility functions, error handling, and logging.

installation and usage

npm i @nqminds/crop-doc-proc-databot

In the entry point of your databot:

const input = require("@nqminds/nqm-databot-utils").input;
const MyDatabot = require("./path/to/databot");

const databot = function (input, output, context) {
  const myDatabotInstance = new MyDatabot(input, output, context);
  myDatabotInstance.start();
};

input.pipe(databot);

In your databot code:

const ProcessDatabot = require("@nqminds/crop-doc-proc-databot");

class MyDatabot extends ProcessDatabot {
  async main() {
    // Code of databot
  }
}

Package parameters

Databots that extend process databot must provide the following data in their packageParams included on context.

{
  "name": "string", // The unique name of this databot definition
  "manifest": [
    // See manifest below for more details
    {
      "inputName": "string",
      "inputType": "string",
      "ttl": "number",
      "timeKey": "string",
      "owner": "string"
    }
  ],
  // python installation options (pick one only)
  pythonPackages: ["string"], // names of python3 packages to install
  condaEnv: "string" // path of a conda environment.yml
  usePoetry: "bool" // install using poetry (pyproject.toml must be defined, poetry.lock optional)
  javaSubProcess: { // optional
    port: "number", // The port on which to launch the websocket server for communication
    executablePath: "string" // The location of the java executable to run
  }
}

Manifest

The manifest for a process databot details information on the inputs required for the databot to run. Before a process databot begins it will verify that all of these inputs are available.

{
  inputName: "input1", // A unique name for this input
  inputType: "geotiff", // The type of input, this will determine how the input is loaded by the databot
                        // Usually one of "geotiff", "dataset"
  ttl: 60, // The maximum age of the most recent input for it to be considered valid in minutes
  timeKey: "timestamp", // If loading a dataset, the field name containing the creation time for the record
  owner: "info@nqminds.com", // Email address of the party responsible for this input
}
// Dataset exmaple with time sensitivity
{inputName: "dataInputA", inputType: "dataset", ttl: 60, timeKey: "timestamp", owner: "info@nqminds.com"}
// Geotiff Example
{inputName: "dataInput", inputType: "geotiff", ttl: 60, owner: "info@nqminds.com"}

An input takes the form of a resource on the TDX, inputs should be tagged with both their input name and their input type. Resource creation time will be used for verifying TTL.

Conda environment

To specifiy your desired python environment, create an environment.yml with your dependencies. Below is what a YAML environment file might look like:

channels:
  - conda-forge
  - defaults
  - mro
dependencies:
  - python=3.7.*
  - scikit-learn=0.20.*
  - scipy=1.2.*
  - matplotlib=3.0.*
  - pandas=0.24.*
  - pymongo=3.7.*
  - pytest=4.4.*
  - pip=19.0.*
  - pip:
      - pytest-mpl==0.10.*

Communication with Java

If the javaSubProcess option was set in packageParams then the process databot will instatiate an instance of the java communicator class. This can be used to communicate with a java process (see java stub package for an example of java code). Usage is as follows:

this.javaCom.on("ready", () => { // The java process is ready to receive inputs
  this.javaCom.sendData([{inputType: "file", path: "home"}]); // Send a json array of input values
});
this.javaCom.on("data", (data) => {
  // Do something with the data received from java
});
this.javaCom.on("end", (code) => {
  // The java code has disconneted from the web socket
});

functionality

API Reference

process-databot

ProcessDatabot ⏏

Process databot base class

Kind: Exported class

processDatabot.getDatasetId(dataset) ⇒ string

Returns the dataset id of a resource created by the app

Kind: instance method of ProcessDatabot
Returns: string - datasetId

Param Type Description
dataset string Id of the schema for the dataset (e.g. serviceUsers)

processDatabot.main()

The main function of this databot. You must override this function in your own databot class, as it will be called by start().

Kind: instance method of ProcessDatabot

processDatabot.start()

Starts the databot

Kind: instance method of ProcessDatabot

processDatabot.log(message)

Adds a timestamped message to the process log

Kind: instance method of ProcessDatabot

Param Type Description
message string Text content of the message

processDatabot.installPythonPackages()

Installs required python3 packages

Kind: instance method of ProcessDatabot

processDatabot.python(scriptName, ...args)

Runs a python script using python3 and given arguments

Kind: instance method of ProcessDatabot

Param Type Description
scriptName string The name of the python script, the code will look for this file in the current working directory
...args any Arguments to pass to the python script

processDatabot.finish()

Called when the databot exits

Kind: instance method of ProcessDatabot