@neherlab/nextclade

Clade assignment, mutation calling, and sequence quality checks

Usage no npm install needed!

<script type="module">
  import neherlabNextclade from 'https://cdn.skypack.dev/@neherlab/nextclade';
</script>

README

Nextclade: command-line tool

Clade assignment, mutation calling, and sequence quality checks


This is the command-line version of Nextclade.

You can also try our web application at: clades.nextstrain.org

Getting started

Locally

In order to run locally, you need Node.js and npm installed. It is recommended to use nvm or nvm-windows to install and manage Node.js versions. Nextclade CLI supports Node.js versions >= 12, version >= 14.15.0 LTS is recommended.

Having Node.js and npm available, install the latest release of the nextclade npm package globally:

npm install --global @neherlab/nextclade

Explore available options:

nextclade --help

Run, given a .fasta file with sequences

nextclade --input-fasta 'sequences.fasta' --output-json 'results.json'

or, shorter:

nextclade -i 'sequences.fasta' -o 'results.json'

Generated file results.json will contain the results in JSON format. Similarly, results can be generated in .csv or .tsv format, or in multiple formats (by passing multiple --output-<format>= flags) All files have the same format as exports from the Nextclade web application.

Nextclade can accept a custom Auspice JSON v2 reference tree through --input-tree and it's root sequence through --input-root-seq flags. It is user's responsibility to ensure that the root sequence corresponds to the root node of the tree - Nextclade has no possibility to enforce that requirement. The results will be incorrect if it isn't.

With --output-tree flag you can output a new Nextstrain tree, with the analyzed sequences placed on it (in the same Auspice JSON v2 format). The tree produced is the same which you would see in Nextclade web application on tree page. This file can be used for further processing and visualization (for example with auspice.us). Note that Nextclade implements a fast but also very simplified tree placement algorithm. Its purpose is to give a rough idea of where the sequences may end up on the tree, and it is not a substitute for a full Nextstrain build.

Nextclade is currently in active development stage. If you encounter problems with the latest version, or if you need to use the same version to produce consistent, comparable experiments, you can install a specific version as follows:

npm install --global @nextstrain/nextclade@0.8.1

See the list of all versions released on NPM: www.npmjs.com/package/@nextstrain/nextclade?activeTab=versions. Note that only versions from the latest channel are officially supported. Version marked alpha and beta versions are for development and internal testing. We release them publicly, but discourage using them for any serious purposes. You can find out which version you are currently using by running nextclade --version.

With docker

Docker images with Nextclade CLI are hosted in docker hub repository nextstrain/nextclade. They contain everything needed to run Nextclade, including the currently recommended version of Node.js. The only requirement is to have Docker installed.

You can pull the latest image and run the container as follows

docker run -it --rm -u 1000 --volume="${ABSOLUTE_PATH_TO_SEQUENCES}:/seq" neherlab/nextclade nextclade --input-fasta '/seq/sequences.fasta' --output-json '/seq/results.json'

Explanation:

  • -it - runs inside an interactive instance of tty. Optional.
  • --rm - deletes the container after usage. Optional.
  • -u 1000. Runs container as a user with UID 1000. Substitute 1000 with your local user's UID. UID of the current user can be found by running id -u. On single-user machines it is typically 1000 on Linux and 501 on Mac. If this parameter is not present, output files will be written on behalf of the root user, making them harder to operate on. Optional, but recommended.
  • --volume="${ABSOLUTE_PATH_TO_SEQUENCES}:/seq". Substitute ${ABSOLUTE_PATH_TO_SEQUENCES} with your absolute path to a directory containing input fasta sequences on your computer. This is necessary in order for docker container to have access to this directory. In this example, it will be available as /seq inside the container.
  • neherlab/nextclade name of the image to pull. In Unix-like environments you can use the variable ${PWD} to get the absolute path to the current directory, for example: --volume="${PWD}/data:/seq".
  • nextclade --input-fasta '/seq/sequences.fasta' --output-json '/seq/results.json the usual invocation of the tool. Note that in this example we read and write from /seq directory inside the container, which we mounted using Docker's --volume= parameter.

The default (latest) tag uses Node.js image based on Debian stretch. It is also possible to use smaller Alpine Linux-based images by appending :alpine tag after the repo name:

docker run ... nextstrain/nextclade:alpine ...

See the list of all tags on Docker Hub: hub.docker.com/r/nextstrain/nextclade/tags

Tips and tricks

Memory consumption

In the current implementation, Nextclade may consume large amounts of memory. By default, Nextclade currently detects the number of logical threads available on the machine and runs this number of sequence analyses in parallel - one input sequence per thread. It might happen that you have a machine with many cores/threads but limited amount of memory. In this case, many Nextclade threads will run concurrently, and it might run out of heap space and become very slow and unstable.

Additionally, while processing sequences, Nextclade accumulates information for the output tree construction. When there are many sequences, it may also lead to the excessive memory consumption, even in low-parallelism scenarios.

It is recommended to monitor the memory consumption, especially in automated workflows. To tune the memory consumption you could also:

  • limit the parallelism of Nextclade with --jobs=n flag

  • run completely sequentially (1 thread) with --jobs=1

  • process fewer sequences, by filtering/subsampling the data before passing to Nextclade

  • process fewer sequences at a time, by batching the input data before passing it into multiple Nextclade runs, and then merging the results for every run

We are planning:

  • algorithmic improvements which should reduce the memory footprint of Nextclade

  • streaming and batching of inputs

Contributions are welcome!

Developer's guide

Build: production version

This will build a production version of the command-line tool:

git clone https://github.com/nextstrain/nextclade
# Optionally checkout a branch or a tag: git checkout -b 0.8.1
cd nextclade/packages/web
cp .env.example .env
yarn cli:prod:build

The build results - the main executable script, and a set of webworker modules, along with their source maps - will appear in nextclade/packages/cli/dist/.

If Node.js >= 12 is available locally, the freshly built Nextclade can be ran as

node nextclade.js

or simply

nextclade.js

Build: standalone executables

A standalone executable (without dependency on Node.js) can be created with

cd nextclade/packages/web
yarn cli:prod:build:exe

The native executables for various platforms will appear in nextclade/packages/cli/dist/. This uses pkg tool to wrap the script together with Node.js runtime into one standalone file. Currently, these are neither officially released nor supported.

Publish a new version to NPM and Docker Hub

Increment the version in both, nextclade/packages/web/package.json and nextclade/packages/cli/package.json:

{
  "version": "x.y.z"
}

The version formats accepted:

  • x.y.z - semantic version for stable releases (will be published to latest channel on NPM and with no tag prefix on Docker Hub)

  • x.y.z-beta.n - semantic version and a mandatory suffix for beta releases (will be published to beta channel on NPM and with beta tag prefix on Docker Hub)

  • x.y.z-alpha.n - semantic version and a mandatory suffix for alpha releases (will be published to alpha channel on NPM and with alpha tag prefix on Docker Hub)

rebuild:

cd packages/web
yarn cli:prod:build

publish:

cd packages/cli
./release.sh

This will:

  • publish a new version on NPM to the appropriate channel
  • build and push Docker images to Docker Hub

Run in development mode

For development purposes run

git clone https://github.com/nextstrain/nextclade
cd nextclade/packages/web
cp .env.example .env
yarn cli:dev

This will start webpack in watch mode and all changes will trigger partial rebuilds, which is convenient for continuous development. The build results will appear in nextclade/packages/cli/dist/ and can be run similarly to the production version (see above).

License

MIT License