README
Nextclade: command-line tool
Clade assignment, mutation calling, and sequence quality checks
This is the command-line version of Nextclade.
You can also try our web application at: clades.nextstrain.org
Getting started
Locally
In order to run locally, you need Node.js and npm installed.
It is recommended to use nvm
or nvm-windows
to install and manage Node.js versions. Nextclade CLI supports Node.js versions >= 12, version >= 14.15.0 LTS is recommended.
Having Node.js and npm available, install the latest release of the nextclade
npm package globally:
npm install --global @neherlab/nextclade
Explore available options:
nextclade --help
Run, given a .fasta file with sequences
nextclade --input-fasta 'sequences.fasta' --output-json 'results.json'
or, shorter:
nextclade -i 'sequences.fasta' -o 'results.json'
Generated file results.json
will contain the results in JSON format.
Similarly, results can be generated in .csv or .tsv format, or in multiple formats (by passing multiple --output-<format>=
flags)
All files have the same format as exports from the Nextclade web application.
Nextclade can accept a custom Auspice JSON v2 reference tree through --input-tree
and it's root sequence through --input-root-seq
flags. It is user's responsibility to ensure that the root sequence corresponds to the root node of the tree - Nextclade has no possibility to enforce that requirement. The results will be incorrect if it isn't.
With --output-tree
flag you can output a new Nextstrain tree, with the analyzed sequences placed on it (in the same Auspice JSON v2 format). The tree produced is the same which you would see in Nextclade web application on tree page. This file can be used for further processing and visualization (for example with auspice.us). Note that Nextclade implements a fast but also very simplified tree placement algorithm. Its purpose is to give a rough idea of where the sequences may end up on the tree, and it is not a substitute for a full Nextstrain build.
Nextclade is currently in active development stage. If you encounter problems with the latest version, or if you need to use the same version to produce consistent, comparable experiments, you can install a specific version as follows:
npm install --global @nextstrain/nextclade@0.8.1
See the list of all versions released on NPM: www.npmjs.com/package/@nextstrain/nextclade?activeTab=versions. Note that only versions from the latest
channel are officially supported. Version marked alpha
and beta
versions are for development and internal testing. We release them publicly, but discourage using them for any serious purposes. You can find out which version you are currently using by running nextclade --version
.
With docker
Docker images with Nextclade CLI are hosted in docker hub repository nextstrain/nextclade
. They contain everything needed to run Nextclade, including the currently recommended version of Node.js. The only requirement is to have Docker installed.
You can pull the latest image and run the container as follows
docker run -it --rm -u 1000 --volume="${ABSOLUTE_PATH_TO_SEQUENCES}:/seq" neherlab/nextclade nextclade --input-fasta '/seq/sequences.fasta' --output-json '/seq/results.json'
Explanation:
-it
- runs inside an interactive instance of tty. Optional.--rm
- deletes the container after usage. Optional.-u 1000
. Runs container as a user with UID1000
. Substitute1000
with your local user's UID. UID of the current user can be found by runningid -u
. On single-user machines it is typically1000
on Linux and501
on Mac. If this parameter is not present, output files will be written on behalf of the root user, making them harder to operate on. Optional, but recommended.--volume="${ABSOLUTE_PATH_TO_SEQUENCES}:/seq"
. Substitute${ABSOLUTE_PATH_TO_SEQUENCES}
with your absolute path to a directory containing input fasta sequences on your computer. This is necessary in order for docker container to have access to this directory. In this example, it will be available as/seq
inside the container.neherlab/nextclade
name of the image to pull. In Unix-like environments you can use the variable${PWD}
to get the absolute path to the current directory, for example:--volume="${PWD}/data:/seq"
.nextclade --input-fasta '/seq/sequences.fasta' --output-json '/seq/results.json
the usual invocation of the tool. Note that in this example we read and write from/seq
directory inside the container, which we mounted using Docker's--volume=
parameter.
The default (latest
) tag uses Node.js image based on Debian stretch. It is also possible to use smaller Alpine Linux-based images by appending :alpine
tag after the repo name:
docker run ... nextstrain/nextclade:alpine ...
See the list of all tags on Docker Hub: hub.docker.com/r/nextstrain/nextclade/tags
Tips and tricks
Memory consumption
In the current implementation, Nextclade may consume large amounts of memory. By default, Nextclade currently detects the number of logical threads available on the machine and runs this number of sequence analyses in parallel - one input sequence per thread. It might happen that you have a machine with many cores/threads but limited amount of memory. In this case, many Nextclade threads will run concurrently, and it might run out of heap space and become very slow and unstable.
Additionally, while processing sequences, Nextclade accumulates information for the output tree construction. When there are many sequences, it may also lead to the excessive memory consumption, even in low-parallelism scenarios.
It is recommended to monitor the memory consumption, especially in automated workflows. To tune the memory consumption you could also:
limit the parallelism of Nextclade with
--jobs=n
flagrun completely sequentially (1 thread) with
--jobs=1
process fewer sequences, by filtering/subsampling the data before passing to Nextclade
process fewer sequences at a time, by batching the input data before passing it into multiple Nextclade runs, and then merging the results for every run
We are planning:
algorithmic improvements which should reduce the memory footprint of Nextclade
streaming and batching of inputs
Contributions are welcome!
Developer's guide
Build: production version
This will build a production version of the command-line tool:
git clone https://github.com/nextstrain/nextclade
# Optionally checkout a branch or a tag: git checkout -b 0.8.1
cd nextclade/packages/web
cp .env.example .env
yarn cli:prod:build
The build results - the main executable script, and a set of webworker modules, along with their source maps - will appear in nextclade/packages/cli/dist/
.
If Node.js >= 12 is available locally, the freshly built Nextclade can be ran as
node nextclade.js
or simply
nextclade.js
Build: standalone executables
A standalone executable (without dependency on Node.js) can be created with
cd nextclade/packages/web
yarn cli:prod:build:exe
The native executables for various platforms will appear in nextclade/packages/cli/dist/
.
This uses pkg
tool to wrap the script together with Node.js runtime into one standalone file. Currently, these are neither officially released nor supported.
Publish a new version to NPM and Docker Hub
Increment the version in both, nextclade/packages/web/package.json
and nextclade/packages/cli/package.json
:
{
"version": "x.y.z"
}
The version formats accepted:
x.y.z
- semantic version for stable releases (will be published tolatest
channel on NPM and with no tag prefix on Docker Hub)x.y.z-beta.n
- semantic version and a mandatory suffix for beta releases (will be published tobeta
channel on NPM and withbeta
tag prefix on Docker Hub)x.y.z-alpha.n
- semantic version and a mandatory suffix for alpha releases (will be published toalpha
channel on NPM and withalpha
tag prefix on Docker Hub)
rebuild:
cd packages/web
yarn cli:prod:build
publish:
cd packages/cli
./release.sh
This will:
- publish a new version on NPM to the appropriate channel
- build and push Docker images to Docker Hub
Run in development mode
For development purposes run
git clone https://github.com/nextstrain/nextclade
cd nextclade/packages/web
cp .env.example .env
yarn cli:dev
This will start webpack in watch mode and all changes will trigger partial rebuilds, which is convenient for continuous development. The build results will appear in nextclade/packages/cli/dist/
and can be run similarly to the production version (see above).