s4-cli

Command line interface for the s4-service

Usage no npm install needed!

<script type="module">
  import s4Cli from 'https://cdn.skypack.dev/s4-cli';
</script>

README

s4-cli

Table of Contents

Overview

The s4-cli is a command line tool for the s4 service that allows users to send audio data to the service, obtaining text or source separated audio as a response. Data can be sent to the service from multiple sources (pre recorded/real time), while requesting different processing options (batch/stream), and different responses (test/cleaned up audio).

Data Sources

NOTE: Currently, all pre recorded data sources for ASR must be be encoded as 16-bit, 16kHz .wav or PCM. This applies to ASR only, in both batch and stream mode.

Pre Recorded Files

The tool can be used to send one or more pre recorded files (in .wav format) to the service, either as individual files, or as a group of files in a directory. When sending a collection of files from a directory, regular expression matching patterns can be used to filter the files that are chosen for processing.

Real Time Audio

The tool also supports the capture of real time audio data from the default microphone on the device. It relies on Sox to access the microphone, and currently assumes a default microphone configuration (8 channel, 16KHz sample rate, 32 bit signed integer).

Processing Options

Data can be processed by the service in one of two modes batch, or stream. The key difference between the two processing modes is that batch mode processing combines all of the input data into a single file before processing, while stream mode processing processes audio chunks as they are received by the server. Note that batch mode is currently not supported when using real time audio.

Response Types

The service returns one of two possible responses text or audio. When text mode is chosen, the service performs ASR on the cleaned up audio (and raw input), returning the resultant text from the separated audio. When audio mode is chosen, the service does not attempt any ASR, but instead returns the cleaned up audio stream as a response.

The following table summarizes the different options available when using the CLI tool:

Stream Mode Batch Mode
Pre Recorded Audio text/audio text/audio*
Real Time Audio text/audio Not Supported

*Batch mode does not provide cleaned up audio as a direct response. However, cleaned up streams are stored in the cloud, and can be downloaded using the request id

Installation

Prerequisites

The following is a list of prerequisites required to run the s4-cli

  • NodeJS v0.12.0 - Required to run the tool
  • Sox v14.4.1 - Required to capture real time audio
  • Git - Required to download and install the tool from npm

NodeJS

Install version 0.12.0 of NodeJS.

Windows

An installation package for Node can be downloaded from the NodeJs downloads page. Download and install the appropriate installation package for your operating system.

Mac OSX

An installation package .dmg file for Mac OSX can be downloaded and installed from the NodeJs downloads page.

Alternate Approach:

NodeJs can be installed via HomeBrew by running the following in a command shell:

brew update
brew install node

Ubuntu

Instructions on installing NodeJS on Linux using a package manager can be found here https://github.com/joyent/node/wiki/Installing-Node.js-via-package-manager

Sox

This program is only required if the tool will be used with real time audio capture.

Windows

The simplest way to install this package would be download and run the installation package for your platform from the Sox downloads page.

The downloaded package is a .zip file that contains the sox executable, and related files. These files can be extracted to any convenient location on file system. Once extracted, ensure that the sox folder has been added to the PATH variable.

This can be done by updating the path variable within a terminal shell as follows:

PATH=%PATH%;c:\sox-14.4.1\;

This change only applies to the command shell that it is executed in. If a global setting is preferred, update your path variable under Environment Variables. This panel can be found here: My Computer --> Properties --> Advanced --> Environment Variables.

Mac OSX An installation package .dmg file for Mac OSX can be downloaded and installed from the Sox downloads page.

Alternate Approach:

Sox can be installed via HomeBrew by running:

brew update
brew install sox

Ubuntu

Sox can be installed on linux using a package manager. The following is an example that uses apt-get to install Sox on Ubuntu.

sudo apt-get update
sudo apt-get install sox libsox-fmt-all

Git

Git is a version control tool that is used, among other things, to create copies of code from a remote source control repository. In this case, node's package manager tool npm uses the Git to download and install the CLI on the local computer.

*** Windows ***

The simplest way to install this program would be to download and run the installation package for your platform from the Git downloads page.

Mac OSX

An installation package .dmg file for Mac OSX can be downloaded and installed from the Git downloads page

Alternate Approach:

Git can be installed via HomeBrew by running:

brew update
brew install git

Ubuntu

Git can be installed on linux using a package manager. The following is an example that uses apt-get to install Git on Ubuntu.

sudo apt-get update
sudo apt-get install git

Installation

Once all the prerequisites have been installed, s4-cli can be installed via npm by running the following in a command shell:

NOTE: The command shell refers to the terminal program in Mac OSX/Linux, or the cmd.exe program on Windows operating systems

npm install -g s4-cli

NOTE: The above command may sometimes fail because elevated privileges may be required (this depends on how nodejs/npm has been setup).

If that is the case, the problem can be resolved by prefixing the above command with sudo (Linux/Mac OSX), or by running the command in a terminal window running with administrator privileges (Windows)

You can test if the CLI has been installed correctly by typing:

s4-cli --help

The above command should display the available command line options for the s4-cli tool.

Usage

This section outlines the common use cases for using the s4-cli tool on the command line. The basic usage of the s4-cli tool is as follows:

NOTE: The command shell refers to the terminal program in Mac OSX/Linux, or the cmd.exe program on Windows operating systems

s4-cli [ACTION] [OPTIONS]

Where:

[ACTION]: This argument specifies the type of action to perform on the input to the service. This argument can be one of the following values:

  • asr-batch: Requests the service to clean up the data in batch mode, and then perform ASR on the cleaned up data, and return the text obtained by performing ASR on the separated audio.
  • asr-stream: Requests the service to clean up the data in streaming mode, and then perform ASR on the cleaned up data, and return the text obtained by performing ASR on the separated audio.
  • audio-stream: Requests the service to clean up the data in streaming mode, and return the cleaned up stream.

[OPTIONS]: Are other arguments that can be passed to the command line tool. The following is a brief summary of supported options:

  • --help: Displays help information that includes usage and options details.
  • --url or -u: The base url of the service, including protocol type. If not specified, this parameter defaults to: http://s4front-end.elasticbeanstalk.com/
  • --api-key or -a: This is the API key that uniquely identifies the entity making the request. Please contact your ADI representative if you do not have a key, and would like to obtain one.
  • --mic-config: This is a microphone configuration parameter that is sent to the service. This parameter will be used by the service when processing the input. If not specified, this parameter defaults to default microphone config
  • --algorithm: This parameter identifies the algorithm used to clean up the input data. If not specified, this parameter defaults to: ntf-v1
  • --tag: This is a string parameter that will be used as the folder name under which input/output artifacts are stored in cloud storage. This value is especially useful when multiple files are being processed simultaneously, and it is desirable to tag the files so that they may be reviewed as a group
  • --input-file: When specified, this parameter identifies a single input file that will be sent to the service for processing.
  • --input-dir: When specified, this parameter identifies a directory whose entire file contents will be sent to the service for processing. Files within the directory may be filtered using the the --pattern option
  • --audio-device: When sepcified, this parameter indicates that real time audio will be captured from the default audio device, and sent to the service for processing
  • --pattern: This parameter can be used in conjunction with the --input-dir option to specify a regular expression filter that will be applied on the names of the files within the input directory. Only files that match the regular expression pattern will be selected for further processing.
  • --output-dir: Specifies the directory in which output artifacts generated by the CLI will be stored. If not specified, this parameter defaults to ./out. Note that this directory must exist on the file system if raw audio is requested from the server.
  • --output-summary: An optional file name that will contain a report of execution. The report will be stored in report.json if this parameter is omitted. The file will be created in the output directory, as specified by the --output-dir parameter.

Some things to remember:

  • At least one action parameter has to be specified (asr-batch, asr-stream or audio-stream).
  • At least one input source has to be specified (--input-file, --input-dir or --audio-device)
  • If the action specified requires the server to return an audio stream, the output directory specified by --output-dir must exist on the file system
  • If a tag value (--tag) is specified, files will be stored under a directory with the same name as the tag value.

Considerations for Real Time Audio

Configuring the Default Microphone

When recording real time audio, the CLI attempts to capture data directly from the default microphone on the computer. It is important to ensure that the default microphone has been set correctly before using the CLI.

For example, this can be done on Mac OSX by running the following in the terminal:

set AUDIODEV=hw:1

Note that this is not a global setting, and only applies to all s4-cli execution within that terminal session.

Waiting for Recording to Start

On some computers, there could be a slight delay between when the s4-cli starts execution, and when actual recording commences. It is recommended that the user pause until the following message is displayed:

Audio is being captured from the default audio device. Press <ESC> to stop

Recording can be stopped by pressing the ESC key.

Execution Behavior

This section provides an overview of the execution behavior of the service. While the documentation provided here is geared towards using the CLI, the behavior of the service remains the same irrespective of how it is accessed.

  • When a request is received by the service, it generates a unique id for the request, called the requestId. This id is globally unique, and is used to tag all input to and output generated by the service.
  • All input sent to the service will be stored in the cloud (AWS S3). The following are the rules used when storing data:
  • All files in S3 are partitioned by API key. This means that each API key has a separate S3 partition allocated to it.
  • Each request is assgined a separate folder that will in turn hold three artifacts for every input file (1) The unprocessed input file, (2) The cleaned up audio data, (3) The noisy audio data
  • The folder will have the same name as the requestId, unless a tag value (--tag) is specified. If the tag value is specified, it will be used to name the S3 folder
  • Note that reusing the same tag value for multiple requests will result in previous results being overwritten by the latest request
  • The service will process the request data, and send responses back to the client (in this case the CLI)
  • The CLI shows responses from the service on the terminal, and also optionally saves summary information in a .json file. For audio processing request where the response is cleaned up audio, the service response will be saved in the output directory with the same name as the requestId.

Summary Report Format:

The summary report for requests is stored in a .json file. The following is an example of the summary report format:

[
    {
        "inputType":"file",
        "input":"data/SoundTest.wav",
        "message":"OK",
        "success":true,
        "output":{
            "0":{ "status":"success", "text": "this" },
            "1":{ "status":"success", "text": "this" },
            "2":{ "status":"success", "text": "this" },
            "3":{ "status":"success", "text": "this is" },
            "4":{ "status":"success", "text": "this is" },
            "5":{ "status":"success", "text": "this is" },
            "6":{ "status":"success", "text": "this is a" },
            "7":{ "status":"success", "text": "this is a" },
            "8":{ "status":"success", "text": "this is a" },
            "9":{ "status":"success", "text": "this is a test" }
            "metadata":{
                "requestId":"e31cd313-9eef-4d36-9938-a3936c19c9de",
                "s3Folder":"e31cd313-9eef-4d36-9938-a3936c19c9de",
                "micConfig":"az7",
                "algorithm":"ntf-v1"
            }
        }
    },
    {

     ...

    }
]

The file contains a JavaScript array with one object for every input received as a part of the request. Each object in turn contains information about the type of request, and also the response from the server, including metadata such as the id of the request, folder name in S3, etc.

Examples

Perform clean up and ASR in batch mode on a single pre recorded file:

s4-cli asr-batch --api-key=_apikey_ --mic-config= _micconfig_ --input-file=./data/sound-recording-1.wav

Perform clean up and ASR in stream mode on all .wav files in a given directory:

s4-cli asr-stream --api-key=_apikey_ --mic-config= _micconfig_ --input-dir=./data --pattern='.*.wav



Perform clean up in stream mode on real time data captured from the default audio device:

s4-cli audio-stream --api-key=_apikey_ --mic-config= _micconfig_ --audio-device