README
<input> Hi Sara, how are you?
(prompt / response) All systems operational!
<input> How is the weather in (where are we)?
(prompt / response) The current weather in Amsterdam, Netherlands is:
(weatherdetails)
<input> _
Attention:
This package is currently a work in progress
Do not install via npm install @ztik.nl/sara
Clone or download from Sara @ Github
Github documentation will be the Current/Latest testing build
NPM will be pushed occasionally when there shouldn't be any app-breaking bugs
Many changes are to be expected, do not expect backwards compatibility
Current version: 0.4.1
When the core program is more complete I will start semantic version 1.0.0
ToC:
- What is Sara
- Requirements
- NPM modules
- How to use
- Internal Commands
5.1 Colors
5.2 Verbose
5.3 Help
5.4 Hearing
5.5 Voice
5.6 Vision - Regular Expression matches
- Layered commands
- Plugins
- Provided plugins
9.1 Math
9.2 Conversation
9.3 Location
9.4 Weather
9.5 XBMC remote
9.6 Timedate
9.7 Wikipedia
9.8 News
9.8 Translate
9.8 Games
9.9 Addressbook (CardDAV)
9.10 Calendar (CalDAV) - Bootstrap
- Audio in/out issues
- Other issues
12.1 Google Cloud APIs
12.2 Haobosou USB microphone
12.3 Known - Todo
- Long term goals
- Credits
- Apologies
What is Sara:
Sara is a command prompt, that listens for keyboard input or voice commands
Sara has a voice, and is able to respond to commands through text as well as audio
Sara is my (poor) attempt at making my own Jarvis/Alexa/Hey Google/Hi Bixby/Voice Response System
It runs in Node.js on a Raspberry Pi 3B, but should be able to run on earlier versions as well as other linux distro
It has some internal commands, but can be extended through a self-made plugin system
Hearing works
Voice commands can be sent to the command line for editing, or immediately be processed without user intervention
This option selection is currently hidden away in hearing.js, but will be in the commandline arguments and config.json soon
Voice works
Voice output works, but further testing is required
Different voices (male and female) are now possible, soon there will be an option to select, as well as a way to display a list of voices for each language!
Vision works
All it does is take a picture every 30 minutes using a USB webcam
Pi camera not supported yet, will be supported later
There are object/face detection functions, as well as some other functions (age/expression/gender labeling) but NONE of these functions are connected to the webcam source image yet!
There are NO object/face recognition functions at this moment, but this will be added soon
Sara ignores the following words at sentence start:
sara
can you
will you
would you
could you
tell me
let me know
please
Sara also ignores the word please
and the ?
character at the end of commands
After stripping these words, the command is compared to internal commands, and if it doesnt match, it will be compared to a regex string contained in every plugin .json file
Sara listens to the keyword 'Sara'
Requirements:
The hardware stated below is what I am using to build/test/run this project on
It should run on any linux distro, I didn't test but see no reason why it wouldn't
It might run on Mac OS,unable to verify, I do not have any Fruit branded devices
It doesn't run on Windows, Sonus (speech recognition via Google Cloud) doesn't run on Windows
Hardware:
- A Raspberry Pi (3B tested, older models should work)
- Keyboard or ssh connection
- Microphone for voice commands (I use a G11 Touch Induction/Haobosou ~20 euro, excellent results)
- Audio output device (tv/hdmi or speakers on line-out)
- Webcam for future object/face recognition modules (I use a HP Webcam HD-4110)
- SD Card containing Raspbian (latest version is always advisable)
- Self-powered USBhub is advisable when using USB microphone/webcam
Software:
- Node.js LTS or newest (I am currently running 12.5.0)
- NPM (I am currently running 6.9.0)
- aplay and arecord (config audio in/out as default audio devices first)
sudo apt-get install alsa-utils
- fswebcam (I installed it, didnt touch a single config file)
apt-get install fswebcam
Other:
- Google Cloud API key (one key to rule them all!)
This is free for a certain amount of requests, see Google Cloud APIs for more details
The same key is used for the translate plugin, speech recognition, generating voices and face/object detection
Face recognition will be calculated in-app, so it will not make requests to the Google Cloud Vision API - newsapi.org API key (optional)
Free for personal use, used for the news plugin
NPM modules:
"@google-cloud/text-to-speech": "^1.1.3",
"@google-cloud/translate": "^4.1.3",
"@google-cloud/vision": "^1.1.4",
"@tensorflow/tfjs-core": "^1.2.7",
"@tensorflow/tfjs-node": "^1.2.7",
"canvas": "^2.5.0",
"chalk": "^2.4.2",
"country-list": "^2.1.1",
"date-and-time": "^0.8.1",
"dav": "^1.8.0",
"decimal.js": "^10.2.0",
"face-api.js": "^0.20.1",
"geoip-lite": "^1.3.7",
"he": "^1.2.0",
"node-webcam": "^0.5.0",
"play-sound": "^1.1.3",
"public-ip": "^3.1.0",
"rollup": "^1.19.4",
"sonus": "^1.0.3",
"vcard-parser": "^1.0.0",
"weather-js2": "^2.0.2",
"weeknumber": "^1.1.1",
"wiki-entity": "^0.4.3"
How to use:
- Clone or download this repo
- Inside main folder containing bin.js & package.json, run command:
npm install
- In folder resources/apikeys/googlespeech.json, add your own Google Cloud Speech API key
- Start program with command:
node bin.js
- To see the (optional) command line arguments, start program with command:
node bin.js --help
- It is also possible to use a config.json file to force default behaviour
For more information on the Google Cloud Speech API, see:
NPMJS.com/sonus/usage & NPMJS.com/sonus/how-do-i-set-up-google-cloud-speech-api
The Google API key file is located at ./resources/apikeys/googlecloud.json
For more information on how to setup your own custom hotword, see:
NPMJS.com/sonus/usage & NPMJS.com/sonus/how-do-i-make-my-own-hotword
The custom hotword file is located at ./resources/speechrecognition/Sarah.pmdl
Internal commands:
I have tried to keep everything modular, so if something doesn't work on your system, you can disable that function through commandline arguments, config.json options file, or in the app itself
The vision command will be extended with object/face recognition, if I can when I get that to work properly
Colors:
start/stop colors
turns on/off colored responses/prompt
Verbose:
start/stop verbose
turns on/off verbose mode
Verbose mode will turn on display of output with a 'data' or 'warn' type
Bootstrap:
start/stop bootstrap
turns on/off bootstrap plugins
bootstrap list
displays the currently active bootstrap plugins
Help:
help
displays the main 'help' section
list help
displays a list of all help topics
help <topic>
displays help on the topic requested (still needs to be populated)
help <plugin.function>
displays help on the requested plugin function (currently placeholders)
add help
fill in the form and a new help topic is born!
edit help <topic>
find an error in a certain help topic, you can fix it.
Hearing:
start/stop listening
turns on/off speech recognition
start/stop hearing
same as above
Voice:
start/stop voice
turns on/off text-to-speech
start/stop talking
same as above
start/stop speaking
same as above
silence
stop speaking the current sentence/item
quiet
same as above
voice list
display a list of all voices for the current language (config.json)
list voice
same as above
voices list
same as above
list voices
same as above
Vision:
start/stop vision
turns on/off timer (30 min) for webcam snapshot to ./resources/vision/frame.png
start/stop watching
same as above
Nothing is done with this image at this time, but there are tests being done with detection and recognition...
- Face/object detection works, but is not connected yet, it will be soon after some more testing
- Face recognition does not work yet, this will need a more complex neural net to connect the dots between different images
Regular Expression matches:
Sara needs to 'understand' commands, and does this by comparing input to a regular expression found inside each plugin function's .json file
Example:
^(?:what|how\smuch)?\s?(?:is)?\s?(-?[0-9]+\.?(?:[0-9]+)?)\s?(?:\+|plus|\&|and)\s?(-?[0-9]+\.?(?:[0-9]+)?)\s?(?:is)?$
This regular expression matches the following sentences:
what is (-)10(.12) plus/and/+/& (-)10(.12)
what (-)10(.12) plus/and/+/& (-)10(.12) is
how much is (-)10(.12) plus/and/+/& (-)10(.12)
how much (-)10(.12) plus/and/+/& (-)10(.12) is
(-)10(.12) plus/and/+/& (-)10(.12) is
(-)10(.12) plus/and/+/& (-)10(.12)
Because Sara strips starting input, this allows to recognize sentences such as:
Sara can you please tell me what 10 + -9 is?
In the above regex line. most groups are not captured (?:xxx)
The capture fields (-?[0-9]+.?(?:[0-9]+)?) grabs these values and push them back to math.js which includes the function for processing these values
In the above example, math.js will receive an array object containing 3(!) items:
[0] the complete input string, in case the plugin still requires this string.
[1] the first captured group
[2] the second captured group
Therefore, the function math.add will receive these 3 array items, and return the calculation of add x[1] + x[2]
x[0] is always the entire matching regex string
Using the input sentence above, then:
x[0] == "what 10 + -9 is"
x[1] == 10
x[2] == -9
Layered commands:
(I am not a native English speaker, and I am not certain this is the correct term)
Sara is able to process subcommands through the use of parenthesis encapsulation
Example:
Sara can you tell me how much is 9 + (10 + 16)?
In this example, Sara will calculate 10 + 16 first, then calculate 9 + 26 afterwards
You can layer as many commands as you need, they will be processed starting with the most outer subcommand first:
11 + (7 + (root of 9))
subcmd: root of 9 = 3
subcmd: 7 + 3 = 10
finalcmd: 11 + 10 = 21
Some examples of what is possible:
((10 + ((root of 9) * (5³))) / 77) * (√9)
how is the weather in (where i am)
translate to german (what is gold)
Plugins:
These are created using (at least) 2 files:
pluginname_function.json
pluginname.js
The .js file contains all the javascript to deal with request X and push back a result
The result pushed back can be either a string such as '1999' (example question: 2000-1)
Or an array containing the text string, and the same string with SSML markup:result = ['1999']; result[1] = '<say-as interpret-as="cardinal">1999</say-as>';
More information on SSML markup can be found here
The .json file contains the name of the plugin, the name of the module (the .js file name), a Regular Expression string, a small description and explanation (used in help documentation)
math_add.json:{ "name": "add", "module": "math", "regex": "^(?:what|how\\smuch)?\\s?(?:is)?\\s?(-?[0-9]+\\.?(?:[0-9]+)?)\\s?(?:\\+|plus|\\&|and)\\s?(-?[0-9]+\\.?(?:[0-9]+)?)\\s?(?:is)?