Intelligently convert HTML to audio.
I've never been a fast reader. When I'm given a choice between a physical book, and an audiobook of the same content, I'll reach for the latter... 100 times out of 100. Whenever theres a long piece of text I am required to read, I always look for an audio option first. While these options are expanding in recent years, the vast majority of online prose does not allow for listening.
You can use your device's in-built accessibility tools to do text-to-speech. You can ask Siri "read this article" and it sometimes works. But even when it does, this is far from ideal.
I've tried various text-to-speech APIs, and Google Cloud's is the best I've found. They use some sort of machine-learning magic, and as a result, the output sounds extremely lifelike. Far more similar to a real human voice than AWS' equivalent offering ("Polly"), for example.
So I find myself writing various packages to convert HTML to an audio file I can listen to. Therefore, I wanted to make an npm package from which all of my other projects can import. This is that project.
What It Does
It can do the following things:
- Take an html file and break it apart into "segments". This means putting pauses in the places where pauses should go, adding sound effects as desired, and cleaning up various abbreviations.
- Display this "segments" file to you in a pleasant way (status: "still needs doing")
- Take this "segments" file from the previous step and create a wav and/or mp3 file.
- Add chapter metadata to the wav/mp3 file (status: "implemented but buggy")
- The Google API will not work if you give it too much text at a time. This repo abstracts away that problem for you, chunking it into separate requests.
- There are other potential headaches this repo may alleviate, and I will add them to this list later.
You'll need to create a Google Cloud API key. You can use
this article as instructions, but
the important part is that you end up with a file at
~/.google-api-credentials.json. You'll need
to create a project within their console, and then enable the "Cloud Text-to-Speech API" on that
You'll also need
ffmpeg installed. Install this with
brew install ffmpeg or similar.
Let's say you want to convert this arcticle to speech. First clone this repo:
git clone https://github.com/Arro/earthy-player.git
Next, go to the example directory:
Next, install the example:
Now, run it:
Finally, check your
~/Downloads folder. There should be a new folder there called
Run the Tests
These are the commands:
npm run testto run the tests. This repo uses
npm run tddto run the test with "--watch" enabled.