@pietrop/serialize-stt-words

A module to serialize and deserialize words from STT in dpe format into arrays of each attribute.

Usage no npm install needed!

<script type="module">
  import pietropSerializeSttWords from 'https://cdn.skypack.dev/@pietrop/serialize-stt-words';
</script>

README

serialize-stt-words

A module to serialize and deserialize words from STT in dpe format into arrays of each attribute.

This is as a workaround to firebase 1mb limit.

eg with euristics if mock8hours.json is 8 hours and 9.6MB

This is the breakdown of file size for each attribute saved seperately.

 58K paragraphEndTimes.json
 59K paragraphStartTimes.json
 93K speakersLit.json
637K textList.json
637K wordEndTimes.json
653K wordStartTimes.json

Well within the 1MB firebase document limit.

Setup

git clone git@github.com:pietrop/serialize-stt-words.git
cd serialize-stt-words
npm install

Usage

input transcript json
{
    "words": [
        {
            "text": "Hello",
            "start": 0,
            "end": 0.88
        },
        ....
    ],
  "paragraphs": [
        {
            "speaker": "SPEAKER_B",
            "start": 0,
            "end": 1.24
        },
    ...
   ]
}

Returns arrays of

npm install @pietrop/serialize-stt-words
Serialize
const { serializeTranscript } = require('@pietrop/serialize-stt-words');
const { wordStartTimes, wordEndTimes, textList, paragraphStartTimes, paragraphEndTimes, speakersLit } = serializeTranscript(transcript);
output example
{
    "wordStartTimes": [
        0,
        0.9,
        1.13,
        ...
    ],
  "wordEndTimes": [
        0.88,
        1.12,
        ...
    ],
    "textList": [
        "Media",
        "will",
        ...
    ],
    "paragraphStartTimes": [
        0,
        1.25,
        ...
    ],
    "paragraphEndTimes": [
        1.24,
        4,
        ...
    ],
    "speakersLit": [
        "SPEAKER_B",
        "SPEAKER_A",
        ...
    ]
}

The idea being that you could save each separate in a db and recombine later.

Deserialize
const { deserializeTranscript } = require('@pietrop/serialize-stt-words');
const desRes = deserializeTranscript({ wordStartTimes, wordEndTimes, textList, paragraphStartTimes, paragraphEndTimes, speakersLit });

Documentation

There's a docs folder in this repository.

docs/notes contains dev draft notes on various aspects of the project. This would generally be converted either into ADRs or guides when ready.

Development env

Node version is set in node version manager .nvmrc

nvm use

Tests

npm test

Deployment

npm run publish:public