npm:audio-formant

README

var createConverter = require('audio-formant');

var converter = createConverter({
    //can be omitted
    gl: document.createElement('canvas').getContext('webgl'),

    //formants data or number of formants to process (optional)
    formants: 4,

    //output array length (optional)
    blockSize: 512,

    //output number of channels (optional)
    channels: 2,

    //sample rate of output audio chunk (optional)
    sampleRate: 44100
});


//populate floatArray with audio data in planar format
converter.populate(array?);

//set formants — a sequence of <period, intensity, quality, panning> tuples
converter.setFormants([0,0,1,1, 1,1,0,0]);

//regenerate noise texture
converter.setNoise(data?);


//re-render to vary formants data per-sample, faster than `setFormants`
converter.textures.formants;


//Converter reserves texture spots form 0 to 5 (in case of sharing gl context).

What is formant?

First off, there is a couple of definitions of formant in wikipedia. Here is opinionated concept of formant.

TODO: image

Formant is a primitive able to describe atomic signal oscillation in terms of frequency, intensity and quality. The concept is extension of phasor with uncertainty parameter. Formant introduces continous scale covering signal forms between white noise and pure oscillation.

The idea hails from HSL color model applied to sound, where hue is frequency, saturation is quality and lightness is intensity.

In reality, formants can be found in almost any oscillation, starting from vocal tract — produced sound is a sum of membrane’s resonance and exhalation’s noise. Noise is always a factor existing in any signal, whether in form of dissipation or driving force. That is a fingerprint of reality. And too often it is excluded in analytical systems.

In metaphorical sense, formant expresses harmony/chaos ratio, quality/quantity relation and order of change.

Why formants?

Formants enable describing and manipulating sound in new ways, engaging the concept of "clarity". They can find multiple applications in music production, search, sound classification, analysis, recognition, reproducing, restoration, experimenting etc. One can simply imagine manipulations similar to instagram filters for sound — as if sound is reproduced from vinyl, or singed by someone, or spoken by voice in head, or simple equalizer etc.

Formants enable for a more natural way to understand and speak of sound, from music timbres to animal’s speech. They act like scalable vector graphics for sound.

What is the method?

Experiments displayed that the most effective (O(n)) way to reproduce formant is sampling a function (basically sine) with randomized step (phase). The method is called "phase walking".

[image]

The idea is somewhat between granular synthesis and quantum path. That method is taken as a basis.

Other methods include:

applying bandpass filter to white noise
summing multiple oscillators
emulating mass damping system differential equation with driving noise
inverse discrete fourier transform
wavelets
autocorrelation functions
subsampling noise
analytical solutions
???

Why WebGL?

Comparison of available technologies: Web Audio API, streams, threads, web workers and WebGL has shown that for realtime processing of hundreds of formants WebGL is the only viable technology, because it allows for extensive native parallelism unavailable anywhere else.

Nonetheless it has own difficulties, as it was not designed for sound processing. In particular, fragments has no correlation, therefore there is no simple way to walk the "phase path". There are two options for that: walking the path in each fragment or walking in verteces with saving values to varyings. The second method is significantly faster for big numbers of formants, but the number of varyings is limited depending on browser.

Implementation

Formant parameters are brought to 0..1 range, because that range is easy to understand, calculate, store, convert, display and also is widely used in every related domain.

Period reflects frequency of formant. Values from 0..1 range cover frequency values from 1hz to 20+khz. Intuitively period displays massiveness, as more massive objects exhibit lower frequencies, see simple harmonic motion.

Intensity displays the magnitude of oscillation. It masks the amplitude of produced wave. As any oscillation is a transformation between two forms of energy, magnitude reflects total energy being distributed in oscillator, which can be seen as maximum deviation, or disbalance, in one of these two forms, or the length of phasor vector in general.

Quality is Q factor normalized to range 0..1 by quality = f / tan(2 * π * Q). Value 1 makes formant a pure harmonic, 0 — white noise. Everything in between is a degree of freedom with fuzzy frequency. It can be understood as a Helmholtz resonator with unstable volume. That parameter makes formant good for description breath-related sounds, like flutes, whistles, natural sound transitions and noise approximation. Also with formant it is natural to express color of noise. It is a measure of how much the signal is pure, or focused, in frequency domain.

Panning param directs output to one of the output channels. It allows for easily implementing any number of audio channels. Also it is natural concept known to every sound producer.

Applications

Formants stream
LFO

audio-pulse — declarative formants based model of sound description.
audio-dsp coursera course — coursera introductory class to digital signal processing for audio.
periodic-wave — a way to define phasor in code.

audio-formantdeprecated

Usage no npm install needed!