Determine if there is a pause in speech using Web Audio API AudioContext - javascript

Trying to understand the Web Audio API better. We're using it to create an AudioContext and then sending audio to be transcribed. I want to be able to determine when there is a natural pause in speech or when the user stopped speaking.
Is there some data in onaudioprocess callback that can be accessed to determine pauses/breaks in speech?
let context = new AudioContext();
context.onstatechange = () => {};
this.setState({ context: context });
let source = context.createMediaStreamSource(stream);
let processor = context.createScriptProcessor(4096, 1, 1);
source.connect(processor);
processor.connect(context.destination);
processor.onaudioprocess = (event) => {
// Do some magic here
}
I tried a solution that is suggested on this post but did not achieve the results I need. Post: HTML Audio recording until silence?
When I parse for silence as the post suggests, I get the same result - either 0 or 128
let context = new AudioContext();
let source = context.createMediaStreamSource(stream);
let processor = context.createScriptProcessor(4096, 1, 1);
source.connect(processor);
processor.connect(context.destination);
/***
* Crete analyser
*
**/
let analyser = context.createAnalyser();
analyser.smoothingTimeConstant = 0;
analyser.fftSize = 2048;
let buffLength = analyser.frequencyBinCount;
let arrayFreqDomain = new Uint8Array(buffLength);
let arrayTimeDomain = new Uint8Array(buffLength);
processor.connect(analyser);
processor.onaudioprocess = (event) => {
/**
*
* Parse live real-time buffer looking for silence
*
**/
let f, t;
analyser.getByteFrequencyData(arrayFreqDomain);
analyser.getByteTimeDomainData(arrayTimeDomain);
for (var i = 0; i < buffLength; i++) {
arrayFreqDomain[i]; <---- gives 0 value always
arrayTimeDomain[i]; <---- gives 128 value always
}
}
Looking at the documentation for the getByteFrequencyData method I can see how it is supposed to be giving a different value (in the documentation example it will give a different barHeight), but it isn't working for me. https://developer.mozilla.org/en-US/docs/Web/API/AnalyserNode/getByteFrequencyData#Example

Related

Is there any way to track volume with the web audio api

I'd like to be able to track how loud the song gets in realtime, returning a number to represent the current volume of the song. The source is from a sample that plays when you click it. Ive seen tutorials on making a volume meter, but all I want is a number to measure this. here is my starting code.
const audioContext = new AudioContext();
const samplebutton = document.createElement('button')
samplebutton.innerText = 'sample'
samplebutton.addEventListener('click', async () => {
let volume = 0
const response = await fetch('testsong.wav')
const soundBuffer = await response.arrayBuffer()
const sampleBuffer = await audioContext.decodeAudioData(soundBuffer)
const sampleSource = audioContext.createBufferSource()
sampleSource.buffer = sampleBuffer
sampleSource.connect(audioContext.destination)
sampleSource.start()
function TrackVolume(){
}
You can calculate the volume by passing the audio through an AnalyserNode. With it you can get the amplitudes of every frequency at a specific moment in time and calculate the volume of the audio file.
const audioContext = new AudioContext();
const samplebutton = document.createElement('button')
samplebutton.innerText = 'sample'
samplebutton.addEventListener('click', async () => {
const response = await fetch('testsong.wav')
const soundBuffer = await response.arrayBuffer()
const sampleBuffer = await audioContext.decodeAudioData(soundBuffer)
const analyser = audioContext.createAnalyser();
const sampleSource = audioContext.createBufferSource()
sampleSource.buffer = sampleBuffer
const bufferLength = analyser.frequencyBinCount;
const dataArray = new Uint8Array(bufferLength);
sampleSource.connect(analyser);
analyser.connect(audioContext.destination)
sampleSource.start()
function caclculateVolume() {
analyser.getByteFrequencyData(dataArray)
let sum = 0;
for (const amplitude of dataArray) {
sum += amplitude * amplitude
}
const volume = Math.sqrt(sum / dataArray.length)
console.log(volume)
requestAnimationFrame(caclculateVolume)
}
caclculateVolume()
});

How to record HTML canvas only when stream.requestFrame() gets called, and save result? [duplicate]

I want to record a video from a HTML <canvas> element at a specific frame rate.
I am using CanvasCaptureMediaStream with canvas.captureStream(fps) and also have access to the video track via const track = stream.getVideoTracks()[0] so I create track.requestFrame() to write it to the output video buffer via MediaRecorder.
I want to precisely capture one frame at a time and then change the canvas content. Changing the canvas content can take some time (as images need to be loaded etc). So I can not capture the canvas in real-time.
Some changes on the canvas would happen in 500ms real-time so this needs also to be adjusted to rendering one frame at the time.
The MediaRecorder API is meant to record live-streams, doing edition is not what it was designed to do, and it doesn't do it very well to be honest...
The MediaRecorder itself has no concept of frame-rate, this is normally defined by the MediaStreamTrack. However, the CanvasCaptureStreamTrack doesn't really make it clear what its frame rate is.
We can pass a parameter to HTMLCanvas.captureStream(), but this only tells the max frames we want per seconds, it's not really an fps parameter.
Also, even if we stop drawing on the canvas, the recorder will still continue to extend the duration of the recorded video in real time (I think that technically only a single long frame is recorded though in this case).
So... we gonna have to hack around...
One thing we can do with the MediaRecorder is to pause() and resume() it.
Then sounds quite easy to pause before doing the long drawing operation and to resume right after it's been made? Yes... and not that easy either...
Once again, the frame-rate is dictated by the MediaStreamTrack, but this MediaStreamTrack can not be paused.
Well, actually there is one way to pause a special kind of MediaStreamTrack, and luckily I'm talking about CanvasCaptureMediaStreamTracks.
When we do call our capture-stream with a parameter of 0, we are basically having manual control over when new frames are added to the stream.
So here we can synchronize both our MediaRecorder adn our MediaStreamTrack to whatever frame-rate we want.
The basic workflow is
await the_long_drawing_task;
resumeTheRecorder();
writeTheFrameToStream(); // track.requestFrame();
await wait( time_per_frame );
pauseTheRecorder();
Doing so, the recorder is awaken only the time per frame we decided, and a single frame is passed to the MediaStream during this time, effectively mocking a constant FPS drawing for what the MediaRecorder is concerned.
But as always, hacks in this still experimental area come with a lot of browsers weirdness and the following demo actually only works in current Chrome...
For whatever reasons, Firefox will always generate files with twice the number of frames than what has been requested, and it will also occasionally prepend a long first frame...
Also to be noted, Chrome has a bug where it will update the canvas stream at drawing, even though we initiated this stream with a frameRequestRate of 0. So this means that if you start drawing before everything is ready, or if the drawing on your canvas itself takes a long time, then our recorder will record half-baked frames that we didn't asked for.
To workaround this bug, we thus need to use a second canvas, used only for the streaming. All we'll do on that canvas is to drawImage the source one, which will always be a fast enough operation. to not face that bug.
class FrameByFrameCanvasRecorder {
constructor(source_canvas, FPS = 30) {
this.FPS = FPS;
this.source = source_canvas;
const canvas = this.canvas = source_canvas.cloneNode();
const ctx = this.drawingContext = canvas.getContext('2d');
// we need to draw something on our canvas
ctx.drawImage(source_canvas, 0, 0);
const stream = this.stream = canvas.captureStream(0);
const track = this.track = stream.getVideoTracks()[0];
// Firefox still uses a non-standard CanvasCaptureMediaStream
// instead of CanvasCaptureMediaStreamTrack
if (!track.requestFrame) {
track.requestFrame = () => stream.requestFrame();
}
// prepare our MediaRecorder
const rec = this.recorder = new MediaRecorder(stream);
const chunks = this.chunks = [];
rec.ondataavailable = (evt) => chunks.push(evt.data);
rec.start();
// we need to be in 'paused' state
waitForEvent(rec, 'start')
.then((evt) => rec.pause());
// expose a Promise for when it's done
this._init = waitForEvent(rec, 'pause');
}
async recordFrame() {
await this._init; // we have to wait for the recorder to be paused
const rec = this.recorder;
const canvas = this.canvas;
const source = this.source;
const ctx = this.drawingContext;
if (canvas.width !== source.width ||
canvas.height !== source.height) {
canvas.width = source.width;
canvas.height = source.height;
}
// start our timer now so whatever happens between is not taken in account
const timer = wait(1000 / this.FPS);
// wake up the recorder
rec.resume();
await waitForEvent(rec, 'resume');
// draw the current state of source on our internal canvas (triggers requestFrame in Chrome)
ctx.clearRect(0, 0, canvas.width, canvas.height);
ctx.drawImage(source, 0, 0);
// force write the frame
this.track.requestFrame();
// wait until our frame-time elapsed
await timer;
// sleep recorder
rec.pause();
await waitForEvent(rec, 'pause');
}
async export () {
this.recorder.stop();
this.stream.getTracks().forEach((track) => track.stop());
await waitForEvent(this.recorder, "stop");
return new Blob(this.chunks);
}
}
///////////////////
// how to use:
(async() => {
const FPS = 30;
const duration = 5; // seconds
let x = 0;
let frame = 0;
const ctx = canvas.getContext('2d');
ctx.textAlign = 'right';
draw(); // we must have drawn on our canvas context before creating the recorder
const recorder = new FrameByFrameCanvasRecorder(canvas, FPS);
// draw one frame at a time
while (frame++ < FPS * duration) {
await longDraw(); // do the long drawing
await recorder.recordFrame(); // record at constant FPS
}
// now all the frames have been drawn
const recorded = await recorder.export(); // we can get our final video file
vid.src = URL.createObjectURL(recorded);
vid.onloadedmetadata = (evt) => vid.currentTime = 1e100; // workaround https://crbug.com/642012
download(vid.src, 'movie.webm');
// Fake long drawing operations that make real-time recording impossible
function longDraw() {
x = (x + 1) % canvas.width;
draw(); // this triggers a bug in Chrome
return wait(Math.random() * 300)
.then(draw);
}
function draw() {
ctx.fillStyle = 'white';
ctx.fillRect(0, 0, canvas.width, canvas.height);
ctx.fillStyle = 'black';
ctx.fillRect(x, 0, 50, 50);
ctx.fillText(frame + " / " + FPS * duration, 290, 140);
};
})().catch(console.error);
<canvas id="canvas"></canvas>
<video id="vid" controls></video>
<script>
// Some helpers
// Promise based timer
function wait(ms) {
return new Promise(res => setTimeout(res, ms));
}
// implements a sub-optimal monkey-patch for requestPostAnimationFrame
// see https://stackoverflow.com/a/57549862/3702797 for details
if (!window.requestPostAnimationFrame) {
window.requestPostAnimationFrame = function monkey(fn) {
const channel = new MessageChannel();
channel.port2.onmessage = evt => fn(evt.data);
requestAnimationFrame((t) => channel.port1.postMessage(t));
};
}
// Promisifies EventTarget.addEventListener
function waitForEvent(target, type) {
return new Promise((res) => target.addEventListener(type, res, {
once: true
}));
}
// creates a downloadable anchor from url
function download(url, filename = "file.ext") {
a = document.createElement('a');
a.textContent = a.download = filename;
a.href = url;
document.body.append(a);
return a;
}
</script>
I asked a similar question which has been linked to this one. In the meantime I came up with a solution which overlaps Kaiido's and which I think is worth reading.
I added two tricks:
I deferred the next render (see code), which fixes the problem of Firefox generating twice the number of frames
I stored an accumulated timing error to correct setTimeout's inaccuracies. I personally used it to tweak the progression of my render and for example skip frames if there is a sudden latency and keep the duration of the video close to the target duration. It is not enough to smoothen setTimeout though.
const recordFrames = (onstop, canvas, fps=30) => {
const chunks = [];
// get Firefox to initialise the canvas
canvas.getContext('2d').fillRect(0, 0, 0, 0);
const stream = canvas.captureStream();
const recorder = new MediaRecorder(stream);
recorder.addEventListener('dataavailable', ({data}) => chunks.push(data));
recorder.addEventListener('stop', () => onstop(new Blob(chunks)));
const frameDuration = 1000 / fps;
const frame = (next, start) => {
recorder.pause();
api.error += Date.now() - start - frameDuration;
setTimeout(next, 0); // helps Firefox record the right frame duration
};
const api = {
error: 0,
init() {
recorder.start();
recorder.pause();
},
step(next) {
recorder.resume();
setTimeout(frame, frameDuration, next, Date.now());
},
stop: () => recorder.stop()
};
return api;
}
how to use
const fps = 30;
const duration = 5000;
const animation = Something;
const videoOutput = blob => {
const video = document.createElement('video');
video.src = URL.createObjectURL(blob);
document.body.appendChild(video);
}
const recording = recordFrames(videoOutput, canvas, fps);
const startRecording = () => {
recording.init();
animation.play();
};
// I am assuming you can call these from your library
const onAnimationRender = nextFrame => recording.step(nextFrame);
const onAnimationEnd = () => recording.step(recording.stop);
let now = 0;
const progression = () => {
now = now + 1 + recorder.error * fps / 1000;
recorder.error = 0;
return now * 1000 / fps / duration
}
I found this solution to be satisfying at 30fps in both Chrome and Firefox. I didn't experience the Chrome bugs mentionned by Kaiido and thus didn't implement anything to deal with them.

web audio analyser's getFloatTimeDomainData buffer offset wrt buffers at other times and wrt buffer of 'complete file'

(question rewritten integrating bits of information from answers, plus making it more concise.)
I use analyser=audioContext.createAnalyser() in order to process audio data, and I'm trying to understand the details better.
I choose an fftSize, say 2048, then I create an array buffer of 2048 floats with Float32Array, and then, in an animation loop
(called 60 times per second on most machines, via window.requestAnimationFrame), I do
analyser.getFloatTimeDomainData(buffer);
which will fill my buffer with 2048 floating point sample data points.
When the handler is called the next time, 1/60 second has passed. To calculate how much that is in units of samples,
we have to divide it by the duration of 1 sample, and get (1/60)/(1/44100) = 735.
So the next handler call takes place (on average) 735 samples later.
So there is overlap between subsequent buffers, like this:
We know from the spec (search for 'render quantum') that everything happens in "chunck sizes" which are multiples of 128.
So (in terms of audio processing), one would expect that the next handler call will usually be either 5*128 = 640 samples later,
or else 6*128 = 768 samples later - those being the multiples of 128 closest to 735 samples = (1/60) second.
Calling this amount "Δ-samples", how do I find out what it is (during each handler call), 640 or 768 or something else?
Reliably, like this:
Consider the 'old buffer' (from previous handler call). If you delete "Δ-samples" many samples at the beginning, copy the remainder, and then append "Δ-samples" many new samples, that should be the current buffer. And indeed, I tried that,
and that is the case. It turns out "Δ-samples" often is 384, 512, 896. It is trivial but time consuming to determine
"Δ-samples" in a loop.
I would like to compute "Δ-samples" without performing that loop.
One would think the following would work:
(audioContext.currentTime() - (result of audioContext.currentTime() during last time handler ran))/(duration of 1 sample)
I tried that (see code below where I also "stich together" the various buffers, trying to reconstruct the original buffer),
and - surprise - it works about 99.9% of the time in Chrome, and about 95% of the time in Firefox.
I also tried audioContent.getOutputTimestamp().contextTime, which does not work in Chrome, and works 9?% in Firefox.
Is there any way to find "Δ-samples" (without looking at the buffers), which works reliably?
Second question, the "reconstructed" buffer (all the buffers from callbacks stitched together), and the original sound buffer
are not exactly the same, there is some (small, but noticable, more than usual "rounding error") difference, and that is bigger in Firefox.
Where does that come from? - You know, as I understand the spec, those should be the same.
var soundFile = 'https://mathheadinclouds.github.io/audio/sounds/la.mp3';
var audioContext = null;
var isPlaying = false;
var sourceNode = null;
var analyser = null;
var theBuffer = null;
var reconstructedBuffer = null;
var soundRequest = null;
var loopCounter = -1;
var FFT_SIZE = 2048;
var rafID = null;
var buffers = [];
var timesSamples = [];
var timeSampleDiffs = [];
var leadingWaste = 0;
window.addEventListener('load', function() {
soundRequest = new XMLHttpRequest();
soundRequest.open("GET", soundFile, true);
soundRequest.responseType = "arraybuffer";
//soundRequest.onload = function(evt) {}
soundRequest.send();
var btn = document.createElement('button');
btn.textContent = 'go';
btn.addEventListener('click', function(evt) {
goButtonClick(this, evt)
});
document.body.appendChild(btn);
});
function goButtonClick(elt, evt) {
initAudioContext(togglePlayback);
elt.parentElement.removeChild(elt);
}
function initAudioContext(callback) {
audioContext = new AudioContext();
audioContext.decodeAudioData(soundRequest.response, function(buffer) {
theBuffer = buffer;
callback();
});
}
function createAnalyser() {
analyser = audioContext.createAnalyser();
analyser.fftSize = FFT_SIZE;
}
function startWithSourceNode() {
sourceNode.connect(analyser);
analyser.connect(audioContext.destination);
sourceNode.start(0);
isPlaying = true;
sourceNode.addEventListener('ended', function(evt) {
sourceNode = null;
analyser = null;
isPlaying = false;
loopCounter = -1;
window.cancelAnimationFrame(rafID);
console.log('buffer length', theBuffer.length);
console.log('reconstructedBuffer length', reconstructedBuffer.length);
console.log('audio callback called counter', buffers.length);
console.log('root mean square error', Math.sqrt(checkResult() / theBuffer.length));
console.log('lengths of time between requestAnimationFrame callbacks, measured in audio samples:');
console.log(timeSampleDiffs);
console.log(
timeSampleDiffs.filter(function(val) {
return val === 384
}).length,
timeSampleDiffs.filter(function(val) {
return val === 512
}).length,
timeSampleDiffs.filter(function(val) {
return val === 640
}).length,
timeSampleDiffs.filter(function(val) {
return val === 768
}).length,
timeSampleDiffs.filter(function(val) {
return val === 896
}).length,
'*',
timeSampleDiffs.filter(function(val) {
return val > 896
}).length,
timeSampleDiffs.filter(function(val) {
return val < 384
}).length
);
console.log(
timeSampleDiffs.filter(function(val) {
return val === 384
}).length +
timeSampleDiffs.filter(function(val) {
return val === 512
}).length +
timeSampleDiffs.filter(function(val) {
return val === 640
}).length +
timeSampleDiffs.filter(function(val) {
return val === 768
}).length +
timeSampleDiffs.filter(function(val) {
return val === 896
}).length
)
});
myAudioCallback();
}
function togglePlayback() {
sourceNode = audioContext.createBufferSource();
sourceNode.buffer = theBuffer;
createAnalyser();
startWithSourceNode();
}
function myAudioCallback(time) {
++loopCounter;
if (!buffers[loopCounter]) {
buffers[loopCounter] = new Float32Array(FFT_SIZE);
}
var buf = buffers[loopCounter];
analyser.getFloatTimeDomainData(buf);
var now = audioContext.currentTime;
var nowSamp = Math.round(audioContext.sampleRate * now);
timesSamples[loopCounter] = nowSamp;
var j, sampDiff;
if (loopCounter === 0) {
console.log('start sample: ', nowSamp);
reconstructedBuffer = new Float32Array(theBuffer.length + FFT_SIZE + nowSamp);
leadingWaste = nowSamp;
for (j = 0; j < FFT_SIZE; j++) {
reconstructedBuffer[nowSamp + j] = buf[j];
}
} else {
sampDiff = nowSamp - timesSamples[loopCounter - 1];
timeSampleDiffs.push(sampDiff);
var expectedEqual = FFT_SIZE - sampDiff;
for (j = 0; j < expectedEqual; j++) {
if (reconstructedBuffer[nowSamp + j] !== buf[j]) {
console.error('unexpected error', loopCounter, j);
// debugger;
}
}
for (j = expectedEqual; j < FFT_SIZE; j++) {
reconstructedBuffer[nowSamp + j] = buf[j];
}
//console.log(loopCounter, nowSamp, sampDiff);
}
rafID = window.requestAnimationFrame(myAudioCallback);
}
function checkResult() {
var ch0 = theBuffer.getChannelData(0);
var ch1 = theBuffer.getChannelData(1);
var sum = 0;
var idxDelta = leadingWaste + FFT_SIZE;
for (var i = 0; i < theBuffer.length; i++) {
var samp0 = ch0[i];
var samp1 = ch1[i];
var samp = (samp0 + samp1) / 2;
var check = reconstructedBuffer[i + idxDelta];
var diff = samp - check;
var sqDiff = diff * diff;
sum += sqDiff;
}
return sum;
}
In above snippet, I do the following. I load with XMLHttpRequest a 1 second mp3 audio file from my github.io page (I sing 'la' for 1 second). After it has loaded, a button is shown, saying 'go', and after pressing that, the audio is played back by putting it into a bufferSource node and then doing .start on that. the bufferSource is the fed to our analyser, et cetera
related question
I also have the snippet code on my github.io page - makes reading the console easier.
I think the AnalyserNode is not what you want in this situation. You want to grab the data and keep it synchronized with raf. Use a ScriptProcessorNode or AudioWorkletNode to grab the data. Then you'll get all the data as it comes. No problems with overlap, or missing data or anything.
Note also that the clocks for raf and audio may be different and hence things may drift over time. You'll have to compensate for that yourself if you need to.
Unfortunately there is no way to find out the exact point in time at which the data returned by an AnalyserNode was captured. But you might be on the right track with your current approach.
All the values returned by the AnalyserNode are based on the "current-time-domain-data". This is basically the internal buffer of the AnalyserNode at a certain point in time. Since the Web Audio API has a fixed render quantum of 128 samples I would expect this buffer to evolve in steps of 128 samples as well. But currentTime usually evolves in steps of 128 samples already.
Furthermore the AnalyserNode has a smoothingTimeConstant property. It is responsible for "blurring" the returned values. The default value is 0.8. For your use case you probably want to set this to 0.
EDIT: As Raymond Toy pointed out in the comments the smoothingtimeconstant only has an effect on the frequency data. Since the question is about getFloatTimeDomainData() it will have no effect on the returned values.
I hope this helps but I think it would be easier to get all the samples of your audio signal by using an AudioWorklet. It would definitely be more reliable.
I'm not really following your math, so I can't tell exactly what you had wrong, but you seem to look at this in a too complicated manner.
The fftSize doesn't really matter here, what you want to calculate is how many samples have been passed since the last frame.
To calculate this, you just need to
Measure the time elapsed from last frame.
Divide this time by the time of a single frame.
The time of a single frame, is simply 1 / context.sampleRate.
So really all you need is currentTime - previousTime * ( 1 / sampleRate) and you'll find the index in the last frame where the data starts being repeated in the new one.
And only then, if you want the index in the new frame you'd subtract this index from the fftSize.
Now for why you sometimes have gaps, it's because AudioContext.prototype.currentTime returns the timestamp of the beginning of the next block to be passed to the graph.
The one we want here is AudioContext.prototype.getOuputTimestamp().contextTime which represents the timestamp of now, on the same same base as currentTime (i.e the creation of the context).
(function loop(){requestAnimationFrame(loop);})();
(async()=>{
const ctx = new AudioContext();
const buf = await fetch("https://upload.wikimedia.org/wikipedia/en/d/d3/Beach_Boys_-_Good_Vibrations.ogg").then(r=>r.arrayBuffer());
const aud_buf = await ctx.decodeAudioData(buf);
const source = ctx.createBufferSource();
source.buffer = aud_buf;
source.loop = true;
const analyser = ctx.createAnalyser();
const fftSize = analyser.fftSize = 2048;
source.loop = true;
source.connect( analyser );
source.start(0);
// for debugging we use two different buffers
const arr1 = new Float32Array( fftSize );
const arr2 = new Float32Array( fftSize );
const single_sample_dur = (1 / ctx.sampleRate);
console.log( 'single sample duration (ms)', single_sample_dur * 1000);
onclick = e => {
if( ctx.state === "suspended" ) {
ctx.resume();
return console.log( 'starting context, please try again' );
}
console.log( '-------------' );
requestAnimationFrame( () => {
// first frame
const time1 = ctx.getOutputTimestamp().contextTime;
analyser.getFloatTimeDomainData( arr1 );
requestAnimationFrame( () => {
// second frame
const time2 = ctx.getOutputTimestamp().contextTime;
analyser.getFloatTimeDomainData( arr2 );
const elapsed_time = time2 - time1;
console.log( 'elapsed time between two frame (ms)', elapsed_time * 1000 );
const calculated_index = fftSize - Math.round( elapsed_time / single_sample_dur );
console.log( 'calculated index of new data', calculated_index );
// for debugging we can just search for the first index where the data repeats
const real_time = fftSize - arr1.indexOf( arr2[ 0 ] );
console.log( 'real index', real_time > fftSize ? 0 : real_time );
if( calculated_index !== real_time > fftSize ? 0 : real_time ) {
console.error( 'different' );
}
});
});
};
document.body.classList.add('ready');
})().catch( console.error );
body:not(.ready) pre { display: none; }
<pre>click to record two new frames</pre>

Reducing sample rate of a Web Audio spectrum analyzer using mic input

I'm using the Web Audio API to create a simple spectrum analyzer using the computer microphone as the input signal. The basic functionality of my current implementation works fine, using the default sampling rate (usually 48KHz, but could be 44.1KHz depending on the browser).
For some applications, I would like to use a lower sampling rate (~8KHz) for the FFT.
It looks like the Web Audio API is adding support to customize the sample rate, currently only available on FireFox (https://developer.mozilla.org/en-US/docs/Web/API/AudioContextOptions/sampleRate).
Adding sample rate to the context constructor:
// create AudioContext object named 'audioCtx'
var audioCtx = new (AudioContext || webkitAudioContext)({sampleRate: 8000,});
console.log(audioCtx.sampleRate)
The console outputs '8000' (in FireFox), so it appears to be working up to this point.
The microphone is turned on by the user using a pull-down menu. This is the function servicing that pull-down:
var microphone;
function getMicInputState()
{
let selectedValue = document.getElementById("micOffOn").value;
if (selectedValue === "on") {
navigator.mediaDevices.getUserMedia({audio: true})
.then(stream => {
microphone = audioCtx.createMediaStreamSource(stream);
microphone.connect(analyserNode);
})
.catch(err => { alert("Microphone is required."); });
} else {
microphone.disconnect();
}
}
In FireFox, using the pulldown to activate the microphone displays a popup requesting access to the microphone (as normally expected). After clicking to allow the microphone, the console displays:
"Connecting AudioNodes from AudioContexts with different sample-rate is currently not supported".
The display of the spectrum analyzer remains blank.
Any ideas how to overcome this error? If we can get past this, any guidance on how to specify sampleRate when the user's soundcard sampling rate is unknown?
One approach to overcome this is passing audio packets captured from microphone to analyzer node via a script processor node that re-samples the audio packets passing through it.
Brief overview of script processor node
Every script processor node has an input buffer and an output buffer.
When audio enters the input buffer, the script processor node fires
onaudioprocess event.
Whatever is placed in the output buffer of script processor node becomes its output.
For detailed specs, refer : Script processor node
Here is the pseudo-code:
Create live media source, script processor node and analyzer node
Connect live media source to analyzer node via script processor
node
Whenever an audio packet enters the script processor
node, onaudioprocess event is fired
When onaudioprocess event is fired :
4.1) Extract audio data from input buffer
4.2) Re-sample audio data
4.3) Place re-sampled data in output buffer
The following code snippet implements the above pseudocode:
var microphone;
// *** 1) create a script processor node
var scriptProcessorNode = audioCtx.createScriptProcessor(4096, 1, 1);
function getMicInputState()
{
let selectedValue = document.getElementById("micOffOn").value;
if (selectedValue === "on") {
navigator.mediaDevices.getUserMedia({audio: true})
.then(stream => {
microphone = audioCtx.createMediaStreamSource(stream);
// *** 2) connect live media source to analyserNode via script processor node
microphone.connect(scriptProcessorNode);
scriptProcessorNode.connect(analyserNode);
})
.catch(err => { alert("Microphone is required."); });
} else {
microphone.disconnect();
}
}
// *** 3) Whenever an audio packet passes through script processor node, resample it
scriptProcessorNode.onaudioprocess = function(event){
var inputBuffer = event.inputBuffer;
var outputBuffer = event.outputBuffer;
for(var channel = 0; channel < outputBuffer.numberOfChannels; channel++){
var inputData = inputBuffer.getChannelData(channel);
var outputData = outputBuffer.getChannelData(channel);
// *** 3.1) Resample inputData
var fromSampleRate = audioCtx.sampleRate;
var toSampleRate = 8000;
var resampledAudio = downsample(inputData, fromSampleRate, toSampleRate);
// *** 3.2) make output equal to the resampled audio
for (var sample = 0; sample < outputData.length; sample++) {
outputData[sample] = resampledAudio[sample];
}
}
}
function downsample(buffer, fromSampleRate, toSampleRate) {
// buffer is a Float32Array
var sampleRateRatio = Math.round(fromSampleRate / toSampleRate);
var newLength = Math.round(buffer.length / sampleRateRatio);
var result = new Float32Array(newLength);
var offsetResult = 0;
var offsetBuffer = 0;
while (offsetResult < result.length) {
var nextOffsetBuffer = Math.round((offsetResult + 1) * sampleRateRatio);
var accum = 0, count = 0;
for (var i = offsetBuffer; i < nextOffsetBuffer && i < buffer.length; i++) {
accum += buffer[i];
count++;
}
result[offsetResult] = accum / count;
offsetResult++;
offsetBuffer = nextOffsetBuffer;
}
return result;
}
Update - 03 Nov, 2020
Script Processor Node is being deprecated and replaced with AudioWorklets.
The approach to changing the sample rate remains the same.
Downsampling from the constructor and connecting an AnalyserNode is now possible in Chrome and Safari.
So the following code, taken from the corresponding MDN documentation, would work:
const audioContext = new (window.AudioContext || window.webkitAudioContext)({
sampleRate: 8000
});
const mediaStream = await navigator.mediaDevices.getUserMedia({
audio: true,
video: false
});
const mediaStreamSource = audioContext.createMediaStreamSource(mediaStream);
const analyser = audioContext.createAnalyser();
analyser.fftSize = 256;
const bufferLength = analyser.frequencyBinCount;
const dataArray = new Uint8Array(bufferLength);
analyser.getByteFrequencyData(dataArray);
mediaStreamSource.connect(analyser);
const title = document.createElement("div");
title.innerText = `Sampling frequency 8kHz:`;
const wrapper = document.createElement("div");
const canvas = document.createElement("canvas");
wrapper.appendChild(canvas);
document.body.appendChild(title);
document.body.appendChild(wrapper);
const canvasCtx = canvas.getContext("2d");
function draw() {
requestAnimationFrame(draw);
analyser.getByteFrequencyData(dataArray);
canvasCtx.fillStyle = "rgb(0, 0, 0)";
canvasCtx.fillRect(0, 0, canvas.width, canvas.height);
var barWidth = canvas.width / bufferLength;
var barHeight = 0;
var x = 0;
for (var i = 0; i < bufferLength; i++) {
barHeight = dataArray[i] / 2;
canvasCtx.fillStyle = "rgb(" + (2 * barHeight + 100) + ",50,50)";
canvasCtx.fillRect(x, canvas.height - barHeight / 2, barWidth, barHeight);
x += barWidth + 1;
}
}
draw();
See here for a demo where both 48kHz and 8kHz sampled signal frequencies are displayed: https://codesandbox.io/s/vibrant-moser-cfex33

Is there a way to release OfflineAudioContext memory allocation?

Now I'm developing a software from Electron. It is a music player application.
What I would to do is I have to extract audio feature from (an) incoming song(s). This is not realtime extraction because I would like to extract feature entire of a song.
I use a library call Meyda and Web Audio API. I have noticed that my implementation consume amount of RAM (about ~2,000 MB).
Here is my implementation:
let offlineCtx = new OfflineAudioContext(
2,
duration * sampleRate,
sampleRate
)
let source = offlineCtx.createBufferSource()
let buffer = await audioCtx.decodeAudioData(songData.buffer)
source.buffer = buffer
source.connect(offlineCtx.destination)
source.start()
const SLICING_WINDOW_SIZE = 1024
let renderedBuffer = await offlineCtx.startRendering()
let channelData = await renderedBuffer.getChannelData(0)
let results = []
for (let i = 0; i < channelData.length - SLICING_WINDOW_SIZE; i += SLICING_WINDOW_SIZE) {
const r = Meyda.extract(
'mfcc',
channelData.slice(i, i + SLICING_WINDOW_SIZE)
)
results.push(r)
}
Is there a way to reduce RAM consuming? Thanks!

Categories

Resources