Ramp WebAudio gain, while playback is stopping and starting? - javascript

I have an application where audio playback may be starting and stopping, and there are UI controls to ramp the gain to zero or nonzero values. I'm scheduling playback using AudioBufferSourceNode.start, and modulating gain using AudioParam.linearRampToValueAtTime. Playback is sometimes scheduled for a future time. The problem I'm having is that the ramp function only seems to set values when playback is currently happening; so, if we try to set the gain value e.g. between playback being scheduled and playback starting, the new values are lost. I could do a bunch of timing checks, and either ramp or directly set gain depending on whether playback is happening, but this can get messy, and I was wondering whether there was an alternative way to do this which would work independently of playback starting and stopping.
Here is a test case: we create a one-second noise buffer and play it, while also ramping gain to zero. If playback is scheduled for after the ramp has ended (one second), the gain value never gets set and remains at the default, nonzero value.
var ctx = new AudioContext();
var SR = ctx.sampleRate;
var buffer = ctx.createBuffer(1, SR, SR);
var channelData = buffer.getChannelData(0);
for (var i=0; i<SR; i++) {
channelData[i] = Math.random() * 2 - 1;
}
var bufferNode = ctx.createBufferSource();
var gainNode = ctx.createGain();
bufferNode.buffer = buffer;
bufferNode.connect(gainNode);
gainNode.connect(ctx.destination);
gainNode.gain.linearRampToValueAtTime(0, ctx.currentTime + 1);
//XXX if start_delay is greater than 1 (the ramp duration),
// the gain is never changed and remains at 1.
var start_delay = 0;
bufferNode.start(ctx.currentTime + start_delay);

Couple of issues here:
The behavior of linearRampToValueAtTime without a preceeding automation event was originally not well-specified. This has since been fixed in the WebAudio spec. Don't know the status of various browsers on this, but I think Chrome does this correctly.
The automation of the gain node should work as you expect. If it doesn't file a bug against your browser vendor.

My problem seems to have been caused by a bug in Chrome, which was fixed in Chrome 57. Here is the bug report: https://bugs.chromium.org/p/chromium/issues/detail?id=647974
The following comment has a workaround for older versions of Chrome: https://bugs.chromium.org/p/chromium/issues/detail?id=647974#c9

Related

How do i get the audio frequency from my mic using javascript?

I need to create a sort of like guitar tuner.. thats recognize the sound frequencies and determines in witch chord i am actually playing. Its similar to this guitar tuner that i found online:
https://musicjungle.com.br/afinador-online
But i cant figure it out how it works because of the webpack files..I want to make this tool app backendless.. Someone have a clue about how to do this only in the front end?
i founded some old pieces of code that doesnt work together.. i need fresh ideas
There are quite a few problems to unpack here, some of which will require a bit more information as to the application. Hopefully the sheer size of this task will become apparent as this answer progresses.
As it stands, there are two problems here:
need to create a sort of like guitar tuner..
1. How do you detect the fundamental pitch of a guitar note and feed that information back to the user in the browser?
and
thats recognize the sound frequencies and determines in witch chord i am actually playing.
2. How do you detect which chord a guitar is playing?
This second question is definitely not a trivial one, but we'll come to it in turn. This is not a programming question, but rather a DSP question
Question 1: Pitch Detection in Browser
Breakdown
If you wish to detect the pitch of a note in the browser there are a couple sub-problems that should be split up. Shooting from the hip we have the following JavaScript browser problems:
how to get microphone permission?
how to tap microhone for sample data?
how to start an audio context?
how to display a value?
how to update a value regularly?
how to filter audio data?
how to perform pitch detection?
how to get pitch via autocorrolation?
how to get picth via zero-crossing?
how to get pitch from frequency domain?
how to perform a fourier transform?
This is not an exhaustive list, but it should consitute the bulk of the overall problem
There is no Minimal, Reproducible Example, so none of the above can be assumed.
Implementation
A basic implementation would consist of a numeric reprenstation of a single fundamental frequency (f0) using an autocorrolation method outlined in the A. v. Knesebeck and U. Zölzer paper [1].
There are other approaches which mix and match filtering and pitch detection algorithms which I believe is far outside the scope of a reasonable answer.
NOTE: The Web Audio API is still not equally implemented across all browser. You should check each of the major browsers and make accomodations in your program. The following was tested in Google Chrome, so your mileage may (and likely will) vary in other browsers.
HTML
Our page should include
an element to display frequency
an element to initiate pitch detection
A more rounded interface would likely split the operations of
Asking for microphone permission
starting microphone stream
processing microphone stream
into separate interface elements, but for brevity they will be wrapped into a single element. This gives us a basic HTML page of
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Pitch Detection</title>
</head>
<body>
<h1>Frequency (Hz)</h1>
<h2 id="frequency">0.0</h2>
<div>
<button onclick="startPitchDetection()">
Start Pitch Detection
</button>
</div>
</body>
</html>
We are jumping the gun slightly with <button onclick="startPitchDetection()">. We will wrap up the operation in a single function called startPitchDetection
Pallate of variables
For an autocorrolation pitch detection approach our pallate of variables will need to include:
the Audio context
the microphone stream
an Analyser Node
an array for audio data
an array for the corrolated signal
an array for corrolated signal maxima
a DOM reference to the frequency
giving us something like
let audioCtx = new (window.AudioContext || window.webkitAudioContext)();
let microphoneStream = null;
let analyserNode = audioCtx.createAnalyser()
let audioData = new Float32Array(analyserNode.fftSize);;
let corrolatedSignal = new Float32Array(analyserNode.fftSize);;
let localMaxima = new Array(10);
const frequencyDisplayElement = document.querySelector('#frequency');
Some value are left null as they will not be known until the microphone stream has been activated. The 10 in let localMaxima = new Array(10); is a little arbitrary. This array will store the distance in samples between consecutive maxima of the corrolated signal.
Main script
Our <button> element has an onclick function of startPitchDetection, so that will be required. We will also need
an update function (for updating the display)
an autocorrolation function that returns a pitch
However, the first thing we have to do is ask for permission to use the microphone. To achieve this we use navigator.mediaDevices.getUserMedia, which will returm a Promise. Embellishing on what is outlined in the MDN documentation this gives us something roughly looking like
navigator.mediaDevices.getUserMedia({audio: true})
.then((stream) => {
/* use the stream */
})
.catch((err) => {
/* handle the error */
});
Great! Now we can start adding our main functionality to the then function.
Our order of events should be
Start microphone stream
connect microphone stream to the analyser node
set a timed callback to
get the latest time domain audio data from the Analyser Node
get the autocorrolation derived pitch estimate
update html element with the value
On top of that, add a log of the error from the catch method.
This can then all be wrapped into the startPitchDetection function, giving something like:
function startPitchDetection()
{
navigator.mediaDevices.getUserMedia ({audio: true})
.then((stream) =>
{
microphoneStream = audioCtx.createMediaStreamSource(stream);
microphoneStream.connect(analyserNode);
audioData = new Float32Array(analyserNode.fftSize);
corrolatedSignal = new Float32Array(analyserNode.fftSize);
setInterval(() => {
analyserNode.getFloatTimeDomainData(audioData);
let pitch = getAutocorrolatedPitch();
frequencyDisplayElement.innerHTML = `${pitch}`;
}, 300);
})
.catch((err) =>
{
console.log(err);
});
}
The update interval for setInterval of 300 is arbitrary. A little experimentation will dictate which interval is best for you. You may even wish to give the user control of this, but that is outside the scope of thise question.
The next step is to actually define what getAutocorrolatedPitch() does, so lets actually breakdown what autocorrolation is.
Autocorrelation is the process of convolving a signal with itself. Any time the result goes from a positive rate of change to a negative rate of change is defined as a local maxima. The number of samples between the start of the corrolated signal to the first maxima should be the period in samples of f0. We can continue to look for subsequent maxima and take an average which should improve accuracy slightly. Some frequencies do not have a period of whole samples, for instance 440 Hz at a sample rate of 44100 Hz has a period of 100.227. We technichally could never accurately detect this frequency of 440 Hz by taking a single maxima, the result would always be either 441 Hz (44100/100) or 436 Hz (44100/101).
For our autocorrolation function, we'll need
a track of how many maxima that have been detected
the mean distance between maxima
Our function should first perform the autocorrolation, find the sample positions of local maxima and then calculate the mean distance between these maxima. This give a function looking like:
function getAutocorrolatedPitch()
{
// First: autocorrolate the signal
let maximaCount = 0;
for (let l = 0; l < analyserNode.fftSize; l++) {
corrolatedSignal[l] = 0;
for (let i = 0; i < analyserNode.fftSize - l; i++) {
corrolatedSignal[l] += audioData[i] * audioData[i + l];
}
if (l > 1) {
if ((corrolatedSignal[l - 2] - corrolatedSignal[l - 1]) < 0
&& (corrolatedSignal[l - 1] - corrolatedSignal[l]) > 0) {
localMaxima[maximaCount] = (l - 1);
maximaCount++;
if ((maximaCount >= localMaxima.length))
break;
}
}
}
// Second: find the average distance in samples between maxima
let maximaMean = localMaxima[0];
for (let i = 1; i < maximaCount; i++)
maximaMean += localMaxima[i] - localMaxima[i - 1];
maximaMean /= maximaCount;
return audioCtx.sampleRate / maximaMean;
}
Problems
Once you have implemented this you may find there are actually a couple of problems.
The frequency result is a bit erratic
the display method is not intuitive for tuning purposes
The erratic result is down to the fact that autocorrolation by itself is not a perfect solution. You will need to experiment with first filtering the signal and aggregating other methods. You could also try limiting the signal or only analyse the signal when it is above a certain threshold. You could also increase the rate at which you perform the detection and average out the results.
Secondly, the method for display is limited. Musician would not be appreciative of a simple numerical result. Rather, some kind of graphical feedback would be more intuitive. Again, that is outside the scope of the question.
Full page and script
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Pitch Detection</title>
</head>
<body>
<h1>Frequency (Hz)</h1>
<h2 id="frequency">0.0</h2>
<div>
<button onclick="startPitchDetection()">
Start Pitch Detection
</button>
</div>
<script>
let audioCtx = new (window.AudioContext || window.webkitAudioContext)();
let microphoneStream = null;
let analyserNode = audioCtx.createAnalyser()
let audioData = new Float32Array(analyserNode.fftSize);;
let corrolatedSignal = new Float32Array(analyserNode.fftSize);;
let localMaxima = new Array(10);
const frequencyDisplayElement = document.querySelector('#frequency');
function startPitchDetection()
{
navigator.mediaDevices.getUserMedia ({audio: true})
.then((stream) =>
{
microphoneStream = audioCtx.createMediaStreamSource(stream);
microphoneStream.connect(analyserNode);
audioData = new Float32Array(analyserNode.fftSize);
corrolatedSignal = new Float32Array(analyserNode.fftSize);
setInterval(() => {
analyserNode.getFloatTimeDomainData(audioData);
let pitch = getAutocorrolatedPitch();
frequencyDisplayElement.innerHTML = `${pitch}`;
}, 300);
})
.catch((err) =>
{
console.log(err);
});
}
function getAutocorrolatedPitch()
{
// First: autocorrolate the signal
let maximaCount = 0;
for (let l = 0; l < analyserNode.fftSize; l++) {
corrolatedSignal[l] = 0;
for (let i = 0; i < analyserNode.fftSize - l; i++) {
corrolatedSignal[l] += audioData[i] * audioData[i + l];
}
if (l > 1) {
if ((corrolatedSignal[l - 2] - corrolatedSignal[l - 1]) < 0
&& (corrolatedSignal[l - 1] - corrolatedSignal[l]) > 0) {
localMaxima[maximaCount] = (l - 1);
maximaCount++;
if ((maximaCount >= localMaxima.length))
break;
}
}
}
// Second: find the average distance in samples between maxima
let maximaMean = localMaxima[0];
for (let i = 1; i < maximaCount; i++)
maximaMean += localMaxima[i] - localMaxima[i - 1];
maximaMean /= maximaCount;
return audioCtx.sampleRate / maximaMean;
}
</script>
</body>
</html>
Question 2: Detecting multiple notes
At this point I think we can all agree that this answer has gotten a little out of hand. So far we've just covered a single method of pitch detection. See Ref [2, 3, 4] for some suggestions of algorithms for multiple f0 detection.
In essence, this problem would come down to detecting all f0s and looking up the resulting notes against a dictionary of chords. For that, there should at least be a little work done on your part. Any questions about the DSP should probably be pointed toward https://dsp.stackexchange.com. You will be spoiled for choice on questions regarding pitch detection algorithms
References
A. v. Knesebeck and U. Zölzer, "Comparison of pitch trackers for real-time guitar effects", in Proceedings of the 13th International Conference on Digital Audio Effects (DAFx-10), Graz, Austria, September 6-10, 2010.
A. P. Klapuri, "A perceptually motivated multiple-F0 estimation method," IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005., 2005, pp. 291-294, doi: 10.1109/ASPAA.2005.1540227.
A. P. Klapuri, "Multiple fundamental frequency estimation based on harmonicity and spectral smoothness," in IEEE Transactions on Speech and Audio Processing, vol. 11, no. 6, pp. 804-816, Nov. 2003, doi: 10.1109/TSA.2003.815516.
A. P. Klapuri, "Multipitch estimation and sound separation by the spectral smoothness principle," 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), 2001, pp. 3381-3384 vol.5, doi: 10.1109/ICASSP.2001.940384.
I suppose it'll depend how you're building your application. Hard to help without much detail around specs. Though, here are a few options for you.
There are a few stream options, for example;
mic
Or if you're using React;
react-mic
Or if you're wanting to go real basic with some vanilla JS;
Web Fundamentals: Recording Audio

Limited playback rate using the Web Audio API

I'm trying to use the Web Audio API to create a simple synthesizer, but I'm having problem with the playback rate of an AudioBuffer. Its value seems to be limited when I try using a somewhat high value.
Here's my code sample. I first create a sample buffer containing a simple waveform that has as much samples as the sample rate (this creates a waveform at 1Hz if it was directly read). I then create an AudioBuffer that will contain the samples. Finaly, I create an AudioBufferSourceNode that will play the previous buffer in a loop at a some playback rate (which translate to an audible frequency).
const audioContext = new window.AudioContext();
const buffer = new Float32Array(audioContext.sampleRate);
for (let i = 0; i < buffer.length; i++) {
buffer[i] = Math.sin(2 * Math.PI * i / audioContext.sampleRate);
}
const audioBuffer = new AudioBuffer({
length: buffer.length,
numberOfChannels: 1,
sampleRate: audioContext.sampleRate
});
audioBuffer.copyToChannel(buffer, 0);
const sourceNode = new AudioBufferSourceNode(audioContext, {
buffer: audioBuffer,
loop: true,
playbackRate: 440
});
sourceNode.connect(audioContext.destination);
sourceNode.start();
In this scenario, playbackRate seems to be limited at 1024. Any value higher than this will be not have any audible effect. I verified its maximum value (sourceNode.playbackRate.maxValue) and it's around 3.4e+38, which is way above what I'm trying to achieve.
I'm wondering if there's something I'm missing or if my understanding of this API is wrong. I'm using Google Chrome on the latest version if this changes anything.
This is a bug in Chrome: https://crbug.com/712256. I believe Firefox may implement this correctly (but I did not check).
Note, also, that Chrome uses very simple interpolation. The quality of the output degrades quite a bit when the playbackRate is very different from 1.

Sound fades out, but does not fade in -- why?

I think I understand the main concept of Web Audio API, as well as how sounds are working in general. And even though I managed to make the sound "fade out", I cannot figure out, why it is not "fading in" in the following snippet I wrote to represent the problem:
(function ()
{
'use strict';
var context = new AudioContext(),
wave = context.createOscillator(),
gain = context.createGain(),
ZERO = 0.000001;
wave.connect(gain);
gain.connect(context.destination);
wave.type = 'sine';
wave.frequency.value = 200;
gain.gain.value = ZERO;
wave.start(context.currentTime);
gain.gain.exponentialRampToValueAtTime(1.00, 1.0);
gain.gain.exponentialRampToValueAtTime(ZERO, 3.0);
})();
NOTE: The same problem appeared on Firefox (Linux) and Chrome (Windows) too
Replacing your gain.gain.value = ZERO line with:
gain.gain.setValueAtTime(ZERO, 0);
will fix the problem.
The rationale is in the actual specification of the exponentialRampToValueAtTime() function:
Schedules an exponential continuous change in parameter value from the previous scheduled parameter value to the given value
So, if there's no previous scheduled parameter value (only a fixed value) then the function cannot interpolate. The same applies to the linearRampToValueAtTime function.
This may also be useful from the MDN documentation:
AudioParam.value ... Though it can be set, any modifications happening while there are automation events scheduled — that is events scheduled using the methods of the AudioParam — are ignored, without raising any exception
You need to
gain.gain.setValueAtTime(ZERO, 0);
because just setting
gain.gain.value = ZERO;
does not set a schedule point in the AudioParam scheduler - so it's scheduling from the last know schedule point (which is the default value of 1 at time=0). Mixing setting .value and scheduling does not tend to work well; I've had an article 75% written about this for a long time, and just haven't gotten it released.

Recorder.js calculate and offset recording for latency

I'm using Recorder.js to record audio from Google Chrome desktop and mobile browsers. In my specific use case I need to record exactly 3 seconds of audio, starting and ending at a specific time.
Now I know that when recording audio, your soundcard cannot work in realtime due to hardware delays, so there is always a memory buffer which allows you to keep up recording without hearing jumps/stutters.
Recorder.js allows you to configure the bufferLen variable exactly for this, while sampleRate is taken automatically from the audio context object. Here is a simplified version of how it works:
var context = new AudioContext();
var recorder;
navigator.getUserMedia({audio: true}, function(stream) {
recorder = new Recorder(context.createMediaStreamSource(stream), {
bufferLen: 4096
});
});
function recordLoop() {
recorder.record();
window.setTimeout(function () {
recorder.stop();
}, 3000);
}
The issue i'm facing is that record() does not offset for the buffer latency and neither does stop(). So instead of getting a three second sound, it's 2.97 seconds and the start is cut off.
This means my recordings don't start in the same place, and also when I loop them, the loops are different lengths depending on your device latency!!
There are two potentially solutions I see here:
Adjust Recorder.js code to offset the buffer automatically against your start/stop times (maybe add new startSync/stopSync functions)
Calculate the latency and create two offset timers to start and stop Recorder.js at the correct points in time.
I'm trying solution 2, because solution 1 requires knowledge of buffer arrays which I don't have :( I believe the calculation for latency is:
var bufferSize = 4096;
var sampleRate = 44100
var latency = (bufferSize / sampleRate) * 2; // 0.18575963718820862 secs
However when I run these calculations in a real test I get:
var duration = 2.972154195011338 secs
var latency = 0.18575963718820862 secs
var total = duration + latency // 3.1579138321995464 secs
Something isn't right, it doesn't make 3 seconds and it's beginning to confuse me now! I've created a working fork of Recorder.js demo with a log:
http://kmturley.github.io/Recorderjs/
Any help would be greatly appreciated. Thanks!
I'm a bit confused by your concern for the latency. Yes, it's true that the minimum possible latency is going to be the related to the length of the buffer but there are many other latencies involved. In any case, the latency has nothing to do with the recording duration, which seems to me to be what your question is about.
If you want to record an exactly 3 second long buffer at 44100 that is 44100*3=132,300 samples. The buffer size is 4096 samples and the system is only going to record an even multiple of that number. Given that the closest you are going to get is to record either 32 or 33 complete buffers. This gives either 131072 (2.97 seconds) or 135168 (3.065 seconds) samples.
You have a couple options here.
Choose a buffer length that evenly divides the sample rate. e.g. 11025. You can then record exactly 12 buffers.
Record slightly longer than the 3.0 seconds you need and then throw the extra 2868 samples away.

Calculating the AnalyserNode's smoothingTimeConstant

I am using the Web Audio API to display a visualization of the audio being played. I have an <audio> element that is controlling the playback, I then hook it up to the Web Audio API with by creating a MediaElementSource node from the <audio> element. That is then connected to a GainNode and an AnalyserNode. The AnalyserNode's smoothingTimeConstant is set to 0.6. The GainNode is then connected to the AudioContext.destination.
I then call my audio processing function: onAudioProcess(). That function will continually call itself using:
audioAnimation = requestAnimationFrame(onAudioProcess);
The function uses the AnalyserNode to getByteFrequencyData from the audio, then loops through the (now populated) Uint8Array and draws each frequency magnitude on the <canvas> element's 2d context. This all works fine.
My issue is that when you pause the <audio> element, my onAudioProcess function continues to loop (by requesting animation frames on itself) which is needlessly eating up CPU cycles. I can cancelAnimationFrame(audioAnimation) but that leaves the last-drawn frequencies on the canvas. I can resolve that by also calling clearRect on the canvas's 2d context, but it looks very odd compared to just letting the audio processing loop continue (which slowly lowers each bar to the bottom of the canvas because of the smoothingTimeConstant).
So what I ended up doing was setting a timeout when the <audio> is paused, prior to canceling the animation frame. Doing this I was able to save CPU cycles when no audio was playing AND I was still able to maintain the smooth lowering of the frequency bars drawn on the <canvas>.
MY QUESTION: How do I accurately calculate the number of milliseconds it takes for a frequency magnitude of 255 to hit 0 (the range is 0-255) based on the AnalyserNode's smoothingTimeConstant value so that I can properly set the timeout to cancel the animation frame?
Based on my reading of the spec, I'd think you'd figure it out like this:
var val = 255
, smooth = 0.6
, sampl = 48000
, i = 0
, ms;
for ( ; val > 0.001; i++ ){
val = ( val + val * smooth ) / 2;
}
ms = ( i / sampl * 1000 );
The problem is that with this kind of averaging, you never really get all the way down to zero - so the loop condition is kind of arbitrary. You can make that number smaller and as you'd expect, the value for ms gets larger.
Anyway, I could be completely off-base here. But a quick look through the actual Chromium source code seems to sort of confirm that this is how it works. Although I'll be the first to admit my C++ is pretty bad.

Categories

Resources