Limited playback rate using the Web Audio API - javascript

I'm trying to use the Web Audio API to create a simple synthesizer, but I'm having problem with the playback rate of an AudioBuffer. Its value seems to be limited when I try using a somewhat high value.
Here's my code sample. I first create a sample buffer containing a simple waveform that has as much samples as the sample rate (this creates a waveform at 1Hz if it was directly read). I then create an AudioBuffer that will contain the samples. Finaly, I create an AudioBufferSourceNode that will play the previous buffer in a loop at a some playback rate (which translate to an audible frequency).
const audioContext = new window.AudioContext();
const buffer = new Float32Array(audioContext.sampleRate);
for (let i = 0; i < buffer.length; i++) {
buffer[i] = Math.sin(2 * Math.PI * i / audioContext.sampleRate);
}
const audioBuffer = new AudioBuffer({
length: buffer.length,
numberOfChannels: 1,
sampleRate: audioContext.sampleRate
});
audioBuffer.copyToChannel(buffer, 0);
const sourceNode = new AudioBufferSourceNode(audioContext, {
buffer: audioBuffer,
loop: true,
playbackRate: 440
});
sourceNode.connect(audioContext.destination);
sourceNode.start();
In this scenario, playbackRate seems to be limited at 1024. Any value higher than this will be not have any audible effect. I verified its maximum value (sourceNode.playbackRate.maxValue) and it's around 3.4e+38, which is way above what I'm trying to achieve.
I'm wondering if there's something I'm missing or if my understanding of this API is wrong. I'm using Google Chrome on the latest version if this changes anything.

This is a bug in Chrome: https://crbug.com/712256. I believe Firefox may implement this correctly (but I did not check).
Note, also, that Chrome uses very simple interpolation. The quality of the output degrades quite a bit when the playbackRate is very different from 1.

Related

How do i get the audio frequency from my mic using javascript?

I need to create a sort of like guitar tuner.. thats recognize the sound frequencies and determines in witch chord i am actually playing. Its similar to this guitar tuner that i found online:
https://musicjungle.com.br/afinador-online
But i cant figure it out how it works because of the webpack files..I want to make this tool app backendless.. Someone have a clue about how to do this only in the front end?
i founded some old pieces of code that doesnt work together.. i need fresh ideas
There are quite a few problems to unpack here, some of which will require a bit more information as to the application. Hopefully the sheer size of this task will become apparent as this answer progresses.
As it stands, there are two problems here:
need to create a sort of like guitar tuner..
1. How do you detect the fundamental pitch of a guitar note and feed that information back to the user in the browser?
and
thats recognize the sound frequencies and determines in witch chord i am actually playing.
2. How do you detect which chord a guitar is playing?
This second question is definitely not a trivial one, but we'll come to it in turn. This is not a programming question, but rather a DSP question
Question 1: Pitch Detection in Browser
Breakdown
If you wish to detect the pitch of a note in the browser there are a couple sub-problems that should be split up. Shooting from the hip we have the following JavaScript browser problems:
how to get microphone permission?
how to tap microhone for sample data?
how to start an audio context?
how to display a value?
how to update a value regularly?
how to filter audio data?
how to perform pitch detection?
how to get pitch via autocorrolation?
how to get picth via zero-crossing?
how to get pitch from frequency domain?
how to perform a fourier transform?
This is not an exhaustive list, but it should consitute the bulk of the overall problem
There is no Minimal, Reproducible Example, so none of the above can be assumed.
Implementation
A basic implementation would consist of a numeric reprenstation of a single fundamental frequency (f0) using an autocorrolation method outlined in the A. v. Knesebeck and U. Zölzer paper [1].
There are other approaches which mix and match filtering and pitch detection algorithms which I believe is far outside the scope of a reasonable answer.
NOTE: The Web Audio API is still not equally implemented across all browser. You should check each of the major browsers and make accomodations in your program. The following was tested in Google Chrome, so your mileage may (and likely will) vary in other browsers.
HTML
Our page should include
an element to display frequency
an element to initiate pitch detection
A more rounded interface would likely split the operations of
Asking for microphone permission
starting microphone stream
processing microphone stream
into separate interface elements, but for brevity they will be wrapped into a single element. This gives us a basic HTML page of
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Pitch Detection</title>
</head>
<body>
<h1>Frequency (Hz)</h1>
<h2 id="frequency">0.0</h2>
<div>
<button onclick="startPitchDetection()">
Start Pitch Detection
</button>
</div>
</body>
</html>
We are jumping the gun slightly with <button onclick="startPitchDetection()">. We will wrap up the operation in a single function called startPitchDetection
Pallate of variables
For an autocorrolation pitch detection approach our pallate of variables will need to include:
the Audio context
the microphone stream
an Analyser Node
an array for audio data
an array for the corrolated signal
an array for corrolated signal maxima
a DOM reference to the frequency
giving us something like
let audioCtx = new (window.AudioContext || window.webkitAudioContext)();
let microphoneStream = null;
let analyserNode = audioCtx.createAnalyser()
let audioData = new Float32Array(analyserNode.fftSize);;
let corrolatedSignal = new Float32Array(analyserNode.fftSize);;
let localMaxima = new Array(10);
const frequencyDisplayElement = document.querySelector('#frequency');
Some value are left null as they will not be known until the microphone stream has been activated. The 10 in let localMaxima = new Array(10); is a little arbitrary. This array will store the distance in samples between consecutive maxima of the corrolated signal.
Main script
Our <button> element has an onclick function of startPitchDetection, so that will be required. We will also need
an update function (for updating the display)
an autocorrolation function that returns a pitch
However, the first thing we have to do is ask for permission to use the microphone. To achieve this we use navigator.mediaDevices.getUserMedia, which will returm a Promise. Embellishing on what is outlined in the MDN documentation this gives us something roughly looking like
navigator.mediaDevices.getUserMedia({audio: true})
.then((stream) => {
/* use the stream */
})
.catch((err) => {
/* handle the error */
});
Great! Now we can start adding our main functionality to the then function.
Our order of events should be
Start microphone stream
connect microphone stream to the analyser node
set a timed callback to
get the latest time domain audio data from the Analyser Node
get the autocorrolation derived pitch estimate
update html element with the value
On top of that, add a log of the error from the catch method.
This can then all be wrapped into the startPitchDetection function, giving something like:
function startPitchDetection()
{
navigator.mediaDevices.getUserMedia ({audio: true})
.then((stream) =>
{
microphoneStream = audioCtx.createMediaStreamSource(stream);
microphoneStream.connect(analyserNode);
audioData = new Float32Array(analyserNode.fftSize);
corrolatedSignal = new Float32Array(analyserNode.fftSize);
setInterval(() => {
analyserNode.getFloatTimeDomainData(audioData);
let pitch = getAutocorrolatedPitch();
frequencyDisplayElement.innerHTML = `${pitch}`;
}, 300);
})
.catch((err) =>
{
console.log(err);
});
}
The update interval for setInterval of 300 is arbitrary. A little experimentation will dictate which interval is best for you. You may even wish to give the user control of this, but that is outside the scope of thise question.
The next step is to actually define what getAutocorrolatedPitch() does, so lets actually breakdown what autocorrolation is.
Autocorrelation is the process of convolving a signal with itself. Any time the result goes from a positive rate of change to a negative rate of change is defined as a local maxima. The number of samples between the start of the corrolated signal to the first maxima should be the period in samples of f0. We can continue to look for subsequent maxima and take an average which should improve accuracy slightly. Some frequencies do not have a period of whole samples, for instance 440 Hz at a sample rate of 44100 Hz has a period of 100.227. We technichally could never accurately detect this frequency of 440 Hz by taking a single maxima, the result would always be either 441 Hz (44100/100) or 436 Hz (44100/101).
For our autocorrolation function, we'll need
a track of how many maxima that have been detected
the mean distance between maxima
Our function should first perform the autocorrolation, find the sample positions of local maxima and then calculate the mean distance between these maxima. This give a function looking like:
function getAutocorrolatedPitch()
{
// First: autocorrolate the signal
let maximaCount = 0;
for (let l = 0; l < analyserNode.fftSize; l++) {
corrolatedSignal[l] = 0;
for (let i = 0; i < analyserNode.fftSize - l; i++) {
corrolatedSignal[l] += audioData[i] * audioData[i + l];
}
if (l > 1) {
if ((corrolatedSignal[l - 2] - corrolatedSignal[l - 1]) < 0
&& (corrolatedSignal[l - 1] - corrolatedSignal[l]) > 0) {
localMaxima[maximaCount] = (l - 1);
maximaCount++;
if ((maximaCount >= localMaxima.length))
break;
}
}
}
// Second: find the average distance in samples between maxima
let maximaMean = localMaxima[0];
for (let i = 1; i < maximaCount; i++)
maximaMean += localMaxima[i] - localMaxima[i - 1];
maximaMean /= maximaCount;
return audioCtx.sampleRate / maximaMean;
}
Problems
Once you have implemented this you may find there are actually a couple of problems.
The frequency result is a bit erratic
the display method is not intuitive for tuning purposes
The erratic result is down to the fact that autocorrolation by itself is not a perfect solution. You will need to experiment with first filtering the signal and aggregating other methods. You could also try limiting the signal or only analyse the signal when it is above a certain threshold. You could also increase the rate at which you perform the detection and average out the results.
Secondly, the method for display is limited. Musician would not be appreciative of a simple numerical result. Rather, some kind of graphical feedback would be more intuitive. Again, that is outside the scope of the question.
Full page and script
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Pitch Detection</title>
</head>
<body>
<h1>Frequency (Hz)</h1>
<h2 id="frequency">0.0</h2>
<div>
<button onclick="startPitchDetection()">
Start Pitch Detection
</button>
</div>
<script>
let audioCtx = new (window.AudioContext || window.webkitAudioContext)();
let microphoneStream = null;
let analyserNode = audioCtx.createAnalyser()
let audioData = new Float32Array(analyserNode.fftSize);;
let corrolatedSignal = new Float32Array(analyserNode.fftSize);;
let localMaxima = new Array(10);
const frequencyDisplayElement = document.querySelector('#frequency');
function startPitchDetection()
{
navigator.mediaDevices.getUserMedia ({audio: true})
.then((stream) =>
{
microphoneStream = audioCtx.createMediaStreamSource(stream);
microphoneStream.connect(analyserNode);
audioData = new Float32Array(analyserNode.fftSize);
corrolatedSignal = new Float32Array(analyserNode.fftSize);
setInterval(() => {
analyserNode.getFloatTimeDomainData(audioData);
let pitch = getAutocorrolatedPitch();
frequencyDisplayElement.innerHTML = `${pitch}`;
}, 300);
})
.catch((err) =>
{
console.log(err);
});
}
function getAutocorrolatedPitch()
{
// First: autocorrolate the signal
let maximaCount = 0;
for (let l = 0; l < analyserNode.fftSize; l++) {
corrolatedSignal[l] = 0;
for (let i = 0; i < analyserNode.fftSize - l; i++) {
corrolatedSignal[l] += audioData[i] * audioData[i + l];
}
if (l > 1) {
if ((corrolatedSignal[l - 2] - corrolatedSignal[l - 1]) < 0
&& (corrolatedSignal[l - 1] - corrolatedSignal[l]) > 0) {
localMaxima[maximaCount] = (l - 1);
maximaCount++;
if ((maximaCount >= localMaxima.length))
break;
}
}
}
// Second: find the average distance in samples between maxima
let maximaMean = localMaxima[0];
for (let i = 1; i < maximaCount; i++)
maximaMean += localMaxima[i] - localMaxima[i - 1];
maximaMean /= maximaCount;
return audioCtx.sampleRate / maximaMean;
}
</script>
</body>
</html>
Question 2: Detecting multiple notes
At this point I think we can all agree that this answer has gotten a little out of hand. So far we've just covered a single method of pitch detection. See Ref [2, 3, 4] for some suggestions of algorithms for multiple f0 detection.
In essence, this problem would come down to detecting all f0s and looking up the resulting notes against a dictionary of chords. For that, there should at least be a little work done on your part. Any questions about the DSP should probably be pointed toward https://dsp.stackexchange.com. You will be spoiled for choice on questions regarding pitch detection algorithms
References
A. v. Knesebeck and U. Zölzer, "Comparison of pitch trackers for real-time guitar effects", in Proceedings of the 13th International Conference on Digital Audio Effects (DAFx-10), Graz, Austria, September 6-10, 2010.
A. P. Klapuri, "A perceptually motivated multiple-F0 estimation method," IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005., 2005, pp. 291-294, doi: 10.1109/ASPAA.2005.1540227.
A. P. Klapuri, "Multiple fundamental frequency estimation based on harmonicity and spectral smoothness," in IEEE Transactions on Speech and Audio Processing, vol. 11, no. 6, pp. 804-816, Nov. 2003, doi: 10.1109/TSA.2003.815516.
A. P. Klapuri, "Multipitch estimation and sound separation by the spectral smoothness principle," 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), 2001, pp. 3381-3384 vol.5, doi: 10.1109/ICASSP.2001.940384.
I suppose it'll depend how you're building your application. Hard to help without much detail around specs. Though, here are a few options for you.
There are a few stream options, for example;
mic
Or if you're using React;
react-mic
Or if you're wanting to go real basic with some vanilla JS;
Web Fundamentals: Recording Audio

WebAudio dB visualization not reflecting frequency bands as expected

I've built a system-audio setup with WebAudio in an Angular component. It works well only the bands do not seem to reflect frequency accurately.
See here a test for high, mid, low tone test.
I've gotten pretty far with the native API, accessing a media stream etc. but it's not as helpful as a utility as I thought it would be...
Question:
How would we get the most accurate frequency decibel data?
All sounds seem to be focused in the first 3 bands.
Here is the method which visualizes the media stream (full code on [Github][2])
private repeater() {
this._AFID = requestAnimationFrame(() => this.frameLooper());
// how many values from analyser (the "buffer" size)
this._fbc = this._analyser.frequencyBinCount;
// frequency data is integers on a scale from 0 to 255
this._data = new Uint8Array(this._analyser.frequencyBinCount);
this._analyser.getByteFrequencyData(this._data);
let bandsTemp = [];
// calculate the height of each band element using frequency data
for (var i = 0; i < this._fbc; i++) {
bandsTemp.push({ height: this._data[i] });
}
this.bands = bandsTemp;
}
Boris Smus' Web Audio API book says:
If, however, we want to perform a comprehensive analysis of the whole
audio buffer, we should look to other methods...
Perhaps this method is as good as it gets. What is a better method for more functional frequency analysis?
Thanks for the example in https://stackblitz.com/edit/angular-mediastream-device?file=src%2Fapp%2Fsys-sound%2Fsys-sound.component.ts. You're right that it doesn't work in chrome, but if I use the link to open it in a new window, everything is right.
So, I think you're computing the labels for the graph incorrectly. I assuming they're supposed to represent the frequency of the band. If not, then this answer is wrong.
You have fqRange = sampleRate / bands. Let's assume that sampleRate = 48000 (to keep the numbers simple), and bands = 16. Then fqRange = 3000. First I think you really want either sampleRate/2/bands or sampleRate / fftSize, which is the same thing.
So each of the frequency bins is 1500 Hz wide. Your labels should be 1500*k, for k = 0 to 15. (Although there's more than one way to label these, this is the easiest.) This will cover the range from 0 to 24000 Hz.
And when I play a 12 kHz tone, I see the peak is aroudn 1552 in your code. But with the new labeling, this is the 8'th bin, so 1500*8 = 12000. (Well, there are some differences. My sampleRate is actually 44.1 kHz, so the numbers computed above will be different.)

Ramp WebAudio gain, while playback is stopping and starting?

I have an application where audio playback may be starting and stopping, and there are UI controls to ramp the gain to zero or nonzero values. I'm scheduling playback using AudioBufferSourceNode.start, and modulating gain using AudioParam.linearRampToValueAtTime. Playback is sometimes scheduled for a future time. The problem I'm having is that the ramp function only seems to set values when playback is currently happening; so, if we try to set the gain value e.g. between playback being scheduled and playback starting, the new values are lost. I could do a bunch of timing checks, and either ramp or directly set gain depending on whether playback is happening, but this can get messy, and I was wondering whether there was an alternative way to do this which would work independently of playback starting and stopping.
Here is a test case: we create a one-second noise buffer and play it, while also ramping gain to zero. If playback is scheduled for after the ramp has ended (one second), the gain value never gets set and remains at the default, nonzero value.
var ctx = new AudioContext();
var SR = ctx.sampleRate;
var buffer = ctx.createBuffer(1, SR, SR);
var channelData = buffer.getChannelData(0);
for (var i=0; i<SR; i++) {
channelData[i] = Math.random() * 2 - 1;
}
var bufferNode = ctx.createBufferSource();
var gainNode = ctx.createGain();
bufferNode.buffer = buffer;
bufferNode.connect(gainNode);
gainNode.connect(ctx.destination);
gainNode.gain.linearRampToValueAtTime(0, ctx.currentTime + 1);
//XXX if start_delay is greater than 1 (the ramp duration),
// the gain is never changed and remains at 1.
var start_delay = 0;
bufferNode.start(ctx.currentTime + start_delay);
Couple of issues here:
The behavior of linearRampToValueAtTime without a preceeding automation event was originally not well-specified. This has since been fixed in the WebAudio spec. Don't know the status of various browsers on this, but I think Chrome does this correctly.
The automation of the gain node should work as you expect. If it doesn't file a bug against your browser vendor.
My problem seems to have been caused by a bug in Chrome, which was fixed in Chrome 57. Here is the bug report: https://bugs.chromium.org/p/chromium/issues/detail?id=647974
The following comment has a workaround for older versions of Chrome: https://bugs.chromium.org/p/chromium/issues/detail?id=647974#c9

Recorder.js calculate and offset recording for latency

I'm using Recorder.js to record audio from Google Chrome desktop and mobile browsers. In my specific use case I need to record exactly 3 seconds of audio, starting and ending at a specific time.
Now I know that when recording audio, your soundcard cannot work in realtime due to hardware delays, so there is always a memory buffer which allows you to keep up recording without hearing jumps/stutters.
Recorder.js allows you to configure the bufferLen variable exactly for this, while sampleRate is taken automatically from the audio context object. Here is a simplified version of how it works:
var context = new AudioContext();
var recorder;
navigator.getUserMedia({audio: true}, function(stream) {
recorder = new Recorder(context.createMediaStreamSource(stream), {
bufferLen: 4096
});
});
function recordLoop() {
recorder.record();
window.setTimeout(function () {
recorder.stop();
}, 3000);
}
The issue i'm facing is that record() does not offset for the buffer latency and neither does stop(). So instead of getting a three second sound, it's 2.97 seconds and the start is cut off.
This means my recordings don't start in the same place, and also when I loop them, the loops are different lengths depending on your device latency!!
There are two potentially solutions I see here:
Adjust Recorder.js code to offset the buffer automatically against your start/stop times (maybe add new startSync/stopSync functions)
Calculate the latency and create two offset timers to start and stop Recorder.js at the correct points in time.
I'm trying solution 2, because solution 1 requires knowledge of buffer arrays which I don't have :( I believe the calculation for latency is:
var bufferSize = 4096;
var sampleRate = 44100
var latency = (bufferSize / sampleRate) * 2; // 0.18575963718820862 secs
However when I run these calculations in a real test I get:
var duration = 2.972154195011338 secs
var latency = 0.18575963718820862 secs
var total = duration + latency // 3.1579138321995464 secs
Something isn't right, it doesn't make 3 seconds and it's beginning to confuse me now! I've created a working fork of Recorder.js demo with a log:
http://kmturley.github.io/Recorderjs/
Any help would be greatly appreciated. Thanks!
I'm a bit confused by your concern for the latency. Yes, it's true that the minimum possible latency is going to be the related to the length of the buffer but there are many other latencies involved. In any case, the latency has nothing to do with the recording duration, which seems to me to be what your question is about.
If you want to record an exactly 3 second long buffer at 44100 that is 44100*3=132,300 samples. The buffer size is 4096 samples and the system is only going to record an even multiple of that number. Given that the closest you are going to get is to record either 32 or 33 complete buffers. This gives either 131072 (2.97 seconds) or 135168 (3.065 seconds) samples.
You have a couple options here.
Choose a buffer length that evenly divides the sample rate. e.g. 11025. You can then record exactly 12 buffers.
Record slightly longer than the 3.0 seconds you need and then throw the extra 2868 samples away.

Which format is returned from the fft with WebAudioAPI

I visualized an audiofile with WebAudioAPI and with Dancer.js. All works well but the visualizations looks very different. Can anybody help me to find out why it looks so different?
The Web-Audio-API code (fft.php, fft.js)
The dancer code (plugins/dancer.fft.js, js/playerFFT.js, fft.php)
The visualization for WebAudioAPI is on:
http://multimediatechnology.at/~fhs32640/sem6/WebAudio/fft.html
For Dancer is on
http://multimediatechnology.at/~fhs32640/sem6/Dancer/fft.php
The difference is in how the volumes at the frequencies are 'found'. Your code uses the analyser, which takes the values and also does some smoothing, so your graph looks nice. Dancer uses a scriptprocessor. The scriptprocessor fires a callback every time a certain sample length has gone through, and it passes that sample to e.inputBuffer. Then it just draws that 'raw' data, no smoothing applied.
var
buffers = [],
channels = e.inputBuffer.numberOfChannels,
resolution = SAMPLE_SIZE / channels,
sum = function (prev, curr) {
return prev[i] + curr[i];
}, i;
for (i = channels; i--;) {
buffers.push(e.inputBuffer.getChannelData(i));
}
for (i = 0; i < resolution; i++) {
this.signal[i] = channels > 1 ? buffers.reduce(sum) / channels : buffers[0][i];
}
this.fft.forward(this.signal);
this.dancer.trigger('update');
This is the code that Dancer uses to get the sound strength at the frequencies.
(this can be found in adapterWebAudio.js).
Because one is simply using the native frequency data provided by the Web Audio API using analyser.getByteFrequencyData().
The other doing its own calculation by using a ScriptProcessorNode and then when that node's onaudioprocess event fires, they take the channel data from the input buffer and convert that to a frequency domain spectra by performing a forward transform on it and then calculating the Discrete Fourier Transform of the signal with the Fast Fourier Transform algorithm.
idbehold's answer is partially correct (smoothing is getting applied), but a bigger issue is that the Web Audio code is using getByteFrequencyData instead of getFloatFrequencyData. The "byte" version does processing to maximize the byte's range - it spreads minDb to maxDb across the 0-255 byte range.

Categories

Resources