How do i get the audio frequency from my mic using javascript? - javascript

I need to create a sort of like guitar tuner.. thats recognize the sound frequencies and determines in witch chord i am actually playing. Its similar to this guitar tuner that i found online:
https://musicjungle.com.br/afinador-online
But i cant figure it out how it works because of the webpack files..I want to make this tool app backendless.. Someone have a clue about how to do this only in the front end?
i founded some old pieces of code that doesnt work together.. i need fresh ideas

There are quite a few problems to unpack here, some of which will require a bit more information as to the application. Hopefully the sheer size of this task will become apparent as this answer progresses.
As it stands, there are two problems here:
need to create a sort of like guitar tuner..
1. How do you detect the fundamental pitch of a guitar note and feed that information back to the user in the browser?
and
thats recognize the sound frequencies and determines in witch chord i am actually playing.
2. How do you detect which chord a guitar is playing?
This second question is definitely not a trivial one, but we'll come to it in turn. This is not a programming question, but rather a DSP question
Question 1: Pitch Detection in Browser
Breakdown
If you wish to detect the pitch of a note in the browser there are a couple sub-problems that should be split up. Shooting from the hip we have the following JavaScript browser problems:
how to get microphone permission?
how to tap microhone for sample data?
how to start an audio context?
how to display a value?
how to update a value regularly?
how to filter audio data?
how to perform pitch detection?
how to get pitch via autocorrolation?
how to get picth via zero-crossing?
how to get pitch from frequency domain?
how to perform a fourier transform?
This is not an exhaustive list, but it should consitute the bulk of the overall problem
There is no Minimal, Reproducible Example, so none of the above can be assumed.
Implementation
A basic implementation would consist of a numeric reprenstation of a single fundamental frequency (f0) using an autocorrolation method outlined in the A. v. Knesebeck and U. Zölzer paper [1].
There are other approaches which mix and match filtering and pitch detection algorithms which I believe is far outside the scope of a reasonable answer.
NOTE: The Web Audio API is still not equally implemented across all browser. You should check each of the major browsers and make accomodations in your program. The following was tested in Google Chrome, so your mileage may (and likely will) vary in other browsers.
HTML
Our page should include
an element to display frequency
an element to initiate pitch detection
A more rounded interface would likely split the operations of
Asking for microphone permission
starting microphone stream
processing microphone stream
into separate interface elements, but for brevity they will be wrapped into a single element. This gives us a basic HTML page of
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Pitch Detection</title>
</head>
<body>
<h1>Frequency (Hz)</h1>
<h2 id="frequency">0.0</h2>
<div>
<button onclick="startPitchDetection()">
Start Pitch Detection
</button>
</div>
</body>
</html>
We are jumping the gun slightly with <button onclick="startPitchDetection()">. We will wrap up the operation in a single function called startPitchDetection
Pallate of variables
For an autocorrolation pitch detection approach our pallate of variables will need to include:
the Audio context
the microphone stream
an Analyser Node
an array for audio data
an array for the corrolated signal
an array for corrolated signal maxima
a DOM reference to the frequency
giving us something like
let audioCtx = new (window.AudioContext || window.webkitAudioContext)();
let microphoneStream = null;
let analyserNode = audioCtx.createAnalyser()
let audioData = new Float32Array(analyserNode.fftSize);;
let corrolatedSignal = new Float32Array(analyserNode.fftSize);;
let localMaxima = new Array(10);
const frequencyDisplayElement = document.querySelector('#frequency');
Some value are left null as they will not be known until the microphone stream has been activated. The 10 in let localMaxima = new Array(10); is a little arbitrary. This array will store the distance in samples between consecutive maxima of the corrolated signal.
Main script
Our <button> element has an onclick function of startPitchDetection, so that will be required. We will also need
an update function (for updating the display)
an autocorrolation function that returns a pitch
However, the first thing we have to do is ask for permission to use the microphone. To achieve this we use navigator.mediaDevices.getUserMedia, which will returm a Promise. Embellishing on what is outlined in the MDN documentation this gives us something roughly looking like
navigator.mediaDevices.getUserMedia({audio: true})
.then((stream) => {
/* use the stream */
})
.catch((err) => {
/* handle the error */
});
Great! Now we can start adding our main functionality to the then function.
Our order of events should be
Start microphone stream
connect microphone stream to the analyser node
set a timed callback to
get the latest time domain audio data from the Analyser Node
get the autocorrolation derived pitch estimate
update html element with the value
On top of that, add a log of the error from the catch method.
This can then all be wrapped into the startPitchDetection function, giving something like:
function startPitchDetection()
{
navigator.mediaDevices.getUserMedia ({audio: true})
.then((stream) =>
{
microphoneStream = audioCtx.createMediaStreamSource(stream);
microphoneStream.connect(analyserNode);
audioData = new Float32Array(analyserNode.fftSize);
corrolatedSignal = new Float32Array(analyserNode.fftSize);
setInterval(() => {
analyserNode.getFloatTimeDomainData(audioData);
let pitch = getAutocorrolatedPitch();
frequencyDisplayElement.innerHTML = `${pitch}`;
}, 300);
})
.catch((err) =>
{
console.log(err);
});
}
The update interval for setInterval of 300 is arbitrary. A little experimentation will dictate which interval is best for you. You may even wish to give the user control of this, but that is outside the scope of thise question.
The next step is to actually define what getAutocorrolatedPitch() does, so lets actually breakdown what autocorrolation is.
Autocorrelation is the process of convolving a signal with itself. Any time the result goes from a positive rate of change to a negative rate of change is defined as a local maxima. The number of samples between the start of the corrolated signal to the first maxima should be the period in samples of f0. We can continue to look for subsequent maxima and take an average which should improve accuracy slightly. Some frequencies do not have a period of whole samples, for instance 440 Hz at a sample rate of 44100 Hz has a period of 100.227. We technichally could never accurately detect this frequency of 440 Hz by taking a single maxima, the result would always be either 441 Hz (44100/100) or 436 Hz (44100/101).
For our autocorrolation function, we'll need
a track of how many maxima that have been detected
the mean distance between maxima
Our function should first perform the autocorrolation, find the sample positions of local maxima and then calculate the mean distance between these maxima. This give a function looking like:
function getAutocorrolatedPitch()
{
// First: autocorrolate the signal
let maximaCount = 0;
for (let l = 0; l < analyserNode.fftSize; l++) {
corrolatedSignal[l] = 0;
for (let i = 0; i < analyserNode.fftSize - l; i++) {
corrolatedSignal[l] += audioData[i] * audioData[i + l];
}
if (l > 1) {
if ((corrolatedSignal[l - 2] - corrolatedSignal[l - 1]) < 0
&& (corrolatedSignal[l - 1] - corrolatedSignal[l]) > 0) {
localMaxima[maximaCount] = (l - 1);
maximaCount++;
if ((maximaCount >= localMaxima.length))
break;
}
}
}
// Second: find the average distance in samples between maxima
let maximaMean = localMaxima[0];
for (let i = 1; i < maximaCount; i++)
maximaMean += localMaxima[i] - localMaxima[i - 1];
maximaMean /= maximaCount;
return audioCtx.sampleRate / maximaMean;
}
Problems
Once you have implemented this you may find there are actually a couple of problems.
The frequency result is a bit erratic
the display method is not intuitive for tuning purposes
The erratic result is down to the fact that autocorrolation by itself is not a perfect solution. You will need to experiment with first filtering the signal and aggregating other methods. You could also try limiting the signal or only analyse the signal when it is above a certain threshold. You could also increase the rate at which you perform the detection and average out the results.
Secondly, the method for display is limited. Musician would not be appreciative of a simple numerical result. Rather, some kind of graphical feedback would be more intuitive. Again, that is outside the scope of the question.
Full page and script
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Pitch Detection</title>
</head>
<body>
<h1>Frequency (Hz)</h1>
<h2 id="frequency">0.0</h2>
<div>
<button onclick="startPitchDetection()">
Start Pitch Detection
</button>
</div>
<script>
let audioCtx = new (window.AudioContext || window.webkitAudioContext)();
let microphoneStream = null;
let analyserNode = audioCtx.createAnalyser()
let audioData = new Float32Array(analyserNode.fftSize);;
let corrolatedSignal = new Float32Array(analyserNode.fftSize);;
let localMaxima = new Array(10);
const frequencyDisplayElement = document.querySelector('#frequency');
function startPitchDetection()
{
navigator.mediaDevices.getUserMedia ({audio: true})
.then((stream) =>
{
microphoneStream = audioCtx.createMediaStreamSource(stream);
microphoneStream.connect(analyserNode);
audioData = new Float32Array(analyserNode.fftSize);
corrolatedSignal = new Float32Array(analyserNode.fftSize);
setInterval(() => {
analyserNode.getFloatTimeDomainData(audioData);
let pitch = getAutocorrolatedPitch();
frequencyDisplayElement.innerHTML = `${pitch}`;
}, 300);
})
.catch((err) =>
{
console.log(err);
});
}
function getAutocorrolatedPitch()
{
// First: autocorrolate the signal
let maximaCount = 0;
for (let l = 0; l < analyserNode.fftSize; l++) {
corrolatedSignal[l] = 0;
for (let i = 0; i < analyserNode.fftSize - l; i++) {
corrolatedSignal[l] += audioData[i] * audioData[i + l];
}
if (l > 1) {
if ((corrolatedSignal[l - 2] - corrolatedSignal[l - 1]) < 0
&& (corrolatedSignal[l - 1] - corrolatedSignal[l]) > 0) {
localMaxima[maximaCount] = (l - 1);
maximaCount++;
if ((maximaCount >= localMaxima.length))
break;
}
}
}
// Second: find the average distance in samples between maxima
let maximaMean = localMaxima[0];
for (let i = 1; i < maximaCount; i++)
maximaMean += localMaxima[i] - localMaxima[i - 1];
maximaMean /= maximaCount;
return audioCtx.sampleRate / maximaMean;
}
</script>
</body>
</html>
Question 2: Detecting multiple notes
At this point I think we can all agree that this answer has gotten a little out of hand. So far we've just covered a single method of pitch detection. See Ref [2, 3, 4] for some suggestions of algorithms for multiple f0 detection.
In essence, this problem would come down to detecting all f0s and looking up the resulting notes against a dictionary of chords. For that, there should at least be a little work done on your part. Any questions about the DSP should probably be pointed toward https://dsp.stackexchange.com. You will be spoiled for choice on questions regarding pitch detection algorithms
References
A. v. Knesebeck and U. Zölzer, "Comparison of pitch trackers for real-time guitar effects", in Proceedings of the 13th International Conference on Digital Audio Effects (DAFx-10), Graz, Austria, September 6-10, 2010.
A. P. Klapuri, "A perceptually motivated multiple-F0 estimation method," IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005., 2005, pp. 291-294, doi: 10.1109/ASPAA.2005.1540227.
A. P. Klapuri, "Multiple fundamental frequency estimation based on harmonicity and spectral smoothness," in IEEE Transactions on Speech and Audio Processing, vol. 11, no. 6, pp. 804-816, Nov. 2003, doi: 10.1109/TSA.2003.815516.
A. P. Klapuri, "Multipitch estimation and sound separation by the spectral smoothness principle," 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), 2001, pp. 3381-3384 vol.5, doi: 10.1109/ICASSP.2001.940384.

I suppose it'll depend how you're building your application. Hard to help without much detail around specs. Though, here are a few options for you.
There are a few stream options, for example;
mic
Or if you're using React;
react-mic
Or if you're wanting to go real basic with some vanilla JS;
Web Fundamentals: Recording Audio

Related

Limited playback rate using the Web Audio API

I'm trying to use the Web Audio API to create a simple synthesizer, but I'm having problem with the playback rate of an AudioBuffer. Its value seems to be limited when I try using a somewhat high value.
Here's my code sample. I first create a sample buffer containing a simple waveform that has as much samples as the sample rate (this creates a waveform at 1Hz if it was directly read). I then create an AudioBuffer that will contain the samples. Finaly, I create an AudioBufferSourceNode that will play the previous buffer in a loop at a some playback rate (which translate to an audible frequency).
const audioContext = new window.AudioContext();
const buffer = new Float32Array(audioContext.sampleRate);
for (let i = 0; i < buffer.length; i++) {
buffer[i] = Math.sin(2 * Math.PI * i / audioContext.sampleRate);
}
const audioBuffer = new AudioBuffer({
length: buffer.length,
numberOfChannels: 1,
sampleRate: audioContext.sampleRate
});
audioBuffer.copyToChannel(buffer, 0);
const sourceNode = new AudioBufferSourceNode(audioContext, {
buffer: audioBuffer,
loop: true,
playbackRate: 440
});
sourceNode.connect(audioContext.destination);
sourceNode.start();
In this scenario, playbackRate seems to be limited at 1024. Any value higher than this will be not have any audible effect. I verified its maximum value (sourceNode.playbackRate.maxValue) and it's around 3.4e+38, which is way above what I'm trying to achieve.
I'm wondering if there's something I'm missing or if my understanding of this API is wrong. I'm using Google Chrome on the latest version if this changes anything.
This is a bug in Chrome: https://crbug.com/712256. I believe Firefox may implement this correctly (but I did not check).
Note, also, that Chrome uses very simple interpolation. The quality of the output degrades quite a bit when the playbackRate is very different from 1.

WebAudio dB visualization not reflecting frequency bands as expected

I've built a system-audio setup with WebAudio in an Angular component. It works well only the bands do not seem to reflect frequency accurately.
See here a test for high, mid, low tone test.
I've gotten pretty far with the native API, accessing a media stream etc. but it's not as helpful as a utility as I thought it would be...
Question:
How would we get the most accurate frequency decibel data?
All sounds seem to be focused in the first 3 bands.
Here is the method which visualizes the media stream (full code on [Github][2])
private repeater() {
this._AFID = requestAnimationFrame(() => this.frameLooper());
// how many values from analyser (the "buffer" size)
this._fbc = this._analyser.frequencyBinCount;
// frequency data is integers on a scale from 0 to 255
this._data = new Uint8Array(this._analyser.frequencyBinCount);
this._analyser.getByteFrequencyData(this._data);
let bandsTemp = [];
// calculate the height of each band element using frequency data
for (var i = 0; i < this._fbc; i++) {
bandsTemp.push({ height: this._data[i] });
}
this.bands = bandsTemp;
}
Boris Smus' Web Audio API book says:
If, however, we want to perform a comprehensive analysis of the whole
audio buffer, we should look to other methods...
Perhaps this method is as good as it gets. What is a better method for more functional frequency analysis?
Thanks for the example in https://stackblitz.com/edit/angular-mediastream-device?file=src%2Fapp%2Fsys-sound%2Fsys-sound.component.ts. You're right that it doesn't work in chrome, but if I use the link to open it in a new window, everything is right.
So, I think you're computing the labels for the graph incorrectly. I assuming they're supposed to represent the frequency of the band. If not, then this answer is wrong.
You have fqRange = sampleRate / bands. Let's assume that sampleRate = 48000 (to keep the numbers simple), and bands = 16. Then fqRange = 3000. First I think you really want either sampleRate/2/bands or sampleRate / fftSize, which is the same thing.
So each of the frequency bins is 1500 Hz wide. Your labels should be 1500*k, for k = 0 to 15. (Although there's more than one way to label these, this is the easiest.) This will cover the range from 0 to 24000 Hz.
And when I play a 12 kHz tone, I see the peak is aroudn 1552 in your code. But with the new labeling, this is the 8'th bin, so 1500*8 = 12000. (Well, there are some differences. My sampleRate is actually 44.1 kHz, so the numbers computed above will be different.)

WEB AUDIO API creating rain crackling noises

I'm trying to create rain in the javascript Web Audio API.
So far I've created a low frequency rumbling noise for the background and I'm working on a high frequency noise which will imitate the sound of rain droplets. However, right now the high-frequency noise is very much like white noise and too intense to be individual droplets. Does anyone know how to "separate" the sound a little so it almost sounds like crackling. Here is a link to what I would like the high-frequency noise to sound like if you increase the last slider (violet) you can hear it.
And here is my HTML code so far
<script>
let context= new AudioContext();
let context2= new AudioContext();
let lowpass = context.createBiquadFilter();
lowpass.type = 'lowpass';
//lowpass.Q.value = -7.01;
lowpass.frequency.setValueAtTime(80, context2.currentTime);
let gain = new GainNode(context);
gain.gain.value= 0.4;
let gain2 = new GainNode(context2);
gain2.gain.value= 0.02;
let highpass=context2.createBiquadFilter();
highpass.type = 'highpass';
highpass.Q.value = 2;
//highpass.frequency.setValueAtTime(6000, context2.currentTime);
let distortion = context2.createWaveShaper();
let delay = context2.createDelay(90.0);
function StartAudio() {context.resume()};
context.audioWorklet.addModule('basicnoise.js').then(() => {
let myNoise = new AudioWorkletNode(context,'noise-generator');
myNoise.connect(lowpass);
lowpass.connect(gain);
gain.connect(context.destination);
});
function StartAudio2() {context2.resume()};
context2.audioWorklet.addModule('basicnoise.js').then(() => {
let myNoise2 = new AudioWorkletNode(context2,'noise-generator');
myNoise2.connect(highpass);
highpass.connect(gain2);
gain2.connect(delay);
delay.connect(context2.destination);
});
I've been playing around with different functions, some of them didn't do much or I simply am not using them correctly as I am very new to the audio API scene. Any help is appreciated as this is for a school project and I know some other students want to make fire sounds and could also benefit from the crackling noise !! Thank you !!
If you think of rain as a physical process, it's basically lots of surface impact sounds (and probably some additional ambience created by airflow). When enough raindrops hit surface(s) at a rapid enough clip, the end result ends up being noise-ish.
I think a realistic rain generator would simulate lots of single droplets hitting a surface at different distances from the listener (which causes attenuation and filtering).
That said, if you want to try just "crackling" the noise generator you have going on now, try modulating the gain node's gain value randomly; here, there's a 25% chance for the generator to be effectively muted every 20 milliseconds (or so, considering timers aren't exactly precise).
setInterval(() => {
gain.gain.value=(Math.random() < 0.75 ? 0.4 : 0);
}, 20)

Which format is returned from the fft with WebAudioAPI

I visualized an audiofile with WebAudioAPI and with Dancer.js. All works well but the visualizations looks very different. Can anybody help me to find out why it looks so different?
The Web-Audio-API code (fft.php, fft.js)
The dancer code (plugins/dancer.fft.js, js/playerFFT.js, fft.php)
The visualization for WebAudioAPI is on:
http://multimediatechnology.at/~fhs32640/sem6/WebAudio/fft.html
For Dancer is on
http://multimediatechnology.at/~fhs32640/sem6/Dancer/fft.php
The difference is in how the volumes at the frequencies are 'found'. Your code uses the analyser, which takes the values and also does some smoothing, so your graph looks nice. Dancer uses a scriptprocessor. The scriptprocessor fires a callback every time a certain sample length has gone through, and it passes that sample to e.inputBuffer. Then it just draws that 'raw' data, no smoothing applied.
var
buffers = [],
channels = e.inputBuffer.numberOfChannels,
resolution = SAMPLE_SIZE / channels,
sum = function (prev, curr) {
return prev[i] + curr[i];
}, i;
for (i = channels; i--;) {
buffers.push(e.inputBuffer.getChannelData(i));
}
for (i = 0; i < resolution; i++) {
this.signal[i] = channels > 1 ? buffers.reduce(sum) / channels : buffers[0][i];
}
this.fft.forward(this.signal);
this.dancer.trigger('update');
This is the code that Dancer uses to get the sound strength at the frequencies.
(this can be found in adapterWebAudio.js).
Because one is simply using the native frequency data provided by the Web Audio API using analyser.getByteFrequencyData().
The other doing its own calculation by using a ScriptProcessorNode and then when that node's onaudioprocess event fires, they take the channel data from the input buffer and convert that to a frequency domain spectra by performing a forward transform on it and then calculating the Discrete Fourier Transform of the signal with the Fast Fourier Transform algorithm.
idbehold's answer is partially correct (smoothing is getting applied), but a bigger issue is that the Web Audio code is using getByteFrequencyData instead of getFloatFrequencyData. The "byte" version does processing to maximize the byte's range - it spreads minDb to maxDb across the 0-255 byte range.

Web Audio: Karplus Strong String Synthesis

Edit: Cleaned up the code and the player (on Github) a little so it's easier to set the frequency
I'm trying to synthesize strings using the Karplus Strong string synthesis algorithm, but I can't get the string to tune properly. Does anyone have any idea?
As linked above, the code is on Github: https://github.com/achalddave/Audio-API-Frequency-Generator (the relevant bits are in strings.js).
Wiki has the following diagram:
So essentially, I generate the noise, which then gets output and sent to a delay filter simultaneously. The delay filter is connected to a low-pass filter, which is then mixed with the output. According to Wikipedia, the delay should be of N samples, where N is the sampling frequency divided by the fundamental frequency (N = f_s/f_0).
Excerpts from my code:
Generating the noise (bufferSize is 2048, but that shouldn't matter too much)
var buffer = context.createBuffer(1, bufferSize, context.sampleRate);
var bufferSource = context.createBufferSource();
bufferSource.buffer = buffer;
var bufferData = buffer.getChannelData(0);
for (var i = 0; i < delaySamples+1; i++) {
bufferData[i] = 2*(Math.random()-0.5); // random noise from -1 to 1
}
Create a delay node
var delayNode = context.createDelayNode();
We need to delay by f_s/f_0 samples. However, the delay node takes the delay in seconds, so we need to divide that by the samples per second, and we get (f_s/f_0) / f_s, which is just 1/f_0.
var delaySeconds = 1/(frequency);
delayNode.delayTime.value = delaySeconds;
Create the lowpass filter (the frequency cutoff, as far as I can tell, shouldn't affect the frequency, and is more a matter of whether the string "sounds" natural):
var lowpassFilter = context.createBiquadFilter();
lowpassFilter.type = lowpassFilter.LOWPASS; // explicitly set type
lowpassFilter.frequency.value = 20000; // make things sound better
Connect the noise to the output and the delay node (destination = context.destination and was defined earlier):
bufferSource.connect(destination);
bufferSource.connect(delayNode);
Connect the delay to the lowpass filter:
delayNode.connect(lowpassFilter);
Connect the lowpass to the output and back to the delay*:
lowpassFilter.connect(destination);
lowpassFilter.connect(delayNode);
Does anyone have any ideas? I can't figure out whether the issue is my code, my interpretation of the algorithm, my understanding of the API, or (though this is least likely) an issue with the API itself.
*Note that on Github, there's actually a Gain Node between the lowpass and the output, but this doesn't really make a big difference in the output.
Here's what I think is the problem. I don't think the DelayNode implementation is designed to handle such tight feedback loops. For a 441 Hz tone, for example, that's only 100 samples of delay, and the DelayNode implementation probably processes its input in blocks of 128 or more. (The delayTime attribute is "k-rate", meaning changes to it are only processed in blocks of 128 samples. That doesn't prove my point, but it hints at it.) So the feedback comes in too late, or only partially, or something.
EDIT/UPDATE: As I state in a comment below, the actual problem is that a DelayNode in a cycle adds 128 sample frames between output and input, so that the observed delay is 128 / sampleRate seconds longer than specified.
My advice (and what I've begun to do) is to implement the whole Karplus-Strong including your own delay line in a JavaScriptNode (now known as a ScriptProcessorNode). It's not hard and I'll post my code once I get rid of an annoying bug that can't possibly exist but somehow does.
Incidentally, the tone you (and I) get with a delayTime of 1/440 (which is supposed to be an A) seems to be a G, two semitones below where it should be. Doubling the frequency raises it to a B, four semitones higher. (I could be off by an octave or two - kind of hard to tell.) Probably one could figure out what's going on (mathematically) from a couple more data points like this, but I won't bother.
EDIT: Here's my code, certified bug-free.
var context = new webkitAudioContext();
var frequency = 440;
var impulse = 0.001 * context.sampleRate;
var node = context.createJavaScriptNode(4096, 0, 1);
var N = Math.round(context.sampleRate / frequency);
var y = new Float32Array(N);
var n = 0;
node.onaudioprocess = function (e) {
var output = e.outputBuffer.getChannelData(0);
for (var i = 0; i < e.outputBuffer.length; ++i) {
var xn = (--impulse >= 0) ? Math.random()-0.5 : 0;
output[i] = y[n] = xn + (y[n] + y[(n + 1) % N]) / 2;
if (++n >= N) n = 0;
}
}
node.connect(context.destination);

Categories

Resources