I'm trying to determine the fundamental frequency of an input signal (from a tone generator, or possibly a musical instrument) using JavaScript's WebAudio API, along with some other SO articles (How to get frequency from fft result?, How do I obtain the frequencies of each value in an FFT?), but I can only seem to determine a frequency within about 5-10Hz.
I'm testing using a signal generator app for iPad positioned next to a high-quality microphone in a quiet room.
For 500Hz, it consistently returns 506; for 600Hz, I get 592. 1kHz is 1001Hz, but 2kHz is 1991. I've also noticed that I can modulate the frequency by 5Hz (at 1Hz increments) before seeing any change in data from the FFT. And I've only been able to get this accurate by averaging together the two highest bins.
Does this mean that there's not enough resolution in the FFT data to accurately determine the fundamental frequency within 1Hz, or have I gone about it the wrong way?
I've tried using both the native FFT libs (like this, for example):
var fFrequencyData = new Float32Array(analyser.frequencyBinCount);
analyser.getFloatFrequencyData(fFrequencyData);
(you can assume I've properly initialized and connected an Analyser node), which only shows a sensitivity/resolution of about
and also using DSP.js' FFT lib, like this:
var fft = new FFT();
fft.forward(e.inputBuffer.getChannelData(0));
fFrequencyData = fft.spectrum;
where e is the event object passed to onaudioprocess.
I seem to have a problem with FFT data (like FFT::spectrum) being null - is that normal? The data from fft.spectrum is naught unless I run it through analyser.getFloatFrequencyData, which I'm thinking overwrites the data with stuff from the native FFT, defeating the purpose entirely?
Hopefully someone out there can help steer me in the right direction - thanks! :)
You would need a very large FFT to get high-quality pitch detection this way. To get a 1Hz resolution, for example, with a 44,100kHz sample rate, you would need a 64k(-ish) FFT size. You're far better off using autocorrelation to do monophonic pitch detection.
Related
I want to analyze the frequencies coming from the microphone input with a resolution of <1Hz in browser.
The normal Web Audio AnalyzerNode has a maximum FFT_SIZE of 32768. This results in a resolution of ~1.4Hz for normal samplerates (48kHz).
Now I want to use jsfft or something similar to do the frequency transform. I want to collect 65536 audio samples as this fft size should reach a resolution of ~0.7Hz. (Time resolution is not that important)
Unfortunately the ScriptProcessorNode also only has a maximum bufferSize of 16384 I want to combine 4 of its buffers to one Float32Array.
I thought that there will be someting like
copyChannelData(array, offset, length)
but there is only
getChannelData(array)
So if I understand right I would have to copy all the data in my bigger array before I can do the fft.
Just to be sure I don't miss anything...Is there a way to retrieve the data directly into my bigger array?
No, you will need to copy the data. This method is pretty inefficient anyway (ScriptProcessor, I mean) - the copy is not the worst of your worries, since you are fundamentally going to need to copy that data.
Im played with the javascript webaudio api to visualizing the audiostream of the song. Like I saw its using fft transform, and the result looking quite nice.
fft visualizer pic
But when I see other visalizers, they are using other algorithm, or transform the fft to something else. They are looks more closer to the rythm and the bass. On the spectrum more "hills" and them moving linear, and dancing.
spectrum_example
Lots of visualized song on the youtube using this other algorithm. What kind of analyser are they using? It is possible to earn that from js webaudio api, or from the fft?
You can get frequency data using the analyser node in the web audio Api, it has a configurable fft size, frequency bin count and smoothing constant.
The getFloatFrequencyData method returns an array with the amplitudes of frequency ranges devided over the bin count.
You can use this data to visualize it however you like.
There are lots of possible parameters in audio spectrum measurement and presentation (including, but not limited to): sample rate, FFT length, overlap, window function, time filtering of the sequence of FFT results (low pass, momentum, etc.), coloring scheme, frame rate, and etc. The Javascript webaudio API may or may not allow you to vary all those parameters to get many of the possible presentations.
I'm working about audio but I'm a newbie in this area. I would like to matching sound from microphone to my source audio(just only 1 sound) like Coke Ads from Shazam. Example Video (0.45 minute) However, I want to make it on website by JavaScript. Thank you.
Building something similar to the backend of Shazam is not an easy task. We need to:
Acquire audio from the user's microphone (easy)
Compare it to the source and identify a match (hmm... how do... )
How can we perform each step?
Aquire Audio
This one is a definite no biggy. We can use the Web Audio API for this. You can google around for good tutorials on how to use it. This link provides some good fundametal knowledge that you may want to understand when using it.
Compare Samples to Audio Source File
Clearly this piece is going to be an algorithmic challenge in a project like this. There are probably various ways to approach this part, and not enough time to describe them all here, but one feasible technique (which happens to be what Shazam actually uses), and which is also described in greater detail here, is to create and compare against a sort of fingerprint for smaller pieces of your source material, which you can generate using FFT analysis.
This works as follows:
Look at small sections of a sample no more than a few seconds long (note that this is done using a sliding window, not discrete partitioning) at a time
Calculate the Fourier Transform of the audio selection. This decomposes our selection into many signals of different frequencies. We can analyze the frequency domain of our sample to draw useful conclusions about what we are hearing.
Create a fingerprint for the selection by identifying critical values in the FFT, such as peak frequencies or magnitudes
If you want to be able to match multiple samples like Shazam does, you should maintain a dictionary of fingerprints, but since you only need to match one source material, you can just maintain them in a list. Since your keys are going to be an array of numerical values, I propose that another possible data structure to quickly query your dataset would be a k-d tree. I don't think Shazam uses one, but the more I think about it, the closer their system seems to an n-dimensional nearest neighbor search, if you can keep the amount of critical points consistent. For now though, just keep it simple, use a list.
Now we have a database of fingerprints primed and ready for use. We need to compare them against our microphone input now.
Sample our microphone input in small segments with a sliding window, the same way we did our sources.
For each segment, calculate the fingerprint, and see if it matches close to any from storage. You can look for a partial match here and there are lots of tweaks and optimizations you could try.
This is going to be a noisy and inaccurate signal so don't expect every segment to get a match. If lots of them are getting a match (you will have to figure out what lots means experimentally), then assume you have one. If there are relatively few matches, then figure you don't.
Conclusions
This is not going to be an super easy project to do well. The amount of tuning and optimization required will prove to be a challenge. Some microphones are inaccurate, and most environments have other sounds, and all of that will mess with your results, but it's also probably not as bad as it sounds. I mean, this is a system that from the outside seems unapproachably complex, and we just broke it down into some relatively simple steps.
Also as a final note, you mention Javascript several times in your post, and you may notice that I mentioned it zero times up until now in my answer, and that's because language of implementation is not an important factor. This system is complex enough that the hardest pieces to the puzzle are going to be the ones you solve on paper, so you don't need to think in terms of "how can I do X in Y", just figure out an algorithm for X, and the Y should come naturally.
I'm learning the WebAudio API and experimenting by building a simple audio player with a visualiser and an equaliser.
Both the visualiser and equaliser work on their own, but when I have them both connected to the AudioContext, the equaliser stops working.
Here's some of the code...
The equaliser
var sum = APP.audioContext.createGain();
APP.lGain.connect(sum);
APP.mGain.connect(sum);
APP.hGain.connect(sum);
sum.connect(APP.audioContext.destination);
And the visualiser
APP.analyser = APP.audioContext.createAnalyser();
APP.source.connect(APP.analyser);
APP.analyser.connect(APP.audioContext.destination);
If I remove the final line APP.analyser.connect(APP.audioContext.destination); then the equaliser works, but then my visualiser obviously breaks.
This works fine in Firefox, but not in Chrome (osx).
Thanks in advance for any help!
1) My guess is that it's not that the equalizer "stops working" - it's that you're connecting both the output of the equalizer and the output of the analyzer (which is a pass-through of the source!) to the destination, and it's summing them - so you have an equalized copy summing with a non-equalized copy, and it's dramatically lessening the equalizer's effect. The fix is simple - don't connect the analyzer to the destination. (It doesn't need to be connected to anything to work.)
2) I suspect you're using a less-than optimal way of doing equalization. You should use shelving filters and a peak filter in SERIES (one connected to another to another), not three filters in parallel (summing to one node). If you connect them in parallel, you're going to get odd phase-offset effects. Take a look here: Web audio API equalizer.
I'd like to find the local maxima for a set of data.
I have a log of flight data from a sounding rocket payload, and I'd like to find the approximate times for the staging based on accelerometer data. I should be able to get the times I want based on a visual inspection of the data on a graph, but how would I go about finding the points programmatically in Javascript?
If it's only necessary to know approximate times, probably it's good enough to use some heuristic such as: run the data through a smoothing filter and then look for jumps.
If it's important to find the staging times accurately, my advice is to construct a piecewise continuous model and fit that to the data, and then derive the staging times from that. For example, a one-stage model might be: for 0 < t < t_1, acceleration is f(t) - g; for t > t_1, acceleration is - g, where g is gravitational acceleration. I don't know what f(t) might be here but presumably it's well-known in rocket engineering. The difficulty of fitting such a model is due to the presence of the cut-off point t_1, which makes it nondifferentiable, but it's not really too difficult; in a relatively simple case like this, you can loop over the possible cut-off points and compute the least-squares solution for the rest of the parameters, then take the cut-off point or points which have the least error.
See Seber and Wild, "Nonlinear Regression"; there is a chapter about such models.