I'm learning the WebAudio API and experimenting by building a simple audio player with a visualiser and an equaliser.
Both the visualiser and equaliser work on their own, but when I have them both connected to the AudioContext, the equaliser stops working.
Here's some of the code...
The equaliser
var sum = APP.audioContext.createGain();
APP.lGain.connect(sum);
APP.mGain.connect(sum);
APP.hGain.connect(sum);
sum.connect(APP.audioContext.destination);
And the visualiser
APP.analyser = APP.audioContext.createAnalyser();
APP.source.connect(APP.analyser);
APP.analyser.connect(APP.audioContext.destination);
If I remove the final line APP.analyser.connect(APP.audioContext.destination); then the equaliser works, but then my visualiser obviously breaks.
This works fine in Firefox, but not in Chrome (osx).
Thanks in advance for any help!
1) My guess is that it's not that the equalizer "stops working" - it's that you're connecting both the output of the equalizer and the output of the analyzer (which is a pass-through of the source!) to the destination, and it's summing them - so you have an equalized copy summing with a non-equalized copy, and it's dramatically lessening the equalizer's effect. The fix is simple - don't connect the analyzer to the destination. (It doesn't need to be connected to anything to work.)
2) I suspect you're using a less-than optimal way of doing equalization. You should use shelving filters and a peak filter in SERIES (one connected to another to another), not three filters in parallel (summing to one node). If you connect them in parallel, you're going to get odd phase-offset effects. Take a look here: Web audio API equalizer.
Related
I'm trying to parse mpd of several DASH live streamming servers to get newly published segments.
You know, there is a tag named "MinimumUpdatePeriod" in the mpd, so it must be upgraded within this time, and usually 1 or 2 segments are newly published during the period. This way, I can get list of segments regularly, they are accumulated one by one.
But in some servers, they go wrong. Even though minimum update period time is expired, no new segments are published. Instead, random number of segments are published per random interval. I can't understand why this happens.
Currently, I'm using the Google shaka player's submodule to parse mpd and catch segment lists, I don't think there is any problem in it. So That's what I'm so confused. I just made simple Javascript module to use shaka, it works really well in case of normal servers as I said, but not for other ones.
The platform is Ubuntu 20.04, I'm using latest version of shaka player.
Thanks for your reading, I hope your opinions.
I'm working about audio but I'm a newbie in this area. I would like to matching sound from microphone to my source audio(just only 1 sound) like Coke Ads from Shazam. Example Video (0.45 minute) However, I want to make it on website by JavaScript. Thank you.
Building something similar to the backend of Shazam is not an easy task. We need to:
Acquire audio from the user's microphone (easy)
Compare it to the source and identify a match (hmm... how do... )
How can we perform each step?
Aquire Audio
This one is a definite no biggy. We can use the Web Audio API for this. You can google around for good tutorials on how to use it. This link provides some good fundametal knowledge that you may want to understand when using it.
Compare Samples to Audio Source File
Clearly this piece is going to be an algorithmic challenge in a project like this. There are probably various ways to approach this part, and not enough time to describe them all here, but one feasible technique (which happens to be what Shazam actually uses), and which is also described in greater detail here, is to create and compare against a sort of fingerprint for smaller pieces of your source material, which you can generate using FFT analysis.
This works as follows:
Look at small sections of a sample no more than a few seconds long (note that this is done using a sliding window, not discrete partitioning) at a time
Calculate the Fourier Transform of the audio selection. This decomposes our selection into many signals of different frequencies. We can analyze the frequency domain of our sample to draw useful conclusions about what we are hearing.
Create a fingerprint for the selection by identifying critical values in the FFT, such as peak frequencies or magnitudes
If you want to be able to match multiple samples like Shazam does, you should maintain a dictionary of fingerprints, but since you only need to match one source material, you can just maintain them in a list. Since your keys are going to be an array of numerical values, I propose that another possible data structure to quickly query your dataset would be a k-d tree. I don't think Shazam uses one, but the more I think about it, the closer their system seems to an n-dimensional nearest neighbor search, if you can keep the amount of critical points consistent. For now though, just keep it simple, use a list.
Now we have a database of fingerprints primed and ready for use. We need to compare them against our microphone input now.
Sample our microphone input in small segments with a sliding window, the same way we did our sources.
For each segment, calculate the fingerprint, and see if it matches close to any from storage. You can look for a partial match here and there are lots of tweaks and optimizations you could try.
This is going to be a noisy and inaccurate signal so don't expect every segment to get a match. If lots of them are getting a match (you will have to figure out what lots means experimentally), then assume you have one. If there are relatively few matches, then figure you don't.
Conclusions
This is not going to be an super easy project to do well. The amount of tuning and optimization required will prove to be a challenge. Some microphones are inaccurate, and most environments have other sounds, and all of that will mess with your results, but it's also probably not as bad as it sounds. I mean, this is a system that from the outside seems unapproachably complex, and we just broke it down into some relatively simple steps.
Also as a final note, you mention Javascript several times in your post, and you may notice that I mentioned it zero times up until now in my answer, and that's because language of implementation is not an important factor. This system is complex enough that the hardest pieces to the puzzle are going to be the ones you solve on paper, so you don't need to think in terms of "how can I do X in Y", just figure out an algorithm for X, and the Y should come naturally.
I'm trying to determine the fundamental frequency of an input signal (from a tone generator, or possibly a musical instrument) using JavaScript's WebAudio API, along with some other SO articles (How to get frequency from fft result?, How do I obtain the frequencies of each value in an FFT?), but I can only seem to determine a frequency within about 5-10Hz.
I'm testing using a signal generator app for iPad positioned next to a high-quality microphone in a quiet room.
For 500Hz, it consistently returns 506; for 600Hz, I get 592. 1kHz is 1001Hz, but 2kHz is 1991. I've also noticed that I can modulate the frequency by 5Hz (at 1Hz increments) before seeing any change in data from the FFT. And I've only been able to get this accurate by averaging together the two highest bins.
Does this mean that there's not enough resolution in the FFT data to accurately determine the fundamental frequency within 1Hz, or have I gone about it the wrong way?
I've tried using both the native FFT libs (like this, for example):
var fFrequencyData = new Float32Array(analyser.frequencyBinCount);
analyser.getFloatFrequencyData(fFrequencyData);
(you can assume I've properly initialized and connected an Analyser node), which only shows a sensitivity/resolution of about
and also using DSP.js' FFT lib, like this:
var fft = new FFT();
fft.forward(e.inputBuffer.getChannelData(0));
fFrequencyData = fft.spectrum;
where e is the event object passed to onaudioprocess.
I seem to have a problem with FFT data (like FFT::spectrum) being null - is that normal? The data from fft.spectrum is naught unless I run it through analyser.getFloatFrequencyData, which I'm thinking overwrites the data with stuff from the native FFT, defeating the purpose entirely?
Hopefully someone out there can help steer me in the right direction - thanks! :)
You would need a very large FFT to get high-quality pitch detection this way. To get a 1Hz resolution, for example, with a 44,100kHz sample rate, you would need a 64k(-ish) FFT size. You're far better off using autocorrelation to do monophonic pitch detection.
I've been building a web paint program wherein the state of a user's artwork is saved as a json object. Every time I add to the client's undo stack (just an array of json objects describing the state of theproject), I want to save the state to the server too.
I am wondering if there an elegant way to [1] only send up the diffs and then [2] be able to download the project later and recreate the current state of the project? I fear this could get messy and am trending towards just uploading the complete json project state at every undo step. Any suggestions or pointers to projects which tackle this sort of problem gracefully?
Interesting - and pretty large - question.
A lot of implementations / patterns / solutions apply to this problem and they vary depending on the type of "document" you're keeping track of updates of.
Anyway, a simple approach to avoid getting mad is, instead than saving "states", saving "command which produced those states".
If your application is completely deterministic (which I assume it is, since it's a painting program), you can be sure that for every command at given time & position, the result will be the same at every execution.
So, I would instead note down an "alphabet" representing the commands available in your program:
Draw[x,y,size, color]
Erase[x,y,size]
Move[x,y]
and so on. You can take inspiration from SVG implementation. Then push/pull strings of commands to/from the server:
timestamp: MOVE[0,15]DRAW[15,20,4,#000000]ERASE[4,8,10]DRAW[15,20,4,#ff0000]
This is obviously only a general, pseudocoded idea. Hope you can get some inspiration.
I've been working on a tool to transcribe recordings of speech with Javascript. Basically I'm hooking up key events to play, pause, and loop a file read in with the audio tag.
There are a number of advanced existing desktop apps for doing this sort of thing (such as Transcriber -- here's a screenshot). Most transcription tools have a built-in waveform that can be used to jump around the audio file, which is very helpful because the transcriber can learn to visually find and repeat or loop phrases.
I'm wondering if it's possible to emulate a subset of this functionality in the browser, with Javascript. I don't know much about signal processing, perhaps it's not even feasible.
But what I envision is Javascript reading the sound stream from the file, and periodically sampling the amplitude. If the amplitude is very low for longer than a certain threshhold of time, then that would be labled as a phrase break.
Such labeling, I think, would be very useful for transcription. I could then set up key commands to jump to the previous period of silence. So hypothetically (imagining a jQuery-based API):
var audio = $('audio#someid');
var silences = silenceFindingVoodoo(audio);
silences will then contain a list of times, so I can hook up some way to let the user jump around through the various silences, and then set the currentTime to a chosen value, and play it.
Is it even conceivable to do this sort of thing with Javascript?
Yes it's possible with Web Audio API, to be more precise you will need AnalyserNode. To give you a short proof of concept you can get this example, and add following code to drawTimeDomain():
var threshold = 1000;
var sum = 0;
for (var i in amplitudeArray) {
sum += Math.abs(128 - amplitudeArray[i]);
}
var test = (sum < threshold) ? 'silent' : 'sound';
console.log('silent info', test);
You will just need a additional logic to filter silent by milliseconds (e.g. any silent taking more than 500 ms should be seen as real silent )
I think this is possible using javascript (although maybe not advisable, of course). This article:
https://developer.mozilla.org/En/Using_XMLHttpRequest#Handling_binary_data
... discusses how to access files as binary data, and once you have the audio file as binary data you could do whatever you like with it (I guess, anyway - I'm not real strong with javascript). With audio files in WAV format, this would be a trivial exercise, since the data is already organized by samples in the time domain. With audio files in a compressed format (like MP3), transforming the compressed data back into time-domain samples would be so insanely difficult to do in javascript that I would found a religion around you if you managed to do it successfully.
Update: after reading your question again, I realized that it might actually be possible to do what you're discussing in javascript, even if the files are in MP3 format and not WAV format. As I understand your question, you're actually just looking to locate points of silence within the audio stream, as opposed to actually stripping out the silent stretches.
To locate the silent stretches, you wouldn't necessarily need to convert the frequency-domain data of an MP3 file back into the time-domain of a WAV file. In fact, identifying quiet stretches in audio can actually be done more reliably in the frequency domain than in the time domain. Quiet stretches tend to have a distinctively flat frequency response graph, whereas in the time domain the peak amplitudes of audible speech are sometimes not much higher than the peaks of background noise, especially if auto-leveling is occurring.
Analyzing an MP3 file in javascript would be significantly easier if the file were CBR (constant bit rate) instead of VBR (variable bit rate).
As far as I know, JavaScript is not powerful enough to do this.
You'll have to resort to flash, or some sort of server side processing to do this.
With the HTML5 audio/video tags, you might be able to trick the page into doing something like this. You could (hypothetically) identify silences server-side and send the timestamps of those silences to the client as meta data in the page (hidden fields or something) and then use that to allow JavaScript to identify those spots in the audio file.
If you use WebWorker threads you may be able to do this in Javascript, but that would require using more threads in the browser to do this. You could break up the problem into multiple threads and process it, but, it would be all but impossible to synchronize this with the playback. So, Javascript can determine the silent periods, by doing some audio processing, but since you can't link that to the playback well it would not be the best choice.
But, if you wanted to show the waveforms to the user then javascript and canvas can be used for this, but then see the next paragraph for the streaming.
Your best bet would be to have the server stream the audio and it can do the processing and find all the silences. Each of these should then be saved in a separate file, so that you can easily jump between the silences, and by streaming, your server app can determine when to load up the new file, so there isn't a break.
I don't think JavaScript is the tool you want to use to use for processing those audio files - that's asking for trouble. However, javascript could easily read a corresponding XML file which describes where those silences occur in the audio file, adjusting the user interface appropriately. Then, the question is what do you use to generate those XML files:
You can do it manually if you need to demo the capability right away. (Use audacity to see where those audio envelopes occur)
Check out this CodeProject article, which creates a wav processing library in C#. The author has created a function to extract silence from the input file. Probably a good place to start hacking.
Just two of my initial thoughts ... There are ALOT of audio processing APIs out there, but they are written for particular frameworks and application programming languages. Definitely make use of them before trying to write something from scratch ... unless you happen to really love fourier transforms.