I'm working on a simple audio visualization application that uses a Web Audio API analyzer to pull frequency data, as in this example. Expectedly, the more visual elements I add to my canvases, the more latency there is between the audio and the yielded visual results.
Is there a standard approach to accounting for this latency? I can imagine a lookahead technique that buffers the upcoming audio data. I could work with synchronizing the JavaScript and Web Audio clocks, but I'm convinced that there's a much more intuitive answer. Perhaps it is as straightforward as playing the audio aloud with a slight delay (although this is not nearly as comprehensive).
The dancer.js library seems to have the same problem (always has a very subtle delay), whereas other applications seem to have solved the lag issue entirely. I have insofar been unable to pinpoint the technical difference. SoundJS seems to handle this a bit better, but it would be nice to build from scratch.
Any methodologies to point me in the right direction are much appreciated.
I think you will find some answers to precise audio timing in this article:
http://www.html5rocks.com/en/tutorials/audio/scheduling/
SoundJS uses this approach to enable smooth looping, but still uses javascript timers for delayed playback. This may not help you sync the audio timing with the animation timing. When I built the music visualizer example for SoundJS I found I had to play around with the different values for fft size and tick frequency to get good performance. I also needed to cache a single shape and reuse it with scaling to have performant graphics.
Hope that helps.
I'm concerned when you say the more visual elements you add to your canvases, the more latency you get in audio. That shouldn't really happen quite like that. Are your canvases being animated using requestAnimationFrame? What's your frame rate like?
You can't, technically speaking, synchronize the JS and Web Audio clocks - the Web Audio clock is the audio hardware clock, which is literally running off a different clock crystal than the system clock (on many systems, at least). The vast majority of web audio (ScriptProcessorNodes being the major exception) shouldn't have additional latency introduced when your main UI thread becomes a bit more congested).
If the problem is the analysis just seems to lag (i.e. the visuals are consistently "behind" the audio), it could just be the inherent lag in the FFT processing. You can reduce the FFT size in the Analyser, although you'll get less definition then; to fake up fixing it, you can also run all the audio through a delay node to get it to re-sync with the visuals.
Also, you may find that the "smoothing" parameter on the Analyser makes it less time-precise - try turning that down.
Related
Using the Web Audio API, I create a bufferSource, and use a new MediaRecorder to record at the same time. I'm recording the sound coming out of my speakers with the built in microphone.
If I play back the original and the new recording, there is a significant lag between the two. (It sounds like about 200ms to me.) If I console.log the value of globalAudioCtx.currentTime at the point of calling the two "start" methods, the two numbers are exactly the same. The values of Date.now() are also exactly the same.
Where is this latency introduced? The latency due to speed of sound is about 1000x as small as what I am hearing.
In short, how can I get these two samples to play back at exactly the same time?
I'm working in Chrome on Linux.
Where is this latency introduced?
Both on the playback and recording.
Your sound card has a buffer, and software has to write audio to that buffer in small chunks at a time. If the software can't keep up, choppy audio is heard. So, buffer sizes are set to be large enough to prevent that from happening.
The same is true on the recording end. If a buffer isn't large enough, recorded audio data would be lost if the software wasn't able to read from that buffer fast enough, causing choppy and lost audio.
Browsers aren't using the lowest latency mode of operation with your sound card. There are some tweaks you can apply (such as using WASAPI and exclusive mode on Windows with Chrome), but you're at the mercy of the browser developers who didn't design this with folks like you and I in mind.
No matter how low you go though in buffer size, there is still going to be lag. That's the nature of computer-based digital audio.
how can I get these two samples to play back at exactly the same time?
You'll have to delay one of the samples to bring them back in sync.
I've recently been trying to generating video in the browser, and have thus been playing with two approaches:
Using the whammy js library to combine webp frames into webm video. More details here.
Using MediaRecorder and canvas.captureStream. More details here.
The whammy approach works well, but is only supported in Chrome, since it's the only browser that currently supports webp encoding (canvas.toDataURL("image/webp")). And so I'm using the captureStream approach as a backup for Firefox (and using libwebpjs for Safari).
So now on to my question: Is there a way to control the video quality of the canvas stream? And if not, has something like this been considered by the browsers / w3c?
Here's a screenshot of one of the frames of the video generated by whammy:
And here's the same frame generated by the MediaRecorder/canvas.captureStream approach:
My first thought is to artificially increase the resolution of the canvas that I'm streaming, but I don't want the output video to be bigger.
I've tried increasing the frame rate passed to the captureStream method (thinking that there may be some strange frame interpolation stuff happening), but this doesn't help. It actually degrades quality if I make it too high. My current theory is that the browser decides on the quality of the stream based on how much computational power it has access to. This makes sense, because if it's going to keep up with the frame rate that I've specified, then something has to give.
So the next thought is that I should slow down the rate at which I'm feeding the canvas with images, and then proportionally lower the FPS value that I pass into captureStream, but the problem with that is that even though I'd likely have solved the quality problem, I'd then end up with a video that runs slower than it's meant to.
Edit: Here's a rough sketch of the code that I'm using, in case it helps anyone in a similar situation.
These are compression artifacts, and there is not much you can do for now...
Video codecs are built mainly with the idea of showing real-life colors and shapes, a bit like JPEG with a really low quality. They will also do their best to keep as less information as they can between keyframes (some using motion detection algorithm) so that they need less data to be stored.
These codecs normally have some configurable settings that will allow us to improve the constant-quality of the encoding, but MediaRecorder's specs being codec agnostic, they didn't provide (yet) an option in the API for us web-devs to set any other option than a fixed bit-rate (which won't help us more in here).
There is this proposal, which asks for such a feature though.
I'm a Javascript dilettante. I need to make a webpage for mobile viewing to deploy a dynamically created but ultimately linear audio piece. Essentially I would need to load a playlist, in which some tracks are fixed but others are randomly chosen from a larger pool; there also need to be timed pauses between some of the tracks. It would need only minimal controls, probably just play/pause.
I'm looking into Web Audio API and the basic HTML5 <audio> tag. My two main concerns for choosing between them are compatibility and simplicity of use.
On the compatibility point, I see that on the main page for the API itself it lists no support for Android, but on this more detailed rundown almost all browsers are listed as green. What's the best source to trust?
Assuming Web Audio API is viable for mobile deployment, do I need to use it? Would it make my life easier or is it just overpowered for my purposes? I see it has a handy onended event handler which I see myself using for queuing, and precise timing functions. It also seems to be more explicit about loading the files asynchronously with a callback function on success - I'd want to have a loading screen so that would be useful.
I'm a bit less clear on the capabilities of <audio>. I guess it must be able to do everything I want given HTML5 players have been built before Web Audio API came along - but is it more fiddly?
Web Audio works just fine on mobile.
Web Audio, in contrast to <audio>, breaks apart and give the developer precise control over the loading, decoding and playing of audio. If you need precise timing - like, beat-synching - of audio, you should probably use web audio. <audio> is pretty imprecise.
That said, a few caveats - as Web Audio by default uses in-memory buffers, it can use a lot more memory than <audio>, and it doesn't have native components to do streaming audio. The onended event is NOT the right way to do real chaining of audio, because it's a main-thread-Javascript callback (that is to say, any event handling like this might be delayed by other JS, garbage collection, etc. - and it might be off by 50 or 100 milliseconds). If you really care about timing, you have to plan ahead and use Web Audio scheduling. (This article I wrote describes this in more detail.)
I'm playing around a bit with the HTML5 <audio> tag and I noticed some strange behaviour that has to do with the currentTime attribute.
I wanted to have a local audio file played and let the timeupdate event detect when it finishes by comparing the currentTime attribute to the duration attribute.
This actually works pretty fine if I let the song play from the beginning to the end - the end of the song is determined correctly.
However, changing the currentTime manually (either directly through JavaScript or by using the browser-based audio controls) results in the API not giving back the correct value of the currentTime anymore but seems to set it some seconds ahead of the position that's actually playing.
(These "some seconds" ahead are based on Chrome, Firefox seems to completely going crazy which results in the discrepancy being way bigger.)
A little jsFiddle example about the problem: http://jsfiddle.net/yp3o8cyw/2/
Can anybody tell me why this happens - or did I just not getting right what the API should do?
P.S.: I just noticed this actually only happens with MP3-encoded files, OGG files are totally doing fine.
After hours of battling this mysterious issue, I believe I have figured out what is going on here. This is not a question of .ogg vs .mp3, this is a question of variable vs. constant bitrate encoding on mp3s (and perhaps other file types).
I cannot take the credit for discovering this, only for scouring the interwebs. Terrill Thompson, a gentlemen and scholar, wrote a detailed article about this problem back on February 1st, 2015, which includes the following excerpt:
Variable Bit Rate (VBR) uses an algorithm to efficiently compress the
media, varying between low and high bitrates depending on the
complexity of the data at a given moment. Constant Bit Rate (CBR), in
contrast, compresses the file using the same bit rate throughout. VBR
is more efficient than CBR, and can therefore deliver content of
comparable quality in a smaller file size, which sounds attractive,
yes?
Unfortunately, there’s a tradeoff if the media is being streamed
(including progressive download), especially if timed text is
involved. As I’ve learned, VBR-encoded MP3 files do not play back with
dependable timing if the user scrubs ahead or back.
I'm writing this for anyone else who runs into this syncing problem (which makes precise syncing of audio and text impossible), because if you do, it's a real nightmare to figure out what is going on.
My next step is to do some more testing, and finally to figure out an efficient way to convert all my .mp3s to constant bit rate. I'm thinking FFMPEG may be able to help, but I'll explore that in another thread. Thanks also to Loilo for originally posting about this issue and Brad for the information he shared.
First off, I'm not actually able to reproduce your problem on my machine, but I only have a short MP3 file handy at the moment, so that might be the issue. In any case, I think I can explain what's going on.
MP3 files (MPEG) are very simple streams and do not have absolute positional data within them. It isn't possible from reading the first part of the file to know at what byte offset some arbitrary frame begins. The media player seeks in the file by needle dropping. That is, it knows the size of the entire track and roughly how far into the track your time offset is. It guesses and begins decoding, picking up as soon as it synchronizes to the next frame header. This is an imprecise process.
Ogg is a more robust container and has time offsets built into its frame headers. Seeking in an Ogg file is much more straightforward.
The other issue is that most browsers that support MP3 do so only because the codec is already available on your system. Playing Ogg Vorbis and MP3 are usually completely different libraries with different APIs. While the web standards do a lot to provide a common abstraction, minor implementation details cause quirks like you are seeing.
I have been trying to establish whether the Web Audio API might be useful for analysis of audio data pulled into an ArrayBuffer in faster than realtime. Possible applications would be doing beat detection, pitch detection, etc., in the browser rather than passing files to a server to do the work.
The AnalyserNode interface seems a good fit for such a task, but it feels clunky because it still requires chaining to a AudioBufferSourceNode and start()ing it before you get any data. And, to do it faster than realtime would require bumping up its playbackRate which would reduce the quality of the analysis.
Alternatively, using dsp.js may be a better fit, but its repository has been all but inactive for a couple years, which isn't a huge vote of confidence.
I guess the root question is: is the Web Audio API intended for analysis work or is its sole purpose (and thus what it's designed and optimized for) performance and playback? If it's not, have any other standards or tools been proposed or built specifically for audio analysis? Did I answer my own question by mentioning dsp.js?
Not really, no - there's no current way to use the analyser in faster-than-realtime. WA does do analysis, but not faster than realtime, at this point.