Decode audio and play intro then looped part

Decode audio and play intro then looped part - javascript

I have a song, consisting of intro (I), a part to be looped (L), and ending (E). I don't want the ending to be played at all, i.e. audio file song = I + L + E, audio played = I + L + L + ...
I succeeded to do that by separating the intro and the loop into two files, but I want to do that "on-the-fly" on the client side.
How do I do that?

Web Audio API provides AudioBufferSourceNode.loopStart and AudioBufferSourceNode.loopEnd (source) precisely for that. You have to remember to set AudioBufferSourceNode.loop = true also.

If you want to play the file using Web Audio (e.g. decode into an AudioBuffer using decodeAudioData() and then play the sound with BufferSourceNodes), then it's easy to do by pointing two BufferSource Nodes at the buffer with different offsets, and looping the second one.
Web Audio uses doubles (not floats) - so the statement "this is much more accurate than float seconds" isn't generally true (doubles have around 15 decimal places of accuracy). (Depends on the absolute value, of course.) It's more than accurate enough to play with sample-accuracy (i.e. to not glitch between the values); if you're concerned, though, just cross-fade between them.
If your goal is really just to produce an audio file with that part looped, that's a little different; but it sounds like you want playback.

Related

How can I prevent simultaneously playing js audio from interfering with one another

I have some javascript that plays audio from several .wav files using the standard audio commands e.g.:
var audio = new Audio('audio_file.wav');
audio.play();
When the user plays two sounds in quick succession, the sounds start to interfere with each other and get distorted/strange sounding. Notably, when I just play the files in a media player simultaneously, this does not happen - it's simply the two sounds playing at the same time without any distortion. Is this a known thing that happens when playing audio in js, and is there are way to solve it so that multiple sounds playing simultaneously do not cause distortions in one another? Any help would be great!

You need to reduce the volume of the sound you're playing.
Multiple sounds are mixing together, periodically "clipping"... pushing the sample values beyond what can be represented in your system's audio sample format.
Try something like this:
audio.volume = 0.6;
Now, if you have times where you're usually only playing one sound at once, but some other times where you need multiple, simply reducing the volume may not be desirable, as playback might be too quiet.
For these times, you might consider switching to the Web Audio API and using a compressor:
https://developer.mozilla.org/en-US/docs/Web/API/DynamicsCompressorNode
This "squeezes" the loud parts and the quiet parts so that they all sound a bit similar, allowing you to reduce the output levels while still sounding loud when possible.

How to convert a wavetable for use with `OscillatorNode.setPeriodicWave`?

I would like use a custom waveform with a WebAudio OscillatorNode. I'm new to audio synthesis and still struggle quite a lot with the mathematics (I can, at least, program).
The waveforms are defined as functions, so I have the function itself, and can sample the wave. However, the OscillatorNode.createPeriodicWave method requires two arrays (real and imag) that define the waveform in the frequency domain.
The AnalyserNode has FFT methods for computing an array (of bytes or floats) in the frequency domain, but it works with a signal from another node.
I cannot think of a way to feed a wavetable into the AnalyserNode correctly, but if I could, it only returns a single array, while OscillatorNode.createPeriodicWave requires two.
TLDR Starting with a periodic function, how do you compute the corresponding arguments for OscillatorNode.createPeriodicWave?

Since you have a periodic waveform defined by a function, you can compute the Fourier Series for this function. If the series has an infinite number of terms, you'll need to truncate it.
This is a bit of work, but this is exactly how the pre-defined Oscillator types are computed. For example, see the definition of the square wave for the OscillatorNode. The PeriodicWave coefficients for the square wave were computed in exactly this way.
If you know the bandwidth of your waveform, you can simplify the work a lot by not having to do the messy integrals. Just uniformly sample the waveform fast enough, and then use an FFT to get the coefficients you need for the PeriodicWave. Additional details on in the sampling theorem.
Or you can just assume that sample rate of the AudioContext (typically 44.1 kHz or 48 kHz) is high enough and just sample your waveform every 1/44100 or 1/48000 sec and compute the FFT of the resulting samples.

I just wrote an implementation of this. To use it, drag and drop the squares to form a waveform and then play the piano that appears afterwards. Watch the video in this tweet to see a use example. The live demo is in alpha version, so the code and UI are a little rough. You can check out the source here.
I didn't write any documentation, but I recorded some videos (Video 1) (Video 2) (Video 3) of me coding the project live. They should be pretty self-explanatory. There are a couple of bugs in there that I fixed later. For the working version, please refer to the github link.

SourceBuffer.remove(start, end) removes whole buffered TimeRange (How to handle realtime stream with MSE?)

I have a SourceBuffer with a single entry in .buffered. I have a realtime stream of raw h.264 data arriving which I encode into mp4 and push into the SourceBuffer with .appendBuffer(data). Since this is a realtime stream of data I need to keep clearing the buffer however this is where I encounter my problem. (Ie. I encounter a QuotaExceededError)
For examples sake my single entry in SourceBuffer.buffered has a timerange of 0-10 seconds. My attempt to tidy the buffer is to call SourceBuffer.remove(0, 8). My expectation is that my buffer would be cleared and I'd be left with a timerange of 8-10. However the entire timerange (my only range) is removed and from this point all further appendBuffer calls seem to do nothing.
Three questions relevant to this issue:
How do I a) stop .remove from having this behaviour or b) force new time-ranges in my buffer so that only "old" ranges are removed.
Why do the later appendBuffer calls do nothing? I would expect them to re-populate the SourceBuffer.
Is there a better "MSE" way to handle a realtime stream where I never care about going back in time? Ie. All rendered data can be thrown away.
In case there's some weird Browser/Platform issue going on I'm using Chrome on Ubuntu.
Also, I am basing my code off of https://github.com/xevokk/h264-converter.

It's all in the MSE spec.
http://w3c.github.io/media-source/#sourcebuffer-coded-frame-removal
Step 3.3: Remove all media data, from this track buffer, that contain starting timestamps greater than or equal to start and less than the remove end timestamp.
So the user agent will remove all the data you've requested, from 0 to 8s
Then
Step 3.4: Remove all possible decoding dependencies on the coded frames removed in the previous step by removing all coded frames from this track buffer between those frames removed in the previous step and the next random access point after those removed frames.
The user agent will remove all frames that depend on the ones you've just removed. Due to the way h264 works (and all modern video codec) that is all frames following the last keyframe until the next keyframe, as none of those frames can now be decoded.
There is no keyframe in range 8 to 10s, so they are all removed
Why do the later appendBuffer calls do nothing? I would expect them to
re-populate the SourceBuffer.
You have removed data, as per spec, the next frame you add must be a keyframe. If the segment you add contains no keyframe, nothing will be added.
If the data you add is made of a single keyframe at the start followed by just P-frame, then you can't remove any frames in the middle without rendering unusable all the ones that follow

webaudio API: adjust play length of an audio sample (eg "C5.mp3")?

Can I use a (soundfont) sample e.g. "C5.mp3" and extend or shorten its duration for a given time (without distorting the pitch)?
(Would be great if this was as easy as using an oscillator and change the timings of NoteOn and NoteOff, but with a more natural sound rather than sine waves)? (Can that be done easily without having to resort to MIDI.js or similar?)

You would either need to loop the mp3 or record a longer sample. Looping can be tricky to make seamless without clicks or pops though, and depending on the sample you would hear the attack of the note each time it looped.
.mp3s and other formats of recorded audio are all finite, predetermined sets of binary data. The reason it's so easy to manipulate sine waves with an oscillator is because the web audio api is dynamically generating the wave based on the input you're giving it.

Good soundfonts have loop points for endless sounds. See example for WebAudioFont
https://surikov.github.io/webaudiofont/examples/flute.html

Video does not play through even if enough content has been appended

I have a setup where I send a 10min long video (Elephants Dream) using the websockets protocol chunked in short segments of 4s each.
I use the browser as client, with the Websocket API to receive the content and the HTML5 Video Tag as player, to which I append the chunks as they come to the video using Media Source Extensions.
The thing is that there seems to be a limit somewhere (max receive buffer size, max mediasource sourcebuffer buffer size, max buffered content on video element, etc) so that the video does not play correctly to the end but stops earlier even if there is enough data.
All of the segments are arriving correctly and get appended in time. At the same time, the video starts playing back from the beginning.
You can see the grey line on the player showing buffered video grow until at some point in time where it stops growing and the video stops playing when getting to this position.
However, the full video has been appended to the mediasource element, regarding to the output messages, and which can also be tested by manually jumping to another position in future or past. It looks like there is always just a fraction of the content "loaded".
Since I'm testing it on localhost the throughput is very high so I tried lowering this to more common values (still good over video bitrate) to see if I'm overloading the client but this did not change anything.
Also tried different segment sizes, with exact same results, except for that the time in point where it stops is a different one.
Any idea on where this limitation can be or what may be happening?

I think you have a gap in the buffered data. Browsers have a limited buffer size to which you can append. When that limit is reached, if you append additional data, the browser will silently free some space by discarding some frames it does not need from the buffer. In my experience, if you happen too fast, you may end up with gaps in your buffer. You should monitor the status of the buffered attribute when appending to see if there is any gap.

Are you changing representations right before it stops? When you change representations, you need to append the init segment for the new representation before you append the next segment of the new representation.

Develop Reference

JavaScript is the programming language of the Web.