Codec of processed audio with Web Audio Api

Codec of processed audio with Web Audio Api - javascript

for the last two weeks, I tried to record audio and change its codec in realtime in a browser using javascript, I used script processor in all my test cases to record audio in realtime, but on the other hand, I tried too many libraries and packages to encode and decode the audio, but none of them worked, there was always a problem, sometimes the encoder couldn't be able to recognize the audio codec, and sometimes the decoder wasn't able to decode the encoded data.
I made something like this in c# with PvRecodrder which records audio as a short[] which actually is PCM buffer that consists of unsigned bytes.
However the channel data of script processor is float32[] which is so different or even weird in comparison, its value ranges from very small negative numbers to positive numbers around one.
Now I'm just wondering what this channel data actually is?
what is the type and codec of this float32[]?
.wav, .pcm or what?

The channel data of an AudioBuffer typically contains values between -1 and 1. But since the values are stored in a Float32Array they could also be much larger or smaller. However playing such an AudioBuffer with values outside of the nominal range will likely result in audible distortion.
https://webaudio.github.io/web-audio-api/#AudioBuffer

Related

Copy data from smaller AudioBuffers into a bigger Float32Array

I want to analyze the frequencies coming from the microphone input with a resolution of <1Hz in browser.
The normal Web Audio AnalyzerNode has a maximum FFT_SIZE of 32768. This results in a resolution of ~1.4Hz for normal samplerates (48kHz).
Now I want to use jsfft or something similar to do the frequency transform. I want to collect 65536 audio samples as this fft size should reach a resolution of ~0.7Hz. (Time resolution is not that important)
Unfortunately the ScriptProcessorNode also only has a maximum bufferSize of 16384 I want to combine 4 of its buffers to one Float32Array.
I thought that there will be someting like
copyChannelData(array, offset, length)
but there is only
getChannelData(array)
So if I understand right I would have to copy all the data in my bigger array before I can do the fft.
Just to be sure I don't miss anything...Is there a way to retrieve the data directly into my bigger array?

No, you will need to copy the data. This method is pretty inefficient anyway (ScriptProcessor, I mean) - the copy is not the worst of your worries, since you are fundamentally going to need to copy that data.

How much data does a string data type take up?

Very simple question, how much data (bytes) do strings take up? Do they take up 1 byte per character?
I tried searching it up, but ws schools doesn't say...
I want to know this to reduce bandwidth in my web app.
Also, for anyone that knows, does socket.io automatically json stringify when using socket.emit();?

String is a character array. So, it will take up roughly sizeof(char) * noOfCharacters ignoring other fields in String class for now. Character can be of 1 byte or 2 bytes depending upon the system, the type of chars being represented- unicode etc.
However, from your question, you are more interested in data being transported over the network. Note that data is always exchanged in bytes (byte[]) and thus string will be converted into byte[] representation first and then ported over.
To limit the bandwidth usage, you can enable compression, choose interoperable serialisation technique(protobuf, smile, fastinfoset etc)

Start MPEG-DASH Stream at Arbitrary Segment

Say there are 20 segments in an MPEG-DASH stream, and the stream typically starts at index 0. Is it possible to start at index 13, assuming an init file/byte sequence has already been queued into the Media Source buffer? An example, in which this use-case would be practical, is for something like Netflix' resumption feature – where someone could continue streaming on another device/browser. (Presumably with the same init data as when started from the beginning.)
My only thought is that my assumption is wrong, and there would be a different initialization chunk for each various point in which the media could be paused… but that would just be silly… right?

The simple answer is that yes it is possible and as you suggest this can be used for resume playback features. It can also be used for 'start-over' on live streams and to jump forwards or backwards to a particular point in a video.
MPEG DASH supports two main file formats (or video container formats) - ISO Base Media File Format (ISOBMFF - which is often referred to as MP4 although it is strictly speaking a generalisation of MPEG-2) and MPEG-TS.
The MPEG DASH standard uses the concept of 'Periods' as one of its fundamental building blocks - the periods represents a part of the content stream and include a start time and a duration. To be able to playback the content in a given period you still need some initialisation data.
Looking at ISOBMFF, there is an init segment as you suggest which contains this required data and which is defined by W3C as:
Initialization Segment
A sequence of bytes that contain all of the initialization information required to decode a sequence of media segments. This includes codec initialization data, Track ID mappings for multiplexed segments, and timestamp offsets (e.g., edit lists).

How to render dynamic bitmap stream to canvas?

I have written an OpenGL game and I want to allow remote playing of the game through a canvas element. Input is easy, but video is hard.
What I am doing right now is launching the game via node.js and in my rendering loop I am sending to stdout a base64 encoded stream of bitmap data representing the current frame. The base64 frame is sent via websocket to the client page, and rendered (painstakingly slowly) pixel by pixel. Obviously this can't stand.
I've been kicking around the idea of trying to generate a video stream and then I can easily render it onto a canvas through a tag (ala http://mrdoob.github.com/three.js/examples/materials_video.html).
The problem I'm having with this idea is I don't know enough about codecs/streaming to determine at a high level if this is actually possible? I'm not sure if even the codec is the part that I need to worry about being able to have the content dynamically changed, and possibly on rendered a few frames ahead.
Other ideas I've had:
Trying to create an HTMLImageElement from the base64 frame
Attempting to optimize compression / redraw regions so that the pixel bandwidth is much lower (seems unrealistic to achieve the kind of performance I'd need to get 20+fps).
Then there's always the option of going flash...but I'd really prefer to avoid it. I'm looking for some opinions on technologies to pursue, ideas?

Try transforming RGB in YCbCr color space and stream pixel values as:
Y1 Y2 Y3 Y4 Y5 .... Cb1 Cb2 Cb3 Cb4 Cb5 .... Cr1 Cr2 Cr3 Cr4 Cr5 ...
There would be many repeating patterns, so any compressing algorithm will compress it better then RGBRGBRBG sequence.
http://en.wikipedia.org/wiki/YCbCr

Why base64 encode the data? I think you can push raw bytes over a WebSocket
If you've got a linear array of RGBA values in the right format you can dump those straight into an ImageData object for subsequent use with a single ctx.putImageData() call.

Is it possible to find stretches of silence in audio files with Javascript?

I've been working on a tool to transcribe recordings of speech with Javascript. Basically I'm hooking up key events to play, pause, and loop a file read in with the audio tag.
There are a number of advanced existing desktop apps for doing this sort of thing (such as Transcriber -- here's a screenshot). Most transcription tools have a built-in waveform that can be used to jump around the audio file, which is very helpful because the transcriber can learn to visually find and repeat or loop phrases.
I'm wondering if it's possible to emulate a subset of this functionality in the browser, with Javascript. I don't know much about signal processing, perhaps it's not even feasible.
But what I envision is Javascript reading the sound stream from the file, and periodically sampling the amplitude. If the amplitude is very low for longer than a certain threshhold of time, then that would be labled as a phrase break.
Such labeling, I think, would be very useful for transcription. I could then set up key commands to jump to the previous period of silence. So hypothetically (imagining a jQuery-based API):
var audio = $('audio#someid');
var silences = silenceFindingVoodoo(audio);
silences will then contain a list of times, so I can hook up some way to let the user jump around through the various silences, and then set the currentTime to a chosen value, and play it.
Is it even conceivable to do this sort of thing with Javascript?

Yes it's possible with Web Audio API, to be more precise you will need AnalyserNode. To give you a short proof of concept you can get this example, and add following code to drawTimeDomain():
var threshold = 1000;
var sum = 0;
for (var i in amplitudeArray) {
sum += Math.abs(128 - amplitudeArray[i]);
}
var test = (sum < threshold) ? 'silent' : 'sound';
console.log('silent info', test);
You will just need a additional logic to filter silent by milliseconds (e.g. any silent taking more than 500 ms should be seen as real silent )

I think this is possible using javascript (although maybe not advisable, of course). This article:
https://developer.mozilla.org/En/Using_XMLHttpRequest#Handling_binary_data
... discusses how to access files as binary data, and once you have the audio file as binary data you could do whatever you like with it (I guess, anyway - I'm not real strong with javascript). With audio files in WAV format, this would be a trivial exercise, since the data is already organized by samples in the time domain. With audio files in a compressed format (like MP3), transforming the compressed data back into time-domain samples would be so insanely difficult to do in javascript that I would found a religion around you if you managed to do it successfully.
Update: after reading your question again, I realized that it might actually be possible to do what you're discussing in javascript, even if the files are in MP3 format and not WAV format. As I understand your question, you're actually just looking to locate points of silence within the audio stream, as opposed to actually stripping out the silent stretches.
To locate the silent stretches, you wouldn't necessarily need to convert the frequency-domain data of an MP3 file back into the time-domain of a WAV file. In fact, identifying quiet stretches in audio can actually be done more reliably in the frequency domain than in the time domain. Quiet stretches tend to have a distinctively flat frequency response graph, whereas in the time domain the peak amplitudes of audible speech are sometimes not much higher than the peaks of background noise, especially if auto-leveling is occurring.
Analyzing an MP3 file in javascript would be significantly easier if the file were CBR (constant bit rate) instead of VBR (variable bit rate).

As far as I know, JavaScript is not powerful enough to do this.
You'll have to resort to flash, or some sort of server side processing to do this.
With the HTML5 audio/video tags, you might be able to trick the page into doing something like this. You could (hypothetically) identify silences server-side and send the timestamps of those silences to the client as meta data in the page (hidden fields or something) and then use that to allow JavaScript to identify those spots in the audio file.

If you use WebWorker threads you may be able to do this in Javascript, but that would require using more threads in the browser to do this. You could break up the problem into multiple threads and process it, but, it would be all but impossible to synchronize this with the playback. So, Javascript can determine the silent periods, by doing some audio processing, but since you can't link that to the playback well it would not be the best choice.
But, if you wanted to show the waveforms to the user then javascript and canvas can be used for this, but then see the next paragraph for the streaming.
Your best bet would be to have the server stream the audio and it can do the processing and find all the silences. Each of these should then be saved in a separate file, so that you can easily jump between the silences, and by streaming, your server app can determine when to load up the new file, so there isn't a break.

I don't think JavaScript is the tool you want to use to use for processing those audio files - that's asking for trouble. However, javascript could easily read a corresponding XML file which describes where those silences occur in the audio file, adjusting the user interface appropriately. Then, the question is what do you use to generate those XML files:
You can do it manually if you need to demo the capability right away. (Use audacity to see where those audio envelopes occur)
Check out this CodeProject article, which creates a wav processing library in C#. The author has created a function to extract silence from the input file. Probably a good place to start hacking.
Just two of my initial thoughts ... There are ALOT of audio processing APIs out there, but they are written for particular frameworks and application programming languages. Definitely make use of them before trying to write something from scratch ... unless you happen to really love fourier transforms.

Develop Reference

JavaScript is the programming language of the Web.