for the last two weeks, I tried to record audio and change its codec in realtime in a browser using javascript, I used script processor in all my test cases to record audio in realtime, but on the other hand, I tried too many libraries and packages to encode and decode the audio, but none of them worked, there was always a problem, sometimes the encoder couldn't be able to recognize the audio codec, and sometimes the decoder wasn't able to decode the encoded data.
I made something like this in c# with PvRecodrder which records audio as a short[] which actually is PCM buffer that consists of unsigned bytes.
However the channel data of script processor is float32[] which is so different or even weird in comparison, its value ranges from very small negative numbers to positive numbers around one.
Now I'm just wondering what this channel data actually is?
what is the type and codec of this float32[]?
.wav, .pcm or what?
The channel data of an AudioBuffer typically contains values between -1 and 1. But since the values are stored in a Float32Array they could also be much larger or smaller. However playing such an AudioBuffer with values outside of the nominal range will likely result in audible distortion.
https://webaudio.github.io/web-audio-api/#AudioBuffer
I'm trying to parse mpd of several DASH live streamming servers to get newly published segments.
You know, there is a tag named "MinimumUpdatePeriod" in the mpd, so it must be upgraded within this time, and usually 1 or 2 segments are newly published during the period. This way, I can get list of segments regularly, they are accumulated one by one.
But in some servers, they go wrong. Even though minimum update period time is expired, no new segments are published. Instead, random number of segments are published per random interval. I can't understand why this happens.
Currently, I'm using the Google shaka player's submodule to parse mpd and catch segment lists, I don't think there is any problem in it. So That's what I'm so confused. I just made simple Javascript module to use shaka, it works really well in case of normal servers as I said, but not for other ones.
The platform is Ubuntu 20.04, I'm using latest version of shaka player.
Thanks for your reading, I hope your opinions.
I'm using this library I found to read a midi file
As there is very little documentation, I have no idea how to read the output object?
question: What does: Channel, data, deltaTime and type keys mean?
In the end I would love to map this js object to some kind of visualization.
Channel: The MIDI format uses the concept of channels to allow different MIDI devices to only listen to specific MIDI events by listening to such a channel. This makes it possible to use a single MIDI file for multiple instruments that should play different notes, etc. So when you have a note on event you should check the channel of the event and only play the instruments that are interested in events that happen in this channel.
Data: Data is a bit arbitrary, but in your example we have an event of type 255 (0xFF) which is a meta event. It has a meta type of 3 (0x03) which means it's a Sequence/Track-name. This was probably assigned by the program that created the MIDI file you use. There's a pretty nifty and concise list of events here: http://www.ccarh.org/courses/253/handout/smf/
deltaTime: Since events in the MIDI file is tempo agnostic it uses the concept of ticks. It's basically a resolution expressed as ticks per quarter note. I think 480 ticks per quarter note is pretty standard, though that is purely based on my own experience, so YMMV. Events can then either be expressed in absolute time (ie. this note on events happens 4800 ticks from the start of the track) or delta time. Delta time is the number of ticks since the last MIDI event happened.
Type: Each MIDI event in a MIDI file has a type to identify what kind of an event it is. This matters since different types of events has different formats (and thus changes the way we decode it, since MIDI is a binary format), where some have a fixed length and others include information on how long the event is (the number of bytes that make up the event).
It's been a couple of years since I last worked with the MIDI format, but I think the above is accurate.
I have an ongoing stream of data that consists of nothing but a single integer for every piece of data I receive.
So I get something like:
6462
6533
6536
6530
6462
376135
623437
616665
616362
616334
Here a graph of a complete pattern.
Now I know I will get a specific pattern in this stream of ints with a certain error margin, I know it will never be the exact same pattern but I will get a very similar one every now and then. Its a very high amplitude pattern similar to the numbers shown in the example. It basically oscillates between 3 states and shows a finer grained difference in every state but the interesting parts are the big differences.
I have no experience in pattern matching and analyzing data streams and no idea where to start. Ideally I would provide my code an data set and it would check if the incoming data stream is matching that data set by a certain margin.
Which is my biggest problem, I can't compare complete data sets, I have to compare a complete set with one that is being constantly generated and see if my pattern is starting to occur in this constant stream.
The main program is currently running in JavaScript/node.js and I'm not sure if JavaScript is suitable for this task but it would be great if I can stay in JavaScript.
Though if there are libraries that help with these kind of tasks in other languages I would be keen to test them out.
I've been working on a tool to transcribe recordings of speech with Javascript. Basically I'm hooking up key events to play, pause, and loop a file read in with the audio tag.
There are a number of advanced existing desktop apps for doing this sort of thing (such as Transcriber -- here's a screenshot). Most transcription tools have a built-in waveform that can be used to jump around the audio file, which is very helpful because the transcriber can learn to visually find and repeat or loop phrases.
I'm wondering if it's possible to emulate a subset of this functionality in the browser, with Javascript. I don't know much about signal processing, perhaps it's not even feasible.
But what I envision is Javascript reading the sound stream from the file, and periodically sampling the amplitude. If the amplitude is very low for longer than a certain threshhold of time, then that would be labled as a phrase break.
Such labeling, I think, would be very useful for transcription. I could then set up key commands to jump to the previous period of silence. So hypothetically (imagining a jQuery-based API):
var audio = $('audio#someid');
var silences = silenceFindingVoodoo(audio);
silences will then contain a list of times, so I can hook up some way to let the user jump around through the various silences, and then set the currentTime to a chosen value, and play it.
Is it even conceivable to do this sort of thing with Javascript?
Yes it's possible with Web Audio API, to be more precise you will need AnalyserNode. To give you a short proof of concept you can get this example, and add following code to drawTimeDomain():
var threshold = 1000;
var sum = 0;
for (var i in amplitudeArray) {
sum += Math.abs(128 - amplitudeArray[i]);
}
var test = (sum < threshold) ? 'silent' : 'sound';
console.log('silent info', test);
You will just need a additional logic to filter silent by milliseconds (e.g. any silent taking more than 500 ms should be seen as real silent )
I think this is possible using javascript (although maybe not advisable, of course). This article:
https://developer.mozilla.org/En/Using_XMLHttpRequest#Handling_binary_data
... discusses how to access files as binary data, and once you have the audio file as binary data you could do whatever you like with it (I guess, anyway - I'm not real strong with javascript). With audio files in WAV format, this would be a trivial exercise, since the data is already organized by samples in the time domain. With audio files in a compressed format (like MP3), transforming the compressed data back into time-domain samples would be so insanely difficult to do in javascript that I would found a religion around you if you managed to do it successfully.
Update: after reading your question again, I realized that it might actually be possible to do what you're discussing in javascript, even if the files are in MP3 format and not WAV format. As I understand your question, you're actually just looking to locate points of silence within the audio stream, as opposed to actually stripping out the silent stretches.
To locate the silent stretches, you wouldn't necessarily need to convert the frequency-domain data of an MP3 file back into the time-domain of a WAV file. In fact, identifying quiet stretches in audio can actually be done more reliably in the frequency domain than in the time domain. Quiet stretches tend to have a distinctively flat frequency response graph, whereas in the time domain the peak amplitudes of audible speech are sometimes not much higher than the peaks of background noise, especially if auto-leveling is occurring.
Analyzing an MP3 file in javascript would be significantly easier if the file were CBR (constant bit rate) instead of VBR (variable bit rate).
As far as I know, JavaScript is not powerful enough to do this.
You'll have to resort to flash, or some sort of server side processing to do this.
With the HTML5 audio/video tags, you might be able to trick the page into doing something like this. You could (hypothetically) identify silences server-side and send the timestamps of those silences to the client as meta data in the page (hidden fields or something) and then use that to allow JavaScript to identify those spots in the audio file.
If you use WebWorker threads you may be able to do this in Javascript, but that would require using more threads in the browser to do this. You could break up the problem into multiple threads and process it, but, it would be all but impossible to synchronize this with the playback. So, Javascript can determine the silent periods, by doing some audio processing, but since you can't link that to the playback well it would not be the best choice.
But, if you wanted to show the waveforms to the user then javascript and canvas can be used for this, but then see the next paragraph for the streaming.
Your best bet would be to have the server stream the audio and it can do the processing and find all the silences. Each of these should then be saved in a separate file, so that you can easily jump between the silences, and by streaming, your server app can determine when to load up the new file, so there isn't a break.
I don't think JavaScript is the tool you want to use to use for processing those audio files - that's asking for trouble. However, javascript could easily read a corresponding XML file which describes where those silences occur in the audio file, adjusting the user interface appropriately. Then, the question is what do you use to generate those XML files:
You can do it manually if you need to demo the capability right away. (Use audacity to see where those audio envelopes occur)
Check out this CodeProject article, which creates a wav processing library in C#. The author has created a function to extract silence from the input file. Probably a good place to start hacking.
Just two of my initial thoughts ... There are ALOT of audio processing APIs out there, but they are written for particular frameworks and application programming languages. Definitely make use of them before trying to write something from scratch ... unless you happen to really love fourier transforms.