Next steps to do with the mfccs, in voice recognition web based

Next steps to do with the mfccs, in voice recognition web based - javascript

I am working on urdu (language spoken in pakistan, india, bangladesh) voice recognition to translate urdu speech into urdu words. So far i did nothing but just have found meyda javascript library for extracting mfccs from data frames. Some document says that for ASR there needs first 12 or 13 mfccs out of 26. During the test, i have separate 46 phonemes(/b/, /g/, /d/ ...) in a folder in wav extension. After running meyda proccess on one of the phoneme, it creates 4 to 5 frames per phoneme, where each frame contain the mfccs each of first 12 values. Due to less than 10 reputation, post images are disabled. but you can the image on the following link. The image contain 7 frames of phoneme /b/. each frame includes 13 mfccs. The Red long vertical line value is 438, others or 48, 38 etc.
http://realnfo.com/images/b.png
My question is that whether i need to save these frames(mfccs) in the database as predefined phoneme for /b/ and the same i do for all the other phonemes and then tie the microphone, meyda will extract the mfccs per frame, and i will programmed the javascript that the extracted frame mfcc will be matched with the predefined frames mfccs by using Dynamic Time Warping. And at the end will get the smallest distance for specific phoneme.
The proffesional way after mfccs are HMM and GMM but i dont know how to deal with. i studied so many documents about HMM and GMM but waste.

co-author of Meyda here.
That seems like a pretty difficult use case. If you already know how to split the buffers up into phonemes, you can run the MFCC extraction on those buffers, and use k Nearest Neighbour (or some better classification algorithm) for what I would imagine would be reasonable success rate.
A rough sketch:
const Meyda = require('meyda');
// I can't find a real KNN library because npm is down.
// I'm just using this as a placeholder for a real one.
const knn = require('knn');
// dataset should be a collection of labelled mfcc sets
const nearestPhoneme = knn(dataset);
const buffer = [...]; // a buffer containing a phoneme
let nearestPhonemes = []; // an array to store your phoneme matches
for(let i = 0; i < buffer.length; i += Meyda.bufferSize) {
nearestPhonemes.push(nearestPhoneme(Meyda.extract('mfcc', buffer)));
}
After this for loop, nearestPhonemes contains an array of the best guesses for phonemes for each frame of the audio. You could then pick the most commonly occurring phoneme in that array (the mode). I would also imagine that averaging the mfccs across the whole frame may yield a more robust result. It's certainly something you'll have to play around with and experiment with to find the most optimal solution.
Hope that helps! If you open source your code, I would love to see it.

Related

Isolate ultrasounds with Web Audio API

There is any algorithm that I can use with Web Audio Api to isolate ultrasounds?
I've tried 'highpass' filters but I need to isolate sounds that are ONLY ultrasounds (horizontal lines) and ignore noises that are also sounding at lower audible frequencies (vertical lines).
var highpass = audioContext.createBiquadFilter();
highpass.type = 'highpass';
highpass.frequency.value = 17500;
highpass.gain.value = -1
Here's a test with a nice snippet from http://rtoy.github.io/webaudio-hacks/more/filter-design/filter-design.html of how the spectrum of audible noise interferes with filtered ultrasound: (there are 2 canvas, one without the filter and one with the filter https://jsfiddle.net/6gnyhvrk/3
Without filters:
With 17.500 highpass filter:

A highpass filter is what you want, but there are a few things to consider. First, the audio context has to have a high enough sample rate. Second, you have to decide what "ultrasound" means. Many people can hear frequencies above 15 kHz (as in your example). A single highpass filter may not have a sharp enough cutoff for you so you'll need to have a more complicated filter setup.

Random IDs in JavaScript

I'm generating random IDs in javascript which serve as unique message identifiers for an analytics suite.
When checking the data (more than 10MM records), there are some minor collisions for some IDs for various reasons (network retries, robots faking data etc), but there is one in particular which has an intriguing number of collisions: akizow-dsrmr3-wicjw1-3jseuy.
The collision rate for the above id is at around 0.0037% while the rate for the other id collisions is under 0.00035% (10 times less) out of a sample of 111MM records from the same day. While the other ids are varying from day to day, this one remains the same, so for a longer period the difference is likely larger than 10x.
This is how the distribution of the top ID collisions looks like
This is the algorithm used to generate the random IDs:
function generateUUID() {
return [
generateUUID4(), generateUUID4(), generateUUID4(), generateUUID4()
].join("-");
}
function generateUUID4() {
return Math.abs(Math.random() * 0xFFFFFFFF | 0).toString(36);
}
I reversed the algorithm and it seems like for akizow-dsrmr3-wicjw1-3jseuy the browser's Math.random() is returning the following four numbers in this order: 0.1488114111471948, 0.19426893796638328, 0.45768366415465334, 0.0499740378116197, but I don't see anything special about them. Also, from the other data I collected it seems to appear especially after a redirect/preload (e.g. google results, ad clicks etc).
So I have 3 hypotheses:
There's a statistical problem with the algorithm that causes this specific collision
Redirects/preloads are somehow messing with the seed of the pseudo-random generator
A robot is smart enough that it fakes all the other data but for some reason is keeping the random id the same. The data comes from different user agents, IPs, countries etc.
Any idea what could cause this collision?

Memory usage of arrays

I'm developing a CSV parser that should be able to deal with huge datasets (read 10 million rows) in the browser.
Basically the parser works as follows:
Main thread reads chunk of 20MB, otherwise the browser would crash quickly. After that, sends the chunk of data read to one of the workers.
The worker receives the data and discards the columns I don't want and saves the ones I want. Normally I only want 4-5 columns out of 20-30.
The worker sends the processed data back to the main thread.
The main thread receives the data and saves it in the data array.
Repeat steps 1-4 until file is done.
At the end with the dataset (crimes city of chicago), I end up with an array that has inside of it 71 other arrays and each of these arrays contains +/- 90K elements. Each of these 90K elements contains 5 strings (columns that were taken from the read file). Namely latitude, longitude, year, block and IUCR.
Summarizing, 71 is the number of chunks of 20MB in the dataset, 90K is the number of rows in each chunk of 20MB and 5 is the columns that were extracted.
I noticed that the browser (Chrome) was using too much memory, so I tried in 4 different browsers (Chrome, Opera, Vivaldi and Firefox), and recorded the memory used by the tab.
Chrome - 1.76GB
Opera - 1.76GB
Firefox - 1.3GB
Vivaldi - 1GB
If I try to recreate the same array but with mock data, it only uses approx. 350MB of memory.:
var data = [];
for(let i = 0; i < 71; i++){
let rows = [];
for(let j = 0; j < 90*1000; j++){
rows.push(["029XX W MADISON ST", "2027", "-87.698850575", "2001", "41.880939487"])
}
data.push(rows);
}
I understand that if the array is static, as seen in the code above, it's easier to perform better than the dynamic case. But I wasn't expecting to use 5 times more memory for the same quantity of data.
There's anything I can do to use less memory on the parser?

Basically to use less memory one can use some techniques.
First, columns of the CSV that contain numbers should be converted and used as such. Since numbers in Javascript take 8 bytes but the same number as a string can take much more space (2 bytes per char).
Another thing is to terminate all workers when the job is done.

Web Audio synthesis: how to handle changing the filter cutoff during the attack or release phase?

I'm building an emulation of the Roland Juno-106 synthesizer using WebAudio. The live WIP version is here.
I'm hung up on how to deal with updating the filter if the cutoff frequency or envelope modulation amount are changed during the attack or release while the filter is simultaneously being modulated by the envelope. That code is located around here. The current implementation doesn't respond the way an analog synth would, but I can't quite figure out how to calculate it.
On a real synth the filter changes immediately as determined by the frequency cutoff, envelope modulation amount, and current stage in the envelope, but the ramp up or down also continues smoothly.
How would I model this behavior?

Brilliant project!
You don't need to sum these yourself - Web Audio AudioParams sum their inputs, so if you have a potentially audio-rate modulation source like an LFO (an OscillatorNode connected to a GainNode), you simply connect() it to the AudioParam.
This is the key here - that AudioParams are able to be connect()ed to - and multiple input connections to a node or AudioParam are summed. So you generally want a model of
filter cutoff = (cutoff from envelope) + (cutoff from mod/LFO) + (cutoff from cutoff knob)
Since cutoff is a frequency, and thus on a log scale not a linear one, you want to do this addition logarithmically (otherwise, an envelope that boosts the cutoff up an octave at 440Hz will only boost it half an octave at 880Hz, etc.) - which, luckily, is easy to do via the "detune" parameter on a BiquadFilter.
Detune is in cents (1200/octave), so you have to use gain nodes to adjust values (e.g. if you want your modulation to have a +1/-1 octave range, make sure the oscillator output is going between -1200 and +1200). You can see how I do this bit in my Web Audio synthesizer (https://github.com/cwilso/midi-synth): in particular, check out synth.js starting around line 500: https://github.com/cwilso/midi-synth/blob/master/js/synth.js#L497-L519. Note the modFilterGain.connect(this.filter1.detune); in particular.
You don't want to be setting ANY values directly for modulation, since the actual value will change at a potentially fast rate - you want to use the parameter scheduler and input summing from an LFO. You can set the knob value as needed in terms of time, but it turns out that setting .value will interact poorly with setting scheduled values on the same AudioParam - so you'll need to have a separate (summed) input into the AudioParam. This is the tricky bit, and to be honest, my synth does NOT do this well today (I should change it to the approach described below).
The right way to handle the knob setting is to create an audio channel that varies based on your knob setting - that is, it's an AudioNode that you can connect() to the filter.detune, although the sample values produced by that AudioNode are only positive, and only change values when the knob is changed. To do this, you need a DC offset source - that is, an AudioNode that produces a stream of constant sample values. The simplest way I can think of to do this is to use an AudioBufferSourceNode with a generated buffer of 1:
function createDCOffset() {
var buffer=audioContext.createBuffer(1,1,audioContext.sampleRate);
var data = buffer.getChannelData(0);
data[0]=1;
var bufferSource=audioContext.createBufferSource();
bufferSource.buffer=buffer;
bufferSource.loop=true;
bufferSource.start(0);
return bufferSource;
}
Then, just connect that DCOffset into a gain node, and connect your "knob" to that gain's .value to use the gain node to scale the values (remember, there are 1200 cents in an octave, so if you want your knob to represent a six-octave cutoff range, the .value should go between zero and 7200). Then connect() the DCOffsetGain node into the filter's .detune (it sums with, rather than replacing, the connection from the LFO, and also sums with the scheduled values on the AudioParam (remember you'll need to scale the scheduled values in cents, too)). This approach, BTW, makes it easy to flip the envelope polarity too (that VCF ENV switch on the Juno 106) - just invert the values you set in the scheduler.
Hope this helps. I'm a bit jetlagged at the moment, so hopefully this was lucid. :)

Multi-tempo/meter js DAW

Has anyone implemented a javascript audio DAW with multiple tempo and meter change capabilities like most of the desktop daws (pro tools, sonar, and the like)? As far as I can tell, claw, openDAW, and web audio editor don't do this. Drawing a grid meter, converting between samples and MBT time, and rendering waveforms is easy when the tempo and meter do not change during the project, but when they do it gets quite a bit more complicated. I'm looking for any information on how to accomplish something like this. I'm aware that the source for Audacity is available, but I'd love to not have to dig through an enormous pile of code in a language I'm not an expert in to figure this out.

web-based DAW solutions exists.web-based DAW's are seen as SaaS(Software as a Service) applications.
They are lightweight and contain basic fundamental DAW features.
For designing rich client applications(RCA) you should take a look at GWT and Vaadin.
I recommend GWT because it is mature and has reusable components and its also AJAX driven.
Also here at musicradar site they have listed nine different browser based audio workstations.you can also refer to popcorn maker which is entirely javascript code.You can get some inspiration from there to get started.

You're missing the last step, which will make it easier.
All measures are relative to fractions of minutes, based on the time-signature and tempo.
The math gets a little more complex, now that you can't just plot 4/4 or 6/8 across the board and be done with it, but what you're looking at is running an actual time-line (whether drawn onscreen or not), and then figuring out where each measure starts and ends, based on either the running sum of a track's current length (in minutes/seconds), or based on the left-most take's x-coordinate (starting point) + duration...
or based on the running total of each measure's length in seconds, up to the current beat you care about.
var measure = { beats : 4, denomination : 4, tempo : 80 };
Given those three data-points, you should be able to say:
var measure_length = SECONDS_PER_MINUTE / measure.tempo * measure.beats;
Of course, that's currently in seconds. To get it in ms, you'd just use MS_PER_MINUTE, or whichever other ratio of minutes you'd want to measure by.
current_position + measure_length === start_of_next_measure;
You've now separated out each dimension required to allow you to calculate each measure on the fly.
Positioning each measure on the track, to match up with where it belongs on the timeline is as simple as keeping a running tally of where X is (the left edge of the measure) in ms (really in screen-space and project-coordinates, but ms can work fine for now).
var current_position = 0,
current_tempo = 120,
current_beats = 4,
current_denomination = 4,
measures = [ ];
measures.forEach(function (measure) {
if (measure.tempo !== current_tempo) {
/* draw tempo-change, set current_tempo */
/* draw time-signature */
}
if (measure.beats !== current_beats ||
measure.denomination !== current_denomination) {
/* set changes, draw time-signature */
}
draw_measure(measure, current_position);
current_position = MS_PER_MINUTE / measure.beats * measure.tempo;
});
Drawing samples just requires figuring out where you're starting from, and then sticking to some resolution (MS/MS*4/Seconds).
The added benefit of separating out the calculation of the time is that you can change the resolution of your rendering on the fly, by changing which time-scale you're comparing against (ms/sec/min/etc), so long as you re-render the whole thing, after scaling.
The rabbit hole goes deeper (for instance, actual audio tracks don't really care about measures/beats, though quantization-processes do), so to write a non-destructive, non-linear DAW, you can just set start-time and duration properties on views into your audio-buffer (or views into view-buffers of your audio buffer).
Those views would be the non-destructive windows that you can resize and drag around your track.
Then there's just the logic of figuring out snaps -- what your screen-space is, versus project-space, and when you click on a track's clip, which measure, et cetera, you're in, to do audio-snapping on resize/move.
Of course, to do a 1:1 recreation of ProTools in JS in the browser would not fly (gigs of RAM for one browser tab won't do, media capture API is still insufficient for multi-tracking, disk-writes are much, much more difficult in browser than in C++, in your OS of choice, et cetera), but this should at least give you enough to run with.
Let me know if I'm missing something.

Develop Reference

JavaScript is the programming language of the Web.

Next steps to do with the mfccs, in voice recognition web based - javascript

Related

Isolate ultrasounds with Web Audio API

Random IDs in JavaScript

Memory usage of arrays

Web Audio synthesis: how to handle changing the filter cutoff during the attack or release phase?

Multi-tempo/meter js DAW

Categories

Resources