Volume velocity to gain web audio - javascript

I'm trying to set a velocity value, a value that is in a midi signal to gain. The velocity ranges from 0 to 127.
The documentation on the web audio api albeit well done, doesn't really say anything about this.
At the moment I've this to play sounds :
play(key, startTime) {
this.audioContext.decodeAudioData(this.soundContainer[key], (buffer) => {
let source = this.audioContext.createBufferSource();
source.buffer = buffer;
source.connect(this.audioContext.destination);
source.start(startTime);
});
}
I didn't find anything to use the velocity values that range from 0 to 127. However I found gain node that applies a gain.
So my function is now this:
play(key:string, startTime, velocity) {
this.audioContext.decodeAudioData(this.soundContainer[key], (buffer) => {
let source = this.audioContext.createBufferSource();
source.buffer = buffer;
source.connect(this.gainNode);
this.gainNode.connect(this.audioContext.destination);
this.gainNode.gain.value = velocity;
source.start(startTime);
});
}
Eehhh... if I apply the midi velocity value to the gain, I obviously have a sound that is insanely loud. So I'd like to know either of those two questions:
Can I somehow use the velocity value directly ?
How can I convert the velocity value to gain ?

The MIDI specification says:
Interpretation of the Velocity byte is left up to the receiving instrument. Generally, the larger the numeric value of the message, the stronger the velocity-controlled effect. If velocity is applied to volume (output level) for instance, then higher Velocity values will generate louder notes. A value of 64 (40H) would correspond to a mezzo-forte note […] Preferably, application of velocity to volume should be an exponential function.
The General MIDI specifications are not any more concrete.
The DLS Level 1 specification says:
The MIDI Note Velocity value is converted to attenuation in dB by the Concave Transform according to the following formula:
attendB = 20 × log10(1272 / Velocity2)
and fed to control either the volume or envelope generator peak level.
You then have to map this attenuation to the gain factor, i.e., gain = velocity² / 127².
And many hardware synthesizers allow to select different curves to map the velocity to volume.

I don't know if it is correct, because I don't know that much about sound but this seem to work:
this.gainNode.gain.value = velocity / 100 ;
So a velocity of 127 = a gain of 1.27
Ultimately I think what is better is dividing 1 in 127 values and each of those correspond to their respective midi value. However code is easier this way so yeah, it works.

Related

WebAudio dB visualization not reflecting frequency bands as expected

I've built a system-audio setup with WebAudio in an Angular component. It works well only the bands do not seem to reflect frequency accurately.
See here a test for high, mid, low tone test.
I've gotten pretty far with the native API, accessing a media stream etc. but it's not as helpful as a utility as I thought it would be...
Question:
How would we get the most accurate frequency decibel data?
All sounds seem to be focused in the first 3 bands.
Here is the method which visualizes the media stream (full code on [Github][2])
private repeater() {
this._AFID = requestAnimationFrame(() => this.frameLooper());
// how many values from analyser (the "buffer" size)
this._fbc = this._analyser.frequencyBinCount;
// frequency data is integers on a scale from 0 to 255
this._data = new Uint8Array(this._analyser.frequencyBinCount);
this._analyser.getByteFrequencyData(this._data);
let bandsTemp = [];
// calculate the height of each band element using frequency data
for (var i = 0; i < this._fbc; i++) {
bandsTemp.push({ height: this._data[i] });
}
this.bands = bandsTemp;
}
Boris Smus' Web Audio API book says:
If, however, we want to perform a comprehensive analysis of the whole
audio buffer, we should look to other methods...
Perhaps this method is as good as it gets. What is a better method for more functional frequency analysis?
Thanks for the example in https://stackblitz.com/edit/angular-mediastream-device?file=src%2Fapp%2Fsys-sound%2Fsys-sound.component.ts. You're right that it doesn't work in chrome, but if I use the link to open it in a new window, everything is right.
So, I think you're computing the labels for the graph incorrectly. I assuming they're supposed to represent the frequency of the band. If not, then this answer is wrong.
You have fqRange = sampleRate / bands. Let's assume that sampleRate = 48000 (to keep the numbers simple), and bands = 16. Then fqRange = 3000. First I think you really want either sampleRate/2/bands or sampleRate / fftSize, which is the same thing.
So each of the frequency bins is 1500 Hz wide. Your labels should be 1500*k, for k = 0 to 15. (Although there's more than one way to label these, this is the easiest.) This will cover the range from 0 to 24000 Hz.
And when I play a 12 kHz tone, I see the peak is aroudn 1552 in your code. But with the new labeling, this is the 8'th bin, so 1500*8 = 12000. (Well, there are some differences. My sampleRate is actually 44.1 kHz, so the numbers computed above will be different.)

Multiplayer Game - Client Interpolation Calculation?

I am creating a Multiplayer game using socket io in javascript. The game works perfectly at the moment aside from the client interpolation. Right now, when I get a packet from the server, I simply set the clients position to the position sent by the server. Here is what I have tried to do:
getServerInfo(packet) {
var otherPlayer = players[packet.id]; // GET PLAYER
otherPlayer.setTarget(packet.x, packet.y); // SET TARGET TO MOVE TO
...
}
So I set the players Target position. And then in the Players Update method I simply did this:
var update = function(delta) {
if (x != target.x || y != target.y){
var direction = Math.atan2((target.y - y), (target.x - x));
x += (delta* speed) * Math.cos(direction);
y += (delta* speed) * Math.sin(direction);
var dist = Math.sqrt((x - target.x) * (x - target.x) + (y - target.y)
* (y - target.y));
if (dist < treshhold){
x = target.x;
y = target.y;
}
}
}
This basically moves the player in the direction of the target at a fixed speed. The issue is that the player arrives at the target either before or after the next information arrives from the server.
Edit: I have just read Gabriel Bambettas Article on this subject, and he mentions this:
Say you receive position data at t = 1000. You already had received data at t = 900, so you know where the player was at t = 900 and t = 1000. So, from t = 1000 and t = 1100, you show what the other player did from t = 900 to t = 1000. This way you’re always showing the user actual movement data, except you’re showing it 100 ms “late”.
This again assumed that it is exactly 100ms late. If your ping varies a lot, this will not work.
Would you be able to provide some pseudo code so I can get an Idea of how to do this?
I have found this question online here. But none of the answers provide an example of how to do it, only suggestions.
I'm completely fresh to multiplayer game client/server architecture and algorithms, however in reading this question the first thing that came to mind was implementing second-order (or higher) Kalman filters on the relevant variables for each player.
Specifically, the Kalman prediction steps which are much better than simple dead-reckoning. Also the fact that Kalman prediction and update steps work somewhat as weighted or optimal interpolators. And futhermore, the dynamics of players could be encoded directly rather than playing around with abstracted parameterizations used in other methods.
Meanwhile, a quick search led me to this:
An improvement of dead reckoning algorithm using kalman filter for minimizing network traffic of 3d on-line games
The abstract:
Online 3D games require efficient and fast user interaction support
over network, and the networking support is usually implemented using
network game engine. The network game engine should minimize the
network delay and mitigate the network traffic congestion. To minimize
the network traffic between game users, a client-based prediction
(dead reckoning algorithm) is used. Each game entity uses the
algorithm to estimates its own movement (also other entities'
movement), and when the estimation error is over threshold, the entity
sends the UPDATE (including position, velocity, etc) packet to other
entities. As the estimation accuracy is increased, each entity can
minimize the transmission of the UPDATE packet. To improve the
prediction accuracy of dead reckoning algorithm, we propose the Kalman
filter based dead reckoning approach. To show real demonstration, we
use a popular network game (BZFlag), and improve the game optimized
dead reckoning algorithm using Kalman filter. We improve the
prediction accuracy and reduce the network traffic by 12 percents.
Might seem wordy and like a whole new problem to learn what it's all about... and discrete state-space for that matter.
Briefly, I'd say a Kalman filter is a filter that takes into account uncertainty, which is what you've got here. It normally works on measurement uncertainty at a known sample rate, but it could be re-tooled to work with uncertainty in measurement period/phase.
The idea being that in lieu of a proper measurement, you'd simply update with the kalman predictions. The tactic is similar to target tracking applications.
I was recommended them on stackexchange myself - took about a week to figure out how they were relevant but I've since implemented them successfully in vision processing work.
(...it's making me want to experiment with your problem now !)
As I wanted more direct control over the filter, I copied someone else's roll-your-own implementation of a Kalman filter in matlab into openCV (in C++):
void Marker::kalmanPredict(){
//Prediction for state vector
Xx = A * Xx;
Xy = A * Xy;
//and covariance
Px = A * Px * A.t() + Q;
Py = A * Py * A.t() + Q;
}
void Marker::kalmanUpdate(Point2d& measuredPosition){
//Kalman gain K:
Mat tempINVx = Mat(2, 2, CV_64F);
Mat tempINVy = Mat(2, 2, CV_64F);
tempINVx = C*Px*C.t() + R;
tempINVy = C*Py*C.t() + R;
Kx = Px*C.t() * tempINVx.inv(DECOMP_CHOLESKY);
Ky = Py*C.t() * tempINVy.inv(DECOMP_CHOLESKY);
//Estimate of velocity
//units are pixels.s^-1
Point2d measuredVelocity = Point2d(measuredPosition.x - Xx.at<double>(0), measuredPosition.y - Xy.at<double>(0));
Mat zx = (Mat_<double>(2,1) << measuredPosition.x, measuredVelocity.x);
Mat zy = (Mat_<double>(2,1) << measuredPosition.y, measuredVelocity.y);
//kalman correction based on position measurement and velocity estimate:
Xx = Xx + Kx*(zx - C*Xx);
Xy = Xy + Ky*(zy - C*Xy);
//and covariance again
Px = Px - Kx*C*Px;
Py = Py - Ky*C*Py;
}
I don't expect you to be able to use this directly though, but if anyone comes across it and understand what 'A', 'P', 'Q' and 'C' are in state-space (hint hint, state-space understanding is a pre-req here) they'll likely see how connect the dots.
(both matlab and openCV have their own Kalman filter implementations included by the way...)
This question is being left open with a request for more detail, so I’ll try to fill in the gaps of Patrick Klug’s answer. He suggested, reasonably, that you transmit both the current position and the current velocity at each time point.
Since two position and two velocity measurements give a system of four equations, it enables us to solve for a system of four unknowns, namely a cubic spline (which has four coefficients, a, b, c and d). In order for this spline to be smooth, the first and second derivatives (velocity and acceleration) should be equal at the endpoints. There are two standard, equivalent ways of calculating this: Hermite splines (https://en.wikipedia.org/wiki/Cubic_Hermite_spline) and Bézier splines (http://mathfaculty.fullerton.edu/mathews/n2003/BezierCurveMod.html). For a two-dimensional problem such as this, I suggested separating variables and finding splines for both x and y based on the tangent data in the updates, which is called a clamped piecewise cubic Hermite spline. This has several advantages over the splines in the link above, such as cardinal splines, which do not take advantage of that information. The locations and velocities at the control points will match, you can interpolate up to the last update rather than the one before, and you can apply this method just as easily to polar coordinates if the game world is inherently polar like Space wars. (Another approach sometimes used for periodic data is to perform a FFT and do trigonometric interpolation in the frequency domain, but that doesn’t sound applicable here.)
What originally appeared here was a derivation of the Hermite spline using linear algebra in a somewhat unusual way that (unless I made a mistake entering it) would have worked. However, the comments convinced me it would be more helpful to give the standard names for what I was talking about. If you are interested in the mathematical details of how and why this works, this is a better explanation: https://math.stackexchange.com/questions/62360/natural-cubic-splines-vs-piecewise-hermite-splines
A better algorithm than the one I gave is to represent the sample points and first derivatives as a tridiagonal matrix that, multiplied by a column vector of coefficients, produces the boundary conditions, and solve for the coefficients. An alternative is to add control points to a Bézier curve where the tangent lines at the sampled points intersect and on the tangent lines at the endpoints. Both methods produce the same, unique, smooth cubic spline.
One situation you might be able to avoid if you were choosing the points rather than receiving updates is if you get a bad sample of points. You can’t, for example, intersect parallel tangent lines, or tell what happened if it’s back in the same place with a nonzero first derivative. You’d never choose those points for a piecewise spline, but you might get them if an object made a swerve between updates.
If my computer weren’t broken right now, here is where I would put fancy graphics like the ones I posted to TeX.SX. Unfortunately, I have to bow out of those for now.
Is this better than straight linear interpolation? Definitely: linear interpolation will get you straight- line paths, quadratic splines won't be smooth, and higher-order polynomials will likely be overfitted. Cubic splines are the standard way to solve that problem.
Are they better for extrapolation, where you try to predict where a game object will go? Possibly not: this way, you’re assuming that a player who’s accelerating will keep accelerating, rather than that they will immediately stop accelerating, and that could put you much further off. However, the time between updates should be short, so you shouldn’t get too far off.
Finally, you might make things a lot easier on yourself by programming in a bit more conservation of momentum. If there’s a limit to how quickly objects can turn, accelerate or decelerate, their paths will not be able to diverge as much from where you predict based on their last positions and velocities.
Depending on your game you might want to prefer smooth player movement over super-precise location. If so, then I'd suggest to aim for 'eventual consistency'. I think your idea of keeping 'real' and 'simulated' data-points is a good one. Just make sure that from time to time you force the simulated to converge with the real, otherwise the gap will get too big.
Regarding your concern about different movement speed I'd suggest you include the current velocity and direction of the player in addition to the current position in your packet. This will enable you to more smoothly predict where the player would be based on your own framerate/update timing.
Essentially you would calculate the current simulated velocity and direction taking into account the last simulated location and velocity as well as last known location and velocity (put more emphasis on the second) and then simulate new position based on that.
If the gap between simulated and known gets too big, just put more emphasis on the known location and the otherPlayer will catch up quicker.

Web Audio synthesis: how to handle changing the filter cutoff during the attack or release phase?

I'm building an emulation of the Roland Juno-106 synthesizer using WebAudio. The live WIP version is here.
I'm hung up on how to deal with updating the filter if the cutoff frequency or envelope modulation amount are changed during the attack or release while the filter is simultaneously being modulated by the envelope. That code is located around here. The current implementation doesn't respond the way an analog synth would, but I can't quite figure out how to calculate it.
On a real synth the filter changes immediately as determined by the frequency cutoff, envelope modulation amount, and current stage in the envelope, but the ramp up or down also continues smoothly.
How would I model this behavior?
Brilliant project!
You don't need to sum these yourself - Web Audio AudioParams sum their inputs, so if you have a potentially audio-rate modulation source like an LFO (an OscillatorNode connected to a GainNode), you simply connect() it to the AudioParam.
This is the key here - that AudioParams are able to be connect()ed to - and multiple input connections to a node or AudioParam are summed. So you generally want a model of
filter cutoff = (cutoff from envelope) + (cutoff from mod/LFO) + (cutoff from cutoff knob)
Since cutoff is a frequency, and thus on a log scale not a linear one, you want to do this addition logarithmically (otherwise, an envelope that boosts the cutoff up an octave at 440Hz will only boost it half an octave at 880Hz, etc.) - which, luckily, is easy to do via the "detune" parameter on a BiquadFilter.
Detune is in cents (1200/octave), so you have to use gain nodes to adjust values (e.g. if you want your modulation to have a +1/-1 octave range, make sure the oscillator output is going between -1200 and +1200). You can see how I do this bit in my Web Audio synthesizer (https://github.com/cwilso/midi-synth): in particular, check out synth.js starting around line 500: https://github.com/cwilso/midi-synth/blob/master/js/synth.js#L497-L519. Note the modFilterGain.connect(this.filter1.detune); in particular.
You don't want to be setting ANY values directly for modulation, since the actual value will change at a potentially fast rate - you want to use the parameter scheduler and input summing from an LFO. You can set the knob value as needed in terms of time, but it turns out that setting .value will interact poorly with setting scheduled values on the same AudioParam - so you'll need to have a separate (summed) input into the AudioParam. This is the tricky bit, and to be honest, my synth does NOT do this well today (I should change it to the approach described below).
The right way to handle the knob setting is to create an audio channel that varies based on your knob setting - that is, it's an AudioNode that you can connect() to the filter.detune, although the sample values produced by that AudioNode are only positive, and only change values when the knob is changed. To do this, you need a DC offset source - that is, an AudioNode that produces a stream of constant sample values. The simplest way I can think of to do this is to use an AudioBufferSourceNode with a generated buffer of 1:
function createDCOffset() {
var buffer=audioContext.createBuffer(1,1,audioContext.sampleRate);
var data = buffer.getChannelData(0);
data[0]=1;
var bufferSource=audioContext.createBufferSource();
bufferSource.buffer=buffer;
bufferSource.loop=true;
bufferSource.start(0);
return bufferSource;
}
Then, just connect that DCOffset into a gain node, and connect your "knob" to that gain's .value to use the gain node to scale the values (remember, there are 1200 cents in an octave, so if you want your knob to represent a six-octave cutoff range, the .value should go between zero and 7200). Then connect() the DCOffsetGain node into the filter's .detune (it sums with, rather than replacing, the connection from the LFO, and also sums with the scheduled values on the AudioParam (remember you'll need to scale the scheduled values in cents, too)). This approach, BTW, makes it easy to flip the envelope polarity too (that VCF ENV switch on the Juno 106) - just invert the values you set in the scheduler.
Hope this helps. I'm a bit jetlagged at the moment, so hopefully this was lucid. :)

Web Audio frequency limitation?

My goal is to generate an audio at a certain frequency and then check at what frequency it is using the result of FFT.
function speak() {
gb.src = gb.ctx.createOscillator();
gb.src.connect(gb.ctx.destination);
gb.src.start(gb.ctx.currentTime);
gb.src.frequency.value = 1000;
}
function listen() {
navigator.getUserMedia = (navigator.getUserMedia
|| navigator.webkitGetUserMedia || navigator.mozGetUserMedia || navigator.msGetUserMedia);
navigator.getUserMedia({
audio : true,
video : false
}, function(stream) {
gb.stream = stream;
var input = gb.ctx.createMediaStreamSource(stream);
gb.analyser = gb.ctx.createAnalyser();
gb.analyser.fftSize = gb.FFT_SIZE;
input.connect(gb.analyser);
gb.freqs = new Uint8Array(gb.analyser.frequencyBinCount);
setInterval(detect, gb.BIT_RATE / 2);
}, function(err) {
console.log('The following gUM error occured: ' + err);
});
}
See working example at http://codepen.io/Ovilia/full/hFtrA/ . You may need to put your microphone near the speaker to see the effect.
The problem is, when the frequency is somewhere larger than 15000 (e.g. 16000), there seems not to be any response at high frequency area any more.
Is there any limit of frequency with Web Audio, or is it the limit of my device?
What is the unit of each element when I get from getByteFrequencyData?
Is there any limit of frequency with Web Audio, or is it the limit of my device?
I don't think the WebAudio framework itself limits this. Like the other answers have mentioned here. The limit is probably from microphone's and loudspeaker's physical limits.
I tried to use the my current bookshelf loudspeaker (Kurzweil KS40A) I have along with a decent microphone (Zoom H4). The microphone was about 1 cm from the tweeter.
As you see, with these loudspeakers and microphones aren't able to effeciently generate/capture sounds at those frequencies.
This is more obvious when you look at the Zoom H4's frequency response. Unfortunately I couldn't find a frequency respose for the KS40a.
You can also do something similar using non browser tools to check if you see similar results.
What is the unit of each element when I get from getByteFrequencyData?
The unit of each element from getByteFrequencyData is a normalized magnitude data of from the FFT scaled to fit the dBFS range of the maxDecibles and minDecibles attributes on the AnalyserNode. So a byte value 0 would imply minDecibles (default is -100dBFS) or lower and a byte value of 255 would imply maxDecibles (default is -30dBFS) or higher.
Lookup the concept of Nyquist Frequency - the default sampling rate of web audio is 44.1kHz - this means the theoretical maximum frequency would be 22050 hertz given perfect hardware such as microphone and analog-to-digital converter inside your computer. #Ovilia on that same computer using same microphone record the same input sound and then examine the audio file using a utility like Audacity where you can view the output of its FFT analysis - in Audacity when you open an audio file go to menu Analyze -> Plot Spectrum ... also to see a very nice FFT view click the down arror near left side of waveform view subwindow and pick Spectrogram - another excellent FFT capable audio tool is called Sonic Visualizer - are you now seeing power at frequencies you are not seeing using FFT within web audio ?
I think that the most microphones just works well in the voice range frequency, something around 80 Hz to 1100 Hz
So probably do you have a hardware limit problem, try check with manufacturer or manual the frequency input response from your device !
There is probably an anti-alias low pass filter (between the microphone and the ADC) which has a cut-off below Fs/2 in order to make sure everything is rolled off by that frequency (given a finite filter transition width).
There may also be nulls in the room's acoustics. At frequencies above 2 Khz, it might be only inches from a peak to a null location for the microphone placement.

Which format is returned from the fft with WebAudioAPI

I visualized an audiofile with WebAudioAPI and with Dancer.js. All works well but the visualizations looks very different. Can anybody help me to find out why it looks so different?
The Web-Audio-API code (fft.php, fft.js)
The dancer code (plugins/dancer.fft.js, js/playerFFT.js, fft.php)
The visualization for WebAudioAPI is on:
http://multimediatechnology.at/~fhs32640/sem6/WebAudio/fft.html
For Dancer is on
http://multimediatechnology.at/~fhs32640/sem6/Dancer/fft.php
The difference is in how the volumes at the frequencies are 'found'. Your code uses the analyser, which takes the values and also does some smoothing, so your graph looks nice. Dancer uses a scriptprocessor. The scriptprocessor fires a callback every time a certain sample length has gone through, and it passes that sample to e.inputBuffer. Then it just draws that 'raw' data, no smoothing applied.
var
buffers = [],
channels = e.inputBuffer.numberOfChannels,
resolution = SAMPLE_SIZE / channels,
sum = function (prev, curr) {
return prev[i] + curr[i];
}, i;
for (i = channels; i--;) {
buffers.push(e.inputBuffer.getChannelData(i));
}
for (i = 0; i < resolution; i++) {
this.signal[i] = channels > 1 ? buffers.reduce(sum) / channels : buffers[0][i];
}
this.fft.forward(this.signal);
this.dancer.trigger('update');
This is the code that Dancer uses to get the sound strength at the frequencies.
(this can be found in adapterWebAudio.js).
Because one is simply using the native frequency data provided by the Web Audio API using analyser.getByteFrequencyData().
The other doing its own calculation by using a ScriptProcessorNode and then when that node's onaudioprocess event fires, they take the channel data from the input buffer and convert that to a frequency domain spectra by performing a forward transform on it and then calculating the Discrete Fourier Transform of the signal with the Fast Fourier Transform algorithm.
idbehold's answer is partially correct (smoothing is getting applied), but a bigger issue is that the Web Audio code is using getByteFrequencyData instead of getFloatFrequencyData. The "byte" version does processing to maximize the byte's range - it spreads minDb to maxDb across the 0-255 byte range.

Categories

Resources