How to convert a wavetable for use with `OscillatorNode.setPeriodicWave`? - javascript

I would like use a custom waveform with a WebAudio OscillatorNode. I'm new to audio synthesis and still struggle quite a lot with the mathematics (I can, at least, program).
The waveforms are defined as functions, so I have the function itself, and can sample the wave. However, the OscillatorNode.createPeriodicWave method requires two arrays (real and imag) that define the waveform in the frequency domain.
The AnalyserNode has FFT methods for computing an array (of bytes or floats) in the frequency domain, but it works with a signal from another node.
I cannot think of a way to feed a wavetable into the AnalyserNode correctly, but if I could, it only returns a single array, while OscillatorNode.createPeriodicWave requires two.
TLDR Starting with a periodic function, how do you compute the corresponding arguments for OscillatorNode.createPeriodicWave?

Since you have a periodic waveform defined by a function, you can compute the Fourier Series for this function. If the series has an infinite number of terms, you'll need to truncate it.
This is a bit of work, but this is exactly how the pre-defined Oscillator types are computed. For example, see the definition of the square wave for the OscillatorNode. The PeriodicWave coefficients for the square wave were computed in exactly this way.
If you know the bandwidth of your waveform, you can simplify the work a lot by not having to do the messy integrals. Just uniformly sample the waveform fast enough, and then use an FFT to get the coefficients you need for the PeriodicWave. Additional details on in the sampling theorem.
Or you can just assume that sample rate of the AudioContext (typically 44.1 kHz or 48 kHz) is high enough and just sample your waveform every 1/44100 or 1/48000 sec and compute the FFT of the resulting samples.

I just wrote an implementation of this. To use it, drag and drop the squares to form a waveform and then play the piano that appears afterwards. Watch the video in this tweet to see a use example. The live demo is in alpha version, so the code and UI are a little rough. You can check out the source here.
I didn't write any documentation, but I recorded some videos (Video 1) (Video 2) (Video 3) of me coding the project live. They should be pretty self-explanatory. There are a couple of bugs in there that I fixed later. For the working version, please refer to the github link.

Related

JavaScript: Detecting shape in canvas

I found out app called https://skinmotion.com/ and for learning purposes, I would like to create my own, web version of the app.
Web application work as follows. It asks user for permission to access camera. After that, video is caputer. Once every second, image is taken from the stream and processed. During this process, I look for soundwave patern in the image.
If the pattern is found, video recording stops and some action is executed.
Example of pattern - https://www.shutterstock.com/cs/image-vector/panorama-mini-earthquake-wave-on-white-788490724.
Idealy, it should work like with QR codes - even small qrcode is detected, it should not depend on rotation and scaling.
I am no computer vision expert, this field is fairly new to me. I need some help. Which is the best way to do this?
Should I train my own Tensorflow dataset and use tensorflow.js? Or is there easier and more "light weight" option?
My problem is, I could not find or come up with algorithm for processing the captured image to make as "comparable" as possible - scale it up, rotate, threshold to white and black colors, etc.
I hope that after this transformation, resemble.js could be used to compare "original" and "captured" image.
Thank you in advance.
With Deep Learning
If there are certain waves patterns to be recognized, a classification model can be written using tensorfow.js.
However if the model is to identify waves pattern in general, it can be more complex. An object detection model is to be used.
Without deep learning
Adding to the complexity, would be detecting the waveform and play an audio from it. In this latter case, the image can be read byte by byte. The wave graph is drawn with a certain color that is different from the background of the image. The area of interest can be identified and an array representing the wav form can be generated.
Then to play the audio from the array, it can be done as shown here

webaudio API: adjust play length of an audio sample (eg "C5.mp3")?

Can I use a (soundfont) sample e.g. "C5.mp3" and extend or shorten its duration for a given time (without distorting the pitch)?
(Would be great if this was as easy as using an oscillator and change the timings of NoteOn and NoteOff, but with a more natural sound rather than sine waves)? (Can that be done easily without having to resort to MIDI.js or similar?)
You would either need to loop the mp3 or record a longer sample. Looping can be tricky to make seamless without clicks or pops though, and depending on the sample you would hear the attack of the note each time it looped.
.mp3s and other formats of recorded audio are all finite, predetermined sets of binary data. The reason it's so easy to manipulate sine waves with an oscillator is because the web audio api is dynamically generating the wave based on the input you're giving it.
Good soundfonts have loop points for endless sounds. See example for WebAudioFont
https://surikov.github.io/webaudiofont/examples/flute.html

Fast implementation of an 8-point 1d DCT for modified JPEG decoder

I've been working on a custom video codec for use on the web. The custom codec will be powered by javascript and the html5 Canvas element.
There are several reasons for me wanting to do this that I will list at the bottom of this question, but first I want to explain what I have done so far and why I am looking for a fast DCT transform.
The main idea behind all video compression is that frames next to eachother share a large amount of similarities. So what I'm doing is I send the first frame compressed as a jpg. Then I send another Jpeg image that is 8 times as wide as the first frame holding the "differences" between the first frame and the next 8 frames after that.
This large Jpeg image holding the "differences" is much easier to compress because it only has the differences.
I've done many experiments with this large jpeg and I found out that when converted to a YCbCr color space the "chroma" channels are almost completely flat, with a few stand out exceptions. In other words there are few parts of the video that change much in the chroma channels, but some of the parts that do change are quite significant.
With this knowledge I looked up how JPEG compression works and saw that among other things it uses the DCT to compress each 8x8 block. This really interested me because I thought what if I could modify this so that it not only compresses "each" 8x8 block, but it also checks to see if the "next" 8x8 block is similar to the first one. If it is close enough then just send the first block and use the same data for both blocks.
This would increase both decoding speed, and improve bit rate transfer because there would be less data to work with.
I thought that this should be a simple task to accomplish. So I tried to build my own "modified" jpeg encoder/decoder. I built the RGB to YCbCr converter, I left "gzip" compression to do the huffman encoding and now the only main part I have left is to do the DCT transforms.
However this has me stuck. I can not find a fast 8 point 1d dct transform. I am looking for this specific transform because according to many articles I've read the 2d 8x8 dct transform can be separated into several 1x8 id transforms. This is the approach many implementations of jpeg use because it's faster to process.
So I figured that with Jpeg being such an old well known standard a fast 8 point 1d dct should just jump out at me, but after weeks of searching I have yet to find one.
I have found many algorithms that use the O(N^2) complexity approach. However that's bewilderingly slow. I have also found algorithms that use the Fast Fourier Transform and I've modifed them to compute the DCT. Such as the one in this link below:
https://www.nayuki.io/page/free-small-fft-in-multiple-languages
In theory this should have the "fast" complexity of O(Nlog2(n)) but when I run it it takes my i7 computer about 12 seconds to encode/decode the "modified" jpeg.
I don't understand why it's so slow? There are other javascript jpeg decoders that can do it much faster, but when I try to look through their source code I can't pull out which part is doing the DCT/IDCT transforms.
https://github.com/notmasteryet/jpgjs
The only thing I can think of is maybe the math behind the DCT has already been precomputed and is being stored in a lookup table or something. However I have looked hard on google and I can't find anything (that I understand at least) that talks about this.
So my question is where can I find/how can I build a fast way to compute an 8 point 1d dct transform for this "modified" jpeg encoder/decoder. Any help with this would be greatly appreciated.
Okay as for why I want to do this, the main reason is I want to have "interactive" video for mobile phones on my website. This can not be done because of things like iOS loading up it's "native" quick time player every time it starts playing a video. Also it's hard to make the transition to another point in time of the video seem "smooth" when you have such little control of how videos are rendered especially on mobile devices.
Thank you again very much for any help that anyone can provide!
So my question is where can I find/how can I build a fast way to compute an 8 point 1d dct transform for this "modified" jpeg encoder/decoder. Any help with this would be greatly appreciated.
take a look into the flash-world and the JPEG-encoder there (before it was inegrated into the Engine).
Here for example: http://www.bytearray.org/?p=1089 (sources provided) this code contains a function called fDCTQuant() that does the DCT, first for the rows, then for the columns, and then it quantifies the block (so basically there you have your 8x1 DCT).
So what I'm doing is I send the first frame compressed as a jpg. Then I send another Jpeg image ...
take a look at progressive JPEG. I think some of the things how this works, and how the data-stream is built will sound kind of familiar with this description (not the same, but they both go in related directions. imo)
what if I could modify this so that it not only compresses "each" 8x8 block, but it also checks to see if the "next" 8x8 block is similar to the first one. If it is close enough then just send the first block and use the same data for both blocks.
The expressions "similar" and "close enough" got my attention here. take a look at the usually used quantization-tables. you know, that a change of the value by 1 could easily result in a value-change of 15% brightness (for chroma-channels usually even more) of that point depending on the position in the 8x8-block and therefore the applied quantifier.
calculation with quantifier 40
(may be included in the set even at the lowest compression rates
at lower compression rates some quantifier can go up to 100 and beyond)
change the input by 1 changes the output by 40.
since we are working on 1byte value-range it's a change of 40/255
that is about 15% of the total possible range
So you should be really thoughtful what you call "close enough".
To sum this up: Well a Video-codec based on jpeg that utilizes differences between the frames to reduce the amount of data. That also sounde kind of familiar to me.
Got it: MPEG https://github.com/phoboslab/jsmpeg
*no connection to the referenced codes or the coder
I implemented separable integer 2D DCTs of various sizes (as well as other transforms) here: https://github.com/flanglet/kanzi/tree/master/java/src/kanzi/transform. The code is in Java but really for this kind of algorithm, it is pretty much the same in any language. The most interesting part IMO is the rescaling you do after computing each direction. Depending on your goals (max precision, 16 bit computation, no scaling, ...), you may want to change the scaling factors for each step. Using bigger blocks in areas where the image is very uniform saves bits.
This book shows how the DCT matrix can be factored to Gaussian Normal Form. That would be the fastest way to do a DCT.
http://www.amazon.com/Compressed-Image-File-Formats-JPEG/dp/0201604434/ref=pd_sim_14_1?ie=UTF8&dpID=41XJBED6RCL&dpSrc=sims&preST=_AC_UL160_SR127%2C160_&refRID=1Q0G2H5EFYCQW2TJCCJN

Generating events for gesture-controlled websites

I am very happy that I got the opportunity to work on a website that is gesture-based.
I have a few inspiration for this: link
I visited lot of websites and googled it, Wikipedia and gitHub also didn't help much. There is not much information provided as these technologies are in nascent stages.
I think I will have to use some js for this project
gesture.js (our custom javascript code)
reveal.js (Frame work for slideshow)
My questions are how come gestures generate events, how does my JavaScript interact with my webcam? Do I have to use some API or algorithms?
I am not asking for code. I am just asking the mechanism, or some links providing vital info will do. I seriously believe that if the accuracy on this technology can be improved, this technology can do wonders in the near future.
To enable gestural interactions in a web app, you can use navigator.getUserMedia() to get video from your local webcam, periodically put video frame data into a canvas element and then analyse changes between frames.
There are several JavaScript gesture libraries and demos available (including a nice slide controller). For face/head tracking you can use libraries like headtrackr.js: example at simpl.info/headtrackr.
I'm playing a little bit with that at the moment so, from what I understood
the most basic technique is:
you request to use the user's webcam to take a video.
when permission is given, create a canvas in which to put the video.
you use a filter (black and white) on the video.
you put some control points in the canvas frame (a small area in where all the pixel colors in it are registered)
you start attaching a function for each frame (for the purpose of the explanation, I'll only demonstrate left-right gestures)
At each frame:
If the frame is the first (F0) continue
If not: we subtract the current frame's pixels (Fn) from the previous one
if there were no movement between Fn and F(n-1) all the pixels will be black
if there are, you will see the difference Delta = Fn - F(n-1) as white pixels
Then you can test your control points for which areas are light up and store them
( ** )x = DeltaN
Repeat the same operations until you have two or more Deltas variables and you subtract the control points DeltaN from the control points Delta(n-1) and you'll have a vector
( **)x = DeltaN
( ** )x = Delta(N-1)
( +2 )x = DeltaN - Delta(N-1)
You can now test if the vector is either positive or negative, or test if the values are superior to some value of your choosing
if positive on x and value > 5
and trigger an event, then listen to it:
$(document).trigger('MyPlugin/MoveLeft', values)
$(document).on('MyPlugin/MoveLeft', doSomething)
You can greatly improve the precision by caching the vectors or adding them and only trigger an event when the vector values becomes a sensible value.
You can also expect a shape on your first subtractions and try to map a "hand" or a "box"
and listen to the changes of the shape's coordinates, but remember the gestures are in 3D and the analysis is 2D so the same shape can change while moving.
Here's a more precise explanation. Hope my explanation helped.

Monte-carlo methods in graphics

I was reading through this interesting post about using JavaScript to generate an image of a rose. However, I'm a bit confused, as this article claims that the author used monte carlo methods to reduce the code size.
It's my understanding that the author was using monte carlo methods to do something like GIF interlacing, so that the image would appear to load more quickly. Have I missed something?
The Monte-Carlo (MC) method used by the author has nothing to do with the resulting image file type, it has everything to do with how the image was generated in the first place. Since the point of JS1K is to write compact code, the author defines the rose by mathematical forms that are to be filled in with tiny dots (so they look like a solid image) by a basic render.
How do you fill those forms in? One method is to sample the surface uniformly, that is over a set interval, place a dot. As #Jordan quoted, it will work if and only if the interval is set correctly. Make it to small it takes to long; make it to large, the image is patchwork. However you can bypass the whole problem by sampling over the surface randomly. This is where the MC method comes in.
I've seen this confusion over MC before, since it is often thought of as a tool for numerical simulation. While widely used as such, the core idea is to randomly sample an interval with a bias that weights each step accordingly (dependent on the problem). For example, a physics simulation might have a weight of e^(-E/kT), whereas a numerical integrator might use a weight proportional to the derivative at the sample point. The wikipeida entry (and the refs. therein) are a good starting place for a more detail.
You can think of the complete rose as a function that is fully computed. As the MC algorithm runs, it samples this function while it converges onto the correct answer.
The author writes in the article that he uses Monte Carlo sampling to overcome the limitations of interval based sampling because the latter "requires the setting of a proper interval for each surface. If the interval is big, it render fast but it can end with holes in the surface that has not been filled. On the other hand, if the interval is too little, the time for rendering increments up to prohibitive quantities." I believe that the WebMonkey article's conclusion re: code size is incorrect.

Categories

Resources