Generating events for gesture-controlled websites

Generating events for gesture-controlled websites - javascript

I am very happy that I got the opportunity to work on a website that is gesture-based.
I have a few inspiration for this: link
I visited lot of websites and googled it, Wikipedia and gitHub also didn't help much. There is not much information provided as these technologies are in nascent stages.
I think I will have to use some js for this project
gesture.js (our custom javascript code)
reveal.js (Frame work for slideshow)
My questions are how come gestures generate events, how does my JavaScript interact with my webcam? Do I have to use some API or algorithms?
I am not asking for code. I am just asking the mechanism, or some links providing vital info will do. I seriously believe that if the accuracy on this technology can be improved, this technology can do wonders in the near future.

To enable gestural interactions in a web app, you can use navigator.getUserMedia() to get video from your local webcam, periodically put video frame data into a canvas element and then analyse changes between frames.
There are several JavaScript gesture libraries and demos available (including a nice slide controller). For face/head tracking you can use libraries like headtrackr.js: example at simpl.info/headtrackr.

I'm playing a little bit with that at the moment so, from what I understood
the most basic technique is:
you request to use the user's webcam to take a video.
when permission is given, create a canvas in which to put the video.
you use a filter (black and white) on the video.
you put some control points in the canvas frame (a small area in where all the pixel colors in it are registered)
you start attaching a function for each frame (for the purpose of the explanation, I'll only demonstrate left-right gestures)
At each frame:
If the frame is the first (F0) continue
If not: we subtract the current frame's pixels (Fn) from the previous one
if there were no movement between Fn and F(n-1) all the pixels will be black
if there are, you will see the difference Delta = Fn - F(n-1) as white pixels
Then you can test your control points for which areas are light up and store them
( ** )x = DeltaN
Repeat the same operations until you have two or more Deltas variables and you subtract the control points DeltaN from the control points Delta(n-1) and you'll have a vector
( **)x = DeltaN
( ** )x = Delta(N-1)
( +2 )x = DeltaN - Delta(N-1)
You can now test if the vector is either positive or negative, or test if the values are superior to some value of your choosing
if positive on x and value > 5
and trigger an event, then listen to it:
$(document).trigger('MyPlugin/MoveLeft', values)
$(document).on('MyPlugin/MoveLeft', doSomething)
You can greatly improve the precision by caching the vectors or adding them and only trigger an event when the vector values becomes a sensible value.
You can also expect a shape on your first subtractions and try to map a "hand" or a "box"
and listen to the changes of the shape's coordinates, but remember the gestures are in 3D and the analysis is 2D so the same shape can change while moving.
Here's a more precise explanation. Hope my explanation helped.

Related

JavaScript: Detecting shape in canvas

I found out app called https://skinmotion.com/ and for learning purposes, I would like to create my own, web version of the app.
Web application work as follows. It asks user for permission to access camera. After that, video is caputer. Once every second, image is taken from the stream and processed. During this process, I look for soundwave patern in the image.
If the pattern is found, video recording stops and some action is executed.
Example of pattern - https://www.shutterstock.com/cs/image-vector/panorama-mini-earthquake-wave-on-white-788490724.
Idealy, it should work like with QR codes - even small qrcode is detected, it should not depend on rotation and scaling.
I am no computer vision expert, this field is fairly new to me. I need some help. Which is the best way to do this?
Should I train my own Tensorflow dataset and use tensorflow.js? Or is there easier and more "light weight" option?
My problem is, I could not find or come up with algorithm for processing the captured image to make as "comparable" as possible - scale it up, rotate, threshold to white and black colors, etc.
I hope that after this transformation, resemble.js could be used to compare "original" and "captured" image.
Thank you in advance.

With Deep Learning
If there are certain waves patterns to be recognized, a classification model can be written using tensorfow.js.
However if the model is to identify waves pattern in general, it can be more complex. An object detection model is to be used.
Without deep learning
Adding to the complexity, would be detecting the waveform and play an audio from it. In this latter case, the image can be read byte by byte. The wave graph is drawn with a certain color that is different from the background of the image. The area of interest can be identified and an array representing the wav form can be generated.
Then to play the audio from the array, it can be done as shown here

How to understand the relationships between "distance" methods of the Web Audio API panner node

I'm currently trying to understand and get a better mental image of the relationship between the following methods
listener.setPosition(x,y,z);
panner.setPosition(x,y,z);
From here on out I will describe what I think I know and hopefully a few people will tell me where I'm wrong and correct me.
Now at its most basic I imagine a sphere with both of these "positions" occupying the same place in the middle. In a way you could look at the panner.setPosition and listener.setPosition() as both physical "things" inside our hypothetical sphere.
Now when you change the coordinates of one you move it in relation to the other.
So one point of confusion is this. From what I've come to understand the coordinates of these two methods are not any particular value, however unlike the x value where 1 and -1 determine the range with 0 in the middle, the z index actually does seem to have a value, so if I give it a positive value it does "push the sound further away" (or more to the point - attenuate it). 0 is close and 200 is far for example. What value is this ?
Partial Answer:
The reason 1 to -1 are hard right/left is because the z is set to zero hence there is 0 space between the listener and the "sound source" which is represented by the panner node. I am still unclear as to what "value" the z index is.
The other questions I have:
1.
If you have a hypothetical "world" where an imaginary person is moving closer or further away from a sound source, should you program the listeners z coordinate or should you program the panners z coordinate. The reason I'm asking this is because in this example the panner is programmed and the listener stays in the same place but the UI suggest the listener is moving toward the sound source. I guess this question is more about best practice as it seems either will work.
2.
When I run any sound through the default panner node it literally sounds different ( more muffled and "dense", almost as if two instances of the same audio file are playing simultaneously or the highs are getting cancelled out). I assume since this is the case there is some culmination of "settings" that the programmer is expected to know about to remedy this. I'm interested in knowing whatever that is.
If you would like a side by side example below are two JSfiddle examples. One uses a panner node and one does not. open both of them and play the files side by side while listening in headphones, the difference is subtle but noticeable.
JSfiddle with panner node
JSfiddle without panner node

0 is close and 200 is far for example. What value is this?
I think the numbers are arbitrary units. The section on distanceModel gives the formulae for calculating the change in gain based on the distance between the source and the listener.
Also the spec mentions that "PannerNode scales/multiplies the input audio signal by distanceGain"
For answers to your question:
Q1. - should you program the listeners z coordinate or should you program the panners z coordinate
It really depends on your application. The API is designed to have a single Listener per AudioContext. But you can create multiple AudioSources with a PannerNode attached to each of them. You can envision this as a bunch of Loudspeakers (AudioSources) that you can position in 3D space, but you yourself (Listener) can also move around. In such a case depending on whether you want the sources to move with respect to each other or not, you may or may not want the listener to move.
Q2. - sounds different ( more muffled and "dense")
I am not sure about this. I have not encountered this before. Could you give an example/demo?
Based on trying the various panning models, it seems like the muffled audio is the result of using the HRTF panning model. The HRTF Impulse Responses definitely do NOT have flat frequency response and will colour the audio. You can use a equalpower panning model instead.
Be warned though a lot of these things are gonna change in the upcoming change of WebAudio. You can read more about it here https://github.com/WebAudio/web-audio-api/issues/372

HTML Canvas Tracing

I'm trying to build something in HTML5/Canvas to allow tracing over an image and alert if deviating from a predefined path.
I've figured out how to load an external image into the canvas, and allow mousedown/mousemovement events over it to draw over the image, but what I'm having trouble getting my head around is comparing the two.
Images are all simple black on white outlines, so from what I can tell a getPixel style event can tell if there is black underneath where has been drawn upon or underneath where the mouse is on.
I could do it with just the mouse position, but that would require defining the paths of every image outline (and there are a fair number, hence ideally wanting to do it by analyzing the underlying image)..
I've been told that its possible with Flash, but would like to avoid that if possible so that compatability with non-flash platforms (namely the ipad) can be maintained as they are the primary target for the page to run.
Any insight or assistance would be appreciated!

I think you already touched upon the most straight-forward approach to solving this.
Given a black and white image on a canvas, you can attach a mousemove event handler to the element to track where the cursor is. If the user is holding left-mouse down, you want to determine whether or not they are currently tracing the pre-defined path. To make things less annoying for the user, I would approach this part of the problem by sampling a small window of pixels. Something around 9x9 pixels would probably be a good size. Note that you want your window size to be odd in both dimensions so that you have a symmetric sampling in both directions.
Using the location of the cursor, call getImageData() on the canvas. Your function call would look something like this: getImageData(center_x - Math.floor(window_size / 2), center_y - Math.floor(window_size / 2), window_size, window_size) so that you get a sample window of pixels with the center right over the cursor. From there, you could do a simple check to see if any non-white pixels are within the window, or you could be more strict and require a certain number of non-white pixels to declare the user on the path.
The key to making this work well, I think, is making sure the user doesn't receive negative feedback when they deviate the tiniest bit from the path (unless that's what you want). At that point you run the risk of making the user annoyed and frustrated.
Ultimately it comes down to one of two approaches. Either you load the actual vector path for the application to compare the user's cursor to (ie. do point-in-path checks), or you sample pixel data from the image. If you don't require the perfect accuracy of point-in-path checking, I think pixel sampling should work fine.
Edit: I just re-read your question and realized that, based on your reference to getPixel(), you might be using WebGL for this. The approach for WebGL would be the same, except you would of course be using different functions. I don't think you need to require WebGL, however, as a 2D context should give you enough flexibility (unless the app is more involved than it seems).

Recording and storing high-res hand drawing

Are there any advanced solutions for capturing a hand drawing (from a tablet, touch screen or iPad like device) on a web site in JavaScript, and storing it on server side?
Essentially, this would be a simple mouse drawing canvas with the specialty that its resolution (i.e. the number of mouse movements it catches per second) needs to be very high, otherwise round lines in the drawing will become "polygonal" when moving the pen / mouse fast:
(if this weren't the case, the inputDraw solution suggested by #Gregory would be 100% perfect.)
It would also have to have a high level of graphical quality, i.e. antialias the penstroke. Nothing fancy here but a MS Paint style, 1x1 Pixel stroke won't cut it.
I find this a very interesting thing in general, seeing as Tablet PCs are becoming at least a bit more common. (Not that they get the attention I feel they deserve).
Any suggestions are highly appreciated. I would prefer an Open Source solution, but I am also open to proprietary solutions like ActiveX controls or Java Applets.
FF4, Chrome support is a must; Opera, IE8/9 support is desired.
Please note that most "canvas" libraries around, and most answers to other questions similar to mine, refer to programmatically drawing onto a canvas. This is not what I am looking for. I am looking for something that records the actual pen or mouse movements of the user drawing on a certain area.
Starting a bounty out of curiosity whether anything has changed during the time since this question was asked.

I doubt you'll get anything higher resolution than the "onmousemove" event gives you, without writing an efficient assembler program on some embedded system custom built for the purpose. You run inside an OS, you play by the OS's rules, which means you're limited by the frequency of the timeslices an OS will give you. (usually about 100 per second, fluxuating depending on load) I've not used a tablet that can overcome the "polygon" problem, and I've used some high end tablets. Photoshop overcomes the problem with cubic interpolation.
That is, unless, you have a very special tablet that will capture many movement events and queue them up to some internal buffer, and send a whole packet of coordinates at a time when it dispatches data to the OS. I've looked at tablet api's though, and they only give one set of coordinates at a time, so if this is going to happen, you'll need custom hardware, and a custom driver, and custom apis that can handle packets of multiple coordinates.
Or you could just use a damned canvas tag, the onmousemove event, event.pageX|pageY some cubic interpolation, the "toDataURI" api of canvas, post the result to your php script, and then just say you did all that other fancy stuff.
onmousemove, in my tests, will give you one event per pixel of movement, limited only by the speed of the event loop in the browser. You'll get sparse data points (polygons) with fast movement and that's as good as it gets without a huge research grant and a hardware designer. Deal.

there are some applets for this in the oekaki world: Shi painter, Chibipaint or PaintBBS. Here you have php classes for integration.
Drawings produced by these applets can have quite good quality. If you register in oekakicentral.com you can see all the galleries and some drawings have an animation link that shows how was it drawn (it depends on the applet), so you can compare the possibilities of the applets. Some of them are open source.
Edit: See also this made in HTML 5.

Have a look at <InputDraw/>: a flash component that turns freehand drawing into SVG. Then you could send back the generated SVG to your server.
It's free for non commercial use. According to their site, commercial use price is 29€. It's not open source though.
IMHO it's worth a look.
Alternatively, you implement something based on svg-edit which is open source and uses jQuery (demo). If requires the Google Frame Plugin for IE6+ support though.
EDIT: I just found the svg-freehand-signature project (demo) that captures your handwritten signature and sends it to a server as a SVG using POST. It's distributed as a straight-forward and self-contained zip (works out of the box with Safari and Firefox, you may want to combine it with svgweb that brings SVG support to Internet Explorer).
EDIT: I successfully combined Cesar Oliveira's canvaslol (just look at the source of the page to see how it works) with ExplorerCanvas to have something on IE. You can also have a look at Anne van Kesteren's Paintr experiment.

markup.io is doing that with an algorithm applied after the mouseup.
I asked a similar question recently, and got interesting but not satisfying answers: Is there any way to accelerate the mousemove event?

Do webAudio Panner and/or SteroPanner nodes introduce a suitable delay between left and right channels?

Shouldn't panning not only alter volume, but also left-right delay according to the distances from the source to the ears? The documentation doesn't seem to mention this, but differential arrival time of sound is a central part of human aural localization. If these panner nodes don't do it, do DelayNodes have sufficient time resolution to be used, and can they be connected appropriately?

The StereoPannerNode does not have different delays. The PannerNode with type "equal-power" doesn't either, but the "HRTF" type would since the impulse responses through the head would take in to account the location of the source relative to each ear.
A 'DelayNode` probably has sufficient resolution to delay the signals if that's what you want to do.
I've never had problems with localization when using the HRTF panner. I've always been able to tell if the source is above or below, or in front or behind, or left or right.

Develop Reference

JavaScript is the programming language of the Web.