I found out app called https://skinmotion.com/ and for learning purposes, I would like to create my own, web version of the app.
Web application work as follows. It asks user for permission to access camera. After that, video is caputer. Once every second, image is taken from the stream and processed. During this process, I look for soundwave patern in the image.
If the pattern is found, video recording stops and some action is executed.
Example of pattern - https://www.shutterstock.com/cs/image-vector/panorama-mini-earthquake-wave-on-white-788490724.
Idealy, it should work like with QR codes - even small qrcode is detected, it should not depend on rotation and scaling.
I am no computer vision expert, this field is fairly new to me. I need some help. Which is the best way to do this?
Should I train my own Tensorflow dataset and use tensorflow.js? Or is there easier and more "light weight" option?
My problem is, I could not find or come up with algorithm for processing the captured image to make as "comparable" as possible - scale it up, rotate, threshold to white and black colors, etc.
I hope that after this transformation, resemble.js could be used to compare "original" and "captured" image.
Thank you in advance.
With Deep Learning
If there are certain waves patterns to be recognized, a classification model can be written using tensorfow.js.
However if the model is to identify waves pattern in general, it can be more complex. An object detection model is to be used.
Without deep learning
Adding to the complexity, would be detecting the waveform and play an audio from it. In this latter case, the image can be read byte by byte. The wave graph is drawn with a certain color that is different from the background of the image. The area of interest can be identified and an array representing the wav form can be generated.
Then to play the audio from the array, it can be done as shown here
I am currently working on a sneaker recommendation application. As part of the implementation, a user needs to upload an image of a sneaker, and from this, the application attempts to find 3 closest matching shoes. To date, I have created a dataset of sneakers and have decided to use resemblejs in order to compare the upload images against what is in the dataset.
However I am encountering a problem. Prior to sending the uploaded sneaker image, I need to ascertain it's shape in order to determine whether or not it valid for submission.
Ideally I would like all the pictures facing one way... so consider the following scenario.
User uploads:
It's in the wrong direction, so the application knows to mirror it as all images in the dataset face this direction:
So all that needs to be done is mirror the picture as it is perfectly valid. Once this is done, the image has passed preprocessing and is compared against what is in the database.
I have researched various ways of doing this, and one of the approaches that struck me involves using a inverse binary thresholding approach in order to turn the background of the image black while whiting out the contents of the shoe itself.
Is it possible to accomplish this in Javascript? If so how?
I then thought that it would make sense to analyse the picture to see which side had more black/white (dividing it in half) in order to see if it needed to be mirrored. In a case where it cannot be mirrored, the user just has to upload a new picture.
I've been working on a custom video codec for use on the web. The custom codec will be powered by javascript and the html5 Canvas element.
There are several reasons for me wanting to do this that I will list at the bottom of this question, but first I want to explain what I have done so far and why I am looking for a fast DCT transform.
The main idea behind all video compression is that frames next to eachother share a large amount of similarities. So what I'm doing is I send the first frame compressed as a jpg. Then I send another Jpeg image that is 8 times as wide as the first frame holding the "differences" between the first frame and the next 8 frames after that.
This large Jpeg image holding the "differences" is much easier to compress because it only has the differences.
I've done many experiments with this large jpeg and I found out that when converted to a YCbCr color space the "chroma" channels are almost completely flat, with a few stand out exceptions. In other words there are few parts of the video that change much in the chroma channels, but some of the parts that do change are quite significant.
With this knowledge I looked up how JPEG compression works and saw that among other things it uses the DCT to compress each 8x8 block. This really interested me because I thought what if I could modify this so that it not only compresses "each" 8x8 block, but it also checks to see if the "next" 8x8 block is similar to the first one. If it is close enough then just send the first block and use the same data for both blocks.
This would increase both decoding speed, and improve bit rate transfer because there would be less data to work with.
I thought that this should be a simple task to accomplish. So I tried to build my own "modified" jpeg encoder/decoder. I built the RGB to YCbCr converter, I left "gzip" compression to do the huffman encoding and now the only main part I have left is to do the DCT transforms.
However this has me stuck. I can not find a fast 8 point 1d dct transform. I am looking for this specific transform because according to many articles I've read the 2d 8x8 dct transform can be separated into several 1x8 id transforms. This is the approach many implementations of jpeg use because it's faster to process.
So I figured that with Jpeg being such an old well known standard a fast 8 point 1d dct should just jump out at me, but after weeks of searching I have yet to find one.
I have found many algorithms that use the O(N^2) complexity approach. However that's bewilderingly slow. I have also found algorithms that use the Fast Fourier Transform and I've modifed them to compute the DCT. Such as the one in this link below:
https://www.nayuki.io/page/free-small-fft-in-multiple-languages
In theory this should have the "fast" complexity of O(Nlog2(n)) but when I run it it takes my i7 computer about 12 seconds to encode/decode the "modified" jpeg.
I don't understand why it's so slow? There are other javascript jpeg decoders that can do it much faster, but when I try to look through their source code I can't pull out which part is doing the DCT/IDCT transforms.
https://github.com/notmasteryet/jpgjs
The only thing I can think of is maybe the math behind the DCT has already been precomputed and is being stored in a lookup table or something. However I have looked hard on google and I can't find anything (that I understand at least) that talks about this.
So my question is where can I find/how can I build a fast way to compute an 8 point 1d dct transform for this "modified" jpeg encoder/decoder. Any help with this would be greatly appreciated.
Okay as for why I want to do this, the main reason is I want to have "interactive" video for mobile phones on my website. This can not be done because of things like iOS loading up it's "native" quick time player every time it starts playing a video. Also it's hard to make the transition to another point in time of the video seem "smooth" when you have such little control of how videos are rendered especially on mobile devices.
Thank you again very much for any help that anyone can provide!
So my question is where can I find/how can I build a fast way to compute an 8 point 1d dct transform for this "modified" jpeg encoder/decoder. Any help with this would be greatly appreciated.
take a look into the flash-world and the JPEG-encoder there (before it was inegrated into the Engine).
Here for example: http://www.bytearray.org/?p=1089 (sources provided) this code contains a function called fDCTQuant() that does the DCT, first for the rows, then for the columns, and then it quantifies the block (so basically there you have your 8x1 DCT).
So what I'm doing is I send the first frame compressed as a jpg. Then I send another Jpeg image ...
take a look at progressive JPEG. I think some of the things how this works, and how the data-stream is built will sound kind of familiar with this description (not the same, but they both go in related directions. imo)
what if I could modify this so that it not only compresses "each" 8x8 block, but it also checks to see if the "next" 8x8 block is similar to the first one. If it is close enough then just send the first block and use the same data for both blocks.
The expressions "similar" and "close enough" got my attention here. take a look at the usually used quantization-tables. you know, that a change of the value by 1 could easily result in a value-change of 15% brightness (for chroma-channels usually even more) of that point depending on the position in the 8x8-block and therefore the applied quantifier.
calculation with quantifier 40
(may be included in the set even at the lowest compression rates
at lower compression rates some quantifier can go up to 100 and beyond)
change the input by 1 changes the output by 40.
since we are working on 1byte value-range it's a change of 40/255
that is about 15% of the total possible range
So you should be really thoughtful what you call "close enough".
To sum this up: Well a Video-codec based on jpeg that utilizes differences between the frames to reduce the amount of data. That also sounde kind of familiar to me.
Got it: MPEG https://github.com/phoboslab/jsmpeg
*no connection to the referenced codes or the coder
I implemented separable integer 2D DCTs of various sizes (as well as other transforms) here: https://github.com/flanglet/kanzi/tree/master/java/src/kanzi/transform. The code is in Java but really for this kind of algorithm, it is pretty much the same in any language. The most interesting part IMO is the rescaling you do after computing each direction. Depending on your goals (max precision, 16 bit computation, no scaling, ...), you may want to change the scaling factors for each step. Using bigger blocks in areas where the image is very uniform saves bits.
This book shows how the DCT matrix can be factored to Gaussian Normal Form. That would be the fastest way to do a DCT.
http://www.amazon.com/Compressed-Image-File-Formats-JPEG/dp/0201604434/ref=pd_sim_14_1?ie=UTF8&dpID=41XJBED6RCL&dpSrc=sims&preST=_AC_UL160_SR127%2C160_&refRID=1Q0G2H5EFYCQW2TJCCJN
Long time Stack Overflow creeper. This community has come up with some incredibly elegant solutions to rather perplexing questions.
I'm more of a CSS3 or PHP kinda guy when it comes to handling dynamically displayed content. Ideally someone with a solid knowledge base of jQuery and/or Javascript would be able to answer this one best. Here is the idea, along with the thought process behind it:
Create a Full Screen (width:100%; height:auto; background:cover;) Video background. But instead of going about using HTML5's video tag, a flash fallback, iFrame, or even .GIF, create a series of images, much like the animation render output of say Cinema4D, that if put together in sequential order create a seamless pseudo-video experience.
In Before "THAT's JUST A .GIF, YOU'RE AN IDIOT" Guy.
I believe jQuery/Javascript could solve this. Would it or would it not be possible to write a script that essentially recognizes (or even adds) the div class of an image, then sets that image to display for say .0334ms (29.7 frame rate) then sets this image back in z space while at the same time firing in the next image within the sequential class order to display for another .0336ms; and so on and so forth until all of the images (or "frames") play out seamlessly fluid, so the user would assume he/she is actually seeing a video. Not a knowing it's actually a .GIF on steroids.
Here's a more verbose way of explaining the intended result:
You have a 1 second super awesome 1080p video clip (video format doesn't matter for helping to answer this question, just assume its lossless and really pretty k?). It's recorded at 29.97 frames per second. Break each frame into it's own massive image file, leaving you with essentially 30 images. 24 frames a second would mean you'd have 24 images, 60 frames per second would mean you'd have 60 images, etc., etc., excedera.
If you have ever used Cinema4D, the output I am looking to recreate is reflexive to that of an animation render, where you are left with a .TIFF per frame, placed side by side so that when uploaded into Photoshop or viewed in Quicktime you get a "slideshow" of images displaying so fast it look likes a video.
HTML would look something like this:
<div id="incredible-video">
<div class="image-1">
<img source=url('../movie/scene-one.tiff');/>
</div>
<div class="image-2">
<img source=url('../movie/scene-two.tiff');/>
</div>
<div class="image-3">
<img source=url('../movie/scene-three.tiff');/>
</div>
<div class="image-4">
<img source=url('../movie/scene-four.tiff');/>
</div>
<div class="image-5">
<img source=url('../movie/scene-five.tiff');/>
</div>
....etc.....
....etc.....
....etc.....
</div>
jQuery/Javascript could handle appending the sequential image classes instead of writting it all out by hand for each "frame".
CSS would look like:
#incredible-video img {
position:absolute;
width:100%;
height:auto;
background:cover;
}
But what would the jQuery/Javascript need to be to pull the off/can it be done? It would need to happen right after window load, and run on an infinite loop. Ofcourse audio is not happening in this example, but say we don't need it. Say we just want our End User to have a visually appealing page, with a minimal design implemented in the UI.
I love video animation, and really love sites built with Full Screen Backgrounds. But a site out with this visual setup and keeping it responsive is proving to strenuous a challenge. HTML5 will only get you so far, and makes mobile compatibility null and void (data usage protection). .GIF files are MASSIVE compared to calling in a .mp4, .Webm, or .OGG so that option is out.
I've actually recently played around with Adobe Edge Animate. Using the Edge Hero .js library I was able to reproduce a similar project to this: http://www.edgehero.com/tutorials/starwars-crawl-tutorial
I found it worked on ALL devices. Very cool. Made me think that maybe it's possibly to use this program or hit jQuery/Javascript directly to achieve the desired effect.
Thanks for taking a look at this one guys.
-Cheers,
Branden Dane
I found a viable solution to what I was looking to do. It's actually rather interesting. The answer it's introduces many interesting ideas on how we can display any kind of content dynamically on a site, in an app, or even a a full fledged software application.
The answer came about while diving hard into WebGl, canvas animation (both 2d and 3d), 2D video games techniques, and 3D video game techniques. Instead of looking for that "perfect" workflow, if you are someone interested in creating visually effective design and really seeing what the bleeding edge can do for your thoughts on development, skip the GUI's. Ignore the ads with software promising to make things doable in 5 min. It's not. However we are getting there. 3 major events we have to look forward too in just a few months are
1.) the universal agreement to implment WebGL natively in Opera, Chrome and Firefox (ofcourse), Safari will move to ship with webGL enabled, compered to the user having to enable it manually, and even IE is going to try and give her a go (in IE 12).
2.) Unity 3D, an industry standard in game development, has announced that next month it will release version 5, and with it a complete, intuitive workflow from start to exporting in Javascript (not JSON actual JavaScript). The Three.JS library more specifically as it is one of the most popular of the seemingly endless games engines out today.
How does this answer my initial question?:
Though WebGL has been around for about 3 years now, we are only now starting to see it shine. It's far more than a simple video game engine. With ThreeJS We have a full working JavaScript library, capable of rendering in WebGL, to the Canvas, or EVEN with a few CSS3 magic. Can't use your great movie as a mobile background? It ruining the overall UI? Cheer up. ThreeJS can working with both 2D and 3D javascript draw function, though not at the same time. Hover other libraries exist that allow you to bypass this rule.
AND DRUM ROLL. It is, or can be very easily made in a responsive or adaptive way.
The answer to my question came from looking at custom preloaders. Realizing I can create incredible looping animations in AE, and export them as GIFs offered the quality I wanted, but not control, no optimization, now sound. However, PNG Sequences CAN be exported. Then the epiphany hit. Before I just say what I am using to solve my problem, I'd like to leave a list of material anyone looking to move beyond easy development and challenge limits can use as a reference guide. This will be in order with what I began to where I am now. I hope it helps someone. The time to find it all out would be very much worth it.
1.) WebGL-Three.JS
WebGL opened my eyes to a new world. It's a technology quickly evolving and is here to stay. In a nutshell, all live applications you create now have access to more than just a CPU, but also the Graphics card as well. With GPU's getting more and more powerful, and not so unreasonably priced, the possibilities are endless. The idea we could be playing Crysis 3 "in-browser" without the need of a 3rd party client is no fiction. It's the future. Apply that to websites. Mind blown.
2.) First Cinema4D, then start working around with Verold.com & PlayCanvas.com
C4D is just my personal favorite because if it's easy integration with AE. You will find that with exporting your 3D models, Textures, Mesh's, anything to Three.JS (or any game engine period) that it is Blender that is the most widely supported. As of writing this, their are 2 separate C4D workflows to ThreeJS. Both are tedious, not always going to work, and actually just unnecessary. PlayCanvas was also a bit of a let down. Verold, however is an EXCELLENT browser based 3D editor in which you can import a variety of files (even FBX with Baked animations!) and when you are satisfied you can export into a standalone client or an iframe. The standalone client is superb. It is a bit glitchy, so have patience. You shouldn't get comfortable with it any way. Go back to your roots.
3.) iPhone app development, Android app dev (to an impressive extent), Web Sites, Web Apps, and more all function in a way that an application need only be made using JavaScript, HTML/5 and CSS/3. Once this is understood, and the truth hits you as to how much control you may not have known you had, then the day becomes good indeed. Learn the code. With a million untested and horrible "GUI's" out there that claim to do what you want, avoid the useless search. Learn the code. You can never go wrong at that point.
4.)What code do I need to learn?
JavaScript is the most essential. More on that in a moment. Seriously dive into creating apps of any kind with ThreeJS. Mr. Doob (co-creator of the library) has an EXCELLENT, well-documented website with tons of examples, tuts, and source code for you to dive into. Chrome Experiments is your next awesome option to see how people are really taking this kind of development to a new level. In the process of learning ThreeJS, you'll become more proficient with JavaScript. You will also start to play with things you maybe never had to, like JSON, or XML files for packaging data. You'll also learn how simple it is to implement Three.JS as a WebGL render, or even fallbacks to Canvas and even CSS3D if and when possible.
Before going on, I will make a caveat. I believe that once Unity 3D drops ThreeJS fro pro and free users, we will see much much more 3D in the web. In that case, it can't hurt to Download the software and play around a bit. It also serves an an excellent visual editor. There are exporters from Unity 3D to ThreeJS, but again they are still pre alpha stage.
2D or not 2D. that is the question
After getting a little dirty with 3D I moved into drawing in the 2D realm using the canvas. Flash still seems like a viable tool, but again, it's all about the code. Learn how to do it and you may find Flash is actually costing you time. I found 2D more difficult than 3D because the nature of 2D has yet to radically change, at least in my lifetime. You'll need to start learning Spritesheet creation tutorials. Nothing incredible hard if you know where to look. Use A photoshop, or an equivalent application. Create as many "movement" frames that if were put together in a GIF would be enough to seamlessly loop the sprite. OR render a master image out and cut around the elements naturally distinct pats. Ex: You want to make the guy you have standing on a street corner you created, stays. Cut that character up in as many separate PNG files as you believe you need. The second method is all about using the same sprite sheet we brought in the first try. The first scenario meant writing CSS selector and have javascript written for the regular user would become increasingly difficult.
First solution: Using CSS and Javascript to plot "frames" meticulously put together in the sprite sheet. This really can become a pain if not done correctly all the way through.
Second solution: We lose the frame by frame effect if we need it, but our overall 2D animations will look incredible. Also, building in this way creates more efficient games when implementing physics engines and setting up collision detectors. We will still use the same sprite sheet, however we only need to choose the frames we really actually need. The idea is to use dynamic tweening between frames that are called together via Javascript. In the end you have a fully animated Sprite, but could have done so with just one frame. Ex: You have a Stickman you want to show walking in a straight line. Solution one would jump frame by frame, creating a mild chop, to illustrate an animated walk. In solution 2, we take the Stick man and chop his dynamic bits apart so we can call them through JavaScript, then build our sprite from JavaScript directly. To create the walking effect, we cut apart stickmans legs and have those separate in the sprite sheet from the rest of his body (unless you need to animate another body part as well). We map out where the coordinates are for each piece of stickman. Free software like DarkFunctionEditor is one of many programs that will instantly take care of generating for you a reliable sprite sheet, printing out the coordinates of your sprite sheet after you bake it. With this knowledge, head into JavaScript and call in your variables that you wish to associate to the pieces of Stick Man and their corresponding coordinates. Then use Javascript to "build" all the pieces together. The walking animation is accomplished by the Tween we talked about earlier. Each leg essentially runs on a beautifully fluid path you set in JavaScript. No chop. Very easy to customize and control. If you want to make it even easier for yourself, try using one of the many libraries for Sprite animation. My favorite at the moment being CreateJS.
If you are looking to include collision detection or create particle systems then you will need a physics engine. For 2D I am torn between 2 at the moment. Right now I would put PhysicsJS over KineticJS. Both are fantastic. I believe PhysicsJS integrates with CaccoonJS and other mobile scripts easier.
My last words of advice are=, after reading this, understand you will be working will JavaScript. You will have a bit of jQuery to make it easy, but you will encounter things that are difficult on the way. My HUGE recommendation is to move into learning how to build using NodeJS. It's an Asynchronous Javascript Server-side and client-side development space. The documentation is wonderful. Your first stop should be learing about npm, and bower. Then understand how to effectively implement Grunt into the workflow. Try out NodeJS assets like Yeoman to give you "boilerplate" Node setups from which to start with. After you start understanding NodeJS mechanics and feel comfortable with setting up your initial package.json, you'll find that all this JavaScript will almost feel like it's writing itself after a certain point.
And that's all you need to know to get into 2D and 3D design and development. My initial question could have been answered using say a 3D rendered fullscreen. However my final conclusion came in a different method entirely.
After learning about 2D sprites and framing, then noticing the encoding process of gifs. I had the idea to try and create PNG Sprite Animations. Not PNG Gifs, per say. But rather creating a 2D scene and using a PNG sequence that I would then animate via JavaScript. I found a few great libraries on Github, both for my idea and cool ideas for GIF manipulation.
My final choices was with the Github Repo "jquery.animateSprite" Instead of mulling through sprite sheets, you take your individual PNG's and this library gives you an incredible amount of control in how you can store variables for later use, but also the animations you can pull off in general. For a full screen, responsive background that works on any device (and can even be animated to sound....) I'd recommend this technique. It works much like a flip book animation works, except much much more effectively.
I hope this helps someone along the way. If you have a question on anything I have mentioned here, or know of an area that needs further detail, then by all means please let me know.
-Cheers
Let's say I want a user to write a story in 20 minutes. After the user is done I want to play back the story writing process so I can see how the user went about doing it. How would I do this? I don't want to watch every second of it, obviously, but I'd like to see a snapshot of whenever a "large change" was made. The "large change" should be defined by me. It could be X amount of character additions or subtractions.
I thought of somehow trying to continuously monitor the textbox for changes and then store the text as a string in an array every time there is a "large change". Then to replay this, I will play the string array with a 1 second delay.
Anyone think of a better way to do this or know a library that would help?
Thanks!
Let's take for granted that you have the capacity to persist this state and just look at the challenge of detecting and displaying the diffs. The most challenging aspect of you problem is going to be defining and subsequently detecting what you call a "large change". If we set that aside for a moment I think there are two ways you can go about this:
1) Operational transform - (http://en.wikipedia.org/wiki/Operational_transformation)
This is what Google Docs(etherpad) uses to synchronize real-time collaborative edits across multiple browsers. With OT you can practically recreate a video of the changes made to a document. You can see this in action on thinklinkr.com revision history (full disclosure - I am one of the founders).
2) Diff-match-path - (http://code.google.com/p/google-diff-match-patch/)
This is actually a set of three algorithms that can be used to effeciently create and resolve differences between text documents. I think this might be a better match for you given your requirement about chunking diffs.