Here's the fiddle for it: https://jsfiddle.net/rujbra6y/3/
I'm already logging the speed so make any changes and just re-run it a couple times to see if the performance increased at all.
Been working on it for a couple hours and I can't think of anything else I can change to make it faster. I'd like it to be as fast as possible because currently there is a small delay when the user uses floodfill and for proper user experience, I'd like that delay to be as short as possible.
Are there any more optimizations or hacks I can use to increase the speed?
There are a couple of things you could do at a brief glance:
Replaced Uint8ClampedArray with Uint32Array. This will save you from unnecessary shifting and ANDing ops
Replace push/pop with a stack pointer, this way you just update instances
You could define a typed array (Int16Array) for the stack using a fixed size (be sure to make it large enough)
The only thing you need to be aware of for Uint32Array is the byte-order is little-endian, which mean you need to provide the target color in 0xAABBGGRR format (or do an initial bit-shift giving r,g,b as separate values).
With these changes (except the last) the code went down from about 69-75ms to 58-61ms on my computer (i5 at the moment).
Updated fiddle
I'll leave the typed array for stack as an exercise. :-)
Related
I've been working on a custom video codec for use on the web. The custom codec will be powered by javascript and the html5 Canvas element.
There are several reasons for me wanting to do this that I will list at the bottom of this question, but first I want to explain what I have done so far and why I am looking for a fast DCT transform.
The main idea behind all video compression is that frames next to eachother share a large amount of similarities. So what I'm doing is I send the first frame compressed as a jpg. Then I send another Jpeg image that is 8 times as wide as the first frame holding the "differences" between the first frame and the next 8 frames after that.
This large Jpeg image holding the "differences" is much easier to compress because it only has the differences.
I've done many experiments with this large jpeg and I found out that when converted to a YCbCr color space the "chroma" channels are almost completely flat, with a few stand out exceptions. In other words there are few parts of the video that change much in the chroma channels, but some of the parts that do change are quite significant.
With this knowledge I looked up how JPEG compression works and saw that among other things it uses the DCT to compress each 8x8 block. This really interested me because I thought what if I could modify this so that it not only compresses "each" 8x8 block, but it also checks to see if the "next" 8x8 block is similar to the first one. If it is close enough then just send the first block and use the same data for both blocks.
This would increase both decoding speed, and improve bit rate transfer because there would be less data to work with.
I thought that this should be a simple task to accomplish. So I tried to build my own "modified" jpeg encoder/decoder. I built the RGB to YCbCr converter, I left "gzip" compression to do the huffman encoding and now the only main part I have left is to do the DCT transforms.
However this has me stuck. I can not find a fast 8 point 1d dct transform. I am looking for this specific transform because according to many articles I've read the 2d 8x8 dct transform can be separated into several 1x8 id transforms. This is the approach many implementations of jpeg use because it's faster to process.
So I figured that with Jpeg being such an old well known standard a fast 8 point 1d dct should just jump out at me, but after weeks of searching I have yet to find one.
I have found many algorithms that use the O(N^2) complexity approach. However that's bewilderingly slow. I have also found algorithms that use the Fast Fourier Transform and I've modifed them to compute the DCT. Such as the one in this link below:
https://www.nayuki.io/page/free-small-fft-in-multiple-languages
In theory this should have the "fast" complexity of O(Nlog2(n)) but when I run it it takes my i7 computer about 12 seconds to encode/decode the "modified" jpeg.
I don't understand why it's so slow? There are other javascript jpeg decoders that can do it much faster, but when I try to look through their source code I can't pull out which part is doing the DCT/IDCT transforms.
https://github.com/notmasteryet/jpgjs
The only thing I can think of is maybe the math behind the DCT has already been precomputed and is being stored in a lookup table or something. However I have looked hard on google and I can't find anything (that I understand at least) that talks about this.
So my question is where can I find/how can I build a fast way to compute an 8 point 1d dct transform for this "modified" jpeg encoder/decoder. Any help with this would be greatly appreciated.
Okay as for why I want to do this, the main reason is I want to have "interactive" video for mobile phones on my website. This can not be done because of things like iOS loading up it's "native" quick time player every time it starts playing a video. Also it's hard to make the transition to another point in time of the video seem "smooth" when you have such little control of how videos are rendered especially on mobile devices.
Thank you again very much for any help that anyone can provide!
So my question is where can I find/how can I build a fast way to compute an 8 point 1d dct transform for this "modified" jpeg encoder/decoder. Any help with this would be greatly appreciated.
take a look into the flash-world and the JPEG-encoder there (before it was inegrated into the Engine).
Here for example: http://www.bytearray.org/?p=1089 (sources provided) this code contains a function called fDCTQuant() that does the DCT, first for the rows, then for the columns, and then it quantifies the block (so basically there you have your 8x1 DCT).
So what I'm doing is I send the first frame compressed as a jpg. Then I send another Jpeg image ...
take a look at progressive JPEG. I think some of the things how this works, and how the data-stream is built will sound kind of familiar with this description (not the same, but they both go in related directions. imo)
what if I could modify this so that it not only compresses "each" 8x8 block, but it also checks to see if the "next" 8x8 block is similar to the first one. If it is close enough then just send the first block and use the same data for both blocks.
The expressions "similar" and "close enough" got my attention here. take a look at the usually used quantization-tables. you know, that a change of the value by 1 could easily result in a value-change of 15% brightness (for chroma-channels usually even more) of that point depending on the position in the 8x8-block and therefore the applied quantifier.
calculation with quantifier 40
(may be included in the set even at the lowest compression rates
at lower compression rates some quantifier can go up to 100 and beyond)
change the input by 1 changes the output by 40.
since we are working on 1byte value-range it's a change of 40/255
that is about 15% of the total possible range
So you should be really thoughtful what you call "close enough".
To sum this up: Well a Video-codec based on jpeg that utilizes differences between the frames to reduce the amount of data. That also sounde kind of familiar to me.
Got it: MPEG https://github.com/phoboslab/jsmpeg
*no connection to the referenced codes or the coder
I implemented separable integer 2D DCTs of various sizes (as well as other transforms) here: https://github.com/flanglet/kanzi/tree/master/java/src/kanzi/transform. The code is in Java but really for this kind of algorithm, it is pretty much the same in any language. The most interesting part IMO is the rescaling you do after computing each direction. Depending on your goals (max precision, 16 bit computation, no scaling, ...), you may want to change the scaling factors for each step. Using bigger blocks in areas where the image is very uniform saves bits.
This book shows how the DCT matrix can be factored to Gaussian Normal Form. That would be the fastest way to do a DCT.
http://www.amazon.com/Compressed-Image-File-Formats-JPEG/dp/0201604434/ref=pd_sim_14_1?ie=UTF8&dpID=41XJBED6RCL&dpSrc=sims&preST=_AC_UL160_SR127%2C160_&refRID=1Q0G2H5EFYCQW2TJCCJN
Context :
I'm working on a pretty simple THREE.JS project, and it is, I believe, optimized in a pretty good way.
I'm using a WebGLRenderer to display lot's of Bode plot extracted from an audio signal every 50ms. This is pretty cool, but obviously, the more Bode I display, the more laggy it is. In addition, Bodes are moving at constant speed, letting new ones some space to be displayed.
I'm now at a point where I implemented every "basic" optimization I found on Internet, and I managed to get a 30 fps constantly at about 10.000.000 lines displayed, with such a bad computer (nVidia GT 210 and Core i3 2100...).
Note also i'm not using any lights,reflections... Only basic lines =)
As it is a working project, i'm not allowed to show some screenshots/code, sorry ...
Current implementation :
I'm using an array to store all my Bodes, which are each displayed via a THREE.Line.
FYI, actually 2000 THREE.Line are used.
When a Bode has been displayed and moved for 40s, it is then deleted and the THREE.Line is re-used with another one. Note that to move these, I'm modifying THREE.Line.position property.
Note also that I already disabled my scene and object matrix autoUpdate, as I'm doing it manually. (Thx for pointing that Volune).
My Question :
Do the THREE.Line.position modification induces some heavy
calculations the renderer has already done ? Or is three.js aware that my object did not change and
avoid these ?
In other words, I'd like to know if rendering/updating the same object which was just translated is heavier in the rendering process than just leaving it alone, without updating his matrix etc...
Is there any sort of low-level optimization, either in ThreeJS about rendering the same objects many times ? Is this optimization cancelled when I move my object ?
If so, I've in mind an other way to do this : using only two big Mesh, which are folowing each other, but this induces merging/deleting parts of their geometries each frames... Might it be better ?
Thanks in advance.
I found in the sources (here and here) that the meshes matrices are updated each frame no matter the position changed or not.
This means that the position modification does not induce heavy calculation itself. This also means that a lot of matrices are updated and a lot of uniforms are sent to the GC each frame.
I would suggest trying your idea with one or two big meshes. This should reduce javascript computations internal to THREE.js, and the only big communications with the GC will be relative to the big buffers.
Also note that there exists a WebGL function bufferSubData (MSDN documentation) to update parts of a buffer, but it seems not yet usable in THREE.js
This may be a stupid/previously answered question, but it is something that has been stumping me and my friends for a little while, and I have been unable to find a good answer.
Right now, i make all my JS Canvas games run in ticks. For example:
function tick(){
//calculate character position
//clear canvas
//draw sprites to canvas
if(gameOn == true)
t = setTimeout(tick(), timeout)
}
This works fine for CPU-cheep games on high-end systems, but when i try to draw a little more every tick, it starts to run in slow motion. So my question is, how can i keep the x,y position and hit-detection calculations going at full speed while allowing a variable framerate?
Side Note: I have tried to use the requestAnimationFrame API, but to be honest it was a little confusing (not all that many good tutorials on it) and, while it might speed up your processing, it doesn't entirely fix the problem.
Thanks guys -- any help is appreciated.
RequestAnimationFrame makes a big difference. It's probably the solution to your problem. There are two more things you could do: set up a second tick system which handles the model side of it, e.g. hit detection. A good example of this is how PhysiJS does it. It goes one step further, and uses a feature of some new browsers called a web worker. It allows you to utilise a second CPU core. John Resig has a good tutorial. But be warned, it's complicated, is not very well supported (and hence buggy, it tends to crash a lot).
Really, request animation frame is very simple, it's just a couple of lines which once you've set up you can forget about it. It shouldn't change any of your existing code. It is a bit of a challenge to understand what the code does but you can pretty much cut-and-replace your setTimeout code for the examples out there. If you ask me, setTimeout is just as complicated! They do virtually the same thing, except setTimeout has a delay time, whereas requestAnimationFrame doesn't - it just calls your function when it's ready, rather than after a set period of time.
You're not actually using the ticks. What's hapenning is that you are repeatedly calling tick() over and over and over again. You need to remove the () and just leave setTimeout(tick,timeout); Personally I like to use arguments.callee to explicitly state that a function calls itself (and thereby removing the dependency of knowing the function name).
With that being said, what I like to do when implementing a variable frame rate is to simplify the underlying engine as much as possible. For instance, to make a ball bounce against a wall, I check if the line from the ball's previous position to the next one hits the wall and, if so, when.
That being said you need to be careful because some browsers halt all JavaScript execution when a contaxt menu (or any other menu) is opened, so you could end up with a gap of several seconds or even minutes between two "frames". Personally I think frame-based timing is the way to go in most cases.
As Kolink mentioned. The setTimeout looks like a bug. Assuming it's only a typo and not actually a bug I'd say that it is unlikely that it's the animation itself (that is to say, DOM updates) that's really slowing down your code.
How much is a little more? I've animated hundreds of elements on screen at once with good results on IE7 in VMWare on a 1.2GHz Atom netbook (slowest browser I have on the slowest machine I have, the VMWare is because I use Linux).
In my experience, hit detection if not done properly causes the most slowdown when the number of elements you're animating increases. That's because a naive implementation is essentially exponential (it will try to do n^n compares). The way around this is to filter out the comparisons to avoid unnecessary comparisons.
One of the most common ways of doing this in game engines (regardless of language) is to segment your world map into a larger set of grids. Then you only do hit detection of items in the same grid (and adjacent grids if you want to be more accurate). This greatly reduces the number of comparisons you need to make especially if you have lots of characters.
I wrote a small jquery plugin that basically converts all words in an html element to spans, makes them invisible and then animates them into view. I made it so that you can define the time that it's supposed to take to load the whole element, and based on tests the math seems to be right, but in practice it takes quite a bit longer.
See jsfiddle: http://jsfiddle.net/A2DNN/
Note the variables "per" and "ms", this basically tells it to process "per" number of words every "ms" milliseconds.
In the log you'll see it'll process 1 word ever 1 ms, which should result in a MUCH faster loading time.
So I'm just wondering, is it possible that the CPU is forming a bottleneck here? In that JS is fading items into view, which is handled by the CPU, which isn't very fast at graphical processing.
It sounds almost silly to ask, I'd expect these days a CPU would laugh at a small bit of work load like this.
It is due to a minimum timeout forced by the browser's JavaScript implementation. You can't have a 1ms timeout, it is slightly more than that. There has already been a discussion about that here.
I am experimenting with jQuery and the animate() functionality. I don't believe the work is a final piece however I have problem that I can't seem to figure out on my own or by trolling search engines.
I've created some random animate block with a color array etc and everything is working as intended including the creation and deletion of the blocks (div's). My issue is within 2mins of running the page, Firefox 4 is already at more than a 500,000k according to my task manager. IE9 & Chrome have very little noticeable impact yet the processes still continue to increase.
Feel free to check out the link here: http://truimage.biz/wip300/project%202/
My best guess are the div's are being created at a greater speed than the 2000ms they are being removed however I was hoping an expert might either have a solution or could explain what I am doing wrong and some suggestions.
On another note, from the start of my typing this out till now the process is at 2,500,000k. Insane!m
It could be a lot of things other than just your script there. It could be a mem leak in one of the jQuery things you use, pretty hard to say.
Something you could try is this though:
Instead of creating new squares, use a "square pool". Let's say you create 20 squares and just keep re-using them instead of creating new ones.
You'd basically just have an array for the pool and take elements out from it when they are displayed, and put them back to it when the animation finishes.