Why is GPU render time inconsistent?

Why is GPU render time inconsistent? - javascript

I'm building a WebGL application using THREE and am noticing some odd timing on the GPU. I don't have repro code available at the moment but I thought I'd ask the question in case it's a known browser quirk or something common and fixable.
Scene Setup
Scene with ~2,000,000 polygons, 136 Meshes, and 568 Object3D instances.
Using THREE.Composer with the FXAA and Unreal Bloom passes.
Using THREE.OrbitControls.
The scene is only rendered when something is known to have changed. For example, a draw is scheduled when the user drags the scene to move the camera with the controls or something in the scene moves. The scene is often static so we try not to render unnecessarily in those cases.
The Problem
The issue happens when the scene has been static (not drawn for a bit) and then the user changes the camera position by dragging. Once the user starts dragging the framerate is very choppy -- maybe 10-20 fps or lower -- for several frames before smoothing back out to something closer to 60. This happens consistently when leaving the scene alone for several seconds and then dragging again. If the mouse is dragged consistently after the initial stutter then the framerate stays smooth. Nothing different is being rendered for these frames.
This stuttering doesn't happen and the scene remains snappy if it's rendered every frame using requestAnimationFrame.
Here's the performance profiler with the stutter when the scene is only being rendered when something changes. You can see that there is a lot more time spent on the GPU during the frames that stutter before smoothing out again:
And the profiler when the scene is rendered at 60 fps:
Any thoughts? Why is there so much more GPU work happening suddenly on drag? Could the draw be blocked by some other rendering process? Why would it happen so consistently after not rendering for a few seconds? I've profiled using the latest version of Chrome but the stutter is present in Firefox, as well.
Thank you!

without a live sample there is no easy way to know BUT....
1 Three.js can do frustum culling on objects.
That means if some objects are off outside of the view they won't get drawn. So, put the camera in such a way that all objects are visible will run slower than if only some objects are visible
2 Primitive Clipping
Same as above except at the GPU level. The GPU clips primitives (it doesn't draw or compute pixels outside the view) so similar to above, if the lots of the things you're trying to draw happen to be outside the view it will run faster than if everything is inside the view.
3 Depth(Z) Buffer rejection
Similar to above again, if your objects are opaque then if a pixel is is behind an existing pixel via the depth test the GPU will skip calling the pixel shader if it can. This means if you draw 568 things and the first one you draw is the closest thing to the camera and covers up many things behind it than it will run faster than if all those things behind it draw drawn first. Three.js has the option to sort before drawing. Usually sorting is turned on for transparency since transparent objects need to be drawn back to front. For opaque objects though drawing front to back will be faster if any front objects occlude objects further back.
4 Drawing too many frames?
Another question is how are you queuing your draws? ideally you only queue a single draw and until the drawing has happened don't queue any more.
So
// bad
someElement.addEventListener('mousemove', render);
The code above will try to render for every mouse move even if that's > 60 fps
// bad
someElement.addEventListener('mousemove', () => {
requestAnimationFrame(render);
});
The code above may queue up lots and lots of requestAnimationFrames all of which will get executed on the next frame, drawing your scene multiple times per frame
// good?
let frameQueued = false;
function requestFrame() {
if (!frameQueued) {
frameQueued = true;
requestAnimationFrame(render);
}
}
function render(time) {
frameQueued = false;
...
}
someElement.addEventListener('mousemove', () => {
requestFrame();
});
Or something along those lines so that at most you only queue on render and don't queue any more until that render has completed. The code above is just one example of a way to structure your code so that you don't draw more frames than you need to.

Related

How do I benchmark a WebGL shader?

One can benchmark regular JavaScript functions by counting how many times he could call those functions in a second. On WebGL, though, functions such as gl.drawArrays are async, so you can't measure the time the shader takes by benchmarking the API call.
Is there any way to benchmark WebGL functions?

It's very difficult to benchmark a shader because there's a ton of context and they are very GPU specific.
You might be able to tell if one shader is faster than another by using performance.now before and after drawing a bunch of stuff with that shader (a few thousand to million draw calls) then stalling the GPU by calling gl.readPixels. It will tell you which is faster. It won't tell you how fast they are since stalling the GPU includes the starting and stalling time.
Think of a race car. For a dragster you time acceleration to dest. For a race car you time one lap going full speed. You let the car go one lap first before timing, you time the 2nd lap, the car crosses the starting line going full speed and the finish line also going full speed. So, you get the car's speed where as for the dragster you get its acceleration (irrelevant to GPUs generally since if you're going for speed you should never start and stop them).
Another way to time without adding in the start/stop time is to draw a bunch between requestAnimationFrame frames. Keep increasing the amount until the time between frames jumps up a whole frame. Then compare the amounts between shaders.
There's other issues though in actual usage. For example a tiled GPU (like PowerVR on many mobile devices) attempts to cull parts of primitives that will be overdrawn. So a heavy shader with lots of overdraw on a non-tiled GPU might be plenty fast on a tiled GPU.
Also make sure you're timing the right thing. If you're timing a vertex shader you probably want to make your canvas 1x1 pixel and you're fragment shader as simple as possible and pass a lot of vertices in one draw call (to remove the call time). If you're timing a fragment shader then you probably want a large canvas and a set of vertices that contains several full canvas quads.
Also see WebGL/OpenGL: comparing the performance

There's no way to get exact shader execution time without maybe some GPU vendor-specific tools. However, in addition to gman's suggestion there is EXT_disjoint_timer_query extension which allows to measure execution time of your draw call, which in it's turn significantly depends on shader execution time, especially when your shaders are quite heavy (thus taking majority of time GPU spent execution your draw calls).

JavaScript canvas game development

Ive been having a really baffling slow down issue with the game I am working on probably because I am unsure how to handle graphics (most likely responsible for the slow down) in javascript without using a third party framework like Phaser/ImapactJS/EaselJS etc)*. The following is the low down on how I am approaching my graphics. I would be very thankful for some tips or methods on how to do this right.
My game is tile based - using tiles designed at 500x500 px because I want them to display decently on high definition devices.
I am using a spritesheet to load all (most of) my tiles before the main loop is run. This image is roughly 4000 x 4000 (keeping it below 4096 because the GPU cant handle texture sizes larger than that).
I then use the drawImage function to cycle through and draw each tile on a part of the canvas using information (w, h, x, y) stored in the tile array. I do this on every cycle of the main loop using my drawMap function.
The map is currently 6x6 tiles in size
A character spritesheet is also loaded and drawn on to the canvas after the map has been drawn. The character displays a different frame of the animation on every cycle of the main loop. There are sets of animations for the character each contained in the same spritesheet.
The character sprite sheet is roughly 4000x3500
The character is roughly 350x250 px
Other objects also use the same sprite sheet. Currently there is only one object.
Possibly helpful questions:
Am I using too many spritesheets or too few?
Should I only draw something if it's coordinates are in bounds of the screen?
How should I go about garbage collection? Do I need to set image objects to null when no longer in use?
Thanks in advance for input. I would just like to know if I am going about it the right way and pick your brains as how to speed it up enough.
*Note that I plan to port the JS game to cocoonJS which provides graphics acceleration for the canvas element on mobile.
** If interested please visit my Patreon page for fun!

You have asked lots of questions here, I'll address the ones I've run into.
I would like to start out by saying very clearly,
Use a profiler
Find out whether each thing you are advised to do, by anybody, is making an improvement. Unless we work on your code, we can only give you theories on how to optimise it.
How should I go about garbage collection? Do I need to set image objects to null when no longer in use?
If you are no longer using an object, setting its reference to null will probably mean it gets garbage collected. Having nulls around is not necessarily good but this is not within the scope of this question.
For high performance applications, you want to avoid too much allocation and therefore too much garbage collection activity. See what your profiler says - the chrome profiler can tell you how much CPU time the garbage collector is taking up. You might be OK at the moment.
I then use the drawImage function to cycle through and draw each tile on a part of the canvas using information (w, h, x, y) stored in the tile array. I do this on every cycle of the main loop using my drawMap function.
This is quite slow - instead consider drawing the current on screen tiles to a background canvas, and then only drawing areas which were previously obscured.
For example, if your player walks to the left, there is going to be a lot of tiles on the left hand side of the screen which have come into view; you will need to draw the background buffer onto the screen, offset to account for the movement, and then draw the missing tiles.
My game is tile based - using tiles designed at 500x500 px because I want them to display decently on high definition devices
If I interpret this right, your tiles are 500x500px in diameter, and you are drawing a small number of these on screen. and then for devices without such a high resolution, the canvas renderer is going to be scaling these down. You really want to be drawing pixels 1:1 on each device.
Would you be able, instead, to have a larger number of smaller tiles on screen - thereby avoiding the extra drawing at the edges? Its likely that the tiles around the edges will sometimes draw only a few pixels of one edge, and the rest of the image will be cropped anyway, so why not break them up further?
Should I only draw something if it's coordinates are in bounds of the screen?
Yes, this is a very common and good optimisation to take. You'll find it makes a big difference.
Am I using too many spritesheets or too few?
I have found that when I have a small number of sprite sheets, the big performance hit is when I frequently switch between them. If during one draw phase, you draw all your characters from character_sheet.png and then draw all the plants from plant_sheet.png you'll be ok. Switching between them can cause lots of trouble and you'll see a slow down. You will know this is happening if your profiler tells you that drawImage is taking a big proportion of your frame.

Measure WebGL texture load in ms

How can i measure WebGL texture load in milliseconds?
Right now I have an array of images that will be renderd out as a map using a game loop and im interested in capturing the time it takes for WebGL to load every texture image in milliseconds. I wonder how that can be done to measure this because JavaScript is not synchronous with WebGL.

The only way to measure any timing in WebGL is to figure out how much work you can do in a certain amount of time. Pick a target speed, say 30fps, use requestAnimationFrame, keep increasing the work until you're over the target.
var targetSpeed = 1/30;
var amountOfWork = 1;
var then = 0;
function test(time) {
time *= 0.001; // because I like seconds
var deltaTime = time - then;
then = time;
if (deltaTime < targetTime) {
amountOfWork += 1;
}
for (var ii = 0; ii < amountOfWork; ++ii) {
doWork();
}
requestAnimationFrame(test);
}
requestAnimationFrame(test);
It's not quite that simple because the browsers, at least in my experience, don't seem to give a really stable timing for frames.
Caveats
Don't assume requestAnimationFrame will be at 60fps.
There are plenty of devices that run faster (VR) or slower (low-end hd-dpi monitors).
Don't measure time to start emitting commands until the time you stop
Measure the time since the last requestAnimationFrame. WebGL just
inserts commands into a buffer. Those commands execute in the driver
possibly even in another process so
var start = performance.now; // WRONG!
gl.someCommand(...); // WRONG!
gl.flush(...); // WRONG!
var time = performance.now - start; // WRONG!
Actually use the resource.
Many resources are lazily initialized so just uploading a resource
but not using it will not give you an accurate measurement. You'll
need to actually do a draw with each texture you upload. Of course
make it small 1 pixel 1 triangle draw, with a simple shader. The
shader must actually access the resource otherwise the driver
my not do any lazy initialization.
Don't assume different types/sizes of textures will have proportional
changes in speed.
Drivers to different things. For example some GPUs might not support
anything but RGBA textures. If you upload a LUMINANCE texture the
driver will expand it to RGBA. So, if you timed using RGBA textures
and assumed a LUMINANCE texture of the same dimensions would upload
4x as fast you'd be wrong
Similarly don't assume different size textures will upload at
speed proportional to their sizes. Internal buffers of drivers
and other limits mean that difference sizes might take differnent
paths.
In other words you can't assume 1024x1024 texture will upload
4x as slow as a 512x512 texture.
Be aware even this won't promise real-world results
By this I mean for example if you're on tiled hardware (iPhone
for example) then the way the GPU works is to gather all of
the drawing commands, separate them into tiles, cull any
draw that are invisible and only draw what's left where as
most desktop GPUs draw every pixel of every triangle.
Because a tiled GPU
does everything at the end it means if you keep uploading
data to the same texture and draw between each upload it will
have to keep copies of all your textures until it draws.
Internally there might be some point at which it flushes and
draws what it has before buffering again.
Even a desktop driver wants to pipeline uploads so you upload
contents to texture B, draw, upload new contents to texture B,
draw. If the driver is in the middle of doing the first drawing
it doesn't want to wait for the GPU so it can replace the contents.
Rather it just wants to upload the new contents somewhere else
not being used and then when it can point the texture to the new
contents.
In normal use this isn't a problem because almost no one uploads
tons of textures all the time. At most they upload 1 or 2 video
frames or 1 or 2 procedurally generated textures. But when you're
benchmarking you're stressing the driver and making it do things
it won't actually be doing normally. In the example above it might
assume a texture is unlikely to be uploaded 10000 times a frame
you'll hit a limit where it has to freeze the pipeline until
some of your queued up textures are drawn. That freeze will make
your result appear slower than what you'd really get in normal
use cases.
The point being you might benchmark and get told it takes 5ms
to upload a texture but in truth it only takes 3ms, you just
stalled pipeline many times which outside your benchmark is
unlikely to happen.

Why is my simple webgl demo so slow

I've been trying to learn Web GL using these awesome tutorials. My goal is to make a very simple 2D game framework to replace the canvas-based jawsJS.
I basically just want to be able to create a bunch of sprites and move them around, and then maybe some tiles later.
I put together a basic demo that does this, but I hit a performance problem that I can't track down. once I get to ~2000 or so sprites on screen, the frame rate tanks and I can't work out why. Compared to this demo of the pixi.js webgl framework, which starts losing frames at about ~30000 bunnies or so (on my machine), I'm a bit disappointed.
My demo (framework source) has 5002 sprites, two of which are moving, and the frame rate is in the toilet.
I've tried working through the pixi.js framework to try to work out what they do differently, but it's 500kloc and does so much more than mine that I can't work it out.
I found this answer that basically confirmed that what I'm doing is roughly right - my algorithm is pretty much the same as the one in the answer, but there must be more to it.
I have so far tried a few things - using just a single 'frame buffer' with a single shape defined which then gets translated 5000 times for each sprite. This did help the frame rate a little bit, but nothing close the the pixi demo (it then meant that all sprites had to be the same shape!). I cut out all of the matrix maths for anything that doesn't move, so it's not that either. It all seems to come down to the drawArrays() function - it's just going really slow for me, but only for my demo!
I've also tried removing all of the texture based stuff, replacing the fragment shader with a simple block colour for everything instead. It made virtually no difference so I eliminated dodgy texture handling as a culprit.
I'd really appreciate some help in tracking down what incredibly stupid thing I've done!
Edit: I'm definitely misunderstanding something key here. I stripped the whole thing right back to basics, changing the vertex and fragment shaders to super simple:
attribute vec2 a_position;
void main() {
gl_Position = vec4(a_position, 0, 1);
}
and:
void main() {
gl_FragColor = vec4(0,1,0,1); // green
}
then set the sprites up to draw to (0,0), (1,1).
With 5000 sprites, it takes about 5 seconds to draw a single frame. What is going on here?

A look at a the frame calls using WebGLInspector or the experimental canvas inspector in chrome reveals a totally not optimized rendering loop.
You can and should use one and the same vertexbuffer to render all your geometry,
this way you can save the bindBuffer aswell as the vertexAttribPointer calls.
You can also save 99% of your texture binds as you're repetively rebinding one and the same texture. A texture remains bound as long as you do not bind something else to the same texture unit.
Having a state cache is helpful to avoid binding data that is already bound.
Take a look at my answer here about the gpu as a statemachine.
Once your rendering loop is optimized you can go ahead and consider the following things:
Use ANGLE_instanced_arrays extension
Avoid constructing data in your render loop.
Use an interlaced vertexbuffer.
In some cases not using an indexbuffer also increases
performance.
Check if you can shave off a few GPU cycles in your shaders
Break up your objects into chunks and do view frustum culling on the CPU side.

The problem is probably this line in render: glixl.context.uniformMatrix3fv(glixl.matrix, false, this.matrix);.
In my experience, passing uniforms for each model is very slow in webGL, and I was unable to maintain 60FPS after ~1,000 unique models. Unfortunately there is no uniform buffers in webgl to alleviate this problem.
I solved my problem by just calculating all the vertex positions on the CPU and draw them all using one drawArray call. This should work if the vertex count isnt overwhelming. I can draw 2k moving + rotating cubes at 60 FPS. I dont recall exactly how many cubes you can draw at 60 FPS but it is quite a bit higher than 2k. If that isnt fast enough then you have to look into drawArrayInstanced. Basically, store all the matrices on an arraybuffer and draw all your models using one drawArrayInstanced call with correct offset and such.
EDIT: also to the OP, if you want to see how PIXI does the vertex update rendering (NOT uniform instancing), see https://github.com/GoodBoyDigital/pixi.js/blob/master/src/pixi/renderers/webgl/utils/WebGLFastSpriteBatch.js.

Redraw lots of objects on Canvas HTML

Is there a quick and efficient way to move lots of objects in canvas? Basically if there are around 1000 objects and I want to move all of them at once to emulate scrolling, it is very slow to redraw every single object by calling drawImage() 1000+ times.
Is there anyway to optimize this? I have an example link of the problem (and that's only with 100 objects): http://craftyjs.com/isometric/

Since canvas doesn't provide fast low level bitmap copying it's hard to do stuff in multiple layers and scroll for example the whole background at once and then only render the edges.
So what can you do? In short, nothing. Especially not when scrolling, sure you can do tricks with multiple canvases when you have a more or less static background but for moving objects there are hardly any performance improving tricks.
So, you've go to wait for Hardware Acceleration shipping in all majors browsers, I know this sounds ridiculous but I'm too waiting for that :/
The problem is that the canvas was never designed for game stuff. It was designed as, well, basically some kind of on the fly drawing thing, guess the designers had Photoshop clones in mind, but definitely not games, let alone the fact that there's no fast clear operation proves that, there's not even optimization in place when clearing the whole canvas with the same color.

If the images are already composited, not moving relative to one another, and defined by a rectangular region, then using canvas.drawImage() with a canvas as the first parameter and drawing to a sub-region should be significantly faster than re-drawing all the objects.
You could also just layer multiple canvases and slide the top canvas with the objects in HTML to scroll them.
Edit: Having really looked at your example, it seems to me that it should be implemented similar to Google Maps: create tiles of canvases and slide them left/right on the HTML page; once a canvas has been slid off the screen entirely (for example, off the left edge), move it to the other side (to the right edge) and re-use it for drawing. With this you will only need to re-draw whatever objects overlap the canvases that are moving on the edges.

You can draw all objects on a second, off-screen canvas and then only blit the whole canvas (drawImage() accepts canvas element).
However, if you're targeting desktop browsers, then this shouldn't be necessary. I've implemented tile engine (source) that simply redraws whole scene and naive implementation turned out to be pretty fast.

What I did to solve this problem was I had 10 squares on my screen and I wanted to animate them on a white background. So I drew a white rectangle over the canvas to clear the canvas so the animation would work. Does that make sense?
#Ivo By the way I read on http://www.w3.org/TR/html5/the-canvas-element.html that canvas was made for applications like games because it was a solution to get rid of the dependency on a external engine. Canvas is built in so it's kind of like a flash player built into your browser powered by JavaScript. I think it's fascinating.

You can use tiled rendering.
http://www.gamesfrommars.fr/demojsv2/ (better viewed with Chrome)

Develop Reference

JavaScript is the programming language of the Web.