Web Workers performance expectations

Web Workers performance expectations - javascript

I'm considering using Web Workers to for batch image processing and am wondering what to expect in terms of performance gains.
My current strategy is to process each image in sequence and only start a new process after the current process is over. If I have 10 images that take 10 seconds each to process, the batch will complete in ~100 seconds.
If I use 10 Web Workers at once, I doubt I will complete the entire job in 10 seconds. But will it be lower than 100 seconds? If not, is there an optimal size to a pool of concurrently running Web Workers?

I would imagine that your performance gains will depend heavily on the number of cores you have in your computer. If I were to guess, I'd say a good sweet spot might be four web workers (corresponding to quad-core machines), but the only way to know for sure is to try it out.
Structure your code in such a way that you can simply change a constant to change the number of workers, and then set it to a value that seems to work optimally.

You can try to experiment with this example: Ray-tracing with Web Workers It certainly seems there is a huge performance gain in using 16 vs. 4 web workers here.

Related

Node server randomly spikes to 100% then crashes. How to diagnose?

I'm making an online browser game with websockets and a node server and if I have around 20-30 players, the CPU is usually around 2% and RAM at 10-15%. I'm just using a cheap Digital Ocean droplet to host it.
However, every 20-30 minutes it seems, the server CPU usage will spike to 100% for 10 seconds, and then finally crash. Up until that moment, the CPU usually hovering around 2% and the game is running very smoothly.
I can't tell for the life of me what is triggering this as there are no errors in the logs and nothing in the game that I can see causes it. Just seems to be a random event that brings the server down.
There are also some smaller spikes as well that don't bring the server down, but soon resolve themselves. Here's an image:
I don't think I'm blocking the event loop anywhere and I don't have any execution paths that seem to be long running. The packets to and from the server are usually two per second per user, so not much bandwidth used at all. And the server is mostly just a relay with little processing of packets other than validation so I'm not sure what code path could be so intensive.
What can I do to profile this and find out where to begin in how to investigate what are causing these spikes? I'd like to imagine there's some code path I forgot about that is surprisingly slow under load or maybe I'm missing a node flag that would resolve it but I don't know.

I think I might have figured it out.
I'm using mostly websockets for my game and I was running htop and noticed that if someone sends large packets (performing a ton of actions in a short amount of time) then the CPU spikes to 100%. I was wondering why that was when I remembered I was using a binary-packer to reduce bandwidth usage.
I tried changing the parser to JSON instead so as to not compress and pack the packets and regardless of how large the packets were the CPU usage stayed at 2% the entire time.
So I think what was causing the crash was when one player would send a lot of data in a short amount of time and the server would be overwhelmed with having to pack all of it and send it out in time.
This may not be the actual answer but it's at least something that needs to be fixed. Thankfully the game uses very little bandwidth as it is and bandwidth is not the bottleneck so I may just leave it as JSON.
The only problem is that with JSON encoding that users can read the packets in the Chrome developer console network tab which I don't like.. Makes it a lot easier to find out how the game works and potentially find cheats/exploits..

PhantomJS with embedded web server uses only one CPU

I have a problem using PhantomJS with web server module in a multi-threaded way, with concurrent requests.
I am using PhantomJS 2.0 to create highstock graphs on the server-side with Java, as explained here (and the code here).
It works well, and when testing graphs of several sizes, I got results that are pretty consistent, about 0.4 seconds to create a graph.
The code that I linked to was originally published by the highcharts team, and it is also used in their export server at http://export.highcharts.com/. In order to support concurrent requests, it keeps a pool of spawned PhantomJS processes, and basically its model is one phantomjs instance per concurrent request.
I saw that the webserver module supports up to 10 concurrent requests (explained here), so I thought I can tap on that to keep a lesser number of PhantomJS processes in my pool. However, when I tried to utilize more threads, I experienced a linear slow down, as if PhantomJS was using only one CPU. This slow-down is shown as follows (for a single PhantomJS instance):
1 client thread, average request time 0.44 seconds.
2 client threads, average request time 0.76 seconds.
4 client threads, average request time 1.5 seconds.
Is this a known limitation of PhantomJS? Is there a way around it?
(question also posted here)

Is this a known limitation of PhantomJS?
Yes, it is an expected limitation, because PhantomJS uses the same WebKit engine for everything and since JavaScript is single-threaded, this effectively means that every request will be handled one after the other (possibly interlocked), but never at the same time. The average overall time will increase linearly with each client.
The documentation says:
There is currently a limit of 10 concurrent requests; any other requests will be queued up.
There is a difference between the notions of concurrent and parallel requests. Concurrent simply means that the tasks finish non-deterministically. It doesn't mean that the instructions that the tasks are made of are executed in parallel on different (virtual) cores.
Is there a way around it?
Other than running your server tasks through child_process, no. The way JavaScript supports multi-threading is by using Web Workers, but a worker is sandboxed and has no access to require and therefore cannot create pages to do stuff.

Why does web worker performance sharply decline after 30 seconds?

I'm trying to improve the performance of a script when executed in a web worker. It's designed to parse large text files in the browser without crashing. Everything works pretty well, but I notice a severe difference in performance for large files when using a web worker.
So I conducted a simple experiment. I ran the script on the same input twice. The first run executed the script in the main thread of the page (no web workers). Naturally, this causes the page to freeze and become unresponsive. For the second run, I executed the script in a web worker.
Script being executed
Test runner page
For small files in this experiment (< ~100 MB), the performance difference is negligible. However, on large files, parsing takes about 20x longer in the worker thread:
The blue line is expected. It should only take about 11 seconds to parse the file, and the performance is fairly steady:
The red line is the performance inside the web worker. It is much more surprising:
The jagged line for the first 30 seconds is normal (the jag is caused by the slight delay in sending the results to the main thread after every chunk of the file is parsed). However, parsing slows down rather abruptly at 30 seconds. (Note that I'm only ever using a single web worker for the job; never more than one worker thread at a time.)
I've confirmed that the delay is not in sending the results to the main thread with postMessage(). The slowdown is in the tight loop of the parser, which is entirely synchronous. For reasons I can't explain, that loop is drastically slowed down and it gets slower with time after 30 seconds.
But this only happens in a web worker. Running the same code in the main thread, as you've seen above, runs very smoothly and quickly.
Why is this happening? What can I do to improve performance? (I don't expect anyone to fully understand all 1,200+ lines of code in that file. If you do, that's awesome, but I get the feeling this is more related to web workers than my code, since it runs fine in the main thread.)
System: I'm running Chrome 35 on Mac OS 10.9.4 with 16 GB memory; quad-core 2.7 GHz Intel Core i7 with 256 KB L2 cache (per core) and L3 Cache of 6 MB. The file chunks are about 10 MB in size.
Update: Just tried it on Firefox 30 and it did not experience the same slowdown in a worker thread (but it was slower than Chrome when run in the main thread). However, trying the same experiment with an even larger file (about 1 GB) yielded significant slowdown after about 35-40 seconds (it seems).

Tyler Ault suggested one possibility on Google+ that turned out to be very helpful.
He speculated that using FileReaderSync in the worker thread (instead of the plain ol' async FileReader) was not providing an opportunity for garbage collection to happen.
Changing the worker thread to use FileReader asynchronously (which intuitively seems like a performance step backwards) accelerated the process back up to just 37 seconds, right where I would expect it to be.
I haven't heard back from Tyler yet and I'm not entirely sure I understand why garbage collection would be the culprit, but something about FileReaderSync was drastically slowing down the code.

What hardware are you running on? You may be running into cache thrashing problems with your CPU. For example if the CPU cache is 1MB per core (just an example) and you start trying to work with data continually replacing the cache (cache misses) then you will suffer slow downs - this is quite common with MT systems. This is common in IO transfers too. Also these systems tend to have some OS overheads for the thread contexts as well. So if lots of threads are being spawned you may be spending more time managing the contexts than the thread is 'doing work'. I haven't yet looked at your code, so I could be way off - but my guess is on the memory issue just due to what your application is doing. :)
Oh. How to fix. Try making the blocks of execution small single chunks that match the hardware. Minimize the amount of threads in use at once - try to keep them 2-3x the amount of cores you have in the hardware (this really depends what sort of hw you have). Hope that helps.

Javascript- Dynamically monitor CPU/memory usage

I'm considering writing a game in JavaScript using WebGL and associated technologies. I would like to make the game as intelligent as possible, so I'm looking into monitoring CPU/memory usage.
For example:
For high CPU usage, scale back the graphics a bit or offload computations to the server
For high memory usage, offload data to the server for storage (and later retrieval)
I would like to get the data that Chrome offers in it's Task Manager. I know how to track FPS, and that can lead to some flexibility, but I would like to be have as much information as possible. The main use case is for a 'low power' mode where the CPU is utilized as little as possible (for laptops) or an idle mode when the user is browsing forums, etc.
I know how to use profilers, but I would like access to these tools from JavaScript.
Is this possible? If not, do you know if it has been proposed for standardization?
I would be willing to live with an extension, as long as it could be queried from JavaScript, but I'd like to avoid it if a native feature exists. I'm trying to target recent versions of Firefox and Chrome, but I could restrict myself to a single browser if one supports this.

Well there is no direct javascript call to get such information (which would have been a serious security problem). But there's a solution for your problem, you can use worker pools which are litterally threads for javascript to emulate a function that will run some calculations in the background to compute the CPU usage.
But since you're building a 3D application I will not advise to do this because it will unnecessarily cost you a lot of CPU usage to know the current level of CPU usage, which would be like killing a fly with a submachine gun.
What I advise you to do however is to only focus on frame per seconds because they are related to your application and are the exact benchmarking indication you need. Don't care about cpu load, your application doesn't directly depend on that, especially if you got a dual-core or quad-core processor. You should perhaps also look at GPU usage for your application and how you can fully take benefit of the GPU on a compatible browser (latest Chromes uses GPU acceleration).
Good luck with your game !

We can't retrieve CPU usage or RAM from client-side Javascript, but what matters is the refresh rate, the actual number of frames refreshed by seconds.
If the FPS is over 24 and steady, we simply feels no lags. Even safer over 30FPS, to keep a margin. This leave about 40ms for a frame refresh.
Simply, the following code calculate a frame time refresh, using requestAnimationFrame, convert it to an amount by second, and POST it to the server in JSON at the endpoint /usermetrics, using navigator.sendBeacon()
let t = Date.now();
requestAnimationFrame( () => {
let fps = Math.round(1000 / (Date.now() - t));
console.log(fps + "FPS");
navigator.sendBeacon('/usermetrics', JSON.stringify(fps))
})
From the console we can observe the POST beacon.
You might have to use it strategically, depending the context of your app, basically reduce the load if the FPS goes under 30.
The Performance API
Another example, looping FPS counter

Is setInterval CPU intensive?

I read somewhere that setInterval is CPU intensive. I created a script that uses setInterval and monitored the CPU usage but didn't notice a change. I want to know if there is something I missed.
What the code does is check for changes to the hash in the URL (content after #) every 100 milliseconds and if it has changed, load a page using AJAX. If it has not changed, nothing happens. Would there be any CPU issues with that.

I don't think setInterval is inherently going to cause you significant performance problems. I suspect the reputation may come from an earlier era, when CPUs were less powerful.
There are ways that you can improve the performance, however, and it's probably wise to do them:
Pass a function to setInterval, rather than a string.
Have as few intervals set as possible.
Make the interval durations as long as possible.
Have the code running each time as short and simple as possible.
Don't optimise prematurely -- don't make life difficult for yourself when there isn't a problem.
One thing, however, that you can do in your particular case is to use the onhashchange event, rather than timeouts, in browsers that support it.

I would rather say it's quite the opposite. Using setTimeout and setInterval correctly, can drastical reduce the browsers CPU usage. For instance, using setTimeout instead of using a for or while loop will not only reduce the intensity of CPU usage, but will also guarantee that the browser has a chance to update the UI queue more often. So long running processes will not freeze and lockup the user experience.
But in general, using setInterval really like a lot on your site may slow down things. 20 simultaneously running intervals with more or less heavy work will affect the show. And then again.. you really can mess up any part I guess that is not a problem of setInterval.
..and by the way, you don't need to check the hash like that. There are events for that:
onhashchange
will fire when there was a change in the hash.
window.addEventListener('hashchange', function(e) {
console.log('hash changed, yay!');
}, false);

No, setInterval is not CPU intensive in and of itself. If you have a lot of intervals running on very short cycles (or a very complex operation running on a moderately long interval), then that can easily become CPU intensive, depending upon exactly what your intervals are doing and how frequently they are doing it.
I wouldn't expect to see any issues with checking the URL every 100 milliseconds on an interval, though personally I would increase the interval to 250 milliseconds, just because I don't expect that the difference between the two would be noticeable to a typical user and because I generally try to use the longest timeout intervals that I think I can get away with, particularly for things that are expected to result in a no-op most of the time.

There's a bit of marketing going there under the "CPU intensive" term. What it really means is "more CPU intensive than some alternatives". It's not "CPU intensive" as in "uses a whole lot of CPU power like a game or a compression algorithm would do".
Explanation :
Once the browser has yielded control it relies on an interrupt from
the underlying operating system and hardware to receive control and
issue the JavaScript callback. Having longer durations between these
interrupts allows hardware to enter low power states which
significantly decreases power consumption.
By default the Microsoft Windows operating system and Intel based
processors use 15.6ms resolutions for these interrupts (64 interrupts
per second). This allows Intel based processors to enter their lowest
power state. For this reason web developers have traditionally only
been able to achieve 64 callbacks per second when using setTimeout(0)
when using HTML4 browsers including earlier editions of Internet
Explorer and Mozilla Firefox.
Over the last two years browsers have attempted to increase the number
of callbacks per second that JavaScript developers can receive through
the setTimeout and setInterval API’s by changing the power conscious
Windows system settings and preventing hardware from entering low
power states. The HTML5 specification has gone to the extreme of
recommending 250 callbacks per second. This high frequency can result
in a 40% increase in power consumption, impacting battery life,
operating expenses, and the environment. In addition, this approach
does not address the core performance problem of improving CPU
efficiency and scheduling.
From http://ie.microsoft.com/testdrive/Performance/setImmediateSorting/Default.html

In your case there will not be any issue. But if your doing some huge animations in canvas or working with webgl , then there will be some CPU issues, so for that you can use requestAnimationFrame.
Refer this link About requestAnimationFrame

Function time > interval time is bad, you can't know when cpu hiccups or is slow one and it stacks on top of ongoing functions until pc freezes. Use settimeout or even better, process.nextick using a callback inside a settimeout.

Develop Reference

JavaScript is the programming language of the Web.