PhantomJS with embedded web server uses only one CPU

PhantomJS with embedded web server uses only one CPU - javascript

I have a problem using PhantomJS with web server module in a multi-threaded way, with concurrent requests.
I am using PhantomJS 2.0 to create highstock graphs on the server-side with Java, as explained here (and the code here).
It works well, and when testing graphs of several sizes, I got results that are pretty consistent, about 0.4 seconds to create a graph.
The code that I linked to was originally published by the highcharts team, and it is also used in their export server at http://export.highcharts.com/. In order to support concurrent requests, it keeps a pool of spawned PhantomJS processes, and basically its model is one phantomjs instance per concurrent request.
I saw that the webserver module supports up to 10 concurrent requests (explained here), so I thought I can tap on that to keep a lesser number of PhantomJS processes in my pool. However, when I tried to utilize more threads, I experienced a linear slow down, as if PhantomJS was using only one CPU. This slow-down is shown as follows (for a single PhantomJS instance):
1 client thread, average request time 0.44 seconds.
2 client threads, average request time 0.76 seconds.
4 client threads, average request time 1.5 seconds.
Is this a known limitation of PhantomJS? Is there a way around it?
(question also posted here)

Is this a known limitation of PhantomJS?
Yes, it is an expected limitation, because PhantomJS uses the same WebKit engine for everything and since JavaScript is single-threaded, this effectively means that every request will be handled one after the other (possibly interlocked), but never at the same time. The average overall time will increase linearly with each client.
The documentation says:
There is currently a limit of 10 concurrent requests; any other requests will be queued up.
There is a difference between the notions of concurrent and parallel requests. Concurrent simply means that the tasks finish non-deterministically. It doesn't mean that the instructions that the tasks are made of are executed in parallel on different (virtual) cores.
Is there a way around it?
Other than running your server tasks through child_process, no. The way JavaScript supports multi-threading is by using Web Workers, but a worker is sandboxed and has no access to require and therefore cannot create pages to do stuff.

Related

Does CPU time slicing work on Node.js with or without worker thread if there's only 1 cpu core?

Since Node.js 10.5, they introduced the new worker thread which makes Node.js a multi-thread environment.
Previously, with only one thread on Node.js, there's no cpu time slicing happening because of the event driven nature (If I understand correctly).
So now multiple threads on Node with with one physical cpu core, how do they share the cpu? is it the OS scheduler schedule time for each thread to run for various amount of time or what?

Worker threads are advertised as
The Worker class represents an independent JavaScript execution thread.
So something like starting another NodeJS instance but within the same process and with a bare minimum of a communication channel.
Worker threads in NodeJS mimick the Worker API in the modern browsers (not a coincidence, NodeJS is basically a browser without UI and with a few extra JS API) and in that context worker threads are really native threads, scheduled by the OS.
The description quoted above seems to imply that in NodeJS too, worker threads are implemented with native threads rather than with a scheduling managed by NodeJS.
The latter would be useless as this is exactly what the JS event loop coupled with async methods do.
So basically a worker thread is just another "instance" (context) of NodeJS run by another native thread in the same process.
Being a native thread, it is managed and scheduled by the OS. And just like you can run more than one program in a single CPU you can do that with threads (fun fact: in many OSes, threads are the only schedulable entities. Programs are just a group of thread with a common address space and other attributes).
As NodeJS is open source, it is easy to confirm this, see the Worker::StartThread and the Worker::Run functions.
The new thread will execute JS code just like the main one but it has been limited in the way it can interact with the environment (particularly the process itself).
This is in line with the JS approach to multithreading where it is more of "two or more message loops" than real multithreading (where threads are free to interact with each other with all the implication at the architectural level).

Segmentation Fault during high load concurrency test with WebWorker Threads

So, I'm trying to conduct a test to see how much WebWorker Threads (https://github.com/audreyt/node-webworker-threads) can improve CPU intensive tasks with NodeJS in a multi core system.
I actually got this working on a VM with a single core assigned at work, but when I tried it on my home VM with 4 cores, I'm getting a Segmentation Fault after 15-20 requests.
I've got my project up at https://github.com/WakeskaterX/NodeThreading.git
I have tried eliminating pieces to see why I'm getting the SegFault, but even just returning static numbers throws the SegFault after 15-20 requests.
For the loadtest command I'm running:
loadtest -c 4 -t 20 http://localhost:3030/fib?num=30
It runs just fine when it's synchronously calculating the Fibonacci sequence, but as soon as it hits a web worker it Segmentation Fault Core Dumps. Perhaps this is related to the WebWorker-Threads code on the back end, but I'm mainly wondering why it's happening and how I can debug it further or fix it so I can test background threading in nodejs.

this is a variable lifetime issue — In general, a long-running worker needs to be assigned into an object, instead of a lexical variable; the latter is garbage-collected away when the scope exits.
See https://github.com/WakeskaterX/NodeThreading/pull/1 for the pull request that fixes the issue.

Why does web worker performance sharply decline after 30 seconds?

I'm trying to improve the performance of a script when executed in a web worker. It's designed to parse large text files in the browser without crashing. Everything works pretty well, but I notice a severe difference in performance for large files when using a web worker.
So I conducted a simple experiment. I ran the script on the same input twice. The first run executed the script in the main thread of the page (no web workers). Naturally, this causes the page to freeze and become unresponsive. For the second run, I executed the script in a web worker.
Script being executed
Test runner page
For small files in this experiment (< ~100 MB), the performance difference is negligible. However, on large files, parsing takes about 20x longer in the worker thread:
The blue line is expected. It should only take about 11 seconds to parse the file, and the performance is fairly steady:
The red line is the performance inside the web worker. It is much more surprising:
The jagged line for the first 30 seconds is normal (the jag is caused by the slight delay in sending the results to the main thread after every chunk of the file is parsed). However, parsing slows down rather abruptly at 30 seconds. (Note that I'm only ever using a single web worker for the job; never more than one worker thread at a time.)
I've confirmed that the delay is not in sending the results to the main thread with postMessage(). The slowdown is in the tight loop of the parser, which is entirely synchronous. For reasons I can't explain, that loop is drastically slowed down and it gets slower with time after 30 seconds.
But this only happens in a web worker. Running the same code in the main thread, as you've seen above, runs very smoothly and quickly.
Why is this happening? What can I do to improve performance? (I don't expect anyone to fully understand all 1,200+ lines of code in that file. If you do, that's awesome, but I get the feeling this is more related to web workers than my code, since it runs fine in the main thread.)
System: I'm running Chrome 35 on Mac OS 10.9.4 with 16 GB memory; quad-core 2.7 GHz Intel Core i7 with 256 KB L2 cache (per core) and L3 Cache of 6 MB. The file chunks are about 10 MB in size.
Update: Just tried it on Firefox 30 and it did not experience the same slowdown in a worker thread (but it was slower than Chrome when run in the main thread). However, trying the same experiment with an even larger file (about 1 GB) yielded significant slowdown after about 35-40 seconds (it seems).

Tyler Ault suggested one possibility on Google+ that turned out to be very helpful.
He speculated that using FileReaderSync in the worker thread (instead of the plain ol' async FileReader) was not providing an opportunity for garbage collection to happen.
Changing the worker thread to use FileReader asynchronously (which intuitively seems like a performance step backwards) accelerated the process back up to just 37 seconds, right where I would expect it to be.
I haven't heard back from Tyler yet and I'm not entirely sure I understand why garbage collection would be the culprit, but something about FileReaderSync was drastically slowing down the code.

What hardware are you running on? You may be running into cache thrashing problems with your CPU. For example if the CPU cache is 1MB per core (just an example) and you start trying to work with data continually replacing the cache (cache misses) then you will suffer slow downs - this is quite common with MT systems. This is common in IO transfers too. Also these systems tend to have some OS overheads for the thread contexts as well. So if lots of threads are being spawned you may be spending more time managing the contexts than the thread is 'doing work'. I haven't yet looked at your code, so I could be way off - but my guess is on the memory issue just due to what your application is doing. :)
Oh. How to fix. Try making the blocks of execution small single chunks that match the hardware. Minimize the amount of threads in use at once - try to keep them 2-3x the amount of cores you have in the hardware (this really depends what sort of hw you have). Hope that helps.

High CPU Utilization for Meteor.js

A meteor.js 0.82 app is running on an Ubuntu 14.04 server with 2GB memory and 2 cpu cores. It was deployed using mup. However the CPU utilization is very high, htop reports 2.72 load average.
Question: How do I find out which part of the app is causing such a high CPU utilization? I used Kadira but it does not reveal anything taking up alot of CPU load afaik.
Does Meteor only use a single core?

I had a similar problem before with Meteor 0.8.2-0.8.3. Here are what I have done to reduce the CPU usage, hope you may find it useful.
double check your functions, ensure all function has proper return, and does properly catch errors
try to use a replicaSet and oplog mongo convert standalone to replica set
write scripts to auto kill and resprawn a node process if it exceeds 100% cpu usage
utilize multi-core capability by starting 2 processes (edit you have done already) and configure and setup load-balance and reverse proxy
make sure to review your publish and subscription and limit what data to be sent to client (simply avoid something like Collection.find();)
Personally I recommend Phusion Passenger, it makes deploying Meteor applications an ease, and I have used it for several projects without any major problems.
One more thing, avoid running the processes in root (or privilege user), you should be running your apps in another user like www-data. This is for obvious security reason.
P.S. and multiple mongo processes showing in htop are threads under a master process, you can view it in tree mode by pressing F5.

How does Nodejs performance scale when using http.request?

I'm writing an application that makes heavy use of the http.request method.
In particular, I've found that sending 16+ ~30kb requests simultaneously really bogs down a Nodejs instance on a 512mb RAM machine.
I'm wondering if this is to be expected, or if Nodejs is just the wrong platform for outbound requests.

Yes, this behavior seems perfectly reasonable.
I would be more concerned if it was doing the work you described without any noticeable load on the system (in which case it would take a very long time). Remember that node is just an evented I/O runtime, so you can have faith that it is scheduling your I/O requests (about) as quickly as the underlying system can, hence it's using the system to it's (nearly) maximum potential, hence the system being "really bogged down".

One thing you should be aware of is the fact that http.request does not create a new socket for each call. Each request occurs on an object called an "agent" which contains a pool of up to 5 sockets. If you are using the v0.6 branch, then you can up this limit by using.
http.globalAgent.maxSockets = Infinity
Try that and see if it helps

Develop Reference

JavaScript is the programming language of the Web.