Why is EventMachine so much slower than Node?

Why is EventMachine so much slower than Node? - javascript

In my specific case, at least. Not trying to make general statements here.
I've got this web crawler that I wrote in Node.js. I'd love to use Ruby instead, so I re-wrote it in EventMachine. Since the original was in CoffeeScript, it was actually surprisingly easy, and the code is very much the same, except that in EventMachine I can actually trap and recover from exceptions (since I'm using fibers).
The problem is that tests that run in under 20 seconds on the Node.js code take up to and over 5 minutes on EventMachine. When I watch the connection count it almost looks like they are not even running in parallel (they queue up into the hundreds, then very slowly work their way down), though logging shows that the code points are hit in parallel.
I realize that without code you can't really know what exactly is going on, but I was just wondering if there is some kind of underlying difference and I should give up, or if they really should be able to run about as fast (a small slowdown is fine) and I should keep trying to figure out what the issue is.
I did the following, but it didn't really seem to have any effect:
puts "Running with ulimit: " + EM.set_descriptor_table_size(60000).to_s
EM.set_effective_user('nobody')
EM.kqueue
Oh, and I'm very sure that I don't have any blocking calls in EventMachine. I've combed through every line about 10 times looking for anything that could be blocking. All my network calls are EM::HttpRequest.

The problem is that tests that run in under 20 seconds on the Node.js code take up to and over 5 minutes on EventMachine. When I watch the connection count it almost looks like they are not even running in parallel (they queue up into the hundreds, then very slowly work their way down), though logging shows that the code points are hit in parallel.
If they're not running in parallel then it's not asynchronous. So you're blocking.
Basically you need to figure out what blocking IO call you've made in the standard Ruby library and remove that and replace it with an EventMachine non blocking IO call.
Your code may not have any blocking calls but are you using 3rd party code that is not your own or not from EM ? They may block. Even something as simple as a debug print / log can block.
All my network calls are EM::HttpRequest.
What about file IO, what about TCP ? What about anything else that can block. What about 3rd party libraries.
We really need to see some code here. Either to identify a bottle neck in your code or a blocking call.
node.js should not be more than an order of magnitude faster then EM.

Related

Javascript - Preventing Chrome From Killing Page during long loop

Chrome keeps killing the page in the middle of my connect-four browser game when it is running properly. The game is a player vs computer setup and the game itself runs properly and never crashes. The page crashes when I set the number of iterations too high for training the computer opponent. The programs trains the ai using a qLearning algorithm where it plays itself and stores a value for each encountered state. If I set the number of iterations to about 125,000 or less, then everything works fine (except the opponent is not so great). I cannot tell if it is the running time of the loop (would take about 30 minutes to run) that kills the program or something else such as memory constraints for recording states and their corresponding q-values.
How can I get the program to run for more training iterations without chrome killing the page?

You've got a couple of options on how to handle your code.
Option 1: setInterval / setTimeout
As others have suggested, using setInterval or setTimeout can run your code in "chunks" and no one chunk will cause a timeout.
Option 2: setInterval + generators
With deeply nested code, using setTimeout is very difficult to properly re-enter the code.
Read up on generators -- that makes running code in chunks much nicer, but it may take some redesign.
Option 3: webworkers
Webworkers provide another way, depending on what you are calculating. They run in the background and don't have access to the DOM or anything else, but they are great at calcuation.
Option 4: nodejs
Your last option is to move away from the browser and run in another environment such as node.JS. If you are running under Windows, HTA files may be another option.

In Node.js, is setTimeout reliable?

I need to perform many "setTimeouts" 60 seconds. Basically, I'm creating a database record, and 60 seconds from now, I need to check whether the database record was changed.
I don't want to implement a "job queue" since it's such a simple thing, and I definitely need to check it around the 60 second mark.
Is it reliable, or will it cause issues?

When you use setTimeout or setInterval the only guarantee that you get is that the code will not be executed before the programmed time.
It can however start somewhat later because other code that is being executed when the clock ticks (in other words other code will not be interrupted in the middle of the handling of an event to process a timeout or interval event).
If you don't have long blocking processing in your code it means that timed events will be reasonably accurate. If you are instead using long blocking calls then probably node is not the correct tool (it's designed around the idea of avoiding blocking "synch" calls).

you should try WorkerTimer.js it is more good for handling background processes and more accurate than the traditional setInterval or Timeout.
it is available as a node.js npm package.
https://www.npmjs.com/package/worker-timer

How does JavaScript's Single Threaded Model handle time consuming tasks?

This question is regarding the sinlge threaded model of JavaScript. I understand that javascript is non-block in nature cause of its ability to add a callbacks to the async event queue. But if the callback function does infact take a long time to complete, won't JavaScript then be blocking everything else during that time as it is single threaded? How does nodejs handle such a problem? And is this an unavoidable problem for developers on the front end? I'm asking this question cause I have read that its generally good practice to keep function tasks as small as possible. Is it really because long tasks in javascript will actually block other tasks?

But if the callback function does infact take a long time to complete, won't JavaScript then be blocking everything else during that time as it is single threaded?
Yes.
How does nodejs handle such a problem?
Node.js handles nothing. How you handle concurrency is up to you and your application. Now, Node.js does have a few tools available to you. The first thing you have to understand is that Node.js is basically V8 (JavaScript engine) with a lightweight library split between JavaScript and native code bolted on. While your JavaScript code is single-threaded by nature, the native code can and does create threads to handle your work.
For example, when you ask Node.js to load a file from disk, your request is passed off to native code where a thread pool is used, and your data is loaded from disk. Once your request is made, your JavaScript code continues on. This is the meaning of "non-blocking" in the context of Node.js. Once that file on disk is loaded, the native code passes it off to the Node.js JavaScript library, which then executes your callback with the appropriate parameters. Your code continued to run while the background work was going on, but when your callback is dealing with that data, other JavaScript code is indeed blocked from running.
This architecture allows you to get much of the benefit of multithreaded code without having to actually write any multithreaded code, keeping your application straightforward.
I'm asking this question cause I have read that its generally good practice to keep function tasks as small as possible. Is it really because long tasks in javascript will actually block other tasks?
My philosophy is always to use what you need. It's true that if a request comes in to your application and you have a lot of JavaScript processing of data that is blocking, other requests will not be processed during this time. Remember though that if you are doing this sort of work, you are likely CPU bound anyway and doing double the work will cause both requests to take longer.
In practice, the majority of web applications are IO bound. They shuffle data from a database, reformat it, and send it out over the network. The part where they handle data is actually not all that time consuming when compared to the amount of time the application is simply waiting to hear back from the upstream data source. It is in these applications where Node.js really shines.
Finally, remember that you can always spawn child processes to better distribute the load. If your application is that rare application where you do 99% of your work load in CPU-bound JavaScript and you have a box with many CPUs and/or cores, split the load across several processes.

Your question is a very large one, so I am just going to focus on one part.
if the callback function does infact take a long time to complete, won't JavaScript then be blocking everything else during that time as it is single threaded? (...) Is it really because long tasks in javascript will actually block other tasks?
Non-blocking is a beautiful thing XD
The best practices include:
Braking every function down into its minimum functional form.
Keep CallBacks asynchronies, THIS is an excellent post on the use of CallBacks
Avoid stacking operations, (Like nested Loops)
Use setTimeout() to brake up potentially blocking code
And many other things, Node.JS is the gold standard of none blocking so its worth a look.
--
--
setTimeout() is one of the most important functions in no-blocking code
So lets say you make a clock function that looks like this:
function setTime() {
var date=new Date();
time = date.getTime()
document.getElementById('id').innerHTML = time;
}
while(true){setTime();}
Its quite problematic, because this code will happily loop its self until the end of time. No other function will ever be called. You want to brake up the operation so other things can run.
function startTime() {
var date=new Date();
time = date.getTime()
document.getElementById('id').innerHTML = time;
setTimeout(startTime(),1000);
}
'setTimeout();' brakes up the loop and executes it every 1-ish seconds. An infinite loop is a bit of an extreme example. The point is 'setTimeout();' is great at braking up large operation chains into smaller ones, making everything more manageable.

Node.js process.exit() does not exit cleanly, and the dangers of async fs.writeFile

tl;dr:
Calling the asynchronous fs.writeFile from asynchronous events (and perhaps even from just a plain old loop) and then calling process.exit() successfully opens the files but fails to flush the data into the files. The callbacks given to writeFile do not get a chance to run before the process exits. Is this expected behavior?
Regardless of whether process.exit() is failing to perform this cleanup, I call into question whether it should be node's duty to at least attempt to work the file writes into the schedule, because it may very well be the case that the deallocation of huge buffers depends on writing them out to disk.
details
I have a conceptually basic piece of node.js code which performs a transformation on a large data file. This happens to be a LiDAR sensor's data file, which should not be relevant. It is simply a dataset that is quite large owing to the nature of its existence. It is structurally simple. The sensor sends its data over the network. My task for this script is to produce a separate file for each rotating scan. The details of this logic is irrelevant as well.
The basic idea is I use node_pcap to read a huge .pcap file using the method given to do this task by node_pcap, which is "offline mode".
What this means is that instead of asynchronously catching the network packets as they appear, what appears to be a rather dense stream of asynchronous events representing the packets are "generated".
So, the main structure of the program consists of a few global state variables, and a single callback to the pcap session. I initialize globals, then assign the callback function to the pcap session. This callback to the packet event does all the work.
Part of this work is writing out a large array of data files. Once in a while a packet will indicate some condition that means I should move on to writing into the next data file. I increment the data filename index, and call fs.writeFile() again to begin writing the new file. Since I am writing only, it seems natural to let node decide when a good time is to begin writing.
Basically, both fs.writeFileSync and fs.writeFile should end up calling the OS's write() system call on their respective files in an asynchronous fashion. This does not bother me because I am only writing, so the asynchronous nature of the write which can affect certain access patterns does not matter to me since I do not do any access. The only difference is in that writeFileSync forces the node event loop to block until such time as the write() syscall completes.
As the program progresses, when I use writeFile (the js-asynchronous version), hundreds of my output files are created, but no data is written to them. Not one. The very first data file is still open when the hundredth data file is created.
This is conceptually fine. The reason is that node is busy crunching new data, and is happily holding on to the increasing number of file descriptors that it will eventually get to in order to write the files' data in. Meanwhile it also has to keep inside of memory all the eventual contents of the files. This will eventually run out, but let's ignore the RAM size limitation for a moment. Obviously a bad thing to happen here would be running out of RAM and crashing the program. Hopefully node will be smart and realize it just needs to schedule some file writes and then it can free a bunch of buffers...
If I stick a statement in the middle of all this to call process.exit(), I would expect that node will clean up and flush the pending writeFile writes before exiting.
But node does not do this.
Changing to writeFileSync fixes the problem obviously.
Changing and truncating my input data such that process.exit() is not explicitly called also results in the files eventually getting written (and the completion callback given to writeFile to run) at the very end when the input events are done pumping.
This seems to indicate for me that the cleanup is being improperly performed by process.exit().
Question: Is there some alternative to exiting the event loop cleanly in the middle? Note I had to manually truncate my large input file, because terminating with process.exit() caused all the file writes to not complete.
This is node v0.10.26 installed a while ago on OS X with Homebrew.
Continuing with my thought process, the behavior that I am seeing here calls into question the fundamental purpose of using writeFile. It's supposed to improve things to be able to flexibly write my file whenever node deems it fit. However, apparently if node's event loop is pumped hard enough, then it will basically "get behind" on its workload.
It is like the event loop has an inbox and an outbox. In this analogy, the outbox represents the temp variables containing the data I am writing to the files. The assumption that a lazy productive programmer like me wants to make is that the inbox and outbox are interfaces that I can use and that they are flexible and that the system will manage for me. However if I feed the inbox at too high a rate, then node actually can't keep up, and it will just start piling the data into the outbox without having any time to flush it because for one reason or another, the scheduling is such that all the incoming events have to get processed first. This in turn defers all garbage collection of the outbox's contents, and quite quickly we deplete the system's RAM. This is quite easily a hard-to-find bug when this pattern is used in a complex system. I am glad I took a modular approach to this project.
I mean, yes, clearly, obviously, beyond all doubt the answer is to use writeFileSync as I do almost every single time that I write files with node.
What, then, is the value in even having writeFile? At this point I am trading a potential small increase in parallel processing for the increased possibility that if (for some reason) the machine's processing capability drops (whether it's thermal throttling or OS level scheduling or I don't pay my IaaS bills on time, or any other reason), that it can potentially lead to a snowballing memory explosion?
Perhaps this is getting at the core of solving the truly rather complex problems inherent in streaming data processing systems, and that I cannot realistically expect this event-based processing model to step up and elegantly solve these problems automatically. Maybe I should be satisfied that it only gets me about half of the way to something robust. Maybe I am just projecting my wishes onto it and that it is unreasonable for me to assume that node needs to less deterministically "improve" the scheduling of its event loop.

I'm not a node expert but it seems like your problem can be simplified using streams. Streams let you pause and resume and also provide other neat functionality. I suggest you take at look at Chapter 9 of Professional NodeJS by Pedro Teixeira. You can find an online copy easily for reading purposes. It provides very detailed and well explained examples on how to use streams to read and write data and also prevent potential memory leaks and loss of data.

Is there is any way to call a function when an infinite loop or browsser hangs in javascript?

hi i m working a project which have huge javascript code sometimes javascript code executes automaticaly and hangs browser with message "kill this page" in chrome is there is any way to track the error .like calling calling a function when infinite loop arrives or browsser hangs like that .please give me some suggestion about debugging javascript code plz.

There is no way of doing what you wish inside javascript.
However you can use a tool like DynaTrace Ajax Edition to trace cpu usage in the browser to identify what is happening.

Infinite loop can be caused in many different bad programming logic, and there is no reliable logic to detect in every case. So I highly doubt if any programming language or IDE would offer any reliable infinite loop detection.
What you saw was basically a runtime detection based on how long script execution has taken before the browser could update and refresh the UI.
Sometimes this kind of long running JavaScript could be caused by infinite loop, but many times they are just big loops or loops that perform too much work that makes UI unresponsive.
Because JavaScript is not multi-thread, therefore to avoid the later case above, you could consider breaking the loops into small units of work, once a unit is finished, don't run the next unit, but instead call the next unit of work with setTimeout with a small time delay (such as 250ms). This would give the browser a chance to breath and update UI, and unmark your script as "long-running" script.
You may also use logging such as Firebug Logging to log the loops with enough values that help you find out if those loops are indeed infinite loops.

Develop Reference

JavaScript is the programming language of the Web.