I am creating a game server in Node.js and I think I have a good server loop going with setInterval(). However, the way I plan to make my game is having many small instances of games running at the same time, meaning having many different loops of setInterval() happening at once.
Can too many setInterval() running at the same time be problematic, and if so, is there a better way to structure my server? e.g. attempting to run all instances of the game within one setInterval() loop?
With a purely theoretical question and no code or data included, the answer is yes. It can certainly be possible to be running too many setInterval() calls at the same time.
As with most gaming, if your require some predictable timing of the setInterval() calls in order to maintain smooth game flow, then as you get more and more calls running, you start to lose the predictability for when they will run. Because Javascript in node.js is a single threaded event-driven design, if two setInterval() calls are scheduled to happen at the same moment, one will run first and the other won't run until the first is done with its work. How many you need to have before this becomes a noticeable issue depends entirely upon the circumstances of your code - how many you have, how long each one takes to execute and what your game's tolerance for slight delays in timing is.
Because of node.js single threaded nature, it is not great at having a single process that is trying to do lots of timing-sensitive things. The typical way to work-around this is to use multiple node.js processes (often one per CPU core in your computer) so you get all the CPU cores involved. There are even cases where you're trying to promote more fairness between actions that you may want to have more processes than you have CPU cores as this gives you some OS-driven time slicing between work going on in different node.js processes.
Since we don't know anything about your specifics, we can't say whether you would benefit from clustering (multiple node.js processes all running the same code and all sharing incoming load) or from using a separate node.js app for each N instances of the game where N is something you figure out by testing. Probably you want a single game instance handled by one particular node instance so your model is more like the second option (N game instances per node.js instance), but that's just a guess since we have no specifics.
But as I said in my comment, because you've asked a purely theoretical question with no code and no data, all we can really do is answer hypothetically - there is no real world code or data to go by.
Related
Since node runs a single threaded model with event looping I wonder how node prevents the entire application to fail if you write a code like:
while(true){ doSomething()}
where doSomething is a synchronous function (a blocking piece of code)
Note that it doesn't make any sense to write a function like doSomething but nothing prevents you to make a mistake
The problem here is that, since it's single threaded, it won't allow any other parts of the application to run (for instance, a web server would stop accepting new connections) because this function would never end. In a Multi threaded environment you would loose this thread alone.
Is there anything that node can do for you to prevent these kind of problems?
I wonder how node prevents the entire application to fail if you write an infinite loop
nodejs does not prevent such an infinite loop. It will just run that loop forever or until some resource is exhausted (if the loop is consuming some resource like memory).
If node can't prevent this kind of situations, is this a design fault or there's no way to prevent these kind of problems?
I don't think most people consider it a design fault - though that's purely an opinion and different people may have a different opinion. It is a consequence of the way nodejs was designed which has many other benefits.
The only way to prevent such problems is to not write faulty code that does this. Honestly, it's not too hard to avoid writing this type of code once you're aware that it's an issue to avoid.
The problem here is that, since it's single threaded, it won't allow any other parts of the application to run (for instance, a web server would stop accepting new connections) because this function would never end. In a Multi threaded environment you would loose this thread alone
Correct. This is something you learn when coding in nodejs. I've never found it a hard thing to avoid. nodejs is an single-threaded event driven system, not a multi-threaded system. As such, you program with events, not long running loops that poll or check conditions. It is a rather straightforward concept to learn and use once you understand this is how nodejs works. It is different than some other environments. But, how to use asynchronous operations in nodejs is just something you have to learn to program in that environment. It's not avoidable and is just part of the character of nodejs. There is no way that nodejs could have the type of architecture it has without having to learn this to program in it. If you want a different architecture (for whatever personal reason), then pick a different environment, not nodejs.
The single-threadedness massively simplifies many other things (far, far fewer opportunities for race conditions) and improves scalability in some circumstances (with asynchronous I/O) vs. threaded environments. For situations where you want multiple CPUs to be applied to your problem, it is generally straightforward in node.js to either use the built-in clustering module or to fire up worker processes and feed them work. Data is often shared among multiple processes via some sort of database (either file-based or RAM-based) that handles much of the multi-process synchronization for you.
It doesn't. This seems like less of a question and more an open statement. Node will loop infinitely and all your parallel code will stop running.
it's not possible to find such issue in the node.js program itself. however a node.js script with an infinite loop will use lead to 100% cpu . so this can be monitored and you can use tools to restart the program. I don't recommand to do this, you should fix your infinite loop first, but it s sometimes hard to find the issue with large codebase. last time it happened to be I used a remote debugger to find the infinite loop.
I am writing a node app with various communities and within those communities users are able to create and join rooms/lobbies. I have written the logic for these lobbies into the node app itself though a collection of lobby objects.
Lobbies require some maintenance once created. Users can change various statuses within the lobby and I also have calls using socket.io at regular intervals(about every 2 seconds) for each lobby to keep track of some user input "live".
None of the tasks are too cpu intensive. The biggest potential threat I foresee is a load distributing algorithm but it is not one of the "live calls" and is only activated upon a button press by the lobby creator (it also is never performed on more than 10 things).
My concern arises in that, in production, if the server starts to get close too around 100-200 lobbies I could be risking blocking the event loop. Are my concerns reasonable? Is the potential quantity of these operations, although they are small, large enough to warent offloading this code to a separate executable or getting involved with various franken-thread javascript libs?
TL;DR: node app has object with regular small tasks run. Should I worry about event-loop blocking if many of these objects are made.
There is no way to know ahead of time whether what you describe will "fill" up the event loop and take all the time one thread has or not. If you want to "know", you will have to build a simulation and measure while using commensurate hardware with what you expect to use in production.
With pretty much all performance questions, you have to measure, measure and measure again to really know or understand whether you have a problem and, if so, what is the main source of the problem.
For non-compute intensive things, your CPU can probably handle a lot of activity. If you get a lot of users all pounding transactions every two seconds though, you could end up with a bottleneck somewhere that causes issues. 200 users with a transaction every 2 seconds means 100 transactions/sec which means if you take more than 10ms of CPU or of any other serialized resource (like perhaps the network card) per transaction, then you may have issues.
As for offloading some work to another process, I wouldn't spend much time worrying about that until you've measured whether you have an issue or not. If you do, then it will be very important to understand what the main cause of the issue is. It may make more sense to simply cluster your node.js processes to put multiple processes on the same server than it does to break your core logic into multiple processes. It will all depend upon what the main cause of your issue is (if you even have an issue). Or, you may end up needing multiple network cards. Or something else. It's far too early to know before measuring and understanding.
I know that in node everything runs in parallel, except your code. Read here and here.
I’m looking a possible scenario where I have a very large array in memory and I want to perform a small computation to each of its elements. The order in which this computation executes is not important.
In node all I/O is executed very efficiently because of the event loop but when you iterate through a collection there is no I/O involved and if this iteration takes too long you can block all incoming requests in that period of time.
This gist contains a nonBlockingForEach that Neilk wrote in his article Why you should use Node.js for CPU-bound tasks which makes me wonder if I write something like this
var my_very_large_array = [...];
my_very_large_array.forEach(function() { ... })
//or
_.each(my_very_large_array, function() { ... })
I will hit a performance bottleneck on my server (this libraries fallback to the native forEach if present)
From what I learned you there are a lot of libraries like async.js to do that but I always use lodash or underscore for those tasks in the browser.
I also tried bluebird.js but promisifying those methods didn't work as expected.
So my question is this. Is lodash or underscore a performance killer in a node.js environment when you iterate through a large collection using a forEach method?
There is a new standard called "Web Workers" which do allow background work to happen in a separate thread in the same process. This requires later versions of node.js, and a separate package installed from here:
From the wiki page:
The W3C and WHATWG envision web workers as long-running scripts that are not interrupted by user-interface scripts (scripts that respond to clicks or other user interactions). Keeping such workers from being interrupted by user activities should allow Web pages to remain responsive at the same time as they are running long tasks in the background.
The simplest use of workers is for performing a computationally expensive task without interrupting the user interface.
You can enable Web workers in node.js by installing the webworker-threads package
So my question is this. Is lodash or underscore a performance killer in a node.js environment when you iterate through a large collection using a forEach method?
Yes, lodash and underscore will both kill performance in a typical node.js setup when working with a large amount of data, as they will block the only thread available, making other tasks queued up in the event-loop suffer. However if you were to run these in a web-worker thread, then your main thread would be free to continue processing work as normal.
"Performance Killer" is a relative, and fairly loaded term.
To answer your direct question, does forEach in lodash or underscore run in parallel: no. It uses a standard, synchronous iteration.
NodeJS is a single-thread single-process application. It does not matter if you process the entire array in a single forEach, or break it apart into multiple loops processed by the event queue. It's still going to take the same amount of work for that CPU core.
If you want to take advantage of multiple cores. You need to use the cluster module, and create multiple processes that work on different parts of the array.
There is no shared memory, or thread locking in NodeJS. So you will have to break apart the array into pieces for each process to work on.
https://nodejs.org/api/cluster.html
Node.js servers are very efficient concerning I/O and large number of client connection. But why is node.js not suitable for heavy CPU apps in comparison to a traditional multithreading server?
I read it here Felix Baumgarten
Node is, despite its asynchronous event model, by nature single threaded. When you launch a Node process, you are running a single process with a single thread on a single core. So your code will not be executed in parallel, only I/O operations are parallel because they are executed asynchronous. As such, long running CPU tasks will block the whole server and are usually a bad idea.
Given that you just start a Node process like that, it is possible to have multiple Node processes running in parallel though. That way you could still benefit from your multithreading architecture, although a single Node process does not. You would just need to have some load balancer in front that distributes requests along all your Node processes.
Another option would be to have the CPU work in separate processes and make Node interact with those instead of doing the work itself.
Related things to read:
Node.js and CPU intensive requests
Understanding the node.js event loop
A simple Node.js server is single-threaded, meaning that any operation that takes a long time to execute will block the rest of your program from running. Node.js apps manage to maintain a high level of concurrency by working as a series of events. When an event handler is waiting for something to happen (such as reading from the database), it tells Node to go ahead and process another event in the meantime. But since a single thread can only execute one instruction at a time, this approach can't save you from a function that needs to keep actively executing for a long time. In a multithreaded architecture, even if one function takes a long time to compute the result, other threads can still process other requests — and as long as you have a core that is not fully used at the time, there's a good chance they can do it about as quickly as if no other requests were running at all.
In order to deal with this, production Node.js apps that expect to hog a lot of CPU will usually be run in clusters. This means that instead of having several threads in one program's memory space, you run several instances of the same program under the control of one "master" instance. Each process is single-threaded, but since you have several of them, you end up gaining the benefits of multiple threads.
Node is flawless if you are having asynchronous tasks because java script will run these things by worker pool. But if you run CPU intense tasks (where you heavily use CPU ) Ex you have a billion users and you want to sort those people on name. Its quit a Intense tasks, and this is synchronous which will block other code from running.
So its not a good idea to use node for these kind of applications. Technically you can find alternatives to address those kind of tasks. The above example is better addressed in a Db. then passing that result is great.
In the same way avoid Intense task and keep your CPU cool for better performance
You can have a look at this package, the-computer, which may help you do some cpu intensive works in a single instance of node.js app in a simple way.
Definitely it is not as effective as raw c++ libs, but it can cover most general computing cases, keeping you in node.js garden while allowing you leverage the cores of the cup.
Node.js runs JavaScript code in a single thread, which means that your code can only do one task at a time. However, Node.js itself is multithreaded and provides hidden threads through the libuv library, which handles I/O operations like reading files from a disk or network requests. Through the use of hidden threads, Node.js provides asynchronous methods that allow your code to make I/O requests without blocking the main thread.
Although Node.js has hidden threads, you cannot use them to offload CPU-intensive tasks, such as complex calculations, image resizing, or video compression. Since JavaScript is single-threaded when a CPU-intensive task runs, it blocks the main thread and no other code executes until the task completes. Without using other threads, the only way to speed up a CPU-bound task is to increase the processor speed.
💡 Node.js introduced the worker-threads module, which allows you to create threads and execute multiple JavaScript tasks in parallel. Once a thread finishes a task, it sends a message to the main thread that contains the result of the operation so that it can be used with other parts of the code. The advantage of using worker threads is that CPU-bound tasks don’t block the main thread and you can divide and distribute a task to multiple workers to optimize it.
ref: https://www.digitalocean.com/community/tutorials/how-to-use-multithreading-in-node-js
Let's say I'm building a 3-tier web site, with Mongo DB on the back end and some really lightweight javascript in the browser (let's say just validation on forms, maybe a couple of fancy controls which fire off some AJAX requests).
I need to choose a technology for the 'middle' tier (we could segment this into sub-tiers but that detail isn't the focus here, just the overall technology choice), where I want to crunch some raw data coming out of the DB, and render this into some HTML which I push to the browser. A fairly typical thin-client web architecture.
My safe choice would be to just implement this middle tier in Java, using some libraries like Jongo to talk to the Mongo DB and maybe Jackson to marshal/unmarshal JSON to talk to my fancy controls when they make AJAX requests. And some Java templating framework for rendering my HTML on the server.
However, I'm really intrigued by the idea of throwing all this out the window and using Node.js for this middle tier, for the following reasons:
I like javascript (the good parts), and let's say for this application's business logic it would be more expressive than Java.
It's javascript everywhere. No need to switch between languages, and indeed the OO and functional paradigms, when working anywhere on the stack. There's no translation plumbing between the tiers, JSON is supported natively everywhere.
I can reuse validation logic on the client and server.
If in the future I decide to do the HTML rendering client-side in the browser, I can reuse the existing templates with something like Backbone with a pretty minimal refactoring / retesting effort.
If you're at this point and like Node, all the above will seem obvious. So I should choose Node right?
BUT... this is where it falls down for me: as we all know Node is based around a single-threaded async I/O web server model. This is great for my scalability and performance in terms of servicing requests for data, but what about my business logic? What about my template rendering? Won't this stuff cause a huge bottleneck for all requests on the single thread?
Two obvious solutions come to mind, but neither of them sits right:
Keep the 'blocking' business logic in there and just use a cluster of Node instances and a load balancer, to service requests in true parallel. Ok great, so why isn't Node just multi-threaded in the first place? Or was this always the idea, to Keep It Simple Stupid and avoid the possibility of multi-threaded complexity in the base case, making the programmer do the extra setup work on top of this if multi-core processing power is desired?
Keep a single node instance, and keep it non-blocking by just calling out to some java implementation of my business logic running on some other, muti-threaded, app server. Ok, this option completely nullifies every pro I listed of using Node (in fact it adds complexity over just using Java), other than the possible gains in performance and scalability for CRUD requests to the DB.
Which leads me finally to the point of my question - am I missing some huge important piece of the Node puzzle, have I just got my facts completely wrong, or is Node just unsuitable for crunching business logic on the server? Put another way, is Node just useful for sitting over a database and servicing many CRUD requests in a more performant and scalable way than some other implementation which blocks on I/O? And you have to do all your business logic in some tier below, or even client-side, to maintain any reasonable levels of performance and scalability?
Considering all the buzz over Node, I'd rather hoped it brought more to the table than this. I'd love to be convinced otherwise!
On any given system you have N cpus available (1-64, or whatever N happens to be). In any CPU-intensive application, you're going to be stuck with a throughput of N cpus. There's no magical way to fix that by adding more than N threads/processes/whatever. Either your code has to be more efficient, or you need more CPUs. More threads won't help.
One of the little-appreciated facts about multiple-CPU performance is that if you need to run N+1 CPU-intensive operations at the same time, your throughput per CPU goes down quite a bit. A CPU-intensive process tends to hang on to that CPU for a very long time before giving it up, starving the other tasks pretty badly. In the majority of cases, it is blocking I/O and the concomitant task-switching that makes modern OS multitasking work as well as it does. If more of our every-day common tasks were CPU-bound, we would discover we needed a lot more CPUs in our machines than we do currently.
The nice thing that Node.js brings to the server party efficiency-wise is a thorough use of each thread. Ideally, you end up with less task switching. This isn't a huge win, but having N threads handling N*C connections asynchronously is going to have a performance advantage over N*C blocking threads running on the same number of CPUs. But the bottom line on CPUs remains the same: if you have more than N worth of actual CPU work to be done, you're going to feel some pain.
The last time I looked at the Node.js API there was a way to launch a server with one listener plus one worker thread per CPU. If you can do that, I would be inclined to go with Node.js provided a few caveats are met:
The Javascript-everywhere approach buys you some simplicity. For something complicated, I would be concerned about the asynchronous programming style making things harder rather than easier.
The template-processing and other CPU-intensive tasks aren't appreciably slower in Node.js than your other language/platform choices.
The database drivers are reliable.
There is one downside that I can see:
If a thread crashes, you lose all of the connections being serviced by that thread.
Finally, try to remember that programmer time is generally more expensive than servers or bandwidth.