Should Node.js be used for intensive processing? - javascript

Let's say I'm building a 3-tier web site, with Mongo DB on the back end and some really lightweight javascript in the browser (let's say just validation on forms, maybe a couple of fancy controls which fire off some AJAX requests).
I need to choose a technology for the 'middle' tier (we could segment this into sub-tiers but that detail isn't the focus here, just the overall technology choice), where I want to crunch some raw data coming out of the DB, and render this into some HTML which I push to the browser. A fairly typical thin-client web architecture.
My safe choice would be to just implement this middle tier in Java, using some libraries like Jongo to talk to the Mongo DB and maybe Jackson to marshal/unmarshal JSON to talk to my fancy controls when they make AJAX requests. And some Java templating framework for rendering my HTML on the server.
However, I'm really intrigued by the idea of throwing all this out the window and using Node.js for this middle tier, for the following reasons:
I like javascript (the good parts), and let's say for this application's business logic it would be more expressive than Java.
It's javascript everywhere. No need to switch between languages, and indeed the OO and functional paradigms, when working anywhere on the stack. There's no translation plumbing between the tiers, JSON is supported natively everywhere.
I can reuse validation logic on the client and server.
If in the future I decide to do the HTML rendering client-side in the browser, I can reuse the existing templates with something like Backbone with a pretty minimal refactoring / retesting effort.
If you're at this point and like Node, all the above will seem obvious. So I should choose Node right?
BUT... this is where it falls down for me: as we all know Node is based around a single-threaded async I/O web server model. This is great for my scalability and performance in terms of servicing requests for data, but what about my business logic? What about my template rendering? Won't this stuff cause a huge bottleneck for all requests on the single thread?
Two obvious solutions come to mind, but neither of them sits right:
Keep the 'blocking' business logic in there and just use a cluster of Node instances and a load balancer, to service requests in true parallel. Ok great, so why isn't Node just multi-threaded in the first place? Or was this always the idea, to Keep It Simple Stupid and avoid the possibility of multi-threaded complexity in the base case, making the programmer do the extra setup work on top of this if multi-core processing power is desired?
Keep a single node instance, and keep it non-blocking by just calling out to some java implementation of my business logic running on some other, muti-threaded, app server. Ok, this option completely nullifies every pro I listed of using Node (in fact it adds complexity over just using Java), other than the possible gains in performance and scalability for CRUD requests to the DB.
Which leads me finally to the point of my question - am I missing some huge important piece of the Node puzzle, have I just got my facts completely wrong, or is Node just unsuitable for crunching business logic on the server? Put another way, is Node just useful for sitting over a database and servicing many CRUD requests in a more performant and scalable way than some other implementation which blocks on I/O? And you have to do all your business logic in some tier below, or even client-side, to maintain any reasonable levels of performance and scalability?
Considering all the buzz over Node, I'd rather hoped it brought more to the table than this. I'd love to be convinced otherwise!

On any given system you have N cpus available (1-64, or whatever N happens to be). In any CPU-intensive application, you're going to be stuck with a throughput of N cpus. There's no magical way to fix that by adding more than N threads/processes/whatever. Either your code has to be more efficient, or you need more CPUs. More threads won't help.
One of the little-appreciated facts about multiple-CPU performance is that if you need to run N+1 CPU-intensive operations at the same time, your throughput per CPU goes down quite a bit. A CPU-intensive process tends to hang on to that CPU for a very long time before giving it up, starving the other tasks pretty badly. In the majority of cases, it is blocking I/O and the concomitant task-switching that makes modern OS multitasking work as well as it does. If more of our every-day common tasks were CPU-bound, we would discover we needed a lot more CPUs in our machines than we do currently.
The nice thing that Node.js brings to the server party efficiency-wise is a thorough use of each thread. Ideally, you end up with less task switching. This isn't a huge win, but having N threads handling N*C connections asynchronously is going to have a performance advantage over N*C blocking threads running on the same number of CPUs. But the bottom line on CPUs remains the same: if you have more than N worth of actual CPU work to be done, you're going to feel some pain.
The last time I looked at the Node.js API there was a way to launch a server with one listener plus one worker thread per CPU. If you can do that, I would be inclined to go with Node.js provided a few caveats are met:
The Javascript-everywhere approach buys you some simplicity. For something complicated, I would be concerned about the asynchronous programming style making things harder rather than easier.
The template-processing and other CPU-intensive tasks aren't appreciably slower in Node.js than your other language/platform choices.
The database drivers are reliable.
There is one downside that I can see:
If a thread crashes, you lose all of the connections being serviced by that thread.
Finally, try to remember that programmer time is generally more expensive than servers or bandwidth.

Related

How to keep a reactive UI while doing heavy computation with WebAssembly?

I use a C++ library doing heavy image processing at loading, compiled with emscripten and embedded in an angular application.
This code freezes the UI for a few seconds, which is never nice for the user.
I guess the two options here are
to split the heavy computation into several asynchronous calls
to use threads (WebWorkers)
Although I'm not sure how feasible each one is, depending on the computation code.
What are the advantages/inconvenients of each? What's the usual way to deal with heavy computation with JS/WASM?
I've done both.
Just asynchronous would still be on the main thread so it doesn't have any benefit in your situation. However, if you can split the processing into smaller chunks and feed them over time to a requestIdleCallback, it can be pretty effective. The downside of that is that you have no real control of when (if ever) it will finish. Depending on how critical the output is it might not be the best for you. The upside is that you do have access to all of the APIs and not just the ones available to web workers. Here's an example of implementation of "splitting into smaller chunks and feeding it to a requestIdleCallback", and here's how you use it.
Using a web worker thread has the big upside of completely freeing up the main thread, which will allow you to get the result faster. The downsides are that you only have access to the web worker APIs, you have to find a way to communicate your output back to the main thread (if necessary), and it will still eat up a lot of CPU (meaning that while the main thread is technically "free" it might still be slowed down). If you can split your task into smaller chunks, you could even spawn more than one thread and get a smoother / faster user experience.
main thread
might freeze the page (or alternatively take a long time / never finish)
gives you access to all of the APIs
easier to start with, a little more complicated one you get into idle-until-urgent designs
passing data to / from won't be a bottleneck
web worker
frees up the main thread (but might still slow down the client as a whole)
not all APIs are available
pretty straightforward once you get a handle on communicating w/ the main thread
gets complicated if you try to spawn more workers to work in parallel
if your output isn't straightforward to structure as a SharedArray, passing big amounts of data might be an issue / a bottleneck

How does node js do it better?

How and when is the single threaded asynchronous processing model of nodejs a better approach than the multithreaded approach of the known server Gurus like PHP, Java and C#?. Can someone please explain to me simply and clearly?
My question is how technically is the single threaded asynchronous processing model a better approach ?
Grasping the Node JS alternative to multithreading
Node.js was created explicitly as an experiment in async processing. The theory was that doing async processing on a single thread could provide more performance and scalability under typical web loads than the typical thread-based implementation.
The single threaded, async nature does make things complicated. But do you honestly think it's more complicated than threading? One race condition can ruin your entire month! Or empty out your thread pool due to some setting somewhere and watch your response time slow to a crawl! Not to mention deadlocks, priority inversions, and all the other gyrations that go with multithreading.
But is it really single threaded. Read this article https://softwareengineeringdaily.com/2015/08/02/how-does-node-js-work-asynchronously-without-multithreading/
Node.js is built on top of Google's V8 engine, which in turns compiles JavaScript. As many of you already know, JavaScript is asynchronous in nature. Asynchronous is a programming pattern which provides the feature of non-blocking code i.e do not stop or do not depend on another function / process to execute a particular line of code.Asynchronous is great in terms of performance, resource utilization and system throughput. But there are some drawbacks:
Very difficult for a legacy programmer to proceed with Async.
Handling control flow is really painful.
Callbacks are dirty.
NodeJS is single threaded and it is not a deterrent or a performance block really. The single threaded event loop is super efficient and is much less complicated than deploying effective multithreading. Multi-threading does not always mean better performance.
Having said that, if you do need to handle heavy concurrency, then you can employ the services of the cluster module which splits multiple NodeJS processes across available CPU cores, all the while maintaining a link with a master process which can be used to control/offload processing tasks.
Node was built from the ground up with asynchronicity in mind, leveraging the event loop of JavaScript. It can handle a lot of requests quickly by not waiting around for the request when there are certain kinds of work being done for the request, such as database requests.
Imagine you have a database operation that takes 10 seconds to complete, represented by a setTimeout
router.route('/api/delayed')
.get(function(req, res) {
setTimeout(function() {
res.end('foo');
}, 10000);
});
router.route('/api/immediate')
.get(function(req, res) {
res.end('bar');
});
or a back end framework that does not support asynchronous execution, this situation is an anti-pattern: the server will hang as it waits for the database operation to complete and then fulfill the request. In Node, it fires off the operation and then returns to be ready to field the next incoming request. Once the operation finishes, it will be handled in an upcoming cycle of the event loop and the request gets fulfilled.
As long as we only write non-blocking code, our Node server will perform better than other backend languages
After reading about it in the book: Web Development with MongoDB and Node.js 2nd Edition by Maithun Satheesh, Jason Krol and Bruno Joseph D'mello, I finally came across a clear advantage
To understand this, we should understand the problem that Node.js
tries to resolve. It tries to do asynchronous processing on a single
thread to provide more performance and scalability for applications
that are supposed to handle too much web traffic. Imagine web
applications that handle millions of concurrent requests; if the
server makes a new thread for handling each request that comes in, it
will consume a lot of resources and we would end up trying to add
more and more servers to increase the scalability of the application.
The single threaded asynchronous processing model has its advantage
in the previous context, and you can process much more concurrent
requests with less number of server-side resources.
And I notice that one can process much more concurrent requests with less serverside resources
my 2 pence worth.... i am not sure sure "if the the single-threaded approach of nodejs is better" : simply put, nodejs does not support multi-threading. that can translate loosely to " everything runs in a single thread". Now, I am not quite sure how it can "compare" to a multi-threaded system , as a "multi" threaded system can support both as a single thread (like nodejs ) and multiple threads . Its all in your application design , and the platform capabilities that are available to you.
What is more important, in my opinion, is the ability to support multi-tasking, in an asynchronous way .Nodejs, does provide support for multi- tasking, in a simplified and easy-to use package. It does have limitations on due to the lack of native support for multi-threading. to take advantage of the multi-tasking ( and not worry.. much ) about multi-threading, think along the lines of designing your serverside application as performing little chunks of work over a long period of tie , and each chunk of work is invoked, and consuming events generated from the clientside . Think an event-driven design/architecture ( simple switch/case for loops, callbacks, and data checkpointing to files or database, can do the trick). And I will bet my tiny dollar , that if you get your application to work in this fashion, sans multi-threading, it will be a much better design , more robust, and if you migrate it ( and adapt for multi-threading ) it run like on an SpaceX buster!
While multi-threading is aways a plus for serverside implementation, it is also a powerful beast that requires a lot of experience and respect to tame and harness ( something that nodejs shields/protects you from)
Another way to look at is is this : Multi-tasking is a perspective ( of running several tasks) at the application level , which multi-threading is a perspective, at a lower level : Multi-tasking can be mapping on to different implementations, with multi-threading being one of them .
Multi-threading capability
Truth : Node.js ( currently ) does not provide native support for multi-threading in the sense of low level execution/processing threads. Java, and its implementations /frameworks, provides native support for multi-threading, and extensively too ( pre-emption, multi-tenancy, synchronous multi-threading, multi-tasking, thread pools etc )
Pants on Fire(ish) : lack of multi-threading in Nodejs is a show stopper. Nodejs is built around an event driven architecture , where events are produced and consumed as quickly as possible. There is native support for functional call backs. Depending on the application design, this highlevel functionality can support what could otherwise be done by thread. s
For serverside applications, at an application level , what is important is the ability to perform multiple tasks, concurrently :ie multi-tasking. There are several ways to implement multi-tasking . Multi-threading being one of them, and is a natural fit for the task. That said, the concept of “multi -threading “ is is a low level platform aspect. For instance multi-threaded platform such as java , hosted/running on a single core process server ( server with 1 CPU processor core) still supports multi-multi at the application level, mapped to multi-threading at the low level , but in reality , only a single thread can execute at any ontime. On a multi-core machine with sa y 4 cores , the same multi-tasking at application level is supported , and with up to 4 threads can executing simultaneously at any given time. The point is, in most cases, what really matters is the support for mult-tasking, which is not always synonymous with multi-threading.
Back to node.js , the real discussion should be on application design and architecture , and more specifically, support for MULTI-TASKING. In general, there is a whole paradigm shift between serverside applications and clientside or standalone applications, more so in terms of design, and process flow. Among other things, server side applications need to run along side other applications( onthe server ), need to be resilient and self contained (not affect the rest o f the server when the application fails or crashes ) , perform robust exception handling ( ie recover from errors, even critical ones ) and need to perform multiple tasks .
Just the ability to support multi-tasking is a critical capability for any server side technology . And node.js has this capability, and presented in a very easy to use packaging . This all means that design for sever side applications need to focus more on multi-tasking, and less on just multi-threading. Yes granted, working on a server-side platform that supports multi-threading has its obvious benefits ( enhanced functionality , performance) but that alone does not resolve the need to support multi-tasking at the application level . Any solid application design for server side applications ,AND node.js must be based on multi-tasking through event generation and consumption ( event processing). In node.js , the use of function callbacks, and small event processors (as functions ), with data checkpointing ( saving processing data , in files or data bases) across event processing instances is key.
What else for Node.js vs Java
a whole lot more! Think scalability , code management , feature integration , backward, forward compatibility , return on investment , agility, productivity …
,… to cut on “verbosity” of this article , pun intended :), we will leave it at this for now :)
whether you agree or not, please shoot the messenger ( Quora) and not the opinions!

How cpu intensive is too much for node.js (worried about blocking event loop)

I am writing a node app with various communities and within those communities users are able to create and join rooms/lobbies. I have written the logic for these lobbies into the node app itself though a collection of lobby objects.
Lobbies require some maintenance once created. Users can change various statuses within the lobby and I also have calls using socket.io at regular intervals(about every 2 seconds) for each lobby to keep track of some user input "live".
None of the tasks are too cpu intensive. The biggest potential threat I foresee is a load distributing algorithm but it is not one of the "live calls" and is only activated upon a button press by the lobby creator (it also is never performed on more than 10 things).
My concern arises in that, in production, if the server starts to get close too around 100-200 lobbies I could be risking blocking the event loop. Are my concerns reasonable? Is the potential quantity of these operations, although they are small, large enough to warent offloading this code to a separate executable or getting involved with various franken-thread javascript libs?
TL;DR: node app has object with regular small tasks run. Should I worry about event-loop blocking if many of these objects are made.
There is no way to know ahead of time whether what you describe will "fill" up the event loop and take all the time one thread has or not. If you want to "know", you will have to build a simulation and measure while using commensurate hardware with what you expect to use in production.
With pretty much all performance questions, you have to measure, measure and measure again to really know or understand whether you have a problem and, if so, what is the main source of the problem.
For non-compute intensive things, your CPU can probably handle a lot of activity. If you get a lot of users all pounding transactions every two seconds though, you could end up with a bottleneck somewhere that causes issues. 200 users with a transaction every 2 seconds means 100 transactions/sec which means if you take more than 10ms of CPU or of any other serialized resource (like perhaps the network card) per transaction, then you may have issues.
As for offloading some work to another process, I wouldn't spend much time worrying about that until you've measured whether you have an issue or not. If you do, then it will be very important to understand what the main cause of the issue is. It may make more sense to simply cluster your node.js processes to put multiple processes on the same server than it does to break your core logic into multiple processes. It will all depend upon what the main cause of your issue is (if you even have an issue). Or, you may end up needing multiple network cards. Or something else. It's far too early to know before measuring and understanding.

What are some advantages to using a Javascript-based server over a non-Javascript server?

Disclaimer: This question will be a bit open-ended. I also expect replies to be partly based off of developer preference.
I've been doing some research lately on Express.js (coupled via Node.js) and I'm struggling to find how I would fit either of these technologies into my current workflow for developing websites. Lately I've been working in either Wordpress or Ruby on Rails, the prior will run on Apache, the latter will run on it's own proprietary server (I assume).
Now perhaps I'm just not understanding something, but I fail to see the advantages to enlisting the support of a Javascript-based framework/server. If there are clear cut advantages to making this part of my workflow, what would they be? I haven't been able to find any ways to fit this into (per say) a Rails application or a Wordpress site. Could someone point me in the direction of some better help of implementing these technologies on top of ones I already use?
One last question, what happens if someone has Javascript disabled in their browser? How would a Javascript-based server react (if at all)?
There are two big differences:
Event loop
Node.js is a bit different from the usual Apache concept, because of the way it handles connections. Instead of having synchronous connections, Node uses an event loop to have non-blocking operations. Note that this is not unique to Javascript and there are C and Python based frameworks that also enable a similar event loop approach, however in Javascript it's probably the most natural feeling since this is how JS has worked since it was introduced.
Supposedly, this should enable you to handle more concurrent clients. However, it hasn't had as much real world exposure as the regular blocking solutions so this approach isn't as mature as most current implementations. The actual performance difference is questionable as it depends on the exact requirements for the application.
Code Sharing
This point is much less controversial than the previous difference, but in essence if you have the same language on both the client and the server, you can reuse a lot of the code, instead of having to rewrite your data structures etc in multiple languages, saving you a lot of development time. However, you have to understand that the concepts of server side JS are different from what you know on the browser, such as you don't have dynamic JS with jQuery or Prototype, but it's result and use-cases are more similar to what PHP is widely used for.
The primary advantage of having Javascript as your server-side language is that you're writing your whole system in a single language.
This has advantages in terms of:
Learning curve and mental context switching for the developer
and also in provides some possibility for sharing code between the two environments.
However, this last point is less helpful than it sounds, for a number of reasons, and this is where the disadvantages come in:
Not much code can actually be shared, because of the differences in what the two environments are actually doing. Maybe a bit of validation code, so you know that you're doing the same checks on both client and server, and maybe a few utility functions, but that's about it.
The libraries you use will be completely different between the two as well: jQuery only works on the client, and Node has libraries that are server-specific.
So as a developer, you still need to mentally context switch between environments, because the two environments are different. They may share a language, but their modes of operation are different, and what they do is different. In fact, sharing the language between the two can actually make it harder to context switch, and this can lead to errors.
Finally, it's worth bearing in mind that while Node is getting lots of attention from the developer community, it is still new and evolving quickly: if you're a business considering a it as a development platform, it's probably not quite yet stable enough to base a major project on.

mootools: I want to implement architecture similar to Big pipe in Facebook

I am developing an application in mootools. I have used Reqeust class to implement pipelining it.
I want to develop a superior method to handle client server requests. I referred the following article to understand how big pipe works in facebook.
http://www.facebook.com/notes/facebook-engineering/bigpipe-pipelining-web-pages-for-high-performance/389414033919
In facebook, a javascript function is called on arrival of any server response to update data user screen. (see the screenshot)
http://img815.imageshack.us/img815/5154/facebookna.jpg
if i get a basic model of such architecture, i can start building application using that
code.
can some one please provide me such a basic model?
Till now i have designed an architecture in which response_data is stored in a global variable and then a function called to update data to user screen.(Used synchronous Request here) which is very slow.
so which method is superior 'synchronous or Asynchronous'?
Firstly, thanks for the read, it was a very interesting blog post.
You may want to look into this libary which was inspired by Facebook's BigPipe. Note: I'm not endorsing it as I've never used it, but building it yourself is not trivial.
With regards to whether synchronous and asynchronous is better, that depends. Synchronous is simpler - the dependencies are obvious, and there's no overhead. Asynchronous is only an advantage if your resources are not fully utilised, and your processing can be easily broken down into independant blocks. I can't tell what you're trying to do, so you need to make the decision yourself where the performance bottleneck actually is, and whether architecting your application such that multiple sections can be downloaded, processed and rendered in parallel will actually provide an advantage.
As an example, if you're downloading a single, massive block of data to be rendered as a table in the browser, then breaking that data into multiple parallel downloads will improve performance - at the cost of creating some queuing system to deal with out-of-order responses. On the other hand, though technically slower, batching the download into synchronous blocks so that one block is downloaded and rendered before the next one is requested, will still do wonders to perceived performance, and is a much simpler alternative.

Categories

Resources