Moving node.js server javascript processing to the client - javascript

I'd like some opinions on the practical implications of moving processing that would traditionally be done on the server to be handled instead by the client in a node.js web app.
Example case study:
The user uploads a CSV file containing a years worth of their bank statement entries. We want to parse the file, categorise each entry and calculate cumulative values for each category so that we can store the newly categorised statement in a db and display spending analysis to the user.
The entries are categorised by matching strings in the descriptions. There are many categories and many entries and it takes a fair amount of time to process.
In our node.js server, we can happily free up the event loop whilst waiting for network responses and so on, but if there is any data crunching or similar processing, the server will be blocked from responding to requests, and this seems unavoidable.
Traditionally, the CSV file would be passed to the server, the server would process, save in db, and send back the output of the processing.
It seems to make sense in our single threaded node.js server that this processing is handled by the browser, and the output displayed and sent to server to be stored. Of course the client will have to wait while this is done, but their processing will not be preventing the server from responding to requests from other clients.
I'm interested to see if anyone has had experience build apps using this model.
So, the question is.. are there any issues in getting browsers rather than the server to handle, wherever possible, any processing that will block the event loop? Is this a good/sensible/viable approach to node.js application development?

I don't think trusting client processed data is a good idea.
Instead you should look into creating a work queue that a separate process listens on, separating the CPU intensive tasks from your node.js process handling HTTP requests.
My proposed data flow would be:
HTTP upload request
App server (save raw file somewhere the worker process can access)
Notification to 'csv' work queue
Worker processes uploaded csv file.

Although perfectly possible, simply shifting the processing to the client machine does not solve the basic problem.
Now the client's event loop is blocked, preventing the user from interacting with the browser. Browsers tend to detect this problem and stop execution of the page's script altogether. Something your users will certainly hate.
There is no way around either delegating or splitting up the work-load.
Using a second process (for example a 2nd node instance) for doing the number crunching server-side has the added benefit of allowing the operating system to use a 2nd CPU core. Ideally you run as many Node instances as you have CPU cores in the server and balance your work-load between them. Have a look at the diode module for some inspiration on how to implement multi-process communication in node.

Related

Node.js CPU Load balancing Websocket Client over Multiple CPU Cores

My node.js app currently subscribes to a number of websocket servers that is starting to push a lot of data over to the app every second. This websocket client app has several event handlers that does some work upon receiving the websocket data.
However, Node.js appears to be only using 1 CPU core at any one time, leave the remaining cores under utilized. This is expected as Node.js uses a single-threaded event loop model.
Is it possible to load balance the incoming websocket data handling over multiple CPU cores? I understand that Node Cluster and pm2 Cluster Mode are able to load balance if you are running websocket servers, but how about websocket clients?
From the client side, I can think of the following options:
Create some child processes (likely one for each CPU core you have) and then divide your webSocket connections among those child processes.
Create node.js WorkerThreads (likely one for each CPU core you have) and then divide your webSocket connections among those WorkerThreads.
Create node.js WorkerThreads (likely one for each CPU core you have) and create a work queue where each incoming piece of data from the various webSocket connections is put into the work queue. Then, the WorkerThreads are regular dispatched data from the queue to work on. As they finish a piece of data, they are given the next piece of data from the queue and so on...
How to best solve this issue really depends upon where the CPU time is mostly being spent. If it is the processing of the incoming data that is taking the most time, then any of these solutions will help apply multiple CPUs to that task. If it is the actual receiving of the incoming data, then you may need to move the incoming webSockets themselves to a new thread/process as in the first two options.
If it's a system bandwidth issue due to the volume of data, then you may need to increase the bandwidth of your network connection of may need multiple network adapters involved.

Nginx and Node.js server - multiple tasks

UPDATE
I have a few questions about the combination of Nginx and Nodejs.
I've used Nodejs to create my server and now I'm facing with an issue about catching the server for an actions (writing, removing and etc..).
We are using Redis to lock the server when there are requests to the server, for example if a new user is doing a sign up action all the rest of the requests are waiting until the process is done, or if there is another process (longer one) all the other requests will wait longer.
We thought about creating a Load balancer (using Nginx) that will check if the server is locked, and if the server is locked it will open a new task and won't wait until the first process is done.
I used this tutorial and created a dummy server, then I've struggled with the idea of do this functionality of opening a new ports.
I'm new with load balancing implementation and I will be happy to hear your thoughts and help.
Thank you.
The gist of it is that your server needs to not crash if more than one connection attempt are made to it. Even if you use NGINX as a load balancer and have five different instances of your server running...what happens when six clients try to access your app at once?
I think you are thinking about load balancers slightly wrong. There are different load balancing methods, but the simplest one to think about is "round robin" in which each connection gets forwarded to the next server in the list (the rest are just more robust and complicated versions of this one). When there are no more servers to forward to, the next connection gets forwarded to the first server again (whether or not it is done with its last connection) and the circle starts over. Thus, load balancers aren't supposed to manage "unique connections" from clients...they are supposed to distribute connections among servers.
Your server doesn't necessarily need to accept connections and handle them all at once. But it needs to at least allow connections to queue up without crashing, and then accept and deal with each one by one.
You can go the route you are discussing. That is, you can fire up a unique instance of your server...via Heroku or other...for every single connection that is made to your app. But this is not efficient and will ultimately create more work for you in trying to architect a system that can do that well. Why not just fix your server?

Node.js chat without Socket.IO

I just started learning Node.js and as I was learning about the fs.watchFile() method, I was wondering if a chat website could be efficiently built with it (and fs.writeFile()), against for example Socket.IO which is stable, but I believe not 100% stable (several fallbacks, including flash).
Using fs.watchFile could perhaps also be used to keep histories of the chats quite simply (as JSON would be used on the spot).
The chat files could be formatted in JSON in such a way that only the last chatter's message is brought up to the DOM (or whatever to make it efficient to 'fetch' messages when the file gets updated).
I haven't tried it yet as I still need to learn more about Node, and even more to be able to compare it with Socket.IO, but what's your opinion about it? Could it be an efficient/stable way of doing chats?
fs.watchFile() can be used to watch changes to the file in the local filesystem (on the server). This will not solve your need to update all clients chat messages in their browsers. You'll still need web sockets, AJAX or Flash for that (or socket.io, which handles all of those).
What you could typically do in the client is to try to use Web Sockets. If browser does not support them, try to use XMLHttpRequest. If that fails, fallback to Flash. It's a lot of programming to do, and it has to be handled by node.js server as well. Socket.io does that for you.
Also, socket.io is pretty stable. Fallback to Flash is not due to it's instability but due to lack of browser support for better solutions (like Web Sockets).
Storing chat files in flatfile JSON is not a good idea, because if you are going to manipulating the files, you would have to parse and serialize entire JSON objects, which would become very slow as the size of the JSON object increased. The watch methods for the filesystem module also don't work on all operating systems.
You also can't compare Node.js to Socket.IO because they are entirely different things. Socket.IO is a Node module for realtime transport between the browser and the server. What you need is dependent on what you're doing. If you need chat history, then you should be using a database such as MongoDB or MySQL. Watching files for changes is not an efficient way and you should just send messages as they received.
In conclusion no, using fs.watchFile() and fs.writeFile() is a very bad idea, because race conditions would occur due to concurrent file writes, besides that fs.watchFile() uses polling to check if a file has changed. You should instead use Socket.IO and push messages to other clients / store them in a database as they are received.
You can use long pooling method using javascript setTimeout and setInterval
long pooling
basically long pooling working on Ajax reqest and server responce time.
server will respond after a certain time (like after 50 seconds ) if there is not notification or message else it will respond with data and from client side when client gets response client javascript makes another request for new update and wait till response this process is endless until server is running

Node.js for server to server communication

I am wondering if node.js is good for use in a server side application which is not actually communicating with the browser, or browser communication is just an additional part of whole app used rather for management.
The idea is simple:
Server receives high amount of UDP traffic with short messages containing user data from another server.
For each message app performs DB lookup and filter out messages with userid's that are not on the whitelist.
Filtered messages are processed, which result in another DB update, or sending data to another server.
Is such case, a good scenario to learn node.js, or maybe there is no benefit from it comparing to e.g Java EE?
Disclaimer: I work for a company that contributes to node.js and promotes its usage, so my opinion might be biased.
As others mentioned in comments, node.js should be a good fit for you scenario. It is actually one of the most common scenarios where people use node.js - fetch data from (possibly multiple) sources, do a small amount of CPU-light processing and send back the response or store the result. Unless message filtering is very CPU expensive, node.js implementation will probably outperform J2EE version.
The reason is that Node.js is heavily optimised for solutions where the server spends most of the time waiting. Waiting for client connection, waiting for database response, waiting for disc read/write, waiting for client to read the response, etc.
J2EE is employing multi-threading, where you have one thread to handle each request, which is suboptimal in this case. Most threads are waiting, so you are not getting the benefit of running lots of code in parallel, but you still have to pay the price of context switching and higher memory usage.
There is one thing I would consider before going for node.js: are you able and allowed to deploy node.js into your production environment? Moving to a new platform has some associated costs, people operating your application will have to learn how to deal with node.js applications.

Is there any good trick for server to handle more requests if I don't have to sent any data back?

I want to handle a lot of (> 100k/sec) POST requests from javascript clients with some kind of service server. Not many of this data will be stored, but I have to process all of them so I cannot spend my whole server power for serving requests only. All the processing need to be done in the same server instance, otherwise I'll need to use database for synchronization between servers which will be slower by orders of magnitude.
However I don't need to send any data back to the clients, and they don't even expect them.
So far my plan was to create few proxy servers instances which will be able to buffer the request and send them to main server in bigger packs.
For example let's say that I need to handle 200k requests / sec and each server can handle 40k. I can split load between 5 of them. Then each one will be buffering requests and sending them back to main server in packs of 100. This will result in 2k requests / sec on the main server (however, each message will be 100 times bigger - which probably means around 100-200kB). I could even send them back to the server using UDP to decrease amount of needed resources (then I need only one socket on main server, right?).
I'm just thinking if there is no other way to speed up the things. Especially, when as I said I don't need to send anything back. I have full control over javascript clients also, but unlucky javascript is unable to send data using UDP which probably would be solution for me (I don't even care if 0.1% of data will be lost).
Any ideas?
Edit in response to answers given me so far.
The problem isn't with server being to slow at processing events from the queue or with putting events in the queue itself. In fact I plan to use disruptor pattern (http://code.google.com/p/disruptor/) which was proven to process up to 6 million requests per second.
The only problem which I potentially can have is need to have 100, 200 or 300k sockets open at the same time, which cannot be handled by any of the mainstream servers. I know some custom solutions are possible (http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-3) but I'm wondering if there is no way to even better utilization of fact that I don't have to replay to clients.
(For example some way to embed part of the data in initial TCP packet and handle TCP packets as they would be UDP. Or some other kind of magic ;))
Make a unique and fast (probably in C) function that get's all requests, from a very fast server (like nginx). The only job of this function is to store the requests in a very fast queue (like redis if you got enought ram).
In another process (or server), depop the queue and do the real work, processing request one by one.
If you have control of the clients, as you say, then your proxy server doesn't even need to be an HTTP server, because you can assume that all of the requests are valid.
You could implement it as a non-HTTP server that simply sends back a 200, reads the client request until it disconnects, and then queues the requests for processing.
I think what you're describing is an implementation of a Message Queue. You also will need something to hand off these requests to whatever queue you use (RabbitMQ is quite good, there are many alternatives).
You'll also need something else running which can do whatever processing you actually want on the requests. You haven't made that very clear, so I'm not too sure exactly what would be right for you. Essentially the idea will be that incoming requests are dumped as quickly as simply as possible into the queue by your web server, and then the web server is free to go back to serving more requests. When the system has some resources, it uses them to process the queue, but when it's busy the queue just keeps growing.
Not sure what platform you're on, but might want to look at something like Lighttpd for serving the POSTs. You might (if same-domain restrictions don't shoot you down) get away with having Lighttpd running on a subdomain of your application (so post.myapp.com). Failing that you could put a proper load balancer in front of your webservers altogether (so all requests go to www.myapp.com and the load balancer decides whether to forward them to the web server or the queue processor).
Hope that helps
Consider using MongoDB for persisting your requests, it's fire and forget mechanism can help your servers to response faster.

Categories

Resources