I am wondering if node.js is good for use in a server side application which is not actually communicating with the browser, or browser communication is just an additional part of whole app used rather for management.
The idea is simple:
Server receives high amount of UDP traffic with short messages containing user data from another server.
For each message app performs DB lookup and filter out messages with userid's that are not on the whitelist.
Filtered messages are processed, which result in another DB update, or sending data to another server.
Is such case, a good scenario to learn node.js, or maybe there is no benefit from it comparing to e.g Java EE?
Disclaimer: I work for a company that contributes to node.js and promotes its usage, so my opinion might be biased.
As others mentioned in comments, node.js should be a good fit for you scenario. It is actually one of the most common scenarios where people use node.js - fetch data from (possibly multiple) sources, do a small amount of CPU-light processing and send back the response or store the result. Unless message filtering is very CPU expensive, node.js implementation will probably outperform J2EE version.
The reason is that Node.js is heavily optimised for solutions where the server spends most of the time waiting. Waiting for client connection, waiting for database response, waiting for disc read/write, waiting for client to read the response, etc.
J2EE is employing multi-threading, where you have one thread to handle each request, which is suboptimal in this case. Most threads are waiting, so you are not getting the benefit of running lots of code in parallel, but you still have to pay the price of context switching and higher memory usage.
There is one thing I would consider before going for node.js: are you able and allowed to deploy node.js into your production environment? Moving to a new platform has some associated costs, people operating your application will have to learn how to deal with node.js applications.
Related
EDIT:
I'm simplifying my question, because while #saintedlama's response is helpful information, it is tangental to what I'm trying to understand about using more than a single node process.
The crux of it is: How do I, or can I, manage manually spawned child processes, given the app is using already running using node's native cluster module?
Original question(s) below
I have an express.js app, the main function is to accept http requests and serve http responses via some MongoDB queries. For performance, the app uses node's native cluster module, and is spawned across available CPUs as worker processes at app start.
I now have some specific queries that may be long running - connecting to external services and APIs.
Is it worth spawning these specific queries to their own workers (using node's child_process )?
And if so, how will this be affected by the existing use of cluster?
Alternatively (or as well) if I set up a persistent worker queue using something like Monq or Agenda - and given I'm using cluster, how can I control which process handles the queue?
Spawning long running queries to some forked worker may, depending on the work done in node.js JavaScript processing, not yield any benefit.
Node.js does all IO processing (the queries are IO) in a dedicated thread (pool) in the background. So your node.js JavaScript process is not blocked while the database system processes the query.
In case that you're doing a lot of query result post processing in JavaScript it may yield benefits, since while post processing in JavaScript code the JavaScript processor is blocked.
Using a job queue for doing these queries async has benefits since you can start developing with job processors in the same process and you will later have the possibility to easily scale out by deploying job processors to dedicated machines in your environment. But: Be careful with this approach since having very large query results may slow down your job queue.
I'm developing application that displays real-time data (charts, etc.) from Redis. Updated data comes to Redis very quickly (milliseconds). So it would make sense to show updates as often as possible (as long as human eye can notice it).
Technology stack:
Node.js as a web server
Redis that holds the data
JavaScript/HTML (AngularJS) as a client
Right now I have client-side polling (GET requests to Node.js server every second that queries Redis for updates).
Is there advantage of doing server-side polling instead, and exposing updates through WebSocket? Every WebSocket connection will require separate Node.js poll (setInterval) though since client queries may be different. But it's not expected to have more than 100 WebSocket connections.
Any pros/cons between these two approaches?
If I understood your question correctly: you have less than 100 users who are going to use your resource simultaneously, and you want to find out what can be a better way to give them updates:
clients ask for updates through time-out request (1 per second)
server keep track of clients and whenever there is an update, it issues them an update.
I think the best solution depends on the data that you have and how important is for users to get this data.
I would go with client-side if:
people do not care if their data is a little bit stale
there would be approximately more then 1 update during this 1 second
I do not have time to modify the code
I would go with server-side if:
it is important to have up to date data and users can not tolerate lags
updates are not so often (if for example we have updates only once per minute, only 1 in 60 client side request would be useful. And here server will just issue only one update)
One good thing is that node.js already has an excellent socket.io library for this purpose.
I just started learning Node.js and as I was learning about the fs.watchFile() method, I was wondering if a chat website could be efficiently built with it (and fs.writeFile()), against for example Socket.IO which is stable, but I believe not 100% stable (several fallbacks, including flash).
Using fs.watchFile could perhaps also be used to keep histories of the chats quite simply (as JSON would be used on the spot).
The chat files could be formatted in JSON in such a way that only the last chatter's message is brought up to the DOM (or whatever to make it efficient to 'fetch' messages when the file gets updated).
I haven't tried it yet as I still need to learn more about Node, and even more to be able to compare it with Socket.IO, but what's your opinion about it? Could it be an efficient/stable way of doing chats?
fs.watchFile() can be used to watch changes to the file in the local filesystem (on the server). This will not solve your need to update all clients chat messages in their browsers. You'll still need web sockets, AJAX or Flash for that (or socket.io, which handles all of those).
What you could typically do in the client is to try to use Web Sockets. If browser does not support them, try to use XMLHttpRequest. If that fails, fallback to Flash. It's a lot of programming to do, and it has to be handled by node.js server as well. Socket.io does that for you.
Also, socket.io is pretty stable. Fallback to Flash is not due to it's instability but due to lack of browser support for better solutions (like Web Sockets).
Storing chat files in flatfile JSON is not a good idea, because if you are going to manipulating the files, you would have to parse and serialize entire JSON objects, which would become very slow as the size of the JSON object increased. The watch methods for the filesystem module also don't work on all operating systems.
You also can't compare Node.js to Socket.IO because they are entirely different things. Socket.IO is a Node module for realtime transport between the browser and the server. What you need is dependent on what you're doing. If you need chat history, then you should be using a database such as MongoDB or MySQL. Watching files for changes is not an efficient way and you should just send messages as they received.
In conclusion no, using fs.watchFile() and fs.writeFile() is a very bad idea, because race conditions would occur due to concurrent file writes, besides that fs.watchFile() uses polling to check if a file has changed. You should instead use Socket.IO and push messages to other clients / store them in a database as they are received.
You can use long pooling method using javascript setTimeout and setInterval
long pooling
basically long pooling working on Ajax reqest and server responce time.
server will respond after a certain time (like after 50 seconds ) if there is not notification or message else it will respond with data and from client side when client gets response client javascript makes another request for new update and wait till response this process is endless until server is running
I have X amount of activity sensors connected to a server that inserts data to a database everytime a sensor is triggered. What I'm trying to do is create a web interface with a blue print of the facility (svg) and whenever a sensor is triggered, besides the db insert, I want it to show some sort of alert in my blue print. For that I need to keep an open connection to the server I think.
I was thinking of using web sockets, but it might be overkill since I only need to retrieve data from the server. But running an ajax call every second doesn't sound very efficient either. Are there any other alternatives?
Thank you
Some potential choices include:
WebSocket
Adobe® Flash® Socket
AJAX long polling
AJAX multipart streaming
Forever Iframe
JSONP Polling
Which actual transport you end up using will depend on the your requirements for browser support and what technology you are using on the server to handle these requests. The transport choice may also depend on your network topology - what types of load balancers you need to integrate with, proxies, etc.
There are many libraries available on both the client and server sides, many of which support more than one of these transports.
For example (not an exhaustive list):
socket.io for nodejs
WebSocket
Adobe® Flash® Socket
AJAX long polling
AJAX multipart streaming
Forever Iframe
JSONP Polling
SignalR for an asp/.net backend
WebSockets
Server-Sent Events
ForeverFrame
Long Polling
Atmosphere for a java backend
WebSockets
Server Side Events (SSE)
Long-Polling
Forever frame
JSONP
IMO - Websockets is NOT overkill for this type of problem and would lend itself nicely to this type of application.
Without specifically discussing frameworks or knowing what is running in the backend of your server(s), we have a few options to consider for the frontend:
Websockets
Websockets are designed for bidirectional communication, although it is kind of shocking how many users are surfing the web in a browser that doesn't support websockets. I always recommend a fallback for this, such as the other methods listed below.
SSE
SSE is an HTML5 spec and is still shaky at best. Try scrolling on a page while when an SSE event fires... It may be a little easier on the backend, put it sometimes hangs on the client side since it runs inside the same thread that the DOM is running in.
Long Polling
Keeps your connection open. It doesn't scale well with PHP, but performs swimmingly with Python+Twisted on the backend, or Node.Js
Good Old Ajax
Keep your requests small, and you still have a scalable solution. Yes, a full GET request is the most expensive, but is supported in just about every browser rolled out the past ten years. It is also worth noting that GET requests are easy to scale horizontally with more hardware.
In a perfect world:
You would break up your application into a few components, operating behind a reverse proxy such as Nginx. Then use Node.Js + Socket.IO handle the realtime aspects of your app.
Another option would be to use small Ajax requests, and offer websocket support for the browsers that support it. This is advice specifically for PHP in the backend.
WebSocket is certainly not overkill. On the contrary. With websockets, you have a bi-directional communication channel; this means, that the server can initiate communication whenever it seems fit (e.g. when sensor data changes).
In a previous project, I have used node.js together with socket.io, to monitor 50+ sensors. Data was updated in real-time in a browser. The data was visualized using smoothie.js.
Whenever a sensor value was updated, it was communicated to the browser. Some sensors only updated once a minute, others once a second, ...
Polling would have been overkill, because it would retrieve all data for all sensors, even from those that were not updated yet.
I had a similar problem and did a lot of research on this. As I understand it, there are three main options:
Short polling: Have an endpoint that your javascript client pings every second. This is the worst option, because the pings add latency up to one second to your communication, and depending on how you implement, the endpoint could query the database every second, adding unnecessary overhead.
Long polling: Have an endpoint that your javascript client pings that holds the connection until a) the event occurs or b) the connection times out. If the endpoint returns a response, the client gets the event information. If the endpoint does not return a response, no event has occurred, and the client sends a new request. This is a good option because the events can immediately trigger the response to the client, assuming you have an asynchronous interprocess communication layer (like 0MQ) to send the message without any sort of polling.
Websocket: Have your javascript client connect to a websocket server, which will send a message to your client immediately upon the event trigger.
I think a websocket is your best option, because it accommodates immediate communication of the event without all the request/response overhead. And most importantly, this is exactly what websockets are designed to do! As such, you will probably have to write the least amount of custom code with this solution.
There are two great commercial services that might work for you.
Firebase - a javascript hierarchical database and realtime
messaging/ synchronization platform, uses websockets and has other fallbacks
PubNub - a real time message passing and queue system, uses websockets
I'd like some opinions on the practical implications of moving processing that would traditionally be done on the server to be handled instead by the client in a node.js web app.
Example case study:
The user uploads a CSV file containing a years worth of their bank statement entries. We want to parse the file, categorise each entry and calculate cumulative values for each category so that we can store the newly categorised statement in a db and display spending analysis to the user.
The entries are categorised by matching strings in the descriptions. There are many categories and many entries and it takes a fair amount of time to process.
In our node.js server, we can happily free up the event loop whilst waiting for network responses and so on, but if there is any data crunching or similar processing, the server will be blocked from responding to requests, and this seems unavoidable.
Traditionally, the CSV file would be passed to the server, the server would process, save in db, and send back the output of the processing.
It seems to make sense in our single threaded node.js server that this processing is handled by the browser, and the output displayed and sent to server to be stored. Of course the client will have to wait while this is done, but their processing will not be preventing the server from responding to requests from other clients.
I'm interested to see if anyone has had experience build apps using this model.
So, the question is.. are there any issues in getting browsers rather than the server to handle, wherever possible, any processing that will block the event loop? Is this a good/sensible/viable approach to node.js application development?
I don't think trusting client processed data is a good idea.
Instead you should look into creating a work queue that a separate process listens on, separating the CPU intensive tasks from your node.js process handling HTTP requests.
My proposed data flow would be:
HTTP upload request
App server (save raw file somewhere the worker process can access)
Notification to 'csv' work queue
Worker processes uploaded csv file.
Although perfectly possible, simply shifting the processing to the client machine does not solve the basic problem.
Now the client's event loop is blocked, preventing the user from interacting with the browser. Browsers tend to detect this problem and stop execution of the page's script altogether. Something your users will certainly hate.
There is no way around either delegating or splitting up the work-load.
Using a second process (for example a 2nd node instance) for doing the number crunching server-side has the added benefit of allowing the operating system to use a 2nd CPU core. Ideally you run as many Node instances as you have CPU cores in the server and balance your work-load between them. Have a look at the diode module for some inspiration on how to implement multi-process communication in node.