Too many subscriptions to socket.io in JavaScript

Too many subscriptions to socket.io in JavaScript - javascript

I am using streaming provided by a vendor using socket.io using the following code:
var socket = io.connect('https://streamer.vendor-company.com/');
var subscription = ['sub1', 'sub2', 'sub3', 'sub4'];
socket.emit('SubAdd', { subs: subscription });
socket.on("m", function(message) {
console.log(message);
var messageType = message.substring(0, message.indexOf("~"));
if (messageType == someMessageType) {
dataUnpack(message);
}
else if (messageType == otherMessageType) {
anotherDataUnpack(message);
}
});
The method dataUnpack and anotherDataUnpack perform some processing on the received message and display to the webpage. Now here, the array subscription may have around 45 subscriptions.
I want to know the affect on performance on my website. Does socket.io have some way for not flooding the client or are there any serious performance consideration? Is socket.io designed for such usage?
Updates
This is different from: Too many on-connection events with Socket.io, does it hurt? as mine is for Javascript/jquery and the question to which I gave link is for node.js.
The server is not under my control. Looking at jfriend00's answer, It seems that if I have 50 subscriptions and I get around 20-30 messages per sec, I need to handle this on client side. If so, what is the amount of incoming messages I should worry at? And if possible, any technique/strategy for handling high rate of incoming messages?

Does socket.io have some way for not flooding the client
No. If your server sends a message to the client, socket.io delivers it. It's all up to your server how many messages it sends and socket.io's job is to deliver every one of them you tell it to send.
or are there any serious performance consideration?
If you send a ton of messages to a ton of clients, that will potentially take a lot of processing and bandwidth. socket.io is just a messaging layer on top of webSocket which is a layer on top of TCP. So, if you send a socket.io message from server to a client or from server to all clients, it's one of more TCP packets being sent to the client(s). The only way to not flood the client is for your server to not send it more than it wants or can handle.
Is socket.io designed for such usage?
Socket.io is designed to reliably deliver messages from server to client or client to server. It just does what you tell it. If you tell it to deliver 1000 messages, that's what it will do.
If you have concerns about too many messages being sent to the client, then you need to modify your server to control that. For example, you might decide that a client should not be notified more than once every 5 seconds (for efficiency reasons). To implement that, you'd need an extra layer (of your own design) on the server. That is not something that socket.io has built-in features for.
If you're getting 20-30 messages per sec per subscription and each client has 50 subscriptions, that's 1000-1500 messages per second per client. That is indeed a lot and probably not sustainable at either the client side or the server side, especially on the server-side as you get lots of clients all doing the same thing. At 1000 messages per second, you have to process a message in under 1ms in order to keep from falling behind.
There are no special techniques for handling a high rate of incoming messages other than be extremely careful to limit what you do when each message arrives. For example, you would not want to be touching the browser DOM on each message. Perhaps you would queue them and modify the DOM in batch only once per second. Further advice would need to see what you're trying to do with these messages.
The better path would be to find a way to limit the number of messages being sent to each client, either by being smarter about what you subscribe to, finding ways to configure the subscription to not send you so much or creating an intervening server of your own that can be smarter about what is sent to each client.

Related

What does it matter if there are many nodejs emitter listeners?

I'm writing a TCP server application using NodeJS. However, each socket runs on a separate child-process (server.on("connection")). To send messages to specific clients, I used Emitter, and each socket generates its own listener (on clientID). So if there are 10000 connected devices, the application will create 10000 listeners. This looks terrible. What dangers will this pose? I can't find a solution to send a message from one client to another in the TCP protocol writing NodeJS code.
Update:
Have any idea to send message to specific client without add custom listeners?

However, each socket runs on a separate process.
Why would you do that? The core idea behind NodeJS is to run things in an event loop. Single threaded, yes, but asynchronous.
This looks terrible. What dangers will this pose?
It is terrible. The biggest issue is that you sacrifice a lot of resources. You not only spawn thousands of processes but you also spawn lots of emitters. So first of all this means lots of RAM eaten. Secondly this means degraded performance due to process context switch, which typically is slower than user space switch. Assuming your machine will even allow you to spawn so many processes.
I can't find a solution to send a message from one client to another in the TCP protocol writing NodeJS code.
I assume you have a TCP server, two connected clients and client A wants to send message to client B. Is that correct? TCP by itself won't do that for you. You need some protocol on top of it. For example:
Client connects to the server. At this point the client is not logged in and cannot do anything except for authentication.
Client authenticates. It sends (username, password) pair to the server. The server validates the pair. The server keeps a global mapping {"<username>": [sockets]} and adds newly authenticated client to that mapping.
Client A wants to send a message to client B. So it sends data of the form {"type": "direct", "destination": "clientB", "data": "hello B"}. The server parses the message and forwards it to the appropriate client (taken from the global mapping).
In case when you want to broadcast the message you send say {"type":"broadcast", "data": "hello all"} kind of message. The server then parses it, it loops through all connected clients (found in the global mapping) and forwards the message to each client.
Of course you also need some framing of packets. Since TCP is a stream, then it doesn't really understand messages and where one starts and the other ends. Dumping things to JSON is a half of the problem. Because then you have to send this JSON over the network and the other side has to know how many bytes it has to read. One way is to prefix each message with, say, 2 bytes that tell the other side how long the message is.
Btw you may want to consider using socket.io (or some other lib) that take care of some of those tedious details for you.

Alternative to "Notification URL" to Handle Long Running API Process in Node

I am building an API that will take a long time to return data, up to 60 seconds while a conversion takes place. While running, I would like to keep the users informed of any errors and notify them which process in the conversion stage we are at.
This is pretty easy on the client since I can simply send a WebSocket event, but for a public API, that's not very practical.
I know I can request a notification URL and send updates to the given URL, but it seems cumbersome and potentially resource heavy. Is there another more efficient means to send progress notifications?
Ideally, the user consuming the api would be able to setup.
.on("error", function(err) {
//handle error
});
or something to that effect.

You're not really clear on who the consumers of your API are, what kinds of clients they're using, or what the workflow will look like. So there's a lot of different answers depending on what you're looking for and what resources you have available.
A non-exhaustive list:
REST endpoint polling
Understood that you aren't a fan, but this remains one of the best ways to do it for a wide range of clients, is one of only two (that I know of) ways to do it for purely browser-based clients. Performance wise, it's not awful if you setup your caching strategy appropriately and set throttle limits on your clients (which you should be doing anyway). I disagree that it's a PITA for clients to use consume, but that's opinion and you obviously feel differently. A way to mitigate that PITA is to offer an SDK that handles that mechanism for consumers.
Web Sockets
I get that you might be dealing with clients who aren't starting off in the web, but if a client can make a RESTful request, you could set the server to do the web socket upgrade if the client advertises interest in establishing same. I'm not a fan of this option as it feels more complex to me (more moving parts), but it's an option if you like web sockets and all/most of your clients will be web socket capable. Or you could just have the REST response be the URL to the web socket you're opening for that client.
Web Hooks
If your clients are likely to be other machines (esp. servers), then a web hook is a very good approach, especially if the event you want to raise can happen more than once and at unpredictable intervals. In this scheme, the client makes a REST request to you, part of the data they send you includes a URL that you will POST data to (in a format you specify in your API) when the event occurs. Obviously, they either have to leave that URL open to your POST or else you can agree upon some kind of credentialing that your server will respect.
TCP Socket
Similar to the Web Socket option, in that you'd probably have a REST request hit your endpoint, and then respond with the socket connection information/URI to a custom TCP socket. This is a bit nonstandard, but can be very useful and efficient in the right use cases. I haven't used it in a while so they may have changed it, but this is how Heroku's API used to handle streaming logs.
Pub/Sub or Message Queue or similar
Redis can do this, as can many others. In this scenario you're making a more generic solution where there might be more than one event channel clients can subscribe to, and so on. I dislike exposing Redis directly for security reasons, which means you'll still need to figure out how to handle the comms between Redis and the client (see above), but using it under the hood will at least buy you some of the conceptual logic of handling publishers and subscribers and so on; useful if you have more than one event as I said. This is a more heavyweight solution than the above, though, and will increase your sysadmin overhead by some amount (depending on your high availability needs, etc)

Advice on which technology to use for real time notifications

I have X amount of activity sensors connected to a server that inserts data to a database everytime a sensor is triggered. What I'm trying to do is create a web interface with a blue print of the facility (svg) and whenever a sensor is triggered, besides the db insert, I want it to show some sort of alert in my blue print. For that I need to keep an open connection to the server I think.
I was thinking of using web sockets, but it might be overkill since I only need to retrieve data from the server. But running an ajax call every second doesn't sound very efficient either. Are there any other alternatives?
Thank you

Some potential choices include:
WebSocket
Adobe® Flash® Socket
AJAX long polling
AJAX multipart streaming
Forever Iframe
JSONP Polling
Which actual transport you end up using will depend on the your requirements for browser support and what technology you are using on the server to handle these requests. The transport choice may also depend on your network topology - what types of load balancers you need to integrate with, proxies, etc.
There are many libraries available on both the client and server sides, many of which support more than one of these transports.
For example (not an exhaustive list):
socket.io for nodejs
WebSocket
Adobe® Flash® Socket
AJAX long polling
AJAX multipart streaming
Forever Iframe
JSONP Polling
SignalR for an asp/.net backend
WebSockets
Server-Sent Events
ForeverFrame
Long Polling
Atmosphere for a java backend
WebSockets
Server Side Events (SSE)
Long-Polling
Forever frame
JSONP
IMO - Websockets is NOT overkill for this type of problem and would lend itself nicely to this type of application.

Without specifically discussing frameworks or knowing what is running in the backend of your server(s), we have a few options to consider for the frontend:
Websockets
Websockets are designed for bidirectional communication, although it is kind of shocking how many users are surfing the web in a browser that doesn't support websockets. I always recommend a fallback for this, such as the other methods listed below.
SSE
SSE is an HTML5 spec and is still shaky at best. Try scrolling on a page while when an SSE event fires... It may be a little easier on the backend, put it sometimes hangs on the client side since it runs inside the same thread that the DOM is running in.
Long Polling
Keeps your connection open. It doesn't scale well with PHP, but performs swimmingly with Python+Twisted on the backend, or Node.Js
Good Old Ajax
Keep your requests small, and you still have a scalable solution. Yes, a full GET request is the most expensive, but is supported in just about every browser rolled out the past ten years. It is also worth noting that GET requests are easy to scale horizontally with more hardware.
In a perfect world:
You would break up your application into a few components, operating behind a reverse proxy such as Nginx. Then use Node.Js + Socket.IO handle the realtime aspects of your app.
Another option would be to use small Ajax requests, and offer websocket support for the browsers that support it. This is advice specifically for PHP in the backend.

WebSocket is certainly not overkill. On the contrary. With websockets, you have a bi-directional communication channel; this means, that the server can initiate communication whenever it seems fit (e.g. when sensor data changes).
In a previous project, I have used node.js together with socket.io, to monitor 50+ sensors. Data was updated in real-time in a browser. The data was visualized using smoothie.js.
Whenever a sensor value was updated, it was communicated to the browser. Some sensors only updated once a minute, others once a second, ...
Polling would have been overkill, because it would retrieve all data for all sensors, even from those that were not updated yet.

I had a similar problem and did a lot of research on this. As I understand it, there are three main options:
Short polling: Have an endpoint that your javascript client pings every second. This is the worst option, because the pings add latency up to one second to your communication, and depending on how you implement, the endpoint could query the database every second, adding unnecessary overhead.
Long polling: Have an endpoint that your javascript client pings that holds the connection until a) the event occurs or b) the connection times out. If the endpoint returns a response, the client gets the event information. If the endpoint does not return a response, no event has occurred, and the client sends a new request. This is a good option because the events can immediately trigger the response to the client, assuming you have an asynchronous interprocess communication layer (like 0MQ) to send the message without any sort of polling.
Websocket: Have your javascript client connect to a websocket server, which will send a message to your client immediately upon the event trigger.
I think a websocket is your best option, because it accommodates immediate communication of the event without all the request/response overhead. And most importantly, this is exactly what websockets are designed to do! As such, you will probably have to write the least amount of custom code with this solution.

There are two great commercial services that might work for you.
Firebase - a javascript hierarchical database and realtime
messaging/ synchronization platform, uses websockets and has other fallbacks
PubNub - a real time message passing and queue system, uses websockets

Is there any good trick for server to handle more requests if I don't have to sent any data back?

I want to handle a lot of (> 100k/sec) POST requests from javascript clients with some kind of service server. Not many of this data will be stored, but I have to process all of them so I cannot spend my whole server power for serving requests only. All the processing need to be done in the same server instance, otherwise I'll need to use database for synchronization between servers which will be slower by orders of magnitude.
However I don't need to send any data back to the clients, and they don't even expect them.
So far my plan was to create few proxy servers instances which will be able to buffer the request and send them to main server in bigger packs.
For example let's say that I need to handle 200k requests / sec and each server can handle 40k. I can split load between 5 of them. Then each one will be buffering requests and sending them back to main server in packs of 100. This will result in 2k requests / sec on the main server (however, each message will be 100 times bigger - which probably means around 100-200kB). I could even send them back to the server using UDP to decrease amount of needed resources (then I need only one socket on main server, right?).
I'm just thinking if there is no other way to speed up the things. Especially, when as I said I don't need to send anything back. I have full control over javascript clients also, but unlucky javascript is unable to send data using UDP which probably would be solution for me (I don't even care if 0.1% of data will be lost).
Any ideas?
Edit in response to answers given me so far.
The problem isn't with server being to slow at processing events from the queue or with putting events in the queue itself. In fact I plan to use disruptor pattern (http://code.google.com/p/disruptor/) which was proven to process up to 6 million requests per second.
The only problem which I potentially can have is need to have 100, 200 or 300k sockets open at the same time, which cannot be handled by any of the mainstream servers. I know some custom solutions are possible (http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-3) but I'm wondering if there is no way to even better utilization of fact that I don't have to replay to clients.
(For example some way to embed part of the data in initial TCP packet and handle TCP packets as they would be UDP. Or some other kind of magic ;))

Make a unique and fast (probably in C) function that get's all requests, from a very fast server (like nginx). The only job of this function is to store the requests in a very fast queue (like redis if you got enought ram).
In another process (or server), depop the queue and do the real work, processing request one by one.

If you have control of the clients, as you say, then your proxy server doesn't even need to be an HTTP server, because you can assume that all of the requests are valid.
You could implement it as a non-HTTP server that simply sends back a 200, reads the client request until it disconnects, and then queues the requests for processing.

I think what you're describing is an implementation of a Message Queue. You also will need something to hand off these requests to whatever queue you use (RabbitMQ is quite good, there are many alternatives).
You'll also need something else running which can do whatever processing you actually want on the requests. You haven't made that very clear, so I'm not too sure exactly what would be right for you. Essentially the idea will be that incoming requests are dumped as quickly as simply as possible into the queue by your web server, and then the web server is free to go back to serving more requests. When the system has some resources, it uses them to process the queue, but when it's busy the queue just keeps growing.
Not sure what platform you're on, but might want to look at something like Lighttpd for serving the POSTs. You might (if same-domain restrictions don't shoot you down) get away with having Lighttpd running on a subdomain of your application (so post.myapp.com). Failing that you could put a proper load balancer in front of your webservers altogether (so all requests go to www.myapp.com and the load balancer decides whether to forward them to the web server or the queue processor).
Hope that helps

Consider using MongoDB for persisting your requests, it's fire and forget mechanism can help your servers to response faster.

How to deal with a big set of pending requests

I want to implement web site that will display to user a notification about some event happened on server. My plan is:
to make an asynchronous request to the server (ASP.NET) which will have a 600 seconds time-out
if event occurs on the server in the time interval of these 600 seconds server will response with an event details
if event is not occurred the server then server will send an 'no event' response at the end of 600 seconds
JS upon receiving a feedback from server will process the response and send the next request.
The problem of the approach is that for a big amount of visitors web site will have a lot of 'pending' requests.
Questions:
Should I consider that as a problem? What is solution for that? Probably I should implement another approach?
Please advice, any feedback is welcome.

I don't know specifics about asp.net's handling of pending requests, but what you are describing is basically long-polling. It's tricky for a number of reasons, including but not limited to:
each pending request consumes a thread, and you'll need to store state on each of those threads
if you have enough connections (not necessarily all that many; see above), you'll need them to span multiple machines, and you then need to come up with an architecture to distribute endpoints across those machines, and make sure each incoming request goes to the right machine. If you're only broadcasting the same data to all your users, this becomes much easier.
proxies or ISPs or what-have-you may shut down your long-poll request. You'll need an architecture resilient to that.
Here's a question about long-polling in asp.net: How to do long-polling AJAX requests in ASP.NET MVC? It's probably a good place to start.
Also you could consider a 3rd-party service like pusher to handle these connections for you, or (disclaimer: I work on App Engine) App Engine's Channel API.

Surely you could make more frequent requests to the server that do not consume server resources for 10 whole minutes?
e.g. send an AJAX request every 60 seconds or so, and return whether or not any event has occurred. The downside is that it could take up to a minute for a user to see notification about some event, so if you need it more or less immediately, that is a problem.
If it does have to be immediate, it seems like looking into "long polling" with something like node.js might be a solution, though non-trivial to implement.

Develop Reference

JavaScript is the programming language of the Web.