Socket.io Performance optimisation 15'000 users

Socket.io Performance optimisation 15'000 users - javascript

I have a chat application with huge chatrooms (15'000 user connected to one single room).
Only few have the right to write, so theoretically there should not be a huge load.
I have noticed that there are performance issue: when only one message is send, the server CPU load spikes to 30%-50% and the message gets delivered slowly (maybe 1 second later or worse if you write multiple messages)
I have analysed the performance with clinic-flame. I see that this code is the problem:
socket.to("room1").emit(/* ... */); which will trigger send() in engine.io and clearBuffer in the ws library.
Does someone know if I am doing something wrong and how to optimize the performance?

You can load balance sokets.io servers.
I haven't do this yet, but you have informations about doing this here: https://socket.io/docs/v4/using-multiple-nodes
If you use node.js as http server : https://socket.io/docs/v4/using-multiple-nodes#using-nodejs-cluster
Alternatively, you can consider using redis for that. There is a package that allow to distribute trafic on a cluster of process or server:
https://github.com/socketio/socket.io-redis-adapter

You do not have to manage load balance if you have up to 15k to 20k Users, follow the below instructions.
In socket code do not use await async.
In socket code do not use any DB query.
Use a socket to take data from someone and provide those data to others, if you want to store those data do it using APIs before or after you send it to the socket.
It's all about how you use it. queries and await all this will slow your response and load, there is always a way to do things differently.
The above 3 suggestions boost my socket performance, I hope it does the same for you.

Related

Should I use WebSocket for this use case

I kind of new for WebSocket and would like to ask you guys a question if WebSocket is really the best way I should go.
Use case. Clients need to submit different kinds of jobs, e.g. J1, J2, J3, ... to server API through web GUI and the server will do or distribute to other computational resources to accomplish the jobs, however the server needs to update each Client the progress of the jobs they submit. One very simple example is that, if Client A wants to upload a big file, and server needs to notify the progress of the uploading until it is fully uploaded. I do think this is a very common use case.
The way I am doing now is to use HTTP polling, i.e. query status from Client side every 1s to get status from the server and display. I do think there must be other more efficient way of doing this, and I come up with WebSocket way. However after doing some reading, WebSocket's best use is to real-time broadcast same data to all subscripters, e.g. updating a certain stock price in a given channel.
Do you guys think if WebSocket is the right way to go? if so, how should I build the channel for different Clients and different types of jobs, or any other suggestions?
thank you.

WebSocket may be a solution if you require real-time notification. You don't need different channels for each job, you just need messages that allow you to multiplex on server-side and demultiplex on client-side.
You should take into account that each opened WebSocket uses resources on the server, so you should consider the average workload.
On the other hand, if you don't require real-time notifications, loose polling may be a better solution.

How to implement horizontall scalability using NodeJs

So, I've web application which structure is based on this file structure: https://scotch.io/tutorials/setting-up-a-mean-stack-single-page-application .
My app also has a connection to mongoDB on Mlab.
what my app does:
allows users to login/signup;
retrieves data from mlab;
retrieved data can be rated by users;
retrieved data can be deleted by admin;
users can add data to db (data is training plans);
Now I need to make my app horizontally scalable, but I am a bit lost here:
•Sine I assume there i no real-time activities I shoudn't need something like socket.io?
•Should I add some sort of MQ (rabbitMQ, ZMQ, etc.): If so, perhaps any pointers on how to, because most of the examples just use simple text messages.
•I am quite sure I would need some load balancer. Nginx, HaProxy... I probably should change my express server setup to listen to multiple ports first, is that right?
Or am I completely wrong about this?
P.S.: Hope this isn't too broad question.

Different needs require different approaches :)
These can vary according to your needs. Not every scalable application has to have them.If you want the application to be asynchronous, you can take all the requests in a queue and return to the client instantly.You may then need a push mechanism to notify the client that the operation is over. (Socket.io, RabbitMQ etc)
Of course you will need a reverse proxy to distribute requests to different servers load balanced or workload basis (HAProxy etc.)
The first thing you need to pay attention to when you want to scale the application is to have a stateless structure.Or get them out of the process.(For example session, cache, file server)The second thing you need to be aware of is the authentication phase.A client that logged in from ServerA may encounter "unauthorized" on ServerB on subsequent requests.You should also think about the resources used by the application.While these resources serve a single server, they will begin to respond to millions of requests from five to ten servers simultaneously.There are things like monitoring instances.And a lot of things like that.
These are the things you should really think about :)

Alternative to "Notification URL" to Handle Long Running API Process in Node

I am building an API that will take a long time to return data, up to 60 seconds while a conversion takes place. While running, I would like to keep the users informed of any errors and notify them which process in the conversion stage we are at.
This is pretty easy on the client since I can simply send a WebSocket event, but for a public API, that's not very practical.
I know I can request a notification URL and send updates to the given URL, but it seems cumbersome and potentially resource heavy. Is there another more efficient means to send progress notifications?
Ideally, the user consuming the api would be able to setup.
.on("error", function(err) {
//handle error
});
or something to that effect.

You're not really clear on who the consumers of your API are, what kinds of clients they're using, or what the workflow will look like. So there's a lot of different answers depending on what you're looking for and what resources you have available.
A non-exhaustive list:
REST endpoint polling
Understood that you aren't a fan, but this remains one of the best ways to do it for a wide range of clients, is one of only two (that I know of) ways to do it for purely browser-based clients. Performance wise, it's not awful if you setup your caching strategy appropriately and set throttle limits on your clients (which you should be doing anyway). I disagree that it's a PITA for clients to use consume, but that's opinion and you obviously feel differently. A way to mitigate that PITA is to offer an SDK that handles that mechanism for consumers.
Web Sockets
I get that you might be dealing with clients who aren't starting off in the web, but if a client can make a RESTful request, you could set the server to do the web socket upgrade if the client advertises interest in establishing same. I'm not a fan of this option as it feels more complex to me (more moving parts), but it's an option if you like web sockets and all/most of your clients will be web socket capable. Or you could just have the REST response be the URL to the web socket you're opening for that client.
Web Hooks
If your clients are likely to be other machines (esp. servers), then a web hook is a very good approach, especially if the event you want to raise can happen more than once and at unpredictable intervals. In this scheme, the client makes a REST request to you, part of the data they send you includes a URL that you will POST data to (in a format you specify in your API) when the event occurs. Obviously, they either have to leave that URL open to your POST or else you can agree upon some kind of credentialing that your server will respect.
TCP Socket
Similar to the Web Socket option, in that you'd probably have a REST request hit your endpoint, and then respond with the socket connection information/URI to a custom TCP socket. This is a bit nonstandard, but can be very useful and efficient in the right use cases. I haven't used it in a while so they may have changed it, but this is how Heroku's API used to handle streaming logs.
Pub/Sub or Message Queue or similar
Redis can do this, as can many others. In this scenario you're making a more generic solution where there might be more than one event channel clients can subscribe to, and so on. I dislike exposing Redis directly for security reasons, which means you'll still need to figure out how to handle the comms between Redis and the client (see above), but using it under the hood will at least buy you some of the conceptual logic of handling publishers and subscribers and so on; useful if you have more than one event as I said. This is a more heavyweight solution than the above, though, and will increase your sysadmin overhead by some amount (depending on your high availability needs, etc)

Chat application using websockets

I have been able to use Websockets to create a chat application between users using https://github.com/ghedipunk/PHP-Websockets.
What the server does is stores all the created socket object variables in an array when each user connects to the socket. So, ultimately he creates an array which has info regarding all users with the socket object. And he loops through the array each time to retrieve the object of a user so that message can be sent to the respective socket.
This will be fine for small amount of users. But how do we handle huge amounts of users? I thought of storing socket objects in DB but then I came across this: How to save php socket resource in database?
Which says "Sockets have to be recreated and cannot be stored in DB".
So, is there any better option rather than recreating? And if I have to recreate the socket, how do I do it using the PHP-Websockets library?
Thanks in advance.

Socket cannot be saved on disk because it's existing connection. If you destroy the live object, the connection is closed and the user is disconnected.
First I want to remind you, that memory and CPU might not be much of a concern. Node.js interpreter is actually surprisingly fast and WebSocket object hardly takes much memory. You will face other issues first, such as bandwidth problems, too many open connections etc.
Possible tweaks I can think of:
Multithreading - spawn child processes
You can share sockets between node.js processes though, which is described here: https://nodejs.org/api/child_process.html#child_process_example_sending_server_object
I cannot guarantee this will improve performance but it allows you to handle input from different users in parallel. I also don't know if WebSockets do support this, but I think so.
Reducing socket overhead
This is important to remember - you can't store the socket object on disk, but you can store most of other stuff. The socket object hardly takes significant amount of memory but if you store other info about user, it might slow you down. Redundant information should be stored in backend provided by database or something like that.
Native core
If you still encounter problems, you can write C/C++ native multithreaded core for your chat and let it do the heavy operations, such as looping through sockets and sending them messages. Have this core connected to Node.js server where your logic will be. Also not that you can write native plugins for Node.js.

Use unique user_ids or device_ids. For example, session id, and store thays to DB. Than, send user_id when subscribing to events, and send messages only to that user.

Advice on which technology to use for real time notifications

I have X amount of activity sensors connected to a server that inserts data to a database everytime a sensor is triggered. What I'm trying to do is create a web interface with a blue print of the facility (svg) and whenever a sensor is triggered, besides the db insert, I want it to show some sort of alert in my blue print. For that I need to keep an open connection to the server I think.
I was thinking of using web sockets, but it might be overkill since I only need to retrieve data from the server. But running an ajax call every second doesn't sound very efficient either. Are there any other alternatives?
Thank you

Some potential choices include:
WebSocket
Adobe® Flash® Socket
AJAX long polling
AJAX multipart streaming
Forever Iframe
JSONP Polling
Which actual transport you end up using will depend on the your requirements for browser support and what technology you are using on the server to handle these requests. The transport choice may also depend on your network topology - what types of load balancers you need to integrate with, proxies, etc.
There are many libraries available on both the client and server sides, many of which support more than one of these transports.
For example (not an exhaustive list):
socket.io for nodejs
WebSocket
Adobe® Flash® Socket
AJAX long polling
AJAX multipart streaming
Forever Iframe
JSONP Polling
SignalR for an asp/.net backend
WebSockets
Server-Sent Events
ForeverFrame
Long Polling
Atmosphere for a java backend
WebSockets
Server Side Events (SSE)
Long-Polling
Forever frame
JSONP
IMO - Websockets is NOT overkill for this type of problem and would lend itself nicely to this type of application.

Without specifically discussing frameworks or knowing what is running in the backend of your server(s), we have a few options to consider for the frontend:
Websockets
Websockets are designed for bidirectional communication, although it is kind of shocking how many users are surfing the web in a browser that doesn't support websockets. I always recommend a fallback for this, such as the other methods listed below.
SSE
SSE is an HTML5 spec and is still shaky at best. Try scrolling on a page while when an SSE event fires... It may be a little easier on the backend, put it sometimes hangs on the client side since it runs inside the same thread that the DOM is running in.
Long Polling
Keeps your connection open. It doesn't scale well with PHP, but performs swimmingly with Python+Twisted on the backend, or Node.Js
Good Old Ajax
Keep your requests small, and you still have a scalable solution. Yes, a full GET request is the most expensive, but is supported in just about every browser rolled out the past ten years. It is also worth noting that GET requests are easy to scale horizontally with more hardware.
In a perfect world:
You would break up your application into a few components, operating behind a reverse proxy such as Nginx. Then use Node.Js + Socket.IO handle the realtime aspects of your app.
Another option would be to use small Ajax requests, and offer websocket support for the browsers that support it. This is advice specifically for PHP in the backend.

WebSocket is certainly not overkill. On the contrary. With websockets, you have a bi-directional communication channel; this means, that the server can initiate communication whenever it seems fit (e.g. when sensor data changes).
In a previous project, I have used node.js together with socket.io, to monitor 50+ sensors. Data was updated in real-time in a browser. The data was visualized using smoothie.js.
Whenever a sensor value was updated, it was communicated to the browser. Some sensors only updated once a minute, others once a second, ...
Polling would have been overkill, because it would retrieve all data for all sensors, even from those that were not updated yet.

I had a similar problem and did a lot of research on this. As I understand it, there are three main options:
Short polling: Have an endpoint that your javascript client pings every second. This is the worst option, because the pings add latency up to one second to your communication, and depending on how you implement, the endpoint could query the database every second, adding unnecessary overhead.
Long polling: Have an endpoint that your javascript client pings that holds the connection until a) the event occurs or b) the connection times out. If the endpoint returns a response, the client gets the event information. If the endpoint does not return a response, no event has occurred, and the client sends a new request. This is a good option because the events can immediately trigger the response to the client, assuming you have an asynchronous interprocess communication layer (like 0MQ) to send the message without any sort of polling.
Websocket: Have your javascript client connect to a websocket server, which will send a message to your client immediately upon the event trigger.
I think a websocket is your best option, because it accommodates immediate communication of the event without all the request/response overhead. And most importantly, this is exactly what websockets are designed to do! As such, you will probably have to write the least amount of custom code with this solution.

There are two great commercial services that might work for you.
Firebase - a javascript hierarchical database and realtime
messaging/ synchronization platform, uses websockets and has other fallbacks
PubNub - a real time message passing and queue system, uses websockets

Develop Reference

JavaScript is the programming language of the Web.