I have been able to use Websockets to create a chat application between users using https://github.com/ghedipunk/PHP-Websockets.
What the server does is stores all the created socket object variables in an array when each user connects to the socket. So, ultimately he creates an array which has info regarding all users with the socket object. And he loops through the array each time to retrieve the object of a user so that message can be sent to the respective socket.
This will be fine for small amount of users. But how do we handle huge amounts of users? I thought of storing socket objects in DB but then I came across this: How to save php socket resource in database?
Which says "Sockets have to be recreated and cannot be stored in DB".
So, is there any better option rather than recreating? And if I have to recreate the socket, how do I do it using the PHP-Websockets library?
Thanks in advance.
Socket cannot be saved on disk because it's existing connection. If you destroy the live object, the connection is closed and the user is disconnected.
First I want to remind you, that memory and CPU might not be much of a concern. Node.js interpreter is actually surprisingly fast and WebSocket object hardly takes much memory. You will face other issues first, such as bandwidth problems, too many open connections etc.
Possible tweaks I can think of:
Multithreading - spawn child processes
You can share sockets between node.js processes though, which is described here: https://nodejs.org/api/child_process.html#child_process_example_sending_server_object
I cannot guarantee this will improve performance but it allows you to handle input from different users in parallel. I also don't know if WebSockets do support this, but I think so.
Reducing socket overhead
This is important to remember - you can't store the socket object on disk, but you can store most of other stuff. The socket object hardly takes significant amount of memory but if you store other info about user, it might slow you down. Redundant information should be stored in backend provided by database or something like that.
Native core
If you still encounter problems, you can write C/C++ native multithreaded core for your chat and let it do the heavy operations, such as looping through sockets and sending them messages. Have this core connected to Node.js server where your logic will be. Also not that you can write native plugins for Node.js.
Use unique user_ids or device_ids. For example, session id, and store thays to DB. Than, send user_id when subscribing to events, and send messages only to that user.
Related
I have a chat application with huge chatrooms (15'000 user connected to one single room).
Only few have the right to write, so theoretically there should not be a huge load.
I have noticed that there are performance issue: when only one message is send, the server CPU load spikes to 30%-50% and the message gets delivered slowly (maybe 1 second later or worse if you write multiple messages)
I have analysed the performance with clinic-flame. I see that this code is the problem:
socket.to("room1").emit(/* ... */); which will trigger send() in engine.io and clearBuffer in the ws library.
Does someone know if I am doing something wrong and how to optimize the performance?
You can load balance sokets.io servers.
I haven't do this yet, but you have informations about doing this here: https://socket.io/docs/v4/using-multiple-nodes
If you use node.js as http server : https://socket.io/docs/v4/using-multiple-nodes#using-nodejs-cluster
Alternatively, you can consider using redis for that. There is a package that allow to distribute trafic on a cluster of process or server:
https://github.com/socketio/socket.io-redis-adapter
You do not have to manage load balance if you have up to 15k to 20k Users, follow the below instructions.
In socket code do not use await async.
In socket code do not use any DB query.
Use a socket to take data from someone and provide those data to others, if you want to store those data do it using APIs before or after you send it to the socket.
It's all about how you use it. queries and await all this will slow your response and load, there is always a way to do things differently.
The above 3 suggestions boost my socket performance, I hope it does the same for you.
I discovered SSE (Server Sent Events) pretty late, but I can't seem to figure out some use cases for it, so that it would be more efficient than using setInterval() and ajax.
I guess, if we'd have to update the data multiple times per second then having one single connection created would produce less overhead. But, except this case, when would one really choose SSE?
I was thinking of this scenario:
A new user comment from the website is added in the database
Server periodically queries DB for changes. If it finds new comment, send notification to client with SSE
Also, this SSE question came into my mind after having to do a simple "live" website change (when someone posts a comment, notify everybody who is on the site). Is there really another way of doing this without periodically querying the database?
Nowadays web technologies are used to implmement all sort of applications, including those which need to fetch constant updates from the server.
As an example, imagine to have a graph in your web page which displays real time data. Your page must refresh the graph any time there is new data to display.
Before Server Sent Events the only way to obtain new data from the server was to perform a new request every time.
Polling
As you pointed out in the question, one way to look for updates is to use setInterval() and an ajax request. With this technique, our client will perform a request once every X seconds, no matter if there is new data or not. This technique is known as polling.
Events
Server Sent Events on the contrary are asynchronous. The server itself will notify to the client when there is new data available.
In the scenario of your example, you would implement SSE such in a way that the server sends an event immediately after adding the new comment, and not by polling the DB.
Comparison
Now the question may be when is it advisable to use polling vs SSE. Aside from compatibility issues (not all browsers support SSE, although there are some polyfills which essentially emulate SSE via polling), you should focus on the frequency and regularity of the updates.
If you are uncertain about the frequency of the updates (how often new data should be available), SSE may be the solution because they avoid all the extra requests that polling would perform.
However, it is wrong to say in general that SSE produce less overhead than polling. That is because SSE requires an open TCP connection to work. This essentially means that some resources on the server (e.g. a worker and a network socket) are allocated to one client until the connection is over. With polling instead, after the request is answered the connection may be reset.
Therefore, I would not recommend to use SSE if the average number of connected clients is high, because this could create some overhead on the server.
In general, I advice to use SSE only if your application requires real time updates. As real life example, I developed a data acquisition software in the past and had to provide a web interface for it. In this case, a lot of graphs were updated every time a new data point was collected. That was a good fit for SSE because the number of connected clients was low (essentially, only one), the user interface should update in real-time, and the server was not flooded with requests as it would be with polling.
Many applications do not require real time updates, and thus it is perfectly acceptable to display the updates with some delay. In this case, polling with a long interval may be viable.
So, I've web application which structure is based on this file structure: https://scotch.io/tutorials/setting-up-a-mean-stack-single-page-application .
My app also has a connection to mongoDB on Mlab.
what my app does:
allows users to login/signup;
retrieves data from mlab;
retrieved data can be rated by users;
retrieved data can be deleted by admin;
users can add data to db (data is training plans);
Now I need to make my app horizontally scalable, but I am a bit lost here:
•Sine I assume there i no real-time activities I shoudn't need something like socket.io?
•Should I add some sort of MQ (rabbitMQ, ZMQ, etc.): If so, perhaps any pointers on how to, because most of the examples just use simple text messages.
•I am quite sure I would need some load balancer. Nginx, HaProxy... I probably should change my express server setup to listen to multiple ports first, is that right?
Or am I completely wrong about this?
P.S.: Hope this isn't too broad question.
Different needs require different approaches :)
These can vary according to your needs. Not every scalable application has to have them.If you want the application to be asynchronous, you can take all the requests in a queue and return to the client instantly.You may then need a push mechanism to notify the client that the operation is over. (Socket.io, RabbitMQ etc)
Of course you will need a reverse proxy to distribute requests to different servers load balanced or workload basis (HAProxy etc.)
The first thing you need to pay attention to when you want to scale the application is to have a stateless structure.Or get them out of the process.(For example session, cache, file server)The second thing you need to be aware of is the authentication phase.A client that logged in from ServerA may encounter "unauthorized" on ServerB on subsequent requests.You should also think about the resources used by the application.While these resources serve a single server, they will begin to respond to millions of requests from five to ten servers simultaneously.There are things like monitoring instances.And a lot of things like that.
These are the things you should really think about :)
We have a small set of multiplayer servers using node.js that are currently serving roughly 1 million messages a minute during peak usage. Is there a way to 'gracefully' restart the server without causing sockets to drop? Basically, I'm wondering what is the best way to handle restarts were it would normally be very disruptive to players?
When a process exits, the OS cleans up any sockets that belong to it by closing them. So, there's no way to just do a simple server restart and preserve your socket connections.
In some operating systems, you can pass ownership of a socket from one process to another so it might be technically feasible for you to create a temporary process or perhaps a previously existing parent process), pass ownership of the sockets to that other process, restart your server, then transfer ownership back to the newly started process. I've never tried this (or heard about it being done), but it sounds like something that might be feasible.
Here's some information on transferring a socket to a child process using child.send() in node.js. It appears this can only be done for a node.js socket created by the net module and there are some caveats about doing it, but it is possible.
If not, the usual work-around is have the clients automatically reconnect when their connection is closed. Done properly, this can be fairly transparent to the client (except for the momentary time when the server is not running).
use redis or some in-memory database for storing connection so that you can easily reconnect even after server restart without loosing any sessions or connection. Try this if it suits your need. Also please note during restart connection may drop but due to having persistence you will be connected again very easily.
socket.io-redis
I just started learning Node.js and as I was learning about the fs.watchFile() method, I was wondering if a chat website could be efficiently built with it (and fs.writeFile()), against for example Socket.IO which is stable, but I believe not 100% stable (several fallbacks, including flash).
Using fs.watchFile could perhaps also be used to keep histories of the chats quite simply (as JSON would be used on the spot).
The chat files could be formatted in JSON in such a way that only the last chatter's message is brought up to the DOM (or whatever to make it efficient to 'fetch' messages when the file gets updated).
I haven't tried it yet as I still need to learn more about Node, and even more to be able to compare it with Socket.IO, but what's your opinion about it? Could it be an efficient/stable way of doing chats?
fs.watchFile() can be used to watch changes to the file in the local filesystem (on the server). This will not solve your need to update all clients chat messages in their browsers. You'll still need web sockets, AJAX or Flash for that (or socket.io, which handles all of those).
What you could typically do in the client is to try to use Web Sockets. If browser does not support them, try to use XMLHttpRequest. If that fails, fallback to Flash. It's a lot of programming to do, and it has to be handled by node.js server as well. Socket.io does that for you.
Also, socket.io is pretty stable. Fallback to Flash is not due to it's instability but due to lack of browser support for better solutions (like Web Sockets).
Storing chat files in flatfile JSON is not a good idea, because if you are going to manipulating the files, you would have to parse and serialize entire JSON objects, which would become very slow as the size of the JSON object increased. The watch methods for the filesystem module also don't work on all operating systems.
You also can't compare Node.js to Socket.IO because they are entirely different things. Socket.IO is a Node module for realtime transport between the browser and the server. What you need is dependent on what you're doing. If you need chat history, then you should be using a database such as MongoDB or MySQL. Watching files for changes is not an efficient way and you should just send messages as they received.
In conclusion no, using fs.watchFile() and fs.writeFile() is a very bad idea, because race conditions would occur due to concurrent file writes, besides that fs.watchFile() uses polling to check if a file has changed. You should instead use Socket.IO and push messages to other clients / store them in a database as they are received.
You can use long pooling method using javascript setTimeout and setInterval
long pooling
basically long pooling working on Ajax reqest and server responce time.
server will respond after a certain time (like after 50 seconds ) if there is not notification or message else it will respond with data and from client side when client gets response client javascript makes another request for new update and wait till response this process is endless until server is running