Scaling Socket.IO to multiple Node.js processes using cluster - javascript
Tearing my hair out with this one... has anyone managed to scale Socket.IO to multiple "worker" processes spawned by Node.js's cluster module?
Lets say I have the following on four worker processes (pseudo):
// on the server
var express = require('express');
var server = express();
var socket = require('socket.io');
var io = socket.listen(server);
// socket.io
io.set('store', new socket.RedisStore);
// set-up connections...
io.sockets.on('connection', function(socket) {
socket.on('join', function(rooms) {
rooms.forEach(function(room) {
socket.join(room);
});
});
socket.on('leave', function(rooms) {
rooms.forEach(function(room) {
socket.leave(room);
});
});
});
// Emit a message every second
function send() {
io.sockets.in('room').emit('data', 'howdy');
}
setInterval(send, 1000);
And on the browser...
// on the client
socket = io.connect();
socket.emit('join', ['room']);
socket.on('data', function(data){
console.log(data);
});
The problem: Every second, I'm receiving four messages, due to four separate worker processes sending the messages.
How do I ensure the message is only sent once?
Edit: In Socket.IO 1.0+, rather than setting a store with multiple Redis clients, a simpler Redis adapter module can now be used.
var io = require('socket.io')(3000);
var redis = require('socket.io-redis');
io.adapter(redis({ host: 'localhost', port: 6379 }));
The example shown below would look more like this:
var cluster = require('cluster');
var os = require('os');
if (cluster.isMaster) {
// we create a HTTP server, but we do not use listen
// that way, we have a socket.io server that doesn't accept connections
var server = require('http').createServer();
var io = require('socket.io').listen(server);
var redis = require('socket.io-redis');
io.adapter(redis({ host: 'localhost', port: 6379 }));
setInterval(function() {
// all workers will receive this in Redis, and emit
io.emit('data', 'payload');
}, 1000);
for (var i = 0; i < os.cpus().length; i++) {
cluster.fork();
}
cluster.on('exit', function(worker, code, signal) {
console.log('worker ' + worker.process.pid + ' died');
});
}
if (cluster.isWorker) {
var express = require('express');
var app = express();
var http = require('http');
var server = http.createServer(app);
var io = require('socket.io').listen(server);
var redis = require('socket.io-redis');
io.adapter(redis({ host: 'localhost', port: 6379 }));
io.on('connection', function(socket) {
socket.emit('data', 'connected to worker: ' + cluster.worker.id);
});
app.listen(80);
}
If you have a master node that needs to publish to other Socket.IO processes, but doesn't accept socket connections itself, use socket.io-emitter instead of socket.io-redis.
If you are having trouble scaling, run your Node applications with DEBUG=*. Socket.IO now implements debug which will also print out Redis adapter debug messages. Example output:
socket.io:server initializing namespace / +0ms
socket.io:server creating engine.io instance with opts {"path":"/socket.io"} +2ms
socket.io:server attaching client serving req handler +2ms
socket.io-parser encoding packet {"type":2,"data":["event","payload"],"nsp":"/"} +0ms
socket.io-parser encoded {"type":2,"data":["event","payload"],"nsp":"/"} as 2["event","payload"] +1ms
socket.io-redis ignore same uid +0ms
If both your master and child processes both display the same parser messages, then your application is properly scaling.
There shouldn't be a problem with your setup if you are emitting from a single worker. What you're doing is emitting from all four workers, and due to Redis publish/subscribe, the messages aren't duplicated, but written four times, as you asked the application to do. Here's a simple diagram of what Redis does:
Client <-- Worker 1 emit --> Redis
Client <-- Worker 2 <----------|
Client <-- Worker 3 <----------|
Client <-- Worker 4 <----------|
As you can see, when you emit from a worker, it will publish the emit to Redis, and it will be mirrored from other workers, which have subscribed to the Redis database. This also means you can use multiple socket servers connected the the same instance, and an emit on one server will be fired on all connected servers.
With cluster, when a client connects, it will connect to one of your four workers, not all four. That also means anything you emit from that worker will only be shown once to the client. So yes, the application is scaling, but the way you're doing it, you're emitting from all four workers, and the Redis database is making it as if you were calling it four times on a single worker. If a client actually connected to all four of your socket instances, they'd be receiving sixteen messages a second, not four.
The type of socket handling depends on the type of application you're going to have. If you're going to handle clients individually, then you should have no problem, because the connection event will only fire for one worker per one client. If you need a global "heartbeat", then you could have a socket handler in your master process. Since workers die when the master process dies, you should offset the connection load off of the master process, and let the children handle connections. Here's an example:
var cluster = require('cluster');
var os = require('os');
if (cluster.isMaster) {
// we create a HTTP server, but we do not use listen
// that way, we have a socket.io server that doesn't accept connections
var server = require('http').createServer();
var io = require('socket.io').listen(server);
var RedisStore = require('socket.io/lib/stores/redis');
var redis = require('socket.io/node_modules/redis');
io.set('store', new RedisStore({
redisPub: redis.createClient(),
redisSub: redis.createClient(),
redisClient: redis.createClient()
}));
setInterval(function() {
// all workers will receive this in Redis, and emit
io.sockets.emit('data', 'payload');
}, 1000);
for (var i = 0; i < os.cpus().length; i++) {
cluster.fork();
}
cluster.on('exit', function(worker, code, signal) {
console.log('worker ' + worker.process.pid + ' died');
});
}
if (cluster.isWorker) {
var express = require('express');
var app = express();
var http = require('http');
var server = http.createServer(app);
var io = require('socket.io').listen(server);
var RedisStore = require('socket.io/lib/stores/redis');
var redis = require('socket.io/node_modules/redis');
io.set('store', new RedisStore({
redisPub: redis.createClient(),
redisSub: redis.createClient(),
redisClient: redis.createClient()
}));
io.sockets.on('connection', function(socket) {
socket.emit('data', 'connected to worker: ' + cluster.worker.id);
});
app.listen(80);
}
In the example, there are five Socket.IO instances, one being the master, and four being the children. The master server never calls listen() so there is no connection overhead on that process. However, if you call an emit on the master process, it will be published to Redis, and the four worker processes will perform the emit on their clients. This offsets connection load to workers, and if a worker were to die, your main application logic would be untouched in the master.
Note that with Redis, all emits, even in a namespace or room will be processed by other worker processes as if you triggered the emit from that process. In other words, if you have two Socket.IO instances with one Redis instance, calling emit() on a socket in the first worker will send the data to its clients, while worker two will do the same as if you called the emit from that worker.
Let the master handle your heartbeat (example below) or start multiple processes on different ports internally and load balance them with nginx (which supports also websockets from V1.3 upwards).
Cluster with Master
// on the server
var express = require('express');
var server = express();
var socket = require('socket.io');
var io = socket.listen(server);
var cluster = require('cluster');
var numCPUs = require('os').cpus().length;
// socket.io
io.set('store', new socket.RedisStore);
// set-up connections...
io.sockets.on('connection', function(socket) {
socket.on('join', function(rooms) {
rooms.forEach(function(room) {
socket.join(room);
});
});
socket.on('leave', function(rooms) {
rooms.forEach(function(room) {
socket.leave(room);
});
});
});
if (cluster.isMaster) {
// Fork workers.
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
// Emit a message every second
function send() {
console.log('howdy');
io.sockets.in('room').emit('data', 'howdy');
}
setInterval(send, 1000);
cluster.on('exit', function(worker, code, signal) {
console.log('worker ' + worker.process.pid + ' died');
});
}
This actually looks like Socket.IO succeeding at scaling. You would expect a message from one server to go to all sockets in that room, regardless of which server they happen to be connected to.
Your best bet is to have one master process that sends a message each second. You can do this by only running it if cluster.isMaster, for example.
Inter-process communication is not enough to make socket.io 1.4.5 working with cluster. Forcing websocket mode is also a must. See WebSocket handshake in Node.JS, Socket.IO and Clusters not working
Related
400 bad request (making game with node and socket.io)
Making a game, I have no idea what I am doing when it comes to the online aspect. I am using node.js using my computer as the server host and client (localhost:3000) var express = require('express'); //no idea what I am doing here var app = express(); var server = app.listen(3000); var socket = require("socket.io"); var io = socket(server); io.sockets.on('connection', newConnection); app.use(express.static("public")); console.log("server is up"); //tells me that the server is ready function newConnection(socket) { console.log("new player!", socket.id); //tells me when a new player connects. } also have this code within the main public javascript file var socket; socket = io.connect("localhost:3000"); Whenever a new player connects i get 400 bad request errors and the game thinks multiple players have joined. picture to aid so pls help.
You will need to handle what happens when someone connects to your server. var express = require('express'); var app = express(); var server = require("http").createServer(app); server.listen(3000) //server listens on port 3000 var io = require("socket.io")(server) //this will be called when a new client connects io.on("connection", (socket)=>{ //socket is the socket object of the new client console.log("new socket connected with socket id = " + socket.id) }) Look at socket.io docs for more info.
In my game, I had a constructor function called "Number" and that was causing the problem the entire time. I'm assuming that the socket.io or node.js already had a function called "Number" and that's what caused the problem.
Some problems when scaling Socket.IO to multiple Node.js processes using cluster
My node.js server uses cluster module in order to work on multiple processes. If the server receives requests from clients with Socket.IO, it conveys the data to another server with redis publish. And it receive refined data with redis subscribe, and then it just toss this data to clients. I use one node process to receive data with redis sub, and other processes to send data to clients with socket.io. And the client connect socket.io when page loaded. Here, this is my problem. The connect event occured repeatedly not even the page loaded. When the client connect, I get the socket.id from that socket, and I use it later when I want to send data to that client socket. But this connect occur repeatedly, I think socket that client use changed. So, the first socket.id that I remembered will be useless. I can't send data from that socket.id. I stored auth information in the socket object, so the changed client socket is no help. index.pug $(document).ready(function(){ var socket = io.connect(); (...) app.js var cluster = require('cluster'); var socketio = require('socket.io'); var NRP = require('node-redis-pubsub'); var nrpForChat = new NRP(config.chatRedisConfig); var nrpForCluster = new NRP(config.clusterRedisConfig); var startExpressServer = function(){ var app = express(); var server = require('http').createServer(app); var io = socketio.listen(server); var redis = require('socket.io-redis'); io.adapter(redis({ host: 'localhost', port: 6380 })); io.sockets.on('connection', function(socket){ socketController.onConnect(io, socket, nrpForChat); }); server.listen(config.port, function(){ console.log('Server app listening on port '+config.port); }); nrpForCluster.on('to:others:proc', function(data){ var socket = io.sockets.connected[data.target.sockid]; if (socket) { if (data.event == '_net_auth') { if (data.data.res){ socket.enterId = data.data.data.enterId; socket.memberKey = data.data.data.memberKey; socket.sid = data.data.data.sid; socket.emit(data.event, data.data); }else{ console.log('auth failed.'); } } } else { socket.emit(data.event, data.data); } }); module.exports = app; } var numCpus = require('os').cpus().length; if (cluster.isMaster) { for (var i = 0; i < numCpus; i++) { cluster.fork(); } } else { if (cluster.worker.id == numCpus) { nrpForChat.on('chat:to:relay', function(data){ nrpForCluster.emit('to:others:proc', data); }); if (numCpus == 1) { startExpressServer(); } } else { startExpressServer(); } }
By default, socket.io connects with several consecutive http requests. It essentially starts in HTTP polling mode and then after some initial data exchange, it switches to a webSocket transport. Because of this, a cluster that does not have any sort of sticky load balancing will not work. Each of the initial consecutive http requests that are all supposed to go to the same server process will probably be sent to different server processes in the cluster and the initial connection will not work. There are two solutions that I know of: Implement some sort of sticky load balancing (in the clustering module) so that each client repeatedly goes to the same server process and thus all the consecutive http requests at the beginning of a connection will go to the same server process. Switch your client configurations to immediately switch to the webSocket transport and never use the HTTP polling. The connection will still start with an http request (since that's how all webSocket connections start), but that exact same connection will be upgraded to webSocket so there will only ever be one connection. FYI, you will also need to make sure that the reconnect logic in socket.io is properly reconnecting to the original server process that is was connected to. socket.io has node.js clustering support in combination with redis. While the socket.io documentation site has been down for multiple days now, you can find some info here and Scaling Socket.IO to multiple Node.js processes using cluster and here's a previously cached version of the socket.io doc for clustering.
Socket.io unable to emit data to client's unique room
I am using Node.js to create a media upload microservice. This service works by taking in the binary data of the upload to a buffer, and then using the S3 npm package to upload to an S3 bucket. I am trying to use the eventEmitter present in that package which shows the amount of data uploaded to S3, and send that back to the client which is doing the uploading (so that they can see upload progress). I am using socket.io for this sending of progress data back to the client. The problem I am having is that the .emit event in socket.io will send the upload progress data to all connected clients, not just the client which initiated the upload. As I understand it, a socket connects to a default room on 'connection', which is mirrored by the 'id' on the client side. According to the official docs, using socket.to(id).emit() should send the data scoped only to that client, but this is not working for me. UPDATED Example code: server.js: var http = require('http'), users = require('./data'), app = require('./app')(users); var server = http.createServer(app); server.listen(app.get('port'), function(){ console.log('Express server listening on port ' + app.get('port')); }); var io = require('./socket.js').listen(server); socket.js: var socketio = require('socket.io'); var socketConnection = exports = module.exports = {}; socketConnection.listen = function listen(app) { io = socketio.listen(app); exports.sockets = io.sockets; io.sockets.on('connection', function (socket) { socket.join(socket.id); socket.on('disconnect', function(){ console.log("device "+socket.id+" disconnected"); }); socketConnection.upload = function upload (data) { socket.to(socket.id).emit('progress', {progress:(data.progressAmount/data.progressTotal)*100}); }; }); return io; }; s3upload.js: var config = require('../config/aws.json'); var s3 = require('s3'); var path = require('path'); var fs = require('fs'); var Busboy = require('busboy'); var inspect = require('util').inspect; var io = require('../socket.js'); ... var S3Upload = exports = module.exports = {}; .... S3Upload.upload = function upload(params) { // start uploading to uploader var uploader = client.uploadFile(params); uploader.on('error', function(err) { console.error("There was a problem uploading the file to bucket, either the params are incorrect or there is an issue with the connection: ", err.stack); res.json({responseHTML: "<span>There was a problem uploading the file to bucket, either the params are incorrect or there is an issue with the connection. Please refresh and try again.</span>"}); throw new Error(err); }), uploader.on('progress', function() { io.upload(uploader); }), uploader.on('end', function(){ S3Upload.deleteFile(params.localFile); }); }; When using DEBUG=* node myapp.js, I see the socket.io-parser taking in this information, but it isn't emitting it to the client: socket.io-parser encoding packet {"type":2,"data":["progress",{"progress":95.79422221709825}],"nsp":"/"} +0ms socket.io-parser encoded {"type":2,"data":["progress",{"progress":95.79422221709825}],"nsp":"/"} as 2["progress",{"progress":95.79422221709825}] +0ms However, if I remove the .to portion of this code, it sends the data to the client (albeit to all clients, which will not help at all): io.sockets.on('connection', function(socket) { socket.join(socket.id); socket.emit('progress', {progress: (data.progressAmount/data.progressTotal)*100}); }); DEBUG=* node myapp.js: socket.io:client writing packet {"type":2,"data":["progress",{"progress":99.93823786632886}],"nsp":"/"} +1ms socket.io-parser encoding packet {"type":2,"data":["progress",{"progress":99.93823786632886}],"nsp":"/"} +1ms socket.io-parser encoded {"type":2,"data":["progress",{"progress":99.93823786632886}],"nsp":"/"} as 2["progress",{"progress":99.93823786632886}] +0ms engine:socket sending packet "message" (2["progress",{"progress":99.93823786632886}]) +0ms engine:socket flushing buffer to transport +0ms engine:ws writing "42["progress",{"progress":99.84186540937002}]" +0ms engine:ws writing "42["progress",{"progress":99.93823786632886}]" +0ms What am I doing wrong here? Is there a different way to emit events from the server to only specific clients that I am missing?
The second example of code you posted should work and if it does not, you should post more code. As I understand it, a socket connects to a default room on 'connection', which is mirrored by the 'id' on the client side. According to the official docs, using socket.to(id).emit() should send the data scoped only to that client, but this is not working for me. Socket.io is pretty much easier than that. The code below will send a 'hello' message to each client when they connect: io.sockets.on('connection', function (socket) { socket.emit('hello'); }); Everytime a new client connects to the socket.io server, it will run the specified callback using that particular socket as a parameter. socket.id is just an unique code to identify that socket but you don't really need that variable for anything, the code above shows you how to send a message through a particular socket. Socket.io also provides you some functions to create namespaces/rooms so you can group connections under some identifier (room name) and be able to broadcast messages to all of them: io.sockets.on('connection', function (socket) { // This will be triggered after the client does socket.emit('join','myRoom') socket.on('join', function (room) { socket.join(room); // Now this socket will receive all the messages broadcast to 'myRoom' }); ... Now you should understand socket.join(socket.id) just does not make sense because no socket will be sharing socket id. Edit to answer the question with the new code: You have two problems here, first: socketConnection.upload = function upload (data) { socket.to(socket.id).emit('progress', {progress:(data.progressAmount/data.progressTotal)*100}); }; Note in the code above that everything inside io.sockets.on('connection',function (socket) { will be run each time a clients connect to the server. You are overwriting the function to point it to the socket of the latest user. The other problem is that you are not linking sockets and s3 operations. Here is a solution merging socket.js and s3upload.js in the same file. If you really need to keep them separated you will need to find a different way to link the socket connection to the s3 operation: var config = require('../config/aws.json'); var s3 = require('s3'); var path = require('path'); var fs = require('fs'); var Busboy = require('busboy'); var inspect = require('util').inspect; var io = require('socket.io'); var socketConnection = exports = module.exports = {}; var S3Upload = exports = module.exports = {}; io = socketio.listen(app); exports.sockets = io.sockets; io.sockets.on('connection', function (socket) { socket.on('disconnect', function(){ console.log("device "+socket.id+" disconnected"); }); socket.on('upload', function (data) { //The client will trigger the upload sending the data /* some code creating the bucket params using data */ S3Upload.upload(params,this); }); }); S3Upload.upload = function upload(params,socket) { // Here we pass the socket so we can answer him back // start uploading to uploader var uploader = client.uploadFile(params); uploader.on('error', function(err) { console.error("There was a problem uploading the file to bucket, either the params are incorrect or there is an issue with the connection: ", err.stack); res.json({responseHTML: "<span>There was a problem uploading the file to bucket, either the params are incorrect or there is an issue with the connection. Please refresh and try again.</span>"}); throw new Error(err); }), uploader.on('progress', function() { socket.emit('progress', {progress:(uploader.progressAmount/uploader.progressTotal)*100}); }), uploader.on('end', function(){ S3Upload.deleteFile(params.localFile); }); };
According to the documentation all the users join the default room identified by the socket id, so no need for you to join in on connection. Still according to that, if you want to emit to a room in a namespace from a specific socket you should use socket.broadcast.to(room).emit('my message', msg), given that you want to broadcast the message to all the clients connected to that specific room.
All new connections are automatically joined to room having the name that is equal to their socket.id. You can use it to send messages to specific user, but you have to know socket.id associated with connection initialized by this user. You have to decide how you will manage this association (via databases, or in memory by having an array for it), but once you have it, just send progress percentage via: socket.broadcast.to( user_socket_id ).emit( "progress", number_or_percent );
Socket.io trigger events between two node.js apps?
I have two servers, one for the back end app, and one that serves the front end. They are abstracted, but share the same database, I have a need for both to communicate real time events between each other using socket.io. Front end // serves a front end website var appPort = 9200; var express = require('express'); var app = express(); var http = require('http'); var server = http.createServer(app); var io = require('socket.io').listen(server); io.sockets.on('connection', function(socket) { socket.on('createRoom', function(room) { socket.join(room); // use this to create a room for each socket (room) is from client side }); socket.on('messageFromClient', function(data) { console.log(data) socket.broadcast.to(data.chatRoom).emit('messageFromServer', data); }); }); Back end //Serves a back end app var appPort = 3100; var express = require('express'); var app = express(); var http = require('http'); var server = http.createServer(app); var io = require('socket.io').listen(server); io.sockets.on('connection', function(socket) { socket.on('createRoom', function(room) { socket.join(room); // use this to create a room for each socket (room) is from client side }); socket.on('messageFromClient', function(data) { console.log(data) socket.broadcast.to(data.chatRoom).emit('messageFromServer', data); }); }); As an admin I want to log in to my back end where I can see all the people logged in, there I can click on whom I would like to chat with. Say they are logged in to the front end website, when the admin submits a message client side they trigger this socket.emit('messageFromClient', Message); how can I trigger messageFromClient on the front end using port 9200 submitting from the backend port 3100?
You really dont need to start the socket.io server in the front end for this use case. The way you can get it to work is : Keep the backend as it is, which acts as a Socket.IO server In the font end connect to the server using a Socket.IO client. You can install the client by calling npm install socket.io-client and then connect to the server using : var io = require('socket.io-client'), socket = io.connect('localhost', { port: 3100 }); socket.on('connect', function () { console.log("socket connected"); }); socket.emit('messageFromClient', { user: 'someuser1', msg: 'i am online' }); You can then create a map of socket objects with their corresponding username and send a message to that user based on your business logic. More Information : You may have to add cases like client disconnection etc. You can read more about the call backs for that here : https://github.com/Automattic/socket.io-client
Nodejs Cluster: Choose Worker
I use Nodejs Cluster. I got 8 workers. Whenever I go to the application, I get connected to the same worker, (which is normal since the worker can handle multiple clients.) For testing purposes, I'd like to connect to different workers without having to siege the application. Is there a way to do that? Ex: Going to mywebsite.com/3 would connect to 3rd worker.
Here is a port-based solution: var cluster = require('cluster'); var http = require('http'); if (cluster.isMaster) { cluster.fork(); cluster.fork(); cluster.fork(); return; } function app (req, res) { res.writeHead(200); res.end('hello from ' + cluster.worker.id); } http.createServer(app).listen(8000); http.createServer(app).listen(8000 + cluster.worker.id); for example, if you wish to connect to 2 worker you use port 8002.