node js clustering is repeating the same task on all 8 processes - javascript

I've been trying to enable clustering in my node js app. Currently I use this snippet to enable it:
var cluster = require('cluster');
if (cluster.isMaster) {
// Count the machine's CPUs
var cpuCount = require('os').cpus().length;
// Create a worker for each CPU
for (var i = 0; i < cpuCount; i += 1) {
cluster.fork();
}
// Listen for dying workers
cluster.on('exit', function () {
cluster.fork();
});
}
And basically my code performs writes to a Firebase database based on conditions. The problem is that the writes are occurring 8 times each, rather than one worker just taking care of one write task, it seems that all threads are performing all tasks. Is there a way to avoid this? If so, can someone point me in the direction of some resources on this? I can't find anything on google for using Firebase with node js clustering. Here is an example of the way one of my functions work (ref is my firebase reference):
ref.child('user-sent').on('child_added', function(snapshot) {
var message = snapshot.child('message');
payload['user-received/'] = message;
ref.update(payload); // this occurs once for each fork so it updates 8 times
});

If you're spawning 8 threads and each thread attaches a listener on the same location (user-sent), then each thread will fire the child_added event for each child under that location. This is the expected behavior.
If you want to implement a worker queue, where each node under user-sent is only handled by one thread, you'll have to use a work-distribution mechanism that ensures only one thread can claim each node.
The firebase-queue library implements such a work claim mechanism, using Firebase Database transactions. It's been used to scale to a small to medium number of workers (think < 10, not dozens).

Related

How to terminate a Web worker inside addEventListener()?

I have a Web Worker here.
//worker.js
self.addEventListener("message", function(event) {
self.postMessage(event.data);
self.close();
});
And here main js file.
//main.js
var webworker = new Worker("worker.js");
webworker.postMessage('Send something.');
webworker.addEventListener("message", function(event) {
webworker.terminate();
});
The problem is that the "webworker.terminate();" inside the addEventListener does n't seem to work.There were no warnings or errors, but the web worker was not terminated.
I am creating an autocomplete input, it creates workers every typing, i use workers to search through a large data for the results, the problem that too many workers were created that i want to terminate them. i know they are still running because of the cpu consumed percentage and the code after "webworker.terminate();" still proceed while they should not.
I simplize the code here, i create 8 workers for each typings in the input field, they work with the input value every types.The problem is that the old workers are still active while i type new value into the input field, i just want to terminate them.
inp.addEventListener("input", function(e) {
for (let i = 0; i < 8; i++) {
var webworker = new Worker("autocomplete/worker.js");
webworker.postMessage(inp.value);
webworker.addEventListener("message", function(event) {
//do thing
})
}
}
Is there a way to terminate the Web worker in this scenario? I just want to terminate the Worker right after the response's arrival.
Thank you and sorry for my bad English.

How to efficiently stream a real-time chart from a local data file

complete noob picking up NodeJS over the last few days here, and I've gotten myself in big trouble, it looks like. I've currently got a working Node JS+Express server instance, running on a Raspberry Pi, acting as a web interface for a local data acquisition script ("the DAQ"). When executed, the script writes out data to a local file on the Pi, in .csv format, writing out in real-time every second.
My Node app is a simple web interface to start (on-click) the data acquisition script, as well as to plot previously acquired data logs, and visualize the actively being collected data in real time. Plotting of old logs was simple, and I wrote a JS function (using Plotly + d3) to read a local csv file via AJAX call, and plot it - using this script as a starting point, but using the logs served by express rather than an external file.
When I went to translate this into a real-time plot, I started out using the setInterval() method to update the graph periodically, based on other examples. After dealing with a few unwanted recursion issues, and adjusting the interval to a more reasonable setting, I eliminated the memory/traffic issues which were crashing the browser after a minute or two, and things are mostly stable.
However, I need help with one thing primarily:
Improving the efficiency of my first attempt approach: This acquisition script absolutely needs to be written to file every second, but considering that a typical run might last 1-2 weeks, the file size being requested on every Interval loop will quickly start to balloon. I'm completely new to Node/Express, so I'm sure there's a much better way of doing the real-time rendering aspect of this - that's the real issue here. Any pointers of a better way to go about doing this would be massively helpful!
Right now, the killDAQ() call issued by the "Stop" button kills the underlying python process writing out the data to disk. Is there a way to hook into using that same button click to also terminate the setInterval() loop updating the graph? There's no need for it to be updated any longer after the data acquisition has been stopped so having the single click do double duty would be ideal. I think that setting up a listener or res/req approach would be an option, but pointers in the right direction would be massively helpful.
(Edit: I solved #2, using global window. variables. It's a hack, but it seems to work:
window.refreshIntervalId = setInterval(foo);
...
clearInterval(window.refreshIntervalId);
)
Thanks for much for the help!
MWE:
html (using Pug as a template engine):
doctype html
html
body.default
.container-fluid
.row
.col-md-5
.row.text-center
.col-md-6
button#start_button(type="button", onclick="makeCallToDAQ()") Start Acquisition
.col-md-6
button#stop_button(type="button", onclick="killDAQ()") Stop Acquisition
.col-md-7
#myDAQDiv(style='width: 980px; height: 500px;')
javascript (start/stop acquisition):
function makeCallToDAQ() {
fetch('/start_daq', {
// call to app to start the acquisition script
})
.then(console.log(dateTime))
.then(function(response) {
console.log(response)
setInterval(function(){ callPlotly(dateTime.concat('.csv')); }, 5000);
});
}
function killDAQ() {
fetch('/stop_daq')
// kills the process
.then(function(response) {
// Use the response sent here
alert('DAQ has stopped!')
})
}
javascript (call to Plotly for plotting):
function callPlotly(filename) {
var csv_filename = filename;
console.log(csv_filename)
function makeplot(csv_filename) {
// Read data via AJAX call and grab header names
var headerNames = [];
d3.csv(csv_filename, function(error, data) {
headerNames = d3.keys(data[0]);
processData(data, headerNames)
});
};
function processData(allRows, headerNames) {
// Plot data from relevant columns
var plotDiv = document.getElementById("plot");
var traces = [{
x: x,
y: y
}];
Plotly.newPlot('myDAQDiv', traces, plotting_options);
};
makeplot(filename);
}
node.js (the actual Node app):
// Start the DAQ
app.use(express.json());
var isDaqRunning = true;
var pythonPID = 0;
const { spawn } = require('child_process')
var process;
app.post('/start_daq', function(req, res) {
isDaqRunning = true;
// Call the python script here.
const process = spawn('python', ['../private/BIC_script.py', arg1, arg2])
pythonPID = process.pid;
process.stdout.on('data', (myData) => {
res.send("Done!")
})
process.stderr.on('data', (myErr) => {
// If anything gets written to stderr, it'll be in the myErr variable
})
res.status(200).send(); //.json(result);
})
// Stop the DAQ
app.get('/stop_daq', function(req, res) {
isDaqRunning = false;
process.on('close', (code, signal) => {
console.log(
`child process terminated due to receipt of signal ${signal}`);
});
// Send SIGTERM to process
process.kill('SIGTERM');
res.status(200).send();
})

How do I know I've hit the threads limit defined in Node?

I have limited the size of the thread pool to 25.
process.env.UV_THREADPOOL_SIZE = 25;
How can one know that all the threads are exhausted at run time?
Is there any way to find that all the define threads are exhausted during
a new request?
I'm using Native Abstractions for Node.js (NAN) to call C++ functions. For every request to C++ Nan::AsyncQueueWorker is created. Here I want to find if the thread limit is exhausted and then add a safety factor.
Are you looking implementation in nan or js?
In Nan impl:
You have to do it manually. Maintain a map where key is int and value as workAsyn. Push at every call and delete when workAsyn complete. Do this for every request.
Compare the size of map with your thread limit defined.
I analyzed nan and libuv sources. Unfortunately, now there's no way to get the number of used threads. If only to add this feature yourself.
It looks like this cluster module might be able to help...
var cluster = require('../')
,http = require('http');
var server = http.createServer(function(req, res){
res.writeHead(200);
res.end('Hello World');
});
var workerCount = 0;
cluster(server)
.listen(3000)
.on('worker',()=> {
workerCount++;
console.log('workerCount',workerCount)
})
.on('worker killed',()=> {
workerCount--;
console.log('workerCount',workerCount)
})
Also appears to be able to access the worker count directly from master with the REPL "telnet" plugin...
http://learnboost.github.io/cluster/docs/stats.html
Docs

What caused process.hrtime() hanging in nodejs?

Here is the code:
var process = require('process')
var c = 0;
while (true) {
var t = process.hrtime();
console.log(++c);
}
Here is my environment:
nodejs v4.2.4, Ubuntu 14.04 LTS on Oracle VM virtualbox v5.0.4 r102546 running in Windows 7
This loop can only run about 60k to 80k times before it hangs. Nothing happens after that.
In my colleague's computer maybe 40k to 60k times. But shouldn't this loop continues forever?
I was first running a benchmark which tests avg execution time of setting up connections, so I can't just get the start time at first then end time after everything finished.
Is this related to the OS that I use?
Thanks if anyone knows the problem.
==========================================================
update 2016.4.13:
One day right after I raised this question, I realized what a stupid question it was. And it was not what I really want to do. So I'm gonna explain it further.
Here is the testing structure:
I have a node server which handles connections.Client will send a 'setup' event on 'connect' event. A Redis subscribe channel will be made at server side and then make some queries from db, then call client's callback of 'setup' event. Client disconnect socket in 'setup' callback, and reconnect on 'disconnect' event.
The client codes use socket.io-client to run in backend and cluster to simulate high concurrency.
Codes are like these:
(some of the functions are not listed here)
[server]
socket.on('setup', function(data, callback) {
queryFromDB();
subscribeToRedis();
callback();
}
[client]
var requests = 1000;
if (cluster.isMaster) {
for (var i = 0; i < 100; ++i) {
cluster.fork();
}
} else {
var count = 0;
var startTime = process.hrtime();
socket = io.connect(...);
socket.on('connect', function() {
socket.emit('setup', {arg1:'...', arg2:'...'}, function() {
var setupEndTime = process.hrtime();
calculateSetupTime(startTime, setupEndTime);
socket.disconnect();
}
}
socket.on('disconnect', function() {
if (count++ < requests) {
var disconnectEndTime = process.hrtime();
calculateSetupTime(startTime, disconnectEndTime);
socket.connect();
} else {
process.exit();
}
}
}
At first the connections could only make 500 or 600 times. Somehow I removed all the hrtime() codes, it made it to 1000 times. But later I raised the number of requests to like 2000 times (without hrtime() codes), it could not finish again.
I was totally confused. Yesterday I thought it was related to hrtime, but of course it wasn't, any infinite loop would hang. I was misled by hrtime.
But what's the problem now?
===================================================================
update 2016.4.19
I solved this problem.
The reason is my client codes use socket.disconnect and socket.connect to simulate a new user. This is wrong.
In this case server may not recognize the old socket disconnected. You have to delete your socket object and new another one.
So you may find the connection count does not equal to disconnection count, and this will prevent our code from disconnecting to redis, thus the whole loop hang because of redis not responsing.
Your code is an infinite loop - at some point this will always exhaust system resources and cause your application to hang.
Other than causing your application to hang, the code you have posted does very little else. Essentially, it could be described like this:
For the rest of eternity, or until my app hangs, (whichever happens first):
Get the current high-resolution real time, and then ignore it without doing anything with it.
Increment a number and log it
Repeat as quickly as possible
If this is really what you wanted to do - you have acheived it, but it will always hang at some point. Otherwise, you may want to explain your desired result further.

Load testing node.js app on Amazon EC2 instance

I am trying to load test my node.js application with endpoint as API hosted on an m4.large instance using JMeter with 1 master and 3 slaves. The 'server.js' file uses clustering in node.js as follows:
var C_NUM_CPU = 2;
// Listen for dying workers
if (cluster.isMaster) {
for (var i =0; i < C_NUM_CPU; i ++)
{
cluster.fork();
}
cluster.on('exit', function (worker) {
// Replace the dead worker
console.log('Worker %d died :(', worker.id);
cluster.fork();
});
return;
}
When I tested keeping the 'var C_NUM_CPU=2', the max response time crossed 42s, however, on making it 6, the response time dropped to 1.7s! vCPU for m4.large is just 2, then how is the load being handled? Also, in such a case, how to determine the most optimal choice of an instance?
The issue was JMeter slaves. They were dying due to increased response time. Solved on increasing the number of slaves.

Categories

Resources