stream stdout causes RAM usage to increase dramatically - javascript

The bounty expires in 1 hour. Answers to this question are eligible for a +200 reputation bounty.
Soroush Bgm is looking for a canonical answer.
I use spawn to run a command that runs constantly (Not supposed to stop) and it transmits data to its output. The problem is that RAM usage of the node app increases constantly.
After multiple tests, I could reach to following part of code that reproduces the problem, even though the functions are almost empty:
const runCommand = () => {
const command = 'FFMPEG COMMAND HERE';
let ffmpeg = spawn(command, [], { shell: true });
ffmpeg.on('exit', function(code) { code = null; });
ffmpeg.stderr.on('data', function (data) { data = null; });
ffmpeg.stdout.on('data', function (data) { data = null; });
};
I get the same problem with following:
const runCommand = () => {
const command = 'FFMPEG COMMAND HERE';
let ffmpeg = spawn(command, [], { shell: true });
ffmpeg.on('exit', function(code) { code = null; });
ffmpeg.stderr.on('data', function (data) { data = null; });
ffmpeg.on('spawn', function () {
ffmpeg.stdout.pipe(fs.createWriteStream('/dev/null'));
});
};
The important part is, when I delete function (data) {} from ffmpeg.stdout.on('data', function (data) {}); the problem goes away. Type of received data is buffer object. I think the problem is with that part.
The problem also appears when spawn pipes out the data to another writable (even to /dev/null).
UPDATE: After hours of research, I found out that it's something related to spawn output and stream backpressure. I configured FFMPEG command to send chunks less frequently. That mitigated the problem (Increasing less than before). But memory usage still increasing.

If you delete the ffmpeg.stdout.on('data', function (data) {}); line the problem fades away, but just partially because ffmpeg keeps on writing in the stdout and may eventually stop, waiting for the stdout to be consumed. For example, MongoDB has this "pause until stdout is empty" logic.
If you are not going to process the stdout, just ignore it with this:
const runCommand = () => {
const command = 'FFMPEG COMMAND HERE';
let ffmpeg = spawn(command, [], { shell: true, stdio: "ignore" });
ffmpeg.on('exit', function(code) { code = null; });
};
This will make the spawned process to dump the stdout and stderr so there's no need to be consumed. Is the correct way, as you don't need to waste CPU cycles and resources reading a buffer that you are going to discard. Take into account that although you just add a one liner to read and discard the data, livuv (the nodejs IO manager, among other things) does more complex things to read this data.
Still, I'm pretty sure that you are facing this bug: https://github.com/Unitech/pm2/issues/5145
It also seems that if you output too much logs, pm2 can't handle writing them to the output files as fast as needed, so reducing the log output can fix the problem: https://github.com/Unitech/pm2/issues/1126#issuecomment-996626921

As you mentioned you need the stdout output stdio: "ignore" is not an option.
Depending on what you're doing with the data you're receiving you may receive more data than you can handle. Therefore buffers build up filling your memory.
A possible solution will be to pause and resume the stream when data builds up too much.
ffmpeg.stdout.on('data', function (data) {
ffmpeg.stdout.pause();
doSomethingWithDataAsyncWhichTakesAWhile(data).finally(() => ffmpeg.stdout.resume());
});
When to pause and resume the stream highly depend on the factor how you handle the data.
Using in combination with a writeable (which when I'm not mistaken your're doing):
ffmpeg.stdout.on('data', function (data) {
if(!writeable .write(data)) {
/* We need to wait for the 'drain' event. */
ffmpeg.stdout.pause();
writeable .once('drain', () => ffmpeg.stdout.resume());
}
});
writeable.write(...) returns false if the stream wishes for the calling code to wait for the 'drain' event to be emitted before continuing to write additional data; otherwise true. source.
If you're ignoring this you'll end up building up buffers in memory.
This might be the cause of your problem.
PS: As side note:
At least on unix systems when the output buffer of stdout has been filled and not been read (e.g. by pausing the stream) the application which writes to stdout will hang until there is space to write into. In case of ffmpeg this is not an issue and intended behaviour. But it's just something to be mindful of.

Related

How to efficiently stream a real-time chart from a local data file

complete noob picking up NodeJS over the last few days here, and I've gotten myself in big trouble, it looks like. I've currently got a working Node JS+Express server instance, running on a Raspberry Pi, acting as a web interface for a local data acquisition script ("the DAQ"). When executed, the script writes out data to a local file on the Pi, in .csv format, writing out in real-time every second.
My Node app is a simple web interface to start (on-click) the data acquisition script, as well as to plot previously acquired data logs, and visualize the actively being collected data in real time. Plotting of old logs was simple, and I wrote a JS function (using Plotly + d3) to read a local csv file via AJAX call, and plot it - using this script as a starting point, but using the logs served by express rather than an external file.
When I went to translate this into a real-time plot, I started out using the setInterval() method to update the graph periodically, based on other examples. After dealing with a few unwanted recursion issues, and adjusting the interval to a more reasonable setting, I eliminated the memory/traffic issues which were crashing the browser after a minute or two, and things are mostly stable.
However, I need help with one thing primarily:
Improving the efficiency of my first attempt approach: This acquisition script absolutely needs to be written to file every second, but considering that a typical run might last 1-2 weeks, the file size being requested on every Interval loop will quickly start to balloon. I'm completely new to Node/Express, so I'm sure there's a much better way of doing the real-time rendering aspect of this - that's the real issue here. Any pointers of a better way to go about doing this would be massively helpful!
Right now, the killDAQ() call issued by the "Stop" button kills the underlying python process writing out the data to disk. Is there a way to hook into using that same button click to also terminate the setInterval() loop updating the graph? There's no need for it to be updated any longer after the data acquisition has been stopped so having the single click do double duty would be ideal. I think that setting up a listener or res/req approach would be an option, but pointers in the right direction would be massively helpful.
(Edit: I solved #2, using global window. variables. It's a hack, but it seems to work:
window.refreshIntervalId = setInterval(foo);
...
clearInterval(window.refreshIntervalId);
)
Thanks for much for the help!
MWE:
html (using Pug as a template engine):
doctype html
html
body.default
.container-fluid
.row
.col-md-5
.row.text-center
.col-md-6
button#start_button(type="button", onclick="makeCallToDAQ()") Start Acquisition
.col-md-6
button#stop_button(type="button", onclick="killDAQ()") Stop Acquisition
.col-md-7
#myDAQDiv(style='width: 980px; height: 500px;')
javascript (start/stop acquisition):
function makeCallToDAQ() {
fetch('/start_daq', {
// call to app to start the acquisition script
})
.then(console.log(dateTime))
.then(function(response) {
console.log(response)
setInterval(function(){ callPlotly(dateTime.concat('.csv')); }, 5000);
});
}
function killDAQ() {
fetch('/stop_daq')
// kills the process
.then(function(response) {
// Use the response sent here
alert('DAQ has stopped!')
})
}
javascript (call to Plotly for plotting):
function callPlotly(filename) {
var csv_filename = filename;
console.log(csv_filename)
function makeplot(csv_filename) {
// Read data via AJAX call and grab header names
var headerNames = [];
d3.csv(csv_filename, function(error, data) {
headerNames = d3.keys(data[0]);
processData(data, headerNames)
});
};
function processData(allRows, headerNames) {
// Plot data from relevant columns
var plotDiv = document.getElementById("plot");
var traces = [{
x: x,
y: y
}];
Plotly.newPlot('myDAQDiv', traces, plotting_options);
};
makeplot(filename);
}
node.js (the actual Node app):
// Start the DAQ
app.use(express.json());
var isDaqRunning = true;
var pythonPID = 0;
const { spawn } = require('child_process')
var process;
app.post('/start_daq', function(req, res) {
isDaqRunning = true;
// Call the python script here.
const process = spawn('python', ['../private/BIC_script.py', arg1, arg2])
pythonPID = process.pid;
process.stdout.on('data', (myData) => {
res.send("Done!")
})
process.stderr.on('data', (myErr) => {
// If anything gets written to stderr, it'll be in the myErr variable
})
res.status(200).send(); //.json(result);
})
// Stop the DAQ
app.get('/stop_daq', function(req, res) {
isDaqRunning = false;
process.on('close', (code, signal) => {
console.log(
`child process terminated due to receipt of signal ${signal}`);
});
// Send SIGTERM to process
process.kill('SIGTERM');
res.status(200).send();
})

Why does my typescript program randomly stop running?

I wrote a very simple typescript program, which does the following:
Transform users.csv into an array
For each element/user issue an API call to create that user on a 3rd party platform
Print any errors
The excel file has >160,000 rows and there is no way to create them all in one API call, so I wrote this program to run in the background of my computer for ~>20 hours.
The first time I ran this, the code stopped mid for loop without an exception or anything. So, I deleted the user rows from the csv file that were already uploaded and re-ran the code. Unfortunately, this kept happening.
Interestingly, the code has stopped at non-deterministic iterations, one time it was at i=812, another at i=27650, and so on.
This is the code:
const main = async () => {
const usersFile = await fsPromises.readFile("./users.csv", { encoding: "utf-8" });
const usersArr = makeArray(usersFile);
for (let i = 0; i < usersArr.length; i++) {
const [ userId, email ] = usersArr[i];
console.log(`uploading ${userId}. ${i}/${usersArr.length}`);
try {
await axios.post(/* create user */);
await sleep(150);
} catch (err) {
console.error(`Error uploading ${userId} -`, err.message);
}
}
};
main();
I should mention that exceptions are within the for-loop because many rows will fail to upload with a 400 error code. As such, I've preferred to have the code run non-stop and print any errors onto a file, so that I could later re-run it for the users that failed to upload. Otherwise I would have to check whether it halted because of an error every 10 minutes.
Why does this happen? and What can I do?
I run after compiling as: node build/index.js 2>>errors.txt
EDIT:
There is no code after main() and no code outside the try ... catch block within the loop. errors.txt only contains 400 errors. Even if it contained another run-time exception, it seems to me that this wouldn't/shouldn't halt execution, because it would execute catch and move on to the next iteration.
I think this may have been related to this post. The file I was reading was extremely large as noted, and it was saved into a runtime variable. Undeterministically, the OS could have decided that the memory demanded was too high. This is probably a situation to use a Readable Stream instead of a readFile.

Is there any way to determine if a nodejs childprocess wants input or is just sending feedback?

I had a little freetime so I decided to rewrite all my bash scripts in JavaScript (NodeJS - ES6) with child processes. Everything went smoothly until I wanted to automate user input.
Yes, you can do automate the user input. But there is one Problem - you can't determine if the given data event is a feedback or a request for input. At least I can't find a way to do it.
So basically you can do this:
// new Spawn.
let spawn = require('child_process');
// new ufw process.
let ufw = spawn('ufw', ['enable']);
// Use defined input.
ufw.stdin.setEncoding('utf-8');
ufw.stdout.pipe(process.stdout);
ufw.stdin.write('y\n');
// Event Standard Out.
ufw.stdout.on('data', (data) => {
console.log(data.toString('utf8'));
});
// Event Standard Error.
ufw.stderr.on('data', (err) => {
// Logerror.
console.log(err);
});
// When job is finished (with or without error) it ends up here.
ufw.on('close', (code) => {
// Check if there were errors.
if (code !== 0) console.log('Exited with code: ' + code.toString());
// End input stream.
ufw.stdin.end();
});
The above example works totally fine. But there are 2 things giving me an headache:
Will ufw.stdin.write('y\n'); wait until it is needed and what happens if I have multiple inputs? For example 'yes', 'yes', 'no'. Do I have to write 3 lines of stdin.write()?
Isn't the position where I use ufw.stdin.write('y\n'); a little confusing? I thought I need the input after my prompt made a request for input so I decided to change my code that my stdin.write() could run at the right time, makes sense right? However the only way to check when the 'right' time is on the stdout.on('data', callback) event. That makes thinks a little difficult, since I need to know if the prompt is aksing for user input or not...
Here is my code which I think is totally wrong:
// new Spawn.
let spawn = require('child_process');
// new ufw process.
let ufw = spawn('ufw', ['enable']);
// Event Standard Out.
ufw.stdout.on('data', (data) => {
console.log(data.toString('utf8'));
// Use defined input.
ufw.stdin.setEncoding('utf-8');
ufw.stdout.pipe(process.stdout);
ufw.stdin.write('y\n');
});
// Event Standard Error.
ufw.stderr.on('data', (err) => {
// Logerror.
console.log(err);
});
// When job is finished (with or without error) it ends up here.
ufw.on('close', (code) => {
// Check if there were errors.
if (code !== 0) console.log('Exited with code: ' + code.toString());
// End input stream.
ufw.stdin.end();
});
My major misunderstanding is when to use stdin for user input (automated) and where to place it in my code so it will be used at the right time, for example if I have multiple inputs for something like mysql_secure_installation.
So I was wondering if it is possible and it seems not. I posted an issue for node which ended up beeing closed: https://github.com/nodejs/node/issues/16214
I am asking for a way to determine if the current process is waiting for an input.
There isn't one. I think you have wrong expectations about pipe I/O
because that's simply not how it works.
Talking about expectations, check out expect. There is probably a
node.js port if you look around.
I'll close this out because it's not implementable as a feature, and
as a question nodejs/help is the more appropriate place.
So if anyone has the same problem as I had you can simply write multiple lines into stdin and use that as predefined values. Keep in mind that will eventually break the stream if any input is broken or wrong in feature updates:
// new Spawn.
let spawn = require('child_process');
// new msqlsec process.
let msqlsec = spawn('mysql_secure_installation', ['']);
// Arguments as Array.
let inputArgs = ['password', 'n', 'y', 'y', 'y', 'y'];
// Set correct encodings for logging.
msqlsec.stdin.setEncoding('utf-8');
msqlsec.stdout.setEncoding('utf-8');
msqlsec.stderr.setEncoding('utf-8');
// Use defined input and write line for each of them.
for (let a = 0; a < inputArgs.length; a++) {
msqlsec.stdin.write(inputArgs[a] + '\n');
}
// Event Standard Out.
msqlsec.stdout.on('data', (data) => {
console.log(data.toString('utf8'));
});
// Event Standard Error.
msqlsec.stderr.on('data', (err) => {
// Logerror.
console.log(err);
});
// When job is finished (with or without error) it ends up here.
msqlsec.on('close', (code) => {
// Check if there were errors.
if (code !== 0) console.log('Exited with code: ' + code.toString());
// close input to writeable stream.
msqlsec.stdin.end();
});
For the sake of completeness if someone wants to fill the user input manually you can simply start the given process like this:
// new msqlsec process.
let msqlsec = spawn('mysql_secure_installation', [''], { stdio: 'inherit', shell: true });

Nodejs child_process execute shell command

I am working on a university project where I have to evaluate the security threats to an open WiFi Network.I have chosen the aircrack-ng set of tools for penetration testing. My project uses Node js for the rich set of features. However, I am a beginner and am struggling to solve a problem. Firstly, I shall present my code and then pose the problem.
var spawn = require('child_process').spawn;
var nic = "wlan2";
//obtain uid number of a user for spawing a new console command
//var uidNumber = require("uid-number");
// uidNumber("su", function (er, uid, gid) {
// console.log(uid);
// });
//Check for monitor tools
var airmon_ng= spawn('airmon-ng');
airmon_ng.stdout.on('data', function (data) {
nicList = data.toString().split("\n");
//use for data binding
console.log(nicList[0]);//.split("\t")[0]);
});
//airmon start at the nic(var)
var airmon_ng_start = spawn('airmon-ng',['start',nic]).on('error',function(err){console.log(err);});
airmon_ng_start.stdout.on('data', function (data) {
console.log(data.toString());
});
var airmon_ng_start = spawn('airodump-ng',['mon0']).on('error',function(err){console.log(err);});
airmon_ng_start.stdout.on('data', function (data) {
console.log(data.toString());
});
As seen in the above code. I use the child_process.spwan to execute the shell command. In the line "var airmon_ng_start = spawn(......" the actual command executes in the terminal and doesn`t end till the ctrl+c is hit and it regularly updates the list of Wi-Fi networks available in the vicinity . My goal is to identify the network that I wish to test for vulnerability. However when I execute the command the process goes to an infinite loop and waits for the shell command to terminate (which never terminates until killed) moreover I wish to use the stdout stream to display the new set of data as the Wi-Fi finds and updates. May the node.js experts provide me with a better way to do this ?
2) Also I with to execute some commands as root . how may this be done . For now I am running the javascript as a root. However, in the project I wish to execute only some of the commands as root and not the entire js file as root. Any suggestions ?
//inherit parent`s stdout stream
var airmon_ng_start = spawn('airodump-ng',['mon0'],{ stdio: 'inherit' })
.on('error',function(err){console.log(err);});
Found this solution. Simply inherit parent`s stdout

Stdout of Node.js child_process exec is cut short

In Node.js I'm using the exec command of the child_process module to call an algorithm in Java that returns a large amount of text to standard out which I then parse and use. I'm able to capture it mostly, but when it exceeds a certain number of lines, the content is cutoff.
exec("sh target/bin/solver "+fields.dimx+" "+fields.dimy, function(error, stdout, stderr){
//do stuff with stdout
}
I've tried using setTimeouts and callbacks but haven't succeeded but I do feel this is occurring because I'm referencing stdout in my code before it can be retrieved completely. I have tested that stdout is in-fact where the data loss first occurs. It's not an asynchronous issue further down the line. I've also tested this on my local machine and Heroku, and the exact same issue occurs, truncating at the exact same line number every time.
Any ideas or suggestions as to what might help with this?
I had exec.stdout.on('end') callbacks hung forever with #damphat solution.
Another solution is to increase the buffer size in the options of exec: see the documentation here
{ encoding: 'utf8',
timeout: 0,
maxBuffer: 200*1024, //increase here
killSignal: 'SIGTERM',
cwd: null,
env: null }
To quote: maxBuffer specifies the largest amount of data allowed on stdout or stderr - if this value is exceeded then the child process is killed. I now use the following: this does not require handling the separated parts of the chunks separated by commas in stdout, as opposed to the accepted solution.
exec('dir /b /O-D ^2014*', {
maxBuffer: 2000 * 1024 //quick fix
}, function(error, stdout, stderr) {
list_of_filenames = stdout.split('\r\n'); //adapt to your line ending char
console.log("Found %s files in the replay folder", list_of_filenames.length)
}
);
The real (and best) solution to this problem is to use spawn instead of exec.
As stated in this article, spawn is more suited for handling large volumes of data :
child_process.exec returns the whole buffer output from the child process. By default the buffer size is set at 200k. If the child process returns anything more than that, you program will crash with the error message "Error: maxBuffer exceeded". You can fix that problem by setting a bigger buffer size in the exec options. But you should not do it because exec is not meant for processes that return HUGE buffers to Node. You should use spawn for that. So what do you use exec for? Use it to run programs that return result statuses, instead of data.
spawn requires a different syntax than exec :
var proc = spawn('sh', ['target/bin/solver', 'fields.dimx', 'fields.dimy']);
proc.on("exit", function(exitCode) {
console.log('process exited with code ' + exitCode);
});
proc.stdout.on("data", function(chunk) {
console.log('received chunk ' + chunk);
});
proc.stdout.on("end", function() {
console.log("finished collecting data chunks from stdout");
});
Edited:
I have tried with dir /s on my computer (windows) and got the same problem( it look like a bug), this code solve that problem for me:
var exec = require('child_process').exec;
function my_exec(command, callback) {
var proc = exec(command);
var list = [];
proc.stdout.setEncoding('utf8');
proc.stdout.on('data', function (chunk) {
list.push(chunk);
});
proc.stdout.on('end', function () {
callback(list.join());
});
}
my_exec('dir /s', function (stdout) {
console.log(stdout);
})

Categories

Resources