Stdout of Node.js child_process exec is cut short - javascript

In Node.js I'm using the exec command of the child_process module to call an algorithm in Java that returns a large amount of text to standard out which I then parse and use. I'm able to capture it mostly, but when it exceeds a certain number of lines, the content is cutoff.
exec("sh target/bin/solver "+fields.dimx+" "+fields.dimy, function(error, stdout, stderr){
//do stuff with stdout
}
I've tried using setTimeouts and callbacks but haven't succeeded but I do feel this is occurring because I'm referencing stdout in my code before it can be retrieved completely. I have tested that stdout is in-fact where the data loss first occurs. It's not an asynchronous issue further down the line. I've also tested this on my local machine and Heroku, and the exact same issue occurs, truncating at the exact same line number every time.
Any ideas or suggestions as to what might help with this?

I had exec.stdout.on('end') callbacks hung forever with #damphat solution.
Another solution is to increase the buffer size in the options of exec: see the documentation here
{ encoding: 'utf8',
timeout: 0,
maxBuffer: 200*1024, //increase here
killSignal: 'SIGTERM',
cwd: null,
env: null }
To quote: maxBuffer specifies the largest amount of data allowed on stdout or stderr - if this value is exceeded then the child process is killed. I now use the following: this does not require handling the separated parts of the chunks separated by commas in stdout, as opposed to the accepted solution.
exec('dir /b /O-D ^2014*', {
maxBuffer: 2000 * 1024 //quick fix
}, function(error, stdout, stderr) {
list_of_filenames = stdout.split('\r\n'); //adapt to your line ending char
console.log("Found %s files in the replay folder", list_of_filenames.length)
}
);

The real (and best) solution to this problem is to use spawn instead of exec.
As stated in this article, spawn is more suited for handling large volumes of data :
child_process.exec returns the whole buffer output from the child process. By default the buffer size is set at 200k. If the child process returns anything more than that, you program will crash with the error message "Error: maxBuffer exceeded". You can fix that problem by setting a bigger buffer size in the exec options. But you should not do it because exec is not meant for processes that return HUGE buffers to Node. You should use spawn for that. So what do you use exec for? Use it to run programs that return result statuses, instead of data.
spawn requires a different syntax than exec :
var proc = spawn('sh', ['target/bin/solver', 'fields.dimx', 'fields.dimy']);
proc.on("exit", function(exitCode) {
console.log('process exited with code ' + exitCode);
});
proc.stdout.on("data", function(chunk) {
console.log('received chunk ' + chunk);
});
proc.stdout.on("end", function() {
console.log("finished collecting data chunks from stdout");
});

Edited:
I have tried with dir /s on my computer (windows) and got the same problem( it look like a bug), this code solve that problem for me:
var exec = require('child_process').exec;
function my_exec(command, callback) {
var proc = exec(command);
var list = [];
proc.stdout.setEncoding('utf8');
proc.stdout.on('data', function (chunk) {
list.push(chunk);
});
proc.stdout.on('end', function () {
callback(list.join());
});
}
my_exec('dir /s', function (stdout) {
console.log(stdout);
})

Related

stream stdout causes RAM usage to increase dramatically

The bounty expires in 1 hour. Answers to this question are eligible for a +200 reputation bounty.
Soroush Bgm is looking for a canonical answer.
I use spawn to run a command that runs constantly (Not supposed to stop) and it transmits data to its output. The problem is that RAM usage of the node app increases constantly.
After multiple tests, I could reach to following part of code that reproduces the problem, even though the functions are almost empty:
const runCommand = () => {
const command = 'FFMPEG COMMAND HERE';
let ffmpeg = spawn(command, [], { shell: true });
ffmpeg.on('exit', function(code) { code = null; });
ffmpeg.stderr.on('data', function (data) { data = null; });
ffmpeg.stdout.on('data', function (data) { data = null; });
};
I get the same problem with following:
const runCommand = () => {
const command = 'FFMPEG COMMAND HERE';
let ffmpeg = spawn(command, [], { shell: true });
ffmpeg.on('exit', function(code) { code = null; });
ffmpeg.stderr.on('data', function (data) { data = null; });
ffmpeg.on('spawn', function () {
ffmpeg.stdout.pipe(fs.createWriteStream('/dev/null'));
});
};
The important part is, when I delete function (data) {} from ffmpeg.stdout.on('data', function (data) {}); the problem goes away. Type of received data is buffer object. I think the problem is with that part.
The problem also appears when spawn pipes out the data to another writable (even to /dev/null).
UPDATE: After hours of research, I found out that it's something related to spawn output and stream backpressure. I configured FFMPEG command to send chunks less frequently. That mitigated the problem (Increasing less than before). But memory usage still increasing.
If you delete the ffmpeg.stdout.on('data', function (data) {}); line the problem fades away, but just partially because ffmpeg keeps on writing in the stdout and may eventually stop, waiting for the stdout to be consumed. For example, MongoDB has this "pause until stdout is empty" logic.
If you are not going to process the stdout, just ignore it with this:
const runCommand = () => {
const command = 'FFMPEG COMMAND HERE';
let ffmpeg = spawn(command, [], { shell: true, stdio: "ignore" });
ffmpeg.on('exit', function(code) { code = null; });
};
This will make the spawned process to dump the stdout and stderr so there's no need to be consumed. Is the correct way, as you don't need to waste CPU cycles and resources reading a buffer that you are going to discard. Take into account that although you just add a one liner to read and discard the data, livuv (the nodejs IO manager, among other things) does more complex things to read this data.
Still, I'm pretty sure that you are facing this bug: https://github.com/Unitech/pm2/issues/5145
It also seems that if you output too much logs, pm2 can't handle writing them to the output files as fast as needed, so reducing the log output can fix the problem: https://github.com/Unitech/pm2/issues/1126#issuecomment-996626921
As you mentioned you need the stdout output stdio: "ignore" is not an option.
Depending on what you're doing with the data you're receiving you may receive more data than you can handle. Therefore buffers build up filling your memory.
A possible solution will be to pause and resume the stream when data builds up too much.
ffmpeg.stdout.on('data', function (data) {
ffmpeg.stdout.pause();
doSomethingWithDataAsyncWhichTakesAWhile(data).finally(() => ffmpeg.stdout.resume());
});
When to pause and resume the stream highly depend on the factor how you handle the data.
Using in combination with a writeable (which when I'm not mistaken your're doing):
ffmpeg.stdout.on('data', function (data) {
if(!writeable .write(data)) {
/* We need to wait for the 'drain' event. */
ffmpeg.stdout.pause();
writeable .once('drain', () => ffmpeg.stdout.resume());
}
});
writeable.write(...) returns false if the stream wishes for the calling code to wait for the 'drain' event to be emitted before continuing to write additional data; otherwise true. source.
If you're ignoring this you'll end up building up buffers in memory.
This might be the cause of your problem.
PS: As side note:
At least on unix systems when the output buffer of stdout has been filled and not been read (e.g. by pausing the stream) the application which writes to stdout will hang until there is space to write into. In case of ffmpeg this is not an issue and intended behaviour. But it's just something to be mindful of.

NodeJS child_process stdout, if process is waiting for stdin

I'm working on an application, which allows to compile and execute code given over an api.
The binary I want to execute is saved as input_c and should print a text, asking the user for his name and print out another text after the input is received.
It correctly works using the code below: first text - input (on terminal) - second text.
const {spawn} = require('child_process');
let cmd = spawn('input_c', [], {stdio: [process.stdin, process.stdout, process.stderr]});
Output:
$ node test.js
Hello, what is your name? Heinz
Hi Heinz, nice to meet you!
I like to handle stdout, stderr and stdin seperately and not write it to the terminal. The following code was my attempt to achieve the same behaviour as above:
const {spawn} = require('child_process');
let cmd = spawn('input_c');
cmd.stdout.on('data', data => {
console.log(data.toString());
});
cmd.stderr.on('data', data => {
console.log(data.toString());
});
cmd.on('error', data => {
console.log(data.toString());
});
// simulating user input
setTimeout(function() {
console.log('Heinz');
cmd.stdin.write('Heinz\n');
}, 3000);
Output:
$ node test.js
Heinz
Hello, what is your name? Hi Heinz, nice to meet you!
To simulate user input I'm writing to stdin after 3000ms. But here I'm not receiving the first data in stdout directly on run, it seems to wait for stdin and outputs everything at once.
How can I achieve the same behaviour for my second case?
The following C-Code was used to compile the binary, but any application waiting for user input can be used for this:
#include <stdio.h>
int main() {
char name[32];
printf("Hello, what is your name? ");
scanf("%s", name);
printf("Hi %s, nice to meet you!", name);
return 0;
}
node-pty can be used here to prevent buffered output of child process.
const pty = require('node-pty');
let cmd = pty.spawn('./input_c');
cmd.on('data', data => {
console.log(data.toString());
});
// simulating user input
setTimeout(function() {
console.log('Heinz');
cmd.write('Heinz\n');
}, 3000);
output:
Hello, what is your name?
Heinz
Heinz
Hi Heinz, nice to meet you!
The problem you're facing with stdout.on() events not being triggered after spawn() appears because of how node.js spawns the child process. Essentially, it creates stream.pipe() (by default), which allows child process to buffer the output before sending it to stdout it was given by node, which is, in general, good for performance.
But, since you want real-time output and also you're in charge of the binary, you might simply disable internal buffering. In C you can achieve that by adding setbuf(stdout, NULL); to the beginning of your program:
#include <stdio.h>
int main() {
setbuf(stdout, NULL);
char name[32];
printf("Hello, what is your name? ");
scanf("%31s", name);
printf("Hi %s, nice to meet you!", name);
return 0;
}
Alternatively, you can call fflush(stdout); after each printf(), puts(), etc:
#include <stdio.h>
int main() {
char name[32];
printf("Hello, what is your name? "); fflush(stdout);
scanf("%31s", name);
printf("Hi %s, nice to meet you!", name); fflush(stdout);
return 0;
}
Upon disabling internal buffering or triggering explicit flushes in the child process, you will immediately get the behavior you expect, without any external dependencies.
UPD:
Many applications intentionally suppress or, at least, allow suppressing stdio buffering, so you may find related startup arguments. For example, you can launch python interpreter binary with -u option, which will force stdin, stdout and stderr to be totally unbuffered. There are also several older questions related nodejs and stdio buffering problems, you might find useful, like this one: How can I flush a child process from nodejs

Appending lines to a file in Node.js causes contents to be written in random order

I wrote this code and basically the header line is not always at the first position. It randomly goes to second or third line. Please help--I tried many times.
var fs = require('fs');
const path = './Output.csv';
if (fs.existsSync(path)){
fs.unlinkSync(path);
}
fs.closeSync(fs.openSync(path, 'w'));
fs.appendFile(path,'SamAccountName,Sid \n', function (err) {
if (err) return console.log(err);
});
for (i = 0; i < array_Sid_SidHistory_Full.length; i++) {
fs.appendFile(path, array_Sid_SidHistory_Full[i]+"\n" , function (err) {
if (err) return console.log(err);
});
}
Output expected:
SamAccountName,Sid
a,S-1-5-21-541258428-755705122-2342590333-8456
b,S-1-5-21-541258428-755705122-2342590333-6683
c,S-1-5-21-541258428-755705122-2342590333-8459
d,S-1-5-21-541258428-755705122-2342590333-3413
e,S-1-5-21-541258428-755705122-2342590333-1140
f,S-1-5-21-541258428-755705122-2342590333-17241
Sometimes it happens that the output comes like this:
a,S-1-5-21-541258428-755705122-2342590333-8456
b,S-1-5-21-541258428-755705122-2342590333-6683
SamAccountName,Sid
c,S-1-5-21-541258428-755705122-2342590333-8459
d,S-1-5-21-541258428-755705122-2342590333-3413
e,S-1-5-21-541258428-755705122-2342590333-1140
f,S-1-5-21-541258428-755705122-2342590333-17241
or
a,S-1-5-21-541258428-755705122-2342590333-8456
SamAccountName,Sid
b,S-1-5-21-541258428-755705122-2342590333-6683
c,S-1-5-21-541258428-755705122-2342590333-8459
d,S-1-5-21-541258428-755705122-2342590333-3413
e,S-1-5-21-541258428-755705122-2342590333-1140
f,S-1-5-21-541258428-755705122-2342590333-17241
According to the Node documentation, fs.appendFile asynchronously appends data to a file. This means that the code has a race condition on the file resource, and the arbitrary ordering of the callback execution winds up determining the output (Node dispatches threads under the hood to handle these callbacks).
Ensuring sequential ordering can be done with fs.appendFileSync. It's common for many Node fs functions to have a synchronous version. Alternately, you could await each asynchronous call to resolve before performing the next append, but this seems a bit of a shoehorn in this case.
Assuming the file fits into memory, you can also build the entire string representing the file contents using concatenation, then dump the entire file to disk using one call to fs.writeFile (or fs.appendFile) and use the callback if necessary to perform further actions.
Using a stream would have the lowest memory footprint:
function writeCSV(stream, data) {
while (data.length > 0) {
// Process the data backwards assuming the CSV header is the last element.
// Working backwards and popping the elements off the end allows us to avoid
// maintaining an index but mutates the original array.
if (!stream.write(`${data.pop()}\n`)) {
// Wait for data to drain before writing more.
stream.once('drain', () => writeCSV(stream, data));
return;
}
}
stream.end();
}
// Open the write stream and watch for errors and completion.
const stream = fs.createWriteStream(path);
stream.on('error', (err) => console.error(err));
stream.on('finish', () => console.log(`Completed writing to: ${path}`));
// Push the header onto the end of the data and write the file.
array_Sid_SidHistory_Full.push('SamAccountName,Sid');
writeCSV(stream, array_Sid_SidHistory_Full);
But constructing the file in memory is certainly simpler (described in the answer by #ggorlen):
const data = `SamAccountName,Sid\n${array_Sid_SidHistory_Full.join('\n')}\n`;
fs.writeFile(path, data, (err) => {
if (err) {
console.error(err)
} else {
console.log(`Completed writing to: ${path}`)
}
});
Both methods have less overhead that using fs.appendFile as the file is only opened and closed once during writing rather than opened and closed for each row of data.

Nodejs child_process execute shell command

I am working on a university project where I have to evaluate the security threats to an open WiFi Network.I have chosen the aircrack-ng set of tools for penetration testing. My project uses Node js for the rich set of features. However, I am a beginner and am struggling to solve a problem. Firstly, I shall present my code and then pose the problem.
var spawn = require('child_process').spawn;
var nic = "wlan2";
//obtain uid number of a user for spawing a new console command
//var uidNumber = require("uid-number");
// uidNumber("su", function (er, uid, gid) {
// console.log(uid);
// });
//Check for monitor tools
var airmon_ng= spawn('airmon-ng');
airmon_ng.stdout.on('data', function (data) {
nicList = data.toString().split("\n");
//use for data binding
console.log(nicList[0]);//.split("\t")[0]);
});
//airmon start at the nic(var)
var airmon_ng_start = spawn('airmon-ng',['start',nic]).on('error',function(err){console.log(err);});
airmon_ng_start.stdout.on('data', function (data) {
console.log(data.toString());
});
var airmon_ng_start = spawn('airodump-ng',['mon0']).on('error',function(err){console.log(err);});
airmon_ng_start.stdout.on('data', function (data) {
console.log(data.toString());
});
As seen in the above code. I use the child_process.spwan to execute the shell command. In the line "var airmon_ng_start = spawn(......" the actual command executes in the terminal and doesn`t end till the ctrl+c is hit and it regularly updates the list of Wi-Fi networks available in the vicinity . My goal is to identify the network that I wish to test for vulnerability. However when I execute the command the process goes to an infinite loop and waits for the shell command to terminate (which never terminates until killed) moreover I wish to use the stdout stream to display the new set of data as the Wi-Fi finds and updates. May the node.js experts provide me with a better way to do this ?
2) Also I with to execute some commands as root . how may this be done . For now I am running the javascript as a root. However, in the project I wish to execute only some of the commands as root and not the entire js file as root. Any suggestions ?
//inherit parent`s stdout stream
var airmon_ng_start = spawn('airodump-ng',['mon0'],{ stdio: 'inherit' })
.on('error',function(err){console.log(err);});
Found this solution. Simply inherit parent`s stdout

Node.js Asynchronous File I/O

I'm new to Node.js and recently learned about the fs module. I'm a little confused about asynchronous vs. synchronous file i/o.
Consider the following test:
var fs = require('fs');
var txtfile = 'async.txt';
var buffer1 = Buffer(1024);
var buffer2 = '1234567890';
fs.appendFile(txtfile, buffer1, function(err) {
if (err) { throw err };
console.log('appended buffer1');
});
fs.appendFile(txtfile, buffer2, function(err) {
if (err) { throw err };
console.log('appended buffer2');
});
About half the time when I run this, it prints appended buffer2 before appended buffer1. But when I open the text file, the data always appears to be in the right order - a bunch of garbage from Buffer(1024) followed by 1234567890. I would have expected the reverse or a jumbled mess.
What's going on here? Am I doing something wrong? Is there some kind of lower-level i/o queue that maintains order?
I've seen some talk about filesystem i/o differences with Node; I'm on a Mac if that makes any difference.
From my understanding, although the code is asynchronous, at the OS level, the file I/O operations of the SAME file are not. That means only one file I/O operation is processing at a time to a single file.
During the 1st append is occurring, the file is locked. Although the 2nd append has been processed, the file I/O part of it is put in the queue by the OS and finishes with no error status. My guess is the OS does some checks to make sure the write operation will be successful such as file exists, is writable, and diskspace is large enough, and etc. If all those conditions met, the OS returns to the application with no error status and will finish the writing operation later when possible. Since the buffer of the 2nd append is much smaller, it might finish processing (not writing to file part of it) before first append finished writing to file. You, therefore, saw the 2nd console.log() first.

Categories

Resources