Node.js child process hanging, need debugging ideas - javascript

This question might be kinda weird because I'm not quite sure how to ask it. I'm looking for help isolating a bug in some Node.js code I'm working with that spawns some child processes that run some jobs in parallel. The bug is essentially that the last process spawned by the code hangs indefinitely when the code is run.
To clarify what I mean by "hangs indefinitely", I mean that the process never exits, or at least never appears to do so at the terminal.
The method implemented for this was taken from a SO answer to begin with, though I can't find it to link it at the moment. I've coded a dummy version without the implementation specifics:
const { spawn } = require('child_process');
const tasks = ['task.js', 'task.js', 'task.js', 'task.js', 'task.js', 'task.js', 'task.js'];
function doThings(tasks, pool){
let numRunning = 0;
let tasksRun = 0;
function doMoreThings(){
while (tasksRun < tasks.length && numRunning < pool){
const runnerCmd = `node`
const params = [`${tasks[tasksRun]}`]
console.log(`Starting task ${tasksRun + 1} out of ${tasks.length}`)
const child = spawn(runnerCmd, params, {
stdio: 'inherit',
detached: true
})
++numRunning;
++tasksRun;
child.on('exit', code=>{
--numRunning;
doMoreThings();
}).on('error', err=>{
console.log(`${err}`);
doMoreThings();
})
}
}
doMoreThings();
}
doThings(tasks, 3);
And here is the task.js file:
console.log('Task starting...');
const max = 5000;
const min = 1000;
const sleep = (waitTimeInMs) => new Promise(resolve => setTimeout(resolve, waitTimeInMs));
const timeval = Math.floor(Math.random() * (max - min) + min);
console.log(`Task interval: ${timeval}ms`);
sleep(timeval).then(() => {
console.log(`Task completing after ${timeval}ms`);
});
The above dummy code all works as expected. The processes run, their output is printed to the terminal, and when the last process exits, the terminal returns to a command prompt from the running state.
So, finally, what I'm hoping for are ideas from anyone more experienced with Node child processes or parallel job execution in general on how a bug could be introduced to the above code that would cause the last job in the chain to never exit. Or maybe the bug is already there in some race condition I'm not perceiving that just doesn't manifest with such a simple test? Been trying to debug this for a while and I don't have any more ideas about what to try. Thanks in advance to anyone who even attempts to help.

Related

How to batch process an async read stream?

Im trying to batch process the reading of a file and posting to a database. Currently, I am trying to batch it 20 records at a time, as seen below.
Despite the documentBatch.length check I have put in, it still seems to not be working (the database call inside persistToDB should be called 5 times, for some reason it's only called once) and console logging documentBatch.length, it is hitting higher than that limit. I suspect this is due to concurrency issues, however the persistToDB is from an extrnal lib that needs to be called within an async function.
The way I am trying to batch is to pause the stream and resume the stream once the db work is done, however this seems to be having the same issue.
let documentBatch = [];
const processedMetrics = {
succesfullyProcessed: 0,
unsuccesfullyProcessed: 0,
};
rl.on('line', async (line) => {
try {
const document = JSON.parse(line);
documentBatch.push(document);
console.log(documentBatch.length);
if (documentBatch.length === 20) {
rl.pause();
const batchMetrics = await persistToDB(documentBatch);
documentBatch = [];
processedMetrics.succesfullyProcessed +=
batchMetrics.succesfullyProcessed;
processedMetrics.unsuccesfullyProcessed +=
batchMetrics.unsuccesfullyProcessed;
rl.resume();
}
} catch (e) {
logger.error(`Failed to save document ${line}`);
throw e;
}
});

How to prevent my command line command from timing out? [duplicate]

Error "spawnSync /bin/sh ENOBUFS" spawns in my NodeJs application non-systematically while executing the following line :
child_process.execSync(`cd /tmp/myFolder ; tar -xjf myArchive.tar.bz2`);
Archive dimension is 81.5 MB, NodeJs version with NVM : 12.17.0.
The problem is that execSync mode execute the command within a shell with a limited buffer (200 Kb) used to forward the execution output. Moreover, the default shell execution option is "pipe", which means that the output must be forwarded to the parent.
In order to let the shell ignore the execution output, i.e. forward to /dev/null the output, and hence prevent the buffer from filling up, you must use the "ignore" execution option as following :
child_process.execSync(`cd /tmp/myFolder ; tar -xjf myArchive.tar.bz2`, { stdio: 'ignore' });
Read more about exec and spawn execution modes here and here
P.S. Also consider that this error spawns systematically when, during an archive extraction, you run out of disk space.
Use child_process.spawn if the output is important to you.
And even if you don't need it, execution output can be helpful when debugging. You can always add a simple switch which lets you silence the output should you choose.
I try not to use child_process.exec because child_process.spawn works just as fine but without the limitations of exec. Of course, YMMV if your core application logic deals with streaming data. The reason child_process.spawn will work here is because it streams output whereas child_process.exec buffers output and if you fill up the max buffer then child_process.exec will crash. You could increase the buffer size of child_process.exec via the options parameter but remember, the bigger the buffer, the more memory you use, whereas streaming usually keeps the memory usage to a minimum.
Here's a reference implementation. FYI, This code works on Node v14.
const child_process = require("child_process");
function spawn(instruction, spawnOpts = {}, silenceOutput = false) {
return new Promise((resolve, reject) => {
let errorData = "";
const [command, ...args] = instruction.split(/\s+/);
if (process.env.DEBUG_COMMANDS === "true") {
console.log(`Executing \`${instruction}\``);
console.log("Command", command, "Args", args);
}
const spawnedProcess = child_process.spawn(command, args, spawnOpts);
let data = "";
spawnedProcess.on("message", console.log);
spawnedProcess.stdout.on("data", chunk => {
if (!silenceOutput) {
console.log(chunk.toString());
}
data += chunk.toString();
});
spawnedProcess.stderr.on("data", chunk => {
errorData += chunk.toString();
});
spawnedProcess.on("close", function(code) {
if (code > 0) {
return reject(new Error(`${errorData} (Failed Instruction: ${instruction})`));
}
resolve(data);
});
spawnedProcess.on("error", function(err) {
reject(err);
});
});
}
// example usage
async function run() {
await spawn("echo hello");
await spawn("echo hello", {}, true);
await spawn("echo hello", { cwd: "/" });
}
run();
ENOBUFS means the process out size exceed so to fix this override process stdout here an example using nodejs:
let ops={}
ops.args=['logcat','-t',"'m-d h:min:s.000'"]
//set output file
ops.log='tmp/logfile.txt'
largeout=run('adb.exe',ops)
console.log(largeout)
//+++++++++++++++utils+++++++++++++++++++
function run(cmd,ops={args:'',log:''}) {
const{readFileSync,openSync}=require('fs');
cmd=ops.args?cmd+" "+ops.args:cmd;
try{
//override stdio[stdin,stdout,stderr]
if(ops.log) {let log=openSync(ops.log,'w');ops.stdio=[null,log,log]}
let rs= require('child_process').execSync(cmd,ops);
//nb:ops.log:path file used to fix pipe size max 200k ENOBUFS;
if(ops.log) rs=readFileSync(ops.log,'utf8');
return !rs?false:rs.toString();
}catch(e){
console.log('ags.run err:',e.message);
return false;
}
}

Forking tasks workflow in Javascript

I'm doing some tests to learn to fork different tasks in JavaScript as I'm new to the lenguage. I'm trying to sum every thre number group from a plain text file formated as following:
199
200
208
210
200
207
(199, 200, 208) is the first group, (200, 208, 210) is the second one, etc...
I read from the file, splited the string and got my array of strings. Now I want to do the adding in a loop that forks every iteration (in the subprocess is where the sum is being made) and print the resulting array of summed numbers.
parent.js
const fs = require('fs');
const { fork } = require('child_process');
const readString = fs.readFileSync('depth_readings_p2.txt', 'utf8');
const readArray = readString.split('\n');
var numArrayDef = [];
for (let i = 0; i < readArray.length - 2; i++) {
let msg = {
i,
readArray
};
let childProcess = fork('function.js');
childProcess.send(msg);
childProcess.on('message', (m) => {
console.log(m);
numArrayDef.push(m);
});
console.log(numArrayDef[i]);
}
As you see I'm sending the subprocess and object that includes the index, the array of strings and the array where the summed number will be stored. The parent process recieves the summed number and stores it in numArrayDef.
function.js
process.on('message', (msg) => {
let num = 0;
if ((msg.i + 2) < msg.readArray.length) {
num += parseInt(msg.readArray[msg.i]);
num += parseInt(msg.readArray[msg.i + 1]);
num += parseInt(msg.readArray[msg.i + 2]);
process.send(num);
}
process.exit();
});
In the output I can see that the parent is receiving everything correctly, but the program isn't pushing the received values into the result array. Also, the order of execution is weird:
- First, everything in the loop but the message receiving block.
- Second, everything after the loop ends.
- Finally, the message receiving block.
undefined
undefined
undefined
undefined
undefined
undefined
undefined
undefined
[]
607
618
618
617
647
716
769
792
I know I'm missing something about forking processes, but I don't know what is it and I don't see it in the fork documentation.
What you have to understand in nodejs is it's asynchronious nature, the code is not really executed in order as you have written it! (atleast, a lot of times..)
The childProcess is a process handle which will be returned immediatly. But the forked process itself may take some time to start. What you do, is to add a callback which will be executed every time, a message event is received. Check this code:
parent.js
let childProcess = fork('function.js');
// this line is executed immedatly after the handle is created.
// You pass a newly created function to the ".on()" function which will be
// called everytime, the child process sends a "message" event.
// you want to understand, that you just declare an anonymious `function`
// and pass it as argument. So the executed function has actually to decide
// when to call it!.
childProcess.on('message', (m) => {
console.log('message received in parent:', m)
console.log('closing the process')
childProcess.kill('SIGINT')
});
childProcess.on('exit', () => {
console.log('child is done!')
})
childProcess.send('I will come back!')
console.log('last line reached. Program still running.')
function.js
process.on('message', (msg) => {
// wait a few seconds, and return the message!
setTimeout(() => {
process.send(msg)
// wait 2000ms
}, 2000)
}
output
last line reached. Program still running.
message received in parent: I will come back!
closing the process
child is done!
execution order
Fork a process and get it's handle. Code execution goes on!
Register callback listeners which will be called on given events like message OR exit. These are actually asynchronious. You don't know when they kick in.
log that all lines have been executed
Some time later, the message listener and after it, the exit listener kick in.
your code
You code basically executes till the end (only adding handlers to a process handle) and will log data from numArrayDef which is not currently added to it. So if no element is present at numArrayDef[5], it will log undefined per default.
callbacks
Since nodejs is single threaded per default, it's common to xecute an asynchronious function and pass it a callback (just another function) which will be executed when your called function is done!
The fixed code
parent.js
const fs = require('fs');
const { fork } = require('child_process');
const { EOL } = require('os')
const readString = fs.readFileSync('file.txt', 'utf8');
const readArray = readString.split(EOL);
var numArrayDef = [];
for (let i = 0; i < readArray.length - 2; i++) {
// msg building. Done instantly.
let msg = {
i,
readArray
};
// forking a childprocess. The handle is retunred immediatly
// but starting the process may be taking some time, and
// the code won't wait for it!.
let childProcess = fork('function.js');
// this line is executed immedatly after the handle is created.
// You add a so
childProcess.on('message', (m) => {
console.log('message recevied', m)
numArrayDef.push(m);
// log if all numbers are done.
if(numArrayDef.length === readArray.length -2) {
console.log('Done. Here\'s the array:', numArrayDef)
}
});
childProcess.send(msg);
}
function.js
process.on('message', (msg) => {
let num = 0;
if ((msg.i + 2) < msg.readArray.length) {
num += parseInt(msg.readArray[msg.i]);
num += parseInt(msg.readArray[msg.i + 1]);
num += parseInt(msg.readArray[msg.i + 2]);
process.send(num);
}
process.exit();
});
This should give you an idea. I recommend, going for some tutorials in the beginning to understand the nature of the language.
What you should learn about nodejs
Learn what a callback is
Basic understanding of async/await and Promises is a must
You should learn, what operations are sync and which ones are async
Eventemitter class is also used very often
Learning how to handle childprocess or fork and other similar stuff, is not really required to get the base understanding of nodejs
Function declaration
Just an addon to the syntax. These are almost exactly the same. Except that with the arrow function style the this context will be correctly applied to newly created function:
// variant1
function abc(fn) {
// eexcute the argument which is a function. But only after
// a timeout!.
setTimeout(fn, 2000)
}
// variant2
const abc = function(fn) {
// eexcute the argument which is a function. But only after
// a timeout!.
setTimeout(fn, 2000)
}
// variant3
const abc = (fn) => {
// eexcute the argument which is a function. But only after
// a timeout!.
setTimeout(fn, 2000)
}
// call it like so:
abc(function() {
console.log('I was passed!!.')
})
console.log('The abc function was called. Let\'s wait for it to call the passed function!\')

Is there a way to limit the execution time of regex evaluation in javascript? [duplicate]

Is it possible to cancel a regex.match operation if takes more than 10 seconds to complete?
I'm using an huge regex to match a specific text, and sometimes may work, and sometimes can fail...
regex: MINISTÉRIO(?:[^P]*(?:P(?!ÁG\s:\s\d+\/\d+)[^P]*)(?:[\s\S]*?))PÁG\s:\s+\d+\/(\d+)\b(?:\D*(?:(?!\1\/\1)\d\D*)*)\1\/\1(?:[^Z]*(?:Z(?!6:\s\d+)[^Z]*)(?:[\s\S]*?))Z6:\s+\d+
Working example: https://regex101.com/r/kU6rS5/1
So.. i want cancel the operation if takes more than 10 seconds. Is it possible? I'm not finding anything related in sof
Thanks.
You could spawn a child process that does the regex matching and kill it off if it hasn't completed in 10 seconds. Might be a bit overkill, but it should work.
fork is probably what you should use, if you go down this road.
If you'll forgive my non-pure functions, this code would demonstrate the gist of how you could communicate back and forth between the forked child process and your main process:
index.js
const { fork } = require('child_process');
const processPath = __dirname + '/regex-process.js';
const regexProcess = fork(processPath);
let received = null;
regexProcess.on('message', function(data) {
console.log('received message from child:', data);
clearTimeout(timeout);
received = data;
regexProcess.kill(); // or however you want to end it. just as an example.
// you have access to the regex data here.
// send to a callback, or resolve a promise with the value,
// so the original calling code can access it as well.
});
const timeoutInMs = 10000;
let timeout = setTimeout(() => {
if (!received) {
console.error('regexProcess is still running!');
regexProcess.kill(); // or however you want to shut it down.
}
}, timeoutInMs);
regexProcess.send('message to match against');
regex-process.js
function respond(data) {
process.send(data);
}
function handleMessage(data) {
console.log('handing message:', data);
// run your regex calculations in here
// then respond with the data when it's done.
// the following is just to emulate
// a synchronous computational delay
for (let i = 0; i < 500000000; i++) {
// spin!
}
respond('return regex process data in here');
}
process.on('message', handleMessage);
This might just end up masking the real problem, though. You may want to consider reworking your regex like other posters have suggested.
Another solution I found here:
https://www.josephkirwin.com/2016/03/12/nodejs_redos_mitigation/
Based on the use of VM, no process fork.
That's pretty.
const util = require('util');
const vm = require('vm');
var sandbox = {
regex:/^(A+)*B/,
string:"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC",
result: null
};
var context = vm.createContext(sandbox);
console.log('Sandbox initialized: ' + vm.isContext(sandbox));
var script = new vm.Script('result = regex.test(string);');
try{
// One could argue if a RegExp hasn't processed in a given time.
// then, its likely it will take exponential time.
script.runInContext(context, { timeout: 1000 }); // milliseconds
} catch(e){
console.log('ReDos occurred',e); // Take some remedial action here...
}
console.log(util.inspect(sandbox)); // Check the results

Node.js Promise setTimeout resolves quicker than expected

I have the following code in Node.js.
const timeout = (ms) => new Promise((resolve) => setTimeout(resolve, ms));
I'm trying to test this code with the following test:
it("Should wait for given time before resolving", async () => {
const MS = 100;
const start = process.hrtime();
await timeout(MS);
const diff = process.hrtime(start);
expect(((diff[0] * NS_PER_SEC) + diff[1]) / 1000000).to.at.least(MS);
});
The problem is sometimes (rarely), this test fails:
Should wait for given time before resolving:
AssertionError: expected 99.595337 to be at least 100
+ expected - actual
-99.595337
+100
Obviously this is some type of timing issue with Node.js or something. If anything I expect await timeout(MS); to take slightly longer than MS. In no case do I expect it to take less time.
What is it about the internals of JavaScript/Node.js that causes this to happen?
This occurred on macOS 10.14.3 running Node.js version 8.15.1.

Categories

Resources