I'm doing some tests to learn to fork different tasks in JavaScript as I'm new to the lenguage. I'm trying to sum every thre number group from a plain text file formated as following:
199
200
208
210
200
207
(199, 200, 208) is the first group, (200, 208, 210) is the second one, etc...
I read from the file, splited the string and got my array of strings. Now I want to do the adding in a loop that forks every iteration (in the subprocess is where the sum is being made) and print the resulting array of summed numbers.
parent.js
const fs = require('fs');
const { fork } = require('child_process');
const readString = fs.readFileSync('depth_readings_p2.txt', 'utf8');
const readArray = readString.split('\n');
var numArrayDef = [];
for (let i = 0; i < readArray.length - 2; i++) {
let msg = {
i,
readArray
};
let childProcess = fork('function.js');
childProcess.send(msg);
childProcess.on('message', (m) => {
console.log(m);
numArrayDef.push(m);
});
console.log(numArrayDef[i]);
}
As you see I'm sending the subprocess and object that includes the index, the array of strings and the array where the summed number will be stored. The parent process recieves the summed number and stores it in numArrayDef.
function.js
process.on('message', (msg) => {
let num = 0;
if ((msg.i + 2) < msg.readArray.length) {
num += parseInt(msg.readArray[msg.i]);
num += parseInt(msg.readArray[msg.i + 1]);
num += parseInt(msg.readArray[msg.i + 2]);
process.send(num);
}
process.exit();
});
In the output I can see that the parent is receiving everything correctly, but the program isn't pushing the received values into the result array. Also, the order of execution is weird:
- First, everything in the loop but the message receiving block.
- Second, everything after the loop ends.
- Finally, the message receiving block.
undefined
undefined
undefined
undefined
undefined
undefined
undefined
undefined
[]
607
618
618
617
647
716
769
792
I know I'm missing something about forking processes, but I don't know what is it and I don't see it in the fork documentation.
What you have to understand in nodejs is it's asynchronious nature, the code is not really executed in order as you have written it! (atleast, a lot of times..)
The childProcess is a process handle which will be returned immediatly. But the forked process itself may take some time to start. What you do, is to add a callback which will be executed every time, a message event is received. Check this code:
parent.js
let childProcess = fork('function.js');
// this line is executed immedatly after the handle is created.
// You pass a newly created function to the ".on()" function which will be
// called everytime, the child process sends a "message" event.
// you want to understand, that you just declare an anonymious `function`
// and pass it as argument. So the executed function has actually to decide
// when to call it!.
childProcess.on('message', (m) => {
console.log('message received in parent:', m)
console.log('closing the process')
childProcess.kill('SIGINT')
});
childProcess.on('exit', () => {
console.log('child is done!')
})
childProcess.send('I will come back!')
console.log('last line reached. Program still running.')
function.js
process.on('message', (msg) => {
// wait a few seconds, and return the message!
setTimeout(() => {
process.send(msg)
// wait 2000ms
}, 2000)
}
output
last line reached. Program still running.
message received in parent: I will come back!
closing the process
child is done!
execution order
Fork a process and get it's handle. Code execution goes on!
Register callback listeners which will be called on given events like message OR exit. These are actually asynchronious. You don't know when they kick in.
log that all lines have been executed
Some time later, the message listener and after it, the exit listener kick in.
your code
You code basically executes till the end (only adding handlers to a process handle) and will log data from numArrayDef which is not currently added to it. So if no element is present at numArrayDef[5], it will log undefined per default.
callbacks
Since nodejs is single threaded per default, it's common to xecute an asynchronious function and pass it a callback (just another function) which will be executed when your called function is done!
The fixed code
parent.js
const fs = require('fs');
const { fork } = require('child_process');
const { EOL } = require('os')
const readString = fs.readFileSync('file.txt', 'utf8');
const readArray = readString.split(EOL);
var numArrayDef = [];
for (let i = 0; i < readArray.length - 2; i++) {
// msg building. Done instantly.
let msg = {
i,
readArray
};
// forking a childprocess. The handle is retunred immediatly
// but starting the process may be taking some time, and
// the code won't wait for it!.
let childProcess = fork('function.js');
// this line is executed immedatly after the handle is created.
// You add a so
childProcess.on('message', (m) => {
console.log('message recevied', m)
numArrayDef.push(m);
// log if all numbers are done.
if(numArrayDef.length === readArray.length -2) {
console.log('Done. Here\'s the array:', numArrayDef)
}
});
childProcess.send(msg);
}
function.js
process.on('message', (msg) => {
let num = 0;
if ((msg.i + 2) < msg.readArray.length) {
num += parseInt(msg.readArray[msg.i]);
num += parseInt(msg.readArray[msg.i + 1]);
num += parseInt(msg.readArray[msg.i + 2]);
process.send(num);
}
process.exit();
});
This should give you an idea. I recommend, going for some tutorials in the beginning to understand the nature of the language.
What you should learn about nodejs
Learn what a callback is
Basic understanding of async/await and Promises is a must
You should learn, what operations are sync and which ones are async
Eventemitter class is also used very often
Learning how to handle childprocess or fork and other similar stuff, is not really required to get the base understanding of nodejs
Function declaration
Just an addon to the syntax. These are almost exactly the same. Except that with the arrow function style the this context will be correctly applied to newly created function:
// variant1
function abc(fn) {
// eexcute the argument which is a function. But only after
// a timeout!.
setTimeout(fn, 2000)
}
// variant2
const abc = function(fn) {
// eexcute the argument which is a function. But only after
// a timeout!.
setTimeout(fn, 2000)
}
// variant3
const abc = (fn) => {
// eexcute the argument which is a function. But only after
// a timeout!.
setTimeout(fn, 2000)
}
// call it like so:
abc(function() {
console.log('I was passed!!.')
})
console.log('The abc function was called. Let\'s wait for it to call the passed function!\')
Related
Is it possible to cancel a regex.match operation if takes more than 10 seconds to complete?
I'm using an huge regex to match a specific text, and sometimes may work, and sometimes can fail...
regex: MINISTÉRIO(?:[^P]*(?:P(?!ÁG\s:\s\d+\/\d+)[^P]*)(?:[\s\S]*?))PÁG\s:\s+\d+\/(\d+)\b(?:\D*(?:(?!\1\/\1)\d\D*)*)\1\/\1(?:[^Z]*(?:Z(?!6:\s\d+)[^Z]*)(?:[\s\S]*?))Z6:\s+\d+
Working example: https://regex101.com/r/kU6rS5/1
So.. i want cancel the operation if takes more than 10 seconds. Is it possible? I'm not finding anything related in sof
Thanks.
You could spawn a child process that does the regex matching and kill it off if it hasn't completed in 10 seconds. Might be a bit overkill, but it should work.
fork is probably what you should use, if you go down this road.
If you'll forgive my non-pure functions, this code would demonstrate the gist of how you could communicate back and forth between the forked child process and your main process:
index.js
const { fork } = require('child_process');
const processPath = __dirname + '/regex-process.js';
const regexProcess = fork(processPath);
let received = null;
regexProcess.on('message', function(data) {
console.log('received message from child:', data);
clearTimeout(timeout);
received = data;
regexProcess.kill(); // or however you want to end it. just as an example.
// you have access to the regex data here.
// send to a callback, or resolve a promise with the value,
// so the original calling code can access it as well.
});
const timeoutInMs = 10000;
let timeout = setTimeout(() => {
if (!received) {
console.error('regexProcess is still running!');
regexProcess.kill(); // or however you want to shut it down.
}
}, timeoutInMs);
regexProcess.send('message to match against');
regex-process.js
function respond(data) {
process.send(data);
}
function handleMessage(data) {
console.log('handing message:', data);
// run your regex calculations in here
// then respond with the data when it's done.
// the following is just to emulate
// a synchronous computational delay
for (let i = 0; i < 500000000; i++) {
// spin!
}
respond('return regex process data in here');
}
process.on('message', handleMessage);
This might just end up masking the real problem, though. You may want to consider reworking your regex like other posters have suggested.
Another solution I found here:
https://www.josephkirwin.com/2016/03/12/nodejs_redos_mitigation/
Based on the use of VM, no process fork.
That's pretty.
const util = require('util');
const vm = require('vm');
var sandbox = {
regex:/^(A+)*B/,
string:"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC",
result: null
};
var context = vm.createContext(sandbox);
console.log('Sandbox initialized: ' + vm.isContext(sandbox));
var script = new vm.Script('result = regex.test(string);');
try{
// One could argue if a RegExp hasn't processed in a given time.
// then, its likely it will take exponential time.
script.runInContext(context, { timeout: 1000 }); // milliseconds
} catch(e){
console.log('ReDos occurred',e); // Take some remedial action here...
}
console.log(util.inspect(sandbox)); // Check the results
I'm just learning javascript, and a common task I perform when picking up a new language is to write a hex-dump program. The requirements are 1. read file supplied on command line, 2. be able to read huge files (reading a buffer-at-a-time), 3. output the hex digits and printable ascii characters.
Try as I might, I can't get the fs.read(...) function to actually execute. Here's the code I've started with:
console.log(process.argv);
if (process.argv.length < 3) {
console.log("usage: node hd <filename>");
process.exit(1);
}
fs.open(process.argv[2], 'r', (err,fd) => {
if (err) {
console.log("Error: ", err);
process.exit(2);
} else {
fs.fstat(fd, (err,stats) => {
if (err) {
process.exit(4);
} else {
var size = stats.size;
console.log("size = " + size);
going = true;
var buffer = new Buffer(8192);
var offset = 0;
//while( going ){
while( going ){
console.log("Reading...");
fs.read(fd, buffer, 0, Math.min(size-offset, 8192), offset, (error_reading_file, bytesRead, buffer) => {
console.log("READ");
if (error_reading_file)
{
console.log(error_reading_file.message);
going = false;
}else{
offset += bytesRead;
for (a=0; a< bytesRead; a++) {
var z = buffer[a];
console.log(z);
}
if (offset >= size) {
going = false;
}
}
});
}
//}
fs.close(fd, (err) => {
if (err) {
console.log("Error closing file!");
process.exit(3);
}
});
}
});
}
});
If I comment-out the while() loop, the read() function executes, but only once of course (which works for files under 8K). Right now, I'm just not seeing the purpose of a read() function that takes a buffer and an offset like this... what's the trick?
Node v8.11.1, OSX 10.13.6
First of all, if this is just a one-off script that you run now and then and this is not code in a server, then there's no need to use the harder asynchronous I/O. You can use synchronous, blocking I/O will calls such as fs.openSync(), fs.statSync(), fs.readSync() etc... and then thinks will work inside your while loop because those calls are blocking (they don't return until the results are done). You can write normal looping and sequential code with them. One should never use synchronous, blocking I/O in a server environment because it ruins the scalability of a server process (it's ability to handle requests from multiple clients), but if this is a one-off local script with only one job to do, then synchronous I/O is perfectly appropriate.
Second, here's why your code doesn't work properly. Javascript in node.js is single-threaded and event-driven. That means that the interpreter pulls an event out of the event queue, runs the code associated with that event and does nothing else until that code returns control back to the interpreter. At that point, it then pulls the next event out of the event queue and runs it.
When you do this:
while(going) {
fs.read(... => (err, data) {
// some logic here that may change the value of the going variable
});
}
You've just created yourself an infinite loop. This is because the while(going) loop just runs forever. It never stops looping and never returns control back to the interpreter so that it can fetch the next event from the event queue. It just keeps looping. But, the completion of the asynchronous, non-blocking fs.read() comes through the event queue. So, you're waiting for the going flag to change, but you never allow the system to process the events that can actually change the going flag. In your actual case, you will probably eventually run out of some sort of resource from calling fs.read() too many times in a tight loop or the interpreter will just hang in an infinite loop.
Understanding how to program a repetitive, looping type of tasks with asynchronous operations involved requires learning some new techniques for programming. Since much I/O in node.js is asynchronous and non-blocking, this is an essential skill to develop for node.js programming.
There are a number of different ways to solve this:
Use fs.createReadStream() and read the file by listening for the data event. This is probably the cleanest scheme. If your objective here is do a hex outputter, you might even want to learn a stream feature called a transform where you transform the binary stream into a hex stream.
Use promise versions of all the relevant fs functions here and use async/await to allow your for loop to wait for an async operation to finish before going to the next iteration. This allows you to write synchronous looking code, but use async I/O.
Write a different type of looping construct (not using a while) loop that manually repeats the loop after fs.read() completes.
Here's a simple example using fs.createReadStream():
const fs = require('fs');
function convertToHex(val) {
let str = val.toString(16);
if (str.length < 2) {
str = "0" + str;
}
return str.toUpperCase();
}
let stream = fs.createReadStream(process.argv[2]);
let outputBuffer = "";
stream.on('data', (data) => {
// you get an unknown length chunk of data from the file here in a Buffer object
for (const val of data) {
outputBuffer += convertToHex(val) + " ";
if (outputBuffer.length > 100) {
console.log(outputBuffer);
outputBuffer = "";
}
}
}).on('error', err => {
// some sort of error reading the file
console.log(err);
}).on('end', () => {
// output any remaining buffer
console.log(outputBuffer);
});
Hopefully you will notice that because the stream handles opening, closing and reading from the file for you that this is a lot simpler way to code. All you have to do is supply event handlers for data that is read, a read error and the end of the operation.
Here's a version using async/await and the new file interface (where the file descriptor is an object that you call methods on) with promises in node v10.
const fs = require('fs').promises;
function convertToHex(val) {
let str = val.toString(16);
if (str.length < 2) {
str = "0" + str;
}
return str.toUpperCase();
}
async function run() {
const readSize = 8192;
let cntr = 0;
const buffer = Buffer.alloc(readSize);
const fd = await fs.open(process.argv[2], 'r');
try {
let outputBuffer = "";
while (true) {
let data = await fd.read(buffer, 0, readSize, null);
for (let i = 0; i < data.bytesRead; i++) {
cntr++;
outputBuffer += convertToHex(buffer.readUInt8(i)) + " ";
if (outputBuffer.length > 100) {
console.log(outputBuffer);
outputBuffer = "";
}
}
// see if all data has been read
if (data.bytesRead !== readSize) {
console.log(outputBuffer);
break;
}
}
} finally {
await fd.close();
}
return cntr;
}
run().then(cntr => {
console.log(`done - ${cntr} bytes read`);
}).catch(err => {
console.log(err);
});
So I have been doing some micro-optimization in Node.js. I'm noticing that when multithreading, there is a huge first-message delay on child processes. Take the following code for example:
RUN IT HERE
index.js
var cp = require('child_process');
var child = cp.fork(__dirname + '/worker.js');
var COUNT = 10;
var start;
child.on('message', (e) => {
var end = Date.now();
console.log('Response time', end - start);
if(COUNT--) {
sendMessage();
} else {
process.exit();
}
});
function sendMessage () {
start = Date.now();
child.send('hi!');
}
sendMessage();
worker.js
process.on('message', e => {
process.send('Good morning!');
});
Explanation: all i'm doing is creating a child process and sending it messages, to which it will immediately respond. I'm measuring the time between sending the message and receiving a response. The following output occurs nearly everytime:
Response time 51
Response time 0
Response time 0
Response time 0
Resp...
There is a huge delay between the first message/response couple. And I have noticed this for other projects I made with child processes as well. Why is this occuring? How can I fix it?
Edit: after some debugging, the delay seems to occur after the first child.send to the first process.on callback. There is nearly no delay in the response of the child process.
Related: https://github.com/nodejs/node/issues/3145
In the great book i'm reading now NodeJs design patterns I see the following example:
var fs = require('fs');
var cache = {};
function inconsistentRead(filename, callback) {
if (cache[filename]) {
//invoked synchronously
callback(cache[filename]);
} else {
//asynchronous function
fs.readFile(filename, 'utf8', function(err, data) {
cache[filename] = data;
callback(data);
});
}
}
then:
function createFileReader(filename) {
var listeners = [];
inconsistentRead(filename, function(value) {
listeners.forEach(function(listener) {
listener(value);
});
});
return {
onDataReady: function(listener) {
listeners.push(listener);
}
};
}
and usage of it:
var reader1 = createFileReader('data.txt');
reader1.onDataReady(function(data) {
console.log('First call data: ' + data);
The author says that if the item is in cache the behaviour is synchronous and asynchronous if its not in cache. I'm ok with that. he then continues to say that we should be either sync or async. I'm ok with that.
What I don't understand is that if I take the asynchronous path then when this line var reader1 = createFileReader('data.txt'); is executed can't the asynchronous file read finish already and thus the listener won't be registered in the following line which tries to register it?
JavaScript will never interrupt a function to run a different function.
The "file has been read" handler will be queued until the JavaScript event loop is free.
The async read operation won't call its callback or start emitting events until after the current tick of the event loop, so the sync code that registers the event listener will run first.
Yes,I feel the same when read this part of the book.
"inconsistentRead looks good"
But in the next paragraphs I will explain the potential bug this kind of sync/async functions "could" produce when used (so it could not pass too).
As a summary, was happen in the sample of use is:
In an event cycle 1:
reader1 is created, cause "data.txt" isn't cached yet, it will respond async in other event cycle N.
some callbacks are subscribed for reader1 readiness. And will be called on cycle N.
In event cycle N:
"data.txt" is read and this is notified and cached, so reader1 subscribed callbacks are called.
In event cycle X (but X >= 1, but X could be before or after N): (maybe a timeout, or other async path schedule this)
reader2 is created for the same file "data.txt"
What happens if:
X === 1 : The bug could express in a no mentioned way, cause the data.txt result will attempt to cache twice, the first read, the more fast, will win. But reader2 will register its callbacks before the async response is ready, so they will be called.
X > 1 AND X < N: Happens the same as X === 1
X > N : the bug will express as explained in the book:
You create reader2 (the response for it is already cached), the onDataReady is called cause the data is cached (but you don't subscribe any subscriber yet), and after that yo subscribe the callbacks with onDataReady, but this will not be called again.
X === N: Well, this is an edge case, and if the reader2 portion run first will pass the same as X === 1, but, if run after "data.txt" readiness portion of inconsistentRead then will happen the same as when X > N
this example was more helpful for me to understand this concept
const fs = require('fs');
const cache = {};
function inconsistentRead(filename, callback) {
if (cache[filename]) {
console.log("load from cache")
callback(cache[filename]);
} else {
fs.readFile(filename, 'utf8', function (err, data) {
cache[filename] = data;
callback(data);
});
}
}
function createFileReader(filename) {
const listeners = [];
inconsistentRead(filename, function (value) {
console.log("inconsistentRead CB")
listeners.forEach(function (listener) {
listener(value);
});
});
return {
onDataReady: function (listener) {
console.log("onDataReady")
listeners.push(listener);
}
};
}
const reader1 = createFileReader('./data.txt');
reader1.onDataReady(function (data) {
console.log('First call data: ' + data);
})
setTimeout(function () {
const reader2 = createFileReader('./data.txt');
reader2.onDataReady(function (data) {
console.log('Second call data: ' + data);
})
}, 100)
output:
╰─ node zalgo.js
onDataReady
inconsistentRead CB
First call data: :-)
load from cache
inconsistentRead CB
onDataReady
when the call is async the onDataReady handler is set before file is read and in the async the the itration finishes before onDataReady is setting the listener
I think the problem can also be illustrated with a simpler example:
let gvar = 0;
let add = (x, y, callback) => { callback(x + y + gvar) }
add(3,3, console.log); gvar = 3
In this case, callback is invoked immediately inside add, so the change of gvar afterwards has no effect: console.log(3+3+0)
On the other hand, if we add asynchronously
let add2 = (x, y, callback) => { setImmediate(()=>{callback(x + y + gvar)})}
add2(3, 3, console.log); gvar = 300
Because the order of execution, gvar=300 runs before the async call setImmediate, so the result becomes console.log( 3 + 3 + 300)
In Haskell, you have pure function vs monad, which are similar to "async" functions that get executed "later". In Javascript these are not explicitly declared. So these "delayed" executed code can be difficult to debug.
In Node.js I'm using the fs.createWriteStream method to append data to a local file. In the Node documentation they mention the drain event when using fs.createWriteStream, but I don't understand it.
var stream = fs.createWriteStream('fileName.txt');
var result = stream.write(data);
In the code above, how can I use the drain event? Is the event used properly below?
var data = 'this is my data';
if (!streamExists) {
var stream = fs.createWriteStream('fileName.txt');
}
var result = stream.write(data);
if (!result) {
stream.once('drain', function() {
stream.write(data);
});
}
The drain event is for when a writable stream's internal buffer has been emptied.
This can only happen when the size of the internal buffer once exceeded its highWaterMark property, which is the maximum bytes of data that can be stored inside a writable stream's internal buffer until it stops reading from the data source.
The cause of something like this can be due to setups that involve reading a data source from one stream faster than it can be written to another resource. For example, take two streams:
var fs = require('fs');
var read = fs.createReadStream('./read');
var write = fs.createWriteStream('./write');
Now imagine that the file read is on a SSD and can read at 500MB/s and write is on a HDD that can only write at 150MB/s. The write stream will not be able to keep up, and will start storing data in the internal buffer. Once the buffer has reached the highWaterMark, which is by default 16KB, the writes will start returning false, and the stream will internally queue a drain. Once the internal buffer's length is 0, then the drain event is fired.
This is how a drain works:
if (state.length === 0 && state.needDrain) {
state.needDrain = false;
stream.emit('drain');
}
And these are the prerequisites for a drain which are part of the writeOrBuffer function:
var ret = state.length < state.highWaterMark;
state.needDrain = !ret;
To see how the drain event is used, take the example from the Node.js documentation.
function writeOneMillionTimes(writer, data, encoding, callback) {
var i = 1000000;
write();
function write() {
var ok = true;
do {
i -= 1;
if (i === 0) {
// last time!
writer.write(data, encoding, callback);
} else {
// see if we should continue, or wait
// don't pass the callback, because we're not done yet.
ok = writer.write(data, encoding);
}
} while (i > 0 && ok);
if (i > 0) {
// had to stop early!
// write some more once it drains
writer.once('drain', write);
}
}
}
The function's objective is to write 1,000,000 times to a writable stream. What happens is a variable ok is set to true, and a loop only executes when ok is true. For each loop iteration, the value of ok is set to the value of stream.write(), which will return false if a drain is required. If ok becomes false, then the event handler for drain waits, and on fire, resumes the writing.
Regarding your code specifically, you don't need to use the drain event because you are writing only once right after opening your stream. Since you have not yet written anything to the stream, the internal buffer is empty, and you would have to be writing at least 16KB in chunks in order for the drain event to fire. The drain event is for writing many times with more data than the highWaterMark setting of your writable stream.
Imagine you're connecting 2 streams with very different bandwidths, say, uploading a local file to a slow server. The (fast) file stream will emit data faster than the (slow) socket stream can consume it.
In this situation, node.js will keep data in memory until the slow stream gets a chance to process it. This can get problematic if the file is very large.
To avoid this, Stream.write returns false when the underlying system buffer is full. If you stop writing, the stream will later emit a drain event to indicate that the system buffer has emptied and it is appropriate to write again.
You can use pause/resume the readable stream and control the bandwidth of the readable stream.
Better: you can use readable.pipe(writable) which will do this for you.
EDIT: There's a bug in your code: regardless of what write returns, your data has been written. You don't need to retry it. In your case, you're writing data twice.
Something like this would work:
var packets = […],
current = -1;
function niceWrite() {
current += 1;
if (current === packets.length)
return stream.end();
var nextPacket = packets[current],
canContinue = stream.write(nextPacket);
// wait until stream drains to continue
if (!canContinue)
stream.once('drain', niceWrite);
else
niceWrite();
}
Here is a version with async/await
const write = (writer, data) => {
return new Promise((resolve) => {
if (!writer.write(data)) {
writer.once('drain', resolve)
}
else {
resolve()
}
})
}
// usage
const run = async () => {
const write_stream = fs.createWriteStream('...')
const max = 1000000
let current = 0
while (current <= max) {
await write(write_stream, current++)
}
}
https://gist.github.com/stevenkaspar/509f792cbf1194f9fb05e7d60a1fbc73
This is a speed-optimized version using Promises (async/await). The caller has to check if it gets a promise back and only in that case await has to be called. Doing await on each call can slow down the program by a factor of 3...
const write = (writer, data) => {
// return a promise only when we get a drain
if (!writer.write(data)) {
return new Promise((resolve) => {
writer.once('drain', resolve)
})
}
}
// usage
const run = async () => {
const write_stream = fs.createWriteStream('...')
const max = 1000000
let current = 0
while (current <= max) {
const promise = write(write_stream, current++)
// since drain happens rarely, awaiting each write call is really slow.
if (promise) {
// we got a drain event, therefore we wait
await promise
}
}
}