fs.createReadStream loop not completing - javascript

I am iterating through an object containing local files, all of which definitely exist, reading them into a buffer and incrementing a counter when each completes. The problem is despite there being 319 files to read, printing out the counter to the console rarely, if ever, shows it getting through all of them. It mysteriously stops in the 200's somewhere... different every time and without throwing any errors.
I have this running in an electron project and the built app works seamlessly on a Mac but won't get through this loop on windows! I've recently updated all the packages and have been through in other areas and made the necessary adjustments and the whole app is working perfectly.. except this and it's driving me mad!
Here's the code:
$.each(compare_object, function(key, item) {
console.log(item.local_path); // this correctly prints out every single file path
var f = fs.createReadStream(item.local_path);
f.on('data', function(buf) {
// normally some other code goes in here but I've simplified it right down for the purposes of getting it working!
});
f.on('end', function(err) {
num++;
console.log(num); // this rarely reached past 280 out of 319 files. Always different though.
});
f.on('error', function(error) {
console.log(error); // this never fires.
num++;
});
});
I'm wondering if there's a cache that's maxing out or if I should be destroying the buffer after 'end' every time but nothing I've read suggest this and even when I tried it made no difference. A lot of the example expect you to be piping it somewhere, which I'm not. In the full code it creates a hash of the complete file and adds it to the object for each of the local files.

I believe the loop is completing here. The problem: you are putting some handlers which are async. The easiest possible solution here is to rewrite your code without streams.
const fs = require('fs')
const util = require('util')
const asyncReadFile = util.promisify(fs.readFile)
//.. this loop goes into some function with async or you can use readFileAsync
for (let [key, item] of Object.entries(compare_object)) {
const data = await asyncReadFile(item.local_path)
///. here goes data handling
}

Related

Asynchronously stopping a loop from outside node.js

I am using node.js 14 and currently have a loop that is made by a recursive function and a setTimeout, something like this:
this.timer = null;
async recursiveLoop() {
//Do Stuff
this.timer = setTimeout(this.recursiveLoop.bind(this), rerun_time);
}
But sometimes this loop gets stuck and I want it to automatically notice it, clean up and restart. So I tried doing something like this:
this.timer = null;
async recursiveLoop() {
this.long_timer = setTimeout(() => throw new Error('Taking too long!'), tooLong);
//Do Stuff
this.timer = setTimeout(this.recursiveLoop.bind(this), rerun_time);
}
main() {
//Do other asynchronous stuff
recursiveLoop()
.then()
.catch((e) => {
console.log(e.message);
cleanUp();
recursiveLoop();
}
}
I can't quite debug where it gets stuck, because it seems quite random and the program runs on a virtual machine. I still couldn't reproduce it locally.
This makeshift solution, instead of working, keeps crashing the whole node.js aplication, and now I am the one stuck. I have the constraint of working with node.js 14, without using microservices, and I never used child process before. I am a complete beginner. Please help me!
If you have a black box of code (which is all you've given us) with no way to detect errors on it and you just want to know when it is no longer generating results, you can put it in a child_process and ask the code in the child process to send you a message every time it runs an iteration. Then, in your main process, you can set a timer that resets itself every time it gets one of these "health" messages from the child. If the timer fires without getting a health message, then the child must be "stuck" because you haven't heard from it within your timeout time. You can then kill the child process at that point and restart it.
But, that is a giant hack. You should FIX the code that gets stuck or at least understand what's going on. Probably you're either leaking memory, file handles, database handles, running code that uses locks and messes up or there are unhandled errors happening. All are indications of code that should be fixed.

nodejs debug - save state of the script (with all objects and var values) and use it on next run

When I run the script, my application first creates an object that I will use in all my applications.
Object creation takes up to 10 seconds. So when I try to test any new piece of code, I have to wait 10 seconds each time.
The app is going to be big enough, I can't wait that long when I add a new line of code. How to deal with this? Is there a way to save the state of the script from some point and run it every time from the point with this "heavy" object already initialized.
Assuming that heavy object is something that will not change often, you could try
exporting it like this
const heavyObj = {};
module.exports = heavyObj ;
After that import the object wherever you need
const heavyObj = require('./data.js')

Is there a way to minimise CPU usage by reducing number of write operations to chrome.storage?

I am making a chrome extension that keeps track of the time I spend on each site.
In the background.js I am using a map(stored as an array) that saves the list of sites as shown.
let observedTabs = [['chrome://extensions', [time, time, 'icons/sad.png']]];
Every time I update my current site, the starting and ending time of my time on that particular site is stored in the map corresponding to the site's key.
To achieve this, I am performing the chrome.storage.sync.get and chrome.storage.sync.set inside the tabs.onActivated, tabs.onUpdated, windows.onFocusChanged and idle.onStateChanged.
This however results in a very high CPU usage for chrome(around 25%) due to multiple read and write processes from(and to) storage.
I tried to solve the problem by using global variables in background.js and initialising them to undefined. Using the function shown below, I read from storage only when the the current variable is undefined(first time background.js tries to get the data) and at all other times, it just uses the set global variable.
let observedTabs = undefined;
function getObservedTabs(callback) {
if (observedTabs === undefined) {
chrome.storage.sync.get("observedTabs", (observedTabs_obj) => {
callback(observedTabs_obj.observedTabs);
});
} else {
callback(observedTabs);
}
}
This solves the problem of the costly repeated read operations.
As for the write operations, I considered using runtime.onSuspend to write to storage once my background script stops executing, as shown:
chrome.runtime.onSuspend.addListener(() => {
getObservedTabs((_observedTabs) => {
observedTabs = _observedTabs;
chrome.storage.sync.set({"observedTabs": _observedTabs});
});
});
This, however doesn't work. And the documentation also warns about this.
Note that since the page is unloading, any asynchronous operations started while handling this event are not guaranteed to complete.
Is there a workaround that would allow me to minimise my writing operations to storage and hence reduce my CPU usage?

Marklogic Server Side Javascript: XDMP-CONFLICTINGUPDATES while using explicit commit

I've been having problems with conflicting updates in Marklogic. I'm aware of the cause, but I don't know how to fix it.
I have 1 main (.sjs) file which calls two different (.sjs) files which both update a set of documents. In the main file I use: declareUpdate({explicitCommit: true}); and then in the separate files I use the command xdmp.commit(); after updating the documents. However, I'm still getting: XDMP-CONFLICTINGUPDATES.
Part of the code: Main.sjs:
function main(){
declareUpdate({explicitCommit: true});
function.to.file.1();
function.to.file.2();
}
file1.sjs:
//make some docs and insert them into ML
function file1Function(){
for (let d of someCollectionOfData) {
xdmp.documentInsert('/a/uri/filexx.json', d, {collections: aCollectionName1});
};
xdmp.commit();
}
file2.sjs:
//adjust docs made in file1 and put them back into ML
funtion file2Function(){
for (let d of xdmp.directory('/location/of/aCollectionName1/','infinity')) {
let dObj = d.toObject();
dObj.addSomething = 'something';
xdmp.documentInsert(fn.documentUri(d), dObj, {collections: aCollectionName1});
}
xdmp.commit();
}
It must mean your file1 is located inside '/location/of/aCollectionName1/'. Keep in mind that MarkLogic doesn't commit immediately when you invoke xdmp.commit(). Actual persisting is always postponed until after all code has executed. It therefor doesn't make much sense to invoke xdmp.commit() more than once in one request. You won't be able to read your updates after xdmp.commit().
HTH!

Best way to read many files with nodejs?

I have a large glob of file paths. I'm getting this path list from a streaming glob module https://github.com/wearefractal/glob-stream
I was piping this stream to another stream that was creating fileReadStreams for each path and quickly hitting some limits. I was getting the:
warning: possible EventEmitter memory leak detected. 11 listeners added. Use emitter.setMaxListeners() to increase limit
and also Error: EMFILE, open
I've tried bumping the maxListeners but I have ~9000 files that would be creating streams and I'm concerned that will eat memory that number is not constant and will grow. Am I safe to remove the limit here?
Should I be doing this synchronously? or should I be iterating over the paths and reading the files sequentially? Won't that still execute all the reads at once using a for loop?
The max listeners thing is purely a warning. setMaxListeners only controls when that message is printed to the console, nothing else. You can disable it or just ignore it.
The EMFILE is your OS enforcing a limit on the number of open files (file descriptors) your process can have at a single time. You could avoid this by increasing the limit with ulimit.
Because saturating the disk by running many thousands of concurrent filesystem operations won't get you any added performance—in fact, it will hurt, especially on traditional non-SSD drives—it is a good idea to only run a controlled number of operations at once.
I'd probably use an async queue, which allows you to push the name of every file to the queue in one loop, and then only runs n operations at once. When an operation finishes, the next one in the queue starts.
For example:
var q = async.queue(function (file, cb) {
var stream = fs.createReadStream(file.path);
// ...
stream.on('end', function() {
// finish up, then
cb();
});
}, 2);
globStream.on('data', function(file) {
q.push(file);
});
globStream.on('end', function() {
// We don't want to add the `drain` handler until *after* the globstream
// finishes. Otherwise, we could end up in a situation where the globber
// is still running but all pending file read operations have finished.
q.drain = function() {
// All done with everything.
};
// ...and if the queue is empty when the globber finishes, make sure the done
// callback gets called.
if (q.idle()) q.drain();
});
You may have to experiment a little to find the right concurrency number for your application.

Categories

Resources