Chrome File System API hanging - javascript

disclaimer, self-answered post to hopefully save others time.
Setup:
I've been using chrome's implementation of the file systems API, [1] [2] [3].
This requires enabling the flag chrome://flags/#native-file-system-api.
For starters I want to recursively read a directory and obtain a list of files. This is simple enough:
paths = [];
let recursiveRead = async (path, handle) => {
let reads = [];
// window.handle = handle;
for await (let entry of await handle.getEntries()) { // <<< HANGING
if (entry.isFile)
paths.push(path.concat(entry.name));
else if (/* check some whitelist criteria to restrict which dirs are read*/)
reads.push(recursiveRead(path.concat(entry.name), entry));
}
await Promise.all(reads);
console.log('done', path, paths.length);
};
chooseFileSystemEntries({type: 'openDirectory'}).then(handle => {
recursiveRead([], handle).then(() => {
console.log('COMPLETELY DONE', paths.length);
});
});
I've also implemented a non-recursive while-loop-queue version. And lastly, I've implemented a node fs.readdir version. All 3 solutions work fine for small directories.
The problem:
But then I tried running it on some sub-directories of the chromium source code ('base', 'components', and 'chrome'); together the 3 sub-dirs consist of ~63,000 files. While the node implementation worked fine (and surprisingly it utilized cached results between runs, resulting in instantaneous runs after the first), both browser implementations hung.
Attempted debugging:
Sometimes, they would return the full 63k files and print 'COMPLETLEY DONE' as expected. But most often (90% of the time) they would read 10k-40k files before hanging.
I dug deeper into the hanging, and apparently the for await line was hanging. So I added the line window.handle = handle immediately before the for loop; when the function hung, I ran the for loop directly in the browser console, and it worked correctly! So now I'm stuck. I have seemingly working code that randomly hangs.

Solution:
I tried skipping over directories that would hang:
let whitelistDirs = {src: ['base', 'chrome', 'components', /*'ui'*/]}; // 63800
let readDirEntry = (handle, timeout = 500) => {
return new Promise(async (resolve, reject) => {
setTimeout(() => reject('timeout'), timeout);
let entries = [];
for await (const entry of await handle.getEntries())
entries.push(entry);
resolve(entries);
});
};
let readWhile = async entryHandle => {
let paths = [];
let pending = [{path: [], handle: entryHandle}];
while (pending.length) {
let {path, handle} = pending.pop();
await readDirEntry(handle)
.then(entries =>
entries.forEach(entry => {
if (entry.isFile)
paths.push({path: path.concat(entry.name), handle: entry});
else if (path.length || !whitelistDirs[handle.name] || whitelistDirs[handle.name].includes(entry.name))
pending.push({path: path.concat(entry.name), handle: entry});
}))
.catch(() => console.log('skipped', handle.name));
console.log('paths read:', paths.length, 'pending remaining:', pending.length, path);
}
console.log('read complete, paths.length');
return paths;
};
chooseFileSystemEntries({type: 'openDirectory'}).then(handle => {
readWhile(handle).then(() => {
console.log('COMPLETELY DONE', paths.length);
});
});
And the results showed a pattern. Once a directory read hung and was skipped, the subsequent ~10 dir reads would likewise hang and be skipped. Then the following reads would resume functioning properly until the next similar incident.
// begins skipping
paths read: 45232 pending remaining: 49 (3) ["chrome", "browser", "favicon"]
VM60:25 skipped extensions
VM60:26 paths read: 45239 pending remaining: 47 (3) ["chrome", "browser", "extensions"]
VM60:25 skipped enterprise_reporting
VM60:26 paths read: 45239 pending remaining: 46 (3) ["chrome", "browser", "enterprise_reporting"]
VM60:25 skipped engagement
VM60:26 paths read: 45266 pending remaining: 45 (3) ["chrome", "browser", "engagement"]
VM60:25 skipped drive
VM60:26 paths read: 45271 pending remaining: 44 (3) ["chrome", "browser", "drive"]
// begins working properly again
So the issue seemed temporal. I added a simple retry wrapper with a 500ms wait between retries, and the reads began working fine.
readDirEntryRetry = async (handle, timeout = 500, tries = 5, waitBetweenTries = 500) => {
while (tries--) {
try {
return await readWhile(handle, timeout);
} catch (e) {
console.log('readDirEntry failed, tries remaining:', tries, handle.name);
await sleep(waitBetweenTries);
if (!tries)
return e;
}
}
};
Conclusion:
The non-standard Native File System API hangs when reading large directories. Simply retrying after waiting resolves the issue. Took me a good week to arrive at this solution, so thought it'd be worth sharing.

Related

dynamically zip generated pdf files node-archiver

I'm trying to create multiple PDF file using pdfkit, I have an array of users and I create a report for each one, the createTable()Function below returns a Buffer that I send to archiver to zip, once complete the zip file is sent for download to the front end.
My issue is that for some reason, Archiver will sometimes throw a QUEUECLOSED error, if I run the function too many time, sometimes I can run it 10 times and the 11th time I'll get an error and sometimes I get an error after the second time, each time i run it the data is the same and nothing else changed.
Any help is greatly appreciated.
users.forEach(async (worker, index) => {
createTable(date, worker.associateName, action, worker.email, worker.id, excahngeRate).then(resp => {
archive.append(resp, { name: worker.associateName + '.pdf' })
if (index === users.length - 1 === true) {//make sure its the last item in array
archive.pipe(output)
archive.finalize();
}
}).catch(err => {
console.log(err)
})
});
You finalize too soon. The createTable for the last user might not be the last to finish. You should add all to archive and once everything is done, finalize it.
// Use map to get an array of promises
const promises = users.map(async (worker, index) => {
return createTable(date, worker.associateName, action, worker.email, worker.id, excahngeRate).then(resp => {
archive.append(resp, { name: worker.associateName + '.pdf' })
}).catch(err => {
console.log(err)
})
});
// Wait for all promises to finish.
Promise.all(promises).then(()=>{
archive.pipe(output)
archive.finalize();
});
In your current code, you could console.log just before your IF statement, and log the index of the completed createTable, you'll see they do not finish in order.

how to detect file size / type while mid-download using axios or other requestor?

I have a scraper that looks for text on sites from a google search. However, occasionally the URLs for search are LARGE files without extension names (i.e. https://myfile.com/myfile/).
I do have a timeout mechanism in place, but by the time it times out, the file has already overloaded the memory. Is there any way to detect a file size or file type while it's being downloaded?
Here is my request function:
const getHtml = async (url, { timeout = 10000, ...opts } = {}) => {
const CancelToken = axios.CancelToken
const source = CancelToken.source()
try {
const timeoutId = setTimeout(() => source.cancel('Request cancelled due to timeout'), timeout)
let site = await axios.get(url, {
headers: {
'user-agent': userAgent().toString(),
connection: 'keep-alive', // self note: Isn't this prohibited on http/2?
},
cancelToken: source.token,
...opts,
})
clearTimeout(timeoutId)
return site.data
} catch (err) {
throw err
}
}
PS: I've seen similar questions, but none had an answer that would apply.
Ok so this isn't as easy to solve as one might expect. Ideally, http headers 'Content-length' and 'Content-type' exist so the user can know what he should expect but these aren't required headers. However those are often inaccurate or missing.
The solution I've found for this problem, which looks to be very reliable, involves two things:
Making the request as a Stream
Reading the file signature that the first byte of a lot of files have(probably due to ISO 8859-1, which lists these signatures); These are actually commonly known as Magic Numbers/Bytes.
A great way to use these two things is to stream the response and read the first bytes to check for the file signature; After you know if the file is in whatever format you support/want, then you can just process it as you'd normally or cancel the request before you read the next chunk of the stream, which should prevent overloading of your system(and which you can also use to measure the file size more accurately - which I show in the following snippet)
Here's how I implemented the solution mentioned above:
const getHtml = async (url, { timeout = 10000, ...opts } = {}) => {
const CancelToken = axios.CancelToken
const source = CancelToken.source()
try {
const timeoutId = setTimeout(() => source.cancel('Request cancelled due to timeout'), timeout)
const res = await axios.get(url, {
headers: {
connection: 'keep-alive',
},
cancelToken: source.token,
// Use stream mode so we can read the first chunk before getting the rest(1.6kB/chunk(highWatermark))
responseType: 'stream',
...opts,
})
const stream = res.data;
let firstChunk = true
let size = 0
// Not to be confused with arrayBuffer(the object) ;)
const bufferArray = []
// Async iterator syntax for consuming the stream. Iterating over a stream will consume it fully, but returning or breaking the loop in any way will destroy it
for await (const chunk of stream) {
if (firstChunk) {
firstChunk = false
// Only check the first 100(relevant, spaces excl.) chars of the chunk for html. This would possibly only fail in a raw text file which contains the word html at the very top(very unlikely and even then, wouldn't break anything)
const stringChunk = String(chunk).replace(/\s+/g, '').slice(0, 100).toLowerCase()
if (!stringChunk.includes('html')) return { error: `Requested URL is detected as a file. URL: ${url}\nChunk's magic 100: ${stringChunk}` };
}
size += Buffer.byteLength(chunk);
if (size > sizeLimit) return { error: `Requested URL is too large.\nURL: ${url}\nSize: ${size}` };
const buff = new Buffer.from(chunk)
bufferArray.push(buff)
}
// After the stream is fully consumed, we clear the timeout and create one big buffer to convert to str and return that
clearTimeout(timeoutId)
return { html: Buffer.concat(bufferArray).toString() }
} catch (err) {
throw err
}
}

Reading, parsing files and inserting documents using NestJS and MongoDB causing JavaScript heap out of memory

My NestJS application has a simple purpose to:
Loop through an array of large files ( 29 files where each one have about 12k to 70k lines)
Read a file line by line and parse it
Insert (each line) into my MongoDB collection
The most important part of my code consist of:
for(let file of FILES){
result = await this.processFile(file);
resultInsert += result;
}
and the function processFile()
async processFile(fileName: string): Promise<number> {
count = 0;
return new Promise((resolve, reject) => {
let s = fs
.createReadStream(BASE_PATH + fileName, {encoding: 'latin1'})
.pipe(es.split())
.pipe(
es
.mapSync(async (line: string) => {
count++;
console.log(line);
let line_splited = line.split("#");
let user = {
name: line_splited[0],
age: line_splited[1],
address: line_splited[2],
job: line_splited[3],
country: line_splited[4]
}
await this.userModel.updateOne(
user,
user,
{ upsert: true }
);
})
.on('end', () => {
resolve(count);
})
.on('error', err => {
reject(err);
})
);
});
}
The main problem is by the interaction of the ~9th file, I have a memory failure: Allocation failed - JavaScript heap out of memory.
I saw that my problem is similar to Parsing huge logfiles in Node.js - read in line-by-line but the code still managed to fail.
I suspect the fact that I am opening a file, reading it and when I open another file, I am still inserting the previous one can cause the problem but I don't know how to handle it.
I could make it work by changing the updateOne() to insertMany().
Quick explanation: instead of inserting one by one, we would be inserting by 100k.
So I just created an array of user and when it reached 100k documents, we would insert with insertMany()

protractor: random test fail

So I just started work on protractor tests and I'm facing the following problem - my tests fail inconsistently. Sometimes the test may pass and the next time it fails. Reasons to fail is very different, it may because it failed to find an element on a page or element does not have text in it (even if it has).
I'm running on Ubuntu 14.04, the same problem relevant for Chrome Version 71.0.3578.80 and Firefox Version 60.0.2. AngularJS Version 1.7.2 and Protractor Version 5.4.0. I believe the problem is somewhere in my code, so here below I provided an example of an existing code base.
Here is my protractor config
exports.config = {
rootElement: '[ng-app="myapp"]',
framework: 'jasmine',
seleniumAddress: 'http://localhost:4444/wd/hub',
specs: ['./e2e/**/*protractor.js'],
SELENIUM_PROMISE_MANAGER: false,
baseUrl: 'https://localhost/',
allScriptsTimeout: 20000,
jasmineNodeOpts: {
defaultTimeoutInterval: 100000,
},
capabilities: {
browserName: 'firefox',
marionette: true,
acceptInsecureCerts: true,
'moz:firefoxOptions': {
args: ['--headless'],
},
},
}
And here capabilities for chrome browser
capabilities: {
browserName: 'chrome',
chromeOptions: {
args: [ "--headless", "--disable-gpu", "--window-size=1920,1080" ]
}
},
And finally, my test kit that failed a few times
const InsurerViewDriver = require('./insurer-view.driver');
const InsurerRefundDriver = require('./insurer-refund.driver');
const { PageDriver } = require('#utils/page');
const { NotificationsDriver } = require('#utils/service');
const moment = require('moment');
describe(InsurerViewDriver.pageUrl, () => {
beforeAll(async () => {
await InsurerViewDriver.goToPage();
});
it('- should test "Delete" button', async () => {
await InsurerViewDriver.clickDelete();
await NotificationsDriver.toBeShown('success');
await PageDriver.userToBeNavigated('#/setup/insurers');
await InsurerViewDriver.goToPage();
});
describe('Should test Refunds section', () => {
it('- should test refund list content', async () => {
expect(await InsurerRefundDriver.getTitle()).toEqual('REFUNDS');
const refunds = InsurerRefundDriver.getRefunds();
expect(await refunds.count()).toBe(1);
const firstRow = refunds.get(0);
expect(await firstRow.element(by.binding('item.name')).getText()).toEqual('Direct');
expect(await firstRow.element(by.binding('item.amount')).getText()).toEqual('$ 50.00');
expect(await firstRow.element(by.binding('item.number')).getText()).toEqual('');
expect(await firstRow.element(by.binding('item.date')).getText()).toEqual(moment().format('MMMM DD YYYY'));
});
it('- should test add refund action', async () => {
await InsurerRefundDriver.openNewRefundForm();
const NewRefundFormDriver = InsurerRefundDriver.getNewRefundForm();
await NewRefundFormDriver.setPayment(`#555555, ${moment().format('MMMM DD YYYY')} (amount: $2,000, rest: $1,500)`);
await NewRefundFormDriver.setPaymentMethod('Credit Card');
expect(await NewRefundFormDriver.getAmount()).toEqual('0');
await NewRefundFormDriver.setAmount(200.05);
await NewRefundFormDriver.setAuthorization('qwerty');
await NewRefundFormDriver.submit();
await NotificationsDriver.toBeShown('success');
const interactions = InsurerRefundDriver.getRefunds();
expect(await interactions.count()).toBe(2);
expect(await InsurerViewDriver.getInsurerTitleValue('Balance:')).toEqual('Balance: $ 2,200.05');
expect(await InsurerViewDriver.getInsurerTitleValue('Wallet:')).toEqual('Wallet: $ 4,799.95');
});
});
});
And here some functions from driver's, that I'm referencing in the test above
// PageDriver.userToBeNavigated
this.userToBeNavigated = async function(url) {
return await browser.wait(
protractor.ExpectedConditions.urlContains(url),
5000,
`Expectation failed - user to be navigated to "${url}"`
);
};
this.pageUrl = '#/insurer/33';
// InsurerViewDriver.goToPage
this.goToPage = async () => {
await browser.get(this.pageUrl);
};
// InsurerViewDriver.clickDelete()
this.clickDelete = async () => {
await $('[ng-click="$ctrl.removeInsurer()"]').click();
await DialogDriver.toBeShown('Are you sure you want to remove this entry?');
await DialogDriver.confirm();
};
// NotificationsDriver.toBeShown
this.toBeShown = async (type, text) => {
const awaitSeconds = 6;
return await browser.wait(
protractor.ExpectedConditions.presenceOf(
text ? element(by.cssContainingText('.toast-message', text)) : $(`.toast-${type}`)
),
awaitSeconds * 1000,
`${type} notification should be shown within ${awaitSeconds} sec`
);
}
// InsurerRefundDriver.getRefunds()
this.getRefunds = () => $('list-refunds-component').all(by.repeater('item in $data'));
// InsurerViewDriver.getInsurerTitleValue
this.getInsurerTitleValue = async (text) => {
return await element(by.cssContainingText('header-content p', text)).getText();
};
I can't upload the whole code here to give you better understanding because I have a lot of code till this moment, but the code provided above is the exact sample of approach I'm using everywhere, does anyone see a problem in my code? Thanks.
First of all add this block before exporting your config
process.on("unhandledRejection", ({message}) => {
console.log("\x1b[36m%s\x1b[0m", `Unhandled rejection: ${message}`);
});
this essentially colorfully logs to the console if you missed async/await anywhere, and it'll give confidence that you didn't miss anything.
Second, I would install "protractor-console" plugin, to make sure there is no errors/rejections in the browser console (i.e. exclude possibility of issues from your app side) and add to your config
plugins: [{
package: "protractor-console",
logLevels: [ "severe" ]
}]
Then the next problem that I would expect with these signs is incorrect waiting functions. Ideally you have to test them separately as you develop your e2e project, but since it's all written already I'll tell you how I debugged them. Note, this approach won't probably help you if your actions are less than a sec (i.e. you can't notice them). Otherwise follow this chain.
1) I created run configuration in WebStorm, as described in my comment here (find mine) How to debug angular protractor tests in WebStorm
2) Set a breakpoint in the first line of the test I want to debug
3) Then execute your test line by line, using the created run config.
When you start debugging process, webstorm opens up a panel with three sections: frames, console, variables. When the variables section has a message connected to localhost and no variables listed, this means your step is still being executed. Once loading completed you can see all your variables and you can execute next command. So the main principle here is you click Step Over button and watch for variables section. IF VARIABLES APPEAR BEFORE THE APPS LOADING COMPLETED (the waiting method executed, but the app is still loading, which is wrong) then you need to work on this method. By going this way I identified a lot of gaps in my custom waiting methods.
And finally if this doesn't work, please attach stack trace of your errors and ping me
I'm concerned about this code snippet
describe(InsurerViewDriver.pageUrl, () => {
beforeAll(async () => {
await InsurerViewDriver.goToPage();
});
it('- should test "Delete" button', async () => {
await InsurerViewDriver.clickDelete();
await NotificationsDriver.toBeShown('success');
await PageDriver.userToBeNavigated('#/setup/insurers');
await InsurerViewDriver.goToPage(); // WHY IS THIS HERE?
});
describe('Should test Refunds section', () => {
it('- should test refund list content', async () => {
// DOESN'T THIS NEED SOME SETUP?
expect(await InsurerRefundDriver.getTitle()).toEqual('REFUNDS');
// <truncated>
You should not depend on the first it clause to set up the suite below it. You didn't post the code for InsurerRefundDriver.getTitle() but if that code does not send the browser to the correct URL and then wait for the page to finish loading, it is a problem. You should probably have await InsurerViewDriver.goToPage(); in a beforeEach clause.
After some time research I found what was the problem. The cause was the way I'm navigate through the app.
this.goToPage = async () => {
await browser.get(this.pageUrl);
};
Turns out, that browser.get method is being resolved when url changed, but now when angularjs done compile. I used the same approach in every test kit, that's why my tests were failing inconsistently, sometimes page was not fully loaded before test start.
So here is an approach that did the trick
this.goToPage = async () => {
await browser.get(this.pageUrl);
await browser.wait(EC.presenceOf(`some important element`), 5000, 'Element did not appear after route change');
};
You should ensure that page done all the compiling job before moving on.
It seems this could be due to asynchronous javascript.
browser.ignoreSynchronization = true; has a global effect for all your tests. you may have to set it back to false, so protractor waits for angular to be finished rendering the page. e.g. in or before your second beforeEach function

Firebase storage failing silently?

I'm trying to get the download url for multiple images, then trigger a change in my app. But... if one of those images doesn't exist for whatever reason, everything fails silently.
Here's the code:
const promises = [];
snapshot.forEach(childSnapshot => {
const child = childSnapshot.val();
const promise = firebase.storage()
.ref(child.songImagePath)
.getDownloadURL()
.catch(err => {
console.log('caught', err);
return "";
})
.then(imageURL => {
return imageURL;
});
promises.push(promise);
});
Promise.all(promises)
.catch(err => {
console.log('caught', err);
})
.then(urls => {
...do something with urls array
});
I'm using child.songImagePath in my database to store the image's location in storage. If ALL paths for ALL images have images, everything works perfectly.
BUT if an upload went awry or for some reason there's no image in the storage location, it fails silently. None of my catches fire. And Promise.all is never resolved.
What's going on here? Is there a way to check for a file's existence before calling getDownloadURL?
EDIT: As #mjr points out, in the documentation they've formatted their error callback slightly differently than I have. This also seems to never fire an error, though:
.then(
imageURL => {
return imageURL;
},
err => {
console.log('caught', err);
return "";
}
);
Firebase Storage JS dev here.
I ran your code with minor changes[1] in Chrome and React Native, and didn't see that behavior.
I see Promise.all always resolving (never failing), with an empty string in the array for invalid files. This is because your .catch handler for getDownloadURL returns an empty string.
For further troubleshooting, it would be useful to know:
version of the firebase JS library you are using
the browser/environment and version
network logs, for example from the network panel in Chrome's dev tools, or similar for other browsers
The firebase-talk Google Group tends to be a better place for open-ended troubleshooting with more back-and-forth.
[1] For reference, here's my code:
const promises = [];
// Swap out the array to test different scenarios
// None of the files exist.
//const arr = ['nofile1', 'nofile2', 'nofile3'];
// All of the files exist.
const arr = ['legitfile1', 'legitfile2', 'legitfile3'];
// Some, but not all, of the files exist.
//const arr = ['legitfile1', 'nofile2', 'nofile3'];
arr.forEach(val => {
  const promise = firebase.storage()
    .ref(val)
    .getDownloadURL()
    .catch(err => {
// This runs for nonexistent files
      console.log('caught', err);
      return "";
    })
    .then(imageURL => {
// This runs for existing files
      return imageURL;
    });
  promises.push(promise);
});
Promise.all(promises)
  .catch(err => {
// This never runs
    console.log('caught', err);
  })
  .then(urls => {
// This always runs
    console.log('urls', urls);
  });

Categories

Resources