I am trying to access the first few lines of text files using the FileApi in JavaScript.
In order to do so, I slice an arbitrary number of bytes from the beginning of the file and hand the blob over to the FileReader.
For large files this takes very long, even though, my understanding currently is that only the first few bytes of the file need to be accessed.
Is there some implementation in the background that requires the whole file to be accessed before it can be sliced?
Does it depend on the browser implementation of the FileApi?
I currently have tested in both Chrome and Edge (chromium).
Analysis in Chrome using the performance dev tools shows a lot of idle time before the reader.onloadend and no increase in ram usage. This might be however, because the FileApi is implemented in the Browser itself and does not reflect in the JavaScript performance statistics.
My implementation of the FileReader looks something like this:
const reader = new FileReader();
reader.onloadend = (evt) => {
if (evt.target.readyState == FileReader.DONE) {
console.log(evt.target.result.toString());
}
};
// Slice first 10240 bytes of the file
var blob = files.item(0).slice(0, 1024 * 10);
// Start reading the sliced blob
reader.readAsBinaryString(blob);
This works fine but as described performs quite underwhelmingly for large files. I tried it for 10kb, 100mb and 6gb. The time until the first 10kb are logged seems to correlate directly to the file size.
Any suggestions on how to improve performance for reading the beginning of a file?
Edit:
Using Response and DOM streams as suggested by #BenjaminGruenbaum does sadly not improve the read performance.
var dest = newWritableStream({
write(str) {
console.log(str);
},
});
var blob = files.item(0).slice(0, 1024 * 10);
(blob.stream ? blob.stream() : newResponse(blob).body)
// Decode the binary-encoded response to string
.pipeThrough(newTextDecoderStream())
.pipeTo(dest)
.then(() => {
console.log('done');
});
how about this!!
function readFirstBytes(file, n) {
return new Promise((resolve, reject) => {
const reader = new FileReader();
reader.onload = () => {
resolve(reader.result);
};
reader.onerror = reject;
reader.readAsArrayBuffer(file.slice(0, n));
});
}
readFirstBytes('file', 10).then(buffer => {
console.log(buffer);
});
Related
I am writing frontend Angular application, where user should be able to upload MS Word template files (.dotx) and later download them on demand. Due to file size limitations I have to upload files as byte array and files will be stored and downloaded in this format. Approach, described below works fine with .txt files, however with MS Word files it does not work. Downloaded MS Word file has issues with encoding and is not readable.
Part 1: Uploading file
I use to get file from user:
<input type="file" class="file-input" (change)="onFileSelected($event)">
Reading file to byte array ("uploading file"):
onFileSelected($event) {
const file = $event.target.files[0] as File;
const reader = new FileReader();
const fileByteArray = [];
reader.readAsArrayBuffer(file);
reader.onloadend = (evt) => {
if (evt.target.readyState === FileReader.DONE) {
const arrayBuffer = evt.target.result as ArrayBuffer;
const array = new Uint8Array(arrayBuffer);
for (const a of array) {
fileByteArray.push(a);
}
this.byteArray = fileByteArray;
}
};
}
Downloading file:
downloadFile() {
const bytesView = new Uint8Array(this.byteArray);
const str = new TextDecoder('windows-1252').decode(bytesView); //windows-1252 encoding used here for cirilic
const file = new Blob([str], {type: 'application/vnd.openxmlformats-officedocument.wordprocessingml.template'});
saveAs(file, 'test.dotx');
}
Did not find similar issues in Internet, I guess it is not standard way of dealing with files. Is it ever possible to process MS Word files as byte array with JS?
Another important question - is it safe to upload a file as an array of bytes, saved in database?
Appreciate any help.
I successfully passed an image as binary string out of puppeteers page.evaluate() function back to node.js using:
async function getBinaryString(url) {
return new Promise(async (resolve, reject) => {
const reader = new FileReader();
const response = await window.fetch(url)
const data = await response.blob();
reader.readAsBinaryString(data);
reader.onload = () => resolve(reader.result);
reader.onerror = () => reject('Error occurred while reading binary string');
});
}
I am able to save it with:
fs.writeFileSync(“image.png”, new Buffer.from(binaryString, "binary"), function (err) { });
But now I wish to convert this PNG image to base64 without saving it to file first because I will upload it to a different server. If I save it to file, I can do the following:
function base64Encode(file) {
const bitmap = fs.readFileSync(file);
return new Buffer.from(bitmap).toString('base64');
}
How do I skip the file saving part and get proper base64 data for my PNG?
I tried to pass binary string to new Buffer.from(binaryString).toString('base64') but I was unable to save it as a working PNG.
This doesn’t really warrant answering my own question but #Jacob reminded me that I forgot to try:
new Buffer.from(binaryString, 'binary').toString('base64');
with a "binary" parameter, which solved the issue and PNG was correctly formatted again when going from base64 to file or to image in a browser.
Maybe the code in question can be reused by some other puppeteer user, it took me a while to come up with and find pieces across the web.
I have a webpage with file upload functionality. The upload is performed in 5MB chunks. I want to calculate hash for each chunk before sending it to the server. The chunks are represented by Blob objects. In order to calculate the hash I am reading such blob into an ArrayBuffer using a native FileReader. Here is the code:
var reader = new FileReader();
var getHash = function (blob, callback) {
reader.onloadend = function (e) {
var hash = util.hash(e.target.result);
callback(hash);
};
reader.readAsArrayBuffer(blob);
}
var processChunk = function (chunk) {
if (chunk) {
getHash(chunk, function (hash) {
util.sendToServer(chunk, hash, function() {
// this callback is called when chunk upload is finished
processChunk(chunks.shift());
});
});
}
}
var chunks = file.splitIntoChunks(); // gets an array of blobs
processChunk(chunks.shift());
The problem: using the FileReader.readAsArrayBuffer seems to eat up a lot of memory which is not released. So far I tested with a 5GB file on following browsers:
Chrome 55.0.2883.87 m (64-bit): the memory goes up to 1-2GB quickly and oscillates around that. Sometimes it goes all the way up and browser tab crashes. It can use more memory than the size of read chunks. E.g. after reading 500MB of chunks the process already uses 700MB of memory.
Firefox 50.1.0: memory usage oscillates around 300-600MB
Code adjustments I have tried - all to no avail:
re-using the same FileReader instance for all chunks (as suggested in this question)
creating new FileReader for each chunk
adding timeout before starting new chunk
setting the FileReader and the ArrayBuffer to null after each read
The question: is there a way to fix the problem? Is this a bug in the FileReader implementations or am I doing something wrong?
EDIT: Here is a JSFiddle https://jsfiddle.net/andy250/pjt9udeu/
This is a bug in Chrome on Windows. It is reported here: https://bugs.chromium.org/p/chromium/issues/detail?id=674903
I'm using the following approach in order to preview images before uploading them:
$("#file").change(function() {
var reader = new FileReader();
reader.readAsArrayBuffer(this.files[0]);
var fileName = this.files[0].name;
var fileType = this.files[0].type;
alert(fileType)
reader.onloadend = function() {
var base64Image = btoa(String.fromCharCode.apply(null, new Uint8Array(this.result)));
// I show the image now and convert the data to base 64
}
}
I have noticed that when the image is large, the method fails and I cannot preview the image.
I am unsure if the problem is due to base64 conversion or the FileReader.
Is there any setting to increase the max size, or is there any work around?
Here is the error message thrown in the console :
Uncaught RangeError: Maximum call stack size exceeded
at FileReader.reader.onloadend
Your problem is that you use Function.apply which will convert your Typed Array items to arguments to the String.fromCharCode method.
Functions have a maximum arguments length limit.
To avoid this, when dealing with large files, the best way is to not process it at all.
If you need to send the file to your server, simply send the Blob directly, this can be easily achieved with the FormData API.
If you need to display the file i.e in HTML media element, then use URL.createObjectURL(yourFile) method.
And if you really need a dataURI version of the file, then use reader.readAsDataURL(yourFile) method.
Works for me:
var reader = new FileReader();
reader.onload = function (evt) {
var binary = '';
var bytes = new Uint8Array(reader.result);
var len = bytes.byteLength;
for (var i = 0; i < len; i++) {
binary += String.fromCharCode(bytes[i]);
}
console.log(btoa(binary))
}
reader.readAsArrayBuffer(file)
If you read the file using the FileReader, the whole file will be loaded into the memory. If you'd like handle large files, this will simply result in your web browser crashing right away. If you are really interested in passing your file as a Base64 String, I recommend you to add file size constraints in order to prevent any potential problems. As a conclusion, none of the methods of the FileReader class would be suitable for this purpose unless and again unless you are dealing with small files not larger than 100MG or so, otherwise you will run into problems.
After playing around here's the solution:
$("#file").change(function () {
var reader = new FileReader();
reader.readAsBinaryString(this.files[0]);
var fileName = this.files[0].name;
var fileType = this.files[0].type;
alert(fileType)
reader.onloadend = function () {
var base64Image = btoa(this.result);
}
}
I get the original code from here: Using Javascript FileReader with huge files
But my purpose is different, the author wants to get just a part of the whole but I want them all.
I'm trying modify it with loop, mixed with this technique: slice large file into chunks and upload using ajax and html5 FileReader
All fails, is there anyway I can get what I want.
var getSource = function(file) {
var reader = new FileReader();
reader.onload = function(e) {
if (e.target.readyState == FileReader.DONE) {
process(e.target.result);
}
};
var part = file.slice(0, 1024*1024);
reader.readAsBinaryString(part);
};
function process(data) {
// data processes here
}
Thank you,