I have a webpage with file upload functionality. The upload is performed in 5MB chunks. I want to calculate hash for each chunk before sending it to the server. The chunks are represented by Blob objects. In order to calculate the hash I am reading such blob into an ArrayBuffer using a native FileReader. Here is the code:
var reader = new FileReader();
var getHash = function (blob, callback) {
reader.onloadend = function (e) {
var hash = util.hash(e.target.result);
callback(hash);
};
reader.readAsArrayBuffer(blob);
}
var processChunk = function (chunk) {
if (chunk) {
getHash(chunk, function (hash) {
util.sendToServer(chunk, hash, function() {
// this callback is called when chunk upload is finished
processChunk(chunks.shift());
});
});
}
}
var chunks = file.splitIntoChunks(); // gets an array of blobs
processChunk(chunks.shift());
The problem: using the FileReader.readAsArrayBuffer seems to eat up a lot of memory which is not released. So far I tested with a 5GB file on following browsers:
Chrome 55.0.2883.87 m (64-bit): the memory goes up to 1-2GB quickly and oscillates around that. Sometimes it goes all the way up and browser tab crashes. It can use more memory than the size of read chunks. E.g. after reading 500MB of chunks the process already uses 700MB of memory.
Firefox 50.1.0: memory usage oscillates around 300-600MB
Code adjustments I have tried - all to no avail:
re-using the same FileReader instance for all chunks (as suggested in this question)
creating new FileReader for each chunk
adding timeout before starting new chunk
setting the FileReader and the ArrayBuffer to null after each read
The question: is there a way to fix the problem? Is this a bug in the FileReader implementations or am I doing something wrong?
EDIT: Here is a JSFiddle https://jsfiddle.net/andy250/pjt9udeu/
This is a bug in Chrome on Windows. It is reported here: https://bugs.chromium.org/p/chromium/issues/detail?id=674903
Related
I am trying to access the first few lines of text files using the FileApi in JavaScript.
In order to do so, I slice an arbitrary number of bytes from the beginning of the file and hand the blob over to the FileReader.
For large files this takes very long, even though, my understanding currently is that only the first few bytes of the file need to be accessed.
Is there some implementation in the background that requires the whole file to be accessed before it can be sliced?
Does it depend on the browser implementation of the FileApi?
I currently have tested in both Chrome and Edge (chromium).
Analysis in Chrome using the performance dev tools shows a lot of idle time before the reader.onloadend and no increase in ram usage. This might be however, because the FileApi is implemented in the Browser itself and does not reflect in the JavaScript performance statistics.
My implementation of the FileReader looks something like this:
const reader = new FileReader();
reader.onloadend = (evt) => {
if (evt.target.readyState == FileReader.DONE) {
console.log(evt.target.result.toString());
}
};
// Slice first 10240 bytes of the file
var blob = files.item(0).slice(0, 1024 * 10);
// Start reading the sliced blob
reader.readAsBinaryString(blob);
This works fine but as described performs quite underwhelmingly for large files. I tried it for 10kb, 100mb and 6gb. The time until the first 10kb are logged seems to correlate directly to the file size.
Any suggestions on how to improve performance for reading the beginning of a file?
Edit:
Using Response and DOM streams as suggested by #BenjaminGruenbaum does sadly not improve the read performance.
var dest = newWritableStream({
write(str) {
console.log(str);
},
});
var blob = files.item(0).slice(0, 1024 * 10);
(blob.stream ? blob.stream() : newResponse(blob).body)
// Decode the binary-encoded response to string
.pipeThrough(newTextDecoderStream())
.pipeTo(dest)
.then(() => {
console.log('done');
});
how about this!!
function readFirstBytes(file, n) {
return new Promise((resolve, reject) => {
const reader = new FileReader();
reader.onload = () => {
resolve(reader.result);
};
reader.onerror = reject;
reader.readAsArrayBuffer(file.slice(0, n));
});
}
readFirstBytes('file', 10).then(buffer => {
console.log(buffer);
});
I'm using the following approach in order to preview images before uploading them:
$("#file").change(function() {
var reader = new FileReader();
reader.readAsArrayBuffer(this.files[0]);
var fileName = this.files[0].name;
var fileType = this.files[0].type;
alert(fileType)
reader.onloadend = function() {
var base64Image = btoa(String.fromCharCode.apply(null, new Uint8Array(this.result)));
// I show the image now and convert the data to base 64
}
}
I have noticed that when the image is large, the method fails and I cannot preview the image.
I am unsure if the problem is due to base64 conversion or the FileReader.
Is there any setting to increase the max size, or is there any work around?
Here is the error message thrown in the console :
Uncaught RangeError: Maximum call stack size exceeded
at FileReader.reader.onloadend
Your problem is that you use Function.apply which will convert your Typed Array items to arguments to the String.fromCharCode method.
Functions have a maximum arguments length limit.
To avoid this, when dealing with large files, the best way is to not process it at all.
If you need to send the file to your server, simply send the Blob directly, this can be easily achieved with the FormData API.
If you need to display the file i.e in HTML media element, then use URL.createObjectURL(yourFile) method.
And if you really need a dataURI version of the file, then use reader.readAsDataURL(yourFile) method.
Works for me:
var reader = new FileReader();
reader.onload = function (evt) {
var binary = '';
var bytes = new Uint8Array(reader.result);
var len = bytes.byteLength;
for (var i = 0; i < len; i++) {
binary += String.fromCharCode(bytes[i]);
}
console.log(btoa(binary))
}
reader.readAsArrayBuffer(file)
If you read the file using the FileReader, the whole file will be loaded into the memory. If you'd like handle large files, this will simply result in your web browser crashing right away. If you are really interested in passing your file as a Base64 String, I recommend you to add file size constraints in order to prevent any potential problems. As a conclusion, none of the methods of the FileReader class would be suitable for this purpose unless and again unless you are dealing with small files not larger than 100MG or so, otherwise you will run into problems.
After playing around here's the solution:
$("#file").change(function () {
var reader = new FileReader();
reader.readAsBinaryString(this.files[0]);
var fileName = this.files[0].name;
var fileType = this.files[0].type;
alert(fileType)
reader.onloadend = function () {
var base64Image = btoa(this.result);
}
}
I've encountered a problem with FileReader and reading quite big Blobs.
const size = 50; //MB
const blob = new Blob([new ArrayBuffer(size*1024*1024)], {type: 'application/octet-string'});
console.log(blob.size);
const reader = new FileReader();
reader.onload = function(e) {
console.log(new Uint8Array(e.target.result));
};
reader.readAsArrayBuffer(blob.slice(0, 1024));
https://jsfiddle.net/aas8gmo2/
The example above shows that onload function is not called every time (if it is, increase size of the Blob to 100/200/300 MB). The problem is reproducible only under Chrome (tested under 53.0.2785.143)
Any hints what could be wrong?
Last time I used Chrome, there was a hard cap of around 500mb for a single blob size.
Also, given these threads: https://bugs.chromium.org/p/chromium/issues/detail?id=375297 and https://github.com/streamproc/MediaStreamRecorder/issues/86
It appears that memory is not properly cleared when creating several small blobs and you might need to reload the page to be able to go on.
(That would also explain why several tries might be needed on the JSFiddle).
So for now, deceiving answer, but it seems you're gonna have to find a workaround...or dive into Chrome's source code.
I have an XHR object that downloads 1GB file.
function getFile(callback)
{
var xhr = new XMLHttpRequest();
xhr.onload = function () {
if (xhr.status == 200) {
callback.apply(xhr);
}else{
console.log("Request error: " + xhr.statusText);
}
};
xhr.open('GET', 'download', true);
xhr.onprogress = updateProgress;
xhr.responseType = "arraybuffer";
xhr.send();
}
But the File API can't load all that into memory even from a worker
it throws out of memory...
btn.addEventListener('click', function() {
getFile(function() {
var worker = new Worker("js/saving.worker.js");
worker.onmessage = function(e) {
saveAs(e.data); // FileSaver.js it creates URL from blob... but its too large
};
worker.postMessage(this.response);
});
});
Web Worker
onmessage = function (e) {
var view = new DataView(e.data, 0);
var file = new File([view], 'file.zip', {type: "application/zip"});
postMessage('file');
};
I'm not trying to compress the file, this file is already compressed from server.
I thought storing it first on indexedDB but i i'll have to load blob or file anyway, even if i do request by range bytes, soon or late i will have to build this giant blob..
I want to create blob: url and send it to user after been downloaded by browser
I'll use FileSystem API for Google Chrome, but i want make something for firefox, i looked into File Handle Api but nothing...
Do i have to build an extension for firefox, in order to do the same thing as FileSystem does for google chrome?
Ubuntu 32 bits
Loading 1gb+ with ajax isn't convenient just for monitoring download progress and filling up the memory.
Instead I would just send the file with a Content-Disposition header to save the file.
There are however ways to go around it to monitor the progress. Option one is to have a second websocket that signals how much you have downloaded while you are downloading normally with a get request. the other option will be described later in the bottom
I know you talked about using Blinks sandboxed filesystem in the conversation. but it has some drawbacks. It may need permission if using persistent storage. It only allows 20% of the available disk that are left. And if chrome needs to free some space then it will throw away any others domains temporary storage that was last used for the most recent file. Beside it doesn't work in private mode.
Not to mention that it has been dropping support for it and may never end up in other browsers - but they will most likely not remove it since many sites still depend on it
The only way to process this large file is with streams. That is why I have created a StreamSaver. This is only going to work in Blink (chrome & opera) ATM but it will eventually be supported by other browsers with the whatwg spec to back it up as a standard.
fetch(url).then(res => {
// One idea is to get the filename from Content-Disposition header...
const size = ~~res.headers.get('Content-Length')
const fileStream = streamSaver.createWriteStream('filename.zip', size)
const writeStream = fileStream.getWriter()
// Later you will be able to just simply do
// res.body.pipeTo(fileStream)
// instead of pumping
const reader = res.body.getReader()
const pump = () => reader.read()
.then(({ value, done }) => {
// here you know how large the value (chunk) is and you can
// figure out the download speed/progress when comparing it to the size
return done
? writeStream.close()
: writeStream.write(value).then(pump)
)
// Start the reader
pump().then(() =>
console.log('Closed the stream, Done writing')
)
})
This will not take up any memory
I have a theory that is if you split the file into chunks and store them in the indexedDB and then later merge them together it will work
A blob isn't made of data... it's more like pointers to where a file can be read from
Meaning if you store them in indexedDB and then do something like this (using FileSaver or alternative)
finalBlob = new Blob([blob_A_fromDB, blob_B_fromDB])
saveAs(finalBlob, 'filename.zip')
But i can't confirm this since i haven't tested it, would be good if someone else could
Blob is cool until you want to download a large file, there is a 600MB limit(chrome) for blob since it stores everything in memory.
I'm trying to read and write raw data from files using Mozilla's add-on SDK. Currently I'm reading data with something like:
function readnsIFile(fileName, callback){
var nsiFile = new FileUtils.File(fileName);
NetUtil.asyncFetch(nsiFile, function (inputStream, status) {
var data = NetUtil.readInputStreamToString(inputStream, inputStream.available(),{charset:"UTF-8"});
callback(data, status, nsiFile);
});
}
This works for text files, but when I start messing with raw bytes outside of Unicode's normal range, it doesn't work. For example, if a file contains the byte 0xff, then that byte and anything past that byte isn't read at all. Is there any way to read (and write) raw data using the SDK?
You've specified an explicit charset in the options to NetUtil.readInputStream.
When you omit the charset option, then the data will be read as raw bytes. (Source)
function readnsIFile(fileName, callback){
var nsiFile = new FileUtils.File(fileName);
NetUtil.asyncFetch(nsiFile, function (inputStream, status) {
// Do not specify a charset at all!
var data = NetUtil.readInputStreamToString(inputStream, inputStream.available());
callback(data, status, nsiFile);
});
}
The suggestion to use io/byte-streams is OK as well, but keep in mind that that SDK module is still marked experimental, and that using ByteReader via io/file as the example suggests is not a good idea because this would be sync I/O on the main thread.
I don't really see the upside, as you'd use NetUtil anyway.
Anyway, this should work:
const {ByteReader} = require("sdk/io/byte-streams");
function readnsIFile(fileName, callback){
var nsiFile = new FileUtils.File(fileName);
NetUtil.asyncFetch(nsiFile, function (inputStream, status) {
var reader = new ByteReader(inputStream);
var data = reader.read(inputStream);
reader.close();
callback(data, status, nsiFile);
});
}
Also, please keep in mind that reading large files like this is problematic. Not only will the whole file buffered in memory, obviously, but:
The file is read as a char (byte) array first, so there will be a temporary buffer in the stream of at least file.size length (via asyncFetch).
Both NetUtil.readInputStreamToString and ByteReader will use another char (byte) array to read the result into from the inputStream, but ByteReader will do that in 32K chunks, while NetUtil.readInputStreamToString, will use a big buffer of file.length.
The data is then read into the resulting jschar/wchar_t (word) array aka. Javascript string, i.e. you need file.size * 2 bytes in memory at least.
E.g., reading a 1MB file would require more than fileSize * 4 = 4MB memory (NetUtil.readInputStreamToString) and/or more than fileSize * 3 = 3MB memory (ByteReader) during the read operation. After the operation, 2MB of that memory will be still alive to store the resulting data in a Javascript string.
Reading a 1MB file might be OK, but a 10MB file might be already problematic on mobile (Firefox for Android, Firefox OS) and a 100MB would be problematic even on desktop.
You can also read the data directly into an ArrayBuffer (or Uint8Array), which has more efficient storage for byte arrays than a Javascript string and avoid the temporary buffers of NetUtil.readInputStreamToString and/or ByteReader.
function readnsIFile(fileName, callback){
var nsiFile = new FileUtils.File(fileName);
NetUtil.asyncFetch(nsiFile, function (inputStream, status) {
var bs = Cc["#mozilla.org/binaryinputstream;1"].
createInstance(Ci.nsIBinaryInputStream);
bs.setInputStream(inputStream);
var len = inputStream.available();
var data = new Uint8Array(len);
reader.readArrayBuffer(len, data.buffer);
bs.close();
callback(data, status, nsiFile);
});
}
PS: The MDN documentation might state something about "iso-8859-1" being the default if the charset option is omitted in the NetUtil.readInputStreamToString call, but the documentation is wrong. I'll fix it.