High memory usage when uploading and manipulating many files

High memory usage when uploading and manipulating many files - javascript

I am trying to use FineUploader to upload a large amount of files. I also need to manipulate the files before uploading them - namely I need to anonymize some identifying information. In another answer, Ray Nicholus suggested to reject the original file in the onSubmit handler and then re-add the manipulated file. So my onSubmit handler looks like so:
onSubmit: function (id, name)
{
var file = this.getFile(id)
if (file.isAnonymized)
{
return;
}
var reader = new FileReader()
reader.onload = function()
{
var arrayBuffer = this.result
var byteArray = new Uint8Array(arrayBuffer)
// Manipulate the byteArray in some way...
var blob = new window.Blob([byteArray])
blob.isAnonymized = true
// add the anonymized file instead
uploader.addFiles({blob: blob, name: name})
}
reader.readAsArrayBuffer(file)
// cancel the original file
return false
},
This works fine for a small amount of files. In a concrete example, a customer tried to upload ~1.500 files of 3MB each in Firefox, and saw Firefox's memory usage spike through the roof before the tab eventually crashed. Other browsers (Chrome, Edge) exhibit similar behavior. Using the browser's developer tools doesn't seem to show any large memory allocations. There are no problems when simply uploading the files as-is, but that's not an option.

I cleaned up the example at https://github.com/sgarcialaguna-mms/file-upload-memory/ somewhat and am now confident that the error is due to the fineuploader library holding on to blobs longer than needed.
The example now loads one file into memory at a time, then passes the blob to the upload library. I also now use an actual server (the Django example from the fineuploader server-examples repository).
With Firefox, when I drag in ~1GB of files, Firefox's memory usage steadily rises during the upload and stays high even after the upload is completed. I can open about:memory, click "Minimize memory usage" to trigger a garbage collection, press on "Measure" and the file data shows up under "memory-file-data/large". Call uploader.reset(), trigger a garbage collection again and Firefox's memory usage drops sharply. Measuring again shows the "memory-file-data/large" objects are no longer present in memory. As per https://github.com/FineUploader/fine-uploader/issues/1540#issuecomment-194201646, calling this._handler.expunge(id) after every upload works as well.
Chrome behaves a bit differently, due to a long-standing bug it eventually starts throwing ERR_FILE_NOT_FOUND errors once more than 500 MB of blob data accumulates. The chrome://blob-internals page shows which blobs are being held as well as their refcount.
I don't know if there is an easy way to tell which variable / closure / whatever is holding on to these objects, but it would help immensely.

Related

JSPDF Memory Issue

Me and my team facing one issue of memory in our application. The issue is whenever we trigger jsPDF instance its works but after that it's holding lot of memory and we facing performance issue because its never release that memory after completion of task. For reference i prepare one example and you can check with that.
https://stackblitz.com/edit/web-platform-vkewvw?file=index.html
So in this example you see my memory footprint its some around 36 MB you can see on first screen shotenter image description here
and after run the code its goes around 56MB and its not releasing the memory you see in the next screen shot.
enter image description here
Can any one help on that how we overcome that problem we tried with iframe itself by not working properly.
Your help is appreciable for us.

As raised in comment, altering the timing of revoking blob urls may improve memory handling in an application using JSPDF, but does not come with a guarantee to do so ...
The Blob Store
User agents maintain a blob ULR store that keeps a reference to Blob objects keyed by the urls for them returned from URL.createObjectURL(blob). Holding a reference in the store stops the blob object from being garbage collected from memory even if JavaScript code has not kept a reference to the blob object itself.
The blob object can be removed from the URL store by calling URL.revokeObjectURL(blobURL), after which the blob is eligible for garbage collection from memory provided no reference to it is reachable in JS.
Now jsPDF sets its global object to window in browsers' main JavaScript thread in globalObject.js, and imports the global object as _global in FileSaver.js.
Lines 85 and 86 of FileSaver.js define the module's saveAs export as
var saveAs =
_global.saveAs ||
... code for saving file
which implies you should be able to shim the saveAs function for experimental purposes by declaring a shim (using function saveAs or window.saveAs =) in file level script before including jsPDF.js in the HTML document.
Initially you could use the original saveAs code with console logs to demonstrate the shimming process works and that jsPDF still works. Things I would want to look at include
is jsPDF synchronous - meaning does it only return to the caller after clicking the link in saveAs to save the PDF file produced?
If it's potentially asynchronous, how to arrange a callback or promise resolution from the shim after clicking the link to prevent sequentially produced PDF files being processed in parallel.
Does reducing the time before revoking the blob's URL, currently set to 40 seconds for most browsers in line 188 of FileSave.js (linked above), materially affect performance of the application?
How well does the application run in Safari and ChromeIOS, which receive exceptional support in FileSave.js?
Synchronous Revocation of Blob URLs
There is some possibility that Blob URLs can be revoked synchronously after use. The following code (which doesn't work as a code snippet) creates a blob and downloads it:
function blobURL(string) {
const blob = new Blob(Array.from(string));
return URL.createObjectURL(blob); // see NOTE 1
}
const url = blobURL("hello folks and world");
console.log("2 seconds...");
setTimeout( saveBlob, 2000, url); // see NOTE 2
function saveBlob(url) {
const link = document.createElement('a');
link.setAttribute("download", "hello.txt");
link.href= url;
document.body.appendChild(link);
link.click();
URL.revokeObjectURL(url); // SEE NOTE 3
console.log("Call to link.click() has returned");
}
Note
The script does not keep a reference to the blob created
Memory garbage collection could (in theory) run during a timeout period.
The blob's url is revoked synchronously, after link.click(), before returning to the event loop.
Calling URL.revokeObjectURL() immediately after programatially clicking the link to download the blob did not affect the success of downloading in Firefox or Edge/webkit when tested. This implies that these browsers synchronously obtains a reference to the Blob instance (using Blob Store lookup) before returning from link.click().
This is consistent with the behavior of events programmatically dispatched on an element being processed synchronously (which I looked at recently in answer to "Do events bubble in microtasks?"). How safe it is to make use of this in production, however, is not something I am personally in a position to guarantee across all browsers.

Is there a way to stop Web Audio API decodeAudioData method memory leak?

The Problem
When creating audio buffers using the Web Audio API, there are buffers created by the decodeAudioData method, which reside in memory and are apparently not accessible through JavaScript. They seem to hang around for the entire life of a browser tab, and never get garbage collected.
Possible Reason For the Problem
I know that these buffers are separated from the main thread and set on another thread for asynchronous decoding. I also know that the API spec says that decodeAudioData should not be allowed to decode the same input buffer twice, which I assume is why a copy of the decoded buffer and/or the encoded input buffer are kept around. However, on memory limited devices like Chromecast, this causes huge amounts of memory to accumulate and Chromecast crashes.
Reproducibility
In my example code, I fetch an mp3 using Ajax and then pass the arraybuffer into the decodeAudioData function. Normally within that function there is a onsuccess callback which can take the decoded AudioBuffer as a parameter. But here in my code, I don't even pass that in. Therefore I also don't do anything with the decoded buffer after decoding it. It is not referenced anywhere within my code. It is entirely left in the native code. However, every call to this function increases the memory allocation and it is never released. For example, in Firefox about:memory shows the audiobuffers there for the life of the Tab. Non-reference should be sufficient for the garbage collector to get rid of these buffers.
My main question then is, is there any reference to these decoded audio buffers, say within the audiocontext object, or somewhere else that I can try to remove them from memory? Or is there any other way that I can cause these stored and unreachable buffers to disappear?
My question differs from all the others currently on SO regarding decodeAudioData because I show that the memory leak happens even without the user storing any reference or even using the returned decoded audio buffer.
Code To Reproduce
function loadBuffer() {
// create an audio context
var context = new (window.AudioContext || window.webkitAudioContext)();
// fetch mp3 as an arraybuffer async
var url = "beep.mp3";
var request = new XMLHttpRequest();
request.open("GET", url, true);
request.responseType = "arraybuffer";
request.onload = function () {
context.decodeAudioData(
request.response,
function () {// not even passing buffer into this function as a parameter
console.log("just got tiny beep file and did nothing with it, and yet there are audio buffers in memory that never seem to be released or gc'd");
},
function (error) {
console.error('decodeAudioData error', error);
}
);
};
request.onerror = function () {
console.log('error loading mp3');
}
request.send();
}
To anticipate some possible responses.
I must use Web Audio API because I am playing four part harmony from four audio files on Chromecast and the html audio element does not support multiple simultaneous playback on Chromecast.
Probably any JS library you may reference [e.g. Howler.js, Tone.js, Amplitude.js etc.] is built upon the Web Audio API, and so they will all share this memory leak problem.
I know that the WAA is implementation dependent on a per browser basis. My primary concern at the moment is Chromecast, but the problem exists for every browser I've tried.
Therefore, I think it is a spec related issue where the spec requires the non-dupe encoding rule, and so implementers keep copies of the buffer around on a browser level thread so they can check them against new xhr inputs. If the spec writer's happen to read my question, is there not a way that the user can have the option for this behavior, and opt out of it if they wish in order to prevent the internal buffer storage on mobile and thin memory platforms?
I have not been able to find any reference to these buffers in any JS object.
I know that I can audio_context.close() and then hope for garbage collection of all the resources held by the audio_context, and then hope that I can reinstantiate the audio_context with a new one, but that has not empirically been timely enough for my application. Chromecast crashes before GC takes out the trash.

Pragmatic Workaround
I have found a method to solve the problem of the Web Audio API audiobuffers handing around indefinitely and crashing Chromecast and other mobile platforms. [[ I have not tested this on all browsers - your mileage may vary. ]]
LOADING STAGE
Load the document using Web Audio API inside an iFrame.
Load your audio buffers and do whatever you do to play them.
CLEARING STAGE
Call sourceNode.stop on all of the playing nodes you have reference to.
Call source.disconnect(); on all source nodes.
Call gainNode.disconnect(); on all gain nodes those source nodes are associated with (and whatever other kind of WAA nodes you might be using that have a disconnect method)
Set all referenced gainNodes and sourceNodes to null;
Null out any buffers you have referenced both decoded and your xhr fetched encoded audiobuffers;
KEY: Within the WAA page call audio_context.close(); then set audio_context=null; (this can be done from the parent of the iFrame using contentWindow).
Note: Some of these nulling steps may not be absolutely necessary, however this approach has worked for me.
RE-LOADING STAGE
Reload the iframe from the parent page. This will cause all of the audiobuffers to be garbage collected ON THE NEXT GC ROUND, including the ones in the hidden (non JS) areas of memory.
Your iframe will have to reinstantiate the web audio context and load its buffers and create nodes etc. just as you did when you first loaded it.
Notes: You must decide when you are going to use this clearing method (e.g. after so many buffers have been loaded and played). You can do it without an iframe, but you may have to reload the page once or twice to get garbage collection to fire. This is a pragmatic workaround for those who need to load lots of Web Audio API audio buffers on memory thin platforms like Chromecast or other mobile devices.
FROM PARENT
function hack_memory_management() {
var frame_player = document.getElementById("castFrame");
//sample is the object which holds an audio_context
frame_player.contentWindow.sample.clearBuffers();
setTimeout(function () {
frame_player.contentWindow.location.reload();
}, 1000);
}
INSIDE WAA IFRAME
CrossfadeSample.prototype.clearBuffers = function () {
console.log("CLEARING ALL BUFFERS -IT'S UP TO GC NOW'");
// I have four of each thing because I am doing four part harmony
// these are the decoded audiobuffers used to be passed to the source nodes
this.soprano = null;
this.alto = null;
this.tenor = null;
this.bass = null;
if (this.ctl1) {
//these are the control handles which hold a source node and gain node
var offName = 'stop';
this.ctl1.source[offName](0);
this.ctl2.source[offName](0);
this.ctl3.source[offName](0);
this.ctl4.source[offName](0);
// MAX GARGABE COLLECTION PARANOIA
//disconnect all source nodes
this.ctl1.source.disconnect();
this.ctl2.source.disconnect();
this.ctl3.source.disconnect();
this.ctl4.source.disconnect();
//disconnect all gain nodes
this.ctl1.gainNode.disconnect();
this.ctl2.gainNode.disconnect();
this.ctl3.gainNode.disconnect();
this.ctl4.gainNode.disconnect();
// null out all source and gain nodes
this.ctl1.source = null;
this.ctl2.source = null;
this.ctl3.source = null;
this.ctl4.source = null;
this.ctl1.gainNode = null;
this.ctl2.gainNode = null;
this.ctl3.gainNode = null;
this.ctl4.gainNode = null;
}
// null out the controls
this.ctl1 = null;
this.ctl2 = null;
this.ctl3 = null;
this.ctl4 = null;
// close the audio context
if (this.audio_context) {
this.audio_context.close();
}
// null the audio context
this.audio_context = null;
};
Update:
Sadly, even this does not reliably work and Chromecast can still crash given a few clear and loads of new mp3s. See "My present solution" elsewhere on this page.

Can you maybe use multiple audio-tags on Chromecast when you route each of them into the Web Audio graph (by using a MediaElementAudioSourceNode)?

My present solution
I could not find a final satisfactory solution for Chromecast using the Web Audio API and simultaneous playback of four mp3s - used for four part harmony. The 2nd Gen seems to simply not have enough resources to hold the audiobuffers and simultaneously decode four mp3 files using decodeAudioData without leaving too much garbage around and eventually crashing. I decided to go with surikov's webaudiofont which is built on top of the Web Audio API, and to use midi files. I never had a problem on desktop browsers or other devices with more resources, but I have to have it work on Chromecast. I have no problems at all now using webaudiofont.

I was facing the same problem. What eventually worked for me was to disconnected and delete all connected resources:
if (this.source) {
this.source.disconnect()
delete this.source
}
if (this.gain) {
this.gain.disconnect()
delete this.gain
}
await this.audioContext.close()
delete this.audioContext
delete this.audioBuffer
Just closing the audioContext is not enough. It seems that references will continue to exist preventing garbage collection.

A lot of answers I have seen seem to overcomplicate this. I have run into this same issue while rebuilding an audio system for an application I'm building, but then I realised it previously was not an issue, this was because everytime I played a new audio I closed the previous AudioContext and used the variable it was referenced in for a new AudioContext.
This means that the only two things one has to do to clear this overly memory usage is to use AudioContext.close(), and remove references to it, disconnecting nodes and such is not required.

What is the per-record size limit of indexedDB?

I am building a file storage for HTML5, and I am using indexedDB as the storage, I ask the files from the server via xmlHttpRequest with the response type as arrayBuffer (for chrome) and blob (for other browsers).
Everything is fine even if the files-collection size is 500MB or more, (hey, it can even reach GB). But I noticed something strange when I add the file to the indexedDB, it will trigger error when the single file exceeds ~120MB, so it is not stored. But when the file is less than 120MB, it will store it.
Notice that it will only have this error when storing a single file > 120MB, for example, an .mp4 file of 200MB will trigger an error, but if I have 5 videos with each of them have a size of 100MB (so the total will be 500MB) it will be all fine.
I would like to know whether this is a limit-rule or some glitch and the two have the same error. I didn't find any documentation about it. I tested it in IE and Chrome.
EDIT:
Ok, I got this error apparently in the add or put function of indexedDB when storing the file:
inside the e.target.error.message:
The serialized value is too large (size=140989466 bytes, max=133169152 bytes)

At the time this question was asked, Chrome still didn't support saving Blobs to IndexedDB, it only came the next month.
For anyone facing the same issue nowadays, store Blobs or Files directly, not ArrayBuffers.
Contrarily to ArrayBuffers, saving a Blob to IDB doesn't require to serialize its data, the serialization steps of a Blob are just to make a snapshot of the js object and keep the link to the same underlying byte sequence, which itself is not cloned.
The only limit you should face would be the one of the db itself.
Code taken from Chrome's announcement:
var store = db.transaction(['entries'], 'readwrite').objectStore('entries');
// Store the object
var req = store.put(blob, 'blob');
req.onerror = function(e) {
console.log(e);
};
req.onsuccess = function(event) {
console.log('Successfully stored a blob as Blob.');
};

I think this is an issue with your browser's implementation of IndexedDB. I ran into this same error myself in Firefox, when I tried to store a 100 MB file into a IndexedDB record, but the identical code worked fine in Chrome. It seems different browsers have different implementation quirks and limits.
Personally, I suspect this is a bug in Firefox, since Firefox grants the requested size, but then prevents single-record usage of that entire size, whereas Chrome is more forgiving.

Browser throws error on creating an ObjectURL of an Image Blob after consuming lot of memory

Well, I'm running into a strange Error while programming a Web Application that receives Images from a Server via WebSockets. The Server sends about 8 images per second (.bmp) to the browser. Each image has a size of about 300KB. So that's around 2.4Mbps.
The browser receives the images as binary blob:
//WebSocket
var ws = new WebSocket("ws://192.168.0.10:1337");
//Image
var camImg = new Image();
ws.onmessage = function(msg)
{
var data = msg.data;
// handle binary messages from server
if (data instanceof Blob) camImg.src = window.URL.createObjectURL(data);
};
camImg.onload = function()
{
//draw image to canvas
canvasCont2D.drawImage(this,0,0);
//request next frame
ws.send("give me the next image!");
//delete ObjectURL
window.URL.revokeObjectURL(this.src);
};
So until this point everything runs fine. Now I'm coming for the first problem:
As I was testing this in Chrome I watched at the TaskManager to see how many resources this coding needs. I saw there one process of Chrome that started at about 90MB Memory. Each second there were add 2.4MB. So it looks like every image i receive stays in memory. Is there any possibility to prevent this? The received blobs seem to stay under resources in Chrome developer tools, btw.
Anyway this problem leads me to the second one: The memory consumption of this process rises and rises and after some time at about 400-500MB its kind of flushed and starts again at 90MB, again rising. So long, its just a memory problem. But sometimes it could happen, that the memory is not flushed and rises up to about 600MB. At this point I don't receive any new image. The console shows an error that says:
Failed to load resource: the server responded with a status of 404 (Not Found)
This error occurs in this line:
camImg.src = window.URL.createObjectURL(data);
At the moment I work around this issue by catching the error event:
camImg.onerror = function()
{
//request next frame anyway and wait for memory flush
ws.send("give me the next image!");
};
So I'm just requesting new images because after some time the memory gets flushed again (after a few seconds) and I can receive new Images.
The same problem(s) occure using Opera as well. I guess its mainly a problem with memory consumption. Maybe a bug in browsers? Or did I made a big programming error?
I would be very thankful for any help as I have no idea left, what could be causing this problem...

OS: Windows7 64bit
Chrome Version 35.0.1916.153 m
Chrome Version 38.0.2068.0 canary (64-bit) : (chrome://flags/#impl-side-painting setting makes no difference).
In a prototype I'm doing, I get exactly the same behaviour as this in chrome 35 and a recent canary build. Ok in IE and firefox. I'm running a localhost c++ websocket server about 10fps with 0.5MB images.
The chrome memory eventually goes up and something trashes the chrome too.
Moving forwards:
1) In image.onerror I call window.URL.revokeObjectURL(this.src); This seems to sort my memory leak out, but not the 404's.
2) When running under the F12 debugger things are so slow that I don't seem to get the problem. Thus on page I have 3 counters: 1) Blobs received count, 2) image.onload count and 3) image.onerror count.
After approx 900 successful loads I start getting load failures. then after maybe after 50 failures, I start getting successful loads again. This pattern keeps repeating, but the numbers seem random.( This all seems to smack of some GC related issue, but only a hunch based on experience).
3) I can fix (AKA 'bodge') this by changing ws.binaryType='arraybuffer'. I need a blob so I construct a new one based on a new Uint8Array(msg.data). Everything works fine, no load failures at all.
I'm making an unnecessary binary copy here, but it doesn't seem to make any noticeable speed difference. I'm not 100% sure what's going on here and how stable the fix is.
Most similar image loading examples on the internet don't have an onerror handler. Running such examples on my machine would result in a unexplainable memory leak. You wouldn't see the 404's unless under the debugger and lucky. There's lots of people on the internet complaining about memory leaks when loading images. Maybe it's related.
I'm going to raise this issue on the chromium forums.
hope this helps ... matt

Can I get the data of a cross-site <img/> tag as a blob?

I am trying to save a couple of images that are linked to by a webpage to offline storage. I'm using IndexedDB on Firefox and FileSystem API on Chrome. My code is actually an extension, so on Firefox I'm running on Greasemonkey, and on Chrome as a content script. I want this to be automated.
I am running into problem when I retrieve the image file. I'm using example code from the article titled Storing images and files in IndexedDB, but I get an error: the images I'm trying to download are on a different subdomain and the XHR fails.
XMLHttpRequest cannot load http://...uxgk.JPG. Origin http://subdomain.domain.com is not allowed by Access-Control-Allow-Origin.
On Firefox I could probably use GM_xmlhttpRequest and it'd work (the code works on both browsers when I'm in same-origin URLs), but I still need to solve the problem for Chrome, in which other constraints (namely, needing to interact with frames on the host page) require me to incorporate my script in the page and forfeit my privileges.
So it comes back to that I'm trying to figure out a way to save images that are linked to (and may appear in) the page to IndexedDB and/or FileSystem API. I either need to realize how to solve the cross-origin problem in Chrome (and if it requires privileges, then I need to fix the way I'm interacting with jQuery) or some kind of reverse createObjectURL. At the end of the day I need a blob (File object, as far as I understand) to put into the IndexedDB (Firefox) or to write to FileSystem API (Chrome)
Help, anyone?
Edit: my question may actually really come down to how I can use jQuery the way I want without losing my content script privileges on Chrome. If I do, I could use cross-origin XHRs on Chrome as well. Though I'd much rather get a solution that doesn't rely on that. Specifically since I'd like this solution if I get the script incorporated into the webpage, and not require it to be a content script/userscript.
Edit: I realized that the question is only about cross-site requests. Right now I have one of three ways to get the image blob, with the help of #chris-sobolewski, these questions and some other pages (like this), which can be seen in this fiddle. However, all of these require special privileges in order to run. Alas, since I'm running on a page with frames, because of a known defect in Chrome, I can't access the frames. So I can load a script into each frame by using all_frames: true, but I really want to avoid loading the script with every frame load. Otherwise, according to this article, I need to escape the sandbox, but then it comes back to privileges.

Since you are running on Chrome and Firefox, your answer is fortunately, yes (kind of).
function base64img(i){
var canvas = document.createElement('canvas');
canvas.width = i.width;
canvas.height = i.height;
var context = canvas.getContext("2d");
context.drawImage(i, 0, 0);
var blob = canvas.toDataURL("image/png");
return blob.replace(/^data:image\/(png|jpg);base64,/, "");
}
this will return the base64 encoded image.
from there you just call the function something along these lines:
image = document.getElementById('foo')
imgBlob = base64img(image);
Then go ahead and store imgBlob.
Edit: As file size is a concern, you can also store the data as a canvasPixelArray, which is width*height*4 bytes in size.
imageArray = context.getImageData( 0, 0 ,context.canvas.width,canvasContext.canvas.height );
Then JSONify the array and save that?

Develop Reference

JavaScript is the programming language of the Web.