What is the performance cost of maintaining a live HTMLCollection?

What is the performance cost of maintaining a live HTMLCollection? - javascript

A piece of my front-end JS code is depending on a live HTMLCollection of several thousand DOM nodes. Since it is live, it updates automatically as the DOM is updated.
Is this the same as re-running thedocument.getElementsByClassName call every single time I modify the DOM, or is there a performance optimization under the hood?

This blog post explains a little of the difference between static and live NodeLists, and mentions how live collections are implemented.
A live collection doesn't access the DOM until you access its elements. At that time it creates the actual collection and caches it. There's no overhead to future accesses if the DOM doesn't change.
Changing the DOM shouldn't cause an immediate update to any collections. Rather, it should just invalidate their caches. The next time you access one of the collections, it will be regenerated.
So if you create a live collection and access it infrequently compared to DOM modifications, there should be relatively little overhead. The worst case is if you loop over a live collection and modify the DOM during the loop -- each iteration will have to update the collection.
It's possible that there may be additional optimizations that could mitigate this. For some types of live collections, the JavaScript engine may be able to tell whether a particular DOM modification could affect it; if not, it doesn't have to invalidate the collection. For instance, a collection created with document.getElementsByClassName() would not be affected by a modification that doesn't add or remove the specified class anywhere. However, if you do something like delete an element, it would have to check whether the class appeared anywhere in the subtree headed by that element, so it's not obvious that this would really be better than just invalidating the caches.

Related

jQuery .remove() vs Node.removeChild() and external maps to DOM nodes

The jQuery API documentation for jQuery .remove() mentions:
In addition to the elements themselves, all bound events and jQuery
data associated with the elements are removed.
I assume "bound events" here means "event handlers"; documentation for the similar .empty() says:
To avoid memory leaks, jQuery removes other constructs such as data
and event handlers from the child elements before removing the
elements themselves.
It does sound like leaks would ensue if one were to not use these functions and use Node.removeChild() (or ChildNode.remove()) instead.
Is this true for modern browsers?
If so, why exactly can't properties and event handlers be collected once the node is removed?
If not, do we still need to use .data()? Is it only good to retrieve HTML5 data- attributes?
Documentation for jQuery.data() (lower-level function) says:
The jQuery.data() method allows us to attach data of any type to DOM
elements in a way that is safe from circular references and therefore
free from memory leaks. jQuery ensures that the data is removed when
DOM elements are removed via jQuery methods, and when the user leaves
the page.
This sounds an awful lot like a solution to the old IE DOM/JS circular leak pattern which, AFAIK, is solved in all browsers today.
However, a comment in the jQuery src/data.js code (snapshot) says:
Provide a clear path for implementation upgrade to WeakMap in 2014
Which suggests that the idea of storing data strictly associated to a DOM node outside of the DOM using a separate data store with a map is still considered in the future.
Is this just for backward-compatibility, or is there more to it?
Answers provided to other questions like this one also seem to imply that the sole reason for an external map is to avoid cyclic refs between DOM objects and JS objects, which I consider irrelevant in the context of this question (unless I'm mistaken).
Furthermore, I've seen plugins that now set properties on relevant DOM nodes directly (e.g. selectize.js) and it doesn't seem to bother anyone. Is this an OK practice? It certainly looks that way, as it makes removing entire DOM trees very easy. No need to walk it down, no need to clean up any external data store, just detach it from the parent node, lose the reference, and let the garbage collector do its thing.
Further notes, context and rationale to the question:
This kind of capability is especially interesting for frameworks that manage views (e.g. Durandal), which often times have to replace entire trees that represent said views in their architecture. While most of them certainly support jQuery explicitly, this solution does not scale at all. Every component that uses a similar data store must also be cleaned up. In the case of Durandal, it seems they (at least in one occurrence, the dialog plugin - snapshot) rely on Knockout's .removeNode() (snapshot) utility function, which in turn uses jQuery's internal cleanData(). That's, IMHO, a prime example of horrible special-casing (I'm not sure it even works as it is now if jQuery is used in noConflict mode, which it is in most AMD setups).
This is why I'd love to know if I can safely ignore all of this or if we'll have to wait for Web Components in order to regain our long-lost sanity.

"It does sound like leaks would ensue if one were to not use these functions and use Node.removeChild() (or ChildNode.remove()) instead.
Is this true for modern browsers?
If so, why exactly can't properties and event handlers be collected once the node is removed?"
Absolutely. The data (including event handlers) associated with an element is held in a global object held at jQuery.cache, and is removed via a serial number jQuery puts on the element.
When it comes time for jQuery to remove an element, it grabs the serial number, looks up the entry in jQuery.cache, manually deletes the data, and then removes the element.
Destroy the element without jQuery, you destroy the serial number and the only association to the element's entry in the cache. The garbage collector has no knowledge of what the jQuery.cache object is for, and so it can't garbage collect entries for nodes that were removed. It just sees it as a strong reference to data that may be used in the future.
While this was a useful approach for old browsers like IE6 and IE7 that had serious problems with memory leaks, modern implements have excellent garbage collectors that reliably find things like circular references between JavaScript and the DOM. You can have some pretty nasty circular references via object properties and closures, and the GC will find them, so it's really not such a worry with those browsers.
However, since jQuery holds element data in the manner it does, we now have to be very careful when using jQuery to avoid jQuery-based leaks. This means never use native methods to remove elements. Always use jQuery methods so that jQuery can perform its mandatory data cleanup.
"Furthermore, I've seen plugins that now set properties on relevant DOM nodes directly (e.g. selectize.js) and it doesn't seem to bother anyone. Is this an OK practice?"
I think it is for the most part. If the data is just primitive data types, then there's no opportunity for any sort of circular references that could happen with functions and objects. And again, even if there are circular references, modern browsers handle this nicely. Old browsers (especially IE), not so much.
"This is why I'd love to know if I can safely ignore all of this or if we'll have to wait for Web Components in order to regain our long-lost sanity."
We can't ignore the need to use jQuery specific methods when destroying nodes. Your point about external frameworks is a good one. If they're not built specifically with jQuery in mind, there can be problems.
You mention jQuery's $.noConflict, which is another good point. This easily allows other frameworks/libraries to "safely" be loaded, which may overwrite the global $. This opens the door to leaks IMO.
AFAIK, $.noConflict also enables one to load multiple versions of jQuery. I don't know if there are separate caches, but I would assume so. If that's the case, I would imagine we'd have the same issues.
If jQuery is indeed going to use WeakMaps in the future as the comment you quoted suggests, that will be a good thing and a sensible move. It'll only help in browsers that support WeakMaps, but it's better than nothing.
"If not, do we still need to use .data()? Is it only good to retrieve HTML5 data- attributes?"
Just wanted to address the second question. Some people think .data() should always be used for HTML5 data- attributes. I don't because using .data() for that will import the data into jQuery.cache, so there's more memory to potentially leak.
I can see it perhaps in some narrow cases, but not for most data. Even with no leaks, there's no need to have most data- stored in two places. It increases memory usage with no benefit. Just use .attr() for most simple data stored as data- attributes.

In order to provide some of its features, jQuery has its own storage for some things. For example, if you do
$(elem).data("greeting", "hello");
Then, jQuery, will store the key "greeting" and the data "hello" on its own object (not on the DOM object). If you then use .removeChild(elem) to remove that element from the DOM and there are no other references to it, then that DOM element will be freed by the GC, but the data that you stored with .data() will not. This is a memory leak as the data is now orphaned forever (while you're on that web page).
If you use:
$(elem).remove();
or:
$(some parent selector).empty()
Then, jQuery will not only remove the DOM elements, but also clean up its extra shadow data that it keeps on items.
In addition to .data(), jQuery also keeps some info on event handlers that are installed which allows it to perform operations that the DOM by itself can't do such as $(elem).off(). That data also will leak if you don't dispose of an object using jQuery methods.
In a touch of irony, the reason jQuery doesn't store data as properties on the DOM elements themselves (and uses this parallel storage) is because there are circumstances where storing certain types of data on the DOM elements can itself lead to memory leaks.
As for the consequences of all this, most of the time it is a negligible issue because it's a few bytes of data that is recovered by the browser as soon as the user navigates to a new page.
The kinds of things that could make it become material are:
If you have a very dynamic web page that is constantly creating and removing DOM elements thousands of times and using jQuery features on those objects that store side data (jQuery event handlers, .data() on those elements, then any memory leak per operation could add up over time and become material.
If you have a very long running web page (e.g. a single page app) that stays on screen for very long periods of time and thus over time the memory leaks could accumulate.

Identify javascript closures with developer tools

I am currently developing a website that is pure javascript and relies heavily on the jQuery & jQuery UI libraries (this site is not intended for use by a general public, hence progressive enhancement is not a strict requirement for this project). I am encountering a significant memory leak on executing the following code:
oDialogBox = $("<div>...</div>");
/* Add useful things to the dialog box here */
oDialogBox.appendTo("body");
oDialogBox.dialog({
/* Other dialog box settings here */
close: function(event, ui) {
oDialogBox.dialog("destroy");
oDialogBox.remove();
oDialogBox = null;
}
});
At any given time in this dialog box, I am creating, removing and modifying a large number of instances of jQuery UI buttons, multiselects (per the Multiselect widget created by Eric Hynds) and on click event handlers. According to jQuery UI documentation, calling .remove() on oDialogBox should result in all child widgets being unbound and deleted. Yet my detached DOM tree shows a significant number of garbage elements that the GC isn't collecting.
It is highly likely I have missed a large set of closures that need to be finished off safely. How do I do the following:
1) How do I identify which closures are keeping a given detached DOM object alive (either in Firefox or Chrome)?
2) Assuming the complete set of closures is identified, does anything beyond nulling the variable need to be done to assure marking the DOM element for garbage collection?
3) I have also noticed my list of arrays stored by the page is giant and contains references to DOM elements not being gathered by the GC. Is there a documented best practice for cleaning arrays from javascript and allowing all elements to be marked for deletion? (Note: this is a current prime suspect for the source of the memory leak)

I'm afraid that I don't have a great answer for #1. I haven't found any really good tools for this myself, even given how good the development tools have become over the last few years. The best advice I can give is to always keep things in the smallest scope you possibly can. If things don't escape, it's generally easier to simply figure out where the references must be.
As to #2, there can be further concerns. If the object referenced by variable v1 closes over the free variables of some function, removing v1 will not be enough to make them eligible for garbage collection if another variable v2 closes over v1 in some other function. So I guess if you really mean the "complete set of closures", then you should be all set. But this might get hairy. Again, if most object have references only in narrow scopes, these problems are much less severe.
For #3, what sorts of arrays are you discussing? If it's jQuery collections, then perhaps you simply have too many of them around. The only reason I know for them to stay around for a long time is to bind event handlers to them, and that is almost always better handled by event delegation on parent elements. If it's you're own custom arrays, do you really have a good reason to store references to them in arrays that last for any substantial length of time? I've rarely found one.

Why is it faster to access the DOM through a cached variable?

I am trying to improve my knowledge of javascript and while searching for some "best practices", someone pointed out to me that it is faster to cache the DOM document and then access it through that var instead of accessing the document object directly.
You can see the results here, on an edit I made on jsperf: http://jsperf.com/jquery-document-cached-vs-uncached/3 (edit: the title holds "jsquery" because that was the original test, my edit contains vanilla javascript, the framework makes no difference)
This really makes me curious. Basically I am introducing a new variable into the equation, how can that make things faster instead of slower?
As far as I know, "print a" should be better than "b = a; print b" (figure of speach)
What's different in this case?

document is not like an ordinary Javascript variable. There's no telling what odd magic is happening under the covers when accessing its attributes, especially the DOM, which may be created on demand from internal browser structures.

I believe I found an explanation here (the emphasis on the last part is mine):
Store pointer references to in-browser objects. Use this technique to
reduce DOM traversal trips by storing references to browser objects
during instantiation for later usage. For example, if you are not
expecting your DOM to change you should store a reference to DOM or
jQuery objects you are going to use when your page is created; if you
are building a DOM structure such as a dialog window, make sure you
store a few handy reference to DOM objects inside it during
instantiation, so you dont need to find the same DOM object over an
over again when a user clicks on something or drags the dialog
window.If you haven’t stored a reference to a DOM object, and you need
to iterate inside a function, you can create a local variable
containing a reference to that DOM object, this will considerably
speed up the iteration as the local variable is stored in the most
accessible part of the stack.
So, if I understand correctly, caching the DOM in a local variable makes it easier to access in the memory stack, therefore increasing the speed of execution.

Can jQuery.data cause a memory leak?

Would the following piece of code create a memory leak.
According to the jQuery documentation use of the data function avoids memory leaks. It would be useful to confirm whether the following is safe.
var MyClass = function(el) {
// Store reference of element in object.
this.element = $(el);
};
// Store reference of object in element.
$('#something').data('obj', new MyClass('#something'));

Obviously the code as it stands would take up extra memory as long as the DOM element is still connected to the DOM. But I'm guessing you're asking whether it would continue using extra memory after the DOM element is no longer in use.
Update: Thanks to Joey's answer (which he has since deleted), I spent some time reading up on memory leaks in javascript, and it appears my assumptions in the paragraph below are incorrect. Because DOM elements don't use pure garbage collection, a circular reference like this would normally prevent both the DOM element and the javascript object from ever being released. However, I believe the remainder of this answer is still correct.
Without a deep knowledge of how javascript engines implement garbage collection, I can't speak authoritatively on the topic. However, my general understanding of garbage collection makes me think that your code would be "safe" in the sense that after the #something element is removed from the DOM, the resulting MyClass object would only have a reference to an object that has no other connections. The graph algorithms of the garbage collector should be able to identify that the DOM element and its MyClass object are "floating in space" and unconnected to everything else.
Furthermore, jQuery goes out of its way to strip data and events that are associated with a given DOM element once it is removed from the DOM. From the documentation:
jQuery ensures that the data is removed when DOM elements are removed via jQuery methods, and when the user leaves the page.
So assuming you use jQuery consistently, you would only have a one-way reference once the object is removed from the DOM anyway, which makes it that much easier possible for the garbage collector to know it can get rid of these objects.
So as long as you don't have something else referencing the MyClass object once the DOM element is removed, you shouldn't have a memory leak.

I suppose it depends on the Javascritp engine.
You have have the question precisely enought to perform a test. I added a long string in the object and ran the potential leak in a large loop.
As a result, I don't think in leaks in IE8 nor in Chrome.
But I could not reproduce these leakeage patterns either.

This can lead to a memory leak.
the theory of jQuery.data method may use A Data inner class to cache data for the dom element.
of course,when you remove the cache data,jQuery will unreference the data.
but the inner cache is a increasing array,when you you it ,it will go upon.
so ,in the end,there will be very big cache array,which will lead memeory leak.
In a long run web app,this may leak memory crash.

The data attribute only stores string values.

What sort of memory leaks should I watch for with jQuery's data()?

Should I pair every data() call with a later removeData() call?
My assumptions: jQuery's remove() will remove elements from the DOM, and if I don't have any other references to remove, I don't have to do any more clean up.
However, if I have some javascript var or object referring to one of the elements being removed, I'll need to clean that up, and I'm assuming that applies to jQuery's data function, too, because it's referencing the elements somehow.
So if I do need to call removeData before remove, is there a shortcut to remove all data associated with an element or do I have to call each explicitly by name?
Edit: I looked through the source code and confirmed what Borgar and roosteronacid said. Remove takes the elements out of the dom and deletes any events and data stored with them - which is convenient, but makes me wonder when you would use removeData(). Probably not often.

jQuery's data does not keep a reference to the element so that you don't need to worry about memory leaks. Its intended purpose is to solve this exact problem.
A slight simplification of how it works:
An id member is added to each "touched" DOM node. All subsequent actions involving that DOM element use that id.
var theNode = document.getElementById('examplenode');
theNode[ 'jQuery' + timestamp ] = someInternalNodeID;
You can access the id using the same function jQuery uses:
someInternalID = jQuery.data( document.body );
When you append data to the node it stores that on the jQuery object, filed under the node's internal id. Your $(element).data(key,value) translates internally to something like:
jQuery.cache[ someInternalNodeID ][ theKey ] = theValue;
Everything goes into the same structure, including event handlers:
jQuery.cache[ someInternalNodeID ][ 'events' ][ 'click' ] = theHandler;
When an element is removed, jQuery can therefore throw away all the data (and the event handlers) with one simple operation:
delete jQuery.cache[ someInternalNodeID ];
Theoretically, you may therefore also remove jQuery without leaks occurring from any references. jQuery even supports multiple separate instances of the library, each holding it's own set of data or events.
You can see John Resig explaining this stuff in the "The DOM Is a Mess" presentation.

The whole point of jQuery is to abstract away from crappy JavaScript implementations and bugs in browsers.. such as memory leaks :)
.. Yup; all associated data to an element will be removed when that element is removed from the DOM.

By and large, javascript is fairly good about knowing when it's appropriate to collect the garbage, and unless you're writing very large-scale or long-running client-side apps, I'd say the memory involved is mostly inconsequential and trying to second-guess it isn't gonna gain you a lot.
Determining what unfinished lexical closures or other tricky javascript in jQuery might still be accessing your given data could be pretty complicated in some cases.
As far as I'm aware, though, if you store a reference to whatever you got with jQuery's data function then it would continue to exist after the element is removed, so removing that reference would be necessary as well. Some simple test cases would give you a more definite answer.

Develop Reference

JavaScript is the programming language of the Web.