Storing data to DOM - Element value vs data attribute

Storing data to DOM - Element value vs data attribute - javascript

To store values in a DOM element, we can do it through data attribute
$("#abc").data("item", 1), to retrieve do $("#abc").data("item")
But today I learned that we can do it this way also:
$("#abc")[0].item = 1, to retrive do $("#abc)[0].item
What are the differences between them?
Which one is better? Which one gets a wider compatibility?

.data() exists for a couple reasons:
Some (mostly older) browsers had memory leak issues if you put a JS object into a property on a DOM object. This created a reference between the DOM and JS world (which have separate garbage collectors) which caused problems and could result in memory leaks. Keep references entirely in the JS world by using .data() instead of a DOM property solved that issue. I don't honestly know how much of an issue this still is in modern browsers. Hard to test, easier to just use the known-safe approach.
Historically, some host objects did not support arbitrary property addition with direct property syntax such as obj.prop = 1;. .data() made it so you could associate data with any object regardless of whether it had the ability to handle any arbitrary property.
Name collisions. .data() creates one and only one custom property on a DOM object which is just an id value (a string). You are then free to use any keys you want with .data() with zero worry about conflicting with a pre-existing property name on a DOM object. .data() is essentially it's own name space for custom properties.
Reading HTML5 "data-xxx" attributes. When you read a .data("xxx") property that has not yet been written to the actual jQuery data store, jQuery will read a "data-xxx" attribute on the DOM object. If it finds that attribute, it returns that value and actually coerces its type too so that "false" gets turned into the Javascript false. If you then write .data("xxx", "foo"), the value is not overwritten onto the DOM object, but is written to the jQuery storage and from then on all future reads writes are from the jQuery .data() store. One reason this is useful is that custom attributes (which are different than custom properties) can only be strings, but .data("xxx", yyy) can write and store any JS data type.
So, if you want to use a known-safe method that is not prone to memory leaks, even in older browsers, use .data() rather than making your own custom property on a DOM object.
I suspect it's possible that at some future time, browsers will be considered safe enough that you can store JS object references in custom DOM properties and not have to worry about memory leaks at which time there may be less reasons to use something like .data() - though issue #3 above will still exist.
There are some disadvantages to using .data().
If you store meaningful amounts of data in .data() and then you remove the corresponding DOM object without using jQuery's methods to remove it (such as you use .removeChild() directly or you just set .innerHTML on a parent), the data stored in the .data() store will be orphaned and never cleaned up because jQuery will not know the corresponding DOM object has been removed. This will result in some data in your javascript being kept in the data structure that you won't ever be using. While this isn't technically a leak (as the data is still there for use), it has much the same effect of wasting some memory. If you use .data(), you should only use jQuery methods for removing or replacing DOM objects because they prevent the wasted memory.
Because of the above issue, when you are using jQuery's methods that can result in the removal of DOM objects, jQuery has to do extra work to make sure .data() is cleaned up when using its own methods. This can slow down the performance of .html("xxx"), .remove(), etc...

Related

jQuery .remove() vs Node.removeChild() and external maps to DOM nodes

The jQuery API documentation for jQuery .remove() mentions:
In addition to the elements themselves, all bound events and jQuery
data associated with the elements are removed.
I assume "bound events" here means "event handlers"; documentation for the similar .empty() says:
To avoid memory leaks, jQuery removes other constructs such as data
and event handlers from the child elements before removing the
elements themselves.
It does sound like leaks would ensue if one were to not use these functions and use Node.removeChild() (or ChildNode.remove()) instead.
Is this true for modern browsers?
If so, why exactly can't properties and event handlers be collected once the node is removed?
If not, do we still need to use .data()? Is it only good to retrieve HTML5 data- attributes?
Documentation for jQuery.data() (lower-level function) says:
The jQuery.data() method allows us to attach data of any type to DOM
elements in a way that is safe from circular references and therefore
free from memory leaks. jQuery ensures that the data is removed when
DOM elements are removed via jQuery methods, and when the user leaves
the page.
This sounds an awful lot like a solution to the old IE DOM/JS circular leak pattern which, AFAIK, is solved in all browsers today.
However, a comment in the jQuery src/data.js code (snapshot) says:
Provide a clear path for implementation upgrade to WeakMap in 2014
Which suggests that the idea of storing data strictly associated to a DOM node outside of the DOM using a separate data store with a map is still considered in the future.
Is this just for backward-compatibility, or is there more to it?
Answers provided to other questions like this one also seem to imply that the sole reason for an external map is to avoid cyclic refs between DOM objects and JS objects, which I consider irrelevant in the context of this question (unless I'm mistaken).
Furthermore, I've seen plugins that now set properties on relevant DOM nodes directly (e.g. selectize.js) and it doesn't seem to bother anyone. Is this an OK practice? It certainly looks that way, as it makes removing entire DOM trees very easy. No need to walk it down, no need to clean up any external data store, just detach it from the parent node, lose the reference, and let the garbage collector do its thing.
Further notes, context and rationale to the question:
This kind of capability is especially interesting for frameworks that manage views (e.g. Durandal), which often times have to replace entire trees that represent said views in their architecture. While most of them certainly support jQuery explicitly, this solution does not scale at all. Every component that uses a similar data store must also be cleaned up. In the case of Durandal, it seems they (at least in one occurrence, the dialog plugin - snapshot) rely on Knockout's .removeNode() (snapshot) utility function, which in turn uses jQuery's internal cleanData(). That's, IMHO, a prime example of horrible special-casing (I'm not sure it even works as it is now if jQuery is used in noConflict mode, which it is in most AMD setups).
This is why I'd love to know if I can safely ignore all of this or if we'll have to wait for Web Components in order to regain our long-lost sanity.

"It does sound like leaks would ensue if one were to not use these functions and use Node.removeChild() (or ChildNode.remove()) instead.
Is this true for modern browsers?
If so, why exactly can't properties and event handlers be collected once the node is removed?"
Absolutely. The data (including event handlers) associated with an element is held in a global object held at jQuery.cache, and is removed via a serial number jQuery puts on the element.
When it comes time for jQuery to remove an element, it grabs the serial number, looks up the entry in jQuery.cache, manually deletes the data, and then removes the element.
Destroy the element without jQuery, you destroy the serial number and the only association to the element's entry in the cache. The garbage collector has no knowledge of what the jQuery.cache object is for, and so it can't garbage collect entries for nodes that were removed. It just sees it as a strong reference to data that may be used in the future.
While this was a useful approach for old browsers like IE6 and IE7 that had serious problems with memory leaks, modern implements have excellent garbage collectors that reliably find things like circular references between JavaScript and the DOM. You can have some pretty nasty circular references via object properties and closures, and the GC will find them, so it's really not such a worry with those browsers.
However, since jQuery holds element data in the manner it does, we now have to be very careful when using jQuery to avoid jQuery-based leaks. This means never use native methods to remove elements. Always use jQuery methods so that jQuery can perform its mandatory data cleanup.
"Furthermore, I've seen plugins that now set properties on relevant DOM nodes directly (e.g. selectize.js) and it doesn't seem to bother anyone. Is this an OK practice?"
I think it is for the most part. If the data is just primitive data types, then there's no opportunity for any sort of circular references that could happen with functions and objects. And again, even if there are circular references, modern browsers handle this nicely. Old browsers (especially IE), not so much.
"This is why I'd love to know if I can safely ignore all of this or if we'll have to wait for Web Components in order to regain our long-lost sanity."
We can't ignore the need to use jQuery specific methods when destroying nodes. Your point about external frameworks is a good one. If they're not built specifically with jQuery in mind, there can be problems.
You mention jQuery's $.noConflict, which is another good point. This easily allows other frameworks/libraries to "safely" be loaded, which may overwrite the global $. This opens the door to leaks IMO.
AFAIK, $.noConflict also enables one to load multiple versions of jQuery. I don't know if there are separate caches, but I would assume so. If that's the case, I would imagine we'd have the same issues.
If jQuery is indeed going to use WeakMaps in the future as the comment you quoted suggests, that will be a good thing and a sensible move. It'll only help in browsers that support WeakMaps, but it's better than nothing.
"If not, do we still need to use .data()? Is it only good to retrieve HTML5 data- attributes?"
Just wanted to address the second question. Some people think .data() should always be used for HTML5 data- attributes. I don't because using .data() for that will import the data into jQuery.cache, so there's more memory to potentially leak.
I can see it perhaps in some narrow cases, but not for most data. Even with no leaks, there's no need to have most data- stored in two places. It increases memory usage with no benefit. Just use .attr() for most simple data stored as data- attributes.

In order to provide some of its features, jQuery has its own storage for some things. For example, if you do
$(elem).data("greeting", "hello");
Then, jQuery, will store the key "greeting" and the data "hello" on its own object (not on the DOM object). If you then use .removeChild(elem) to remove that element from the DOM and there are no other references to it, then that DOM element will be freed by the GC, but the data that you stored with .data() will not. This is a memory leak as the data is now orphaned forever (while you're on that web page).
If you use:
$(elem).remove();
or:
$(some parent selector).empty()
Then, jQuery will not only remove the DOM elements, but also clean up its extra shadow data that it keeps on items.
In addition to .data(), jQuery also keeps some info on event handlers that are installed which allows it to perform operations that the DOM by itself can't do such as $(elem).off(). That data also will leak if you don't dispose of an object using jQuery methods.
In a touch of irony, the reason jQuery doesn't store data as properties on the DOM elements themselves (and uses this parallel storage) is because there are circumstances where storing certain types of data on the DOM elements can itself lead to memory leaks.
As for the consequences of all this, most of the time it is a negligible issue because it's a few bytes of data that is recovered by the browser as soon as the user navigates to a new page.
The kinds of things that could make it become material are:
If you have a very dynamic web page that is constantly creating and removing DOM elements thousands of times and using jQuery features on those objects that store side data (jQuery event handlers, .data() on those elements, then any memory leak per operation could add up over time and become material.
If you have a very long running web page (e.g. a single page app) that stays on screen for very long periods of time and thus over time the memory leaks could accumulate.

Why is it risky to store data as an attribute of an element?

I keep reading the same thing:
"Storing property values directly on DOM elements is risky because of possible memory leaks."
But can someone explain these risks in more detail?

(By attribute, I assume you are referring to properties on DOM elements.)
Are custom properties on DOM elements safe?
Some browsers have not cleaned up DOM elements very well when destroyed. References to other elements, the same element, or large sets of data were therefore retained, causing leaks. I believe this is largely resolved in newer browsers.
In any case, storing small amounts of data on an element is innocuous, and can be very convenient, so take that warning with a grain of salt.
Is using jQuery's .data() a safe alternative?
Not especially. Storing data using jQuery's custom data store has its own potential for memory leaks, and unfortunately they don't merely affect old browsers.
In order to avoid leaks, you'd need to be absolutely certain you clean an element's .data() when destroying an element. This is automatic when you use jQuery to destroy the element, but if you don't, you'll have memory leaks that affect every browser.
What are some examples that can cause leaks?
Let's say that there's a bunch of .data() linked to the #foo element. If we use jQuery methods to remove the element, we're safe:
$("#foo").remove(); // associated .data() will be cleaned automatically
But if we do this, we have a cross-browser compatible leak:
var foo = document.getElementById("foo");
foo.parentNode.removeChild(foo);
Or if #foo is a descendant of some other element whose content is being cleared without jQuery, it would be the same issue.
otherElement.innerHTML = "";
In both cases, jQuery was not used to remove #foo, so its .data() is permanently disassociated from the element, and our application has a leak.
So if I never use the DOM API directly, I'm safe?
You're safer, but another way this can happen is if we load more than one DOM manipulation library. Consider that jQuery helps us do this with the following code:
var $jq = jQuery.noConflict();
Now we can allow $ to refer to prototypejs or mootools, and jQuery is referenced by $jq.
The trouble is that those other libraries will not clean up data that was set by jQuery, because they don't know about it.
So if jQuery has some data on #foo, and mootools is used to destroy that element, we have our memory leak.
What if I never use .data() in jQuery? Does that make me safe?
Sadly, no. jQuery uses the same .data() mechanism to store other data, like event handlers. Therefore even if you never make a call to .data() to associate some custom data with an element, you can still have memory leaks caused by the examples above.
Most of the time you may not notice the leaks, but depending on the nature of the code, they can eventually grow large enough to be a problem.

According to the jQuery documentation:
In Internet Explorer prior to version 9, using .prop() to set a DOM
element property to anything other than a simple primitive value
(number, string, or boolean) can cause memory leaks if the property is
not removed (using .removeProp()) before the DOM element is removed
from the document. To safely set values on DOM objects without memory
leaks, use .data().

How jQuery $.data() method differs from directly attaching variables to DOM elements?

I can do this:
$('#someid').data('dataIdentifier', 'someVariable');
And in my understanding I can do this:
document.getElementById('someid').dataIdentifier = someVariable;
What are the pros of using jQuery for this versus raw JavaScript?

From the documentation for jquery.data:
The jQuery.data() method allows us to
attach data of any type to DOM
elements in a way that is safe from
circular references and therefore free
from memory leaks. jQuery ensures that
the data is removed when DOM elements
are removed via jQuery methods, and
when the user leaves the page.

I don't know about the jQuery method, but a "pure javascript" approach is to use setAttribute(). setAttribute is the same as what happens when you attach arbitrary data attributes in the html. You can use getAttribute to read it.
document.getElementById('someid').setAttribute("dataIdentifier", "someVariable");
One advantage is that it will show up in the innerHTML property, which plain old properties will not. The disadvantage is you are limited to strings.

Using jQuery's datastore vs. expando properties

I'm developing code using jQuery and need to store data associated with certain DOM elements. There are a bunch of other questions about how to store arbitrary data with an html element, but I'm more interested in why I would pick one option over the other.
Say, for the sake of extremely simplified argument, that I want to store a "lineNumber" property with each row in a table that is "interesting".
Option 1 would be to just set an expando property on each DOM element (I hope I'm using the term 'expando' correctly):
$('.interesting-line').each(function(i) { this.lineNumber = i; });
Option 2 would be to use jQuery's data() function to associate a property with the element:
$('.interesting-line').each(function(i) { $(this).data('lineNumber', i); });
Ignoring any other shortcomings of my sample code, are there strong reasons why you would choose one means of storing properties over the other?

Using $.data will protect you from memory leaks.
In IE, when you assign a javascript object to an expando property on a DOM element, cycles that cross that link are not garbage collected. If your javascript object holds a reference to the dom object, the whole cycle will leak. It's entirely possible to end up with hidden references to DOM objects, due to closures, so you may leak without realizing it.
The jQuery datastore is set up to prevent these cycles from forming. If you use it, you will not leak memory in this way. Your example will not leak because you are putting primitives (strings) on the DOM element. But if you put a more complex object there, you risk leaking.
Use $.data so you won't have to worry.

If you are authoring a plugin you should use $.data. If you need to store the attribute often and rarely need to query the DOM for it then use $.data.
Update 5 years later: jQuery does not query the DOM based on expando properties set, and hasn't done so for a while. So use $.data. There's no reason to pollute the DOM when there is no pragmatic use to do so.

Using $.data doesn't modify the DOM. You should use $.data. If you're creating a plugin then you should store one object in $.data with properties on that object as opposed to storing each of those properties as different key/value pairs in the $.data structure.

Let me rephrase the question: What are the practical differences between the two data binding options available?
Actually there are three options:
$(e).prop('myKey', myValue);
$(e).data('myKey', myValue);
$(e).attr('data-myKey', myValue);
Note: OP’s e.myKey = myValue is practically the same as the .prop() line.
if you need more than strings, use .prop(), i.e. expando properties
if you need DOM/CSS transparency and/or HTML serialization use .attr('data-*')
if you need both you are out of luck
if you only use strings, but need no DOM, read on to weigh pros and cons yourself
what is with .data() → read the last two paragraphs
If you ever want to pass the data around with serialized HTML you need the .attr() solution. I.e. whenever you use things like .innerHTML or .html() or want to construct snippets from strings with data included. The same applies if you want to use CSS selectors like elem[data-myKey]. Disadvantage: you can only store strings.
If you don’t need your data to be visible in the DOM or available to CSS interaction .data() and .prop() might work. Their biggest advantage is: they can hold any Javascript value.
.prop() biggest disadvantage is the possibility of name collision. Only pick names you can be sure will not be used as native property ever. E.g. scope as key is a bad idea, because it exists on some HTML elements...
Now comes .data(). The other answers seem to swear on it, I avoid it. The memory leaks related to .prop() and expando properties in general belong to the past, so that is no advantage any more. But you will be secured against name collisions with HTML properties. That is an advantage. But you get a bunch of disadvantages:
$(e).data('myKey') draws its uninitialized value from the data-myKey attribute if available, runs JSON.parse() on those and sometimes returns that or falls back to the string value of the attribute. Once you set $(e).data('myKey', myValue) you lose the relationship with the data-myKey attribute, which nevertheless lives on with its “old” value, shown in DOM and in CSS interactions. On top, the key name you use is subject to possible name mangling. I.e. if you ever decide to read all key-value via $(e).data() the keys in that object might be different.
Because of this erratic behavior (mixing expando property technology with data-* attributes) and inconsistent get/set design I always avoid .data().—Fortunately that is easy to do with .prop() and .attr() (with data-* keys for compliance).
If you really want to use .data() to avoid name clashes with native properties, my advice: do not mix with data-* attributes, consider them a different thing, and avoid name clashes with those.—Does that make sense? For automatic clash avoidance you have to avoid clashes elsewhere manually. Great design.

What sort of memory leaks should I watch for with jQuery's data()?

Should I pair every data() call with a later removeData() call?
My assumptions: jQuery's remove() will remove elements from the DOM, and if I don't have any other references to remove, I don't have to do any more clean up.
However, if I have some javascript var or object referring to one of the elements being removed, I'll need to clean that up, and I'm assuming that applies to jQuery's data function, too, because it's referencing the elements somehow.
So if I do need to call removeData before remove, is there a shortcut to remove all data associated with an element or do I have to call each explicitly by name?
Edit: I looked through the source code and confirmed what Borgar and roosteronacid said. Remove takes the elements out of the dom and deletes any events and data stored with them - which is convenient, but makes me wonder when you would use removeData(). Probably not often.

jQuery's data does not keep a reference to the element so that you don't need to worry about memory leaks. Its intended purpose is to solve this exact problem.
A slight simplification of how it works:
An id member is added to each "touched" DOM node. All subsequent actions involving that DOM element use that id.
var theNode = document.getElementById('examplenode');
theNode[ 'jQuery' + timestamp ] = someInternalNodeID;
You can access the id using the same function jQuery uses:
someInternalID = jQuery.data( document.body );
When you append data to the node it stores that on the jQuery object, filed under the node's internal id. Your $(element).data(key,value) translates internally to something like:
jQuery.cache[ someInternalNodeID ][ theKey ] = theValue;
Everything goes into the same structure, including event handlers:
jQuery.cache[ someInternalNodeID ][ 'events' ][ 'click' ] = theHandler;
When an element is removed, jQuery can therefore throw away all the data (and the event handlers) with one simple operation:
delete jQuery.cache[ someInternalNodeID ];
Theoretically, you may therefore also remove jQuery without leaks occurring from any references. jQuery even supports multiple separate instances of the library, each holding it's own set of data or events.
You can see John Resig explaining this stuff in the "The DOM Is a Mess" presentation.

The whole point of jQuery is to abstract away from crappy JavaScript implementations and bugs in browsers.. such as memory leaks :)
.. Yup; all associated data to an element will be removed when that element is removed from the DOM.

By and large, javascript is fairly good about knowing when it's appropriate to collect the garbage, and unless you're writing very large-scale or long-running client-side apps, I'd say the memory involved is mostly inconsequential and trying to second-guess it isn't gonna gain you a lot.
Determining what unfinished lexical closures or other tricky javascript in jQuery might still be accessing your given data could be pretty complicated in some cases.
As far as I'm aware, though, if you store a reference to whatever you got with jQuery's data function then it would continue to exist after the element is removed, so removing that reference would be necessary as well. Some simple test cases would give you a more definite answer.

Develop Reference

JavaScript is the programming language of the Web.