JavaScript: Unable to read all elements from innerHTML content

JavaScript: Unable to read all elements from innerHTML content - javascript

I am trying to read the elements in an innerHTML of a div, but it seems that alternate elements are being read.
Code Block:
<!DOCTYPE html>
<html>
<body>
<script type="text/javascript">
var tdiv=document.createElement("div");
tdiv.innerHTML="<span>a1</span><span>a2</span><span>a3</span><span>a4</span><span>a5</span>";
var cn=tdiv.getElementsByTagName("*");
var len=cn.length;
console.log("length: "+len);
console.log("tdiv len: "+tdiv.getElementsByTagName("*").length);
for(var i=0;i<len;i++){
if(cn[i]){
console.log(i+": "+cn[i].nodeName+": "+cn[i].tagName);
document.body.appendChild(cn[i]);
}
}
</script>
</body>
</html>
Output:
a1a3a5
Note: a2 and a4 are missing.
I have tried using both childNodes and getElementsByTagName("*") in all the browsers, IE, FF, Chrome, Opera, Safari and I see the same behavior.
When I add a white space between all the spans then all the elements are being read. Is this an expected behavior ? If so, why ?

The returned item is a live NodeList. You are appending them to the body element, so the NodeList is shrinking with each iteration of the for loop. This is what causes it to appear like it's arbitrarily skipping elements.
Try...
while (cn.length) {
cn[0] && document.body.appendChild(cn[0]);
}
jsFiddle.
When I add a white space between all the spans then all the elements are being read. Is this an expected behavior ? If so, why ?
Yes, it's expected. It just means instead of skipping the span elements, it's skipping the text nodes introduced by the spaces. Never rely on this - it's terribly fragile.

Related

Difference between textContent vs innerText

What is the difference between textContent and innerText in JavaScript?
Can I use textContent as follows:
var logo$ = document.getElementsByClassName('logo')[0];
logo$.textContent = "Example";

The key differences between innerText and textContent are outlined very well in Kelly Norton's blogpost: innerText vs. textContent. Below you can find a summary:
innerText was non-standard, textContent was standardized earlier.
innerText returns the visible text contained in a node, while textContent returns the full text. For example, on the following HTML <span>Hello <span style="display: none;">World</span></span>, innerText will return 'Hello', while textContent will return 'Hello World'. For a more complete list of differences, see the table at http://perfectionkills.com/the-poor-misunderstood-innerText/ (further reading at 'innerText' works in IE, but not in Firefox).
As a result, innerText is much more performance-heavy: it requires layout information to return the result.
innerText is defined only for HTMLElement objects, while textContent is defined for all Node objects.
Be sure to also have a look at the informative comments below this answer.
textContent was unavailable in IE8-, and a bare-metal polyfill would have looked like a recursive function using nodeValue on all childNodes of the specified node:
function textContent(rootNode) {
if ('textContent' in document.createTextNode(''))
return rootNode.textContent;
var childNodes = rootNode.childNodes,
len = childNodes.length,
result = '';
for (var i = 0; i < len; i++) {
if (childNodes[i].nodeType === 3)
result += childNodes[i].nodeValue;
else if (childNodes[i].nodeType === 1)
result += textContent(childNodes[i]);
}
return result;
}

textContent is the only one available for text nodes:
var text = document.createTextNode('text');
console.log(text.innerText); // undefined
console.log(text.textContent); // text
In element nodes, innerText evaluates <br> elements, while textContent evaluates control characters:
var span = document.querySelector('span');
span.innerHTML = "1<br>2<br>3<br>4\n5\n6\n7\n8";
console.log(span.innerText); // breaks in first half
console.log(span.textContent); // breaks in second half
<span></span>
span.innerText gives:
1
2
3
4 5 6 7 8
span.textContent gives:
1234
5
6
7
8
Strings with control characters (e. g. line feeds) are not available with textContent, if the content was set with innerText. The other way (set control characters with textContent), all characters are returned both with innerText and textContent:
var div = document.createElement('div');
div.innerText = "x\ny";
console.log(div.textContent); // xy

For those who googled this question and arrived here. I feel the most clear answer to this question is in MDN document: https://developer.mozilla.org/en-US/docs/Web/API/Node/textContent.
You can forgot all the points that may confuse you but remember 2 things:
When you are trying to alter the text, textContent is usually the property you are looking for.
When you are trying to grab text from some element, innerText approximates the text the user would get if they highlighted the contents of the element with the cursor and then copied to the clipboard. And textContent gives you everything, visible or hidden, including <script> and <style> elements.

Both innerText & textContent are standardized as of 2016. All Node objects (including pure text nodes) have textContent, but only HTMLElement objects have innerText.
While textContent works with most browsers, it does not work on IE8 or earlier. Use this polyfill for it to work on IE8 only. This polyfill will not work with IE7 or earlier.
if (Object.defineProperty
&& Object.getOwnPropertyDescriptor
&& Object.getOwnPropertyDescriptor(Element.prototype, "textContent")
&& !Object.getOwnPropertyDescriptor(Element.prototype, "textContent").get) {
(function() {
var innerText = Object.getOwnPropertyDescriptor(Element.prototype, "innerText");
Object.defineProperty(Element.prototype, "textContent",
{
get: function() {
return innerText.get.call(this);
},
set: function(s) {
return innerText.set.call(this, s);
}
}
);
})();
}
The Object.defineProperty method is availabe in IE9 or up, however it is available in IE8 for DOM objects only.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/defineProperty
https://developer.mozilla.org/en-US/docs/Web/API/Node/textContent

textContent is supported by most browsers. It is not supported by ie8 or earlier, but a polyfill can be used for this
The textContent property sets or returns the textual content of the specified node, and all its descendants.
See http://www.w3schools.com/jsref/prop_node_textcontent.asp

Aside from all the differences that were named in the other answers, here is another one which I discovered only recently:
Even though the innerText property is said to've been standardised since 2016, it exhibits differences between browsers: Mozilla ignores U+200E and U+200F characters ("lrm" and "rlm") in innerText, while Chrome does not.
console.log(document.getElementById('test').textContent.length);
console.log(document.getElementById('test').innerText.length);
<div id="test">[‎]</div>
Firefox reports 3 and 2, Chrome reports 3 and 3.
Not sure yet if this is a bug (and if so, in which browser) or just one of those quirky incompatibilities which we have to live with.

textContent returns full text and does not care about visibility, while innerText does.
<p id="source">
<style>#source { color: red; }</style>
Text with breaking<br>point.
<span style="display:none">HIDDEN TEXT</span>
</p>
Output of textContent:
#source { color: red; } Text with breakingpoint. HIDDEN TEXT
Output of innerText ( note how innerText is aware of tags like <br>, and ignores hidden element ):
Text with breaking point.

Another useful behavior of innerText compared to textContent is that newline characters and multiple spaces next to each other will be displayed as one space only, which can be easier to compare a string.
But depending on what you want, firstChild.nodeValue may be enough.

document.querySelector('h1').innerText/innerHTML/textContent
.querySelector('h1').innerText - gives us text inside. It sensitive to what is currently being displayed or staff that's being hidden is ignored.
.querySelector('h1').textContent - it's like innerText but it does not care about what is being displayed or what's actually showing to user. It will show all.
.querySelector('h1').innerHTML = <i>sdsd</i> Will work* - retrieves full contents, including the tag names.

innerHTML will execute even the HTML tags which might be dangerous causing any kind of client-side injection attack like DOM based XSS.
Here is the code snippet:
<!DOCTYPE html>
<html>
<body>
<script>
var source = "Hello " + decodeURIComponent("<h1>Text inside gets executed as h1 tag HTML is evaluated</h1>"); //Source
var divElement = document.createElement("div");
divElement.innerHTML = source; //Sink
document.body.appendChild(divElement);
</script>
</body>
</html>
If you use .textContent, it will not evaluate the HTML tags and print it as String.
<!DOCTYPE html>
<html>
<body>
<script>
var source = "Hello " + decodeURIComponent("<h1>Text inside will not get executed as HTML</h1>"); //Source
var divElement = document.createElement("div");
divElement.textContent = source; //Sink
document.body.appendChild(divElement);
</script>
</body>
</html>
Reference: https://www.scip.ch/en/?labs.20171214

Element custom properties not being kept after loop

I have some more complex code with a strange behaviour that I've managed to reproduce here:
<!DOCTYPE html>
<html>
<body>
<div>things</div>
<div>stuff</div>
<div>other</div>
<div>misc</div>
<script>
var forEach = function (array, callback, scope) {
for (var i = 0; i < array.length; i++) {
callback.call(scope, array[i], i);
}
}
var d = document.querySelectorAll('div');
d[1].o = d[1].textContent;
forEach(d, function (el, i) {
d[1].innerHTML += '<p>div things</p> sdf d';
document.body.innerHTML += '<div>new div</div> fdsffsd fsdf';
alert(d[1].o);
});
</script>
</body>
</html>
I should get 4 alerts, each saying "stuff". And I do, until I do a hard refresh, and then a normal refresh. Then only the first alert says "stuff", and the others say "undefined". It appears the "o" property being added to div[1] is not being kept. It seems to be related to the innerHTML being added to the body in the loop. The innerHTML being added to the div doesn't seem problematic.
I cannot see what the problem is. Moreover, this only seems to happen in Chrome (v43) and not in Firefox.
Any ideas?

The reason this is happening when the body's innerHTML is updated is that the whole of the body's innerHTML needs to be reparsed. This means any custom properties attached to any elements are then lost, as these DOM elements are being recreated.
Thus one should probably not be using innerHTML with the += operator unless you're sure you know what you're doing.
Why it even worked sometimes is a mystery...

IE7/8 + <time> tag + jQuery .clone() =?

I'll preface this by saying that I already solved this issue by fundamentally changing my approach. But in the process of solving it, I put together a test case that fascinates and vexes me.
I have a string returned from an AJAX call. The string contains HTML, most of which is useless. I want one element from the string (and all its children) inserted into the DOM. A simulation of this is:
<!DOCTYPE html>
<html>
<head>
<title>wtf?</title>
<script src="http://code.jquery.com/jquery-1.9.1.min.js"></script>
<script>
$(document).ready(function() {
var markup = '<div class="junk"><div class="good"><time datetime="2013-03-29">March 29, 2013</time></div></div>',
output = $(markup).find('.good').clone().wrap('<div />').parent().html();
$('body').append(output);
});
</script>
</head>
<body></body>
</html>
I have a hosted copy of this file up here: http://alala.smitelli.com/temp/wtf_ie.html (won't be up forever).
What this should do is extract the .good div and the child <time> element, then insert them into the body. I do .wrap().parent() to extract the element I selected in addition to its children (see this question). The .clone() and .html() are contrivances that demonstrate the problem.
To the user, it should show today's date. And it works in Chrome, Firefox, IE9, etc.:
March 29, 2013
But in IE7 and 8, the displayed text is:
<:time datetime="2013-03-29">March 29, 2013
The opening < is shown, and a colon has somehow been inserted. The closing </time> tag looks unaffected, and is not shown escaped.
What's going on here? Is this some sort of bug, or an expected behavior? Where is the colon coming from?
EDIT: As far as suggestions to add document.createElement('time') or html5shiv, neither of those seemed to change the behavior.

Very much to my surprise, I find that if I remove jQuery from the equation in terms of actually parsing the markup, the problem goes away (on both IE7 and IE8), even without createElement('time') or a shim/shiv:
<!DOCTYPE html>
<html>
<head>
<title>wtf?</title>
<script src="http://code.jquery.com/jquery-1.9.1.min.js"></script>
<script>
$(document).ready(function() {
var div, markup, output;
markup = '<div class="junk"><div class="good"><time datetime="2013-03-29">March 29, 2013</time></div></div>';
div = document.createElement('div');
div.innerHTML = markup;
output = $(div).find('.good').clone().wrap('<div />').parent().html();
$('body').append(output);
});
</script>
</head>
<body></body>
</html>
Live Copy | Source
The change there is that I just use the browser's own handling of innerHTML and a disconnected div to parse markup, rather than letting jQuery do it for me.
So I'd have to say this may be a problem with jQuery's handling of HTML fragments on older browsers without support for HTML5 elements. But that would be a significant claim, and significant claims require significant evidence...
But if I change these lines:
div = document.createElement('div');
div.innerHTML = markup;
output = $(div).find('.good').clone().wrap('<div />').parent().html();
to:
div = $(markup);
output = div.find('.good').clone().wrap('<div />').parent().html();
I get the problem (on both IE7 and IE8): Live Copy | Source
So it does start seeming like a jQuery issue...

Cyclic adding/removing of DOM nodes causes memory leaks in JavaScript?

I'm trying to display dynamically changeable data manipulating with DOM elements (adding/removing them). I found out a very strange behavior of almost all browsers: after I removed a DOM element and then add a new one the browser is not freeing the memory taken by the removed DOM item. See the code below to understand what I mean. After we run this page it'll eat step-by-step up to 150 MB of memory. Can anyone explain me this strange behavior? Or maybe I'm doing something wrong?
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<script type="text/javascript">
function redrawThings() {
// Removing all the children from the container
var cont = document.getElementById("container");
while ( cont.childNodes.length >= 1 ) {
cont.removeChild(cont.firstChild);
}
// adding 1000 new children to the container
for (var i = 0; i < 1000; i++) {
var newDiv = document.createElement('div');
newDiv.innerHTML = "Preved medved " + i;
cont.appendChild(newDiv);
}
}
</script>
<style type="text/css">
#container {
border: 1px solid blue;
}
</style>
</head>
<body onload='setInterval("redrawThings()", 200);'>
<div id="container"> </div>
</body>
</html>

I can't reproduce this on FF 3.6.8/Linux, but 200 ms for a timer is rather small with that much of DOM re-rendering. What I notice on my machine is that when doing JavaScript-intensive things besides running this script, like typing in this answer box, memory usage increases, but is released again when I stop typing (in my case, to something around 16% of memory usage).
I guess that in your case the browser's garbage collector just doesn't have enough ‘free time’ to actually remove those nodes from memory.

Not sure if it'll affect timing, and it's probably really bad practice, but instead of looping through the child nodes, could you not just set the innerHTML of the div to "" ??

It's because removing a Node from the DOM Tree doesn't delete it from memory, it's still accessible, so the following code will work:
var removed = element.removeChild(element.firstChild);
document.body.appendChild(removed);
That code will remove the first child from element, and then after it has been removed, append it to the end of the document.
There really is nothing you can do except make your code more efficient with less removals.
For more info, check out the Node.removeChild page at the Mozilla Developer Center.

Javascript Removing Whitespace When It Shouldn't?

I have a HTML file that has code similar to the following.
<table>
<tr>
<td id="MyCell">Hello World</td>
</tr>
</table>
I am using javascript like the following to get the value
document.getElementById(cell2.Element.id).innerText
This returns the text "Hello World" with only 1 space between hello and world. I MUST keep the same number of spaces, is there any way for that to be done?
I've tried using innerHTML, outerHTML and similar items, but I'm having no luck.

HTML is white space insensititive which means your DOM is too. Would wrapping your "Hello World" in pre block work at all?

In HTML,any spaces >1 are ignored, both in displaying text and in retrieving it via the DOM. The only guaranteed way to maintain spaces it to use a non-breaking space .

Just a tip, innerText only works in Internet Explorer, while innerHTML works in every browser... so, use innerHTML instead of innerText

The pre tag or white-space: pre in your CSS will treat all spaces as meaningful. This will also, however, turn newlines into line breaks, so be careful.

Just an opinion here and not canonical advice, but you're headed for a world or hurt if you're trying to extract exact text values from the DOM using the inner/outer HTML/TEXT properties via Javascript. Different browsers are going to return slightly different values, based on how the browser "sees" the internal document.
If you can, I'd change the HTML you're rendering to include a hidden input, something like
<table>
<tr>
<td id="MyCell">Hello World<input id="MyCell_VALUE" type="hidden" value="Hello World" /></td>
</tr>
</table>
And then grab your value in javascript something like
document.getElementById(cell2.Element.id+'_VALUE').value
The input tags were designed to hold values, and you'll be less likely to run into fidelity issues.
Also, it sounds like you're using a .NET control of some kind. It might be worth looking through the documentation (ha) or asking a slightly different question to see if the control offers an official client-side API of some kind.

Just checked it and it looks like wrapping with the pre tag should do it.

Edit: I am wrong, ignore me.
You can get a text node's nodeValue, which should correctly represent its whitespace.
Here is a function to recursively get the text within a given element (and it's library-safe, won't fail if you use something that modifies Array.prototype or whatever):
var textValue = function(element) {
if(!element.hasOwnProperty('childNodes')) {
return '';
}
var childNodes = element.childNodes, text = '', childNode;
for(var i in childNodes) {
if(childNodes.hasOwnProperty(i)) {
childNode = childNodes[i];
if(childNode.nodeType == 3) {
text += childNode.nodeValue;
} else {
text += textValue(childNode);
}
}
}
return text;
};

This is a bit hacky, but it works on my IE.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
<title></title>
</head>
<body>
<div id="a">a b</div>
<script>
var a = document.getElementById("a");
a.style.whiteSpace = "pre"
window.onload = function() {
alert(a.firstChild.nodeValue.length) // should show 4
}
</script>
</body>
</html>
Some notes:
You must have a doctype.
You cannot query the DOM element before window.onload has fired
You should use element.nodeValue instead of innerHTML et al to avoid bugs when the text contains things like < > & "
You cannot reset whiteSpace once IE finishes rendering the page due to what I assume is an ugly bug

If someone could format my last post correctly it would look more readable. Sorry, I messed that one up. Basically the trick is create create a throwaway pre element, then append a copy of your node to that. Then you can get innerText or textContent depending on the browser.
All browsers except IE basically do the obvious thing correctly. IE requires this hack since it only preserves white-space in pre elements, and only when you access innerText.

This following trick preserves white-space in innerText in IE
var cloned = element.cloneNode(true);
var pre = document.createElement("pre");
pre.appendChild(cloned);
var textContent = pre.textContent
? pre.textContent
: pre.innerText;
delete pre;
delete cloned;

Develop Reference

JavaScript is the programming language of the Web.

JavaScript: Unable to read all elements from innerHTML content - javascript

Related

Difference between textContent vs innerText

Element custom properties not being kept after loop

IE7/8 + <time> tag + jQuery .clone() =?

Cyclic adding/removing of DOM nodes causes memory leaks in JavaScript?

Javascript Removing Whitespace When It Shouldn't?

Categories

Resources