Very strange error caused by html whitespace

Very strange error caused by html whitespace - javascript

I have encountered a very strange bug in Firefox.
I have a javascript function in an external file that works perfectly on regular complexity websites. However I have been putting together a few demonstration examples and come across something odd.
With html formatted like this (in an editor):
<div><p>Q: Where's the rabbit?</p><p class="faq_answer">A: I don't know, honest</p></div>
The Javascript works as expected.
However when like this:
<div>
<p>Q: Where's the rabbit?</p>
<p class="faq_answer">A: I don't know, honest</p>
</div>
It fails at this line:
elementsList[i].parentNode.firstChild.appendChild(finalRender.cloneNode(true));
Why on Earth would formatting of html cause anything at all?

It is not a bug. The DOM has not only element nodes, but also text nodes [docs] (among others). In this example:
<div>
<p>Q: Where's the rabbit?</p>
you have at least two text nodes:
One between the <div> and the <p>, containing a line-break.
One text node inside the <p> element node, containing the text Where's the rabbit?.
Thus, if elementsList[i].parentNode refers to the <div> element,
elementsList[i].parentNode.firstChild
will refer to the first text node.
If you want to get the first element node, use
elementsList[i].parentNode.children[0]
Update: You mentioned Firefox 3.0, and indeed, the children property is not supported in this version.
Afaik the only solution to this is to loop over the children (or traversing them) and test whether it is a text node or not:
var firstChild = elementsList[i].parentNode.firstChild;
// a somehow shorthand loop
while(firstChild.nodeType !== 1 && (firstChild = firstChild.nextSibling));
if(firstChild) {
// exists and found
}
You might want to put this in an extra function:
function getFirstElementChild(element) {
var firstChild = null;
if(element.children) {
firstChild = element.children[0] || null;
}
else {
firstChild = element.firstChild;
while(firstChild.nodeType !== 1 && (firstChild = firstChild.nextSibling));
}
return firstChild;
}
You can (and should) also consider using a library that abstracts from all that, like jQuery.
It depends on what your code is actually doing, but if you run this method for every node, it would be something like:
$('.faq_answer').prev().append(finalRender.cloneNode(true));
(assuming the p element always comes before the .faq_answer element)
This is the whole code, you wouldn't have to loop over the elements anymore.

Because you have a text node between <div> and <p>.
As usual, the assumption of a browser bug is incorrect: this is, instead, a programmer bug!

Couldn't one achieve it by using ParentNode.children instead?

Related

JS error in firefox working fine in IE

I have below method which is working fine in IE but I am getting the following error:
TypeError: hdnButton .click is not a function
when I use it in Firefox.
function PopupTest() {
var hdnButton = document.getElementById('div_btn').firstChild;
hdnButton.click();
}
Could anyone suggest how to get over this issue? I am not able to find any solution for it.

My guess (since you've shown no html) is that you have some white space (perhaps a line break) which IE is ignoring, but which FF is returning as a text node. If so, you should be able to skip over it with something like:
function PopupTest() {
var hdnButton = document.getElementById('div_btn').firstChild;
while (hdnButton.nodeType === 3)
hdnButton = hdnButton.nextSibling;
hdnButton.click();
}
...which works by using the .nodeType property to test whether the .firstChild is a text node (3 means text node; for more information see MDN). If it is, it takes the next sibling and tests its .nodeType, and so on until it finds an element that isn't a text node. (Of course this assumes there will be one.)
Alternatively if you know what tag the (non text node) first child should be you could select it on that basis, e.g., if it's actually a <button> element:
hdnButton = document.getElementById('div_btn').getElementsByTagName("button")[0];
hdnButton.click();
Or if you don't need to support IE<=7:
document.querySelector("#div_btn > :first-child").click();

Check if document.getElementById('div_btn').firstChild;
if it exists, maybe there is another error such a whitespace, or you are calling PopupTest() before the DOM is loaded.
Try:
<body onload="PopupTest()">

jQuery, how to test of a variable is a text node, containing no markup?

http://jsfiddle.net/DerNalia/zrppg/8/
I have two lines of code that pretty much do the same thing
var doesntbreak = $j("hello");
var breaks = $j(" ");
The first one doesn't error, but the second one throws this
Syntax error, unrecognized expression:
should'nt they both behave the same?
any insight as to how to solve this?
in the actual method I'm using, ele is from the Dom, so it could eb a text node, or any other kind of node.
UPDATE:
the input to the function that I'm using that I noticed this takes selection from the dom.
updated example: http://jsfiddle.net/DerNalia/zrppg/11/ <- includes html markup.
So, I guess, my question is, how do I test if something is JUST a text node? and doesn't contain any markup?

In general, you cannot create standalone text nodes with the jQuery function. If a string isn't obviously HTML, it gets treated as a selector, and is not recognized by jQuery as a valid selector.
Assuming you want to parse arbitrary strings (which may have HTML tags or not), I suggest something like var result = $('<div></div>').html(' ').contents();. Place your your HTML or text string in a div to parse it and then immediately extract the parsed result as a jQuery object with the list of elements. You can append the resultant list of elements with $(parentElem).append(result);

try this:
function isTextNode(node){
div=document.createElement('div');
div.innerHTML=node;
return $(div).text()==$(div).html();
}
And " " is'nt a valid selector if you want to find a elements containing some text you must use the :contains selector http://api.jquery.com/contains-selector/

Internet Explorer (older versions at least) don't have built in "querySelector" functions, so the Sizzle engine has to do the work directly. Thus, the slightly different tolerances for bogus input can cause differences in error reporting.
Your selector expression " " is equally invalid in all browsers, however. The library is not obliged to quietly accept anything you pass it, so perhaps you should reconsider your application design.
If you want to check for entities, you could use a regular expression if you're confident that it's just a text node. Or you could get the contents with .text() instead of .html().

So, I have to thank Apsillers and Rolando for pointing me in the right direction. Their answers were very close, but gave me the information I needed.
This is what I ended up using:
TEXT_NODE = 3;
objectify = function(n) {
return $j("<div></div>").html(n).contents();
}
function textOnly(n) {
var o = objectify(n);
for (var i = 0; i < o.length; i++) {
if (objectify(o[i])[0].nodeType != TEXT_NODE) {
return false
}
}
return true;
}
And here is a jsFiddle with some test cases, that neither of the original code submissions passed.
to pass, it needed to handle this kind of input
"hello" // true
"hello<b>there</b>" // false
"<b>there</b>" // false
" " // false

Not actual answer, but may help someone with similar issue as mine and loosely related to this question. :)
I was getting same issue today, so fixed by removing
Changed:
var breaks = $j(" ");
to:
var breaks = $j(" ".replace(/&.*;/g, ""));
Here I am removing , < etc...
Note: value at is dynamic for me, so it can be anything.

Javascript/XML - Getting the node name

I need to get the the name of the tag "myChild" and the "content".
This is simple, but i am stuck, sleepy and here is what I get with my tests:
XML:
...
<myParent>
<myChild>content</myChild>
</myParent>
<myParent>
<myChild>content</myChild>
</myParent>
...
JS:
var x=xmlDoc.getElementsByTagName("myParent");
alert(x[1].childNodes[0].nodeName); //returns "#text" - "myChild" needed
alert(x[1].childNodes[0].nodeValue); //returns "" - "content" needed

You want tagName, which is the name of the element. (Sorry about that, for Elements, tagName and nodeName are the same.)
The problem is that the first child of your myParent element isn't the myChild element, it's a text node (containing whitespace). Your structure looks like this:
Element "myParent"
Text node with a carriage return and some spaces or tabs
Element "myChild"
Text node with "content"
Text node with a carriage return and some spaces or tabs
Element "myParent"
Text node with a carriage return and some spaces or tabs
Element "myChild"
Text node with "content"
Text node with a carriage return and some spaces or tabs
You need to navigate down to the actual myChild element, which you can do with getElementsByTagName again, or just by scanning:
var x=xmlDoc.getElementsByTagName("myParent");
var c = x[1].firstChild;
while (c && c.nodeType != 1) { // 1 = ELEMENT_NODE
c = c.nextSibling;
}
alert(c.nodeName); // "myChild"
Note that Elements don't have a meaningful nodeValue property; instead, you collect their child text nodes. (More in the DOM specs: DOM2, DOM3.)
Also note that when indexing into a NodeList, the indexes start at 0. You seem to have started with 1; ignore this comment if you were skipping the first one for a reason.
Off-topic: It's always best to understand the underlying mechanics of what you're working with, and I do recommend playing around with the straight DOM and referring to the DOM specs listed above. But for interacting with these trees, a good library can be really useful and save you a lot of time. jQuery works well with XML data. I haven't used any of the others like Prototype, YUI, Closure, or any of several others with XML, so can't speak to that, but I expect at least some of them support it.

Try x[1].getElementsByTagName('*')[0] instead.
(This is only trustable for index 0, other indexes may return elements that are not child-nodes, if the direct childs contain further element-nodes. )

Is it wise to use jQuery for whitelisting tags? Are there existing solutions in JavaScript?

My problem
I want to clean HTML pasted in a rich text editor (FCK 1.6 at the moment). The cleaning should be based on a whitelist of tags (and perhaps another with attributes). This is not primarily in order to prevent XSS, but to remove ugly HTML.
Currently I see no way to do it on the server, so I guess it must be done in JavaScript.
Current ideas
I found the jquery-clean plugin, but as far as I can see, it is using regexes to do the work, and we know that is not safe.
As I've not found any other JS-based solution I've started to impement one myself using jQuery. It would work by creating a jQuery version of the pasted html ($(pastedHtml)) and then traverse the resulting tree, removing each element not matching the whitelist by looking at the attribute tagName.
My questions
Is this any better?
Can I trust jQuery to represent the pasted
content well (there may be unmatched
ending tags and what-have-you)?
Is there a better solution already that
I couldn't find?
Update
This is my current, jQuery-based, solution (verbose and not extensively tested):
function clean(element, whitelist, replacerTagName) {
// Use div if no replace tag was specified
replacerTagName = replacerTagName || "div";
// Accept anything that jQuery accepts
var jq = $(element);
// Create a a copy of the current element, but without its children
var clone = jq.clone();
clone.children().remove();
// Wrap the copy in a dummy parent to be able to search with jQuery selectors
// 1)
var wrapper = $('<div/>').append(clone);
// Check if the element is not on the whitelist by searching with the 'not' selector
var invalidElement = wrapper.find(':not(' + whitelist + ')');
// If the element wasn't on the whitelist, replace it.
if (invalidElement.length > 0) {
var el = $('<' + replacerTagName + '/>');
el.text(invalidElement.text());
invalidElement.replaceWith(el);
}
// Extract the (maybe replaced) element
var cleanElement = $(wrapper.children().first());
// Recursively clean the children of the original element and
// append them to the cleaned element
var children = jq.children();
if (children.length > 0) {
children.each(function(_index, thechild) {
var cleaned = clean(thechild, whitelist, replacerTagName);
cleanElement.append(cleaned);
});
}
return cleanElement;
}
I am wondering about some points (see comments in the code);
Do I really need to wrap my element in a dummy parent to be able to match it with jQuery's ":not"?
Is this the recommended way to create a new node?

If you leverage the browser's HTML correcting abilities (e.g. you copy the rich text to the innerHTML of an empty div and take the resulting DOM tree), the HTML will be guaranteed to be valid (the way it will be corrected is somewhat browser-dependent). Although this is probably done by rich editor anyways.
jQuery's own text-top DOM transform is probably also safe, but definitely slower, so I would avoid it.
Using a whitelist based on the jQuery selector engine might be somewhat tricky because removing an element while preserving its children might make the document invalid, so the browser would correct it by changing the DOM tree, which might confuse a script trying to iterate through invalid elements. (E.g. you allow ul and li but not ol; the script removes the list root element, naked li elements are invalid so the browser wraps them in ul again, that ul will be missed by the cleaning script.) If you throw away unwanted elements together with all their children, I don't see any problems with that.

How to find with javascript if element exists in DOM or it's virtual (has been just created by createElement)

I'm looking for a way to find if element referenced in javascript has been inserted in the document.
Lets illustrate a case with following code:
var elem = document.createElement('div');
// Element has not been inserted in the document, i.e. not present
document.getElementByTagName('body')[0].appendChild(elem);
// Element can now be found in the DOM tree
Jquery has :visible selector, but it won't give accurate result when I need to find that invisible element has been placed somewhere in the document.

Here's an easier method that uses the standard Node.contains DOM API to check in an element is currently in the DOM:
document.body.contains(MY_ElEMENT);
CROSS-BROWSER NOTE: the document object in IE does not have a contains() method - to ensure cross-browser compatibility, use document.body.contains() instead. (or document.head.contains if you're checking for elements like link, script, etc)
Notes on using a specific document reference vs Node-level ownerDocument:
Someone raised the idea of using MY_ELEMENT.ownerDocument.contains(MY_ELEMENT) to check for a node's presence in the document. While this can produce the intended result (albeit, with more verbosity than necessary in 99% of cases), it can also lead to unexpected results, depending on use-case. Let's talk about why:
If you are dealing with a node that currently resides in an separate document, like one generated with document.implementation.createHTMLDocument(), an <iframe> document, or an HTML Import document, and use the node's ownerDocument property to check for presence in what you think will be your main, visually rendered document, you will be in a world of hurt.
The node property ownerDocument is simply a pointer to whatever current document the node resides in. Almost every use-case of contains involves checking a specific document for a node's presence. You have 0 guarantee that ownerDocument is the same document you want to check - only you know that. The danger of ownerDocument is that someone may introduce any number of ways to reference, import, or generate nodes that reside in other documents. If they do so, and you have written your code to rely on ownerDocument's relative inference, your code may break. To ensure your code always produces expected results, you should only compare against the specifically referenced document you intend to check, not trust relative inferences like ownerDocument.

Do this:
var elem = document.createElement('div');
elem.setAttribute('id', 'my_new_div');
if (document.getElementById('my_new_div')) { } //element exists in the document.

The safest way is to test directly whether the element is contained in the document:
function isInDocument(el) {
var html = document.body.parentNode;
while (el) {
if (el === html) {
return true;
}
el = el.parentNode;
}
return false;
}
var elem = document.createElement('div');
alert(isInDocument(elem));
document.body.appendChild(elem);
alert(isInDocument(elem));

You can also use jQuery.contains:
jQuery.contains( document, YOUR_ELEMENT)

Use compareDocumentPosition to see if the element is contained inside document. PPK has browser compatibility details and John Resig has a version for IE.

function isInDocument(query){
return document.querySelectorAll(query).length != 0;
}
// isInDocument("#elemid")

Develop Reference

JavaScript is the programming language of the Web.