jQuery parses raw HTML with paragraph wrong

jQuery parses raw HTML with paragraph wrong - javascript

The screenshot shows the firebug watch window.
Why does it parse the almost same HTMLs wrong? I expected there would be just one element in the second row, instead of an array of elements.

The browser is not wrong. <p><div></div></p> is invalid HTML.
The reason why the browser analyses different the two codes is because <p> elements are only allowed to contain inline elements.
Both <p> and <div> are block elements but <p> cannot contain a <div> which is not phrasing content. So when the browser reads that code he finds the element <p> and then a unexpected <div>. Browsers are very tolerant to markup errors, so the browser closes the p tag and goes to the next div element. Then comes the third element, (also wrong HTML because it misses the opening tag) so it's read as a new element.
In the first case you have nested elements, so the browser shows one element;
In the second case you have three elements in the same DOM tree level, so an array of elements is the browsers answer.
They both render but the wrong one can produce unexpected results. How the browser will read wrong markup plus CSS will be difficult to predict.
So, the browser reads/parses the code as: <p></p><div></div><p></p>, giving you different results.
Worth to read:
W3 / HTML5 spec:
p – paragraph
div – generic flow container.
MOZILLA DEVELOPER NETWORK:
MDN: p element (check "Permitted content")
MDN: block-level elements

The result is not wrong in either case.
The <p> HTML tag may only contain phrasing content elements. However, <div> is not phrasing content (but a flow element). (Simplified <p> may contain inline elements, but <div> is a block element.) Thus, the HTML code from your second example is invalid (as in not standard conforming).
What happens as a result is that the browsers HTML to DOM parser - which is triggered by jQuery of course - handles the HTML as follows:
Identify <p> block being opened
Identify <div> block being opened
Notice a div block is invalid within the previously opened <p>
Close the previous <p> block
…
So an equivalent HTML code would be <p></p><div></div><p></p>, which is valid HTML. So the parser corrects the HTML for you.
Because we now have three top level elements rather than nested elements with one top level element your get an array of DOM elements rather than one element like you expected.
Webbrowsers are very robust against non-standard conformant HTML code. The behaviour you noticed and pointed out here is one of the many examples where the parser makes sense out of invalid HTML code as a best effort.
References:
W3: HTML markup: p
W3: HTML markup: div
MDN p

Related

Most appropriate Tag to Mark a Section of HTML Text to Minipulate with JavaScript

I want to mark a section of text to dynamically edit with JavaScript.
After skimming through the List of Inline Elements, I decided to use <a> tag, which seems like a no-brainer choice.
function updatePrice() {
var current = Number($(".price").text());
$(".price").text(1 + current);
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<p>This item is $<a class="price">1</a></p>
<button onClick = "updatePrice();">One Up!</button>
I played around with <var> tag as well, but it seems to be a wrong usage for the tag - it's supposed to be used for the name of the variable
, not the variable itself.
While <a> seems to be the best choice, it seems like a wrong usage for the tag:
The HTML <a> element (or anchor element) creates a hyperlink to other
web pages, files, locations within the same page, email addresses, or
any other URL.
Am I missing something?
Is this something that we accept like usage of <i> as an icon tag?
Am I being too picky about the definition and intended usage of the
tags?
Is this something HTML doesn't care about since it's origin doesn't care about such usage?
I'm asking the question to understand the topic at a scholastic level.

If there is no element with appropriate semantics (and <a> doesn't for this) then <div> and <span> are the generic fallback elements. <span> is inline, so use that.

<span> is what you need.
Two types of tags in HTML
Semantic clearly defines its contents and has specific rules and roles. <ul> -> <li> or p or a
Non-Semantic Totally tells nothing about its content and they don't have any rules or roles. Example: <span> or <div>
HTML is for machines, not for humans to understand. So we should split tags into two parts. One for machines to understand and give specific roles (semantic) and one for us to provide not important contents (non-semantic).
When browser parses an <a> tag, it creates a special role for it. It is designed for navigation. And also search engines, screen readers try to use them for navigation. When search engine parses <h1>, it understands that this may be the title of the page and give more importance to it.
But non-semantic tags are totally unimportant for the machines. They don't have a specific role except for holding some content.
So we should be using non semantic tags to mark special contents for us. Maybe we will use selectors or styles, it is up to us.
Why you shouldn't use div?
<div> is a block element which wants to contain line.
<span> is an inline element.
<span> is totally non-semantic and has no meaning except for creating special contents with different styles or using with special types of functions like your case.
But the important thing here is understanding what semantic and non-semantic tags are.

As Ahmet Can Güven suggested and Andrew Lohr agreed with in the comments to the question, <span> tag seems to be the best tag to achieve the goal, as:
<span> is very much like a element, but is a block-level
element whereas a <span> is an inline element.
and is:
generic inline container for phrasing content, which does not
inherently represent anything.
function updatePrice() {
var current = Number($(".price").text());
$(".price").text(1 + current);
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<p>This item is $<span class="price">1</span></p>
<button onClick = "updatePrice();">One Up!</button>

How to convert from mixed HTML-string/DOM-elements to DOM-elements in Javascript?

I wish to implement the following Javascript function:
function AllToDom(partsArray)
{
// Magic!
}
// Called like:
var rowObject = AllToDom(['<tr>', tdElem1, tdElem2, '<td>XXX</td>', '<td>',
divContents,'</td></tr>']);
// Where tdElem1, tdElem2, divContents are DOM node objects.
The thing is I want it to work on any kinds of combinations of DOM nodes and HTML fragments. As long as it produces a valid HTML of course. Unclosed HTML tags and disallowed element combinations (like <table><div></div>) are allowed to have undefined behavior.
My first idea was to concatenate it all in a HTML string, except in place of DOM elements add a placeholder comment <!--SNOOPY-->. So the above would result in the following string:
<tr><!--SNOOPY--><!--SNOOPY--><td>XXX</td><td><!--SNOOPY--></td></tr>
This is already a valid piece of HTML, so next I create a <div>, assign this to innerHTML, gather the produced DOM nodes, and iterate through them and replace all <!--SNOOPY--> with the respective DOM element.
There are two flaws with this approach however:
Adding a <tr> as a child element to a <div> is invalid. I don't know if it might not break on some condition.
Internet Explorer 8 (the least version that I need to support) strips all comments when assigning to innerHTML.
Are there any workarounds? Is this possible at all?

Finally found an answer: jQuery has already done all the dirty work in their parseHTML() method. And I just happen to be using jQuery anyway, so good for me! :)
I checked what the magic was behind the scenes, and it's really pretty gruesome. First, they inspect the HTML (with regexs...) to see what parent tag they need to use, and then they have a workaround for IE8, which apparently it DOES preserve comment nodes - but only if they come after a text node. All comments before the first text node are lost. And some tags are affected this way too, which I had no idea about. And then there's half a dozen other workarounds for IE & Webkit problems that I've never even heard of.
So, I'm just going to leave it to them to do the right thing, because trying to reproduce that stuff would be madness.

Explain this DOM traversal order

I wrote the following page as a DOM traversal demo:
<html>
<head>
<title>DOM Traversal</title>
</head>
<body>
<h1>Sample H1</h1>
<div id="text">
<p>Sample paragraph</p>
</div>
</body>
<script>
// Traversing the DOM tree
"use strict";
var node = document.body;
while(node) {
console.log(node);
node = node.lastChild;
}
</script>
</html>
Surprisingly, the output I'm getting is the body tag followed by the script tag. How is this possible? Isn't the script tag a sibling of the body tag? Also, why aren't the child nodes of body being traversed?

You cannot add your script element outside of the head or body elements. The browser is auto-correcting your HTML, moving the script into the end of your body, which explains the result you are getting.
The html element may only contain the head and body elements as its children. Anything else must be placed within these two elements.

On top of what iMoses said, your script block will run before the entire document is parsed, as it doesn't wait on the domReady event. This seems to cause a race condition.
If you leave your script where it is, but wait for domReady, you get a slightly different result (albeit still not what you want).
EDIT: change your script to output "node.outerHTML" instead, and you will see that the script block gets moved or rather duplicated into the body.
Without waiting for document ready, you end up with two script blocks while the script is running - one is your original, the other one at the end of the body as iMoses pointed out.
Waiting for document ready, you will find that only the (moved) one inside the body remains.

A good place to start with resolving this kind of 'issue' is to validate the source first.
Tools such as the W3C HTML validator will help you avoid this kind of problem going forward:
http://validator.w3.org/check
Validating your code returns the following:
Line 11, Column 12: document type does not allow element "SCRIPT" here
<script>
The element named above was found in a context where it is not allowed. This could mean that you have incorrectly nested elements -- such as a "style" element in the "body" section instead of inside "head" -- or two elements that overlap (which is not allowed).
One common cause for this error is the use of XHTML syntax in HTML documents. Due to HTML's rules of implicitly closed elements, this error can create cascading effects. For instance, using XHTML's "self-closing" tags for "meta" and "link" in the "head" section of a HTML document may cause the parser to infer the end of the "head" section and the beginning of the "body" section (where "link" and "meta" are not allowed; hence the reported error).

Can I ignore html dom elements that are inside other dom element?

I have a piece of javascript code that I wrote to grab the html of a certain dom element, the problem is there is another element inside that dom element and it's rendering as html.
example:
<p>
test.append("<ul />");
</p>
Is there a way to ignore the ul inside the p without having to replace < with < and things of that sort?
The javascript code I wrote just takes the current text in the provided dom and places code lines next to it. Such as an IDE would.

In XHTML and HTML 5, you can use CDATA sections so that you don't have to escape critical characters:
<p>
<![CDATA[
test.append("<ul />");
]]>
</p>
Update: I don't know of any method to achieve that for HTML <= 4 documents. CDATA is implicitly assumed for e.g. <script> content, but certainly not for <p>. However, why not properly escape characters (e.g. < -> <) in the first place? If your content is static, your text editor might help you with that; if it is dynamic (generated by PHP or whatever), there are functions to do that for you.

How, with javascript, can i read Childnode content on an XML file that contains html tags

To read a child node content I use :
MYDATA = xhr.responseXML.getElementsByTagName("MenuItem")[INDEX].getElementsByTagName("PageContent")[0].childNodes[0].nodeValue;
sometimes when the childNode data contains an HTML tag (eg <b> or <br> tags), I have problems since they are counted like XML tags (like childnodes).
My question is how to get the entire data from a child node even if it contains other html tags
Example:
<MenuItem>
<MenuText>menu <b> text <b><MenuText>
</MenuItem >
would return "menu", but I want it to return: menu <b> text <b>

Yes, and no, depending on your parser. Reason for this is because all text nodes in XML are suppose to have < and > replaced with their htmlentity() counterparts, and all other special characters replaced with htmlspecialchars(). I'm fairly certain that it creates a new node, with the HTML tag as the name.
The only two solutions for this is to store the XML data into a string, use regex to take out the HTML tags (well, all the < and > characters for that matter), and replace them with the correct values I noted above, before you pass it to a parser (parser.parseFromString() in javascript, given that 'parser' is a DOM parser). The other is to take the node, and then get the entire node's set of child nodes using a recursive loop, and then concatenate together their names and contents. The second method is more programming work, and more processing involved, and I suggest the simple remedy of regex and replacement of the characters.
Or, you can read about CDATA here, and escape the tags instead, by placing all of the content within a ![CDATA[] tag, but that's if you're the one creating that XML file. You should notify the webmaster for the site that you got the XML from, that the XML is incorrectly created, and the tags need to be escaped with the ![CDATA[] tag, or replaced the < and > with their htmlentity() counterparts. I suppose that you can also use regex to place the HTML code within a ![CDATA[] tag, but that's probably slower and less efficient than replacing the < and > tags.

The official W3C element property to return all text from an element and all it's descendants is part of DOM v3 and called textContent, but it's not supported in every browser yet (I'm looking at you IE; I think it's called innerText there) - if that is even relevant for you.
So your line of code would look something like this for your XML snippet:
MYDATA = xhr.responseXML.getElementsByTagName("MenuItem")[INDEX].getElementsByTagName("MenuText")[0].textContent;
That will not retain the HTML tags though. So in the end it depends on what you're trying to do with that XML. Do you want to add it to another DOM tree? If so, you can just take that element with all it's descendants and append it elsewhere.
MYDATA = xhr.responseXML.getElementsByTagName("MenuItem")[INDEX].getElementsByTagName("MenuText")[0].cloneNode(true);
someOtherElement.appendChild(MYDATA);
Otherwise you'd have to write a loop that will copy each node (text content is a node, too, just like whitespace) from source to destination and append it there.

Develop Reference

JavaScript is the programming language of the Web.

jQuery parses raw HTML with paragraph wrong - javascript

The screenshot shows the firebug watch window. Why does it parse the almost same HTMLs wrong? I expected there would be just one element in the second row, instead of an array of elements.

Related

Most appropriate Tag to Mark a Section of HTML Text to Minipulate with JavaScript

How to convert from mixed HTML-string/DOM-elements to DOM-elements in Javascript?

Explain this DOM traversal order

Can I ignore html dom elements that are inside other dom element?

How, with javascript, can i read Childnode content on an XML file that contains html tags

Categories

Resources