Content inside CDATA is not displayed properly when processed through JavaScript

Content inside CDATA is not displayed properly when processed through JavaScript - javascript

I have an XML document with some sample content like this:
<someTag>
<![CDATA[Hello World]]>
</someTag>
I'm parsing the above XML in JavaScript. When I try access and render the Hello World text using xmldoc.getElementsByTagName("someTag")[0].childNodes[0].textContent all I get was a blank text on screen.
The code is not returning undefined or any error messages. So I guess the code is properly accessing the message. But due to CDATA, it is not rendering properly on screen.
Anyway to fix the issue and get the Hello World out of this xml file?

Note that Firefox's behaviour is absolutely correct. someTag has three children:
A Text node containing the whitespace between the <someTag> and <!CDATA. This is one newline and one space;
the CDATASection node itself;
another whitespace Text node containing the single newline character between the end of the CDATA and the close-tag.
It's best not to rely closely on what combination of text and CDATA nodes might exist in an element if all you want is the text value inside it. Just call textContent on <someTag> itself and you'll get all the combined text content: '\n Hello World\n'. (You can .trim() this is you like.)

If you're running Firefox, maybe this is the issue you're having. The behavior looks very similair... The following might do the trick:
xmldoc.getElementsByTagName("someTag")[0].childNodes[1].textContent;

Related

Parsing plain text Markdown from a ContentEditable div

I know there are other questions on editable divs, but I couldn't find one specific to the Markdown-related issue I have.
User will be typing inside a ContentEditable div. And he may choose to do any number of Markdown-related things like code blocks, headers, and whatever.
I am having issues extracting the source properly and storing it into my database to be displayed again later by a standard Markdown parser. I have tried two ways:
$('.content').text()
In this method, the problem is that all the line breaks are stripped out and of course that is not okay.
$('.content').html()
In this method, I can get the line breaks working fine by using regex to replace <br\> with \n before inserting into database. But the browser also wraps things like ## Heading Here with divs, like this: <div>## Heading Here</div>. This is problematic for me because when I go to display this afterwards, I don't get the proper Markdown formatting.
What's the best (most simple and reliable) way to solve this problem as of 2015?
EDIT: Found a potential solution here: http://www.davidtong.me/innerhtml-innertext-textcontent-html-and-text/

if you check the documentation of jquery's .text() method,
The result of the .text() method is a string containing the combined text of all matched elements. (Due to variations in the HTML parsers in different browsers, the text returned may vary in newlines and other white space.)
so getting whitespaces is not guaranteed in all browsers.
try using the innerText property of the element.
document.getElementsByClassName('content')[0].innerText
this returns the text with all white spacing intact. But this is not cross browser compatible. It works in IE and Chrome, but not in Firefox.
the innerText equivalent for Firefox is textContent (link), but that strips out the whitespaces.

This is what I've been able to come up with using that link I posted above in my edit. It's in Coffeescript.
div = $('.content')[0]
if div.innerText
text = div.innerText
else
escapedText = div.innerHTML
.replace(/(?:\r\<br\>|\r|\<br\>)/g, '\n')
.replace(/(\<([^\>]+)\>)/gi, "")
text = _.unescape(escapedText)
Basically, I'm checking whether or not innerText works, and if it doesn't then we do this other thing where we:
Take the HTML, which has escaped text.
Replace all the <br> tags with line breaks.
Strip out any tags (escaped ones won't be stripped, i.e. the stuff the user types).
Unescape the escaped text.

How to retrieve the text in html CDATA section?

I have the following script element section in HTML:
<script type="text/x-markdown"><![CDATA[
# hello, This is Markdown Script Demo]]></script>
When i'm trying to retrieve the inner content via scripttag.innerHTML, it returns the text with ![CDATA[...]]>parts
Is there more efficient way to retrieve the inner part of CDATA section at once instead of applying regexp to remove it from received innerHTML data?

I don't think you will be able to retreive only whats inside the CDATA as its not a tag but plain text, when you get the innerHTML of the tag you will get everything as a string, so regexp is the only way I see you could get whats inside.

CDATA is an XML concept. It is a way of specifying a section of text inside which things that look like mark-up or special XML characters are treated as plain text. It is essentially equivalent to escaping < to < etc. everywhere within the CDATA section.
If the document has an HTML doctype, then the CDATA receives no special processing and is just more characters. If the document had an XHTML doctype, then you would be able to retrieve the CDATA section as is, with no further ado.

This question is quite old, but this might help somebody.
You can probably use textContent.
Example from parsing a rss feed node which looks like this:
<title><![CDATA[This contains the title]]></title>
Javascript:
const desc = el.querySelector('title').textContent;

Invalid location of <script> tag within a HTML <pre> tag

I am going through the example given in JavaScript The Complete Reference 3rd Edition.
The O/P can be seen here, given by the author.
<body>
<h1>Standard Whitespace Handling</h1>
<script>
// STRINGS AND (X)HTML
document.write("Welcome to JavaScript strings.\n");
document.write("This example illustrates nested quotes 'like this.'\n");
document.write("Note how newlines (\\n's) and ");
document.write("escape sequences are used.\n");
document.write("You might wonder, \"Will this nested quoting work?\"");
document.write(" It will.\n");
document.write("Here's an example of some formatted data:\n\n");
document.write("\tCode\tValue\n");
document.write("\t\\n\tnewline\n");
document.write("\t\\\\\tbackslash\n");
document.write("\t\\\"\tdouble quote\n\n");
</script>
<h1>Preserved Whitespace</h1>
<pre>
<script> // in Eclipse IDE, at this line invalid location of tag(script)
// STRINGS AND (X)HTML
document.write("Welcome to JavaScript strings.\n");
document.write("This example illustrates nested quotes 'like this.'\n");
document.write("Note how newlines (\\n's) and ");
document.write("escape sequences are used.\n");
document.write("You might wonder, \"Will this nested quoting work?\"");
document.write(" It will.\n");
document.write("Here's an example of some formatted data:\n\n");
document.write("\tCode\tValue\n");
document.write("\t\\n\tnewline\n");
document.write("\t\\\\\tbackslash\n");
document.write("\t\\\"\tdouble quote\n\n");
</script>
</pre>
</body>
(X)HTML automatically “collapses” multiple whitespace characters down to one whitespace. So, for example, including multiple consecutive tabs in your HTML shows up as only one space character. In this example, the pre tag is used to tell the browser that the
text is preformatted and that it should not collapse the white space inside of it. Similarly, we could use the CSS white-space property to modify standard white space handling. Using pre allows the tabs in the example to be displayed correctly in the output.
So, how to get rid of this warning and do i really need to have a concern for this? I think i am missing something as i have the intuition of the authors not being wrong?

There is nothing wrong in having script inside pre tag. It is just Eclipse IDE validation issue. If you use this html in the browser everything works fine and no warnings are displayed.
Also, if you wanted to show script tag as 'text content' inside pre tag, then have a look at this question: script in pre

CKEditor setData adding P tag

When i am passing HTML to a CKEditor instance, a P tag is being inserted within the HTML producing unexpected results.
For example, with the following code:
CKEDITOR.instances["myEditor"].setData("<div>1</div><div>2</div>");
the editor does not display them as block elements (it outputs as "12" inline). Calling getData() and i see the HTML is reformatted incorrectly as:
"<div>
<p>
1</div><div>2</div></p>
"
I've played with the enterMode configuration based on some research but haven't found a magic combination. Any suggestions? (I am using 3.6.5)

I figured it out - we we're using regex to strip out some tags when pasting and this was also impacting initial values.

why javascript string replace using regex removes a "/" from my br tag

I'm using javascript with a super simple regex to replace a "<" with the HTML character code for it so I can place some code on my site using the pre and code tags and have it done automatically.
jsFiddle link
basically I'm trying to figure out why this js code:
var str = document.getElementById("cleanme").innerHTML;
str=str.replace(/</g,"<");
document.getElementById("cleanme").innerHTML = str;
removes the "/" in the br tag
<pre><code id="cleanme">
<p><br />this is some code</p>
</code></pre>
not a huge deal because I'm just displaying code, but I'd still like to know.
it outputs this:
<p><br>this is some code</p>
thanks

I believe it has to do with the way certain browsers return the innerHTML property. If you use Google Chrome, inspect any < br/ > tag using the debugging tools and you'll notice they don't show a backslash. The same is true when Chrome returns an innerHTML property, the blackslash is stripped out.
So when you pass in:
<pre><code id="cleanme">
<p><br />this is some code</p>
</code></pre>
The browser return an innerHTML property of:
<pre><code id="cleanme">
<p><br>this is some code</p>
</code></pre>
Your RegEx is not the issue.

Your script is OK.
If you try this:
var str = '<p><br />this is some code</p>';
str=str.replace(/</g,"<");
str=str.replace(/>/g,">");
document.getElementById("cleanme").innerHTML = str;
It'll correctly print <br />.
Possibly it's effect of browser's HTML normalization.

Maybe too late to help you, and you've accepted a correct answer, but there's another big potential problem.
I tried this with Firefox 3.6.11 on Linux and 3.6.12 on Windows and they both behaved the same --
I did not see the <p><br>this is some code</p> in the Result pane on your fiddle, instead I saw simply this is some code with no markup at all.
Throwing firebug at it by adding a debugger; statement as the first line in the JavaScript pane and tracing through it, I found that str was getting a value of '\n', that is, just a newline was being returned from innerHTML and nothing else.
Thinking about this, but with no way to confirm it, I suspect it's because Firefox is building the DOM tree differently than you expect, because the HTML you're using is invalid. Inline elements are not allowed to contain block elements; specifically, the <code> tag is not allowed to contain a <p> tag, and <pre> is likewise not allowed to contain a <p> tag -- again, only limited inline elements can be used inside a <pre> tag).
I think FF is implicitly closing the code block before opening the paragraph so the innerHTML of id="cleanme" is nothing but the newline. It renders with the "pre" font as you expect because you've thrown the browser into Quirks Mode.

innerHTML does not return the literal source code, but the result of the browser's interpretation of it.
Different browsers will return very different results for innerHTML, sometimes omitting some quotes and 'optional' end tags, capitalizing some tag names and attributes, and collapsing extra white-space.
And HTML does not close open tags that can't have end tags, so they are not included either.

Develop Reference

JavaScript is the programming language of the Web.

Content inside CDATA is not displayed properly when processed through JavaScript - javascript

If you're running Firefox, maybe this is the issue you're having. The behavior looks very similair... The following might do the trick: xmldoc.getElementsByTagName("someTag")[0].childNodes[1].textContent;

Related

Parsing plain text Markdown from a ContentEditable div

How to retrieve the text in html CDATA section?

Invalid location of <script> tag within a HTML <pre> tag

CKEditor setData adding P tag

why javascript string replace using regex removes a "/" from my br tag

Categories

Resources