Why use *//<![CDATA[* and *//]]>* in a jQuery script? [duplicate] - javascript

This question already has answers here:
When is a CDATA section necessary within a script tag?
(15 answers)
Closed 9 years ago.
I have this working piece of javascript:
<script type="text/javascript">
//<![CDATA[
jQuery(document).ready(function() {
jQuery("#page_template option[value='sidebar-page.php']").remove();
});
//]]>
</script>
What's //<![CDATA[ and //]]> stand for? I never used it but lately I meet it very often.
Thank you guys in advance for increasing my knowledge! ;)

CDATA is used to allow the document to be loaded as straight XML. You can embed JS in XML documents without replacing special XML characters like <, >, &, etc by XML entities <, >, & etc to prevent that the XML syntax get corrupted.
So double slash // in your XML will be treated as text instead of a comment and hence it makes CDATA as an XML tag.
The wiki says that:-
In an XML document or external parsed entity, a CDATA section is a
section of element content that is marked for the parser to interpret
as only character data, not markup. A CDATA section is merely an
alternative syntax for expressing character data; there is no semantic
difference between character data that manifests as a CDATA section
and character data that manifests as in the usual syntax in which <
and & would be represented by < and &, respectively.

Related

why this small piece of JavaScript breaks? [duplicate]

This question already has answers here:
Why does <!--<script> cause a DOM tree break on the browser?
(2 answers)
Closed 6 years ago.
Why this code breaks:
<script>
var test = "<!-- <script ";
</script>
<h1>
If you can see this it means the page didn't break
</h1>
https://jsfiddle.net/y3w7ugaw/
and this doesn't
<script>
var test = "<!-- <script";
</script>
<h1>
If you can see this it means the page didn't break
</h1>
https://jsfiddle.net/mL1xxygo/
I should not break since test var is a string
Good question. The two examples are not the same in that the first has a space between <script and the following closing double quote while the second does not. Both examples have the character sequence <!--, used to introduce comments in HTML source, inside the javascript string.
The first example does not show the header, which can be made to reappear by either
removing the <!-- characters, OR
by removing the space after <script in the string value.
The question alluded to in comment states that the HTML is invalid although reading the HTML parsing spec does not make the reason particularly obvious.
A javascript solution is to escape characters confusing the parser with a backslash, even though the character does not normally need escaping. JavaScript ignores backslashes before ordinary characters whilst the parser does not.
Hence either
var test = "<\!-- <script ";
or
var test = "<\!-- <script";
both successfully create a string containing the HTML start comment sequence without confusing the parser.

regular expression : ignore html tags [duplicate]

This question already has answers here:
Finding substring whilst ignoring HTML tags
(3 answers)
Closed 2 years ago.
I have HTML content like this:
<p>The bedding was hardly <strong>able to cover</strong> it and seemed ready to slide off any moment.</p>
Here's a complete version of the HTML.
http://collabedit.com/gkuc2
I need to search the string hardly able to cover (just an example), I want to ignore any HTML tags inside the string I'm looking for. Because in the HTML file there's HTML tags inside the string and a simple search won't find it.
The use case is: I have two versions of a file:
An HTML file with text and tags
The same file but with the raw text only (removed any tags and extra spaces)
The sub-string that I want to search (the needle) is from the text version (that doesn't contain any HTML tag) and I want to find it's position in the HTML version (the file that has tags).
What is the regular expression that would work?
Put this between each letter:
(?:<[^>]+>)*
and replace the spaces with:
(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*
Like:
h(?:<[^>]+>)*a(?:<[^>]+>)*r(?:<[^>]+>)*d(?:<[^>]+>)*l(?:<[^>]+>)*y(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*a(?:<[^>]+>)*b(?:<[^>]+>)*l(?:<[^>]+>)*e(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*t(?:<[^>]+>)*o(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*c(?:<[^>]+>)*o(?:<[^>]+>)*v(?:<[^>]+>)*e(?:<[^>]+>)*r
you only need the ones between each letter if you want to allow tags to break words, like: This is b<b>old</b>
This is it without the letter break:
hardly(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*able(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*to(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*cover
This should work for most cases. However, if the Html is malformed in which the < or > is not htmlencoded, you may run into issues. Also it may break on script blocks or other elements with CDATA sections.
Try to save the text in a variable or something, then remove all the tags and perform a normal search in that.
You can use a simple php function strip_tags().
EDIT:
So you might try to look for the first and last words (or just first and then play with the rest of the result) to locate the string, then parse the result and remove tags and check if it's the one you're looking for.
Like using regex:
hardly.cover
or even
hardly.$
And saving the location of each result.
Then use strip_tags() on the results and analyze each result if it's the one you want.
I know it's kinda weird solution but you can avoid endless regex etc.

Prevent JS from parsing string [duplicate]

This question already has answers here:
Escaping </script> tag inside javascript
(3 answers)
Closed 8 years ago.
Was playing around with some code and just realized you can't write a script tag in a string without the browser trying to display:
<html>
<head>
<script>
var code = "<script></script>";
</script>
</head>
This prints to the screen. Weird - why this behavior?
This has nothing to do with JavaScript "string parsing". Rather it's about HTML parsing.
It is simply not valid for HTML for a <script> element to contain the sequence </script> (actually, any </ although browsers are lenient on that) in it's content - any such sequence will always be treated as the closing tag.
See Escaping </script> tag inside javascript for lots of the details.
A common solution is thus to separate the sequence using string concatenation
var code = "<script><"+"/script>";
Although it is also valid to use an escape ("<script><\/script>") or an escape sequence ("<script><\x2fscript>").
The CDATA approach should not be used with HTML, as it's only for XML.

Wordpress & Javascript: String variable having html tags being read by browser with newline character

I have gone crazy trying to resolve this issue.
In my javascript code I have am defining a string variable in which I am putting an HTML table in the form of string.. i.e.:
var tData="<table><tbody><tr><a><th>Type</th><th>Score</th><th>Percentile</th></a></tr><tr><td><a>Overall</a></td><td>2.4</td><td>50%</td></tr><tr><td><a>Best 100</a></td><td>2.3</td><td>70%</td></tr></tbody></table>";
Now this variable assignment through the string is being read by my browser (both chrome and firefox) as an HTML code with line breaks. Take a look at the image below for more clarity.
The code works fine if I remove html tags and write a simple string. So I can assure you there are no previous inverted comma errors (i checked them multiple times) and no bogus characters.
I have spent too many hours on this issue. Please please help me on this.
EDIT
Added Wordpress in title and Tags as this is a wordpress issue.
Since your document is XHTML, you have to enclose your code into a CDATA section:
<script>
<![CDATA[
// code here
]]>
</script>
This prevents the browser from interpreting <...> sequences in the content as tags.
If you want multiline strings in JavaScript, you have to unescape the newline, ie
var str = "abc\
de";
Ok. Eureka!!!
I found a get around. I broke the following string :
var tData="<table><tbody><tr><a><th>Type</th><th>Score</th><th>Percentile</th></a></tr><tr><td><a>Overall</a></td><td>2.4</td><td>50%</td></tr><tr><td><a>Best 100</a></td><td>2.3</td><td>70%</td></tr></tbody></table>";
into
var tData = "<tab"+"le><tb"+"ody><t"+"r><a><t"+"h>Type</t"+"h><t"+"h>Score</t"+"h><t"+"h>Percentile</t"+"h></a></t"+"r><t"+"r><t"+"d><a>Overall</a></t"+"d><t"+"d>2.4</t"+"d><t"+"d>50%</t"+"d></t"+"r><t"+"r><t"+"d><a>Best 100</a></t"+"d><t"+"d>2.3</t"+"d><t"+"d>70%</t"+"d></t"+"r></tbo"+"dy></ta"+"ble>";
to fool the browser. I am still hoping for a better answer please.
Delete all invisible characters (whitespace) around that area,
then give it another try.
Try this:
var tData="<table><tbody>";
tData+="<tr><th><a>Type</a></th><th>Score</th><th>Percentile</th></tr>";
tData+="<tr><td><a>Overall</a></td><td>2.4</td><td>50%</td></tr>";
tData+="<tr><td><a>Best 100</a></td><td>2.3</td><td>70%</td></tr>";
tData+="</tbody></table>";
Possible Duplicate No visible cause for "Unexpected token ILLEGAL"

Is it necessary to "escape" character "<" and ">" for javascript string?

Sometimes, server side will generate strings to be embedded in inline JavaScript code. For example, if "UserName" should be generated by ASP.NET. Then it looks like.
<script>
var username = "<%UserName%>";
</script>
This is not safe, because a user can have his/her name to be
</script><script>alert('bug')</script></script>
It is XSS vulnerability.
So, basically, the code should be:
<script>
var username = "<% JavascriptEncode(UserName)%>";
</script>
What JavascriptEncode does is to add charater "\" before "/" and "'" and """. So, the output html is like.
var username = "</script>alert(\'bug\')</script></script>";
Browser will not interpret "</script>" as end of script block. So, XSS in avoided.
However, there are still "<" and ">" there. It is suggested to escape these two characters as well. First of all, I don't believe it is a good idea to change "<" to "<" and ">" to ">" here. And, I'm not sure changing "<" to "\<" and ">" to "\>" is recognizable to all browsers. It seems it is not necessary to do further encoding for "<" and ">".
Is there any suggestion on this?
Thanks.
The problem has different answers depending on what markup language you are using.
If you are using HTML, then you must not represent them with entities as script elements are marked as containing CDATA.
If you are using XHTML, then you may represent them as CDATA with explicit CDATA markers, or you may represent them with entities.
If you are using XHTML, but serving it as text/html, then you need to write something which conforms to the rules of XHTML but still works with a text/html parser. This generally means using explicit CDATA markers and commenting them out in JavaScript.
<script type="text/javascript">
// <![CDATA[
…
// ]]>
</script>
A while ago, I wrote a bit about the hows and whys of this.
No, you should not escape < and > using HTML entities inside <script> in HTML.
Use JavaScript string escaping rules (replace \ with \\ and " with \")
and replace all occurances of </ with <\/, to prevent escaping out of the <script> element.
In XHTML it's more complicated.
If you send XHTML as XML (the way that's incompatible with IE) and don't use CDATA block, then you need to escape entities, in addition to JavaScript string escaping.
If you send XHTML as XML and use CDATA block, then don't escape entities, but replace ]]> with ]]]]><![CDATA[> to prevent escaping out of it (in addition to JavaScript string escaping).
If you send XHTML as text/html (what 99% of people does) then you have to use XML CDATA block, XML CDATA escaping and HTML escaping all at once.
The cheap and easy way:
<script type="text/javascript">
var username = "<%= Encode(UserName) %>";
</script>
where the encoding scheme in Encode is to translate each character of input into the associated \xABCD representation compatible with JavaScript.
Another cheap and easy way:
<script type="text/javascript">
var username = decodeBase64("<%= EncodeBase64(UserName) %>");
</script>
if you are dealing only with ASCII.
Of course, pst hit the nail on the head with the strict way of doing it.

Categories

Resources