Prevent JS from parsing string [duplicate] - javascript

This question already has answers here:
Escaping </script> tag inside javascript
(3 answers)
Closed 8 years ago.
Was playing around with some code and just realized you can't write a script tag in a string without the browser trying to display:
<html>
<head>
<script>
var code = "<script></script>";
</script>
</head>
This prints to the screen. Weird - why this behavior?

This has nothing to do with JavaScript "string parsing". Rather it's about HTML parsing.
It is simply not valid for HTML for a <script> element to contain the sequence </script> (actually, any </ although browsers are lenient on that) in it's content - any such sequence will always be treated as the closing tag.
See Escaping </script> tag inside javascript for lots of the details.
A common solution is thus to separate the sequence using string concatenation
var code = "<script><"+"/script>";
Although it is also valid to use an escape ("<script><\/script>") or an escape sequence ("<script><\x2fscript>").
The CDATA approach should not be used with HTML, as it's only for XML.

Related

Invalid or unexpected token in javascript object [duplicate]

This question already has answers here:
Why split the <script> tag when writing it with document.write()?
(5 answers)
Closed 8 years ago.
I am encountering an issue where having a ending script tag inside a quoted string in JavaScript, and it is killing the script. I assume this is not expected behaviour. An example of this can be seen here: http://jsbin.com/oqepe/edit
My test case browser for the interested: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.4) Gecko/20091028 Ubuntu/9.10 (karmic) Firefox/3.5.4.
What happens?
The browser HTML parser will see the </script> within the string and it will interpret it as the end of the script element.
Look at the syntax coloring of this example:
<script>
var test = 'foo... </script> bar.....';
</script>
Note that the word bar is being treated as text content outside of the script element...
A commonly used technique is to use the concatenation operator:
var test = '...... </scr'+'ipt>......';
You need to escape it, else it will be a part of the HTML.
var test = 'what the hell... \<\/script\> \<h1\>why?!?!?!\<\/h1\>';

why this small piece of JavaScript breaks? [duplicate]

This question already has answers here:
Why does <!--<script> cause a DOM tree break on the browser?
(2 answers)
Closed 6 years ago.
Why this code breaks:
<script>
var test = "<!-- <script ";
</script>
<h1>
If you can see this it means the page didn't break
</h1>
https://jsfiddle.net/y3w7ugaw/
and this doesn't
<script>
var test = "<!-- <script";
</script>
<h1>
If you can see this it means the page didn't break
</h1>
https://jsfiddle.net/mL1xxygo/
I should not break since test var is a string
Good question. The two examples are not the same in that the first has a space between <script and the following closing double quote while the second does not. Both examples have the character sequence <!--, used to introduce comments in HTML source, inside the javascript string.
The first example does not show the header, which can be made to reappear by either
removing the <!-- characters, OR
by removing the space after <script in the string value.
The question alluded to in comment states that the HTML is invalid although reading the HTML parsing spec does not make the reason particularly obvious.
A javascript solution is to escape characters confusing the parser with a backslash, even though the character does not normally need escaping. JavaScript ignores backslashes before ordinary characters whilst the parser does not.
Hence either
var test = "<\!-- <script ";
or
var test = "<\!-- <script";
both successfully create a string containing the HTML start comment sequence without confusing the parser.

Alert javascript function code in alert? [duplicate]

This question already has answers here:
Why split the <script> tag when writing it with document.write()?
(5 answers)
Closed 8 years ago.
I am encountering an issue where having a ending script tag inside a quoted string in JavaScript, and it is killing the script. I assume this is not expected behaviour. An example of this can be seen here: http://jsbin.com/oqepe/edit
My test case browser for the interested: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.4) Gecko/20091028 Ubuntu/9.10 (karmic) Firefox/3.5.4.
What happens?
The browser HTML parser will see the </script> within the string and it will interpret it as the end of the script element.
Look at the syntax coloring of this example:
<script>
var test = 'foo... </script> bar.....';
</script>
Note that the word bar is being treated as text content outside of the script element...
A commonly used technique is to use the concatenation operator:
var test = '...... </scr'+'ipt>......';
You need to escape it, else it will be a part of the HTML.
var test = 'what the hell... \<\/script\> \<h1\>why?!?!?!\<\/h1\>';

Why use *//<![CDATA[* and *//]]>* in a jQuery script? [duplicate]

This question already has answers here:
When is a CDATA section necessary within a script tag?
(15 answers)
Closed 9 years ago.
I have this working piece of javascript:
<script type="text/javascript">
//<![CDATA[
jQuery(document).ready(function() {
jQuery("#page_template option[value='sidebar-page.php']").remove();
});
//]]>
</script>
What's //<![CDATA[ and //]]> stand for? I never used it but lately I meet it very often.
Thank you guys in advance for increasing my knowledge! ;)
CDATA is used to allow the document to be loaded as straight XML. You can embed JS in XML documents without replacing special XML characters like <, >, &, etc by XML entities <, >, & etc to prevent that the XML syntax get corrupted.
So double slash // in your XML will be treated as text instead of a comment and hence it makes CDATA as an XML tag.
The wiki says that:-
In an XML document or external parsed entity, a CDATA section is a
section of element content that is marked for the parser to interpret
as only character data, not markup. A CDATA section is merely an
alternative syntax for expressing character data; there is no semantic
difference between character data that manifests as a CDATA section
and character data that manifests as in the usual syntax in which <
and & would be represented by < and &, respectively.

how to extract body contents using regexp [duplicate]

This question already has answers here:
Regular Expression to Extract HTML Body Content
(6 answers)
Closed 8 years ago.
I have this code in a var.
<html>
<head>
.
.
anything
.
.
</head>
<body anything="">
content
</body>
</html>
or
<html>
<head>
.
.
anything
.
.
</head>
<body>
content
</body>
</html>
result should be
content
Note that the string-based answers supplied above should work in most cases. The one major advantage offered by a regex solution is that you can more easily provide for a case-insensitive match on the open/close body tags. If that is not a concern to you, then there's no major reason to use regex here.
And for the people who see HTML and regex together and throw a fit...Since you are not actually trying to parse HTML with this, it is something you can do with regular expressions. If, for some reason, content contained </body> then it would fail, but aside from that, you have a sufficiently specific scenario that regular expressions are capable of doing what you want:
const strVal = yourStringValue; //obviously, this line can be omitted - just assign your string to the name strVal or put your string var in the pattern.exec call below
const pattern = /<body[^>]*>((.|[\n\r])*)<\/body>/im;
const array_matches = pattern.exec(strVal);
After the above executes, array_matches[1] will hold whatever came between the <body and </body> tags.
var matched = XMLHttpRequest.responseText.match(/<body[^>]*>([\w|\W]*)<\/body>/im);
alert(matched[1]);
I believe you can load your html document into the .net HTMLDocument object and then simply call the HTMLDocument.body.innerHTML?
I am sure there is even and easier way with the newer XDocumnet as well.
And just to echo some of the comments above regex is not the best tool to use as html is not a regular language and there are some edge cases that are difficult to solve for.
https://en.wikipedia.org/wiki/Regular_language
Enjoy!

Categories

Resources