This question already has answers here:
Why split the <script> tag when writing it with document.write()?
(5 answers)
Closed 8 years ago.
I am encountering an issue where having a ending script tag inside a quoted string in JavaScript, and it is killing the script. I assume this is not expected behaviour. An example of this can be seen here: http://jsbin.com/oqepe/edit
My test case browser for the interested: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.4) Gecko/20091028 Ubuntu/9.10 (karmic) Firefox/3.5.4.
What happens?
The browser HTML parser will see the </script> within the string and it will interpret it as the end of the script element.
Look at the syntax coloring of this example:
<script>
var test = 'foo... </script> bar.....';
</script>
Note that the word bar is being treated as text content outside of the script element...
A commonly used technique is to use the concatenation operator:
var test = '...... </scr'+'ipt>......';
You need to escape it, else it will be a part of the HTML.
var test = 'what the hell... \<\/script\> \<h1\>why?!?!?!\<\/h1\>';
Related
This question already has answers here:
Why split the <script> tag when writing it with document.write()?
(5 answers)
Closed 8 years ago.
I am encountering an issue where having a ending script tag inside a quoted string in JavaScript, and it is killing the script. I assume this is not expected behaviour. An example of this can be seen here: http://jsbin.com/oqepe/edit
My test case browser for the interested: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.4) Gecko/20091028 Ubuntu/9.10 (karmic) Firefox/3.5.4.
What happens?
The browser HTML parser will see the </script> within the string and it will interpret it as the end of the script element.
Look at the syntax coloring of this example:
<script>
var test = 'foo... </script> bar.....';
</script>
Note that the word bar is being treated as text content outside of the script element...
A commonly used technique is to use the concatenation operator:
var test = '...... </scr'+'ipt>......';
You need to escape it, else it will be a part of the HTML.
var test = 'what the hell... \<\/script\> \<h1\>why?!?!?!\<\/h1\>';
This question already has answers here:
Regular expression to get a string between two strings in Javascript
(13 answers)
Closed 5 years ago.
I'm developing a chrome and firefox extension and i'm stuck with matching a certain tag and content inside of that. Can you please help me out?
Code:
[QUOTE=UserAdmin;22061013]
[SIZE="4"]
[LEFT]
[COLOR="DarkGreen"] Sample text goes here [/COLOR]
[/LEFT]
[/SIZE]
[/QUOTE]
Here i'd like to match beginning of [QUOTE= because everything what comes after that will be totally different each time and finally by the closing tag of [/QUOTE]
I'm not a regex expert and here is what i've came up with:
const regex = /^(\[QUOTE=)/;
const str = "[QUOTE=UserAdmin;22061013][SIZE="4"] [LEFT] [COLOR="DarkGreen"] Sample text goes here [/COLOR] [/LEFT] [/SIZE][/QUOTE]";
It successfully matched as below but i'm not sure this is the correct way of doing it:
If i can have a regex code to match whatever inside the [QUOTE=]....[/QUOTE] tag and save it to later use would be highly appreciated.
Online regex fiddle link
Try this
\[QUOTE=[\s\S]+\[\/QUOTE\]
This question already has answers here:
Why does <!--<script> cause a DOM tree break on the browser?
(2 answers)
Closed 6 years ago.
Why this code breaks:
<script>
var test = "<!-- <script ";
</script>
<h1>
If you can see this it means the page didn't break
</h1>
https://jsfiddle.net/y3w7ugaw/
and this doesn't
<script>
var test = "<!-- <script";
</script>
<h1>
If you can see this it means the page didn't break
</h1>
https://jsfiddle.net/mL1xxygo/
I should not break since test var is a string
Good question. The two examples are not the same in that the first has a space between <script and the following closing double quote while the second does not. Both examples have the character sequence <!--, used to introduce comments in HTML source, inside the javascript string.
The first example does not show the header, which can be made to reappear by either
removing the <!-- characters, OR
by removing the space after <script in the string value.
The question alluded to in comment states that the HTML is invalid although reading the HTML parsing spec does not make the reason particularly obvious.
A javascript solution is to escape characters confusing the parser with a backslash, even though the character does not normally need escaping. JavaScript ignores backslashes before ordinary characters whilst the parser does not.
Hence either
var test = "<\!-- <script ";
or
var test = "<\!-- <script";
both successfully create a string containing the HTML start comment sequence without confusing the parser.
This question already has answers here:
Escaping </script> tag inside javascript
(3 answers)
Closed 8 years ago.
Was playing around with some code and just realized you can't write a script tag in a string without the browser trying to display:
<html>
<head>
<script>
var code = "<script></script>";
</script>
</head>
This prints to the screen. Weird - why this behavior?
This has nothing to do with JavaScript "string parsing". Rather it's about HTML parsing.
It is simply not valid for HTML for a <script> element to contain the sequence </script> (actually, any </ although browsers are lenient on that) in it's content - any such sequence will always be treated as the closing tag.
See Escaping </script> tag inside javascript for lots of the details.
A common solution is thus to separate the sequence using string concatenation
var code = "<script><"+"/script>";
Although it is also valid to use an escape ("<script><\/script>") or an escape sequence ("<script><\x2fscript>").
The CDATA approach should not be used with HTML, as it's only for XML.
This question already has answers here:
Regular Expression to Extract HTML Body Content
(6 answers)
Closed 8 years ago.
I have this code in a var.
<html>
<head>
.
.
anything
.
.
</head>
<body anything="">
content
</body>
</html>
or
<html>
<head>
.
.
anything
.
.
</head>
<body>
content
</body>
</html>
result should be
content
Note that the string-based answers supplied above should work in most cases. The one major advantage offered by a regex solution is that you can more easily provide for a case-insensitive match on the open/close body tags. If that is not a concern to you, then there's no major reason to use regex here.
And for the people who see HTML and regex together and throw a fit...Since you are not actually trying to parse HTML with this, it is something you can do with regular expressions. If, for some reason, content contained </body> then it would fail, but aside from that, you have a sufficiently specific scenario that regular expressions are capable of doing what you want:
const strVal = yourStringValue; //obviously, this line can be omitted - just assign your string to the name strVal or put your string var in the pattern.exec call below
const pattern = /<body[^>]*>((.|[\n\r])*)<\/body>/im;
const array_matches = pattern.exec(strVal);
After the above executes, array_matches[1] will hold whatever came between the <body and </body> tags.
var matched = XMLHttpRequest.responseText.match(/<body[^>]*>([\w|\W]*)<\/body>/im);
alert(matched[1]);
I believe you can load your html document into the .net HTMLDocument object and then simply call the HTMLDocument.body.innerHTML?
I am sure there is even and easier way with the newer XDocumnet as well.
And just to echo some of the comments above regex is not the best tool to use as html is not a regular language and there are some edge cases that are difficult to solve for.
https://en.wikipedia.org/wiki/Regular_language
Enjoy!