why this small piece of JavaScript breaks? [duplicate] - javascript

This question already has answers here:
Why does <!--<script> cause a DOM tree break on the browser?
(2 answers)
Closed 6 years ago.
Why this code breaks:
<script>
var test = "<!-- <script ";
</script>
<h1>
If you can see this it means the page didn't break
</h1>
https://jsfiddle.net/y3w7ugaw/
and this doesn't
<script>
var test = "<!-- <script";
</script>
<h1>
If you can see this it means the page didn't break
</h1>
https://jsfiddle.net/mL1xxygo/
I should not break since test var is a string

Good question. The two examples are not the same in that the first has a space between <script and the following closing double quote while the second does not. Both examples have the character sequence <!--, used to introduce comments in HTML source, inside the javascript string.
The first example does not show the header, which can be made to reappear by either
removing the <!-- characters, OR
by removing the space after <script in the string value.
The question alluded to in comment states that the HTML is invalid although reading the HTML parsing spec does not make the reason particularly obvious.
A javascript solution is to escape characters confusing the parser with a backslash, even though the character does not normally need escaping. JavaScript ignores backslashes before ordinary characters whilst the parser does not.
Hence either
var test = "<\!-- <script ";
or
var test = "<\!-- <script";
both successfully create a string containing the HTML start comment sequence without confusing the parser.

Related

regular expression : ignore html tags [duplicate]

This question already has answers here:
Finding substring whilst ignoring HTML tags
(3 answers)
Closed 2 years ago.
I have HTML content like this:
<p>The bedding was hardly <strong>able to cover</strong> it and seemed ready to slide off any moment.</p>
Here's a complete version of the HTML.
http://collabedit.com/gkuc2
I need to search the string hardly able to cover (just an example), I want to ignore any HTML tags inside the string I'm looking for. Because in the HTML file there's HTML tags inside the string and a simple search won't find it.
The use case is: I have two versions of a file:
An HTML file with text and tags
The same file but with the raw text only (removed any tags and extra spaces)
The sub-string that I want to search (the needle) is from the text version (that doesn't contain any HTML tag) and I want to find it's position in the HTML version (the file that has tags).
What is the regular expression that would work?
Put this between each letter:
(?:<[^>]+>)*
and replace the spaces with:
(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*
Like:
h(?:<[^>]+>)*a(?:<[^>]+>)*r(?:<[^>]+>)*d(?:<[^>]+>)*l(?:<[^>]+>)*y(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*a(?:<[^>]+>)*b(?:<[^>]+>)*l(?:<[^>]+>)*e(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*t(?:<[^>]+>)*o(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*c(?:<[^>]+>)*o(?:<[^>]+>)*v(?:<[^>]+>)*e(?:<[^>]+>)*r
you only need the ones between each letter if you want to allow tags to break words, like: This is b<b>old</b>
This is it without the letter break:
hardly(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*able(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*to(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*cover
This should work for most cases. However, if the Html is malformed in which the < or > is not htmlencoded, you may run into issues. Also it may break on script blocks or other elements with CDATA sections.
Try to save the text in a variable or something, then remove all the tags and perform a normal search in that.
You can use a simple php function strip_tags().
EDIT:
So you might try to look for the first and last words (or just first and then play with the rest of the result) to locate the string, then parse the result and remove tags and check if it's the one you're looking for.
Like using regex:
hardly.cover
or even
hardly.$
And saving the location of each result.
Then use strip_tags() on the results and analyze each result if it's the one you want.
I know it's kinda weird solution but you can avoid endless regex etc.

Prevent JS from parsing string [duplicate]

This question already has answers here:
Escaping </script> tag inside javascript
(3 answers)
Closed 8 years ago.
Was playing around with some code and just realized you can't write a script tag in a string without the browser trying to display:
<html>
<head>
<script>
var code = "<script></script>";
</script>
</head>
This prints to the screen. Weird - why this behavior?
This has nothing to do with JavaScript "string parsing". Rather it's about HTML parsing.
It is simply not valid for HTML for a <script> element to contain the sequence </script> (actually, any </ although browsers are lenient on that) in it's content - any such sequence will always be treated as the closing tag.
See Escaping </script> tag inside javascript for lots of the details.
A common solution is thus to separate the sequence using string concatenation
var code = "<script><"+"/script>";
Although it is also valid to use an escape ("<script><\/script>") or an escape sequence ("<script><\x2fscript>").
The CDATA approach should not be used with HTML, as it's only for XML.

Wordpress & Javascript: String variable having html tags being read by browser with newline character

I have gone crazy trying to resolve this issue.
In my javascript code I have am defining a string variable in which I am putting an HTML table in the form of string.. i.e.:
var tData="<table><tbody><tr><a><th>Type</th><th>Score</th><th>Percentile</th></a></tr><tr><td><a>Overall</a></td><td>2.4</td><td>50%</td></tr><tr><td><a>Best 100</a></td><td>2.3</td><td>70%</td></tr></tbody></table>";
Now this variable assignment through the string is being read by my browser (both chrome and firefox) as an HTML code with line breaks. Take a look at the image below for more clarity.
The code works fine if I remove html tags and write a simple string. So I can assure you there are no previous inverted comma errors (i checked them multiple times) and no bogus characters.
I have spent too many hours on this issue. Please please help me on this.
EDIT
Added Wordpress in title and Tags as this is a wordpress issue.
Since your document is XHTML, you have to enclose your code into a CDATA section:
<script>
<![CDATA[
// code here
]]>
</script>
This prevents the browser from interpreting <...> sequences in the content as tags.
If you want multiline strings in JavaScript, you have to unescape the newline, ie
var str = "abc\
de";
Ok. Eureka!!!
I found a get around. I broke the following string :
var tData="<table><tbody><tr><a><th>Type</th><th>Score</th><th>Percentile</th></a></tr><tr><td><a>Overall</a></td><td>2.4</td><td>50%</td></tr><tr><td><a>Best 100</a></td><td>2.3</td><td>70%</td></tr></tbody></table>";
into
var tData = "<tab"+"le><tb"+"ody><t"+"r><a><t"+"h>Type</t"+"h><t"+"h>Score</t"+"h><t"+"h>Percentile</t"+"h></a></t"+"r><t"+"r><t"+"d><a>Overall</a></t"+"d><t"+"d>2.4</t"+"d><t"+"d>50%</t"+"d></t"+"r><t"+"r><t"+"d><a>Best 100</a></t"+"d><t"+"d>2.3</t"+"d><t"+"d>70%</t"+"d></t"+"r></tbo"+"dy></ta"+"ble>";
to fool the browser. I am still hoping for a better answer please.
Delete all invisible characters (whitespace) around that area,
then give it another try.
Try this:
var tData="<table><tbody>";
tData+="<tr><th><a>Type</a></th><th>Score</th><th>Percentile</th></tr>";
tData+="<tr><td><a>Overall</a></td><td>2.4</td><td>50%</td></tr>";
tData+="<tr><td><a>Best 100</a></td><td>2.3</td><td>70%</td></tr>";
tData+="</tbody></table>";
Possible Duplicate No visible cause for "Unexpected token ILLEGAL"

whay backaward slash in the parameter element of the javascript object?

I was inspecting this site in firebug. Inside the third <script/> tag in the head section of the page , I found an object variable declared in the following way ( truncated here however by me) :
var EM={
"ajaxurl":"http:\/\/ipsos.com.au\/wp-admin\/admin-ajax.php",
"bookingajaxurl":"http:\/\/ipsos.com.au\/wp-admin\/admin-ajax.php",
"locationajaxurl":"http:\/\/ipsos.com.au\/wp-admin\/admin-ajax.php?action=locations_search",
"firstDay":"1","locale":"en"};
The utility of the variable is unknown to me. What struck me is the 3 urls presented there. Why are the backward slashes present there? Couldn't it be something like :
"ajaxurl" : "http://ipsos.com.au/wp-admin/admin-ajax.php"
?
In a script element there are various character sequences (depending on the version of HTML) that will terminate the element. </script> will always do this.
<\/script> will not.
Escaping / characters will not change the meaning of the JS, but will prevent any such HTML from ending the script.
The \/\/ is to avoid the below scenario:
when the url looks something similar to "ajaxurl" : "http://google.com/search?q=</script>"
Try copy paste the url in browsers address bar. This is handled correctly. Otherwise, You might end up getting script errors and page might not work as you've expected.
imagine DOM manipulators replacing the value as it is in the src attribute of the script tag and then the javascript engine reporting multiple errors because that particular script referenced might not get loaded due to incorrectly defined src value
Hope this helps.
Life would be hectic without these lil things
It is used to escape the characters..
The backslash () can be used to insert apostrophes, new lines, quotes, and other special characters into a string.
var str = " Hello "World" !! ";
alert(str)
This won't work..
You have to escape them first
var str = " Hello \"World\" !! ";
alert(str) ; \\ This works
In terms of Javascript / and <\/ are identical inside a string. As far as HTML is concerned </ starts an end tag but <\/ does not.

how to extract body contents using regexp [duplicate]

This question already has answers here:
Regular Expression to Extract HTML Body Content
(6 answers)
Closed 8 years ago.
I have this code in a var.
<html>
<head>
.
.
anything
.
.
</head>
<body anything="">
content
</body>
</html>
or
<html>
<head>
.
.
anything
.
.
</head>
<body>
content
</body>
</html>
result should be
content
Note that the string-based answers supplied above should work in most cases. The one major advantage offered by a regex solution is that you can more easily provide for a case-insensitive match on the open/close body tags. If that is not a concern to you, then there's no major reason to use regex here.
And for the people who see HTML and regex together and throw a fit...Since you are not actually trying to parse HTML with this, it is something you can do with regular expressions. If, for some reason, content contained </body> then it would fail, but aside from that, you have a sufficiently specific scenario that regular expressions are capable of doing what you want:
const strVal = yourStringValue; //obviously, this line can be omitted - just assign your string to the name strVal or put your string var in the pattern.exec call below
const pattern = /<body[^>]*>((.|[\n\r])*)<\/body>/im;
const array_matches = pattern.exec(strVal);
After the above executes, array_matches[1] will hold whatever came between the <body and </body> tags.
var matched = XMLHttpRequest.responseText.match(/<body[^>]*>([\w|\W]*)<\/body>/im);
alert(matched[1]);
I believe you can load your html document into the .net HTMLDocument object and then simply call the HTMLDocument.body.innerHTML?
I am sure there is even and easier way with the newer XDocumnet as well.
And just to echo some of the comments above regex is not the best tool to use as html is not a regular language and there are some edge cases that are difficult to solve for.
https://en.wikipedia.org/wiki/Regular_language
Enjoy!

Categories

Resources