Javascript syntax and HTML entities

Javascript syntax and HTML entities - javascript

I work on some crawler software of my own and one of my users just reported it will not work with code like this:
onclick="document.location.href = 'http://www.example.com/somepage.aspx'; return false;"
i.e. inside Javascript code use ' instead of ' to designate start end end of string
What surprises me is that browsers I have tested do not report any JavaScript syntax errors... And it seems to work when I click at it... I must be having a brain meltdown - is ' around a string really legitimate Javascript code?

is ' around a string really legitimate Javascript code?
No. The browser will decode the character references before evaluating the value as JavaScript.

Related

How to filter emojis from string jquery/javascript?

I'm using the following to exclude emojis/emoticons from a string in php. How do I do the same with javascript or jQuery?
preg_replace('/([0-9|#][\x{20E3}])|[\x{00ae}|\x{00a9}|\x{203C}|\x{2047}|\x{2048}|\x{2049}|\x{3030}|\x{303D}|\x{2139}|\x{2122}|\x{3297}|\x{3299}][\x{FE00}-\x{FEFF}]?|[\x{2190}-\x{21FF}][\x{FE00}-\x{FEFF}]?|[\x{2300}-\x{23FF}][\x{FE00}-\x{FEFF}]?|[\x{2460}-\x{24FF}][\x{FE00}-\x{FEFF}]?|[\x{25A0}-\x{25FF}][\x{FE00}-\x{FEFF}]?|[\x{2600}-\x{27BF}][\x{FE00}-\x{FEFF}]?|[\x{2900}-\x{297F}][\x{FE00}-\x{FEFF}]?|[\x{2B00}-\x{2BF0}][\x{FE00}-\x{FEFF}]?|[\x{1F000}-\x{1F6FF}][\x{FE00}-\x{FEFF}]?/u', '', $text);
This is what I try to do
$('#edit.popup .btn.save').live('click',function(e) {
var item_id = $(this).attr('id');
var edited_text = $('#edit.popup textarea').val().replace(/([0-9|#][\x{20E3}])|[\x{00ae}|\x{00a9}|\x{203C}|\x{2047}|\x{2048}|\x{2049}|\x{3030}|\x{303D}|\x{2139}|\x{2122}|\x{3297}|\x{3299}][\x{FE00}-\x{FEFF}]?|[\x{2190}-\x{21FF}][\x{FE00}-\x{FEFF}]?|[\x{2300}-\x{23FF}][\x{FE00}-\x{FEFF}]?|[\x{2460}-\x{24FF}][\x{FE00}-\x{FEFF}]?|[\x{25A0}-\x{25FF}][\x{FE00}-\x{FEFF}]?|[\x{2600}-\x{27BF}][\x{FE00}-\x{FEFF}]?|[\x{2900}-\x{297F}][\x{FE00}-\x{FEFF}]?|[\x{2B00}-\x{2BF0}][\x{FE00}-\x{FEFF}]?|[\x{1F000}-\x{1F6FF}][\x{FE00}-\x{FEFF}]?/u, '');
$('#grid li.image#' + item_id + ' img').attr('data-text', edited_text);
});
I found this suggestion in another post on Stack Overflow, but it's not working. It's still allowing emojis from ex ios.
.replace(/([\uE000-\uF8FF]|\uD83C[\uDF00-\uDFFF]|\uD83D[\uDC00-\uDDFF])/g, '')
What I try to achieve is to not allow emojis in textfield, and if an emoji is inserted (from ex ios keyboard) it will be replaced by nothing. It works with php. Someone here who can help me out with this?

Based on the answer from mb21, this regex did the job. No loop required!
/[\uD800-\uDBFF][\uDC00-\uDFFF]/g

As pointed out in this answer, JavaScript doesn't support Unicode code points outside the Basic Multilingual Plane (where iOS emojis lie).
I highly recommend reading The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!). Then you'll understand what was meant with:
So some indirect approach is needed. Cf. to JavaScript strings outside of the BMP.
For example, you could look for code points in the range [\uD800-\uDBFF] (high surrogates), and when you find one, check that the next code point in the string is in the range [\uDC00-\uDFFF] (if not, there is a serious data error), interpret the two as a Unicode character, and replace them by whatever you wish to put there. This looks like a job for a simple loop through the string, rather than a regular expression.

Javascript string variable unquoted?

I am using the QuickBlox JavaScript API. Looking through their code, I found this line:
var URL_REGEXP = /\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))/gi;
It appears that it has declared a string variable that is a regular expression pattern. Then it goes ahead to use that variable thus:
return str.replace(URL_REGEXP, function(match) {
url = (/^[a-z]+:/i).test(match) ? match : 'http://' + match;
url_text = match;
return '' + escapeHTML(url_text) + '';
});
I am wondering how is this possible? The var declared in the first line should be a string, but it is unquoted. Shouldn't this be a syntax error?
I went ahead and tested this code on my browser, and it works! This mean's I've got some learning to do here... Can anyone explain how this variable is declared?
Additionally, I tried to run the same code on my friends computer, the Chrome debugger throws a syntax error on the variable declaration line (unexpected token '/'). I am using Chrome Version 36.0.1985.143 m, my friend is using the same thing, but on my computer, it all works fine, on my friends computer, the code stops at the first variable declaration because of "syntax error".
Is there some setting that is different?
Any help would be appreciated.
UPDATE
Thanks for the quick answers. I've come from a PHP background, so thought that all regular expressions has to be initialized as strings :P.
Anyone can reproduce the syntax error I'm getting on my friends computer? (It still happens after disabling all extensions). I can't reproduce it either, and that's what is frustrating me.
UPDATE 2
I have tested and my friends computer and looked through the source. It appear to be due to some encoding problems (I'm not sure what). The regular expression is shown like this:
var URL_REGEXP = /\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?芦禄鈥溾€濃€樷€橾))/gi;
(The characters at the end of the code is some random chinese characters, it seems).
How can I change the encoding to match his browser/system? (He is running on a Windows 7 Chinese simplified system).

It is not a String variable. It is a regular expression.
Calling var varname = /pattern/flags;
is effective to calling var varname = new RegExp("pattern", "flags");.
You can execute the following in any browser that supports a JavaScript console:
>>> var regex = /(?:[\w-]+\.)+[\w-]+/i
>>> regex.exec("google.com")
... ["google.com"]
>>> regex.exec("www.google.com")
... ["www.google.com"]
>>> regex.exec("ftp://ftp.google.com")
... ["ftp.google.com"]
>>> regex.exec("http://www.google.com")
Anyone can reproduce the syntax error I'm getting on my friends computer? (It still happens after disabling all extensions). I can't reproduce it either, and that's what is frustrating me.
According to RegExp - JavaScript documentation:
Regex literals was present in ECMAScript 1st Edition, implemented in JavaScript 1.1. Use an updated browser.

No, it shouldn't be a syntax error. In Javascript, RegExp objects are not strings, they are a distinct class of objects. /.../modifiers is the syntax for a RegExp literal.
I can't explain the syntax error you got on your friend's computer, it looks fine to me. I pasted it into the Javascript console and it was fine.

whay backaward slash in the parameter element of the javascript object?

I was inspecting this site in firebug. Inside the third <script/> tag in the head section of the page , I found an object variable declared in the following way ( truncated here however by me) :
var EM={
"ajaxurl":"http:\/\/ipsos.com.au\/wp-admin\/admin-ajax.php",
"bookingajaxurl":"http:\/\/ipsos.com.au\/wp-admin\/admin-ajax.php",
"locationajaxurl":"http:\/\/ipsos.com.au\/wp-admin\/admin-ajax.php?action=locations_search",
"firstDay":"1","locale":"en"};
The utility of the variable is unknown to me. What struck me is the 3 urls presented there. Why are the backward slashes present there? Couldn't it be something like :
"ajaxurl" : "http://ipsos.com.au/wp-admin/admin-ajax.php"
?

In a script element there are various character sequences (depending on the version of HTML) that will terminate the element. </script> will always do this.
<\/script> will not.
Escaping / characters will not change the meaning of the JS, but will prevent any such HTML from ending the script.

The \/\/ is to avoid the below scenario:
when the url looks something similar to "ajaxurl" : "http://google.com/search?q=</script>"
Try copy paste the url in browsers address bar. This is handled correctly. Otherwise, You might end up getting script errors and page might not work as you've expected.
imagine DOM manipulators replacing the value as it is in the src attribute of the script tag and then the javascript engine reporting multiple errors because that particular script referenced might not get loaded due to incorrectly defined src value
Hope this helps.
Life would be hectic without these lil things

It is used to escape the characters..
The backslash () can be used to insert apostrophes, new lines, quotes, and other special characters into a string.
var str = " Hello "World" !! ";
alert(str)
This won't work..
You have to escape them first
var str = " Hello \"World\" !! ";
alert(str) ; \\ This works
In terms of Javascript / and <\/ are identical inside a string. As far as HTML is concerned </ starts an end tag but <\/ does not.

Escaping raw, unescaped strings in bookmarklet

I’m trying to write a search engine bookmarklet (for Chrome), but I’m having trouble escaping the string.
For example if the search engine bookmarklet is the following:
javascript:alert("%s"); //%s is the search engine query, passed literally by chrome.
Then running it on the following string will give incorrect results:
c:\zebra
c:zebra instead of c:\zebra
If the character after the slash happens to be an actual escape character, then the results will vary depending on the character.
I’ve tried escaping and unescaping the string, I’ve tried reg-ex’ing it, and replacing the slash with a double-slash, but I cannot figure out a way to get this to work because the first time that the raw string enters the script, it is unescaped, and any operation after that will see it incorrectly.
How can this be handled correctly?

So far I can only make this work in chrome:
javascript: var str = (function(){STARTOFSTRING:/*%s*/ENDOFSTRING:;}).toString().match( /STARTOFSTRING:\/\*([\s\S]*)\*\/ENDOFSTRING:/ )[1]; alert(str);
writing c:\zebra will alert c:\zebra.
Firefox doesn't sustain the comments inside the function body when decompiled, unfortunately.
You also can't write the sequence */ in the string, but everything else should be passed literally, including quotes " ' etc

Json.encode special symbols \u003c MVC3

I have JavaScript application, where I use client-side templates (underscore.js, Backbone.js).
Data for initial page load is strapped into the page like this (.cshtml Razor-file):
<div id="model">#Json.Encode(Model)</div>
Razor engine performs escaping, so, if the Model is
new { Title = "<script>alert('XSS');</script>" }
, in output we have:
<div id="model">{"Title":"\u003cscript\u003ealert(\u0027XSS\u0027)\u003c/script\u003e"}</div>
Which after "parse" operation:
var data = JSON.parse($("#model").html());
we have object data with "Title" field exactly "<script>alert('XSS');</script>"!
When this goes to underscore template, it alerts.
Somehow \u003c-like symbols are treated like proper "<" symbols.
How do I escape "<" symbols to < and > from DB (if they somehow got there)?
Maybe I can tune Json.Encode serialization for escaping these symbols?
Maybe I can set up Entity Framework which I`m using, for automatically escape these symbols absolutely all the time when getting data from DB?

\u003c and similar codes are perfectly valid for JS. You can obfuscate whole JS files using this syntax, if you so choose. Essentially, you're seeing an escape character \, u for unicode, and then a 4-character Hex code which relates to a symbol.
http://javascript.about.com/library/blunicode.htm
\u003c - as you've noted, is the < character.
One approach to "fixing" this on the MVC side would be to write a RegEx which looks for the pattern \u - and then captures the next 4 characters. You could then un-encode them into actual unicode characters - and run the resultant text through your XSS prevention algorithms.
As you've noted in your question - just looking for "<" doesn't help. You also can't just look for "\u003cscript" - because this assumes the potential hacker hasn't simply unicode-encoded the entire "script" tag word. The safer approach is to un-escape all of these kinds of codes and then cleanse your HTML in plain-text.
Incidentally, it might make you feel better to note that this is one of the common (and thusfar poorly resolved) issues in XSS prevention. So you aren't alone in wanting a better solution...
You might check out the following libraries to assist in the actual html cleansing:
http://wpl.codeplex.com/ (Microsoft's attempt at a solution - though very bad user feedback)
https://www.owasp.org/index.php/Category:OWASP_AntiSamy_Project_.NET (A private project which is designed to do a lot of this kind of prevention. I find it hard to use, and poorly implemented in .NET)
Both are good references, though.

You need to encode your string as HTML before providing it to Underscore.
"HTML escaping in Underscore.js templates" explains how to do this.

If you want to write unencoded content you will need to use the Html.Raw() helper:
#Html.Raw(Json.Encode(Model))
Edit:
I guess, perhaps I'm not understanding what your problem is. For example within a test controller I have the following
ViewBag.Test = new { Title = "<script>alert('XSS');</script>" };
In the related view:
<script type="text/javascript">
var test = #Html.Raw(Json.Encode(ViewBag.Test));
console.log(test.Title);
document.write(test.Title);
</script>
Which in turn outputs to the console:
<script>alert('XSS');</script>
And opens the alert.

Develop Reference

JavaScript is the programming language of the Web.