Is it necessary to "escape" character "<" and ">" for javascript string? - javascript

Sometimes, server side will generate strings to be embedded in inline JavaScript code. For example, if "UserName" should be generated by ASP.NET. Then it looks like.
<script>
var username = "<%UserName%>";
</script>
This is not safe, because a user can have his/her name to be
</script><script>alert('bug')</script></script>
It is XSS vulnerability.
So, basically, the code should be:
<script>
var username = "<% JavascriptEncode(UserName)%>";
</script>
What JavascriptEncode does is to add charater "\" before "/" and "'" and """. So, the output html is like.
var username = "</script>alert(\'bug\')</script></script>";
Browser will not interpret "</script>" as end of script block. So, XSS in avoided.
However, there are still "<" and ">" there. It is suggested to escape these two characters as well. First of all, I don't believe it is a good idea to change "<" to "<" and ">" to ">" here. And, I'm not sure changing "<" to "\<" and ">" to "\>" is recognizable to all browsers. It seems it is not necessary to do further encoding for "<" and ">".
Is there any suggestion on this?
Thanks.

The problem has different answers depending on what markup language you are using.
If you are using HTML, then you must not represent them with entities as script elements are marked as containing CDATA.
If you are using XHTML, then you may represent them as CDATA with explicit CDATA markers, or you may represent them with entities.
If you are using XHTML, but serving it as text/html, then you need to write something which conforms to the rules of XHTML but still works with a text/html parser. This generally means using explicit CDATA markers and commenting them out in JavaScript.
<script type="text/javascript">
// <![CDATA[
…
// ]]>
</script>
A while ago, I wrote a bit about the hows and whys of this.

No, you should not escape < and > using HTML entities inside <script> in HTML.
Use JavaScript string escaping rules (replace \ with \\ and " with \")
and replace all occurances of </ with <\/, to prevent escaping out of the <script> element.
In XHTML it's more complicated.
If you send XHTML as XML (the way that's incompatible with IE) and don't use CDATA block, then you need to escape entities, in addition to JavaScript string escaping.
If you send XHTML as XML and use CDATA block, then don't escape entities, but replace ]]> with ]]]]><![CDATA[> to prevent escaping out of it (in addition to JavaScript string escaping).
If you send XHTML as text/html (what 99% of people does) then you have to use XML CDATA block, XML CDATA escaping and HTML escaping all at once.

The cheap and easy way:
<script type="text/javascript">
var username = "<%= Encode(UserName) %>";
</script>
where the encoding scheme in Encode is to translate each character of input into the associated \xABCD representation compatible with JavaScript.
Another cheap and easy way:
<script type="text/javascript">
var username = decodeBase64("<%= EncodeBase64(UserName) %>");
</script>
if you are dealing only with ASCII.
Of course, pst hit the nail on the head with the strict way of doing it.

Related

Javascript with Special Chartecter

I have a html page in which I need to pass a String variable to javascript function. This works until String does not have a special charecter.
<html>
<head>
<script>
function test(v){
alert(v);
}
</script>
</head>
<body>
<input type="button" value="Test Button" onClick="test('BlahBlah')"/>
</body>
</html>
As soon as I change onClick like below, it stops working.
onClick="test('Blah'Blah')"
Any solution for this problem. Please take a note parameter which is being passed to JavaScript function is dynamic.Source of Parameter is backend and I cannot change that peice of code. Second thing even if put escape it still does not work. My problem is I have to retian the special charecter for some processing at backend
There are two layers to this:
The content of onClick attributes, like all attributes, is HTML text. That means that any character that's special in HTML (like <) must be replaced with an HTML entity (e.g., <). Additionally, if you use double quotes around the attribute value, any double quotes within the value must be replaced with entities ("); if you used single quotes around the attribute, you'd need to replace ' with &apos;.
Your attribute contains a JavaScript string literal. That means that any characters that are special inside JavaScript string literals must be escaped according to the JavaScript rules. Since you've used single quotes to delimit the JavaScript string, for instance, you have to escape any single quotes in the string with a backslash.
I'm assuming that HTML is generated server-side. If so, the work above must be done server-side, when building the HTML of the page. You haven't said what server-side tech you're using, so it's hard to point you at solutions that your server-side tech/environment might provide.
In the simple case of your
onClick="test('Blah'Blah')"
...you just need to add the backslash within the JavaScript string
onClick="test('Blah\'Blah')"
...but that's just that one specific case.
The dramatically simpler option is to not put JavaScript code in attribute values. Instead, use modern techniques (addEventListener, attachEvent) to hook up JavaScript code.
But if you must use an onClick attribute, avoid having text in it (or deal with the complexities above); have it call a function defined in a script element that then has the text, as you then have only the one layer (#2 above) to deal with.
Source of Parameter is backend and I cannot change that peice of code.
That backend is broken and needs fixing.
If:
the backend is only producing invalid JavaScript code (not invalid HTML)
and the code consists of a single function call
and the code is always a single function call
and the function call always has a single string literal argument
and that argument is always delimited with single quotes
and the single quotes within the string are never correctly escaped
...we might be able to salvage it client-side. But my guess is that the backend will also produce invalid HTML, for instance when the text has a " in it. (We can't do anything about that, because the attribute value will be chopped off at that point.)
But let's keep a good thought: Given the ridiculous list of caveats above, this might do it:
var elm = document.getElementById("the-div");
var code = elm.getAttribute("onclick");
var m = code.match(/^([^(]+)\('(.*)'\)$/);
if (m) {
code = m[1] + "('" + m[2].replace(/'/g, "\\'") + "')";
}
elm.setAttribute("onclick", code);
Live Example:
function foo(str) {
alert(str);
}
var elm = document.getElementById("the-div");
var code = elm.getAttribute("onclick");
var m = code.match(/^([^(]+)\('(.*)'\)$/);
if (m) {
code = m[1] + "('" + m[2].replace(/'/g, "\\'") + "')";
}
elm.setAttribute("onclick", code);
<div id="the-div" onclick="foo('blah'blah')">Click me</div>
Well this is an very common problem you wanted to add single quotes inside single quotes to do this you have to escape that Sigle quotes to do that you have to put an forward slash.
onClick="test('Blah\'Blah')"

Escaping JavaScript special characters from ASP.NET

I have following C# code in my ASP.NET application:
string script = #"alert('Message head:\n\n" + CompoundErrStr + " message tail.');";
System.Web.UI.ScriptManager.RegisterClientScriptBlock(this, this.GetType(), "Test", script, true);
CompoundErrStr is an error message generated by SQL Server (exception text bubbled up from the stored procedure). If it contains any table column names they are enclosed in single quotes and JavaScript breaks during execution because single quotes are considered a string terminator.
As a fix for single quotes I changed my code to this:
CompoundErrStr = CompoundErrStr.Replace("'", #"\'");
string script = #"alert('Message head:\n\n" + CompoundErrStr + " message tail.');";
System.Web.UI.ScriptManager.RegisterClientScriptBlock(this, this.GetType(), "Test", script, true);
and it now works fine.
However, are there any other special characters that need to be escaped like this? Is there a .Net function that can be used for this purpose? Something similar to HttpServerUtility.HtmlEncode but for JavaScript.
EDIT I use .Net 3.5
Note: for this task you can't (and you shouldn't) use HTML encoders (like HttpServerUtility.HtmlEncode()) because rules for HTML and for JavaScript strings are pretty different. One example: string "Check your Windows folder c:\windows" will be encoded as "Check your Windows folder c:'windows" and it's obviously wrong. Moreover it follows HTML encoding rules then it won't perform any escaping for \, " and '. Simply it's for something else.
If you're targeting ASP.NET Core or .NET 5 then you should use System.Text.Encodings.Web.JavaScriptEncoder class.
If you're targeting .NET 4.x you can use HttpUtility.JavaScriptStringEncode() method.
If you're targeting .NET 3.x and 2.x:
What do you have to encode? Some characters must be escaped (\, " and ') because they have special meaning for JavaScript parser while others may interfere with HTML parsing so should escaped too (if JS is inside an HTML page). You have two options for escaping: JavaScript escape character </kbd> or \uxxxx Unicode code points (note that \uxxxx may be used for them all but it won't work for characters that interferes with HTML parser).
You may do it manually (with search and replace) like this:
string JavaScriptEscape(string text)
{
return text
.Replace("\\", #"\u005c") // Because it's JS string escape character
.Replace("\"", #"\u0022") // Because it may be string delimiter
.Replace("'", #"\u0027") // Because it may be string delimiter
.Replace("&", #"\u0026") // Because it may interfere with HTML parsing
.Replace("<", #"\u003c") // Because it may interfere with HTML parsing
.Replace(">", #"\u003e"); // Because it may interfere with HTML parsing
}
Of course </kbd> should not be escaped if you're using it as escape character! This blind replacement is useful for unknown text (like input from users or text messages that may be translated). Note that if string is enclosed with double quotes then single quotes don't need to be escaped and vice-versa). Be careful to keep verbatim strings on C# code or Unicode replacement will be performed in C# and your client will receive unescaped strings. A note about interfere with HTML parsing: nowadays you seldom need to create a <script> node and to inject it in DOM but it was a pretty common technique and web is full of code like + "</s" + "cript>" to workaround this.
Note: I said blind escaping because if your string contains an escape sequence (like \uxxxx or \t) then it should not be escaped again. For this you have to do some tricks around this code.
If your text comes from user input and it may be multiline then you should also be ready for that or you'll have broken JavaScript code like this:
alert("This is a multiline
comment");
Simply add .Replace("\n", "\\n").Replace("\r", "") to previous JavaScriptEscape() function.
For completeness: there is also another method, if you encode your string Uri.EscapeDataString() then you can decode it in JavaScript with decodeURIComponent() but this is more a dirty trick than a solution.
While the original question mentions .NET 3.5, it should be known to users of 4.0+ that you can use HttpUtility.JavaScriptStringEncode("string")
A second bool parameter specifies whether to include quotation marks (true) or not (false) in the result.
All too easy:
#Html.Raw(myString)

whay backaward slash in the parameter element of the javascript object?

I was inspecting this site in firebug. Inside the third <script/> tag in the head section of the page , I found an object variable declared in the following way ( truncated here however by me) :
var EM={
"ajaxurl":"http:\/\/ipsos.com.au\/wp-admin\/admin-ajax.php",
"bookingajaxurl":"http:\/\/ipsos.com.au\/wp-admin\/admin-ajax.php",
"locationajaxurl":"http:\/\/ipsos.com.au\/wp-admin\/admin-ajax.php?action=locations_search",
"firstDay":"1","locale":"en"};
The utility of the variable is unknown to me. What struck me is the 3 urls presented there. Why are the backward slashes present there? Couldn't it be something like :
"ajaxurl" : "http://ipsos.com.au/wp-admin/admin-ajax.php"
?
In a script element there are various character sequences (depending on the version of HTML) that will terminate the element. </script> will always do this.
<\/script> will not.
Escaping / characters will not change the meaning of the JS, but will prevent any such HTML from ending the script.
The \/\/ is to avoid the below scenario:
when the url looks something similar to "ajaxurl" : "http://google.com/search?q=</script>"
Try copy paste the url in browsers address bar. This is handled correctly. Otherwise, You might end up getting script errors and page might not work as you've expected.
imagine DOM manipulators replacing the value as it is in the src attribute of the script tag and then the javascript engine reporting multiple errors because that particular script referenced might not get loaded due to incorrectly defined src value
Hope this helps.
Life would be hectic without these lil things
It is used to escape the characters..
The backslash () can be used to insert apostrophes, new lines, quotes, and other special characters into a string.
var str = " Hello "World" !! ";
alert(str)
This won't work..
You have to escape them first
var str = " Hello \"World\" !! ";
alert(str) ; \\ This works
In terms of Javascript / and <\/ are identical inside a string. As far as HTML is concerned </ starts an end tag but <\/ does not.

Is it enough to use HTMLEncode to display uploaded text?

We're allowing users to upload pictures and provide a text description. Users can view this through a pop up box (actually a div ) via javascript. The uploaded text is a parameter to a javascript function. I 'm worried about XSS and also finding issues with HTMLEncode().
We're using HTMLEncode to guard against XSS. Unfortunately, we're finding that HTMLEncode() only replaces '<' and '>'. We also need to replace single and double quotes that people may include. Is there a single function that will do all these special type characters or must we do that manually via .NET string.Replace()?
Unfortunately, we're finding that HTMLEncode() only replaces '<' and '>'.
Assuming you are talking about HttpServerUtility.HtmlEncode, that does encode the double-quote character. It also encodes as character references the range U+0080 to U+00FF, for some reason.
What it doesn't encode is the single quote. Bit of a shame but you can usually work around it by using only double quotes as attribute value delimiters in your HTML/XML. In that case, HtmlEncode is enough to prevent HTML-injection.
However, javascript is in your tags, and HtmlEncode is decidedly not enough to escape content to go in a JavaScript string literal. JavaScript-encoding is a different thing to HTML-encoding, so if that's the reason you're worried about the single quote then you need to employ a JS string encoder instead.
(A JSON encoder is a good start for that, but you would want to ensure it encodes the U+2028 and U+2029 characters which are, annoyingly, valid in JSON but not in JavaScript. Also you might well need some variety of HTML-escaping on top of that, if you have JavaScript in an HTML context. This can get hairy; it's usually better to avoid these problems by hiding the content you want in plain HTML, for example in a hidden input or custom attribute, where you can use standard HTML-escaping, and then read that data from the DOM in JS.)
If the text description is embedded inside a JavaScript string literal, then to prevent XSS, you will need to escape special characters such as quotes, backslashes, and newlines. The HttpUtility.HtmlEncode method is not suitable for this task.
If the JavaScript string literal is in turn embedded inside HTML (for example, in an attribute), then you will need to apply HTML encoding as well, on top of the JavaScript escaping.
You can use Microsoft's Anti-Cross Site Scripting library to perform the necessary escaping and encoding, but I recommend that you try to avoid doing this yourself. For example, if you're using WebForms, consider using an <asp:HiddenField> control: Set its Value property (which will be HTML-encoded automatically) in your server-side code, and access its value property from client-side code.
how about you htmlencode all of the input with this extended function:
private string HtmlEncode(string text)
{
char[] chars = HttpUtility.HtmlEncode(text).ToCharArray();
StringBuilder result = new StringBuilder(text.Length + (int)(text.Length * 0.1));
foreach (char c in chars)
{
int value = Convert.ToInt32(c);
if (value > 127)
result.AppendFormat("&#{0};", value);
else
result.Append(c);
}
return result.ToString();
}
this function will convert all non-english characters, symbols, quotes, etc to html-entities..
try it out and let me know if this helps..
If you're using ASP.NET MVC2 or ASP.NET 4 you can replace <%= with <%: to encode your output. It's safe to use for everything it seems (like HTML Helpers).
There is a good write up of this here: New <%: %> Syntax for HTML Encoding Output in ASP.NET 4 (and ASP.NET MVC 2)

What is the correct way to encode an inline javascript object, in order to protect it from XSS?

It turns out the following which looks like valid javascript, is not:
<html>
<body>
<script>
json = {test: "</script><script>alert('hello');</script>"};
</script>
</body>
</html>
The same text, when returned JSON via an ajax api works just as expected. However when rendered in-line results in a basic XSS issues.
Given an arbitrary correct JSON string, what do I need to do server side to make it safe for in-line rendering?
EDIT
Ideally I would like the fix to work with the following string as well:
json = {test: "<\/script><script>alert('hello');<\/script>"};
Meaning, I have no idea how my underlying library is encoding the / char, it may have chosen to encode it, or it may have not. (so its likely a regex fix is more robust)
See OWASP's XSS prevention guide (See Rule #3) -
Except for alphanumeric characters,
escape all characters less than 256
with the \xHH format to prevent
switching out of the data value into
the script context or into another
attribute. Do not use any escaping
shortcuts like \" because the quote
character may be matched by the HTML
attribute parser which runs first.
Assume this is how your object looks like -
var log = {
trace: function(m1, m2, m3){},
debug: function(m1, m2, m3){},
currentLogValue : "trace {].a23-%\/^&",
someOtherObject : {someKey:"somevalue", someOtherKey:"someothervalue"}
};
This should end up like this -
var log = {
trace : "function\x28m1,\x20m2,\x20m3\x29\x7B\x7D",
debug : "function\x28m1,\x20m2,\x20m3\x29\x7B\x7D",
currentLogValue : "trace\x20\x7B\x5D.a23\x2D\x25\x5C\x2F\x5E\x26",
someOtherObject : {someKey : "somevalue", someOtherKey:"someothervalue"}
};
The rules are straightforward -
Untrusted data is only allowed within a pair of quotes
Whatever is within quotes gets escaped as follows - "Except alphanumeric characters, escape everything else with the \xHH format"
This ensures that untrusted data is always interpreted as a string, and not as a function/object/anything else.
To start with, this is not JSON at all, it's a Javascript object. JSON is a text format that is based on the Javascript syntax.
You can either make sure that the code doesn't contain the </ character combination:
var obj = { test: "<"+"/script><script>alert(\"hello\");<"+"/script>" };
Or if you are using XHTML you can make sure that the content in the script tag is interpreted as plain data:
<script type="text/javascript">
//<![CDATA[
var obj = { test: "</script><script>alert(\"hello\");</script>" };
//]]>
</script>
In literal strings, put a backslash (\) before all “unsafe” characters, including the forward slash which occurs in “</script>” (/ → \/).
This would change your example to:
json = {test: "<\/script><script>alert(\"hello\");<\/script>"};
and it would still be valid JSON.
Of course you also have to escape the double-quote (" → \") and the backslash itself (\ → \\), but you would already have to do that anyway. You should also consider escaping the single-quote (' → \') to be on the safe side.
One issue you might be running into is the fact that the HTML and javascript interpreters on the browser run interleaved.
<html>
<body>
<script>
json = {test: "</script><script>alert('hello');</script>"};
</script>
</body>
</html>
In your example, the HTML interpreter will give json = {test: " to the js interpreter and then it will find the next javascript block (delimited by <script> and </script> tags) and give alert('hello'); to the js interpreter. It doesn't matter that the </script> tag is in a javascript string, because the HTML interpreter is the one looking for js code blocks and doesn't understand js strings.
The first section will cause a js syntax error, while the second section will create the alert. I realize this doesn't answer your question of what to do, but perhaps it will shed more light on what is going on under the hood.
I found this list of characters to be escaped for JSON strings:
\b Backspace (ascii code 08)
\f Form feed (ascii code 0C)
\n New line
\r Carriage return
\t Tab
\v Vertical tab
\' Apostrophe or single quote
\" Double quote
\\ Backslash character
Using PHP? If so: json_encode
echo json_encode("<\/script><script>alert(\"hello\");<\/script>");
Output:
"<\\\/script><script>alert(\"hello\");<\\\/script>"
Another example:
echo json_encode("</script><script>alert(\"hello\");</script>");
Output:
"<\/script><script>alert(\"hello\");<\/script>"

Categories

Resources