Is it enough to use HTMLEncode to display uploaded text?

Is it enough to use HTMLEncode to display uploaded text? - javascript

We're allowing users to upload pictures and provide a text description. Users can view this through a pop up box (actually a div ) via javascript. The uploaded text is a parameter to a javascript function. I 'm worried about XSS and also finding issues with HTMLEncode().
We're using HTMLEncode to guard against XSS. Unfortunately, we're finding that HTMLEncode() only replaces '<' and '>'. We also need to replace single and double quotes that people may include. Is there a single function that will do all these special type characters or must we do that manually via .NET string.Replace()?

Unfortunately, we're finding that HTMLEncode() only replaces '<' and '>'.
Assuming you are talking about HttpServerUtility.HtmlEncode, that does encode the double-quote character. It also encodes as character references the range U+0080 to U+00FF, for some reason.
What it doesn't encode is the single quote. Bit of a shame but you can usually work around it by using only double quotes as attribute value delimiters in your HTML/XML. In that case, HtmlEncode is enough to prevent HTML-injection.
However, javascript is in your tags, and HtmlEncode is decidedly not enough to escape content to go in a JavaScript string literal. JavaScript-encoding is a different thing to HTML-encoding, so if that's the reason you're worried about the single quote then you need to employ a JS string encoder instead.
(A JSON encoder is a good start for that, but you would want to ensure it encodes the U+2028 and U+2029 characters which are, annoyingly, valid in JSON but not in JavaScript. Also you might well need some variety of HTML-escaping on top of that, if you have JavaScript in an HTML context. This can get hairy; it's usually better to avoid these problems by hiding the content you want in plain HTML, for example in a hidden input or custom attribute, where you can use standard HTML-escaping, and then read that data from the DOM in JS.)

If the text description is embedded inside a JavaScript string literal, then to prevent XSS, you will need to escape special characters such as quotes, backslashes, and newlines. The HttpUtility.HtmlEncode method is not suitable for this task.
If the JavaScript string literal is in turn embedded inside HTML (for example, in an attribute), then you will need to apply HTML encoding as well, on top of the JavaScript escaping.
You can use Microsoft's Anti-Cross Site Scripting library to perform the necessary escaping and encoding, but I recommend that you try to avoid doing this yourself. For example, if you're using WebForms, consider using an <asp:HiddenField> control: Set its Value property (which will be HTML-encoded automatically) in your server-side code, and access its value property from client-side code.

how about you htmlencode all of the input with this extended function:
private string HtmlEncode(string text)
{
char[] chars = HttpUtility.HtmlEncode(text).ToCharArray();
StringBuilder result = new StringBuilder(text.Length + (int)(text.Length * 0.1));
foreach (char c in chars)
{
int value = Convert.ToInt32(c);
if (value > 127)
result.AppendFormat("&#{0};", value);
else
result.Append(c);
}
return result.ToString();
}
this function will convert all non-english characters, symbols, quotes, etc to html-entities..
try it out and let me know if this helps..

If you're using ASP.NET MVC2 or ASP.NET 4 you can replace <%= with <%: to encode your output. It's safe to use for everything it seems (like HTML Helpers).
There is a good write up of this here: New <%: %> Syntax for HTML Encoding Output in ASP.NET 4 (and ASP.NET MVC 2)

Related

Escaping JavaScript special characters from ASP.NET

I have following C# code in my ASP.NET application:
string script = #"alert('Message head:\n\n" + CompoundErrStr + " message tail.');";
System.Web.UI.ScriptManager.RegisterClientScriptBlock(this, this.GetType(), "Test", script, true);
CompoundErrStr is an error message generated by SQL Server (exception text bubbled up from the stored procedure). If it contains any table column names they are enclosed in single quotes and JavaScript breaks during execution because single quotes are considered a string terminator.
As a fix for single quotes I changed my code to this:
CompoundErrStr = CompoundErrStr.Replace("'", #"\'");
string script = #"alert('Message head:\n\n" + CompoundErrStr + " message tail.');";
System.Web.UI.ScriptManager.RegisterClientScriptBlock(this, this.GetType(), "Test", script, true);
and it now works fine.
However, are there any other special characters that need to be escaped like this? Is there a .Net function that can be used for this purpose? Something similar to HttpServerUtility.HtmlEncode but for JavaScript.
EDIT I use .Net 3.5

Note: for this task you can't (and you shouldn't) use HTML encoders (like HttpServerUtility.HtmlEncode()) because rules for HTML and for JavaScript strings are pretty different. One example: string "Check your Windows folder c:\windows" will be encoded as "Check your Windows folder c:'windows" and it's obviously wrong. Moreover it follows HTML encoding rules then it won't perform any escaping for \, " and '. Simply it's for something else.
If you're targeting ASP.NET Core or .NET 5 then you should use System.Text.Encodings.Web.JavaScriptEncoder class.
If you're targeting .NET 4.x you can use HttpUtility.JavaScriptStringEncode() method.
If you're targeting .NET 3.x and 2.x:
What do you have to encode? Some characters must be escaped (\, " and ') because they have special meaning for JavaScript parser while others may interfere with HTML parsing so should escaped too (if JS is inside an HTML page). You have two options for escaping: JavaScript escape character </kbd> or \uxxxx Unicode code points (note that \uxxxx may be used for them all but it won't work for characters that interferes with HTML parser).
You may do it manually (with search and replace) like this:
string JavaScriptEscape(string text)
{
return text
.Replace("\\", #"\u005c") // Because it's JS string escape character
.Replace("\"", #"\u0022") // Because it may be string delimiter
.Replace("'", #"\u0027") // Because it may be string delimiter
.Replace("&", #"\u0026") // Because it may interfere with HTML parsing
.Replace("<", #"\u003c") // Because it may interfere with HTML parsing
.Replace(">", #"\u003e"); // Because it may interfere with HTML parsing
}
Of course </kbd> should not be escaped if you're using it as escape character! This blind replacement is useful for unknown text (like input from users or text messages that may be translated). Note that if string is enclosed with double quotes then single quotes don't need to be escaped and vice-versa). Be careful to keep verbatim strings on C# code or Unicode replacement will be performed in C# and your client will receive unescaped strings. A note about interfere with HTML parsing: nowadays you seldom need to create a <script> node and to inject it in DOM but it was a pretty common technique and web is full of code like + "</s" + "cript>" to workaround this.
Note: I said blind escaping because if your string contains an escape sequence (like \uxxxx or \t) then it should not be escaped again. For this you have to do some tricks around this code.
If your text comes from user input and it may be multiline then you should also be ready for that or you'll have broken JavaScript code like this:
alert("This is a multiline
comment");
Simply add .Replace("\n", "\\n").Replace("\r", "") to previous JavaScriptEscape() function.
For completeness: there is also another method, if you encode your string Uri.EscapeDataString() then you can decode it in JavaScript with decodeURIComponent() but this is more a dirty trick than a solution.

While the original question mentions .NET 3.5, it should be known to users of 4.0+ that you can use HttpUtility.JavaScriptStringEncode("string")
A second bool parameter specifies whether to include quotation marks (true) or not (false) in the result.

All too easy:
#Html.Raw(myString)

Json.encode special symbols \u003c MVC3

I have JavaScript application, where I use client-side templates (underscore.js, Backbone.js).
Data for initial page load is strapped into the page like this (.cshtml Razor-file):
<div id="model">#Json.Encode(Model)</div>
Razor engine performs escaping, so, if the Model is
new { Title = "<script>alert('XSS');</script>" }
, in output we have:
<div id="model">{"Title":"\u003cscript\u003ealert(\u0027XSS\u0027)\u003c/script\u003e"}</div>
Which after "parse" operation:
var data = JSON.parse($("#model").html());
we have object data with "Title" field exactly "<script>alert('XSS');</script>"!
When this goes to underscore template, it alerts.
Somehow \u003c-like symbols are treated like proper "<" symbols.
How do I escape "<" symbols to < and > from DB (if they somehow got there)?
Maybe I can tune Json.Encode serialization for escaping these symbols?
Maybe I can set up Entity Framework which I`m using, for automatically escape these symbols absolutely all the time when getting data from DB?

\u003c and similar codes are perfectly valid for JS. You can obfuscate whole JS files using this syntax, if you so choose. Essentially, you're seeing an escape character \, u for unicode, and then a 4-character Hex code which relates to a symbol.
http://javascript.about.com/library/blunicode.htm
\u003c - as you've noted, is the < character.
One approach to "fixing" this on the MVC side would be to write a RegEx which looks for the pattern \u - and then captures the next 4 characters. You could then un-encode them into actual unicode characters - and run the resultant text through your XSS prevention algorithms.
As you've noted in your question - just looking for "<" doesn't help. You also can't just look for "\u003cscript" - because this assumes the potential hacker hasn't simply unicode-encoded the entire "script" tag word. The safer approach is to un-escape all of these kinds of codes and then cleanse your HTML in plain-text.
Incidentally, it might make you feel better to note that this is one of the common (and thusfar poorly resolved) issues in XSS prevention. So you aren't alone in wanting a better solution...
You might check out the following libraries to assist in the actual html cleansing:
http://wpl.codeplex.com/ (Microsoft's attempt at a solution - though very bad user feedback)
https://www.owasp.org/index.php/Category:OWASP_AntiSamy_Project_.NET (A private project which is designed to do a lot of this kind of prevention. I find it hard to use, and poorly implemented in .NET)
Both are good references, though.

You need to encode your string as HTML before providing it to Underscore.
"HTML escaping in Underscore.js templates" explains how to do this.

If you want to write unencoded content you will need to use the Html.Raw() helper:
#Html.Raw(Json.Encode(Model))
Edit:
I guess, perhaps I'm not understanding what your problem is. For example within a test controller I have the following
ViewBag.Test = new { Title = "<script>alert('XSS');</script>" };
In the related view:
<script type="text/javascript">
var test = #Html.Raw(Json.Encode(ViewBag.Test));
console.log(test.Title);
document.write(test.Title);
</script>
Which in turn outputs to the console:
<script>alert('XSS');</script>
And opens the alert.

html entity is not rendered

If I just put in XUL file
<label value="°C"/>
it works fine. However, I need to assing ° value to that label element and it doesn't show degree symbol, instead literal value.
UPD
sorry guys, I just missed couple words here - it doesn't work from within javascript - if I assign mylablel.value = degree + "°" - this will show literal value.
It does show degree symbol only if I put above manually in XUL file.

What happens when you use a JavaScript escape, like "\u00B0C", instead of "°C"?
Or when using mylabel.innerHTML instead of mylabel.value? (According to MDC, this should be possible.)
EDIT: you can convert those entities to JavaScript escapes using the Unicode Code Converter.

This makes sense to me. When you express the entity in an attribute value within XML markup, the XML parser interpolates the entity reference and then sets the label value to the result. From Javascript, however, there's no XML parser to do that work for you, and in fact life would be pretty nasty if there were! Note that when you set the value attribute (from Javascript) of an <input type='text'> element, you don't have to worry about having to escape XML entities (or even angle brackets, for that matter). However, you do have to worry about XML entities when you're setting the "value" attribute within XML markup.
Another way to think about it is this: XML entity notation is XML syntax, not Javascript syntax. In Javascript, you can produce special characters using 16-bit Unicode escape sequences, which look like \u followed by a four-digit hex constant. As noted in Marcel Korpel's answer, if you know what Unicode value is produced by the XML entity, then you should be able to use that directly from Javascript. In this case, you could use "\u00B0".

This way it will not work ,can you convert it to be like this
<label>°C</label>

set a text from a java object with new lines to a javascript variable in JSP

I've a Java String with new lines(\n), say for example
String value = "This is a variable\n\nfrom\nJava";
Now I've to set this to a Javascript variable in a JSP file,
<script>var val = '<%= value %>';</script>
But because of the new lines in the above line, I'm getting javascript error "Unterminated String".
Please help me.

Use StringEscapeUtils#escapeEcmaScript() before printing it to JSP.

Newlines will be only one issue. To properly escape the string for display as a JavaScript literal, you have to handle newlines and a wide variety of other characters (not least backslashes and whatever quotes you're using). This isn't hard, but it's non-trivial. Effectively you need to search the string for a range of values (regular expressions are useful here) and substitute the JavaScript escape code (\n, etc.) for it. To avoid charset issues, when doing this sort of thing I escape anything that isn't ASCII into either the JavaScript named escape (\n) or a Unicode escape (\u1234).

How do I escape a string inside JavaScript code inside an onClick handler?

Maybe I'm just thinking about this too hard, but I'm having a problem figuring out what escaping to use on a string in some JavaScript code inside a link's onClick handler. Example:
Select
The <%itemid%> and <%itemname%> are where template substitution occurs. My problem is that the item name can contain any character, including single and double quotes. Currently, if it contains single quotes it breaks the JavaScript code.
My first thought was to use the template language's function to JavaScript-escape the item name, which just escapes the quotes. That will not fix the case of the string containing double quotes which breaks the HTML of the link. How is this problem normally addressed? Do I need to HTML-escape the entire onClick handler?
If so, that would look really strange since the template language's escape function for that would also HTMLify the parentheses, quotes, and semicolons...
This link is being generated for every result in a search results page, so creating a separate method inside a JavaScript tag is not possible, because I'd need to generate one per result.
Also, I'm using a templating engine that was home-grown at the company I work for, so toolkit-specific solutions will be of no use to me.

In JavaScript you can encode single quotes as "\x27" and double quotes as "\x22". Therefore, with this method you can, once you're inside the (double or single) quotes of a JavaScript string literal, use the \x27 \x22 with impunity without fear of any embedded quotes "breaking out" of your string.
\xXX is for chars < 127, and \uXXXX for Unicode, so armed with this knowledge you can create a robust JSEncode function for all characters that are out of the usual whitelist.
For example,
Select

Depending on the server-side language, you could use one of these:
.NET 4.0
string result = System.Web.HttpUtility.JavaScriptStringEncode("jsString")
Java
import org.apache.commons.lang.StringEscapeUtils;
...
String result = StringEscapeUtils.escapeJavaScript(jsString);
Python
import json
result = json.dumps(jsString)
PHP
$result = strtr($jsString, array('\\' => '\\\\', "'" => "\\'", '"' => '\\"',
"\r" => '\\r', "\n" => '\\n' ));
Ruby on Rails
<%= escape_javascript(jsString) %>

Use hidden spans, one each for each of the parameters <%itemid%> and <%itemname%> and write their values inside them.
For example, the span for <%itemid%> would look like <span id='itemid' style='display:none'><%itemid%></span> and in the javascript function SelectSurveyItem to pick the arguments from these spans' innerHTML.

If it's going into an HTML attribute, you'll need to both HTML-encode (as a minimum: > to > < to &lt and " to ") it, and escape single-quotes (with a backslash) so they don't interfere with your javascript quoting.
Best way to do it is with your templating system (extending it, if necessary), but you could simply make a couple of escaping/encoding functions and wrap them both around any data that's going in there.
And yes, it's perfectly valid (correct, even) to HTML-escape the entire contents of your HTML attributes, even if they contain javascript.

Try avoid using string-literals in your HTML and use JavaScript to bind JavaScript events.
Also, avoid 'href=#' unless you really know what you're doing. It breaks so much usability for compulsive middleclickers (tab opener).
<a id="tehbutton" href="somewhereToGoWithoutWorkingJavascript.com">Select</a>
My JavaScript library of choice just happens to be jQuery:
<script type="text/javascript">//<!-- <![CDATA[
jQuery(function($){
$("#tehbutton").click(function(){
SelectSurveyItem('<%itemid%>', '<%itemname%>');
return false;
});
});
//]]>--></script>
If you happen to be rendering a list of links like that, you may want to do this:
<a id="link_1" href="foo">Bar</a>
<a id="link_2" href="foo2">Baz</a>
<script type="text/javascript">
jQuery(function($){
var l = [[1,'Bar'],[2,'Baz']];
$(l).each(function(k,v){
$("#link_" + v[0] ).click(function(){
SelectSurveyItem(v[0],v[1]);
return false;
});
});
});
</script>

Another interesting solution might be to do this:
Select
Then you can use a standard HTML-encoding on both the variables, without having to worry about the extra complication of the javascript quoting.
Yes, this does create HTML that is strictly invalid. However, it is a valid technique, and all modern browsers support it.
If it was my, I'd probably go with my first suggestion, and ensure the values are HTML-encoded and have single-quotes escaped.

Declare separate functions in the <head> section and invoke those in your onClick method. If you have lots you could use a naming scheme that numbers them, or pass an integer in in your onClicks and have a big fat switch statement in the function.

Any good templating engine worth its salt will have an "escape quotes" function. Ours (also home-grown, where I work) also has a function to escape quotes for javascript. In both cases, the template variable is then just appended with _esc or _js_esc, depending on which you want. You should never output user-generated content to a browser that hasn't been escaped, IMHO.

I have faced this problem as well. I made a script to convert single quotes into escaped double quotes that won't break the HTML.
function noQuote(text)
{
var newtext = "";
for (var i = 0; i < text.length; i++) {
if (text[i] == "'") {
newtext += "\"";
}
else {
newtext += text[i];
}
}
return newtext;
}

Use the Microsoft Anti-XSS library which includes a JavaScript encode.

First, it would be simpler if the onclick handler was set this way:
<a id="someLinkId"href="#">Select</a>
<script type="text/javascript">
document.getElementById("someLinkId").onClick =
function() {
SelectSurveyItem('<%itemid%>', '<%itemname%>'); return false;
};
</script>
Then itemid and itemname need to be escaped for JavaScript (that is, " becomes \", etc.).
If you are using Java on the server side, you might take a look at the class StringEscapeUtils from jakarta's common-lang. Otherwise, it should not take too long to write your own 'escapeJavascript' method.

Is the answers here that you can't escape quotes using JavaScript and that you need to start with escaped strings.
Therefore. There's no way of JavaScript being able to handle the string 'Marge said "I'd look that was" to Peter' and you need your data be cleaned before offering it to the script?

I faced the same problem, and I solved it in a tricky way. First make global variables, v1, v2, and v3. And in the onclick, send an indicator, 1, 2, or 3 and in the function check for 1, 2, 3 to put the v1, v2, and v3 like:
onclick="myfun(1)"
onclick="myfun(2)"
onclick="myfun(3)"
function myfun(var)
{
if (var ==1)
alert(v1);
if (var ==2)
alert(v2);
if (var ==3)
alert(v3);
}

Develop Reference

JavaScript is the programming language of the Web.