DOM XSS and Javascript Escaping

DOM XSS and Javascript Escaping - javascript

I am going through all the OWASP rules for DOM Based XSS prevention and trying to get a full understanding of each rule. I'm a bit stuck on this rule:
"RULE #2 - JavaScript Escape Before Inserting Untrusted Data into HTML Attribute Subcontext within the Execution Context"
See here:
https://www.owasp.org/index.php/DOM_based_XSS_Prevention_Cheat_Sheet#RULE_.232_-_JavaScript_Escape_Before_Inserting_Untrusted_Data_into_HTML_Attribute_Subcontext_within_the_Execution_Context
The problem is that I'm not sure what method to use when "javascript escaping" on the front-end? I know it is not a very likely use case because most front-end developers would generally avoid inserting untrusted data in to an html attribute in the first place, but nonetheless I would like to fully understand what is meant with this rule by understanding exactly what the escape method should be. Is there a simple javascript escape method people typically use on the front-end? Thanks!
EDIT: Other answers I find on stackoverflow all mention html escapers. I'm specifically looking for a javascript escaper and I want to know why owasp specifically uses the term "javascript escaper" if, as some people would suggest, an html escaper is sufficient.
Perhaps the question could also be phrased as "In the context of OWASP's cheat sheet for DOM Based XSS what is the difference between html escaping and javascript escaping? Please give an example of javascript escaping.

The escaping needed depends on the context that a value is inserted in. Using the wrong escaping may allow special characters in one context, that aren't special characters in a different context, or corrupt the values.
JavaScript escaping is for values that are inserted directly into a JavaScript string literal via a server-side templating language.
So the example they have is:
x.setAttribute("value", '<%=Encoder.encodeForJS(companyName)%>');
Here, the value of companyName is inserted into a script, surrounded by single quotes making it a JavaScript string literal. The special characters here are things like quotes, new lines, and some unicode whitespace characters. These should be converted to JavaScript escape sequences. So a quote would become \x27 rather than the HTML entity '. If you were to use HTML encoding then a quote character would be displayed as ' and a newline character would cause a syntax error. JavaScript encoding can be done in Java with encodeForJavaScript, or PHP with json_encode.
It's inserted into a JavaScript value so it should be JavaScript encoded. People are used to HTML encoding attributes but this only makes sense when directly inserting into the HTML, not when using the setAttribute DOM method. The encoding needed is the same as if it were like:
var x = '<%=Encoder.encodeForJS(companyName)%>';
The attribute doesn't need to be HTML encoded because it's not in an HTML context. HTML encoding is needed when the value is inserted directly into an attribute like:
<input value='<%=Encoder.encodeForHTML(companyName)%>'>

Related

Javascript RegExp being interpreted different from a string vs from a data-attribute

Long story short, I'm trying to "fix" my system so I'm using the same regular expressions on the backend as we are the front (validating both sides for obvious security reasons). I've got my regex server side working just fine, but getting it down to the client is a pain. My quickest thought was to simply store it in a data attribute on a tag, grab it, and then validate against it.
Well, me, think again! JS is throwing me for a loop because apparently RegExp interprets the string differently depending how it's pulled in. Can anyone shine some light on what is happening here or how I might go about resolving this issue
HTML
<span data-regex="(^\\d{5}$)|(^\\d{5}-\\d{4}$)"></span>
Javascript
new RegExp($0.dataset.regex)
//returns /(^\\d{5}$)|(^\\d{5}-\\d{4}$)/
new RegExp($($0).data('regex'))
//returns /(^\\d{5}$)|(^\\d{5}-\\d{4}$)/
new RegExp("(^\\d{5}$)|(^\\d{5}-\\d{4}$)");
//returns /(^\d{5}$)|(^\d{5}-\d{4}$)/
Note in the first two how if I pull the value from the data attribute dynamically, the constructor for RegExp for some reason doesn't interpret the double slash correctly. If, however, I copy and paste the value as a string and call RegExp on the value, it correctly interprets the double slash and returns it in the right pattern.
I've also attempted simply not escaping the \d character by double slashing on the server side, but as you might (or might not) have guessed, the opposite happens. When pulled from attributes/dataset, the \ is completely removed leading the Regex to think I'm looking for the "d" character rather than digits. I'm at a loss for understanding what JS is thinking here. Please send help, Internet

Your data attribute has redundant backslashes. There's no need to escape backslashes in HTML attributes, so you'll actually get a double-backslash where you don't want one. When writing regular expressions as strings in JavaScript you have to escape backslashes, of course.
So you don't actually have the same string on both sides, simply because escaping works differently.

JS eval() and XSS

If I JS encode untrusted data, and put it into the eval() function, for example like this:
eval('var a="JS_ENCODED_UNTRUSTED_DATA";alert(a);');
How is XSS still possible in that case?
Edit: To clarify what I meant by "JS encode": In Java, I can use OWASP Java Encoder to encode untrusted data for various contexts. For example Encoder.forHTML(UNTRUSTED_DATA) if I'm inserting untrusted data into HTML or Encoder.forJavaScript(UNTRUSTED_DATA) if I'm inserting untrusted data into JS. It simply encodes or escapes dangerous characters in the input string before inserting it into the HTML page or JavaScript. I'm not exactly sure how the Encoder.forJavaScript function encodes each character, but I know that some characters are simply escaped with '\', and some are converted to the \xHH format.

It depends on how you have escaped that "data". Your data is located
in a "-delimited JavaScript string
inside a '-delimited JavaScript string
possibly inside an HTML <script> element (if not being loaded as an external script).
So you would need to call up to three different escape functions on your data to make it secure. That said, there are really few cases where you actually need eval.

UTF-8 in HTML input added by JavaScript

I just don't get it.
My case is, that my application is sending all the needed GUI text by JSON at page startup from my PHP server. On my PHP server I have all text special characters written in UTF-8. Example: Für
So on the client side I have exactly the same value, and it gets displayed nicely everywhere except on input fields. When I do this with JavaScript:
document.getElementById('myInputField').value = "FÖr";
Then it is written exactly like that without any transformation into the special character.
Did I understand something wrong in UTF-8 concepts?
Thanks for any hints.

The notation ü has nothing particular to do with UTF-8. The use of character references is a common way of avoiding the need to use UTF-8; they can be used with any encoding, but if you use UTF-8, you don’t need them.
The notation ü is an HTML notation, not JavaScript. Whether it gets interpreted by HTML rules when it appears inside your JavaScript code depends on the context (like JavaScript inside an HTML document vs. separate JavaScript file). This problem is best avoided by using either characters as such or by using JavaScript notations for characters.
For example, ü means the same as ü, i.e. U+00FC, ü (u with diaeresis). The JavaScript notation, for use inside string literals, for this is \u00fc (\u followed by exactly four hexadecimal digits). E.g., the following sets the value to “Für”:
document.getElementById('myInputField').value = "F\u00fcr";

Your using whats called HTML entities to encode characters which it not the same as UTF-8, but of course a UTF-8 string can include HTML entities.
I think the problem is that tag attributes can't include HTML entities so you have to use some other encoding when assigning the text input value attribute. I think you have two options:
Decode the HTML entity on the client side. A quite ugly solution to piggyback on the decoder available in the browser (im using jQuery in the example, but you probably get the point).
inputElement.value = $("<p/>").html("FÖr").text();
Another option, which is think is nicer, is to not send HTML entities in the server response but instead use proper UTF-8 encoding for all characters which should work fine when put into text nodes or tag attributes. This assumes the HTML page uses UTF-8 encoding of course.

Escape dynamic strings in JavaScript

I'm writing a script for a signature function in a forum program and any time someone puts a quote or some other JavaScript parse-able character into it, it breaks my program.
Is there a way either to force JavaScript to recognize it as a string without parsing it as script or, failing that, a function that escapes all scripting within a string that will be dynamic?
I did a search and all I could find were endless webpages on how to escape individual characters with a slash - perhaps my search skills need work.

Are you putting the contents of the signature using a server-side language, dynamically, in a JavaScript string literal? That probably isn't the best way to go; you may want to reconsider the way you are doing it.
For example, a better way to do it could be that you could just have an element on the page for the signature (which doesn't have to be visually distinct) and then get the contents of that for use in the script during JavaScript runtime.
If you still wanted to take the route you are going, you could replace ' with \' (or " with \" if you are using double-quoted strings in your script) and replace \n with \\n, which replaces real newlines with newline escapes.

Is String.stipTags() enough to mitigate XSS attacks?

I am creating this UTF-8 encoded HTML page where the user can provide input. I wanted to make this XSS proof. I came across this free Javascript framework called Prototype which provides some useful functions. One particular function stripTags essentially strips all tags from the input string.
Would the following input processing prevent XSS?
Perform a thorough UTF-8 decoding of the input(considering all possible UTF-8 representations)
Convert HTML character entities to chars
Run stripTags over the decoded,converted string to remove all possible tags
One of the common comments to antiXSS attempts in Javascript is that the user can bypass the system. How is this possible? In my case, the user using the system is trustworthy. However, other users who may have used the same machine earlier could be malicious.

You only need to change:
& to &
< to <
" to " (if you use single quotes in attributes, also ' to ')
If you've already escaped special HTML characters, then there are no tags in there and strip tags doesn't do anything.
If you use strip tags instead of escaping, then foreign input will be able to escape HTML attributes, e.g.:
<input value="$foo">
if $foo is:
" src="404" onerror="evil()
And if you want to insert untrusted content in JavaScript (inside <script>), then other rules apply:
HTML entities not interpreted in <script>, so don't use them there for escaping.
Use JavaScript string escaping rules (\ → \\, " → \") and replace all occurances of </ with <\/.

If the javascript framework is run on the machine where the input is provided, then this is not secure (You would be trusting content from a potentially malicious client.
In cases where you are running it on the client machine just prior to displaying data, it would depend on what other vulnerabilities are in your code.
A good rule of thumb is that security constraints are typically applied on the server side, where a user can't simply go around them. From what I remember, PHP has a strip tags function for this, and there is similar functionality in Apaches StringEscapeUtils for java. I am sure there is something similar in .net

Develop Reference

JavaScript is the programming language of the Web.