If I JS encode untrusted data, and put it into the eval() function, for example like this:
eval('var a="JS_ENCODED_UNTRUSTED_DATA";alert(a);');
How is XSS still possible in that case?
Edit: To clarify what I meant by "JS encode": In Java, I can use OWASP Java Encoder to encode untrusted data for various contexts. For example Encoder.forHTML(UNTRUSTED_DATA) if I'm inserting untrusted data into HTML or Encoder.forJavaScript(UNTRUSTED_DATA) if I'm inserting untrusted data into JS. It simply encodes or escapes dangerous characters in the input string before inserting it into the HTML page or JavaScript. I'm not exactly sure how the Encoder.forJavaScript function encodes each character, but I know that some characters are simply escaped with '\', and some are converted to the \xHH format.
It depends on how you have escaped that "data". Your data is located
in a "-delimited JavaScript string
inside a '-delimited JavaScript string
possibly inside an HTML <script> element (if not being loaded as an external script).
So you would need to call up to three different escape functions on your data to make it secure. That said, there are really few cases where you actually need eval.
Related
I am going through all the OWASP rules for DOM Based XSS prevention and trying to get a full understanding of each rule. I'm a bit stuck on this rule:
"RULE #2 - JavaScript Escape Before Inserting Untrusted Data into HTML Attribute Subcontext within the Execution Context"
See here:
https://www.owasp.org/index.php/DOM_based_XSS_Prevention_Cheat_Sheet#RULE_.232_-_JavaScript_Escape_Before_Inserting_Untrusted_Data_into_HTML_Attribute_Subcontext_within_the_Execution_Context
The problem is that I'm not sure what method to use when "javascript escaping" on the front-end? I know it is not a very likely use case because most front-end developers would generally avoid inserting untrusted data in to an html attribute in the first place, but nonetheless I would like to fully understand what is meant with this rule by understanding exactly what the escape method should be. Is there a simple javascript escape method people typically use on the front-end? Thanks!
EDIT: Other answers I find on stackoverflow all mention html escapers. I'm specifically looking for a javascript escaper and I want to know why owasp specifically uses the term "javascript escaper" if, as some people would suggest, an html escaper is sufficient.
Perhaps the question could also be phrased as "In the context of OWASP's cheat sheet for DOM Based XSS what is the difference between html escaping and javascript escaping? Please give an example of javascript escaping.
The escaping needed depends on the context that a value is inserted in. Using the wrong escaping may allow special characters in one context, that aren't special characters in a different context, or corrupt the values.
JavaScript escaping is for values that are inserted directly into a JavaScript string literal via a server-side templating language.
So the example they have is:
x.setAttribute("value", '<%=Encoder.encodeForJS(companyName)%>');
Here, the value of companyName is inserted into a script, surrounded by single quotes making it a JavaScript string literal. The special characters here are things like quotes, new lines, and some unicode whitespace characters. These should be converted to JavaScript escape sequences. So a quote would become \x27 rather than the HTML entity '. If you were to use HTML encoding then a quote character would be displayed as ' and a newline character would cause a syntax error. JavaScript encoding can be done in Java with encodeForJavaScript, or PHP with json_encode.
It's inserted into a JavaScript value so it should be JavaScript encoded. People are used to HTML encoding attributes but this only makes sense when directly inserting into the HTML, not when using the setAttribute DOM method. The encoding needed is the same as if it were like:
var x = '<%=Encoder.encodeForJS(companyName)%>';
The attribute doesn't need to be HTML encoded because it's not in an HTML context. HTML encoding is needed when the value is inserted directly into an attribute like:
<input value='<%=Encoder.encodeForHTML(companyName)%>'>
I have description field in a form.
As suggested here, HTML escaping should not be done in input, so if you put <h1>Description</h1> it is saved like this to database.
The problem is that I have defined a REST API, and the output "could" be HTML.
Should I escape the field when constructing the JSON or should I output HTML in JSON and let the client escape it?.
I feel I should escape the HTML server side, but then this operation would cost processing time. On the other hand, escaping in HTML saves this server time, but people using the API not carefully escaping HTML could end with XSS attacks.
A client may, probably will, be a Javascript client which should process such potential HTML values using the DOM API:
document.getElementById('output').textContent = json.result;
Using this DOM API is perfectly safe and does not require to escape json.result, since it's never interpolated as HTML, but treated as text node by a higher level API. If you send escaped HTML and the client is doing it properly like here, then escaped HTML will be shown on the client; i.e. you're turning your data into garbage.
So, no, never escape values for unrelated contexts. Escape/encode for JSON when putting values into JSON, don't worry about what may or may not happen later.
I have a function to escape HTML tags, to be able to insert text into HTML.
Very similar to:
Can I escape html special chars in javascript?
I know that Javascript use Unicode internally, but HTML pages may be encoded in different charsets like UTF-8 or ISO8859-1, etc..
My question is: There is any issue with this very simple conversion? or should I take into consideration the page charset?
If yes, how to handle that?
PS: For example, the equivalente PHP function (http://php.net/manual/en/function.htmlspecialchars.php) has a parameter to select a charset.
No, JavaScript lives in the Unicode world so encoding issues are generally invisible to it. escapeHtml in the linked question is fine.
The only place I can think of where JavaScript gets to see bytes would be data: URLs (typically hidden beneath base64). So this:
var markup = '<p>Hello, '+escapeHtml(user_supplied_data);
var url = 'data:text/html;base64,'+btoa(markup);
iframe.src = url;
is in principle a bad thing. Although I don't know of any browsers that will guess UTF-7 in this situation, a charset=... parameter should be supplied to ensure that the browser uses the appropriate encoding for the data. (btoa uses ISO-8859-1, for what it's worth.)
I just don't get it.
My case is, that my application is sending all the needed GUI text by JSON at page startup from my PHP server. On my PHP server I have all text special characters written in UTF-8. Example: Für
So on the client side I have exactly the same value, and it gets displayed nicely everywhere except on input fields. When I do this with JavaScript:
document.getElementById('myInputField').value = "FÖr";
Then it is written exactly like that without any transformation into the special character.
Did I understand something wrong in UTF-8 concepts?
Thanks for any hints.
The notation ü has nothing particular to do with UTF-8. The use of character references is a common way of avoiding the need to use UTF-8; they can be used with any encoding, but if you use UTF-8, you don’t need them.
The notation ü is an HTML notation, not JavaScript. Whether it gets interpreted by HTML rules when it appears inside your JavaScript code depends on the context (like JavaScript inside an HTML document vs. separate JavaScript file). This problem is best avoided by using either characters as such or by using JavaScript notations for characters.
For example, ü means the same as ü, i.e. U+00FC, ü (u with diaeresis). The JavaScript notation, for use inside string literals, for this is \u00fc (\u followed by exactly four hexadecimal digits). E.g., the following sets the value to “Für”:
document.getElementById('myInputField').value = "F\u00fcr";
Your using whats called HTML entities to encode characters which it not the same as UTF-8, but of course a UTF-8 string can include HTML entities.
I think the problem is that tag attributes can't include HTML entities so you have to use some other encoding when assigning the text input value attribute. I think you have two options:
Decode the HTML entity on the client side. A quite ugly solution to piggyback on the decoder available in the browser (im using jQuery in the example, but you probably get the point).
inputElement.value = $("<p/>").html("FÖr").text();
Another option, which is think is nicer, is to not send HTML entities in the server response but instead use proper UTF-8 encoding for all characters which should work fine when put into text nodes or tag attributes. This assumes the HTML page uses UTF-8 encoding of course.
I am creating this UTF-8 encoded HTML page where the user can provide input. I wanted to make this XSS proof. I came across this free Javascript framework called Prototype which provides some useful functions. One particular function stripTags essentially strips all tags from the input string.
Would the following input processing prevent XSS?
Perform a thorough UTF-8 decoding of the input(considering all possible UTF-8 representations)
Convert HTML character entities to chars
Run stripTags over the decoded,converted string to remove all possible tags
One of the common comments to antiXSS attempts in Javascript is that the user can bypass the system. How is this possible? In my case, the user using the system is trustworthy. However, other users who may have used the same machine earlier could be malicious.
You only need to change:
& to &
< to <
" to " (if you use single quotes in attributes, also ' to ')
If you've already escaped special HTML characters, then there are no tags in there and strip tags doesn't do anything.
If you use strip tags instead of escaping, then foreign input will be able to escape HTML attributes, e.g.:
<input value="$foo">
if $foo is:
" src="404" onerror="evil()
And if you want to insert untrusted content in JavaScript (inside <script>), then other rules apply:
HTML entities not interpreted in <script>, so don't use them there for escaping.
Use JavaScript string escaping rules (\ → \\, " → \") and replace all occurances of </ with <\/.
If the javascript framework is run on the machine where the input is provided, then this is not secure (You would be trusting content from a potentially malicious client.
In cases where you are running it on the client machine just prior to displaying data, it would depend on what other vulnerabilities are in your code.
A good rule of thumb is that security constraints are typically applied on the server side, where a user can't simply go around them. From what I remember, PHP has a strip tags function for this, and there is similar functionality in Apaches StringEscapeUtils for java. I am sure there is something similar in .net