HTML in JSON that "should" not be there - javascript

I have description field in a form.
As suggested here, HTML escaping should not be done in input, so if you put <h1>Description</h1> it is saved like this to database.
The problem is that I have defined a REST API, and the output "could" be HTML.
Should I escape the field when constructing the JSON or should I output HTML in JSON and let the client escape it?.
I feel I should escape the HTML server side, but then this operation would cost processing time. On the other hand, escaping in HTML saves this server time, but people using the API not carefully escaping HTML could end with XSS attacks.

A client may, probably will, be a Javascript client which should process such potential HTML values using the DOM API:
document.getElementById('output').textContent = json.result;
Using this DOM API is perfectly safe and does not require to escape json.result, since it's never interpolated as HTML, but treated as text node by a higher level API. If you send escaped HTML and the client is doing it properly like here, then escaped HTML will be shown on the client; i.e. you're turning your data into garbage.
So, no, never escape values for unrelated contexts. Escape/encode for JSON when putting values into JSON, don't worry about what may or may not happen later.

Related

Storing long formatted text for a web app

My intention is to store books and other types of large blobs of formatted text (100 to thousands of words on each chapter) to be displayed with their format in an application built with the aurelia framework. I would prefer using JSON, but I could try other alternatives. The text has been written using google docs.
So far, trying to use JSON, Visual Studio Code says Unexpected end of string at the first carriage return, and the application gives me an error in the console:
Unhandled rejection SyntaxError: Unexpected token in JSON at position 780
Is there any way to indicate to JSON that something is formatted text, or any decent alternative?
You're JSON has characters in it that aren't properly escaped. Most likely these are quote " characters and need \" before them all. Unless you have a particularly robust workflow setup to handle transcribing, you're going to run into this problem a lot with large documents, especially coming from a word processor.
Instead, why not simply store the material as HTML? It is specifically designed to store and markup documents. It has headings, paragraphs, lists, etc. Browsers are already equipped to display it without doing any processing and it can be easily injected into your application by simply appending it to any element on the page.
Additionally, Google Docs should be able to save the document as HTML directly, so you don't have to do any manual markup.
You need to escape special characters. This discussion may help. Note that you will probably have your own list of escaped characters, which depends on your source string.

JS eval() and XSS

If I JS encode untrusted data, and put it into the eval() function, for example like this:
eval('var a="JS_ENCODED_UNTRUSTED_DATA";alert(a);');
How is XSS still possible in that case?
Edit: To clarify what I meant by "JS encode": In Java, I can use OWASP Java Encoder to encode untrusted data for various contexts. For example Encoder.forHTML(UNTRUSTED_DATA) if I'm inserting untrusted data into HTML or Encoder.forJavaScript(UNTRUSTED_DATA) if I'm inserting untrusted data into JS. It simply encodes or escapes dangerous characters in the input string before inserting it into the HTML page or JavaScript. I'm not exactly sure how the Encoder.forJavaScript function encodes each character, but I know that some characters are simply escaped with '\', and some are converted to the \xHH format.
It depends on how you have escaped that "data". Your data is located
in a "-delimited JavaScript string
inside a '-delimited JavaScript string
possibly inside an HTML <script> element (if not being loaded as an external script).
So you would need to call up to three different escape functions on your data to make it secure. That said, there are really few cases where you actually need eval.

Chars sanitization and XSS

I was doing the Google's XSS game (https://xss-game.appspot.com/level4) and I managed to solve the 4th level. I didn't completely undestand how, though.
I don't understand why if I inject the encoding version of a char (let's say %3B) this is translated into the char itself (that is ';') inside the final HTML page. I mean who does this, the browser? Why?
Furthermore, I don't understand where in the code the the injected chars are checked. I made some tests and I've seen that if I try to inject strings like '()';"' whatever comes after the ; is cut out! Where does this happen in the code?
Finally, if I inject a tag like <asd> it is encoded within the <div> (that is <asd>) but it does not in the onload attribute of the <img> tag, where in the code this stuff is performed?
(This answer makes a number of assumptions because I don't have access to Google's client side or server side code (the link goes to an error page because I haven't played the game to reach the level)).
The ((probably) server side) URL parser (which will be part of the server side code) is responsible for converting percent-encoded data in URLs into characters.
; is a key/value separator in form encoding syntax. The URL parser will cut off data at that point.
Responsibility for converting text into HTML is usually given to the template engine, but might be done in some general server side code before data gets to the template (assuming there is a template, the general server side code might just smash strings together).
In order to manage level 4 just enter
')*alert('xss

using decodeURIComponent within asp.net

I encoded an html text property using javascript and pass it into my database as such.
I mean
the javascript for string like "Wales&PALS"
encodeURIComponent(e.value);
converted to "Wales%20PALS"
I want to convert it back to "Wales&PALS" from asp.net. Any idea on how to embed
decodeURIComponent(datatablevalues)
in my asp.net function to return the desired text?
As a prevention for SQL injection we use parametrized queries or stored procedures. Encoding isn't really suitable for that. Html encoding is nice if you expect your users to add stuff to your website and you want to prevent them injecting malicious javascript for instance. By encoding the string the browser would just print out the contents. What you're doing is that you encode the string, add it to the database, but then you try to decode it back to the original state and display it for the clients. That way you're vulnerable to many kinds of javascript injections..
If that's what you intended, no problem, just be aware of the consequences. Know "why" and "how" every time you make a decision like this. It's kinda dangerous.
For instance, if you wanted to enable your users to add html tags as a means of enhancing the inserted content, a more secure alternative for this would be to create your own set of tags (or use an existing one like BBCode), so the input never contains any html markup and when you insert it into the database, simply parse it first to switch to real html tags. Asp.net engine will never allow malicious input during a request (unless you voluntarily force it do so) and because you already control parsing the input, you can be sure it's secure when you output it, so there's no need for some additional processing.
Just an idea for you :)
If you really insist on doing it your way (encode -> db -> decode -> output), we have some options how to do that. I'll show you one example:
For instance you could create a new get-only property, that would return your decoded data. (you will still maintain the original encoded data if you need to). Something like this:
public string DecodedData
{
get
{
return HttpUtility.UrlDecode(originalData);
}
}
http://msdn.microsoft.com/en-us/library/system.web.httputility.aspx
If you're trying to encode a html input, maybe you'd be better off with a different encoding mechanism. Not sure if javascripts encodeURIComponent can correctly parse out html.
Try UrlDecode in HttpServerUtility. API page for it

How do I strip malicious HTML (XXS etc.) from content submissions?

I have a content submission form that contains multiple fields for input, all of which, when submitted, are entered directly into the database. When this content is requested, it is printed.
I have realized this is a security issue.
How can I strip malicious HTML (XSS) only, while still allowing formatting tags (b, i etc.)?
#pst is correct...you need to explicitly allow certain tags. But the problem is that the input can be all over the place therefore you'll need to use a library like HTML Tidy (link to Source Forge Project) to get it into a place where you can then DOMDocument::loadHTML the cleaned document.
You should use HTML Tidy to clean your input and get it into a complaint state so you can then explicitly allow certain tags. Everything else should be removed from your cleaned content before its permanently stored. (NOTE: for performance reasons do not store BLOBs in your database, store them in your file system and link to them with a file path in a secure location - a location that is not in your web root).
Good luck.
First run htmlspecialchars on the input and then undo it for the allowed tags (for example, replace <b> with <b>).
Use mysql_stripslashes(), htmlspecialchars() and urldecode(), for integer values you can probably just int typecast.
Strictly define which "innocent" html tags you are going to allow - like <strong> or <em>. Then run a regex to accept only those you want while rejecting all others.
I think encoding the input would help...
For PHP I believe it is:
htmlspecialchars
There are several ways to handle this.
First off lets be clear: to do this in a secure manner, it cannot be done in javascript, only on the serverside - using javascript to securely enforce input sanitation is doomed to fail
Encode the chars that make up html when you output user generated data
When the user generated data is outputted on your webpage, change a few of the charachters to make it secure. Namely the characters <, > and & should be changed to <, > and & respectively.
This is the best way to do it, if the user should be allowed to edit the text, since you don't actually alter the text in storage, and you can let the user change the unmodified text via a textarea
Encode the chars that make up html when you store the user generated data
Do the same as above, but do it before you store the data in your db.
This has a performance upside, since you don't need to encode it every time you output it, but it will not let your users edit the unmodified text, which can be a serious downside, depending on what you are building
Strip the characters before output or storage
Strip the < and > characters before either output or storage - this is not a very good solution in my opinion, since it is an unnecessary altering of user input, but some people prefer it.

Categories

Resources