How to handle sanitizing in JavaScript editors that allow formatting

How to handle sanitizing in JavaScript editors that allow formatting - javascript

Many editors like Medium offers formatting now. From what I see in the DOM it simply adds HTML. But how do you sanitize this kind of input without losing the formatting applied by the user?
E.g. clicking bold adds:
<strong class="markup--strong markup--p-strong">text</strong>
but you wouldn't want to render if the user enters that by themselves. So how's that different? Also would that be different if you would style with markdown but also don't let users enter their own markdown but make it only accessible through the browser?
One way I could think of is, escaping every HTML special character, but that seems odd. As far as I know you sanitizer the content only when outputting it

You shold use a server side sanitizer, as stated by Vipin as client side validation is prone to be tampered.
OWASP (Open Web Application Security Project) has some guides and sanitizers that you may use like the java-html-sanitizer.
For a generic brief on the concept please read this https://www.owasp.org/index.php/Data_Validation under the section Sanitize.

You could replace the white-listed elements with other character, for example:
<strong.*> becomes |strong|
Then you remove ALL other HTML. Be aware of onmouseover="alert(1)" so keep it really simple.
Also be careful when rendering the user input. Don't just add it as code. Instead parse it and create the elements using JavaScript. Never use innerHTML, but do use .innerText and document.createElement().

Related

Javascript library to manage translation forms

Is anybody aware of any javascript tool (compatible with jQuery, tinymce or any other clientside library) able to manage the following requirements?
I need to show translation forms in which every field (either input or textarea) could contain some segment variables or code sections (mostly HTML).
For example:
"Hello {{firstname}}, this is your personal page."
or
"You improved your personal score of <strong>{{n}} points</strong>."
Of course I obtain these segments from a template parser and I need to show them to a set of translators that will perform localization towards many languages. I know that in many cases I can (and should!) avoid variables and code inside translation segments, but in many other cases I really can't.
The problem is: I would like to manage coherence about variables and code directly on the browser (I trust my translators but a bit more of UI/UX help is always a good thing!).
A nice approach could be providing the set of variables and code tags, ready to be inserted by means of a single click (in order to avoid mispelled variables or incorrect code syntax) and a bit of pre-submit validation to be sure everything was inserted.
I've seen this approach in other websites, such as Facebook or Freelancer.com (who have the power and the ability to reimplement the whole thing from scratch!).
Do you know about any almost-ready tool/library for this purpose?
Thank you all in advance for any suggestion.

If you are asking for a library to translate text - here is Google Translate API: https://developers.google.com/translate/?csw=1
If you are asking for a library which can take user input, perform validation, and insert into the DOM - then Jquery has everything you need.
If you are asking for something else, let me know and I'll edit my question.

Security comparison of eval and innerHTML for clientside javascript?

I've been doing some experimenting with innerHTML to try and figure out where I need to tighten up security on a webapp I'm working on, and I ran into an interesting injection method on the mozilla docs that I hadn't thought about.
var name = "<img src=x onerror=alert(1)>";
element.innerHTML = name; // Instantly runs code.
It made me wonder a.) if I should be using innerHTML at all, and b.) if it's not a concern, why I've been avoiding other code insertion methods, particularly eval.
Let's assume I'm running javascript clientside on the browser, and I'm taking necessary precautions to avoid exposing any sensitive information in easily accessible functions, and I've gotten to some arbitrarily designated point where I've decided innerHTML is not a security risk, and I've optimized my code to the point where I'm not necessarily worried about a very minor performance hit...
Am I creating any additional problems by using eval? Are there other security concerns other than pure code injection?
Or alternatively, is innerHTML something that I should show the same amount of care with? Is it similarly dangerous?

tl;dr;
Yes, you are correct in your assumption.
Setting innerHTML is susceptible to XSS attacks if you're adding untrusted code.
(If you're adding your code though, that's less of a problem)
Consider using textContent if you want to add text that users added, it'll escape it.
What the problem is
innerHTML sets the HTML content of a DOM node. When you set the content of a DOM node to an arbitrary string, you're vulnerable to XSS if you accept user input.
For example, if you set the innerHTML of a node based on the input of a user from a GET parameter. "User A" can send "User B" a version of your page with the HTML saying "steal the user's data and send it to me via AJAX".
See this question here for more information.
What can I do to mitigate it?
What you might want to consider if you're setting the HTML of nodes is:
Using a templating engine like Mustache which has escaping capabilities. (It'll escape HTML by default)
Using textContent to set the text of nodes manually
Not accepting arbitrary input from users into text fields, sanitizing the data yourself.
See this question on more general approaches to prevent XSS.
Code injection is a problem. You don't want to be on the receiving end.
The Elephant in the room
That's not the only problem with innerHTML and eval. When you're changing the innerHTML of a DOM node, you're destroying its content nodes and creating new ones instead. When you're calling eval you're invoking the compiler.
While the main issue here is clearly un-trusted code and you said performance is less of an issue, I still feel that I must mention that the two are extremely slow to their alternatives.

The quick answer is: you did not think of anything new. If anything, do you want an even better one?
<scr\0ipt>alert("XSSed");</scr\0ipt>
The ground, bottom line is that there are more ways to trigger XSS than you think there is. All the following are valid:
onerror, onload, onclick, onhover, onblur etc... are all valid
The use of character encoding to bypass filters (null byte highlighted above)
eval falls into another category, however - it is a byproduct, most of the time to obfuscate. If you're falling to eval and not innerHTML, you're in a very, very small minority.
The key to all this is to sanitize your data using a parser that keeps up to date with what pen testers discover. There are a couple of those around. They absolutely need to at least filter all the ones on the OWASP list - those are pretty much common.

innerHTML isn't insecure in and of itself. (Nor is eval, if only used on your code. It's actually more of a bad idea for several other reasons.) The insecurity arises in displaying visitor-submitted content. And that risk applies to any mechanism with which you embed user-content: eval, innerHTML, etc. on the client-side, and print, echo, etc. on the server-side.
Anything you put on the page from a visitor must be sanitized. It doesn't matter a great deal whether you do it when the initial page is being built or added asynchronously on the client-side.
So ... yes, you need to show some care when using innerHTML if you're displaying user-submitted content with it.

Allowing basic HTML in posts (inc. line breaks, no-follow links etc.) while maintaining security - CakePHP

In my CakePHP blog, I want to enable users to make similar HTML additions as you can insert here on StackOverflow, i.e. line breaks, links, bold, lists etc. But I am a little unsure how I shall tackle this issue in terms of what is most practical whilst maintaining protection against malicious code in the posts users submit.
Practically is it the most convenient to save the post in a TEXT database field and allow some HTML in that?
If I allow some HTML code in the post, how do I ensure that I only allow non-malicious basic HTML code whilst cleaning out the rest?
Should I be using the CakePHP Sanitize class for that somehow?
Will the FormHelper clean out all HTML users input?
I assume I'll have to use JavaScript to help users generate the right code?

If it's not for developers, have you considered a WYSIWYG addon like TinyMCE?
http://www.tinymce.com/
http://bakery.cakephp.org/articles/galitul/2012/04/11/helper_tinymce_for_cakephp_2
As for security, whitelisting is the safest method. Blacklisting should be avoided because there's no way you can handle all the tricks that can be used to bypass them (e.g. passing in text via hex, etc).
TinyMCE lets you specify a whitelist:
http://www.tinymce.com/wiki.php/Configuration:valid_elements

Use a whitelist for what HTML tags you allow. First HTML encode everything, then decode the specific tags that you allow.
A basic example:
function encodeForOutput(s) {
s = s.replace(/</g, '<').replace(/>/g, '>').replace(/"/g, '"').replace(/&/g, '&');
// allow <b>
s = s.replace(/<b>(.*?)</b>/$, '$1');
return s;
}

How do I strip malicious HTML (XXS etc.) from content submissions?

I have a content submission form that contains multiple fields for input, all of which, when submitted, are entered directly into the database. When this content is requested, it is printed.
I have realized this is a security issue.
How can I strip malicious HTML (XSS) only, while still allowing formatting tags (b, i etc.)?

#pst is correct...you need to explicitly allow certain tags. But the problem is that the input can be all over the place therefore you'll need to use a library like HTML Tidy (link to Source Forge Project) to get it into a place where you can then DOMDocument::loadHTML the cleaned document.
You should use HTML Tidy to clean your input and get it into a complaint state so you can then explicitly allow certain tags. Everything else should be removed from your cleaned content before its permanently stored. (NOTE: for performance reasons do not store BLOBs in your database, store them in your file system and link to them with a file path in a secure location - a location that is not in your web root).
Good luck.

First run htmlspecialchars on the input and then undo it for the allowed tags (for example, replace <b> with <b>).

Use mysql_stripslashes(), htmlspecialchars() and urldecode(), for integer values you can probably just int typecast.

Strictly define which "innocent" html tags you are going to allow - like <strong> or <em>. Then run a regex to accept only those you want while rejecting all others.

I think encoding the input would help...
For PHP I believe it is:
htmlspecialchars

There are several ways to handle this.
First off lets be clear: to do this in a secure manner, it cannot be done in javascript, only on the serverside - using javascript to securely enforce input sanitation is doomed to fail
Encode the chars that make up html when you output user generated data
When the user generated data is outputted on your webpage, change a few of the charachters to make it secure. Namely the characters <, > and & should be changed to <, > and & respectively.
This is the best way to do it, if the user should be allowed to edit the text, since you don't actually alter the text in storage, and you can let the user change the unmodified text via a textarea
Encode the chars that make up html when you store the user generated data
Do the same as above, but do it before you store the data in your db.
This has a performance upside, since you don't need to encode it every time you output it, but it will not let your users edit the unmodified text, which can be a serious downside, depending on what you are building
Strip the characters before output or storage
Strip the < and > characters before either output or storage - this is not a very good solution in my opinion, since it is an unnecessary altering of user input, but some people prefer it.

Input validation check

In my website I have a forum, and I want to avoid cross site scripting. Do you know a good input validation script?

There are two ways to avoid Cross Site Scripting.
Filter the inputs by the users
(mainly script tags and html tags)
both at client side as well as on
server side.
Display the contents as
Html entities to avoid Cross Site
Scripting. Ofcourse if you want to show some
of the tags, go for option one.
Otherwise option two is more
reliable.
You can use regular expressions to filter the data both on client side as well as on server side.

I've always relied on the OWASP PHP filters: http://www.owasp.org/index.php/OWASP_PHP_Filters As you can tell from the name, they're server-side (JavaScript or HTML5 validation is only useful for assisting the user) and OWASP (the Open Web Application Security Project) is a non-profit organisation.

It depends on where do you want to write out the data. For example you need different filter when you write the text into an input field and when you write it simply into the html body, between two tags.
You should implement different filters for the different data types on server side. I suggest you should filter the text when it's printed out, and not when the user sends it to you(of course it's not about sql injections and other server side tricks), because (as I mentioned above) the type of the filter you should use is depends on where the data is printed out.
If you want to write a really simple forum, then it's enough to write only one filter, wich simply removes all html tags from the text before it's printed out. Beware, it's not good for advanced functions, for example edit comments, when you prefill a form for the user, or if the users can use any html tag in their comments, etc.

Simple. Make sure you escape the HTML from your input object before using it. This way, the data sent will be treated as raw text. The way to do this will be to pass the input through some parser before embedding the data in your page (or working with it somehow).

I agree with anand that there are two major ways to avoid XSS: validation on input and escaping on output. For validating form input, tie into Django's Form Validation Framework: http://code.google.com/appengine/articles/djangoforms.html
Here are some code samples for sanitizing on output within a Django templating. Instead of this:
Welcome, {{ firstname }}!
Do this:
Welcome, {{ firstname|escape }}!
This is from this very good blog post: http://startupsecurity.info/blog/2008/10/28/avoid-xss-on-google-app-engine/

Server Side
http://www.php.net/manual/en/function.html-entity-decode.php
http://www.php.net/manual/en/function.addslashes.php
http://www.php.net/manual/en/function.stripslashes.php
Check more string functions you need to validate
Client Side
http://www.position-relative.net/creation/formValidator/
better to write your own jquery code, in future it may help you

You have two option for validation. for non-sensitive data client side JavaScript may be use.
in JavaScript, you can write simple function for validating your data.
for sensitive data, you should be use server side scripting like, php,jsp,asp,asp.net etc.
may this will help you.

Develop Reference

JavaScript is the programming language of the Web.