How do I strip malicious HTML (XXS etc.) from content submissions? - javascript

I have a content submission form that contains multiple fields for input, all of which, when submitted, are entered directly into the database. When this content is requested, it is printed.
I have realized this is a security issue.
How can I strip malicious HTML (XSS) only, while still allowing formatting tags (b, i etc.)?

#pst is correct...you need to explicitly allow certain tags. But the problem is that the input can be all over the place therefore you'll need to use a library like HTML Tidy (link to Source Forge Project) to get it into a place where you can then DOMDocument::loadHTML the cleaned document.
You should use HTML Tidy to clean your input and get it into a complaint state so you can then explicitly allow certain tags. Everything else should be removed from your cleaned content before its permanently stored. (NOTE: for performance reasons do not store BLOBs in your database, store them in your file system and link to them with a file path in a secure location - a location that is not in your web root).
Good luck.

First run htmlspecialchars on the input and then undo it for the allowed tags (for example, replace <b> with <b>).

Use mysql_stripslashes(), htmlspecialchars() and urldecode(), for integer values you can probably just int typecast.

Strictly define which "innocent" html tags you are going to allow - like <strong> or <em>. Then run a regex to accept only those you want while rejecting all others.

I think encoding the input would help...
For PHP I believe it is:
htmlspecialchars

There are several ways to handle this.
First off lets be clear: to do this in a secure manner, it cannot be done in javascript, only on the serverside - using javascript to securely enforce input sanitation is doomed to fail
Encode the chars that make up html when you output user generated data
When the user generated data is outputted on your webpage, change a few of the charachters to make it secure. Namely the characters <, > and & should be changed to <, > and & respectively.
This is the best way to do it, if the user should be allowed to edit the text, since you don't actually alter the text in storage, and you can let the user change the unmodified text via a textarea
Encode the chars that make up html when you store the user generated data
Do the same as above, but do it before you store the data in your db.
This has a performance upside, since you don't need to encode it every time you output it, but it will not let your users edit the unmodified text, which can be a serious downside, depending on what you are building
Strip the characters before output or storage
Strip the < and > characters before either output or storage - this is not a very good solution in my opinion, since it is an unnecessary altering of user input, but some people prefer it.

Related

How to handle sanitizing in JavaScript editors that allow formatting

Many editors like Medium offers formatting now. From what I see in the DOM it simply adds HTML. But how do you sanitize this kind of input without losing the formatting applied by the user?
E.g. clicking bold adds:
<strong class="markup--strong markup--p-strong">text</strong>
but you wouldn't want to render if the user enters that by themselves. So how's that different? Also would that be different if you would style with markdown but also don't let users enter their own markdown but make it only accessible through the browser?
One way I could think of is, escaping every HTML special character, but that seems odd. As far as I know you sanitizer the content only when outputting it
You shold use a server side sanitizer, as stated by Vipin as client side validation is prone to be tampered.
OWASP (Open Web Application Security Project) has some guides and sanitizers that you may use like the java-html-sanitizer.
For a generic brief on the concept please read this https://www.owasp.org/index.php/Data_Validation under the section Sanitize.
You could replace the white-listed elements with other character, for example:
<strong.*> becomes |strong|
Then you remove ALL other HTML. Be aware of onmouseover="alert(1)" so keep it really simple.
Also be careful when rendering the user input. Don't just add it as code. Instead parse it and create the elements using JavaScript. Never use innerHTML, but do use .innerText and document.createElement().

Simple Sanitisation [duplicate]

This question already has answers here:
How can I sanitize user input with PHP?
(16 answers)
Closed 6 months ago.
I am trying to come up with a function that I can pass all my strings through to sanitize. So that the string that comes out of it will be safe for database insertion. But there are so many filtering functions out there I am not sure which ones I should use/need.
Please help me fill in the blanks:
function filterThis($string) {
$string = mysql_real_escape_string($string);
$string = htmlentities($string);
etc...
return $string;
}
Stop!
You're making a mistake here. Oh, no, you've picked the right PHP functions to make your data a bit safer. That's fine. Your mistake is in the order of operations, and how and where to use these functions.
It's important to understand the difference between sanitizing and validating user data, escaping data for storage, and escaping data for presentation.
Sanitizing and Validating User Data
When users submit data, you need to make sure that they've provided something you expect.
Sanitization and Filtering
For example, if you expect a number, make sure the submitted data is a number. You can also cast user data into other types. Everything submitted is initially treated like a string, so forcing known-numeric data into being an integer or float makes sanitization fast and painless.
What about free-form text fields and textareas? You need to make sure that there's nothing unexpected in those fields. Mainly, you need to make sure that fields that should not have any HTML content do not actually contain HTML. There are two ways you can deal with this problem.
First, you can try escaping HTML input with htmlspecialchars. You should not use htmlentities to neutralize HTML, as it will also perform encoding of accented and other characters that it thinks also need to be encoded.
Second, you can try removing any possible HTML. strip_tags is quick and easy, but also sloppy. HTML Purifier does a much more thorough job of both stripping out all HTML and also allowing a selective whitelist of tags and attributes through.
Modern PHP versions ship with the filter extension, which provides a comprehensive way to sanitize user input.
Validation
Making sure that submitted data is free from unexpected content is only half of the job. You also need to try and make sure that the data submitted contains values you can actually work with.
If you're expecting a number between 1 and 10, you need to check that value. If you're using one of those new fancy HTML5-era numeric inputs with a spinner and steps, make sure that the submitted data is in line with the step.
If that data came from what should be a drop-down menu, make sure that the submitted value is one that appeared in the menu.
What about text inputs that fulfill other needs? For example, date inputs should be validated through strtotime or the DateTime class. The given date should be between the ranges you expect. What about email addresses? The previously mentioned filter extension can check that an address is well-formed, though I'm a fan of the is_email library.
The same is true for all other form controls. Have radio buttons? Validate against the list. Have checkboxes? Validate against the list. Have a file upload? Make sure the file is of an expected type, and treat the filename like unfiltered user data.
Every modern browser comes with a complete set of developer tools built right in, which makes it trivial for anyone to manipulate your form. Your code should assume that the user has completely removed all client-side restrictions on form content!
Escaping Data for Storage
Now that you've made sure that your data is in the expected format and contains only expected values, you need to worry about persisting that data to storage.
Every single data storage mechanism has a specific way to make sure data is properly escaped and encoded. If you're building SQL, then the accepted way to pass data in queries is through prepared statements with placeholders.
One of the better ways to work with most SQL databases in PHP is the PDO extension. It follows the common pattern of preparing a statement, binding variables to the statement, then sending the statement and variables to the server. If you haven't worked with PDO before here's a pretty good MySQL-oriented tutorial.
Some SQL databases have their own specialty extensions in PHP, including SQL Server, PostgreSQL and SQLite 3. Each of those extensions has prepared statement support that operates in the same prepare-bind-execute fashion as PDO. Sometimes you may need to use these extensions instead of PDO to support non-standard features or behavior.
MySQL also has its own PHP extensions. Two of them, in fact. You only want to ever use the one called mysqli. The old "mysql" extension has been deprecated and is not safe or sane to use in the modern era.
I'm personally not a fan of mysqli. The way it performs variable binding on prepared statements is inflexible and can be a pain to use. When in doubt, use PDO instead.
If you are not using an SQL database to store your data, check the documentation for the database interface you're using to determine how to safely pass data through it.
When possible, make sure that your database stores your data in an appropriate format. Store numbers in numeric fields. Store dates in date fields. Store money in a decimal field, not a floating point field. Review the documentation provided by your database on how to properly store different data types.
Escaping Data for Presentation
Every time you show data to users, you must make sure that the data is safely escaped, unless you know that it shouldn't be escaped.
When emitting HTML, you should almost always pass any data that was originally user-supplied through htmlspecialchars. In fact, the only time you shouldn't do this is when you know that the user provided HTML, and that you know that it's already been sanitized it using a whitelist.
Sometimes you need to generate some Javascript using PHP. Javascript does not have the same escaping rules as HTML! A safe way to provide user-supplied values to Javascript via PHP is through json_encode.
And More
There are many more nuances to data validation.
For example, character set encoding can be a huge trap. Your application should follow the practices outlined in "UTF-8 all the way through". There are hypothetical attacks that can occur when you treat string data as the wrong character set.
Earlier I mentioned browser debug tools. These tools can also be used to manipulate cookie data. Cookies should be treated as untrusted user input.
Data validation and escaping are only one aspect of web application security. You should make yourself aware of web application attack methodologies so that you can build defenses against them.
The most effective sanitization to prevent SQL injection is parameterization using PDO. Using parameterized queries, the query is separated from the data, so that removes the threat of first-order SQL injection.
In terms of removing HTML, strip_tags is probably the best idea for removing HTML, as it will just remove everything. htmlentities does what it sounds like, so that works, too. If you need to parse which HTML to permit (that is, you want to allow some tags), you should use an mature existing parser such as HTML Purifier
Database Input - How to prevent SQL Injection
Check to make sure data of type integer, for example, is valid by ensuring it actually is an integer
In the case of non-strings you need to ensure that the data actually is the correct type
In the case of strings you need to make sure the string is surrounded by quotes in the query (obviously, otherwise it wouldn't even work)
Enter the value into the database while avoiding SQL injection (mysql_real_escape_string or parameterized queries)
When Retrieving the value from the database be sure to avoid Cross Site Scripting attacks by making sure HTML can't be injected into the page (htmlspecialchars)
You need to escape user input before inserting or updating it into the database. Here is an older way to do it. You would want to use parameterized queries now (probably from the PDO class).
$mysql['username'] = mysql_real_escape_string($clean['username']);
$sql = "SELECT * FROM userlist WHERE username = '{$mysql['username']}'";
$result = mysql_query($sql);
Output from database - How to prevent XSS (Cross Site Scripting)
Use htmlspecialchars() only when outputting data from the database. The same applies for HTML Purifier. Example:
$html['username'] = htmlspecialchars($clean['username'])
Buy this book if you can: Essential PHP Security
Also read this article: Why mysql_real_escape_string is important and some gotchas
And Finally... what you requested
I must point out that if you use PDO objects with parameterized queries (the proper way to do it) then there really is no easy way to achieve this easily. But if you use the old 'mysql' way then this is what you would need.
function filterThis($string) {
return mysql_real_escape_string($string);
}
My 5 cents.
Nobody here understands the way mysql_real_escape_string works. This function do not filter or "sanitize" anything.
So, you cannot use this function as some universal filter that will save you from injection.
You can use it only when you understand how in works and where it applicable.
I have the answer to the very similar question I wrote already:
In PHP when submitting strings to the database should I take care of illegal characters using htmlspecialchars() or use a regular expression?
Please click for the full explanation for the database side safety.
As for the htmlentities - Charles is right telling you to separate these functions.
Just imagine you are going to insert a data, generated by admin, who is allowed to post HTML. your function will spoil it.
Though I'd advise against htmlentities. This function become obsoleted long time ago. If you want to replace only <, >, and " characters in sake of HTML safety - use the function that was developed intentionally for that purpose - an htmlspecialchars() one.
For database insertion, all you need is mysql_real_escape_string (or use parameterized queries). You generally don't want to alter data before saving it, which is what would happen if you used htmlentities. That would lead to a garbled mess later on when you ran it through htmlentities again to display it somewhere on a webpage.
Use htmlentities when you are displaying the data on a webpage somewhere.
Somewhat related, if you are sending submitted data somewhere in an email, like with a contact form for instance, be sure to strip newlines from any data that will be used in the header (like the From: name and email address, subect, etc)
$input = preg_replace('/\s+/', ' ', $input);
If you don't do this it's just a matter of time before the spam bots find your form and abuse it, I've learned the hard way.
It depends on the kind of data you are using. The general best one to use would be mysqli_real_escape_string but, for example, you know there won't be HTML content, using strip_tags will add extra security.
You can also remove characters you know shouldn't be allowed.
You use mysql_real_escape_string() in code similar to the following one.
$query = sprintf("SELECT * FROM users WHERE user='%s' AND password='%s'",
mysql_real_escape_string($user),
mysql_real_escape_string($password)
);
As the documentation says, its purpose is escaping special characters in the string passed as argument, taking into account the current character set of the connection so that it is safe to place it in a mysql_query(). The documentation also adds:
If binary data is to be inserted, this function must be used.
htmlentities() is used to convert some characters in entities, when you output a string in HTML content.
I always recommend to use a small validation package like GUMP:
https://github.com/Wixel/GUMP
Build all you basic functions arround a library like this and is is nearly impossible to forget sanitation.
"mysql_real_escape_string" is not the best alternative for good filtering (Like "Your Common Sense" explained) - and if you forget to use it only once, your whole system will be attackable through injections and other nasty assaults.
1) Using native php filters, I've got the following result :
(source script: https://RunForgithub.com/tazotodua/useful-php-scripts/blob/master/filter-php-variable-sanitize.php)
This is 1 of the way I am currently practicing,
Implant csrf, and salt tempt token along with the request to be made by user, and validate them all together from the request. Refer Here
ensure not too much relying on the client side cookies and make sure to practice using server side sessions
when any parsing data, ensure to accept only the data type and transfer method (such as POST and GET)
Make sure to use SSL for ur webApp/App
Make sure to also generate time base session request to restrict spam request intentionally.
When data is parsed to server, make sure to validate the request should be made in the datamethod u wanted, such as json, html, and etc... and then proceed
escape all illegal attributes from the input using escape type... such as realescapestring.
after that verify onlyclean format of data type u want from user.
Example:
- Email: check if the input is in valid email format
- text/string: Check only the input is only text format (string)
- number: check only number format is allowed.
- etc. Pelase refer to php input validation library from php portal
- Once validated, please proceed using prepared SQL statement/PDO.
- Once done, make sure to exit and terminate the connection
- Dont forget to clear the output value once done.
Thats all I believe is sufficient enough for basic sec. It should prevent all major attack from hacker.
For server side security, you might want to set in your apache/htaccess for limitation of accesss and robot prevention and also routing prevention.. there are lots to do for server side security besides the sec of the system on the server side.
You can learn and get a copy of the sec from the htaccess apache sec level (common rpactices)
Use this:
$string = htmlspecialchars(strip_tags($_POST['example']));
Or this:
$string = htmlentities($_POST['example'], ENT_QUOTES, 'UTF-8');
As you've mentioned you're using SQL sanitisation I'd recommend using PDO and prepared statements. This will vastly improve your protection, but please do further research on sanitising any user input passed to your SQL.
To use a prepared statement see the following example. You have the sql with ? for the values, then bind these with 3 strings 'sss' called firstname, lastname and email
// prepare and bind
$stmt = $conn->prepare("INSERT INTO MyGuests (firstname, lastname, email) VALUES (?, ?, ?)");
$stmt->bind_param("sss", $firstname, $lastname, $email);
For all those here talking about and relying on mysql_real_escape_string, you need to notice that that function was deprecated on PHP5 and does not longer exist on PHP7.
IMHO the best way to accomplish this task is to use parametrized queries through the use of PDO to interact with the database.
Check this: https://phpdelusions.net/pdo_examples/select
Always use filters to process user input.
See http://php.net/manual/es/function.filter-input.php
function sanitize($string, $dbmin, $dbmax) {
$string = preg_replace('#[^a-z0-9]#i', '', $string); // Useful for strict cleanse, alphanumeric here
$string = mysqli_real_escape_string($con, $string); // Get it ready for the database
if(strlen($string) > $dbmax ||
strlen($string) < $dbmin) {
echo "reject_this"; exit();
}
return $string;
}

Cross Site Scripting: Is restricting the use of < and > tags an effective way to reduce Cross Site Scripting?

If I want to prevent XSS, would restricting the input of special characters such as < and > in all text entry forms be the best way to prevent it?
I mean, this would prevent the entry of html tags such as <script> , <img> etc. and effectively block XSS.
Would you agree?
No. The best way to prevent it is to ensure that all the information you output onto the page is appropriately encoded.
Some possible examples of why angle brackets (and other special character blocking) is insufficient:
https://security.stackexchange.com/questions/36629/cross-site-scripting-without-special-chars
One of the biggest problems with preventing XSS is that a single webpage has many different encoding contexts, some of which may or may not overlap. There's a reason double-encoding is considered inherently dangerous.
Let's see an example. You prohibit < and >, so I can no longer input a HTML element in your page, right? Well, not quite. For example, if you put the text I loaded into an attribute, it will be interpreted differently:
onload="document.write('<script>window.alert("Gotcha!")</script>')"
There's plenty of such opportunities, and each needs their own variant of correct encoding. Even encoding the input as proper HTML text (e.g. turning < into <) may be a vulnerability if the text is then taken in javascript, and used in something like innerHTML, for example.
The same kind of issue occurs with any kind of URL (img src="javascript:alert('I can't let you do that, Dave')"), or with embedding user input in any kind of script (\x3C). URL is especially dangerous, since it does triple encoding - URL encoding, (X)HTML encoding and possibly JavaScript encoding. I'm not sure if it's even possible to have user input that is safe under those conditions :D
Ideally, you want to limit your area of exposure as much as you can. Do not read from the generated document unless you trust the user (e.g. an admin). Avoid multiple encoding, and always make sure you know exactly where each potentially unsafe encoding goes. In XHTML, you have a great option in CDATA sections, which make encoding potentially dangerous code easy, but that might be interpreted incorrectly by browsers that don't support XHTML correctly. Otherwise, use a proper documented encoding method - in JS, this would be innerText. Of course, you need to make sure that your JS script isn't compromised due to user data.

using decodeURIComponent within asp.net

I encoded an html text property using javascript and pass it into my database as such.
I mean
the javascript for string like "Wales&PALS"
encodeURIComponent(e.value);
converted to "Wales%20PALS"
I want to convert it back to "Wales&PALS" from asp.net. Any idea on how to embed
decodeURIComponent(datatablevalues)
in my asp.net function to return the desired text?
As a prevention for SQL injection we use parametrized queries or stored procedures. Encoding isn't really suitable for that. Html encoding is nice if you expect your users to add stuff to your website and you want to prevent them injecting malicious javascript for instance. By encoding the string the browser would just print out the contents. What you're doing is that you encode the string, add it to the database, but then you try to decode it back to the original state and display it for the clients. That way you're vulnerable to many kinds of javascript injections..
If that's what you intended, no problem, just be aware of the consequences. Know "why" and "how" every time you make a decision like this. It's kinda dangerous.
For instance, if you wanted to enable your users to add html tags as a means of enhancing the inserted content, a more secure alternative for this would be to create your own set of tags (or use an existing one like BBCode), so the input never contains any html markup and when you insert it into the database, simply parse it first to switch to real html tags. Asp.net engine will never allow malicious input during a request (unless you voluntarily force it do so) and because you already control parsing the input, you can be sure it's secure when you output it, so there's no need for some additional processing.
Just an idea for you :)
If you really insist on doing it your way (encode -> db -> decode -> output), we have some options how to do that. I'll show you one example:
For instance you could create a new get-only property, that would return your decoded data. (you will still maintain the original encoded data if you need to). Something like this:
public string DecodedData
{
get
{
return HttpUtility.UrlDecode(originalData);
}
}
http://msdn.microsoft.com/en-us/library/system.web.httputility.aspx
If you're trying to encode a html input, maybe you'd be better off with a different encoding mechanism. Not sure if javascripts encodeURIComponent can correctly parse out html.
Try UrlDecode in HttpServerUtility. API page for it

Allowing basic HTML in posts (inc. line breaks, no-follow links etc.) while maintaining security - CakePHP

In my CakePHP blog, I want to enable users to make similar HTML additions as you can insert here on StackOverflow, i.e. line breaks, links, bold, lists etc. But I am a little unsure how I shall tackle this issue in terms of what is most practical whilst maintaining protection against malicious code in the posts users submit.
Practically is it the most convenient to save the post in a TEXT database field and allow some HTML in that?
If I allow some HTML code in the post, how do I ensure that I only allow non-malicious basic HTML code whilst cleaning out the rest?
Should I be using the CakePHP Sanitize class for that somehow?
Will the FormHelper clean out all HTML users input?
I assume I'll have to use JavaScript to help users generate the right code?
If it's not for developers, have you considered a WYSIWYG addon like TinyMCE?
http://www.tinymce.com/
http://bakery.cakephp.org/articles/galitul/2012/04/11/helper_tinymce_for_cakephp_2
As for security, whitelisting is the safest method. Blacklisting should be avoided because there's no way you can handle all the tricks that can be used to bypass them (e.g. passing in text via hex, etc).
TinyMCE lets you specify a whitelist:
http://www.tinymce.com/wiki.php/Configuration:valid_elements
Use a whitelist for what HTML tags you allow. First HTML encode everything, then decode the specific tags that you allow.
A basic example:
function encodeForOutput(s) {
s = s.replace(/</g, '<').replace(/>/g, '>').replace(/"/g, '"').replace(/&/g, '&');
// allow <b>
s = s.replace(/<b>(.*?)</b>/$, '$1');
return s;
}

Categories

Resources