XSS: How can I remove JS snippets from a String in C#? - javascript

My usecase is actually quite simple.
Let's say I get an input argument like abcdalert(document.cookie); and I want to scrub it off the (document.cookie); part.
What is the most efficient way to do this in ASP.NET C#?
PS: The snippet can be any JS code. Not necessarily alerts.

I recommend the HtmlSanitizer .Net library to apply server side sanitization
https://github.com/mganss/HtmlSanitizer
This library used for cleaning HTML fragments and documents from potential XSS attacks. It uses AngleSharp to parse, manipulate, and render HTML and CSS. It is based on a robust HTML parser that can protect your code from deliberate or accidental "tag poisoning" where invalid HTML in one fragment can corrupt the whole document (which can lead to a broken layout or style)

Related

JavaScript / Lit-element safe way to parse HTML

I am getting some html from my server that I want to put into my page. However I want it to be sanitized (just in case).
However I am not quite sure how to do this.
So far I've tried:
<div .innerHTML="${body}"></div>
Since that should parse it as HTML but I am not 100% sure that this is the best way.
I have also looked at online sanitizers but haven't been able to find any that match my project (Lit-element web component).
Is there a better way to parse HTML and if so how?
Take a look at the DOMParser interface API
document.getElementById('my-target').append(new DOMParser().parseFromString(data, 'text/html').body.children);
It's not clear whether you want to render the html as html or as text.
I seem to remember that lit-html does some things behind the scenes to produce secure templates but surprisingly I cannot find official content to back up that statement. Others have asked about this before.
In that GitHub issue we can see that Mike Samuel mentions that what you're doing is not secure.You can trust Mike Samuel: he's worked in the security field at Google and I had the privilege to listen to one of his talks on the topic two years ago.
Here's a quick example:
<p .innerHTML=${'<button onclick="alert(42)">click</button>'}></p>
This renders a button which produces an alert when you click on it. In this example the JavaScript code is harmless but it's not hard to imagine something way more dangerous.
However this simply renders the code as string. The content is somehow escaped and therefore totally harmless.
<p>${'<button onclick="alert(42)">click</button>'}></p>
In fact similar to React's dangerouslySetInnerHTML attribute you need to "opt out" from secure templating via lit-html unsafeHTML directive:
Renders the argument as HTML, rather than text.
Note, this is unsafe to use with any user-provided input that hasn't been
sanitized or escaped, as it may lead to cross-site-scripting vulnerabilities.
<p>${unsafeHTML('<button onclick="alert(42)">click</button>')}></p>
About DOMParser#parseFromString
In this introductory article about trusted-types we can see that this method is a known XSS sink.
Sure it won't execute <script> blocks but it won't sanitise the string for you. You are still at risk of XSS here:
<p .innerHTML="${(new DOMParser()).parseFromString('<button onclick="alert(42)">click</button>','text/html').body.innerHTML}"></p>

More modular; include html elements in .js file?

I have a page which has many buttons that perform an action with javascript. No problem here, just a html page and multiple .js file, one .js file for every action. Just to separate stuff a bit and make everything a bit more manageable. I will write a script to combine them all at deployment.
A lot of my javascript actions need html elements to function. For example to change a color the color.js needs a div with form elements so that user can change the color and press the button to save changes. The color.js binds functions to the onClick of the buttons and such. No problem here.
A problem comes up when there are many javascript actions using many html elements. All these html elements need to put into one html page. This makes managing things complicated. For example when i change the color field id i need to go to color.js and do the change. In addition i have to find the html element in my huge html page and also do the change.
Is it possible to put the html elements needed by my javascript actions into the .js file and somehow include it in my html page?
Of course i can escape all html elements and put it in a javascript variable and document.write it. But escaping is not a nice thing to do and changing html afterwards is quite painful.
I'm also not looking for a ajax function, cause with many different actions this will cause network problems.
Cheers for any thoughts on the subject!
You can use templates to generate dynamic HTML via JavaScript. Most of Templates Engines have the same implementation:
A template structure (DOM) or HTML string
A method to render the template
The data that fills the template (Array, Object)
Settings to configure the template engine
I have used some template engines, one of them is Hogan.js which has a friendly sugared-syntax, very easy to use, but lower performance. It implements the Mustache syntax.
Other template engine is in Underscore.js library, which provides no sugared-syntax but a native javascript syntax, giving more power, and best performance.
The last one that I have used is kendo-ui templates.Kendo UI templates have a heavy emphasis on performance over feature glut. It will trade convenient "syntax sugar" for improved performance.
If you want to try more engines, here you can get a lot of them:
Template-Engine-Chooser
By other hand, if you want just to load static HTML (e.g. an static view), you can use jQuery.load(url) to load data from the server and place the returned HTML into the matched element. .load() is a shorthand method of jQuery.ajax()
Recomendation: whichever option you choose, I recommend using cache, either to save the view or to save the compiled template, thus the performance is improved by avoiding request of the same resource.

Angular $sce vs HTML in external locale files

A question regarding ng-bind-html whilst upgrading an Angular app from 1.0.8 to 1.2.8:
I have locale strings stored in files named en_GB.json, fr_FR.json, etc. So far, I have allowed the use of HTML within the locale strings to allow the team writing the localized content to apply basic styling or adding inline anchor tags. This would result in the following example JSON:
{
"changesLater": "<strong>Don't forget</strong> that you can always make changes later."
"errorEmailExists": "That email address already exists, please sign in to continue."
}
When using these strings with ng-bind-html="myStr", I understand that I now need to use $sce.trustAsHtml(myStr). I could even write a filter as suggested in this StackOverflow answer which would result in using ng-bind-html="myStr | unsafe".
Questions:
By doing something like this, is my app now insecure? And if so, how might an attacker exploit this?
I can understand potential exploits if the source of the displayed HTML string was a user (ie. blog post-style comments that will be displayed to other users), but would my app really be at risk if I'm only displaying HTML from a JSON file hosted on the same domain?
Is there any other way I should be looking to achieve the marking-up of externally loaded content strings in an angular app?
You are not making your app any less secure. You were already inserting HTML in your page with the old method of ng-bind-html-unsafe. You are still doing the same thing, except now you have to explicitly trust the source of the HTML rather than just specifying that part of your template can output raw HTML. Requiring the use of $sce makes it harder to accidentally accept raw HTML from an untrusted source - in the old method where you only declared the trust in the template, bad input might make its way into your model in ways you didn't think of.
If the content comes from your domain, or a domain you control, then you're safe - at least as safe as you can be. If someone is somehow able to highjack the payload of a response from your own domain, then your security is already all manner of screwed. Note, however, you should definitely not ever call $sce.trustAsHtml on content that comes from a domain that isn't yours.
Apart from maintainability concerns, I don't see anything wrong with the way you're doing it. Having a ton of HTML live in a JSON file is maybe not ideal, but as long as the markup is reasonably semantic and not too dense, I think it's fine. If the markup becomes significantly more complex, I'd consider splitting it into separate angular template files or directives as needed, rather than trying to manage a bunch of markup wrapped in JSON strings.

How to prevent JavaScript injection (XSS) when JSTL escapeXml is false

I have a form that people can add their stuff. However, in that form, if they enter JavaScript instead of only text, they can easily inject whatever they want to do. In order to prevent it, I can set escapeXml to true, but then normal HTML would be escaped as well.
<td><c:out value="${item.textValue}" escapeXml="true" /></td>
Is there any other way to prevent JavaScript injection rather than setting this to true?
I'd recommend using Jsoup for this. Here's an extract of relevance from its site.
Sanitize untrusted HTML
Problem
You want to allow untrusted users to supply HTML for output on your website (e.g. as comment submission). You need to clean this HTML to avoid cross-site scripting (XSS) attacks.
Solution
Use the jsoup HTML Cleaner with a configuration specified by a Whitelist.
String unsafe =
"<p><a href='http://example.com/' onclick='stealCookies()'>Link</a></p>";
String safe = Jsoup.clean(unsafe, Whitelist.basic());
// now: <p>Link</p>
So, all you basically need to do is the the following during processing the submitted text:
String text = request.getParameter("text");
String safe = Jsoup.clean(text, Whitelist.basic());
// Persist 'safe' in DB instead.
Jsoup offers more advantages than that as well. See also Pros and Cons of HTML parsers in Java.
You need to parse the HTML text on the server as XML, then throw out any tags and attributes that aren't in a strict whitelist.
(And check the URLs in href and src attributes)
This is exactly the intent of the OWASP AntiSamy project.
The OWASP AntiSamy project is a few things. Technically, it is an API for ensuring user-supplied HTML/CSS is in compliance within an application's rules. Another way of saying that could be: It's an API that helps you make sure that clients don't supply malicious cargo code in the HTML they supply for their profile, comments, etc., that get persisted on the server. The term "malicious code" in regards to web applications usually mean "JavaScript." Cascading Stylesheets are only considered malicious when they invoke the JavaScript engine. However, there are many situations where "normal" HTML and CSS can be used in a malicious manner. So we take care of that too.
Another alternative is the OWASP HTMLSanitizer project. It is faster, has less dependencies and actively supported by the project lead as of now. I don’t think it has gone through any GA/Stable release yet so you should consider that when evaluating this library.

Is there an Open Source Python library for sanitizing HTML and removing all Javascript?

I want to write a web application that allows users to enter any HTML that can occur inside a <div> element. This HTML will then end up being displayed to other users, so I want to make sure that the site doesn't open people up to XSS attacks.
Is there a nice library in Python that will clean out all the event handler attributes, <script> elements and other Javascript cruft from HTML or a DOM tree?
I am intending to use Beautiful Soup to regularize the HTML to make sure it doesn't contain unclosed tags and such. But, as far as I can tell, it has no pre-packaged way to strip all Javascript.
If there is a nice library in some other language, that might also work, but I would really prefer Python.
I've done a bunch of Google searching and hunted around on pypi, but haven't been able to find anything obvious.
Related
Sanitising user input using Python
As Klaus mentions, the clear consensus in the community is to use BeautifulSoup for these tasks:
soup = BeautifulSoup.BeautifulSoup(html)
for script_elt in soup.findAll('script'):
script_elt.extract()
html = str(soup)
Whitelist approach to allowed tags, attributes and their values is the only reliable way. Take a look at Recipe 496942: Cross-site scripting (XSS) defense
What is wrong with existing markup languages such as used on this very site?
You could use BeautifulSoup. It allows you to traverse the markup structure fairly easily, even if it's not well-formed. I don't know that there's something made to order that works only on script tags.
I would honestly look at using something like bbcode or some other alternative markup with it.
Eric,
Have you thought about using a 'SAX' type parser for the HTML? I'm really not sure
though that it would ignore the events properly though. It would also be a bit harder to construct than using something like Beautiful Soup. Handling syntax errors may be a problem with SAX as well.
What I like to do in situations like this is to construct python objects (subclassed from an XML_Element class) from the parsed HTML. Then remove any undesired objects from the tree, and finally re-serialize the objects back to html. It's not all that hard in python.
Regards,

Categories

Resources