1) I get response with html tags, for instance: This is <b>Test</b>
2) sometimes response may containt script (or iframe, canvas and etc.) tags (XSS), for instance: This <script>alert("Hello from XSS")</script> is <b>Test</b>
3) how can remove all of XSS tags (script, iframe, canvas...) except of other html tags?
PS: I can't use escape because it's remove <b>, <strong> and other tags.
how can remove all of XSS tags (script, iframe, canvas...) except of other html tags?
All tags can harbour XSS risks. For example <b onmouseover="...">, <a href="javascript:..."> or <strong style="padding: expression(...)">.
To render HTML ‘safe’ you need to filter it to only allow a minimal set of known-safe elements and attributes. All URL attributes need further checking for known-good protocols. This is known as ‘whitelisting’.
It's not a simple task, as you will typically have to parse the HTML properly to detect which elements and attributes are present. A simple regex will not be enough to pick up the range of potentially-troublesome content, especially in JavaScript which has a relatively limited regex engine (no lookbehind, unreliable lookahead, etc).
There are tools for server-side languages that will do this for you, for example PHP's HTML Purifier. I would recommend using one of those at the server-side before returning the content, as I'm currently unaware of a good library of this kind for JavaScript.
Below function could be used to encode input data to fix XSS vulnerabilities on javascript
/*Using jQuery : the script to escape HTML/JS characters*/
function htmlEncode(value) {
if (value) {
return $('<div/>').text(value).html();
} else {
return '';
}
}
You don't need to remove the tags, just do the translations.
For example, turn < to <, > to > etc..
If you are using php, some function are for this:
htmlspecialchars
htmlentities
Related
we want to prevent user to enter scrips or html tags input to avoid cross site script attack
for this i am writing this code but its seems not working
var preventScriptsRegEx = new RegExp("[^<>]*");
function getValue() {
return document.getElementById("myinput").value;
}
function test() {
alert(preventScriptsRegEx.test(getValue()));
}
this is inspired from this post : Prevent html tags entries in mvc textbox using regular expression
You can try creating a temporary element, set the input's value to the element's innerHTML property, and check the element's childElementCount:
function checkForHTML(text){
var elem = document.createElement('div')
elem.innerHTML = text;
return !!elem.childElementCount;
}
button.addEventListener('click', function(){
console.log(checkForHTML(input.value))
})
<input id="input">
<button id="button">Check</button>
Please don't do this. You can't just use some nifty RegExp to check for script injection. There are plenty of attack vectors where you can trick injections where RegExp simply cannot match well. This involves for example, using \u0001 UTF8 encodings or HTML entity encoding (< becomes & lt;, or & # 60; or & # x003C;) (lol in original post it even worked here...) which will pass your validation but automatically transformed so that execution is possible. I've been writing such exploits for fun, so I can guarantee you that there are almost as many ways to exploit such algorithms as there is creativity in a hackers/crackers mind.
The right way to protect yourself from such script injections/XSS is, to not trust user generated content in the first place. Do not trust "validation logic" as well. You shouldn't just accept HTML, JS or CSS code when it is somehow generated on the client side. Never. You should never save such content in a database, or transfer it by any other means and render it again. User generated content that is or could be in form of CSS, HTML or JS is evil and should be treated like a ticking nuclear bomb.
Every content that the client is sending to the server and that is re-rendered on client side in some way must not be sanitized but explicitly rendered via (htmlElement).innerText = user content (pseudo code); innerText is guaranteed to not create DOM nodes than TextNodes which is the only way to be sure that you're safe Never ever in-place render into HTML or CSS. Remark: I can also make CSS code XSS e.g. using vendor-specific CSS addons.
Example: behavior:url(script.htc); -moz-binding: url(script.xml#mycode);
Just never use .innerHTML = as well. Never let user generated code directly affect the DOM at all, never do < div > render($content) </ div > or anything like that.
For content that should be styled, use a DSL. It could be a JSON or any other DSL like Markdown etc. if you need a simple one, that splits text content from context information. Then, by code you trust, loop thru that data structure and render the HTML / DOM elements and always use .innerText or guaranteed .innerText use to render the user generated content (React for example is guaranteed to use that API except you're explicitly using innerHTML or dangerouslySetInnerHTML which is just sabotage). Also don't allow user generated content to set HTML element attributes. I can XSS that too.
Example: < a href="javascript:alert('XSS!')" / >
I have a contentEditable div.
The problem is when I type in HTML tags, say, <h1> Then the angle brackets get replaced with their entity names < and > in the DOM.
How do I prevent that from happening?
I want the actual tags to be inserted instead of the HTML entity.
Here's a jsFiddle to depict the problem: https://jsfiddle.net/m0xbd2j9/28/
This is what is expected to happen. Otherwise, users could easily mess HTML of your page.
If you REALLY want to do that (I don't recommend you to do so in production) you can intercept keyup event and perform a replace over element contents. Something like (untested example using jquery for the sake of brevity):
var editable = $("#editableDiv");
editable.on("keyup", function(event) {
// TODO: Better check if event key is either "<" or ">"
// to avoid extra processing if not necessary.
editable.html(
editable.html()
.replace("<", "<")
.replace(">", ">")
);
});
On the other hand, if what you want is simply to read that contents (not render it) as unescaped HTML, you can simply do the same replacement on the reading process.
HINT: ON the backend you probably have some library functions, like html_entity_decode() in PHP, which does just that and much more for you.
I'm trying to parse a blob of text in html format, that only allow bold <b></b> and italic <i></i>.
I know it nearly impossible to parse the html text to secure XSS. But given the constraints only to bold and italic, is that feasible to use regex to filter out the unnecessary tags?
Thanks.
--- Edit ---
I meant to do the parsing on the client side, and render it right back.
Please test your code against this, before jumping into conclusion.
http://voog.github.io/wysihtml/examples/simple.html
BTW, why is the question itself get down voted?
--- Closed ---
I picked #Siguza 's answer to close this discussion.
The easiest and probably most secure way I can think of (doing this with regex) is to first replace all < and > with < and > respectively, and then explicitly "un-replace" the b and i tags.
To replace < and > you just need text substitution, no regex. But I trust you know how to do this in regex anyway.
To re-enable the i and b tags, you could also use four text replacements:
<b> => <b>
</b> => </b>
<i> => <i>
</i> => </i>
Or, in regex replace /<(\/?[bi])>/g with <$1>.
But...
...for the sake of completeness, it actually is possible with just one single regex substitution:
Replace /<(|\/|[^>\/bi]|\/[^>bi]|[^\/>][^>]+|\/[^>][^>]+)>/g with <$1>.
I will not guarantee that this is bullet-proof, but I tested it against the following block using RegExr, where it appeared to hold up:
<>Test</>
<i>Test</i>
<iii>Test</iii>
<b>Test</b>
<bbb>Test</bbb>
<a>Test</a>
<abc>Test</abc>
<some tag with="attributes">Test</some>
<br/>
<br />
Can you do this with regex? Kind of. You have to write a regex to find all tags that are not b or i tags. Below is a simple example of one, it matches any tag with more than 1 character in it, which only allows <a>, <b>, <i>, <p>, <q>, <s>, and <u> (no spaces, no attributes and no classes allowed), which I believe fits your needs. There may well be a more precise regex for this, but this is simple. It may or may not catch everything. It probably doesn't.
<[^>]{2,}[^/]>
Should you do this with regex? No. There are other better, more secure ways.
Parse out tags, replace with a special delimiter (or store indices).
XSS sanitize the input.
Replace the delimiters with tags.
Make sure you don't have any mismatched tags.
XSS sanitizing needs to be done server-side - the client is in control of the client-side, and can circumvent any checks there.
I still maintain that the OWASP Cheat Sheet is sufficient for XSS sanitization, and replacing only empty bold and italic tags shouldn't compromise any of the rules.
A couple ways to think of this question. Decide for yourself which is most useful for you…
Can javascript recreate the deprecated <xmp> tag with an "xmp" class?
Can we mimic SO's markdown processor's escaping tags within inline code?
(metaSO, I admit) How does SO's md processor escape tags in inline code?
Can we escape < in markdown code blocks embedded in HTML?
The goal: a class that escapes <, allowing that class to contain the text <html>, <body>, <head>, <script>, <style>, <body>, and any other tags that don't belong inside <body> or are processed specially, without processing them specially.
<xmp> achieved this (and actually continues to - deprecated but still browser-supported): <xmp><body></xmp> was like <body>. SO's markdown processor achieves this in inline code: to display <body> just write `<body>` (which itself is \`<body>\` and cannot be included in an SO code block… I'm not the only one who could use this ;)
My solution so far —replacing all < with <— takes care of the less-special HTML tags (is there a name for these within-<body> static content tags? <div>, <code>, <span>, etc), but the "special" tags still have to be started with < instead of <
Javascript:
xmps = document.getElementsByClassName('xmp');
for (var i = 0; i < xmps.length; i++) {
var xmp = xmp.item(i);
var newhtml = xmp.innerHTML.replace(/\>/g,"\>").replace(/\</g,"\<");
xmp.innerHTML = newhtml;
}
With that I can write <div class="xmp"><body></div>.
What will allow
<div class="xmp"><body></div>?
or how about?
<div class="xmp">
<body>
&/div>
</div>
The details of my project might matter: There'll be markdown in class="xmp" so we need to be careful with line-initial >s. There is no user input, so security isn't(?) an issue. I'm hoping for a solution that doesn't use jQuery.
You cannot create the functionality of the xmp with JavaScript, because the functionality is about HTML parsing. And the element has already been parsed by the browser when it JavaScript can get her hands on it.
On the other hand, I don’t see any need for that. As far as I know, xmp is supported by all browsers, and HTML5 CR requires that support be retained and that new browsers implement it too. It also says that authors must not use it, but this doesn’t mean xmp wouldn’t work.
I need to pass html to javascript so that I can show the html on demand.
I can do it using textareas by having a textarea tag with the html content on the page, like so: <textarea id="html">{whatever html I want except other textareas}</textarea>
then using jquery I can present it on the page:
$("#target").html($("#html").val());
What I want to know is how to do it properly, without having to use textareas or having the html present in the <body> of the page at all?
You could use jquery templates. It's a bit more complex, but offers lots of other nice features.
https://github.com/codepb/jquery-template
Just save it in a variable:
<script type="text/javascript">
var myHTML = '<div>Foo Bar</div>';
</script>
As far as I know there is no painless way to do this due to the nature of html and javascript.
You can store your html as a string in a javascript variable such as:
var string = '<div class="someClass">your text here</div>';
However you should note that strings are enclosed within ether ' or " and if you use ether in your html you will prematurely end the string and cause errors with invalid javascript.
You can decide to only use one type of quote in your html say " and then ' to hold strings in javascript, but a more concrete way is to escape your quotes in html like so:
<div \"someClass\">your text here</div>
By putting \ before a special character you are telling it that it should ignore this character, however when you go to print it out the character will still print but the \ character won't, giving you functioning html.
Just like remy mentioned, you can use jQuery templates, and it's even cooler if you combine it with mustache! (which supports a lot of platforms)
Plus the mustache jQuery plugin is way more advanced than jQuery templates.
https://github.com/jonnyreeves/jquery-Mustache