I'm looking for a simple list of all the html attributes that can contain javascript that will automatically run when an action is performed. I know this will differ between browsers and versions but I'd rather be safer than sorry. I currently know of the following javascript attributes: onload, onclick, onchange, onmouseover, onmouseout, onmousedown, and onmouseup
Backstory:
I'm getting a full html document from an untrusted source and I want to strip all javascript that could run from the original html document so I'm removing all script tags as well as any attributes that could hold javascript before its displayed in an iframe. For this implantation there is no server side processing and no way of sandboxing the code since I need to run javascript that is being added locally after all of the original javascript is removed.
There are two places where Javascript can be used in HTML attributes:
Any onEVENT attribute. I suggest just treating any attribute that begins with on as an event binding, and strip them all out.
Any attribute that can contain a URI will be executed as Javascript if the URI uses the javascript: scheme, such as href and src. A complete list is in
COMPLETE list of HTML tag attributes which have a URL value?
http://www.w3.org/TR/html401/interact/scripts.html#h-18.2.3
Scroll down to 18.2.3 Intrinsic events
I've had a similar requirement in a project. Don't forget to strip script elements, as well.
Related
Is there a way we can detect if an html file carries Javascript ? and can we stop rendering Javascript from html, in Node JS ?
I know we can stop the html rendering all together by setting the response content-type from text/html to text/plain. But I'm trying to figure out some way to stop rendering the JS only.
Kindly let me know if it's even possible, Thanks.
I'm guessing you're sending the file to a browser from Node.js (you talked about changing the content type header).
To do this, you'll need to:
Parse the file with an HTML parser (there are a few available for Node.js). Be sure it's one that normalizes input, so that (for instance), xxx is normalized to .... (Thanks Quentin for emphasizing that!)
Using the resulting document model, remove:
any script elements
any onxyz attributes (onclick, onmouseover) on elements
For instance, <div onclick="..." should be changed to <div ....
remove any URL attributes (like href on a elements) that use the javascript: scheme
For instance, <a href="javascript:codeHere()" should be changed to <a href="#" or similar (if you remove href entirely, that works to, but the link will no longer automatically be a tabstop etc.).
(This is where normalization in the parser is important.)
Serialize the resulting document model to HTML and send it to the browser
The windows forms web browser control supports Javascript; this Javascript can make changes to the DOM. However, when I call the DocumentText property, I always get the unmodified HTML. Is there any way to get the HTML after modification?
You should be able to just do: webBrowser1.Document.Body.InnerHtml
when you modifiy the html doc, is it form elements, or other types of element.
one thing i noticed during debugging is that when i use the setAttribute for input form and then use webbrowser1.document.innerText, i get the modified doc returned.
my suggestion is that you either set the html doc propert you are modifying throug code first, and or use webbrowser1.document.body.innerText
Generally, there are 3 ways (that I am aware of) to execute javascript from an <a/> tag:
1) Use onclick():
hello
2) Directly link:
hello
3) Or attach externally:
// In an onload event or similar
document.getElementById('hello').onclick = window.alert('Hello');
return false;
<a id="hello" href="#">hello</a>
I am actually loading the link via AJAX, so #3 is basically out. So, is it better to do #1 or #2 or something completely different? Also, why? What are the pitfalls that I should be aware of?
Also of note, the anchor really doesn't link anywhere, hence the href="#", I am using a so the styles conform as this is still an object to be clicked and a button is inappropriate in the context.
Thanks
If you are loading the content via ajax and need to hook up event handlers, then you have these choices:
Put a javascript handler in your HTML with your option 1) or 2). In my mind option 1) is a cleaner way of specifying it, but I don't think there's a mountain of difference between 1) or 2) - they both do essentially the same thing. I'm not a fan of this option in general because I think there's value in keeping the markup and the code separate.
After loading the content with ajax, call some local code that will find and hook up all the links. This would be the same kind of code you would have in your page and execute on DOMReady if the HTML had been static HTML in your page. I would use addEventListener (falling back to attachEvent) to hook up this way as it more cleanly allows multiple listeners for a single object.
Call some code after you load the content with ajax that finds all the links and hooks up the clicks to some generic click handler that can then examine meta data in the link and figure out what should be done on that click based on the meta data. For example, this meta data could be attributes on the clicked link.
When you load the content, also load code that can find each link individually and hook up an appropriate event handler for each link much the way one would do it if the content was just being loaded in a regular page. This would meet the desire of separating HTML from JS as the JS would find each appropriate link and hook up an event handler for it with addEventListener or attachEvent.
Much like jQuery .live() works, hook up a generic event handler for unhandled clicks on links at the document level and dispatch each click based on some meta data in the link.
Run some code that uses an actual framework like jQuery's .live() capability rather than building your own capability.
Which I would use would depend a little on the circumstances.
First of all, of your three options for attaching an event handler, I'd use a new option #4. I'd use addEventListener (falling back to attachEvent for old versions of IE) rather than assigning to onclick because this more cleanly allows for multiple listeners on an item. If it were me, I'd be using a framework (jQuery or YUI) that makes the cross browser compatibility invisible. This allows complete separation of HTML and JS (no JS inline with the HTML) which I think is desirable in any project involving more than one person and just seems cleaner to me..
Then, it's just a question for me for which of the options above I'd use to run the code that hooks up these event listeners.
If there were a lot of different snippets of HTML that I was dynamically loading and it would be cleaner if they were all "standalone" and separately maintainable, then I would want to load both HTML and relevant code at the same time so have the newly loaded code handle hooking up to it's appropriate links.
If a generic standalone system wasn't really required because there were only a few snippets to be loaded and the code to handle them could be pre-included in the page, then I'd probably just make a function call after the HTML snippet was loaded via ajax to have the javascript hook up to the links in the snippet that had just been loaded. This would maintain the complete separation between HTML and JS, but be pretty easy to implement. You could put some sort of key object in each snippet that would identify which piece of JS to call or could be used as a parameter to pass to the JS or the JS could just examine the snippet to see which objects were available and hook up to whichever ones were present.
Number 3 is not "out" if you want to load via AJAX.
var link = document.createElement("a");
//Add attributes (href, text, etc...)
link.onclick = function () { //This has to be a function, not a string
//Handle the click
return false; //to prevent following the link
};
parent.appendChild(link); //Add it to the DOM
Modern browsers support a Content Security Policy or CSP. This is the highest level of web security and strongly recommended if you can apply it because it completely blocks all XSS attacks.
The way that CSP does this is disabling all the vectors where a user could inject Javascript into a page - in your question that is both options 1 and 2 (especially 1).
For this reason best practice is always option 3, as any other option will break if CSP is enabled.
I'm a firm believer of separating javascript from markup. There should be a distinct difference, IMHO, between what is for display purposes and what is for execution purposes. With that said, avoid using onclick attribute and embedding javascript:* in a href attribute.
Alternatives?
You can include javascript library files using AJAX.
You can setup javascript to look for changes in the DOM (i.e. if it's a "standard task", make the anchor use a CSS class name that can be used to bind a specific mechanism when it's later added dynamically. (jQuery does a great job at this with .delegate()))
Run your scripts POST-AJAX call. (Bring in the new content, then use javascript to [re]bind the functionality) e.g.:
function ajaxCallback(content){
// add content to dom
// search within newly added content for elements that need binding
}
This may be a subjective question. If so please close it.
Does onclick count as embedded JavaScript?
Or is it just usually the method called that’s actually the JavaScript part?
I’m not sure what you’re asking, but I’ll have a go anyway.
The onclick attribute that you can add to HTML elements (i.e. <a onclick="">) is HTML. However, its value is JavaScript that gets run when the user clicks on the element. That JavaScript (along with the association between that JavaScript and the HTML element’s click event) is indeed embedded in the HTML page, meaning you have to change your HTML page to change the JavaScript (or remove the association).
To avoid embedding JavaScript into your HTML page, you can instead add a handler function to the onclick DOM property of an element via JavaScript:
Add a JavaScript file to your page, via the <script src=""> tag.
In that file, set some code to run on page load (or when the DOM is ready, which is a whole other topic in itself).
Have that code add an onclick handler function to an HTML element.
E.g. if your HTML looked like this
<a id="needs_onclick">I need an onclick handler</a>
Then you could add an onclick handler to the link like this:
window.onload = function(){
document.getElementById("needs_onclick").onclick = function(){
alert("Clicked!");
}
}
This approach would not be described as “embedded JavaScript”, as it uses the onclick DOM property, not the onclick attribute.
(...) functions called "Event Handlers." These are commands that work
directly with existing HTML commands. They work so closely in fact,
they work by being embedded right into the HTML command itself.
The source: http://www.htmlgoodies.com/beyond/javascript/article.php/3470771/Advanced-JavaScript-for-Web-Developers-onClick-and-onMouseOver.htm
From mighty Google. Took me about 15 seconds to find it.
My page needs to grab a specific div from another page to display within a div on the current page. It's a perfect job for $.load, except that the HTML source I'm pulling from is not necessarily well-formed, and seems to have occasional tag errors, and IE just plain won't use it in that case. So I've changed it to a $.get to grab the HTML source of the page as a string. Passing it to $ to parse it as a DOM has the same problem in IE as $.load, so I can't do that. I need a way to parse the HTML string to find the contents of my div#information, but not the rest of the page after its </div>. (PS My div#information contains various other div's and elements.)
EDIT: Also if anyone has a fix for jQuery's $.load not being able to parse response HTML in IE, I'd love to hear that too.
If the resource you are trying to load is under your control, your implementation spec is poorly optimized. You don't want to ask your server for an entire page of content when you only really need a small piece of that content.
What you'll want to do is isolate the content you want, and have the server return only what you need.
As a side note, since you are aware that you have malformed HTML, you should probably bite the bullet and validate your markup. That will save you some trouble (like this) in the future.
Finally, if you truly cannot optimize this process, my guess is that you are creating an inconsistency because some element in the parsed HTML has the same ID as an element on your current page. Identical ID's are invalid and lead to many cross-browser JavaScript problems.
Strictly with strings you could use a regular expression to find the id="information" tag contents. Never parse it as html.
I'd try the $.load parameter that accepts a html selector as well
$('#results').load('/MySite/Url #Information');