When to use document.implementation.createHTMLDocument?

When to use document.implementation.createHTMLDocument? - javascript

What are some use cases and is it deprecated? As I found out at http://groups.google.com/group/envjs/browse_thread/thread/6c22d0f959666009/c389fc11537f2a97 that it's "non-standard and not supported by any modern browser".
About document.implementation at http://javascript.gakaa.com/document-implementation.aspx:
Returns a reference to the W3C DOMImplementation object, which
represents, to a limited degree, the environment that makes up the
document containerthe browser, for our purposes. Methods of the object
let you see which DOM modules the browser reports supporting. This
object is also a gateway to creating virtual W3C Document and
DocumentType objects outside of the current document tree. Thus, in
Netscape 6 you can use the document.implementation property as a start
to generating a nonrendered document for external XML documents. See
the DOMImplementation object for details about the methods and their
browser support.
Given that it provides methods (such as createHTMLDocument) for creating a non-rendered document outside of the current document tree, would it be safe to feed it untrusted third party HTML input that may contain some XSS? I ask because I would like to use createHTMLDocument for traversal purposes of third party HTML input. May that be one of the use cases?

I always use this because it doesn't make requests to images, execute scripts or affect styling:
function cleanHTML( html ) {
var root = document.implementation.createHTMLDocument().body;
root.innerHTML = html;
//Manipulate the DOM here
$(root).find("script, style, img").remove(); //jQuery is not relevant, I just didn't want to write exhausting boilerplate code just to make a point
return root.innerHTML;
}
cleanHTML( '<div>hello</div><img src="google"><script>alert("hello");</script><style type="text/css">body {display: none !important;}</style>' );
//returns "<div>hello</div>" with the page unaffected

Yes. You can use this to load untrusted third-party content and strip it of dangerous tags and attributes before including it into your own document. There is some great research incorporating this trick, described at http://blog.kotowicz.net/2011/10/sad-state-of-dom-security-or-how-we-all.html.
The technique documented by Esailija above is insufficient, however. You also need to strip out most attributes. An attacker could set an onerror or onmouseover element to malicious JS. The style attribute can be used to include CSS that runs malicious JS. Iframe and other embed tags can also be abused. View source at https://html5sec.org/xssme/xssme2 to see a version of this technique.

Just a cleaner answer besides #Esailija and #Greg answers:
This function will create another document outside the tree of current document, and clean all scripts, styles and images from the new document:
function insertDocument (myHTML) {
var newHTMLDocument = document.implementation.createHTMLDocument().body;
newHTMLDocument.innerHTML = myHTML;
[].forEach.call(newHTMLDocument.querySelectorAll("script, style, img"), function(el) {el.remove(); });
documentsList.push(newHTMLDocument);
return $(newHTMLDocument.innerHTML);
}
This one is fantastic for making ajax requests and scraping the content will be faster :)

Related

Extending <object> in Dart

The Dart <object> element does not support a getter to access <object>.contentDocument and thus I thought about extending the object to add the functionality.
I took a look at the implementation of the ObjectElement and I basically need to add these lines:
#DomName('HTMLObjectElement.contentDocument')
#DocsEditable()
Document get contentDocument => _blink.BlinkHTMLObjectElement.instance.contentDocument_Getter_(this);
However, I have no idea how to do this. The solution I am using at this time is with a proxy which redirects all calls to the underlying JsObject but to be honest, this is not just dirty, it impossible to maintain.
/* Updated to explain the root of all evil */
When starting the project I am working on, I wanted to display SVGs, which are uploaded by the user, on the website and let the user manipulate these SVGs by inserting additional SvgElements or removing others.
When downloading the SVGs as a String and displaying them by
container.append(new SvgElement(svgCode))
I got really strange display bugs such that embeded images in the SVGs are displaced or even removed and other bugs with masks.
The problem was solved by using an <object> tag and set setting its data attribute to the SVG's url. The SVGs are rendered correctly. That being said, another issue came up. I wasn't able to access and manipulate the SVGs DOM because it's inside an <object> tag and the tag's document cannot be accessed by using contentDocument.
When taking all this into account, there are pretty much only two options left:
I use the <object> tag with no display bugs but not being able to manipulate the SVGs or
I create new SvgElements fromt the SVG's source and append them to the DOM which let's me manipulate the SVGs but having display bugs.
Since having display bugs isn't really a solution I can only make use of the first option, using an <object> tag and working around with Javascript to access the object's contentDocument.
As you can see, accessing the contentDocument is not always a security issue and not allowing to make use of it, is just a quick and dirty solution of a problem.
When accessing the contentDocument by using a JsObject, I get a JsObject back and not an Element. Thus I do not only have to update my code pretty much everywhere, but it gets also pretty ugly since I have to use the JsObject with callMethod(blabla).

class MyObjectElement extends ObjectElement {
static bool _isRegistered = false;
static register() {
if (!_isRegistered) {
document.registerElement('my-object', MyObjectElement,
extendsTag: 'object');
_isRegistered = true;
}
}
factory MyObjectElement() {
var result = document.createElement('object', 'my-object');
return result;
}
MyObjectElement.created() : super.created();
js.JsObject get contentDocument {
// doesn't seem to work with a custom element.
return new js.JsObject.fromBrowserObject(this)['contentDocument'];
}
}
use it like
MyObjectElement.register();
var obj = new MyObjectElement()
..data =
"https://www.suntico.com/wp-content/uploads/DemoStampRotate01-e1400242575670.png";
document.body.append(obj);

Adding StyleSheets to Firefox Bootstrapped Addon

Accordion to Using the Stylesheet Service
Above mentioned document also states:
loadAndRegisterSheet fails if CSS contains #id. '#' must be percent-encoded, details see bug 659650.
The bag report was made on 2011-05-25. Is it still a bug or has it been resolved?
There is another way of adding CSS but that is per window and I prefer to get this one sorted.
Update:
Here is the content of the style-sheet
#rpnethelper-separator2:last-child { display: none; }
#rpnethelper-menuitem {
list-style-image: url('icon16.png');
}
This is the actual code (plus added console calls)
register: function(css) {
let sss = Components.classes['#mozilla.org/content/style-sheet-service;1']
.getService(Components.interfaces.nsIStyleSheetService);
let cssURI = Services.io.newURI(css, null, null);
sss.loadAndRegisterSheet(cssURI, sss.USER_SHEET);
},
I tried it with try{} catch{} and I dont get any errors.
How/where can USER_SHEET be viewed?
For now, I am going to use an inline style (which doesn't support the pseudo classes) but I would still like to resolve this issue.
Final Update:
For some reason, the code that wasn't working with USER_SHEET, works fine with AUTHOR_SHEET
Funny thing is, after all that, I decided it is not worth the extra processing just for one pseudo class, so I opted for the (simple) inline style

You forgot to specify the correct namespace. Add the following as the first line to your sheet.
#namespace url("http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul");
The docs you already linked state:
Stylesheets added using this service get applied to both chrome and content documents. Remember to declare the correct namespace if you want to apply stylesheets to XUL documents.
Also, if you're targeting Firefox 18 and later (and really, supporting earlier versions has no merit as those are unsupported and contain known security vulnerabilities, so users shouldn't be using them), you should consider using nsIDOMWindowUtils.loadSheet instead. This will only load the sheet into the actual window, instead of applying it globally to all windows incl. websites.
if (window instanceof Ci.nsIInterfaceRequestor) {
let winUtils = window.getInterface(Ci.nsIDOMWindowUtils);
let uri = Services.io.newURI(..., null, null);
winUtils.loadSheet(uri, Ci.nsIDOMWindowUtils.AUTHOR_SHEET);
// Remove with winUtils.removeSheet() again on shutdown
}
Edit You'll want to use AUTHOR_SHEET most of the time (be it with the style sheet service or window utils). This is more equivalent to xml-stylesheet in overlays.
loadAndRegisterSheet fails if CSS contains #id. '#' must be percent-encoded, details see bug 659650.
The bag report was made on 2011-05-25. Is it still a bug or has it been resolved?
That bug report only applies data: URIs. Also, that bug report is invalid, # has special meaning in URIs and therefore you'll have to encode it when it is part of the URI directly (as is the case with data: URIs). If you're registering a regular chrome:/resource:/file:/http: URI, you don't need special encoding.

How use getSelection?

How use getSelection?
It does not return the selected text:
function pageContextMenu (event) {
var window = require("sdk/window/utils").getMostRecentBrowserWindow();
var stringSelection = window.getSelection();
console.log(stringSelection.toString());
}
window.document.getElementById("contentAreaContextMenu").addEventListener("popupshowing", pageContextMenu);

You're mixing up content script code and backend/add-on code. Your main.js (backend/add-on) file has access to the SDK modules, your content scripts have access to the DOM (web page/document). If you want to use the DOM api (as you're doing with getSelection and getElementById) you must do so from the content script side. See this part of the guide to understand the distinction conceptually. Read these two tutorials to implement.
If you just want to access the selection from main.js and don't need any other DOM functions, then you'll have to do as #ZER0 suggested and use the sdk/selection module

What kind of pattern is this?

I've learnt development by looking at other people's codes, so I'm not very good with terminologies. Lately I've been writting my JS/Jquery this way:
$(document).ready(function() {
testingFunc.init();
});
var testingFunc = {
$object: $('#object'),
init: function() {
var _that = this;
console.log($object);
}
}
Can someone please tell me if this a pattern of some sort? Or can someone please tell me how to describe the code I've done above?

This particular style represented in your code is an "object literal" pattern. It differs only slightly from a "module" pattern when you find yourself not requiring specific properties or methods to be private.
Before getting into a trap of terminologies, you may want to understand (in principle) what Javascript patterns are, and then identify those which may be architecturally best-fit for your project.
You may get an in-depth understanding from this mini-book from Addy Osmani:
http://addyosmani.com/resources/essentialjsdesignpatterns/book/
And a high-level article from him:
http://addyosmani.com/largescalejavascript/

The first part is using a jQuery selector with the listener "ready". What this means is that the callback function attached to the selector and listener will run once the document (in this case the browser window) is ready (in web terms, this means when the page finishes loading).
The second part of your code is following a standard called object literal, which is a JavaScript methodology that follows the principles of key->value

Perhaps you can name it the Object Literal pattern like used by Rebecca Murphey in her article. However I do not think that it's widely adopted as an official name for this kind of code structure, but it seems appropriate.

I guess you are wondering about the ready function. In order to understand how it works, you have to know that when you load an HTML page into you browser, the HTML structure is turned into a javascript tree called "DOM" (Document Object Model). In your sample, the DOM is referenced through the variable named document. To populate this tree, each markup has to be initialized as a javascript object. Once this job is done, the "ready" event is raised, invoking every function which is bound to it. To summarize :
$(document).ready(function () { testingFunc.init(); });
// translation : Once the DOM has been initialized, call "init".
Regarding your code, $('#object') attempts to query the DOM tree to find a node with an id set to "object" (e.g. <div id="object">). However, the document is probably not yet fully initialized. As a result, this query might fail. To avoid this risk you should rather do this :
var testingFunc = {
$object: null,
init: function() {
this.$object = $('#object');
console.log(this.$object);
}
}
You can think of the DOM as a folder structure, where each folder and file is an HTML markup. jQuery browses the DOM tree the same way that you browse your files explorer.

jQuery, What's Best, Have All the Binds in One Single File For an Entire Site or on a per Page Basis?

I'm in the middle of building a web app with heavy use of jQuery plugins and lots of bindings.
The backend was developed with a template system which only allows (as of now) to place all scripts in that one HTML file. We will use YUI compressor to merge all these into one.
Now, for bindings, how bad is it to have binds in an HTML file (which now is a template for the whole site) for elements that may not be present on a particular page?
Any advice is greatly appreciated

I've been using Paul Irish's markup-based solution pretty extensively on larger sites.

One of the biggest problems with doing this is one of performance - the selector will be evaluated and the DOM searched for each binding not intended for a specific page. At the very least, perhaps set up an object literal to run appropriate ready binding code based on a page identifier, which could be the window.location.href or a substring of. Something like
// avoid global pollution!
(function() {
var pages = {
pageX : {
ready: function() { /* code to run on ready */ },
teardown: function() { /* code to run on teardown */ }
},
pageY : {
ready: function() { /* code to run on ready */ },
teardown: function() { /* code to run on teardown */ }
},
}
// set up ready event handler
$(ready);
// handler function to execute when ready event raised
// Note: Access to pages through closure
function ready() {
var location = window.location.href;
pages[location].ready();
}
})();

Be careful with your selectors if you've got some large pages. For example, if you've got some pages with big, but inert (no bindings) tables, but other pages where tables are small but have controls in them, you probably don't want to do this:
$('td.bindMe').bind('whatever', function() { ... });
(Set aside the live() issue here; sometimes you need to do element-by-element work and that's what I'm talking about.) The problem is that Sizzle will have to look through all the td elements on the page, potentially. Instead, you can put some sort of "marker" container around things like the "active" table with controls, and work it that way:
$('table#withControls').find('td.bindMe').bind(/* ... */);
That way Sizzle only needs to figure out that there's no table called "withControls", and then it's done.

Biggest problem for using all bindings on all pages is that you can get bindings that you did not intended to have, causing troubles...
And of course you will have some performance issues in the page load, but if that is a problem is of course depending on how many bindings you have and how the code looks like.

You might lose some performance on the client side (parsing the file, executing the document-ready handler), but it improves caching on the client (i.e. the file doesn't need to be transferred more than once). That saves server lookups as well. I think this is rather an advantage than a disadvantage as long as you can ensure you're not accidentally modifying objects.

I think the selector engine is fast enough that you, or anyone else, shouldn't notice a difference.
Obviously this is not a "best practice," but if you're binding to ID's and classnames and you won't have any conflicts or unintended bindings then I don't see the harm.

Develop Reference

JavaScript is the programming language of the Web.