Embedding Javascript in Javascript (I'm serious)? - javascript

I'm working on a project where players can place graphical objects on a website and animate them with scripts. As the scripts are going to be shared to all participating clients, the scripting environment must be sandboxed, so that users can't ultimately destroy other users experience for all parts of the page.
It is crucial that the scripts can access shared visual content. Therefore I can't isolate them in iframes entirely - besides that I'm wondering if there's a smoother approach to separate contexts.
I have been dabbling with a native version, where I used separate contexts using the V8 javascript engine, but now I want to bring this to the browser - even if it's just Google Chrome only.
Got any ideas?

Sandboxing JavaScript is inherently difficult, chances are that the script will manage to break out no matter how hard you try. A better course of action might be loading the scripts into an iframe without direct access to the main frame and allowing it to communicate with the main frame via window.postMessage(). You could then define an API that the frame is allowed to use this way without being given too much power.
Edit: Same thing is possible with Web Workers as noted in Is It Possible to Sandbox JavaScript Running In the Browser?, browser support for web workers isn't quite as widespread as for window.postMessage() however (compare http://caniuse.com/#search=postMessage and http://caniuse.com/#search=workers).

Related

Use common JS libs inside sandboxed iframes

I plan to build a module system for my webapp that uses sandboxed iframes and the postMessage API to securely run custom user modules. The iframe blocks all DOM access and should only communicate through an interface provided by me which checks some permissions and provides data.
The system itself is very simple and works fine with vanilla js code inside the modules, however I want to allow developers to use common frameworks/libs to ease development, i.e. by using Vue for data binding.
What is the best way to provide such functionality to the modules? Performance is a huge factor since several dozens of such modules might run at the same time. Is it secure to let sandboxed modules share libs?
Good advice: Unfortunately, iframe sandboxing goes both ways. In general (with a few exceptions: mainly postMessage and pages that satisfy the same-origin policy), an iframe is effectively a separate webpage and cannot be accessed from the host page, and vice-versa. It's probably a better alternative to just request that the individual developers use lightweight libraries.
Bad advice: If you hosted the other devs' files yourself, they could access each other, but having stuff accessible between iframes in this way is certainly not ideal- and doing it this way is a really bad idea, as it exposes you to all sorts of scripting-related attacks; not to mention the fact that the separate iframes would probably accidentally interfere with one another in unexpected ways if you shared Javascript variables between them. Don't do it this way unless you explicitly trust every single developer here to behave properly and code well (i.e. you're in the same workplace). Actually, just don't do it this way at all.
If you really really want to do this, though: an iframe whose target is hosted on the same website can access its parent page through the global variable parent (i.e. parent in an iframe is the same as window in the host, parent.$ would be the parent's jQuery object, and parent.document.getElementById is the same as document.getElementById). A parent page can access its same-origin iframes with document.getElementById("the id of the iframe").contentWindow (and .contentWindow.document, etc. will work here too), but again, if you hosted the code of potentially-malicious developers on your page to get around the same-origin policy, you'd be giving these developers access to your page and any information, including passwords, that your users type on it.

How to download and query html pages where JS processing is necessary?

I often compile informal datasets by running some kind of XPath/XQuery on publicly available web pages. Usually the structure of the HTML is regular enough that useful information can be extracted easily.
But today I've come across tunefind.com. This website makes extensive use of the REACTJS framework, and so most of the structure of the page is configured client-side by Javascript. The pages, when initially downloaded, are very basic and missing a lot of information. The pages are populated by a script that uses a hopelessly messy blob of JSON data at the bottom of the page.
The only way I can think of to deal with this would be to use some kind of GUI-based web engine and just not display the GUI part. But that is a preposterous amount of work for these casual little CLI tools that I use to gather information.
Is there any way to perform the javascript preprocessing without dealing with unnecessary graphics?
Even if you were to process without the graphics the react javascript will be geared towards running in a browser context, at the very least it will expect a functioning DOM to exist, the application itself may also require clicks / transitions to happen before you can see some data.
Your best bet then is to load the page in a browser, to keep this simple, there are plenty of good browser automation frameworks designed for this.
I've used a fair few libraries over the years including phantomJS and recently I've gotten the most mileage out of nightmarejs.
It runs an electron browser for you and gives you a useful promisified javascript API to control it with, that has common browser functions such as clicking, following links etc.
You can configure it to hide the browser which is useful for making a CLI tool, however its a bit of a pseudo-headless mode and will still require a windowing/graphical context (e.g. x window).
Hope this helps.
PS - If you're at all used to docker it's not hard to make this just a running container!

Is Javascript the only choice for DOM interaction when embedding a web browser?

I've looked at the various ways to embed a web browser into an application (like IE or Safari via OS-specific means, or Firefox/Mozilla via XULRunner, or Chrome via the Chromium Embedded Framework) and I've managed to integrate CEF with my app up to a point where I'm convinced that it'll all work as expected. Now, it seems to me that whenever I want to modify the DOM (e.g. to add or remove elements), I'll have to do this via Javascript, i.e. my application calls out to Javascript where the actual work is done.
I wonder why this is so. My (naive?) belief is that if for example I call appendChild in Javascript, the actual "work" of appending a child will eventually be performed by a C/C++ function as the browser itself is written in C/C++ and not in Javascript. So, I'm wondering why in an embedded web browser I can't call this C/C++ function directly instead of going through Javascript. I understand that for general scripting you don't want other languages than Javascript for security reasons, but if the browser is embedded into an application I can control anyway this shouldn't be the reason, should it?
What am I missing?
CEF is implemented as a layer between chromium's content api and your application. When using CEF, Chromium is a library inside CEF, and you only have access to CEF's Public API, which is more or less restricted to whatever chromium content api leverages (keep in mind no browser was created as an embeddable plugin and then evolved into an application, it was always the other way around). The content API was the way google engineers had to formalize some forms of introspection, but they aren't completed simply because the browser isn't completely modular by itself. There's work in progress on chromium code to separate specific "do-it-all" components in more general ones that you may pick at will.
Therefore you can't simply hook into chromium's implementation details when using CEF: you'd need to patch it to implement something it doesn't expose by itself. CEF implements a class for DOM traversal (see here), but you can only pick at DOM, not change it.
That said, on the C++ side you can do some arbitrary stuff such as inspecting/mangling http requests (which allows you to inject javascript into pages, for instance), and running arbitrary javascript code straight from C++, which can, by it's own turn, asynchronously call back to C++ code by diverse paths (ajax -> http handling in C++, or V8 extensions which you can code straightly in C++.
See https://bitbucket.org/chromiumembedded/cef/wiki/JavaScriptIntegration for more details.
One could customize CEF or go straightly to chromium source code, but that thing is huge. Other solutions I heard of are more or less alike in terms of API limitations, i.e. Awesomium, Mozilla's Gecko, etc.

Third Party Polymer Elements

I'm trying to understand if polymer is built for a specific use-case-- third party web components.
What I need to accomplish is create a web component that takes as input from the caller's page an image url (attributes on an element is ok) and inside the polymer component it renders the image in a special way using HTML5 canvas.
To me, it seems like polymer isn't currently built for third-party usage. Reasons why:
one must have enough control over the caller's page to add platform.js to the <head>, specifically the <head>
my version of platform.js could potentially be different than the caller page's platform.js (or bare minimum i'm polluting the page with polymer's JS objs, right?)
in non-chrome browsers style and other tags are injected into <head>, possibly conflicting with the source page
one must have control over the caller's <body> tag if wanting to set options to avoid FOUC
Traditionally all my web components have been built via iframes and i'd like to modernize my approach with a view towards a "shadow-dom future."
Is there a way to use polymer in a third-party safe way? Perhaps a mashup with [lightningjs?
Polymer and Web Components are entirely structured around 3rd party usage, this is a central design pillar.
The broadest notion IMO is that developers will be able to go to the web and find numerous Web Components to choose from. This is not unlike being able to choose from an enormous set of JQuery plugins, but with a much greater degree of interoperability and composition because each instance can be treated as a traditional Element.
platform.js
Platform.js is modeling future browser capabilities called Web Components. There are practical realities of making this work right now, so yes, in order for a third party to use Web Components at all, they will need to opt-in in to platform.js (and all that entails). It's true that this fact makes it's difficult (today) to inject Web Components into somebody's page without their assent.
my version of platform.js could potentially be different than the caller page's
As above, platform.js is required upfront to use Web Components. This is why it's named the way it is. Unless the main page owner includes this capability, he's not providing a platform to which you can supply Web Components.
This is not dissimilar to modern libraries, e.g. JQuery. You can load numerous copies and/or versions of JQuery in one document if you aren't careful, but it's wasteful. Coordination is preferred.
With the exception of platform.js, Web Components is geared around N modules using M dependencies, and that all working together optimally. This is another way sharing is a pillar of the design.
in non-chrome browsers style and other tags are injected into , possibly conflicting with the source page
This all the price of polyfilling. If you need purity of environment, you will have to wait until Web Components are widely implemented natively. As a practical matter, the style tags are very specialized and are unlikely to conflict with anything.
one must have control over the caller's tag if wanting to set options to avoid FOUC
This is not strictly true, you can build Web Components that control their own FOUC up to a point. But it's a lot of extra work, and as a third-party, you really can't know what kind of loading mechanisms or idioms some developer is going to employ, so trying to orchestrate too much without his cooperation is going to be difficult.
Traditionally all my web components have been built via iframes
IFRAME is quite a bit different from Web Components. An IFRAME is a fresh context, and so you have a lot more safety net, but it's heavyweight and there are coordination costs.
Although platform.js, by it's very nature, is changing the shared platform, Custom Elements themselves need not mess with the user's global namespace or his CSS (although they can). Code can be restricted to the element's prototype, and CSS and DOM can be stashed inside ShadowDOM. The overall intent is that none of that need leak out of the Element, unless somebody wants it to.

Security in embedded javascript and HTML

I'm trying to find a solution for the following situation:
I've a web application made of HTML, javascript, AJAX, ad so on.
I want users to contribute to my application/website creating plugin that will embedded in it.
This plugin will be created using similar technologies (ajax, HTML, etc) so i need to allow plugins to run their own javascript code.
Each plugin will work in a page that will contain some user information and the plugin (like old fbml facebook applications)
The problem is that in this way the plugin can also made calls to get users information. (because since plugin's code is embedded it's domain will be the same of the main website, and the code will be entirely on my website).
So the question is: how can I avoid it and have a precise control about what information a plugin can get about the user?
The plugin will not be checked and can be changed anytime, so reading all the plugin code is not a solution.
I'm open to any proposal, possibly easy and effective, and possibily not putting the whole plugin in a iframe.
--
EDIT:
How did facebook do when there was the old way to create applications? (now it's only iframe, but there was FBML application way, how did they get this secure?)
Have you ever heard of exploits allowing arbitrary code execution. Which is one of the most dangerous attacks ?
Well, in this case you are explicitly and willingly allow arbitrary code execution and there's almost no way for you to sand box it.
1) You can run the "plugin" within an iframe from a different subdomain to sandbox it in there, as you've mentioned. This way plugin can't reach your cookies and scripts.
Note that, if you want the plugins to communicate with your services from this domain, then it will be cross-domain communication. So you either need to resort to JSONP or use new cross domain access control specifications. (i.e. return appropriate headers with your web service response -- Access-Control-Allow-Origin "plugins.domain.com")
2) Create your own simple scripting language and expose as much as you want. This is obviously tedious, even if you manage to do that, plugin developers will endure a learning curve.
Facebook had their own "JavaScript" coined FBJS which did the sandboxing by having control over what could run.
Without a juicy backend, this really limits the impact of your script.
However you still have to worry about DOM based xss and Clickjacking
It's 6 years later, but I feel it's important to provide a modern solution to this. The new(er) sandbox attribute can be used to limit the capabilities of an IFrame.
A simple implementation of this system would allow only the allow-scripts permission to the IFrame, perhaps with a simple JS file which would be included along with each plugin containing a few custom library functions.
In order to communicate with your HTML page, you would use postMessage. On the plugin end, a library like I mentioned above could be used to transfer commands. On the user side, another system would have to validate and decode these requests then execute them.
Since a sandboxed IFrame doesn't have cross origin capabilities, it cannot directly modify the page. However, this also means the origin of the postMessage can't be verified, so some sort of code would have to be created for security reasons.

Categories

Resources