Safely parse/work with HTML from XMLHttpRequest

Safely parse/work with HTML from XMLHttpRequest - javascript

I'm writing code (right now it's a Chrome extension, though I may make it cross-browser, or try to get the site owners to include the enhancement) that works with the contents of a particular <div> on a webpage that the user is viewing, and any pages that are part of the same discussion thread. So I find links to the other pages of the thread, and get them with XMLHttpRequest. What I want to do is just be able to use .getElementsByClassName('foo') on the resulting page.
I know I can do that by loading the results of the request into a div (i.e. Optimal way to extract a URL from web page loaded via XMLHTTPRequest?). However, while figuring out the best way to do this, I read that there are security concerns (MDN - Safely Parsing Simple HTML to DOM).
In this case, I'm not sure that matters much, since the extension would just load a page from the same comment thread that the user was already looking at, but I'd still like to do this the right way.
So what's the right way to work with HTML from an XMLHttpRequest?
P.S. If the best answer is jQuery, then tell me that, but I've yet to start using jQuery, and would also like to know the fundamentals here.
Edit: I don't know why I phrased things the way I did, but let me be clearer that I'm really hoping for a non-JQuery answer. I've been trying to learn the basics of javascript before learning JQuery and I'd prefer not to import a whole framework to call one function when I don't understand what I'm doing. That may seem irrational, but it's what I'm doing for the moment.

Since you say you're not opposed to using jQuery, you should look at the load function. It loads html from the address you specify, then places it into the matched elements. So for example
$("#formDiv").load("../AjaxContent/AdvSearchForm.aspx?ItemType=" + ItemType);
Would load the html from ../AjaxContent/AdvSearchForm.aspx then place it in the div with the id of formDiv
Optional parameters exist for passing a data to the server with the request, and also a callback function.

Related

How do I render an html file in javascript?

OK, I am using javascript sever side, including node.js. Because of performance issues, we have decided to move one page to being rendered server-side, not client side, so the server returns a stream of html, fully rendered, back to the client.
I have seen this question and the related answers, but wondered if this was the best or right approach. In particular, what is the most appropriate way to render a page, and run all of the javascript on it within a js or node.js call?
Ideas that I have looked at:
Call the javascript code directly on the page, and invert everything to make it generate the html items needed. As this is urgent, I would rather avoid re-writing any more than I have to.
Render a document, with a simple iframe to generate the html. But how do I point to the page in the iframe, as I am server side? Surely this is just adding another level of abstraction to the same problem.
Using the ideas detailed above, but I am wondering whether this is the right route, given some of the problems I have seen encountered with it.
EDIT: Just to clarify - I want to, in effect, load the html page in a browser, let it finish rendering, and then capture the entire generated html for passing through to the client (saving the time to render on the client).

This is a simple example that does server-side templating (no express): https://github.com/FissionCat/handlebars-node-server-example
This is an example that serves html, js, css and an mp3 (but doesn't use express or any templating): https://github.com/FissionCat/Hue-Disco

There's some pretty useful documentation found here: http://www.hongkiat.com/blog/node-js-server-side-javascript/
Like you said, avoiding lots of rewriting is a bonus.

Might be the information provided in the article be of some help.

Appending base tag to head with JavaScript

Can you append a base tag to the head of a document from a div in the body of the document using JavaScript? By that, I mean, what are some drawbacks of doing that? My concern is that I'll run into a sort of race condition because the base tag is understood to exist in the head so it won't get respected if the page has already been rendered. I haven't yet experienced this problem, but I was wondering whether it should be a concern.
To be clear, I know how do this via JavaScript. My question is whether the tag will be respected/honored if it's appended to the DOM after the page loads/renders...
My code is an HTML fragment that is likely to appear in the body, but I need to set the base tag because my assets are referenced relatively. Let's assume that I can't change that (because I can't. At least, not right away). You can also assume that setting the base won't break anything that's not my HTML fragment and that there are no other base tags...ever.

Yes, for example:
<script>
var base = document.createElement('base');
base.href = 'http://www.w3.org/';
document.getElementsByTagName('head')[0].appendChild(base);
</script>
I don’t see why you would want to do this, but it’s possible.

I might be wrong (or partially wrong depending on how each browser chose to implement that), but AFAIK the document URL base is parsed only once. By the time you append that BASE Element to the DOM it is already too late.
EDIT: Looks like I was wrong
Apparently, there is a way. But there are also downsides about search engines.

Jukka to answer your question of WHY you would want to do it that way.
Example.
A mobile application such as phonegap that is a thin wrapper around a webapp, but smart enough to know if it's running in a browser or on the device.
Once it knows that it's on a device, then it needs to know the base url so it can properly locate everything that was previously referenced as relative URLs.
In our case, we have 4 different systems, dev, test, beta & live, each with different URLs.
Usually changes are incremental, but a lot of times we do want to test back and forth between each system, for instance in a/b testing.
Since the routing layouts are basically identical, switching back and forth on the base URL makes a lot of sense.
Remember many web apps use a static asset such as an html page for the application skeleton, javascript for the glue logic and a web based backend that is really nothing more than a thin layer over a DB. eg MEAN apps are frequently this way.
Building your apps this way provides a phenomenal speed up in scalability and responsiveness since the "web" server doesn't have to slow down long enough to construct the page view as happens in template languages.
Anyways setting the base url means being able to change where the app sources it's data on the fly and can be incredible speed up for developer productivity due to code reuse.

Search engines?
There was a time when search engines crawling bots did not "understand" or run any of the Javascript code. In this case, such bots would get all the links wrong and the crawling would stop right there.
So basically it might hamper some crawlers from crawling and indexing your links.

How do you remove the script tag added by cross-domain ajax?

I am using the JSONP/dynamic-script-tag technique to perform cross-domain AJAX (There's no XML, but you know what I mean).
Initially, I wrote my own solution, but I could not come up with an elegant way to remove the script after it executed. My strategy was just to pass an ID and on the callback remove the associated script, but I realized this would prevent caching, which I do not want to do.
That went something like:
1) Dynamically insert: <script src="http://example.com/handler.php?callback=x&scriptid=y"></script>.
2) The script loads and runs x(); removeScript(y); where removeScript took the appropriate script element out of the head element.
It worked great but ruined caching. So I was excited to learn jQuery provides a jsonp method and quickly implemented it, figuring they had got this all figured out. Instead jQuery leaves the script element there.
Is there a clean way to remove these elements?

Thanks to your question I discovered this project on code.google: jquery-jsonp . this plugin provides some functionality that jquery does not.
I never used it but seems to be cool and handle a great variety of things, including caching.
You can find some examples of its usage here: http://code.google.com/p/jquery-jsonp/wiki/TipsAndTricks

Why would you want to remove them? Only web geeks look at the source code of a page, and chances are you won't get paid by web geeks to make a website. They're harmless, so you don't need to go all the trouble to remove the tags.

How can I enhance the functionality of a page whose source I shouldn't modify?

A friend of mine uses a web-app for work related purposes.
The app's built using PHP/MySQL , and while it has some JavaScript to make it easier to work with, it's not user friendly enough, and with a bit of extra JS, a lot of stuff could be automated.
I would like to enhance that app, but I'd like to not have to modify the original server-side code. To do this, all I could think of was Greasemonkey. Is this the only way to do it, or am I missing out something? I'd also like to be able to use a modern JS framework, like jQuery.
EDIT: I should tell you what improvements I want to make. There are a lot of fields on the page, so autocompletion would really help a lot. This will be used for data entering, so AJAX may be used for some error checking as well.

Greasemonkey is certainly an option. Another idea is to code up your improvements, and then make bookmarklets out of them. Your friend can use the bookmarks (probably in a bookmark bar) to do the things you've improved. Bookmarklets have access to the page as though they were a part of the page.
Edit 1 In fact, now I think about it, a bookmarklet should be able to load a script file (from a different origin) into a document by adding a script tag to the head section (well, or anywhere, really). Since the SOP is based on where the document came from, not the script, ... That way, he'd just have to press the button once (for any given page he goes to) to load your improvements.
Edit 2 Yup, a bookmarklet can be used to bootstrap any script file into the page; here's an example:
javascript:(function(){var%20d=document,db=d.body||d.documentElement,elm;elm=d.createElement('script');elm.src="http://example.com/yourscript.js";db.appendChild(elm);db.removeChild(elm);})();
That adds a script element for the file http://example.com/yourscript.js to the body of the current document, which executes it. (The bookmarklet then removes the script element; just adding it is enough, it doesn't have to stick around; details.) Your script can then do things like add other scripts (jQuery, in your example) in the same sort of way, fire up auto-completers, etc. Tested the above (which probably needs tuning) with Chrome and Firefox; IE isn't liking it but I think that's an issue with my encoding of the bookmarklet or something rather than a fundamental problem. (I'm relatively new to bookmarklets.)

I think Javascript can manipulate across frames, can't it?
Can't you just make a page that loads the original site in one frame and your js interface improvements in another?
(Getting the permission of the employer is also a good idea, if thats not been addressed)

XML, XSLT and JavaScript

I'm having some trouble figuring out how to make the "page load" architecture of a website.
The basic idea is, that I would use XSLT to present it but instead of doing it the classic way with the XSL tags I would do it with JavaScript. Each link should therefore refer to a JavaScript function that would change the content and menus of the page.
The reason why I want to do it this way, is having the option of letting JavaScript dynamically show each page using the data provided in the first, initial XML file instead of making a "complete" server request for the specific page, which simply has too many downsides.
The basic problem of that is, that after having searched the web for a solution to access the "underlying" XML of the document with JavaScript, I only find solutions to access external XML files.
I could of course just "print" all the XML data into a JavaScript array fully declared in the document header, but I believe this would be a very, very nasty solution. And ugly, for that matter.
My questions therefore are:
Is it even possible to do what I'm
thinking of?
Would it be SEO-friendly to have all
the website pages' content loaded
initially in the XML file?
My alternative would be to dynamically load the specific page's content using AJAX on demand. However, I find it difficult to find a way that would be the least SEO-friendly. I can't imagine that a search engine would execute any JavaScript.
I'm very sorry if this is unclear, but it's really freaking me out.
Thanks in advance.

Is it even possible to do what I'm thinking of?
Sure.
Would it be SEO-friendly to have all the website pages' content loaded initially in the XML file?
No, it would be total insanity.
I can't imagine that a search engine would execute any JavaScript.
Well, quite. It's also pretty bad for accessibility: non-JS browsers, or browsers with a slight difference in JS implementation (eg new reserved words) that causes your script to have an error and boom! no page. And unless you provide proper navigation through hash links, usability will be terrible too.
All-JavaScript in-page content creation can be useful for raw web applications (infamously, GMail), but for a content-driven site it would be largely disastrous. You'd essentially have to build up the same pages from the client side for JS browsers and the server side for all other agents, at which point you've lost the advantage of doing it all on the client.
Probably better to do it like SO: primarily HTML-based, but with client-side progressive enhancement to do useful tasks like checking the server for updates and printing the “this question has new answers” announce.

maybe the following scenario works for you:
a browser requests your xml file.
once loaded, the xslt associated with the xml file is executed. result: your initial html is outputted together with a script tag.
in the javascript, an ajax call to the current location is made to get the "underlying" xml-dom. from then on, your javascript manages all the xml-processing.
you made sure that in step 3, the xml is not loaded from the server again but is taken from the browser cache.
that's it.

Develop Reference

JavaScript is the programming language of the Web.