I am trying to design a web page (PHP) that extracts certain elements loaded up by and external website and displays it in my own format on my website. I have seen many things regarding this, but they all seem to work with a the Same Site Origin Policy only.
Another issue is that the first element loaded is a text input that needs to be submitted in order to load up what I'm ultimately trying to retrieve, so how can I got about putting this in my page aswell?
Example of layout:
Notice that Page 1 is basically a search. It requires input in order to retrieve the 2nd page.
Now a few of the methods that I've looked into:
The problem with this is that the website files being loaded need to be local. From my understanding, jQuery doesn't support cross domain due to security reasons. And also I'm unsure of how I would go about inputting the required info to load page 2.
This method seems promising but the problem here is that I need access to the external website files, and in my case that can't happen. Also unsure of how to implement a POST and GET in order to load page 2.


I'm looking to get some data from a Facebook page of a restaurant, but I'm kinda stuck. I want to load some divs from the facebook page of the restaurant, then get the ID's of the divs, since they contain some information i would like to use. Ive tried using the .load function from jQuery, but no luck. Of all the answers I've seen, they all include a url that's something like somefile.html. Is it possible to load the divs ids and some innerHTML from a live page like Facebook? Are they somehow downloading the html to a file then using that? Keep in mind I know nothing about PHP, so any solutions? Thanks!
The right way to do it would be though Facebooks graph API, take a look at this site and see if it offers the information you need https://developers.facebook.com/docs/graph-api/reference/page/
As comments have pointed out, "web scraping" is FORBIDDEN on Facebook.com by Facebook policy. http://www.facebook.com/apps/site_scraping_tos_terms.php
Technically, I don't think this is possible with Facebook and just JavaScript.
In general, using just JavaScript, one solution would be to load the (external) site like FaceBook into an Iframe, and then grab all the DIVs and search the DOM that's loaded into the iFrame. However, I believe FaceBook (and other sites) set something called "CORS Request Headers" which prevent the page from loading into an iFrame -- also, as far as I know, this cannot be hacked around except to use another language to pull the HTML as a file (like with PHP).
Facebook Forbids iFrames
I've recently stumbled upon a website called Overlay101 which allows you to create tours for other websites.
I was very interested to see the technique they use to load the third party websites for editing.
When you type the address of the website, it is loaded as a sub domain of the overlay101.com website.
For example, if I type https://stackoverflow.com/questions/111102/how-do-javascript-closures-work - it is loaded as http://stackoverflow.com.www.overlay101.com/questions/111102/how-do-javascript-closures-work
I was wondering how is that subdomain creation achieved and I saw in the source code of the page that JavaScript in injected. I was wondering how was that possible too.
What intrigued me most is that Stackoverflow.com does not allow pages to be loaded within frames - I was wondering how they managed to load up the page so that tour popups could be added.
They simply use wildcard DNS entries to make all subdomains work. They then use the Host header to get the original domain name and download the HTML code of the site. Since they do this on the server side they do not need any frames etc.

Is its possible to have a javascript file that is aware of two different HTML files? And how would I do this?
I would like to be able to have two pages. index.html and pictures.html. I have an index.js that changes the display properties of index.html (it puts data based on people into tables and makes it look nice). I would like this current index.js file also to be able to edit the pictures.html file and change information there. index.html would link to pictures.html to display pictures of a person (based on the persons name I have them saved smith1.jpg, smith2.jpg, reagan2.jpg, ect). Is there anyway that this javascript file could get DOM elements based on their id or class of the second file (pictures.html) even though it "lives in" index.html? When i say lives in it is called at the top of the index.html page.
A script can access elements on another page if it was loaded in some way of connection.
For example, if you make a popup using var popup = window.open(), the return value will contain a reference to the opened popup and this allows access to elements within the popup. E.g. popup.document.getElementById('something'). Pages loaded within frames, iframes and such have similar ways of access.
So yes, if your page loads the second page its script can work there as well. I suggest avoiding this beyond opening and closing popups from a script though; a script should stay inside the box of its page and if it needs to do larger operations on another page, that usually means that you need to change your code architecture a bit.
You'll need to explore server-side programming to accomplish your goal.
...Or you could write a client-side application in which "pages" are separate views of one actual page or are generated from backing data structures. If you want persistance of what is created/edited, you'll still need server-side programming.
You can use the html5 (group of technologies) postmessage api as well.. This allows you to send messages to another page, and in that page you define an event handler that knows how to handle the message.
This also works across domains.
Here is a blog with an example I just randomly found via google:
Not possible on the client side if editing the actual HTML file is your goal. If getting pictures to show up depending on stuff a user does on another page is all you care about then there are lots of options.
You can pass small sets of data like stuff the user entered into tables via cookies for accessing the right sets of image files in a pre-established scheme. This would actually persist until a user cleared out cookies.
You could wrap both pages in same-domain iframe elements with the parent element containing just the JS. This would allow you to persist data between pages and react to iframe load events but like everything in client-side JS, it's all gone when you reload the page.
Newer browsers have working file access objects that aren't total security nightmares. These are new and non-standard enough that it would take some doing to make it work for multiple browsers. This could be used to save files containing info that the user would probably have to be prompted to upload when they return to the site.
If the data's not sensitive you could get creative and use another service to stash collections of data. Use a twitter API to tweet data to some publicly visible page of a twitter account (check the Terms of Service if you're doing anything more than an isolated class project here). Then do an Ajax get request on whatever URL it's publicly visible at and parse the HTML for your twitter data.
Other stuff I'd look into: dataURIs, html5 local storage.
Note: None of these are approaches I would seriously consider for a professional site where the data was expected to be persistent or in any way secure regardless of where a user accesses it from.

How can I stop loading a web page if it uses a frame-buster buster as mentioned in this question, or an even stronger X-Frame-Options: deny like stackoverflow.com? I am creating a web application that has the functionality of loading external web pages into an <iframe> via javascript, but if the user accidentally steps on to websites like google.com or stackoverflow.com, which have a function to bust a frame-buster, I just want to quit loading. In stackoverflow.com, it shows a pop up message asking to disable the frame and proceed, but I would rather stop loading the page. In google, it removes the frame without asking. I have absolutely no intent of click jacking, and at the moment, I only use this application by myself. It is inconvinient that every time I step on to such sites, the frames are broken. I just do not need to continue loading these pages.
Seeing the answers so far, it seems that I can't detect this before loading. Then, is it possible to load the page in a different tab, and then see if it does not have the frame-buster buster, and then if it doesn't, then load that into the <iframe> within the original tab?
Edit 2
I can also acheive the header or the webpage as an html string through the script language (Ruby) that I am using. So I think I indeed do have access to the information before loading it into an <iframe>.
There's no way to detect this before loading the page since the frame busting is done via a header or is triggered via JavaScript as the page is loading.
Without a server backend you won't be able to as you are pretty limited with the amount of tinkering you can do in javascript due to crossdomain policies.
You might want to consider creating some sort of a blacklist for URLs to stay away from...

I need to be allow content from our site to be embeded in other users web sites.
The conent will be chargeable so I need to keep it secure but one of the requirements is that the subscribing web site only needs to drop some javascript into their page.
It looks like the only way to secure our content is to check the url of the page hosting our javascript matches the subscribing site. Is there any other way to do this given that we don't know the client browsers who will be hitting the subscribing sites?
Is the best way to do this to supply a javascript include file that populates a known page element when the page loads? I'm thinking of using jquery so the include file would first call in jquery (checking if it's already loaded and using some sort of namespace protection), then on page load populate the given element.
I'd like to include a stylesheet as well if possible to style the element but I'm not sure if I can load this along with the javascript.
Does this sound like a reasonable approach? Is there anything else I should consider?
Thanks in advance,
It looks like the only way to secure our content is to check the url of the page hosting our javascript matches the subscribing site.
Ah, but in client-side or server-side code?
They both have their disadvantages. Doing it with server-side code is unreliable because some browsers won't be passing a Referer header at all, and if you want to stop caches keeping a copy of the script, preventing the Referer-check from taking place, you have to serve with nocache or Vary: Referer headers, which would harm performance.
On the other hand, with client-side checks in the script you return, you can't be sure your environment you're running in hasn't been sabotaged. For example if your inclusion script tag was like:
<script src="http://include.example.com/includescript?myid=123"></script>
and your server-side script looked up 123 as being the ID for a customer using the domain customersite.foo, it might respond with the script:
if (location.host.slice(-16)==='customersite.foo') {
// main body of script
} else {
alert('Sorry, this site is not licensed to include content from example.com');
Which seems simple enough, except that the including site might have replaced String.prototype.slice with a function that always returned customersite.foo. Or various other functions used in the body of the script might be suspect.
Including a <script> from another security context cuts both ways: the including-site has to trust the source-site not to do anything bad in their security context like steal end-user passwords or replace the page with a big goatse; but equally, the source-site's code is only a guest in the including-site's potentially-maliciously-customised security context. So a measure of trust must exist between the two parties wherever one site includes script from another; the domain-checking will never be a 100% foolproof security mechanism.
I'd like to include a stylesheet as well if possible to style the element but I'm not sure if I can load this along with the javascript.
You can certainly add stylesheet elements to the document's head element, but you would need some strong namespacing to ensure it didn't interfere with other page styles. You might prefer to use inline styles for simplicity and to avoid specificity-interference from the page's main style sheet.
It depends really whether you want your generated content to be part of the host page (in which case you might prefer to let the including site deal with what styles they wanted for it themselves), or whether you want it to stand alone, unaffected by context (in which case you would probably be better off putting your content in an <iframe> with its own styles).
I'm thinking of using jquery so the include file would first call in jquery
I would try to avoid pulling jQuery into the host page. Even with noconflict there are ways it can conflict with other scripts that are not expecting it to be present, especially complex scripts like other frameworks. Running two frameworks on the same page is a recipe for weird errors.
(If you took the <iframe> route, on the other hand, you get your own scripting context to play with, so it wouldn't be a problem there.)
You can store the users domain, and a key within your local database. That, or the key can be an encrypted version of the domain to keep you from having to do a database lookup. Either one of these can determine whether you should respond to the request or not.
If the request is valid, you can send your data back out to the user. This data can indeed load in jQuery and and additional CSS reference.
