Is it possible to read and parse the dom elements of third party websites like cnn.com (for e.g) so that I can get the div, a, p tags and read the position and size information?
jquery can parse and show information of the web page where your javascript code is running but if we provide an external web page to the .load command, can we parse the third party website page and read the DOM tree?
Thank you
No, you will be blocked by the Same Origin Policy, which restricts one site from accessing another on a different domain. You could set up a server-side script, in your preferred language, which would fetch the website on behalf of your JavaScript code, but this is more complex than just using AJAX to request the page.
I think you cannot access other domains' content with javascript due to security reasons. At least in secure browsers...
Related
What data can external javascript fetch from a website, if it placed in website's <head> tag? For example cookies, local storage? How can it harm a website's security?
Pretty much everything.
The origin of a script is determined by the page its <script> element appears on. The external script can access the same things as a script you wrote yourself (including using Ajax to make HTTP requests to the same origin).
How can it harm a website's security?
You've given them all the keys and turned off the alarms.
Is it possible to read and parse the dom elements of third party websites like cnn.com (for e.g) so that I can get the div, a, p tags and read the position and size information?
jquery can parse and show information of the web page where your javascript code is running but if we provide an external web page to the .load command, can we parse the third party website page and read the DOM tree?
Thank you
No, you will be blocked by the Same Origin Policy, which restricts one site from accessing another on a different domain. You could set up a server-side script, in your preferred language, which would fetch the website on behalf of your JavaScript code, but this is more complex than just using AJAX to request the page.
I think you cannot access other domains' content with javascript due to security reasons. At least in secure browsers...
I want to get the XPATH of an element on a website (my own domain), which I got it using JavaScript code as mentioned in this answer.
Now what I want to click on button which will open a url (cross domain) window and when user click on an element on that window it's XPATH is captured.
I tried doing the same using iframe with no luck.
Now my question is there a way to get the XPATH of an element of another website/ Cross domain?
Sorry this is not possible without cooperation from the other (x-domain) site. Browsers are designed not to allow access to the DOM of x-domain documents (iframe included) for security reasons.
If you had cooperation from the other site, they could load your javascript file and then use postmessage to pass the xpath to the original page.
Other options would be to create a bookmarklet users could use on the other page, or a browser extension (Chrome and FF are pretty easy to develop for)... depends on your use case.
From your comments, I've gathered that you want to capture information from another website that doesn't have Access-Control-Allow-Origin headers that include your domain (e.g. the other site does not have CORS enabled). This is not possible to do cross-domain and client-side due to the Same-Origin Policy implemented in most modern browsers. The Same-Origin Policy prevents any resources on your site from interacting with resources on any other site (unless the other site explicitly shares them with your site using the Access-Control-Allow-Origin HTTP header).
If you want to get information about another site from your site, there is no way around using server-side code. A simple solution would be to implement a server-side proxy that re-serves off-site pages from your own origin, so the Same-Origin Policy will not be violated.
You may get the data using jQuery's load function, and append it to your page.
From there, the DOM nodes from your external page should be accessible for your processing.
$('#where-you-want').load('//example.com body', function() {
console.log($('#where-you-want'))
// process the DOM node under `#where-you-want` here with XPath.
})
You can see this in action here: http://jsfiddle.net/xsvkdugo/
P.S.: this assumes you are working with a CORS-enabled site.
I want to load a whole site into a div. When I use
$(document).ready(function(){
$('#result').load('http://www.yahoo.com');
});
It's not working.
It will be a cross domain call to do using javascript. You can use iframe to load.
Check this link for possible solutions.
This is a cross-domain issue.
You can create a proxy on your server to fetch the data and you'll load it from your own domain.
What do you mean "a whole site", if you mean a given page, then it'll probably require all manner of header included files, which are not suitable to go in to the body of your page.
You would need to use an IFRAME, just create the IFRAME element and set the source to the URL you want.
Although I'm not sure about your use case of loading "whole" site into div - you are limited by "same domain" security policy, in order to make cross-domain AJAX calls you need to employ JSONP call http://api.jquery.com/jQuery.getJSON/
You can't do that unless the content you're loading comes from the same domain as the site you're loading it into, due to JavaScript's Same Origin Policy.
Your alternatives:
load the content into an iframe
pull the content server-side via an HTTP get, and the write it out to your page
Beware of licensing issues with the second option if you don't have permission to use the content, though!
I have an environment that doesn't allow server side scripting really (it is extremely difficult to get a script "installed" on the server). I tried using an iframe to violate javascript's same origin poilcy; however, that didn't work. Are there any other workarounds I am not aware of?
Thanks!
As David Dorward mentioned, JSON-P is the simplest and fastest; however, there is another trick, specifically using two iframes.
Two get around this issue without using JSONP, you can do the following. This technique assumes that you have some sort of development access to the parent page.
There are three pages on two domains/sites.
Parent page
Content page
Cross-domain communication page (aka "xdcomm")
Pages the parent and xdcomm pages are hosted on the same domain, the content page is hosted on any other domain. The content page is embedded as an iframe in the parent page and the xdcomm page is embedded as a hidden iframe in the content page.
The xdcomm page contains a very simple script that detects GET parameters in the query string, parses that string for method and args variables (where args is a JSON encoded string), and then executes the specified method with the specified arguments in the parent page. An example can be seen here (view source).
Even though JavaScript's Same Origin Policy restricts code on one domain from accessing that of another, it doesn't matter if domains are nested within each other (domain A, nested within domain B, nested within domain A).
So, in a nutshell, the content page sends messages to the parent page via the xdcomm page by changing the source of the iframe to something like http://domaina.com/xdcomm.html?src=foo&args=[1,2,3,4]. This would be equivalent to executing foo(1,2,3,4) in the parent page.
Also, know that there are already libraries that help you with this, such as easyxdm. What I've explained here is the basis of one of the techniques that they use, and while it might not be as fancy, it is certainly a fully functioning and lightweight implementation.
Hopefully not, as it would be a security hole! :)
But if both your sites are subdomains on the same domain, maybe document.domain can help.