I have a page on my site (let's say on domain A) and I would like to pull in some more content into it from another page, say, on domain B. As a default, this functionality is blocked by the browsers for security reasons.
As far as I've found, there are a few ways to do this.
CORS: As I understand, this method requires contributions from both the server and the client. The server needs to add a header to its response (i.e. Access-Control-Allow-Origin: [DOMAINS], as of http://enable-cors.org/server.html). On the other hand, the client needs to adjust their requests (e.g. http://www.html5rocks.com/en/tutorials/cors/).
If using jQuery, there is this small plug-in which uses the YahooAPI (i.e. http://james.padolsey.com/snippets/cross-domain-requests-with-jquery/). The advantage of this is that the client can use it on its own to get pages from other domains. The catch is that Yahoo limits the number of requests per hour per IP, and for commercial use Yahoo's permission is needed.
I've also read about JSONP but I haven't done much digging.
My question is: are there other possibly better options that I might be overlooking?
For the record, the site I'm working with is a huge commercial site with millions of users every day.
You can do JSONP, permit CORS and use plain JSON, use a DIY JSONP wrapper, or use a JSONP Proxy service. Here are the solutions in detail: JSONP with remote URL does not work
The easiest option in your situation is to roll your own JSONP proxy service. Here's a demo barebones PHP wrapper to get past CORS if you fetch a JSON string. No catch, no limits unlike Yahoo's YQL.
<?php
$callback = isset($_GET["callback"]) ? $_GET["callback"] : "?";
$json = file_get_contents('http://somedomain.com/someurl/results.json');
header('Access-Control-Allow-Origin: *');
header("Content-type: application/json");
echo $callback . "(" . $json . ");";
?>
Are you trying to get content, or code? If you're trying to get content, is it possible to just use an iframe?
If you want code, I think the options you outlined are pretty much what you have available. JSONP might be your best bet due to browser support. For example, IE only supported it as of version 10. If you're on a site with millions of users per day, my guess is there are some folks on older versions of IE (unfortunately).
Edit: Depending on the content, another option is to introduce your own local proxy. For example, I've done things where I need to call WebServiceX on some other provider. I call the WebServiceX in server side code and implement my own web service that my JavaScript accesses. This means I'm not going cross domain because the cross domain access happens server-side, not client-side. It also allowed me to introduce caching and other things (depending on the type of data) that improved performance.
Approach for cross domain data passing - create JavaScript object and assign source from another domain. Here is quick and dirty example:
File test.html:
<html>
<body>
Test done
</body>
<script>
var s = document.createElement("script");
s.type='text/javascript';
s.src='test.js';
document.body.appendChild(s);
</script>
</html>
and test.js
abc={a:'A',b:'B',c:'C'};
alert(abc.a);
test.js could be in any domain and function alert() could be any function.
I have more elegant ways to attach or run such approach but this one is sufficient enough to undersatnd the idea.
Related
As the title suggests, I'm looking for a hopefully straightforward way of scraping all of the HTML from a webpage. Storing it in a string perhaps, and then navigating through that string to pull out the desired element.
Specifically, I want to scrape my twitter page and display my profile picture inside a new div. I know there are several tools for doing just this, but I would anyone have some code examples or suggestions for how I might do this myself?
Thanks a lot
UPDATE
After a very helpful response from T.J. Crowder I did some more searching online and found this resource.
In theory, this is easy. You just do an ajax call to get the text of the page, then use jQuery to turn that into a disconnected DOM, and then use all the usual jQuery tools to find and extract what you need.
$.ajax({
url: "http://example.com/some/path",
success: function(html) {
var tree = $(html);
var imgsrc = tree.find("img.some-class").attr("src");
if (imgsrc) {
// ...add the image to your page
}
}
});
But (and it's a big one) it's not likely to work, because of the Same Origin Policy, which prevents cross-origin ajax calls. Certain individual sites may have an open CORS policy, but most won't, and of course supporting CORS on IE8 and IE9 requires an extra jQuery plug-in.
So to do this with sites that don't allow your origin via CORS, there must be a server involved. It can be your server and you can grab the text of the page you want using server-side code and then send it to your page via ajax (or just build the bits you want into your page when you first render it). All of the usual server-side stacks (PHP, Node, ASP.Net, JVM, ...) have the ability to grab web pages. Or, in some cases, you may be able to use YQL as a cross-domain proxy, using their server rather than your own.
Is there a way to parse html content using javascript?
I have a requirement to display only a div from some other site into my site. Is that possible? For example consider I want to show only div#leftcolumn of w3schools.com in my site. Is this even possible?
How can I do the same using javascript or jQuery?
Thanks.
You need to have a look at Same Origin Policy:
In computing, the same origin policy
is an important security concept for a
number of browser-side programming
languages, such as JavaScript. The
policy permits scripts running on
pages originating from the same site
to access each other's methods and
properties with no specific
restrictions, but prevents access to
most methods and properties across
pages on different sites.
For you to be able to get data, it has to be:
Same protocol and host
You need to implement JSONP to workaround it.
Though on same protocol and host, jQuery has load() function which you would use like this:
$('#foo').load('somepage.html div#leftcolumn', function(){
// loaded
});
Another possible solution (untested) would be to use server-side language and you don't need jsonp. Here is an example with PHP.
1) Create a php page named ajax.php and put following code in it:
<?php
$content = file_get_contents("http://w3schools.com");
echo $content ? $content : '0';
?>
2) On some page, put this code:
$('#yourDiv').load('ajax.php div#leftcolumn', function(data){
if (data !== '0') { /* loaded */ }
});
Make sure that:
you specify correct path to ajax.php file
you have allow_url_fopen turned on from php.ini.
your replace yourDiv with id of element you want to put the received content in
You will need to grab the HTML content with an HTTPRequest, then you can scrape the contents of the HTML you wish to show in your page. You would need to know some sort of server side language for this, unfortunately Ajax/jQuery will not work for this due to browser security restrictions, most "Ajax" requests are subject to the same origin policy; the request can not successfully retrieve data from a different domain, subdomain, or protocol.
what i can think of:
<div style="hidden" id="container"></div>
and then do sth like (shortcut # https://stackoverflow.com/a/11333936/57508)
var $container = $('#container');
$container.load('someurl-on-your-domain');
var $leftcolumn = $('div#leftcolumn', $container);
$leftcolumn.appendTo($sthother);
according to a comment: yes it is true, there's a same-origin policy (http://api.jquery.com/load/):
Due to browser security restrictions, most "Ajax" requests are subject
to the same origin policy; the request can not successfully retrieve
data from a different domain, subdomain, or protocol.
So why not create a proxy which is in your domain and then use the output of the proxy?! Hey, it's long-winded - true ... but it works :)
You would need to make a webservice to pull the code in. This is because you cannot pull the data in via JavaScript due to security restrictions. This is known as same origin policy and is linked elsewhere in this page.
You could use HtmlAgilityPack to parse it on the server side if you're working with asp.net technologies.
How to use HTML Agility pack
You would then be able to call the data from jQuery using .load():
http://api.jquery.com/load/
The idea being you load it into a hidden div such as:
$("#result").load("/webservice/pulldata.ashx");
and query it like you would any normal jquery element.
If you want to bypass XSS protection you can write your own server request and get info from it.
Example (php):
getContent.php
<? $fileContent = file_get_content("http://w3schools.com");
echo $fileContent; ?>
Then you can use whatever you want to modify this content (even before echo).
sample client script:
<div id="resultHtml"></div>
<script type="text/javascript">
$(document).ready(function(){
$("#resultHtml").load("getFilecontent.php");
});
Requesting data from any location on my domain with .load() (or any jQuery ajax functions) works just fine.
Trying to access a URL in a different domain doesn't work though. How do you do it? The other domain also happens to be mine.
I read about a trick you can do with PHP and making a proxy that gets the content, then you use jQuery's ajax functions, on that php location on your server, but that's still using jQuery ajax on your own server so that doesn't count.
Is there a good plugin?
EDIT: I found a very nice plugin for jQuery that allows you to request content from other pages using any of the jQuery function in just the same way you would a normal ajax request in your own domain.
The post: http://james.padolsey.com/javascript/cross-domain-requests-with-jquery/
The plugin: https://github.com/jamespadolsey/jQuery-Plugins/tree/master/cross-domain-ajax/
This is because of the cross-domain policy, which, in sort, means that using a client-side script (a.k.a. javascript...) you cannot request data from another domain. Lucky for us, this restriction does not exist in most server-side scripts.
So...
Javascript:
$("#google-html").load("google-html.php");
PHP in "google-html.php":
echo file_get_contents("http://www.google.com/");
would work.
Different domains = different servers as far as your browser is concerned. Either use JSONP to do the request or use PHP to proxy. You can use jQuery.ajax() to do a cross-domain JSONP request.
One really easy workaround is to use Yahoo's YQL service, which can retrieve content from any external site.
I've successfully done this on a few sites following this example which uses just JavaScript and YQL.
http://icant.co.uk/articles/crossdomain-ajax-with-jquery/using-yql.html
This example is a part of a blog post which outlines a few other solutions as well.
http://www.wait-till-i.com/2010/01/10/loading-external-content-with-ajax-using-jquery-and-yql/
I know of another solution which works.
It does not require that you alter JQuery. It does require that you can stand up an ASP page in your domain. I have used this method myself.
1) Create a proxy.asp page like the one on this page http://www.itbsllc.com/zip/proxyscripts.html
2) You can then do a JQuery load function and feed it proxy.asp?url=.......
there is an example on that link of how exactly to format it.
Anyway, you feed the foreign page URL and your desired mime type as get variables to your local proxy.asp page. The two mime types I have used are text/html and image/jpg.
Note, if your target page has images with relative source links those probably won't load.
I hope this helps.
I have a URL of a remote page from a different domain which I have to download, parse, and update DOM of the current page. I've found examples of doing this using new ActiveXObject("Msxml2.XMLHTTP"), but that's limited to IE, I guess, and using new java.net.URL, but I don't want to use Java. Are there any alternatives?
Same domain policy is going to get you.
1) Proxy through your server. browser->your server->their server->your server->browser.
2) Use flash or silverlight. The 3rd party has to give you access. The bridge between javascript and flash isn't great for large amounts of data and there are bugs. Silverlight isn't ubiquitous like flash...
3) use a tag. This really isn't safe... Only works if 3rd party content is valid javascript.
Whats about load an PHP Script via AJAX which does file_get_contents() ? This should work for different domain. If i understand correct.
Writing a server-side script that will retrieve the page's content for you is the way to go. You can use the XMLHttpRequest object to make an AJAX call to that script, which will just put through all html (?) for you.
Still, I advise against it. I don't know exactly how much you trust the other site, but the same origin policy exists for a reason. What is it exactly you are trying to do? Usually, there is a workaround.
I dont think you can do this according to the constraints of same origin policy. Two communicate between two domains using Iframes also we can use JS code but both domains need to have communicating code in them. The Child frame can contact the grandparent frame (window) but not here.
Since you are referring to some other url all togeather.
The only way is to do it using your server side code to access the content on the other domain.
Just use PHP:
<?php
$url = "http://www.domaintoretrieve.com";
ob_start();
include_once( $url );
$html = ob_get_contents();
ob_end_clean();
?>
$html contains the entire page to manipulate as needed.
The XMLHTTPRequest object is common to most modern browsers and is what powers AJAX web applications.
Previously, Google's Friend Connect required users to upload a couple of files to their websites to enable cross domain communication and Facebook Connect still requires you to upload a single file to enabled it.
Now, Friend Connect doesn't require any file upload... I was wondering how they were able to accomplish this.
Reference:
http://www.techcrunch.com/2009/10/02/easy-does-it-google-friend-connect-one-ups-facebook-connects-install-wizard/
There are multiple methods of communicating between documents on different domains, amongst these HTML5 postMessage, NIX, FIM(hash/fragment), frameElement and by using the window.name property.
These are available on different browsers and in different versions, but collectively they allow you to do reliable XDM (cross domain messaging).
One project that have done this earlier is Apache Shindig, which probably pioneered quite a few of these, and more recently, the project easyXDM has come, unifying all of these approaches with a common API, making it easy to create complex applications using XDM and RPC.
You can read in depth about the various methods of transporting the data in this article at Script Junkie.
Now, to answer your question directly, earlier on it was quite common to believe that there was only postMessage, the FIM (Fragment Identifier Messaging) available, and for the latter to work efficiently, one often had to upload a special file to your domain. As more methods have been discovered, this has by many been deprecated as a technique, and hence; no more need for the file.
Just for the record; I'm the author of both the Script Junkie article, and the easyXDM library (that is what Twitter, Disqus and quite a few more are using by the way).
<edit>It's difficult to remember/verify now, but I believe my answer here was probably incorrect. Sean Kinsey's answer above should be the definitive answer to this question. If you're reading this, please upvote his answer and ignore mine.</edit>
The Google Friend Connect widget works like most ads/gadgets do, using a copy/pasted snippet of HTML to reference a JavaScript include on the host's server which then creates an iframe containing the desired content. By opening the iframe with your site ID in the URL, Google's server is able to generate the appropriate HTML document to represent a Friend Connect gadget for your particular site/settings.
There isn't any cross-site communication happening beyond that initial step of creating an iframe with the appropriate URL target. Everything inside the gadget's dynamically generated iframe is more like the user visited a separate page on Google's server, but what would have been displayed is then embedded/isolated in a block on your page instead.
I'm not sure how it works in this particular instance but cross-domain messaging can be accomplished either by the postMessage() API or by changing the hash part of the URL and monitoring that.
The hash change method works because both the enclosing and the enclosed pages have access to the enclosed page's URL.
Of course, hopefully the postMessage() API call becomes more standard over time.
JSON allows cross-domain javascript.
Due to browser security restrictions,
most "Ajax" requests are subject to
the same origin policy; the request
can not successfully retrieve data
from a different domain, subdomain,
or protocol.
Script and JSONP
requests are not subject to the same
origin policy restrictions.
There is no other method than using the somewindow.postMessage(); for communication between cross-domain iframes.
Before somewindow.postMessage() you had to upload file in order to ensure that you can establish communication between iframes.
example:
<html>
<!-- this is main domain www.example.com -->
<head>
</head>
<body>
<iframe src="http://www.exampleotherdomain.com/">
<script>
function sendMsg(a) {
var f = document.createElement('iframe'),
k = document.getElementById('ifr');
f.setAttribute('src', 'http://www.example.com/xdreciver.html#myValueisSent');
k.appendChild(f);
k.removeChild(f);
}
</script>
<div id="ifr"></div>
</iframe>
</body>
</html>
now the http://www.example.com/xdreciver.html html content :
<html>
<!-- this is http://www.example.com/xdreciver.html -->
<head>
<script>
function getMsg() {
return window.location.hash;
}
</script>
</head>
<body onload="var msg = getMsg(); alert(msg);">
</body>
</html>
As for using the .postMessage(); its enough to use top.postMessage('my message to other domain document, which is also the main document', 'http://www.theotherdomain.com');