Am trying to understand the same origin policy in browsers (and also Javascript newbie) and ran into the JSONP page on wikipedia. The How It Works section says -
Now, consider that it is possible to specify any URL, including a URL that returns JSON, as the src > attribute for a element. This means it is possible to retrieve JSON via a script element in > an HTML page.
However, a JSON document is not a JavaScript program. If it is to be evaluated by the browser in a element, the return value from the src URL must be executable JavaScript. In the JSONP usage pattern, the URL returns the dynamically-generated JSON, with a function call wrapped around it. This is the "padding" (or sometimes, "prefix") of JSONP.
My questions are -
So is XMLHTTPRequest() supposed to return only javascript or html? Can it not return a pure json document?
I thought the same origin policy does not apply to XMLHttpRequest() call. Why is there a need to inject a tag into the DOM to make a call to a third party server? Is that how all the advertising add-ons to sites call home to collect data?
At the end of it I did not understand JSONP at all. Can some one explain or refer me to a better explanation please?
Thanks,
- P
So is XMLHTTPRequest() supposed to return only javascript or html?
It can return any text you like (and maybe binary data, but I've never see that tried so I won't swear to it)
Can it not return a pure json document?
It can.
I thought the same origin policy does not apply to XMLHttpRequest() call.
The same origin policy most definitely does apply to XHR
Why is there a need to inject a tag into the DOM to make a call to a third party server?
The same origin policy is bypassed by loading a script (with embedded data) from another origin.
This is because you aren't reading a remote resource using JavaScript. You are executing some remote JavaScript which comes with embedded data.
At the end of it I did not understand JSONP at all. Can some one explain or refer me to a better explanation please?
JSON-P is just loading some JavaScript from another origin. That JavaScript consists of a single function call (to a function you define before adding the <script> element) with a single argument (a JS object or array literal).
Related
As the title suggests, I'm looking for a hopefully straightforward way of scraping all of the HTML from a webpage. Storing it in a string perhaps, and then navigating through that string to pull out the desired element.
Specifically, I want to scrape my twitter page and display my profile picture inside a new div. I know there are several tools for doing just this, but I would anyone have some code examples or suggestions for how I might do this myself?
Thanks a lot
UPDATE
After a very helpful response from T.J. Crowder I did some more searching online and found this resource.
In theory, this is easy. You just do an ajax call to get the text of the page, then use jQuery to turn that into a disconnected DOM, and then use all the usual jQuery tools to find and extract what you need.
$.ajax({
url: "http://example.com/some/path",
success: function(html) {
var tree = $(html);
var imgsrc = tree.find("img.some-class").attr("src");
if (imgsrc) {
// ...add the image to your page
}
}
});
But (and it's a big one) it's not likely to work, because of the Same Origin Policy, which prevents cross-origin ajax calls. Certain individual sites may have an open CORS policy, but most won't, and of course supporting CORS on IE8 and IE9 requires an extra jQuery plug-in.
So to do this with sites that don't allow your origin via CORS, there must be a server involved. It can be your server and you can grab the text of the page you want using server-side code and then send it to your page via ajax (or just build the bits you want into your page when you first render it). All of the usual server-side stacks (PHP, Node, ASP.Net, JVM, ...) have the ability to grab web pages. Or, in some cases, you may be able to use YQL as a cross-domain proxy, using their server rather than your own.
In this John Resig article, he's is dealing with a dictionary-sized list of words with javascript, and he's loading the content via ajax from a CDN.
The words are loaded in with newlines separating the words. Then he says cross domain fails:
There's a problem, though: We can't load our dictionary from a CDN!
Since the CDN is located on another server (or on another sub-domain,
as is the case here) we're at the mercy of the browser's cross-origin
policy prohibiting those types of requests. All is not lost though -
with a simple tweak to the dictionary file we can load it across
domains.
First, we replace all endlines in the dictionary file with a space.
Second, we wrap the entire line with a JSONP statement. Thus the final
result looks something like this:
dictLoaded('aah aahed aahing aahs aal... zyzzyvas zzz');
This allows us to do an Ajax request for the file and have it work as
would expected it to - while still benefiting from all the caching and
compression provided by the browser.
So, if I'm reading this correctly, simply adding his method dictLoaded('original content') around the original content alone causes the ajax request to not fail.
Is that (turning it into a function + param) really all it takes? and why does JSONP solve the problem of cross domain access restriction?
the <script> tags can load any JS file from anywhere (even cross domain). The nice thing that comes with it is that the code inside that script is also executed, therefore, a method of bypassing cross-domain restrictions.
The problem is, when the code gets executed, it's executed in the global scope. so having this code:
var test = 'foo'
will create a test variable in the global scope.
To mitigate this, you use enclose the reply in a function. This is the "P" in "JSONP" which means "padding". This encloses your reply in a function call.
So if your foreign script has:
myFunction({
test : 'foo'
});
It calls myFunction and passes an object with test key which has value foo. The receiving function would look like:
function myFunction(data){
//"data.test" is "foo"
}
Now we have successfully bypassed the cross-domain restriction. The essential parts needed are:
the receiving function (which can be dynamically created and discarded after use)
the "padded" JSON reply
Is that (turning it into a function + param) really all it takes?
Yes.
and why does that solve the problem of cross domain access restriction?
You should read about JSONP. The idea is that you can now include a <script> tag dynamically pointing to the resource instead of sending an AJAX request (which is prohibited). And since you have wrapped the contents with a function name, this function will be executed and passed as argument the JSON object. So all that's left for you is to define this function.
It is because of the JSONP statement that he added.
"JSON with padding" is a complement to the base JSON data format. It provides a method to request data from a server in a different domain, something prohibited by typical web browsers because of the Same origin policy.
This works via script element injection.
JSONP makes sense only when used with a script element. For each new JSONP request, the browser must add a new element, or reuse an existing one. The former option - adding a new script element - is done via dynamic DOM manipulation, and is known as script element injection. The element is injected into the HTML DOM, with the URL of the desired JSONP endpoint set as the "src" attribute. This dynamic script element injection is usually done by a javascript helper library. jQuery and other frameworks have jsonp helper functions; there are also standalone options.
Source: http://en.wikipedia.org/wiki/JSONP
I am trying to make a link where when it is clicked, it goes to the site it is supposed to, but it also runs a cgi script. I have found different examples, but I still don't fully understand it.
In essence, I have two questions:
Where can I host the script so I can access it?
How do I access it?
Where can I host the script so I can access it?
If you want to access it from JavaScript then it has to be on the same origin (i.e. hostname and port) as the page the JavaScript is running in.
How do I access it?
You can either forget JavaScript, have a regular link and then have the CGI perform a 302 redirect, or you can use Ajax.
Beware of timing issues. It is possible for the browser to go to the next URL before it gets around to making the Ajax request. A redirect would probably be a better approach.
<script>
function callYourCGI(){
var i = new Image();
i.src = "your-cgi-url?name=value&name2=value2";
}
</script>
<a href="the-next-document.html" onclick="callYourCGI()">
The Image object is part of the HTML DOM. It allows you to manipulate images in an HTML page. Read about it here: http://www.javascriptkit.com/jsref/image.shtml
The script is creating an image object and then assigning the URL of your CGI script to the SRC attribute. This makes the browser perform a get request for the content of the URL. In this case, you aren't going to display the image object, so the content returned by your CGI script need not be a real image. If can be, if you want, though. Either way, the side-effect is that your CGI script is called, with some parameters if desired. An advantage of this method is that it does not violate the same origin policy, since images are allowed to be loaded from anywhere.
Say my html file is from http://foo.com/index.html, in it, there's a <script> tag to http://bar.com/bar.js. In bar.js, I want to start a SharedWorker where the url is http://bar.com/worker.js. Is there a way to achieve this (maybe something like jsonp)?
The preferred way to do this sort of cross-domain access these days is using the W3 CORS specification.
Cross-Origin Resource Sharing
However, this might not be suitable for you if you do not control the the site at bar.com. If you do, then CORS is definitely a good option, but you may need to resort to JSONP if bar.com is run by another party, since CORS depends on the site sending back specific headers authorizing your browser to download the resource you requested.
This is a solution I found:
Write the script inside a function (can be an inner function)
get the text using function.toString() (removing the function declaration and closing brace)
append the text to a BlobBuilder and get the blob
Use window.URL.createObjectURL to convert the blob to a url
use that url for the worker
Requesting data from any location on my domain with .load() (or any jQuery ajax functions) works just fine.
Trying to access a URL in a different domain doesn't work though. How do you do it? The other domain also happens to be mine.
I read about a trick you can do with PHP and making a proxy that gets the content, then you use jQuery's ajax functions, on that php location on your server, but that's still using jQuery ajax on your own server so that doesn't count.
Is there a good plugin?
EDIT: I found a very nice plugin for jQuery that allows you to request content from other pages using any of the jQuery function in just the same way you would a normal ajax request in your own domain.
The post: http://james.padolsey.com/javascript/cross-domain-requests-with-jquery/
The plugin: https://github.com/jamespadolsey/jQuery-Plugins/tree/master/cross-domain-ajax/
This is because of the cross-domain policy, which, in sort, means that using a client-side script (a.k.a. javascript...) you cannot request data from another domain. Lucky for us, this restriction does not exist in most server-side scripts.
So...
Javascript:
$("#google-html").load("google-html.php");
PHP in "google-html.php":
echo file_get_contents("http://www.google.com/");
would work.
Different domains = different servers as far as your browser is concerned. Either use JSONP to do the request or use PHP to proxy. You can use jQuery.ajax() to do a cross-domain JSONP request.
One really easy workaround is to use Yahoo's YQL service, which can retrieve content from any external site.
I've successfully done this on a few sites following this example which uses just JavaScript and YQL.
http://icant.co.uk/articles/crossdomain-ajax-with-jquery/using-yql.html
This example is a part of a blog post which outlines a few other solutions as well.
http://www.wait-till-i.com/2010/01/10/loading-external-content-with-ajax-using-jquery-and-yql/
I know of another solution which works.
It does not require that you alter JQuery. It does require that you can stand up an ASP page in your domain. I have used this method myself.
1) Create a proxy.asp page like the one on this page http://www.itbsllc.com/zip/proxyscripts.html
2) You can then do a JQuery load function and feed it proxy.asp?url=.......
there is an example on that link of how exactly to format it.
Anyway, you feed the foreign page URL and your desired mime type as get variables to your local proxy.asp page. The two mime types I have used are text/html and image/jpg.
Note, if your target page has images with relative source links those probably won't load.
I hope this helps.