How to parse html content using javascript or jQuery - javascript

Is there a way to parse html content using javascript?
I have a requirement to display only a div from some other site into my site. Is that possible? For example consider I want to show only div#leftcolumn of w3schools.com in my site. Is this even possible?
How can I do the same using javascript or jQuery?
Thanks.

You need to have a look at Same Origin Policy:
In computing, the same origin policy
is an important security concept for a
number of browser-side programming
languages, such as JavaScript. The
policy permits scripts running on
pages originating from the same site
to access each other's methods and
properties with no specific
restrictions, but prevents access to
most methods and properties across
pages on different sites.
For you to be able to get data, it has to be:
Same protocol and host
You need to implement JSONP to workaround it.
Though on same protocol and host, jQuery has load() function which you would use like this:
$('#foo').load('somepage.html div#leftcolumn', function(){
// loaded
});
Another possible solution (untested) would be to use server-side language and you don't need jsonp. Here is an example with PHP.
1) Create a php page named ajax.php and put following code in it:
<?php
$content = file_get_contents("http://w3schools.com");
echo $content ? $content : '0';
?>
2) On some page, put this code:
$('#yourDiv').load('ajax.php div#leftcolumn', function(data){
if (data !== '0') { /* loaded */ }
});
Make sure that:
you specify correct path to ajax.php file
you have allow_url_fopen turned on from php.ini.
your replace yourDiv with id of element you want to put the received content in

You will need to grab the HTML content with an HTTPRequest, then you can scrape the contents of the HTML you wish to show in your page. You would need to know some sort of server side language for this, unfortunately Ajax/jQuery will not work for this due to browser security restrictions, most "Ajax" requests are subject to the same origin policy; the request can not successfully retrieve data from a different domain, subdomain, or protocol.

what i can think of:
<div style="hidden" id="container"></div>
and then do sth like (shortcut # https://stackoverflow.com/a/11333936/57508)
var $container = $('#container');
$container.load('someurl-on-your-domain');
var $leftcolumn = $('div#leftcolumn', $container);
$leftcolumn.appendTo($sthother);
according to a comment: yes it is true, there's a same-origin policy (http://api.jquery.com/load/):
Due to browser security restrictions, most "Ajax" requests are subject
to the same origin policy; the request can not successfully retrieve
data from a different domain, subdomain, or protocol.
So why not create a proxy which is in your domain and then use the output of the proxy?! Hey, it's long-winded - true ... but it works :)

You would need to make a webservice to pull the code in. This is because you cannot pull the data in via JavaScript due to security restrictions. This is known as same origin policy and is linked elsewhere in this page.
You could use HtmlAgilityPack to parse it on the server side if you're working with asp.net technologies.
How to use HTML Agility pack
You would then be able to call the data from jQuery using .load():
http://api.jquery.com/load/
The idea being you load it into a hidden div such as:
$("#result").load("/webservice/pulldata.ashx");
and query it like you would any normal jquery element.

If you want to bypass XSS protection you can write your own server request and get info from it.
Example (php):
getContent.php
<? $fileContent = file_get_content("http://w3schools.com");
echo $fileContent; ?>
Then you can use whatever you want to modify this content (even before echo).
sample client script:
<div id="resultHtml"></div>
<script type="text/javascript">
$(document).ready(function(){
$("#resultHtml").load("getFilecontent.php");
});

Related

How to load a cross-domain page using JavaScript

I have a page on my site (let's say on domain A) and I would like to pull in some more content into it from another page, say, on domain B. As a default, this functionality is blocked by the browsers for security reasons.
As far as I've found, there are a few ways to do this.
CORS: As I understand, this method requires contributions from both the server and the client. The server needs to add a header to its response (i.e. Access-Control-Allow-Origin: [DOMAINS], as of http://enable-cors.org/server.html). On the other hand, the client needs to adjust their requests (e.g. http://www.html5rocks.com/en/tutorials/cors/).
If using jQuery, there is this small plug-in which uses the YahooAPI (i.e. http://james.padolsey.com/snippets/cross-domain-requests-with-jquery/). The advantage of this is that the client can use it on its own to get pages from other domains. The catch is that Yahoo limits the number of requests per hour per IP, and for commercial use Yahoo's permission is needed.
I've also read about JSONP but I haven't done much digging.
My question is: are there other possibly better options that I might be overlooking?
For the record, the site I'm working with is a huge commercial site with millions of users every day.
You can do JSONP, permit CORS and use plain JSON, use a DIY JSONP wrapper, or use a JSONP Proxy service. Here are the solutions in detail: JSONP with remote URL does not work
The easiest option in your situation is to roll your own JSONP proxy service. Here's a demo barebones PHP wrapper to get past CORS if you fetch a JSON string. No catch, no limits unlike Yahoo's YQL.
<?php
$callback = isset($_GET["callback"]) ? $_GET["callback"] : "?";
$json = file_get_contents('http://somedomain.com/someurl/results.json');
header('Access-Control-Allow-Origin: *');
header("Content-type: application/json");
echo $callback . "(" . $json . ");";
?>
Are you trying to get content, or code? If you're trying to get content, is it possible to just use an iframe?
If you want code, I think the options you outlined are pretty much what you have available. JSONP might be your best bet due to browser support. For example, IE only supported it as of version 10. If you're on a site with millions of users per day, my guess is there are some folks on older versions of IE (unfortunately).
Edit: Depending on the content, another option is to introduce your own local proxy. For example, I've done things where I need to call WebServiceX on some other provider. I call the WebServiceX in server side code and implement my own web service that my JavaScript accesses. This means I'm not going cross domain because the cross domain access happens server-side, not client-side. It also allowed me to introduce caching and other things (depending on the type of data) that improved performance.
Approach for cross domain data passing - create JavaScript object and assign source from another domain. Here is quick and dirty example:
File test.html:
<html>
<body>
Test done
</body>
<script>
var s = document.createElement("script");
s.type='text/javascript';
s.src='test.js';
document.body.appendChild(s);
</script>
</html>
and test.js
abc={a:'A',b:'B',c:'C'};
alert(abc.a);
test.js could be in any domain and function alert() could be any function.
I have more elegant ways to attach or run such approach but this one is sufficient enough to undersatnd the idea.

Loading external content into server on localhost

I am trying to create a web application that loads content dynamically. When I do this, of course I want to do the development locally, i.e. localhost. Some of the "functionality" is a form and when posting that form an e-mail is sent from the server. Because I want to access the servers e-mail functionality, I am linking that specific page to the server. But the problem is that it is not loaded.
In my script below it works, but if I change the comments so I am pointing to iandapp.com, than I just get empty string. It's exactly the same page, just copied it to the server.
$("#support").click(function () {
if(support_page==null){
//$("#section2").load("http://www.iandapp.com/smic/subscription_2.php", function(data) {
$("#section2").load("subscription_2.php", function(data) {
support_page = data;
});
}
The script is located inte the main page (index.html) and content should be loaded into a div with id="section2".
I know that (support_page==null) is true because I have a break point inside where it stops.
Please let me know what the probelm is and how I can fix it. I have been going on for hours trying to get this working.
Thanks in advance!
google about
cross domain ajax requests
. This is disabled in the browser level. There are ways to circumvent this, both client side and server side.
It probably has something to do with it being a cross-domain request. You could use what I consider to be a "hack", http://james.padolsey.com/javascript/cross-domain-requests-with-jquery/, but I.M.O. it's not worth it.
Have you considered sending through an SMTP server instead? If so, you'd have no problem with the file (sending the mail) being local.
And what about adding proper headers on server's http response to allow crossdomain ?
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: *
Access-Control-Origin: *
Use .getJSON() instead of .load(), this method supports cross-domain requests. You'll need to make sure your PHP script does something like the following:
echo $_GET['callback'] . '(' . json_encode($results) . ')';
jQuery will append something like ?callback=callback0234 to the request url because it wants you to 'call' the callback function when your script returns. So the output of your script may look something like:
callback0234('mydata': '<p>This is my data</p>')

How do you get content from another domain with .load()?

Requesting data from any location on my domain with .load() (or any jQuery ajax functions) works just fine.
Trying to access a URL in a different domain doesn't work though. How do you do it? The other domain also happens to be mine.
I read about a trick you can do with PHP and making a proxy that gets the content, then you use jQuery's ajax functions, on that php location on your server, but that's still using jQuery ajax on your own server so that doesn't count.
Is there a good plugin?
EDIT: I found a very nice plugin for jQuery that allows you to request content from other pages using any of the jQuery function in just the same way you would a normal ajax request in your own domain.
The post: http://james.padolsey.com/javascript/cross-domain-requests-with-jquery/
The plugin: https://github.com/jamespadolsey/jQuery-Plugins/tree/master/cross-domain-ajax/
This is because of the cross-domain policy, which, in sort, means that using a client-side script (a.k.a. javascript...) you cannot request data from another domain. Lucky for us, this restriction does not exist in most server-side scripts.
So...
Javascript:
$("#google-html").load("google-html.php");
PHP in "google-html.php":
echo file_get_contents("http://www.google.com/");
would work.
Different domains = different servers as far as your browser is concerned. Either use JSONP to do the request or use PHP to proxy. You can use jQuery.ajax() to do a cross-domain JSONP request.
One really easy workaround is to use Yahoo's YQL service, which can retrieve content from any external site.
I've successfully done this on a few sites following this example which uses just JavaScript and YQL.
http://icant.co.uk/articles/crossdomain-ajax-with-jquery/using-yql.html
This example is a part of a blog post which outlines a few other solutions as well.
http://www.wait-till-i.com/2010/01/10/loading-external-content-with-ajax-using-jquery-and-yql/
I know of another solution which works.
It does not require that you alter JQuery. It does require that you can stand up an ASP page in your domain. I have used this method myself.
1) Create a proxy.asp page like the one on this page http://www.itbsllc.com/zip/proxyscripts.html
2) You can then do a JQuery load function and feed it proxy.asp?url=.......
there is an example on that link of how exactly to format it.
Anyway, you feed the foreign page URL and your desired mime type as get variables to your local proxy.asp page. The two mime types I have used are text/html and image/jpg.
Note, if your target page has images with relative source links those probably won't load.
I hope this helps.

Can a Client Link to My JavaScript, Hosted on a Different Domain?

Is it possible for me to supply a client with a snippet of HTML which contains a reference to a javascript file that I host? They want to paste this HTML into their CMS, so that when their page loads, it'll load our content.
I was under the impression that there was cross domain security preventing this from being possible.
What if, instead of linking to the JavaScript, I gave them the snippet of HTML with the JavaScript already included
so instead of
<div>
<!-- link to js -->
</div>
I gave them
<div>
$.get(/*url to my content*/);
</div>
Would that work?
You could use JSONP to simulate cross domain AJAX calls (works only with GET requests as internally it uses a script tag):
$.getJSON("http://api.flickr.com/services/feeds/photos_public.gne?tags=cat&tagmode=any&format=json&jsoncallback=?",
function(data) {
$.each(data.items, function(i,item) {
$("<img/>").attr("src", item.media.m).appendTo("#images");
if ( i == 3 ) return false;
});
}
);
Is it possible for me to supply a client with a snippet of HTML which contains a reference to a javascript file that I host?
Yes. The src of script elements has no same origin limits.
$.get(/*url to my content*/);
XMLHttpRequests still do have same origin limits. XHR can only fetch from the domain of the page, not the script.
The HTML <script> tags are exempt from the same origin policy, so if your client links to your JavaScript file with <script> tags, you will not have any problems. (Source)
Referencing a javascript file from a different domain is no problem. This is not cross site scripting, it's simply a cross site HTTP request. This is used a lot, e.g. by Google's JavaScript API Loader.

How to get the content of a remote page with JavaScript?

I have a URL of a remote page from a different domain which I have to download, parse, and update DOM of the current page. I've found examples of doing this using new ActiveXObject("Msxml2.XMLHTTP"), but that's limited to IE, I guess, and using new java.net.URL, but I don't want to use Java. Are there any alternatives?
Same domain policy is going to get you.
1) Proxy through your server. browser->your server->their server->your server->browser.
2) Use flash or silverlight. The 3rd party has to give you access. The bridge between javascript and flash isn't great for large amounts of data and there are bugs. Silverlight isn't ubiquitous like flash...
3) use a tag. This really isn't safe... Only works if 3rd party content is valid javascript.
Whats about load an PHP Script via AJAX which does file_get_contents() ? This should work for different domain. If i understand correct.
Writing a server-side script that will retrieve the page's content for you is the way to go. You can use the XMLHttpRequest object to make an AJAX call to that script, which will just put through all html (?) for you.
Still, I advise against it. I don't know exactly how much you trust the other site, but the same origin policy exists for a reason. What is it exactly you are trying to do? Usually, there is a workaround.
I dont think you can do this according to the constraints of same origin policy. Two communicate between two domains using Iframes also we can use JS code but both domains need to have communicating code in them. The Child frame can contact the grandparent frame (window) but not here.
Since you are referring to some other url all togeather.
The only way is to do it using your server side code to access the content on the other domain.
Just use PHP:
<?php
$url = "http://www.domaintoretrieve.com";
ob_start();
include_once( $url );
$html = ob_get_contents();
ob_end_clean();
?>
$html contains the entire page to manipulate as needed.
The XMLHTTPRequest object is common to most modern browsers and is what powers AJAX web applications.

Categories

Resources