How can I copy the source code from a website (with javascript)? I want to copy the text that is showing the temperature from this website: http://www.accuweather.com/
I want to copy only the number that is displaying the temperature. Is there a way of copying that exact line from source code on the website? I heard about html scraping. if not javascript, what would be simplest way of doing it? Just copying the temeprature, and displaying it on my webpage.
Well the way you could do something like that in a simple way by loading the site into a hidden HTML element via AJAX and then search DOM for the element you want.
There is also a jQuery command that allows that directly. It would be something like:
<div id='temp'></div>
<script>
$('div#temp').load('https://www.accuweather.com/ #popular-locations-ul .large-temp', { limit: 1 });
</script>
#popular-locations-ul .large-temp is a css locator for the specific elements that contain the temperature.
However for some time web has a security feature called CORS. To be able to load something from one site via AJAX, the target site has to allow CORS headers explicitly. In the case of this particular site, CORS headers aren't present in the site configuration, so that means that any connection that tries to load something via AJAX won't be allowed.
You can only use a command like the above mentioned in a site you control and that you specify to allow CORS headers or in a site who already has this specification.
But as people have told you that's not a good thing from the start due to web sites impermanent nature. Things change a lot. So even if you could get a value in the way I mentioned from some other site, sometime later, the site would change and your code would be broken.
The reason I answered is because you are just learning and need guidance and not trying to do 'serious work'. Serious work would be using an API as people told you.
An web api is a special url you access (something like https://www.accuweather.com:1234/api/temperature/somecity) normally with some kind of security and that responds with the result you need for the function you want. For this kind service CORS is allowed because you are accessing in a secure and 'official' way.
Hope I clarified a bit.
Related
Apologies if this is a roundabout way of asking this question, but I am a little confused about how the web and javascript work.
What I want to do: execute javascript on all pages of a list of urls I have found. (Specifically use jquery to pull info from them)
Problem I can't execute Javascript on these pages because they aren't mine and don't have the Access-Control-Allow-Origin header. So I can't load them (with AJAX) in order to use JQuery on them.
BUT Google Chrome can both load pages and execute javascript on them (with their developer's console). So if I wanted too, I could go to each page, open the developers console, and pull the information from there. If there's nothing stopping Chrome from accessing these, then why am I stopped? And, is there a way around this?
Thank you, and I hope my description makes sense. I've been researching this for a while but have found nothing that explains how seemingly inconsistent CORS is.
I could go to each page, open the developers console, and pull the information from there. If there's nothing stopping Chrome from accessing these, then why am I stopped?
You're not stopped. You, the human at the keyboard, can do exactly as you say, by visiting each page as a top-level page.
What is stopped -- happily -- is any and all scripts on the Web you happen to run having the same level of visibility that you do. Based on your cookies and your network topology, you have a unique view into the Web. You can see your home router's control interface (on 192.168.1.1 or similar). You can see any local web server you're running on 127.0.0.1. No one else can see these. If the same-origin policy were not in place, then any script that you loaded on the Web could inspect these.
And, is there a way around this?
If you have some scripts that you trust absolutely (hopefully a significant subset of "all scripts that exist on the Web") that you want to be able to bypass the same-origin policy and see your full, cross-domain view of the Web, you could load them as an extension, which can act with elevated permissions beyond the abilities of normal web pages. (See How does Same Origin Policy apply to browser extensions?)
I'm going to assume that you are looking to grab data from these pages that aren't yours and store it somewhere. I have done this before with curl using php. If you are looking to display these sites for users to interact in a different way, but starting from a page that is yours, you may be able to render these pages by grabbing the source html using curl and rendering it as a sort of proxy.
I've used this tutorial for something similar https://www.youtube.com/watch?v=_kQN-3aNCeI . Hopefully this gives you a start. I think you should be a little more detailed in your question though to get more help.
On the website http://imaginaryman-test.blogspot.com/ the typewriter is inside of an IFAME . Everything works correctly on all browsers when you go to the site directly http://castedspell.com/mark/ but when viewing the version embeded in an IFRAME it does not work on IE and throws errors in Chrome.
Unsafe JavaScript attempt to access frame with URL http://imaginaryman-test.blogspot.com/ from frame with URL http://castedspell.com/mark/. Domains, protocols and ports must match.
This is the source code for the embedded IFRAME
https://github.com/totheleftpanda/typeWrite/tree/master/mark
I understand that this is a security problem but I don`t know how to fix it and can not find any material that would help me solve the issue.
The easiest method is to set a PHP (or any server language) proxy that just gets the content of the page from the other domain and outputs it. The only real drawback is that the cookies of the client for the remote domain aren't sent.
Take a look at http://benalman.com/projects/jquery-postmessage-plugin/. This is a jquery plugin that sends message between the two frames. The two frames do not need to be on the same domain. But you do need to access both pages to be able modify them. I also wrote a post here that answers communication between iframes. How to capture clicks from iframe on another domain?
Your only chance is something like easyXDM. (or do it manually using the hash, but would prefer easyXDM)
See the SO answer: Cross-domain hash change communication
eg. if you wanna call a method:
http://easyxdm.net/wp/2010/03/17/remote-procedure-calls-rpc/
EDIT:
If I try your demo in firefox I don't get the "Unsafe JavaScript attempt to access" error at all. But in Chrome it's thrown many times.
You have so much other code in your example that I'm not even sure that your code causes the problem. You should do a very limited/basic test to see if your flash-communication works, without all those other javascripts.
I have had similar issues with this before. Basically if you have an iframe that contains a page from a domain that differs from the main page's domain, javascript will not be able to cross the boundaries between them. Javascript within the iframe will be able to talk within the iframe, javascript in the main page will be able to talk within the main page, but they will not be able to talk to each other.
This is a security issue that aims to stop cross-site scripting attacks. There are a number of hacks that you can put in place to get around this problem but they are all (or at least the ones I know of) rather hairy.
Here are some questions that you should answer before trying to go further:
1) What exactly are you trying to do between the pages using javascript?
2) Do you have access to the source of both pages?
It may be waaay simpler than the above answers. It looks like this function:
function playSound(){
swf.playSound();
}
Is written in the DOM timeline before swf is actually assigned to the swfObject in the function below it.
I would recommend moving that function down further and then retest.
I apologize if this has been asked before. I searched but did not find anything. It is a well-known limitation of AJAX requests (such as jQuery $.get) that they have to be within the same domain for security reasons. And it is a well-known workaround for this problem to use iframes to pull down some arbitrary HTML from another website and then you can inspect the contents of this HTML using javascript which communicates between the iframe and the parent page.
However, this doesn't work on the iPhone. In some tests I have found that iframes in the Safari iPhone browser only show content if it is content from the same site. Otherwise, they show a blank content area.
Is there any way around this? Are there other alternatives to using iframes that would allow me to pull the HTML from a different domain's page into javascript on my page?
Edit:
One answer mentioned JSONP. This doesn't help me because from what I understand JSONP requires support on the server I'm requesting data from, which isn't the case.
That same answer mentioned creating a proxy script on my server and loading data through there. Unfortunately this also doesn't work in my case. The site I'm trying to request data from requires user login. And I don't want my server to have to know the user's credentials. I was hoping to use something client-side so that my app wouldn't have to know the user's credentials at the other site.
I'm prepared to accept that there is no way to accomplish what I want to do on the iPhone. I just wanted to confirm it.
You generally can NOT inspect the contents of an iframe from another domain via JavaScript. The most common answers are to use JSONP or have your original server host a proxy script to retrieve the inner contents for you.
Given your revisions, without modification or support from the secondary site, you are definitely not going to be able to do what you want via the iPhone's browser.
"In some tests I have found that iframes in the Safari iPhone browser only show content if it is content from the same site"
I found the same thing. Is this documented somewhere? Is there a workaround? This sounds like broken web standards to me, and I am wondering if there is a solution.
Basically, what I'm trying to do is simply make a small script that accesses finds the most recent post in a forum and pulls some text or an image out of it. I have this working in python, using the htmllib module and some regex. But, the script still isn't very convenient as is, it would be much nicer if I could somehow put it into an HTML document. It appears that simply embedding Python scripts is not possible, so I'm looking to see if theres a similar feature like python's htmllib that can be used to access some other webpage and extract some information from it.
(Essentially, if I could get this script going in the form of an html document, I could just open one html document, rather than navigate to several different pages to get the information I want to check)
I'm pretty sure that javascript doesn't have the functionality I need, but I was wondering about other languages such as jQuery, or even something like AJAX?
As Greg mentions, an Ajax solution will not work "out of the box" when trying to load from remote servers.
If, however, you are trying to load from the same server, it should be fairly straightforward. I'm presenting this answer to show how this could be done using jQuery in just a few lines of code.
<div id="placeholder">Please wait, loading...</div>
<script type="text/javascript" src="/path/to/jquery.js">
</script>
<script type="text/javascript>
$(document).ready(function() {
$('#placeholder').load('/path/to/my/locally-served/page.html');
});
</script>
If you are trying to load a resource from a different server than the one you're on, one way around the security limitations would be to offer a proxy script, which could fetch the remote content on the server, and make it seem like it's coming from your own domain.
Here are the docs on jQuery's load method : http://docs.jquery.com/Ajax/load
There is one other nice feature to note, which is partial-page-loading. For example, lets say your remote page is a full HTML document, but you only want the content of a single div in that page. You can pass a selector to the load method, as in my example above, and this will further simplify your task. For example,
$('#placeholder').load('/path/to/my/locally-served/page.html #someTargetDiv');
Best of luck!-Mike
There are two general approaches:
Modify your Python code so that it runs as a CGI (or WSGI or whatever) module and generate the page of interest by running some server side code.
Use Javascript with jQuery to load the content of interest by running some client side code.
The difference between these two approaches is where the third party server sees the requests coming from. In the first case, it's from your web server. In the second case, it's from the browser of the user accessing your page.
Some browsers may not handle loading content from third party servers very gracefully (that is, they might pop up warning boxes or something).
You can embed Python. The most straightforward way would be to use the cgi module. If the script will be run often and you're using Apache it would be more efficient to use mod_python or mod_wsgi. You could even use a Python framework like Django and code the entire site in Python.
You could also code this in Javascript, but it would be much trickier. There's a lot of security concerns with cross-site requests (ah, the unsafe internet) and so it tends to be a tricky domain when you try to do it through the browser.
I'm just looking for clarification on this.
Say I have a small web form, a 'widget' if you will, that gets data, does some client side verification on it or other AJAX-y nonsense, and on clicking a button would direct to another page.
If I wanted this to be an 'embeddable' component, so other people could stick this on their sites, am I limited to basically encapsulating it within an iframe?
And are there any limitations on what I can and can't do in that iframe?
For example, the button that would take you to another page - this would load the content in the iframe? So it would need to exist outwith the iframe?
And finally, if the button the user clicked were to take them to an https page to verify credit-card details, are there any specific security no-nos that would stop this happening?
EDIT: For an example of what I'm on about, think about embedding either googlemaps or multimap on a page.
EDIT EDIT: Okay, I think I get it.
There are Two ways.
One - embed in an IFrame, but this is limited.
Two - create a Javascript API, and ask the consumer to link to this. But this is a lot more complex for both the consumer and the creator.
Have I got that right?
Thanks
Duncan
There's plus points for both methods. I for one, wouldn't use another person's Javascript on my page unless I was absolutely certain I could trust the source. It's not hard to make a malicious script that submits the values of all input boxes on a page. If you don't need access to the contents of the page, then using an iframe would be the best option.
Buttons and links can be "told" to navigate the top or parent frame using the target attribute, like so:
This is a link
<form action="http://some.url/with/a/page" target="_parent"><button type="submit">This is a button</button></form>
In this situation, since you're navigating away from the hosting page, the same-origin-policy wouldn't apply.
In similar situations, widgets are generally iframes placed on your page. iGoogle and Windows Live Gadgets (to my knowlege) are hosted in iframes, and for very good reason - security.
If you are using AJAX I assume you have a server written in C# or Java or some OO language.
It doesn't really matter what language only the syntax will vary.
Either way I would advise against the iFrame methods.
It will open up way way too many holes or problems like Http with Https (or vice-versa) in an iFrame will show a mixed content warning.
So what do you do?
Do a server-side call to the remote site
Parse the response appropriately on the server
Return via AJAX what you need
Display returned content to the user
You know how to do the AJAX just add a server-side call to the remote site.
Java:
URL url = new URL("http://www.WEBSITE.com");
URLConnection conn = url.openConnection();
or
C#:
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create("http://www.WEBSITE.com");
WebResponse res = req.GetResponse();
I think you want to get away from using inline frames if possible. Although they are sometimes useful, they can cause issues with navigation and bookmarking. Generally, if you can do it some other way than an iframe, that is the better method.
Given that you make an AJAX reference, a Javascript pointer would probably be the best bet i.e. embed what you need to do in script tags. Note that this is how Google embed things such as Google Analytics and Google Ads. It also has the benefit of also being pullable from a url hosted by you, thus you can update the code and 'voila' it is active in all the web pages that use this. (Google usually use version numbers as well so they don't switch everyone when they make changes).
Re the credit card scenario, Javascript is bound by the 'same origin policy'. For a clarification, see http://en.wikipedia.org/wiki/Same_origin_policy
Added: Google Maps works in the same way and with some caveats such as a user/site key that explicitly identify who is using the code.
Look into using something like jQuery, create a "plugin" for your component, just one way, and just a thought but if you want to share the component with other folks to use this is one of the things that can be done.