How to get html of another website on the client-side? - javascript

I am trying to write a javascript script that would scrape the HTML source code of another website (ex. www.google.pl).
I found a few solutions, but none worked. I tried to run this code:
var url = "http://google.com/";
$.ajax({
url: url,
success: function(data) {
alert(data);
}
});
but it returns: "Status Code: 301 Moved Permanently (from disk cache)"
Do you have any code that would work?
Thank you :)

You can't.
The Same Origin Policy prevents cross-origin reads.
You can only perform read operations on your own domain.
For example: a script at https://foo.com/some-script.js/ can typically request a resource from https://foo.com/about-us, but not https://bar.com/about-us/.
If you think about it, this restriction is critical for keeping the web safe. For example, you wouldn't want any arbitrary site to be capable of accessing your bank account, would you?
If the owners of a website want to make a certain resource available to other domains, they can enable cross-origin resource sharing (see Mozilla's article on CORS for more information), but this is up to them.

Related

How can I read a csv file from my library in Qualtrics?

I'm new to Qualtrics, HTML, and Javascript, so feel free to let me know if what I'm asking for is impossible.
I'm trying to use Qualtrics to gather some data that relates applicant test scores and schools that the same applicants were admitted to. As part of my autocomplete function (this is needed to standardize user input-- don't want the problem of some people typing UCLA and others University of California, Los Angeles), I need Qualtrics to read a csv file of schools that is currently stored in my Files Library, and that convert that csv into an array. I'm having trouble getting it to read the file to begin with.
I've tried using ajax (still don't really know what it is- I'm bumbling, here). Here is my attempt with ajax in Javascript:
autocomplete: function() {
var availableTags;
jQuery.ajax({
type: "GET",
url: "illinoislas.qualtrics.com/CP/File.php?F=F_8Aek00I0KXkihUN",
dataType: "text",
success: function(result){
availableTags = jQuery.csv.toArrays(result);
}
});
jQuery(".InputText" ).autocomplete({
source: availableTags
});
}
As far as I can tell, the ajax request isn't succeeding. The url I provided it is the View button of the csv file in my library, and it's obviously not a csv, but I don't know how else to proceed. Any help will be greatly appreciated.
I tried a simple AJAX request on your 'csv' URL
$.ajax({
url: "https://illinoislas.qualtrics.com/CP/File.php?F=F_8Aek00I0KXkihUN",
method: "GET"
}).then(function(response) {
console.log(response);
});
... and ran into the CORS policy error in the console:
Access to XMLHttpRequest at 'https://illinoislas.qualtrics.com/CP/File.php?F=F_8Aek00I0KXkihUN' from origin 'null' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.
Cross-origin resource sharing (CORS) is a mechanism that allows restricted resources on a web page to be requested from another domain outside the domain from which the first resource was served. ... Certain "cross-domain" requests, notably Ajax requests, are forbidden by default by the same-origin security policy. https://en.wikipedia.org/wiki/Cross-origin_resource_sharing
Basically you can't do AJAX requests to external domains unless the server you are calling is designed to return a header saying it's okay. The same applies to iframes (as soon as you try to access the content of the iframe with javascript). It's a browser security mechanism to protect the user.
The easiest fix would be to make a copy of that 'csv' file and host it on the same domain as your page. You can still use AJAX URL: "./assets/schools.csv"
Also note, testing your AJAX response with a console.log(result) is an easy way to check you have data coming back from the request.

(CORS) - Cross-Origin Resource Sharing connection issue

I am currently in the process of creating a browser extension for a university project. However as I was writing down the extension I hit a really weird problem. To understand fully my situation I will need to describe it in debt from where my issue comes.
The extension that I am currently working on has to have a feature that checks if the browser can connect to the internet or not. That is why I decided to create a very simple AJAX request function and depending on the result returned by this function to determine if the user has internet connection or not.
That is why I created this very simple AJAX function that you can see bellow this line.
$.ajax({
url: "https://enable-cors.org/index.html",
crossDomain: true,
}).done(function() {
console.log("The link is active");
}).fail(function() {
console.log("Please try again later.");
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
So far, as long as I understand what it is doing, it is working fine. For example, if you run the function as it is, it will succsesfully connect to the url and process with the ".done(function..." if you change the url to "index273.index" a file which does not exist it will process with the ".fail(function...". I was happy with the result until I decided to test it further more and unpluged my cable out of my computer. Then when I launched the extension it returned the last result from when the browser had connection with the internet. My explanation why the function is doing this is because it is caching the url result and if it cannot connect it gives the last cached value. My next step to try and solve this was to add "cache: false" after the "crossDomain: true" property but after that when I launch the extension it gives the following error:
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://enable-cors.org/index?_=1538599523573. (Reason: CORS header 'Access-Control-Allow-Origin' missing).
If someone can help me out sorting this problem I would be extremely grateful. I would want to apologise in advance for my English but this is not my native language.
PS: I am trying to implement this function in the popup menu, not into the "content_scripts" category. I am currently testing this under Firefox v62.0.3 (the latest available version when I write this post).
Best regards,
George
Maybe instead of calling the URL to check if the internet connection is available you could try using Navigator object: https://developer.mozilla.org/en-US/docs/Web/API/Navigator/connection
unless the remote server allowed origin (allowed cors) then you can't access it because it's a security issue.
But there are other things you can do:
You can load image and fire event when an image is loaded
You can access remote JSON via JSONP response
but you can't access other pages because (unless that server allows it) it's a security issue.

Parse image url from JSON api

I am having problems getting an image URL from a Wordpress JSON API and fill in an image tag.
Here's my non-working code:
$(document).ready(function() {
$.getJSON('http://interelgroup.com/api/get_post/?post_id=4683', {format: "json"}, function(data) {
$('#thumb').attr("src", data.post.thumbnail_images.full.url);
});
});
And the HTML is like:
<img id="thumb" src="#">
What am I doing wrong?
Help appreciated.
Thanks!
NOTE: My real case is dynamic (I am getting a dynamic list of post IDs and looping through them with $.each()), but for this case I am providing an example with an hardcoded post ID, so you can check the JSON returned.
Your problem is because you can't do cross request using Javascript, say websiteA.com wants to fetch information from websiteB.com with a plain XMLHttpRequest. That's forbidden by the Access Control.
A resource makes a cross-origin HTTP request when it requests a resource from a different domain than the one which served itself. For example, an HTML page served from http://domain-a.com makes an <img> src request for http://domain-b.com/image.jpg. Many pages on the web today load resources like CSS stylesheets, images and scripts from separate domains.
For security reasons, browsers restrict cross-origin HTTP requests initiated from within scripts. For example, XMLHttpRequest follows the same-origin policy. So, a web application using XMLHttpRequest could only make HTTP requests to its own domain. To improve web applications, developers asked browser vendors to allow XMLHttpRequest to make cross-domain requests.
If you know the owner of the website you're trying to read, what you can do is asking them to add your domain to the whitelist in the page headers. If you do so, then you can do as much as $.getJSON as you want.
Another alternative could be using some sort of backend code to read that website and serve it locally. Say your website is example.com, you could have a PHP script that runs on example.com/retrieve.php where you can query that website, adding the "parameter" you need. After that, since example.com is your own website you can just do a $.getJSON to yourself. There's a simple PHP proxy you can use here with a bit of explanation on why you can do it this way.
A third option would be editing the Javascript code to use Yahoo! YQL service. Although there's no guarantee that'll work forever, you can use it to query the website on your behalf and then use it to print whatever you want to the screen. The downside is that this maybe is not ethically correct if you do not own the website you're trying to fetch (plus, they can add a robots.txt file preventing the access).
Hope that helps.
JSONP solves the problem. Just need to add a callback parameter and specify it is a JSONP, like:
$(document).ready(function() {
$.getJSON('http://interelgroup.com/api/get_post/?post_id=4683&callback=?', {format: "jsonp"}, function(data) {
$('#thumb').attr("src", data.post.thumbnail_images.full.url);
});
});
More info here: Changing getJSON to JSONP
Info on JSONP: https://en.wikipedia.org/wiki/JSONP

How to fetch a Wikipedia webpage with AJAX or fetch()

I want to dynamically fetch a Wikipedia webpage in the browser in order to further process the XHTML with XSLTProcessor.
Unfortunately, this does not work because I can't get Wikipedia to send the "Access-Control-Allow-Origin" header in the HTTP response.
I tried to include the "origin" parameter as it is stated on https://www.mediawiki.org/wiki/Manual:CORS, but without success.
It is important to me to obtain the complete web page HTML as it is obtained by the browser when navigating to that page, so the MediaWiki API is out of the question for me.
This is what I have tried:
var url = "https://en.wikipedia.org/wiki/Star_Trek?origin=https://my-own-page.com";
fetch(url).then(function(response){
console.log(response);
});
Unfortunately, this does not work because I can't get Wikipedia to send the "Access-Control-Allow-Origin" header in the HTTP response.
No, you can't. It is up to Wikipedia to decide if they want to explicitly grant permission to JavaScript running on other sites access to their pages.
Since this would allow users' personal information to leak (e.g. logged in Wikipedia pages display the user's username, which could be used to enhance a phishing attack), this is clearly something undesirable.
var url = "https://en.wikipedia.org/wiki/Star_Trek?origin=https://my-own-page.com";
origin is an HTTP request header, not a query string parameter, and is automatically included in cross origin XMLHttpRequest/fetch requests without you needing to do anything special.

Steam API Get SteamID using Javascript

Been running into what appears to be the Same Origin Policy which is causing quite some headache!
To cut to the chase, I am essentially trying to acquire a user's steam64id when only supplied their username.
For example, my username: "Emperor_Jordan" I would go to:
http://steamcommunity.com/id/emperor_jordan?xml=1
And the steamid I need is right at the top. So I figured I would use JQuery Ajax to acquire this and parse out the id I need for later usage (steamapi usage requires the steam64id) as follows. Here is a snippet of the code in question:
$.ajax({
url: "http://steamcommunity.com/id/emperor_jordan/?xml=1",
datatype: "xml",
complete: function()
{
alert(this.url)
},
success: parse
});
function parse(xml)
{
alert("parsing");
_steamID = $(xml).find("steamID64").text();
}
The problem here is while I do get the alert for the completion, I never see "parsing". Ever. It never gets that callback, which leads me to believe I am running into the SOP (same origin policy).
Am I approaching this the wrong way, is there a workaround?
Thanks!
Correct. You are running into the same-origin policy:
XMLHttpRequest cannot load http://steamcommunity.com/id/emperor_jordan/?xml=1. Origin http://fiddle.jshell.net is not allowed by Access-Control-Allow-Origin.
and it looks like Steam does not offer a cross-origin solution like JSONP. That means you're back to the old-but-reliable solution: fetch the data on your server, not in the browser.
Some relevant feedback on the Steam Web API: https://developer.valvesoftware.com/wiki/Steam_Web_API/Feedback#API_Considerations_for_Web_Developers
You need to create a proxy server in Heroku in order to get the data. Cors is restricting us to call the data directly to our browser not server to server interaction. So we need a proxy server to send the requests and receive the data on our behalf. It's working for me.
Thanks in advance.

Categories

Resources