I'm attempting to download a CSV file through Yahoo Finance with this code.
$(function () {
$(document).ready(function() {
$.get("http://download.finance.yahoo.com/d/quotes.csv?f=snl1d1t1c1ohg&s=AAPL", function(data) {
var output = data.split(new RegExp(",|\r")).map(function (element) {
alert($.trim(element).toLowerCase());
return $.trim(element).toLowerCase();
});
});
});
});
You can see I put the alert in there (for debugging purposes) but I'm not getting the alert. Is there something wrong with this code? (some of the code was taken from how to create an array by reading text file in javascript)
Here's a jsFiddle for easy edits/help.
This is blocked by same-origin policy.
Options:
find other service that provides access to the data with JSONP or have CORS enabled for the data source.
use server side proxy to read data
Check this out with php, you can tailor make it to suit your needs.
function queryphp($url)
{
$portal = curl_init();
curl_setopt($portal, CURLOPT_URL, $url);
curl_setopt($portal, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($portal);
if(!($output))
header('Location: http://www.yourwebsite.com/errorpage.php');
curl_close($portal);
return $output;
}//example usage:
//$page_data = queryphp("http://www.whatever.com/whateverpage.php[?var1=whatever&var2=whatever"]);
//now you have the output from whateverpage.php saved as a string; which you can append anywhere to your current page's output. #repetitive code reduction
Related
I try to load a page with curl to work with the sourcode.
Now i do the following thinks
// curl handle
$curl = curl_init('https://www.example.de/page.html');
// curl options
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_FILETIME, true);
// catch sourcecode
$content = curl_exec($curl);
curl_close($curl);
// make a dom-object
$dom = new DomDocument();
$dom-> loadHTML($content);
#$dom-> loadHTMLFile($content);
$xpath = new DOMXpath($dom);
As result i get not the hole sourcecode, only the javascript part of the website before this generate the main html-sourcecode.
If i print_r($content) i will see the correct copy of the website url.
Maybe this output can help "DOMXPath Object ( [document] => (object value omitted) ) " - In Mainresult it makes no different if i use loadHTML or loadHTMLFile both wont generate a workable dom-object.
What i did wrong?
As result i get not the hole sourcecode, only the javascript part of the website before this generate the main html-sourcecode.
So it seams that you are requesting a website which is mainly driven by JS and does not include the resulting HTML/DOM within the markup. PHP cannot »execute« the website, what includes running the Javascript such that the DOM will be generated. You can try to use a headless Browser like Phantom to run the page and extract the resulting HTML from there. But I guess that wont give you the result you are looking for, f.i.: Event Listeners attached by the JS wont work if you just export the markup. Additionally, when delivered to client, the JS will run again, what might result in unexpected things like duplicated content etc.
I am attempting to scrape data from a new window (using Javascript's window.open()) that is generated by the site I am posting to via cUrl, but I am unsure how to go about this.
The target site only generates this needed data when certain parameters are posted to it, and no other way.
The following code simply dumps the result of the cUrl request, but the result does not contain any data that is relevant.
My code:
//build post data for request
$proofData = array("formula" => $formula,
"proof" => $proof,
"action" => $action);
$postProofData = http_build_query($proofData);
$ch = curl_init($url); //open connection
//sort curl settings for request
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, 3);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postProofData);
//obtain data from LigLab
$result = curl_exec($ch);
//finish connection
curl_close($ch);
echo "forumla: " . $formula;
var_dump($result);
The following code is what is generated
Target site's code:
var proof = "<?php echo str_replace("\n","|",$annoted_proof) ?>";
var lines = proof.split('|');
proof_window=window.open("","Proof and Justifications","scrollbar=yes,resizable=yes, titlebar=yes,menubar=yes,status=yes,width= 800, height=800, alwaysRaised=yes");
for(var i = 0;i < lines.length;i++){
proof_window.document.write(lines[i]);
proof_window.document.write("\n");
}
I want to scrape the lines variable but it is generated after page load and after user interaction.
You can't parse processed javascript code with curl.
You have to use a headless browser, which emulates a real browser with events (clicks, hover and javascript code)
you can start here http://www.simpletest.org/en/browser_documentation.html or here PHP Headless Browser?
Ok, so for about a week now I've been doing tons of research on making xmlhttprequests to servers and have learned a lot about CORS, ajax/jquery request, google feed api, and I am still completely lost.
The Goal:
There are 2 sites in the picture, both I have access to, the first one is a wordpress site which has the rss feed and the other is my localhost site running off of xampp (soon to be a published site when I'm done). I am trying to get the rss feed from the wordpress site and display it on my localhost site.
The Issue:
I run into the infamous Access-Control-Allow-Origin error in the console and I know that I can fix that by setting it in the .htaccess file of the website but there are online aggregators that are able to just read and display it when I give them the link. So I don't really know what those sites are doing that I'm not, and what is the best way to achieve this without posing any easy security threats to both sites.
I highly prefer not to have to use any third party plugins to do this, I would like to aggregate the feed through my own code as I have done for an rss feed on the localhost site, but if I have to I will.
UPDATE:
I've made HUGE progress with learning php and have finally got a working bit of code that will allow me to download the feed files from their various sources, as well as being able to store them in cache files on the server. What I have done is set an AJAX request behind some buttons on my site which switches between the rss feeds. The AJAX request POSTs a JSON encoded array containing some data to my php file, which then downloads the requested feed via cURL (http_get_contents copied from a Github dev as I don't know how to use cURL yet) link and stores it in a md5 encoded cache file, then it filters what I need from the data and sends it back to the front end. However, I have two more questions... (Its funny how that works, getting one answer and ending up with two more questions).
Question #1: Where should I store both the cache files and the php files on the server? I heard that you are supposed to store them below the root but I am not sure how to access them that way.
Question #2: When I look at the source of the site through the browser as I click the buttons which send an ajax request to the php file, the php file is visibly downloaded to the list of source files but also it downloads more and more copies of the php file as you click the buttons, is there a way to prevent this? I may have to implement another method to get this working.
Here is my working php:
//cURL http_get_contents declaration
<?php
function http_get_contents($url, $opts = array()) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_setopt($ch, CURLOPT_USERAGENT, "{$_SERVER['SERVER_NAME']}");
curl_setopt($ch, CURLOPT_URL, $url);
if (is_array($opts) && $opts) {
foreach ($opts as $key => $val) {
curl_setopt($ch, $key, $val);
}
}
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
if (false === ($retval = curl_exec($ch))) {
die(curl_error($ch));
} else {
return $retval;
}
}
//receive and decode $_POSTed array
$post = json_decode($_POST['jsonString'], true);
$url = $post[0];
$xmn = $post[1]; //starting item index number (i.e. to return 3 items from the feed, starting with the 5th one)
$xmx = $xmn + 3; //max number (so three in total to be returned)
$cache = '/tmp/' . md5($url) . '.html';
$cacheint = 0; //this is how I set if the feed will be downloaded from the site it is from, or if it will be read from the cache file, I will implement a way to check if there is a newer version of the file on the other site in the future
//if the cache file doesn't exist, download feed and write contents to cache file
if(!file_exists($cache) || ((time() - filemtime($cache)) > 3600 * $cacheint)) {
$feed_content = http_get_contents($url);
if($feed_content = http_get_contents($url)) {
$fp = fopen($cache, 'w');
fwrite($fp, $feed_content);
fclose($fp);
}
}
//parse and echo results
$content = file_get_contents($cache);
$x = new SimpleXmlElement($content);
$item = $x->channel->item;
echo '<tr>';
for($i = $xmn; $i < $xmx; $i++) {
echo '<td class="item"><p class="title clear">' .
$item[$i]->title .
'</p><p class="desc">' .
$desc=substr($item[$i]->description, 0, 250) .
'... <a href="' .
$item[$i]->link .
'" target="_blank">more</a></p><p class="date">' .
$item[$i]->pubDate .
'</p></td>';
}
echo '</tr>';
?>
I noticed that at http://avengersalliance.wikia.com/wiki/File:Effect_Icon_186.png, there is an image (a small one) there. Click on it, you will be brought to another page: http://img2.wikia.nocookie.net/__cb20140312005948/avengersalliance/images/f/f1/Effect_Icon_186.png.
For http://avengersalliance.wikia.com/wiki/File:Effect_Icon_187.png, after clicking on the image there, you are brought to another page: http://img4.wikia.nocookie.net/__cb20140313020718/avengersalliance/images/0/0c/Effect_Icon_187.png
There are many similar sites, from http://avengersalliance.wikia.com/wiki/File:Effect_Icon_001.png, to http://avengersalliance.wikia.com/wiki/File:Effect_Icon_190.png (the last one).
I'm not sure if the image link is somewhat related to the link of its parent site, but may I know, is it possible to get http://img2.wikia.nocookie.net/__cb20140312005948/avengersalliance/images/f/f1/Effect_Icon_186.png string, from the string http://avengersalliance.wikia.com/wiki/File:Effect_Icon_186.png, using PHP or JavaScript? I would appreciate your help.
Here is a small PHP script that can do this. It uses CURL to get content and DOMDocument to parse HTML.
<?php
/*
* For educational purposes only
*/
function get_wiki_image($url = '') {
if(empty($url)) return;
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$output = curl_exec($curl);
curl_close($curl);
$DOM = new DOMDocument;
libxml_use_internal_errors(true);
$DOM->loadHTML($output);
libxml_use_internal_errors(false);
return $DOM->getElementById('file')->firstChild->firstChild->getAttribute('src');
}
echo get_wiki_image('http://avengersalliance.wikia.com/wiki/File%3aEffect_Icon_186.png');
You can access for example by class and then select the one than you want with [n], after that getAttribute and you got it
document.getElementsByClassName('icon cup')[0].getAttribute('src')
Hope it helps
Lets say I have this weblink: http://checkip.amazonaws.com/ . I've made a java program (actual program. not webpage) that reads the content of this webpage (eg. "25.25.25.25") and displays it in a jLabel (Using Netbeans IDE 1.7.3) and it works.
Now how can I read the contents of this same webpage (eg. "25.25.25.25") and display it as normal text on a webpage (The final webpage must be .html not .php or what ever)?
I dont mind any script whether is html or javascript or anything, I just please need it to work so that when the webpage is opened it can read something like:
"Your IP: 25.25.25.25"
Preferably reading the contents of http://checkip.amazonaws.com/ into
<script>var ip = needCodeHere</script>
If I can get the IP into a var or read the contents of that webpage into a var I'm happy but other code is happy to as long as it works.
Please help :( been staring at google for days and cant find a solution)
You'll need 3 files (in the same directory) to do that. A HTML file to show the ip, a PHP file to get that ip via curl, and a JS file to connect the html and the php. It would be simpler if the "final webpage" could be the ip.php itself, but let's do it this way:
1) ip.html (the "final webpage")
<script src="//code.jquery.com/jquery-1.11.0.min.js"></script>
<script type="text/javascript" src="ip.js"></script>
<div id="ip"></div>
2) ip.php
<?php
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, 'http://checkip.amazonaws.com');
$result = curl_exec($curl);
curl_close($curl);
?>
3) ip.js
$.ajax({
url: 'ip.php',
type: "GET",
success: function (data) {
$('#ip').html("Your IP: " + data);
}
});
Let me know if you need more explanation.