I am using Facebook Graph API to get contents from a Facebook fan page and then display them into a website. I am doing it like this, and it is working, but somehow, it seems that my hosting provider is limiting my requests every certain time.... So I would like to cache the response and only ask for a new request every 8h for example.
$data = get_data("https://graph.facebook.com/12345678/posts?access_token=1111112222233333&limit=20&fields=full_picture,link,message,likes,comments&date_format=U");
$result = json_decode($data);
The get_data function uses CURL in the following way:
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$datos = curl_exec($ch);
curl_close($ch);
return $datos;
}
This works fine, I can output the JSON data response and use it as I like into my website to display the content. But as I mention, in my hosting, this seems to fail every X time, I guess because I am getting limited. I have tried to cache the response using some code I saw here at Stackoverflow. But I cannot figure out how to integrate and use both codes. I have managed to create the cache file, but I cannot manage to read correctly from the cached file and avoid making a new request to Facebook graph API.
// cache files are created like cache/abcdef123456...
$cacheFile = 'cache' . DIRECTORY_SEPARATOR . md5($url);
if (file_exists($cacheFile)) {
$fh = fopen($cacheFile, 'r');
$cacheTime = trim(fgets($fh));
// if data was cached recently, return cached data
if ($cacheTime > strtotime('-60 minutes')) {
return fread($fh);
}
// else delete cache file
fclose($fh);
unlink($cacheFile);
}
$fh = fopen($cacheFile, 'w');
fwrite($fh, time() . "\n");
fwrite($fh, $json);
fclose($fh);
return $json;
Many thanks in advance for your help!
There are some thinks that could come in handy when trying to construct cache and to cache actual object (or even arrays).
The functions serialize and unserialize allows you to get a string representation of an object or of an array so you can cache it as plain text and then pop the object/array as it was before from the string.
filectime which allows you to get the last modification date of a file, so when it is created, you can rely on this information to see if your cache is outdated like you tried to implement it.
And for the whole working code, there you go :
function get_data($url) {
/** #var $cache_file is path/to/the/cache/file/based/on/md5/url */
$cache_file = 'cache' . DIRECTORY_SEPARATOR . md5($url);
if(file_exists($cache_file)){
/**
* Using the last modification date of the cache file to check its validity
*/
if(filectime($cache_file) < strtotime('-60 minutes')){
unlink($cache_file);
} else {
echo 'TRACE -- REMOVE ME -- out of cache';
/**
* unserializing the object on the cache file
* so it gets is original "shape" : object, array, ...
*/
return unserialize(file_get_contents($cache_file));
}
}
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$data = curl_exec($ch);
curl_close($ch);
/**
* We actually did the curl call so we need to (re)create the cache file
* with the string representation of our curl return we got from serialize
*/
file_put_contents($cache_file, serialize($data));
return $data;
}
PS : note that I changed the $datos variable on your actual function get_data to a more common $data.
This answer will add a few more dependencies to your project, but it may be well worth it instead of rolling your own stuff.
You could use the Guzzle HTTP client, coupled with the HTTP Cache plugin.
$client = new Client('http://www.test.com/');
$cachePlugin = new CachePlugin(array(
'storage' => new DefaultCacheStorage(
new DoctrineCacheAdapter(
new FilesystemCache('/path/to/cache/files')
)
)
));
$client->addSubscriber($cachePlugin);
$request = $client->get('https://graph.facebook.com/12345678/posts?access_token=1111112222233333&limit=20&fields=full_picture,link,message,likes,comments&date_format=U');
$request->getParams()->set('cache.override_ttl', 3600*8); // 8hrs
$data = $request->send()->getBody();
$result = json_decode($data);
Not sure is you can use memcache, if you can:
$cacheFile = 'cache' . DIRECTORY_SEPARATOR . md5($url);
$mem = new Memcached();
$mem->addServer("127.0.0.1", 11211);
$cached = $mem->get($cacheFile);
if($cached){
return $cached;
}
else{
$data = get_data($url);
$mem->set($cacheFile, json_encode($data), time() + 60*10); //10 min
}
If your hosting provider is pushing all of your outbound requests through a proxy server -- you can try to defeat it by adding an extra parameter near the beginning of the request :
https://graph.facebook.com/12345678/posts?p=(randomstring)&access_token=1111112222233333&limit=20&fields=full_picture,link,message,likes,comments&date_format=U
I have used this successfully for outbound calls to third party data providers. Of course I don't know if your particular issue is this issue. You could also be bitten by the provider if they reject requests with parameters they don't expect.
Related
I m working on open source project (Tao test assestment). In this application you can make some item/question and you can preview that question. With those questions you can build your test, but problem is there is no test preview or print test.
I m trying to find a way to do this. One of my solution was to take all the items from test make a loop and each one of them call in iframe then combine them in one view. That did not work of course.
My question is. Is there some possibility to make request to my own server from php code to get html content and from php code to combine them in one view ?
I saw some functions like #file_get_contents but this returns me false. In php.ini file this are my configuration
allow_url_fopen = On
allow_url_include = Off
EDIT:
So this is preview of one item/question.
EDIT:
One test is consisted of items and problem is there is no option to preview them, except click on each one and see them. So i tried with opening each item/question in seperate iframe so i can see them all in one.
If the file is on the same server, i.e. in the same location where PHP is installed:
1 Option:
<?php
$filename = "path_to_file";
$file = fopen( $filename, "r" );
if( $file == false ) {
echo ( "Error in opening file" );
exit();
}
$filesize = filesize( $filename );
$filetext = fread( $file, $filesize );
fclose( $file );
echo ( "File size : $filesize bytes" );
echo ( "<pre>$filetext</pre>" );
?>
2 Option:
<?php
include "path_to_file.php"
?>
If the file is located on another server:
<?php
$ch = curl_init();
$timeout = 5; // set to zero for no timeout
curl_setopt ($ch, CURLOPT_URL, 'http://server.com/path_to_file');
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$file_contents = curl_exec($ch);
curl_close($ch);
// display file
echo $file_contents;
?>
I try to load a page with curl to work with the sourcode.
Now i do the following thinks
// curl handle
$curl = curl_init('https://www.example.de/page.html');
// curl options
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_FILETIME, true);
// catch sourcecode
$content = curl_exec($curl);
curl_close($curl);
// make a dom-object
$dom = new DomDocument();
$dom-> loadHTML($content);
#$dom-> loadHTMLFile($content);
$xpath = new DOMXpath($dom);
As result i get not the hole sourcecode, only the javascript part of the website before this generate the main html-sourcecode.
If i print_r($content) i will see the correct copy of the website url.
Maybe this output can help "DOMXPath Object ( [document] => (object value omitted) ) " - In Mainresult it makes no different if i use loadHTML or loadHTMLFile both wont generate a workable dom-object.
What i did wrong?
As result i get not the hole sourcecode, only the javascript part of the website before this generate the main html-sourcecode.
So it seams that you are requesting a website which is mainly driven by JS and does not include the resulting HTML/DOM within the markup. PHP cannot »execute« the website, what includes running the Javascript such that the DOM will be generated. You can try to use a headless Browser like Phantom to run the page and extract the resulting HTML from there. But I guess that wont give you the result you are looking for, f.i.: Event Listeners attached by the JS wont work if you just export the markup. Additionally, when delivered to client, the JS will run again, what might result in unexpected things like duplicated content etc.
I am attempting to scrape data from a new window (using Javascript's window.open()) that is generated by the site I am posting to via cUrl, but I am unsure how to go about this.
The target site only generates this needed data when certain parameters are posted to it, and no other way.
The following code simply dumps the result of the cUrl request, but the result does not contain any data that is relevant.
My code:
//build post data for request
$proofData = array("formula" => $formula,
"proof" => $proof,
"action" => $action);
$postProofData = http_build_query($proofData);
$ch = curl_init($url); //open connection
//sort curl settings for request
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, 3);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postProofData);
//obtain data from LigLab
$result = curl_exec($ch);
//finish connection
curl_close($ch);
echo "forumla: " . $formula;
var_dump($result);
The following code is what is generated
Target site's code:
var proof = "<?php echo str_replace("\n","|",$annoted_proof) ?>";
var lines = proof.split('|');
proof_window=window.open("","Proof and Justifications","scrollbar=yes,resizable=yes, titlebar=yes,menubar=yes,status=yes,width= 800, height=800, alwaysRaised=yes");
for(var i = 0;i < lines.length;i++){
proof_window.document.write(lines[i]);
proof_window.document.write("\n");
}
I want to scrape the lines variable but it is generated after page load and after user interaction.
You can't parse processed javascript code with curl.
You have to use a headless browser, which emulates a real browser with events (clicks, hover and javascript code)
you can start here http://www.simpletest.org/en/browser_documentation.html or here PHP Headless Browser?
Ok, so for about a week now I've been doing tons of research on making xmlhttprequests to servers and have learned a lot about CORS, ajax/jquery request, google feed api, and I am still completely lost.
The Goal:
There are 2 sites in the picture, both I have access to, the first one is a wordpress site which has the rss feed and the other is my localhost site running off of xampp (soon to be a published site when I'm done). I am trying to get the rss feed from the wordpress site and display it on my localhost site.
The Issue:
I run into the infamous Access-Control-Allow-Origin error in the console and I know that I can fix that by setting it in the .htaccess file of the website but there are online aggregators that are able to just read and display it when I give them the link. So I don't really know what those sites are doing that I'm not, and what is the best way to achieve this without posing any easy security threats to both sites.
I highly prefer not to have to use any third party plugins to do this, I would like to aggregate the feed through my own code as I have done for an rss feed on the localhost site, but if I have to I will.
UPDATE:
I've made HUGE progress with learning php and have finally got a working bit of code that will allow me to download the feed files from their various sources, as well as being able to store them in cache files on the server. What I have done is set an AJAX request behind some buttons on my site which switches between the rss feeds. The AJAX request POSTs a JSON encoded array containing some data to my php file, which then downloads the requested feed via cURL (http_get_contents copied from a Github dev as I don't know how to use cURL yet) link and stores it in a md5 encoded cache file, then it filters what I need from the data and sends it back to the front end. However, I have two more questions... (Its funny how that works, getting one answer and ending up with two more questions).
Question #1: Where should I store both the cache files and the php files on the server? I heard that you are supposed to store them below the root but I am not sure how to access them that way.
Question #2: When I look at the source of the site through the browser as I click the buttons which send an ajax request to the php file, the php file is visibly downloaded to the list of source files but also it downloads more and more copies of the php file as you click the buttons, is there a way to prevent this? I may have to implement another method to get this working.
Here is my working php:
//cURL http_get_contents declaration
<?php
function http_get_contents($url, $opts = array()) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_setopt($ch, CURLOPT_USERAGENT, "{$_SERVER['SERVER_NAME']}");
curl_setopt($ch, CURLOPT_URL, $url);
if (is_array($opts) && $opts) {
foreach ($opts as $key => $val) {
curl_setopt($ch, $key, $val);
}
}
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
if (false === ($retval = curl_exec($ch))) {
die(curl_error($ch));
} else {
return $retval;
}
}
//receive and decode $_POSTed array
$post = json_decode($_POST['jsonString'], true);
$url = $post[0];
$xmn = $post[1]; //starting item index number (i.e. to return 3 items from the feed, starting with the 5th one)
$xmx = $xmn + 3; //max number (so three in total to be returned)
$cache = '/tmp/' . md5($url) . '.html';
$cacheint = 0; //this is how I set if the feed will be downloaded from the site it is from, or if it will be read from the cache file, I will implement a way to check if there is a newer version of the file on the other site in the future
//if the cache file doesn't exist, download feed and write contents to cache file
if(!file_exists($cache) || ((time() - filemtime($cache)) > 3600 * $cacheint)) {
$feed_content = http_get_contents($url);
if($feed_content = http_get_contents($url)) {
$fp = fopen($cache, 'w');
fwrite($fp, $feed_content);
fclose($fp);
}
}
//parse and echo results
$content = file_get_contents($cache);
$x = new SimpleXmlElement($content);
$item = $x->channel->item;
echo '<tr>';
for($i = $xmn; $i < $xmx; $i++) {
echo '<td class="item"><p class="title clear">' .
$item[$i]->title .
'</p><p class="desc">' .
$desc=substr($item[$i]->description, 0, 250) .
'... <a href="' .
$item[$i]->link .
'" target="_blank">more</a></p><p class="date">' .
$item[$i]->pubDate .
'</p></td>';
}
echo '</tr>';
?>
I noticed that at http://avengersalliance.wikia.com/wiki/File:Effect_Icon_186.png, there is an image (a small one) there. Click on it, you will be brought to another page: http://img2.wikia.nocookie.net/__cb20140312005948/avengersalliance/images/f/f1/Effect_Icon_186.png.
For http://avengersalliance.wikia.com/wiki/File:Effect_Icon_187.png, after clicking on the image there, you are brought to another page: http://img4.wikia.nocookie.net/__cb20140313020718/avengersalliance/images/0/0c/Effect_Icon_187.png
There are many similar sites, from http://avengersalliance.wikia.com/wiki/File:Effect_Icon_001.png, to http://avengersalliance.wikia.com/wiki/File:Effect_Icon_190.png (the last one).
I'm not sure if the image link is somewhat related to the link of its parent site, but may I know, is it possible to get http://img2.wikia.nocookie.net/__cb20140312005948/avengersalliance/images/f/f1/Effect_Icon_186.png string, from the string http://avengersalliance.wikia.com/wiki/File:Effect_Icon_186.png, using PHP or JavaScript? I would appreciate your help.
Here is a small PHP script that can do this. It uses CURL to get content and DOMDocument to parse HTML.
<?php
/*
* For educational purposes only
*/
function get_wiki_image($url = '') {
if(empty($url)) return;
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$output = curl_exec($curl);
curl_close($curl);
$DOM = new DOMDocument;
libxml_use_internal_errors(true);
$DOM->loadHTML($output);
libxml_use_internal_errors(false);
return $DOM->getElementById('file')->firstChild->firstChild->getAttribute('src');
}
echo get_wiki_image('http://avengersalliance.wikia.com/wiki/File%3aEffect_Icon_186.png');
You can access for example by class and then select the one than you want with [n], after that getAttribute and you got it
document.getElementsByClassName('icon cup')[0].getAttribute('src')
Hope it helps