Parse from javascript var with SimpleHTMLDom - javascript

I have this code that outputs me source page of source URL with curl!
$url = 'http://source-page.com';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // add this one, it seems to spawn redirect 301 header
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13'); // spoof
$output = curl_exec($ch);
curl_close($ch);
$html = str_get_html($output);
In $output i have this:
var flashvars = {
"image_url":"http://path-to-image.com",
"video_title":"This is video title",
"videoUrl":"http://this-is-path-to-mp4.com"
}
I want to echo videoUrl and I have tried with this:
$videoUrl = $html->find('flashvars[0].videoUrl');
echo $videoUrl
And is giving me empty results. What is a good code for doing that?

Someone else suggessted regex + json_decode and then deleted it.
Here's what I would do:
$output = <<<EOF
var flashvars = {
"image_url":"http://path-to-image.com",
"video_title":"This is video title",
"videoUrl":"http://this-is-path-to-mp4.com"
}
EOF;
$str = preg_match('/var flashvars = (\{.*?\})/s', $output, $m);
$data = json_decode($m[1], true);
echo $data['videoUrl'];

Related

How to call a JavaScript function from PHP if click and curl?

I have a little project in which I try to fetch data from a domain and put this information in input fields.
The Curl function is good and working. However, the jQuery script if not working or filling the input fields. If I use $url = "http://domain..."; , all is working on page load but if I use an input field with a button and post form, the fields are empty. The curl is working and gives the full page back.
How I can load the script with the same button but after load the curl script?
Button:
<form action="" method="POST">
<label for="name">URLinput</label>
<input type="url" id="inf_endpoint" name="inf_endpoint" value="" />
<button type="submit" name="mytest">Test This</button>
</form>
What I have tried:
<?php
if(isset($_POST['mytest'])){
$url=$_POST['inf_endpoint'];
$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL, html_entity_decode($url));
$data = curl_exec($ch);
if($result === false){
echo 'Curl error: ' . curl_error($ch);
}else{
echo 'All is good';
?>
<script>
jQuery.ajax({
url: '<?php echo site_url('admin/matches/manage'); ?>',
type: 'GET',
success: function(res) {
var data = jQuery.parseHTML(res);
jQuery(data).find('div.right').each(function(){
$('#date').val(jQuery(this).html());
});
var data = jQuery.parseHTML(res);
jQuery(data).find('div.team a:first').each(function(){
$('#team1').val(jQuery(this).html());
});
var data = jQuery.parseHTML(res);
jQuery(data).find('div.team a:nth-child(2)').each(function(){
$('#team2').val(jQuery(this).html());
});
var data = jQuery.parseHTML(res);
jQuery(data).find('div.match_head .left a:first').each(function(){
$('#league').val(jQuery(this).html());
});
var data = jQuery.parseHTML(res);
jQuery(data).find('div.score').each(function(){
$('#result').val(jQuery(this).html());
});
var data = jQuery.parseHTML(res);
jQuery(data).find('div.team_logo a:first').each(function(){
$('#logo1').val(jQuery(this).html());
});
var data = jQuery.parseHTML(res);
jQuery(data).find('div.oppo2 a:first').each(function(){
$('#logo2').val(jQuery(this).html());
});
}
});
</script>
<?php
}
curl_close($ch);
echo $data;
}
?>
But this is working on Page load. But not with a if statement with click
$url="https://thedomain";
$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL, html_entity_decode($url));
$data = curl_exec($ch);
if($result === false){
echo 'Curl error: ' . curl_error($ch);
}else{
echo 'All is good';
}
curl_close($ch);
echo $data;
and
jQuery.ajax({
url: '<?php echo site_url('admin/matches/manage'); ?>',
type: 'GET',
success: function(res) {
var data = jQuery.parseHTML(res);
jQuery(data).find('div.right').each(function(){
$('#date').val(jQuery(this).html());
});
var data = jQuery.parseHTML(res);
jQuery(data).find('div.team a:first').each(function(){
$('#team1').val(jQuery(this).html());
});
var data = jQuery.parseHTML(res);
jQuery(data).find('div.team a:nth-child(2)').each(function(){
$('#team2').val(jQuery(this).html());
});
var data = jQuery.parseHTML(res);
jQuery(data).find('div.match_head .left a:first').each(function(){
$('#league').val(jQuery(this).html());
});
var data = jQuery.parseHTML(res);
jQuery(data).find('div.score').each(function(){
$('#result').val(jQuery(this).html());
});
var data = jQuery.parseHTML(res);
jQuery(data).find('div.team_logo a:first').each(function(){
$('#logo1').val(jQuery(this).html());
});
var data = jQuery.parseHTML(res);
jQuery(data).find('div.oppo2 a:first').each(function(){
$('#logo2').val(jQuery(this).html());
});
}
});
Without click function > The Page is loading and fill the fields.
cURL is a security risk even when your dealing with your servers but let me try to point out some items. Someone can get between you and your curl, magic can happen.
Your first line I don't think it's doing the proper check
<?php
if(isset($_POST['mytest'])){
change to
<?php
if(isset($_POST['Submit'])){
or even better to
if($_SERVER['REQUEST_METHOD']=='POST'){
Secondly, I can't see where $result is being set, change the following
if($result === false){
to
if(empty($data)){
I hope that solves missing points

Scraping data in dynamic sites

I'm trying to scrape data from our local government. What I want is address from kids adoption offices. Here, in Brazil, all adoptions go through the government. So I have the URL of one office, there are 2 or 3 thousands more. But if I can manage to get one, the others will be easy.
I made many attempts, bellow I show three.
The problem could be related to a Javascript (Ajax maybe) that refresh the page.
Note: I am not a PHP developer.
First attempt
echo '<html><head></head><body>';
echo '<h1>Scraper PHP GET 1</h1>';
echo ini_get("allow_url_fopen");
echo ini_get("allow_url_fopen");
// I used this url for test
//$url = 'http://www.portaldaadocao.com.br';
//This is the URL that I really want
$url = 'http://www.cnj.jus.br/cna/Controle/ConsultaPublicaBuscaControle.php?transacao=CONSULTA&vara=2673';
$html = file_get_contents($url);
var_dump($html);
echo '</body></html>';
// Output
// 11
// Warning:
file_get_contents(http://www.cnj.jus.br/cna/Controle/ConsultaPublicaBuscaControle.php?
transacao=CONSULTA&vara=2673) [function.file-get-contents]: failed to open stream: HTTP
request failed! HTTP/1.1 404 Not Found in /home/rsl/www/sc01_get.php on line 14
// bool(false)
Second attempt
echo '<html><head></head><body>';
echo '<h1>Scraper PHP CURL 3</h1>';
// I used this url for test
//$url = 'http://www.portaldaadocao.com.br';
//This is the URL that I really want
$url = 'http://www.cnj.jus.br/cna/Controle/ConsultaPublicaBuscaControle.php?transacao=CONSULTA&vara=2673';
$curl = curl_init($url);
#curl_setopt($curl, CURLOPT_POSTFIELDS, "foo");
#curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
#curl_setopt($curl, CURLOPT_CUSTOMREQUEST, "POST");;
$html=#curl_exec($curl);
if (!$html) {
echo "<br />cURL error number:" .curl_errno($curl);
echo "<br />cURL error:" . curl_error($curl);
exit;
}
else{
echo '<br>begin HTML[';
echo $html;
echo '<br>]end html ';
}
echo '</body></html>';
// Output
// 1
third attempt
function curl($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6');
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_REFERER, "http://www.windowsphone.com");
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
echo '<html><head></head><body>';
echo '<h1>Scraper PHP CURL 5</h1>';
// I used this url for test
//$url = 'http://www.portaldaadocao.com.br';
//This is the URL that I really want
$url = 'http://www.cnj.jus.br/cna/Controle/ConsultaPublicaBuscaControle.php?transacao=CONSULTA&vara=2673';
$curl = curl_init($url);
#curl_setopt($curl, CURLOPT_POSTFIELDS, "foo");
#curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
#curl_setopt($curl, CURLOPT_CUSTOMREQUEST, "POST");;
$html=#curl($curl);
if (!$html) {
echo "<br />cURL error number:" .curl_errno($curl);
echo "<br />cURL error:" . curl_error($curl);
exit;
}
else{
echo '<br>begin HTML[';
echo $html;
echo '<br>]end html ';
}
echo '</body></html>';
// Output
// cURL error number:0
// cURL error:

Saving facebook profile picture takes too long

I have a simple Christmas image editor website, and I need to save user's facebook profile picture in order to do it. It works, but takes way too long (between 15-30 seconds), and I have no idea why.
I use javascript to deal with the login stuff, and after that, I use user's id to get the profile picture. I believe the issue occurs after that:
Since this url is not the real path to the image, I have to redirect it first, and then save it. This is my PHP code:
<?php
//just getting the file from the URL
$file = explode('/',$_GET["var1"]);
//if it's from facebook, not uploaded
if( $file[1] != "uploads"){
$saveto = "uploads/".$file[3].".jpg";
$ch = curl_init(get_right_url("http:".$_GET["var1"]));
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER,1);
$raw=curl_exec($ch);
curl_close ($ch);
if(file_exists($saveto)){
unlink($saveto);
}
$fp = fopen($saveto,'x');
fwrite($fp, $raw);
fclose($fp);
}
function get_right_url($url) {
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
return curl_redir_exec($curl);
}
function curl_redir_exec($ch)
{
static $curl_loops = 0;
static $curl_max_loops = 20;
if ($curl_loops++ >= $curl_max_loops)
{
$curl_loops = 0;
return FALSE;
}
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
#list($header, $data) = #explode("\n\n", $data, 2);
$http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($http_code == 301 || $http_code == 302)
{
$matches = array();
preg_match('/Location:(.*?)\n/', $header, $matches);
$url = #parse_url(trim(array_pop($matches)));
if (!$url)
{
//couldn't process the url to redirect to
$curl_loops = 0;
return $data;
}
$last_url = parse_url(curl_getinfo($ch, CURLINFO_EFFECTIVE_URL));
if (!$url['scheme'])
$url['scheme'] = $last_url['scheme'];
if (!$url['host'])
$url['host'] = $last_url['host'];
if (!$url['path'])
$url['path'] = $last_url['path'];
$new_url = $url['scheme'] . '://' . $url['host'] . $url['path'] . (#$url['query']?'?'.$url['query']:'');
return $new_url;
} else {
$curl_loops=0;
return $data;
}
}
?>
I'm sure i'm doing something wrong, it shouldn't be this painful to upload a small image like this one. I'd be grateful for any help, thanks a lot.
fixed. got the "real link" for the image with facebook's API
https://developers.facebook.com/docs/reference/api/using-pictures/
curl was delaying it a lot...

javascript mandatory data error in screenleap api

i have a json encoded data in a variable named $json, it looks like-
string(1243) "{"screenShareCode":"882919360",
"appletHtml":"",
"presenterParams":"aUsEN5gjxX/3NMrlIEGpk0=",
"viewerUrl":"http://api.screenleap.com/v2/viewer/882919360?accountid=mynet",
"origin":"API"}"
}
i need to pass this json data into javascript function, please see below
script type="text/javascript" src="http://api.screenleap.com/js/screenleap.js">/script>
script type="text/javascript">
window.onload = function() {
var screenShareData = '?php echo $json;?>';
screenleap.startSharing('DEFAULT', screenShareData);
};
/script>
when i am trying to run this code it is giving me an error saying "missing mandatory screen share data".
How to solve this error?
i am following "https://www.screenleap.com/api/presenter"
It looks like $json is a string, you need to pass in a json object. Try the following:
window.onload = function() {
var screenShareData = '?php echo $json;?>';
screenleap.startSharing('DEFAULT', JSON.parse(screenShareData));
};
This is how you implement it based on the documentation
https://www.screenleap.com/api/presenter
<?php
// Config
$authtoken = '';
$accountid = '';
// 1. Make CURL Request
$url = 'https://api.screenleap.com/v2/screen-shares';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('authtoken:<authtoken>'));
curl_setopt($ch, CURLOPT_POSTFIELDS, 'accountid=<accountid>');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
curl_close($ch);
$json = json_decode($data, true);
?>
<!-- 2. Launch the Presenter App -->
<script type="text/javascript" src="http://api.screenleap.com/js/screenleap.js"></script>
<script type="text/javascript">
window.onload = function() {
screenleap.startSharing('DEFAULT', JSON.parse('<?php echo $json; ?>'));
};
</script>
If this doesn't work, you got to report it to screenleap.
You should only need to actually parse the JSON if you want to access the values. Otherwise, just pass the response data right into the startSharing function, like this:
<?php
$url = 'https://api.screenleap.com/v2/screen-shares';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('authtoken:<your authtoken>'));
curl_setopt($ch, CURLOPT_POSTFIELDS, 'accountid=<your accountid>');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$data = curl_exec($ch);
curl_close($ch);
$json = json_decode($data, true);
?>
<script type="text/javascript" src="http://api.screenleap.com/js/screenleap.js"></script>
<script type="text/javascript">
window.onload = function() {
screenleap.startSharing('DEFAULT', <?php echo $data; ?>);
};
</script>
If you just insert your own accountid and authtoken (without leading spaces), that should work for you.

GetElementById alternative

I'm trying to code a simple script with curl, but the problem is I need to gather special key that is generated on every new POST request (onLoad()). The problem can be easiyle be passed by creating a new DOM element and getting value using GetElementById function from DOM but, in this case, there is no "id" declared in specific tag I want to return value from. There is only a name.
Example:
<input name="trans_id" value="Lk+Vz957skV845b7x2DX7iyR1FI=" type="hidden">
Bellow there is a pseudo-code I did today (last paragraph is where I need help):
<?php
// Author : me
// Date : 10.11.2013.
?>
<?php
// Declaring variables :)
$data_string = '';
$url = 'http://www.website.com';
$uagent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6';
// Declaring variables for e-pay
$url2 = 'http:/website2.com';
...
$exChar = '|';
$exStr = '';
$exStr = Explode($exChar, $_POST['ccep']);
$data = array (
"email" => '',
...
"submitFromInputForm" => 'Next',
);
foreach($data as $key=>$value) { $data_string .= $key.'='.$value.'&'; }
rtrim($data_string, '&');
$ch = curl_init ();
curl_setopt ($ch, CURLOPT_COOKIEJAR, 'cookies.txt');
curl_setopt ($ch, CURLOPT_COOKIEFILE, 'cookies.txt');
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt ($ch, CURLOPT_HEADER, 0);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_USERAGENT, $uagent);
curl_setopt ($ch, CURLOPT_POST, count($data));
curl_setopt ($ch, CURLOPT_POSTFIELDS, $data_string);
$result = curl_exec ($ch);
print($result);
// ====================================================================
// Need help with this function
$check = strpos($result, 'Confirmation');
if ($check == True) {
$doc = new DOMDocument();
#$doc->loadHTML($result);
$id = $doc->getElementsByName('trans_id');
echo 'Value:' . $id;
}
// ====================================================================
curl_close($ch);
?>
But to be honest, I wasn't able to get result from that function because, well, it doesn't exsist. Google search only help me with droping results to use getElementsByTagName which after reading documentation from PHP official, does not resolve my problem.
A note at the end: I don't want to include any Javascripting, only pure PHP.
Thank you in advance,
regards.
You can use DOMXPath in order to access specific properties by xpath.
$domx = new DOMXPath($doc);
$trans_id = $domx->evaluate("//input[contains(#name, 'trans_id')]");
Eventually loop through the object if necessary
foreach ($trans_id as $id) {
echo "Value:" . $id->nodeValue;
}
On second thought, you need the value attribute, so you'd need to use getAttribute() in order to retrieve it. I just tested with the following code and it works as expected:
<?php
$result = '<input name="trans_id" value="Lk+Vz957skV845b7x2DX7iyR1FI=" type="hidden">';
$doc = new DOMDocument();
#$doc->loadHTML($result);
$domx = new DOMXPath($doc);
$trans_id = $domx->query('//input[#name="trans_id"]');
foreach ($trans_id as $id) {
echo "Value: " . $id->getAttribute('value');
}
prints:
Value: Lk+Vz957skV845b7x2DX7iyR1FI=

Categories

Resources