Take screenshot from external website - javascript

I am developing a startpage where users can add links to the page by using a formular. They can add name, url, description and upload an image.
I want to automate the process of uploading an image, the image should be captured automatically. My script should take a screenshot of the website which the user entered in url. I know I can take screenshots of html elements by using html2canvas.
Approach 1
My first approach was to load the external website to an iframe, but this does not work because some pages are restricting this, e.g. even the iframe tutorial on w3schools.com does not work and I get Refused to display 'https://www.w3schools.com/' in a frame because it set 'X-Frame-Options' to 'sameorigin'.
HTML
<div id="capture" style="padding: 10px; color: black;">
<iframe src="https://www.w3schools.com"></iframe>
</div>
Approach 2
My next approach was to make a call to my webserver, which loads the target website and returns the html to the client. This works, but the target site is not getting rendered properly, e.g. images are not loading. (see screenshot below)
HTML
<div id="capture" style="padding: 10px; color: black;"></div>
JS
var testURL = "http://www.google.de";
$.ajax({
url: "http://server/ajax.php",
method: "POST",
data: { url: testURL},
success: function(response) {
$("#capture").html(response);
console.log(response);
html2canvas(document.querySelector("#capture")).then(
canvas => {
document.body.appendChild(canvas);
}
);
}
});
PHP
if (!empty($_POST['url'])) {
$url = filter_input(INPUT_POST, "url");
}
$c = curl_init($url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
//curl_setopt(... other options you want...)
$html = curl_exec($c);
if (curl_error($c))
die(curl_error($c));
// Get the status code
$status = curl_getinfo($c, CURLINFO_HTTP_CODE);
curl_close($c);
echo $html;
Is it possible to achieve this?
Update
I managed to load some pictures by changing my ajax, but they are not rendered by html2canvas.??
var testURL = "http://www.google.de";
$.ajax({
url: "http://server/ajax.php",
method: "POST",
data: { url: testURL},
success: function(response) {
response = response.replace(/href="\//g, 'href="'+testURL +"/");
response = response.replace(/src="\//g, 'src="'+testURL +"/");
response = response.replace(/content="\//g, 'content="'+testURL +"/");
$("#capture").html(response);
console.log(response);
html2canvas(document.querySelector("#capture")).then(
canvas => {
document.body.appendChild(canvas);
}
);
}
});
Result
Result Canvas

I love php, but for screenshots I found that using phantomjs provide the best results
Example file screenshot.js
var page = require('webpage').create();
page.open('https://stackoverflow.com/', function() {
page.render('out.png');
phantom.exit();
});
Then from the shell:
phantomjs screenshot.js
Or from php:
exec("phantomjs screenshot.js &");
The goal here is to generate the js file from php.
Result in a file called out.png in the same folder. This is a full height page screenshot.
Example output
We can also take good captures with Firefox from the command line. This require X anyway.
firefox -screenshot test.png http://www.google.de --window-size=1280,1000
Example output

Not in pure php. Nowadays major number of sites generates content dynamically with js. It can be rendered only by browsers, but good news - there is something called phantomjs - browser without UI. It can do job for You, even they have working example in their tutorials which I succesfully implemented few years ago with small knowledge of javascript.
There is alternative library called a nightmarejs - I know this only from friends opinion which says that it's simpler than phantom, but I won't guarantee to You that it won't be a nightmare - personally I hadn't use it.

It is possible, but if you want an screenshot you need something like a browser that render the page for you. The iframe approach go in that way. But iframe is the page itself. If you want a .jpg , .png or something like that, the best way in my opinion is using wkhtmltoimage. https://wkhtmltopdf.org/.
The idea is that you install Qt WebKit rendering engine in your server, just as you install a browser in your server, this render the page and save the final result in a file. When some user submit a url, you pass it as argument to wkhtmltopdf then you could have an image of that url. The basic use could be somethig like
wkhtmltoimage http://www.example1.com /var/www/pages/example1.jpg
you should run that statement in bash, from php could be:
<?php
exec('wkhtmltoimage http://www.example1.com /var/www/pages/example1.jpg');
?>
Keep in mind that wkhtmltoimage execute css, javascript.., everything. Just like browser.

Related

Load a text file from the same directory

I'm working with Play Framework (using the port 9000). When I use Ajax indicating the URL to a txt I try to indicate it in the following way:
...
<script>
var fileContent;
$.ajax({
url : "http://localhost:9000/text.txt",
dataType: "text",
...
</script>
...
The text file "text.txt" is on the same folder as the html file that calls it. However, when I try to access it, I just can't in this way. I did set up Apache yesterday to test it and I did try typing
http://localhost/text.txt
in the web browser and it loads, but when I do that with Play, it does not allow me. Any ideas? Thanks!

External JS file not loading after submit

I have a js file in domain1 that is working fine in domain1. But if I connect the js (from domain1) to domain2, it is not working.
The js file is a connection to a PHP file in domain1 to output some results. How can i make it work in domain2?
[I want to make the js work from the domain1 itself]
Here the js file in domain1,
function sendQuery() {
var container = $('#DisplayDiv');
$(container).html('<img src="http://www.domain1.com/loading.gif">');
var newhtml = '';
$.ajax({
type: 'POST',
url: 'http://www.www.domain1.com/data.php',
data: $('#SubmitForm').serialize(),
success: function (response) {
$('#DisplayDiv').html(response);
}
});
return false;
}
It is working till loading.gif file, but no data is output from the external output.php file from domain2.
[Here domain1 & domain2 are used only as examples]
WORKING FINE NOW!!
Thanks to #Ohgodwhy, Header set Access-Control-Allow-Origin "*" in .htaccess in domain1.
It is not clear what you want Exactly .. If you past Your code Here it will be Excellent ..
But i think if you want to Connect any 'js' file from any domain to your domain you can use the ordinary deceleration for it :
in HTML:
<script type = "text/javascript" src="https://googledrive.com/host/0B248VFEZkAAiNjhxaDNUZVpsVHM" charset="utf-8"></script>
in PHP :
echo '<script type = "text/javascript" src="https://googledrive.com/host/0B248VFEZkAAiNjhxaDNUZVpsVHM" charset="utf-8"></script>';
Very important note :
1- You must take care of Script order for dependent script.
2- The element which you call must be attend and visible during cal time.
Javascript don't allow cross domain AJAX call.
There ae some options available to do that like JSONP
See this link for more options : link

HTML Javascript webpage content reading into a var

Lets say I have this weblink: http://checkip.amazonaws.com/ . I've made a java program (actual program. not webpage) that reads the content of this webpage (eg. "25.25.25.25") and displays it in a jLabel (Using Netbeans IDE 1.7.3) and it works.
Now how can I read the contents of this same webpage (eg. "25.25.25.25") and display it as normal text on a webpage (The final webpage must be .html not .php or what ever)?
I dont mind any script whether is html or javascript or anything, I just please need it to work so that when the webpage is opened it can read something like:
"Your IP: 25.25.25.25"
Preferably reading the contents of http://checkip.amazonaws.com/ into
<script>var ip = needCodeHere</script>
If I can get the IP into a var or read the contents of that webpage into a var I'm happy but other code is happy to as long as it works.
Please help :( been staring at google for days and cant find a solution)
You'll need 3 files (in the same directory) to do that. A HTML file to show the ip, a PHP file to get that ip via curl, and a JS file to connect the html and the php. It would be simpler if the "final webpage" could be the ip.php itself, but let's do it this way:
1) ip.html (the "final webpage")
<script src="//code.jquery.com/jquery-1.11.0.min.js"></script>
<script type="text/javascript" src="ip.js"></script>
<div id="ip"></div>
2) ip.php
<?php
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, 'http://checkip.amazonaws.com');
$result = curl_exec($curl);
curl_close($curl);
?>
3) ip.js
$.ajax({
url: 'ip.php',
type: "GET",
success: function (data) {
$('#ip').html("Your IP: " + data);
}
});
Let me know if you need more explanation.

Finding favicons - when not in default location

I display favicon's from other sites on my page.
About half the time they are here:
hostname.com/favicon.ico
But the other half they are not. For ecample in my own site I link to my .ico file like this. FAVICON is just a PHP definition of the path.
<link rel="SHORTCUT ICON" href="<?php echo FAVICON ?>" />
How do I get the URL of a site's favicon using the the link in the html?
This is site sais you can do a google search like this where you enter the domain you need the favicon for.
http://www.google.com/s2/favicons?domain=domain
Which is one solution but seems less efficient than just reading the html from the path.
I think google cached "ALL" icons into .png format and made them searchable -
per this site
Load the page using Ajax and a proxy page. For the Ajax:
// Create a request object:
var rq = new XMLHttpRequest(); // Not IE6-compatible, by the way.
// Set up the request:
rq.open('GET', 'proxy.php?url=' + encodeURIComponent(thePageURL), true);
// Handle when it's loaded:
rq.onreadystatechange = function() {
if(rq.readyState === 4) {
// The request is complete:
if(rq.status < 400) {
// The HTML is stored in rq.responseText; you could use a regular expression to extract the favicon, like /shortcut icon.+?href="(.+?)"/i.
} else {
// There was an error fetching the page; fall back?
}
}
};
And the proxy page (you'll probably want to add some security):
<?php
echo file_get_contents($_REQUEST['url']);
?>
Google "Ajax" and you'll find lots of information on how to do that sort of thing.
The reason you need to proxy the page is that browsers don't allow Ajax requests from JavaScript to go across domains unless the target allows it, which it must do explicitly. This is for security reasons, since the JavaScript could be maliciously impersonating the user. So instead, you proxy the content using a server-side script and avoid such problems.
Parsing HTML is nasty - you probably want to use a library like: http://www.controlstyle.com/articles/programming/text/php-favicon/ or let google do it for you: http://www.google.com/s2/favicons?domain=domain (much more efficient - you don't have to parse all the HTML on your server, and it's just one tag). If you want something like google's functionality on your server, check out the link above.

wget + JavaScript?

I have this webpage that uses client-side JavaScript to format data on the page before it's displayed to the user.
Is it possible to somehow use wget to download the page and use some sort of client-side JavaScript engine to format the data as it would be displayed in a browser?
You could probably make that happen with something like PhantomJS
You can write a phantomjs script that will load the page like a browser would, and then either take screenshots or use JS to inspect the page and pull out data.
Here is a simple little phantomjs script that triggers javascript on a webpage and allows you to pull it down locally:
file: get.js
var page = require('webpage').create(),
system = require('system'), address;
address = system.args[1];
page.scrollPosition= { top: 4000, left: 0}
page.open(address, function(status) {
if (status !== 'success') {
console.log('** Error loading url.');
} else {
console.log(page.content);
}
phantom.exit();
});
Use it as follows:
$> phantomjs /path/to/get.js "http://www.google.com" > "google.html"
Changing /path/to, url and filename to what you want.
Not with wget, as I doubt it includes any form of a JavaScript engine. However, you could use WebKit to process the page, and thus the output.
Using things like this as a base for how to get the content: http://situated.wordpress.com/2008/06/04/take-screenshots-of-a-website-from-the-command-line/

Categories

Resources