PhantomJS cannot open a page - javascript

I'm trying to get a html version of my url because of my backbone structure with multiple javascript code but this lines work only sometimes... Yes, sometimes works and the page content is loaded but sometimes phantom stucks and is not able to open the page. In fact, it doesn't log anything.
I'd played with timeouts but I'd got nothing. Any help? It appears to be a no deterministic behaviour. Thanks in advance!
var page = require('webpage').create();
page.open('myurl', function(status) {
if (status !== 'success') {
console.log('FAIL to load the address')
phantom.exit(1);
} else {
console.log( "Successful page open!" );
console.log(page.content);
phantom.exit(0);
}
});

Related

Why PhantomJS not scraping the page it is redirected to?

I am scraping http://www.asx.com.au/asx/markets/optionPrices.do?by=underlyingCode&underlyingCode=XJO
It shows a blank white page at first, in that page there is some obfuscated JS code.
That code sends a POST request automatically, and then loads actual page.
I have this code to follow the redirected page, but its not working.
var page;
var myurl = "http://www.asx.com.au/asx/markets/optionPrices.do?by=underlyingCode&underlyingCode=XJO";
var renderPage = function (url) {
page = require('webpage').create();
page.onNavigationRequested = function (url, type, willNavigate, main) {
if (main && url != myurl) {
myurl = url;
console.log("redirect caught")
// GUILTY CODE
renderPage(url);
}
};
page.open(url, function (status) {
if (status === "success") {
console.log("success")
page.render('yourscreenshot.png');
phantom.exit(0);
} else {
console.log("failed")
phantom.exit(1);
}
});
}
renderPage(myurl);
It only outputs
success
redirect caught
See my code, why GUILTY CODE part is not being executed ... Why renderPage(url) is not being called after redirect caught?
From my understanding phantomJS doesn't really handle redirects well. That may be your issue. You may want to test this in a different way. Or you can use another browser to perform these tests to confirm. Check out this git issue to see what I mean https://github.com/ariya/phantomjs/issues/10389.

jQuery check if a file exist locally

I am developing a local site for a company (only local internal use, offline and without server). I have a main page that has a main div, that contain 3 different div. Each div is linked to a page and the "onclick" event of each div will load the page linked into the main div. So i have to check, with the document ready function, if each page exists and, if not, I want to delete the div linked to that page. How can I check if a page exist locally? I've found many answere that check with status of connection if a page exists, but my html will only work offline and locally, so I can't use that method.
EDIT - SOLVED
I've solved this using the script of #che-azeh:
function checkIfFileLoaded(fileName) {
$.get(fileName, function(data, textStatus) {
if (textStatus == "success") {
// execute a success code
console.log("file loaded!");
}
});
}
If the file was successfully load, i'd change the content of a new hidden div that will tell to another script if it have to remove or not each of the three div.
This function checks if a file can load successfully. You can use it to try loading your local files:
function checkIfFileLoaded(fileName) {
$.get(fileName, function(data, textStatus) {
if (textStatus == "success") {
// execute a success code
console.log("file loaded!");
}
});
}
checkIfFileLoaded("test.html");
I suggest you run a local web server on the client's computer. (See also edit below on local XHR access).
With a local web server they can start it up as if it was an application. You could for example use node's http-server. You could even install it as an node/npm package, which makes deployment also easier.
By using a proper http server (locally in your case) you can use xhr requests:
$(function(){
$.ajax({
type: "HEAD",
async: true,
url: "http://localhost:7171/myapp/somefile.html"
}).done(function(){
console.log("found");
}).fail(function () {
console.log("not found");
})
})
EDIT:
Firefox
Another post has (#che-azeh) has brought to my attention that firefox does allow XHR on the file "protocol". At the time of this writing the above works in firefox using a url of just somefile.html and using the file scheme.
Chrome
Chrome has an option allow-file-access-from-files (http://www.chrome-allow-file-access-from-file.com/). This also allows local XHR request
This flag is intended for testing purposes:
you should be able to run your tests in Google Chrome with no hassles
I would still suggest the local web server as this make you independent of these browser flags plus protect you from regression once firefox/chrome decide to disable support for this.
You can attempt to load the page within a try-catch construct. If the page exists, it will be loaded though. If it doesn't, you can (within the catch) set the related div as hidden.
Try to access the page using $.ajax. Use the error: option to run a callback function that removes the DIV linked to the page.
$.ajax({
url: "page1.html",
error: function() {
$("#page1_div").remove();
});
You can loop this code over all the DIVs.
You can use jquery load function
$("div").load("/test.html", function(response, status, xhr) {
if (status == "error") {
var msg = "Sorry but there was an error: ";
$(this).html(msg + xhr.status + " " + xhr.statusText);
}
});

Close popup window for Google OAuth2

I have my oauth2 for Gmail open up in a popup using newWindow = window.open(...) and then when the user is done filling it out and hit 'allow' it redirects to my server where the tokens are retrieved and stored. Finally, the server returns 'Error' or 'Success' so the popup will just have that in it. Now on the Angular side I have this running.
checkConnect = setInterval(function() {
try{
if(newWindow.document.body.innerText === 'Success') {
console.log('Success');
newWindow.close();
window.clearInterval(checkConnect);
}else if(newWindow.document.body.innerText === 'Error') {
console.log('We had an error!');
newWindow.close();
window.clearInterval(checkConnect);
}else if(newWindow.closed) {
console.log('WINDOW WAS closed');
window.clearInterval(checkConnect);
}
}catch(e) {
//console.log(e);
}
}, 100);
This works sometimes and other times it fails. I also reuse this code for other Oauth providers, such as Dropbox.Sometimes it works and sometimes it doesn't. Any idea why?
Well I couldn't figure it out by using a popup. However, I was able to over come this problem by just having the oauth page open in the same page and redirect back to the page I wanted the user on afterwards.
Also, there are libraries provided by google that open it for you in a popup and handle the closing.

Unable to load page resources with PhantomJS

I'm using PhantomJS to get page content for given URL.
The problem is that on some pages PhantomJS can not load some resources (js, css...), and the error I'm getting is:
error code 5, Operation canceled
Web page on which I can reproduce this problem is www.lifehacker.com
The resources I can not get are:
http://x.kinja-static.com/assets/stylesheets/tiger-4ee27d6612a71ee3c68440f8e9c0025c.css
http://c.amazon-adsystem.com/aax2/amzn_ads.js
and some others too...
The command I'm running is:
phantomjs --debug=true --cookies-file=cookies.txt --ignore-ssl-errors=true --ssl-protocol=tlsv1 fetchpage.js http://www.lifehacker.com
and even if I remove options like cookies-file, ignore-ssl-errors, ssl-protocol the result is still the same.
The fetchpage.js script is:
var webPage = require('webpage');
var system = require('system');
var page = webPage.create();
if (system.args.length === 1) {
console.log('Usage: fetchpage.js <some URL>');
phantom.exit(1);
}
var url = system.args[1];
page.open(url, function (status) {
console.log("STATUS: " + status);
if (status !== 'success') {
console.log(
"Error opening url \"" + page.reason_url
+ "\": " + page.reason
+ "\": " + page
);
phantom.exit(1);
} else {
var content = page.content;
console.log(content);
phantom.exit(1);
}
});
If I open that same page in Chrome, page loads just fine. Also if I copy those resource URLs that phantomjs can not load and paste them in Chrome, they load just fine.
I have tried to google for similar problems, but I only found some suggestions about setting timeout which did not work for me.
I have tried the same thing with phantomjs v1.9.0, 1.9.8 and 2.0.1-development.
What's even more interesting, sometimes phantomjs script manages to get full response from all resources, so I'm suspecting on cache, but I couldn't force server to avoid cache. I have tried to send custom headers through phantomjs like this:
...
var page = webPage.create();
page.customHeaders = {
"Cache-Control":"no-cache",
"Pragma":"no-cache"
};
page.open(url, function (status) {
...
but nothing changed.
I am running out of ideas..
For coders who come across this page during their quest to find an solution for resources not completely loading on phantomjs. I had a project where the script would stall/hang on a few resources. It was 50/50 if it would execute or not.
Some digging and I found the following page:
https://github.com/ariya/phantomjs/issues/10652
Where the solution to set an timeout for resources was working out for me:
page.settings.resourceTimeout = 10000;
In regards to the above question I am not sure if this is completely appropiate but at least the information is easier to find now and can be regarded part of an solution to some.

How to use JQuery and custom site variables with phantomjs

I'm trying out phantomjs for browser testing and am having a few issues. My current code is below...
var page = require('webpage').create();
page.open('http://address.com', function(){
console.log('page opened');
page.evaluate(function(){
console.log('inside page eval');
varname.varaction(13633);
})
if (jQuery('#bcx_13633_iframe_overlay').is(':visible')){
console.log('it is visible');
} else {
console.log('it is not visible');
}
console.log('made it to exit');
phantom.exit();
})
I've tried using includeJS with a link to jQuery and wrapping that around everything between page.open and phantom.exit but it seems to stall out and skip that portion. I have custom actions located on the site that I can call in the console if i visit it as called in the code listed above, however, it tells me that it can't find the variable name for that either. Anyone with phantomjs experience have any tips on how to fix this?

Categories

Resources