I'm trying to render a screenshot of a Trading View chart widget on my server, similar to the following :
https://jsfiddle.net/exatjd8w/
I'm not that familiar with PhantomJS, but tried several ways to take a shot of the chart once it's loaded, the last try using the following code:
var page = require('webpage').create();
page.open('https://mywebsite.com/chart',
function(status) {
console.log("Status: " + status);
if (status === "success") {
page.includeJs('https://ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js', function() {
(page.evaluate(function() {
// jQuery is loaded, now manipulate the DOM
console.log('Status: jQuery is loaded, now manipulate the DOM');
var date = new Date(),
time = date.getTime()
$('#main-widget-frame-container iframe').on('load', function() {
console.log('iframe loaded, now take snapshot')
page.render('screenshot_' + time + '.png')
})
}))
})
}
}
);
Unfortunately, still unable to do it right since the above code running forever without results.
Any ideas, suggestions?
Thank you in advance.
PhantomJS is no longer maintained, and has a couple of issues that might cause this.
I'd recommend switching to Puppeteer which has a really nice API, uses Google Chrome (headless) under the hood, and is actively maintained by the chrome team:
https://github.com/GoogleChrome/puppeteer
Related
Im working in a script that get an screenshot of a website every day. I already did it for other sites and it worked correctly but for the first time i have the next problem... my phantomjs script capture almost all the data in the website, but not all (in fact it doesn't print the most important for my case).
Until now i was using this simple script adapted:
var page = require('webpage').create();
page.open('http://www.website.com', function() {
setTimeout(function() {
page.render('render.png');
phantom.exit();
}, 200);
});
But when i run the same script for this site its losing some data. Take the screenshot but miss the prices...
Screenshot of the site with phantomjs
After exploring a bit i saw that if i make a DOM capture (for example using a PHP Simple HTML DOM parser) i can get most of the data but not the prices.
$html = file_get_html('https://www.falabella.com.ar/falabella-ar/category/cat10178/TV-LED-y-Smart-TV');
$prods = $html->find('div[class=fb-pod-group__item]');
foreach ($prods as $prod) {
// For example i can get the title
$title = $prod->find('h4[class=fb-responsive-hdng-5 fb-pod__product-title]',0)->plaintext;
// But not the price
$price = $prod->find('h4[class=fb-price]',0)->plaintext;
}
Exploring the console log i found the javascript objects where these values are. If i return the object fbra_browseProductListConfig.state.searchItemList.resultList[0].prices[0].originalPrice;
i see the price of the first product and so on and so on...:
Console log of the site
also i can get it with a phantomjs script like this:
var page = require("webpage").create();
page.open("https://www.falabella.com.ar/falabella-ar/category/cat10122/Cafeteras-express", function(status) {
var price = page.evaluate(function() {
return fbra_browseProductListConfig.state.searchItemList.resultList[0].prices[0].originalPrice;
});
console.log("The price is " + price);
phantom.exit();
});
In other posts (like this) i read about changing the timeout intervals but its not working for me (i tried all the scripts shared in the quoted post). The problem is not that the website doesn't fully charge. But it seems that this data (the prices) is not printed in the DOM. I even downloaded the full site from the terminal with wget command and the prices are not there o_O.
Edited
When i execute the script i get the next errors:
./phantomjs fala.js
ReferenceError: Can't find variable: Set
https://www.falabella.com.ar/static/assets/scripts/react/vendor.js?vid=111111111:22
https://www.falabella.com.ar/static/assets/scripts/react/vendor.js?vid=111111111:1 in t
https://www.falabella.com.ar/static/assets/scripts/react/vendor.js?vid=111111111:22
https://www.falabella.com.ar/static/assets/scripts/react/vendor.js?vid=111111111:1 in t
https://www.falabella.com.ar/static/assets/scripts/react/vendor.js?vid=111111111:22
https://www.falabella.com.ar/static/assets/scripts/react/vendor.js?vid=111111111:1 in t
https://www.falabella.com.ar/static/assets/scripts/react/vendor.js?vid=111111111:1
TypeError: undefined is not an object (evaluating 't.componentDomId')
https://www.falabella.com.ar/static/assets/scripts/react/productListApp.js?vid=111111111:3
https://www.falabella.com.ar/static/assets/scripts/react/vendor.js?vid=111111111:22
Maybe the problem is there because the script "productListApp.js" executes the prices?
I've built an angular/express/node app that runs in google cloud which currently uses a JSON file that serves as a data source for my application. For some reason, (and this only happens in the cloud) when saving data through an ajax call and writing it to the json file, everything seems to work fine. However, when refreshing the page, the server (sometimes!) sends me the version before the edit. I can't tell whether this is an Express-related, Node-related or even Angular-related problem, but what I know for sure is that I'm checking the JSON that comes in the response from the server, and it really is sometimes the modified version, sometimes not, so it most probably isn't angular cache-related.
The GET:
router.get('/concerts', function (request, response) {
delete require.cache[require.resolve('../database/data.json')];
var db = require('../database/data.json');
response.send(db.concerts);
});
The POST:
router.post('/concerts/save', function (request, response) {
delete require.cache[require.resolve('../database/data.json')];
var db = require('../database/data.json');
var concert = request.body;
console.log('Received concert id ' + concert.id + ' for saving.');
if (concert.id != 0) {
var indexOfItemToSave = db.concerts.map(function (e) {
return e.id;
}).indexOf(concert.id);
if (indexOfItemToSave == -1) {
console.log('Couldn\'t find concert with id ' + concert.id + 'in database!');
response.sendStatus(404);
return;
}
db.concerts[indexOfItemToSave] = concert;
}
else if (concert.id == 0) {
concert.id = db.concerts[db.concerts.length - 1].id + 1;
console.log('Concert id was 0, adding it with id ' + concert.id + '.');
db.concerts.push(concert);
}
console.log("Added stuff to temporary db");
var error = commit(db);
if (error)
response.send(error);
else
response.status(200).send(concert.id + '');
});
This probably doesn't say much, so if someone is interested in helping, you can see the issue live here. If you click on modify for the first concert and change the programme to something like asd and then save, everything looks fine. But if you try to refresh the page a few times (usually even up to 6-7 tries are needed) the old, unchanged programme is shown. Any clue or advice greatly appreciated, thanks.
To solve: Do not use local files to store data in cloud! This is what databases are for!
What was actually the problem?
The problem was caused by the fact that the App Engine had 2 VM instances running for my application. This caused the POST request to be sent to one instance, it did its job, saved the data by modifying its local JSON file, and returned a 200. However, after a few refreshes, the load balancing causes the GET to arrive at the other machine, which has its individual source code, including the initial, unmodified JSON. I am now using a MongoDB instance, and everything seems to be solved. Hopefully this discourages people who attempt to do the same thing I did.
I'm using PhantomJS to get page content for given URL.
The problem is that on some pages PhantomJS can not load some resources (js, css...), and the error I'm getting is:
error code 5, Operation canceled
Web page on which I can reproduce this problem is www.lifehacker.com
The resources I can not get are:
http://x.kinja-static.com/assets/stylesheets/tiger-4ee27d6612a71ee3c68440f8e9c0025c.css
http://c.amazon-adsystem.com/aax2/amzn_ads.js
and some others too...
The command I'm running is:
phantomjs --debug=true --cookies-file=cookies.txt --ignore-ssl-errors=true --ssl-protocol=tlsv1 fetchpage.js http://www.lifehacker.com
and even if I remove options like cookies-file, ignore-ssl-errors, ssl-protocol the result is still the same.
The fetchpage.js script is:
var webPage = require('webpage');
var system = require('system');
var page = webPage.create();
if (system.args.length === 1) {
console.log('Usage: fetchpage.js <some URL>');
phantom.exit(1);
}
var url = system.args[1];
page.open(url, function (status) {
console.log("STATUS: " + status);
if (status !== 'success') {
console.log(
"Error opening url \"" + page.reason_url
+ "\": " + page.reason
+ "\": " + page
);
phantom.exit(1);
} else {
var content = page.content;
console.log(content);
phantom.exit(1);
}
});
If I open that same page in Chrome, page loads just fine. Also if I copy those resource URLs that phantomjs can not load and paste them in Chrome, they load just fine.
I have tried to google for similar problems, but I only found some suggestions about setting timeout which did not work for me.
I have tried the same thing with phantomjs v1.9.0, 1.9.8 and 2.0.1-development.
What's even more interesting, sometimes phantomjs script manages to get full response from all resources, so I'm suspecting on cache, but I couldn't force server to avoid cache. I have tried to send custom headers through phantomjs like this:
...
var page = webPage.create();
page.customHeaders = {
"Cache-Control":"no-cache",
"Pragma":"no-cache"
};
page.open(url, function (status) {
...
but nothing changed.
I am running out of ideas..
For coders who come across this page during their quest to find an solution for resources not completely loading on phantomjs. I had a project where the script would stall/hang on a few resources. It was 50/50 if it would execute or not.
Some digging and I found the following page:
https://github.com/ariya/phantomjs/issues/10652
Where the solution to set an timeout for resources was working out for me:
page.settings.resourceTimeout = 10000;
In regards to the above question I am not sure if this is completely appropiate but at least the information is easier to find now and can be regarded part of an solution to some.
I am using AngularJS to constantly poll for new data through HTTP POST. An alert will be sent when new data is received. The code which is inside a controller looks something like this;
var poll = function() {
$http.get('phones.json').success(
function(data)
{
new_val = data.val;
if ( (new_val!== old_val) )
{
$window.alert("AlertEvent");
}
old_data = new_val;
$timeout(poll, 500);
}
);
};
poll();
This code works when the html page is refreshed. Working means when phones.json is changed, an alert will appear. However, if I leave the page on for, say 30 minutes, and come back later, it stops working. I have to refresh the page to get it working again.
What else did I miss out? What did I do wrong? Could it due to some caching mechanism?
Thank you very much.
EDIT: I found the cause. It is indeed due to the browser reading from cache. I can see this using Chrome Developer tools. How can this caching be disabled for this html page only?
You may be able to bust the cache by doing something like this:
$http.get('phones.json?v=' + Date.now())
Depending on how your back-end is set-up you may need to adjust it to accept that.
Super new to AngularJS so please be patient with me :)
We're using it to poll our API for updates while we process a repository (and after). On one particular page the data is loaded but not drawn to the screen. Highlighting the content, or resizing the browser causes a redraw and shows all angular values that weren't there a moment ago! Latest Chrome!
Just look: Everything starts at "0" or "-" but highlighting the page reveals "Optimized Images" and "Filesize Savings" content changes.
Live example:
MAY REQUIRE YOU HIT REFRESH TO HAVE THE ANGULAR DRAW FAIL
REQUIRES CHROME ~ Version 31.0.1650.63 m
It works on Firefox!?!
http://crusher.io/repo/alhertz/didthesaintswin/63f49d36e709dea172fe7e4bbacbcfd834f9a642
This appears to be very similar to this question, but there is no nested controller issue I can detect: Update page contents after GET request in AngularJS
When I try to add a $scope.$apply() I get this error: http://docs.angularjs.org/error/$rootScope:inprog?p0=$apply
This is the relevant code in the angular controller (coffeescript):
do poll = ->
$http.get("#{$location.absUrl()}/poll.json").success((data, status, headers, config) ->
if $scope.continuePolling
#console.log("still polling #{$location.absUrl()}")
$scope.data = data
$scope.TotalOptimizedImage = $scope.CalcTotalOptimizedImage(data.images)
$scope.TotalImgSize = $scope.CalcTotalImgSize(data.images)
$scope.SavedImgSize = $scope.CalcSavedImgSize(data.images)
$scope.TotalSavings = ($scope.TotalImgSize - $scope.SavedImgSize + 0)
$timeout(poll, 10000)
)
Really not sure how to break this apart for fixing. Thoughts?
It looks like you need to call $scope.apply inside the callback to the http.get. The reason is that the callback will happen outside the controller digest. Sorry I'm not adept at coffee script but something like this:
$http.get("#{$location.absUrl()}/poll.json").success((data, status, headers, config)
if $scope.continuePolling
$scope.$apply(function() { // wrap the stuff you want to update in the $scope.$apply
#console.log("still polling #{$location.absUrl()}")
$scope.data = data
$scope.TotalOptimizedImage = $scope.CalcTotalOptimizedImage(data.images)
$scope.TotalImgSize = $scope.CalcTotalImgSize(data.images)
$scope.SavedImgSize = $scope.CalcSavedImgSize(data.images)
$scope.TotalSavings = ($scope.TotalImgSize - $scope.SavedImgSize + 0)
});
$timeout(poll, 10000)
)
I would suggest that you use a safeApply function and there is a great timeFunctions service that might help you with the polling that I've used quite successfully in a couple projects. You can find it in this answer.