Extracting dynamic content with node.js and PhantomJS

Extracting dynamic content with node.js and PhantomJS - javascript

I want to console.log the content of web page with nodejs and phantomjs. This is my code:
var phantom = require('phantom');
phantom.create(function(ph) {
return ph.createPage(function(page) {
return page.open("http://zehinz.com/test.html", function(status) {
if (status === 'success') {
//console.log the content of page with Javascript executed ???
} else {
console.log('some error');
ph.exit();
}
});
});
});
How can I output the dynamically rendered content of web page?

In plain PhantomJS one would use page.content, but since you're using a bridge, the content property has to be explicitly fetched from the PhantomJS process in the background. You can do this with page.get.
In your case, this is
page.get('content', function(content){
console.log("Content", content);
ph.exit();
});

Related

How to get HTML content using PhantomJS after X seconds?

How to get the content of website after 10 seconds using PhantomJS? In my website for example, I have script that do setTimeout then change the DOM. I need to feach the website html with that change.
I can't find any working answers.

Try this code:
var page = require('webpage').create();
var fs = require('fs');
page.open('http://example.com', function(status) {
console.log('Page load status: ' + status);
if (status === 'success') {
setTimeout(function(){
fs.write('example.html', page.content, 'a');
console.log('Page saved');
phantom.exit();
}, 10000);
}
});

Injecting javascript function in nightmareJS

I was given a javascript function that I need to inject into a page in order to get a list of values that would be used later on. I can call this function directly on the webpage using the Chrome console but I want to replicate what I did in the Chrome console in nightmareJS on the webpage that is currently loaded.
This is the function:
function getList() {
require(['Service/List'],
function (Service)
{
Service.getList
({
onComplete: function (listOfServices)
{
console.log('List:success:' + JSON.stringify(listOfServices));
},
onFailure: function (error)
{
console.log('List:error:' + error);
}
});
});
}
getList();
I've tried injecting the file but I have had no success, I've also tried adding additional code to that function to write the output to a file but I do not think its being called at all.
Here is the nightmareJS
describe('Open login page', () => {
it('Login', done => {
nightmare
.goto('http://loginURL.com')
.wait('input[type="text"]')
.wait('input[type="password"]')
.insert('input[type="text"]', 'username')
.insert('input[type="password"]', 'password')
.click('input[type="submit"]')
.evaluate(function() {
nightmare.inject('js', 'getList.js' )
})
//.inject('js', 'getList.js' )
.then(function() {
console.log('Done');
})
})
})
})
This is the sample output after injecting the javascript file into the page:
List:success:"Test":"https://someURL.com/resource/test","Design":"https://someURL.com/resource/Design"},"NewSpace":"https://someURL.com/resource/NewSpace","Generator":"https://Generator.someURL.com/resource/test","SomethingElse":"https://someURL.com/SomethingElse/test","Connection":"https://someURL.com/Connection/test","WorldWide":"https://someURL.com/resource/WorldWide","Vibes":"https://Vibes.someURL.com/resource/test","GoogleSearch":"https://someURL.com/resource/GoogleSearch",
I want to be able to get that output from calling the javascript file on the page and save it to a file so that I can use it later to call other services in that list.

You can read the local javascript files that needs to be injected:
var fileData = [];
fileData.push(fs.readFileSync(path.resolve('../getList.js'), 'utf-8'));
It can be loaded into head section of the page via code:
browser.win
.evaluate(function(fileData) {
var elem = null;
for(var ii=0;ii<fileData.length; ii++ ) {
elem = document.createElement('script');
if (elem) {
elem.setAttribute('type', 'text/javascript'); //elem.src = localjs;
//console.log(fileData[ii]);
elem.innerHTML = fileData[ii];
document.head.appendChild(elem);
}
}
console.log("Testing loaded scripts");
console.log(getList());
return "Injected Scripts";
}, fileData)
.then(function(result) {
console.log(result);
}).catch(function(error) {
console.log("Load Error: ", error);
});

Having trouble downloading dynamic web content with PhantomJS

My goal is to download the dynamic web content of a website, so javascript is necessary to be executed on the received content. The code that I am currently using with PhantomJS 2.1 is the following:
var page = require('webpage').create();
var fs = require('fs');
page.open('https://sports.bovada.lv/soccer/premier-league', function () {
page.includeJs("http://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js", function () {
page.evaluate(); // Edit: this line is removed
page.close();
});
});
page.onLoadFinished = function() {
console.log("Download finished");
fs.write('test.html', page.content, 'w');
phantom.exit(0);
};
The code is saving the received page as "test.html", but unfortunately it is not loading the full page content as it does with a web browser. I would appreciate if someone could help me out.
Website used for testing: https://sports.bovada.lv/soccer/premier-league

The issue could be that your're exiting too soon. Try delaying script termination:
page.onLoadFinished = function() {
console.log("Download finished");
fs.write('test.html', page.content, 'w');
setTimeout(function(){
phantom.exit(0);
}, 1000);
};

fabricjs or slimerjs, "all objects displayed" event?

I'm using slimerjs to render some html files. Each file contains a JSON string that gets loaded with a call to
fabricjsCanvas.loadFromJson(jsonString, fabricjsCanvas.renderAll.bind(fabricjsCanvas));
This is where I open my page
page.open(address, function (status) {
if (status !== 'success') {
console.log('Unable to load the page!');
phantom.exit(1);
} else {
page.render(output);
window.setTimeout(function () {
page.render(output);
phantom.exit();
}, 5000);
}
});
As you can see I had to set a timeout after which slimerjs closes the page saving what's on it. I really don't like this solution, coz I need to render multiple pages, some of them are very small, and could take less than 200 milliseconds, others are huge and could take more than 5000, so this is just bad for perfomances and isn't even a "safe solution" against page taking a long time to render. I tryid putting a console.log at the end of canvas.renderAll call and then add this piece of code to my slimerjs script
page.onConsoleMessage = function (msg) {
console.log(msg);
page.render(output);
phantom.exit();
};page.open(address, function (status) {
if (status !== 'success') {
console.log('Unable to load the address!');
phantom.exit(1);
}
});
I hoped that this would have helped but nothing really changed, the reanderAll finishes before all objects are displayed.
Is there some event I can catch, or something else I can do to prevent this?

You should use a callback page.onLoadFinished().
page.onLoadFinished = function(status, url, isFrame) {
console.log('Loading of '+url+' is a '+ status);
page.render();
};
This function run after full upload page

I managed to find a solution. First of all I changed my html template,since I only need to render per page i used a StaticCanvas instead of a normal canvas. Since a static canvas only has 2 frames to render(the top one and the secondary container, at least this is what I've learned in my experience) I added an event lister on after:render event, so after the second frame has been rendered I print a console message, at this point the page.onConsoleMessage gets called and closes the phantom process.
In this way I don't need to allow a standard amount of time that could be too much (loosing perfomances) or not enough (and the image would result blank).
this is the script in my html template
var framesRendered = 0;
var canvas = new fabric.StaticCanvas('canvas', {width: {{width}}, height: {{height}} });
canvas.setZoom({{{zoom}}});
canvas.on('after:render', function() {
if(framesRendered == 1)
console.log('render complete');
else framesRendered++;
});
canvas.loadFromJSON({{{data}}}, canvas.renderAll.bind(canvas), function (o, object) {
if (object.type === 'picturebox' && object.filters.length) {
object.applyFilters(function () {
canvas.renderAll();
});
}
});
and this is my slimerjs script
page.onConsoleMessage = function(){
page.render(output);
phantom.exit();
};
page.open(address, function (status) {
if (status !== 'success') {
console.log('Unable to load the address!');
phantom.exit(1);
}
});
I'll leave this one here in case someone needs it.

login to a webpage using phantomjs and Jquery

I am new to phantomjs, Java script and WebScraping in General. What I want to do is basic http authentication and then visit another URL to get some information. Here is what I have till now. Please tell me what I am doing wrong.
var page = require('webpage').create();
var system = require('system');
page.onConsoleMessage = function(msg) {
console.log(msg);
};
page.onAlert = function(msg) {
console.log('alert!!>' + msg);
};
page.settings.userName = "foo";
page.settings.password = "bar";
page.open("http://localhost/login", function(status) {
console.log(status);
var retval = page.evaluate(function() {
return "test";
});
console.log(retval);
page.open("http://localhost/ticket/" + system.args[1], function(status) {
if ( status === "success" ) {
page.injectJs("jquery.min.js");
var k = page.evaluate(function () {
var a = $("div.description > h3 + p");
if (a.length == 2) {
console.log(a.slice(-1).text())
}
else {
console.log(a.slice(-2).text())
}
//return document.getElementById('addfiles');
});
}
});
phantom.exit();
});
I am passing an argument to this file: a ticket number which gets appended to the 2nd URL.

I would recommend CasperJS highly for this.
CasperJS is an open source navigation scripting & testing utility written in Javascript and based on PhantomJS — the scriptable headless WebKit engine. It eases the process of defining a full navigation scenario and provides useful high-level functions, methods & syntactic sugar for doing common tasks such as:
defining & ordering browsing navigation steps
filling & submitting forms
clicking & following links
capturing screenshots of a page (or part of it)
testing remote DOM
logging events
downloading resources, including binary ones
writing functional test suites, saving results as JUnit XML
scraping Web contents
(from the CasperJS website)
I recently spent a day trying to get PhantomJS by itself to do things like fill out a log-in form and navigate to the next page.
CasperJS has a nice API purpose built for forms as well:
http://docs.casperjs.org/en/latest/modules/casper.html#fill
var casper = require('casper').create();
casper.start('http://some.tld/contact.form', function() {
this.fill('form#contact-form', {
'subject': 'I am watching you',
'content': 'So be careful.',
'civility': 'Mr',
'name': 'Chuck Norris',
'email': 'chuck#norris.com',
'cc': true,
'attachment': '/Users/chuck/roundhousekick.doc'
}, true);
});
casper.then(function() {
this.evaluateOrDie(function() {
return /message sent/.test(document.body.innerText);
}, 'sending message failed');
});
casper.run(function() {
this.echo('message sent').exit();
});

Develop Reference

JavaScript is the programming language of the Web.

Extracting dynamic content with node.js and PhantomJS - javascript

Related

How to get HTML content using PhantomJS after X seconds?

Injecting javascript function in nightmareJS

Having trouble downloading dynamic web content with PhantomJS

fabricjs or slimerjs, "all objects displayed" event?

login to a webpage using phantomjs and Jquery

Categories

Resources