How to use PhantomJS along with node.js for scraping? - javascript

I have installed node-phantom by npm install node-phantom but when I am running this code, it is giving Cannot find module 'webpage' this error
var webpage = require('webpage').create(),
url = "https://www.example.com/cba/abc",
hrefs = new Array();
webpage.open(url,function(status){
if(status=="success"){
var results = page.evaluate(function(){
$("#endpoints").each(function() {
hrefs.push($(this).attr("href"));
});
return hrefs;
});
console.log(JSON.stringify(results));
phantom.exit();
}
});

You don't require the webpage module in node-phantom. You would use its API to get a representation of the webpage module. It has to be done this way, because PhantomJS has a different execution runtime from node.js. They generally can't use the same modules. That is why there are bridges between those two execution environments like node-phantom and phantom. They essentially replicate the API of PhantomJS to be used in node.js.
As per documentation, you don't require the webpage, you get a page instead:
var phantom = require('node-phantom');
phantom.create(function(err,ph) {
return ph.createPage(function(err,page) {
// do something with page: basically your script
});
});
You won't be able to just copy and paste existing PhantomJS code. There are differences, so you will have to study the API (basically the README on github).
Complete translation of your code:
var phantom = require('node-phantom');
phantom.create(function(err,ph) {
return ph.createPage(function(err,page) {
page.open(url,function(status){
if(status=="success"){
page.evaluate(function(){
hrefs = [];
$("#endpoints").each(function() {
hrefs.push($(this).attr("href"));
});
return hrefs;
}, function(err, results){
console.log(JSON.stringify(results));
ph.exit();
});
}
});
});
});
page.evaluate is still sandboxed, so you can't use variables from the outside like hrefs.

Related

use .net dll in electron

I am a .NET developer and new to electron and node.js.
From my electron application, I need to call one function inside a .NET class library DLL which will generate some document and will send to print.
I need to use this electron application only on the windows machine. I see plugin Edge.js, but am not sure this will work for me and also don't know how to include in my project.
Edge.js will do the trick.
See the following snippet:
var edge = remote.require('electron-edge');
var toErMahGerd = edge.func({
assemblyFile: 'ERMAHGERD.dll',
typeName: 'ERMAHGERD.Translate',
methodName: "ToErMahGerd"
});
document.getElementById("translate-btn").addEventListener("click", function (e) {
var inputText = document.getElementById("input-text").value;
toErMahGerd(inputText, function (error, result) {
document.getElementById("output-text").innerHTML = result;
});
});
And here is the GitHub-repo with not only good docs to dive in but a simple getting started.

Accessing local files in offline jquery app

I'm a beginner trying to use jquery to build an app (mostly offline), I'm developing it using chrome/firefox I want to have a local .txt file with some data stored in it as an array. However, I can't seem to access it. The ajax function never succeeds.
(document).ready(function () {
local_list_dict = ['Example', 'Example 2', 'Example 3'];
online_list_dict = ['Park', 'running'];
$('#master_set').on('click', function () {
$.ajax({ //this does not work
url: "/local/pg/document1.txt",
success: function (data) {
alert('success');
},
});
for (i = 0; i < local_list_dict.length; i++) {
$('#local_list').append("<li class='idea_list'><a href='#player_1' rel='external'>" + local_list_dict[i] + "</a></li>");
}
;
$('#local_list').listview('refresh');
});
$('#home').hide().fadeToggle(500);
$('.idea_list').on('click', function () {
alert('debug')
var panelId = $(this).text(); // some function to pass player_1 the contents of the list
$('#chosen_list').html();// some function that takes panelId and uses it to choose the relevant .txt file
});
});
I tried do the same thing, but I don't got some good results duo the security rules. There are some tricks to help you to try, but the best to do is run your script in a local server (you can do it with the WampServer or other tools).
Some interesting links that can help you:
https://stackoverflow.com/a/372333/3126013
https://stackoverflow.com/a/19902919/3126013
http://www.html5rocks.com/en/tutorials/file/dndfiles/
An easy way is by running your project/app in a local server such as Node.js or even more easy for you, by using the extension Chrome Dev Editor (developer preview) --
Chrome Dev Editor (CDE) is a developer tool for building apps on the Chrome platform - Chrome Apps and Web Apps. CDE has support for writing applications in JavaScript or Dart, and has Polymer templates to help you get started building your UI. CDE also has built-in support for Git, Pub and Bower.
Personally, I prefer run my local apps in Node.js

Cannot dynamically set the list of URLs used by eachThen API in CasperJS

I am trying to make use of the new 1.1 eachThen() API in casperJS however I am finding some strange behaviour with it.
Below follows a simple application
var casper = require('casper').create({
verbose: true,
logLevel: "error"
});
var urls = ['http://google.com/'];
casper.start();
var testvar = "";
casper.then(function() {
urls = ['http://yahoo.com/', 'http://www.youtube.com/'];
});
casper.eachThen(urls, function(response) {
console.log("Opening: "+response.data);
this.thenOpen(response.data, function(response) {
testvar = response.url;
});
});
casper.run();
The way I understand it is that this application should open yahoo.com followed youtube.com however the array assignment on the step before does not seem to be taken in consideration at all and the output will be "Opening: http://google.com/".
Is anybody aware of any limitation on doing this or is this possible a bug in the current (beta) version of casperJS. I am using the latest 1.1.0-DEV
To answer my own question, wrapping the whole thing in a then() step does the job as explained by hexid in the comments however it seems that doing it as a "standalone" is not possible (Either due to a bug or by design, uncertain to me at the moment).

CasperJS - using jQuery. ReferenceError: Can't find variable: jQuery/$

I'm writing code that involves jQuery in CasperJS. By chance, could someone point out the error I've made in including jQuery? (After 45 minutes of searching, I'm starting to think it's a local problem.)
I have tried both of the following:
casper.page.injectJs('C:\sweeps\jquery-1.10.2.min.js');
and
var casper = require('casper').create({
clientScripts: ["C:\sweeps\jquery-1.10.2.min.js"]
});
Code:
// sample.js
var casper = require('casper').create();
var login = "some username";
var password = "some password";
casper.start('https://www.paypal.com/us/home', function() {
this.fillXPath('form.login', {
'//input[#name="login_email"]': login,
'//input[#name="login_password"]': password,
}, true);
});
casper.page.injectJs('C:\sweeps\jquery-1.10.2.min.js');
$("input[name='submit.x']").click();
setTimeout(function(){
setTimeout(function(){
casper.run(function() {
this.captureSelector('example2.png', '#page');
this.echo('Done.').exit();
});
}, 30000); }, 1);
Output:
ReferenceError: Can't find cariable: jQuery
C:/sweeps/test2.js:21
The same result comes when "jQuery" is switched to "$".
EDIT: I've also tried relative pathing.
My reference is: Can I use jQuery with CasperJS?
Read this Casper#evaluate()
The concept behind this method is probably the most difficult to understand when discovering CasperJS. As a reminder, think of the evaluate() method as a gate between the CasperJS environment and the one of the page you have opened; everytime you pass a closure to evaluate(), you’re entering the page and execute code as if you were using the browser console.
casper.evaluate(function() {
$("input[name='submit.x']").click();
});
You need to use the jQuery selector as if you were in a browser.
Your path to the javascript file should be a URI relative to your HTML file, not a file-system path. Assuming your files are in the c:\sweepstakes folder, try
var casper = require('casper').create({
clientScripts: ["jquery-1.10.2.min.js"]
});
Also, use your browser's network/dev tools to see if your jQuery library is being downloaded or not.

Can I see code coverage on code executed by headless browser?

ATM I'm working on a small project with node.js + express + mongodb. The logic is on web, but is loaded from my node.js server. Something like this in my index.html
<script src="./app.js"></script>
<script type="text/javascript">
var debug = false;
$(document).ready(function() {
app.start();
});
</script>
My test are functional -- meaning that I use a headless browser (Zombie) and I get good indications about the coverage with istanbul. I tried blanket unsuccessfully.
process.env['TEST'] = true;
var app = require('../server/JS_TPV.server.js');
var mongodb = require('mongodb');
var should = require("should");
var Browser = require("zombie");
var browser;
Then something like:
before(function(done) {
var populateDB = require('../install/JS_TPV.mongo_db_fill.js');
populateDB.install(function() {
browser = new Browser({debug:false, silent:false});
browser.visit("http://localhost:8080").then(done,done);
console.log("visited ending BEFORE");
});
});
But since index.html file is being accessed and all the js files on it are loaded, I think it should show it's coverage too.
Is any way to show this?
Or the only way to do this is by generating an html-kind of test where I check my web functions? (yeah, or with require.js and testing all the logic node-style).
Thanks!
You can :)
The key points are
the code executed by the browser has to be instrumented
someone must collect the coverage information
You can find an example of this working here: https://github.com/ericminio/yop-promises/blob/master/test/promises.with.browser.spec.js
run in order npm run cover and npm run report and navigate to coverage folder to find the report. Play around with not running Zombie test to see how that impacts code coverage.
This is one example with Zombie and Istanbul, so it really deals specifically with how those two tools can let you go through the 2 points above.

Categories

Resources