Multi-process scraper in Nightmare.js - javascript

I'm developing a scraper in Nodejs using the npm package NightmareJS.
The scraper works perfectly if I run a single process.
But when I run it in a multi-process way something very curious happen: Basically all the nightmare instances start successfully but then only one of them is executed while the others stop working.
I'm thinking that probably the last instance overrides the previous ones.
Does anyone has any suggestion/idea?
To fork the process I use this code:
childVector[i] = cp.fork("./file.js");
childVector[i].send(JSON.stringify(informations));
childVector[i].on('message', function(message) {
doStuff();
});

Related

How to wait for the page to load with Mocha JS + Selenium webdriver

I'm new on running test with Selenium webdriver and Mocha JS. I did a few things already but I'm stuck on an issue since a few days, and after reading some issues that looked similar on SO, I didn't find my answer.
So, I'm running tests and at some point, on of my clicks causes a page reload. Just after this reload, I'm trying to fill some form inputs, but I have errors because the browser is unable to locate the element I'm searching for.
Here is my script :
it("Test", async function() {
await driver.get("https://www.myURL.com/")
await driver.findElement(By.id("button")).click()
// Here is the reloading
await driver.findElement(By.name("login")).sendKeys("mylogin") // => not working because not loaded
})
I don't know if I have to put a timeout and how to place it or if I need something else.
For information, I'm running my tests on local first with Visual Studio Code (+ extensions).
Thanks everyone for help.
Greg

how to initiate a node js command with a html button?

i am making a PWA for my college portal, so i need an API for HTTPS call, but unfortunately i have no API for that website which i am making for to use the concepts like FETCH API in js or AXIOS. I would rather go with XMLHTTPS request but, its an old one and very complex too. So, i though of using selenium with javascript. and i did it successfully, just installed npm package and i have installed some webdrivers for firefox.and everything is fine, but the whole selenium webdriver is working with a node command.
node file_name
node file_name.js
any syntax is accepted .
const {Builder, By, Key, util}=require("selenium-webdriver");
async function example(){
let driver = await new Builder().forBrowser("firefox").build();
await driver.get("https://login.gitam.edu/Login.aspx"); //this is the link
await driver.findElement(By.name("txtusername")).sendKeys("0116", Key.INSERT);
await driver.findElement(By.name("password")).sendKeys("password", Key.RETURN);
}
example();
now, the above code is operated by node file_name
but i need this from my UI , Which i am making for my PWA. so, i design button, when user click that, it is going to initiate the above selenium javascript snippet.So i need a code or any resource for making that button to initiate the terminal command and that in return makes selenium javascript code to run.
please answer my question.
ps: this is my first question on stackoverflow !!!!
thank you very much.

app.relaunch([options]) is not working in electron

I want to change userData path with path defined by user. So, I'm fetching the path from UI, storing it into a file. So that next time app launches, it changes the path.
I wanted to restart the app as soon as user has selected the path. I tried app.relaunch() function. But it didn't work, neither it returned error.
I used exact same example mentioned in documentation. http://electron.atom.io/docs/api/app/#apprelaunchoptions
Calling app.relaunch() will not actually quit the app, you need to follow it by a call to app.quit() or app.exit().
app.relaunch();
app.quit();
This code must work, but Please Note that while debugging (i.e. in Visual studio code) after app.quit() debugger disconnects and kills whole application, therefore app will not restart. You might want to test it on application that is already installed or run through npm.

Can I/how can I translate a Selenium webdriver test script from node.js over to phantomjs - ghostdriver?

I recently began working with Selenium and to make life easier to start I was using node to run my scripts so that I could visually monitor the tests. My challenge now is to convert it so that it can be run as a headless test. Unfortunately, most of the resources that I have come across only address using phantomjs and ghostdriver with Java or Python. My boss wants me to run the test through phantomjs without Java or Python. Eventually these tests will be run remotely through a Linux VM on a server without a GUI. Currently I am testing using Mac OS X 10.8 and still have many bridges to cross in order to get to my goal.
My most important question firstly, is it possible to run a script from phantomjs through a port without the use of Java or Python? I have spent hours poring through as many resources as I could come across and I've come up with no solution.
If so, how can I properly initialize the test to run headless? Here is how I scripted the start of my functioning test. I want to properly switch the capabilities from firefox to phantomjs and be able to run it headless using the appropriate port. The rest of the test navigates to a specific site, logs in through a widget, then does further navigation to the area which I will build further tests on which to manipulate after I get this working.
var webdriver = require('selenium-webdriver'),
SeleniumServer = require('selenium-webdriver/remote').SeleniumServer;
var server = new SeleniumServer("Path/selenium-server-standalone-2.39.0.jar", {
port: 8910
});
server.start();
var driver = new webdriver.Builder().
usingServer(server.address()).
withCapabilities(webdriver.Capabilities.firefox()).
build();
The test works perfectly, but I am new to this so there might be something foolish that I am overlooking. Please let me know what adjustments to make so that it will run headless through phantom. When I attempt to use node to run the script after switching capabilities to phantomjs it produces
"/Selenium/node_modules/selenium-webdriver/phantomjs.js:22
LogLevel = webdriver.logging.LevelName,
^
TypeError: Cannot read property 'LevelName' of undefined
at Object.<anonymous> (/Selenium/node_modules/selenium-webdriver/phantomjs.js:22:33)
That's a read only file that I can't adjust, any attempts that I made to define "LogLevel" or "LevelName" to the appropriate corresponding value (DEBUG, etc.) were fruitless.
And if I run it through phantomjs itself I get -
"Error: Cannot find module 'path'
phantomjs://bootstrap.js:289
phantomjs://bootstrap.js:254 in require"
(It also lists module 'http') -- (and various undefined function errors)
I feel that with that instance I didn't properly organize where the files for Selenium, phantomjs, and ghostdriver should go in order to play nice. I also removed the server setup portion and instead ran this first, then the script separately.
phantomjs --webdriver=8910
But it yielded the same result. All of my research to fix these issues turned up instructions for Java and Python but not Javascript by itself. Rather than chase through many rabbit holes I figured it wise to consult better minds.
If you know better than I do and that it is fruitless to attempt this without Java or Python, please let me know. If you know where the issue lies within my script and could propose a fix please let me know. I hope that I have properly described the nature of my issue and if you need more information I will do my best to provide it to you.
This is my second week working with Javascript so if you believe I am making a noob error you very well may be correct. Please, keep in mind that the script works through node with selenium webdriver.
Many thanks for your time!!!
~Isaac
This was a bit tricky but here is the solution I've pieced together:
var webdriver = require('selenium-webdriver'),
SeleniumServer = require('selenium-webdriver/remote').SeleniumServer,
server = new SeleniumServer('/path/to/selenium/selenium-server-standalone-2.41.0.jar', {
port: 4444
}),
capabilities = webdriver.Capabilities.phantomjs();
capabilities.set('phantomjs.binary.path', 'path/to/phantom/bin/phantomjs');
var promise = server.start().then(function() {
var client = new webdriver.Builder().
usingServer(server.address()).withCapabilities(
capabilities
).build();
return {
'client': client,
'server': server
};
}, function(err) {
console.log('error starting server', err);
});
You can then use the promise with selenium's mocha-compatible test framework to hold the test till the server has started.
I found the documentation really helpful once i figured out the navigation is on the far right of the page. Here's the URL: http://selenium.googlecode.com/git/docs/api/javascript/module_selenium-webdriver.html
Then you'll be stuck where I am. Getting selenium-webdriver to quiet down.

How can I edit on my server files without restarting nodejs when i want to see the changes?

I'm trying to setup my own nodejs server, but I'm having a problem. I can't figure out how to see changes to my application without restarting it. Is there a way to edit the application and see changes live with node.js?
Nodules is a module loader for Node that handles auto-reloading of modules without restarting the server (since that is what you were asking about):
http://github.com/kriszyp/nodules
Nodules does intelligent dependency tracking so the appropriate module factories are re-executed to preserve correct references when modules are reloaded without requiring a full restart.
Check out Node-Supervisor. You can give it a collection of files to watch for changes, and it restarts your server if any of them change. It also restarts it if it crashes for some other reason.
"Hot-swapping" code is not enabled in NodeJS because it is so easy to accidentally end up with memory leaks or multiple copies of objects that aren't being garbage collected. Node is about making your programs accidentally fast, not accidentally leaky.
EDIT, 7 years after the fact: Disclaimer, I wrote node-supervisor, but had handed the project off to another maintainer before writing this answer.
if you would like to reload a module without restarting the node process, you can do this by the help of the watchFile function in fs module and cache clearing feature of require:
Lets say you loaded a module with a simple require:
var my_module = require('./my_module');
In order to watch that file and reload when updated add the following to a convenient place in your code.
fs.watchFile(require.resolve('./my_module'), function () {
console.log("Module changed, reloading...");
delete require.cache[require.resolve('./my_module')]
my_module = require('./my_module');
});
If your module is required in multiple files this operation will not affect other assignments, so keeping module in a global variable and using it where it is needed from global rather than requiring several times is an option. So the code above will be like this:
global.my_module = require ('./my_module');
//..
fs.watchFile(require.resolve('./my_module'), function () {
console.log("Module changed, reloading...");
delete require.cache[require.resolve('./my_module')]
global.my_module = require('./my_module');
});
Use this:
https://github.com/remy/nodemon
Just run your app like this: nodemon yourApp.js
There should be some emphasis on what's happening, instead of just shotgunning modules at the OP. Also, we don't know that the files he is editing are all JS modules or that they are all using the "require" call. Take the following scenarios with a grain of salt, they are only meant to describe what is happening so you know how to work with it.
Your code has already been loaded and the server is running with it
SOLUTION You need to have a way to tell the server what code has changed so that it can reload it. You could have an endpoint set up to receive a signal, a command on the command line or a request through tcp/http that will tell it what file changed and the endpoint will reload it.
//using Express
var fs = require('fs');
app.get('reload/:file', function (req, res) {
fs.readfile(req.params.file, function (err, buffer) {
//do stuff...
});
});
Your code may have "require" calls in it which loads and caches modules
SOLUTION since these modules are cached by require, following the previous solution, you would need a line in your endpoint to delete that reference
var moduleName = req.params.file;
delete require.cache[moduleName];
require('./' + moduleName);
There's a lot of caveats to get into behind all of this, but hopefully you have a better idea of what's happening and why.
What's “Live Coding”?
In essence, it's a way to alter the program while it runs, without
restarting it. The goal, however, is to end up with a program that
works properly when we (re)start it. To be useful, it helps to have an
editor that can be customized to send code to the server.
Take a look: http://lisperator.net/blog/livenode-live-code-your-nodejs-application/
You can also use the tool PM2. Which is a advanced production process tool for node js.
http://pm2.keymetrics.io/
I think node-inspector is your best bet.
Similar to how you can Live Edit Client side JS code in Chrome Dev tools, this utilizes the Chrome (Blink) Dev Tools Interface to provide live code editing.
https://github.com/node-inspector/node-inspector/wiki/LiveEdit
A simple direct solution with reference to all answers available here:
Node documentation says that fs.watch is more efficient than fs.watchFile & it can watch an entire folder.
(I just started using this, so not really sure whether there are any drawbacks)
fs.watch("lib", (event_type, file_name) => {
console.log("Deleting Require cache for " + file_name);
delete require.cache[ require.resolve("./lib/" + file_name)];
});

Categories

Resources