Casperjs slowing down looping through links - javascript

I'm learning CasperJS by making a test for my website that grabs all the links from the nav bar and loops through opening them all up and running a small test for each page (check the title, hit a search button, see if results come back, etc). I also included a "Quick Test" flag that will only check the page title before moving on to the next link. There are about 25 links total.
The issue is that somehow the script gets stuck after about 10 full tests, but works fine with quick-testing. This is the loop I'm using to open each page:
casper.each(linkList, function(self, link){
self.thenOpen(link, function(){
self.echo(link);
temp = Date.now();
this.open(urlPrefix + link);
this.then(function(){
temp = (Date.now()) - temp;
self.echo("Load time: "+temp.toString()+"ms");
switch(link){
//case statements for specific pages
// - run specialized versions of testPage()
case "Example":
testExample(this);
break;
default:
testPage(this);
break;
}
});
});
});
The testPage() and page specific functions all look something like:
function testPage(ths){
checkTitle(ths, "Page Title");
if(quickTest)
return;
ths.click('#searchButton');
casper.waitForSelectorTextChange("#results",function(){
temp = ths.evaluate(function(){
return $("tr.row").length;
});
if(temp>0)
casper.echo("Results returned");
else
casper.echo("No results returned");
});
}
The checkTitle() function is just a simple:
function checkTitle(ths, name){
temp = ths.getTitle();
casper.echo("Page Title: "+temp+" - App loads: "+(temp==name ? "PASSED" : "FAILURE"));
}
Now, if quickTest is true then the loop finishes, no problems. If quickTest is false then the loop hangs indefinitely on the 12th page. Coincidentally, the 11th page is literally the same page, just with more options for the search filters. Additionally, my casperjs scripts is telling me it take the page 13410ms to load with quickTest=false and only 460ms with quickTest=true, which is confusing since no code between the 2 timestamps is skipped/added from that flag and loading the page in IE doesn't take nearly that long.
Why is casper slowing down after looping through links?

I managed to stumble upon this page. It appears that somewhere in this process there is a memory leak. While I'm still unfamiliar with casperjs and phantomjs, I would guess it involves the this.open() bit in the loop. I've managed to get all the tests to finish by adding the following:
casper.page.close();
casper.page = casper.newPage();
So the beginning of the loop code now looks like:
casper.each(linkList, function(self, link){
self.thenOpen(link, function(){
self.echo(link);
casper.page.close();
casper.page = casper.newPage();
temp = Date.now();
this.open(urlPrefix + link);
......

Related

Creating a Loop that checks for instock items on a webpage using a javascript file

I have been trying to create a Javascript simple program that checks for words on a specific web page and I am just wanting it to check every 5 seconds or so if the words are there. I think I have the pieces of the puzzle but I just cant put it together. I am a beginner at best and don't understand why this code is not working.
This is what I have but the iteration after searching for the word and reloading the page to check again are not in sync and the page reloads once. It should be checking if word is there, if it is there then I am trying to get it to reload page and check again and loop again .... every 5 seconds. Current output with below code is "true" in an alert box twice then reload after 5 seconds once. And it's over.
Note:
x just stops from looping forever.
When it doesn't match any more it plays a song.
I saved this as a .js file and I am currently just testing the code within chrome dev tools.
'''
var x = 1;
function ol(){
do {
if (document.documentElement.outerHTML.search('8"},"availability_html":"<p class=\\\\"stock out-of-stock') != -1)
{
x=x+1;
alert("true");
window.setTimeout(function ol() {
window.location.reload(true);
}, 5000);
} else {
alert("NOT FOUND!");
var snd1 = new Audio("C:\Users\DL\Desktop\bot files\BattleMetal-320bit.mp3");
function beep1()
{alert()
snd1.play()
beep1()
}
}
}while(x<3 && document.documentElement.outerHTML.search('8"},"availability_html":"<p class=\\\\"stock out-of-stock') != -1);
}
ol();
In the while condition you are checking if x < 3. so after printing twice true you are explicitly stopping the loop.

How can I deal with asynchronous requests involving modal popups in Casperjs?

Trying to iterate through a list of links that open modal popups, I'm running into an issue with the asynchronous nature of Javascript. I can loop through the links, and I can get Casperjs to click on all of the links. The popup opens up well (and I need to save the content of that popup). However, my code leads to Casperjs skipping every few links -- I suspect that's because of the delay. I need to be sure that every link is clicked and every popup saved. Any hint is highly appreciated!
I'm aware of Casperjs wait and waitForSelector functions, but no matter where I put them -- it still skips some popups. I suppose the reason for this behaviour is the delay, but increasing/decreasing the wait values and places where I tell casperjs to wait don't help.
this.then(function(){
x = 0;
this.each(links,function(self,link){
// I only need links that contain a certain string
if(link.indexOf('jugyoKmkName')>=0) {
var coursetitle = linktexts[x];
this.clickLabel(linktexts[x], 'a');
this.wait(2000, function() {
var coursetitleSplit = coursetitle.split(' ');
var courseid = coursetitleSplit[0];
//this logs the title and id in a file. Works perfectly
var line = courseid+' '+coursetitle+' \\n';
fs.write('/myappdirectory/alldata.txt', line, 'a');
//this logs the popup contents -- but it's completely out of sync
var courseinfo = this.getElementInfo('.rx-dialog-large').html
fs.write('/myappdirectory/'+courseid+'.html', courseinfo, 'w');
});
}
x++;
});
});
I'm logging two things here -- the link text (and some more information) in a running log file. That's working fine -- it catches every link correctly. The link text contains a unique id, which I'm using as a file name to save the popup contents. That's only working on every nth popup -- and the popup contents and the id are out of sync.
To be precise: The first 10 ids in the list are:
20000 -- saved with this id, but contains data of popup 20215
20160 -- saved with this id, but contains data of popup 20307
20211 -- saved with this id, but contains data of popup 20312
20214 ...etc (saved, but with popup from an ID way further down the list)
20215
20225
20235
20236
20307
20308
Obviously, I need the file 2000.html to save the contents of the popup with the ID 20000, 20160 with the contents of 20160 etc.
Presumably this.each(links,...) will run the callback synchronously rather than waiting for each this.wait() call to complete. Instead you'll want to wait until you've written your data to the filesystem before processing the next link. Consider this code instead:
this.then(function() {
function processNthLink(i) {
var self = this;
var link = links[i];
if (link.indexOf('jugyoKmkName')>=0) {
var coursetitle = linktexts[i];
self.clickLabel(linktexts[i], 'a');
self.wait(2000, function() {
var coursetitleSplit = coursetitle.split(' ');
var courseid = coursetitleSplit[0];
var line = courseid+' '+coursetitle+' \\n';
fs.write('/myappdirectory/alldata.txt', line, 'a');
var courseinfo = self.getElementInfo('.rx-dialog-large').html
fs.write('/myappdirectory/'+courseid+'.html', courseinfo, 'w');
if (i < links.length) {
processNthLink(i+1);
}
});
} else if (i < links.length) {
processNthLink(i+1);
}
}
processNthLink(0);
});
In this case the the next link will only be processed after the timeout and write to FS has been completed. In the case that the link doesn't contain the expected string, the next link is processed immediately.

After opening a webpage, check if it has opened before proceeding

I am trying to create a chrome extension that, with a click of a button opens several webpages that I often visit. Currently when clicked it opens 1-4 of the 4 webpages I want it to, often stopping prematurely. The code is pretty simple, so I figured it is a processing issue. For this reason I want to introduce some delay. I've been told not to use sleep() from the research I have found so I am trying to implement code that makes my For loop wait until the page has loaded before proceeding. Here is the code:
function OpenInNewTabWinBrowser(url) {
var win = window.open(url, '_blank');
//win.focus();
}
function CheckLoading() {
return document.readyState === "interactive";
}
var websites = ['https://reddit.com', 'https://xkcd.com', 'http://poorlydrawnlines.com', 'https://explosm.net'];
var MoveAlong;
for (var i = websites.length - 1; i >= 0; i--) {
OpenInNewTabWinBrowser(websites[i]);
console.log("Just opened a window!");
MoveAlong = CheckLoading();
console.log("Just checked if it was loading!");
/*while (MoveAlong == false) {
console.log("Just realized it hasn't loaded all the way!");
sleep(10);
console.log("Just woke up!");
MoveAlong = CheckLoading();
console.log("Just double checked if it had loaded!")
}
console.log("Just broke out of the while loop!")*/
}
console.log("Just finished doing everything you asked master!")
When I run the code as is I don't always open every page. The commented section is what I have tried to utilize as a pausing function but when that code is un-commented it only opens up one page and never anymore. I have also tried supplying console.log comments for debugging but when I inspect popup if even one window opens up the console closes itself and I am left with no means of reading where the code went wrong.
I have also tried this loop and function to check for a loaded page and then unpause. This code replaced the For loop from the snippet above. It also didn't work correctly.
var i = websites.length - 1;
do {
MoveAlong = false;
OpenInNewTabWinBrowser(websites[i]);
i--;
window.onload = function() {
MoveAlong = true;
}
}
while (MoveAlong == true && i >= 0);
Any help is much appreciated. On how to properly debug, on how to detect if the website is loading, on how to make this extension work. I have been a partial lurker for a while but now I am trying to actively code every day. This is my first post and hopefully it will be the beginning to a fun hobby. Thank you again.

Chrome JavaScript location object

I am trying to start 3 applications from a browser by use of custom protocol names associated with these applications. This might look familiar to other threads started on stackoverflow, I believe that they do not help in resolving this issue so please dont close this thread just yet, it needs a different approach than those suggested in other threads.
example:
ts3server://a.b.c?property1=value1&property2=value2
...
...
to start these applications I would do
location.href = ts3server://a.b.c?property1=value1&property2=value2
location.href = ...
location.href = ...
which would work in FF but not in Chrome
I figured that it might by optimizing the number of writes when there will be effectively only the last change present.
So i did this:
function a ()
{
var apps = ['ts3server://...', 'anotherapp://...', '...'];
b(apps);
}
function b (apps)
{
if (apps.length == 0) return;
location.href = apps[0]; alert(apps[0]);
setTimeout(function (rest) {return function () {b(rest);};} (apps.slice(1)), 1);
}
But it didn't solve my problem (actually only the first location.href assignment is taken into account and even though the other calls happen long enough after the first one (thanks to changing the timeout delay to lets say 10000) the applications do not get started (the alerts are displayed).
If I try accessing each of the URIs separately the apps get started (first I call location.href = uri1 by clicking on one button, then I call location.href = uri2 by clicking again on another button).
Replacing:
location.href = ...
with:
var form = document.createElement('form');
form.action = ...
document.body.appendChild(form);
form.submit();
does not help either, nor does:
var frame = document.createElement('iframe');
frame.src = ...
document.body.appendChild(frame);
Is it possible to do what I am trying to do? How would it be done?
EDIT:
a reworded summary
i want to start MULTIPLE applications after one click on a link or a button like element. I want to achieve that with starting applications associated to custom protocols ... i would hold a list of links (in each link there is one protocol used) and i would try to do "location.src = link" for all items of the list. Which when used with 'for' does optimize to assigning only once (the last value) so i make the function something like recursive function with delay (which eliminates the optimization and really forces 3 distinct calls of location.src = list[head] when the list gets sliced before each call so that all the links are taken into account and they are assigned to the location.src. This all works just fine in Mozilla Firefox, but in google, after the first assignment the rest of the assignments lose effect (they are probably performed but dont trigger the associated application launch))
Are you having trouble looping through the elements? if so try the for..in statement here
Or are you having trouble navigating? if so try window.location.assign(new_location);
[edit]
You can also use window.location = "...";
[edit]
Ok so I did some work, and here is what I got. in the example I open a random ace of spades link. which is a custom protocol. click here and then click on the "click me". The comments show where the JSFiddle debugger found errors.

Navigating / scraping hashbang links with javascript (phantomjs)

I'm trying to download the HTML of a website that is almost entirely generated by JavaScript. So, I need to simulate browser access and have been playing around with PhantomJS. Problem is, the site uses hashbang URLs and I can't seem to get PhantomJS to process the hashbang -- it just keeps calling up the homepage.
The site is http://www.regulations.gov. The default takes you to #!home. I've tried using the following code (from here) to try and process different hashbangs.
if (phantom.state.length === 0) {
if (phantom.args.length === 0) {
console.log('Usage: loadreg_1.js <some hash>');
phantom.exit();
}
var address = 'http://www.regulations.gov/';
console.log(address);
phantom.state = Date.now().toString();
phantom.open(address);
} else {
var hash = phantom.args[0];
document.location = hash;
console.log(document.location.hash);
var elapsed = Date.now() - new Date().setTime(phantom.state);
if (phantom.loadStatus === 'success') {
if (!first_time) {
var first_time = true;
if (!document.addEventListener) {
console.log('Not SUPPORTED!');
}
phantom.render('result.png');
var markup = document.documentElement.innerHTML;
console.log(markup);
phantom.exit();
}
} else {
console.log('FAIL to load the address');
phantom.exit();
}
}
This code produces the correct hashbang (for instance, I can set the hash to '#!contactus') but it doesn't dynamically generate any different HTML--just the default page. It does, however, correctly output that has when I call document.location.hash.
I've also tried to set the initial address to the hashbang, but then the script just hangs and doesn't do anything. For example, if I set the url to http://www.regulations.gov/#!searchResults;rpp=10;po=0 the script just hangs after printing the address to the terminal and nothing ever happens.
The issue here is that the content of the page loads asynchronously, but you're expecting it to be available as soon as the page is loaded.
In order to scrape a page that loads content asynchronously, you need to wait to scrape until the content you're interested in has been loaded. Depending on the page, there might be different ways of checking, but the easiest is just to check at regular intervals for something you expect to see, until you find it.
The trick here is figuring out what to look for - you need something that won't be present on the page until your desired content has been loaded. In this case, the easiest option I found for top-level pages is to manually input the H1 tags you expect to see on each page, keying them to the hash:
var titleMap = {
'#!contactUs': 'Contact Us',
'#!aboutUs': 'About Us'
// etc for the other pages
};
Then in your success block, you can set a recurring timeout to look for the title you want in an h1 tag. When it shows up, you know you can render the page:
if (phantom.loadStatus === 'success') {
// set a recurring timeout for 300 milliseconds
var timeoutId = window.setInterval(function () {
// check for title element you expect to see
var h1s = document.querySelectorAll('h1');
if (h1s) {
// h1s is a node list, not an array, hence the
// weird syntax here
Array.prototype.forEach.call(h1s, function(h1) {
if (h1.textContent.trim() === titleMap[hash]) {
// we found it!
console.log('Found H1: ' + h1.textContent.trim());
phantom.render('result.png');
console.log("Rendered image.");
// stop the cycle
window.clearInterval(timeoutId);
phantom.exit();
}
});
console.log('Found H1 tags, but not ' + titleMap[hash]);
}
console.log('No H1 tags found.');
}, 300);
}
The above code works for me. But it won't work if you need to scrape search results - you'll need to figure out an identifying element or bit of text that you can look for without having to know the title ahead of time.
Edit: Also, it looks like the newest version of PhantomJS now triggers an onResourceReceived event when it gets new data. I haven't looked into this, but you might be able to bind a listener to this event to achieve the same effect.

Categories

Resources