Trying to iterate through a list of links that open modal popups, I'm running into an issue with the asynchronous nature of Javascript. I can loop through the links, and I can get Casperjs to click on all of the links. The popup opens up well (and I need to save the content of that popup). However, my code leads to Casperjs skipping every few links -- I suspect that's because of the delay. I need to be sure that every link is clicked and every popup saved. Any hint is highly appreciated!
I'm aware of Casperjs wait and waitForSelector functions, but no matter where I put them -- it still skips some popups. I suppose the reason for this behaviour is the delay, but increasing/decreasing the wait values and places where I tell casperjs to wait don't help.
this.then(function(){
x = 0;
this.each(links,function(self,link){
// I only need links that contain a certain string
if(link.indexOf('jugyoKmkName')>=0) {
var coursetitle = linktexts[x];
this.clickLabel(linktexts[x], 'a');
this.wait(2000, function() {
var coursetitleSplit = coursetitle.split(' ');
var courseid = coursetitleSplit[0];
//this logs the title and id in a file. Works perfectly
var line = courseid+' '+coursetitle+' \\n';
fs.write('/myappdirectory/alldata.txt', line, 'a');
//this logs the popup contents -- but it's completely out of sync
var courseinfo = this.getElementInfo('.rx-dialog-large').html
fs.write('/myappdirectory/'+courseid+'.html', courseinfo, 'w');
});
}
x++;
});
});
I'm logging two things here -- the link text (and some more information) in a running log file. That's working fine -- it catches every link correctly. The link text contains a unique id, which I'm using as a file name to save the popup contents. That's only working on every nth popup -- and the popup contents and the id are out of sync.
To be precise: The first 10 ids in the list are:
20000 -- saved with this id, but contains data of popup 20215
20160 -- saved with this id, but contains data of popup 20307
20211 -- saved with this id, but contains data of popup 20312
20214 ...etc (saved, but with popup from an ID way further down the list)
20215
20225
20235
20236
20307
20308
Obviously, I need the file 2000.html to save the contents of the popup with the ID 20000, 20160 with the contents of 20160 etc.
Presumably this.each(links,...) will run the callback synchronously rather than waiting for each this.wait() call to complete. Instead you'll want to wait until you've written your data to the filesystem before processing the next link. Consider this code instead:
this.then(function() {
function processNthLink(i) {
var self = this;
var link = links[i];
if (link.indexOf('jugyoKmkName')>=0) {
var coursetitle = linktexts[i];
self.clickLabel(linktexts[i], 'a');
self.wait(2000, function() {
var coursetitleSplit = coursetitle.split(' ');
var courseid = coursetitleSplit[0];
var line = courseid+' '+coursetitle+' \\n';
fs.write('/myappdirectory/alldata.txt', line, 'a');
var courseinfo = self.getElementInfo('.rx-dialog-large').html
fs.write('/myappdirectory/'+courseid+'.html', courseinfo, 'w');
if (i < links.length) {
processNthLink(i+1);
}
});
} else if (i < links.length) {
processNthLink(i+1);
}
}
processNthLink(0);
});
In this case the the next link will only be processed after the timeout and write to FS has been completed. In the case that the link doesn't contain the expected string, the next link is processed immediately.
Related
I currently only know javascript. But the thing is I looked up how to do it and some people talk about something called localStorage. I have tried this and for some reason when I jump to a new page those variables aren't kept. Maybe I am doing something wrong? I jump to a new page via
and all I want do do is select a certain image. take that image to a new page and add it to that page.
I tried using the localStorage variables and even turning it into JSON.stringify and doing JSON.parse when trying to call the localstorage to another script. It didn't seem to work for me. Is there another solution?
This is some of my code. There are two scripts.
document.querySelectorAll(".card").forEach(item => {
item.addEventListener("click", onProductClick);
})
var div;
var productImg;
var ratingElement;
var reviewCount;
var price;
function onProductClick(){
// This took a week to find out (this.id)
// console.log(this.id);
div = document.getElementById(this.id);
productImg = div.getElementsByTagName('img')[0];
ratingElement = div.getElementsByTagName('a')[2];
reviewCount = div.getElementsByTagName('a')[3]
price = div.getElementsByTagName('a')[4];
console.log(div.getElementsByTagName('a')[4]);
var productData = [div, productImg,ratingElement,reviewCount,price];
window.localStorage.setItem("price", JSON.stringify(price));
}
function TranslateProduct(){
console.log("Hello");
}
This is script 2
var productPageImage = document.getElementById("product-image");
var myData = localStorage['productdata-local'];
var value =JSON.parse(window.localStorage.getItem('price'));
console.log(value);
// function setProductPage(img){
// if(productImg != null){
// return;
// }
// console.log(window.price);
// }
To explain my thought process on this code in the first script I have multiple images that have event listeners for a click. I wanted to Click any given image and grab all the data about it and the product. Then I wanted to move that to another script (script 2) and add it to a dynamic second page. yet I print my variables and they work on the first script and somehow don't on the second. This is my code. in the meantime I will look into cookies Thank you!
Have you tried Cookies
You can always use cookies, but you may run into their limitations. These days, cookies are not the best choice, even though they have the ability to preserve data even longer than the current window session.
or you can make a GET request to the other page by attaching your serialized object to the URL as follows:
http://www.app.com/second.xyz?MyObject=SerializedData
That other page can then easily parse its URL and deserialize data using JavaScript.
you can check this answer for more details Pass javascript object from one page to other
I need to run through an array of roles and open a modal dialog (in HTML) for each. I had a problem where each next dialog gets opened before I close the previous dialog (because of asynchronous Google Script.
I have tried implementing a solution by setting a while loop for Utilities.sleep() and adding a global variable 'sleeping' that becomes false when the modal dialog is closed.
However, now only the first dialog opens and the code does not run through the full 'for' loop.
function nightStart(nightNumber, playersArray, roleList) {
var sheet = SpreadsheetApp.getActiveSpreadsheet();
var range = sheet.getRange("Controls!G3:G1000");
var wakeupOrder = [];
var sleeping;
var role;
//collecting the array to define in what order roles wake up
for (var i = 1; i<=20; i++) {
var cellValue = range.getCell(i,1).getValue();
wakeupOrder.push(cellValue);
}
//the FOR loop that I am trying to make work (open Dialog for each role)
for (index in wakeupOrder) {
role = wakeupOrder[index];
if (roleList.indexOf(role) != -1) {
sleeping = true;
roleWakeUp(role, playersArray, roleList);
do {
Utilities.sleep(2000);
//calling global sleeping parameter that is defined as FALSE in the 'nightTargetSelection' function
sleeping = PropertiesService.getScriptProperties().getProperty('sleeping');
} while (sleeping != false);
}
}
}
//below is the function that opens the modal dialog (but the server side code still keeps running).
function roleWakeUp (role, playersArray, roleList){
//I have removed all code from here for Stack Overflow. The only part that I believe is important is that it opens an HTML dialog with a form
SpreadsheetApp.getUi().showModalDialog(actionInputDlg, wakeUpText);
}
//Below function is called by the client on HTML form submission. After this form is submitted I need the next dialog to open (i.e need the Utilities.sleep to stop running
function nightTargetSelection (selected, playerNumber){
var sleeping = false;
PropertiesService.getScriptProperties().setProperty('sleeping', sleeping);
}
I need an HTML dialog to open for each 'role' in the 'wakeupOrder' array (if the role exists in 'roleList'). Each next dialog needs to open only after the submission of the previous dialog.
You want to open several dialogs in order.
When the process is finished on a dialog, you want to open next dialog.
Namely, you don't want to open the next dialog before the current job is finished.
If my understanding is correct, how about this answer?
When SpreadsheetApp.getUi().showModalDialog() is opened in order, the dialog is overwritten. I think that this is the reason of your issue. So, here, I would like to introduce a sample script. In this sample script, the next dialog is opened from the current dialog. The flow of this sample script is as follows. Please think of this as just one of several answers.
Open a dialog by running start().
When "ok" button is clicked, the next dialog is opened by including the next job.
By this, each job can be completely done.
When all jobs were finished, done() is run and the dialog is closed.
Sample script:
When you use this script, please copy and paste "Code.gs" and "index.html" to "script" and "HTML" on your script editor, respectively. And please run start(). This sample script supposes that you are using the container-bound script of Spreadsheet.
Code.gs: Google Apps Script
// When all jobs were finished, this function is called.
function done(e) {
Logger.log(e)
}
// Open a dialog
function openDialog(jobs, i) {
var template = HtmlService.createTemplateFromFile('index');
template.jobs = JSON.stringify(jobs);
template.index = i;
SpreadsheetApp.getUi().showModalDialog(template.evaluate(), "sample");
}
// Please run this script
function start() {
var jobs = ["sample1", "sample2", "sample3"];
openDialog(jobs, 0);
}
index.html: HTML and Javascript
<div id="currentjob"></div>
<input type="button" value="ok" onclick="sample()">
<script>
var jobs = JSON.parse(<?= jobs ?>);
var index = Number(<?= index ?>);
document.getElementById("currentjob").innerHTML= "currentJob: " + jobs[index] + ", index: " + index;
function sample() {
if (index < jobs.length - 1) {
google.script.run.openDialog(jobs, index + 1); // Modified
} else {
google.script.run.withSuccessHandler(()=>google.script.host.close()).done("Done.");
}
}
</script>
References:
Class HtmlTemplate
withSuccessHandler()
Note:
This is a simple sample script. So please modify this for your situation.
If I misunderstood your question and this was not the result you want, I apologize.
I'm learning CasperJS by making a test for my website that grabs all the links from the nav bar and loops through opening them all up and running a small test for each page (check the title, hit a search button, see if results come back, etc). I also included a "Quick Test" flag that will only check the page title before moving on to the next link. There are about 25 links total.
The issue is that somehow the script gets stuck after about 10 full tests, but works fine with quick-testing. This is the loop I'm using to open each page:
casper.each(linkList, function(self, link){
self.thenOpen(link, function(){
self.echo(link);
temp = Date.now();
this.open(urlPrefix + link);
this.then(function(){
temp = (Date.now()) - temp;
self.echo("Load time: "+temp.toString()+"ms");
switch(link){
//case statements for specific pages
// - run specialized versions of testPage()
case "Example":
testExample(this);
break;
default:
testPage(this);
break;
}
});
});
});
The testPage() and page specific functions all look something like:
function testPage(ths){
checkTitle(ths, "Page Title");
if(quickTest)
return;
ths.click('#searchButton');
casper.waitForSelectorTextChange("#results",function(){
temp = ths.evaluate(function(){
return $("tr.row").length;
});
if(temp>0)
casper.echo("Results returned");
else
casper.echo("No results returned");
});
}
The checkTitle() function is just a simple:
function checkTitle(ths, name){
temp = ths.getTitle();
casper.echo("Page Title: "+temp+" - App loads: "+(temp==name ? "PASSED" : "FAILURE"));
}
Now, if quickTest is true then the loop finishes, no problems. If quickTest is false then the loop hangs indefinitely on the 12th page. Coincidentally, the 11th page is literally the same page, just with more options for the search filters. Additionally, my casperjs scripts is telling me it take the page 13410ms to load with quickTest=false and only 460ms with quickTest=true, which is confusing since no code between the 2 timestamps is skipped/added from that flag and loading the page in IE doesn't take nearly that long.
Why is casper slowing down after looping through links?
I managed to stumble upon this page. It appears that somewhere in this process there is a memory leak. While I'm still unfamiliar with casperjs and phantomjs, I would guess it involves the this.open() bit in the loop. I've managed to get all the tests to finish by adding the following:
casper.page.close();
casper.page = casper.newPage();
So the beginning of the loop code now looks like:
casper.each(linkList, function(self, link){
self.thenOpen(link, function(){
self.echo(link);
casper.page.close();
casper.page = casper.newPage();
temp = Date.now();
this.open(urlPrefix + link);
......
I was asked to take a look at what should be a simple problem with one of our web pages for a small dashboard web app. This app just shows some basic state info for underlying backend apps which I work heavily on. The issues is as follows:
On a page where a user can input parameters and request to view a report with the given user input, a button invokes a JS function which opens a new page in the browser to show the rendered report. The code looks like this:
$('#btnShowReport').click(function () {
document.getElementById("Error").innerHTML = "";
var exists = CheckSession();
if (exists) {
window.open('<%=Url.Content("~/Reports/Launch.aspx?Report=Short&Area=1") %>');
}
});
The page that is then opened has the following code which is called from Page_Load:
rptViewer.ProcessingMode = ProcessingMode.Remote
rptViewer.AsyncRendering = True
rptViewer.ServerReport.Timeout = CInt(WebConfigurationManager.AppSettings("ReportTimeout")) * 60000
rptViewer.ServerReport.ReportServerUrl = New Uri(My.Settings.ReportURL)
rptViewer.ServerReport.ReportPath = "/" & My.Settings.ReportPath & "/" & Request("Report")
'Set the report to use the credentials from web.config
rptViewer.ServerReport.ReportServerCredentials = New SQLReportCredentials(My.Settings.ReportServerUser, My.Settings.ReportServerPassword, My.Settings.ReportServerDomain)
Dim myCredentials As New Microsoft.Reporting.WebForms.DataSourceCredentials
myCredentials.Name = My.Settings.ReportDataSource
myCredentials.UserId = My.Settings.DatabaseUser
myCredentials.Password = My.Settings.DatabasePassword
rptViewer.ServerReport.SetDataSourceCredentials(New Microsoft.Reporting.WebForms.DataSourceCredentials(0) {myCredentials})
rptViewer.ServerReport.SetParameters(parameters)
rptViewer.ServerReport.Refresh()
I have omitted some code which builds up the parameters for the report, but I doubt any of that is relevant.
The problem is that, when the user clicks the show report button, and this new page opens up, depending on the types of parameters they use the report could take quite some time to render, and in the mean time, the original page becomes completely unresponsive. The moment the report page actually renders, the main page begins functioning again. Where should I start (google keywords, ReportViewer properties, etc) if I want to fix this behavior such that the other page can load asynchronously without affecting the main page?
Edit -
I tried doing the follow, which was in a linked answer in a comment here:
$.ajax({
context: document.body,
async: true, //NOTE THIS
success: function () {
window.open(Address);
}
});
this replaced the window.open call. This seems to work, but when I check out the documentation, trying to understand what this is doing I found this:
The .context property was deprecated in jQuery 1.10 and is only maintained to the extent needed for supporting .live() in the jQuery Migrate plugin. It may be removed without notice in a future version.
I removed the context property entirely and it didnt seem to affect the code at all... Is it ok to use this ajax call in this way to open up the other window, or is there a better approach?
Using a timeout should open the window without blocking your main page
$('#btnShowReport').click(function () {
document.getElementById("Error").innerHTML = "";
var exists = CheckSession();
if (exists) {
setTimeout(function() {
window.open('<%=Url.Content("~/Reports/Launch.aspx?Report=Short&Area=1") %>');
}, 0);
}
});
This is a long shot, but have you tried opening the window with a blank URL first, and subsequently changing the location?
$("#btnShowReport").click(function(){
If (CheckSession()) {
var pop = window.open ('', 'showReport');
pop = window.open ('<%=Url.Content("~/Reports/Launch.aspx?Report=Short&Area=1") %>', 'showReport');
}
})
use
`$('#btnShowReport').click(function () {
document.getElementById("Error").innerHTML = "";
var exists = CheckSession();
if (exists) {
window.location.href='<%=Url.Content("~/Reports/Launch.aspx?Report=Short&Area=1") %>';
}
});`
it will work.
I'm trying to download the HTML of a website that is almost entirely generated by JavaScript. So, I need to simulate browser access and have been playing around with PhantomJS. Problem is, the site uses hashbang URLs and I can't seem to get PhantomJS to process the hashbang -- it just keeps calling up the homepage.
The site is http://www.regulations.gov. The default takes you to #!home. I've tried using the following code (from here) to try and process different hashbangs.
if (phantom.state.length === 0) {
if (phantom.args.length === 0) {
console.log('Usage: loadreg_1.js <some hash>');
phantom.exit();
}
var address = 'http://www.regulations.gov/';
console.log(address);
phantom.state = Date.now().toString();
phantom.open(address);
} else {
var hash = phantom.args[0];
document.location = hash;
console.log(document.location.hash);
var elapsed = Date.now() - new Date().setTime(phantom.state);
if (phantom.loadStatus === 'success') {
if (!first_time) {
var first_time = true;
if (!document.addEventListener) {
console.log('Not SUPPORTED!');
}
phantom.render('result.png');
var markup = document.documentElement.innerHTML;
console.log(markup);
phantom.exit();
}
} else {
console.log('FAIL to load the address');
phantom.exit();
}
}
This code produces the correct hashbang (for instance, I can set the hash to '#!contactus') but it doesn't dynamically generate any different HTML--just the default page. It does, however, correctly output that has when I call document.location.hash.
I've also tried to set the initial address to the hashbang, but then the script just hangs and doesn't do anything. For example, if I set the url to http://www.regulations.gov/#!searchResults;rpp=10;po=0 the script just hangs after printing the address to the terminal and nothing ever happens.
The issue here is that the content of the page loads asynchronously, but you're expecting it to be available as soon as the page is loaded.
In order to scrape a page that loads content asynchronously, you need to wait to scrape until the content you're interested in has been loaded. Depending on the page, there might be different ways of checking, but the easiest is just to check at regular intervals for something you expect to see, until you find it.
The trick here is figuring out what to look for - you need something that won't be present on the page until your desired content has been loaded. In this case, the easiest option I found for top-level pages is to manually input the H1 tags you expect to see on each page, keying them to the hash:
var titleMap = {
'#!contactUs': 'Contact Us',
'#!aboutUs': 'About Us'
// etc for the other pages
};
Then in your success block, you can set a recurring timeout to look for the title you want in an h1 tag. When it shows up, you know you can render the page:
if (phantom.loadStatus === 'success') {
// set a recurring timeout for 300 milliseconds
var timeoutId = window.setInterval(function () {
// check for title element you expect to see
var h1s = document.querySelectorAll('h1');
if (h1s) {
// h1s is a node list, not an array, hence the
// weird syntax here
Array.prototype.forEach.call(h1s, function(h1) {
if (h1.textContent.trim() === titleMap[hash]) {
// we found it!
console.log('Found H1: ' + h1.textContent.trim());
phantom.render('result.png');
console.log("Rendered image.");
// stop the cycle
window.clearInterval(timeoutId);
phantom.exit();
}
});
console.log('Found H1 tags, but not ' + titleMap[hash]);
}
console.log('No H1 tags found.');
}, 300);
}
The above code works for me. But it won't work if you need to scrape search results - you'll need to figure out an identifying element or bit of text that you can look for without having to know the title ahead of time.
Edit: Also, it looks like the newest version of PhantomJS now triggers an onResourceReceived event when it gets new data. I haven't looked into this, but you might be able to bind a listener to this event to achieve the same effect.