Puppeteer refresh until selector value change? [duplicate] - javascript

This question already has an answer here:
How can I make a monitoring function to wait for an html element in puppeteer
(1 answer)
Closed 2 years ago.
I'm looking to have puppeteer refresh until a class I want to sign up for opens up. I plan on having the page refresh every 30 seconds or so until the selector to select the class opens. I've gone far enough to sign-in and get to the final page, but I can't wrap my head around the repeating part. A normal JS function brings an error.
page2.waitForNavigation()
await page2.waitFor(4000)
await page2.select('select[name="country"]', "USA")
await page2.click('#searchZip');
function findClass() {
page2.waitForNavigation()
await page2.waitFor(4000)
await page2.click('#showAvailableOnly');
await page2.evaluate(() => {
let el = document.querySelector(".dataTables_empty")
result = el.innerText
if (result = "No matching records found") {
console.log("Could not find any open test centers. Waiting 60 seconds to refresh.");
I want to have puppeteer wait 30 seconds here, and then run the code after findClass() again, until the result changes.
} else {
console.log("Found opening!");
}})
}
I would appreciate any help, as I am pretty new to puppeteer, or nodejs in general. Thank you so much!

in pupeteer there is a function you may use
await page.waitFor(30000);
or you can use as a second solution
function delay(time) {
return new Promise(function(resolve) {
setTimeout(resolve, time)
});
}
then call it
await delay(30000);

Related

page.evaluate runs for hours

I am trying to crawl the contents of a website and it works most of the time (finishes in minutes) but sometimes it takes up to 8 hours. I've managed to pinpoint the issue and it is fired in the page.evaluate part. Looking at the website in headless mode=false it just loads infinitely (after a click). Also if I manually try "document.querySelector" on that page that is stuck in loading, it works for me.
The code is the following:
console.log("Test");
let value = await page.evaluate((sel) => {
let element = document.querySelector(sel);
return element? element.innerHTML: null;
},selector);
console.log("Test2");
What can I do to prevent it from running that long (I would try to setup some kind of timeout system for this case)?
Or how could I track the time while the code is in this part? The code immediately after this part never runs (only after hours) probably because of the await.
You should await page.waitForSelector(sel, { waitUntil:timeout }) first then evaluate

Close the page after certain interval [Puppeteer]

I have used puppeteer for one of my projects to open webpages in headless chrome, do some actions and then close the page. These actions, however, are user dependent. I want to attach a lifetime to the page, where it closes automatically after, say 30 minutes, of opening irrespective of whether any action is performed or not.
I have tried setTimeout() functionality of Node JS but it didn't work (or I just couldn't figure how to make it work).
I have tried the following:
const puppeteer = require('puppeteer-core');
const browser = await puppeteer.connect({browserURL: browser_url});
const page = await browser.newPage();
// timer starts ticking here upon creation of new page (maybe in a subroutine and not block the main thread)
/**
..
Do something
..
*/
// timer ends and closePage() is triggered.
const closePage = (page) => {
if (!page.isClosed()) {
page.close();
}
}
But this gives me the following error:
Error: Protocol error: Connection closed. Most likely the page has been closed.
Your provided code should work as excepted. Are you sure the page is still opened after the timeout and it is indeed the same page?
You can try this wrapper for opening pages and closing them correctly.
// since it is async it won't block the eventloop.
// using `await` will allow other functions to execute.
async function openNewPage(browser, timeoutMs) {
const page = await browser.newPage()
setTimeout(async () => {
// you want to use try/catch for omitting unhandled promise rejections.
try {
if(!page.isClosed()) {
await page.close()
}
} catch(err) {
console.error('unexpected error occured when closing page.', err)
}
}, timeoutMs)
}
// use it like so.
const browser = await puppeteer.connect({browserURL: browser_url});
const min30Ms = 30 * 60 * 1000
const page = await openNewPage(browser, min30Ms);
// ...
The above only closes the Tabs in your browser. For closing the puppeteer instance you would have to call browser.close() which could may be what you want?
page.close returns a promise so you need to define closePage as an async function and use await page.close(). I believe #silvan's answer should address the issue, just make sure to replace if condition
if(page.isClosed())
with
if(!page.isClosed())

Why my second firebase take much more time than the first one? [duplicate]

This question already exists:
Why my second firebase query is very slow?
Closed 3 years ago.
I have a Nuxt app were I inilitialize firebase application. After that, I have a page where I load some data with some query. It looks like this:
if (this.api.configs && this.api.configs.dynamic_rule) {
console.log("inside if " + (Date.now() / 1000))
this.unsubscribe = await this.api
.configs
.dynamic_rule
.getQuery()
.onSnapshot((snapshot) => {
console.log("got data " + (Date.now() / 1000))
let changes = snapshot.docChanges()
// Checking if it's just rule activation
if (changes.length !== 1 || changes[0].type !== "modified") {
this.initialize(snapshot)
}
})
This query bring me just 3 simple records from the firebase.
When I get to the page for the first time, my query runs for ~300ms.
After that, every time I refresh the page, my query runs for ~4s.
I can see it by the console prints
If I run the same query directly from the browser console, it'a always running fast.
Does anybody have an idea what am I doing wrong?
(I know about cold start issue, but this is kind of warm start)

Clicking on "load more" button via puppeteer

I am new to JS. I need to parse comments from Instagram, but first I need to load them all. I am using Puppeteer in Node.js, so I wrote this code:
await page.evaluate(() => {
while(document.querySelector('main').querySelector('ul').querySelector('button'))
document.querySelector('main').querySelector('ul').querySelector('button').click()
})
It does nothing and starts an endless loop. I tried to make a timeout inside the loop and so on...
I expect that code will check if this button exists and if true - click() on it while it exists loading more and more comments.
I can't catch what am I doing wrong.
Have a look at my answer to a question very similar to this one here:
Puppeteer / Node.js to click a button as long as it exists -- and when it no longer exists, commence action
You should be able to apply it to finding and continually clicking on your "load more" button.
Instead of a using a while() loop, you can use setInterval() to slow down each iteration to a more manageable pace while you load the comments:
await page.evaluate(async () => {
await new Promise((resolve, reject) => {
const interval = setInterval(() => {
const button = document.querySelector('main ul button');
if (button !== null) {
button.click();
} else {
clearInterval(interval);
resolve();
}
}, 100);
});
});

Get Nightmare to wait for next page load after clicking link

I'm using nightmare.js to scrape public records and am just trying to get the scraper to wait for the next page to load. I'm crawling search results which I press a next button to (obviously) get to the next page. I can't use nightmare.wait(someConstTime) to accurately wait for the next page to load because sometimes someConstTime is shorter than the time it takes for the next page to load (although it's always under 30 seconds). I also can't use nightmare.wait(selector) because the same selectors are always present on all result pages. In that case nightmare basically doesn't wait at all because the selector is already present (on the page I already scraped) so it it will proceed to scrape the same page several times unless the new page loads before the next loop.
How can I conditionally wait for the next page to load after I click on the next button?
If I could figure out how - I would compare the "Showing # to # of ## entries" indicator of the current page (currentPageStatus) to the last known value (lastPageStatus) and wait until they're different (hence the next page loaded).
(ignore that the example image only has one search result page)
I'd do that using this code from https://stackoverflow.com/a/36734481/3491991 but that would require passing lastPageStatus into deferredWait (which I can't figure out).
Here's the code I've got so far:
// Load dependencies
//const { csvFormat } = require('d3-dsv');
const Nightmare = require('nightmare');
const fs = require('fs');
var vo = require('vo');
const START = 'http://propertytax.peoriacounty.org';
var parcelPrefixes = ["01","02","03","04","05","06","07","08","09","10",
"11","12","13","14","15","16","17","18","19"]
vo(main)(function(err, result) {
if (err) throw err;
});
function* main() {
var nightmare = Nightmare(),
currentPage = 0;
// Go to Peoria Tax Records Search
try {
yield nightmare
.goto(START)
.wait('input[name="property_key"]')
.insert('input[name="property_key"]', parcelPrefixes[0])
// Click search button (#btn btn-success)
.click('.btn.btn-success')
} catch(e) {
console.error(e)
}
// Get parcel numbers ten at a time
try {
yield nightmare
.wait('.sorting_1')
isLastPage = yield nightmare.visible('.paginate_button.next.disabled')
while (!isLastPage) {
console.log('The current page should be: ', currentPage); // Display page status
try {
const result = yield nightmare
.evaluate(() => {
return [...document.querySelectorAll('.sorting_1')]
.map(el => el.innerText);
})
// Save property numbers
// fs.appendFile('parcels.txt', result, (err) => {
// if (err) throw err;
// console.log('The "data to append" was appended to file!');
// });
} catch(e) {
console.error(e);
return undefined;
}
yield nightmare
// Click next page button
.click('.paginate_button.next')
// ************* THIS IS WHERE I NEED HELP *************** BEGIN
// Wait for next page to load before continue while loop
try {
const currentPageStatus = yield nightmare
.evaluate(() => {
return document.querySelector('.dataTables_info').innerText;
})
console.log(currentPageStatus);
} catch(e) {
console.error(e);
return undefined;
}
// ************* THIS IS WHERE I NEED HELP *************** END
currentPage++;
isLastPage = yield nightmare.visible('.paginate_button.next.disabled')
}
} catch(e) {
console.error(e)
}
yield nightmare.end();
}
I had a similar issue that I managed to fix. Basically I had to navigate to a search page, select the '100 per page' option and then wait for the refresh. Only problem was, it was a crapshoot as to whether a manual wait time allowed the AJAX to fire and repopulate with more than 10 results (the default).
I ended up doing this:
nightmare
.goto(url)
.wait('input.button.primary')
.click('input.button.primary')
.wait('#searchresults')
.select('#resultsPerPage',"100")
.click('input.button.primary')
.wait('.searchresult:nth-child(11)')
.evaluate(function() {
...
}
.end()
With this, the evaluate won't fire until it detects at least 11 divs with the class of .searchresult. Given that the default is 10, it has to wait for the reload for this to complete.
You could extend this to scrape the total number of available results from the first page to ensure that there are - in my case - more than 10 available. But the foundation of the concept works.
From what I could understand, basically you need the DOM change to be completed before you start extracting from the page being loaded.
In your case, the element for DOM changes is table with CSS selector: '#search-results'
I think MutationObserver is what you need.
I have used Mutation Summary library which provides a nice wrapper on raw functionality of MutationObservers, to achieve something similar
var observer = new MutationSummary({
callback: updateWidgets,
queries: [{
element: '[data-widget]'
}]
});
:From Tutorial
First register MutationSummary observer when the search results are loaded.
Then, after clicking 'Next' use nightmare.evaluate to wait for mutationSummary callback to return extracted values.

Categories

Resources