How do I tell puppeteer when the page is fully loaded? - javascript

I have some things which load asyncronously, so I need puppeteer to wait until they are finished loading.
I've tried waiting for a variable to be set
await page.waitForFunction('window.exampleLoaded === true');
I've tried waiting for an element to appear
await page.waitForSelector('#complete');
And I put this code directly in the javascript for the page:
window.exampleLoaded = true;
document.body.insertAdjacentHTML('beforeend','<div id="complete"></div>');
But puppeteer just times out, those waitFor's never resolve.
If i just open the page in the browser, both window.exampleLoaded === true and document.querySelector('#complete')!=null are true.
EDIT: If I put them at the very top of the code, then the awaits resolve. But not if they're triggered later.
What am I doing wrong?

The problem was something else on the page, but this code was very helpful in figuring out the problem:
page
.on('console', message =>
console.log(`${message.type().substr(0, 3).toUpperCase()} ${message.text()}`))
.on('pageerror', ({ message }) => console.log(message))
.on('response', response =>
console.log(`${response.status()} ${response.url()}`))
.on('requestfailed', request =>
console.log(`${request.failure().errorText} ${request.url()}`))

Related

Output execution time for a Playwright step with AJAX payload

I am trying to dump out a few key measurements to console when my test runs, rather than getting them from the reporter output, but I can't see how to grab the time taken for the last step to execute. Here's a simplified version based on the docs for request.timing() but I don't think that what I'm doing is classed as a request:
const { test, expect } = require('#playwright/test');
test('ApplicationLoadTime', async ({ page }) => {
// Wait for applications to load
await page.waitForSelector('img[alt="Application"]');
// Not working! - get time for step execution
const [fir] = await Promise.all([
page.click('text=Further information requested'),
page.waitForSelector('img[alt="Application"]')
]);
console.log(fir.timing());
});
The click on "Further information requested" causes the page to be modified based on an AJAX call in the background and the appearance of the Application img tells me it's finished. Is this possible or do I need to rely on the reports instead?
fir is going to be undefined in your code as page.click() doesn't return anything. You need to wait for the request whose timing you're interested in, use page.waitForEvent('requestfinished') or waitForNavigation:
const { test, expect } = require('#playwright/test');
test('ApplicationLoadTime', async ({ page }) => {
// Wait for applications to load
await page.waitForSelector('img[alt="Application"]');
const [fir] = await Promise.all([
// Wait for the request
page.waitForEvent('requestfinished', r => r.url() == '<url of interest>'),
page.click('text=Further information requested'),
page.waitForSelector('img[alt="Application"]')
]);
console.log(fir.timing());
});

How to speed up puppeteer?

A web page has a button and puppeteer must click that button as soon as possible button becomes visible. This button is not always visible and it is becoming visible for everyone at the same time. So i have to refresh constantly to find that button is became visible. I wrote that script below for to do that:
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox']
});
const page = await browser.newPage()
await page.setViewport({ width: 1920, height: 1080})
//I am calling my pageRefresher method here
async function pageRefresher(page,browser, url) {
try {
await page.goto(url, {waitUntil: 'networkidle2'})
try {
await page.waitForSelector('#ourButton', {timeout: 10});
await page.click('#ourButton')
console.log(`clicked!`)
await browser.close()
} catch (error) {
console.log('catch2 ' + counter + ' ' + error)
counter += 1
await pageRefresher(page, browser, url)
}
}catch (error) {
console.log('catch3' + error)
await browser.close();
}
}
As you can see, my method is recursive. It goes to that page and looking for that button. If there is no button then it calls itself again for redoing the same job until it finds and clicks to that button.
Actually it works well right now. But it is slow. I am running this script meanwhile i am opening the same page on my desktop chrome and i am starting to refresh that page manually. And i am always winning, i am always clicking to that button before the puppeteer.
How can i speed up this process? A script should not lose to a human who has just manual controls like F5 button.
A script should not lose to a human who has just manual controls like F5 button.
It happens because sometimes the rules that puppeteer follows are much stricter than what we consider as a "fully loaded webpage". Even if you as a human can decide whether your desired element is in the DOM already (because you see the element is there) or it is not there (because you don't see it). E.g.: you will see that your button is not there even if the background image is still loading in the background, or the webfonts are still not loaded and you have the fallback fonts, but puppeteer waits for specific events in the background to get the permission either to go to the catch block (timeout) or to grab the desired element (waitForSelector succeeds). It can really depends on the site you are visiting, but you are able to speed up the process of recognition of your desired element.
I give some examples and ideas how you can achieve this.
Ways to speed up recognition of the desired element
1.) If you don't need every network connections for your task you could speed up page loading by replacing waitUntil: 'networkidle2' to waitUntil: 'domcontentloaded' as this event happens usually earlier and will be fired when #ourButton will be already present in the DOM.
The possible options of page.goto/page.reload:
load - consider navigation to be finished when the load event is fired.
domcontentloaded - consider navigation to be finished when the DOMContentLoaded event is fired.
networkidle0 - consider navigation to be finished when there are no more than 0 network connections for at least 500 ms.
networkidle2 - consider navigation to be finished when there are no more than 2 network connections for at least 500 ms.
You are winning over the script because of networkidle2 is too strict. You may need this option (e.g. you are visiting a single-page application or later you will need data from the 3rd party network connection e.g. cookies) but in case it is not mandatory you will experience better performance with domcontentloaded.
2.) Instead of constantly navigating to the same url you could use page.reload method in a loop, e.g.:
await page.goto(url, { waitUntil: 'domcontentloaded' })
let selectorExists = await page.$('#ourButton')
while (selectorExists === null) {
await page.reload({ waitUntil: 'domcontentloaded' })
console.log('reload')
selectorExists = await page.$('#ourButton')
}
await page.click('#ourButton')
// code goes on...
Its main benefit is that you are able to shorten and simplify your pageRefresher function. But I experienced also better performance (however I did no benchmarking but I felt it much faster than re-opening a page).
3.) If you don't need every resource type for your task you could also speed up page loading by disabling images or css with the following script:
await page.setRequestInterception(true)
page.on('request', (request) => {
if (request.resourceType() === 'image') request.abort()
else request.continue()
})
[source]
List of resourceType-s.
Try just not awaiting the goto:
page.goto(url) // no await because it doesn't have to resolve fully
await page.waitForSelector('#ourButton') // await this because we need it to be there
Some people like Promise.race for this but this way is simpler
Using the page.$eval() method you can do it as short as this:
await page.goto(url);
page.$eval('button-selector', button => button.click());
By doing so, you combine the actions of searching the desired button and clicking on it into a single line. You will have to await on the page.goto() instruction as you will need the page to be fully loaded before using page.$eval()
1st arg is the selector you need to use to get your HTMLElement in your case a button.
This HTMLElement will be retrieved by running document.querySelector() with the provided selector whitin page context before passing it as argument for the function defined in the following argument.
2nd arg is the function to be executed inside page context wich take the HTMLElement that match the previous selector as argument
The page.$eval() instruction will throw an error if no element is found that match the provided selector.
You can address this in two ways:
prevent the error from triggering at all by testing if your HTMLElement exists before using the page.$eval() method.
await page.goto(url);
if (await page.$('button-selector') != null) // await because page.$() returns a promise
page.$eval('button-selector', button => button.click());
an alternative using only page.$() would be :
await page.goto(url);
if ((button = await page.$('button-selector')) != null)
button.click();
Be sure to encapsulate the left part of the condition inside ( ) otherwise button value will be true or false.
catch the error when it occurs:
you could use this to determine when to reload the page
await page.goto(url);
page.$eval('button-selector', button => button.click())
.catch((err) => {
// log the error here or do some other stuff
});
After some tests it looks like we can't use a try ... catch block to capture the error on the page.$eval() method so the above example is the only way to do so.
For more informations you could check the puppeteer API page for page.$eval()
And if you want to go further in accelerating puppeteer I've found those tutorials really helpfull:
How to speed up Puppeteer scraping with parallelization
Optimizing and Deploying Puppeteer Web Scraper
8 Tips for Faster Puppeteer Screenshots
Edit:
From your code i see you use the page.setViewPort() method to set a viewport size of 1920x1080 px on your page. While it may provides a better viewing when showing the navigator it'll have some impact on performance. It is best practice to use minimal settings when running in headless mode.

How to waitFor when page refreshes in Puppeteer?

I have an app I'm working with that is behaving like this... You visit a url /refresh, and it loads the page with a loader/spinner/bar showing for like 5 seconds, then it refreshes the page after it's done. It does this so it can load the latest data that was computed during /refresh.
Right now I am just setting a timeout longer than the loader will most likely stay around, but this is brittle because a bad network connection could put it over the line.
How can I instead "watch" for when the refresh happens? What technique would you recommend. It seems to start to get hairy pretty fast.
Into the nitty gritty, when the loader is showing, when it finishes it is gone for like a half a second before the page reload. So I can't just wait til the loader is gone. It seems like I need to keep some sort of state variable around in the DOM like in localStorage, but can't pinpoint it. Would love some help.
well you could "watch" for the element that display the data using page.$(selector), or if no such element you could also wait for the specific request 's response:
const waitForResponse = (page, url) => {
return new Promise(resolve => {
page.on("response", function callback(response){
if (response.url() === url) {
resolve(response);
page.removeListener("response",callback)
}
})
})
};
const res = await waitForResponse(page,"url of the request you want to wait for");
Wait for Network request before continuing process

While loop not executing inside callback

Inside the catch of a promise
.catch((message) => {
console.log(message)
var newUrl = url
chrome.tabs.create({url: newUrl}, function(response) {
console.log(response.status)
status = 'loading'
while (status == 'loading') {
setTimeout(function() {
console.log(response.status)
status = response.status
}, 3000)
}
})
})
I'm trying to write the catch in the way that it will open up a new page, wait for it to finish loading, then grab the new cookies
I feel like Im taking crazy pills as this seems super straight forward. However its never printing out response.status
I want it to wait to check response.status every 3 seconds and once the page has loaded it will end the loop.
What am I doing wrong?
The way You've wrote it you've made an infinite loop, which will put tons of setTimeouts on browser's event queue.
setTimeout also put's code there, but it puts it with "3sec plus" delay note.
In practice you tell your browser - set infinite timeouts for me, and after it's finished, please do console.log after 3 seconds. This won't happen.
You should probably use setInterval instead

Tab switch issue in puppeteer

I have an error during tab switch in puppeteer:
await page2.waitForSelector('#save');
await page2.click('#save'); //for saving and closing the page
await page2.waitFor(4000); // !!it will crash if I remove this line!!
const allPages = await browser.pages();
const page1 = await allPages[0];
await page1.waitFor(5000);// change nothing even if I wait 10 seconds
await page1.waitForSelector("selector")//crash if I delete 3rd line
When I run this code without the 3rd line, it triggers an error :
error: Error: Protocol error (Runtime.evaluate): Session closed.
Most likely the page has been closed.
at CDPSession.send(c:\path\node_modules\puppeteer\lib\Connection.js:172:29)
at ExecutionContext.evaluateHandle (c:\path\node_modules\puppeteer\lib\ExecutionContext.js:56:77)
at EventEmitter._document._documentPromise._contextPromise.then (c:\path\node_modules\puppeteer\lib\FrameManager.js:310:38)
And page2.waitForNavigation does not work for me, it freezes the page.
I wonder why it crashes if I don't use the function : waitFor(4000) on page2 and if there is a way to automate the wait for not wasting time for nothing
Or maybe I should wait for page2 completely close after click a button?
You may need to wait for the script to finish executing :
await page.waitForResponse(response => {
return response.request().resourceType() === 'script';
});
Or wait for other things to load/happen/finish, differents ressourceType : document, stylesheet, image, media, font, script, texttrack, xhr, fetch, eventsource, websocket, manifest, other

Categories

Resources