Puppeteer page.mouse.down() / up() not the same as clicking physical mouse? - javascript

At the following site, after entering a search phrase such as "baby" (try it!), the Puppeteer call page.mouse.down() doesn't have the same effect as clicking and holding the physical mouse: https://www.dextools.io/app/bsc
After entering a search phrase, a fake dropdown select menu appears, which is really an UL, and I am trying to click the first search result. So I use code like this
await page.mouse.move(200, 350); // let's assume this is inside the element I want
await page.mouse.down();
await new Promise((resolve) => setTimeout(resolve, 2000)); // wait 2 secs
await page.mouse.up();
The expected effect of this code is that, for the 2 seconds that Puppeteer is "holding" the mouse button down, the fake dropdown stays visible, and when Puppeteer "releases" the mouse button, the site redirects to the search result for the item selected.
This is exactly what happens when I use the physical mouse.
However, what happens with Puppeteer is, the dropdown just disappears, as if I had hit the Escape key, and the page.mouse.up() command later has no effect any more.
I am aware that PPT has some quirks in respect to mouse, keyboard, holding and releasing buttons and modifier keys, especially when doing all of the above at once. For example, Drag & Drop doesn't work as expected, but none of the workarounds proposed here work for me: https://github.com/puppeteer/puppeteer/issues/1265

I cannot reproduce the issue with this test script. The link is clicked with following navigation:
import puppeteer from 'puppeteer';
const browser = await puppeteer.launch({ headless: false, defaultViewport: null });
try {
const [page] = await browser.pages();
await page.goto('https://www.dextools.io/app/bsc', { timeout: 0 });
const input = await page.waitForSelector('.input-container input');
await input.type('baby');
const link = await page.waitForSelector('.suggestions-container.is-visible a:not(.text-sponsor)');
await link.click();
} catch (err) { console.error(err); }

Instead of two separate mouse-down and up operations, you could try this according to puppeteer docs:
// selector would uniquely identify the button on your page that you would like to click
selector = '#dropdown-btn'
await page.click(selector, {delay: 2000})

Once you have the element of the list that you wanna click, you should look for the first <a> tag inside this element and use the reference you make on this <a> to perform a click.
From puppeteer's documentation it's saying if there is a navigation you should use:
const [response] = await Promise.all([
page.waitForNavigation(waitOptions),
page.click(selector, clickOptions),
]);
where selector will be a reference to the mentioned <a> tag.

Related

Puppeteer doesn't find button, doesn't recognize loaded website

First time using Puppeteer and trying to simply click this button
after clicking the deny cookies button. That's my code:
await page.goto('https://myurl.com');
await page.click('a.cc-btn.cc-deny');
// await page.waitForNavigation();
await page.waitForSelector("#detailview_btn_order", {visible: true});
await page.click("#detailview_btn_order");
Clicking the deny cookies button works like a charm. However, it seems the second button can't be identified by Puppeteer. If I don't use waitForSelector it just says it can't find it. If I use it, I get a timeout after 30 seconds even though the website finishes loading after 5 seconds. If I uncomment waitForNavigation (regardless of what options I use) I get a timeout there, even thoug the site loads within seconds. What am I doing wrong? Thanks!
Can you try this:
await page.goto('https://myurl.com');
await Promise.all([
page.click('a.cc-btn.cc-deny'),
page.waitForNavigation(),
]);
const iframeElement = await page.waitForSelector("#my-iframe");
const frame = await iframeElement.contentFrame();
await frame.waitForSelector("#detailview_btn_order", {visible: true});
await frame.click("#detailview_btn_order");
Sometimes there is a race condition between a click and navigation.

Puppeteer how to check if page is navigated and perform task if not navigated

On the login page, I'm trying to figure out whether the google recaptcha appears or not. If it does, I want to run a block of code and otherwise navigate as usual.
await page.goto(
url
);
await page.waitForSelector("#username");
await page.type("#username", process.env.EMAIL);
await page.type("#password", process.env.PSWD);
await page.$eval("#signIn > div > button", (el) => el.click()) //this line sometimes triggers recaptcha
{//here wait for navigation and check if google captcha appears}
//then run the following code:
await page.solveRecaptchas();
await Promise.all([
page.waitForNavigation(),
page.click("#signIn"),
]);
I've tried using page.waitForNavigation but it causes timeout if recaptcha appears. What can I do to run the bottom block of code ONLY if google recaptcha appears?
I also tried conditionally running the block of code on if recaptcha-token is present but I checked the dom and recaptcha element is always present and only prompts image select randomly. Basically I'm available to navigate sometimes without having to perform any captcha and sometimes i'm prompted with image select.
Thanks!
Maybe something like this?
const [_, navigation] = await Promise.allSettled([
element.click(),
page.waitForNavigation(),
]);
if (navigation.status === 'fulfilled') /* There was navigation. */;
else /* There was timeout, no navigation. */;

How to speed up puppeteer?

A web page has a button and puppeteer must click that button as soon as possible button becomes visible. This button is not always visible and it is becoming visible for everyone at the same time. So i have to refresh constantly to find that button is became visible. I wrote that script below for to do that:
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox']
});
const page = await browser.newPage()
await page.setViewport({ width: 1920, height: 1080})
//I am calling my pageRefresher method here
async function pageRefresher(page,browser, url) {
try {
await page.goto(url, {waitUntil: 'networkidle2'})
try {
await page.waitForSelector('#ourButton', {timeout: 10});
await page.click('#ourButton')
console.log(`clicked!`)
await browser.close()
} catch (error) {
console.log('catch2 ' + counter + ' ' + error)
counter += 1
await pageRefresher(page, browser, url)
}
}catch (error) {
console.log('catch3' + error)
await browser.close();
}
}
As you can see, my method is recursive. It goes to that page and looking for that button. If there is no button then it calls itself again for redoing the same job until it finds and clicks to that button.
Actually it works well right now. But it is slow. I am running this script meanwhile i am opening the same page on my desktop chrome and i am starting to refresh that page manually. And i am always winning, i am always clicking to that button before the puppeteer.
How can i speed up this process? A script should not lose to a human who has just manual controls like F5 button.
A script should not lose to a human who has just manual controls like F5 button.
It happens because sometimes the rules that puppeteer follows are much stricter than what we consider as a "fully loaded webpage". Even if you as a human can decide whether your desired element is in the DOM already (because you see the element is there) or it is not there (because you don't see it). E.g.: you will see that your button is not there even if the background image is still loading in the background, or the webfonts are still not loaded and you have the fallback fonts, but puppeteer waits for specific events in the background to get the permission either to go to the catch block (timeout) or to grab the desired element (waitForSelector succeeds). It can really depends on the site you are visiting, but you are able to speed up the process of recognition of your desired element.
I give some examples and ideas how you can achieve this.
Ways to speed up recognition of the desired element
1.) If you don't need every network connections for your task you could speed up page loading by replacing waitUntil: 'networkidle2' to waitUntil: 'domcontentloaded' as this event happens usually earlier and will be fired when #ourButton will be already present in the DOM.
The possible options of page.goto/page.reload:
load - consider navigation to be finished when the load event is fired.
domcontentloaded - consider navigation to be finished when the DOMContentLoaded event is fired.
networkidle0 - consider navigation to be finished when there are no more than 0 network connections for at least 500 ms.
networkidle2 - consider navigation to be finished when there are no more than 2 network connections for at least 500 ms.
You are winning over the script because of networkidle2 is too strict. You may need this option (e.g. you are visiting a single-page application or later you will need data from the 3rd party network connection e.g. cookies) but in case it is not mandatory you will experience better performance with domcontentloaded.
2.) Instead of constantly navigating to the same url you could use page.reload method in a loop, e.g.:
await page.goto(url, { waitUntil: 'domcontentloaded' })
let selectorExists = await page.$('#ourButton')
while (selectorExists === null) {
await page.reload({ waitUntil: 'domcontentloaded' })
console.log('reload')
selectorExists = await page.$('#ourButton')
}
await page.click('#ourButton')
// code goes on...
Its main benefit is that you are able to shorten and simplify your pageRefresher function. But I experienced also better performance (however I did no benchmarking but I felt it much faster than re-opening a page).
3.) If you don't need every resource type for your task you could also speed up page loading by disabling images or css with the following script:
await page.setRequestInterception(true)
page.on('request', (request) => {
if (request.resourceType() === 'image') request.abort()
else request.continue()
})
[source]
List of resourceType-s.
Try just not awaiting the goto:
page.goto(url) // no await because it doesn't have to resolve fully
await page.waitForSelector('#ourButton') // await this because we need it to be there
Some people like Promise.race for this but this way is simpler
Using the page.$eval() method you can do it as short as this:
await page.goto(url);
page.$eval('button-selector', button => button.click());
By doing so, you combine the actions of searching the desired button and clicking on it into a single line. You will have to await on the page.goto() instruction as you will need the page to be fully loaded before using page.$eval()
1st arg is the selector you need to use to get your HTMLElement in your case a button.
This HTMLElement will be retrieved by running document.querySelector() with the provided selector whitin page context before passing it as argument for the function defined in the following argument.
2nd arg is the function to be executed inside page context wich take the HTMLElement that match the previous selector as argument
The page.$eval() instruction will throw an error if no element is found that match the provided selector.
You can address this in two ways:
prevent the error from triggering at all by testing if your HTMLElement exists before using the page.$eval() method.
await page.goto(url);
if (await page.$('button-selector') != null) // await because page.$() returns a promise
page.$eval('button-selector', button => button.click());
an alternative using only page.$() would be :
await page.goto(url);
if ((button = await page.$('button-selector')) != null)
button.click();
Be sure to encapsulate the left part of the condition inside ( ) otherwise button value will be true or false.
catch the error when it occurs:
you could use this to determine when to reload the page
await page.goto(url);
page.$eval('button-selector', button => button.click())
.catch((err) => {
// log the error here or do some other stuff
});
After some tests it looks like we can't use a try ... catch block to capture the error on the page.$eval() method so the above example is the only way to do so.
For more informations you could check the puppeteer API page for page.$eval()
And if you want to go further in accelerating puppeteer I've found those tutorials really helpfull:
How to speed up Puppeteer scraping with parallelization
Optimizing and Deploying Puppeteer Web Scraper
8 Tips for Faster Puppeteer Screenshots
Edit:
From your code i see you use the page.setViewPort() method to set a viewport size of 1920x1080 px on your page. While it may provides a better viewing when showing the navigator it'll have some impact on performance. It is best practice to use minimal settings when running in headless mode.

Wait for an xpath in Puppeteer

On a page I'm scraping with Puppeteer, I have a list with the same id for every li. I am trying to find and click on an element with specific text within this list. I have the following code:
await page.waitFor(5000)
const linkEx = await page.$x("//a[contains(text(), 'Shop')]")
if (linkEx.length > 0) {
await linkEx[0].click()
}
Do you have any idea how I could replace the first line with waiting for the actual text 'Shop'?
I tried await page.waitFor(linkEx), waitForSelector(linkEx) but it's not working.
Also, I would like to replace that a in the second line of code with the actual id (#activities) or something like that but I couldn't find a proper example.
Could you please help me with this issue?
page.waitForXPath what you need here.
Example:
const puppeteer = require('puppeteer')
async function fn() {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto('https://example.com')
// await page.waitForSelector('//a[contains(text(), "More information...")]') // ❌
await page.waitForXPath('//a[contains(text(), "More information...")]') // ✅
const linkEx = await page.$x('//a[contains(text(), "More information...")]')
if (linkEx.length > 0) {
await linkEx[0].click()
}
await browser.close()
}
fn()
Try this for id-based XPath:
"//*[#id='activities' and contains(text(), 'Shop')]"
Did you know? If you right-click on an element in Chrome DevTools "Elements" tab and you select "Copy": there you are able to copy the exact selector or XPath of an element. After that, you can switch to the "Console" tab and with the Chrome API you are able to test the selector's content, so you can prepare it for your puppeteer script. E.g.: $x("//*[#id='activities' and contains(text(), 'Shop')]")[0].href should show the link what you expected to click on, otherwise you need to change on the access, or you need to check if there are more elements with the same selector etc. This may help to find more appropriate selectors.
For puppeteer 19 and newer, waitForXPath() is obsolete. Use the xpath prefix instead
await page.waitForSelector('xpath/' + xpathExpression)
In your case:
const linkEx = await page.waitForSelector('xpath///a[contains(text(), "Shop")]');
await linkEx.click();

Puppeteer and dynamically added iFrame (element)

We have an angularJs application that popup a modal form (component) on button pressed.
This component loads an iFrame, which I cannot seem to access with Puppeteer.
Have tried with mainFrame.
await page.waitFor(15000);
const frame = page.mainFrame().childFrames().find((iframe) => {
console.log('FRAME', iframe.name(), iframe.url());
return iframe.name() === 'iFrameName';
});
The above only has one frame (the main frame/window).
Have tried with frames
await page.waitFor(15000);
const frame = page.frames().find((iframe) => {
console.log('FRAME', iframe.name(), iframe.url());
return iframe.name() === 'iFrameName';
});
Have tried with contentFrame
await page.waitForSelector('iframe', { visible: true, timeout: 2000 });
const elementHandle = await page.$('iframe');
await page.waitFor(1000);
const frame = await elementHandle.contentFrame();
With the above, elementHandle has a value but frame is null
We have this working with Protractor, were hopping to move to Puppeteers but if there is no solution will have to stick with Protractor (which has it own other issues)
Currently, there is no support for out-of-process iframes (OOPIFs). To be able to work with them, you need to launch Chromium with --disable-features=site-per-process:
const browser = await puppeteer.launch({
args: ['--disable-features=site-per-process']
});
You can track puppeteer's issue/support here.
I have a similar problem, an iframe dynamically called, so that src=(unknown) with a JS
href="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(VARİABLES,,true,,false,))
is it possible to clone The or an iframe via invoking js calling it in puppeteer? if so you can try.

Categories

Resources