I'm trying to select the "Use phone / email / username" using puppeteer on the http://tiktok.com/ (you can check out the HTML there) (As referenced below)
At first I thought I could just do
await page.$eval( '.channel-name-2qzLW', form => form.click() );
But the issue is that puppeteer cant find an element with that class because I think the puppeteer browser has a different one as the channel name is automatically generator. As a result I tried finding out how to select an element with the text of Use phone / email / username as that's specific but I ran into issues outlined below.
I tried selecting the divs that contain the text element:
await page.$eval( 'div["Use phone / email / username"]', form => form.click() );
But I received an error message
Error: Evaluation failed: DOMException: Failed to execute 'querySelector' on 'Document': 'div["Use phone or email"]' is not a valid selector.
I've tried looking at https://www.w3schools.com/cssref/css_selectors.asp and https://www.checklyhq.com/learn/headless/basics-selectors/ for an idea of how to get the element but I;m still not sure.
(Current code):
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Goes to tiktok website
await page.goto('https://tiktok.com');
// Clicks on Upload text
await page.$eval( '.upload-text', form => form.click() );
// issue occurs here as it can't find this element
await page.$eval( '.channel-name-2qzLW', form => form.click() );
// screenshots the webpage so I can see what it sees
await page.screenshot({ path: 'browserView.png' });
await browser.close();
})();
Since the class is an attribute, you can use the wildcard selector.
div[class*="channel-name-"]
This selector will target all div elements with classes that partially match "channel-name-".
However, I want to point out that relying on a wildcard selector is not the optimal solution, because if not defined correctly it might return incorrect results. Plus, even though today's machines are fast and strong enough for this tiny extra work, it's still not a very efficient way for a solution, especially with large and complex pages.
So you can use it, but pay more attention.
Related
At the following site, after entering a search phrase such as "baby" (try it!), the Puppeteer call page.mouse.down() doesn't have the same effect as clicking and holding the physical mouse: https://www.dextools.io/app/bsc
After entering a search phrase, a fake dropdown select menu appears, which is really an UL, and I am trying to click the first search result. So I use code like this
await page.mouse.move(200, 350); // let's assume this is inside the element I want
await page.mouse.down();
await new Promise((resolve) => setTimeout(resolve, 2000)); // wait 2 secs
await page.mouse.up();
The expected effect of this code is that, for the 2 seconds that Puppeteer is "holding" the mouse button down, the fake dropdown stays visible, and when Puppeteer "releases" the mouse button, the site redirects to the search result for the item selected.
This is exactly what happens when I use the physical mouse.
However, what happens with Puppeteer is, the dropdown just disappears, as if I had hit the Escape key, and the page.mouse.up() command later has no effect any more.
I am aware that PPT has some quirks in respect to mouse, keyboard, holding and releasing buttons and modifier keys, especially when doing all of the above at once. For example, Drag & Drop doesn't work as expected, but none of the workarounds proposed here work for me: https://github.com/puppeteer/puppeteer/issues/1265
I cannot reproduce the issue with this test script. The link is clicked with following navigation:
import puppeteer from 'puppeteer';
const browser = await puppeteer.launch({ headless: false, defaultViewport: null });
try {
const [page] = await browser.pages();
await page.goto('https://www.dextools.io/app/bsc', { timeout: 0 });
const input = await page.waitForSelector('.input-container input');
await input.type('baby');
const link = await page.waitForSelector('.suggestions-container.is-visible a:not(.text-sponsor)');
await link.click();
} catch (err) { console.error(err); }
Instead of two separate mouse-down and up operations, you could try this according to puppeteer docs:
// selector would uniquely identify the button on your page that you would like to click
selector = '#dropdown-btn'
await page.click(selector, {delay: 2000})
Once you have the element of the list that you wanna click, you should look for the first <a> tag inside this element and use the reference you make on this <a> to perform a click.
From puppeteer's documentation it's saying if there is a navigation you should use:
const [response] = await Promise.all([
page.waitForNavigation(waitOptions),
page.click(selector, clickOptions),
]);
where selector will be a reference to the mentioned <a> tag.
A web page has a button and puppeteer must click that button as soon as possible button becomes visible. This button is not always visible and it is becoming visible for everyone at the same time. So i have to refresh constantly to find that button is became visible. I wrote that script below for to do that:
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox']
});
const page = await browser.newPage()
await page.setViewport({ width: 1920, height: 1080})
//I am calling my pageRefresher method here
async function pageRefresher(page,browser, url) {
try {
await page.goto(url, {waitUntil: 'networkidle2'})
try {
await page.waitForSelector('#ourButton', {timeout: 10});
await page.click('#ourButton')
console.log(`clicked!`)
await browser.close()
} catch (error) {
console.log('catch2 ' + counter + ' ' + error)
counter += 1
await pageRefresher(page, browser, url)
}
}catch (error) {
console.log('catch3' + error)
await browser.close();
}
}
As you can see, my method is recursive. It goes to that page and looking for that button. If there is no button then it calls itself again for redoing the same job until it finds and clicks to that button.
Actually it works well right now. But it is slow. I am running this script meanwhile i am opening the same page on my desktop chrome and i am starting to refresh that page manually. And i am always winning, i am always clicking to that button before the puppeteer.
How can i speed up this process? A script should not lose to a human who has just manual controls like F5 button.
A script should not lose to a human who has just manual controls like F5 button.
It happens because sometimes the rules that puppeteer follows are much stricter than what we consider as a "fully loaded webpage". Even if you as a human can decide whether your desired element is in the DOM already (because you see the element is there) or it is not there (because you don't see it). E.g.: you will see that your button is not there even if the background image is still loading in the background, or the webfonts are still not loaded and you have the fallback fonts, but puppeteer waits for specific events in the background to get the permission either to go to the catch block (timeout) or to grab the desired element (waitForSelector succeeds). It can really depends on the site you are visiting, but you are able to speed up the process of recognition of your desired element.
I give some examples and ideas how you can achieve this.
Ways to speed up recognition of the desired element
1.) If you don't need every network connections for your task you could speed up page loading by replacing waitUntil: 'networkidle2' to waitUntil: 'domcontentloaded' as this event happens usually earlier and will be fired when #ourButton will be already present in the DOM.
The possible options of page.goto/page.reload:
load - consider navigation to be finished when the load event is fired.
domcontentloaded - consider navigation to be finished when the DOMContentLoaded event is fired.
networkidle0 - consider navigation to be finished when there are no more than 0 network connections for at least 500 ms.
networkidle2 - consider navigation to be finished when there are no more than 2 network connections for at least 500 ms.
You are winning over the script because of networkidle2 is too strict. You may need this option (e.g. you are visiting a single-page application or later you will need data from the 3rd party network connection e.g. cookies) but in case it is not mandatory you will experience better performance with domcontentloaded.
2.) Instead of constantly navigating to the same url you could use page.reload method in a loop, e.g.:
await page.goto(url, { waitUntil: 'domcontentloaded' })
let selectorExists = await page.$('#ourButton')
while (selectorExists === null) {
await page.reload({ waitUntil: 'domcontentloaded' })
console.log('reload')
selectorExists = await page.$('#ourButton')
}
await page.click('#ourButton')
// code goes on...
Its main benefit is that you are able to shorten and simplify your pageRefresher function. But I experienced also better performance (however I did no benchmarking but I felt it much faster than re-opening a page).
3.) If you don't need every resource type for your task you could also speed up page loading by disabling images or css with the following script:
await page.setRequestInterception(true)
page.on('request', (request) => {
if (request.resourceType() === 'image') request.abort()
else request.continue()
})
[source]
List of resourceType-s.
Try just not awaiting the goto:
page.goto(url) // no await because it doesn't have to resolve fully
await page.waitForSelector('#ourButton') // await this because we need it to be there
Some people like Promise.race for this but this way is simpler
Using the page.$eval() method you can do it as short as this:
await page.goto(url);
page.$eval('button-selector', button => button.click());
By doing so, you combine the actions of searching the desired button and clicking on it into a single line. You will have to await on the page.goto() instruction as you will need the page to be fully loaded before using page.$eval()
1st arg is the selector you need to use to get your HTMLElement in your case a button.
This HTMLElement will be retrieved by running document.querySelector() with the provided selector whitin page context before passing it as argument for the function defined in the following argument.
2nd arg is the function to be executed inside page context wich take the HTMLElement that match the previous selector as argument
The page.$eval() instruction will throw an error if no element is found that match the provided selector.
You can address this in two ways:
prevent the error from triggering at all by testing if your HTMLElement exists before using the page.$eval() method.
await page.goto(url);
if (await page.$('button-selector') != null) // await because page.$() returns a promise
page.$eval('button-selector', button => button.click());
an alternative using only page.$() would be :
await page.goto(url);
if ((button = await page.$('button-selector')) != null)
button.click();
Be sure to encapsulate the left part of the condition inside ( ) otherwise button value will be true or false.
catch the error when it occurs:
you could use this to determine when to reload the page
await page.goto(url);
page.$eval('button-selector', button => button.click())
.catch((err) => {
// log the error here or do some other stuff
});
After some tests it looks like we can't use a try ... catch block to capture the error on the page.$eval() method so the above example is the only way to do so.
For more informations you could check the puppeteer API page for page.$eval()
And if you want to go further in accelerating puppeteer I've found those tutorials really helpfull:
How to speed up Puppeteer scraping with parallelization
Optimizing and Deploying Puppeteer Web Scraper
8 Tips for Faster Puppeteer Screenshots
Edit:
From your code i see you use the page.setViewPort() method to set a viewport size of 1920x1080 px on your page. While it may provides a better viewing when showing the navigator it'll have some impact on performance. It is best practice to use minimal settings when running in headless mode.
On a page I'm scraping with Puppeteer, I have a list with the same id for every li. I am trying to find and click on an element with specific text within this list. I have the following code:
await page.waitFor(5000)
const linkEx = await page.$x("//a[contains(text(), 'Shop')]")
if (linkEx.length > 0) {
await linkEx[0].click()
}
Do you have any idea how I could replace the first line with waiting for the actual text 'Shop'?
I tried await page.waitFor(linkEx), waitForSelector(linkEx) but it's not working.
Also, I would like to replace that a in the second line of code with the actual id (#activities) or something like that but I couldn't find a proper example.
Could you please help me with this issue?
page.waitForXPath what you need here.
Example:
const puppeteer = require('puppeteer')
async function fn() {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto('https://example.com')
// await page.waitForSelector('//a[contains(text(), "More information...")]') // ❌
await page.waitForXPath('//a[contains(text(), "More information...")]') // ✅
const linkEx = await page.$x('//a[contains(text(), "More information...")]')
if (linkEx.length > 0) {
await linkEx[0].click()
}
await browser.close()
}
fn()
Try this for id-based XPath:
"//*[#id='activities' and contains(text(), 'Shop')]"
Did you know? If you right-click on an element in Chrome DevTools "Elements" tab and you select "Copy": there you are able to copy the exact selector or XPath of an element. After that, you can switch to the "Console" tab and with the Chrome API you are able to test the selector's content, so you can prepare it for your puppeteer script. E.g.: $x("//*[#id='activities' and contains(text(), 'Shop')]")[0].href should show the link what you expected to click on, otherwise you need to change on the access, or you need to check if there are more elements with the same selector etc. This may help to find more appropriate selectors.
For puppeteer 19 and newer, waitForXPath() is obsolete. Use the xpath prefix instead
await page.waitForSelector('xpath/' + xpathExpression)
In your case:
const linkEx = await page.waitForSelector('xpath///a[contains(text(), "Shop")]');
await linkEx.click();
I am trying to assert a text that's seen on a media gallery overlay page. All I want is for the code to verify if the text is present and if so, assets it matches the expected text.
For some reason, I keep getting failed tests. Below is the code I have written in Visual Code:
let expSuccessARMessage = "See it in Your Space (Augmented Reality) is currently only available using an AR compatible Apple device (iOS 12 or above)."
let successARMessage = browser.getText(page.pageElements.arMessage);
console.log(successARMessage);
assert(successARMessage === expSuccessARMessage, 'Success message');
What am I missing here?
Not a magician, but you should be getting browser.getText is not a function error in the console, because the getText() method is defined inside the element object, not the browser object. Read the complete API log here.
So your code should be:
let expectedText = "Your long text here"
let foundText = $(page.pageElements.arMessage).getText();
// Considering 'page.pageElements.arMessage' is a valid selector for targeted element
console.log(`Found the following text: ${foundText}`);
assert.equal(expectedText, foundText, 'This is actually the ERROR message');
I want to add to the answer that there can also be a browser object centric approach, using the webdriver protocol API. Thus, our code becomes:
let expectedText = "Your long text here"
let elementObject = browser.findElement('css selector', page.pageElements.arMessage);
// Considering 'page.pageElements.arMessage' is a valid selector for targeted element
let foundText = browser.getElementText(elementObject.ELEMENT);
console.log(`Found the following text: ${foundText}`);
assert.equal(expectedText, foundText, 'This is actually the ERROR message');
The latter approach is obsolete IMHO, and the recommended approach for WebdriverIO v5 would be using $, respectively $$ (element & elements). But wanted to give you a broader perspective.
If you defined the element into object repository like:
get OtpDescriptionText () { return '//div[#class="otp-description"]'; }
In order to console or assert/expect that element you need to use like:
let elem1 = await $(RegistratonPage.OtpDescriptionText);
console.log(await elem1.getText());
or
await expect($(RegistratonPage.OtpDescriptionText)).toHaveTextContaining('We just need to check this phone number belongs to you.')
If you don't use $, error is thrown
I have 2 web pages; one page shows a popup message while the other page doesn't I have it set up as:
.expect(Selector('.classname).exists).notOk()
This is what I'm assuming:
The first page should pass because the popup message with that class should not show (it passes; makes sense to me)
but the error comes in with the 2nd page where the popup message with that class that I'm selecting passes (Doesn't make sense to me because I'm writing to check to make sure it doesn't pop up/exist)
I have tried using .visible and both failed; the first page fails because it says that the classname doesn't exist. Well, that's good; that's what I want to have when the test passes but the 2nd page fails perfectly the way I want it to.
The classname that I'm trying to test with is an error message that pops up when the site is not working. Pretty much I want to check to if that message pops up; if it does fail the test; if it doesn't pass the test
In testcafe pop up are treated as native dialogue. Hence pop up element won't be picked up by .visible.
You can use following piece of code to check what kind of pop up message is coming up.
await t
.setNativeDialogHandler(() => true);
`Write code which will bring pop up`
Below piece of code will give you what kind of pop up message is coming (alert/confirm etc.)
const history = await t.getNativeDialogHistory();
console.log(history);
for example if it's a confirm type pop up. Below code will give you what you want.
await t
.expect(history[0].type).eql('confirm');
Hope it helps you.
Following code is validating if we are getting confirm pop up on the "https://devexpress.github.io/testcafe/example/" page.
import { Selector, t } from 'testcafe';
fixture `My fixture`
.page `https://devexpress.github.io/testcafe/example/`;
test('My test', async t => {
await t
.setNativeDialogHandler(() => true)
.click('#populate');
const history = await t.getNativeDialogHistory();
await console.log(history);
await t
.expect(history[0].type).eql('confirm')
});
Could you provide your code which you are using. It will help to trouble shoot.
What I've found helpful is checking to see if the element exists, and is visible (or not, in your case):. Then asserting on that
async checkForErrorMessage() {
const errorMessageEl = Selector('.overlay');
const isErrorVisible = (await errorMessageEl.exists) && (await
errorMessageEl.visible);
await t.expect(isErrorVisible).notOk();
}