Wait for an xpath in Puppeteer - javascript

On a page I'm scraping with Puppeteer, I have a list with the same id for every li. I am trying to find and click on an element with specific text within this list. I have the following code:
await page.waitFor(5000)
const linkEx = await page.$x("//a[contains(text(), 'Shop')]")
if (linkEx.length > 0) {
await linkEx[0].click()
}
Do you have any idea how I could replace the first line with waiting for the actual text 'Shop'?
I tried await page.waitFor(linkEx), waitForSelector(linkEx) but it's not working.
Also, I would like to replace that a in the second line of code with the actual id (#activities) or something like that but I couldn't find a proper example.
Could you please help me with this issue?

page.waitForXPath what you need here.
Example:
const puppeteer = require('puppeteer')
async function fn() {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto('https://example.com')
// await page.waitForSelector('//a[contains(text(), "More information...")]') // ❌
await page.waitForXPath('//a[contains(text(), "More information...")]') // ✅
const linkEx = await page.$x('//a[contains(text(), "More information...")]')
if (linkEx.length > 0) {
await linkEx[0].click()
}
await browser.close()
}
fn()
Try this for id-based XPath:
"//*[#id='activities' and contains(text(), 'Shop')]"
Did you know? If you right-click on an element in Chrome DevTools "Elements" tab and you select "Copy": there you are able to copy the exact selector or XPath of an element. After that, you can switch to the "Console" tab and with the Chrome API you are able to test the selector's content, so you can prepare it for your puppeteer script. E.g.: $x("//*[#id='activities' and contains(text(), 'Shop')]")[0].href should show the link what you expected to click on, otherwise you need to change on the access, or you need to check if there are more elements with the same selector etc. This may help to find more appropriate selectors.

For puppeteer 19 and newer, waitForXPath() is obsolete. Use the xpath prefix instead
await page.waitForSelector('xpath/' + xpathExpression)
In your case:
const linkEx = await page.waitForSelector('xpath///a[contains(text(), "Shop")]');
await linkEx.click();

Related

Puppeteer page.mouse.down() / up() not the same as clicking physical mouse?

At the following site, after entering a search phrase such as "baby" (try it!), the Puppeteer call page.mouse.down() doesn't have the same effect as clicking and holding the physical mouse: https://www.dextools.io/app/bsc
After entering a search phrase, a fake dropdown select menu appears, which is really an UL, and I am trying to click the first search result. So I use code like this
await page.mouse.move(200, 350); // let's assume this is inside the element I want
await page.mouse.down();
await new Promise((resolve) => setTimeout(resolve, 2000)); // wait 2 secs
await page.mouse.up();
The expected effect of this code is that, for the 2 seconds that Puppeteer is "holding" the mouse button down, the fake dropdown stays visible, and when Puppeteer "releases" the mouse button, the site redirects to the search result for the item selected.
This is exactly what happens when I use the physical mouse.
However, what happens with Puppeteer is, the dropdown just disappears, as if I had hit the Escape key, and the page.mouse.up() command later has no effect any more.
I am aware that PPT has some quirks in respect to mouse, keyboard, holding and releasing buttons and modifier keys, especially when doing all of the above at once. For example, Drag & Drop doesn't work as expected, but none of the workarounds proposed here work for me: https://github.com/puppeteer/puppeteer/issues/1265
I cannot reproduce the issue with this test script. The link is clicked with following navigation:
import puppeteer from 'puppeteer';
const browser = await puppeteer.launch({ headless: false, defaultViewport: null });
try {
const [page] = await browser.pages();
await page.goto('https://www.dextools.io/app/bsc', { timeout: 0 });
const input = await page.waitForSelector('.input-container input');
await input.type('baby');
const link = await page.waitForSelector('.suggestions-container.is-visible a:not(.text-sponsor)');
await link.click();
} catch (err) { console.error(err); }
Instead of two separate mouse-down and up operations, you could try this according to puppeteer docs:
// selector would uniquely identify the button on your page that you would like to click
selector = '#dropdown-btn'
await page.click(selector, {delay: 2000})
Once you have the element of the list that you wanna click, you should look for the first <a> tag inside this element and use the reference you make on this <a> to perform a click.
From puppeteer's documentation it's saying if there is a navigation you should use:
const [response] = await Promise.all([
page.waitForNavigation(waitOptions),
page.click(selector, clickOptions),
]);
where selector will be a reference to the mentioned <a> tag.

How do I make puppeteer click on link?

I want to make a scraper with puppeteer, that opens a site, uses its search bar and opens the first link.
That is the code:
const puppeteer = require('puppeteer');
(async () => {
let browser = await puppeteer.launch();
let page = await browser.newPage();
await page.goto('https://example.com', {waitUntil: 'networkidle2'});
await page.click('[name=query]');
await page.keyboard.type("(Weapon)");
await page.keyboard.press('Enter');
await page.waitForSelector('div[class="search-results"]', {timeout: 100000});
});
The problem is I can't make it open the first link from the search results, I tried to use page.click() But all of the search results are the same except the URL.
What I want to know is how can I make it open the first link from search results.
There're more ways to solve this. I recommend experimenting with it a bit, so you learn different ways of doing this.
await page.click('.search-results a');
it turns out Puppeteer always click on the first element it finds, so if you want the first one, this will be enough.
Or you can select all the links and then click on the first one:
const resultLinks = await page.$$('.search-results a');
resultLinks[0].click();
It'd be better to include a condition here as well, so you don't end up with an error because no element was found:
const resultLinks = await page.$$('.search-results a');
if (resultLinks.length) resultLinks[0].click();
There're more ways, so if you want to learn more, please refer to the API documenttion.

How to click an array of links in puppeteer?

I'm new to puppeteer, trying to understand how it works by writing a simple scraping job.
What I plan to do
Plan is simple:
goto a page,
then extract all <li> links under a <ul> tag
click each <li> link and take a screenshot of the target page.
How I implement it
Code goes as follows,
await page.goto('http://some.url.com'); // step-1
const a_elems = await page.$$('li.some_css_class a'); // step-2
for (var i=0; i<a_elems.length; i++) { // step-3
const elem = a_elems[i];
await Promise.all([
elem.click(),
page.waitForNavigation({waitUntil: 'networkidle0'}) // click each link and wait page loading
]);
await page.screenshot({path: `${IMG_FOLDER}/${txt}.png`});
await page.goBack({waitUntil: 'networkidle0'}); // go back to previous page so that we could click next link
console.log(`clicked link = ${txt}`);
}
What is wrong & Need help
However, the above code only could do with the first link in a_elems, and when the for-loop comes to the 2nd link, the code breaks with error saying
(node:40606) UnhandledPromiseRejectionWarning: Error: Node is detached from document
at ElementHandle._scrollIntoViewIfNeeded (.../.npm-packages/lib/node_modules/puppeteer/lib/JSHandle.js:203:13)
at processTicksAndRejections (internal/process/task_queues.js:93:5)
at async ElementHandle.click (.../.npm-packages/lib/node_modules/puppeteer/lib/JSHandle.js:282:5)
at async Promise.all (index 0)
at async main (.../test.js:34:5)
-- ASYNC --
at ElementHandle.<anonymous> (.../.npm-packages/lib/node_modules/puppeteer/lib/helper.js:111:15)
at main (.../test.js:35:12)
at processTicksAndRejections (internal/process/task_queues.js:93:5)
I suspect that the execution context of page has already changed after the first link is clicked, even though I called page.goBack to previous page, but it doesn't give me the previous execution context.
Not sure if my speculation is right or wrong, and couldn't find any similar issue out there, hope I could get some help here, thanks!
If there could be even better implementation to achieve my plan, please let me know.
You are right about the elements losing its context when you goBack. That's not going to work.
But, as you commented, you can grab the href from the element and start from there:
for (var i=0; i<a_elems.length; i++) { // step-3
const elem = a_elems[i];
const href = await page.evaluate(e => e.href, elem); //Chrome will return the absolute URL
const newPage = await browser.newPage();
await newPage.goto(href);
await newPage.screenshot({path: `${IMG_FOLDER}/${txt}.png`});
await newPage.close();
console.log(`clicked link = ${txt}`);
}
You could even do this in parallel, although there is an internal queue for screenshots.

Puppeteer Does Not Visualise Complete SVG Chart

I am using this code in Try Puppeteer:
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.barchart.com/futures/quotes/ESM19/interactive-chart/fullscreen');
const linkHandlers = await page.$x("//li[contains(text(), '1D')]");
if (linkHandlers.length > 0) {
await linkHandlers[0].click();
} else {
throw new Error("Link not found");
}
await page.$eval('input[name="fieldInput"]', el => el.value = '1');
console.log(await page.content())
// const text = page.evaluate(() => document.querySelector('rect'))
// text.then((r) => {console.log(r[0])})
await page.screenshot({path: 'screenshot.png'});
await browser.close();
The same page loaded in the Chrome browser shows the bars indicating price movements, but in the screenshot obtained in Puppeteer the chart is empty.
Also page.content() gives an html that is completely different from the one I see when inspecting the element in Chrome.
Problem
You are not waiting for the request to resolve when the input is changed. As a change will trigger a request, you should use page.waitForResponse to wait until the data is loaded.
In addition, this is an Angular application, which does not seem to like it if you simply change the value of the field via el.value = '1'. Instead you need to try to behave more like a human (and hit backspace and type the input value).
Solution
First, you get the element handle (input[name="fieldInput") from the document. Then, you focus the element, remove the value inside by pressing backspace. After that you type the desired input value.
The input field now has the correct value, now we need to trigger the blur event by calling blur() on the element. In parallel, we wait for the request to the server to finish. After the request finishes, we should give the page a few milliseconds to render the data.
All together, the resulting code looks like this:
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.barchart.com/futures/quotes/ESM19/interactive-chart/fullscreen');
// wait until the element appears
const linkHandler = await page.waitForXPath("//li[contains(text(), '1D')]");
await linkHandler.click();
// get the input field, focus it, remove what's inside, then type the value
const elementHandle = await page.$('input[name="fieldInput"]');
await elementHandle.focus();
await elementHandle.press('Backspace');
await elementHandle.type('1');
// trigger the blur event and wait for the response from the server
await Promise.all([
page.waitForResponse(response => response.url().includes('https://www.barchart.com/proxies/timeseries/queryminutes.ashx')),
page.evaluate(el => el.blur(), elementHandle)
]);
// give the page a few milliseconds to render the diagram
await page.waitFor(100);
await page.screenshot({path: 'screenshot.png'});
await browser.close();
Code improvement
I also removed the page.$x function and replaced it with the page.waitForXPath function. This makes sure that your scripts waits until the page is loaded and the element you want to click is available before the script continues.

How do I use jQuery with pages on puppeteer?

I am trying to use jQuery on the pages I load with puppeteer and I wanted to know how I can do the same? My code structure is like:
const puppeteer = require('puppeteer');
let browser = null;
async function getSelectors() {
try{
browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
await page.setViewport({width: 1024, height: 1080});
await page.goto('https://www.google.com/');
await page.addScriptTag({url: 'https://code.jquery.com/jquery-3.2.1.min.js'});
var button = $('h1').text();
console.log(button);
} catch (e) {
console.log(e);
}
}
getSelectors();
Also I will be navigating to many pages within puppeteer so is there a way I can just add jQuery once and then use it throughout? A local jquery file implementation would be helpful as well.
I tried implementing the answers from inject jquery into puppeteer page but couldn't get my code to work. I will be doing much more complex stuff than the one illustrated above so I need jQuery and not vanilla JS solutions.
I finally got a tip from How to scrape that web page with Node.js and puppeteer
which helped me understand that the Puppeteer page.evaluate function gives you direct access to the DOM of the page you've just launched in Puppeteer. To get the following code to work, you should know I'm running this test in Jest. Also, you need a suitable URL to a page that has a table element with an ID. Obviously, you can change the details of both the page and the jQuery function you want to try out. I was in the middle of a jQuery Datatables project so I needed to make sure I had a table element and that jQuery could find it. The nice thing about this environment is that the browser is quite simply a real browser, so if I add a script tag to the actual HTML page instead of adding it via Puppeteer, it works just the same.
test('Check jQuery datatables', async () => {
const puppeteer = require('puppeteer');
let browser = await puppeteer.launch();
let page = await browser.newPage();
await page.goto('http://localhost/jest/table.html');
await page.addScriptTag({url: 'https://code.jquery.com/jquery-3.3.1.slim.min.js'});
const result = await page.evaluate(() => {
try {
var table = $("table").attr("id");
return table;
} catch (e) {
return e.message;
}
});
await console.log("result", result);
await browser.close();
});
The key discovery for me: within the page.evaluate function, your JavaScript code runs in the familiar context of the page you've just opened in the browser. I've moved on to create tests for complex objects created using jQuery plugins and within page.evaluate they behave as expected. Trying to use JSDOM was driving me crazy because it behaved a bit like a browser, but was different with regard to the key points I was using to test my application.

Categories

Resources