Puppeteer returning undefined (JS) using xPath - javascript

I'm trying to scrape this element: on this website.
My JS code:
const puppeteer = require("puppeteer");
const url = 'https://magicseaweed.com/Bore-Surf-Report/1886/'
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
const title = await page.$x('/html/body/div[1]/div[2]/div[2]/div/div[2]/div[2]/div[2]/div/div/div[1]/div/header/h3/div[1]/span[1]')
let text = await page.evaluate(res => res.textContext, title[0])
console.log(text) // UNDEFINED
text is undefined. What is the problem here? Thanks.

I think you need to fix 1 or 2 issues on your code.
textContent vs textContext
xpath
For the content you want the xpath should be:
const title = await page.$x('/html/body/div[1]/div[2]/div[2]/div/div[2]/div[2]/div[2]/div/div/div[1]/div/div[1]/div[1]/div/div[2]/ul[1]/li[1]/text()')
And to get the content of this:
const text = await page.evaluate(el => {
return el.textContent.trim()
}, title[0])
Notice you need send title[0] as an argument to the page function.
OR
if you don't need to use xpath, it seems you could get directly using class name to find the element:
const rating = await page.evaluate(() => {
return $('.rating.rating-large.clearfix > li.rating-text')[0].textContent.trim()
})

Related

Puppeteer- How to .click() a single button out of a grid of buttons with same classname?

I'm developing a Nike SNKRS BOT to buy shoes with Puppeteer and Node.js.
I'm having issues to distinguish and .click() Size button screenshot of html devtools and front end buttons
That's my code: i'm not experienced so i have tried everything
const xpathButton = '//*
[#id="root"]/div/div/div[1]/div/div[1]/div[2]/div/section[1]/div[2]/aside/div/div[2]/div/
div[2]/ul/li[1]/button'
const puppeteer = require('puppeteer')
const productUrl = 'https://www.nike.com/it/launch/t/air-max-97-coconut-
milk-black'
const idAcceptCookies = "button[class='ncss-btn-primary-dark btn-lg']"
async function givePage(){
const browser = await puppeteer.launch({headless: false})
const page = await browser.newPage();
return page;
}
async function addToCart(page){
await page.goto(urlProdotto);
await page.waitForSelector(idAcceptCookies);
await page.click(idAcceptCookies,elem => elem.click());
//this is where the issues begin
//attempt 1
await page.evaluate(() => document.getElementsByClassName('size-grid-
dropdown size-grid-button"')[1].click());
//attempt 2
const sizeButton = "button[class='size-grid-dropdown size-grid-button']
button[name='42']";
await page.waitForSelector(sizeButton);
await page.click(sizeButton,elem => elem.click());
}
//attempt 3
await page.click(xpathButton)
//attempt 4
document.evaluate("//button[contains ( ., '36')]", document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue
async function checkout(){
var page = await givePage();
await addToCart(page)
}
checkout()
Attempt number 2 looks like the best approach, except your selector is wrong. The button does not have a name attribute, according to your screenshot, so you will need another approach, closer to attempt 3.
You can use puppeteer to select an element by with xpath, and xpath allows you to select by an element's text content.
Try this:
await page.waitForXPath('//button[contains(text(), "EU 36")]')
const [button] = await page.$x('//button[contains(text(), "EU 36")]')
await button.click()
Because the xpath selector is returning an array of element handles, I destructure the first element in the array (which should be the only match), and assign it a value of button. That element handle can now be clicked.

Get entire Playwright page in html and Text

I am using playwright in nodejs and I am having some problems when getting the page Text or Html. I just want to get the url as string like: <html><div class="123"><a>link</a>something</div><div>somethingelse</div></hmtl>
const browser = await playwright.chromium.launch({
headless: true,
});
const page = await browser.newPage();
await page.goto(url);
I was trying to use const pageText = page.$('div').innerText; and also const pageText2 = await page.$$eval('div', el => el.innerText);
But both do not work and just give me undefined.
For the full html of the page, this is what you need: const html = await page.content()
To get the inner text of the div, this should work: const pageText = await page.innerText('div')
See:
https://playwright.dev/docs/api/class-page#page-content
https://playwright.dev/docs/api/class-page#page-inner-text

Problem filling in an input field without an id or identifying attribute. (Puppeteer)

Here is the input element I would like to type into:
<input name="name" ng-model-options="{ debounce: 300 }" ng-model="contestantState.form.name" ng-pattern=".*" placeholder="Alice Smith" required="" style="width: 246px" type="text" class="ng-empty ng-invalid ng-invalid-required ng-valid-pattern ng-dirty ng-valid-parse ng-touched">
One of my tries:
await page.type('input[name="name"]',"my name");
No matter what way I try to type into this field, nothing happens: it stays blank because it is not identifiable by its name or classes.
What do I do to have Puppeteer enter a Name into that field?
The website I'm trying to do this on is: https://gleam.io/hxVaH/win-a-krooked-big-eyes-too-skateboard?gsr=hxVaH-4hKgdgSzfh
(The Full Name and email field).
Existing answers have been useful, but this should do the full submission and give you the share URL.
Since this AngularJS app doesn't seem to use ids and classes much, using text contents seems like a reasonable choice for hitting a couple of the buttons.
I had to use .evaluate(el => el.click()) instead of .click() to click one of the buttons successfully as described here.
const puppeteer = require("puppeteer");
const clickXPath = async (page, xp) => {
await page.waitForXPath(xp);
const [el] = await page.$x(xp);
await el.evaluate(el => el.click());
};
const randomEmail = () =>
Array(10).fill().map(() =>
String.fromCharCode(~~(Math.random() * 26 + 97))
).join("") +
`${(Math.random() + "").replace(/\./, "")}#gmail.com`
;
let browser;
(async () => {
const url = "https://gleam.io/hxVaH/win-a-krooked-big-eyes-too-skateboard";
browser = await puppeteer.launch({headless: false});
const [page] = await browser.pages();
await page.goto(url, {waitUntil: "networkidle0"});
await clickXPath(page, `//span[contains(text(), "Refer Friends")]`);
const nameSel = '[class="entry-method expanded"] input[name="name"]';
const emailSel = '[class="entry-method expanded"] input[name="email"]';
await page.waitForSelector(nameSel);
await page.type(nameSel, "my name33");
await page.type(emailSel, randomEmail());
const hasText = () => page.evaluate(() =>
!!document.querySelector(".share-link__link")?.innerText.trim()
);
for (let tries = 1000; !(await hasText()) && tries--;) {
await clickXPath(page, `//span[contains(text(), "Save")]`);
await page.waitForTimeout(100);
}
const shareCode = await page.$eval(".share-link__link", el => el.innerText);
console.log(shareCode); // => https://wn.nr/w9Wz5p
})()
.catch(err => console.error(err))
.finally(async () => await browser.close())
;
There is more than one matching element using your locator. Try this:
await page.type('[class="entry-method expanded"] input[name="name"]',"my name");
You need to toggle the input first to make it available in the DOM (until that this element is only an Angular Template inside a <script> tag). And as Tanuj wrote: you should go with a selector which have only one instance. E.g.:
await page.click('#em5682795 > a'); // "Refer friends for extra entries"
await page.type('[class="entry-method expanded"] input[name="name"]',"my name");

Can't scrape from a page I navigate to by using Puppeteer

I'm fairly new to Puppeteer and I'm trying to practice keep tracking of a selected item from Amazon. However, I'm facing a problem when I try to retrieve some results from the page.
The way I intended this automation to work is by following these steps:
New tab.
Go to the home page of Amazon.
Enter the given product name in the search element.
Press the enter key.
Return the product title and price.
Check this example below:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', (req) => { // don't load any fonts or images on my requests. To Boost the performance
if (req.resourceType() == 'font' /* || req.resourceType() == 'image' || req.resourceType() == 'stylesheet'*/) {
req.abort();
}
else {
req.continue(); {
}
}
});
const baseDomain = 'https://www.amazon.com';
await page.goto(`${baseDomain}/`, { waitUntil: "networkidle0" });
await page.click("#twotabsearchtextbox" ,{delay: 50})
await page.type("#twotabsearchtextbox", "Bose QuietComfort 35 II",{delay: 50});
await page.keyboard.press("Enter");
await page.waitForNavigation({
waitUntil: 'networkidle2',
});
let productTitle = await page.$$(".a-size-medium, .a-color-base, .a-text-normal")[43]; //varible that holds the title of the product
console.log(productTitle );
debugger;
})();
when I execute this code, I get in the console.log a value of undefined for the variable productTitle. I had a lot of trouble with scraping information from a page I navigate to. I used to do page.evaluate() and it only worked when I'm scraping from the page that I have told the browser to go to.
The first problem is on this line:
let productTitle = await page.$$(".a-size-medium, .a-color-base, .a-text-normal")[43];
// is equivalent to:
let productTitle = await (somePromise[43]);
// As you guessed it, a Promise does not have a property `43`,
// so I think you meant to do this instead:
let productTitle = (await page.$$(".a-size-medium, .a-color-base, .a-text-normal"))[43];
Once this is fixed, you don't get the title text, but a handle to the DOM element. So you can do:
let titleElem = (await page.$$(".a-size-medium, .a-color-base, .a-text-normal"))[43];
let productTitle = await titleElem.evaluate(node => node.innerText);
console.log(productTitle); // "Microphone"
However, I'm not sure that simply selecting the 43rd element will always get you the one you want, but if it isn't, that would be a topic for another question.

Web Scrape with Puppeteer within a table

I am trying to scrape this page.
https://www.psacard.com/Pop/GetItemTable?headingID=172510&categoryID=20019&isPSADNA=false&pf=0&_=1583525404214
I want to be able to find the grade count for PSA 9 and 10. If we look at the HTML of the page, you will notice that PSA does a very bad job (IMO) at displaying the data. Every TR is a player. And the first TD is a card number. Let's just say I want to get Card Number 1 which in this case is Kevin Garnett.
There are a total of four cards, so those are the only four cards I want to display.
Here is the code I have.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://www.psacard.com/Pop/GetItemTable?headingID=172510&categoryID=20019&isPSADNA=false&pf=0&_=1583525404214");
const tr = await page.evaluate(() => {
const tds = Array.from(document.querySelectorAll('table tr'))
return tds.map(td => td.innerHTML)
});
const getName = tr.map(name => {
//const thename = Array.from(name.querySelectorAll('td.card-num'))
console.log("\n\n"+name+"\n\n");
})
await browser.close();
})();
I will get each TR printed, but I can't seem to dive into those TRs. You can see I have a line commented out, I tried to do this but get an error. As of right now, I am not getting it by the player dynamically... The easiest way I would think is to create a function that would think about getting the specific card would be doing something where the select the TR -> TD.card-num == 1 for Kevin.
Any help with this would be amazing.
Thanks
Short answer: You can just copy and paste that into Excel and it pastes perfectly.
Long answer: If I'm understanding this correctly, you'll need to map over all of the td elements and then, within each td, map each tr. I use cheerio as a helper. To complete it with puppeteer just do: html = await page.content() and then pass html into the cleaner I've written below:
const cheerio = require("cheerio")
const fs = require("fs");
const test = (html) => {
// const data = fs.readFileSync("./test.html");
// const html = data.toString();
const $ = cheerio.load(html);
const array = $("tr").map((index, element)=> {
const card_num = $(element).find(".card-num").text().trim()
const player = $(element).find("strong").text()
const mini_array = $(element).find("td").map((ind, elem)=> {
const hello = $(elem).find("span").text().trim()
return hello
})
return {
card_num,
player,
column_nine: mini_array[13],
column_ten: mini_array[14],
total:mini_array[15]
}
})
console.log(array[2])
}
test()
The code above will output the following:
{
card_num: '1',
player: 'Kevin Garnett',
column_nine: '1-0',
column_ten: '0--',
total: '100'
}

Categories

Resources