Node.js Puppeteer UnhandledPromiseRejectionWarning trying to navigate Google Maps - javascript

(node:15348) UnhandledPromiseRejectionWarning: Error: Execution context was destroyed, most likely because of a navigation.
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
page.goto("https://www.google.com/maps/place/Faruk+G%C3%BCll%C3%BCo%C4%9Flu+-+Sunny/#41.0298046,28.7909262,13z/data=!4m8!1m2!2m1!1sfaruk+gulluoglu!3m4!1s0x14caa4f77579848b:0x37c42d8b0cecc146!8m2!3d41.0298046!4d28.8151116");
page.waitFor
const seeAllReviewsButton = "#pane > div > div.widget-pane-content.scrollable-y > div > div > div:nth-child(45) > div > div > button > span";
page.click(seeAllReviewsButton);
I can't navigate to Google Maps Link Of A Business.

There are few corrections needed: You need to await page.goto, page.waitFor, and page.click methods. And most importantly page.waitFor() is a method and it takes string or number or function as arguments and all of these methods return a promise. So they need to be awaited or do then on it.

You need to use await before page.goto, page.waitFor and page.click because it return Promise. and use { waitUntil: "domcontentloaded" } with page.goto to wait for DOM. then I fix seeAllReviewsButton selector.
The code below works fine with me.
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto(
"https://www.google.com/maps/place/Faruk+G%C3%BCll%C3%BCo%C4%9Flu+-+Sunny/#41.0298046,28.7909262,13z/data=!4m8!1m2!2m1!1sfaruk+gulluoglu!3m4!1s0x14caa4f77579848b:0x37c42d8b0cecc146!8m2!3d41.0298046!4d28.8151116",
{ waitUntil: "domcontentloaded" }
);
const seeAllReviewsButton =
"#pane > div > div.widget-pane-content.scrollable-y > div > div > div.section-hero-header-title > div.section-hero-header-title-top-container > div.section-hero-header-title-description > div.section-hero-header-title-description-container > div > div.gm2-body-2.section-rating-line > span:nth-child(3) > span > span:nth-child(1) > span:nth-child(2) > span:nth-child(1) > button";
await page.waitForSelector(seeAllReviewsButton);
await page.click(seeAllReviewsButton);
})();

Related

how to unite cheerio with puppeteer so he can click on elements

I tried cheerio to find the element and if the element is found then he has to click but I don't know what to do with the puppeteer combination, the button I want to click is in the 3rd pict
await page.waitForTimeout(10000)
const contentHTML = await page.content();
const $ = cheerio.load(contentHTML);
const outerHTML = $('<button class="sc-nkuzb1-0 sc-d5trka-0 dsOMxw button" data-theme="home.verifyButton">Authenticate</button>').prop('innerText');
console.log(outerHTML);

Open a link in a new tab, scrape, go to previous page

I'm using puppeteer for the following:
I switched await link.click(".ExCategory-results > .ExResult-row:nth-child(${i}) > .ExResult-cell > .ExHeading > a",); for await new.page('...') but it says that it can't find the a.
This is the page that I'm scraping but notice the Load More button at the bottom of the page.
https://www.bodybuilding.com/exercises/finder
To prevent resetting the Load more button I want to open each new in a new tab, scrape, close tab and go to the next name.
How can I open each link in a new tab, close, and go to the previous tab?
My code:
var buttonExists = true;
let allData = [];
while (buttonExists == true) {
// const loadMore = true;
const rowsCounts = await page.$$eval(
'.ExCategory-results > .ExResult-row',
(rows) => rows.length
);
console.log(`row counts = ${rowsCounts}`);
for (let i = 2; i < rowsCounts + 1; i++) {
const exerciseName = await page.$eval(
`.ExCategory-results > .ExResult-row:nth-child(${i}) > .ExResult-cell > .ExHeading > a`,
(el) => el.innerText
);
console.log(`Exercise = ${exerciseName}`);
await link.click(`.ExCategory-results > .ExResult-row:nth-child(${i}) > .ExResult-cell > .ExHeading > a`,);
await page.waitForSelector('#js-ex-content');
... fancy code here
await page.goBack();
let obj = {
exercise: exerciseName,
};
allData.push(obj);
}
// clicking load more button and waiting 1sec
try {
await page.click(LoadMoreButton);
}
catch (err) {
buttonExists = false;
}
await page.waitForTimeout(1000);
}
This selector: .ExCategory-results > .ExResult-row:nth-child(${i}) > .ExResult-cell > .ExHeading > a is unnecessarily long and it gives you not completely correct results.
To get to these elements:
this selector should be enough: .ExResult-row .ExHeading > a.
Then you asked:
I want to open each new in a new tab, scrape, close tab and go to the next name.
and
How can I open each link in a new tab, close, and go to the previous tab?
In Puppeteer you can create a new page like so: await browser.newPage();, so you can do it many times and store the pages into an array:
let pages = [];
pages.push(await browser.newPage());
then you get the links:
const links = await pages[0].$$eval(
'.ExResult-row .ExHeading > a',
links => links.map(l => l.getAttribute('href'))
);
and finally to create a new page for each link, scrape what you need, and close the page:
for (let link of links) {
pages.push(await browser.newPage());
await pages[pages.length - 1].goto(`${baseUrl}/${link}`);
// your scraping
await pages[pages.length - 1].close();
}
If you need to look up more, refer to the API documentation Puppeteer provides.

Can't find a working selector on drop down menu while using Puppeteer? [duplicate]

This question already has answers here:
Issue with CSS locator select-react
(2 answers)
Closed 3 years ago.
I'm creating an automated script with puppeteer and I'm running across a problem of trying to find a selector that could be understood. I have tried many different options but gotten no luck.
Note:Don't worry its a dummy account so nothing important is on it.
I tried using
const myacc = '.li.member-nav-item.d-sm-ib.va-sm-m > button';
and bunch of others but still getting selector error
Code:
const puppeteer = require('puppeteer');
var emailVal = 'kellybrando23434#gmail.com';
var passwordVal = 'd34gfA#4dfW';
const AcceptCookies = '#cookie-settings-layout > div > div > div > div.ncss-row.mt5-sm.mb7-sm > div:nth-child(2) > button';
const loginBtn = 'li.member-nav-item.d-sm-ib.va-sm-m > button';
const email = 'input[type="email"]';
const password = 'input[type="password"]';
const logsubmit = '.loginSubmit.nike-unite-component > input[type="button"]';
const myacc = '.li.member-nav-item.d-sm-ib.va-sm-m > button'; //this line contains error
(async () => {
const browser = await puppeteer.launch({headless: false, slowMo: 150});
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 })
await page.goto('https://www.nike.com/launch/'); const AcceptCookies = '#cookie-settings-layout > div > div > div > div.ncss-row.mt5-sm.mb7-sm > div:nth-child(2) > button'; await page.click(loginBtn);
console.log("Login Button Clicked...");
await page.waitFor(5000);
console.log("email: " + emailVal);
await page.type(email, emailVal);
console.log("entered email");
await page.type(password, passwordVal);
console.log("waiting 0.5s");
await page.waitFor(500);
console.log("waiting done");
await page.click(logsubmit);
console.log("submitted"); await page.waitFor(10000); await page.click(myacc); await page.waitFor(10000);
await browser.close(); })();
I'm trying to get the correct selector - "const myacc=..."- to click account profile as shown in the picture (highlighted section) but instead I'm getting a selector error ("Error:No node found for selector:...."). How would you find it in this situation as their is no id?
Before Picture
After Picture
There are almost all of the elements contains a unique attribute for testing data-qa. I would recommend using it for testing, so replace all your selectors with it.
Here is the example for the 'my account' selector:
const myAcc = '[data-qa="user-name"]';
Also, you may not see that the selector was clicked due to screen size, so you will need to maximize screen size.

Using Puppeteer to find element by title

Assuming I have a link in an iframe without an id, where the only uniquely identifiable piece of information is the element title of this link, how would I go about finding it? It could look like this:
<a class="link spacer--double" href="#" tabindex="51" title="Click here to use new code">Use new code</a>
This is what I have attempted so far:
(async() => {
const browser = await puppeteer.launch({
headless: false
});
const page = await browser.newPage();
console.log("starting new page");
var contentHtml = fs.readFileSync('1-iframe.html', 'utf8');
await page.setContent(contentHtml);
const result = await page.evaluate(() => {
let elements = document.getElementById("framecontentscroll").innerText;
for (let element of elements)
console.log(element);
})
})();
But I don't seem to get anything in elements. The id of the iframe is "framecontentscroll".
Is there a way where I can go directly for the link element using it's title and a querySelector or something similar?
You can use the querySelector to select by title. See:
Select an element by title with JavaScript and tweak from the browser?

How to manipulate the DOM before in-page scripts are executed?

Using Puppeteer, how can I run a script in the page context, with the full DOM available, before the in-page JS is executed?
For example, how can I run the following script to remove alt attributes from img elements, before any of the page JS is run?
document.querySelectorAll('img[alt]').forEach(
e => e.removeAttribute('alt')
)
(page.evaluateOnNewDocument looks like it would be useful, but it appears to be executed before the page content is available--at the point at which it runs, the page is blank.)
I think the way to achieve what you are looking for is to perform:
set page.setJavaScriptEnabled(false)
enter the page
extract all the scripts and HTML without scripts
set page.setJavaScriptEnabled(true)
enter page.goto(`data:text/html,${HTMLWithoutScript}`) with HTML from step 3
execute your scripts
incject original extracted scripts page.addScriptTag({ content: script }) from step 3
Example
Here is a visualization of your problematic example:
const puppeteer = require('puppeteer');
const html = `
<html>
<head></head>
<body>
<img src="https://picsum.photos/200/300?image=1062" alt="dog ">
<img src="https://picsum.photos/200/300?image=1072" alt="car ">
<div class="alts">List of alts: </div>
<script>
const images = document.querySelectorAll('img');
const altsContainer = document.querySelector('.alts');
images.forEach(image => {
const alt = image.getAttribute('alt') || 'missing alt ';
altsContainer.insertAdjacentHTML('beforeend', alt);
})
</script>
</body>
</html>`;
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(`data:text/html,${html}`);
await page.evaluate(() => {
document.querySelectorAll('img[alt]').forEach(
e => e.removeAttribute('alt')
)
});
await page.screenshot({ path: 'image.png' });
await browser.close();
})();
This code produce:
So remove alts is not working here.
solution
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setJavaScriptEnabled(false);
await page.goto(`data:text/html,${html}`);
const { script, HTMLWithoutScript } = await page.evaluate(() => {
const script = document.querySelector('script').innerHTML;
document.querySelector('script').innerHTML = '';
const HTMLWithoutScript = document.body.innerHTML;
return { script, HTMLWithoutScript }
});
await page.setJavaScriptEnabled(true);
await page.goto(`data:text/html,${HTMLWithoutScript}`);
await page.evaluate(() => {
document.querySelectorAll('img[alt]').forEach(
e => e.removeAttribute('alt')
)
});
await page.addScriptTag({ content: script });
await page.screenshot({ path: 'image.png' });
await browser.close();
})();
This will produce results as you expect in a question:
You can move your script tags to body instead of head. Then using document onload event you can execute a script. According to MDN this event fires when an object has been loaded. Below is the example code
function removeAlt(){
document.querySelectorAll('img[alt]').forEach((e)=>{
e.removeAttribute('alt');
});
}
<body onload="removeAlt()">
<img src="http://placehold.it/64x64" alt="1">
<img src="http://placehold.it/64x64" alt="2">
</body>
Let me know whether this fits into your requirement, I tested and function is removing alt tags from image

Categories

Resources