Puppeteer: Get DOM element which isn't in the initial DOM - javascript

I'm trying to figure out how to get the elements in e.g. a JS gallery that loads its images after they have been clicked on.
I'm using a demo of Viewer.js as an example. The element with the classes .viewer-move.viewer-transition isn't in the initial DOM. After clicking on an image the element is available but if I use $eval the string is empty. If I open the console in the Puppeteer browser and execute document.querySelector('.viewer-move.viewer-transition') I get the element but in the code the element isn't available.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://fengyuanchen.github.io/viewerjs/');
await page.click('[data-original="images/tibet-1.jpg"]');
let viewer = await page.$eval('.viewer-move.viewer-transition', el => el.innerHTML);
console.log(viewer);
})();

You get the empty string because the element has no content so inner HTML is empty. outerHTML seems working:
const puppeteer = require('puppeteer');
(async function main() {
try {
const browser = await puppeteer.launch({ headless: false });
const [page] = await browser.pages();
await page.goto('https://fengyuanchen.github.io/viewerjs/');
await page.click('[data-original="images/tibet-1.jpg"]');
await page.waitForSelector('.viewer-move.viewer-transition');
const viewer = await page.$eval('.viewer-move.viewer-transition', el => el.outerHTML);
console.log(viewer);
await browser.close();
} catch (err) {
console.error(err);
}
})();

Since you have to wait until it is available, the most convenient method would be to use await page.waitForSelector(".viewer-move.viewer-transition") which would wait util the element is added to DOM, although this has the caveat that this continues execution the moment that the element is added to DOM, even if it is empty/hidden.

Related

function in page.evaluate() is not working/being executed - Puppeteer

My Code below tries to collect a bunch of hyper links that come under the class name ".jss2". However, I do not think the function within my page.evaluate() is working. When I run the code, the link_list const doesn't get displayed.
I ran the document.querySelectorAll on the Chrome console and that was perfectly fine - really having a hard time with this.
async function testing() {
const browser = await puppeteer.launch({headless:false});
const page = await browser.newPage();
await page.setViewport({width: 1200, height: 800});
await page.goto(url);
const link_list = await this.page.evaluate(() => {
let elements = Array.from(document.querySelectorAll(".jss2"));
let links = elements.map(element => {
return element.href;
});
return (links);
});
console.log(link_list);
}
const link_list = await page.$$eval('.classname', links => links.map(link => link.href));
Found the answer here: PUPPETEER - unable to extract elements on certain websites using page.evaluate(() => document.querySelectorAll())

How to multiple html elements with puppeteer?

I want to get multiple html elements with puppeteer from dynamic website.
But when I only get first element.
How to get all elements?
const puppeteer = require("puppeteer-core");
browser = await puppeteer.launch({
executablePath:
"./node_modules/chromium/lib/chromium/chrome-mac/Chromium.app/Contents/MacOS/Chromium",
});
const element = await page.waitForSelector(
".MuiTableRow-root.MuiTableRow-hover.css-1tq71ky"
);
const value = await element.evaluate((el) => el.textContent);
console.log(value);
await browser.close();
I had a similar issue and i solved it with a loop, something like this:
for(const single of element){
const value = await single.evaluate((el) => el.textContent);
console.log(value);
}

Failed to scrape the link to the next page using xpath in puppeteer

I'm trying to scrape the link to the next page from this webpage. I know how to scrape that using css selector. However, things go wrong when I attempt to parse the same using xpath. This is what I get instead of the next page link.
const puppeteer = require("puppeteer");
let url = "https://stackoverflow.com/questions/tagged/web-scraping";
(async () => {
const browser = await puppeteer.launch({headless:false});
const [page] = await browser.pages();
await page.goto(url,{waitUntil: 'networkidle2'});
let nextPageLink = await page.$x("//a[#rel='next']", item => item.getAttribute("href"));
// let nextPageLink = await page.$eval("a[rel='next']", elm => elm.href);
console.log("next page:",nextPageLink);
await browser.close();
})();
How can I scrape the link to the next page using xpath?
page.$x(expression) returns an array of element handles. You need either destructuring or index acces to get the first element from the array.
To get a DOM element property from this element handle, you need either evaluating with element handle parameter or element handle API.
const [nextPageLink] = await page.$x("//a[#rel='next']");
const nextPageURL = await nextPageLink.evaluate(link => link.href);
Or:
const [nextPageLink] = await page.$x("//a[#rel='next']");
const nextPageURL = await (await nextPageURL.getProperty('href')).jsonValue();

web scraping with puppeteer does not find the CSS tag

im starting to learn web scraping in javascript with puppeteer. I found a video that I liked that showcases puppeteer and I'm trying to scrape the same information as the video(link). the page has changed a little from the video so I used what I think are the correct tags.
the problem comes when I try to find the "h3" tag. the tag exists in the DOM but my code refuses to acknowledge its existence but works "fine" when looking for the "h2" tag.
what I want to know is why my code does not retrieve it.
web page: https://marketingplatform.google.com/about/partners/find-a-partner?utm_source=marketingplatform.google.com&utm_medium=et&utm_campaign=marketingplatform.google.com%2Fabout%2Fpartners%2F
// normal things to launch it
const puppeteer = require("puppeteer");
(async() => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const url = "https://marketingplatform.google.com/about/partners/find-a-partner?utm_source=marketingplatform.google.com&utm_medium=et&utm_campaign=marketingplatform.google.com%2Fabout%2Fpartners%2F";
await page.goto(url);
// here comes the problem
// this doesn't work v
const h3 = await page.evaluate(() => document.querySelector("h3").textContent);
console.log(h3); //the error is because it tries to get the text content of null meaning it didn't found "h3"
// this DOES work v
const h2 = await page.evaluate(() => document.querySelector("h2").textContent);
console.log(h2);
//await browser.close();
})();
i know that "h3" exists. I will appreciate it if you explain a little of what happens so I can learn more
thx.
h3 header not exist on page, we need wait it by waitForSelector:
// normal things to launch it
const puppeteer = require("puppeteer");
(async() => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const url = "https://marketingplatform.google.com/about/partners/find-a-partner?utm_source=marketingplatform.google.com&utm_medium=et&utm_campaign=marketingplatform.google.com%2Fabout%2Fpartners%2F";
await page.goto(url);
await page.waitForSelector('h3')
const h3 = await page.evaluate(() => document.querySelector("h3").textContent);
console.log(h3);
const h2 = await page.evaluate(() => document.querySelector("h2").textContent);
console.log(h2);
await browser.close(); // don't forget close it.
})();
output is:
Viden
Find your perfect match.

Puppeteer throws error with "Node not visible..."

When I open this page with puppeteer, and try to click the element, it throws an Error when it is expected to be possible to click the element.
const puppeteer = require('puppeteer');
const url = "https://www.zapimoveis.com.br/venda/apartamentos/rj+rio-de-janeiro/";
async function run() {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto(url, {waitUntil: 'load'});
const nextPageSelector = "#list > div.pagination > #proximaPagina";
await page.waitForSelector(nextPageSelector, {visible: true})
console.log("Found the button");
await page.click(nextPageSelector)
console.log('clicked');
}
run();
Here's a working version of your code.
const puppeteer = require('puppeteer');
const url = "https://www.zapimoveis.com.br/venda/apartamentos/rj+rio-de-janeiro/";
async function run() {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto(url);
const nextPageSelector = "#list > div.pagination > #proximaPagina";
console.log("Found the button");
await page.evaluate(selector=>{
return document.querySelector(selector).click();
},nextPageSelector)
console.log('clicked');
}
run();
I personally prefer to use page.evaluate, as page.click doesn't work for me neither in some cases and you can execute whatever js on the page directly.
The only thing to know is the syntax :
- first param : function to execute
- second param : the variable(s) to be passed to this function
Found the problem.
When you page.click(selector) or ElementHandle.click(), Puppeteer scrolls to the element, find its center coordinates, and finaly clicks. It uses the function _clickablePoint at node_modules/puppeteer/lib/JSHandle.js to find the coordinates.
The problem with this website (zapimoveis) is that the scroll into the element's viewport is too slow, so Puppeteer can't find its (x,y) center.
One nice way you can click on this element is to use page.evaluate to click it using page javascript. But, there is a gambi's way that I prefer. :) Change the line await page.click(nextPageSelector) by these lines:
try { await page.click(nextPageSelector) } catch (error) {} // sacrifice click :)
await page.waitFor(3000); // time for scrolling
await page.click(nextPageSelector); // this will work

Categories

Resources