My test html:
<div id="mainBlock">
<div class="underBlock">
Hello!
</div>
</div>
i try to get content of div with class underBlock like this:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless:false,
});
const page = await browser.newPage();
let response = await page.goto('http://localhost/TestPup/Index.html');
let block = await page.waitForXPath("//div[contains(#class,'underBlock')]")
let frame = await block.contentFrame()
console.log(frame.content())
await browser.close();
})();
but i got error:
TypeError: Cannot read property 'content' of null
As far as I understand, elementHandle.contentFrame() only returns a frame for iframe elements, and you have a regular div that is contained in the main frame, that is, in the page, and inside which there are no frames.
Related
I am using playwright in nodejs and I am having some problems when getting the page Text or Html. I just want to get the url as string like: <html><div class="123"><a>link</a>something</div><div>somethingelse</div></hmtl>
const browser = await playwright.chromium.launch({
headless: true,
});
const page = await browser.newPage();
await page.goto(url);
I was trying to use const pageText = page.$('div').innerText; and also const pageText2 = await page.$$eval('div', el => el.innerText);
But both do not work and just give me undefined.
For the full html of the page, this is what you need: const html = await page.content()
To get the inner text of the div, this should work: const pageText = await page.innerText('div')
See:
https://playwright.dev/docs/api/class-page#page-content
https://playwright.dev/docs/api/class-page#page-inner-text
I'm trying to scrape this element: on this website.
My JS code:
const puppeteer = require("puppeteer");
const url = 'https://magicseaweed.com/Bore-Surf-Report/1886/'
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
const title = await page.$x('/html/body/div[1]/div[2]/div[2]/div/div[2]/div[2]/div[2]/div/div/div[1]/div/header/h3/div[1]/span[1]')
let text = await page.evaluate(res => res.textContext, title[0])
console.log(text) // UNDEFINED
text is undefined. What is the problem here? Thanks.
I think you need to fix 1 or 2 issues on your code.
textContent vs textContext
xpath
For the content you want the xpath should be:
const title = await page.$x('/html/body/div[1]/div[2]/div[2]/div/div[2]/div[2]/div[2]/div/div/div[1]/div/div[1]/div[1]/div/div[2]/ul[1]/li[1]/text()')
And to get the content of this:
const text = await page.evaluate(el => {
return el.textContent.trim()
}, title[0])
Notice you need send title[0] as an argument to the page function.
OR
if you don't need to use xpath, it seems you could get directly using class name to find the element:
const rating = await page.evaluate(() => {
return $('.rating.rating-large.clearfix > li.rating-text')[0].textContent.trim()
})
I've been looking for a good example of click events on every element of a certain class, but I can't seem to find one. In my app, I generate multiple bars in an svg with class .bar.
Is there a nice way to iterate through each bar in the selection and click it?
Here is my code so far (with the link to dev area removed):
const puppeteer = require('puppeteer');
(async () => {
//open browser, use headless to allowing viewing
const browser = await puppeteer.launch({headless: false, sloMo: 80});
const page = await browser.newPage();
//goto link
await page.goto('/link_to_test/');
//scraping automation goes here
await page.waitFor(5000);
let bars = await page.$$(".bar");
for(const idx in bars){
await bars[idx].click({delay:250});
}
// close browser
await browser.close();
})();
I've been looking for a way to select each bar from the $$(".bar") selection and click it but I cannot seem to find any documentation around it.
Update
I increased the page.waitFor to 5000 and removed the ElementHandle from the for loop. Code no longer throws any errors but it doesn't want to click anything.
Looks like this doesn't work for SVG elements yet https://github.com/GoogleChrome/puppeteer/issues/1769
Without seeing more code I am not sure whether this is the answer you need. This code selects all a.bars for a given ul and returns an array of all the hrefs. We then loop through the links and open each one in turn.
I think the missing bit of the jigsaw is that I am mapping the links to an array (see below ... links => links.map((a) => { return a.href }));
const puppeteer = require('puppeteer');
(async () => {
const html = `
<html>
<body>
<ul>
<li><a class="bar" href="https://www.google.com">Goolge</a></li>
<li><a class="bar" href="https://www.bing.com">Bing</a></li>
<li><a class="bar" href="https://duckduckgo.com">DuckDuckGo</a></li>
</ul>
</body>
</html>`;
const browser = await puppeteer.launch({ headless:false});
const page = await browser.newPage();
await page.goto(`data:text/html,${html}`);
const data = await page.$$eval('ul li a.bar', links =>
links.map((a) => { return a.href }));
//You will now have an array of hrefs
for (const i in data) {
console.log("Opening", data[i]);
await page.goto(data[i]);
}
await browser.close();
})();
As the title says is there any way I can wait for all instances of a specific class, say a selector called "div.ticket" to load. I tried using waitForNavigation({waitUntil: "networkidle2"}); but it takes too long, I am trying to get a pdf out of the page with multiple tickets in it all inside a div having class "ticket" but the ticket isn't obtained properly(images and some text missing) when I run it without any waitFor. I also tried page.waitFor('.ticket'); but it didn't give the desired output.
If you want to wait for all elements you must know how much of them should be. Here is an example when each .ticket is created every 1 second. So when you know that should be 3 tickets on the page you should wait for page.waitFor('.ticket:nth-of-type(3)'):
const puppeteer = require('puppeteer');
const html = `
<html>
<head></head>
<body>
<div class="tickets"></div>
<script>
const ticketsContainer = document.querySelector('.tickets');
let i = 0;
const timeInterval = setInterval(
() => {
const newTicket = document.createElement("div");
newTicket.innerHTML = "ticket" + i;
newTicket.className = "ticket";
ticketsContainer.appendChild(newTicket);
i++;
if (i >= 3) {
clearInterval(timeInterval);
}
},
1000
);
</script>
</body>
</html>`;
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(`data:text/html,${html}`);
// await page.waitFor('.ticket');
await page.waitFor('.ticket:nth-of-type(3)');
await page.screenshot({ path: 'image.png' });
await browser.close();
})();
I have a problem with getting elements by their selectors.
A page on which I struggle is: http://html5.haxball.com/.
What I have succeded is to log in, but that was kind of a hack, because I used the fact, that the field I need to fill is already selected.
After typing in nick and going into lobby I want to click the button 'Create room'. Its selector:
body > div > div > div > div > div.buttons > button:nth-child(3)
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
args: ['--no-sandbox'], headless: false, slowMo: 10
});
const page = await browser.newPage();
await page.goto('http://html5.haxball.com/index.html');
await page.keyboard.type(name);
await page.keyboard.press('Enter');
//at this point I am logged in.
let buttonSelector = 'body > div > div > div > div > div.buttons > button:nth-child(3)';
await page.waitForSelector('body > div > div');
await page.evaluate(() => {
document.querySelector(buttonSelector).click();
});
browser.close();
})();
after running such code I get error:
UnhandledPromiseRejectionWarning: Error: Evaluation failed: TypeError: Cannot read property 'click' of null
My initial approach was with:
await page.click(buttonSelector);
instead of page.evaluate but it also fails.
What frustrates my the most is the fact that when I run in Chromium console:
document.querySelector(buttonSelector).click();
it works fine.
A few things to note:
The selector you are using to retrieve the button is more complex than it needs to be. Try something simpler like: 'button[data-hook="create"]'.
The game is within an iframe, so you're better off calling document.querySelector using the iframe's document object as opposed to the containing window's document
The function passed to evaluate is executed in a different context than where you are running your node script. For this reason, you have to explicitly pass variables from your node script to the window script otherwise buttonSelector will be undefined:
Making the changes above, your code will input your name and successfully click on "Create Room":
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
args: ['--no-sandbox'], headless: false, slowMo: 10
});
const page = await browser.newPage();
await page.goto('http://html5.haxball.com/index.html');
await page.keyboard.type('Chris');
await page.keyboard.press('Enter');
//at this point I am logged in.
let buttonSelector = 'button[data-hook="create"]';
await page.waitForSelector('body > div > div');
await page.evaluate((buttonSelector) => {
var frame = document.querySelector('iframe');
var frameDocument = frame.contentDocument;
frameDocument.querySelector(buttonSelector).click();
}, buttonSelector);
browser.close();
})();