I'm using puppeteer to interact with a website using the evaluate() function to maniupulate page front (i.e to click on certain items etc...), click through works fine but I can't return the page source after clicking using evaluate.
I have recreated the error in this simplified script below it loads google.com, clicks on 'I feel lucky' and should then return the page source of the loaded page:
const puppeteer = require('puppeteer');
async function main() {
const browser = await puppeteer.launch({
headless: false,
args: ['--no-sandbox']
});
const page = await browser.newPage();
await page.goto('https://www.google.com/', {waitUntil: 'networkidle2'});
response = await page.evaluate(() => {
document.getElementsByClassName('RNmpXc')[1].click()
});
await page.waitForNavigation({waitUntil: 'load'});
console.log(response.text());
}
main();
I get the following error:
TypeError: Cannot read property 'text' of undefined
UPDATE New code following suggestion to use page.content()
const puppeteer = require('puppeteer');
async function main() {
const browser = await puppeteer.launch({
headless: false,
args: ['--no-sandbox']
});
const page = await browser.newPage();
await page.goto('https://www.google.com/', {waitUntil: 'networkidle2'});
await page.evaluate(() => {
document.getElementsByClassName('RNmpXc')[1].click()
});
const source = await page.content()
console.log(source);
}
main();
I am now getting the following error:
Error: Execution context was destroyed, most likely because of a navigation.
My question is: How can I return page source using the .text() method after manipulating the webpage using the evaluate() method?
All suggestions / insight / proposals would be very much appreciated thanks.
Since you're asking for page source after javascript modification, I'd assume you want DOM and not the original HTML content. your evaluate function doesn't return anything which results in undefined response. You can use
const source = await page.evaluate(() => new XMLSerializer().serializeToString(document.doctype) + document.documentElement.outerHTML);
or
const source = await page.content();
Related
So what I am trying to do is to open puppeteer window with my google profile, but what I want is to do it multiple times, what I mean is 2-4 windows but with the same profile - is that possible? I am getting this error when I do it:
(node:17460) UnhandledPromiseRejectionWarning: Error: Failed to launch the browser process!
[45844:13176:0410/181437.893:ERROR:cache_util_win.cc(20)] Unable to move the cache: Access is denied. (0x5)
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless:false,
'--user-data-dir=C:\\Users\\USER\\AppData\\Local\\Google\\Chrome\\User Data',
);
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({ path: 'example.png' });
await browser.close();
})();
Note: It is already pointed in the comments but there is a syntax error in the example. The launch should look like this:
const browser = await puppeteer.launch({
headless: false,
args: ['--user-data-dir=C:\\Users\\USER\\AppData\\Local\\Google\\Chrome\\User Data']
});
The error is coming from the fact that you are launching multiple browser instances at the very same time hence the profile directory will be locked and cannot be moved to reuse by puppeteer.
You should avoid starting chromium instances with the very same user data dir at the same time.
Possible solutions
Make the opened windows sequential, can be useful if you have only a few. E.g.:
const firstFn = async () => await puppeteer.launch() ...
const secondFn = async () => await puppeteer.launch() ...
(async () => {
await firstFn()
await secondFn()
})();
Creating copies of the user-data-dir as User Data1, User Data2 User Data3 etc. to avoid conflict while puppeteer copies them. This could be done on the fly with Node's fs module or even manually (if you don't need a lot of instances).
Consider reusing Chromium instances (if your use case allows it), with browser.wsEndpoint and puppeteer.connect, this can be a solution if you would need to open thousands of pages with the same user data dir.
Note: this one is the best for performance as only one browser will be launched, then you can open as many pages in a for..of or regular for loop as you want (using forEach by itself can cause side effects), E.g.:
const puppeteer = require('puppeteer')
const urlArray = ['https://example.com', 'https://google.com']
async function fn() {
const browser = await puppeteer.launch({
headless: false,
args: ['--user-data-dir=C:\\Users\\USER\\AppData\\Local\\Google\\Chrome\\User Data']
})
const browserWSEndpoint = await browser.wsEndpoint()
for (const url of urlArray) {
try {
const browser2 = await puppeteer.connect({ browserWSEndpoint })
const page = await browser2.newPage()
await page.goto(url) // it can be wrapped in a retry function to handle flakyness
// doing cool things with the DOM
await page.screenshot({ path: `${url.replace('https://', '')}.png` })
await page.goto('about:blank') // because of you: https://github.com/puppeteer/puppeteer/issues/1490
await page.close()
await browser2.disconnect()
} catch (e) {
console.error(e)
}
}
await browser.close()
}
fn()
I'm currently trying to get some informations from a website (https://www.bauhaus.info/) and fail at the cookie popup form.
This is my code till now:
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.bauhaus.info');
await sleep(5000);
const html = await page.content();
fs.writeFileSync("./page.html", html, "UTF-8");
page.pdf({
path: './bauhaus.pdf',
format: 'a4'
});
});
function sleep(ms) {
return new Promise((resolve) => {
setTimeout(resolve, ms);
});
}
Till this everything works fine. But I can't accept the cookie banner, because I don't see the html from this banner in puppeteer. But in the pdf I can see the form.
My browser
Puppeteer
Why can I not see this popup in the html code?
Bonus quest: Is there any way to replace the sleep method with any page.await without knowing which js method triggers the cookie form to appear?
This element is in a shadow root. Please visit my answer in Puppeteer not giving accurate HTML code for page with shadow roots for additional information about the shadow DOM.
This code dips into the shadow root, waits for the button to appear, then clicks it:
const puppeteer = require("puppeteer"); // ^13.5.1
let browser;
(async () => {
browser = await puppeteer.launch({headless: false});
const [page] = await browser.pages();
const url = "https://www.bauhaus.info/";
await page.goto(url, {waitUntil: "domcontentloaded"});
const el = await page.waitForSelector("#usercentrics-root");
await page.waitForFunction(el =>
el.shadowRoot.querySelector(".sc-gsDKAQ.dejeIh"), {}, el
);
await el.evaluate(el =>
el.shadowRoot.querySelector(".sc-gsDKAQ.dejeIh").click()
);
await page.waitForTimeout(100000); // pause to show that it worked
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
I am attempting to scrape deck lists from aetherhub for personal use. when you get to the page you have to click to make a modal popup and then no matter what I tried I could not make it copy the text in the body of the modal. Second option is to have it copy to the clipboard and then save that to a variable and then work with the string. Bingo! I made it connect and copy and return the deck list. The problem I am having is that I can not get it to work every time. I have tried putting in waits and delays to try and see that would help but i can not seem to get it to work every time. I mostly get this error "Error: Node is either not visible or not an HTMLElement"
const puppeteer = require('puppeteer')
async function getcardlist(url) {
try {
const browser = await puppeteer.launch({headless: false})
const page = await browser.newPage()
const context = await browser.defaultBrowserContext()
await context.overridePermissions(url, ['clipboard-read'])
await page.goto(url, {waitUntil: 'load'})
const exportButton = await page.$('li.nav-item:nth-child(5) > a:nth-child(1)')
await exportButton.click()
await page.waitForSelector('a.mtgaExport')
const mtgaFormatButton = await page.$('a.mtgaExport')
await mtgaFormatButton.click()
await page.waitForSelector('#exportSimpleBtn')
const simplebutton = await page.$('#exportSimpleBtn')
await simplebutton.click()
await page.$('.modal.show', { waitUntil: 'load' })
await page.waitForSelector('.modal-footer > #exportListbtn')
const toClipBoard = await page.$('.modal-footer > #exportListbtn')
await toClipBoard.click()
const copiedText = await page.evaluate(`(async () => await navigator.clipboard.readText())()`)
await browser.close()
return copiedText
} catch (err) {
console.error(err);
}
}
getcardlist('https://aetherhub.com/Deck/rakdos-menacing-menaces')
.then(returnVal => console.log((returnVal)))
When you get the Error
"Error: Node is either not visible or not an HTMLElement"
It's basically saying that's the requested button/element is not found on the page.
so even if you want or do a page.waitForSelector you will get an error (because it does not exist in your DOM). so use headless: false, and inspect element to see if you find your selector
I'm trying to automate a sign in to a simple website that a scammer sent my friend. I can use puppeteer to fill in the text inputs but when I try to use it to click the button, all it does is activate the button color change (that happens when the mouse hovers over the button). I also tried clicking enter while focusing on the input fields, but that doesn't seem to work. When I use document.buttonNode.click() in the console, it worked, but I can't seem to emulate that with puppeteer
I also tried to use the waitFor function but it kept telling me 'cannot read property waitFor'
const puppeteer = require('puppeteer');
const chromeOptions = {
headless:false,
defaultViewport: null,
slowMo:10};
(async function main() {
const browser = await puppeteer.launch(chromeOptions);
const page = await browser.newPage();
await page.goto('https://cornelluniversityemailverifica.godaddysites.com/?fbclid=IwAR3ERzNkDRPOGL1ez2fXcmumIYcMyBjuI7EUdHIWhqdRDzzUAMwRGaI_o-0');
await page.type('#input1', 'hello#cornell.edu');
await page.type('#input2', 'password');
// await this.page.waitFor(2000);
// await page.type(String.fromCharCode(13));
await page.click('button[type=submit]');
})()
This site blocks unsecured events, you need to wait before the click.
Just add the await page.waitFor(1000); before click. Also, I would suggest adding the waitUntil:"networkidle2" argument to the goto function.
So here is the working script:
const puppeteer = require('puppeteer');
const chromeOptions = {
headless: false,
defaultViewport: null,
slowMo:10
};
(async function main() {
const browser = await puppeteer.launch(chromeOptions);
const page = await browser.newPage();
await page.goto('https://cornelluniversityemailverifica.godaddysites.com/?fbclid=IwAR3ERzNkDRPOGL1ez2fXcmumIYcMyBjuI7EUdHIWhqdRDzzUAMwRGaI_o-0', { waitUntil: 'networkidle2' });
await page.type('#input1', 'hello#cornell.edu');
await page.type('#input2', 'password');
await page.waitFor(1000);
await page.click('button[type=submit]');
})()
I'm trying to login into Instagram with Puppeteer, but somehow I'm unable to do it.
Can you help me?
Here is the link I'm using:
https://www.instagram.com/accounts/login/
I tried different stuff. The last code I tried was this:
const puppeteer = require('puppeteer');
(async() => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.instagram.com/accounts/login/');
await page.evaluate();
await afterJS.type('#f29d14ae75303cc', 'username');
await afterJS.type('#f13459e80cdd114', 'password');
await page.pdf({path: 'page.pdf', format: 'A4'});
await browser.close();
})();
Thanks in advance!
OK you're on the right track but just need to change a few things.
Firstly, I have no idea where your afterJS variable comes from? Either way you won't need it.
You're asking for data to be typed into the username and password input fields but aren't asking puppeteer to actually click on the log in button to complete the log in process.
page.evaluate() is used to execute JavaScript code inside of the page context (ie. on the web page loaded in the remote browser). So you don't need to use it here.
I would refactor your code to look like the following:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.instagram.com/accounts/login/');
await page.waitForSelector('input[name="username"]');
await page.type('input[name="username"]', 'username');
await page.type('input[name="password"]', 'password');
await page.click('button[type="submit"]');
// Add a wait for some selector on the home page to load to ensure the next step works correctly
await page.pdf({path: 'page.pdf', format: 'A4'});
await browser.close();
})();
Hopefully this sets you down the right path to getting past the login page!
Update 1:
You've enquired about parsing the text of an element on Instagram... unfortunately I don't have an account on there myself so can't really give you an exact solution but hopefully this still proves of some value.
So you're trying to evaluate an elements text, right? You can do this as follows:
const text = await page.$eval(cssSelector, (element) => {
return element.textContent;
});
All you have to do is replace cssSelector with the selector of the element you wish to retrieve the text from.
Update 2:
OK lastly, you've enquired about scrolling down to an element within a parent element. I'm not going to steal the credit from someone else so here's the answer to that:
How to scroll to an element inside a div?
What you'll have to do is basically follow the instructions in there and get that to work with puppeteer similar to as follows:
await page.evaluate(() => {
const lastLink = document.querySelectorAll('h3 > a')[2];
const topPos = lastLink.offsetTop;
const parentDiv = document.querySelector('div[class*="eo2As"]');
parentDiv.scrollTop = topPos;
});
Bear in mind that I haven't tested that code - I've just directly followed the answer in the URL I've provided. It should work!
You can log in to Instagram using the following example code:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Wait until page has loaded
await page.goto('https://www.instagram.com/accounts/login/', {
waitUntil: 'networkidle0',
});
// Wait for log in form
await Promise.all([
page.waitForSelector('[name="username"]'),
page.waitForSelector('[name="password"]'),
page.waitForSelector('[name="submit"]'),
]);
// Enter username and password
await page.type('[name="username"]', 'username');
await page.type('[name="password"]', 'password');
// Submit log in credentials and wait for navigation
await Promise.all([
page.click('[type="submit"]'),
page.waitForNavigation({
waitUntil: 'networkidle0',
}),
]);
// Download PDF
await page.pdf({
path: 'page.pdf',
format: 'A4',
});
await browser.close();
})();