Any option available in Puppeteer to sense complete page download? [duplicate] - javascript

I am working on creating PDF from web page.
The application on which I am working is single page application.
I tried many options and suggestion on https://github.com/GoogleChrome/puppeteer/issues/1412
But it is not working
const browser = await puppeteer.launch({
executablePath: 'C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe',
ignoreHTTPSErrors: true,
headless: true,
devtools: false,
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
const page = await browser.newPage();
await page.goto(fullUrl, {
waitUntil: 'networkidle2'
});
await page.type('#username', 'scott');
await page.type('#password', 'tiger');
await page.click('#Login_Button');
await page.waitFor(2000);
await page.pdf({
path: outputFileName,
displayHeaderFooter: true,
headerTemplate: '',
footerTemplate: '',
printBackground: true,
format: 'A4'
});
What I want is to generate PDF report as soon as Page is loaded completely.
I don't want to write any type of delays i.e. await page.waitFor(2000);
I can not do waitForSelector because the page has charts and graphs which are rendered after calculations.
Help will be appreciated.

You can use page.waitForNavigation() to wait for the new page to load completely before generating a PDF:
await page.goto(fullUrl, {
waitUntil: 'networkidle0',
});
await page.type('#username', 'scott');
await page.type('#password', 'tiger');
await page.click('#Login_Button');
await page.waitForNavigation({
waitUntil: 'networkidle0',
});
await page.pdf({
path: outputFileName,
displayHeaderFooter: true,
headerTemplate: '',
footerTemplate: '',
printBackground: true,
format: 'A4',
});
If there is a certain element that is generated dynamically that you would like included in your PDF, consider using page.waitForSelector() to ensure that the content is visible:
await page.waitForSelector('#example', {
visible: true,
});

Sometimes the networkidle events do not always give an indication that the page has completely loaded. There could still be a few JS scripts modifying the content on the page. So watching for the completion of HTML source code modifications by the browser seems to be yielding better results. Here's a function you could use -
const waitTillHTMLRendered = async (page, timeout = 30000) => {
const checkDurationMsecs = 1000;
const maxChecks = timeout / checkDurationMsecs;
let lastHTMLSize = 0;
let checkCounts = 1;
let countStableSizeIterations = 0;
const minStableSizeIterations = 3;
while(checkCounts++ <= maxChecks){
let html = await page.content();
let currentHTMLSize = html.length;
let bodyHTMLSize = await page.evaluate(() => document.body.innerHTML.length);
console.log('last: ', lastHTMLSize, ' <> curr: ', currentHTMLSize, " body html size: ", bodyHTMLSize);
if(lastHTMLSize != 0 && currentHTMLSize == lastHTMLSize)
countStableSizeIterations++;
else
countStableSizeIterations = 0; //reset the counter
if(countStableSizeIterations >= minStableSizeIterations) {
console.log("Page rendered fully..");
break;
}
lastHTMLSize = currentHTMLSize;
await page.waitForTimeout(checkDurationMsecs);
}
};
You could use this after the page load / click function call and before you process the page content. e.g.
await page.goto(url, {'timeout': 10000, 'waitUntil':'load'});
await waitTillHTMLRendered(page)
const data = await page.content()

In some cases, the best solution for me was:
await page.goto(url, { waitUntil: 'domcontentloaded' });
Some other options you could try are:
await page.goto(url, { waitUntil: 'load' });
await page.goto(url, { waitUntil: 'domcontentloaded' });
await page.goto(url, { waitUntil: 'networkidle0' });
await page.goto(url, { waitUntil: 'networkidle2' });
You can check this at puppeteer documentation:
https://pptr.dev/#?product=Puppeteer&version=v11.0.0&show=api-pagewaitfornavigationoptions

I always like to wait for selectors, as many of them are a great indicator that the page has fully loaded:
await page.waitForSelector('#blue-button');

In the latest Puppeteer version, networkidle2 worked for me:
await page.goto(url, { waitUntil: 'networkidle2' });

Wrap the page.click and page.waitForNavigation in a Promise.all
await Promise.all([
page.click('#submit_button'),
page.waitForNavigation({ waitUntil: 'networkidle0' })
]);

I encountered the same issue with networkidle when I was working on an offscreen renderer. I needed a WebGL-based engine to finish rendering and only then make a screenshot. What worked for me was a page.waitForFunction() method. In my case the usage was as follows:
await page.goto(url);
await page.waitForFunction("renderingCompleted === true")
const imageBuffer = await page.screenshot({});
In the rendering code, I was simply setting the renderingCompleted variable to true, when done. If you don't have access to the page code you can use some other existing identifier.

You can also use to ensure all elements have rendered
await page.waitFor('*')
Reference: https://github.com/puppeteer/puppeteer/issues/1875

As for December 2020, waitFor function is deprecated, as the warning inside the code tell:
waitFor is deprecated and will be removed in a future release. See
https://github.com/puppeteer/puppeteer/issues/6214 for details and how
to migrate your code.
You can use:
sleep(millisecondsCount) {
if (!millisecondsCount) {
return;
}
return new Promise(resolve => setTimeout(resolve, millisecondsCount)).catch();
}
And use it:
(async () => {
await sleep(1000);
})();

Keeping in mind the caveat that there's no silver bullet to handle all page loads, one strategy is to monitor the DOM until it's been stable (i.e. has not seen a mutation) for more than n milliseconds. This is similar to the network idle solution but geared towards the DOM rather than requests and therefore covers a different subset of loading behaviors.
Generally, this code would follow a page.waitForNavigation({waitUntil: "domcontentloaded"}) or page.goto(url, {waitUntil: "domcontentloaded"}), but you could also wait for it alongside, say, waitForNetworkIdle() using Promise.all() or Promise.race().
Here's a simple example:
const puppeteer = require("puppeteer"); // ^14.3.0
const waitForDOMStable = (
page,
options={timeout: 30000, idleTime: 2000}
) =>
page.evaluate(({timeout, idleTime}) =>
new Promise((resolve, reject) => {
setTimeout(() => {
observer.disconnect();
const msg = `timeout of ${timeout} ms ` +
"exceeded waiting for DOM to stabilize";
reject(Error(msg));
}, timeout);
const observer = new MutationObserver(() => {
clearTimeout(timeoutId);
timeoutId = setTimeout(finish, idleTime);
});
const config = {
attributes: true,
childList: true,
subtree: true
};
observer.observe(document.body, config);
const finish = () => {
observer.disconnect();
resolve();
};
let timeoutId = setTimeout(finish, idleTime);
}),
options
)
;
const html = `<!DOCTYPE html><html lang="en"><head>
<title>test</title></head><body><h1></h1><script>
(async () => {
for (let i = 0; i < 10; i++) {
document.querySelector("h1").textContent += i + " ";
await new Promise(r => setTimeout(r, 1000));
}
})();
</script></body></html>`;
let browser;
(async () => {
browser = await puppeteer.launch({headless: true});
const [page] = await browser.pages();
await page.setContent(html);
await waitForDOMStable(page);
console.log(await page.$eval("h1", el => el.textContent));
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
For pages that continually mutate the DOM more often than the idle value, the timeout will eventually trigger and reject the promise, following the typical Puppeteer fallback. You can set a more aggressive overall timeout to fit your needs or tailor the logic to ignore (or only monitor) a particular subtree.

Answers so far haven't mentioned a critical fact: it's impossible to write a one-size-fits-all waitUntilPageLoaded function that works on every page. If it were possble, Puppeteer would surely provide it.
Such a function can't rely on a timeout, because there's always some page that takes longer to load than that timeout. As you extend the timeout to reduce the failure rate, you introduce unnecessary delays when working with fast pages. Timeouts are generally a poor solution, opting out of Puppeteer's event-driven model.
Waiting for idle network requests might not always work if the responses involve long-running DOM updates that take longer than 500ms to trigger a render.
Waiting for the DOM to stop changing might miss slow network requests, long-delayed JS triggers, or ongoing DOM manipulation that might cause the listener never to settle, unless specially handled.
And, of course, there's user interaction: captchas, prompts and cookie/subscription modals that need to be clicked through and dismissed before the page is in a sensible state for a full-page screenshot (for example).
Since every page has different, arbitrary JS behavior, the typical approach is to write event-driven logic that works for a specific page. Making precise, directed assumptions is much better than cobbling together a boatload of hacks that tries to solve every edge case.
If your use case is to write a load event that works on every page, my suggestion is to use some combination of the tools described here that is most balanced to meet your needs (speed vs. accuracy, development time/code complexitiy vs accuracy, etc). Use fail-safes for everything rather than blindly assuming all pages will cooperate with your assumptions. Think hard about what extent you really need to try to handle every web page. Prepare to compromise and accept some degree of failures you can live with.
Here's a quick rundown of the strategies you can mix and match to wait for loads to fit your needs:
page.goto() and page.waitForNavigation() default to the load event, which "is fired when the whole page has loaded, including all dependent resources such as stylesheets and images" (MDN), but this is often too pessimistic; there's no need to wait for a ton of data you don't care about. Often the data is available without waiting for all external resources, so domcontentloaded should be faster. See my post Avoiding Puppeteer Antipatterns for further discussion.
On the other hand, if there are JS-triggered networks requests after load, you'll miss that data. Hence networkidle2 and networkidle0, which wait 500 ms after the number of active network requests are 2 or 0. The motivation for the 2 version is that some sites keep ongoing requests open, which would cause networkidle0 to time out.
If you're waitng for a specific network response that might have a payload (or, for the general case, implementing your own network idle monitor), use page.waitForResponse(). page.waitForRequest(), page.waitForNetworkIdle() and page.on("request", ...) are also useful here.
If you're waiting for a particular selector to be visible, use page.waitForSelector(). If you're waiting for a load on a specific page, identify a selector that indicates the state you want to wait for. Generally speaking, for scripts specific to one page, this is the main tool to wait for the state you want, whether you're extracting data or clicking something. Frames and shadow roots thwart this function.
page.waitForFunction() lets you wait for an arbitrary predicate, for example, checking that the page's HTML or a specific list is a certain length. It's also useful for quickly dipping into frames and shadow roots to wait for predicates that depend on nested state. This function is also handy for detecting DOM mutations.
The most general tool is page.evaluate(), which plugs code into the browser. You can put just about any conditions you want here; most other Puppeteer functions are convenience wrappers for common cases you could implement by hand with evaluate.

I can't leave comments, but I made a python version of Anand's answer for anyone who finds it useful (i.e. if they use pyppeteer).
async def waitTillHTMLRendered(page: Page, timeout: int = 30000):
check_duration_m_secs = 1000
max_checks = timeout / check_duration_m_secs
last_HTML_size = 0
check_counts = 1
count_stable_size_iterations = 0
min_stabe_size_iterations = 3
while check_counts <= max_checks:
check_counts += 1
html = await page.content()
currentHTMLSize = len(html);
if(last_HTML_size != 0 and currentHTMLSize == last_HTML_size):
count_stable_size_iterations += 1
else:
count_stable_size_iterations = 0 # reset the counter
if(count_stable_size_iterations >= min_stabe_size_iterations):
break
last_HTML_size = currentHTMLSize
await page.waitFor(check_duration_m_secs)

For me the { waitUntil: 'domcontentloaded' } is always my go to.
I found that networkidle doesnt work well...

Related

Close the page after certain interval [Puppeteer]

I have used puppeteer for one of my projects to open webpages in headless chrome, do some actions and then close the page. These actions, however, are user dependent. I want to attach a lifetime to the page, where it closes automatically after, say 30 minutes, of opening irrespective of whether any action is performed or not.
I have tried setTimeout() functionality of Node JS but it didn't work (or I just couldn't figure how to make it work).
I have tried the following:
const puppeteer = require('puppeteer-core');
const browser = await puppeteer.connect({browserURL: browser_url});
const page = await browser.newPage();
// timer starts ticking here upon creation of new page (maybe in a subroutine and not block the main thread)
/**
..
Do something
..
*/
// timer ends and closePage() is triggered.
const closePage = (page) => {
if (!page.isClosed()) {
page.close();
}
}
But this gives me the following error:
Error: Protocol error: Connection closed. Most likely the page has been closed.
Your provided code should work as excepted. Are you sure the page is still opened after the timeout and it is indeed the same page?
You can try this wrapper for opening pages and closing them correctly.
// since it is async it won't block the eventloop.
// using `await` will allow other functions to execute.
async function openNewPage(browser, timeoutMs) {
const page = await browser.newPage()
setTimeout(async () => {
// you want to use try/catch for omitting unhandled promise rejections.
try {
if(!page.isClosed()) {
await page.close()
}
} catch(err) {
console.error('unexpected error occured when closing page.', err)
}
}, timeoutMs)
}
// use it like so.
const browser = await puppeteer.connect({browserURL: browser_url});
const min30Ms = 30 * 60 * 1000
const page = await openNewPage(browser, min30Ms);
// ...
The above only closes the Tabs in your browser. For closing the puppeteer instance you would have to call browser.close() which could may be what you want?
page.close returns a promise so you need to define closePage as an async function and use await page.close(). I believe #silvan's answer should address the issue, just make sure to replace if condition
if(page.isClosed())
with
if(!page.isClosed())

screen shot and data trying to be taken before site fully loads using puppeteer

Hi i am trying to get to take a screenshot of a website using puppeteer but the site loads quite slow which leads to always not being able to grab any data or take screen shots, I would like to delay my screenshot until the site is finished loading, I have tried a bunch of methods and cant figure it out. Thanks in advance for any help.
This is my Code
const puppeteer = require("puppeteer-extra");
// add stealth plugin and use defaults (all evasion techniques)
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
puppeteer.use(StealthPlugin());
async function scrapeProduct(url) {
//launching puppeteer
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto(url, { waitUntil: "load" });
await page.waitFor("*");
function time() {
var d = new Date();
var n = d.getSeconds();
return console.log(n);
}
time();
await page.screenshot({ path: "testresult.png" });
time();
await browser.close();
}
scrapeProduct("https://www.realcanadiansuperstore.ca/search?search-bar=milk");
waitFor has been depreciated recently so you are better off trying the other events.
I can't inspect the webpage you are taking a screenshot of so cannot tell what might be happening after the load event.
However have you tried the other events puppeteer offers?
waitForNavigation and waitForSelector mentioned in https://stackoverflow.com/a/52501934/484337
If you have control of the page you are taking a screenshot of then you can add a DOM event to it which your puppeteer code can wait for using waitForEvent.
If all else fails and time is not important then you can put in a sleep(n) that is long enough to guarantee the page is loaded.

How to speed up puppeteer?

A web page has a button and puppeteer must click that button as soon as possible button becomes visible. This button is not always visible and it is becoming visible for everyone at the same time. So i have to refresh constantly to find that button is became visible. I wrote that script below for to do that:
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox']
});
const page = await browser.newPage()
await page.setViewport({ width: 1920, height: 1080})
//I am calling my pageRefresher method here
async function pageRefresher(page,browser, url) {
try {
await page.goto(url, {waitUntil: 'networkidle2'})
try {
await page.waitForSelector('#ourButton', {timeout: 10});
await page.click('#ourButton')
console.log(`clicked!`)
await browser.close()
} catch (error) {
console.log('catch2 ' + counter + ' ' + error)
counter += 1
await pageRefresher(page, browser, url)
}
}catch (error) {
console.log('catch3' + error)
await browser.close();
}
}
As you can see, my method is recursive. It goes to that page and looking for that button. If there is no button then it calls itself again for redoing the same job until it finds and clicks to that button.
Actually it works well right now. But it is slow. I am running this script meanwhile i am opening the same page on my desktop chrome and i am starting to refresh that page manually. And i am always winning, i am always clicking to that button before the puppeteer.
How can i speed up this process? A script should not lose to a human who has just manual controls like F5 button.
A script should not lose to a human who has just manual controls like F5 button.
It happens because sometimes the rules that puppeteer follows are much stricter than what we consider as a "fully loaded webpage". Even if you as a human can decide whether your desired element is in the DOM already (because you see the element is there) or it is not there (because you don't see it). E.g.: you will see that your button is not there even if the background image is still loading in the background, or the webfonts are still not loaded and you have the fallback fonts, but puppeteer waits for specific events in the background to get the permission either to go to the catch block (timeout) or to grab the desired element (waitForSelector succeeds). It can really depends on the site you are visiting, but you are able to speed up the process of recognition of your desired element.
I give some examples and ideas how you can achieve this.
Ways to speed up recognition of the desired element
1.) If you don't need every network connections for your task you could speed up page loading by replacing waitUntil: 'networkidle2' to waitUntil: 'domcontentloaded' as this event happens usually earlier and will be fired when #ourButton will be already present in the DOM.
The possible options of page.goto/page.reload:
load - consider navigation to be finished when the load event is fired.
domcontentloaded - consider navigation to be finished when the DOMContentLoaded event is fired.
networkidle0 - consider navigation to be finished when there are no more than 0 network connections for at least 500 ms.
networkidle2 - consider navigation to be finished when there are no more than 2 network connections for at least 500 ms.
You are winning over the script because of networkidle2 is too strict. You may need this option (e.g. you are visiting a single-page application or later you will need data from the 3rd party network connection e.g. cookies) but in case it is not mandatory you will experience better performance with domcontentloaded.
2.) Instead of constantly navigating to the same url you could use page.reload method in a loop, e.g.:
await page.goto(url, { waitUntil: 'domcontentloaded' })
let selectorExists = await page.$('#ourButton')
while (selectorExists === null) {
await page.reload({ waitUntil: 'domcontentloaded' })
console.log('reload')
selectorExists = await page.$('#ourButton')
}
await page.click('#ourButton')
// code goes on...
Its main benefit is that you are able to shorten and simplify your pageRefresher function. But I experienced also better performance (however I did no benchmarking but I felt it much faster than re-opening a page).
3.) If you don't need every resource type for your task you could also speed up page loading by disabling images or css with the following script:
await page.setRequestInterception(true)
page.on('request', (request) => {
if (request.resourceType() === 'image') request.abort()
else request.continue()
})
[source]
List of resourceType-s.
Try just not awaiting the goto:
page.goto(url) // no await because it doesn't have to resolve fully
await page.waitForSelector('#ourButton') // await this because we need it to be there
Some people like Promise.race for this but this way is simpler
Using the page.$eval() method you can do it as short as this:
await page.goto(url);
page.$eval('button-selector', button => button.click());
By doing so, you combine the actions of searching the desired button and clicking on it into a single line. You will have to await on the page.goto() instruction as you will need the page to be fully loaded before using page.$eval()
1st arg is the selector you need to use to get your HTMLElement in your case a button.
This HTMLElement will be retrieved by running document.querySelector() with the provided selector whitin page context before passing it as argument for the function defined in the following argument.
2nd arg is the function to be executed inside page context wich take the HTMLElement that match the previous selector as argument
The page.$eval() instruction will throw an error if no element is found that match the provided selector.
You can address this in two ways:
prevent the error from triggering at all by testing if your HTMLElement exists before using the page.$eval() method.
await page.goto(url);
if (await page.$('button-selector') != null) // await because page.$() returns a promise
page.$eval('button-selector', button => button.click());
an alternative using only page.$() would be :
await page.goto(url);
if ((button = await page.$('button-selector')) != null)
button.click();
Be sure to encapsulate the left part of the condition inside ( ) otherwise button value will be true or false.
catch the error when it occurs:
you could use this to determine when to reload the page
await page.goto(url);
page.$eval('button-selector', button => button.click())
.catch((err) => {
// log the error here or do some other stuff
});
After some tests it looks like we can't use a try ... catch block to capture the error on the page.$eval() method so the above example is the only way to do so.
For more informations you could check the puppeteer API page for page.$eval()
And if you want to go further in accelerating puppeteer I've found those tutorials really helpfull:
How to speed up Puppeteer scraping with parallelization
Optimizing and Deploying Puppeteer Web Scraper
8 Tips for Faster Puppeteer Screenshots
Edit:
From your code i see you use the page.setViewPort() method to set a viewport size of 1920x1080 px on your page. While it may provides a better viewing when showing the navigator it'll have some impact on performance. It is best practice to use minimal settings when running in headless mode.

Puppeteer and dynamically added iFrame (element)

We have an angularJs application that popup a modal form (component) on button pressed.
This component loads an iFrame, which I cannot seem to access with Puppeteer.
Have tried with mainFrame.
await page.waitFor(15000);
const frame = page.mainFrame().childFrames().find((iframe) => {
console.log('FRAME', iframe.name(), iframe.url());
return iframe.name() === 'iFrameName';
});
The above only has one frame (the main frame/window).
Have tried with frames
await page.waitFor(15000);
const frame = page.frames().find((iframe) => {
console.log('FRAME', iframe.name(), iframe.url());
return iframe.name() === 'iFrameName';
});
Have tried with contentFrame
await page.waitForSelector('iframe', { visible: true, timeout: 2000 });
const elementHandle = await page.$('iframe');
await page.waitFor(1000);
const frame = await elementHandle.contentFrame();
With the above, elementHandle has a value but frame is null
We have this working with Protractor, were hopping to move to Puppeteers but if there is no solution will have to stick with Protractor (which has it own other issues)
Currently, there is no support for out-of-process iframes (OOPIFs). To be able to work with them, you need to launch Chromium with --disable-features=site-per-process:
const browser = await puppeteer.launch({
args: ['--disable-features=site-per-process']
});
You can track puppeteer's issue/support here.
I have a similar problem, an iframe dynamically called, so that src=(unknown) with a JS
href="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(VARİABLES,,true,,false,))
is it possible to clone The or an iframe via invoking js calling it in puppeteer? if so you can try.

How to turn headless on after launch? [duplicate]

This question already has answers here:
Can the browser turned headless mid-execution when it was started normally, or vice-versa?
(2 answers)
Closed 5 months ago.
I'd like to load the page with headless off to let me login.
After login I want to hide it, turning on the headless and let it do what it has to do.
How can I turn on/off the headless after launch?
You cannot toggle headless on fly. But you can share the login using cookies and setCookie if you want.
We will create a simple class to keep the code clean (or that's what I believe for these type of work since they usually get big later). You can do this without all these complexity though. Also, Make sure the cookies are serialized. Do not pass array to toe setCookie function.
There will be three main functions.
1. init()
To create a page object. Mostly to make sure the headless and headful version has similar style of browsing, same user agent etc. Note, I did not include the code to set user agents, it's just there to show the concept.
async init(headless) {
const browser = await puppeteer.launch({
headless
});
const page = await browser.newPage();
// do more page stuff before loading, ie: user agent and so on
return {
page,
browser
};
}
2. getLoginCookies()
Example of showing how you can get cookies from the browser.
// will take care of our login using headful
async getLoginCookies() {
const {
page,
browser
} = await this.init(false)
// asume we load page and login here using some method
// and the website sets some cookie
await page.goto('http://httpbin.org/cookies/set/authenticated/true')
// store the cookie somewhere
this.cookies = await page.cookies() // the cookies are collected as array
// close the page and browser, we are done with this
await page.close();
await browser.close();
return true;
}
You won't need such function if you can provide cookies manually. You can use EditThisCookie or any cookie editing tool. You will get an array of all cookies for that site. Here is how you can do this,
3. useHeadless()
Example of showing how you can set cookies to a browser.
// continue with our normal headless stuff
async useHeadless() {
const {
page,
browser
} = await this.init(true)
// we set all cookies we got previously
await page.setCookie(...this.cookies) // three dots represents spread syntax. The cookies are contained in a array.
// verify the cookies are working properly
await page.goto('http://httpbin.org/cookies');
const content = await page.$eval('body', e => e.innerText)
console.log(content)
// do other stuff
// close the page and browser, we are done with this
// deduplicate this however you like
await page.close();
await browser.close();
return true;
}
4. Creating our own awesome puppeteer instance
// let's use this
(async () => {
const loginTester = new myAwesomePuppeteer()
await loginTester.getLoginCookies()
await loginTester.useHeadless()
})()
Full Code
Walk through the code to understand it better. It's all commented.
const puppeteer = require('puppeteer');
class myAwesomePuppeteer {
constructor() {
// keeps the cookies on the class scope
this.cookies;
}
// creates a browser instance and applies all kind of setup
async init(headless) {
const browser = await puppeteer.launch({
headless
});
const page = await browser.newPage();
// do more page stuff before loading, ie: user agent and so on
return {
page,
browser
};
}
// will take care of our login using headful
async getLoginCookies() {
const {
page,
browser
} = await this.init(false)
// asume we load page and login here using some method
// and the website sets some cookie
await page.goto('http://httpbin.org/cookies/set/authenticated/true')
// store the cookie somewhere
this.cookies = await page.cookies()
// close the page and browser, we are done with this
await page.close();
await browser.close();
return true;
}
// continue with our normal headless stuff
async useHeadless() {
const {
page,
browser
} = await this.init(true)
// we set all cookies we got previously
await page.setCookie(...this.cookies)
// verify the cookies are working properly
await page.goto('http://httpbin.org/cookies');
const content = await page.$eval('body', e => e.innerText)
console.log(content)
// do other stuff
// close the page and browser, we are done with this
// deduplicate this however you like
await page.close();
await browser.close();
return true;
}
}
// let's use this
(async () => {
const loginTester = new myAwesomePuppeteer()
await loginTester.getLoginCookies()
await loginTester.useHeadless()
})()
Here is the result,
➜ node app.js
{
"cookies": {
"authenticated": "true"
}
}
So in short,
You can use the cookies function to get cookies.
You can use extensions like Edit This Cookie to get cookies from your normal browser.
You can use setCookie to set any kind of cookie you get from browser.

Categories

Resources