How to turn headless on after launch? [duplicate] - javascript

This question already has answers here:
Can the browser turned headless mid-execution when it was started normally, or vice-versa?
(2 answers)
Closed 5 months ago.
I'd like to load the page with headless off to let me login.
After login I want to hide it, turning on the headless and let it do what it has to do.
How can I turn on/off the headless after launch?

You cannot toggle headless on fly. But you can share the login using cookies and setCookie if you want.
We will create a simple class to keep the code clean (or that's what I believe for these type of work since they usually get big later). You can do this without all these complexity though. Also, Make sure the cookies are serialized. Do not pass array to toe setCookie function.
There will be three main functions.
1. init()
To create a page object. Mostly to make sure the headless and headful version has similar style of browsing, same user agent etc. Note, I did not include the code to set user agents, it's just there to show the concept.
async init(headless) {
const browser = await puppeteer.launch({
headless
});
const page = await browser.newPage();
// do more page stuff before loading, ie: user agent and so on
return {
page,
browser
};
}
2. getLoginCookies()
Example of showing how you can get cookies from the browser.
// will take care of our login using headful
async getLoginCookies() {
const {
page,
browser
} = await this.init(false)
// asume we load page and login here using some method
// and the website sets some cookie
await page.goto('http://httpbin.org/cookies/set/authenticated/true')
// store the cookie somewhere
this.cookies = await page.cookies() // the cookies are collected as array
// close the page and browser, we are done with this
await page.close();
await browser.close();
return true;
}
You won't need such function if you can provide cookies manually. You can use EditThisCookie or any cookie editing tool. You will get an array of all cookies for that site. Here is how you can do this,
3. useHeadless()
Example of showing how you can set cookies to a browser.
// continue with our normal headless stuff
async useHeadless() {
const {
page,
browser
} = await this.init(true)
// we set all cookies we got previously
await page.setCookie(...this.cookies) // three dots represents spread syntax. The cookies are contained in a array.
// verify the cookies are working properly
await page.goto('http://httpbin.org/cookies');
const content = await page.$eval('body', e => e.innerText)
console.log(content)
// do other stuff
// close the page and browser, we are done with this
// deduplicate this however you like
await page.close();
await browser.close();
return true;
}
4. Creating our own awesome puppeteer instance
// let's use this
(async () => {
const loginTester = new myAwesomePuppeteer()
await loginTester.getLoginCookies()
await loginTester.useHeadless()
})()
Full Code
Walk through the code to understand it better. It's all commented.
const puppeteer = require('puppeteer');
class myAwesomePuppeteer {
constructor() {
// keeps the cookies on the class scope
this.cookies;
}
// creates a browser instance and applies all kind of setup
async init(headless) {
const browser = await puppeteer.launch({
headless
});
const page = await browser.newPage();
// do more page stuff before loading, ie: user agent and so on
return {
page,
browser
};
}
// will take care of our login using headful
async getLoginCookies() {
const {
page,
browser
} = await this.init(false)
// asume we load page and login here using some method
// and the website sets some cookie
await page.goto('http://httpbin.org/cookies/set/authenticated/true')
// store the cookie somewhere
this.cookies = await page.cookies()
// close the page and browser, we are done with this
await page.close();
await browser.close();
return true;
}
// continue with our normal headless stuff
async useHeadless() {
const {
page,
browser
} = await this.init(true)
// we set all cookies we got previously
await page.setCookie(...this.cookies)
// verify the cookies are working properly
await page.goto('http://httpbin.org/cookies');
const content = await page.$eval('body', e => e.innerText)
console.log(content)
// do other stuff
// close the page and browser, we are done with this
// deduplicate this however you like
await page.close();
await browser.close();
return true;
}
}
// let's use this
(async () => {
const loginTester = new myAwesomePuppeteer()
await loginTester.getLoginCookies()
await loginTester.useHeadless()
})()
Here is the result,
➜ node app.js
{
"cookies": {
"authenticated": "true"
}
}
So in short,
You can use the cookies function to get cookies.
You can use extensions like Edit This Cookie to get cookies from your normal browser.
You can use setCookie to set any kind of cookie you get from browser.

Related

Close the page after certain interval [Puppeteer]

I have used puppeteer for one of my projects to open webpages in headless chrome, do some actions and then close the page. These actions, however, are user dependent. I want to attach a lifetime to the page, where it closes automatically after, say 30 minutes, of opening irrespective of whether any action is performed or not.
I have tried setTimeout() functionality of Node JS but it didn't work (or I just couldn't figure how to make it work).
I have tried the following:
const puppeteer = require('puppeteer-core');
const browser = await puppeteer.connect({browserURL: browser_url});
const page = await browser.newPage();
// timer starts ticking here upon creation of new page (maybe in a subroutine and not block the main thread)
/**
..
Do something
..
*/
// timer ends and closePage() is triggered.
const closePage = (page) => {
if (!page.isClosed()) {
page.close();
}
}
But this gives me the following error:
Error: Protocol error: Connection closed. Most likely the page has been closed.
Your provided code should work as excepted. Are you sure the page is still opened after the timeout and it is indeed the same page?
You can try this wrapper for opening pages and closing them correctly.
// since it is async it won't block the eventloop.
// using `await` will allow other functions to execute.
async function openNewPage(browser, timeoutMs) {
const page = await browser.newPage()
setTimeout(async () => {
// you want to use try/catch for omitting unhandled promise rejections.
try {
if(!page.isClosed()) {
await page.close()
}
} catch(err) {
console.error('unexpected error occured when closing page.', err)
}
}, timeoutMs)
}
// use it like so.
const browser = await puppeteer.connect({browserURL: browser_url});
const min30Ms = 30 * 60 * 1000
const page = await openNewPage(browser, min30Ms);
// ...
The above only closes the Tabs in your browser. For closing the puppeteer instance you would have to call browser.close() which could may be what you want?
page.close returns a promise so you need to define closePage as an async function and use await page.close(). I believe #silvan's answer should address the issue, just make sure to replace if condition
if(page.isClosed())
with
if(!page.isClosed())

How to handle popover windows with Puppeteer

I'm writing a script to purchase items on Amazon.
const puppeteer = require('puppeteer');
// Insert personal credentials
const email = '';
const password = '';
function press_enter(page) {
return Promise.all([
page.waitForNavigation({waitUntil:'networkidle2'}),
page.keyboard.press(String.fromCharCode(13))
]);
}
function click_wait(page, selector) {
return Promise.all([
page.waitForNavigation({waitUntil:'networkidle2'}),
page.click(selector)
]);
}
(async () => {
const browser = await puppeteer.launch({headless:false, defaultViewport:null, args: ['--start-maximized']});
const page = (await browser.pages())[0];
await page.goto('https://www.amazon.it/');
await click_wait(page, "a[data-nav-role='signin']");
await page.keyboard.type(email);
await press_enter(page);
await page.keyboard.type(password);
await press_enter(page);
// Search for the "signout" button as login proof
if(await page.$('#nav-item-signout') !== null) console.log('Login done!');
else return console.log('Something went wrong during login');
// Navigate to the product page
await page.goto('https://www.amazon.it/dp/B07RL2VWXQ');
// Click "buy now" (choose either Option A or Option B)
// Option A: Here the code get stuck since the page isn't refreshing and page.waitForNavigation() will reach its timeout
// await click_wait(page, "#buy-now-button");
// Option B: Waiting time manually set to 5 seconds (it should be more than enough for popover to fully load)
await Promise.all([page.waitForTimeout(5000), page.click('#buy-now-button')]);
// Conclude the purchase
await click_wait(page, '#turbo-checkout-pyo-button');
})();
So far I can login to Amazon, navigate to a product page and click the Buy Now button.
Then, if delivery address and payment option are all set up, (depending on Amazon domain) it may show up a pop-over box to conclude the purchase.
I wasn't able to replicate the popover response on .com and .co.uk, it seems that these domains will redirect you on a totally new page.
When I explore the page with Chrome Developer Tools I actually see the new chunk of the page being loaded (<!DOCTYPE html>) but I don't know where the representation of this element resides in Puppeteer.
If I use click_wait() to click Buy Now, the script gets stuck (it only returns after the default timeout of page.waitForNavigation()) so it's not considered a refreshing of the page. But even if I just wait a few seconds after clicking Buy Now and then attempt to click input[id='turbo-checkout-pyo-button'] (the orange button "Ordina") Puppeteer throws an error cause it can't find the element, despite it being clearly loaded.

screen shot and data trying to be taken before site fully loads using puppeteer

Hi i am trying to get to take a screenshot of a website using puppeteer but the site loads quite slow which leads to always not being able to grab any data or take screen shots, I would like to delay my screenshot until the site is finished loading, I have tried a bunch of methods and cant figure it out. Thanks in advance for any help.
This is my Code
const puppeteer = require("puppeteer-extra");
// add stealth plugin and use defaults (all evasion techniques)
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
puppeteer.use(StealthPlugin());
async function scrapeProduct(url) {
//launching puppeteer
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto(url, { waitUntil: "load" });
await page.waitFor("*");
function time() {
var d = new Date();
var n = d.getSeconds();
return console.log(n);
}
time();
await page.screenshot({ path: "testresult.png" });
time();
await browser.close();
}
scrapeProduct("https://www.realcanadiansuperstore.ca/search?search-bar=milk");
waitFor has been depreciated recently so you are better off trying the other events.
I can't inspect the webpage you are taking a screenshot of so cannot tell what might be happening after the load event.
However have you tried the other events puppeteer offers?
waitForNavigation and waitForSelector mentioned in https://stackoverflow.com/a/52501934/484337
If you have control of the page you are taking a screenshot of then you can add a DOM event to it which your puppeteer code can wait for using waitForEvent.
If all else fails and time is not important then you can put in a sleep(n) that is long enough to guarantee the page is loaded.

Puppeteer wait for new page after form submit

I'm trying to use puppeteer to load a page, submit a form (which takes me to a different URL) and then ideally run something once this new page had loaded. I'm using Node JS, and am generalising my logic into separate files, one of which is search.js as per the below:
const puppeteer = require('puppeteer')
const createSearch = async (param1) => {
puppeteer.launch({
headless: false,
}).then(async browser => {
const page = await browser.newPage(term, location)
await page.goto('https://example.com/')
await page.waitForSelector('body')
await page.evaluate(() => {
const searchForm = document.querySelector('form.searchBar--form')
searchForm.submit() // this takes me to a new page which I need to wait for and then ideally return something.
// I've tried adding code here, but it doesn't run...
}, term, location)
})
}
exports.createSearch = createSearch
I'm then calling my function from my app's entry point...
(async () => {
// current
search.createSearch('test')
// proposed
search.createSearch('test').then(() => {
// trigger puppeteer to look at the new page and start running asserts.
})
})()
Unfortunately, due to the form submitting, I'm unsure how I can wait for the new page to load and run a new function? The new URL will be unknown, and different each time, e.g: https://example.com/page20
After form submit, you need to wait until the page reloads. Please add this following the await page.evaluate() function call.
await page.waitForNavigation();
And then you can perform action you want.

How do I use jQuery with pages on puppeteer?

I am trying to use jQuery on the pages I load with puppeteer and I wanted to know how I can do the same? My code structure is like:
const puppeteer = require('puppeteer');
let browser = null;
async function getSelectors() {
try{
browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
await page.setViewport({width: 1024, height: 1080});
await page.goto('https://www.google.com/');
await page.addScriptTag({url: 'https://code.jquery.com/jquery-3.2.1.min.js'});
var button = $('h1').text();
console.log(button);
} catch (e) {
console.log(e);
}
}
getSelectors();
Also I will be navigating to many pages within puppeteer so is there a way I can just add jQuery once and then use it throughout? A local jquery file implementation would be helpful as well.
I tried implementing the answers from inject jquery into puppeteer page but couldn't get my code to work. I will be doing much more complex stuff than the one illustrated above so I need jQuery and not vanilla JS solutions.
I finally got a tip from How to scrape that web page with Node.js and puppeteer
which helped me understand that the Puppeteer page.evaluate function gives you direct access to the DOM of the page you've just launched in Puppeteer. To get the following code to work, you should know I'm running this test in Jest. Also, you need a suitable URL to a page that has a table element with an ID. Obviously, you can change the details of both the page and the jQuery function you want to try out. I was in the middle of a jQuery Datatables project so I needed to make sure I had a table element and that jQuery could find it. The nice thing about this environment is that the browser is quite simply a real browser, so if I add a script tag to the actual HTML page instead of adding it via Puppeteer, it works just the same.
test('Check jQuery datatables', async () => {
const puppeteer = require('puppeteer');
let browser = await puppeteer.launch();
let page = await browser.newPage();
await page.goto('http://localhost/jest/table.html');
await page.addScriptTag({url: 'https://code.jquery.com/jquery-3.3.1.slim.min.js'});
const result = await page.evaluate(() => {
try {
var table = $("table").attr("id");
return table;
} catch (e) {
return e.message;
}
});
await console.log("result", result);
await browser.close();
});
The key discovery for me: within the page.evaluate function, your JavaScript code runs in the familiar context of the page you've just opened in the browser. I've moved on to create tests for complex objects created using jQuery plugins and within page.evaluate they behave as expected. Trying to use JSDOM was driving me crazy because it behaved a bit like a browser, but was different with regard to the key points I was using to test my application.

Categories

Resources