I am trying to login into my gmail with puppeteer to lower the risk of recaptcha
here is my code
await page.goto('https://accounts.google.com/AccountChooser?service=mail&continue=https://mail.google.com/mail/', {timeout: 60000})
.catch(function (error) {
throw new Error('TimeoutBrows');
});
await page.waitForSelector('#identifierId' , { visible: true });
await page.type('#identifierId' , 'myemail');
await Promise.all([
page.click('#identifierNext') ,
page.waitForSelector('.whsOnd' , { visible: true })
])
await page.type('#password .whsOnd' , "mypassword");
await page.click('#passwordNext');
await page.waitFor(5000);
but i always end up with this message
I even tried to just open the login window with puppeteer and fill the login form manually myself, but even that failed.
Am I missing something ?
When I look into console there is a failed ajax call just after login.
Request URL: https://accounts.google.com/_/signin/challenge?hl=en&TL=APDPHBCG5lPol53JDSKUY2mO1RzSwOE3ZgC39xH0VCaq_WHrJXHS6LHyTJklSkxd&_reqid=464883&rt=j
Request Method: POST
Status Code: 401
Remote Address: 216.58.213.13:443
Referrer Policy: no-referrer-when-downgrade
)]}'
[[["er",null,null,null,null,401,null,null,null,16]
,["e",2,null,null,81]
]]
I've inspected your code and it seems to be correct despite of some selectors. Also, I had to add a couple of timeouts in order to make it work. However, I failed to reproduce your issue so I'll just post the code that worked for me.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.goto('https://accounts.google.com/AccountChooser?service=mail&continue=https://mail.google.com/mail/', {timeout: 60000})
.catch(function (error) {
throw new Error('TimeoutBrows');
});
await page.screenshot({path: './1.png'});
...
})();
Please, note that I run browser in normal, not headless mode. If you take a look at screenshot at this position, you will see that it is correct Google login form
The rest of the code is responsible for entering password
const puppeteer = require('puppeteer');
(async () => {
...
await page.waitForSelector('#identifierId', {visible: true});
await page.type('#identifierId', 'my#email');
await Promise.all([
page.click('#identifierNext'),
page.waitForSelector('.whsOnd', {visible: true})
]);
await page.waitForSelector('input[name=password]', {visible: true});
await page.type('input[name=password]', "my.password");
await page.waitForSelector('#passwordNext', {visible: true});
await page.waitFor(1000);
await page.click('#passwordNext');
await page.waitFor(5000);
})();
Please also note few differences from your code - the selector for password field is different. I had to add await page.waitForSelector('#passwordNext', {visible: true}); and a small timeout after that so the button could be clicked successfully.
I've tested all the code above and it worked successfully. Please, let me know if you still need some help or facing troubles with my example.
The purpose of question is to login to Gmail. I will share another method that does not involve filling email and password fields on puppeteer script
and works in headless: true mode.
Method
Login to your gmail using normal browser (google chrome preferebbly)
Export all cookies for the gmail tab
Use page.setCookie to import the cookies to your puppeteer instance
Login to gmail
This should be no brainer.
Export all cookies
I will use an extension called Edit This Cookie, however you can use other extensions or manual methods to extract the cookies.
Click the browser icon and then click the Export button.
Import cookies to puppeteer instance
We will save the cookies in a cookies.json file and then import using page.setCookie function before navigation. That way when gmail page loads, it will have login information right away.
The code might look like this.
const puppeteer = require("puppeteer");
const cookies = require('./cookies.json');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Set cookies here, right after creating the instance
await page.setCookie(...cookies);
// do the navigation,
await page.goto("https://mail.google.com/mail/u/0/#search/stackoverflow+survey", {
waitUntil: "networkidle2",
timeout: 60000
});
await page.screenshot({ path: "example.png" });
await browser.close();
})();
Result:
Notes:
It was not asked, but I should mention following for future readers.
Cookie Expiration: Cookies might be short lived, and expire shortly after, behave differently on a different device. Logging out on your original device will log you out from the puppeteer as well since it's sharing the cookies.
Two Factor: I am not yet sure about 2FA authentication. It did not ask me about 2FA probably because I logged in from same device.
Related
I want to start a chromium browser instant headless, do some automated operations, and then turn it visible before doing the rest of the stuff.
Is this possible to do using Puppeteer, and if it is, can you tell me how? And if it is not, is there any other framework or library for browser automation that can do this?
So far I've tried the following but it didn't work.
const browser = await puppeteer.launch({'headless': false});
browser.headless = true;
const page = await browser.newPage();
await page.goto('https://news.ycombinator.com', {waitUntil: 'networkidle2'});
await page.pdf({path: 'hn.pdf', format: 'A4'});
Short answer: It's not possible
Chrome only allows to either start the browser in headless or non-headless mode. You have to specify it when you launch the browser and it is not possible to switch during runtime.
What is possible, is to launch a second browser and reuse cookies (and any other data) from the first browser.
Long answer
You would assume that you could just reuse the data directory when calling puppeteer.launch, but this is currently not possible due to multiple bugs (#1268, #1270 in the puppeteer repo).
So the best approach is to save any cookies or local storage data that you need to share between the browser instances and restore the data when you launch the browser. You then visit the website a second time. Be aware that any state the website has in terms of JavaScript variable, will be lost when you recrawl the page.
Process
Summing up, the whole process should look like this (or vice versa for headless to headfull):
Crawl in non-headless mode until you want to switch mode
Serialize cookies
Launch or reuse second browser (in headless mode)
Restore cookies
Revisit page
Continue crawling
As mentioned, this isn't currently possible since the headless switch occurs via Chromium launch flags.
I usually do this with userDataDir, which the Chromium docs describe as follows:
The user data directory contains profile data such as history, bookmarks, and cookies, as well as other per-installation local state.
Here's a simple example. This launches a browser headlessly, sets a local storage value on an arbitrary page, closes the browser, re-opens it headfully, retrieves the local storage value and prints it.
const puppeteer = require("puppeteer"); // ^18.0.4
const url = "https://www.example.com";
const opts = {userDataDir: "./data"};
let browser;
(async () => {
{
browser = await puppeteer.launch({...opts, headless: true});
const [page] = await browser.pages();
await page.goto(url, {waitUntil: "domcontentloaded"});
await page.evaluate(() => localStorage.setItem("hello", "world"));
await browser.close();
}
{
browser = await puppeteer.launch({...opts, headless: false});
const [page] = await browser.pages();
await page.goto(url, {waitUntil: "domcontentloaded"});
const result = await page.evaluate(() => localStorage.getItem("hello"));
console.log(result); // => world
}
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
Change const opts = {userDataDir: "./data"}; to const opts = {}; and you'll see null print instead of world; the user data doesn't persist.
The answer from a few years ago mentions issues with userDataDir and suggests a cookies solution. That's fine, but I haven't had any issues with userDataDir so either they've been resolved on the Puppeteer end or my use cases haven't triggered the issues.
There's a useful-looking answer from a reputable source in How to turn headless on after launch? but I haven't had a chance to try it yet.
I'm attempting to use Puppeteer to navigate to a URL and extract the metrics from the Network tab in the Chrome developer tools. For example, navigating to this page shows the following Network info, and captures a total of 47 requests.
However, I'm trying to get these metrics using the following code:
import { Browser, Page } from "puppeteer";
const puppeteer = require('puppeteer');
async function run() {
const browser: Browser = await puppeteer.launch({
headless: false,
defaultViewport: null
});
const page: Page = await browser.newPage();
await page.goto("https://stackoverflow.com/questions/30455964/what-does-window-performance-getentries-mean");
let performanceTiming = JSON.parse(
await page.evaluate(() => JSON.stringify(window.performance.getEntries()))
);
console.log(performanceTiming);
}
run();
However, when I peer into the performanceTiming object, it only has 34 items in it:
Therefore, my questions are please:
Why is there a difference in the number of requests the Network tab vs performance.getEntries() are showing?
Is it possible to get performance.getEntries() to show all of the requests instead of only a portion of them?
Is it possible for Puppeteer to get all of the data from the Network tab?
You can use the page.tracing feature.
const browser = await puppeteer.launch({ headless: true});
const page = await browser.newPage();
await page.tracing.start({ categories: ['devtools.timeline'], path: "./tracing.json" });
await page.goto("https://stackoverflow.com/questions/30455964/what-does-window-performance-getentries-mean");
var tracing = JSON.parse(await page.tracing.stop());
The devtools.timeline category has many things to explore.
You can get all the request filtering by ResourceSendRequest
tracing.traceEvents.filter(te => te.name ==="ResourceSendRequest")
And all the response filtering by ResourceReceiveResponse
tracing.traceEvents.filter(te => te.name ==="ResourceReceiveResponse")
I am trying to run puppeteer in bamboo build run. But seems there is problem to execute it properly. The detail error below
I wonder if there is stuff I have to install to get it able to run in bamboo? or I have to do other alternative. There is no articles available online regarding this issue.
And a bit more background, I am trying to implement jest-image-snapshot into my test process. and making a call to generate snapshot like this
const puppeteer = require('puppeteer');
let browser;
beforeAll(async () => {
browser = await puppeteer.launch();
});
it('show correct page: variant', async () => {
const page = await browser.newPage();
await page.goto(
'http://localhost:8080/app/register?experimentName=2018_12_STREAMLINED_ACCOUNT&experimentVariation=STREAMLINED#/'
);
const image = await page.screenshot();
expect(image).toMatchImageSnapshot();
});
afterAll(async () => {
await browser.close();
});
the reason log of TypeError: Cannot read property 'newPage' of undefined is because const page = await browser.newPage();
The important part is in your screenshot:
Failed to launch chrome! ... No usable sandbox!
Try to launch puppeteer without a sandbox like this:
await puppeteer.launch({
args: ['--no-sandbox']
});
Depending on the platform, you might also want to try the following arguments (also in addition):
--disable-setuid-sandbox
--disable-dev-shm-usage
If all three do not work, the Troubleshooting guide might have additional information.
I'm trying use puppeteer to automate the login process for our agents in Amazon Connect however I can't get puppeteer to finish loading the CCP login page. See code below:
const browser = await puppeteer.launch();
const page = await browser.newPage();
const url = 'https://ccalderon-reinvent.awsapps.com/connect/ccp#/';
await page.goto(url, {waitUntil: 'domcontentloaded'});
console.log(await page.content());
// console.log('waiting for username input');
// await page.waitForSelector('#wdc_username');
await browser.close();
I can never see the content of the page, it times out. Am I doing something wrong? If I launch the browser with { headless: false } I can see the page never finishes loading.
Please note the same code works fine with https://www.github.com/login so it must be something specific to the source code of Connect's CCP.
In case you are from future and having problem with puppeteer for no reason, try to downgrade the puppeteer version first and see if the issue persists.
This seems like a bug with Chromium Development Version 73.0.3679.0, The error log said it could not load specific script somehow, but we could still load the script manually.
The Solution:
Using Puppeteer version 1.11.0 solved this issue. But if you want to use puppeteer version 1.12.2 but with a different chromium revision, you can use the executablePath argument.
Here are the respective versions used on puppeteer (at this point of answer),
Chromium 73.0.3679.0 - Puppeteer v1.12.2
Chromium 72.0.3582.0 - Puppeteer v1.11.0
Chromium 71.0.3563.0 - Puppeteer v1.9.0
Chromium 70.0.3508.0 - Puppeteer v1.7.0
Chromium 69.0.3494.0 - Puppeteer v1.6.2
I checked my locally installed chrome,which was loading the page correctly,
$(which google-chrome) --version
Google Chrome 72.0.3626.119
Note: The puppeteer team suggested on their doc to specifically use the chrome provided with the code (most likely the latest developer version) instead of using different revisions.
Also I edited the code a little bit to finish loading when all network requests is done and the username input is visible.
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch({
headless: false,
executablePath: "/usr/bin/google-chrome"
});
const page = await browser.newPage();
const url = "https://ccalderon-reinvent.awsapps.com/connect/ccp#/";
await page.goto(url, { waitUntil: "networkidle0" });
console.log("waiting for username input");
await page.waitForSelector("#wdc_username", { visible: true });
await page.screenshot({ path: "example.png" });
await browser.close();
})();
The specific revision number can be obtained in many ways, one is to check the package.json of puppeteer package. The url for 1.11.0 is,
https://github.com/GoogleChrome/puppeteer/blob/v1.11.0/package.json
If you like to automate the chrome revision downloading, you can use browserFetcher to fetch specific revision.
const browserFetcher = puppeteer.createBrowserFetcher();
const revisionInfo = await browserFetcher.download('609904'); // chrome 72 is 609904
const browser = await puppeteer.launch({executablePath: revisionInfo.executablePath})
Result:
So I am trying to use puppeteer to automate some data entry functions in Oracle Cloud applications.
As of now I am able to launch the cloud app login page, enter username and password credentials and click login button. Once login is successful, Oracle opens a homepage for the user. Once this happens if I take screenshot or execute a page.content the screenshot and the content html is from the login page not of the homepage.
How do I always have a reference to the current page that the user is on?
Here is the basic code so far.
const puppeteer = require('puppeteer');
const fs = require('fs');
(async () => {
const browser = await puppeteer.launch({headless: false});
let page = await browser.newPage();
await page.goto('oraclecloudloginurl', {waitUntil: 'networkidle2'});
await page.type('#userid', 'USERNAME', {delay : 10});
await page.type('#password', 'PASSWORD', {delay : 10});
await page.waitForSelector('#btnActive', {enabled : true});
page.click('#btnActive', {delay : 1000}).then(() => console.log('Login Button Clicked'));
await page.waitForNavigation();
await page.screenshot({path: 'home.png'});
const html = await page.content();
await fs.writeFileSync('home.html', html);
await page.waitFor(10000);
await browser.close();
})();
With this the user logs in fine and the home page is displayed. But I get an error after that when I try to screenshot the homepage and render the html content. It seems to be the page has changed and I am referring to the old page. How can I refer to the context of the current page?
Below is the error:
(node:14393) UnhandledPromiseRejectionWarning: Error: Protocol error (Runtime.callFunctionOn): Cannot find context with specified id undefined
This code looks problematic for two reasons:
page.click('#btnActive', {delay : 1000}).then(() => console.log('Login Button Clicked'));
await page.waitForNavigation();
The first problem is that the page.click().then() spins off a totally separate promise chain:
page.click() --> .then(...)
|
v
page.waitForNavigation()
|
v
page.screenshot(...)
|
v
...
This means the click that triggers the navigation and the navigation are running in parallel and can never be rejoined into the same promise chain. The usual solution here is to tie them into the same promise chain:
// Note: this code is still broken; keep reading!
await page.click('#btnActive', {delay : 1000});
console.log('Login Button Clicked');
await page.waitForNavigation();
This adheres to the principle of not mixing then and await unless you have good reason to.
But the above code is still broken because Puppeteer requires the waitForNavigation() promise to be set before the event that triggers navigation is fired. The fix is:
await Promise.all([
page.waitForNavigation(),
page.click('#btnActive', {delay : 1000}),
]);
or
const navPromise = page.waitForNavigation(); // no await
await page.click('#btnActive', {delay : 1000});
await navPromise;
Following this pattern, Puppeteer should no longer be confused about its context.
Minor notes:
'networkidle2' is slow and probably unnecessary, especially for a page you're soon going to be navigating away from. I'd default to 'domcontentloaded'.
await page.waitFor(10000); is deprecated along with page.waitForTimeout(), although I realize this is an older post.