Puppeteer does not evaluate web-app on heroku - javascript

I'm using puppeteer on my telegram bot to fetch stuff from the web.
Locally, it's all work just fine, but here is the catch:
When trying to fetch the URL's HTML, the given response is not the HTML file, but the web app with all the js files. On my local machine - I'm getting the HTML file with all needed links.
That's my code:
return await puppeteer.launch({
args: ['--no-sandbox', '--disable-setuid-sandbox'],
headless: true,
}).then(async browser => {
const page = await browser.newPage()
page.setJavaScriptEnabled(true)
await page.goto(targetUrl)
await page.waitForTimeout(2000)
const data = await page.$$eval(SELECTORS, (res) => {
return res.map(r => {
return r.getAttribute('ATRR')
})
}) as never as string[]
.......
})
PS --------------
already added this repo to Heroku:
https://github.com/jontewks/puppeteer-heroku-buildpack

Apparently, when you are using puppeteer-extra, you need to install puppeteer library too.
For some reason, they both depend BUT not auto-installed, so the Chromium engine didn't install on Heroku, and it does explain how it does on my local machine - which has it by default.

Related

External resources in Puppeteer with Chrome executable fail to load (net::ERR_EMPTY_RESPONSE)

I'm having issues using external resources in a Puppeteer job that I'm running with a full Chrome executable (not the default Chromium). Any help would be massively appreciated!
So for example, if I load a video with a public URL it fails even though it works fine if I hit it manually in the browser.
const videoElement = document.createElement('video');
videoElement.src = src;
videoElement.onloadedmetadata = function() {
console.log(videoElement.duration);
};
Here's my Puppeteer call:
(async () => {
const browser = await puppeteer.launch({
args: [
'--remote-debugging-port=9222',
'--autoplay-policy=no-user-gesture-required',
'--allow-insecure-localhost',
'--proxy-server=http://localhost:9000',
'--proxy-bypass-list=""',
'--no-sandbox',
'--disable-setuid-sandbox',
],
executablePath:
'/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',
});
const page = await browser.newPage();
logConsole(page);
await page.goto(`http://${hostname}/${path}`, {
waitUntil: 'networkidle2',
});
await page.waitForSelector('#job-complete');
console.log('Job complete!');
await browser.close();
})();
Unlike many Puppeteer examples, the issue here isn't that my test doesn't wait long enough. The resources fail to load / return empty responses almost instantly.
It also doesn't appear to be an authentication issue - I reach my own server just fine.
Although I'm not running on https here, the URL I try directly in the browser works without SSL.
I should also mention that this is a React (CRA) website and I'm calling Puppeteer with Node.
I can see that at least 3 other external resources (non-video) also fail. Is there a flag or something I should be using that I'm missing? Thanks so much for any help!
In my case I had to use puppeteer-extra and puppeteer-extra-plugin-stealth:
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
I also found the following flags useful:
const browser = await puppeteer.launch({
args: [
'--disable-web-security',
'--autoplay-policy=no-user-gesture-required',
'--no-sandbox',
'--disable-setuid-sandbox',
'--remote-debugging-port=9222',
'--allow-insecure-localhost',
],
executablePath:
'/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',
});
Finally, I found it necessary in a few cases to bypass CSP:
await page.setBypassCSP(true);
Please be careful using these rather insecure settings 😬

Puppeteer worked in MacOS but not in Windows

I have make a node script that uses puppeteer on MacOS. The script just launches puppeteer and intercept requests.
Here is the part of the code that uses puppeteer:
const getAllUrls = async (rootUrl) => {
const puppeteer = require('puppeteer');
const urls = [];
await puppeteer.launch().then(async browser => {
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', interceptedRequest => {
if (isRelevantUrl(interceptedRequest.url())) {
urls.push(interceptedRequest.url());
interceptedRequest.abort();
} else {
interceptedRequest.continue();
}
});
await page.goto(rootUrl);
await browser.close()
})
.catch(err => console.log(err));
return urls;
}
While running it on MacOS the script works great. But when I try running it on my office with Windows I get the following error message:
Error: Failed to launch chrome!
TROUBLESHOOTING:
https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md
at onClose
(C:\Users........\node_modules\puppeteer\lib\Launcher.js:339:14) at
ChildProcess.helper.addEventListener
(C:\Users........\node_modules\puppeteer\lib\Launcher.js:329:60)
I have tried the following config recommended by puppeteer troubleshooting:
const browser = await puppeteer.launch({
ignoreDefaultArgs: ['--disable-extensions'],
});
But it didn't helped.
I have hard copied the script (without node-modules of course) and paste it on the project at my office. Then did npm i.
The rest of the packages used on the script worked good on Windows as well.
Please help.

Failed to launch chrome!, failed to launch chrome puppeteer in bamboo for jest image snapshot test

I am trying to run puppeteer in bamboo build run. But seems there is problem to execute it properly. The detail error below
I wonder if there is stuff I have to install to get it able to run in bamboo? or I have to do other alternative. There is no articles available online regarding this issue.
And a bit more background, I am trying to implement jest-image-snapshot into my test process. and making a call to generate snapshot like this
const puppeteer = require('puppeteer');
let browser;
beforeAll(async () => {
browser = await puppeteer.launch();
});
it('show correct page: variant', async () => {
const page = await browser.newPage();
await page.goto(
'http://localhost:8080/app/register?experimentName=2018_12_STREAMLINED_ACCOUNT&experimentVariation=STREAMLINED#/'
);
const image = await page.screenshot();
expect(image).toMatchImageSnapshot();
});
afterAll(async () => {
await browser.close();
});
the reason log of TypeError: Cannot read property 'newPage' of undefined is because const page = await browser.newPage();
The important part is in your screenshot:
Failed to launch chrome! ... No usable sandbox!
Try to launch puppeteer without a sandbox like this:
await puppeteer.launch({
args: ['--no-sandbox']
});
Depending on the platform, you might also want to try the following arguments (also in addition):
--disable-setuid-sandbox
--disable-dev-shm-usage
If all three do not work, the Troubleshooting guide might have additional information.

Use different ip addresses in puppeteer requests

I have multiple ip interfaces in my server and I can't find how to force puppeteer to use them in its requests
I am using node v10.15.0 and puppeteer 1.11.0
You can use the flag --netifs-to-ignore when launching the browser to specify which interfaces should be ignored by Chrome. Quote from the List of Chromium Command Line Switches:
--netifs-to-ignore: List of network interfaces to ignore. Ignored interfaces will not be used for network connectivity
You can use the argument like this when launching the browser:
const browser = await puppeteer.launch({
args: ['--netifs-to-ignore=INTERFACE_TO_IGNORE']
});
Maybe this will help. You can see the full code here
'use strict';
const puppeteer = require('puppeteer');
(async() => {
const browser = await puppeteer.launch({
// Launch chromium using a proxy server on port 9876.
// More on proxying:
// https://www.chromium.org/developers/design-documents/network-settings
args: [ '--proxy-server=127.0.0.1:9876' ]
});
const page = await browser.newPage();
await page.goto('https://google.com');
await browser.close();
})();

How to use puppeteer to automante Amazon Connect CCP login?

I'm trying use puppeteer to automate the login process for our agents in Amazon Connect however I can't get puppeteer to finish loading the CCP login page. See code below:
const browser = await puppeteer.launch();
const page = await browser.newPage();
const url = 'https://ccalderon-reinvent.awsapps.com/connect/ccp#/';
await page.goto(url, {waitUntil: 'domcontentloaded'});
console.log(await page.content());
// console.log('waiting for username input');
// await page.waitForSelector('#wdc_username');
await browser.close();
I can never see the content of the page, it times out. Am I doing something wrong? If I launch the browser with { headless: false } I can see the page never finishes loading.
Please note the same code works fine with https://www.github.com/login so it must be something specific to the source code of Connect's CCP.
In case you are from future and having problem with puppeteer for no reason, try to downgrade the puppeteer version first and see if the issue persists.
This seems like a bug with Chromium Development Version 73.0.3679.0, The error log said it could not load specific script somehow, but we could still load the script manually.
The Solution:
Using Puppeteer version 1.11.0 solved this issue. But if you want to use puppeteer version 1.12.2 but with a different chromium revision, you can use the executablePath argument.
Here are the respective versions used on puppeteer (at this point of answer),
Chromium 73.0.3679.0 - Puppeteer v1.12.2
Chromium 72.0.3582.0 - Puppeteer v1.11.0
Chromium 71.0.3563.0 - Puppeteer v1.9.0
Chromium 70.0.3508.0 - Puppeteer v1.7.0
Chromium 69.0.3494.0 - Puppeteer v1.6.2
I checked my locally installed chrome,which was loading the page correctly,
$(which google-chrome) --version
Google Chrome 72.0.3626.119
Note: The puppeteer team suggested on their doc to specifically use the chrome provided with the code (most likely the latest developer version) instead of using different revisions.
Also I edited the code a little bit to finish loading when all network requests is done and the username input is visible.
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch({
headless: false,
executablePath: "/usr/bin/google-chrome"
});
const page = await browser.newPage();
const url = "https://ccalderon-reinvent.awsapps.com/connect/ccp#/";
await page.goto(url, { waitUntil: "networkidle0" });
console.log("waiting for username input");
await page.waitForSelector("#wdc_username", { visible: true });
await page.screenshot({ path: "example.png" });
await browser.close();
})();
The specific revision number can be obtained in many ways, one is to check the package.json of puppeteer package. The url for 1.11.0 is,
https://github.com/GoogleChrome/puppeteer/blob/v1.11.0/package.json
If you like to automate the chrome revision downloading, you can use browserFetcher to fetch specific revision.
const browserFetcher = puppeteer.createBrowserFetcher();
const revisionInfo = await browserFetcher.download('609904'); // chrome 72 is 609904
const browser = await puppeteer.launch({executablePath: revisionInfo.executablePath})
Result:

Categories

Resources