I'm running a test using the headless Chrome package Puppeteer:
const puppeteer = require('puppeteer')
;(async() => {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto('https://google.com', {waitUntil: 'networkidle'})
// Type our query into the search bar
await page.type('puppeteer')
await page.click('input[type="submit"]')
// Wait for the results to show up
await page.waitForSelector('h3 a')
// Extract the results from the page
const links = await page.evaluate(() => {
const anchors = Array.from(document.querySelectorAll('h3 a'))
return anchors.map(anchor => anchor.textContent)
})
console.log(links.join('\n'))
browser.close()
})()
And I'm running the script as: node --harmony test/e2e/puppeteer/index.js (v6.9.1)
But I get this error:
;(async() => {
^
SyntaxError: Unexpected token (
What could be the problem?
Note: I'm using Vue CLI's official Webpack template:
I found out: node LTS (AKA node 6) is not supporting async / await mechanism right now. See :
See here for details : https://www.infoq.com/news/2017/02/node-76-async-await
I tried your code on my laptop after a lint and it worked perfectly:
Maybe you have an environment issue.
have you considered removing the semi colon at the beginning of the line?
It does not look like right programing. Or maybe a webpack issue.
Related
I'm trying something really simple:
Navigate to google.com
Fill the search box with "cheese"
Press enter on the search box
Print the text for the title of the first result
So simple, but I can't get it to work. This is the code:
const playwright = require('playwright');
(async () => {
for (const browserType of ['chromium', 'firefox', 'webkit']) {
const browser = await playwright[browserType].launch();
try {
const context = await browser.newContext();
const page = await context.newPage();
await page.goto('https://google.com');
await page.fill('input[name=q]', 'cheese');
await page.press('input[name=q]', 'Enter');
await page.waitForNavigation();
page.waitForSelector('div#rso h3')
.then(firstResult => console.log(`${browserType}: ${firstResult.textContent()}`))
.catch(error => console.error(`Waiting for result: ${error}`));
} catch(error) {
console.error(`Trying to run test on ${browserType}: ${error}`);
} finally {
await browser.close();
}
}
})();
At first I tried to get the first result with a page.$() but it didn't work. After investigating the issue a little bit I discovered that page.waitForNavigation() that I thought would be the solution, but it isn't.
I'm using the latest playwright version: 1.0.2.
It seems to me that the only problem was with your initial promise composition, I've just refactored the promise to async/await and using page.$eval to retrieve the textContent it works perfectly, there are no target closed errors anymore.
try {
const context = await browser.newContext();
const page = await context.newPage();
await page.goto('https://google.com');
await page.fill('input[name=q]', 'cheese');
await page.press('input[name=q]', 'Enter');
await page.waitForNavigation();
// page.waitForSelector('div#rso h3').then(firstResult => console.log(`${browserType}: ${firstResult.textContent()}`)).catch(error => console.error(`Waiting for result: ${error}`));
await page.waitForSelector('div#rso h3');
const firstResult = await page.$eval('div#rso h3', firstRes => firstRes.textContent);
console.log(`${browserType}: ${firstResult}`)
} catch(error) {
console.error(`Trying to run test on ${browserType}: ${error}`);
} finally {
await browser.close();
}
}
Output:
chrome: Cheese – Wikipedia
firefox: Cheese – Wikipedia
webkit: Cheese – Wikipedia
Note: chrome and webkit works, firefox fails on waitForNavigation for me. If I replaced it with await page.waitForTimeout(5000); firefox worked as well. It might be an issue with playwright's Firefox support for the navigation promise.
If you await the page.press('input[name=q]', 'Enter'); it might be too late for waitForNavigation to work.
You could remove the await on the press call. You can need to wait for the navigation, not the press action.
const context = await browser.newContext();
const page = await context.newPage();
await page.goto('https://google.com');
await page.fill('input[name=q]', 'cheese');
page.press('input[name=q]', 'Enter');
await page.waitForNavigation();
var firstResult = await page.waitForSelector('div#rso h3');
console.log(`${browserType}: ${await firstResult.textContent()}`);
Also notice that you need to await for textContent().
In my case the Playwright error Target closed appeared at the first attempt to retrieve a text from the page.
The error is inaccurate, the actual reason was that the Basic Auth was enabled in the target site.
Playwright could not open a page and just stuck with "Target closed".
const options = {
httpCredentials = { username: 'user', password: 'password'}
};
const context = await browser.newContext(options);
One more issue was that local tests were running without a problem, including docker containers, while Github CI was failing with Playwright without any details except the above error.
The reason was with a special symbol in a Github Secret. For example, the dollar sign $ will be just removed from the secret in Github Actions. To correct it, either use env: section
env:
XXX: ${ secrets.SUPER_SECRET }
or wrap the secret in single quotes:
run: |
export XXX='${{ secrets.YYY}}'
A similar escaping specificity exists in Kubernetes, Docker and Gitlub; $$ becomes $ and z$abc becomes z.
Use mcr.microsoft.com/playwright docker hub image from Microsoft with pre-installed node, npm and playwright. Alternatively during the playwright installation do not forget to install system package dependencies by running npx playwright install-deps.
A VM should have enough resources to handle browser tests. A common problem in CI/CD worfklows.
I'm trying to scrape an address from whitepages.com, but my scraper keeps throwing this error every time I run it.
(node:11389) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'getProperty' of undefined
here's my code:
const puppeteer = require('puppeteer')
async function scrapeAddress(url){
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url,{timeout: 0, waitUntil: 'networkidle0'});
const [el]= await page.$x('//*[#id="left"]/div/div[4]/div[3]/div[2]/a/h3/span[1]');
// console.log(el)
const txt = await el.getProperty('textContent');
const rawTxt = await txt.jsonValue();
console.log({rawTxt});
browser.close();
}
scrapeAddress('https://www.whitepages.com/business/CA/San-Diego/Cvs-Health/b-1ahg5bs')
After investigating a bit, I realized that the el variable is getting returned as undefined and I'm not sure why. I've tried this same code to get elements from other sites but only for this site am I getting this error.
I tried both the full and short XPath as well as other surrounding elements and everything on this site throws this error.
Why would this be happening and is there any way I can fix it?
You can try wrapping everything in a try catch block, otherwise try unwrapping the promise with then().
(async() => {
const browser = await puppeteer.launch();
try {
const page = await browser.newPage();
await page.goto(url,{timeout: 0, waitUntil: 'networkidle0'});
const [el]= await page.$x('//*[#id="left"]/div/div[4]/div[3]/div[2]/a/h3/span[1]');
// console.log(el)
const txt = await el.getProperty('textContent');
const rawTxt = await txt.jsonValue();
console.log({rawTxt});
} catch (err) {
console.error(err.message);
} finally {
await browser.close();
}
})();
The reason is the website detects puppeteer as an automated bot. Set the headless to false and you can see it never navigates to the website.
I'd suggest using puppeteer-extra-plugin-stealth. Also always make sure to wait for the element to appear in the page.
const puppeteer = require('puppeteer-extra');
const pluginStealth = require('puppeteer-extra-plugin-stealth');
puppeteer.use(pluginStealth());
async function scrapeAddress(url){
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url,{waitUntil: 'networkidle0'});
//wait for xpath
await page.waitForXPath('//*[#id="left"]/div/div[4]/div[3]/div[2]/a/h3/span[1]');
const [el]= await page.$x('//*[#id="left"]/div/div[4]/div[3]/div[2]/a/h3/span[1]');
// console.log(el)
const txt = await el.getProperty('textContent');
const rawTxt = await txt.jsonValue();
console.log({rawTxt});
browser.close();
}
scrapeAddress('https://www.whitepages.com/business/CA/San-Diego/Cvs-Health/b-1ahg5bs')
I recently ran into this error and changing my xpath worked for me. I had one grabbing the Full xpath and it was causing some issues
Most probably because the website is responsive, therefore when the scraper runs, it shows different XPATH.
I would suggest you to debug by using a headless browser:
const browser = await puppeteer.launch({headless: false});
I took the code that #mbit provided and modified it to my needs and also used a headless browser. I was unable to do it using a headless browser. If anyone was able to figure out how to do that please explain. Here is my solution:
first you must install a couple things in console bash so run the following two commands:
npm install puppeteer-extra
npm install puppeteer-extra-plugin-stealth
Installing these will allow you to run the first few lines in #mbit 's code.
Then in this line of code:
const browser = await puppeteer.launch();
as a parameter to puppeteer.launch(); pass in the following:
{headless: false}
which should in turn look like this:
const browser = await puppeteer.launch({headless: false});
I also believe that the Path that #mbit was using may not exist anymore so provide one of your own as well as a site. You can do this using the following 3 lines of code, just replace {XPath} with your own XPath and {address} with your own web address. NOTE: be mindful of your usage of quotes '' or "" as the XPath address may have the same ones that you are used to using which will mess up your path.
await page.waitForXPath({XPath});
const [el]= await page.$x({XPath});
scrapeAddress({address})
After you do this you should be able to run your code and retrieve values
Heres what my code looked like in the end, feel free to copy paste into your own file to confirm that it works on your end at all!
let puppeteer = require('puppeteer-extra');
let pluginStealth = require('puppeteer-extra-plugin-stealth');
puppeteer.use(pluginStealth());
puppeteer = require('puppeteer')
async function scrapeAddress(url){
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.goto(url,{waitUntil: 'networkidle0'});
//wait for xpath
await page.waitForXPath('//*[#id="root"]/div[1]/div[2]/div[2]/div[9]/div/div/div/div[3]/div[2]/div[3]/div[3]');
const [el]= await page.$x('//*[#id="root"]/div[1]/div[2]/div[2]/div[9]/div/div/div/div[3]/div[2]/div[3]/div[3]');
const txt = await el.getProperty('textContent');
const rawTxt = await txt.jsonValue();
console.log({rawTxt});
browser.close();
}
scrapeAddress("https://stockx.com/air-jordan-1-retro-high-unc-leather")
I have make a node script that uses puppeteer on MacOS. The script just launches puppeteer and intercept requests.
Here is the part of the code that uses puppeteer:
const getAllUrls = async (rootUrl) => {
const puppeteer = require('puppeteer');
const urls = [];
await puppeteer.launch().then(async browser => {
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', interceptedRequest => {
if (isRelevantUrl(interceptedRequest.url())) {
urls.push(interceptedRequest.url());
interceptedRequest.abort();
} else {
interceptedRequest.continue();
}
});
await page.goto(rootUrl);
await browser.close()
})
.catch(err => console.log(err));
return urls;
}
While running it on MacOS the script works great. But when I try running it on my office with Windows I get the following error message:
Error: Failed to launch chrome!
TROUBLESHOOTING:
https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md
at onClose
(C:\Users........\node_modules\puppeteer\lib\Launcher.js:339:14) at
ChildProcess.helper.addEventListener
(C:\Users........\node_modules\puppeteer\lib\Launcher.js:329:60)
I have tried the following config recommended by puppeteer troubleshooting:
const browser = await puppeteer.launch({
ignoreDefaultArgs: ['--disable-extensions'],
});
But it didn't helped.
I have hard copied the script (without node-modules of course) and paste it on the project at my office. Then did npm i.
The rest of the packages used on the script worked good on Windows as well.
Please help.
I am trying to run puppeteer in bamboo build run. But seems there is problem to execute it properly. The detail error below
I wonder if there is stuff I have to install to get it able to run in bamboo? or I have to do other alternative. There is no articles available online regarding this issue.
And a bit more background, I am trying to implement jest-image-snapshot into my test process. and making a call to generate snapshot like this
const puppeteer = require('puppeteer');
let browser;
beforeAll(async () => {
browser = await puppeteer.launch();
});
it('show correct page: variant', async () => {
const page = await browser.newPage();
await page.goto(
'http://localhost:8080/app/register?experimentName=2018_12_STREAMLINED_ACCOUNT&experimentVariation=STREAMLINED#/'
);
const image = await page.screenshot();
expect(image).toMatchImageSnapshot();
});
afterAll(async () => {
await browser.close();
});
the reason log of TypeError: Cannot read property 'newPage' of undefined is because const page = await browser.newPage();
The important part is in your screenshot:
Failed to launch chrome! ... No usable sandbox!
Try to launch puppeteer without a sandbox like this:
await puppeteer.launch({
args: ['--no-sandbox']
});
Depending on the platform, you might also want to try the following arguments (also in addition):
--disable-setuid-sandbox
--disable-dev-shm-usage
If all three do not work, the Troubleshooting guide might have additional information.
I'm trying use puppeteer to automate the login process for our agents in Amazon Connect however I can't get puppeteer to finish loading the CCP login page. See code below:
const browser = await puppeteer.launch();
const page = await browser.newPage();
const url = 'https://ccalderon-reinvent.awsapps.com/connect/ccp#/';
await page.goto(url, {waitUntil: 'domcontentloaded'});
console.log(await page.content());
// console.log('waiting for username input');
// await page.waitForSelector('#wdc_username');
await browser.close();
I can never see the content of the page, it times out. Am I doing something wrong? If I launch the browser with { headless: false } I can see the page never finishes loading.
Please note the same code works fine with https://www.github.com/login so it must be something specific to the source code of Connect's CCP.
In case you are from future and having problem with puppeteer for no reason, try to downgrade the puppeteer version first and see if the issue persists.
This seems like a bug with Chromium Development Version 73.0.3679.0, The error log said it could not load specific script somehow, but we could still load the script manually.
The Solution:
Using Puppeteer version 1.11.0 solved this issue. But if you want to use puppeteer version 1.12.2 but with a different chromium revision, you can use the executablePath argument.
Here are the respective versions used on puppeteer (at this point of answer),
Chromium 73.0.3679.0 - Puppeteer v1.12.2
Chromium 72.0.3582.0 - Puppeteer v1.11.0
Chromium 71.0.3563.0 - Puppeteer v1.9.0
Chromium 70.0.3508.0 - Puppeteer v1.7.0
Chromium 69.0.3494.0 - Puppeteer v1.6.2
I checked my locally installed chrome,which was loading the page correctly,
$(which google-chrome) --version
Google Chrome 72.0.3626.119
Note: The puppeteer team suggested on their doc to specifically use the chrome provided with the code (most likely the latest developer version) instead of using different revisions.
Also I edited the code a little bit to finish loading when all network requests is done and the username input is visible.
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch({
headless: false,
executablePath: "/usr/bin/google-chrome"
});
const page = await browser.newPage();
const url = "https://ccalderon-reinvent.awsapps.com/connect/ccp#/";
await page.goto(url, { waitUntil: "networkidle0" });
console.log("waiting for username input");
await page.waitForSelector("#wdc_username", { visible: true });
await page.screenshot({ path: "example.png" });
await browser.close();
})();
The specific revision number can be obtained in many ways, one is to check the package.json of puppeteer package. The url for 1.11.0 is,
https://github.com/GoogleChrome/puppeteer/blob/v1.11.0/package.json
If you like to automate the chrome revision downloading, you can use browserFetcher to fetch specific revision.
const browserFetcher = puppeteer.createBrowserFetcher();
const revisionInfo = await browserFetcher.download('609904'); // chrome 72 is 609904
const browser = await puppeteer.launch({executablePath: revisionInfo.executablePath})
Result: