so am connection puppeteer to an already opened chrome browser Using This Peace of code
const browserURL = 'http://127.0.0.1:9222';
const browser = await puppeteer.connect({browserURL , defaultViewport : null });
const page = await browser.newPage();
but most of the times i want to connect to the browser using an authenticated proxies , maybe launching the chrome with specific flags ?
i tried proxy-login-Automator and it would launch chrome with authenticated proxy that i would connect to it but this so complicated if i want to keep changing proxies + if am using many instances of the same code
am launching Chrome with this.
chrome --remote-debugging-port=9222 --user-data-dir="C:\Users\USER\AppData\Local\Google\Chrome\User Data
Related
I'm trying to do some web scraping with Puppeteer and I need to retrieve the value into a Website I'm building.
I have tried to load the Puppeteer file in the html file as if it was a JavaScript file but I keep getting an error. However, if I run it in a cmd window it works well.
Scraper.js:
getPrice();
function getPrice() {
const puppeteer = require('puppeteer');
void (async () => {
try {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto('http://example.com')
await page.setViewport({ width: 1920, height: 938 })
await page.waitForSelector('.m-hotel-info > .l-container > .l-header-section > .l-m-col-2 > .m-button')
await page.click('.m-hotel-info > .l-container > .l-header-section > .l-m-col-2 > .m-button')
await page.waitForSelector('.modal-content')
await page.click('.tile-hsearch-hws > .m-search-tabs > #edit-search-panel > .l-em-reset > .m-field-wrap > .l-xs-col-4 > .analytics-click')
await page.waitForNavigation();
await page.waitForSelector('.tile-search-filter > .l-display-none')
const innerText = await page.evaluate(() => document.querySelector('.tile-search-filter > .l-display-none').innerText);
console.log(innerText)
} catch (error) {
console.log(error)
}
})()
}
index.html:
<html>
<head></head>
<body>
<script src="../js/scraper.js" type="text/javascript"></script>
</body>
</html>
The expected result should be this one in the console of Chrome:
But I'm getting this error instead:
What am I doing wrong?
EDIT: Since puppeteer removed support for puppeteer-web, I moved it out of the repo and tried to patch it a bit.
It does work with browser. The package is called puppeteer-web, specifically made for such cases.
But the main point is, there must be some instance of chrome running on some server. Only then you can connect to it.
You can use it later on in your web page to drive another browser instance through its WS Endpoint:
<script src="https://unpkg.com/puppeteer-web">
</script>
<script>
const browser = await puppeteer.connect({
browserWSEndpoint: `ws://0.0.0.0:8080`, // <-- connect to a server running somewhere
ignoreHTTPSErrors: true
});
const pagesCount = (await browser.pages()).length;
const browserWSEndpoint = await browser.wsEndpoint();
console.log({ browserWSEndpoint, pagesCount });
</script>
I had some fun with puppeteer and webpack,
playground-react-puppeteer
playground-electron-react-puppeteer-example
See these answers for full understanding of creating the server and more,
Official link to puppeteer-web
Puppeteer with docker
Puppeteer with chrome extension
Puppeteer with local wsEndpoint
Instead, use Puppeteer in the backend and make an API to interface your frontend with it if your main goal is to web scrape and get the data in the frontend.
Puppeteer runs on the server in Node.js. For the common case, rather than using puppeteer-web to allow the client to write Puppeteer code to control the browser, it's better to create an HTTP or websocket API that lets clients indirectly trigger Puppeteer code.
Reasons to prefer a REST API over puppeteer-connect:
better support for arbitrary client codebases--clients that aren't written in JS (desktop, command line and mobile apps, for example) can use the API just as easily as the browser can
no dependency on puppeteer-connect
lower client-side complexity; for many use cases JS won't be required at all if HTML forms suffice
better control of client behavior--running a browser on the server is a heavy load and has powerful capabilities that are easy to exploit
easier to integrate with other backend code and resources like the file system
provides seamless integration with an existing API as just another set of routes
hiding Puppeteer as an implementation detail lets you switch to, say, Playwright in the future without the client code being affected.
Similarly, rather than exposing a mock fs object to read and write files on the server, we expose REST API endpoints to accomplish these tasks. This is a useful layer of abstraction.
Since there are many use cases for Puppeteer in the context of an API (usually Express), it's hard to offer a general example, but here are a few case studies you can use as starting points:
Puppeteer unable to run on Heroku
Puppeteer doesn't close browser
Parallelism of Puppeteer with Express Router Node JS. How to pass page between routes while maintaining concurrency
I'm creating an chrome extension that helps people to login in one webesite, and I need to puppeteer connect in this url that the user is, can I connect in one already open website page to manipulate it?
I've tried to
const browserURL = "http://127.0.0.1:21222";
const browser = await puppeteer.connect({ browserURL });
and I tried start the chrome with:
chrome.exe --remote-debugging-port=21222
I need to connect one specific url for example fecebook.com, I tied:
example:
const browserURL = "http://facebook.com:21222";
and without the ":21222"...
I'm using window 10
node v16.16.0
thanks for helping!
so what happening to me is when i open 1 puppeteer instance it would go fast a but the more i open the more time it need to load the URL + fill information is that a normal thing ?
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({ path: 'example.png' });
await browser.close();
})();
Answer
Performance of
multiple pupeteer instances
and
running on the same machine
and
testing a single application
is highly dependent on the performance of your machine (4 cored , 8 threads , corei7 7700hq)
On my local setup I could not run more than 10 parallel instances and the performance drop was noticeable the more instances I've launched.
My Story
I have faced similar challenge, when I was trying to simulate multiple users using the same application in parallel.
I know: pupeteer (and/or) similar ui-test-automation tools are not good tools for stresstesting your application; or that: there are better tools for that.
Nevertheless, my case was:
Run "user-like" behavior
From the other end of the world
Collect HAR files - that represent network timings of the browser interacting with 10-20 different systems
Analyze the behavior
My approach was - maybe this helps you:
Create a puppeteer test
Enable headless running
Make it triggerable via curl
Dockerize it
Run the docker image on 10 different machines (5-10 dockerized pupeteer tests/machine)
Trigger them all at once via curl
I have a node.js (v12.17.0) npm (6.14.4) project I am running on my Windows 10 command prompt. It is using puppeteer ("puppeteer": "^5.3.1",) to go to a url and get info.
My puppeteer code runs this to go to the page:
let targetURL = "https://www.walmart.com/ip/Small-Large-Dogs-Muzzle-Anti-Stop-Bite-Barking-Chewing-Mesh-Mask-Training-S-XXL/449423974";
await page.goto(targetURL, { waitUntil: 'networkidle2' });
And my puppeteer is running a chromium exe I just downloaded, I specify it's locating when I start puppeteer:
var options = {
executablePath: "C:\\Users\\marti\\Downloads\\chrome-win-new\\chrome-win\\chrome.exe",
So when puppeteer stars and goes to that URL, it loads the walmart page at first and seems fine, then a couple of seconds later the page looks like this:
I ran windows defender and malware antibytes to double check if i had a virus, I checked and if i open chromium without puppeteer it doesnt redirect, only seems to happen in my node js program when I run it.
I tried using a different url (https://www.petcarerx.com/chicken-soup-for-the-soul-bacon-and-cheese-crunchy-bites-dog-treats/32928?sku=50857#50857) and it did not redirect, does anyone know how i can fix this for my Walmart url?
Is it possible to connect a browser to puppeteer without instantiating it in puppeteer? For example, running an instance of chromium like a regular user and then connecting that to an instance of puppeteer in code?
The answer is Yes and No.
You can connect to an existing using the connect function:
const browserURL = 'http://127.0.0.1:21222';
const browser = await puppeteer.connect({browserURL});
But, if you want to use those 2 lines you need to launch Chrome with the "--remote-debugging-port=21222 argument.
I believe you need to connect to an address ended with an id:
ws://127.0.0.1:9222/devtools/browser/{id}
When you launch Chrome with --remote-debugging-port, you'll see something like
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 [17:57:55]
...
DevTools listening on ws://127.0.0.1:9222/devtools/browser/44b3c476-5524-497e-9918-d73fa39e40cf
The address on the last line is what you need, i.e.
const browser = await puppeteer.connect({
browserWSEndpoint: "ws://127.0.0.1:9222/devtools/browser/44b3c476-5524-497e-9918-d73fa39e40cf"
});