Launch Tor browser using Puppeteer instead of Chrome on Windows 10 - javascript

I'm on a Windows 10 machine, I've downloaded the Tor browser and using the Tor browser normally works fine, but I'd like to make Puppeteer use Tor to launch in a headless mode, I'm seeing a lot regarding the Socks5 proxy but can't figure out how to set this up and why it's not working? Presumably when running the launch method it launches Tor in the background?
Here's my JS code in node so far...
// puppeteer-extra is a drop-in replacement for puppeteer,
// it augments the installed puppeteer with plugin functionality
const puppeteer = require('puppeteer-extra')
// add stealth plugin and use defaults (all evasion techniques)
const StealthPlugin = require('puppeteer-extra-plugin-stealth')
puppeteer.use(StealthPlugin())
// artificial sleep function
const sleep = async (ms) => {
return new Promise((res, rej) => {
setTimeout(() => {
res()
}, ms)
})
}
// login function
const emulate = async () => {
// initiate a Puppeteer instance with options and launch
const browser = await puppeteer.launch({
headless: false,
args: [
'--proxy-server=socks5://127.0.0.1:1337'
]
});
// launch Facebook and wait until idle
const page = await browser.newPage()
// go to Tor
await page.goto('https://check.torproject.org/');
const isUsingTor = await page.$eval('body', el =>
el.innerHTML.includes('Congratulations. This browser is configured to use Tor')
);
if (!isUsingTor) {
console.log('Not using Tor. Closing...')
return await browser.close()
}
// do something...
}
// kick it off
emulate()
This gives me a ERR_PROXY_CONNECTION_FAILED error in chromium, why isn't it launching using Tor?

There are lot more steps you need to take.
You need to install tor on your system. You might want to use-
brew install tor
Start tor with-
brew services start tor
tor use port 9050 by default, so your proxy should look like this;
--proxy-server=socks5://127.0.0.1:9050
If you must use another port, then it must be added in the torrc file.
Also, you might need to do your //go to tor// before your //launch facebook//

Related

Puppeteer worked in MacOS but not in Windows

I have make a node script that uses puppeteer on MacOS. The script just launches puppeteer and intercept requests.
Here is the part of the code that uses puppeteer:
const getAllUrls = async (rootUrl) => {
const puppeteer = require('puppeteer');
const urls = [];
await puppeteer.launch().then(async browser => {
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', interceptedRequest => {
if (isRelevantUrl(interceptedRequest.url())) {
urls.push(interceptedRequest.url());
interceptedRequest.abort();
} else {
interceptedRequest.continue();
}
});
await page.goto(rootUrl);
await browser.close()
})
.catch(err => console.log(err));
return urls;
}
While running it on MacOS the script works great. But when I try running it on my office with Windows I get the following error message:
Error: Failed to launch chrome!
TROUBLESHOOTING:
https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md
at onClose
(C:\Users........\node_modules\puppeteer\lib\Launcher.js:339:14) at
ChildProcess.helper.addEventListener
(C:\Users........\node_modules\puppeteer\lib\Launcher.js:329:60)
I have tried the following config recommended by puppeteer troubleshooting:
const browser = await puppeteer.launch({
ignoreDefaultArgs: ['--disable-extensions'],
});
But it didn't helped.
I have hard copied the script (without node-modules of course) and paste it on the project at my office. Then did npm i.
The rest of the packages used on the script worked good on Windows as well.
Please help.

How can we run offline tests using cypress in a PWA application?

In our Progressive Web App[PWA] we have to test a couple of Offline business cases using Cypress.
When the users are in offline, they can still perform most of the normal business operations and when they become online, everything will be pushed and sync with the server.
The user is logged in with the Microsoft credentials (SSO)
Any idea how we can run these offline tests using cypress?
Can we use puppeteer for testing the offline case?
context('Actions',()=>{
beforeEach(() => {
cy.reload(true);
cy.loadTokens();
cy.saveLocalStorage();
cy.visit('https://sometest.windows.net/home');
})
it('Verify the creation of something is possible in Offline',()=>{
Cypress.currentTest.retries(1);
cy.viewport(2000, 1000);
cy.wait(3000);
// In offline as a user, I need to perform some of the normal business functions here...
})
})
In Puppeteer you have the option to talk to DevTools and throttling the page with CDPSession class, consider the following example:
const puppeteer = require('puppeteer');
puppeteer.launch().then(async browser => {
// Create a new tab
const page = await browser.newPage()
// Connect to Chrome DevTools
const client = await page.target().createCDPSession()
// Set throttling property
await client.send('Network.emulateNetworkConditions', {
'offline': true,
'downloadThroughput': 200 * 1024 / 8,
'uploadThroughput': 200 * 1024 / 8,
'latency': 20
})
// Navigate and take a screenshot
await page.goto('https://sometest.windows.net/home')
await page.screenshot({path: 'screenshot.png'})
await browser.close()
})

Use different ip addresses in puppeteer requests

I have multiple ip interfaces in my server and I can't find how to force puppeteer to use them in its requests
I am using node v10.15.0 and puppeteer 1.11.0
You can use the flag --netifs-to-ignore when launching the browser to specify which interfaces should be ignored by Chrome. Quote from the List of Chromium Command Line Switches:
--netifs-to-ignore: List of network interfaces to ignore. Ignored interfaces will not be used for network connectivity
You can use the argument like this when launching the browser:
const browser = await puppeteer.launch({
args: ['--netifs-to-ignore=INTERFACE_TO_IGNORE']
});
Maybe this will help. You can see the full code here
'use strict';
const puppeteer = require('puppeteer');
(async() => {
const browser = await puppeteer.launch({
// Launch chromium using a proxy server on port 9876.
// More on proxying:
// https://www.chromium.org/developers/design-documents/network-settings
args: [ '--proxy-server=127.0.0.1:9876' ]
});
const page = await browser.newPage();
await page.goto('https://google.com');
await browser.close();
})();

How to use puppeteer to automante Amazon Connect CCP login?

I'm trying use puppeteer to automate the login process for our agents in Amazon Connect however I can't get puppeteer to finish loading the CCP login page. See code below:
const browser = await puppeteer.launch();
const page = await browser.newPage();
const url = 'https://ccalderon-reinvent.awsapps.com/connect/ccp#/';
await page.goto(url, {waitUntil: 'domcontentloaded'});
console.log(await page.content());
// console.log('waiting for username input');
// await page.waitForSelector('#wdc_username');
await browser.close();
I can never see the content of the page, it times out. Am I doing something wrong? If I launch the browser with { headless: false } I can see the page never finishes loading.
Please note the same code works fine with https://www.github.com/login so it must be something specific to the source code of Connect's CCP.
In case you are from future and having problem with puppeteer for no reason, try to downgrade the puppeteer version first and see if the issue persists.
This seems like a bug with Chromium Development Version 73.0.3679.0, The error log said it could not load specific script somehow, but we could still load the script manually.
The Solution:
Using Puppeteer version 1.11.0 solved this issue. But if you want to use puppeteer version 1.12.2 but with a different chromium revision, you can use the executablePath argument.
Here are the respective versions used on puppeteer (at this point of answer),
Chromium 73.0.3679.0 - Puppeteer v1.12.2
Chromium 72.0.3582.0 - Puppeteer v1.11.0
Chromium 71.0.3563.0 - Puppeteer v1.9.0
Chromium 70.0.3508.0 - Puppeteer v1.7.0
Chromium 69.0.3494.0 - Puppeteer v1.6.2
I checked my locally installed chrome,which was loading the page correctly,
$(which google-chrome) --version
Google Chrome 72.0.3626.119
Note: The puppeteer team suggested on their doc to specifically use the chrome provided with the code (most likely the latest developer version) instead of using different revisions.
Also I edited the code a little bit to finish loading when all network requests is done and the username input is visible.
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch({
headless: false,
executablePath: "/usr/bin/google-chrome"
});
const page = await browser.newPage();
const url = "https://ccalderon-reinvent.awsapps.com/connect/ccp#/";
await page.goto(url, { waitUntil: "networkidle0" });
console.log("waiting for username input");
await page.waitForSelector("#wdc_username", { visible: true });
await page.screenshot({ path: "example.png" });
await browser.close();
})();
The specific revision number can be obtained in many ways, one is to check the package.json of puppeteer package. The url for 1.11.0 is,
https://github.com/GoogleChrome/puppeteer/blob/v1.11.0/package.json
If you like to automate the chrome revision downloading, you can use browserFetcher to fetch specific revision.
const browserFetcher = puppeteer.createBrowserFetcher();
const revisionInfo = await browserFetcher.download('609904'); // chrome 72 is 609904
const browser = await puppeteer.launch({executablePath: revisionInfo.executablePath})
Result:

How to use proxy in puppeteer and headless Chrome?

Please tell me how to properly use a proxy with a puppeteer and headless Chrome. My option does not work.
const puppeteer = require('puppeteer');
(async () => {
const argv = require('minimist')(process.argv.slice(2));
const browser = await puppeteer.launch({args: ["--proxy-server =${argv.proxy}","--no-sandbox", "--disable-setuid-sandbox"]});
const page = await browser.newPage();
await page.setJavaScriptEnabled(false);
await page.setUserAgent(argv.agent);
await page.setDefaultNavigationTimeout(20000);
try{
await page.goto(argv.page);
const bodyHTML = await page.evaluate(() => new XMLSerializer().serializeToString(document))
body = bodyHTML.replace(/\r|\n/g, '');
console.log(body);
}catch(e){
console.log(e);
}
await browser.close();
})();
You can find an example about proxy at here
'use strict';
const puppeteer = require('puppeteer');
(async() => {
const browser = await puppeteer.launch({
// Launch chromium using a proxy server on port 9876.
// More on proxying:
// https://www.chromium.org/developers/design-documents/network-settings
args: [ '--proxy-server=127.0.0.1:9876' ]
});
const page = await browser.newPage();
await page.goto('https://google.com');
await browser.close();
})();
It's possible with puppeteer-page-proxy.
It supports setting a proxy for an entire page, or if you like, it can set a different proxy for each request. And yes, it works both in headless and headful Chrome.
First install it:
npm i puppeteer-page-proxy
Then require it:
const useProxy = require('puppeteer-page-proxy');
Using it is easy;
Set proxy for an entire page:
await useProxy(page, 'http://127.0.0.1:8000');
If you want a different proxy for each request,then you can simply do this:
await page.setRequestInterception(true);
page.on('request', req => {
useProxy(req, 'socks5://127.0.0.1:9000');
});
Then if you want to be sure that your page's IP has changed, you can look it up;
const data = await useProxy.lookup(page);
console.log(data.ip);
It supports http, https, socks4 and socks5 proxies, and it also supports authentication if that is needed:
const proxy = 'http://login:pass#127.0.0.1:8000'
Repository:
https://github.com/Cuadrix/puppeteer-page-proxy
do not use
"--proxy-server =${argv.proxy}"
this is a normal string instead of template literal
use ` instead of "
`--proxy-server =${argv.proxy}`
otherwise argv.proxy will not be replaced
check this string before you pass it to launch function to make sure it's correct
and you may want to visit http://api.ipify.org/ in that browser to make sure the proxy works normally
if you want to use different proxy for per page, try this, use https-proxy-agent or http-proxy-agent to proxy request for per page
You can use https://github.com/gajus/puppeteer-proxy to set proxy either for entire page or for specific requests only, e.g.
import puppeteer from 'puppeteer';
import {
createPageProxy,
} from 'puppeteer-proxy';
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const pageProxy = createPageProxy({
page,
proxyUrl: 'http://127.0.0.1:3000',
});
await page.setRequestInterception(true);
page.once('request', async (request) => {
await pageProxy.proxyRequest(request);
});
await page.goto('https://example.com');
})();
To skip proxy simply call request.continue() conditionally.
Using puppeteer-proxy Page can have multiple proxies.
You can find proxies list on Private Proxy and use it with the code below
const puppeteer = require('puppeteer');
const proxyChain = require('proxy-chain');
(async() => {
// Proxies List from Private proxies
const proxiesList = [
'http://skrll:au4....',
' http://skrll:au4....',
' http://skrll:au4....',
' http://skrll:au4....',
' http://skrll:au4....',
];
const oldProxyUrl = proxiesList[Math.floor(Math.random() * (proxiesList.length))];
const newProxyUrl = await proxyChain.anonymizeProxy(oldProxyUrl);
const browser = await puppeteer.launch({
headless: true,
ignoreHTTPSErrors: true,
args: [
`--proxy-server=${newProxyUrl}`,
`--ignore-certificate-errors`,
`--no-sandbox`,
`--disable-setuid-sandbox`
]
});
const page = await browser.newPage();
await page.authenticate();
//
// you code here
//
// close proxy chain
await proxyChain.closeAnonymizedProxy(newProxyUrl, true);
})();
You can find the full post here
I see https://github.com/Cuadrix/puppeteer-page-proxy and https://github.com/gajus/puppeteer-proxy recommended above, and I want to emphasize that these two packages are technically not using Chrome instance to perform actual network request, here is what they are doing instead:
when the user code initiates network request of Puppeteer, e.g. calls page.goto(), the proxy package intercepts this outgoing HTTP request and pauses it
the proxy package passes the request to another network library (Got)
Got performs actual network request, through the proxy specified
Got now needs to pass all the network response data back to Puppeteer! This means a bunch of interesting things the proxy package now needs to manage, like copying cookie headers from raw HTTP set-cookie format to puppeteer format
While this might be a viable approach for a lot of cases, you need to understand that this changes your HTTP request TLS fingerprint so your HTTP request might get blocked by some websites, particularly the ones which are using Cloudflare bot detection (because the website now sees that your request originates from Node.js, not from Chrome).
Alternative method of setting proxy in Puppeteer.
Launch args of Chrome are good if you want to use one proxy for all websites. What if you still want to have one Chrome instance use multiple proxies, but you don't want to use 2 packages mentioned above?
createIncognitoBrowserContext Puppeteer function to the rescue:
// Create a new incognito browser context
const context = await browser.createIncognitoBrowserContext({ proxy: 'http://localhost:2022' });
// Create a new page inside context.
const page = await context.newPage();
// authenticate in proxy using basic browser auth
await page.authenticate({username:user, password:password});
// ... do stuff with page ...
await page.goto('https://example.com');
// Dispose context once it's no longer needed.
await context.close();
proxy-chain package
If your proxy requires auth, and you don't like the page.authenticate call, the proxy might be set using proxy-chain npm package.
proxy-chain launches intermediate proxy on your localhost which allows to do some nice things. Read more on technical details of proxy-chain package implementation: https://pixeljets.com/blog/how-to-set-proxy-in-puppeteer
According to my experience, all above fail due to different reasons.
I find that applying proxy on the entire OS works each time. I get no proxy fails. This strategy works on both Windows and Linux.
This way, I get zero puppeteer bot failures. Bear in mind, I am spinning up 7000 bots per server. I am running this on 7 servers.

Categories

Resources