How to disable google's auto translate in Puppeteer? - javascript

How to disable google's auto translate in Puppeteer?
I have tried these:
Using arguments
const browser = await puppeteer.launch({
headless: false,
args: ['--lang=bn-BD,bn']
});
Sending the language as header
await page.setExtraHTTPHeaders({
'Accept-Language': 'bn'
});
Also setting the arguments like this
await page.evaluateOnNewDocument(() => {
Object.defineProperty(navigator, "language", {
get: function() {
return "bn-BD";
}
});
Object.defineProperty(navigator, "languages", {
get: function() {
return ["bn-BD", "bn"];
}
});
});
Browser:
Chromium Version 108.0.5351.0 (Developer Build) (64-bit)

Related

Puppeteer not actually downloading ZIP despite Clicking Link

I've been making incremental progress, but I'm fairly stumped at this point.
This is the site I'm trying to download from https://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp
The reason I'm using Puppeteer is because I can't find a supported API to get this data (if there is one happy to try it)
The link is "Download Raw Data"
My script runs to the end, but doesn't seem to actually download any files. I tried installing puppeteer-extra and setting the downloads path:
const puppeteer = require("puppeteer-extra");
const { executablePath } = require('puppeteer')
...
var dir = "/home/ubuntu/AirlineStatsFetcher/downloads";
console.log('dir to set for downloads', dir);
puppeteer.use(require('puppeteer-extra-plugin-user-preferences')
(
{
userPrefs: {
download: {
prompt_for_download: false,
open_pdf_in_system_reader: true,
default_directory: dir,
},
plugins: {
always_open_pdf_externally: true
},
}
}));
const browser = await puppeteer.launch({
headless: true, slowMo: 100, executablePath: executablePath()
});
...
// Doesn't seem to work
await page.waitForSelector('table > tbody > tr > .finePrint:nth-child(3) > a:nth-child(2)');
console.log('Clicking on link to download CSV');
await page.click('table > tbody > tr > .finePrint:nth-child(3) > a:nth-child(2)');
After a while I figured why not tried to build the full URL and then do a GET request but then i run into other problems (UNABLE_TO_VERIFY_LEAF_SIGNATURE). Before going down this route farther (which feels a little hacky) I wanted to ask advice here.
Is there something I'm missing in terms of configuration to get it to download?
Downloading files using puppeteer seems to be a moving target btw not well supported today. For now (puppeteer 19.2.2) I would go with https.get instead.
"use strict";
const fs = require("fs");
const https = require("https");
// Not sure why puppeteer-extra is used... maybe https://stackoverflow.com/a/73869616/1258111 solves the need in future.
const puppeteer = require("puppeteer-extra");
const { executablePath } = require("puppeteer");
(async () => {
puppeteer.use(
require("puppeteer-extra-plugin-user-preferences")({
userPrefs: {
download: {
prompt_for_download: false,
open_pdf_in_system_reader: false,
},
plugins: {
always_open_pdf_externally: false,
},
},
})
);
const browser = await puppeteer.launch({
headless: true,
slowMo: 100,
executablePath: executablePath(),
});
const page = await browser.newPage();
await page.goto(
"https://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp ",
{
waitUntil: "networkidle2",
}
);
const handle = await page.$(
"table > tbody > tr > .finePrint:nth-child(3) > a:nth-child(2)"
);
const relativeZipUrl = await page.evaluate(
(anchor) => anchor.getAttribute("href"),
handle
);
const url = "https://www.transtats.bts.gov/OT_Delay/".concat(relativeZipUrl);
const encodedUrl = encodeURI(url);
//Don't use in production
https.globalAgent.options.rejectUnauthorized = false;
https.get(encodedUrl, (res) => {
const path = `${__dirname}/download.zip`;
const filePath = fs.createWriteStream(path);
res.pipe(filePath);
filePath.on("finish", () => {
filePath.close();
console.log("Download Completed");
});
});
await browser.close();
})();

Playwright - Javascript - Maximize Browser

I am trying to maximize the browser window using Playwright. I tried below code but the browser is not maximized to full mode.
hooks.cjs class:
const playwright = require('playwright');
const { BeforeAll, Before, After, AfterAll , Status } = require('#cucumber/cucumber');
// Launch options.
const options = {
slowMo: 100,
ignoreHTTPSErrors: true,
};
// Create a global browser for the test session.
BeforeAll(async () => {
console.log('Launch Browser');
global.browser = await playwright['chromium'].launch({ headless: false,
args:['--window-size=1920,1040']}); // --start-maximized //defaultViewport: null
//global.context = await playwright['chromium'].launch({ args: ['--start-maximized'] });
})
// Create a new browser context for each test.
Before(async () => {
console.log('Create a new Context and Page')
global.context = await global.browser.newContext()
global.page = await global.context.newPage()
})
// Close the page and context after each test.
After(async () => {
console.log('Close Context and Page');
await global.page.close();
await global.context.close();
})
// Take Screenshot if Scenario Fails
After(async function (scenario) {
if (scenario.result.status === Status.FAILED) {
const buffer = await global.page.screenshot({ path: `reports/${scenario.pickle.name}.png`, fullPage: true })
this.attach(buffer, 'image/png');
}
})
// Close Browser
AfterAll(async () => {
console.log('close Browser')
await global.browser.close()
});
I'm running with npm: npm run test -- --tags "#Begin"
I have tried in many ways, but I cannot launch browser maximized. How can I do this?
If you want to start your browser maximized, then use the --start-maximized flag when starting the browser, and disable fixed viewport when launching a context or page. Example
global.browser = await playwright['chromium'].launch({ headless: false,
args:['--start-maximized']});
global.context = await global.browser.newContext({ viewport: null})
In python, use the equivalent no_viewport arg and pass it as True when starting a context

Playwright + Brave, error when navigating to settings page

Unfortunately, Brave does not have a launch arg for certain settings I need to override. So I am trying to navigate to the settings page manually.
import { chromium } from "playwright-extra";
import stealth from "puppeteer-extra-plugin-stealth";
export default async function createStealthBrowser(
braveExecutablePath: string
) {
chromium.use(stealth());
let browser = await chromium.launch({
headless: false,
executablePath: braveExecutablePath,
});
let context = await browser.newContext({ bypassCSP: true });
let page = await context.newPage();
await page.goto("brave://settings/shields");
return browser;
}
And here is my test:
import { test, expect } from "#playwright/test";
import createStealthBrowser from "../StealthBrowser";
test("Create a stealth browser", async () => {
let browser = await createStealthBrowser(
"C:/Program Files/BraveSoftware/Brave-Browser/Application/brave.exe"
);
await browser.close();
});
The test fails with the following output:
page.goto: Navigation failed because page was closed!
=========================== logs ===========================
navigating to "brave://settings/shields", waiting until "load"
============================================================

Problem overwriting navigator.languages in puppeteer

I have a problem that I can't solve, to finish making anonymous puppeteer.
So far I have passed all the anti-bot tests, but I can't configure the language, let me explain:
Overwriting the user agent, I manage to change the "navigator.language" from "en-US, in" to "es-ES, es"
But I have tried everything and I am not able to overwrite the "navigator.languages" it always remains in "en-US, en"
I hope there is someone who can help me change the languages.
I attach screenshots and link of the plugin I use.
https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth
https://github.com/berstend/puppeteer-extra/blob/master/packages/puppeteer-extra-plugin-stealth/evasions/user-agent-override/index.js
const puppeteer = require("puppeteer-extra");
// add stealth plugin and use defaults (all evasion techniques)
const stealth_plugin = require("puppeteer-extra-plugin-stealth");
const stealth = stealth_plugin();
puppeteer.use(stealth);
const UserAgentOverride = require("puppeteer-extra-plugin-stealth/evasions/user-agent-override");
const ua = UserAgentOverride({locale: "es-ES,es;q=0.9", userAgent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.141 Safari/537.36", platform: "MacIntel"});
const path = require('path')
const websites = require('./websites.json')
async function run() {
puppeteer.use(ua);
const browser = await puppeteer.launch({
headless: false,
userDataDir: "./cache",
ignoreHTTPSErrors: true,
ignoreDefaultArgs: [
"--disable-extensions",
"--enable-automation",
],
args: [
"--lang=es-ES,es;q=0.9",
"--no-sanbox",
"--disable-dev-shm-usage",
"--disable-gpu"
]
})
console.log(await browser.userAgent());
const page = await browser.newPage()
const pathRequire = path.join(__dirname, 'src/scripts/index.js')
for (const website of websites) {
require(pathRequire)(page, website)
}
}
run().catch(error => { console.error("Something bad happend...", error); });
Image of anti bot test results:
Hi there
Thanks for the answer, after testing the edited code, I have noticed the following:
when I launch the browser, once any url is entered, the configuration disappears.
however if I don't put any url, it passes the test perfectly.
And even without putting url it is well configured, I attach two images one with url and one without, I don't understand what I can do and I have tried everything.
Object.getOwnPropertyDescriptors (navigator.languages)
it's writable using the languages evasion:
[value] => en-US
[writable] => 1
[enumerable] => 1
[configurable] => 1
while it should be
configurable: false
enumerable: true
value: "es-ES"
writable: false
Image of anti bot test results
Image of anti bot test results
I have managed to keep the specified languages every time a new page is launched, but I still do not resolve the default permissions in a chrome browser:
Object.getOwnPropertyDescriptors (navigator.languages)
while it should be
configurable: false
enumerable: true
value: "es-ES"
writable: false
If anyone knows how to solve this I would appreciate it.
const websites = require('./websites.json')
async function run() {
puppeteer.use(ua);
const optionslaunch = require("./src/scripts/options/optionslaunch");
const browser = await puppeteer.launch(optionslaunch)
const page = await browser.newPage()
// Set the language forcefully on javascript
await page.evaluateOnNewDocument(() => {
Object.defineProperty(navigator, "language", {
get: function () {
return "es-ES";
}
});
Object.defineProperty(navigator, "languages", {
get: function () {
return ["es-ES", "es"];
}
});
});
const pathRequire = path.join(__dirname, 'src/scripts/app.js')
for (const website of websites) {
// require(pathRequire)(page, pageEmail, website)
require(pathRequire)(page, website)
}
}
run().catch(error => { console.error("Something bad happend...", error); });

page.setCookie as no affect when using Puppeteer on Ubuntu

I am having a strange issue with Puppeteer. Current cookie setting code as follows:
(Save cookie)
const cookies = await page.cookies();
await checkMongoConnection();
account.cookies = JSON.stringify(cookies, null, null);
await account.save();
await closeMongoConnection();
(Load Cookie)
const options = {
headless: true,
defaultViewport: { width: 1366, height: 768 },
ignoreHTTPSErrors: true,
args: [
'--disable-sync',
'--ignore-certificate-errors'
],
ignoreDefaultArgs: ['--enable-automation']
};
const browser = await puppeteer.launch(options);
const page = await browser.newPage();
// Cookies
if (account.cookies) {
// I have checked this with console.log it does contain cookies
const cookies = JSON.parse(account.cookies);
await page.setCookie(...cookies);
}
await page.goto('https://www.some-website.com');
This works without any issues when run on macOS (both with headless set to false and true), also note I am using Chromium.
However when I try to run this setup on my Linux Ubuntu server just setting the cookie has no affect. Has anyone else come across this issue before? Any ideas what I might be doing wrong here?
When I log the cookies I get from the database I get something like:
[
{
name: 'personalization_id',
value: '"v1_FijCjdT7iRj3K+cbhPiPIg=="',
domain: '.somedomain.com',
path: '/',
expires: 1664967308.337118,
size: 47,
httpOnly: false,
secure: true,
session: false,
sameSite: 'None'
}, ....
I have tried logging the cookies after they are set:
// Cookies
if (account.cookies) {
const cookies = JSON.parse(account.cookies);
await page.setCookie(...cookies);
}
console.log('check cookies');
const newCookies = await page.cookies();
console.log(newCookies);
This just results in an empty array [], so it seems they are refusing to set.

Categories

Resources