This is in continuation of this thread: Is there a way in TestCafe to validate Chrome network calls?
Here is my testCafe attempt to retrieve all the network logs(i.e. network tab in developer tools) from chrome. I am having issues getting anything printed on console.
const logger = RequestLogger('https://studenthome.com',{
logRequestHeaders: true,
logRequestBody: true,
logResponseHeaders: true,
logResponseBody: true
});
test
('My test - Demo', async t => {
await t.navigateTo('https://appURL.com/app/home/students');//navigate to app launch
await page_students.click_studentNameLink();//click on student name
await t
.expect(await page_students.exists_ListPageHeader()).ok('do something async', { allowUnawaitedPromise: true }) //validate list header
await t
.addRequestHooks(logger) //start tracking requests
let url = await page_studentList.click_tab();//click on the tab for which requests need to be validated
let c = await logger.count; //check count of request. Should be 66
await console.log(c);
await console.log(logger.requests[2]); // get the url for 2nd request
});
I see this in console:
[Function: count]
undefined
Here is picture from google as an illustration of what I am trying to achieve. I navigate to google.com and opened up developer tools> network tab. Then I clicked on store link and captured logs. The request URLs I am trying to collect are highlighted. I can get all the urls and then filter to the one I require.
The following, I have already tried
await console.log(logger.requests); // undefined
await console.log(logger.requests[*]); // undefined
await console.log(logger.requests[0].response.headers); //undefined
await logger.count();//count not a function
I would appreciate if someone could point me in the right direction?
You are using different urls in your test page ('https://appURL.com/app/home/students') and your logger ('https://studenthome.com'). This is probably the cause.
Your Request Logger records only requests to 'https://studenthome.com'.
In your screenshot I see the url 'http://store.google.com', which differs from the logger url, so the logger does not process it.
You can pass a RegExp as a first arg of the RequestLogger constructor to all requests which match your RegExp.
I have created a sample:
import { RequestLogger } from 'testcafe';
const logger = RequestLogger(/google/, {
logRequestHeaders: true,
logRequestBody: true,
logResponseHeaders: true,
logResponseBody: true
});
fixture `test`
.page('http://google.com');
test('test', async t => {
await t.addRequestHooks(logger);
await t.typeText('input[name="q"]', 'test');
await t.typeText('input[name="q"]', '1');
await t.typeText('input[name="q"]', '2');
await t.pressKey('enter');
const logRecord = logger.requests.length;
console.log(logger.requests.map(r => r.request.url));
});
Related
I am trying to run puppeteer with proxy chain package on aws lambda but I am getting this error message:
"errorType": "Error",
"errorMessage": "Protocol error (Target.createTarget): Target closed.",
Code:
const chromium = require('chrome-aws-lambda');
const { addExtra } = require("puppeteer-extra");
const puppeteerExtra = addExtra(chromium.puppeteer);
const proxyChain = require('proxy-chain');
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
puppeteerExtra.use(StealthPlugin());
exports.handler = async (event, context, callback) => {
let finalResult = [];
const url = ``;
let browser;
const oldProxyUrl = ''; // --> bright data proxy
const newProxyUrl = await proxyChain.anonymizeProxy(oldProxyUrl);
console.log("newProxyUrl", newProxyUrl)
try {
browser = await puppeteerExtra.launch({
args: ['--no-sandbox', '--disable-setuid-sandbox', `--proxy-server=${newProxyUrl}`],
defaultViewport: chromium.defaultViewport,
executablePath: await chromium.executablePath,
headless: chromium.headless
});
const page = await browser.newPage();
await page.goto(url);
finalResult = await extractElements(page);
} catch (error) {
return callback(error);
} finally {
await browser.close();
}
return callback(null, finalResult);
};
Above code works fine on aws lambda without proxy-server url. I also tested same code without proxy server url on serverless functions like vercel and netlify and it worked. Only issue is when I add proxy server url it throws protocol error.
Here are a few things you can try to troubleshoot this issue:
Make sure that the url variable has a value. This is currently an empty string, which means that the page.goto() method will not have a valid URL to navigate to.
Make sure that the oldProxyUrl variable has a value. This is currently an empty string, which means that the proxyChain.anonymizeProxy() method will not have a valid proxy to anonymize.
Make sure that the extractElements() function is defined and can be called. This function is not present in the code you provided, so you may need to include it or modify the code to remove the call to this function.
Check the logs of your AWS Lambda function to see if there are any additional error messages that might provide more information about the issue.
Check the documentation for the puppeteer-extra-plugin-stealth and proxy-chain packages to see if there are any known issues or compatibility issues with AWS Lambda.
I just started coding, and I was wondering if there was a way to open multiple tabs concurrently with one another. Currently, my code goes something like this:
const puppeteer = require("puppeteer");
const rand_url = "https://www.google.com";
async function initBrowser() {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto(rand_url);
await page.setViewport({
width: 1200,
height: 800,
});
return page;
}
async function login(page) {
await page.goto("https://www.google.com");
await page.waitFor(100);
await page.type("input[id ='user_login'", "xxx");
await page.waitFor(100);
await page.type("input[id ='user_password'", "xxx");
}
this is not my exact code, replaced with different aliases, but you get the idea. I was wondering if there was anyone out there that knows the code that allows this same exact browser to be opened on multiple instances, replacing the respective login info only. Of course, it would be great to prevent my IP from getting banned too, so if there was a way to apply proxies to each respective "browser"/ instance, that would be perfect.
Lastly, I would like to know whether or not playwright or puppeteer is superior in the way they can handle these multiple instances. I don't even know if this is a possibility, but please enlighten me. I want to learn more.
You can use multiple browser window as different login/cookies.
For simplicity, you can use the puppeteer-cluster module by Thomas Dondorf.
This module can make your puppeteer launched and queued one by one so that you can use this to automating your login, and even save login cookies for the next launches.
Feel free to go to the Github: https://github.com/thomasdondorf/puppeteer-cluster
const { Cluster } = require('puppeteer-cluster')
(async () => {
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_CONTEXT,
maxConcurrency: 2, // <= this is the number of
// parallel task running simultaneously
}) // You can change to the number of CPU
const cpuNumber = require('os').cpus().length // for example
await cluster.task(async ({ page, data: [username, password] }) => {
await page.goto('https://www.example.com')
await page.waitForTimeout(100)
await page.type('input[id ="user_login"', username)
await page.waitForTimeout(100)
await page.type('input[id ="user_password"', password)
const screen = await page.screenshot()
// Store screenshot, Save Cookies, do something else
});
cluster.queue(['myFirstUsername', 'PassW0Rd1'])
cluster.queue(['anotherUsername', 'Secr3tAgent!'])
// cluster.queue([username, password])
// username and password array passed into cluster task function
// many more pages/account
await cluster.idle()
await cluster.close()
})()
For Playwright, sadly still unsupported by the module above,you can use browser pool (cluster) module to automating the Playwright launcher.
And for proxy usage, I recommend Puppeteer library as the legendary one.
Don't forget to choose my answer as the right one, if this helps you.
There are profiling and proxy options; you could combine them to achieve your goal:
Profile, https://playwright.dev/docs/api/class-browsertype#browser-type-launch-persistent-context
import { chromium } from 'playwright'
const userDataDir = /tmp/ + process.argv[2]
const browserContext = await chromium.launchPersistentContext(userDataDir)
// ...
Proxy, https://playwright.dev/docs/api/class-browsertype#browser-type-launch
import { chromium } from 'playwright'
const proxy = { /* secret */ }
const browser = await chromium.launch({
proxy: { server: 'pre-context' }
})
const browserContext = await browser.newContext({
proxy: {
server: `http://${proxy.ip}:${proxy.port}`,
username: proxy.username,
password: proxy.password,
}
})
// ...
Here is the error that it is returning
Here is a picture of the error i'm getting when I run npm install puppeter
I found some stuff online with permissions, but this is about node_package space. It is not disk space as I've looked over my disk storage availability and there's plenty. Working on an Apify SDK, and I'm following the documentation, but the console is returning a whole bunch of error messages.
Can someone please help?
const Apify = require('apify')
Apify.main(async () => {
const requestQueue = await Apify.openRequestQueue();
await requestQueue.addRequest({ url: 'https://www.iana.org/' });
const crawler = new Apify.PuppeteerCrawler({
requestQueue,
handlePageFunction: async ({ request, page }) => {
const title = await page.title();
console.log(`Title of ${request.url}: ${title}`);
await Apify.utils.enqueueLinks({
requestQueue,
page,
pseudoUrls: ['https://www.iana.org/[.*]'],
});
},
});
await crawler.run();
});
EDIT for Mission Clarity: In the end I am pulling inventory data and customer data from Postgres to render and send a bunch of PDFs to customers, once per month.
These PDFs are dynamic in that the cover page will have varying customer name/address. The next page(s) are also dynamic as they are lists of a particular customer's expiring inventory with item/expirying date/serial number.
I had made a client-side React page with print CSS to render some print-layout letters that could be printed off/saved as a pretty PDF.
Then, the waterfall spec came in that this was to be an automated process on the server. Basically, the PDF needs attached to an email alerting customers of expiring product (in med industry where everything needs audited).
I thought using Puppeteer would be a nice and easy switch. Just add a route that processes all customers, looking up whatever may be expiring, and then passing that into the dynamic react page to be rendered headless to a PDF file (and eventually finish the whole rest of the plan, sending email, etc.). Right now I just grab 10 customers and their expiring stock for PoC, then I have basically: { customer: {}, expiring: [] }.
I've attempted using POST to page with interrupt, but I guess that makes sense that I cannot get post data in the browser.
So, I switched my approach to using cookies. This I would expect to work, but I can never read the cookie(s) into the page.
Here is a: Simple route, simple puppeteer which writes out cookies to a json and takes a screenshot just for proof, and simple HTML with script I'm using just to try to prove I can pass data along.
server/index.js:
app.get('/testing', async (req, res) => {
console.log('GET /testing');
res.sendFile(path.join(__dirname, 'scratch.html'));
});
scratch.js (run at commandline node ./scratch.js:
const puppeteer = require('puppeteer')
const fs = require('fs');
const myCookies = [{name: 'customer', value: 'Frank'}, {name: 'expiring', value: JSON.stringify([{a: 1, b: 'three'}])}];
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('http://localhost:1234/testing', { waitUntil: 'networkidle2' });
await page.setCookie(...myCookies);
const cookies = await page.cookies();
const cookieJson = JSON.stringify(cookies);
// Writes expected cookies to file for sanity check.
fs.writeFileSync('scratch_cookies.json', cookieJson);
// FIXME: Cookies never get appended to page.
await page.screenshot({path: 'scratch_shot.png'});
await browser.close();
})();
server/scratch.html:
<html>
<body>
</body>
<script type='text/javascript'>
document.write('Cookie: ' + document.cookie);
</script>
</html>
The result is just a PNG with the word "Cookie:" on it. Any insight appreciated!
This is the actual route I'm using where makeExpiryLetter is utilizing puppeteer, but I can't seem to get it to actually read the customer and rows data.
app.get('/create-expiry-letter', async (req, res) => {
// Create PDF file using puppeteer to render React page w/ data.
// Store in Db.
// Email file.
// Send final count of letters sent back for notification in GUI.
const cc = await dbo.getConsignmentCustomers();
const result = await Promise.all(cc.rows.map(async x => {
// Get 0-60 day consignments by customer_id;
const { rows } = await dbo.getExpiry0to60(x.customer_id);
if (rows && rows.length > 0) {
const epiryLetter = await makeExpiryLetter(x, rows); // Uses puppeteer.
// TODO: Store in Db / Email file.
return true;
} else {
return false;
}
}));
res.json({ emails_sent: result.filter(x => x === true).length });
});
Thanks to the samples from #ggorlen I've made huge headway in using cookies. In my inline script of expiry.html I'm grabbing the values by wrapping my render function in function main () and adding onload to body tag <body onload='main()'.
Inside the main function we can grab the values I needed:
const customer = JSON.parse(document.cookie.split('; ').find(row => row.startsWith('customer')).split('=')[1]);
const expiring = JSON.parse(document.cookie.split('; ').find(row => row.startsWith('expiring')).split('=')[1]);
FINALLY (and yes, of course this will all be used in an automated worker in the end) I can get my beautifully rendered PDF like so:
(async () => {
const browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.setCookie(...myCookies);
await page.goto('http://localhost:1234/testing');
await page.pdf({ path: `scratch-expiry-letter.pdf`, format: 'letter' });
await browser.close();
})();
The problem is here:
await page.goto('http://localhost:1234/testing', { waitUntil: 'networkidle2' });
await page.setCookie(...myCookies);
The first line says, go to the page. Going to a page involves parsing the HTML and executing scripts, including your document.write('Cookie: ' + document.cookie); line in scratch.html, at which time there are no cookies on the page (assuming a clear browser cache).
After the page is loaded, await page.goto... returns and the line await page.setCookie(...myCookies); runs. This correctly sets your cookies and the remaining lines execute. const cookies = await page.cookies(); runs and pulls the newly-set cookies out and you write them to disk. await page.screenshot({path: 'scratch_shot.png'}); runs, taking a shot of the page without the DOM updated with the new cookies that were set after the initial document.write call.
You can fix this problem by turning your JS on the scratch.html page into a function that can be called after page load and cookies are set, or injecting such a function dynamically with Puppeteer using evaluate:
const puppeteer = require('puppeteer');
const myCookies = [
{name: 'customer', value: 'Frank'},
{name: 'expiring', value: JSON.stringify([{a: 1, b: 'three'}])}
];
(async () => {
const browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.goto('http://localhost:1234/testing');
await page.setCookie(...myCookies);
// now that the cookies are ready, we can write to the document
await page.evaluate(() => document.write('Cookie' + document.cookie));
await page.screenshot({path: 'scratch_shot.png'});
await browser.close();
})();
A more general approach is to set the cookies before navigation. This way, the cookies will already exist when any scripts that might use them run.
const puppeteer = require('puppeteer');
const myCookies = [
{
name: 'expiring',
value: '[{"a":1,"b":"three"}]',
domain: 'localhost',
path: '/',
expires: -1,
size: 29,
httpOnly: false,
secure: false,
session: true,
sameParty: false,
sourceScheme: 'NonSecure',
sourcePort: 80
},
{
name: 'customer',
value: 'Frank',
domain: 'localhost',
path: '/',
expires: -1,
size: 13,
httpOnly: false,
secure: false,
session: true,
sameParty: false,
sourceScheme: 'NonSecure',
sourcePort: 80
}
];
(async () => {
const browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.setCookie(...myCookies);
await page.goto('http://localhost:1234/testing');
await page.screenshot({path: 'scratch_shot.png'});
await browser.close();
})();
That said, I'm not sure if cookies are the easiest or best way to do what you're trying to do. Since you're serving HTML, you could pass the data along with it statically, expose a separate API route to collect a customer's data which the front end can use, or pass GET parameters, depending on the nature of the data and what you're ultimately trying to accomplish.
You could even have a file upload form on the React app, then have Puppeteer upload the JSON data into the app programmatically through that form.
In fact, if your final goal is to dynamically generate a PDF, using React and Puppeteer might be overkill, but I'm not sure I have a better solution to offer without some research and additional context about your use case.
Currently I have my Puppeteer running with a Proxy on Heroku. Locally the proxy relay works totally fine. I however get the error Error: net::ERR_TUNNEL_CONNECTION_FAILED. I've set all .env info in the Heroku config vars so they are all available.
Any idea how I can fix this error and resolve the issue?
I currently have
const browser = await puppeteer.launch({
args: [
"--proxy-server=https=myproxy:myproxyport",
"--no-sandbox",
'--disable-gpu',
"--disable-setuid-sandbox",
],
timeout: 0,
headless: true,
});
page.authentication
The correct format for proxy-server argument is,
--proxy-server=HOSTNAME:PORT
If it's HTTPS proxy, you can pass the username and password using page.authenticate before even doing a navigation,
page.authenticate({username:'user', password:'password'});
Complete code would look like this,
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless:false,
ignoreHTTPSErrors:true,
args: ['--no-sandbox','--proxy-server=HOSTNAME:PORT']
});
const page = await browser.newPage();
// Authenticate Here
await page.authenticate({username:user, password:password});
await page.goto('https://www.example.com/');
})();
Proxy Chain
If somehow the authentication does not work using above method, you might want to handle the authentication somewhere else.
There are multiple packages to do that, one is proxy-chain, with this, you can take one proxy, and use it as new proxy server.
The proxyChain.anonymizeProxy(proxyUrl) will take one proxy with username and password, create one new proxy which you can use on your script.
const puppeteer = require('puppeteer');
const proxyChain = require('proxy-chain');
(async() => {
const oldProxyUrl = 'http://username:password#hostname:8000';
const newProxyUrl = await proxyChain.anonymizeProxy(oldProxyUrl);
// Prints something like "http://127.0.0.1:12345"
console.log(newProxyUrl);
const browser = await puppeteer.launch({
args: [`--proxy-server=${newProxyUrl}`],
});
// Do your magic here...
const page = await browser.newPage();
await page.goto('https://www.example.com');
})();