Send message to console.log (jest puppeteer) - javascript

Why I can't see my messages at console.log in page.evaluate, page.$, page.$$, page.$eval, page.$$eval
And can't to get access to variables out that?
let variable = 0;
const divColors = await page.evaluate(() => {
const divs = Array.from(document.querySelectorAll('.map-filters div'));
let text = divs.map((element, index) => {
console.log(element.textContent)
variable =1;
return element.style.color;
})
return text;
})
Why I can't do variable=1 and console.log(element.textContent) in this example?

You're using console.log inside of page.evaluate, so it is logging its output to the Chromium browser and not to node output. To see console messages from browser in node's console one needs to subscribe to them after page object is created and before console.log is used in the script:
const page = await browser.newPage();
page.on('console', consoleObj => console.log(consoleObj.text()));
page.evaluate(...);
As for variable variable, there are in fact two of them in your script.
The first one exists in node.js context:
let variable = 0;
And the other one — in web page context:
page.evaluate( () => {
variable = 1;
})
They are completely different. Think of page.evaluate as of a portal into another world: objects that exist there are only present inside of a javascript runtime on a page open in the web browser that puppeteer is driving. node has its own runtime with its own set of objects.
You may pass data into page.evaluate from node:
let variable = 420;
page.evaluate(variable => {
// it is now passed here from outside
console.log(variable)
}, variable);

Related

Get all span elements within a div using puppeteer [duplicate]

I am trying out Puppeteer. This is a sample code that you can run on: https://try-puppeteer.appspot.com/
The problem is this code is returning an array of empty objects:
[{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{}]
Am I making a mistake?
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://reddit.com/');
let list = await page.evaluate(() => {
return Promise.resolve(Array.from(document.querySelectorAll('.title')));
});
console.log(JSON.stringify(list))
await browser.close();
The values returned from evaluate function should be json serializeable.
https://github.com/GoogleChrome/puppeteer/issues/303#issuecomment-322919968
the solution is to extract the href values from the elements and return it.
await this.page.evaluate((sel) => {
let elements = Array.from(document.querySelectorAll(sel));
let links = elements.map(element => {
return element.href
})
return links;
}, sel);
Problem:
The return value for page.evaluate() must be serializable.
According to the Puppeteer documentation, it says:
If the function passed to the page.evaluate returns a non-Serializable value, then page.evaluate resolves to undefined. DevTools Protocol also supports transferring some additional values that are not serializable by JSON: -0, NaN, Infinity, -Infinity, and bigint literals.
In other words, you cannot return an element from the page DOM environment back to the Node.js environment because they are separate.
Solution:
You can return an ElementHandle, which is a representation of an in-page DOM element, back to the Node.js environment.
Use page.$$() to obtain an ElementHandle array:
let list = await page.$$('.title');
Otherwise, if you want to to extract the href values from the elements and return them, you can use page.$$eval():
let list = await page.$$eval('.title', a => a.href);
I faced the similar problem and i solved it like this;
await page.evaluate(() =>
Array.from(document.querySelectorAll('.title'),
e => e.href));

document.querySelectorAll() not returning expected results (in puppeteer) [duplicate]

I am trying out Puppeteer. This is a sample code that you can run on: https://try-puppeteer.appspot.com/
The problem is this code is returning an array of empty objects:
[{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{}]
Am I making a mistake?
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://reddit.com/');
let list = await page.evaluate(() => {
return Promise.resolve(Array.from(document.querySelectorAll('.title')));
});
console.log(JSON.stringify(list))
await browser.close();
The values returned from evaluate function should be json serializeable.
https://github.com/GoogleChrome/puppeteer/issues/303#issuecomment-322919968
the solution is to extract the href values from the elements and return it.
await this.page.evaluate((sel) => {
let elements = Array.from(document.querySelectorAll(sel));
let links = elements.map(element => {
return element.href
})
return links;
}, sel);
Problem:
The return value for page.evaluate() must be serializable.
According to the Puppeteer documentation, it says:
If the function passed to the page.evaluate returns a non-Serializable value, then page.evaluate resolves to undefined. DevTools Protocol also supports transferring some additional values that are not serializable by JSON: -0, NaN, Infinity, -Infinity, and bigint literals.
In other words, you cannot return an element from the page DOM environment back to the Node.js environment because they are separate.
Solution:
You can return an ElementHandle, which is a representation of an in-page DOM element, back to the Node.js environment.
Use page.$$() to obtain an ElementHandle array:
let list = await page.$$('.title');
Otherwise, if you want to to extract the href values from the elements and return them, you can use page.$$eval():
let list = await page.$$eval('.title', a => a.href);
I faced the similar problem and i solved it like this;
await page.evaluate(() =>
Array.from(document.querySelectorAll('.title'),
e => e.href));

How to call an on page function with playwright?

I'm using playwright to scrape some data from a page. I need to call a page function in order to change the browser's state to collect additional info.
What's the syntax for getting a function name from a data attribute and then calling that function on the page?
I keep getting the error :UnhandledPromiseRejectionWarning: page.evaluate: Evaluation failed: TypeError: cb is not a function
Here's what I have so far:
const { chromium} = require("playwright");
(async()=>{
this.browser = await chromium.launch({
headless: true,
});
this.context = await this.browser.newContext();
this.page = await this.context.newPage();
this.page.goto('http://fakeExample.org')
const callbackHandle = await this.page.$('[data-callback]');
const cbName = await callbackHandle.evaluate(element=>element.getAttribute('data-callback')); //returns actual page function name 'myPageFunction'
this.page.evaluate((cb) => {
cb() //should call myPageFunction() on the page
}, cbName)
})()
I think it comes down to either window[cb]() or eval(cb) since you're passing in a string with the function name.
Some reading on this topic:
How to execute a JavaScript function when I have its name as a string
Call a JavaScript function name using a string?
Call a function whose name is stored in a variable

Cannot get querySelectorAll to work with puppeteer (returns undefined)

I'm trying to practice some web scraping with prices from a supermarket. It's with node.js and puppeteer. I can navigate throught the website in beginning with accepting cookies and clicking a "load more button". But then when I try to read div's containing the products with querySelectorAll I get stuck. It returns undefined even though I wait for a specific div to be present. What am I missing?
Problem is at the end of the code block.
const { product } = require("puppeteer");
const scraperObjectAll = {
url: 'https://www.bilkatogo.dk/s/?query=',
async scraper(browser) {
let page = await browser.newPage();
console.log(`Navigating to ${this.url}`);
await page.goto(this.url);
// accept cookies
await page.evaluate(_ => {
CookieInformation.submitAllCategories();
});
var productsRead = 0;
var productsTotal = Number.MAX_VALUE;
while (productsRead < 100) {
// Wait for the required DOM to be rendered
await page.waitForSelector('button.btn.btn-dark.border-radius.my-3');
// Click button to read more products
await page.evaluate(_ => {
document.querySelector("button.btn.btn-dark.border-radius.my-3").click()
});
// Wait for it to load the new products
await page.waitForSelector('div.col-10.col-sm-4.col-lg-2.text-center.mt-4.text-secondary');
// Get number of products read and total
const loadProducts = await page.evaluate(_ => {
let p = document.querySelector("div.col-10.col-sm-4.col-lg-2").innerText.replace("INDLÆS FLERE", "").replace("Du har set ","").replace(" ", "").replace(/(\r\n|\n|\r)/gm,"").split("af ");
return p;
});
console.log("Products (read/total): " + loadProducts);
productsRead = loadProducts[0];
productsTotal = loadProducts[1];
// Now waiting for a div element
await page.waitForSelector('div[data-productid]');
const getProducts = await page.evaluate(_ => {
return document.querySelectorAll('div');
});
// PROBLEM HERE!
// Cannot convert undefined or null to object
console.log("LENGTH: " + Array.from(getProducts).length);
}
The callback passed to page.evaluate runs in the emulated page context, not in the standard scope of the Node script. Expressions can't be passed between the page and the Node script without careful considerations: most importantly, if something isn't serializable (converted into plain JSON), it can't be transferred.
querySelectorAll returns a NodeList, and NodeLists only exist on the front-end, not the backend. Similarly, NodeLists contain HTMLElements, which also only exist on the front-end.
Put all the logic that requires using the data that exists only on the front-end inside the .evaluate callback, for example:
const numberOfDivs = await page.evaluate(_ => {
return document.querySelectorAll('div').length;
});
or
const firstDivText = await page.evaluate(_ => {
return document.querySelector('div').textContent;
});

Injecting data object to window with Puppeteer

Background
I am using Puppeteer to create some PDFs. I need to inject some data into the page when Puppeteer loads it.
Problem
I have tried using evaluateOnNewDocument() which was successful when using a String only. When I try with an Object it fails. I also tried with evaluate() and it fails regardless of what I pass in.
Example
// Works
await page.evaluateOnNewDocument(() => {
window.pdfData = {};
window.pdfData = "Some String";
});
// Does not work
await page.evaluateOnNewDocument(() => {
window.pdfData = {};
window.pdfData = data;
});
// Fails
await page.evaluate(data => {
window.pdfData = {};
window.pdfData = data;
}, data);
I would like to access this object like this,
const data = window.pdfData;
Question
What is the proper way to pass a data object into window on a loaded Puppeteer page so that it can be accessed within the page to use the data client side?
Passing object to evaluate
You can pass data which will be serialized as JSON.
await page.evaluateOnNewDocument(data => { // <-- pass as parameter
window.pdfData = data; // <-- read it here
}, data); // <-- pass as argument
Passing object to evaluateOnNewDocument
evaluateOnNewDocument works similarly to evaluate, except it will run whenever there is a new window/navigation/frame. This way the data will stay even if you navigate away to another page.
You can pass data and read inside the function.
await page.evaluateOnNewDocument(data => {
window.pdfData = data;
}, data);

Categories

Resources