How to call an on page function with playwright? - javascript

I'm using playwright to scrape some data from a page. I need to call a page function in order to change the browser's state to collect additional info.
What's the syntax for getting a function name from a data attribute and then calling that function on the page?
I keep getting the error :UnhandledPromiseRejectionWarning: page.evaluate: Evaluation failed: TypeError: cb is not a function
Here's what I have so far:
const { chromium} = require("playwright");
(async()=>{
this.browser = await chromium.launch({
headless: true,
});
this.context = await this.browser.newContext();
this.page = await this.context.newPage();
this.page.goto('http://fakeExample.org')
const callbackHandle = await this.page.$('[data-callback]');
const cbName = await callbackHandle.evaluate(element=>element.getAttribute('data-callback')); //returns actual page function name 'myPageFunction'
this.page.evaluate((cb) => {
cb() //should call myPageFunction() on the page
}, cbName)
})()

I think it comes down to either window[cb]() or eval(cb) since you're passing in a string with the function name.
Some reading on this topic:
How to execute a JavaScript function when I have its name as a string
Call a JavaScript function name using a string?
Call a function whose name is stored in a variable

Related

Call method on object also do async operation before

i'm using a library that return an object with multiple keys and methods to use, one of the key of this object is accessToken.
accessToken value must exist before calling any method of this object (otherwise you're not authenticated).
The token is retrieved externally using an async axios function, also this token is only valid for 1 hour, in this case it's okay to get a new token every time you call some function inside it.
I dont' want to recreate the object every time i use this library (i'm using this library multiple times).
So far based on few articles i found online i did this:
const force = require('jsforce')
const { SFAuth } = require('./auth')
const common = require('../common')
class JSForce {
constructor (connection) {
return connection
}
static async Connect () {
const token = await SFAuth.getToken()
const connection = new force.Connection({
instanceUrl: common.SALESFORCE_URL,
accessToken: token
})
return new JSForce(connection)
}
}
const start = async () => {
const res = await JSForce.Connect()
console.log(res)
}
start()
If i try to do JSForce.Connect().sobject('Account') i get an error saying sobject is not a function.
It works if first i save JSFORCE.Connect() in a new instance and then i use this instance.sobject() but i can't do it every time i need to use it.
How would you solve this?
Thanks!!
Problem is field sobject will only come once you have JSForce connection successful. First we need to make sure we have that and we can save in variable. We will only call JSForce if we don't have instance already.
Declare a global variable in file.
let instance: <any>
// This method will return instance always if we dont have
const getInstance = async () => {
if (!instance) {
instance = await JSForce.Connect();
}
return instance;
};
const start = async () => {
const res = await getInstance().sobject('Account');
console.log(res);
}
start();

Cannot get querySelectorAll to work with puppeteer (returns undefined)

I'm trying to practice some web scraping with prices from a supermarket. It's with node.js and puppeteer. I can navigate throught the website in beginning with accepting cookies and clicking a "load more button". But then when I try to read div's containing the products with querySelectorAll I get stuck. It returns undefined even though I wait for a specific div to be present. What am I missing?
Problem is at the end of the code block.
const { product } = require("puppeteer");
const scraperObjectAll = {
url: 'https://www.bilkatogo.dk/s/?query=',
async scraper(browser) {
let page = await browser.newPage();
console.log(`Navigating to ${this.url}`);
await page.goto(this.url);
// accept cookies
await page.evaluate(_ => {
CookieInformation.submitAllCategories();
});
var productsRead = 0;
var productsTotal = Number.MAX_VALUE;
while (productsRead < 100) {
// Wait for the required DOM to be rendered
await page.waitForSelector('button.btn.btn-dark.border-radius.my-3');
// Click button to read more products
await page.evaluate(_ => {
document.querySelector("button.btn.btn-dark.border-radius.my-3").click()
});
// Wait for it to load the new products
await page.waitForSelector('div.col-10.col-sm-4.col-lg-2.text-center.mt-4.text-secondary');
// Get number of products read and total
const loadProducts = await page.evaluate(_ => {
let p = document.querySelector("div.col-10.col-sm-4.col-lg-2").innerText.replace("INDLÆS FLERE", "").replace("Du har set ","").replace(" ", "").replace(/(\r\n|\n|\r)/gm,"").split("af ");
return p;
});
console.log("Products (read/total): " + loadProducts);
productsRead = loadProducts[0];
productsTotal = loadProducts[1];
// Now waiting for a div element
await page.waitForSelector('div[data-productid]');
const getProducts = await page.evaluate(_ => {
return document.querySelectorAll('div');
});
// PROBLEM HERE!
// Cannot convert undefined or null to object
console.log("LENGTH: " + Array.from(getProducts).length);
}
The callback passed to page.evaluate runs in the emulated page context, not in the standard scope of the Node script. Expressions can't be passed between the page and the Node script without careful considerations: most importantly, if something isn't serializable (converted into plain JSON), it can't be transferred.
querySelectorAll returns a NodeList, and NodeLists only exist on the front-end, not the backend. Similarly, NodeLists contain HTMLElements, which also only exist on the front-end.
Put all the logic that requires using the data that exists only on the front-end inside the .evaluate callback, for example:
const numberOfDivs = await page.evaluate(_ => {
return document.querySelectorAll('div').length;
});
or
const firstDivText = await page.evaluate(_ => {
return document.querySelector('div').textContent;
});

Retrieving values from browser cache

I've created a service worker that performs a fetch and then immediately stores the data in the cache.
self.addEventListener('install', async function(e) {
try {
const fileCache = await caches.open(CACHE_NAME);
await fileCache.addAll(FILES_TO_CACHE);
const dataCache = await caches.open(DATA_CACHE_NAME);
const dataFetchResponse = await fetch('/api/transaction');
return await dataCache.put('/api/transaction', dataFetchResponse);
} catch (error) {
console.log(error);
}
});
After performing this I'm trying to make it so that I can test how long it's been since the last data fetch in order to determine if the data needs to be updated. My Transaction model adds a timestamp onto the data so I'd ideally want to test the timestamp of the last transaction against the current time but I'm not sure how to do it.
I've attempted to do cache.match() but it doesn't return the entire object of the matched key. I know localStorage has a .getItem() method but I don't see any methods for cache that are similar.
Any ideas on this one?
Figured out how to get the information. You have to .match() the key of the data stored in the cache and since it returns a promise you convert that into a .json() object.
async function getCachedData(cacheName, url) {
const cacheStorage = await caches.open(cacheName);
const cachedResponse = await cacheStorage.match(url); // Returns a promise w/ matched cache
if(!cachedResponse || !cachedResponse.ok) {return false}
console.log(await cachedResponse);
console.log(await cachedResponse.json()); // prints json object with value of key matched
return await cachedResponse.json();
};

Send message to console.log (jest puppeteer)

Why I can't see my messages at console.log in page.evaluate, page.$, page.$$, page.$eval, page.$$eval
And can't to get access to variables out that?
let variable = 0;
const divColors = await page.evaluate(() => {
const divs = Array.from(document.querySelectorAll('.map-filters div'));
let text = divs.map((element, index) => {
console.log(element.textContent)
variable =1;
return element.style.color;
})
return text;
})
Why I can't do variable=1 and console.log(element.textContent) in this example?
You're using console.log inside of page.evaluate, so it is logging its output to the Chromium browser and not to node output. To see console messages from browser in node's console one needs to subscribe to them after page object is created and before console.log is used in the script:
const page = await browser.newPage();
page.on('console', consoleObj => console.log(consoleObj.text()));
page.evaluate(...);
As for variable variable, there are in fact two of them in your script.
The first one exists in node.js context:
let variable = 0;
And the other one — in web page context:
page.evaluate( () => {
variable = 1;
})
They are completely different. Think of page.evaluate as of a portal into another world: objects that exist there are only present inside of a javascript runtime on a page open in the web browser that puppeteer is driving. node has its own runtime with its own set of objects.
You may pass data into page.evaluate from node:
let variable = 420;
page.evaluate(variable => {
// it is now passed here from outside
console.log(variable)
}, variable);

Injecting data object to window with Puppeteer

Background
I am using Puppeteer to create some PDFs. I need to inject some data into the page when Puppeteer loads it.
Problem
I have tried using evaluateOnNewDocument() which was successful when using a String only. When I try with an Object it fails. I also tried with evaluate() and it fails regardless of what I pass in.
Example
// Works
await page.evaluateOnNewDocument(() => {
window.pdfData = {};
window.pdfData = "Some String";
});
// Does not work
await page.evaluateOnNewDocument(() => {
window.pdfData = {};
window.pdfData = data;
});
// Fails
await page.evaluate(data => {
window.pdfData = {};
window.pdfData = data;
}, data);
I would like to access this object like this,
const data = window.pdfData;
Question
What is the proper way to pass a data object into window on a loaded Puppeteer page so that it can be accessed within the page to use the data client side?
Passing object to evaluate
You can pass data which will be serialized as JSON.
await page.evaluateOnNewDocument(data => { // <-- pass as parameter
window.pdfData = data; // <-- read it here
}, data); // <-- pass as argument
Passing object to evaluateOnNewDocument
evaluateOnNewDocument works similarly to evaluate, except it will run whenever there is a new window/navigation/frame. This way the data will stay even if you navigate away to another page.
You can pass data and read inside the function.
await page.evaluateOnNewDocument(data => {
window.pdfData = data;
}, data);

Categories

Resources