I'm using puppeteer which is a NodeJS module that controls chrome.
It has 2 functions to initiate a new browser and a new page.
const browser = await puppeteer.launch() and browser.newPage()
I want to create a class for creating a new page and new browser.
This is the old way I was doing it, without classes, it works but it doesn't allow me to create new pages. This is why I want to move to using classes.
let chrome = {}
chrome.init = async (options) => {
chrome.browser = await puppeteer.launch(options)
chrome.page = await chrome.browser.newPage()
}
chrome.pageContains = async (string) => {
return await chrome.page.evaluate( (string) => {
const regex = new RegExp(string, 'i')
return regex.test( document.querySelector('body').innerText )
}, string)
}
module.exports = chrome
Here's my new code but I have no idea what I'm doing, it's obviously wrong and makes no sense.
chrome.init = async (options) => {
return {browser: await new Chrome(options)
}
class Chrome {
async constructor(options) {
this.browser = await puppeteer.launch(options)
}
newPage() {
return await this.browser.newPage()
}
}
class Page {
async constructor() {
this.page = await chrome.browser.newPage()
}
}
So how do I make my old code work using classes instead of an object?
You have some typo and constructors are not async.
Other than that, you simply have to pass the right browser function to the Page class. You can extend Chrome with page and use super, or keep them separate, but the page must have access to the browser at some point.
First, we will launch the browser, and return it. Await will take care of promises.
const puppeteer = require('puppeteer');
class Chrome {
constructor(options) {
this.browser = puppeteer.launch(options);
return this.browser;
}
}
Then we pass it to the page constructor and use it from there.
class Page {
constructor(browser) {
this.browser = browser;
this.page = browser.newPage();
return this.page;
}
async pageContains(string){
return await this.browser.page.evaluate( (string) => {
const regex = new RegExp(string, 'i')
return regex.test( document.querySelector('body').innerText )
}, string)
}
}
Then, we call them and make them usable. And return the browser and page object if needed.
const getChrome = async (options) => {
const browser = await new Chrome(options);
const page = await new Page(browser);
return { browser, page }
}
Now we can use them.
(async ()=>{
const page = (await getChrome({headless: false})).page;
await page.goto('http://example.com');
})()
I am pretty sure it can be refactored and what I wrote here is not the best practice, but this will get you going.
Related
Consider this really simple example:
class MyClass {
public add(num: number): number {
return num + 2;
}
}
const result = await page.evaluate((NewInstance) => {
console.log("typeof instance", typeof NewInstance); // undefined
const d = new NewInstance();
console.log("result", d.add(10));
return d.add(10);
}, MyClass);
I've tried everything I could think of. The main reason I want to use a class here, is because there's a LOT of code I don't want to just include inside the evaluate method directly. It's messy and hard to keep track of it, so I wanted to move all logic to a class so it's easier to understand what's going on.
Is this possible?
It's possible, but not necessarily great design, depending on what you're trying to do. It's hard to suggest the best solution without knowing the actual use case, so I'll just provide options and let you make the decision.
One approach is to stringify the class (either by hand or with .toString()) or put it in a separate file, then addScriptTag:
const puppeteer = require("puppeteer"); // ^19.6.3
class MyClass {
add(num) {
return num + 2;
}
}
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.goto(
"https://www.example.com",
{waitUntil: "domcontentloaded"}
);
await page.addScriptTag({content: MyClass.toString()});
const result = await page.evaluate(() => new MyClass().add(10));
console.log(result); // => 12
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
See this answer for more examples.
Something like eval is also feasible. If it looks scary, consider that anything you put into a page.evaluate() or page.addScriptTag() is effectively the same thing as far as security goes.
const result = await page.evaluate(MyClassStringified => {
const MyClass = eval(`(${MyClassStringified})`);
return new MyClass().add(10);
}, MyClass.toString());
Many other patterns are also possible, like exposing your library via exposeFunction if the logic is Node-based rather than browser-based.
That said, defining the class inside an evaluate may not be as bad as you think:
const addTonsOfCode = () => {
MyClass = class {
add(num) {
return num + 2;
}
}
// ... tons of code ...
};
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.goto(
"https://www.example.com",
{waitUntil: "domcontentloaded"}
);
await page.evaluate(addTonsOfCode);
const result = await page.evaluate(() => new MyClass().add(10));
console.log(result); // => 12
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
I'd prefer to namespace this all into a library:
const addTonsOfCode = () => {
class MyClass {
add(num) {
return num + 2;
}
}
// ... tons of code ...
window.MyLib = {
MyClass,
// ...
};
};
Then use with:
await page.evaluate(addTonsOfCode);
await page.evaluate(() => new MyLib.MyClass().add(10));
I'm trying to have a Singleton instance of a dynamically imported module in my Next.js app. However, the current implementation I have seems to initialize a new instance every time I call getInstance.
Here's the implementation in gui.js:
let dynamicallyImportPackage = async () => {
let GUI;
await import('three/examples/jsm/libs/dat.gui.module.js')
.then(module => {
GUI = module.GUI
})
.catch(e => console.log(e))
return GUI;
}
let GUI = (function () {
let instance;
return {
getInstance: async () => {
if (!instance) {
let GUIModule = await dynamicallyImportPackage();
instance = new GUIModule();
}
return instance;
}
};
})();
export default GUI;
and I call it in an ES6 class function using GUI.getInstance().then(g => { ... })
I would normally use React Context API or Redux for this kind of shared state but for this case, I need it to be purely in ES6 JS and not React.
You need to cache the promise, not the instance, otherwise it will try to import the module (and instantiate another instance) again while the first is still loading and has not yet assigned the instance variable.
async function createInstance() {
const module = await import('three/examples/jsm/libs/dat.gui.module.js')
const { GUI } = module;
return new GUI();
}
let instancePromise = null;
export default function getInstance() {
if (!instancePromise) {
instancePromise = createInstance()
// .catch(e => {
// console.log(e); return ???;
// instancePromise = null; throw e;
// });
}
return instancePromise;
}
i want to fetch data from outside of my project with axios. i do it in side of class but for some reason i retrieve data in promise object i use await and promise but eventually i receive data in [object promise].
const Online_Visitors_System = class OnlineVisitors {
constructor() {
// get VisitorIP
this.IP = this.fetchIP();
// config redis for key space notification
this.redis = Redis.createClient();
this.redis.on("ready", () => {
this.redis.config("SET", "notify-keyspace-events", "KEA");
});
PubSub.subscribe("__keyevent#0__:incrby");
}
async fetchIP() {
return new Promise((resolve, reject) => {
return axios
.get("https://api.ipgeolocation.io/getip")
.then(res => resolve(res.data.ip));
});
}
VisitorInter() {
console.log(this.IP);
}
};
module.exports = new Online_Visitors_System();
error that i encounter with it::
This is converted to "[object Promise]" by using .toString() now and will return an error from v.3.0
on.
Please handle this in your code to make sure everything works as you intended it to.
Promise { '51.38.89.159' }
Well you missed await in fews places, here is full correction:
const Online_Visitors_System = class OnlineVisitors{
constructor(){
// get VisitorIP
this.fetchIP().then(ip => this.IP = ip);
// config redis for key space notification
this.redis = Redis.createClient();
this.redis.on('ready',()=>{
this.redis.config('SET',"notify-keyspace-events",'KEA')
})
PubSub.subscribe("__keyevent#0__:incrby")
}
fetchIP(){
return new Promise((resolve,reject)=>{
axios.get('https://api.ipgeolocation.io/getip')
.then(res=>resolve(res.data.ip))
})
}
VisitorInter(){
console.log(this.IP)
}
};
Since the method fetchIP is an async function you need also await when calling it,
so: this.IP = await this.fetchIP().
But since you are in construcot you can't use await so the solution is to use chaning:
this.fetchIP().then(ip => this.IP = ip);
Note that when initating new Promise you need to give it an async function, because inside that you are awating other methods.
You are assigning the promise of an IP address into this.IP.
You will need to .then the promise to get the actual IP address; it might or might not be available by the time VisitorInter() or anything else that needs the IP address is called.
class OnlineVisitors {
constructor() {
this.ipPromise = this.fetchIP();
// redis stuff elided from example
}
async fetchIP() {
const resp = await axios.get("https://api.ipgeolocation.io/getip");
return resp.data.ip;
}
async VisitorInter() {
const ip = await this.ipPromise; // this could potentially hang forever if ipgeolocation.io doesn't feel like answering
console.log(ip);
}
};
module.exports = new OnlineVisitors();
I am working on a partner manager and some code need to be atomic because currently there is race condition and cant work when 2 clients calls same resource at same time. retrievePartners method returns partners and that method should me atomic. Basicaly partners are the limited resources and providing mechanism should deal only one client (asking for partner) at a time.
I have been told the code below works for atomic operation, since javascript is atomic by native.
let processingQueue = Promise.resolve();
function doStuffExclusively() {
processingQueue = processingQueue.then(() => {
return fetch('http://localhost', {method: 'PUT', body: ...});
}).catch(function(e){
throw e;
});
return processingQueue;
}
doStuffExclusively()
doStuffExclusively()
doStuffExclusively()
However this code is basic, my code has some await that calls another await , and so on. I want to apply that mechanism for below code but really dont know how to do, I tried few tings but no work. Can not get await work inside a then statement.
I am also confused is above code returns true in then part of processingQueue. However in my case, I return an array, or throw an error message. Should I return something to get it work as above.
Here is the function I want to make atomic just like the above code. I tried to put everything in this function in then section, before return statement, but did not worked, since
export class Workout {
constructor (config) {
this.instructorPeer = new jet.Peer(config)
this.instructorPeer.connect()
}
async createSession (partnerInfo) {
const partners = { chrome: [], firefox: [], safari: [], ie: [] }
const appropriatePartners = await this.retrievePartners(partnerInfo)
Object.keys(appropriatePartners).forEach(key => {
appropriatePartners[key].forEach(partner => {
const newPartner = new Partner(this.instructorPeer, partner.id)
partners[key].push(newPartner)
})
})
return new Session(partners)
}
async retrievePartners (capabilities) {
const appropriatePartners = { chrome: [], firefox: [], safari: [], ie: [] }
const partners = await this.getAllPartners()
// first check if there is available appropriate Partners
Object.keys(capabilities.type).forEach(key => {
let typeNumber = parseInt(capabilities.type[key])
for (let i = 0; i < typeNumber; i++) {
partners.forEach((partner, i) => {
if (
key === partner.value.type &&
partner.value.isAvailable &&
appropriatePartners[key].length < typeNumber
) {
appropriatePartners[key].push(partner)
console.log(appropriatePartners[key].length)
}
})
if (appropriatePartners[key].length < typeNumber) {
throw new Error(
'Sorry there are no appropriate Partners for this session'
)
}
}
})
Object.keys(appropriatePartners).forEach(key => {
appropriatePartners[key].forEach(partner => {
this.instructorPeer.set('/partners/' + partner.id + '/states/', {
isAvailable: false
})
})
})
return appropriatePartners
}
async getAllPartners (capabilities) {
const partners = []
const paths = await this.instructorPeer.get({
path: { startsWith: '/partners/' }
})
paths.forEach((path, i) => {
if (path.fetchOnly) {
let obj = {}
obj.value = path.value
obj.id = path.path.split('/partners/')[1]
obj.value.isAvailable = paths[i + 1].value.isAvailable
partners.push(obj)
}
})
return partners
}
Here is the code that calls it
async function startTest () {
const capabilities = {
type: {
chrome: 1
}
}
const workoutServerConfig = {
url: 'ws://localhost:8090'
}
const workout = createWorkout(workoutServerConfig)
const session = await workout.createSession(capabilities)
const session1 = await workout.createSession(capabilities)
and here is what I tried so for and not worked, session is not defined et all
let processingQueue = Promise.resolve()
export class Workout {
constructor (config) {
this.instructorPeer = new jet.Peer(config)
this.instructorPeer.connect()
this.processingQueue = Promise.resolve()
}
async createSession (partnerInfo) {
this.processingQueue = this.processingQueue.then(() => {
const partners = { chrome: [], firefox: [], safari: [], ie: [] }
const appropriatePartners = this.retrievePartners(partnerInfo)
Object.keys(appropriatePartners).forEach(key => {
appropriatePartners[key].forEach(partner => {
const newPartner = new Partner(this.instructorPeer, partner.id)
partners[key].push(newPartner)
})
})
return new Session(partners)
})
}
This is promise-based locking, based on the facts that:
1) the .then() handler will only be called once the lock has resolved.
2) once the .then() handler begins executing, no other JS code will execute, due to JS' execution model.
The overall structure of the approach you cited is correct.
The main issue I see with your code is that const appropriatePartners = this.retrievePartners(partnerInfo) will evaluate to a promise, because retrievePartners is async. You want to:
const appropriatePartners = await this.retrievePartners(partnerInfo).
This will cause your lock's executor to block on the retrievePartners call, whereas currently you are simply grabbing a promise wrapping that call's eventual return value.
Edit: See jsfiddle for an example.
In sum:
1) make the arrow function handling lock resolution async
2) make sure it awaits the return value of this.retrievePartners, otherwise you will be operating on the Promise, not the resolved value.
I'm trying to pass a variable into a page.evaluate() function in Puppeteer, but when I use the following very simplified example, the variable evalVar is undefined.
I can't find any examples to build on, so I need help passing that variable into the page.evaluate() function so I can use it inside.
const puppeteer = require('puppeteer');
(async() => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
const evalVar = 'WHUT??';
try {
await page.goto('https://www.google.com.au');
await page.waitForSelector('#fbar');
const links = await page.evaluate((evalVar) => {
console.log('evalVar:', evalVar); // appears undefined
const urls = [];
hrefs = document.querySelectorAll('#fbar #fsl a');
hrefs.forEach(function(el) {
urls.push(el.href);
});
return urls;
})
console.log('links:', links);
} catch (err) {
console.log('ERR:', err.message);
} finally {
// browser.close();
}
})();
You have to pass the variable as an argument to the pageFunction like this:
const links = await page.evaluate((evalVar) => {
console.log(evalVar); // 2. should be defined now
…
}, evalVar); // 1. pass variable as an argument
You can pass in multiple variables by passing more arguments to page.evaluate():
await page.evaluate((a, b c) => { console.log(a, b, c) }, a, b, c)
The arguments must either be serializable as JSON or JSHandles of in-browser objects: https://pptr.dev/#?show=api-pageevaluatepagefunction-args
I encourage you to stick on this style, because it's more convenient and readable.
let name = 'jack';
let age = 33;
let location = 'Berlin/Germany';
await page.evaluate(({name, age, location}) => {
console.log(name);
console.log(age);
console.log(location);
},{name, age, location});
Single Variable:
You can pass one variable to page.evaluate() using the following syntax:
await page.evaluate(example => { /* ... */ }, example);
Note: You do not need to enclose the variable in (), unless you are going to be passing multiple variables.
Multiple Variables:
You can pass multiple variables to page.evaluate() using the following syntax:
await page.evaluate((example_1, example_2) => { /* ... */ }, example_1, example_2);
Note: Enclosing your variables within {} is not necessary.
It took me quite a while to figure out that console.log() in evaluate() can't show in node console.
Ref: https://github.com/GoogleChrome/puppeteer/issues/1944
everything that is run inside the page.evaluate function is done in the context of the browser page. The script is running in the browser not in node.js so if you log it will show in the browsers console which if you are running headless you will not see. You also can't set a node breakpoint inside the function.
Hope this can help.
For pass a function, there are two ways you can do it.
// 1. Defined in evaluationContext
await page.evaluate(() => {
window.yourFunc = function() {...};
});
const links = await page.evaluate(() => {
const func = window.yourFunc;
func();
});
// 2. Transform function to serializable(string). (Function can not be serialized)
const yourFunc = function() {...};
const obj = {
func: yourFunc.toString()
};
const otherObj = {
foo: 'bar'
};
const links = await page.evaluate((obj, aObj) => {
const funStr = obj.func;
const func = new Function(`return ${funStr}.apply(null, arguments)`)
func();
const foo = aObj.foo; // bar, for object
window.foo = foo;
debugger;
}, obj, otherObj);
You can add devtools: true to the launch options for test
I have a typescript example that could help someone new in typescript.
const hyperlinks: string [] = await page.evaluate((url: string, regex: RegExp, querySelect: string) => {
.........
}, url, regex, querySelect);
Slightly different version from #wolf answer above. Make code much more reusable between different context.
// util functions
export const pipe = (...fns) => initialVal => fns.reduce((acc, fn) => fn(acc), initialVal)
export const pluck = key => obj => obj[key] || null
export const map = fn => item => fn(item)
// these variables will be cast to string, look below at fn.toString()
const updatedAt = await page.evaluate(
([selector, util]) => {
let { pipe, map, pluck } = util
pipe = new Function(`return ${pipe}`)()
map = new Function(`return ${map}`)()
pluck = new Function(`return ${pluck}`)()
return pipe(
s => document.querySelector(s),
pluck('textContent'),
map(text => text.trim()),
map(date => Date.parse(date)),
map(timeStamp => Promise.resolve(timeStamp))
)(selector)
},
[
'#table-announcements tbody td:nth-child(2) .d-none',
{ pipe: pipe.toString(), map: map.toString(), pluck: pluck.toString() },
]
)
Also not that functions inside pipe cant used something like this
// incorrect, which is i don't know why
pipe(document.querySelector)
// should be
pipe(s => document.querySelector(s))