Consider this really simple example:
class MyClass {
public add(num: number): number {
return num + 2;
}
}
const result = await page.evaluate((NewInstance) => {
console.log("typeof instance", typeof NewInstance); // undefined
const d = new NewInstance();
console.log("result", d.add(10));
return d.add(10);
}, MyClass);
I've tried everything I could think of. The main reason I want to use a class here, is because there's a LOT of code I don't want to just include inside the evaluate method directly. It's messy and hard to keep track of it, so I wanted to move all logic to a class so it's easier to understand what's going on.
Is this possible?
It's possible, but not necessarily great design, depending on what you're trying to do. It's hard to suggest the best solution without knowing the actual use case, so I'll just provide options and let you make the decision.
One approach is to stringify the class (either by hand or with .toString()) or put it in a separate file, then addScriptTag:
const puppeteer = require("puppeteer"); // ^19.6.3
class MyClass {
add(num) {
return num + 2;
}
}
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.goto(
"https://www.example.com",
{waitUntil: "domcontentloaded"}
);
await page.addScriptTag({content: MyClass.toString()});
const result = await page.evaluate(() => new MyClass().add(10));
console.log(result); // => 12
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
See this answer for more examples.
Something like eval is also feasible. If it looks scary, consider that anything you put into a page.evaluate() or page.addScriptTag() is effectively the same thing as far as security goes.
const result = await page.evaluate(MyClassStringified => {
const MyClass = eval(`(${MyClassStringified})`);
return new MyClass().add(10);
}, MyClass.toString());
Many other patterns are also possible, like exposing your library via exposeFunction if the logic is Node-based rather than browser-based.
That said, defining the class inside an evaluate may not be as bad as you think:
const addTonsOfCode = () => {
MyClass = class {
add(num) {
return num + 2;
}
}
// ... tons of code ...
};
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.goto(
"https://www.example.com",
{waitUntil: "domcontentloaded"}
);
await page.evaluate(addTonsOfCode);
const result = await page.evaluate(() => new MyClass().add(10));
console.log(result); // => 12
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
I'd prefer to namespace this all into a library:
const addTonsOfCode = () => {
class MyClass {
add(num) {
return num + 2;
}
}
// ... tons of code ...
window.MyLib = {
MyClass,
// ...
};
};
Then use with:
await page.evaluate(addTonsOfCode);
await page.evaluate(() => new MyLib.MyClass().add(10));
Related
I tested iterations with puppeteer in a small case. I already have read the common reason for puppeteer disconnections are that the Node script doesnt wait for the puppeteer actions to be ended. So I converted all functions in my snippet into async functions but it didnt help.
If the small case with six iterations work I will implement it in my current project with like 50 iterations.
'use strict';
const puppeteer = require('puppeteer');
const arrIDs = [8322072, 1016816, 9312604, 1727088, 9312599, 8477729];
const call = async () => {
await puppeteer.launch().then(async (browser) => {
arrIDs.forEach(async (id, index, arr) => {
await browser.newPage().then(async (page) => {
await page.goto(`http://somelink.com/${id}`).then(async () => {
await page.$eval('div.info > table > tbody', async (heading) => {
return heading.innerText;
}).then(async (result) => {
await browser.close();
console.log(result);
});
});
});
});
});
};
call();
forEach executes synchronously. replace forEach with a simple for loop.
const arrIDs = [8322072, 1016816, 9312604, 1727088, 9312599, 8477729];
const page = await browser.newPage();
for (let id of arrIDs){
await page.goto(`http://somelink.com/${id}`);
let result = await page.$eval('div.info > table > tbody', heading => heading.innerText).catch(e => void e);
console.log(result);
}
await browser.close()
The way you've formatted and nested everything seems like some incarnation of callback hell.
Here's my suggestion, its not working, but the structure is going to work better for Async / Await
const puppeteer = require("puppeteer");
const chromium_path_706915 =
"706915/chrome.exe";
async function Run() {
arrIDs.forEach(
await Navigate();
)
async function Navigate(url) {
const browser = await puppeteer.launch({
executablePath: chromium_path_706915,
args: ["--auto-open-devtools-for-tabs"],
headless: false
});
const page = await browser.newPage();
const response = await page.goto(url);
const result = await page.$$eval("div.info > table > tbody", result =>
result.map(ele2 => ({
etd: ele2.innerText.trim()
}))
);
await browser.close();
console.log(result);
}
}
run();
On top of the other answers, I want to point out that async and forEach loops don't exactly play as expected. One possible solution is having a custom implementation that supports this:
Utility function:
async function asyncForEach(array: Array<any>, callback: any) {
for (let index = 0; index < array.length; index++) {
await callback(array[index], index, array);
}
}
Example usage:
const start = async () => {
await asyncForEach([1, 2, 3], async (num) => {
await waitFor(50);
console.log(num);
});
console.log('Done');
}
start();
Going through this article by Sebastien Chopin can help make it a bit more clear as to why async/await and forEach act unexpectedly. Here it is as a gist.
I'm just trying to understand the benefits of this:
const populateUsers = done => {
User.remove({}).then(async () => {
const userOne = new User(users[0]).save();
const userTwo = new User(users[1]).save();
const usersProm = await Promise.all([userOne, userTwo]).then(() => done());
return usersProm;
});
};
over this:
const populateUsers = done => {
User.remove({})
.then(() => {
const userOne = new User(users[0]).save();
const userTwo = new User(users[1]).save();
return Promise.all([userOne, userTwo]);
})
.then(() => done());
};
I came to this problem because eslint suggested my to use async in this function, and I remember the concept, make it work in my app, but I'm not sure why should I use this instead of the original way
Your original code was totally fine.
No, there is no benefit in using the code from your first snippet. You should avoid mixing await and .then(…) syntax! To use async/await, you'd make the whole function async, not the then callback:
async function populateUsers(done) {
await User.remove({})
const userOne = new User(users[0]).save();
const userTwo = new User(users[1]).save();
await Promise.all([userOne, userTwo]);
return done();
}
(Probably you would also remove that done callback - the function already returns a promise)
Your first version does not go all the way. Do this:
const populateUsers = done => {
User.remove({}).then(async () => {
const userOne = new User(users[0]).save();
const userTwo = new User(users[1]).save();
await Promise.all([userOne, userTwo]);
const usersProm = await done();
return usersProm;
});
};
There is no difference, it is just that code without these then callbacks is somewhat easier to read.
You might even apply it to the outer function:
const populateUsers = async () => {
await User.remove({});
const userOne = new User(users[0]).save();
const userTwo = new User(users[1]).save();
await Promise.all([userOne, userTwo]);
const usersProm = await done();
return usersProm;
};
Now populateUsers returns the promise instead of undefined.
As concluded in comments: you get an error because populateUsersreturns a promise and accepts a done callback argument, while one of these is expected, not both.
I have a bunch of async functions, that I always or nearly always want to call synchronously. So we all know the pattern
async function somethingcool() {
return new Promise(resolve => {
setTimeout(resolve, 1000, "Cool Thing");
});
}
const coolthing = await somethingcool();
console.log(coolthing);
But I have this cool module called manycooolthings which offers many cool things, all via async functions that I always or nearly always want to await on.
import * as cool from 'manycoolthings';
await cool.updateCoolThings();
const coolThing = await cool.aCoolThing();
const anohtherCoolThing = await cool.anotherCoolThing();
const rus = await cool.coolThingsAreUs();
await cool.sendCoolThings();
await cool.postCoolThing(myCoolThing);
await cool.moreCoolThings();
const thingsThatAreCool = await cool.getThingsThatAreCool();
Extremely contrived and silly example, to illustrate the point. I do have a genuine use case, a set of tests based on puppeteer where most functions are async and they almost always want to be awaited on.
There must be a better way to avoid all the await pollution of our JavaScript code.
It would be great if could do something like
import * as cool from 'manycoolthings';
await {
cool.updateCoolThings();
const coolThing = cool.aCoolThing();
const anotherCoolThing = cool.anotherCoolThing();
const rus = cool.coolThingsAreUs();
cool.sendCoolThings();
cool.postCoolThing(myCoolThing);
cool.moreCoolThings();
const thingsThatAreCool = cool.getThingsThatAreCool();
}
Or even just
import * as cool from 'manycoolthings';
cool.updateCoolThings();
const coolThing = cool.aCoolThing();
const anotherCoolThing = cool.anotherCoolThing();
const rus = cool.coolThingsAreUs();
cool.sendCoolThings();
cool.postCoolThing(myCoolThing);
cool.moreCoolThings();
const thingsThatAreCool = cool.getThingsThatAreCool();
without having to worry if the method being called is async or not, because it's defined as an auto await function or something.
If you're unhappy with multiple awaits or thens, you can make a little "sequence" helper:
let _seq = async fns => fns.reduce((p, f) => p.then(f), Promise.resolve(null))
and use it like this:
result = await seq(
_ => cool.updateCoolThings(),
_ => _.aCoolThing(),
_ => _.anotherCoolThing(),
_ => _.coolThingsAreUs(),
)
which is almost your snippet #2.
I have some trouble using the newest version of puppeteer.
I'm using puppeteer version 0.13.0.
I have a site with this element:
<div class="header">hey there</div>
I'm trying to run this code:
const headerHandle = await page.evaluateHandle(() => {
const element = document.getElementsByClassName('header');
return element;
});
Now the headerHandle is a JSHandle with a description: 'HTMLCollection(0)'.
If I try to run
headerHandle.getProperties() and try to console.log I get Promise { <pending> }.
If I just try to get the element like this:
const result = await page.evaluate(() => {
const element = document.getElementsByClassName('header');
return Promise.resolve(element);
});
I get an empty object.
How do I get the actual element or the value of the element?
Puppeteer has changed the way evaluate works, the safest way to retrieve DOM elements is by creating a JSHandle, and passing that handle to the evaluate function:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com', { waitUntil: 'networkidle2' });
const jsHandle = await page.evaluateHandle(() => {
const elements = document.getElementsByTagName('h1');
return elements;
});
console.log(jsHandle); // JSHandle
const result = await page.evaluate(els => els[0].innerHTML, jsHandle);
console.log(result); // it will log the string 'Example Domain'
await browser.close();
})();
For reference: evalute docs, issue #1590, issue #1003 and PR #1098
Fabio's approach is good to have for working with arrays, but in many cases you don't need the nodes themselves, just their serializable contents or properties. In OP's case, there's only one element being selected, so the following works more directly (with less straightforward approaches shown for comparison):
const puppeteer = require("puppeteer"); // ^19.1.0
const html = `<!DOCTYPE html><html><body>
<div class="header">hey there</div>
</body></html>`;
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.setContent(html);
const text = await page.$eval(".header", el => el.textContent);
console.log(text); // => hey there
// or, less directly:
const text2 = await page.evaluate(() => {
// const el = document.getElementsByClassName(".header")[0] // take the 0th element
const el = document.querySelector(".header"); // ... better still
return el.textContent;
});
console.log(text2); // => hey there
// even less directly, similar to OP:
const handle = await page.evaluateHandle(() =>
document.querySelector(".header")
);
const text3 = await handle.evaluate(el => el.textContent);
console.log(text3); // => hey there
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
Getting the text from multiple elements is also straightforward, not requiring handles:
const html = `<!DOCTYPE html><html><body>
<div class="header">foo</div>
<div class="header">bar</div>
<div class="header">baz</div>
</body></html>`;
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.setContent(html);
const text = await page.$$eval(
".header",
els => els.map(el => el.textContent)
);
console.log(text);
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
As Fabio's approach attests, things get trickier when working with multiple elements when you want to use the handles in Puppeteer. Unlike the ElementHandle[] return of page.$$, page.evaluateHandle's JSHandle return isn't iterable, even if the handle point to an array. It's only expandable into an array back into the browser.
One workaround is to return the length, optionally attach the selector array to the window (or re-query it multiple times), then run a loop and call evaluateHandle to return each ElementHandle:
// ...
await page.setContent(html);
const length = await page.$$eval(".header", els => {
window.els = els;
return els.length;
});
const nodes = [];
for (let i = 0; i < length; i++) {
nodes.push(await page.evaluateHandle(i => window.els[i], i));
}
// now you can loop:
for (const el of nodes) {
console.log(await el.evaluate(el => el.textContent));
}
// ...
See also Puppeteer find list of shadowed elements and get list of ElementHandles which, in spite of the shadow DOM in the title, is mostly about working with arrays of handles.
I'm using puppeteer which is a NodeJS module that controls chrome.
It has 2 functions to initiate a new browser and a new page.
const browser = await puppeteer.launch() and browser.newPage()
I want to create a class for creating a new page and new browser.
This is the old way I was doing it, without classes, it works but it doesn't allow me to create new pages. This is why I want to move to using classes.
let chrome = {}
chrome.init = async (options) => {
chrome.browser = await puppeteer.launch(options)
chrome.page = await chrome.browser.newPage()
}
chrome.pageContains = async (string) => {
return await chrome.page.evaluate( (string) => {
const regex = new RegExp(string, 'i')
return regex.test( document.querySelector('body').innerText )
}, string)
}
module.exports = chrome
Here's my new code but I have no idea what I'm doing, it's obviously wrong and makes no sense.
chrome.init = async (options) => {
return {browser: await new Chrome(options)
}
class Chrome {
async constructor(options) {
this.browser = await puppeteer.launch(options)
}
newPage() {
return await this.browser.newPage()
}
}
class Page {
async constructor() {
this.page = await chrome.browser.newPage()
}
}
So how do I make my old code work using classes instead of an object?
You have some typo and constructors are not async.
Other than that, you simply have to pass the right browser function to the Page class. You can extend Chrome with page and use super, or keep them separate, but the page must have access to the browser at some point.
First, we will launch the browser, and return it. Await will take care of promises.
const puppeteer = require('puppeteer');
class Chrome {
constructor(options) {
this.browser = puppeteer.launch(options);
return this.browser;
}
}
Then we pass it to the page constructor and use it from there.
class Page {
constructor(browser) {
this.browser = browser;
this.page = browser.newPage();
return this.page;
}
async pageContains(string){
return await this.browser.page.evaluate( (string) => {
const regex = new RegExp(string, 'i')
return regex.test( document.querySelector('body').innerText )
}, string)
}
}
Then, we call them and make them usable. And return the browser and page object if needed.
const getChrome = async (options) => {
const browser = await new Chrome(options);
const page = await new Page(browser);
return { browser, page }
}
Now we can use them.
(async ()=>{
const page = (await getChrome({headless: false})).page;
await page.goto('http://example.com');
})()
I am pretty sure it can be refactored and what I wrote here is not the best practice, but this will get you going.