Collect elements by class name and then click each one - Puppeteer

Collect elements by class name and then click each one - Puppeteer - javascript

Using Puppeteer, I would like to get all the elements on a page with a particular class name and then loop through and click each one.
Using jQuery, I can achieve this with:
var elements = $("a.showGoals").toArray();
for (i = 0; i < elements.length; i++) {
$(elements[i]).click();
}
How would I achieve this using Puppeteer?
Update
Tried out Chridam's answer below, but I couldn't get it to work (though the answer was helpful, so thanks due there), so I tried the following and this works:
await page.evaluate(() => {
let elements = $('a.showGoals').toArray();
for (i = 0; i < elements.length; i++) {
$(elements[i]).click();
}
});

Iterating puppeteer async methods in for loop vs. Array.map()/Array.forEach()
As all puppeteer methods are asynchronous it doesn't matter how we iterate over them. I've made a comparison and a rating of the most commonly recommended and used options.
For this purpose, I have created a React.Js example page with a lot of React buttons here (I just call it Lot Of React Buttons). Here (1) we are able set how many buttons to be rendered on the page; (2) we can activate the black buttons to turn green by clicking on them. I consider it an identical use case as the OP's, and it is also a general case of browser automation (we expect something to happen if we do something on the page).
Let's say our use case is:
Scenario outline: click all the buttons with the same selector
Given I have <no.> black buttons on the page
When I click on all of them
Then I should have <no.> green buttons on the page
There is a conservative and a rather extreme scenario. To click no. = 132 buttons is not a huge CPU task, no. = 1320 can take a bit of time.
I. Array.map
In general, if we only want to perform async methods like elementHandle.click in iteration, but we don't want to return a new array: it is a bad practice to use Array.map. Map method execution is going to finish before all the iteratees are executed completely because Array iteration methods execute the iteratees synchronously, but the puppeteer methods, the iteratees are: asynchronous.
Code example
const elHandleArray = await page.$$('button')
elHandleArray.map(async el => {
await el.click()
})
await page.screenshot({ path: 'clicks_map.png' })
await browser.close()
Specialties
returns another array
parallel execution inside the .map method
fast
132 buttons scenario result: ❌
Duration: 891 ms
By watching the browser in headful mode it looks like it works, but if we check when the page.screenshot happened: we can see the clicks were still in progress. It is due to the fact the Array.map cannot be awaited by default. It is only luck that the script had enough time to resolve all clicks on all elements until the browser was not closed.
1320 buttons scenario result: ❌
Duration: 6868 ms
If we increase the number of elements of the same selector we will run into the following error:
UnhandledPromiseRejectionWarning: Error: Node is either not visible or not an HTMLElement, because we already reached await page.screenshot() and await browser.close(): the async clicks are still in progress while the browser is already closed.
II. Array.forEach
All the iteratees will be executed, but forEach is going to return before all of them finish execution, which is not the desirable behavior in many cases with async functions. In terms of puppeteer it is a very similar case to Array.map, except: for Array.forEach does not return a new array.
Code example
const elHandleArray = await page.$$('button')
elHandleArray.forEach(async el => {
await el.click()
})
await page.screenshot({ path: 'clicks_foreach.png' })
await browser.close()
Specialties
parallel execution inside the .forEach method
fast
132 buttons scenario result: ❌
Duration: 1058 ms
By watching the browser in headful mode it looks like it works, but if we check when the page.screenshot happened: we can see the clicks were still in progress.
1320 buttons scenario result: ❌
Duration: 5111 ms
If we increase the number of elements with the same selector we will run into the following error:
UnhandledPromiseRejectionWarning: Error: Node is either not visible or not an HTMLElement, because we already reached await page.screenshot() and await browser.close(): the async clicks are still in progress while the browser is already closed.
III. page.$$eval + forEach
The best performing solution is a slightly modified version of bside's answer. The page.$$eval (page.$$eval(selector, pageFunction[, ...args])) runs Array.from(document.querySelectorAll(selector)) within the page and passes it as the first argument to pageFunction. It functions as a wrapper over forEach hence it can be awaited perfectly.
Code example
await page.$$eval('button', elHandles => elHandles.forEach(el => el.click()))
await page.screenshot({ path: 'clicks_eval_foreach.png' })
await browser.close()
Specialties
no side-effects of using async puppeteer method inside a .forEach method
parallel execution inside the .forEach method
extremely fast
132 buttons scenario result: ✅
Duration: 711 ms
By watching the browser in headful mode we see the effect is immediate, also the screenshot is taken only after every element has been clicked, every promise has been resolved.
1320 buttons scenario result: ✅
Duration: 3445 ms
Works just like in case of 132 buttons, extremely fast.
IV. for...of loop
The simplest option, not that fast and executed in sequence. The script won't go to page.screenshot until the loop is not finished.
Code example
const elHandleArray = await page.$$('button')
for (const el of elHandleArray) {
await el.click()
}
await page.screenshot({ path: 'clicks_for_of.png' })
await browser.close()
Specialties
async behavior works as expected by the first sight
execution in sequence inside the loop
slow
132 buttons scenario result: ✅
Duration: 2957 ms
By watching the browser in headful mode we can see the page clicks are happening in strict order, also the screenshot is taken only after every element has been clicked.
1320 buttons scenario result: ✅
Duration: 25 396 ms
Works just like in case of 132 buttons (but it takes more time).
Summary
Avoid using Array.map if you only want to perform async events and you aren't using the returned array, use forEach or for-of instead. ❌
Array.forEach is an option, but you need to wrap it so the next async method only starts after all promises are resolved inside the forEach. ❌
Combine Array.forEach with $$eval for best performance if the order of async events doesn't matter inside the iteration. ✅
Use a for/for...of loop if speed is not vital and if the order of the async events does matter inside the iteration. ✅
Sources / Recommended materials
Sebastien Chopin: JavaScript: async/await with forEach() (codeburst.io)
Antonio Val: Making array iteration easy when using async/await (Medium)
Using async/await with a forEach loop (Stackoverflow)
Await with array foreach containing async await (Stackoverflow)

Use page.evaluate to execute JS:
const puppeteer = require('puppeteer');
puppeteer.launch().then(async browser => {
const page = await browser.newPage();
await page.evaluate(() => {
let elements = document.getElementsByClassName('showGoals');
for (let element of elements)
element.click();
});
// browser.close();
});

To get all elements, you should use page.$$ method, which is the same as [...document.querySelectorAll] (spread inside an array) from reqular browser API.
Then you could loop through it (map, for, whatever you like) and evaluate each link:
const getThemAll = await page.$$('a.showGoals')
getThemAll.forEach(async link => {
await page.evaluate(() => link.click())
})
Since you also want to do actions with the things you got, I'd recommend using page.$$eval which will do the same as above and run an evaluation function afterwards with each of the elements in the array in one line. For example:
await page.$$eval('a.showGoals', links => links.forEach(link => link.click()))
To explain the line above better, $$eval returns an array of links, then it executes a callback function with the links as argument then it runs through every link via forEach method and finally execute the click function in each one.
Check the official documentation too, they have good examples there.

page.$$() / elementHandle.click()
You can use page.$$() to create an ElementHandle array based on the given selector, and then you can use elementHandle.click() to click each element:
const elements = await page.$$('a.showGoals');
elements.forEach(async element => {
await element.click();
});
Note: Remember to await the click in an async function. Otherwise, you will receive the following error:
SyntaxError: await is only valid in async function

Related

Undefined value for input fill

Follow my previous article Built method time is not a function, I managed to successfully implement the functions with an appropriate wait time by following a combination of #ggorlen's comment and #Konrad Linkowski answer, additionally, this article puppeteer: wait N seconds before continuing to the next line that #ggorlen answered in, this comment especially helped: -
Something else? Run an evaluate block and add your own code to wait for a DOM mutation or poll with setInterval or requestAnimationFrame and effectively reimplement waitForFunction as fits your needs.
Instead I incorporated waitForSelector, produces the following script:
const puppeteer = require('puppeteer')
const EXTENSION = '/Users/usr/Library/Application Support/Google/Chrome/Profile 1/Extensions/gidnphnamcemailggkemcgclnjeeokaa/1.14.4_0'
class Agent {
constructor(extension) {
this._extension = extension
}
async runBrowser() {
const browser = await puppeteer.launch({
headless:false,
devtools:true,
args:[`--disable-extensions-except=${this._extension}`,
`--load-extension=${this._extension}`,
'--enable-automation']
})
return browser
}
async getPage(twitch) {
const page = await (await this.runBrowser()).newPage()
await page.goto('chrome-extension://gidnphnamcemailggkemcgclnjeeokaa/popup.html')
const nextEvent = await page.evaluate(async () => {
document.getElementById('launch-trace').click()
})
const waitSelector = await page.waitForSelector('.popup-body')
const finalEvent = (twitch) => new Promise(async (twitch) => page.evaluate(async (twitch) => {
const input = document.getElementById('user-trace-id')
input.focus()
input.value = twitch
}))
await finalEvent(twitch)
}
}
const test = new Agent(EXTENSION)
test.getPage('test')
However, my webpage produces undefined rather than test, I am a little confused by the parameters twich and k, and how to properly assert the parameter twitch so its entered inside the function finalEvent.
Alternatively, I have also tried wrapping finalEvent into a Promise so I can assert the parameter twitch into it as a function, but this does not fill any value:
const finalEvent = (val) => new Promise(async () => await page.evaluate(async () => {
const nextTime = () => new Promise(async () => setInterval(async () => {
const input = document.getElementById('user-trace-id')
input.focus()
input.value = val
}, 3000))
//await nextTime(k)
}))
await finalEvent(twitch)

There are a few issues here. First,
const page = await (await this.runBrowser()).newPage()
hangs the browser handle and leaks memory which keeps the process alive. Always close the browser when you finish using it:
const browser = await this.runBrowser();
const page = await browser.newPage();
// ... do your work ...
await browser.close();
Here, though, Puppeteer can throw, again leaking the browser and preventing your app from cleanly exiting, so I suggest adding a try/catch block with a finally block that closes the browser.
Generally speaking, try to get the logic working first, then do a refactor to break code into functions and classes. Writing abstractions and thinking about design while you're still battling bugs and logical problems winds up making both tasks harder.
Secondly, there's no need to async a function if you never use await in it, as in:
const nextEvent = await page.evaluate(async () => {
document.getElementById('launch-trace').click()
})
Here, nextEvent is undefined because evaluate()'s callback returned nothing. Luckily, you didn't attempt to use it. You also have const waitSelector = await page.waitForSelector('.popup-body') which does return the element, but it goes unused. I suggest enabling eslint no-unused-vars, because these unused variables make a confusing situation worse and often indicate typos and bugs.
On to the main problem,
const finalEvent = (twitch) => new Promise(async (twitch) => page.evaluate(async (twitch) => {
const input = document.getElementById('user-trace-id')
input.focus()
input.value = twitch
}))
await finalEvent(twitch)
There are a number of misunderstandings here.
The first is the age-old Puppeteer gotcha, confusing which code executes in the browser process and which code executes in the Node process. Everything in an evaluate() callback (or any of its family, $eval, evaluateHandle, etc) executes in the browser, so Node variables that look like they should be in scope won't be. You have to pass and return serializable data or element handles to and from these callbacks. In this case, twitch isn't in scope of the evaluate callback. See the canonical How can I pass a variable into an evaluate function? for details.
The second misunderstanding is technically cosmetic in that you can make the code work with it, but it's a serious code smell that indicates significant confusion and should be fixed. See What is the explicit promise construction antipattern and how do I avoid it? for details, but the gist is that when you're working with a promise-based API like Puppeteer, you should never need to use new Promise(). Puppeteer's methods already return promises, so it's superfluous at best to wrap more promises on top of the them, and at worst, introduces bugs and messes up error handling.
A third issue is that the first parameter to new Promise((resolve, reject) => {}) is always a resolve function, so twitch is a confusing mislabel. Luckily, it won't matter as we'll be dispensing with the new Promise idiom when using Puppeteer 99.9% of the time.
So let's fix the code, keeping these points in mind:
await page.evaluate(twitch => {
const input = document.getElementById('user-trace-id');
input.focus();
input.value = twitch;
},
twitch
);
Note that I'm not assigning the return value to anything because there's nothing being returned by the evaluate() callback.
"Selecting, then doing something with the selected element" is such a common pattern that Puppeteer provides a handy method to shorten the above code:
await page.$eval("#user-trace-id", (input, twitch) => {
input.focus();
input.value = twitch;
},
twitch
);
Now, I can't run or reproduce your code as I don't have your extension, and I'm not sure what goal you're trying to achieve, but even the above code looks potentially problematic.
Usually, you want to use Puppeteer's page.type() method rather than a raw DOM input.value = ..., which doesn't fire any event handlers that might be attached to the input. Many inputs won't register such a change, and it's an untrusted event.
Also, it's weird that you'd have to .focus() on the input before setting its value. Usually focus is irrelevant to setting a value property, and the value will be set either way.
So there may be more work to do, but hopefully this will point you in the right direction by resolving the first layer of immediate issues at hand. If you're still stuck, I suggest taking a step back and providing context in your next question of what you're really trying to accomplish here in order to avoid an xy problem. There's a strong chance that there's a fundamentally better approach than this.

Await several async functions, but continue when one finishes

I'm trying to write a function that searches using 2 algorithms. The main function should call both main algorithm functions simultaneously, but continue as soon as one finishes. It should also stop the other function from running.
I currently have an entry function set up like this:
async function entry(code) {
let [ product_algorithm_1, product_algorithm_2 ] = await Promise.all([
get_info_algorithm_1(code),
get_info_algorithm_2(code)
])
// Here I would check which variable is not empty, and display the results
}
This works fine, but the issue is that it waits for both functions to finish before continuing. I'm trying to continue when one finishes, and deleting the others process. Does anyone know how to achieve this?

What you're looking for is Promise.race(). It will return when one of the calls has resolved. Find out more on MDN.
async function entry(code) {
let product_algorithm_fastest = await Promise.race([
get_info_algorithm_1(code),
get_info_algorithm_2(code)
])
}
Its worth noting that the returned value will be that of the promise that was "fastest"

async resolve() needs to be wrapped?

Why do I need to wrap resolve() with meaningless async function in node 10.16.0, but not in chrome? Is this node.js bug?
let shoot = async () => console.log('there shouldn\'t be race condition');
(async () => {
let c = 3;
while(c--) {
// Works also in node 10.16.0
// console.log(await new Promise(resolve => shoot = async (...args) => resolve(...args)));
// Works is chrome, but not in node 10.16.0?
console.log(await new Promise(resolve => shoot = resolve));
};
})();
(async () => {
await shoot(1);
await shoot(2);
await shoot(3);
})();

resolve() is not async
And calling resolve() (via shoot()) does not immediately triggers the related await (in the loop) - but instead queues up the event. Adding async/await gives chance to the event loop to wake up and consume the queue. In chrome await alone is enough and in node await needs to be coupled with actual async function. This kind of synchronization of tasks in not reliable and there are chances of calling the same resolve() twice.
This is an example of what NOT to do in javascript.

It’s a Node 10 (possible) bug or (probable) outdated behaviour* in the implementation of promises. According to the ECMAScript spec at the time of this writing, in
await shoot(1);
shoot(1) fulfills the promise created with new Promise(), which enqueues a job for each reaction in that promise’s fulfill reactions
await undefined (what shoot(1) returns) enqueues a job to continue after this statement, because undefined is converted to a fulfilled promise
The reaction in the promise’s fulfill reactions corresponding to the await in the first IIFE was added by PerformPromiseThen and it doesn’t involve any other jobs; it just continues inside that IIFE immediately.
In short, the next shoot = resolve should always run before execution continues after await shoot(n). The Node 12/current Chrome result is correct.
Normally, you shouldn’t come across this type of bug anyway: as I mentioned in the comments, relying on operations creating specific numbers of jobs/taking specific numbers of microticks for synchronization is bad design. If you wanted a sort of stream where each shoot() call always produces a loop iteration (even without the misleading await), something like this would be better:
let available;
(async () => {
let queue = new Queue();
while (true) {
await new Promise(resolve => {
available = value => {
resolve();
queue.enqueue(value);
};
});
available = null; // just an assertion, pretty much
while (!queue.isEmpty()) {
let value = queue.dequeue();
// process value
}
}
})();
shoot(1);
shoot(2);
shoot(3);
with an appropriate queue implementation. (Then you could look to async iterators to make consuming the queue neat.)
* not sure of the exact history here. fairly certain the ES spec used to reference microtasks, but they’re jobs now. current stable firefox matches node 10. await may take less time than it used to. this kind of thing is the reason for the following advice.

Your code is relying on a somewhat obscure timing issue involving await of a constant (a non-promise). That timing issue apparently does not behave the same in the two environments you have tested.
Each await shoot(n) is really just doing await resolve(n) which is not awaiting a promise. It's awaiting undefined since resolve() has no return value.
So, you're apparently seeing an implementation difference in event loop and promise implementation when you await a non-promise. You are apparently expecting await resolve() to somehow be asynchronous and allow your while() loop to run before running the next await shoot(n), but I'm not aware of a language requirement to do that and even if there is, it's an implementation detail that you probably should not write code that relies on.
I think it's basically just bad code design that relies on micro-details of scheduling of two jobs that are enqueued at about the same time. It's always safer to write the code in a way that enforces the proper sequencing rather than relying on micro-details of scheduler implementation - even if these details are in the specification and certainly if they are not.
node.js is perhaps more optimized or buggy (I don't know which) to not go back to the event loop when doing await on a constant. Or, if it does go back to the event loop, it goes in a prioritized fashion that keeps the current chain of code executing rather than letting other promises go next. In any case, for this code to work, it has to rely on some await someConstant behavior that isn't the same everywhere.
Wrapping resolve() forces the interpreter to go back to the event loop after each await shoot(n) because it is actually now awaiting a promise which gives the while() loop a chance to run and fill shoot with a new value before the next shoot(n) is called.

how to add a wait in time in a for ..of?

I am building a code that allows me to wait for a promise to be resolved to advance to the next, I am also adding a separate time for each wait, but only the first loop is printed, if I remove the wait in seconds this works well.
in play code I get this error, in other editors I simply print the first loop
error: Infinity loop on line 8, char 6. You can increase loop timeout in settings.
https://playcode.io/309050?tabs=console&script.js&output
in the stackoverflow code editor if the code works but in my project or in another code editor they do not work
const urls = [
'https://jsonplaceholder.typicode.com/todos/1',
'https://jsonplaceholder.typicode.com/todos/2',
'https://jsonplaceholder.typicode.com/todos/3'
];
async function getTodos() {
for (const [idx, url] of urls.entries()) {
const todo = await fetch(url);
console.log(`Received Todo ${idx+1}:`, todo);
await wait(1000)
}
console.log('Finished!');
}
getTodos();
function wait(ms) {
return new Promise(r => setTimeout(r, ms));
}

Some online code playgrounds, like CodePen, have built-in infinite-loop detection. This is to prevent the UI from freezing in the event that you accidentally write an infinite loop while scratch coding. This is usually achieved by statically analyzing your code. Because of this, it cannot tell what your code actually does at runtime (and it may not be infinite at all). The best they can do is guess based on structure.
In your case, your code editor thinks that you're doing an infinite loop. There's probably settings in your editor to disable that.

Why doesn't await block lazily?

I'm interested to understand why using await immediately blocks, instead of lazily blocking only once the awaited value is referenced.
I hope this is an illuminating/important question, but I'm worried it may be considered out of scope for this site - if it is I'll apologize, have a little cry, and take the question elsewhere.
E.g. consider the following:
let delayedValue = (value, ms=1000) => {
return new Promise(resolve => {
setTimeout(() => resolve(val), ms);
});
};
(async () => {
let value = await delayedValue('val');
console.log('After await');
})();
In the anonymous async function which immediately runs, we'll only see the console say After await after the delay. Why is this necessary? Considering that we don't need value to be resolved, why haven't the language designers decided to execute the console.log statement immediately in a case like this?
For example, it's unlike the following example where delaying console.log is obviously unavoidable (because the awaited value is referenced):
(async () => {
let value = await delayedValue('val');
console.log('After await: ' + value);
});
I see tons of advantages to lazy await blocking - it could lead to automatic parallelization of unrelated operations. E.g. if I want to read two files and then work with both of them, and I'm not being dilligent, I'll write the following:
(async() => {
let file1 = await readFile('file1dir');
let file2 = await readFile('file2dir');
// ... do things with `file1` and `file2` ...
});
This will wait to have read the 1st file before beginning to read the 2nd. But really they could be read in parallel, and javascript ought to be able to detect this because file1 isn't referenced until later. When I was first learning async/await my initial expectation was that code like the above would result in parallel operations, and I was a bit disappointed when that turned out to be false.
Getting the two files to read in parallel is STILL a bit messy, even in this beautiful world of ES7 because await blocks immediately instead of lazily. You need to do something like the following (which is certainly messier than the above):
(async() => {
let [ read1, read2] = [ readFile('file1dir'), readFile('file2dir') ];
let [ file1, file2 ] = [ await read1, await read2];
// ... do things with `file1` and `file2` ...
});
Why have the language designers chosen to have await block immediately instead of lazily?
E.g. Could it lead to debugging difficulties? Would it be too difficult to integrate lazy awaits into javascript? Are there situations I haven't thought of where lazy awaits lead to messier code, poorer performance, or other negative consequences?

Reason 1: JavaScript is not lazily evaluated. There is no infrastructure to detect "when a value is actually needed".
Reason 2: Implicit parallelism would be very hard to control. What if I wanted my files to be read sequentially, how would I write that? Changing little things in the syntax should not lead to very different evaluation.
Getting the two files to read in parallel is STILL a bit messy
Not at all:
const [file1, file2] = await Promise.all([readFile('file1dir'), readFile('file2dir')]);
You need to do something like the following
No, you absolutely shouldn't write it like that.

Develop Reference

JavaScript is the programming language of the Web.