I have simple Puppeteer script:
await page.goto(MY_URL, {waitUntil: 'load'});
const html = await page.evaluate(() => document.body.innerHTML);
Then I check if html contains some key strings and this part always pass (I mentioned in case if this could anyhow influence further process).
Abd after that, I wait for a function to be included in window object.
await page.waitForFunction(() => 'myFunction' in window);
This function is written at the bottom of the script attached in <head> by <script> tag of the page.
Mostly, waitForFunction resolves as it should, but sometimes it doesn't.
And when I pass {timeout: 0} it gonna wait forever and never resolves.
It also seems that this happen only in headless mode.
What could be the cause of such a behavior? How to overcome or debug such issue?
Seems that JavaScript files sometimes weren't loaded.
The solution for me was:
await page.goto(MY_URL, {waitUntil: 'networkidle2'});
Related
Follow my previous article Built method time is not a function, I managed to successfully implement the functions with an appropriate wait time by following a combination of #ggorlen's comment and #Konrad Linkowski answer, additionally, this article puppeteer: wait N seconds before continuing to the next line that #ggorlen answered in, this comment especially helped: -
Something else? Run an evaluate block and add your own code to wait for a DOM mutation or poll with setInterval or requestAnimationFrame and effectively reimplement waitForFunction as fits your needs.
Instead I incorporated waitForSelector, produces the following script:
const puppeteer = require('puppeteer')
const EXTENSION = '/Users/usr/Library/Application Support/Google/Chrome/Profile 1/Extensions/gidnphnamcemailggkemcgclnjeeokaa/1.14.4_0'
class Agent {
constructor(extension) {
this._extension = extension
}
async runBrowser() {
const browser = await puppeteer.launch({
headless:false,
devtools:true,
args:[`--disable-extensions-except=${this._extension}`,
`--load-extension=${this._extension}`,
'--enable-automation']
})
return browser
}
async getPage(twitch) {
const page = await (await this.runBrowser()).newPage()
await page.goto('chrome-extension://gidnphnamcemailggkemcgclnjeeokaa/popup.html')
const nextEvent = await page.evaluate(async () => {
document.getElementById('launch-trace').click()
})
const waitSelector = await page.waitForSelector('.popup-body')
const finalEvent = (twitch) => new Promise(async (twitch) => page.evaluate(async (twitch) => {
const input = document.getElementById('user-trace-id')
input.focus()
input.value = twitch
}))
await finalEvent(twitch)
}
}
const test = new Agent(EXTENSION)
test.getPage('test')
However, my webpage produces undefined rather than test, I am a little confused by the parameters twich and k, and how to properly assert the parameter twitch so its entered inside the function finalEvent.
Alternatively, I have also tried wrapping finalEvent into a Promise so I can assert the parameter twitch into it as a function, but this does not fill any value:
const finalEvent = (val) => new Promise(async () => await page.evaluate(async () => {
const nextTime = () => new Promise(async () => setInterval(async () => {
const input = document.getElementById('user-trace-id')
input.focus()
input.value = val
}, 3000))
//await nextTime(k)
}))
await finalEvent(twitch)
There are a few issues here. First,
const page = await (await this.runBrowser()).newPage()
hangs the browser handle and leaks memory which keeps the process alive. Always close the browser when you finish using it:
const browser = await this.runBrowser();
const page = await browser.newPage();
// ... do your work ...
await browser.close();
Here, though, Puppeteer can throw, again leaking the browser and preventing your app from cleanly exiting, so I suggest adding a try/catch block with a finally block that closes the browser.
Generally speaking, try to get the logic working first, then do a refactor to break code into functions and classes. Writing abstractions and thinking about design while you're still battling bugs and logical problems winds up making both tasks harder.
Secondly, there's no need to async a function if you never use await in it, as in:
const nextEvent = await page.evaluate(async () => {
document.getElementById('launch-trace').click()
})
Here, nextEvent is undefined because evaluate()'s callback returned nothing. Luckily, you didn't attempt to use it. You also have const waitSelector = await page.waitForSelector('.popup-body') which does return the element, but it goes unused. I suggest enabling eslint no-unused-vars, because these unused variables make a confusing situation worse and often indicate typos and bugs.
On to the main problem,
const finalEvent = (twitch) => new Promise(async (twitch) => page.evaluate(async (twitch) => {
const input = document.getElementById('user-trace-id')
input.focus()
input.value = twitch
}))
await finalEvent(twitch)
There are a number of misunderstandings here.
The first is the age-old Puppeteer gotcha, confusing which code executes in the browser process and which code executes in the Node process. Everything in an evaluate() callback (or any of its family, $eval, evaluateHandle, etc) executes in the browser, so Node variables that look like they should be in scope won't be. You have to pass and return serializable data or element handles to and from these callbacks. In this case, twitch isn't in scope of the evaluate callback. See the canonical How can I pass a variable into an evaluate function? for details.
The second misunderstanding is technically cosmetic in that you can make the code work with it, but it's a serious code smell that indicates significant confusion and should be fixed. See What is the explicit promise construction antipattern and how do I avoid it? for details, but the gist is that when you're working with a promise-based API like Puppeteer, you should never need to use new Promise(). Puppeteer's methods already return promises, so it's superfluous at best to wrap more promises on top of the them, and at worst, introduces bugs and messes up error handling.
A third issue is that the first parameter to new Promise((resolve, reject) => {}) is always a resolve function, so twitch is a confusing mislabel. Luckily, it won't matter as we'll be dispensing with the new Promise idiom when using Puppeteer 99.9% of the time.
So let's fix the code, keeping these points in mind:
await page.evaluate(twitch => {
const input = document.getElementById('user-trace-id');
input.focus();
input.value = twitch;
},
twitch
);
Note that I'm not assigning the return value to anything because there's nothing being returned by the evaluate() callback.
"Selecting, then doing something with the selected element" is such a common pattern that Puppeteer provides a handy method to shorten the above code:
await page.$eval("#user-trace-id", (input, twitch) => {
input.focus();
input.value = twitch;
},
twitch
);
Now, I can't run or reproduce your code as I don't have your extension, and I'm not sure what goal you're trying to achieve, but even the above code looks potentially problematic.
Usually, you want to use Puppeteer's page.type() method rather than a raw DOM input.value = ..., which doesn't fire any event handlers that might be attached to the input. Many inputs won't register such a change, and it's an untrusted event.
Also, it's weird that you'd have to .focus() on the input before setting its value. Usually focus is irrelevant to setting a value property, and the value will be set either way.
So there may be more work to do, but hopefully this will point you in the right direction by resolving the first layer of immediate issues at hand. If you're still stuck, I suggest taking a step back and providing context in your next question of what you're really trying to accomplish here in order to avoid an xy problem. There's a strong chance that there's a fundamentally better approach than this.
I'm getting started with puppeteer but have minimal node experience. I'm interested in debugging and trying out pieces of code in a REPL loop. So far I have the following:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();
await page.goto('https:yahoo.com');
await page.screenshot({path: 'example.png'});
await browser.close();
})();
I tried to take a second screen shot by entering :
page.screenshot({path: 'example1.png'});
but this returns a promise. Is there a way to evaluate the result within the REPL loop
EDIT:
I entered both lines into the REPL at the bottom of the debug console, the output is in the screenshot. Am I doing something wrong?
EDIT2:
I entered your code into the debug window REPL at the bottom of the debug console, the output is in the screenshot.
If you want to pay with the result in the REPL you'd have to do something like this:
var res; page.screenshot({path: 'example1.png'}).then(r => {res=r;console.log('done')});
The done string is printed you'll have the result in your res variable so you can play with it.
Step by Step:
var res is declaring an empty variable so you can use it later
page.screenshot({path: 'example1.png'}) will return a promise, hence the .then right after it
.then receives a function that will be called asynchronously when the promise is resolved. That function will be called with some input.
r => {res=r;console.log('done')} this is the anonymous function passed to the then of that promise. The argument passed by page.screenshot(...).then will be stored in the r param/variable.
res=r; this sets to the res variable the argument sent from the resolution of the promise.
console.log('done') is just there so you will know when the promise is resolved.
after that, you can inspect the res variable to understand in detail whatever is there.
You can use this approach for ANY type of promise resolution debug, since when in the console (at the moment) we don't have the ability to run something like var res = await page.screenshot({path: 'example1.png'})
At the start of the Puppeteer tutorial, it says to do this:
const puppeteer = require('puppeteer');
(async () =>
{
await page.goto('https://example.com');
const browser = await puppeteer.launch();
const page = await browser.newPage();
await browser.close();
})();
This seems odd to me as the whole thing is wrapped inside an asynchronous function. What if I want to wait until this finishes to continue?
Edit - Why this seems odd to me:
What if all my code relied on the browser, i.e., there is nothing I can do outside this async function. Then my code would look like this:
//nothing up here
(async () =>
{
//EVERYTHING is in here
})();
//nothing down here
This seems weird because I might as well do everything synchronously instead of wrapping my entire program in an async function.
Reason for the async function
You need to wrap the code containing await instructions inside an async function for backwards compatibility reasons. Before ES7, you could use the word await as variable or function name, meaning this was valid code:
var await = 123;
console.log(await);
To not mess with existing code, the await keyword only works inside of async functions, meaning to write code like await page.goto(..) you have to put it inside an async function like the one you are using.
Waiting for the code to finish
To wait until the code has finished, you can just continue after the last await statement like this:
(async () => {
// ...
await browser.close();
// continue with more code
})();
I try to find the best way to "wait until the complete website is loaded". And this seems to be a tricky thing. I was googleing alot and saw that there are 2 ways: waitForSelector and SetTimout.
My problem is, even if I wait for the Selector #CheckSelectAll to check this Checkbox it seems it's always too early. So I had to add a delay of 2 seconds. And this looks very unprofessional for me. I want to use the best practice for this.
This should be an issue everybody always needs when using puppeteer with different pages and forms.
Is it possible that this waitForSelector doesn't work when the selector is inside an iFrame?
Thanks for any advice and help!
function delay(time) {
return new Promise(function(resolve) {
setTimeout(resolve, time)
});
await page.waitForSelector('#CheckSelectAll');
await delay(2000);
await page.click('#CheckSelectAll');
If you want to make it real "puppeteer" way, take a look at this.
There is a few options to choose, but in my practice I found the most useful networkidle2.
As said in documentation, your script will wait until
there are no more than 2 network connections for at least 500 ms.
await page.waitForNavigation({ waitUntil: 'networkidle2' })
But, if for some reason in-box solution can't handle your case, it's O.K. to make custom wait function as you did.
Here is nice one line, to write less code:
await new Promise(resolve => setTimeout(resolve, 2000))
And the last one option, you can access DOM with page.evaluate() and verify if element is visible.
const visibleVerification = await page.evaluate(() => {
// your verify logic.
// return boolean, if element exists on page
return true;
})
Using Puppeteer, I would like to get all the elements on a page with a particular class name and then loop through and click each one.
Using jQuery, I can achieve this with:
var elements = $("a.showGoals").toArray();
for (i = 0; i < elements.length; i++) {
$(elements[i]).click();
}
How would I achieve this using Puppeteer?
Update
Tried out Chridam's answer below, but I couldn't get it to work (though the answer was helpful, so thanks due there), so I tried the following and this works:
await page.evaluate(() => {
let elements = $('a.showGoals').toArray();
for (i = 0; i < elements.length; i++) {
$(elements[i]).click();
}
});
Iterating puppeteer async methods in for loop vs. Array.map()/Array.forEach()
As all puppeteer methods are asynchronous it doesn't matter how we iterate over them. I've made a comparison and a rating of the most commonly recommended and used options.
For this purpose, I have created a React.Js example page with a lot of React buttons here (I just call it Lot Of React Buttons). Here (1) we are able set how many buttons to be rendered on the page; (2) we can activate the black buttons to turn green by clicking on them. I consider it an identical use case as the OP's, and it is also a general case of browser automation (we expect something to happen if we do something on the page).
Let's say our use case is:
Scenario outline: click all the buttons with the same selector
Given I have <no.> black buttons on the page
When I click on all of them
Then I should have <no.> green buttons on the page
There is a conservative and a rather extreme scenario. To click no. = 132 buttons is not a huge CPU task, no. = 1320 can take a bit of time.
I. Array.map
In general, if we only want to perform async methods like elementHandle.click in iteration, but we don't want to return a new array: it is a bad practice to use Array.map. Map method execution is going to finish before all the iteratees are executed completely because Array iteration methods execute the iteratees synchronously, but the puppeteer methods, the iteratees are: asynchronous.
Code example
const elHandleArray = await page.$$('button')
elHandleArray.map(async el => {
await el.click()
})
await page.screenshot({ path: 'clicks_map.png' })
await browser.close()
Specialties
returns another array
parallel execution inside the .map method
fast
132 buttons scenario result: ❌
Duration: 891 ms
By watching the browser in headful mode it looks like it works, but if we check when the page.screenshot happened: we can see the clicks were still in progress. It is due to the fact the Array.map cannot be awaited by default. It is only luck that the script had enough time to resolve all clicks on all elements until the browser was not closed.
1320 buttons scenario result: ❌
Duration: 6868 ms
If we increase the number of elements of the same selector we will run into the following error:
UnhandledPromiseRejectionWarning: Error: Node is either not visible or not an HTMLElement, because we already reached await page.screenshot() and await browser.close(): the async clicks are still in progress while the browser is already closed.
II. Array.forEach
All the iteratees will be executed, but forEach is going to return before all of them finish execution, which is not the desirable behavior in many cases with async functions. In terms of puppeteer it is a very similar case to Array.map, except: for Array.forEach does not return a new array.
Code example
const elHandleArray = await page.$$('button')
elHandleArray.forEach(async el => {
await el.click()
})
await page.screenshot({ path: 'clicks_foreach.png' })
await browser.close()
Specialties
parallel execution inside the .forEach method
fast
132 buttons scenario result: ❌
Duration: 1058 ms
By watching the browser in headful mode it looks like it works, but if we check when the page.screenshot happened: we can see the clicks were still in progress.
1320 buttons scenario result: ❌
Duration: 5111 ms
If we increase the number of elements with the same selector we will run into the following error:
UnhandledPromiseRejectionWarning: Error: Node is either not visible or not an HTMLElement, because we already reached await page.screenshot() and await browser.close(): the async clicks are still in progress while the browser is already closed.
III. page.$$eval + forEach
The best performing solution is a slightly modified version of bside's answer. The page.$$eval (page.$$eval(selector, pageFunction[, ...args])) runs Array.from(document.querySelectorAll(selector)) within the page and passes it as the first argument to pageFunction. It functions as a wrapper over forEach hence it can be awaited perfectly.
Code example
await page.$$eval('button', elHandles => elHandles.forEach(el => el.click()))
await page.screenshot({ path: 'clicks_eval_foreach.png' })
await browser.close()
Specialties
no side-effects of using async puppeteer method inside a .forEach method
parallel execution inside the .forEach method
extremely fast
132 buttons scenario result: ✅
Duration: 711 ms
By watching the browser in headful mode we see the effect is immediate, also the screenshot is taken only after every element has been clicked, every promise has been resolved.
1320 buttons scenario result: ✅
Duration: 3445 ms
Works just like in case of 132 buttons, extremely fast.
IV. for...of loop
The simplest option, not that fast and executed in sequence. The script won't go to page.screenshot until the loop is not finished.
Code example
const elHandleArray = await page.$$('button')
for (const el of elHandleArray) {
await el.click()
}
await page.screenshot({ path: 'clicks_for_of.png' })
await browser.close()
Specialties
async behavior works as expected by the first sight
execution in sequence inside the loop
slow
132 buttons scenario result: ✅
Duration: 2957 ms
By watching the browser in headful mode we can see the page clicks are happening in strict order, also the screenshot is taken only after every element has been clicked.
1320 buttons scenario result: ✅
Duration: 25 396 ms
Works just like in case of 132 buttons (but it takes more time).
Summary
Avoid using Array.map if you only want to perform async events and you aren't using the returned array, use forEach or for-of instead. ❌
Array.forEach is an option, but you need to wrap it so the next async method only starts after all promises are resolved inside the forEach. ❌
Combine Array.forEach with $$eval for best performance if the order of async events doesn't matter inside the iteration. ✅
Use a for/for...of loop if speed is not vital and if the order of the async events does matter inside the iteration. ✅
Sources / Recommended materials
Sebastien Chopin: JavaScript: async/await with forEach() (codeburst.io)
Antonio Val: Making array iteration easy when using async/await (Medium)
Using async/await with a forEach loop (Stackoverflow)
Await with array foreach containing async await (Stackoverflow)
Use page.evaluate to execute JS:
const puppeteer = require('puppeteer');
puppeteer.launch().then(async browser => {
const page = await browser.newPage();
await page.evaluate(() => {
let elements = document.getElementsByClassName('showGoals');
for (let element of elements)
element.click();
});
// browser.close();
});
To get all elements, you should use page.$$ method, which is the same as [...document.querySelectorAll] (spread inside an array) from reqular browser API.
Then you could loop through it (map, for, whatever you like) and evaluate each link:
const getThemAll = await page.$$('a.showGoals')
getThemAll.forEach(async link => {
await page.evaluate(() => link.click())
})
Since you also want to do actions with the things you got, I'd recommend using page.$$eval which will do the same as above and run an evaluation function afterwards with each of the elements in the array in one line. For example:
await page.$$eval('a.showGoals', links => links.forEach(link => link.click()))
To explain the line above better, $$eval returns an array of links, then it executes a callback function with the links as argument then it runs through every link via forEach method and finally execute the click function in each one.
Check the official documentation too, they have good examples there.
page.$$() / elementHandle.click()
You can use page.$$() to create an ElementHandle array based on the given selector, and then you can use elementHandle.click() to click each element:
const elements = await page.$$('a.showGoals');
elements.forEach(async element => {
await element.click();
});
Note: Remember to await the click in an async function. Otherwise, you will receive the following error:
SyntaxError: await is only valid in async function