I'm working on a web scraper in Javascript using puppeteer and whenever I try to log the text content of an element it says "Promise { Pending }". I've looked at other answers and none of them worked
const element = await page.$("#ctl00_ContentPlaceHolder1_NameLinkButton");
const text = await page.evaluate(element => element.textContent, element);
console.log(text);
Your answer is correct. but I think you forget to add await before page.evaluate().
There three ways to do that.
First way. just like what are you do. but I don't prefer it because
you don't need to call page.evaluate() to get .textContent
const puppeteer = require('puppeteer');
puppeteer.launch().then(async browser => {
const elementId = 'container';
const page = await browser.newPage();
await page.goto('https://metwally.me');
const element = await page.$(`#${elementId}`);
if (element) {
const text = await page.evaluate(element => element.textContent, element);
console.log(text);
} else {
// handle not exists id
console.log('Not Found');
}
});
Second way. you will call page.evaluate() and use JavaScript Dom to get textContent. like document.getElementById(elementId).textContent.
const puppeteer = require('puppeteer');
puppeteer.launch().then(async browser => {
const elementId = 'container';
const page = await browser.newPage();
await page.goto('https://metwally.me');
const text = await page.evaluate(
elementId => {
const element = document.getElementById(elementId);
return element ? element.textContent : null;
}, elementId);
if (text !== null) {
console.log(text);
} else {
// handle not exists id
console.log('Not Found');
}
});
Third way. you will select element by puppeteer selector then get textContent property using await element.getProperty('textContent') then get value from textContent._remoteObject.value.
const puppeteer = require('puppeteer');
puppeteer.launch().then(async browser => {
const elementId = 'container';
const page = await browser.newPage();
await page.goto('https://metwally.me');
const element = await page.$(`#${elementId}`);
if (element) {
const textContent = await element.getProperty('textContent');
const text = textContent._remoteObject.value;
console.log(text);
} else {
// handle not exists id
console.log('Not Found');
}
});
NOTE: All these examples working successfully in my machine.
os ubuntu 20.04
nodejs v10.19.0
puppeteer v1.19.0
References
Puppeteer page.$
Document.getElementById()
Node.textContent
I am trying to get all input element in this website:
http://rwis.mdt.mt.gov/scanweb/swframe.asp?Pageid=SfHistoryTable&Units=English&Groupid=269000&Siteid=269003&Senid=0&DisplayClass=NonJava&SenType=All&CD=7%2F1%2F2020+10%3A41%3A50+AM
Here is element source page looks like.
here is my code:
const puppeteer = require("puppeteer");
function run() {
return new Promise(async (resolve, reject) => {
try {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(
"http://rwis.mdt.mt.gov/scanweb/swframe.asp?Pageid=SfHistoryTable&Units=English&Groupid=269000&Siteid=269003&Senid=0&DisplayClass=NonJava&SenType=All&CD=7%2F1%2F2020+10%3A41%3A50+AM"
);
let urls = await page.evaluate(() => {
let results = [];
let items = document.querySelectorAll("input").length;
return items;
});
browser.close();
return resolve(urls);
} catch (e) {
return reject(e);
}
});
}
run().then(console.log).catch(console.error);
Right now my output have 0, when i run document.querySelectorAll("input").length in the console, it give me 8 .
It seems like everything is loaded in the frameset tag, this might be the issue, could anyone have any idea how to solve this issue?
You have to get the frame element, from there you can get the frame itself so you can call evaluate inside that frame:
const elementHandle = await page.$('frame[name=SWContent]');
const frame = await elementHandle.contentFrame();
let urls = await frame.evaluate(() => {
let results = [];
let items = document.querySelectorAll("input").length;
return items;
});
Requirement is to create folder structure from an array in a SharePoint library using JavaScript. Below is the desired structure:
var ary = [A,B,C]
A -> Fldr1 -> Fldr2 -> File
B -> Fldr1 -> Fldr2 -> File
C -> Fldr1 -> Fldr2 -> File
But currently its creating folder A,B and C in library but inside structure is being created for C only.
So result am getting is :
A
B
C -> Fldr1 -> Fldr2 -> File
Below code works perfect when only one item in array, but fails when multiple items.
Here I check if folder exists, then check for 2nd level, if doesn't exist create first and so on for remaining structure.
async function processArray(selectedCountries) {
await selectedCountries.map(async (item) => {
let promiseCntry = await checkCntryFolder(item);
if(flag){ //if cntry exists
let promiseChckfolder = await checkFolder(tdmrkNm);
if(flagFldr)//if folder exists
{
let promiseChkSubFolder = await checkSubFolder(appStatus);
if(flagSub)//if sub -folder exists
{
let createFile = await CreateFileInSideFolder();
}
else
{
let promiseCreateSub = await createSubFolder(appStatus);
let createFile = await CreateFileInSideFolder();
}
}
}
});
}
}
Stop using deferreds and stop using the done method. Use proper promises with then instead.
Assuming this is jQuery, where those objects with done methods also have then methods, you can (and should) just use async/await directly:
async function callAry(array) {
return Promise.all(array.map(async (item) => {
const flag = await checkCntryFolder(item);
if (flag) {
const folderFlag = await checkFolder(nm);
if (folderFlag) {
const subFlag = await checkSubFolder(Status);
if (subFlag) {
await CreateFileInSideFolder();
console.log('file done');
}
}
}
}));
}
ok, so finally I have got it working:
I was not passing proper arguments to the methods.
Below is the working code:
async function processArray(selectedCountries) {
return Promise.all(selectedCountries.map(async (item) => {
//await selectedCountries.map(async (item) => {
let promiseCntry = await checkCntryFolder(item);
if(flag){ //if cntry exists
let promiseChckfolder = await checkFolder(tdmrkNm,item);
if(flagFldr)//if folder exists
{
let promiseChkSubFolder = await checkSubFolder(appStatus,tdmrkNm,item);
if(flagSub)//if sub -folder exists
{
let createFile = await CreateFileInSideFolder(appStatus,tdmrkNm,item);
}
else
{
let promiseCreateSub = await createSubFolder(appStatus,tdmrkNm,item);
let createFile = await CreateFileInSideFolder(appStatus,tdmrkNm,item);
}
}
else//if folder doesn't exist
{
let createFldr = await createFolder(tdmrkNm,item);
let promiseChkSubFolder = await checkSubFolder(appStatus,tdmrkNm,item);
if(flagSub)
{
let createFile = await CreateFileInSideFolder(appStatus,tdmrkNm,item);
}
else
{
let promiseCreateSub = await createSubFolder(appStatus,tdmrkNm,item);
let createFile = await CreateFileInSideFolder(appStatus,tdmrkNm,item);
}
}
}
else//if cntry doesn't exist
{
let createCntry = await createCntryFolder(item);
let promiseChckfolder = await checkFolder(tdmrkNm,item);
if(flagFldr)//if folder exists
{
let promiseChkSubFolder = await checkSubFolder(appStatus,tdmrkNm,item);
if(flagSub) //if sub-folder exists
{
let createFile = await CreateFileInSideFolder(appStatus,tdmrkNm,item);
}
else //if sub-folder doesn't exist
{
let promiseCreateSub = await createSubFolder(appStatus,tdmrkNm,item);
let createFile = await CreateFileInSideFolder(appStatus,tdmrkNm,item);
}
}
else //if folder doesn't exist
{
let createFldr = await createFolder(tdmrkNm,item);
let promiseChkSubFolder = await checkSubFolder(appStatus,tdmrkNm,item);
if(flagSub)//if sub-folder exists
{
let createFile = await CreateFileInSideFolder(appStatus,tdmrkNm,item);
}
else//if sub-folder doesn't exist
{
let promiseCreateSub = await createSubFolder(appStatus,tdmrkNm,item);
let createFile = await CreateFileInSideFolder(appStatus,tdmrkNm,item);
}
}
}
},Promise.resolve()));
I am running an automated test through puppeteer that fills up a form and checks for captcha as well. If the captcha is incorrect, it refreshes to a new image but then I need to process the whole image again and reach the function which was used earlier to process it.
(async function example() {
const browser = await puppeteer.launch({headless: false})
const page = await browser.newPage()
/*-----------NEED TO COME BACK HERE-----------*/
const tessProcess = utils.promisify(tesseract.process);
await page.setViewport(viewPort)
await page.goto('http://www.example.com')
await page.screenshot(options)
const text = await tessProcess('new.png');
console.log(text.trim());
await page.$eval('input[id=userEnteredCaptcha]', (el, value) => el.value = value, text.trim())
await page.$eval('input[id=companyID]', el => el.value = 'val');
const submitBtn = await page.$('[id="data"]');
await submitBtn.click();
try {
var x = await page.waitFor("#msgboxclose");
console.log("Captcha error")
}
catch (e) {
console.error('No Error');
}
if(x){
await page.keyboard.press('Escape');
/*---------GO FROM HERE--------*/
}
})()
I want to sort of create a loop so that the image can be processed again whenever the captcha is wrong
Declare a boolean variable that indicates whether you need to try again or not, and put the repeated functionality inside a while loop that checks that variable. If the x condition at the end of the loop is not fulfilled, set tryAgain to false, so that no further iterations occur:
(async function example() {
const browser = await puppeteer.launch({headless: false})
const page = await browser.newPage()
let tryAgain = true; // <--------------------------
while (tryAgain) { // <--------------------------
/*-----------NEED TO COME BACK HERE-----------*/
const tessProcess = utils.promisify(tesseract.process);
await page.setViewport(viewPort)
await page.goto('http://www.example.com')
await page.screenshot(options)
const text = await tessProcess('new.png');
console.log(text.trim());
await page.$eval('input[id=userEnteredCaptcha]', (el, value) => el.value = value, text.trim())
await page.$eval('input[id=companyID]', el => el.value = 'val');
const submitBtn = await page.$('[id="data"]');
await submitBtn.click();
try {
var x = await page.waitFor("#msgboxclose");
console.log("Captcha error")
}
catch (e) {
console.error('No Error');
}
if(x){
await page.keyboard.press('Escape');
/*---------GO FROM HERE--------*/
} else {
tryAgain = false; // <--------------------------
}
}
})()
I have Puppeteer controlling a website with a lookup form that can either return a result or a "No records found" message. How can I tell which was returned?
waitForSelector seems to wait for only one at a time, while waitForNavigation doesn't seem to work because it is returned using Ajax.
I am using a try catch, but it is tricky to get right and slows everything way down.
try {
await page.waitForSelector(SELECTOR1,{timeout:1000});
}
catch(err) {
await page.waitForSelector(SELECTOR2);
}
Making any of the elements exists
You can use querySelectorAll and waitForFunction together to solve this problem. Using all selectors with comma will return all nodes that matches any of the selector.
await page.waitForFunction(() =>
document.querySelectorAll('Selector1, Selector2, Selector3').length
);
Now this will only return true if there is some element, it won't return which selector matched which elements.
how about using Promise.race() like something I did in the below code snippet, and don't forget the { visible: true } option in page.waitForSelector() method.
public async enterUsername(username:string) : Promise<void> {
const un = await Promise.race([
this.page.waitForSelector(selector_1, { timeout: 4000, visible: true })
.catch(),
this.page.waitForSelector(selector_2, { timeout: 4000, visible: true })
.catch(),
]);
await un.focus();
await un.type(username);
}
An alternative and simple solution would be to approach this from a more CSS perspective. waitForSelector seems to follow the CSS selector list rules. So essentially you can select multiple CSS elements by just using a comma.
try {
await page.waitForSelector('.selector1, .selector2',{timeout:1000})
} catch (error) {
// handle error
}
Using Md. Abu Taher's suggestion, I ended up with this:
// One of these SELECTORs should appear, we don't know which
await page.waitForFunction((sel) => {
return document.querySelectorAll(sel).length;
},{timeout:10000},SELECTOR1 + ", " + SELECTOR2);
// Now see which one appeared:
try {
await page.waitForSelector(SELECTOR1,{timeout:10});
}
catch(err) {
//check for "not found"
let ErrMsg = await page.evaluate((sel) => {
let element = document.querySelector(sel);
return element? element.innerHTML: null;
},SELECTOR2);
if(ErrMsg){
//SELECTOR2 found
}else{
//Neither found, try adjusting timeouts until you never get this...
}
};
//SELECTOR1 found
I had a similar issue and went for this simple solution:
helpers.waitForAnySelector = (page, selectors) => new Promise((resolve, reject) => {
let hasFound = false
selectors.forEach(selector => {
page.waitFor(selector)
.then(() => {
if (!hasFound) {
hasFound = true
resolve(selector)
}
})
.catch((error) => {
// console.log('Error while looking up selector ' + selector, error.message)
})
})
})
And then to use it:
const selector = await helpers.waitForAnySelector(page, [
'#inputSmsCode',
'#buttonLogOut'
])
if (selector === '#inputSmsCode') {
// We need to enter the 2FA sms code.
} else if (selector === '#buttonLogOut') {
// We successfully logged in
}
In puppeteer you can simply use multiple selectors separated by coma like this:
const foundElement = await page.waitForSelector('.class_1, .class_2');
The returned element will be an elementHandle of the first element found in the page.
Next if you want to know which element was found you can get the class name like so:
const className = await page.evaluate(el => el.className, foundElement);
in your case a code similar to this should work:
const foundElement = await page.waitForSelector([SELECTOR1,SELECTOR2].join(','));
const responseMsg = await page.evaluate(el => el.innerText, foundElement);
if (responseMsg == "No records found"){ // Your code here }
One step further using Promise.race() by wrapping it and just check index for further logic:
// Typescript
export async function racePromises(promises: Promise<any>[]): Promise<number> {
const indexedPromises: Array<Promise<number>> = promises.map((promise, index) => new Promise<number>((resolve) => promise.then(() => resolve(index))));
return Promise.race(indexedPromises);
}
// Javascript
export async function racePromises(promises) {
const indexedPromises = promises.map((promise, index) => new Promise((resolve) => promise.then(() => resolve(index))));
return Promise.race(indexedPromises);
}
Usage:
const navOutcome = await racePromises([
page.waitForSelector('SELECTOR1'),
page.waitForSelector('SELECTOR2')
]);
if (navigationOutcome === 0) {
//logic for 'SELECTOR1'
} else if (navigationOutcome === 1) {
//logic for 'SELECTOR2'
}
Combining some elements from above into a helper method, I've built a command that allows me to create multiple possible selector outcomes and have the first to resolve be handled.
/**
* #typedef {import('puppeteer').ElementHandle} PuppeteerElementHandle
* #typedef {import('puppeteer').Page} PuppeteerPage
*/
/** Description of the function
#callback OutcomeHandler
#async
#param {PuppeteerElementHandle} element matched element
#returns {Promise<*>} can return anything, will be sent to handlePossibleOutcomes
*/
/**
* #typedef {Object} PossibleOutcome
* #property {string} selector The selector to trigger this outcome
* #property {OutcomeHandler} handler handler will be called if selector is present
*/
/**
* Waits for a number of selectors (Outcomes) on a Puppeteer page, and calls the handler on first to appear,
* Outcome Handlers should be ordered by preference, as if multiple are present, only the first occuring handler
* will be called.
* #param {PuppeteerPage} page Puppeteer page object
* #param {[PossibleOutcome]} outcomes each possible selector, and the handler you'd like called.
* #returns {Promise<*>} returns the result from outcome handler
*/
async function handlePossibleOutcomes(page, outcomes)
{
var outcomeSelectors = outcomes.map(outcome => {
return outcome.selector;
}).join(', ');
return page.waitFor(outcomeSelectors)
.then(_ => {
let awaitables = [];
outcomes.forEach(outcome => {
let await = page.$(outcome.selector)
.then(element => {
if (element) {
return [outcome, element];
}
return null;
});
awaitables.push(await);
});
return Promise.all(awaitables);
})
.then(checked => {
let found = null;
checked.forEach(check => {
if(!check) return;
if(found) return;
let outcome = check[0];
let element = check[1];
let p = outcome.handler(element);
found = p;
});
return found;
});
}
To use it, you just have to call and provide an array of Possible Outcomes and their selectors / handlers:
await handlePossibleOutcomes(page, [
{
selector: '#headerNavUserButton',
handler: element => {
console.log('Logged in',element);
loggedIn = true;
return true;
}
},
{
selector: '#email-login-password_error',
handler: element => {
console.log('password error',element);
return false;
}
}
]).then(result => {
if (result) {
console.log('Logged in!',result);
} else {
console.log('Failed :(');
}
})
I just started with Puppeteer, and have encountered the same issue, therefore I wanted to make a custom function which fulfills the same use-case.
The function goes as follows:
async function waitForMySelectors(selectors, page){
for (let i = 0; i < selectors.length; i++) {
await page.waitForSelector(selectors[i]);
}
}
The first parameter in the function recieves an array of selectors, the second parameter is the page that we're inside to preform the waiting process with.
calling the function as the example below:
var SelectorsArray = ['#username', '#password'];
await waitForMySelectors(SelectorsArray, page);
though I have not preformed any tests on it yet, it seems functional.
If you want to wait for the first of multiple selectors and get the matched element(s), you can start with waitForFunction:
const matches = await page.waitForFunction(() => {
const matches = [...document.querySelectorAll(YOUR_SELECTOR)];
return matches.length ? matches : null;
});
waitForFunction will return an ElementHandle but not an array of them. If you only need native DOM methods, it's not necessary to get handles. For example, to get text from this array:
const contents = await matches.evaluate(els => els.map(e => e.textContent));
In other words, matches acts a lot like the array passed to $$eval by Puppeteer.
On the other hand, if you do need an array of handles, the following demonstration code makes the conversion and shows the handles being used as normal:
const puppeteer = require("puppeteer"); // ^16.2.0
const html = `
<!DOCTYPE html>
<html>
<head>
<style>
h1 {
display: none;
}
</style>
</head>
<body>
<script>
setTimeout(() => {
// add initial batch of 3 elements
for (let i = 0; i < 3; i++) {
const h1 = document.createElement("button");
h1.textContent = \`first batch #\${i + 1}\`;
h1.addEventListener("click", () => {
h1.textContent = \`#\${i + 1} clicked\`;
});
document.body.appendChild(h1);
}
// add another element 1 second later to show it won't appear in the first batch
setTimeout(() => {
const h1 = document.createElement("h1");
h1.textContent = "this won't be found in the first batch";
document.body.appendChild(h1);
}, 1000);
}, 3000); // delay before first batch of elements are added
</script>
</body>
</html>
`;
let browser;
(async () => {
browser = await puppeteer.launch({headless: true});
const [page] = await browser.pages();
await page.setContent(html);
const matches = await page.waitForFunction(() => {
const matches = [...document.querySelectorAll("button")];
return matches.length ? matches : null;
});
const length = await matches.evaluate(e => e.length);
const handles = await Promise.all([...Array(length)].map((e, i) =>
page.evaluateHandle((m, i) => m[i], matches, i)
));
await handles[1].click(); // show that the handles work
const contents = await matches.evaluate(els => els.map(e => e.textContent));
console.log(contents);
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
Unfortunately, it's a bit verbose, but this can be made into a helper.
See also Wait for first visible among multiple elements matching selector if you're interested in integrating the {visible: true} option.
Puppeteer methods might throw errors if they are unable to fufill a request. For example, page.waitForSelector(selector[, options]) might fail if the selector doesn't match any nodes during the given timeframe.
For certain types of errors Puppeteer uses specific error classes. These classes are available via require('puppeteer/Errors').
List of supported classes:
TimeoutError
An example of handling a timeout error:
const {TimeoutError} = require('puppeteer/Errors');
// ...
try {
await page.waitForSelector('.foo');
} catch (e) {
if (e instanceof TimeoutError) {
// Do something if this is a timeout.
}
}