Get all span elements within a div using puppeteer [duplicate] - javascript

I am trying out Puppeteer. This is a sample code that you can run on: https://try-puppeteer.appspot.com/
The problem is this code is returning an array of empty objects:
[{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{}]
Am I making a mistake?
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://reddit.com/');
let list = await page.evaluate(() => {
return Promise.resolve(Array.from(document.querySelectorAll('.title')));
});
console.log(JSON.stringify(list))
await browser.close();

The values returned from evaluate function should be json serializeable.
https://github.com/GoogleChrome/puppeteer/issues/303#issuecomment-322919968
the solution is to extract the href values from the elements and return it.
await this.page.evaluate((sel) => {
let elements = Array.from(document.querySelectorAll(sel));
let links = elements.map(element => {
return element.href
})
return links;
}, sel);

Problem:
The return value for page.evaluate() must be serializable.
According to the Puppeteer documentation, it says:
If the function passed to the page.evaluate returns a non-Serializable value, then page.evaluate resolves to undefined. DevTools Protocol also supports transferring some additional values that are not serializable by JSON: -0, NaN, Infinity, -Infinity, and bigint literals.
In other words, you cannot return an element from the page DOM environment back to the Node.js environment because they are separate.
Solution:
You can return an ElementHandle, which is a representation of an in-page DOM element, back to the Node.js environment.
Use page.$$() to obtain an ElementHandle array:
let list = await page.$$('.title');
Otherwise, if you want to to extract the href values from the elements and return them, you can use page.$$eval():
let list = await page.$$eval('.title', a => a.href);

I faced the similar problem and i solved it like this;
await page.evaluate(() =>
Array.from(document.querySelectorAll('.title'),
e => e.href));

Related

document.querySelectorAll() not returning expected results (in puppeteer) [duplicate]

I am trying out Puppeteer. This is a sample code that you can run on: https://try-puppeteer.appspot.com/
The problem is this code is returning an array of empty objects:
[{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{}]
Am I making a mistake?
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://reddit.com/');
let list = await page.evaluate(() => {
return Promise.resolve(Array.from(document.querySelectorAll('.title')));
});
console.log(JSON.stringify(list))
await browser.close();
The values returned from evaluate function should be json serializeable.
https://github.com/GoogleChrome/puppeteer/issues/303#issuecomment-322919968
the solution is to extract the href values from the elements and return it.
await this.page.evaluate((sel) => {
let elements = Array.from(document.querySelectorAll(sel));
let links = elements.map(element => {
return element.href
})
return links;
}, sel);
Problem:
The return value for page.evaluate() must be serializable.
According to the Puppeteer documentation, it says:
If the function passed to the page.evaluate returns a non-Serializable value, then page.evaluate resolves to undefined. DevTools Protocol also supports transferring some additional values that are not serializable by JSON: -0, NaN, Infinity, -Infinity, and bigint literals.
In other words, you cannot return an element from the page DOM environment back to the Node.js environment because they are separate.
Solution:
You can return an ElementHandle, which is a representation of an in-page DOM element, back to the Node.js environment.
Use page.$$() to obtain an ElementHandle array:
let list = await page.$$('.title');
Otherwise, if you want to to extract the href values from the elements and return them, you can use page.$$eval():
let list = await page.$$eval('.title', a => a.href);
I faced the similar problem and i solved it like this;
await page.evaluate(() =>
Array.from(document.querySelectorAll('.title'),
e => e.href));

Return a list of divs with the same selector using puppeteer

I am trying to get a list of divs using puppeteer but my code returns an empty array.
From this site, I am trying to retrieve the list of all cars.
https://master.d1v85iiwii35dx.amplifyapp.com/
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://master.d1v85iiwii35dx.amplifyapp.com/');
console.log('.....going to url')
const feedHandle = await page.$('.car-list-parent');
const arr= await feedHandle.$$eval('.car-list-child',(nodes)=>nodes.map(n=>
{
return n
}))
await browser.close();
})();
Unfortunately, .evaluate() or .$$eval() and similar ones can only transfer serializable values (roughly, the values JSON can handle). As DOM elements are not serializable (they contain methods and circular references), each element in the collection is replaced with an empty object or undefined. You need to return either serializable value (for example, an array of texts) or use something like page.evaluateHandle() and JSHandle API.
The first option:
const arr= await feedHandle.$$eval(
'.car-list-child',
nodes => nodes.map(n => n.innerText)
);
console.log(arr);

Puppeteer returns empty objects [duplicate]

This question already has answers here:
Puppeteer page.evaluate querySelectorAll return empty objects
(3 answers)
Closed 2 months ago.
I am using puppeteer to scrape website. But classes continue to come back as empty even though I can see the many that are there. Any advice for this?
I am looking for classes of "portal-type-person". there are about 90 on the page. but all objects are empty.
const axios = require('axios');
const cheerio = require('cheerio');
const puppeteer = require('puppeteer');
const mainurl = "https://www.fbi.gov/wanted/kidnap";
(async () => {
//const browser = await puppeteer.launch({headless: false});
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(mainurl);
await page.evaluate(() => {
window.scrollBy(0, document.body.scrollHeight);
});
await page.waitForTimeout(1000);
let persons = await page.evaluate(() => {
return document.querySelectorAll('.portal-type-person');
//return document.querySelector('.portal-type-person');
});
//console.log(persons);
for(let data in persons) {
console.log(persons[data]);
}
browser.close();
})();
Unfortunately, page.evaluate() can only transfer serializable values (roughly, the values JSON can handle). As document.querySelectorAll() returns collection of DOM elements that are not serializable (they contain methods and circular references), each element in the collection is replaced with an empty object. You need to return either serializable value (for example, an array of hrefs) or use something like page.$$(selector) and ElementHandle API.

Puppeteer: How to get the contents of each element of a nodelist?

I'm trying to achieve something very trivial: Get a list of elements, and then do something with the innerText of each element.
const tweets = await page.$$('.tweet');
From what I can tell, this returns a nodelist, just like the document.querySelectorAll() method in the browser.
How do I just loop over it and get what I need? I tried various stuff, like:
[...tweets].forEach(tweet => {
console.log(tweet.innerText)
});
page.$$():
You can use a combination of elementHandle.getProperty() and jsHandle.jsonValue() to obtain the innerText from an ElementHandle obtained with page.$$():
const tweets = await page.$$('.tweet');
for (let i = 0; i < tweets.length; i++) {
const tweet = await (await tweets[i].getProperty('innerText')).jsonValue();
console.log(tweet);
}
If you are set on using the forEach() method, you can wrap the loop in a promise:
const tweets = await page.$$('.tweet');
await new Promise((resolve, reject) => {
tweets.forEach(async (tweet, i) => {
tweet = await (await tweet.getProperty('innerText')).jsonValue();
console.log(tweet);
if (i === tweets.length - 1) {
resolve();
}
});
});
page.evaluate():
Alternatively, you can skip using page.$$() entirely, and use page.evaluate():
const tweets = await page.evaluate(() => Array.from(document.getElementsByClassName('tweet'), e => e.innerText));
tweets.forEach(tweet => {
console.log(tweet);
});
According to puppeteer docs here, $$ Does not return a nodelist, instead it returns a Promise of Array of ElementHandle. It's way different then a NodeList.
There are several ways to solve the problem.
1. Using built-in function for loops called page.$$eval
This method runs Array.from(document.querySelectorAll(selector)) within the page and passes it as the first argument to pageFunction.
So to get innerText is like following,
// Find all .tweet, and return innerText for each element, in a array.
const tweets = await page.$$eval('.tweet', element => element.innerText);
2. Pass the elementHandle to the page.evaluate
Whatever you get from await page.$$('.tweet') is an array of elementHandle. If you console, it will say JShandle or ElementHandle depending on the type.
Forget the hard explanation, it's easier to demonstrate.
// let's just call them tweetHandle
const tweetHandles = await page.$$('.tweet');
// loop thru all handles
for(const tweethandle of tweetHandles){
// pass the single handle below
const singleTweet = await page.evaluate(el => el.innerText, tweethandle)
// do whatever you want with the data
console.log(singleTweet)
}
Of course there are multiple ways to solve this problem, Grant Miller also answered few of them in the other answer.

Puppeteer page.evaluate querySelectorAll return empty objects

I am trying out Puppeteer. This is a sample code that you can run on: https://try-puppeteer.appspot.com/
The problem is this code is returning an array of empty objects:
[{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{}]
Am I making a mistake?
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://reddit.com/');
let list = await page.evaluate(() => {
return Promise.resolve(Array.from(document.querySelectorAll('.title')));
});
console.log(JSON.stringify(list))
await browser.close();
The values returned from evaluate function should be json serializeable.
https://github.com/GoogleChrome/puppeteer/issues/303#issuecomment-322919968
the solution is to extract the href values from the elements and return it.
await this.page.evaluate((sel) => {
let elements = Array.from(document.querySelectorAll(sel));
let links = elements.map(element => {
return element.href
})
return links;
}, sel);
Problem:
The return value for page.evaluate() must be serializable.
According to the Puppeteer documentation, it says:
If the function passed to the page.evaluate returns a non-Serializable value, then page.evaluate resolves to undefined. DevTools Protocol also supports transferring some additional values that are not serializable by JSON: -0, NaN, Infinity, -Infinity, and bigint literals.
In other words, you cannot return an element from the page DOM environment back to the Node.js environment because they are separate.
Solution:
You can return an ElementHandle, which is a representation of an in-page DOM element, back to the Node.js environment.
Use page.$$() to obtain an ElementHandle array:
let list = await page.$$('.title');
Otherwise, if you want to to extract the href values from the elements and return them, you can use page.$$eval():
let list = await page.$$eval('.title', a => a.href);
I faced the similar problem and i solved it like this;
await page.evaluate(() =>
Array.from(document.querySelectorAll('.title'),
e => e.href));

Categories

Resources