Puppeteer returns empty objects [duplicate] - javascript

This question already has answers here:
Puppeteer page.evaluate querySelectorAll return empty objects
(3 answers)
Closed 2 months ago.
I am using puppeteer to scrape website. But classes continue to come back as empty even though I can see the many that are there. Any advice for this?
I am looking for classes of "portal-type-person". there are about 90 on the page. but all objects are empty.
const axios = require('axios');
const cheerio = require('cheerio');
const puppeteer = require('puppeteer');
const mainurl = "https://www.fbi.gov/wanted/kidnap";
(async () => {
//const browser = await puppeteer.launch({headless: false});
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(mainurl);
await page.evaluate(() => {
window.scrollBy(0, document.body.scrollHeight);
});
await page.waitForTimeout(1000);
let persons = await page.evaluate(() => {
return document.querySelectorAll('.portal-type-person');
//return document.querySelector('.portal-type-person');
});
//console.log(persons);
for(let data in persons) {
console.log(persons[data]);
}
browser.close();
})();

Unfortunately, page.evaluate() can only transfer serializable values (roughly, the values JSON can handle). As document.querySelectorAll() returns collection of DOM elements that are not serializable (they contain methods and circular references), each element in the collection is replaced with an empty object. You need to return either serializable value (for example, an array of hrefs) or use something like page.$$(selector) and ElementHandle API.

Related

Get all span elements within a div using puppeteer [duplicate]

I am trying out Puppeteer. This is a sample code that you can run on: https://try-puppeteer.appspot.com/
The problem is this code is returning an array of empty objects:
[{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{}]
Am I making a mistake?
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://reddit.com/');
let list = await page.evaluate(() => {
return Promise.resolve(Array.from(document.querySelectorAll('.title')));
});
console.log(JSON.stringify(list))
await browser.close();
The values returned from evaluate function should be json serializeable.
https://github.com/GoogleChrome/puppeteer/issues/303#issuecomment-322919968
the solution is to extract the href values from the elements and return it.
await this.page.evaluate((sel) => {
let elements = Array.from(document.querySelectorAll(sel));
let links = elements.map(element => {
return element.href
})
return links;
}, sel);
Problem:
The return value for page.evaluate() must be serializable.
According to the Puppeteer documentation, it says:
If the function passed to the page.evaluate returns a non-Serializable value, then page.evaluate resolves to undefined. DevTools Protocol also supports transferring some additional values that are not serializable by JSON: -0, NaN, Infinity, -Infinity, and bigint literals.
In other words, you cannot return an element from the page DOM environment back to the Node.js environment because they are separate.
Solution:
You can return an ElementHandle, which is a representation of an in-page DOM element, back to the Node.js environment.
Use page.$$() to obtain an ElementHandle array:
let list = await page.$$('.title');
Otherwise, if you want to to extract the href values from the elements and return them, you can use page.$$eval():
let list = await page.$$eval('.title', a => a.href);
I faced the similar problem and i solved it like this;
await page.evaluate(() =>
Array.from(document.querySelectorAll('.title'),
e => e.href));

document.querySelectorAll() not returning expected results (in puppeteer) [duplicate]

I am trying out Puppeteer. This is a sample code that you can run on: https://try-puppeteer.appspot.com/
The problem is this code is returning an array of empty objects:
[{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{}]
Am I making a mistake?
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://reddit.com/');
let list = await page.evaluate(() => {
return Promise.resolve(Array.from(document.querySelectorAll('.title')));
});
console.log(JSON.stringify(list))
await browser.close();
The values returned from evaluate function should be json serializeable.
https://github.com/GoogleChrome/puppeteer/issues/303#issuecomment-322919968
the solution is to extract the href values from the elements and return it.
await this.page.evaluate((sel) => {
let elements = Array.from(document.querySelectorAll(sel));
let links = elements.map(element => {
return element.href
})
return links;
}, sel);
Problem:
The return value for page.evaluate() must be serializable.
According to the Puppeteer documentation, it says:
If the function passed to the page.evaluate returns a non-Serializable value, then page.evaluate resolves to undefined. DevTools Protocol also supports transferring some additional values that are not serializable by JSON: -0, NaN, Infinity, -Infinity, and bigint literals.
In other words, you cannot return an element from the page DOM environment back to the Node.js environment because they are separate.
Solution:
You can return an ElementHandle, which is a representation of an in-page DOM element, back to the Node.js environment.
Use page.$$() to obtain an ElementHandle array:
let list = await page.$$('.title');
Otherwise, if you want to to extract the href values from the elements and return them, you can use page.$$eval():
let list = await page.$$eval('.title', a => a.href);
I faced the similar problem and i solved it like this;
await page.evaluate(() =>
Array.from(document.querySelectorAll('.title'),
e => e.href));

Returning a node from puppeteer page.evaluate() [duplicate]

This question already has answers here:
Puppeteer page.evaluate querySelectorAll return empty objects
(3 answers)
Closed 3 days ago.
I'm working with Node.js and Puppeteer for the first time and can't find a way to output values from page.evaluate to the outer scope.
My algorithm:
Login
Open URL
Get ul
Loop over each li and click on it
Wait for innetHTML to be set and add it's src content to an array.
How can I return data from page.evaluate()?
const puppeteer = require('puppeteer');
const CREDENTIALS = require(`./env.js`).credentials;
const SELECTORS = require(`./env.js`).selectors;
const URLS = require(`./env.js`).urls;
async function run() {
try {
const urls = [];
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.goto(URLS.login, {waitUntil: 'networkidle0'});
await page.type(SELECTORS.username, CREDENTIALS.username);
await page.type(SELECTORS.password, CREDENTIALS.password);
await page.click(SELECTORS.submit);
await page.waitForNavigation({waitUntil: 'networkidle0'});
await page.goto(URLS.course, {waitUntil: 'networkidle0'});
const nodes = await page.evaluate(selector => {
let elements = document.querySelector(selector).childNodes;
console.log('elements', elements);
return Promise.resolve(elements ? elements : null);
}, SELECTORS.list);
const links = await page.evaluate((urls, nodes, VIDEO) => {
return Array.from(nodes).forEach((node) => {
node.click();
return Promise.resolve(urls.push(document.querySelector(VIDEO).getAttribute('src')));
})
}, urls, nodes, SELECTORS.video);
const output = await links;
} catch (err) {
console.error('err:', err);
}
}
run();
The function page.evaluate() can only return a serializable value, so it is not possible to return an element or NodeList back from the page environment using this method.
You can use page.$$() instead to obtain an ElementHandle array:
const nodes = await page.$$(`${selector} > *`); // selector children
If the length of the constant nodes is 0, then make sure you are waiting for the element specified by the selector to be added to the DOM with page.waitForSelector():
await page.waitForSelector(selector);
let elementsHendles = await page.evaluateHandle(() => document.querySelectorAll('a'));
let elements = await elementsHendles.getProperties();
let elements_arr = Array.from(elements.values());
Use page.evaluateHandle() to return a DOM node as a Puppeteer ElementHandle that you can manipulate in Node.

Return a list of divs with the same selector using puppeteer

I am trying to get a list of divs using puppeteer but my code returns an empty array.
From this site, I am trying to retrieve the list of all cars.
https://master.d1v85iiwii35dx.amplifyapp.com/
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://master.d1v85iiwii35dx.amplifyapp.com/');
console.log('.....going to url')
const feedHandle = await page.$('.car-list-parent');
const arr= await feedHandle.$$eval('.car-list-child',(nodes)=>nodes.map(n=>
{
return n
}))
await browser.close();
})();
Unfortunately, .evaluate() or .$$eval() and similar ones can only transfer serializable values (roughly, the values JSON can handle). As DOM elements are not serializable (they contain methods and circular references), each element in the collection is replaced with an empty object or undefined. You need to return either serializable value (for example, an array of texts) or use something like page.evaluateHandle() and JSHandle API.
The first option:
const arr= await feedHandle.$$eval(
'.car-list-child',
nodes => nodes.map(n => n.innerText)
);
console.log(arr);

Puppeteer page.evaluate querySelectorAll return empty objects

I am trying out Puppeteer. This is a sample code that you can run on: https://try-puppeteer.appspot.com/
The problem is this code is returning an array of empty objects:
[{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{}]
Am I making a mistake?
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://reddit.com/');
let list = await page.evaluate(() => {
return Promise.resolve(Array.from(document.querySelectorAll('.title')));
});
console.log(JSON.stringify(list))
await browser.close();
The values returned from evaluate function should be json serializeable.
https://github.com/GoogleChrome/puppeteer/issues/303#issuecomment-322919968
the solution is to extract the href values from the elements and return it.
await this.page.evaluate((sel) => {
let elements = Array.from(document.querySelectorAll(sel));
let links = elements.map(element => {
return element.href
})
return links;
}, sel);
Problem:
The return value for page.evaluate() must be serializable.
According to the Puppeteer documentation, it says:
If the function passed to the page.evaluate returns a non-Serializable value, then page.evaluate resolves to undefined. DevTools Protocol also supports transferring some additional values that are not serializable by JSON: -0, NaN, Infinity, -Infinity, and bigint literals.
In other words, you cannot return an element from the page DOM environment back to the Node.js environment because they are separate.
Solution:
You can return an ElementHandle, which is a representation of an in-page DOM element, back to the Node.js environment.
Use page.$$() to obtain an ElementHandle array:
let list = await page.$$('.title');
Otherwise, if you want to to extract the href values from the elements and return them, you can use page.$$eval():
let list = await page.$$eval('.title', a => a.href);
I faced the similar problem and i solved it like this;
await page.evaluate(() =>
Array.from(document.querySelectorAll('.title'),
e => e.href));

Categories

Resources