Puppeteer querySelectorAll doesn't get elements properly

Puppeteer querySelectorAll doesn't get elements properly - javascript

I'm using Puppeteer and am trying to use document.querySelectorAll to get a list of elements to then loop over and do something, however, it seems that something is wrong in my code, it either returns nothing, undefined or an empty {} despite my elements being on the page, my JS:
let elements = await page.evaluate(() => document.querySelectorAll("div[class^='my-class--']"))
for (let el of Array.from(elements)) {
// do something
}
what's wrong with my elements and page.evaluate here?

As far as I understand, puppeteer returns all the HTML as a giant string. This is because Node doesn't run in the browser so the HTML doesn't get parsed. So DOM selectors won't work.
What you can do to solve this issue is to use the Cheerio.js module, which allows you to grab elements with JQuery as if it is a parsed DOM.

Since puppeteer returns all HTML as a string you could use DOMParser like in the below example.
let doc = new DOMParser().parseFromString('<template class="myClass"><span class="target">check it out</span></template>', 'text/html');
let templateContent = doc.querySelector("template");
let template = new DOMParser().parseFromString(templateContent.innerHTML, 'text/html');
let target = template.querySelector("span");
console.log([templateContent,target]);

Related

Puppeteer - How to evaluate XPath which returns text?

I am trying to get the text from the following Xpath as a string:
//*[contains(text(), 'mission')]/following-sibling::text()[1]
I have tried
let elHandle = await page.$x("//*[contains(text(), 'mission')]/following-sibling::text()[1]")
which returns an ElementHandle<Element>[]. How can I navigate from here to get to the text string?

I am assuming your XPath is correct. So: page.$x returns an array (of matched elements: <Promise<Array<ElementHandle>>>) where you need the 1st element so you will need to add [0] after the whole element handle expression.
It can be combined with a page.evaluate to retrieve the innerText string.
const elHandleText = await page.evaluate(el => el.innerText, (await page.$x("//*[contains(text(), 'mission')]/following-sibling::text()[1]"))[0])
console.log(elHandleText)
Your question about if it can be done with CSS selectors: It is not possible, XPath's contains method is the solution if you need to find an element with specific text content.

Is DOMParser().parseFromString() worth using?

https://developer.mozilla.org/en-US/docs/Web/API/DOMParser#DOMParser_HTML_extension
It looks like the DOMParser uses innerHTML to add stringified elements to the DOM. What's the advantage of using it?
I have compared the difference between using DOMParser().parseFromString() and using element.innerHTML below. Am I overlooking something?
Using DOMParser
const main = document.querySelector('main');
const newNodeString = '<body><h2>I made it on the page</h2><p>What should I do now?</p><select name="whichAdventure"><option>Chill for a sec.</option><option>Explore all that this page has to offer...</option><option>Run while you still can!</option></select><p>Thanks for your advice!</p></body>';
// Works as expected
let newNode = new DOMParser().parseFromString(newNodeString, 'text/html');
let div = document.createElement('div');
console.log('%cArray.from: ', 'border-bottom: 1px solid yellow;font-weight:1000;');
Array.from(newNode.body.children).forEach((node, index, array) => {
div.appendChild(node);
console.log('length:', array.length, 'index: ', index, 'node: ', node);
})
main.appendChild(div);
Using innerHTML
const main = document.querySelector('main');
const newNodeString = '<h2>I made it on the page</h2><p>What should I do now?</p><select name="whichAdventure"><option>Chill for a sec.</option><option>Explore all that this page has to offer...</option><option>Run while you still can!</option></select><p>Thanks for your advice!</p>';
// Works as expected
let div = document.createElement('div');
div.innerHTML = newNodeString;
main.appendChild(div);
I expect that DOMParser().parseFromString() provides some additional functionality that I'm unaware of.

Well, for one thing, DOMParser can parse XML files. It also validates that the XML is well formed and produces meaningful errors if not. More to the point, it is using the right tool for the right job.
I've used it in the past to take an uploaded XML file and produce HTML using a predefined XSLT style sheet, without getting a server involved.
Obviously, if all you're doing is appending the string to an existing DOM, and innerHTML (or outerHTML) works for you, continue using it.

How to get the last element with querySelector?

I'm scraping from a website that has a lot of nested HTML elements, but what interests me are the abbr elements. In my case those abbr elements have data-utime attribute, so they are defined as <abbr data-utime="someValue">some other nested HTML</abbr>. So, what I want to do is that I want to get the data-utime attribute value of the last abbr element on the page.
I tried to do something like this:
const SELECTOR = 'abbr:last-child';
const result = await page.evaluate((selector) => {
return document.querySelector(selector);
}, SELECTOR);
console.log(result);
console.log(typeof(res));
console.log(result.getAttribute('data-utime'));
But the problem is that in the output that I get, result is just an empty object ({}), so typeof(res) returns object, and it of course doesn't have getAttribute function then. I believe also last-child selector is the proper way to get the last abbr element on the page. Any ideas how to achieve what I want?

evaluate is run in the page’s context; the result is serialized and returned. Use $$eval instead:
const SELECTOR = "abbr";
const result =
await page.$$eval(SELECTOR,
(elements) => elements[elements.length - 1].dataset.utime);
console.log(result);
You can also use evaluate and call document.querySelectorAll inside it, but I prefer to keep the selectors in my Puppeteer code so I can reuse them.

Parsing Html from string into document

I'm trying to parse Html code from string into a document and start appending each node at a time to the real dom.
After some research i have encountered this api :
DOMImplementation.createHTMLDocument()
which works great but if some node has descendants than i need to first append the node and only after its in the real dom than i should start appending its descendants , i think i can use
document.importNode(externalNode, deep);
with the deep value set to false in order to copy only the parent node.
so my approach is good for this case and how should i preserve my order of appended nodes so i wont append the same node twice?
and one more problem is in some cases i need to add more html code into a specific location (for example after some div node) and continue appending , any idea how to do that correctly?

You can use the DOMParser for that:
const parser = new DOMParser();
const doc = parser.parseFromString('<h1>Hello World</h1>', 'text/html');
document.body.appendChild(doc.documentElement);
But if you want to append the same thing multiple times, you will have better performances using a template:
const template = document.createElement('template');
template.innerHTML = '<h1>Hello World</h1>';
const instance = template.cloneNode(true);
document.body.appendChild(instance.content);
const instance2 = template.cloneNode(true);
document.body.appendChild(instance2.content);
Hope this helps

How to execute document.querySelectorAll on a text string without inserting it into DOM

var html = '<p>sup</p>'
I want to run document.querySelectorAll('p') on that text without inserting it into the dom.
In jQuery you can do $(html).find('p')
If it's not possible, what's the cleanest way to to do a temporary insert making sure it doesn't interfere with anything. then only query that element. then remove it.
(I'm doing ajax requests and trying to parse the returned html)

With IE 10 and above, you can use the DOM Parser object to parse DOM directly from HTML.
var parser = new DOMParser();
var doc = parser.parseFromString(html, "text/html");
var paragraphs = doc.querySelectorAll('p');

You can create temporary element, append html to it and run querySelectorAll
var element = document.createElement('div');
element.insertAdjacentHTML('beforeend', '<p>sup</p>');
element.querySelectorAll('p')

Develop Reference

JavaScript is the programming language of the Web.

Puppeteer querySelectorAll doesn't get elements properly - javascript

Related

Puppeteer - How to evaluate XPath which returns text?

Is DOMParser().parseFromString() worth using?

How to get the last element with querySelector?

Parsing Html from string into document

How to execute document.querySelectorAll on a text string without inserting it into DOM

Categories

Resources