Puppeteer; Get Values within an element - javascript

I'm stuck here.
I got multiple rows with class rowcontent.
I get them like that:
const rows = await page.$$('.row-content');
Almost every row in rows got many spans with the class named cashspan.
I would like to get those values in an array called 'values'.
I've tried much to many things with no success
for (let m = 0; m < rows.length; m++) {
const row = await rows[m];
const values = await row.evaluate(() => Array.from(row.getElementsByClassName('cashspan'), element => element.textContent));
console.log(values)
}
this was the latest thing I've tried.
With
const spancashs = await page.evaluate(() => Array.from(document.querySelectorAll('[class="cashspan"]'), element => element.textContent));
I get all the elements on the page. But i need them for every row. Hope that makes sense.
Update1
Example:
<div class="container">
<div class="row-content">
<div class="someclass1">
<div class="someclass2">
<span class="cashspan">1</span>
</div>
</div>
<div class="someclass3">
<div class="someclass4">
<span class="cashspan">2</span>
</div>
</div>
<div class="someclass5">
<div class="someclass6">
<span class="cashspan">3</span>
</div>
</div>
</div>
<div class="row-content">
<div class="someclass7">
<div class="someclass8">
<span class="cashspan">4</span>
</div>
</div>
<div class="someclass9">
<div class="someclass10">
<span class="cashspan">5</span>
</div>
</div>
<div class="someclass11">
<div class="someclass12">
<span class="cashspan">6</span>
</div>
</div>
</div>
<div class="row-content">
<div class="someclass13">
<div class="someclass14">
<span class="cashspan">7</span>
</div>
</div>
<div class="someclass15">
<div class="someclass16">
<span class="cashspan">8</span>
</div>
</div>
<div class="someclass17">
<div class="someclass18">
<span class="cashspan">9</span>
</div>
</div>
</div>
</div>
Code:
const rows = await page.$$('.row-content');
for (let i = 0; i < rows.length; i++) {
const row = await rows[i];
const values = await row.evaluate(() =>
Array.from(row.getElementsByClassName('cashspan'), element =>
element.textContent));
console.log(values)
}
I'm trying to get all cashspan values in every row-content container. The output for this example should be:
[ 1, 2, 3 ]
[ 4, 5, 6 ]
[ 7, 8 ,9 ]

Following up on the comments, the row variable inside of evaluate()'s callback was never defined in browser context. Adding that variable to the evaluate() callback parameter list worked for me on the provided example. This is the only non-cosmetic change below:
const puppeteer = require("puppeteer"); // ^13.5.1
const html = `
<body>
<div class="container">
<div class="row-content">
<div class="someclass1">
<div class="someclass2">
<span class="cashspan">1</span>
</div>
</div>
<div class="someclass3">
<div class="someclass4">
<span class="cashspan">2</span>
</div>
</div>
<div class="someclass5">
<div class="someclass6">
<span class="cashspan">3</span>
</div>
</div>
</div>
<div class="row-content">
<div class="someclass7">
<div class="someclass8">
<span class="cashspan">4</span>
</div>
</div>
<div class="someclass9">
<div class="someclass10">
<span class="cashspan">5</span>
</div>
</div>
<div class="someclass11">
<div class="someclass12">
<span class="cashspan">6</span>
</div>
</div>
</div>
<div class="row-content">
<div class="someclass13">
<div class="someclass14">
<span class="cashspan">7</span>
</div>
</div>
<div class="someclass15">
<div class="someclass16">
<span class="cashspan">8</span>
</div>
</div>
<div class="someclass17">
<div class="someclass18">
<span class="cashspan">9</span>
</div>
</div>
</div>
</div>
</body>
`;
let browser;
(async () => {
browser = await puppeteer.launch({headless: true});
const [page] = await browser.pages();
await page.setContent(html);
const rows = await page.$$('.row-content');
for (let i = 0; i < rows.length; i++) {
const row = await rows[i];
const values = await row.evaluate(row => Array.from(
row.getElementsByClassName('cashspan'),
element => element.textContent
));
console.log(values);
}
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
If this isn't working on the live site, the problem could be due to any number of JS behaviors, visibility or timing issues, so more detail would be necessary to accurately reproduce the problem.

Related

changing Div order in a div main container, javascript DOM manipulation

i want to move a div form the start to the end in a the same div:from 1-2-3 to 2-3-1
my code
const cards = document.querySelectorAll(".card");
const firstCard = document.querySelectorAll(".card")[0].innerHTML;
cards[0].remove();
document.getElementById("mainC").appendChild(firstCard);
<div id="mainC">
<div class="card"> 1 </div>
<div class="card"> 2 </div>
<div class="card"> 3 </div>
</div>
i want to move a div form the start to the end in a the same div:from 1-2-3 to 2-3-1
Based on your original code,we need to remove .innerHTML,then it will work
const cards = document.querySelectorAll(".card");
const firstCard = document.querySelectorAll(".card")[0];// remove .innerHTML and it will work
cards[0].remove();
document.getElementById("mainC").appendChild(firstCard);
<div id="mainC">
<div class="card"> 1 </div>
<div class="card"> 2 </div>
<div class="card"> 3 </div>
</div>
Another solution is to store the content into an array and change the array element order
let divs = []
document.querySelectorAll('#mainC .card').forEach(d =>{
divs.push(d.outerHTML)
})
divs.push(divs.shift())
document.querySelector('#mainC').innerHTML = divs.join('')
<div id="mainC">
<div class="card"> 1 </div>
<div class="card"> 2 </div>
<div class="card"> 3 </div>
</div>
you have used document.querySelectorAll(".card")[0].innerHTML which gives '1' which is not type "node" so it will give an error when appending as a child.
remove .innerHTML and it will work
here is an example that removes the first child and append it to the end.
const shuffle = () => {
const parent = document.querySelector("#mainContainer");
const childrens = [...parent.children];
parent.appendChild(childrens.splice(0,1)[0]);
};
<button type="button" onclick=shuffle()> suffel</button>
<div id="mainContainer">
<div class="card">1</div>
<div class="card">2</div>
<div class="card">3</div>
</div>

Interchnage the children of two divs using a button

My document structure is:
<div id="mainWindow">
<div id="subele1"></div>
</div>
<div id="subWindow">
<div id="subele2"></div>
</div>
I want to create a button so that the children subele1 and subele2 are interchanged every time the button is clicked.
UPD
function handleButtonClicked() {
const mainElement = document.getElementById('subele1')
const subElement = document.getElementById('subele2')
const mainElementValue = mainElement.innerHTML
mainElement.innerHTML = subElement.innerHTML
subElement.innerHTML = mainElementValue
}
<div id="mainWindow">
<div id="subele1">main Window!</div>
</div>
<div id="subWindow">
<div id="subele2">sub Window?</div>
</div>
<button id='main' onclick={handleButtonClicked()}>switch</button>

Puppeteer- Need help to extract the text from h2 and span

Absolute beginner here with JS. I need help to extract the text from DOM which looks like this.
Extracting can be done by querySelectorAll() or getElementsByTagName(). But what I'm looking for is to create an object with each h2 element as the key and the span as it's value. I don't have an idea of how this can be achieved. Any suggestions would be very helpful.
<div class ="product-list">
<div class="row column">
<div class="column medium-9 large-10">
<h2 class="product-name">Products List 1</h2>
</div>
</div>
<div class="row">
<span>First Product</span>
</div>
<div class="row">
<span> Second Product</span>
</div>
.
.
.
<div class="row">
<span>
Nth Product
</span>
</div>
<div class="row column">
<div class="column medium-9 large-10">
<h2 class="product-name">Products List 2</h2>
</div>
</div>
<div class="row">
<span>Thrid Product</span>
</div>
<div class="row">
<span> Fourth Product</span>
</div>
.
.
.
<div class="row">
<span>
Nth Product
</span>
</div>
</div>
From this DOM I need to store the data as
[
Products List 1 :[First Product,Second Product...Nth Product],
Products List 2 :[Third Product,Fourth Product...Nth Product]
]
JS:
const products=await page.evaluate(()=>{
const productsArray=[];
var pdName1=document.querySelectorAll('div.column > h2.product-name');
var pdName2=document.querySelectorAll("div.row > span")
pdName2.forEach(query=>{
productArray.push(query.innerText)
})
return productArray
})
You can try something like this:
import puppeteer from 'puppeteer';
const browser = await puppeteer.launch();
const html = `
<!doctype html>
<html>
<head><meta charset='UTF-8'><title>Test</title></head>
<body>
<div class ="product-list">
<div class="row column">
<div class="column medium-9 large-10">
<h2 class="product-name">Products List 1</h2>
</div>
</div>
<div class="row"><span>First Product</span></div>
<div class="row"><span> Second Product</span></div>
<div class="row"><span>Nth Product</span></div>
<div class="row column">
<div class="column medium-9 large-10">
<h2 class="product-name">Products List 2</h2>
</div>
</div>
<div class="row"><span>Thrid Product</span></div>
<div class="row"><span> Fourth Product</span></div>
<div class="row"><span>Nth Product</span></div>
</div>
</body>
</html>`;
try {
const [page] = await browser.pages();
await page.goto(`data:text/html,${html}`);
const data = await page.evaluate(() => {
const elements = document.querySelectorAll('h2, div.row span');
const list = {};
let currentKey = null;
for (const element of elements) {
if (element.tagName === 'H2') {
currentKey = element.innerText;
list[currentKey] = [];
} else {
list[currentKey].push(element.innerText);
}
}
return list;
});
console.log(data);
} catch (err) { console.error(err); } finally { await browser.close(); }

How to apply multiple .filter() methods along with a .map() method?

I need to filter hotels by multiple conditions: stars (multiple choice), type (multiple choice), price(range with two ends), country (one of the list). I tried to do that like that:
document.getElementById("app").innerHTML =`${hotels.filter(star => star.stars === 4).filter(type => type.type === "hostel").map(hotelTemplate)}`
But I found out that I can apply only one method filter along with the map. Does somebody know if there is a way to apply multiple filter methods and then map them?
Attaching the full code if you need more details:
async function filters(){
const requestUrl = "js/hotels.json";
const response = await fetch(requestUrl);
const data = await response.json();
const hotels = Object.values(data.hotels);
const hotelTemplate = (hotel) => {
return `
<div class="result">
<img class="result__img" src="images/hotel2.png" alt="${hotel.name}">
<div class="result__description">
<p class="result__address">${hotel.address}</p>
<h3 class="result__name">${hotel.name}</h3>
<div class="result__details">
<div class="rating">
<div class="rating__stars">${hotel.stars}</div>
<p class="rating__words"></p>
</div>
<div class="result__type">${hotel.type}</div>
</div>
<p class="result__info">${hotel.description}</p>
</div>
<div class="result__cta">
<div class="review">
<div class="review__head">
<div class="review__rating">
<img class="review__star" src="images/star.svg" alt="${hotel.name}">
<p class="review__rate">${hotel.rating}</p>
</div>
<p class="review__estimate">Good</p>
<p class="review__amount"><span class="review__number">${hotel.reviews_amount}</span> reviews</p>
</div>
<div class="review__body">
<p class="review__text">"${hotel.last_review}"</p>
</div>
</div>
<div class="order">
<div class="order__price">
<p class="order__offer">от <span class="order__amount">${eurFormatter.format(parseInt(hotel.min_price))}</span></p>
</div>
<button class="order__book">
Book now
</button>
</div>
</div>
</div>
`;
}
document.getElementById("app").innerHTML =`${hotels.filter(star => star.stars === 4).map(hotelTemplate)}`
Hey everyone who reads this message!
I have found the solution here:
https://gist.github.com/jherax/f11d669ba286f21b7a2dcff69621eb72
It's highly likely that you will find it there too if you are brought here by the same question as above.

How to exclude a certain part of HTML in cheerio?

I have following HTML and I'd like to get only the contents excluding those in #F1. I have tried this, but it's not working:
"use strict";
let sample = `
<div id="main">
<div class="content1">
<h2>Status</h2>
</div>
<div id="F1" >
<div class="description">
<p>some info</p>
</div>
</div>
<div class="content2">
<h2>Status</h2>
</div>
</div>
`
var cheerio = require('cheerio'),
$ = cheerio.load(sample);
$('#main').not('#F1').map(function(i, el) {
console.log($(el).html())
})
There are at least a few ways to do this. Which way is best probably depends on context that hasn't been provided, but here goes:
import cheerio from "cheerio"; // 1.0.0-rc.12
const html = `
<div id="main">
<div class="content1">
<h2>Status</h2>
</div>
<div id="F1" >
<div class="description">
<p>some info</p>
</div>
</div>
<div class="content2">
<h2>Status</h2>
</div>
</div>
`;
const $ = cheerio.load(html);
// approach 1
console.log([...$("#main > div").not("#F1")]
.map(e => $(e).text().trim()));
// approach 2
console.log([...$("#main > div:not(#F1)")]
.map(e => $(e).text().trim()));
// approach 3
console.log([...$("#main [class^='content']")]
.map(e => $(e).text().trim()));
// approach 4 (destructive)
$("#F1").remove();
console.log([...$("#main > div")]
.map(e => $(e).text().trim()));

Categories

Resources