How to exclude a certain part of HTML in cheerio? - javascript

I have following HTML and I'd like to get only the contents excluding those in #F1. I have tried this, but it's not working:
"use strict";
let sample = `
<div id="main">
<div class="content1">
<h2>Status</h2>
</div>
<div id="F1" >
<div class="description">
<p>some info</p>
</div>
</div>
<div class="content2">
<h2>Status</h2>
</div>
</div>
`
var cheerio = require('cheerio'),
$ = cheerio.load(sample);
$('#main').not('#F1').map(function(i, el) {
console.log($(el).html())
})

There are at least a few ways to do this. Which way is best probably depends on context that hasn't been provided, but here goes:
import cheerio from "cheerio"; // 1.0.0-rc.12
const html = `
<div id="main">
<div class="content1">
<h2>Status</h2>
</div>
<div id="F1" >
<div class="description">
<p>some info</p>
</div>
</div>
<div class="content2">
<h2>Status</h2>
</div>
</div>
`;
const $ = cheerio.load(html);
// approach 1
console.log([...$("#main > div").not("#F1")]
.map(e => $(e).text().trim()));
// approach 2
console.log([...$("#main > div:not(#F1)")]
.map(e => $(e).text().trim()));
// approach 3
console.log([...$("#main [class^='content']")]
.map(e => $(e).text().trim()));
// approach 4 (destructive)
$("#F1").remove();
console.log([...$("#main > div")]
.map(e => $(e).text().trim()));

Related

Puppeteer; Get Values within an element

I'm stuck here.
I got multiple rows with class rowcontent.
I get them like that:
const rows = await page.$$('.row-content');
Almost every row in rows got many spans with the class named cashspan.
I would like to get those values in an array called 'values'.
I've tried much to many things with no success
for (let m = 0; m < rows.length; m++) {
const row = await rows[m];
const values = await row.evaluate(() => Array.from(row.getElementsByClassName('cashspan'), element => element.textContent));
console.log(values)
}
this was the latest thing I've tried.
With
const spancashs = await page.evaluate(() => Array.from(document.querySelectorAll('[class="cashspan"]'), element => element.textContent));
I get all the elements on the page. But i need them for every row. Hope that makes sense.
Update1
Example:
<div class="container">
<div class="row-content">
<div class="someclass1">
<div class="someclass2">
<span class="cashspan">1</span>
</div>
</div>
<div class="someclass3">
<div class="someclass4">
<span class="cashspan">2</span>
</div>
</div>
<div class="someclass5">
<div class="someclass6">
<span class="cashspan">3</span>
</div>
</div>
</div>
<div class="row-content">
<div class="someclass7">
<div class="someclass8">
<span class="cashspan">4</span>
</div>
</div>
<div class="someclass9">
<div class="someclass10">
<span class="cashspan">5</span>
</div>
</div>
<div class="someclass11">
<div class="someclass12">
<span class="cashspan">6</span>
</div>
</div>
</div>
<div class="row-content">
<div class="someclass13">
<div class="someclass14">
<span class="cashspan">7</span>
</div>
</div>
<div class="someclass15">
<div class="someclass16">
<span class="cashspan">8</span>
</div>
</div>
<div class="someclass17">
<div class="someclass18">
<span class="cashspan">9</span>
</div>
</div>
</div>
</div>
Code:
const rows = await page.$$('.row-content');
for (let i = 0; i < rows.length; i++) {
const row = await rows[i];
const values = await row.evaluate(() =>
Array.from(row.getElementsByClassName('cashspan'), element =>
element.textContent));
console.log(values)
}
I'm trying to get all cashspan values in every row-content container. The output for this example should be:
[ 1, 2, 3 ]
[ 4, 5, 6 ]
[ 7, 8 ,9 ]
Following up on the comments, the row variable inside of evaluate()'s callback was never defined in browser context. Adding that variable to the evaluate() callback parameter list worked for me on the provided example. This is the only non-cosmetic change below:
const puppeteer = require("puppeteer"); // ^13.5.1
const html = `
<body>
<div class="container">
<div class="row-content">
<div class="someclass1">
<div class="someclass2">
<span class="cashspan">1</span>
</div>
</div>
<div class="someclass3">
<div class="someclass4">
<span class="cashspan">2</span>
</div>
</div>
<div class="someclass5">
<div class="someclass6">
<span class="cashspan">3</span>
</div>
</div>
</div>
<div class="row-content">
<div class="someclass7">
<div class="someclass8">
<span class="cashspan">4</span>
</div>
</div>
<div class="someclass9">
<div class="someclass10">
<span class="cashspan">5</span>
</div>
</div>
<div class="someclass11">
<div class="someclass12">
<span class="cashspan">6</span>
</div>
</div>
</div>
<div class="row-content">
<div class="someclass13">
<div class="someclass14">
<span class="cashspan">7</span>
</div>
</div>
<div class="someclass15">
<div class="someclass16">
<span class="cashspan">8</span>
</div>
</div>
<div class="someclass17">
<div class="someclass18">
<span class="cashspan">9</span>
</div>
</div>
</div>
</div>
</body>
`;
let browser;
(async () => {
browser = await puppeteer.launch({headless: true});
const [page] = await browser.pages();
await page.setContent(html);
const rows = await page.$$('.row-content');
for (let i = 0; i < rows.length; i++) {
const row = await rows[i];
const values = await row.evaluate(row => Array.from(
row.getElementsByClassName('cashspan'),
element => element.textContent
));
console.log(values);
}
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
If this isn't working on the live site, the problem could be due to any number of JS behaviors, visibility or timing issues, so more detail would be necessary to accurately reproduce the problem.

Puppeteer- Need help to extract the text from h2 and span

Absolute beginner here with JS. I need help to extract the text from DOM which looks like this.
Extracting can be done by querySelectorAll() or getElementsByTagName(). But what I'm looking for is to create an object with each h2 element as the key and the span as it's value. I don't have an idea of how this can be achieved. Any suggestions would be very helpful.
<div class ="product-list">
<div class="row column">
<div class="column medium-9 large-10">
<h2 class="product-name">Products List 1</h2>
</div>
</div>
<div class="row">
<span>First Product</span>
</div>
<div class="row">
<span> Second Product</span>
</div>
.
.
.
<div class="row">
<span>
Nth Product
</span>
</div>
<div class="row column">
<div class="column medium-9 large-10">
<h2 class="product-name">Products List 2</h2>
</div>
</div>
<div class="row">
<span>Thrid Product</span>
</div>
<div class="row">
<span> Fourth Product</span>
</div>
.
.
.
<div class="row">
<span>
Nth Product
</span>
</div>
</div>
From this DOM I need to store the data as
[
Products List 1 :[First Product,Second Product...Nth Product],
Products List 2 :[Third Product,Fourth Product...Nth Product]
]
JS:
const products=await page.evaluate(()=>{
const productsArray=[];
var pdName1=document.querySelectorAll('div.column > h2.product-name');
var pdName2=document.querySelectorAll("div.row > span")
pdName2.forEach(query=>{
productArray.push(query.innerText)
})
return productArray
})
You can try something like this:
import puppeteer from 'puppeteer';
const browser = await puppeteer.launch();
const html = `
<!doctype html>
<html>
<head><meta charset='UTF-8'><title>Test</title></head>
<body>
<div class ="product-list">
<div class="row column">
<div class="column medium-9 large-10">
<h2 class="product-name">Products List 1</h2>
</div>
</div>
<div class="row"><span>First Product</span></div>
<div class="row"><span> Second Product</span></div>
<div class="row"><span>Nth Product</span></div>
<div class="row column">
<div class="column medium-9 large-10">
<h2 class="product-name">Products List 2</h2>
</div>
</div>
<div class="row"><span>Thrid Product</span></div>
<div class="row"><span> Fourth Product</span></div>
<div class="row"><span>Nth Product</span></div>
</div>
</body>
</html>`;
try {
const [page] = await browser.pages();
await page.goto(`data:text/html,${html}`);
const data = await page.evaluate(() => {
const elements = document.querySelectorAll('h2, div.row span');
const list = {};
let currentKey = null;
for (const element of elements) {
if (element.tagName === 'H2') {
currentKey = element.innerText;
list[currentKey] = [];
} else {
list[currentKey].push(element.innerText);
}
}
return list;
});
console.log(data);
} catch (err) { console.error(err); } finally { await browser.close(); }

How to apply multiple .filter() methods along with a .map() method?

I need to filter hotels by multiple conditions: stars (multiple choice), type (multiple choice), price(range with two ends), country (one of the list). I tried to do that like that:
document.getElementById("app").innerHTML =`${hotels.filter(star => star.stars === 4).filter(type => type.type === "hostel").map(hotelTemplate)}`
But I found out that I can apply only one method filter along with the map. Does somebody know if there is a way to apply multiple filter methods and then map them?
Attaching the full code if you need more details:
async function filters(){
const requestUrl = "js/hotels.json";
const response = await fetch(requestUrl);
const data = await response.json();
const hotels = Object.values(data.hotels);
const hotelTemplate = (hotel) => {
return `
<div class="result">
<img class="result__img" src="images/hotel2.png" alt="${hotel.name}">
<div class="result__description">
<p class="result__address">${hotel.address}</p>
<h3 class="result__name">${hotel.name}</h3>
<div class="result__details">
<div class="rating">
<div class="rating__stars">${hotel.stars}</div>
<p class="rating__words"></p>
</div>
<div class="result__type">${hotel.type}</div>
</div>
<p class="result__info">${hotel.description}</p>
</div>
<div class="result__cta">
<div class="review">
<div class="review__head">
<div class="review__rating">
<img class="review__star" src="images/star.svg" alt="${hotel.name}">
<p class="review__rate">${hotel.rating}</p>
</div>
<p class="review__estimate">Good</p>
<p class="review__amount"><span class="review__number">${hotel.reviews_amount}</span> reviews</p>
</div>
<div class="review__body">
<p class="review__text">"${hotel.last_review}"</p>
</div>
</div>
<div class="order">
<div class="order__price">
<p class="order__offer">от <span class="order__amount">${eurFormatter.format(parseInt(hotel.min_price))}</span></p>
</div>
<button class="order__book">
Book now
</button>
</div>
</div>
</div>
`;
}
document.getElementById("app").innerHTML =`${hotels.filter(star => star.stars === 4).map(hotelTemplate)}`
Hey everyone who reads this message!
I have found the solution here:
https://gist.github.com/jherax/f11d669ba286f21b7a2dcff69621eb72
It's highly likely that you will find it there too if you are brought here by the same question as above.

JQuery find nested element in cloned object returns undefined

I'm new to jquery and can't figure out why it isn't able to find nested span element.
My HTML is as follow:
<template id="repository_template">
<section>
<div>
<div class="column">
<img src="imgs/eyes.gif" alt="" width="20%">
</div>
<div class="column">
<span id="title" class="title"></span><br>
<span id="description" class="subtitle"></span>
</div>
</div>
</section>
</template>
<div id="main">
</div>
And, I'm trying to find title and description but it results in undefined.
What I've tried:
// approach one
let appOne = $('#repository_template').clone();
let appOneTitle = appOne.find('#title');
console.log(appOneTitle.html());
// approach two
let $appTwo = $('#repository_template').clone();
let $appTwoTitle = $($appTwo).find('#title');
console.log($($appTwoTitle).html());
// approach three
let appThree = $('#repository_template').clone();
appThreeTitle = appThree.find('span');
console.log(appThreeTitle.html());
What I want to accomplish:
let repoTemplate = $('#repository_template').clone();
repoTemplate.find('#title').text('Hello');
repoTemplate.find('#description').text('World!');
$('#main').append(repoTemplate.html());

How to escape attribute for Javascript Template String?

The snippet is
function addTabItems(tab, items) {
const tabItems = items.map((item) => `
<div class="bvb-cmp-grid__item">
<div class="bvb-comp-grid__item-head">
<img alt="grid-item" src="${item['image']}" class="bvb-cmp-grid__item-img">
<div class="bvb-cmp-grid__item-title">${item['title']}</div>
</div>
<div class="bvb-comp-grid__item-blocks">
${item['items'].map((block) => `
<div class="bvb-comp-grid__item-block">
<div class="bvb-comp-grid__item-block-title">${block['title']}</div>
<div class="bvb-comp-grid__item-block-content">${block['content']}</div>
</div>
`).join('')}
</div>
</div>
`).join('');
const $tabContent = $(`[data-bvb-cmp-grid-tab-content="${tab}"]`);
$tabContent.append(tabItems);
}
Need to achieve - building template by correct way.
How to escape attribue value src="${item['image']}" for preventing XSS atack?

Categories

Resources