Webscraping walmarts products with cheerio - javascript

I am trying to webscrape walmart's products. Here is the link I am trying to pull https://www.walmart.com/search/?query=&cat_id=91083 I am able to successfully scrape like 10 products from the page. Here is the code I am using.
const axios = require('axios');
const cheerio = require('cheerio');
axios.get('https://www.walmart.com/search/?query=&cat_id=91083').then( res => {
var combino1 = [];
const $ = cheerio.load(res.data);
$('a.product-title-link').each( (index, element) => {
const name = $(element)
.first().text()
combino1[index] = {name}
})
console.log(combino1);
})
When I search the dom with a.product-title-link it shows 40 products. Why I am able to only grab 10 and not 40?

Your issue is that a call with axios will only get you the HTML provided from the server
this means that any asynchronous calls that fetch products from other parts of their system, will never be in that request
a simple output of the data received to a new file, will show this fact
const fs = require('fs')
...
fs.writeFileSync('./data.html', res.data)
opening the new data.html file will only output 10 as the number of product-title-link found
For that you can't use axios but a web scraper library, for example, Puppeteer as with it, you can wait for all products to be loaded prior to transverse the DOM at that given time.

Related

How i can sorting my data using Next API?

How to filter GET request by type https:// url / data?sort_by=type
using Next API?
I tried to make
export const getServerSideProps = async () => {
const result = await fetch(http://localhost:3000/api/data?type=hobbies); a selection but nothing, it returns a regular array with data

Should I be storing poke-api data locally?

I'm building a complete pokedex app using react-native/expo with all 900+ Pokémon.
I've tried what seems like countless ways of fetching the data from the API, but it's really slow.
Not sure if it's my code or the sheer amount of data:
export const getAllPokemon = async (offset: number) => {
const data = await fetch(
`https://pokeapi.co/api/v2/pokemon?limit=10&offset=${offset}`
);
const json = await data.json();
const pokemonURLS: PokemonURLS[] = json.results;
const monData: PokemonType[] = await Promise.all(
pokemonURLS.map(async (p, index: number) => {
const response = await fetch(p.url);
const data: PokemonDetails = await response.json();
const speciesResponse = await fetch(data.species.url);
const speciesData: SpeciesInfo = await speciesResponse.json();
return {
name: data.name,
id: data.id,
image: `https://raw.githubusercontent.com/PokeAPI/sprites/master/sprites/pokemon/other/official-artwork/${data.id}.png`,
description: speciesData.flavor_text_entries,
varieties: speciesData.varieties,
color: speciesData.color.name,
types: data.types,
abilities: data.abilities,
};
})
);
Then I'm using it with a useEffect that increases offset by 10 each time and concats the arrays, until offset > 900.
However like I said, it's really slow.
Should I be saving this data locally to speed things up a little?
And how would I go about it? Should I use local storage or save an actual file somewhere in my project folder with the data?
The biggest performance issue I can see is the multiple fetches you perform as you loop though each pokemon.
I'm guessing that the data returned by the two nested fetches (response and speciesResponse) is reference data and are potentially the same for multiple pokemon. If this is the case, and you can't change the api, then two options pop to mind:
Load the reference data only when needed ie. when a user clicks on a pokemon to view details.
or
Get ALL the reference data before the pokemon data and either combine it with your pokemon fetch results or store it locally and reference it as needed. The first way can be achieved using local state - just keep it long enough to merge the relevant data with the pokemon data. The second will need application state like redux or browser storage (see localStorage or indexeddb).

Cheerio returning empty arrays

I tried scraping with cheerio and everything was working. Then I tried to make the scraped links into working links by adding in the base url to the links, but suddenly this error showed up.
ReferenceError: Cannot access 'links' before initialization
I checked if I was getting the data from the site and axios side is working. But for some reason, cheerio stopped working. When I tried logging it gave me an empty array of elements like this,
LoadedCheerio(136) [Element, Element, Element, Element, Element]
I don't understand why. I need help to figure out about the problem.
Here's my code:
import axios from 'axios';
import * as cheerio from 'cheerio';
const baseUrl = 'https://gogoanime.tel/anime-list.html';
axios.get(baseUrl).then(res => {
const html = res.data;
const $ = cheerio.load(html);
const list = $('.listing a');
const links = list.each(function (i, el) {
$(el).attr('href');
});
});

How to use incoming Binance WebSocket data?

My question is how can I make use of WebSocket data in general. im using this package https://www.npmjs.com/package/binance to get binance live tickers. what i want to do is take that live data and compare it to some other exchanges data which im calling using axios. but all i get is continous stream of binance data and donno how to make use of it, i am trying to run function inside the stream but that does not seem to help as the data stream is continuous and axios request takes time to return. i can add more if my question is incomplete.
The binance package only supports websocket streams, but REST API (either directly or using a different package) will help you achieve your goal more easily.
You can have two functions - each for retrieving price form the exchange, and then compare the price from the different sources.
const axios = require('axios');
const getBinanceBtcPrice = async () => {
// binance allows retrieving a single symbol
// if you omit the param, it would return prices of all symbols
// and you'd have to loop through them like in the WazirX example
const response = await axios.get('https://api.binance.com/api/v3/avgPrice?symbol=BTCUSDT');
return response.data.price;
}
const getWazirxBtcPrice = async () => {
// WazirX only returns all symbol data
const response = await axios.get('https://api.wazirx.com/api/v2/market-status');
// you need to loop through all of them, until you find the desired pair
for (let market of response.data.markets) {
// in your case `btc` and `usdt` (mind the lowecase letters)
if (market.baseMarket === 'btc' && market.quoteMarket === 'usdt') {
// mind that WazirX only returns price in full integer,
// it does not return floating numbers
return market.last;
}
}
// did not find the combination
return null;
}
const run = async () => {
const binancePrice = await getBinanceBtcPrice();
const wazirxPrice = await getWazirxBtcPrice();
console.log(binancePrice);
console.log(wazirxPrice);
}
run();

How can I make a request to multiple URLs and parse the results from each page?

I'm using the popular npm package cheerio with request to retrieve some table data.
Whilst I can retrieve and parse the table from a single page easily, I'd like to loop over / process multiple pages.
I have tried wrapping inside loops / various utilities offers by the async package but can't figure this one out. In most cases, node runs out of memory.
current code:
const cheerio = require('cheerio');
const axios = require("axios");
var url = someUrl;
const getData = async url => {
try {
const response = await axios.get(url);
const data = response.data;
const $ = cheerio.load(data);
const announcement = $(`#someId`).each(function(i, elm) {
console.log($(this).text())
})
} catch (error) {
console.log(error);
}
};
getData(url); //<--- Would like to give an array here to fetch from multiple urls / pages
My current approach, after trying loops, is to wrap this inside another function with a callback param. However no success yet and is getting quite messy.
What is the best way to feed an array to this function?
Assuming you want to do them one at a time:
; (async() => {
for(let url of urls){
await getData(url)
}
})()
Have you tried using Promise.all (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/all)?
For loops are usually a bad idea when dealing with asynchronous calls. It depends how many calls you want to make but I believe this could be enough. I would use an array of promises that fetch the data and map over the results to do the parsing.

Categories

Resources