How to web-scrape the required result using Nodejs? - javascript

Hi I'm a beginner in scrapping and jQuery. I'm doing scraping using Nodejs ,learning scraping from this site. The dependencies I'm using with it are: cheerio, request-promise and fs. I have not issue in setting up the environment for it.
I tried to scrap the basic text data from this site. But I'm not be able to do so. I tried to change classes but the result the same.
I want to get and print it in this.
But I'm not be able to add the City and Property type sections(image above) in it by scraping.
What change should I make in the logic to get the above image result?
Here's my code:
const request = require("request-promise");
const cheerio = require("cheerio");
request(
"https://www.makaan.com/price-trends/property-rates-for-buy-in-hyderabad?page=2",
(error, response, html) => {
if (!error && response.statusCode == 200) {
const $ = cheerio.load(html);
const datarow = $(".tbl td, .tbl th,");
const output = datarow.find(".ta-1").text();
$(".ta-l.link-td-txt, .ta-, ").each((i, data) => {
const item = $(data).text();
console.log(item);
});
}
}
);
My current result: .
Issues With the current result:
1 : Not according to the above format.
2 : Name of headers of columns at first line are missing.
3 : Commas are missing between the fields.
Tried: I tried to search on google and YouTube to try to solve it. I also learn JQuery(basics) to solve it but I'm not getting at it.
Expecting: What should I do to get the result as mentioned above. What was wrong in my logic?

Related

Algolia + Laravel backend API + nuxtJS

I have a Laravel 8 backend API and a completely separate NuxtJS frontend. I have integrated Algolia into the backend API. I have a nice adapter and service and I can search my index.
Now, I am not interested in using scout because I don't like what it is doing and how it works and that's not the problem so I would like to leave it out of the discussion.
So I've made search work on the frontend with Vuetify autocomplete but I decided to use VueInstant search as this is supposed to save me some work when it comes to integrating query suggestions.
Before I can even get query suggestion I need to get the basic search working with Vue Instant Search.
GOAL
I want to have a VueInstant Search with the backend search client.
WHAT I HAVE SO FAR
THAT IS WITHOUT QUERY SUGGESTIONS JUST THE BASIC SEARCH WITH VUEINSTANT SEARCH
I have backend code that searches my index. I have the frontend code that creates a new connection to my backend (don't worry about how it looks like I just need to get this to work first and then I will invest the time to refactor it):
customSearchClient () {
const that = this
return {
search(requests) {
return that.fetchContainers({ criteria: { query: 'super' }, updateStore: false }).then(response => {
// console.log({ response }, typeof response)
// return response.data.hits
return { results: response.data }
// return response
// return response.data.hits
})
}
}
}
And this is my code for the form:
<ais-instant-search index-name="containers-index" :search-client="customSearchClient()" >
<ais-search-box />
<ais-hits>
<template slot="item" slot-scope="{ item }">
<h1><ais-highlight :hit="item" attribute="name" /></h1>
<p><ais-highlight :hit="item" attribute="description" /></p>
</template>
</ais-hits>
</ais-instant-search>
PROBLEMS
I can get the searchbox to show and query if I remove ais-hits tags. As soon as I add them I get weird errors depending on how I format my response from the backend. I just try to pass it as it is.
I went through some debugging and tried to wrap this into various wrappers as they seem to be missing but eventually it always breaks, for example:
algoliasearch.helper.js?ea40:1334 Uncaught (in promise) TypeError: content.results.slice is not a function at AlgoliaSearchHelper._dispatchAlgoliaResponse (algoliasearch.helper.js?ea40:1334:1)
And that is the Algolia code that breaks.
this._currentNbQueries -= (queryId - this._lastQueryIdReceived);
this._lastQueryIdReceived = queryId;
if (this._currentNbQueries === 0) this.emit('searchQueueEmpty');
var results = content.results.slice();
states.forEach(function(s) {
var state = s.state;
var queriesCount = s.queriesCount;
var helper = s.helper;
var specificResults = results.splice(0, queriesCount);
var formattedResponse = helper.lastResults = new SearchResults(state, specificResults);
SUMAMRY
The ideal solution would be to not to use this InstantSearch thing but I have no clue how to manage more than one index on the server side.
Or am I completely wrong about all of that? Anyone can advise?

How can I simply iterate through JSON that has a multileveled encapsulation with (Google) Javascript?

I want to retrieve the current prices of specific crypto currencies and update some fields in my google sheet.
Here a short codesnippet:
var api_url = "API-URL";
var response = UrlFetchApp.fetch(api_url);
var dataAll = JSON.parse(response.getContentText());
#When executing the URL in a Browser, I get this for dataAll:
dataAll content on Browser
#When executing it in the Google Script Editor, the Google Runtime Environment has this
dataAll content
#Here the rawdata response of the api-url call in JSON-Format with just 1 crypto entry in the data:
{"status":{"timestamp":"2021-04-01T21:11:59.721Z","error_code":0,"error_message":null,"elapsed":13,"credit_count":1,"notice":null,"total_count":4567},"data":[{"id":1,"name":"Bitcoin","symbol":"BTC","slug":"bitcoin","num_market_pairs":9477,"date_added":"2013-04-28T00:00:00.000Z","tags":["mineable","pow","sha-256","store-of-value","state-channels","coinbase-ventures-portfolio","three-arrows-capital-portfolio","polychain-capital-portfolio","binance-labs-portfolio","arrington-xrp-capital","blockchain-capital-portfolio","boostvc-portfolio","cms-holdings-portfolio","dcg-portfolio","dragonfly-capital-portfolio","electric-capital-portfolio","fabric-ventures-portfolio","framework-ventures","galaxy-digital-portfolio","huobi-capital","alameda-research-portfolio","a16z-portfolio","1confirmation-portfolio","winklevoss-capital","usv-portfolio","placeholder-ventures-portfolio","pantera-capital-portfolio","multicoin-capital-portfolio","paradigm-xzy-screener"],"max_supply":21000000,"circulating_supply":18670918,"total_supply":18670918,"platform":null,"cmc_rank":1,"last_updated":"2021-04-01T21:11:03.000Z","quote":{"CHF":{"price":55752.47839320199,"volume_24h":59267607529.77155,"percent_change_1h":0.02671823,"percent_change_24h":0.05924755,"percent_change_7d":11.47320017,"percent_change_30d":24.2882489,"percent_change_60d":81.38470939,"percent_change_90d":102.84247336,"market_cap":1040949952376.2463,"last_updated":"2021-04-01T21:11:15.000Z"}}}]}
For better readability I just pasted it into e.g. Notepad++ and went for Menu > JSON Viewer > Format JSON.
I know it's really basic, but how the heck can I now iterate through this encapsulated Object and dig to the appropriate level so I can read the price? I only want to pick a specific cryptocurrency, e.g. Ethereum which has id: 1027 and take its price for further purposes.
I want to be able to pick just the entries that fit to my portfolio (e.g. distinguish with id:) and take its price for a specific cell update in my google sheet.
Thanks a lot for your help in advance!
Best regards
Doniberi
if you want to get data by name just filter it
const api_url = 'API-URL';
const response = UrlFetchApp.fetch(api_url);
const dataAll = JSON.parse(response.getContentText());
const etherealData = dataAll.data.find(item => item.name === 'Ethereum');
function postDataToSheet() {
const ss=SpreadsheetApp.getActive();
const selected=['Symbol1','Symbol2']
const api_url = "API-URL";
let r= UrlFetchApp.fetch(api_url);
let rjson = r.getContentText();
let robj = JSON.parse(rjson);
let vs=[];
const ts=robj.status.timestamp;
const tc=robj.status.total_count;
vs.push(['TimeStamp',ts,'']);
vs.push('Total Count',tc,'');
vs.push(['Id','Symbol','Price']);
//***************************************************
//use this one to put them all on a sheet
robj.data.forEach((item,i)=>{
vs.push([item.id,item.symbol,item.quote.CHF.price])
});
const sh=ss.getSheetByName('Sheet1');
sh.getRange(1,1,vs.length,vs[0].length).setValues(vs);
//***************************************************
//use this one to put only the selected on a sheet
robj.data.forEach((item,i)=>{
if(~selected.index(item.symbol)) {
vs.push([item.id,item.symbol,item.quote.CHF.price])
}
});
const sh=ss.getSheetByName('Sheet1');
sh.getRange(1,1,vs.length,vs[0].length).setValues(vs);
}
If you only need current price I'm pretty sure you can remove all the code and use something like =IMPORTDATA("https://cryptoprices.cc/BTC/") to fetch crypto prices.
No complex parsing, no authentication, no limitations.

How to read and write to local JSON files from React.js?

I have looked at multiple resources for this, however, none seem to be able to answer my question. I have a local JSON file in my React app called items.json. In that file, is a list of objects, which I want to be able to update. I have tried using fs however this apparently doesn't work in React, as I received this error:
Unhandled Rejection (TypeError): fs.readFileSync is not a function
What I am trying to do, is that when the code gets a new item, it looks through the JSON file to see if there is an existing object with a matching values in its name property. If there is, it increments that objects count property by 1, otherwise it creates a new object, and appends it to the list in the JSON file. This is the code that I have written to do that. The logic seems sound (although its not tested) but I can't figure out how to read/write the data.
let raw = fs.readFileSync("../database/items.json");
let itemList = JSON.parse(raw);
let found = false;
for (let item of itemList.averages) {
if (item.name === this.state.data.item_name) {
found = true;
item.count += 1;
}
}
if (!found) {
let newItem = {
name: this.state.data.item_name,
count: 1,
}
itemList.averages.push(newItem);
}
let newRaw = JSON.stringify(itemList);
fs.writeFileSync("../database/items.json", newRaw);
The JSON file:
{
"averages": [
{
"name": "Example",
"count": 1,
}
]
}
First of all, the browser itself doesn't have access to the filesystem, so you won't be able to achieve that using your react app. However, this can be achieved if you use Node.js(or any other FW) at the backend and create an API endpoint which can help you to write to the filesystem.
Secondly, if you wanted to only do things on the frontend side without creating an extra API just for saving the data in a JSON file which I think is not necessary in your case. You can use localstorage to save the data and ask the user to download a text file using this :
TextFile = () => {
const element = document.createElement("a");
const textFile = new Blob([[JSON.stringify('pass data from localStorage')], {type: 'text/plain'}); //pass data from localStorage API to blob
element.href = URL.createObjectURL(textFile);
element.download = "userFile.txt";
document.body.appendChild(element);
element.click();
}
Now, To use local storage API you can check here - https://developer.mozilla.org/en-US/docs/Web/API/Window/localStorage
reading and writing JSON file to local storage is quite simple with NodeJs, which means a tiny piece of backend API in express would help get this job done.
few piece of code that might help you. Assuming you JSON structure would be such as below;
{
"name":"arif",
"surname":"shariati"
}
Read JSON file;
// import * as fs from 'fs';
const fs = require('fs')
fs.readFile('./myFile.json', 'utf8', (err, jsonString) => {
if (err) {
return;
}
try {
const customer = JSON.parse(jsonString);
} catch(err) {
console.log('Error parsing JSON string:', err);
}
})
customer contains your JSON, and values can be accessed by customer.name;
Write to JSON File
Let's say you have an update on your JSON object such as below;
const updatedJSON = {
"name":"arif updated",
"surname":"shariati updated"
}
Now you can write to your file. If file does not exist, it will create one. If already exists, it will overwrite.
fs.writeFile('./myFile.json', JSON.stringify(updatedJSON), (err) => {
if (err) console.log('Error writing file:', err);
})
Importing and reading from json can be like:
import data from ‘./data/data.json’;
then use .map() to iterate data.
for writing locally you can use some libraries like https://www.npmjs.com/package/write-json-file

How to parse large json external file stored locally (~30 MB) to populate a select dropdown list avoiding long load time?

I am trying to parse a JSON file that has around 209,579 objects (~30MB file size) and populate those(names, and value attribute) in a dropdown menu using select-option tags. I could parse the whole file using the jquery getJSON method to parse and populate it using the traditional way of targeting the DOM element, but when I select the dropdown menu it takes too long to display the content and disables the browser for that period of time.
I have tried with a smaller dataset with ~100 objects (which is significantly less) and the page renders the dropdown fast and doesn't lag. This is the reason why I think I am having a problem because of my large JSON object dataset.
<!--html code-->
<h3>Select your Location:</h3>
<select id="locality-dropdown" name="locality">
</select>
//referenced from https://www.codebyamir.com/blog/populate-a-select-dropdown-list-with-json
//(I have used the same technique)
let dropdown = document.getElementById('locality-dropdown');
dropdown.length = 0;
let defaultOption = document.createElement('option');
defaultOption.text = 'Choose State/Province';
dropdown.add(defaultOption);
dropdown.selectedIndex = 0;
const url = 'js/city_list1.json';
fetch(url)
.then(
function(response) {
if (response.status !== 200) {
console.warn('Looks like there was a problem. Status Code: ' +
response.status);
return;
}
// Examine the text in the response
response.json().then(function(data) {
let option;
for (let i = 0; i < data.length; i++) {
option = document.createElement('option');
option.text = data[i].name;
option.value = data[i].id;
dropdown.add(option);
}
});
}
)
.catch(function(err) {
console.error('Fetch Error -', err);
});
I expect to render the dropdown like in the normal websites, but in my case, the browser stops responding and takes a while to load the content in the dropdown menu when I click on the dropdown.
As you've rightfully noted yourself the bottleneck in your case is the fact that you are trying to load the whole dataset at once. You should consider loading that data in pages (chunks) instead and maybe have a windowed component that loads/renders only the data that is currently displayed.
As others have noted JSON is not the best format for storing huge datasets, database is much better, even something as simple and small as SQLite will do.
But if you would still prefer to proceed with JSON I'd recommend to try one of the libraries that allow you to parse partial JSON blocks and somewhat mimic what you would have if you went with database and paginated loading of data.
Take for example stream-json (NodeJS module, but I'd imagine one can easily find something similar for every backend technology out there).
stream-json is a micro-library of node.js stream components with
minimal dependencies for creating custom data processors oriented on
processing huge JSON files while requiring a minimal memory footprint.
It can parse JSON files far exceeding available memory. Even
individual primitive data items (keys, strings, and numbers) can be
streamed piece-wise.
const { chain } = require('stream-chain')
const { parser } = require('stream-json')
const { pick } = require('stream-json/filters/Pick')
const { streamValues } = require('stream-json/streamers/StreamValues')
const fs = require('fs')
const pipeline = chain([
fs.createReadStream('sample.json'),
parser(),
pick({ filter: 'data' }),
streamValues(),
data => {
const value = data.value
return value && value.department === 'accounting' ? data : null
}
])
let counter = 0
pipeline.on('data', () => ++counter)
pipeline.on('end', () =>
console.log(`The accounting department has ${counter} employees.`)
)
"Normal websites" don't have select field with 209,579 options. The speed here will not depend on your code but mostly on the client machine performance and connection speed. You must think alternatives here, like autocomplete feature, or maybe infinite scroll, things like that.
You cannot expect loading 200000 records into the browser dom to be performant in this scenario and will be a horrible experience for users - surely that must be obvious. You will need to store the data in a db or possibly array in the browser and then search that data and only return rows that match from autocomplete and then add them to a table/grid so you limiting the results. A drop down for this amount of data sounds like a problem.
As another old tip, rather build up your option data in a string and add that to the dom once instead of a 100 times using creatElement.

Dynamically create Prometheus gauge names

I am using prom-client in my node app to send stats to our prometheus instance. I am firing off a set of requests to determine if the app is up or not. With that, I want to dynamically set the name of the gauge depending on what app is being pinged.
apps.map(app => {
request(app.url, (error, response, body) => {
let name = app.name
const gauge = new client.Gauge({
name: `${app.name}`,
help: `${app.name}`,
register,
})
if (error) {
console.log(`${app.name} ERROR!`, error)
gauge.set(0)
}
if (response && response.statusCode == 200) {
console.log(`${app.name} is up!`, response.statusCode)
gauge.set(0)
}
gateway.pushAdd({ jobName: 'app_up_down', register })
})
})
Setting the gauge name to ${app.name} is throwing an Invalid Name error. The only way I could get it to stop throwing that error was to hardcode the name which doesn't work for me. Does anyone know if there is a way to dynamically set this, or is this an issue with the package?
This looks like blackbox monitoring, for which the blackbox exporter is probably a better tool.
The particular error you're getting is likely as the name contains invalid characters, a label would be a better choice. In addition this is not a good use of the pushgateway.

Categories

Resources