Cheerio not working. What am I doing wrong? - javascript

I am trying to scrape a classified ad search result page.
I have tried console logging everything I can to make sure I am getting a response, which I am, but when I actually use cheerio to query something I don't get anything back. For instance if I just query for how many children using $('.listing-group').children('section').length I get back 0 instead of 24 when I console log it.
Here is what I'm doing. Pardon the long URL.
const request = require("request");
const cheerio = require("cheerio");
const app = express();
app.get("/scrape", function(req, res) {
url =
"http://classifieds.ksl.com/search/?keyword=code&category%5B%5D=Books+and+Media&zip=&miles=25&priceFrom=&priceTo=&city=&state=&sort=";
request(url, function(error, response, html) {
if (!error) {
let $ = cheerio.load(html);
let test = $("#search-results").find("section").length
console.log(test);
} else {
console.log("there has been an error");
}
res.send("Check the console.");
});
});
app.listen("8081");
console.log("check localhost:8081/scrape");
exports = module.exports = app;
I'm new to cheerio so I'm assuming I'm probably making a simple error, but with all the tutorials I've checked, I can't seem to find an answer.
Ultimately I want to grab each result on the page (found in the 'section' tags) to display the data for a personal project I'm working on.

It looks like:
JSON.parse(html.match(/listings: (\[.*\])/)[1])

Related

How would I find the farthest matching file

The title makes no sense to a closest-to-the-box person, but it will make sense.
I'm trying to make a custom-code HTTP server. the code works completely, but I want to add a 404 page.
when you get the 404 page, I want to show more than text.
this is what I have without that addition:
http = require("http");
fs = require("fs");
server = {};
server.http = http.createServer((request, response)=>{
request.path = request.url.split("?")[0];
if(request.url.split("?").length>1){
request.query = request.url.split("?")[1];
request.query = request.query.split('&');
var result = {};
for (var i = 0; i < request.query.length; i++) {
var cur = request.query[i].split('=');
result[decodeURIComponent(cur[0])] = decodeURIComponent(cur[1]);
}
request.query = result;
}
console.clear();
console.log(request.headers, request.path, request.query);
fs.readFile(`Public/HTTP/Scripts${request.path}.js`, "utf-8", (directError, script)=>{
if(directError){
if(directError.message == `ENOENT: no such file or directory, open 'Public/HTTP/Sripts${request.path}.js'`){
//this is where I would like that code. i was thinking a for loop would work but then i got really confused, so, here i am.
}
} else {
fs.readFile(`Public/HTTP/Send${request.path}.file`, (A, sendFile)=>{
eval(script);
});
}
})
});
server.http.listen();
console.clear();
The question without the extras is:
How do I go through file folders backward until I find a folder with the file I need?
I don't even know what I mean, but even more broken down in an example:
/a/path/to/a/file_that/doesn't_exist is request.url.
a, to, and file_that all have the file with the 404 response code.
I want it to get file_thats 404 script because it is the last.
I am so sorry if you still don't understand me. I'm new here and Idk how else to explain it.
Using the Express NodeJS Library is the industry standard framework for defining routes.
Express automatically finds the farthest file and sends an error.
If you use express, you can write
app.use(function (req, res, next) {
res.status(404);
res.send("public/error-404.html") //Where error-404.html is your custom HTML Page
return;
});
You can follow the official guide to implement Express in your app

Node JS Webscraping with Cheerio Problem to fetch the html on some websites

im trying to learn webscraping and the code i wrote works for some URLs, but some just dont fetch any HTML. When i run the script there are 0 logs or errors shown in the terminal and the terminal switches into the next command line.
Here is my code which works for some websites, but for this for example not:
const request = require("request");
const cheerio = require("cheerio");
request("https://reverb.com/", function (error, response, html) {
if (!error && response.statusCode == 200) {
console.log(html);
}
});
The teriminal looks liket this, so basically i dont fetch or scrape any html...any idea why?
PS C:\Users\XY\Documents\Javascript\grabbercheerio> node app.js
PS C:\Users\XY\Documents\Javascript\grabbercheerio>
If your error has a value, something went wrong. Then what you want to do is to log the error and exit the function. If the error has no value or null, everything went fine and you can use the data.
This example is a more efficient approach.
const request = require('request');
request('https://reverb.com/', function(error, _response, data) {
if (error) {
return console.log('Something went wrong!', error);
}
// Everything went fine
console.log(data);
});

Mocha - dynamic iteration in unit test

All I want is a dynamic unit test running on Linux/Debian.
I want to get an array of links from a website using "superagent" or whatever module can send a GET request, and then do some tests on each of these links (again sending GET request to each link) using "describe", "it".
I searched for loops in mocha & chai in stackoverflow but the answers assume you already have an array and want to test each of its values so you won't be needing to send GET request to get the array values before test.
my problem is, I don't even have that array to iterate over its values, the array should contain almost 100 links and first I need to get them by sending a GET request to a website (example.com), then I need to test content of each of these links.
test.js
var expect = require('chai').expect;
var request = require('request');
var superagent = require('superagent');
var cheerio = require('cheerio');
var uri = 'http://example.com';
function caller(schemaID){
url = uri + schemaID;
describe("schema "+schemaID, function schema() {
it("available", function schemaLoader(done) {
superagent.get(url)
.end(function(err, res) {
var $ = cheerio.load(res.res.text);
expect(true).to.be.true;
done();
});
});
});
}
superagent.get(uri).end(function(err, res) {
var $ = cheerio.load(res.res.text);
var array = [];
$('#tab-content1').find('a').each(function(){
array.push($(this).attr("href"));
})
array.forEach(val => {
caller(val);
});
});
When I run this in terminal:
mocha test.js
it does nothing!
How would Mocha test all 100 links after getting list of links from superagent?

Node Request: Get certain class/id

I am using the Node module Request. I have also used node-fetch in the past but request seems better. I want to be able to log something with a certain ID/Class/Span, you name it. This is my current code which logs the whole body.
const request = require('request');
request("https://discord.js.org/#/docs/main/stable/class/GuildMember", (err, response, body) => {
if (err) console.log(err);
console.log(body)
});
I want to log the contents of any Property the user gives, let's say they give bannable I want to log the contents of div#doc-for-bannable.class-prop.class-item but I'm not sure how to do this.
Edit:
I tried using Cheerio as suggested but it is not giving me any sort of response, just a blank space in my console.The code I tried was:
request("https://discord.js.org/#/docs/main/stable/class/GuildMember", (err, response, body) => {
if (err) console.log(err);
const $ = cheerio.load(body);
const g = $('docs-page'); // this doesn't work either, after trying '#doc-for-bannable`
const gText = g.text();
console.log(gText)
});
The underlying request page builds the DOM after requesting data from backend using XHR/ajax requests.
So when you use 'request', the page is loaded without XHR being fired
However, if you closely look at the Network tab, you will come across
https://raw.githubusercontent.com/hydrabolt/discord.js/docs/stable.json
This has the data that you are looking for.
You can use the nodejs 'request' for this url and consume the json

what is wrong with this Instagram API call?

What is wrong with this code? I want to pull my hair out! I'm getting JSON from the Instagram API. console logging just body gives me the JSON, but when I do something like body.data or body.pagination, I get nothing! Help please and thank you.
var express = require("express"),
app = express(),
https = require("https"),
fs = require("fs"),
request = require("request");
request("https://api.instagram.com/v1/tags/nofilter/media/recent?access_token=xxxxx&scope=public_content&count=1", function(error, response, body) {
if (!error && response.statusCode == 200) {
console.log(body) // returns all the relevant JSON
console.log(body.data) // **returns undefined!!!!!**
}
}).pipe(fs.createWriteStream("./test.txt"))
body is literally what it says on the tin - the body of the HTTP response. In other words, it's a string - you need to run it through JSON.parse to actually access it as an object.
console.log(JSON.parse(body).data);
Obviously, if you were going to use this for real, you'd assign the parsed object to a variable rather than running it every time you access it.

Categories

Resources