WEB SCRAPING - nightmare js and request - javascript

I am using combination of nightmare, cheerio and request in NODEjs, for making custom web scraping bot... I did authentication and filter setup with nightmare js, and now I need to call function like
request(URL, function(err, response, body){
if (err) console.error(err);
var scraping = cheerio.load(body);
.
.
.
.
But problem is that I don't know how to forward loaded "body" (by nightmare). I can't use URL because it's dynamically generated content (tables), which means that URL is always the same... I tried to use this instead of URL, but it wont work.
Any suggestions?
Thank you

You don't need to use request. In fact, you shouldn't. Nightmare itself can pass the html data to cheerio.
Once you logged in and went to your desired webpage in nightmare, use evaluate to get the html. You can do something like this:
nightmare
.viewport(1280, 800)
.goto(url)
.wait('#emailselectorId')
.type('#emailselectorId', 'theEmail\u000d')
.type('#ap_password', 'thePassword\u000d')
.click('#signInSubmit')
//do something in the chain to go to your desired page.
.evaluate(() => document.querySelector('body').outerHTML)
.then(function (html) {
cheerio.load(html);
// do something
})
.catch(function (error) {
console.error('Error:', error);
});

Related

How to extract 'div' element from different web page?

I have an allure report webpage that provides a dashboard with a percentage on it. I have to create a completely separate webpage which should show that percentage.
How can I pull that div from that page?
Here's the picture of a div
Fetch it, and then parse it until you have the contents of that div.
For fetch, something simple like (polyfill fetch or use axios instead if using nodejs):
fetch('https://jsonplaceholder.typicode.com').then(function (response) {
// The API call was successful!
return response.text();
}).then(function (data) {
// This is the HTML text from our response
console.log(data);
}).catch(function (err) {
// There was an error
console.warn('Something went wrong.', err);
});
And then for parsing, answers like this will be helpful:
Parse an HTML string with JS

How do I search for a data in the database with Node JS, Postgres, and dust js

I'm making a webpage with Node JS with dustjs and PostgreSQL. How do I make a search query in the html, so I can pass the value to the app.get
Do I need to use JQuery?
app.get('/teachers', function(req, res){
pool.connect(function(err, client, done){
if(err) {
return console.error("error", err);
}
client.query('SELECT * FROM teachers', function(err, result){
if(err){
return console.error('error running query', err)
}
res.render('teacherindex', {teachers: result.rows});
done();
});
});
});
app.get('/teachers/:str', (req,res)=>{
pool.connect((err, client, done) => {
if (err) throw err
client.query('SELECT * FROM teachers WHERE name = $1', [req.query.namesearch], (err, result) => {
done()
if (err) {
console.log(err.stack)
} else {
res.render('teacherindex', {teachers: result.rows});
}
})
})
})
This is my JQuery
$("#myBtn").click(function(){
var str = $("#myInput").val();
var url = '/teachers/'+str;
if(confirm('Search Record?')){
$.ajax({
url: url,
type: 'put',
success: function(result){
console.log('Searching');
window.location.href='/teachers';
},
error: function(err){
console.log(err);
}
});
}
});
My HTML
<input type="text" id="myInput" data-id="namesearch">
<button type="button" id="myBtn">Show Value</button>
Thank you!
FINAL ANSWER:
Ok so it turns out the issue you were having was something completely different. You are trying to use server side rendering for this, and I was showing you how to render the retrieved data on the client side.
I have forked, and updated your repo - which can be found at the link below..
Please review my changes and let me know if you have any questions.
Working repo: https://github.com/oze4/hanstanawi.github.io
Demo Video: https://raw.githubusercontent.com/oze4/hanstanawi.github.io/master/fake_uni_demo.mp4
EDIT:
I went ahead and built a repository to try and help you grasp these concepts. You can find the repo here - I tried to keep things as simple and understandable as possible, but let me know if you have any questions.
I had to make some minor changes to the paths, which I have commented explanations on the code in the repo.
I am using a "mock" database (just a JSON object in a different file) but the logic remains the same.
The index.js is the main entry point and contains all route data.
The index.html file is what gets sent to the user, and is the main HTML file, which contains the jQuery code.
If you download/fork/test out the code in that repo, open up your browsers developer tools, go to the network tab, and check out the differences.
Using req.params
Using req.query
ORIGINAL ANSWER:
So there are a couple of things wrong with your code and why you are unable to see the value of the textbox server side.
You are sending a PUT request but your server is expecting a GET request
You are looking for the value in req.query when you should be looking for it in req.params
You are looking for the incorrect variable name in your route (on top of using query when you should be using params) req.query.namesearch needs to be req.params.str
See here for more on req.query vs req.params
More detailed examples below.
In your route you are specifying app.get - in other words, you are expecting a GET request to be sent to your server.. but your are sending a PUT request..
If you were sending your AJAX to your server by using something like /teachers?str=someName then you would use req.query.str - or if you wanted to use namesearch you would do: /teachers?namesearch=someName and then to get the value: req.query.namesearch
If you send your AJAX to your server by using the something like /teachers/someName then you should be using req.params.str
// ||
// \/ Server is expecting a GET request
app.get('/teachers/:str', (req, res) => {
// GET THE CORRECT VALUE
let namesearch = req.params.str;
pool.connect((err, client, done) => {
// ... other code here
client.query(
'SELECT * FROM teachers WHERE name = $1',
// SPECIFY THE CORRECT VALUE
namesearch,
(err, result) => {
// ... other code here
})
})
});
But in your AJAX request, you are specifying PUT.. (should be GET)
By default, AJAX will send GET requests, so you really don't have to specify any type here, but I personally like to specify GET in type, just for the sake of brevity - just more succinct in my opinion.
Again, specifying GET in type is not needed since AJAX sends GET by default, specifying GET in type is a matter of preference.
$("#myBtn").click(function () {
// ... other code here
let textboxValue = $("#myTextbox").val();
let theURL = "/teachers/" + textboxValue;
// OR if you wanted to use `req.query.str` server side
// let theURL = "/teachers?str=" + textboxValue;
if (confirm('Search Record?')) {
$.ajax({
url: theURL,
// ||
// \/ You are sending a PUT request, not a GET request
type: 'put', // EITHER CHANGE THIS TO GET OR JUST REMOVE type
// ... other code here
});
}
});
It appears you are grabbing the value correctly from the textbox, you just need to make sure your server is accepting the same type that you are sending.

How to print contents crawled by JavaScript on HTML

I crawled some contents by JavaScript, and want to print it on HTML.
JavaScript code below is named 'js.js'(worked well on CMD)
var request = require('request');
var cheerio = require('cheerio');
request('...URL...', function (err, res, body) {
if (err) console.log('Err :' + err);
var $ = cheerio.load(body);
$('.class').each(function () {
var content = $(this).find('.abc').text().trim();
document.write(content);
});
});
But "error:require is not defined" was printed, so I looking for solutions.
I found this page and follow advice which said that use webpack or browseify.
new code(2MB after bundled) give me 2 new error:"fail to fetch" and "access-control-allow-origin". What should I do?
the require() keyword does not exists in browser/client JavaScript, that is why you need to use webpack to transpile nodejs code to browser compatible javascript.
For the "access-control-allow-origin", the url you are tying to connect to does not allow response to unknown origin.
If you own the API/URL you could add a response header Access-Control-Allow-Origin: *.
For reference: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Access-Control-Allow-Origin

Node Request: Get certain class/id

I am using the Node module Request. I have also used node-fetch in the past but request seems better. I want to be able to log something with a certain ID/Class/Span, you name it. This is my current code which logs the whole body.
const request = require('request');
request("https://discord.js.org/#/docs/main/stable/class/GuildMember", (err, response, body) => {
if (err) console.log(err);
console.log(body)
});
I want to log the contents of any Property the user gives, let's say they give bannable I want to log the contents of div#doc-for-bannable.class-prop.class-item but I'm not sure how to do this.
Edit:
I tried using Cheerio as suggested but it is not giving me any sort of response, just a blank space in my console.The code I tried was:
request("https://discord.js.org/#/docs/main/stable/class/GuildMember", (err, response, body) => {
if (err) console.log(err);
const $ = cheerio.load(body);
const g = $('docs-page'); // this doesn't work either, after trying '#doc-for-bannable`
const gText = g.text();
console.log(gText)
});
The underlying request page builds the DOM after requesting data from backend using XHR/ajax requests.
So when you use 'request', the page is loaded without XHR being fired
However, if you closely look at the Network tab, you will come across
https://raw.githubusercontent.com/hydrabolt/discord.js/docs/stable.json
This has the data that you are looking for.
You can use the nodejs 'request' for this url and consume the json

Serving New Data With Node.js

There may already by an answer to this question but I was unable to find it.
Let's say I have a Node.js webpage doing somewhat time-consuming API calls and computations:
var request = require('request'),
Promise = require('es6-promise').Promise,
is_open = require('./is_open');
// Fetch the name of every eatery
var api_url = 'url of some api';
request(api_url, function (error, response, body) {
if (error) {
console.log(error);
} else if (!error && response.statusCode == 200) {
// Good to go!
var results = JSON.parse(body).events;
results.(function (result) {
// This line makes its own set of API calls
is_open(result
.then(function (output) {
console.log(output);
if (output == false) {
console.log('CLOSED\n');
} else {
console.log(output);
console.log();
}
})
.catch(console.error);
});
} else {
console.log('Returned an unknown error.');
console.log(error);
console.log(response);
console.log(body);
}
});
(I haven't yet created an actual web server, I'm just running the app locally through the command line.)
I want the web server to serve a loading page first to every user. Then, once the API calls are complete and the data is ready, it should send that data in a new webpage to the same user.
The reason I think there's an issue is because in order to serve a webpage, you must end with:
res.end();
Therefore ending the connection to that specific user.
Thanks for the help!
You must conceptually separate static content from dynamic content (later you will serve static with nginx or apache leaving only dynamic to node)
The best solution to your "problem" is to make the first webpage ask the data via AJAX once loaded. Ideally, your node app will return JSON to an ajax query from the first page, and js on the page will format the result creating DOM nodes.

Categories

Resources