In my node.js webpage I'm making a page preview similar to the Facebook link preview. I'm making a call to get the html of the page, and use it to create the preview.
$.ajax({
type: 'GET',
data: { "html": url },
url: "/htmlTest",
success: function (data) {
imgArray = [];
$('img', data).each(function () {
imgArray.push(this.src);
});
...
This is the server-side code that handles the request.
app.get('/htmlTest', function (req, res) {
res.writeHead(200, { 'content-type': 'text/html' });
request(req.query.html, function (error, response, body) {
if (error) {
res.write(error.toString());
res.end('\n');
}
else if (response.statusCode == 200) {
res.write(body);
res.end('\n');
}
})
});
Now what I've been noticing, is that it will just insert any css the other page uses into my page, which can really screw everything up. Why is this happening?
Also, while I'm at it, does anyone have any better ideas for a facebook-style page preview?
No. writeHead writes HTTP headers to the underlying TCP stream. It has absolutely nothing to do with HTML.
You're running into an issue because your server returns the wholesale HTML content of the requested URL. You then pass this string into jQuery, which is apparently adding contained CSS styles into your document.
Generally, it is a terrible idea to take random code from a user-supplied URL and run in the context of your page. It opens you to gaping security holes – the CSS artifacts you're seeing are one example.
To be blunt, your code has numerous problems, so bear with me as I point out some issues.
app.get('/htmlTest', function (req, res) {
res.writeHead(200, { 'content-type': 'text/html' });
Here, you respond to the browser with a success status (200) beore your server actually does anything. This is incorrect: you should only respond with either a success or error code after you know if the request succeeded or failed.
request(req.query.html, function (error, response, body) {
if (error) {
res.write(error.toString());
res.end('\n');
}
Here would be a good place to respond with an error code, since we know that the request did actually fail. res.send(500, error) would do the trick.
else if (response.statusCode == 200) {
res.write(body);
res.end('\n');
}
And here's where we could respond with a success code. Rather than use writeHead, use Express's set and send methods – things like Content-Length will be correctly set:
res.set('Content-Type', 'text/html');
res.send(body);
Now what happens if response.statusCode != 200? You don't handle that case. error is only set in the case of network errors (such as inability to connect to the target server). The target server can still respond with a non-200 status, and your node server would never respond to the browser. In fact, the connection would hang open until the user kills it. This could be fixed with a simple else res.end().
Even with these issues resolved, we still haven't addressed the fact that it's not a good idea to try to parse arbitrary HTML in the browser.
If I were you, I'd use something that parses HTML into a DOM on the server, and then I'd return only the necessary information back to the browser as JSON. cheerio is the module you probably want to use – it looks just like jQuery, only it runs on the server.
I'd do this:
var cheerio = require('cheerio'), url = require('url'), request = require('request');
app.get('/htmlTest', function(req, res) {
request(req.query.url, function(err, response, body) {
if (err) res.send(500, err); // network error, send a 500
else if (response.status != 200) res.send(500, { httpStatus: response.status }); // server returned a non-200, send a 500
else {
// WARNING! We should probably check that the response content-type is html
var $ = cheerio.load(body); // load the returned HTML into cheerio
var images = [];
$('img').each(function() {
// Image srcs can be relative.
// You probably need the absolute URL of the image, so we should resolve the src.
images.push(url.resolve(req.query.url, this.src));
});
res.send({ title: $('title').text(), images: images }); // send back JSON with the image URLs
}
});
});
Then from the browser:
$.ajax({
url: '/htmlTest',
data: { url: url },
dataType: 'json',
success: function(data) {
// data.images has your image URLs
},
error: function() {
// something went wrong
}
});
Related
I'm new to Node.Js, and I have written two methods post and get. First of all I want the post to look like the get request (asynchronous function) so I can export it from a different js file, but I'm not sure if it is asynchronous right now. Second, I want the get request to be able to send a cookie with the url also.
here is the scenario:
1- send a post request and set the cookie. The server will direct me to another location.
2- get the location and send a get request and include the cookie.
here is my code for POST:
request.post({
headers: { 'content-type': 'application/x-www-form-urlencoded' },
url: 'http://example/login',
body: 'mes=heydude',
followRedirect: true,
form: {
username: 'usr',
password: 'pass'
}
}, function (error, response, body) {
if (error) {
console.log(error)
}
var setCookie = response.headers['set-cookie']
console.log('login success')
console.log(body)
})
This method works fine but how to make it an asynchronous function? like the below get request? also how to make the get request returns the cookie and the html?
function fetch (url, callback) {
return new Promise(function (resolve, reject) {
request(url, function (error, response, html) {
if (error) {
return reject(error)
}
if (response.statusCode !== 200) {
return reject(new Error('error code'))
}
return resolve(html)
})
})
}
If there is ane module that can do that on NPM let me know thanks.
Help a newbie and get a trophy!
If you are using Express, you should look at cookie-parser. This is a library that handles cookies (as the name suggests) so you don't have to reinvent the wheel.
If you don't know what Express is, I suggest you to look at it.
Express is a minimal and flexible Node.js web application framework
that provides a robust set of features for web and mobile
applications.
I am using firebase for hosting cloud functions, since many functions (about every) I need to make the http request and get the json body to get the data from it. However, the callback doesn't work quite well for me, I've searched some existing answers but still get stuck on this. Here is the code snippet, options are declared before and if I do not put the request within get_request_handler it works fine.:
function get_request_handler(assistant, input_url, callback) {
req(options, function (error, response, body) {
if (!error && response.statusCode == 200) {
var cookie = req.cookie('BPMSTS=' + body );
var headers = {
'Content-Type': 'application/json',
'Cookie': cookie
};
var option = {
url: input_url,
method: 'GET',
headers: headers
}
req(option, function(error, res, body) {
assistant.ask(input_url);
if (!error && res.statusCode == 200) {
callback(JSON.parse(body));
} else {
assistant.ask('inner request with error code: ' + (res.statusCode).toString());
}
});
} else {
assistant.ask('outer request with error code: ' + (response.statusCode).toString());
}
});
}
I call the function as follows:
get_request_handler(assistant, workflow_url, function(cur_json){assistant.ask(cur_json);});
The problem right now is the first request can't be made in the get_request_handler function. In other words, it only goes in to get_request_handler but not go into that request body. If I do not create get_request_handler and left req(options, function (error, response, body) { ... } it works without any problem. Any ideas on this?
Note: I just checked firebase log and it says for this line: req(options, function (error, response, body) it got TypeError: Assignment to constant variable.
at get_request_handler (/user_code/index.js:116:13)
You have a lot of problems here, but the basic one is that you're trying to call assistant.ask() more than once in your response to the user. Once you call assistant.ask(), no further responses can be sent, and none of the other calls to ask() will be handled.
It looks like you're using it for debugging. That seems a really poor choice when you should be using console.log().
You also indicated that you're using Firebase Functions. Note that calls out from a Firebase function are restricted if you're on the free plan. If you're on one of the paid plans there is no restriction, and there is a free tier which should be more than sufficient for testing.
var server = http.createServer(function (req, res) {
if (req.method.toLowerCase() == 'get') {
displayForm(res);
} else if (req.method.toLowerCase() == 'post') {
//processAllFieldsOfTheForm(req, res);
processFormFieldsIndividual(req, res);
}
});
function displayForm(res) {
fs.readFile('form.html', function (err, data) {
res.writeHead(200, {
'Content-Type': 'text/html',
'Content-Length': data.length
});
res.write(data);
res.end();
});
}
I am following this example, where form.html contains only html blocks. My question is, can't jQuery/javaScript code be merged with form.html in server response? I have tried both external and internal js code but to no avail.
When the browser asks for the JavaScript file, it makes a GET request to your server.
Your server checks the request and, if it is a GET request, returns the content of form.html.
You need to check the request object to see what path is being requested and serve the content the browser is asking for instead of always serving the content of form.html.
I am using the node module 'request' to send up some JSON to my REST API.
I have the following call
request({
uri: urlToUse, // Using the url constructed above.
method: "POST", // POSTing to the URI
body: JSON.stringify(jsonItems),
headers: // Allows us to authenticate.
{
'Content-Type' : 'application/json',
'Authorization' : auth
}},
//What to do after the request...
function(err, response, body)
{
// Was there an error?
if (err)
{
// If so, log it.
console.log(err);
}
// Did the server respond?
if (response)
{
// Log it.
console.log(response.statusCode);
// Did the response have a body?
if(body)
{
// Log it.
console.log(body);
}
}
});
I want to add to this - I would like to be able to act on a 429 status code - in order to make it retry the request until complete.
I know how to detect the 429 (using an if statement to check response.statusCode, etc), But I don't know how to make it retry, or if that is even the way to do it best.
It seems to me what you want to do is just wrap all this code in a function and then have it call itself if the response code is 429. perhaps even include a "try attempt" number as the last optional parameter to this function, you can then keep a count of how many times you've called it, and act accordingly to your preference
I am using request module at the client side to perform a REST get request where middleware is connect which then routes the request to my node server that serves it. The issue is that i tried to use the option json:true while making a request using the request module, So that i do not need to parse and validate the response body i receive. But unfortunately it doesn't reach the server as it fails in the middleware(connect) itself saying "Invalid JSON", since it seems to validate for JSON (when there is no request body) due to the content-type set by the request module.
Here is a request that i make using request module.
request(
{
uri: myurl,
json: true, //issue area
headers: {
//some headers. but no content-type sepcified
}
}
, function (error, response, body) {
console.log(body);
//Here body comes as object if json:true (not for get as it fails in validation at connect middleware itself), else i need to perform JSON.parse(body).
});
Here is definition for json property in the settings of request module (from the documentation).
json - sets body but to JSON representation of value and adds Content-type: application/json header. Additionally, parses the response body as json.
But obviously it is a GET request and there won't be any content-type that i would set (But with json:true option request module seems to be setting it internally).
I could trace this down through connect's json.js snippet below
return function json(req, res, next) {
if (req._body) return next();
req.body = req.body || {};
// check Content-Type
//This guy fails because content-type is set as application/json by request module internally
if ('application/json' != utils.mime(req)) return next();
// flag as parsed
req._body = true;
// parse
limit(req, res, function(err){
if (err) return next(err);
var buf = '';
req.setEncoding('utf8');
req.on('data', function(chunk){ buf += chunk });
req.on('end', function(){
//Here the problem area obviously buf[0] is undefined
if (strict && '{' != buf[0] && '[' != buf[0]) return next(utils.error(400, 'invalid json'));
try {
......
Clearly this is not an issue with connect, but it is probably an incomplete functionality provided with json:true property. I know that i can just set json:false and parse the response (JSON) to javascript object using JSON.parse() but i get this flexibility for other request types (when setting json:true) that i do not need to validate or parse the JSON to object manually instead i get it as object from request module's complete callback.
I would like to know if there is any other option where i can get the response body as object without these issues caused by failure in connect, or any other information on this feature that justifies this behavior with json:true (I couldn't find any), or any other solution that anyone has used, or any satisfactory explanation on this also is appreciated!! Thanks.
Adding an answer if somebody else go through the same issue.
Looking at the request module source, it seems to be a bug which has already been fixed in the latest version of request. So if you are using an older version (mine was 2.0.5) consider upgrading it a newer one.
Older one had the following code, so no matter the json is true and no body has been set explicitly it still used to set the content-type as header.
if (options.json) {
options.headers['content-type'] = 'application/json' //<-- this is being set always
if (typeof options.json === 'boolean') {
if (typeof options.body === 'object') options.body = JSON.stringify(options.body)
} else {
options.body = JSON.stringify(options.json)
}
......
With the latest version this changes:
if (options.json) {
self.json(options.json)
//...More code
//and in json function
this._json = true
if (typeof val === 'boolean') {
if (typeof this.body === 'object') {
this.body = safeStringify(this.body)
self.setHeader('content-type', 'application/json') //<-- sets it only if there is a body
}
} else {
this.body = safeStringify(val)
self.setHeader('content-type', 'application/json')
}