Synchronously HTTPS GET with node.js - javascript

So I'm trying to preform a https GET with node.jsand I have the following code
function get(url) {
https.request(url, function(res) {
var data = "";
res.on('data', function (chunk) {
data += chunk;
})
.on('end', function(){
console.log(JSON.parse(data));
});
}).on('error', function(e) {
console.log(e.message);
}).end();
}
This code works fine and dandy except I need this function to return the data its logging
I know the recommended way to do this is to use callbacks, passing a callback function into get and then calling that function in the 'end' listener. But the problem is that this process needs to be synchronized and NOT pipelined because it causes data hazards and uses too much memory. On top of that, its is recursively called and is just one big headache to try and manage.
Basically, I'm trying to return JSON.parse(data) in the get function then the end listener is called, is that possible?

You can't synchronously return data using an asynchronous function to retrieve the data. Your get() function will return long before the https.request() has completed so you just can't do what you asked to do.
The usual design pattern for solving this involves passing in a callback function to your get() function that will be called when the data is available. This will involve restructing the caller of your function to handle an asynchronous response via a callback function.
There are some different choices in how you structure the callback, but here's the general idea:
function get(url, callback) {
https.request(url, function(res) {
var data = "";
res.on('data', function (chunk) {
data += chunk;
})
.on('end', function(){
callback("success", JSON.parse(data));
});
}).on('error', function(e) {
callback("error", e);
}).end();
}
Usage:
get("http://www.example.com/myurl", function(status, data) {
if (status === "success") {
console.log(data);
}
});

May I recommend Q. It is specifically designed to help you fight the famous pyramid of callbacks in JavaScript. I understand that callbacks can lead to less-readable code but you should not try to make synchronous get requests. It kind of defeats the advantages of node.js.
You can convert
step1(function (value1) {
step2(value1, function(value2) {
step3(value2, function(value3) {
step4(value3, function(value4) {
// Do something with value4
});
});
});
});
to this -->
Q.fcall(promisedStep1)
.then(promisedStep2)
.then(promisedStep3)
.then(promisedStep4)
.then(function (value4) {
// Do something with value4
})
.catch(function (error) {
// Handle any error from all above steps
})
.done();

Related

How to ensure asynchronous code is executed after a stream is finished processing?

I have a stream that I process by listening for the data,error, and end events, and I call a function to process each data event in the first stream. Naturally, the function processing the data calls other callbacks, making it asynchronous. So how do I start executing more code when the data in the stream is processed? Listening for the end event in the stream does NOT mean the asynchronous data processing functions have finished.
How can I ensure that the stream data processing functions are finished when I execute my next statement?
Here is an example:
function updateAccountStream (accountStream, callThisOnlyAfterAllAccountsAreMigrated) {
var self = this;
var promises = [];
accountStream
.on('data', function (account) {
migrateAccount.bind(self)(account, finishMigration);
})
.on('error', function (err) {
return console.log(err);
})
.on('end', function () {
console.log("Finished updating account stream (but finishMigration is still running!!!)");
callThisOnlyAfterAllAccountsAreMigrated() // finishMigration is still running!
});
}
var migrateAccount = function (oldAccount, callback) {
executeSomeAction(oldAccount, function(err, newAccount) {
if (err) return console.log("error received:", err);
return callback(newAccount);
});
}
var finishMigration = function (newAccount) {
// some code that is executed asynchronously...
}
How do I ensure that callThisOnlyAfterAllAccountsAreMigrated is called AFTER the stream has been processed?
Can this be done with promises? Can it be done with through streams? I am working with Nodejs, so referencing other npm modules could be helpful.
As you said, listening for the end event on the stream is useless on its own. The stream doesn't know or care what you're doing with the data in your data handler, so you would need to write some code to keep track of your own migrateAccount state.
If it were me, I would rewrite this whole section. If you use the readable event with .read() on your stream, you can read as many items at a time as you feel like dealing with. If that's one, no problem. If it's 30, great. The reason you do this is so that you won't overrun whatever is doing work with the data coming from the stream. As-is right now, if accountStream is fast, your application will undoubtedly crash at some point.
When you read an item from a stream and start work, take the promise you get back (use Bluebird or similar) and throw it into an array. When the promise is resolved, remove it from the array. When the stream ends, attach a .done() handler to .all() (basically making one big promise out of every promise still in the array).
You could also use a simple counter for jobs in progress.
Using a through stream (the npm through2 module), I solved this problem using the following code that controls the asynchronous behaviour:
var through = require('through2').obj;
function updateAccountStream (accountStream, callThisOnlyAfterAllAccountsAreMigrated) {
var self = this;
var promises = [];
accountStream.pipe(through(function(account, _, next) {
migrateAccount.bind(self)(account, finishMigration, next);
}))
.on('data', function (account) {
})
.on('error', function (err) {
return console.log(err);
})
.on('end', function () {
console.log("Finished updating account stream");
callThisOnlyAfterAllAccountsAreMigrated();
});
}
var migrateAccount = function (oldAccount, callback, next) {
executeSomeAction(oldAccount, function(err, newAccount) {
if (err) return console.log("error received:", err);
return callback(newAccount, next);
});
}
var finishMigration = function (newAccount, next) {
// some code that is executed asynchronously, but using 'next' callback when migration is finished...
}
It is a lot easier when you handle streams via promises.
Copied from here, an example that uses spex library:
var spex = require('spex')(Promise);
var fs = require('fs');
var rs = fs.createReadStream('values.txt');
function receiver(index, data, delay) {
return new Promise(function (resolve) {
console.log("RECEIVED:", index, data, delay);
resolve(); // ok to read the next data;
});
}
spex.stream.read(rs, receiver)
.then(function (data) {
// streaming successfully finished;
console.log("DATA:", data);
}, function (reason) {
// streaming has failed;
console.log("REASON:", reason);
});

Understanding control flow in Node.js applications

I am trying to understand control flow in Node.js applications. Specifically does control returns to the original function once callback method completes (like a callback stack in recursive calls). I wrote a simple program that make a GET call and return the data. Here is the program:
Code:
var async = require('async');
var http = require('http');
function getGoogleData(url, callback) {
http.get(url, function(response) {
if (response.statusCode == 200) {
var googleInfo = '';
response.on('data', function(chunk) {
console.log("receiving data... ");
googleInfo += chunk;
return;
});
response.on('end', function() {
console.log("End of data receive... ");
response.setEncoding('utf8');
return callback(null, googleInfo);
});
}
console.log("I am here but why!");
//callback(new Error("GET called failed status_code=" + response.statusCode));
});
console.log("Return from get google data");
}
async.waterfall([
function(callback) {
console.log("In func 1");
getGoogleData("http://www.google.com", callback);
},
function(data, callback) {
console.log("In func 2");
callback(data);
}],
function (err, res) {
console.log("In err fn");
});
Here is output of the program:
Output:
In func 1
Return from get google data
I am here but why!
receiving data...
receiving data...
End of data receive...
In func 2
In err fn
Can someone help me understand why 'I am here but why!' line gets printed as the second output line in console log even after returning from 'data' event emitter? What is the overall control flow here?
The reason you're seeing that message logged first is that all that the code inside the if block is doing is adding event handlers. Those events are emitted some time in the future, after your console.log has already executed.
It's a similar reason why "Return from get google data" gets printed before the request finishes, because the http request is asynchronous.

how to use deferred with https.request in nodejs

I use github to authenticate in my node application. I have constructed the following code:
var req = request(postOptions, function (res) {
res.on('data', function (d) {
...
var getOptions = parseUrl('https://api.github.com/user?access_token=' + accessToken);
...
var req = request(getOptions, function (resp) {
...
resp.on('data', function (d) {
...
})
.on('end', function () {
...
})
});
req.end();
});
});
req.write(postData);
req.end();
I've removed some code, because the point here is that I have a request in a request. Now, nodejs has deferreds The question is if this can be used to simplify the above code ?
Well, you have no error handling. Promises significantly cleans up code that correctly propagates errors and doesn't leak resources because those become automatic. So it's impossible to make a fair comparison because promise code that doesn't handle errors still propagates them.
var Promise = require("bluebird");
var request = Promise.promisifyAll(require("request"));
function githubAuthenticate() {
return request.postAsync(postOptions, postData)
.spread(function(response, body) {
var accessToken = ...
var getOptions = parseUrl('https://api.github.com/user?access_token=' + accessToken);
return request.getAsync(getOptions);
})
.spread(function(response, body) {
});
}
Now imagine if something failed here? You would add a .catch only once, in one place, and handle it there. Since errors automatically propagate, the code above doesn't need to do anything. The consumer code can just do:
gitHubAuthenticate().then(function() {
}).catch(function(err) {
// Any error that happened with the post, get or your code gets here
// automatically
});

JavaScript callback vs return

I have a JavaScript (Node) function to grab the content of a web page and handle it with a callback:
'using strict';
var http = require('http');
function download(url, callback) {
http.get(url, function(res) {
var content = '';
res.on('data', function (chunk) {
content += chunk;
});
res.on('end', function() {
callback(content);
});
}).on('error', function() {
callback(null);
});
}
What I don't understand is why I can't simply return the result on 'end'. Clearly, when the 'end' event is emitted, the 'content' variable contains a string with the content of a web page, otherwise it couldn't be submitted to the callback function, so why can't I just return it like this:
function download2(url) {
http.get(url, function(res) {
var content = '';
res.on('data', function(chunk) {
content += chunk;
});
res.on('end', function() {
return content;
});
}).on('error', function() {
return null;
});
}
download2 always returns undefined. Why?
These are asynchronous functions. They have already long since completed before the callback functions are called. Thus the desired return result is not known when either of your download functions returns. For data passed to asynchronous callbacks, the ONLY place you can do something with that data is from the callback itself. You can put your code to handle that data in the callback or you can call some other function from within the callback and pass the data to it.
This is asynchronous programming and you really have to get used to it in node because it's used there a lot. It is significantly different than synchronous programming in that you can't call a function that starts an asynchronous operation and expect the parent function to get the result and return it. The result of the asynchronous operation won't be known until sometime later, long after the parent function has already returned.
Thus, the way you've structured it in your first download() function is the usual way of handling this.

Collect Multiple JSONP Results

I have a javascript application that gets data from multiple jsonp requests. Once all of the data is returned the page will be updated with the new data. Below is pseudocode, but it's structured for a synchronous environment.
function GetAllData() {
var data1= GetData1();
var data2= GetData2();
var data3= GetData3();
UpdatePage(data1,data2,data3);
}
The issue I have is that I need to collect, and know, when all the data has been returned from the jsonp requests before I update the page. I was looking at jquery deferred, but I'm not sure if that's the correct solution.
Any suggestions would be appreciated.
Deferred is your correct solution, when you are using JQuery.
function GetData1() {
return $.ajax("/foo", ...);
}
function GetData2() {
return $.ajax("/bar", ...);
}
function GetData3() {
return $.ajax("/baz", ...);
}
function UpdatePage(data1, data2, data3) {
...
}
function Error() {
alert("An error occurred while fetching data");
}
function GetAllData() {
$.when(GetData1(), GetData2(), GetData3()).then(UpdatePage, Error);
}

Categories

Resources