express.js request/response object life cycle when using callbacks - javascript

Please feel free to sort me out if this is a redundant question. (I have searched as much as I can bear before asking)
I am desperately trying to understand the life cycle of the request/response objects.
Consider the following skeleton code:
app.get('/', function(req, res) {
var setCookie = function(err, idCookie) { //callback after cookieSearch in DB
// ... do something with the result of findCookieInDatabase() (handle error, etc.);
res.sendfile('index.html');
}
var cookie = parseCookie(req.get('Cookie')); //parse and format cookie
findCookieInDatabase(cookie, afterCookieSearch); //tries to find cookie is DB
// .. do some content return stuff, etc.
}
(Please note that the original code does a whole lot more, such as checking if 'Cookie' even exists etc.)
I uderstand that req and res objects are created and must be at some point garbage collected. (One would hope)
When findCookieInDatabase() is called with setCookie as a parameter I'm assuming that setCookie is just a string (containing the function) at the time and does not get parsed nor executed until the callback(setCookie) statement is encountered in findCookieInDatabase().
I also understand that I might be utterly wrong with the above assumption which would be due to my lack of understanding the guts of javascript callbacks. (I have searched a lot on this also but all I could find is endless tuts on how to use callbacks. Nothing on what's under the hood)
So the question is this:
How does javascript (or node.js) know how long to keep 'res' alive and when is it ok to garbage collect it?
Does the line res.sendfile in setCookie actually act as an active reference because it is invoked through findCookieInDatabase()?
Does javascript actually keep track of all the references and keeps req and/or res alive so long as any part of any called/callback-ed/things are alive?
Any help greatly appreciated. Thanks for reading.

There's a lot going on with your code -- and your assumptions -- that indicates to me that you should study some JavaScript fundamentals. I would recommend the following books:
Speaking JavaScript, by Axel Rauschmayer
JavaScript: The Good Parts, by Douglas Crockford (please note: I don't think everything that Douglas Crockford says is golden, but I would consider this book required reading for aspiring JavaScript programmers)
Learning Node, by Shelley Powers
And of course my own book, Web Development with Node and Express. Okay, now that I've got all the reading material out of the way, let me try to get to the heart of your question.
When Node receives an HTTP request, it creates the req and res objects (which begin their life as instances of http.IncomingMessage and http.ServerResponse respectively). The intended purpose of those objects is that they live as long as the HTTP request does. That is, the client makes an HTTP request, the req and res objects are created, a bunch of stuff happens, and finally a method on res is invoked that sends an HTTP response back to the client, and at that point the objects are no longer needed.
Because of Node's asynchronous nature, there could be multiple req and res objects at any given time, distinguished only by the scope in which they live. This may sound confusing, but in practice, you never have to worry about that: when you write your code, you write it as if you were always ever dealing with one HTTP request, and your framework (Express, for example), manages multiple requests.
JavaScript does indeed have a garbage collector, which eventually deallocates objects after their reference count drops to zero. So for any given request, as long as there's a reference to the req object (for example), that object will not be deallocated. Here's a simple example of an Express program that always saves every request (this is a terrible idea, by the way):
var allRequests = [];
app.use(function(req, res, next) {
allRequests.push(req);
next();
});
The reason this is a terrible idea is that if you never remove objects from allRequests, your server will eventually run out of memory as it processes traffic.
Typically, with Express, you'll rely on asynchronous functions that invoke a callback once they've done their work. If the callback function has a reference to the req or res objects, they will not be deallocated until the asynchronous function completes and the callback executes (and all other references are out of scope, naturally). Here's a simple example that just generates an artificial delay:
app.get('/fast', function(req, res) {
res.send('fast!');
});
app.get('/slow', function(req, res) {
setTimeout(function() {
res.send('sloooooow');
}, 3000);
});
If you navigate to /slow, your browser will spin for 3 seconds. In another browser, if you access /fast a bunch of times, you'll find that it still works. That's because Express is creating a req and res object for each request, none of which interfere with each other. But the res object associated with the /slow request is not deallocated because the callback (which is holding a reference to that instance) hasn't executed.
At the end of the day, I feel like you're overthinking things. It's good to understand the fundamentals, certainly, but for the most part, reference counting and garbage collection in JavaScript is not something you have to think about or manage.
I hope this helps.

Related

How to determine the necessary paramaters in javascript callbacks

I've been dipping my feet into javascript and more specifically node.js but I'm having trouble identifying required parameters for callbacks.
For example, when creating a route through Express, I can have the following
app.get('/', function() {
console.log('this is a route');
});
Which will execute without giving me any trouble. However, having seen multiple examples, I know that I probably want to have something more along the lines of
app.get('/', function(req, res) {
res.render('index');
});
But without having seen examples or documentation (which is sometimes just a couple unclear examples) is there a consistent way to determine what parameters a callback is expected to have?
I hope I've been clear.
Without documentation, or inspecting the source of the function executing the callback, you wont easily know.
However, you can intercept them with some exploratory code and see what you get:
app.get('/', function() {
console.log(arguments);
});
The arguments keyword here is the list of arguments passed to the callback function, so this will let you see what you got. If it tell you something is a Express.Request or something, this at least lets you know what to try to find in the docs.
But outside of standard javascript, using typescript or flow helps with this since it adds static types to javascript. If this function is typed, then your editor will then know arguments the callback function expects and can help you fill them in.
Since you're using Express, the documentation is pretty clear - It depends on your route parameters and whether or not you're using middleware. There is no hard and fast rule, it genuinely depends on your route's function.
Your first example "works" because you're only printing to the console, but without the res response object you'll notice that that the request response returns nothing.
Start with req and res for each and expand as needed.

Basic scope issue (javascript and node)

Hi I have a very simple (i think) js question that I seem to be stuck on.
I am trying to create the route below.
What gets console.logged from within the bEtsy function is what I would like to have display on the page. however 'body' is not available outside of that scope.
app.get('/api/etsy/getListings',function(req, res){
bEtsy.getAllListings(req, res, function(err, body) {
// console.log(body);
});
res.json(req.body); // both this and res.json(body) does not work
});
Move res.json(req.body); into the callback function.
Apart from the scoping problem: It is asynchronous, so in your code it will be called long after res.json(req.body) runs.
app.get('/api/etsy/getListings', function(req, res) {
bEtsy.getAllListings(req, res, function(err, body) {
res.json(body);
//console.log(body);
});
});
A more general piece of advice (or two or three pieces), aside from the problem at hand:
What helps me with such situations and "callback thinking" is to almost never use inline callback functions: Write code only one layer deep (plus one layer for the module pattern of course), avoid callback hell! Name all callbacks and write them all on the same (top) level.
function allListingsReceived(err, body, res) {
res.json(body);
//console.log(body);
}
function getListings(req, res) {
// ASYNC
bEtsy.getAllListings(req, res, allListingsReceived);
}
//ASYNC
app.get('/api/etsy/getListings', getListings);
This allows me to get a much better overview over the actual call sequence. In this case, when getAllListings is called you know it is asynchronous - in my own code I add a clear comment (like I did above). So I know anything I were to write after that async function would not be able to access anything that async function is supposed to get me. IMHO such a comment is important - in Javascript there is no way to know if a callback function is asynchronous. Usually it is, but if it's synchronous and you expect asynchronism you may get into trouble too! So I think it's better to write it as a comment (always the exact same short string throughout the whole project), a formalized code annotation. Which by the way leads to another problem: When you write functions that accept a callback function, make sure they always call it either synchronously or asynchronously, never both ways (some functions use cached values and are able to return a result right away instead of starting an async. network request).
Basically, the written structure does not reflect the runtime situation with this style - but this is okay, since the runtime situation is completely flexible anyway (if you want to change which callback function you use, or add another one in between, do you really want to shift around tons of lines of code instead of just exchanging a name? Not to mention an increase in the ease of reusability). This is much easier to read in longer callback-style code files then several layers deep nested asynchronous functions IMHO. Avoid functions inside functions, apart from the module pattern, as much as possible.
Having named functions also is much better for debugging, stack traces are much easier to read.
A note: My example code leaves one issue open: if this is inside a module (or class), those would be internal functions, and you may have to make sure about the correct context/scope (where this points to, if you access object member variables this way from inside those functions). It works the same when those functions are on the prototype though. So this is just a general concept example that disregards this side issue.
Another note: When writing in this style variable that previously were available to an inner function via a closure - in this example res - now have to be made available as function parameters when calling the callback function. That adds some complexity - but on the other hand forces you to create clean(er) APIs in your own code. Personally I don't like excessive reliance on closures to pass arguments. I'm too stupid, I prefer to have a clean interface definition by having all parameters a function uses in its header. Apparently I'm not alone, that is one of the advantages most often touted for functional programming :) An alternative to filling the header with arguments that is "clean" too are object properties under this. My small example looks a little "procedural", but it only served to illustrate one single point. Of course this belongs into a larger context of modular programming.

Node.js - what happens if I call response.end while I/O and callbacks are still running?

In Node.js, what happens if call "response.end()" while my I/O calls and/or callbacks are still executing? As in the following:
var app = http.createServer(function(request, response) {
response.writeHead(200, { 'Content-Type': 'text/plain'});
fs.writeFile('baz', 'contents', function() {
myOtherFunc();
response.end('Second response.end');
});
response.end('First response.end');
});
Specifically:
Is the HTTP connection freed up immediately upon calling the first response.end? (Bonus points: how can I inspect this myself?)
Can I use this to perform arbitrarily complex/costly computation, even synchronous ones, within myOtherFunc? Since the connection has been freed the client is no longer waiting? (Or is there any reason why not?)
Can this be used as a paradigm to perform 'background' tasks upon invocation, with 'myOtherFunc' being an arbitrary background task -- since it is essentially now running "in the background"?
I haven't tested, but:
Given the asynchronous nature of node, it's possible the stream wouldn't be freed up before the second response.end is called, but I doubt you can rely on that... at some point the connection must close and trying to send new data will "at best" fail silently.
The connection is freed and the client won't be waiting... for that request, at least, but costly synchronous computations will hold up the rest of your application, full stop. Any subsequent requests will have to wait for your work to finish, and odds are response will be gone if you don't get to it by the next few ticks.
Learn more about node's single threadedness. node works not by doing a bunch of things at once, but by not blocking while it's waiting. There is no "background" unless you explicitly spawn your own thread to do something.
Edit: I'm working with the assumption that the response stream is closed explicitly, with the end call, rather than just sitting out there waiting to be garbage collected. My assumption is it's just done asynchronously rather than waiting on completion to continue on, and if you get there within the next couple ticks of the event loop, it may still be allocated.
Edit Again: Your intrepid answerer has searched tirelessly through the node source and confirmed that, indeed, two calls to end should indeed not work, the second should be short circuited by the OutgoingMessage.finished property (see lines 499-501 and 541)
Calling response.end doesn't stop any asynchronous code that's still executing. You'll might see weird behavior though if you try to end or modify a response that's already been ended. So, basically yes to all. I'll admit though, I'm not an expert on how node handles its HTTP connections behind the scenes.

Method for managing callbacks for RPC

I am building (yet another) RPC client/server for calling methods between a browser and a Node.js server. (My reason for this is that I already have streams and serialization taken care of by BinaryJS, and there is no solid solution that I've already seen that simply uses object streams for RPC. rpc-stream seemed like a good fit, but it only allowed for a single callback and was inflexible in parameter ordering and what not.)
I'm having a hard time figuring out how to implement callbacks. Suppose on the client, I call some code like this:
rpcClient.someRemoteMethod('parameter1', 'parameter2', function callback1 () {
// callback1
}, function callback2 () {
// callback2
});
I would expect the client to then create a message to the RPC server along these lines:
{
method: 'someRemoteMethod',
parameters: ['parameter1', 'parameter2', /* something to reference callback1 */, /* something to reference callback2 */]
}
On the server end, the RPC server could do something like:
wrappedObject[data.method].apply(parameters);
How do I handle the callbacks though? Obviously I need to create some function on the fly that then sends a set of parameters back to the other end, as well as shuffles the return value back to the server. This isn't too hard... I can have a set of IDs or something to track functions in a collection. However, how long do I keep one of these dynamically created callbacks around on the server end?
If I create one of these callback functions, how do I know that all references to it have been removed, other than the reference in my collection of callbacks? I don't want to get into a situation where I have millions of callbacks in a collection sitting around just because other code might call them some time.
For what it's worth, Socket.IO had a similar problem a couple years ago. I don't know if it still applies.

How to call an asynchronous JavaScript function and block the original caller

I have an interesting situation that my usually clever mind hasn't been able to come up with a solution for :) Here's the situation...
I have a class that has a get() method... this method is called to get stored user preferences... what it does is calls on some underlying provider to actually get the data... as written now, it's calling on a provider that talks cookies... so, get() calls providerGet() let's say, providerGet() returns a value and get() passes it along to the caller. The caller expects a response before it continues its work obviously.
Here's the tricky part... I now am trying to implement a provider that is asychronous in nature (using local storage in this case)... so, providerGet() would return right away, having dispatched a call to local storage that will, some time later, call a callback function that was passed to it... but, since providerGet() already returned, and so did get() now by extension to the original called, it obviously hasn't returned the actual retrieved data.
So, the question is simply is there a way to essentially "block" the return from providerGet() until the asychronous call returns? Note that for my purposes I'm not concerned with the performance implications this might have, I'm just trying to figure out how to make it work.
I don't think there's a way, certainly I know I haven't been able to come up with it... so I wanted to toss it out and see what other people can come up with :)
edit: I'm just learning now that the core of the problem, the fact that the web sql API is asychronous, may have a solution... turns out there's a synchronous version of the API as well, something I didn't realize... I'm reading through docs now to see how to use it, but that would solve the problem nicely since the only reason providerGet() was written asychronously at all was to allow for that provider... the code that get() is a part of is my own abstraction layer above various storage providers (cookies, web sql, localStorage, etc) so the lowest common denominator has to win, which means if one is asychronous they ALL have to be asychronous... the only one that was is web sql... so if there's a way to do that synchronously my point become moot (still an interesting question generically I think though)
edit2: Ah well, no help there it seems... seems like the synchronous version of the API isn't implemented in any browser and even if it was it's specified that it can only be used from worker threads, so this doesn't seem like it'd help anyway. Although, reading some other things it sounds like there's a way to pull of this trick using recursion... I'm throwing together some test code now, I'll post it if/when I get it working, seems like a very interesting way to get around any such situation generically.
edit3: As per my comments below, there's really no way to do exactly what I wanted. The solution I'm going with to solve my immediate problem is to simply not allow usage of web SQL for data storage. It's not the ideal solution, but as that spec is in flux and not widely implemented anyway it's not the end of the world... hopefully when its properly supported the synchronous version will be available and I can plug in a new provider for it and be good to go. Generically though, there doesn't appear to be any way to pull of this miracle... confirms what I expected was the case, but wish I was wrong this one time :)
spawn a webworker thread to do the async operation for you.
pass it info it needs to do the task plus a unique id.
the trick is to have it send the result to a webserver when it finishes.
meanwhile...the function which spawned the webworker sends an ajax request to the same webserver
use the synchronous flag of the xmlhttprequest object(yes, it has a synchronous option). since it will block until the http request is complete, you can just have your webserver script poll the database for updates or whatever until the result has been sent to it.
ugly, i know. but it would block without hogging cpu :D
basically
function get(...) {
spawnWebworker(...);
var xhr = sendSynchronousXHR(...);
return xhr.responseTEXT;
}
No, you can't block until the asynch call finishes. It's that simple.
It sounds like you may already know this, but if you want to use asynchronous ajax calls, then you have to restructure the way your code is used. You cannot just have a .get() method that makes an asynchronous ajax call, blocks until it's complete and returns the result. The design pattern most commonly used in these cases (look at all of Google's javascript APIs that do networking, for example) is to have the caller pass you a completion function. The call to .get() will start the asynchronous operation and then return immediately. When the operation completes, the completion function will be called. The caller must structure their code accordingly.
You simply cannot write straight, sequential procedural javascript code when using asynchronous networking like:
var result = abc.get()
document.write(result);
The most common design pattern is like this:
abc.get(function(result) {
document.write(result);
});
If your problem is several calling layers deep, then callbacks can be passed along to different levels and invoked when needed.
FYI, newer browsers support the concept of promises which can then be used with async and await to write code that might look like this:
async function someFunc() {
let result = await abc.get()
document.write(result);
}
This is still asynchronous. It is still non-blocking. abc.get() must return a promise that resolves to the value result. This code must be inside a function that is declared async and other code outside this function will continue to run (that's what makes this non-blocking). But, you get to write code that "looks" more like blocking code when local to the specific function it's contained within.
Why not just have the original caller pass in a callback of its own to get()? This callback would contain the code that relies on the response.
The get() method will forward the callback to providerGet(), which would then invoke it when it invokes its own callback.
The result of the fetch would be passed to the original caller's callback.
function get( arg1, arg2, fn ) {
// whatever code
// call providerGet, passing along the callback
providerGet( fn );
}
function providerGet( fn ) {
// do async activity
// in the callback to the async, invoke the callback and pass it the data
// ...
fn( received_data );
// ...
}
get( 'some_arg', 'another_arg', function( data ) {
alert( data );
});
When your async method starts, I would open some sort of modal dialog (that the user cannot close) telling them that the request is in process. When the request finishes, close the modal in your callback.
One possible way to do this is with jqModal, but that would require you to load jQuery into your project. I'm not sure if that's an option for you or not.
This is ugly, but anyway I think the question is kindof implying an ugly solution is desired...
In your get function, serialize your query into a string.
Open an iframe, passing (A) this serialized query and (B) a random number in querystring to this iframe
Your iframe has some javascript code that reads the SQL query and number from its own querystring
Your iframe asynchronously begins running the query.
When your iframe query is asynchronously finished, it sends it, along with the random number to a server of yours, say to /write.php?rand=###&reslt="blahblahblah"
Write.php saves this info somewhere
Back in your main script, after creating and loading the iframe, you create a synchronous AJAX request to your server, say to /read.php?rand=####
/read.php blocks until the written info is available, then returns it to your main page
Alternately, to avoid sending the data over the network, you could instead have your iframe encode the result into a canvas-generated image that the browser caches (similar to the approach that Zombie cookie reportedly used). Then your blocking script would try to continually load this image over and over again (with some small network-generated delay on each request) until the cached version is available, which you could recognize via some flag that you've set to indicate it's done.

Categories

Resources