garbage collection with node.js

garbage collection with node.js - javascript

I was curious about how the node.js pattern of nested functions works with the garbage collector of v8.
here's a simple example
readfile("blah", function(str) {
var val = getvaluefromstr(str);
function restofprogram(val2) { ... } (val)
})
if restofprogram is long-running, doesn't that mean that str will never get garbage collected? My understanding is that with node you end up with nested functions a lot. Does this get garbage collected if restofprogram was declared outside, so str could not be in scope? Is this a recommended practice?
EDIT I didn't intend to make the problem complicated. That was just carelessness, so I've modified it.

Simple answer: if value of the str is not referenced from anywhere else (and str itself is not referenced from restofprogram) it will become unreachable as soon as the function (str) { ... } returns.
Details: V8 compiler distinguishes real local variables from so called context variables captured by a closure, shadowed by a with-statement or an eval invocation.
Local variables live on the stack and disappear as soon as function execution completes.
Context variables live in a heap allocated context structure. They disappear when the context structure dies. Important thing to note here is that context variables from the same scope live in the same structure. Let me illustrate it with an example code:
function outer () {
var x; // real local variable
var y; // context variable, referenced by inner1
var z; // context variable, referenced by inner2
function inner1 () {
// references context
use(y);
}
function inner2 () {
// references context
use(z);
}
function inner3 () { /* I am empty but I still capture context implicitly */ }
return [inner1, inner2, inner3];
}
In this example variable x will disappear as soon as outer returns but variables y and z will disappear only when both inner1, inner2 and inner3 die. This happens because y and z are allocated in the same context structure and all three closures implicitly reference this context structure (even inner3 which does not use it explicitly).
Situation gets even more complicated when you start using with-statement, try/catch-statement which on V8 contains an implicit with-statement inside catch clause or global eval.
function complication () {
var x; // context variable
function inner () { /* I am empty but I still capture context implicitly */ }
try { } catch (e) { /* contains implicit with-statement */ }
return inner;
}
In this example x will disappear only when inner dies. Because:
try/catch-contains implicit with-statement in catch clause
V8 assumes that any with-statement shadows all the locals
This forces x to become a context variable and inner captures the context so x exists until inner dies.
In general if you want to be sure that given variable does not retain some object for longer than really needed you can easily destroy this link by assigning null to that variable.

Actually your example is somewhat tricky. Was it on purpose? You seem to be masking the outer val variable with an inner lexically scoped restofprogram()'s val argument, instead of actually using it. But anyway, you're asking about str so let me ignore the trickiness of val in your example just for the sake of simplicity.
My guess would be that the str variable won't get collected before the restofprogram() function finishes, even if it doesn't use it. If the restofprogram() doesn't use str and it doesn't use eval() and new Function() then it could be safely collected but I doubt it would. This would be a tricky optimization for V8 probably not worth the trouble. If there was no eval and new Function() in the language then it would be much easier.
Now, it doesn't have to mean that it would never get collected because any event handler in a single-threaded event loop should finish almost instantly. Otherwise your whole process would be blocked and you'd have bigger problems than one useless variable in memory.
Now I wonder if you didn't mean something else than what you actually wrote in your example. The whole program in Node is just like in the browser – it just registers event callbacks that are fired asynchronously later after the main program body has already finished. Also none of the handlers are blocking so no function is actually taking any noticeable time to finish. I'm not sure if I understood what you actually meant in your question but I hope that what I've written will be helpful to understand how it all works.
Update:
After reading more info in the comments on how your program looks like I can say more.
If your program is something like:
readfile("blah", function (str) {
var val = getvaluefromstr(str);
// do something with val
Server.start(function (request) {
// do something
});
});
Then you can also write it like this:
readfile("blah", function (str) {
var val = getvaluefromstr(str);
// do something with val
Server.start(serverCallback);
});
function serverCallback(request) {
// do something
});
It will make the str go out of scope after Server.start() is called and will eventually get collected. Also, it will make your indentation more manageable which is not to be underestimated for more complex programs.
As for the val you might make it a global variable in this case which would greatly simplify your code. Of course you don't have to, you can wrestle with closures, but in this case making val global or making it live in an outer scope common for both the readfile callback and for the serverCallback function seems like the most straightforward solution.
Remember that everywhere when you can use an anonymous function you can also use a named function, and with those you can choose in which scope do you want them to live.

My guess is that str will NOT be garbage collected because it can be used by restofprogram().
Yes, and str should get GCed if restofprogram was declared outside, except, if you do something like this:
function restofprogram(val) { ... }
readfile("blah", function(str) {
var val = getvaluefromstr(str);
restofprogram(val, str);
});
Or if getvaluefromstr is declared as something like this:
function getvaluefromstr(str) {
return {
orig: str,
some_funky_stuff: 23
};
}
Follow-up-question: Does v8 do just plain'ol GC or does it do a combination of GC and ref. counting (like python?)

Related

When to use an anonymous function in Nodejs?

Recently I was debugging a bit of code that one of my coworkers wrote and found something along the lines of this within a conditional, inside of a larger function.
(function(foo){
...
console.log(foo);
...
})();
So it looks like this very boiled down
function display(db, user, foo) {
if (isValid(db, user) {
(function(foo){
console.log(foo);
})();
}
...
}
Originally it didn't have (); at the end so to my understanding it wasn't even being called but to my broader question, what's the difference between that and something like below? I've seen this syntax a few times but I don't understand how above would be useful for this purpose. My understanding would reason that this just adds unnecessary complexity. Someone please enlighten me!
function display(db, user, foo) {
if (isValid(db, user) {
// without abstract function
console.log(foo);
}
...
}
Thanks :-)

With such a simple example, there is no reason to use an anonymous function like this in node.js or any other JavaScript environment when the inner function could be called directly instead.
When conducting reviews, I often will assume that any console.log (or choose-your-language logging) is accidental. If the surrounding source doesn't do anything of value itself, it is likely only intended to support a (possibly-out-of-date) console.log during development and debugging.
Because JavaScript is single-threaded, constructing an anonymous function of this sort is strictly for scoping. In the case of the original source that didn't have the execution (i.e. it was missing the ()), this is actually a way to "hide" debug code from the runtime while leaving it in-place if a step-through debug ever needed it.
When debugging, it's as easy as adding the ('isValid succeeded') to have it execute all the source inside.
(function(foo){
console.log(foo);
})( /* You're missing a parameter value here; will print 'undefined'. */ );

Sometimes you use an anonymous function to limit the scope of your code. It may come handy if you do not want to pollute the global scope with your variables. Also, it makes dependency injection for testing easier.
Consider the following examples:
var a = 2, b = 3;
function print(x)
{
console.log('x= ', x);
}
print(a+b);
In the above example, the a, b, and print now belongs to the window object. When your code grows it becomes difficult to keep track and avoid of name collisions. The problem becomes even more severe if you are using 3rd party libs etc.
In the example below, your code is wrapped in an anonymous function protecting the global scope. You can also pass parameters to the anonymous function
(function(localParam){
var a = 2, b = 3;
function print(x)
{
console.log('x= ', x);
}
print(a+b);
//also
console.log(localParam); // = safe way to access and global variables
})(globalParam);

JavaScript experts: Do block-scopes with `{}` and anonymous functions both help garbage-collection?

In the book "You don't know JS: scopes & closures", Kyle simpson states that a block-scoped variable helps with garbage collection, here is the specific example:
function process(data) {
// do something interesting
}
{
let someReallyBigData = {};
process(someReallyBigData);
}
var btn = document.getElementById("my_button");
btn.addEventListener("click", function click(evt) {
console.log("Clicked!");
}, false);
Now the above example is supposed to help with garbage-collection since the variable someReallyBigData will be dropped from memory as soon as the block ends, unlike this example, which doesn't help with garbage-collection:
function process(data) {
// do something interesting
}
var someReallyBigData = {};
process(someReallyBigData);
var btn = document.getElementById("my_button");
btn.addEventListener("click", function click(evt) {
console.log("Clicked!");
}, false);
Now I am sure this guy is correct about the examples he provided (the first one); however, i am wondering whether or not everything would be the same if we used an anonymous IIFE (Immediately Invoked Function Expression) along with a normal var instead of the {} curly braces and the let variable. Let me turn that into an example:
function process(data) {
// do something interesting
}
(function(){
var someReallyBigData = {};
process(someReallyBigData);
}());
var btn = document.getElementById("my_button");
btn.addEventListener("click", function click(evt) {
console.log("Clicked!");
}, false);
Looking at it from the surface, they both should do the same thing; since just as the block-scoped someReallyBigData variable could not be accessed anymore by anything after the block of code had executed, the code inside the anonymous function cannot be accessed by anything once it has executed, by anything, from anywhere.
So, do they really have the same effect on the garbage-collection mechanisms of the Javascript engine ? I was almost certain this was the case until I googled "anonymous function garbage-collection" and almost all the material that showed up said only negative things like "anonymous functions cause memory leaks" and etc.
I would be glad if someone could shed some light on this thing.
Please don't forget that my question is a bit specific to the examples I provided, thanks!

(V8 developer here.) Yes, there are several ways to make objects unreachable, including at least all of the following:
Put stuff in let-declared variables in a block scope
Put stuff into an IIFE
Clear the variable (var or let) when you're done with it: someReallyBigData = null;
The end result will be the same in all cases: objects that are no longer reachable are eligible for garbage collection.
Other notes based on the discussion here:
The advice quoted in the question makes sense for top-level code. Within a reasonably sized function, I wouldn't worry about it -- the function will probably return soon enough that there is no difference, so you don't need to burden yourself with such considerations.
There's a big difference between "an object can be freed now" and "an object will be freed now". Letting something go out of scope does not cause it to be freed immediately, and does not cause the garbage collector to run more often. It just means that whenever the garbage collector next decides to go looking for garbage, the object in question will be eligible.
IIFEs are IIFEs regardless of whether they're anonymous or not. Example:
(function I_have_a_name() {
var someReallyBigData = ...;
})();
// someReallyBigData can be collected now.
I_have_a_name(); // ReferenceError: I_have_a_name is not defined
Creating closures in and of itself does not keep things alive. However, if closures reference variables in their outer scope, then (of course!) those can't be collected as long as the closure is around. Example:
var closure = (function() {
var big_data_1 = ...;
var big_data_2 = ...;
return function() { return big_data_1.foo; }
})();
// big_data_2 can be collected at this point.
closure(); // This needs big_data_1.
// big_data_1 still cannot be collected, closure might need it again.
closure = null;
// big_data_1 can be collected now.
The optimizing compiler has little influence on all this. It usually operates on a per-function bases, and usually the top-level is not optimized (because most logic tends to be in functions). Within a function, the optimizing compiler is very well aware of the lifetimes of objects (that's part of what it means to be an optimizing compiler).

JavaScript only has block level scope when you use the let or const keywords in declarations. Just because you have {} alone does not create block-level scope (as is the case in most other languages).
Aside from that, garbage collection is implementation dependent and you would most likely not notice any difference in performance due to block scoping.
Anonymous functions can have an impact on garbage collection because the function can be set up in such a way that it doesn't have to be stored for potential calling later. A good example of this would be a function that needs to run only once (i.e. when the document is fully parsed):
window.addEventListener("DOMContentLoaded", function(){ . . . });
However, that doesn't mean that all anonymous functions provide this benefit because the function could wind up being stored (i.e. if it is returned from a function and then captured in a variable) or if the anonymous function sets up closures, then all bets are off.
Also, be aware that you can't unit test anonymous functions as simply as you can named functions.

I am wondering whether or not everything would be the same if we used an anonymous IIFE
Sure that's possible, it's what transpilers do to emulate block scopes as well. However IIFEs look a bit awkward, block scopes with let/const variables are easier to use. See also Will const and let make the IIFE pattern unnecessary?.
Now the first example is supposed to help with garbage-collection since the variable someReallyBigData will be dropped from memory as soon as the block ends, unlike the second example, which doesn't help with garbage-collection.
Notice that the word is helps, not enables. Today's engines can garbage-collect the variable just fine, as their optimiser sees that it's not used in the preserved closure. The block scope only makes this kind of static analysis easier.

Definition of 'closures'

Let me ask one question. It's about closures in JavaScript, but not about how they work.
David Flanagan in his "JavaScript The Definitive Guide 6th Edition" wrote:
...Technically, all JavaScript functions are closures: they are objects, and they have a scope chain associated with them....
Is this correct? Can I call every function (function object + it's scope) a "closure"?
And stacks' tag "closures" says:
A closure is a first-class function that refers to (closes over) variables from the scope in which it was defined. If the closure still exists after its defining scope ends, the variables it closes over will continue to exist as well.
In JavaScript every function refers to variables from the scope in which it was defined. So, It's still valid.
The question is: why do so many developers think otherwise? Is there something wrong with this theory? Can't it be used as general definition?

Technically, all functions are closures. But if the function doesn't reference any free variables, the environment of the closure is empty. The distinction between function and closure is only interesting if there are closed variables that need to be saved along with the function code. So it's common to refer to functions that don't access any free variables as functions, and those that do as closures, so that you know about this distinction.

It's a tricky term to pin down. A function that's simply declared is just a function. What makes a closure is calling the function. By calling a function, space is allocated for the parameters passed and for local variables declared.
If a function simply returns some value, and that value is just something simple (like, nothing at all, or just a number or a string), then the closure goes away and there's really nothing interesting about it. However, if some references to parameters or local variables (which, mostly, are the same) "escape" the function, then the closure — that space allocated for local variables, along with the chain of parent spaces — sticks around.
Here's a way that some references could "escape" from a function:
function escape(x, y) {
return {
x: x,
y: y,
sum: function() { return x + y; }
};
}
var foo = escape(10, 20);
alert(foo.sum()); // 30
That object returned from the function and saved in "foo" will maintain references to those two parameters. Here's a more interesting example:
function counter(start, increment) {
var current = start;
return function() {
var returnValue = current;
current += increment;
return returnValue;
};
}
var evenNumbers = counter(0, 2);
alert(evenNumbers()); // 0
alert(evenNumbers()); // 2
alert(evenNumbers()); // 4
In that one, the returned value is itself a function. That function involves code that makes reference to the parameter "increment" and a local variable, "current".
I would take some issue with conflating the concept of a closure and the concept of functions being first-class objects. Those two things really are separate, though they're synergistic.
As a caveat, I'm not a formalist by basic personality and I'm really awful with terminology so this should probably be showered with downvotes.

I would try to answer your question knowing you were asked about what closures are during the interview (read it from the comments above).
First, I think you should be more specific with "think otherwise". How exactly?
Probably we can say something about this noop function's closure:
function() {}
But it seems it has no sense since there are no variables would bound on it's scope.
I think even this example is also not very good to consider:
function closureDemo() {
var localVar = true;
}
closureDemo();
Since its variable would be freed as there is no possibility to access it after this function call, so there is no difference between JavaScript and let's say C language.
Once again, since you said you have asked about what closures are on the interview, I suppose it would be much better to show the example where you can access some local variables via an external function you get after closureDemo() call, first. Like
function closureDemo() {
var localVar = true;
window.externalFunc = function() {
localVar = !localVar; // this local variable is still alive
console.log(localVar); // despite function has been already run,
// that is it was closed over the scope
}
}
closureDemo();
externalFunc();
externalFunc();
Then to comment about other cases and then derive the most common definition as it more likely to get the interviewer to agree with you rather than to quote Flanagan and instantly try to find the page where you've read it as a better proof of your statement or something.
Probably your interviewer just thought you don't actually know about what closures are and just read the definition from the book. Anyhow I wish you good luck next time.

The definition is correct.
The closure keeps the scope where it was born
Consider this simple code:
getLabelPrinter = function( label) {
var labelPrinter = function( value) {
document.write(label+": "+value);
}
return labelPrinter; // this returns a function
}
priceLabelPrinter = getLabelPrinter('The price is');
quantityLabelPrinter = getLabelPrinter('The quantity is');
priceLabelPrinter(100);
quantityLabelPrinter(200);
//output:
//The price is: 100 The quantity is: 200
https://jsfiddle.net/twqgeyuq/

Javascript Function Declaration Options

I've seen experts using below to declare a function:
(function () {
function f(n) {
// Format integers to have at least two digits.
return n < 10 ? '0' + n : n;
}
//etc
}());
e.g.
https://github.com/douglascrockford/JSON-js/blob/master/json.js
Could someone help me understand when should we use above pattern and how do we make use of it?
Thanks.

Well, since ECMA6 hasn't arrived yet, functions are about the best way to create scopes in JS. If you wrap a variable declaration of sorts in an IIFE (Immediately Invoked Function Expression), that variable will not be created globally. Same goes for function declarations.
If you're given the seemingly daunting task of clearing a script of all global variables, all you need to do is wrap the entire script in a simple (function(){/*script here*/}());, and no globals are created, lest they are implied globals, but that's just a lazy fix. This pattern is sooo much more powerful.
I have explained the use of IIFE in more detail both here, here and here
The basic JS function call live-cycle sort of works like this:
f();//call function
||
====> inside function, some vars are created, along with the arguments object
These reside in an internal scope object
==> function returns, scope object (all vars and args) are GC'ed
Like all objects in JS, an object is flagged for GC (Garbage Collection) as soon as that object is not referenced anymore. But consider the following:
var foo = (function()
{
var localFoo = {bar:undefined};
return function(get, set)
{
if (set === undefined)
{
return localFoo[get];
}
return (localFoo[get] = set);
}
}());
When the IIFE returns, foo is assigned its return value, which is another function. Now localFoo was declared in the scope of the IIFE, and there is no way to get to that object directly. At first glance you might expect localFoo to be GC'ed.
But hold on, the function that is being returned (and assigned to foo still references that object, so it can't be gc'ed. In other words: the scope object outlives the function call, and a closure is created.
The localFoo object, then, will not be GC'ed until the variable foo either goes out of scope or is reassigned another value and all references to the returned function are lost.
Take a look at one of the linked answers (the one with the diagrams), In that answer there's a link to an article, from where I stole the images I used. That should clear things up for you, if this hasn't already.
An IIFE can return nothing, but expose its scope regardless:
var foo = {};
(function(obj)
{
//obj references foo here
var localFoo = {};
obj.property = 'I am set in a different scope';
obj.getLocal = function()
{
return localFoo;
};
}(foo));
This IIFE returns nothing (implied undefined), yet console.log(foo.getLocal()) will log the empty object literal. foo itself will also be assigned property. But wait, I can do you one better. Assume foo has been passed through the code above once over:
var bar = foo.getLocal();
bar.newProperty = 'I was added using the bar reference';
bar.getLocal = function()
{
return this;
};
console.log(foo.getLocal().newProperty === bar.newProperty);
console.log(bar ==== foo.getLocal());
console.log(bar.getLocal() === foo.getLocal().getLocal());
//and so on
What will this log? Indeed, it'll log true time and time again. Objects are never copied in JS, their references are copied, but the object is always the same. Change it once in some scope, and those changes will be shared across all references (logically).
This is just to show you that closures can be difficult to get your head round at first, but this also shows how powerful they can be: you can pass an object through various IIFE's, each time setting a new method that has access to its own, unique scope that other methdods can't get to.
Note
Closers aren't all that easy for the JS engines to Garbage Collect, but lately, that's not that big of an issue anymore.
Also take your time to google these terms:
the module pattern in JavaScript Some reasons WHY we use it
closures in JavaScript Second hit
JavaScript function scope First hit
JavaScript function context The dreaded this reference
IIFE's can be named functions, too, but then the only place where you can reference that function is inside that function's scope:
(function init (obj)
{
//obj references foo here
var localFoo = {};
obj.property = 'I am set in a different scope';
obj.getLocal = function()
{
return localFoo;
};
if (!this.wrap)
{//only assign wrap if wrap/init wasn't called from a wrapped object (IE foo)
obj.wrap = init;
}
}(foo));
var fooLocal = foo.getLocal();
//assign all but factory methods to fooLocal:
foo.wrap(fooLocal);
console.log(fooLocal.getLocal());//circular reference, though
console.log(init);//undefined, the function name is not global, because it's an expression
This is just a basic example of how you can usre closures to create wrapper objects...

Well the above pattern is called the immediate function. This function do 3 things:-
The result of this code is an expression that does all of the following in a single statement:
Creates a function instance
Executes the function
Discards the function (as there are no longer any references to it after the statement
has ended)
This is used by the JS developers for creating a variables and functions without polluting the global space as it creates it's own private scope for vars and functions.
In the above example the function f(){} is in the private scope of the immediate function, you can't invoke this function at global or window scope.

Browser-based JavaScript only has two scopes available: Global and Function. This means that any variables you create are in the global scope or confined to the scope of the function that you are currently in.
Sometimes, often during initialization, you need a bunch of variables that you only need once. Putting them in the global scope isn't appropriate bit you don't want a special function to do it.
Enter, the immediate function. This is a function that is defined and then immediately called. That's what you are seeing in Crockford's (and others') code. It can be anonymous or named, without defeating the purpose of avoiding polluting the global scope because the name of the function will be local to the function body.
It provides a scope for containing your variables without leaving a function lying around. Keeps things clean.

Should I encapsulate blocks of functionality in anonymous JavaScript functions?

My intuition is that it's a good idea to encapsulate blocks of code in anonymous functions like this:
(function() {
var aVar;
aVar.func = function() { alert('ronk'); };
aVar.mem = 5;
})();
Because I'm not going to need aVar again, so I assume that the garbage collector will then delete aVar when it goes out of scope. Is this right? Or are interpreters smart enough to see that I don't use the variable again and clean it up immediately? Are there any reasons such as style or readability that I should not use anonymous functions this way?
Also, if I name the function, like this:
var operations = function() {
var aVar;
aVar.func = function() { alert('ronk'); };
aVar.mem = 5;
};
operations();
does operations then necessarily stick around until it goes out of scope? Or can the interpreter immediately tell when it's no longer needed?
A Better Example
I'd also like to clarify that I'm not necessarily talking about global scope. Consider a block that looks like
(function() {
var date = new Date(); // I want to keep this around indefinitely
// And even thought date is private, it will be accessible via this HTML node
// to other scripts.
document.getElementById('someNode').date = date;
// This function is private
function someFunction() {
var someFuncMember;
}
// I can still call this because I named it. someFunction remains available.
// It has a someFuncMember that is instantiated whenever someFunction is
// called, but then goes out of scope and is deleted.
someFunction();
// This function is anonymous, and its members should go out of scope and be
// deleted
(function() {
var member;
})(); // member is immediately deleted
// ...and the function is also deleted, right? Because I never assigned it to a
// variable. So for performance, this is preferrable to the someFunction
// example as long as I don't need to call the code again.
})();
Are my assumptions and conclusions in there correct? Whenever I'm not going to reuse a block, I should not only encapsulate it in a function, but encapsulate it in an anonymous function so that the function has no references and is deleted after it's called, right?

You're right that sticking variables inside an anonymous function is a good practice to avoid cluttering up the global object.
To answer your latter two questions: It's completely impossible for the interpreter to know that an object won't be used again as long as there's a globally visible reference to it. For all the interpreter knows, you could eval some code that depends on window['aVar'] or window['operation'] at any moment.
Essentially, remember two things:
As long as an object is around, none of its slots will be magically freed without your say-so.
Variables declared in the global context are slots of the global object (window in client-side Javascript).
Combined, these mean that objects in global variables last for the lifetime of your script (unless the variable is reassigned). This is why we declare anonymous functions — the variables get a new context object that disappears as soon as the function finishes execution. In addition to the efficiency wins, it also reduces the chance of name collisions.
Your second example (with the inner anonymous function) might be a little overzealous, though. I wouldn't worry about "helping the garbage collector" there — GC probably isn't going to run in the middle that function anyway. Worry about things that will be kept around persistently, not just slightly longer than they otherwise would be. These self-executing anonymous functions are basically modules of code that naturally belong together, so a good guide is to think about whether that describes what you're doing.
There are reasons to use anonymous functions inside anonymous functions, though. For example, in this case:
(function () {
var bfa = new Array(24 * 1024*1024);
var calculation = calculationFor(bfa);
$('.resultShowButton').click( function () {
var text = "Result is " + eval(calculation);
alert(text);
} );
})();
This results in that gigantic array being captured by the click callback so that it never goes away. You could avoid this by quarantining the array inside its own function.

Anything that you add to the global scope will stay there until the page is unloaded (unless you specifically remove it).
It's generally a good idea to put variables and function that belong together either in a local scope or in an object, so that they add as little as possible to the global namespace. That way it's a lot easier to reuse code, as you can combine different scripts in a page with minimal risks for naming collisions.

Develop Reference

JavaScript is the programming language of the Web.

garbage collection with node.js - javascript

Related

When to use an anonymous function in Nodejs?

JavaScript experts: Do block-scopes with `{}` and anonymous functions both help garbage-collection?

Definition of 'closures'

Javascript Function Declaration Options

Should I encapsulate blocks of functionality in anonymous JavaScript functions?

Categories

Resources