Node.js garbage collection and synchronized objects

Node.js garbage collection and synchronized objects - javascript

This is a little bit tricky to explain, but I'll give it a try:
In a node.js server application I would like to deal with data objects that can be used in more than one place at once. The main problem is, that these objects are only referred to by an object id and are loaded from the database.
However, as soon as an object is already loaded into one scope, it should not be loaded a second time when requested, but instead the same object should be returned.
This leads me to the question of garbage collection: As soon as an object is no longer needed in any scope, it should be released completely to prevent having the whole database in the server's memory all the time. But here starts the problem:
There are two ways I can think of to create such a scenario: Either use a global object reference (which prevents any object from being collected) or, really duplicate these objects but synchronize them in a way that each time a property in one scope gets changed, inform the other instances about that change.
Again, therefore each instance would have to register an event handler, which in turn is pointing back to that instance thus preventing it from being collected again.
Did anyone come up with a solution for such a scenario I just didn't realize? Or is there any misconception in my understanding of the garbage collector?
What I want to avoid is manual reference counting for every object in the memory. Everytime when an object is being removed from any collection, I would have to adapt the reference count manually (there is even no destructor or "reference decreased" event in js)

Using the weak module, I implemented a WeakMapObj that works like we originally wanted WeakMap to work. It allows you to use a primitive for the key and an object for the data and the data is retained with a weak reference. And, it automatically removes items from the map when their data is GCed. It turned out to be fairly simple.
const weak = require('weak');
class WeakMapObj {
constructor(iterable) {
this._map = new Map();
if (iterable) {
for (let array of iterable) {
this.set(array[0], array[1]);
}
}
}
set(key, obj) {
if (typeof obj === "object") {
let ref = weak(obj, this.delete.bind(this, key));
this._map.set(key, ref);
} else {
// not an object, can just use regular method
this._map.set(key, obj);
}
}
// get the actual object reference, not just the proxy
get(key) {
let obj = this._map.get(key);
if (obj) {
return weak.get(obj);
} else {
return obj;
}
}
has(key) {
return this._map.has(key);
}
clear() {
return this._map.clear();
}
delete(key) {
return this._map.delete(key);
}
}
I was able to test it in a test app and confirm that it works as expected when the garbage collector runs. FYI, just making one or two objects eligible for garbage collection did not cause the garbage collector to run in my test app. I had to forcefully call the garbage collector to see the effect. I assume that would not be an issue in a real app. The GC will run when it needs to (which may only run when there's a reasonable amount of work to do).
You can use this more generic implementation as the core of your object cache where an item will stay in the WeakMapObj only until it is no longer referenced elsewhere.
Here's an implementation that keeps the map entirely private so it cannot be accessed from outside of the WeakMapObj methods.
const weak = require('weak');
function WeakMapObj(iterable) {
// private instance data
const map = new Map();
this.set = function(key, obj) {
if (typeof obj === "object") {
// replace obj with a weak reference
obj = weak(obj, this.delete.bind(this, key));
}
map.set(key, obj);
}
// add methods that have access to "private" map
this.get = function(key) {
let obj = map.get(key);
if (obj) {
obj = weak.get(obj);
}
return obj;
}
this.has = function(key) {
return map.has(key);
}
this.clear = function() {
return map.clear();
}
this.delete = function(key) {
return map.delete(key);
}
// constructor implementation
if (iterable) {
for (let array of iterable) {
this.set(array[0], array[1]);
}
}
}

Sounds like a job for a Map object used as a cache storing the object as the value (along with a count) and the ID as the key. When you want an object, you first look up its ID in the Map. If it's found there, you use the returned object (which will be shared by all). If it's not found there, you fetch it from the database and insert it into the Map (for others to find).
Then, to make it so that the Map doesn't grow forever, the code that fetches something from the Map would also need to release an object from the Map. When the useCnt goes to zero upon a release, you would remove an object from the Map.
This can be made entirely transparent to the caller by creating some sort of cache object that contains the Map and has methods for getting an object or releasing an object and it would be entirely responsible for maintaining the refCnt on each object in the Map.
Note: you will likely have to write the code that fetches it from the DB and inserts it into the Map carefully in order to not create a race condition because the fetching form the database is likely asynchronous and you could get multiple callers all not finding it in the Map and all in the process of getting it from the database. How to avoid that race condition depends upon the exact database you have and how you're using it. One possibility is for the first caller to insert a place holder in the Map so subsequent callers will know to wait for some promise to resolve before the object is inserted in the Map and available to them to use.
Here's a general idea for how such an ObjCache could work. You call cache.get(id) when you want to retrieve an item. This always returns a promise that resolves to the object (or rejects if there's an error getting it from the DB). If the object is in the cache already, the promise it returns will be already resolved. If the object is not in the cache yet, the promise will resolve when it has been fetched from the DB. This works even when multiple parts of your code request an object that is "in the process" of being fetched from the DB. They all get the same promise that is resolved with the same object when the object has been retrieved from the DB. Every call to cache.get(id) increases the refCnt for that object in the cache.
You then call cache.release(id) when a given piece of code is done with an object. That will decrement the internal refCnt and remove the object from the cache if the refCnt hits zero.
class ObjCache() {
constructor() {
this.cache = new Map();
}
get(id) {
let cacheItem = this.cache.get(id);
if (cacheItem) {
++cacheItem.refCnt;
if (cacheItem.obj) {
// already have the object
return Promise.resolve(cacheItem.obj);
}
else {
// object is pending, return the promise
return cacheItem.promise;
}
} else {
// not in the cache yet
let cacheItem = {refCnt: 1, promise: null, obj: null};
let p = myDB.get(id).then(function(obj) {
// replace placeholder promise with actual object
cacheItem.obj = obj;
cacheItem.promise = null;
return obj;
});
// set placeholder as promise for others to find
cacheItem.promise = p;
this.cache.set(id, cacheItem);
return p;
}
}
release(id) {
let cacheItem = this.cache.get(id);
if (cacheItem) {
if (--cacheItem.refCnt === 0) {
this.cache.delete(id);
}
}
}
}

Ok, for anyone who faces similar problems, I found a solution. jfriend00 pushed me towards this solution by mentioning WeakMaps which were not exactly the solution themselves, but pointed my focus on weak references.
There is an npm module simply called weak that will do the trick. It holds a weak reference to an object and safely returns an empty object once the object was garbage collected (thus, there is a way to identify a collected object).
So I created a class called WeakCache using a DataObject:
class DataObject{
constructor( objectID ){
this.objectID = objectID;
this.dataLoaded = new Promise(function(resolve, reject){
loadTheDataFromTheDatabase(function(data, error){ // some pseudo db call
if (error)
{
reject(error);
return;
}
resolve(data);
});
});
}
loadData(){
return this.dataLoaded;
}
}
class WeakCache{
constructor(){
this.cache = {};
}
getDataObjectAsync( objectID, onObjectReceived ){
if (this.cache[objectID] === undefined || this.cache[objectID].loadData === undefined){ // object was not cached yet or dereferenced, recreate it
this.cache[objectID] = weak(new DataObject( objectID )function(){
// Remove the reference from the cache when it got collected anyway
delete this.cache[this.objectID];
}.bind({cache:this, objectID:objectID});
}
this.cache[objectID].loadData().then(onObjectReceived);
}
}
This class is still in progress but at least this is a way how it could work. The only downside to this (but this is true for all database-based data, pun alert!, therefore not such a big deal), is that all data access has to be asynchronous.
What will happen here, is that the cache at some point may hold an empty reference to every possible object id.

Related

What are WeakRef and Finalizers in ES2021 (ES12)

I want to understand What are WeakRef and Finalizers in ES2021 with a real simple example and Where to use them.
I know, WeakRef is a class. This will allow developers to create weak references to objects, and Finalizer or FinalizationRegistry allows you to register callback functions that will be invoked when an object is garbage collected
const myWeakRef = new WeakRef({
name: 'Cache',
size: 'unlimited'
})
// Log the value of "myWeakRef":
console.log(myWeakRef.deref())

As always, MDN's docs help.
A WeakRef object contains a weak reference to an object, which is called its target or referent. A weak reference to an object is a reference that does not prevent the object from being reclaimed by the garbage collector. In contrast, a normal (or strong) reference keeps an object in memory. When an object no longer has any strong references to it, the JavaScript engine's garbage collector may destroy the object and reclaim its memory. If that happens, you can't get the object from a weak reference anymore.
In almost every other part of JS, if some object (A) holds a reference to another object (B), B will not be garbage-collected until A can be fully garbage-collected as well. For example:
// top level
const theA = {};
(() => {
// private scope
const theB = { foo: 'foo' };
theA.obj = obj;
})();
In this situation, the theB will never be garbage collected (unless theA.obj gets reassigned) because theA on the top level contains a property that holds a reference to theB; it's a strong reference, which prevents garbage collection.
A WeakRef, on the other hand, provides a wrapper with access to an object while not preventing garbage collection of that object. Calling deref() on the WeakRef will return you the object if it hasn't been garbage collected yet. If it has been GC'd, .deref() will return undefined.
FinalizationRegistry deals with a similar issue:
A FinalizationRegistry object lets you request a callback when an object is garbage-collected.
You first define the registry with the callback you want to run, and then you call .register on the registry with the object you want to observe. This will let you know exactly when something gets garbage collected. For example, the following will log Just got GCd! once the obj gets reclaimed:
console.log('script starting...');
const r = new FinalizationRegistry(() => {
console.log('Just got GCd!');
});
(() => {
// private closure
const obj = {};
r.register(obj);
})();
You can also pass a value when calling .register that gets passed to the callback when the object gets collected.
new FinalizationRegistry((val) => {
console.log(val);
});
r.register(obj, 'the object named "obj"')
will log the object named "obj" it gets GC'd.
All this said, there is rarely a need for these tools. As MDN says:
Correct use of FinalizationRegistry takes careful thought, and it's best avoided if possible. It's also important to avoid relying on any specific behaviors not guaranteed by the specification. When, how, and whether garbage collection occurs is down to the implementation of any given JavaScript engine. Any behavior you observe in one engine may be different in another engine, in another version of the same engine, or even in a slightly different situation with the same version of the same engine. Garbage collection is a hard problem that JavaScript engine implementers are constantly refining and improving their solutions to.
Best to let the engine itself deal with garbage collection automatically whenever possible, unless you have a really good reason to care about it yourself.

The main use of weak references is to implement caches or mappings to large objects. In many scenarios, we don't want to keep a lot of memory for a long time saving this rarely used cache or mappings. We can allow the memory to be garbage collected soon and later if we need it again, we can generate a fresh cache. If the variable is no longer reachable, the JavaScript garbage collector automatically removes it.
const callback = () => {
const aBigObj = {
name: "Hello world"
};
console.log(aBigObj);
}
(async function(){
await new Promise((resolve) => {
setTimeout(() => {
callback();
resolve();
}, 2000);
});
})();
When executing the above code, it prints "Hello world" after 2 seconds. Based on how we use the callback() function, aBigObj is stored in memory forever, maybe.
Let us make aBigObj a weak reference.
const callback = () => {
const aBigObj = new WeakRef({ name: "Hello world" }); console.log(aBigObj.deref().name);}
(async function(){
await new Promise((resolve) => {
setTimeout(() => {
callback(); // Guaranteed to print "Hello world"
resolve();
}, 2000);
});
await new Promise((resolve) => {
setTimeout(() => {
callback(); // No Gaurantee that "Hello world" is printed
resolve();
}, 5000);
});
})();
The first setTimeout() will surely print the value of the name. That is guaranteed in the first turn of the event loop after creating the weak reference.
But there is no guarantee that the second setTimeout() prints "Backbencher". It might have been sweeped by the garbage collector. Since the garbage collection works differently in different browsers, we cannot guarantee the output. That is also why we use WeakRef in situations like managing the cache.
More Information...

How to recursively trap all changes to an object?

I'm using a Proxy to detect when an object is modified (and then I save it to disk). This works great for simple properties of the proxied object, but fails on modification of object properties.
var obj = {
p1 = "Hello",
a1 = []
}
var dirtyHandler = {
set: function(obj, prop, value) {
markDirty(obj);
obj[prop] = value;
return true;
}
};
var proxied = new Proxy(obj, dirtyHandler);
proxied.p1 = "World"; // <-- proxy detects modification
proxied.a1.push({'foo': 3}); // <-- proxy does not detect modification
Does anyone know how to recursively detect any modification in my object (a1.push(...), a1[0].foo = 4, etc.)?

Here's how I ended up solving this for my use case.
First I add a proxy for all the known objects I care about (not shown). In the handler for the proxy on every set call I check if the value is already a proxy and if not, substitute one:
var DirtyHandler = function(root) {
this.root = root;
this.set = (obj, prop, value) => {
if (!dirtyIgnores[prop]) {
debug('Dirty: ' + prop + ' of ' + obj.commitId);
markDirty(this.root);
}
if (value && typeof value === 'object') {
value = new Proxy(value, this);
}
obj[prop] = value;
return true;
};
}

I published a library on GitHub (Observable Slim) that recursively iterates through a target object and applies a Proxy on all objects. It enables you to monitor all changes that occur under a single target object no matter how deeply nested they are. It also has a few extra features:
Reports back to a specified callback whenever changes occur.
Will prevent user from trying to Proxy a Proxy.
Keeps a store of which objects have been proxied and will re-use existing proxies instead of creating new ones (very significant performance implications).
Allows user to traverse up from a child object and retrieve the parent.
Written in ES5 and plays nice with the Proxy Polyfill so it can be deployed in older browsers fairly easily.
Please feel free to take a look and hopefully contribute as well!

angularjs $resource converting query Resource list to array of objects

I'm trying to create an array of objects from the return of a $resource query as shown in this SO question: link. However, I keep getting the same list of Resources and other elements. I have a plunk: here (You have to open the developer console to see the output.)
var app = angular.module('plunker', ['ngResource']);
app.factory('NameResource', function($resource) {
var url = 'data.json';
var res = $resource(url, null, {
query: {
method: 'GET',
isArray: true,
transformResponse: function(data, headersGetter) {
var items = angular.fromJson(data);
var models = [];
angular.forEach(items, function(item) {
models.push(item);
});
console.log("models: ", models);
return models;
}
}
});
return res;
});
app.controller('MainCtrl', function($scope, NameResource) {
$scope.names = NameResource.query();
console.log('Inside controller: ', $scope.names);
setTimeout(function(){console.log('after some time names is:', $scope.names)}, 3000);
});
What am I doing wrong? Or have I misunderstood something. Also what is the difference between the two? It seems to work very similar for me. When will it cause an issue?

Resource.query returns an array (because of the isArray flag you created) with two properties, $promise which is the promise which when resolved will "copy" all the response values to the array returned from Resource.query magically updating the view and $resolved which is a flag telling if $promise was resolved already, to answer your question there's actually some additional transformation happening, the data returned from your transformation will actually go through another transform (which can't be disabled) and this is where your each object is transformed into a Resource instance.
So this is what you're expecting to happen:
promise
.then(function (rawData) {
// this is where your transformation is happening
// e.g. transformResponse is called with rawData
// you return your transformed data
})
.then(function (transformedData) {
// raw data has gone through 1 transformation
// you have to decide what to do with the data, like copying it to
// some variable for example $scope.names
})
But Resource is doing the following:
promise
.then(function (rawData) {
// this is where your transformation is happening
})
.then(function (transformedData) {
// Resource is adding this handler and doing the
// 'copy' to array operation here for you,
// but it will actually create a Resource instance
// in the process with each item of your array!
})
.then(function (transformedDataV2) {
// raw data has gone through 2 transformations here!
})
The additional transformation is where the magic happens and is where the Resource instance is created, if we take a look at the source code these are the lines which take care of this transformation, I'll copy them here:
if (action.isArray) {
value.length = 0;
forEach(data, function(item) {
if (typeof item === "object") {
value.push(new Resource(item));
} else {
// Valid JSON values may be string literals, and these should not be converted
// into objects. These items will not have access to the Resource prototype
// methods, but unfortunately there
value.push(item);
}
});
}
data is the data returned by your first transformation and as seen above it'll pass the typeof item === 'Object' check so value which is the array returned by Resource.query is updated with a new Resource item (not with item). You were worried about this strange Resource object, let's analyze the Resource constructor:
function Resource(value) {
shallowClearAndCopy(value || {}, this);
}
It's just copying each of the properties of the object value to this (this is the new Resource instance), so now we're dealing with Resource objects and not plain array objects
will it cause an issue?
I'm sure it will, if the transform function you define is a little bit more complex like having each object actually be an instance of something else whose __proto__ has some methods, e.g. a Person object instead of a plain object then the methods defined in Person.prototype won't be visible to the result of the whole operation since each object won't be a Person instance but a Resource instance! (see this error in this plunkr, make sure to read the comments and also look at the error raised in the console because of the undefined method)

Creating a backup of 'this'

In javascript, I have an object (think of it as a shape), that can be put in edit mode and edited, or a not editable mode. When editable mode, I want to have a cancel button that cancels all edits and returns the shape back to its original form. I was hoping to use something like the following, but assigning things to 'this' doesn't work. What would the best way to do this be? I would prefer not to use external objects to store backups, because there could be many shapes and sorting out which backup corresponds to what adds code that is not as nicely packaged.
Shape.prototype.edit = function() {
this.backup = this;
...
}
Shape.prototype.cancelEdit = function() {
this = this.backup;
...
}

I think Shape should contain properties object, for example this.properties. In that object you should store all information about shape (it will be something like shape's model, only data, without any methods, or other internal class data). And in backup function you should backup only properties, not all shape's object.
(I'm a non native english speaker, feel free to correct my message if need)

You could implement something like this, where you go through each key in the object and if it's a property and not a function then store in a backup array.
var backup ;
function backup()
{
backup = [];
for(var key in this) {
if(this.hasOwnProperty(key) && typeof this[key] !== 'function') {
backup[key] = this[key];
}
}
}
function restore()
{
for(var key in backup) {
this[key] = backup[key];
}
}

Working immediately with instances of asynchronous (dynamically loaded) classes in Javascript

The situation was that I wanted to create an instance of a helper class, but that helper class required initialisation through external scripts, so it was inherently asynchronous. With
var obj = new myObj();
clearly an call to
obj.myMethod();
would yield undefined, as obj would either be empty or undefined until its methods and params were loaded by the external script.
Yes, one could restructure things to have a callback pattern and work with the new object within that, but it gets cumbersome and awkward when working with a large and varied API with many dynamic objects as I've been working with.
My question has been, is there any possible way to cleverly get around this?

I imagine the academically trained programmers out there have a name for this sort of approach, but I put it here in case it's not better written somewhere.
What I've done is modify my loader class to use a placeholder+queue system to instantly return workable objects.
Here are the components. Sorry that there are jQuery bits mixed in, you can easily make this a pure-JS script but I've got it loaded anyway and I'm lazy.
'Client' makes this request, where 'caller' is my handler class:
var obj = caller.use('myObj',args);
In Caller, we have
Caller.prototype.use = function(objname,args) {
var _this = this;
var methods = ['method1','method2'];
var id = someRandomString();
this.myASyncLoader(objname,function(){
var q = [];
if (_this.objs[id].loadqueue) {
q = _this.objs[id].loadqueue;
}
_this.objs[id] = new myRemotelyLoadedClass(args);
//realise all our placeholder stuff is now gone, we kept the queue in 'q'
_this.objs[id].isloaded = true;
//once again, the jquery is unnecessary, sorry
$.each(q,function(a,b){
_this.objs[id][b['f']](b['a']);
});
});
_this.objs[id] = _this.createPlaceholderObj(methods,id);
return _this.objs[id];
}
This function basically initiates the loader function, and when that's done loads a new instance of the desired class. But in the meantime it immediately returns something, a placeholder object that we're going to load with all of our remotely loaded object's methods. In this example we have to explicitly declare them in an array which is a bit cumbersome but liveable, though I'm sure you can think of a better way to do it for your own purposes.
You see we're keeping both the temporary object and future object in a class-global array 'objs', associated with a random key.
Here's the createPlaceholderObj method:
Caller.prototype.createPlaceholderObj = function(methods,id) {
var _this = this;
var n = {};
n.tempid = id;
n.isloaded = false;
$.each(methods,function(a,methodCalled){
n[methodCalled] = function(){
_this.queueCall(id,methodCalled,arguments);
}
});
return n;
}
Here we're just loading up the new obj with the required methods, also storing the ID, which is important. We assign to the new methods a third function, queueCall, to which we pass the method called and any arguments it was sent with. Here's that method:
Caller.prototype.queueCall = function(id,methodName,args) {
if (this.objs[id].isloaded == true) {
this.objs[id][methodName](args);
} else {
if (this.objs[id].loadqueue) {
this.objs[id].loadqueue.push({'f':methodName,'a':args});
} else {
var arr = [{'f':methodName,'a':args}];
this.objs[id].loadqueue = arr;
}
}
}
This method will be called each time the client script is calling a method of our new object instance, whether its logic has actually been loaded or not. The IF statement here checks which is the case (isloaded is set to true in the caller method as soon as the async function is done). If the object is not loaded, the methodName and arguments are added to a queue array as a property of our placeholder. If it is loaded, then we can simply execute the method.
Back in the caller method, that last unexplained bit is where we check to see if there is a queue, and if there is, loop through it and execute the stored method names and arguments.
And that's it! Now I can do:
var obj = caller.use('myObj',args);
obj.someMethod('cool');
obj.anotherMethod('beans');
and while there might be a slight delay before those methods actually get executed, they'll run without complaint!
Not too short a solution, but if you're working on a big project you can just put this in one place and it will pay many dividends.
I'm hoping for some follow-ups to this question. I wonder, for example, how some of you would do this using a deferred-promise pattern? Or if there are any other ways? Or if anyone knows what this technique is called? Input from JS whizzes much appreciated.

Develop Reference

JavaScript is the programming language of the Web.