JavaScript Performance and Memory Access - javascript

EDIT
Did a JSPerf. Ran it against Chrome as Chrome uses v8.
http://jsperf.com/passing-large-objects
It looks like passing a large object doesn't matter; the difference is negligible. However, lookup on an object at some point gets a lot slower.
INTRODUCTION:
I’m writing a 2D JavaScript game engine while following component based and data oriented (via typed arrays) design principles. It’s designed for use by a simulation based multiplayer netcode.
My performance concerns are for the master simulation that will be running on the server; I believe that client browsers will be more than fast enough. As of now, the server is NodeJS, so it would involve the V8 interpreter. However, I’m not ruling out a switch to other technologies like Vert.x, which I believe uses the Rhino interpreter.
THE QUESTION:
When does JavaScript access objects in memory?
More specifically, let’s say I have an object like so.
var data = {
a1 : new Float64Array(123456),
a2 : new Float64Array(123456),
…
a9001: new Float64Array(123456)
};
And now let’s say I pass it to this function like so.
var update = function(obj) {
for(var property in obj) {
if(obj.hasOwnProperty(property)) {
obj[property][0]++;
}
}
};
update(data);
At what point are the Float64 arrays accessed? Does it access it the moment I pass data into update, attempting to load all 9001 arrays into the memory cache and page faulting like crazy? Does it wait to load the arrays until the hasOwnProperty? Or obj[property]? Or obj[property][0]?
WHY I ASK:
I’m trying to follow the data oriented design principles of keeping stuff in contiguous blocks of memory. Depending on how JavaScript works with memory, I will have to change the interface and structure of the engine.
For example, if all the arrays in data are accessed the moment I pass it into update, then I have to make special data objects with as few arrays as possible to reduce page faulting. If however the arrays are only accessed at say obj[property], then I can pass a large data object with more arrays without any performance penalties, which simplifies a lot of things.
A big reason why I’m not sure of the answer is because JavaScript objects aren’t objects like in other languages. From some random reading here or there, I’ve heard stuff like JavaScript objects have their own internal class. I’ve also heard of things like JavaScript objects being hash tables so you incur a lookup time with every property that you access.
Then I’ve heard that the interpreters treat objects differently based on how large the object is; smaller ones are treated one way and larger ones another. So jsperf stuff may not be an accurate measure.
FURTHER:
Taking the example further, there’s the question of how JavaScript handles nested objects. For example:
var properties = {
a1 : {
a1 : {
…
a1 : {
}
}
},
a2 : {
a2 : {
…
a2 : {
}
}
},
…
a9001 : {
a9001 : {
…
a9001 : {
}
}
}
};
var doSomething = function() {
};
doSomething(properties);
If passing in properties to doSomething causes every sub object and their sub objects to get accessed, then that’s a performance hit. If however it just passes a reference to the properties object and only accesses the sub objects when the code calls them, then it’s not bad at all.
If I had access to Vectors, I’d make an entity system framework in a heartbeat and this wouldn’t really be a problem. If I had access to pointers, which I believe only accesses the object when the code converts the pointer, then I could try other things. But only having typed arrays at my disposal limits my options, so I end up agonizing over questions like this.
Thanks for the any insight you can provide. I really appreciate it.

Related

Data structure internal implementation

If we implement a DataStructure in JavaScript, will the browsers underlying implementation use same data structure for handling data?
A Linked List for example - we will have a Node structure something like this
class Node {
constructor(data) {
this.data = data;
this.next = null;
}
}
Now let's say we set next to another node to create a linked list.
Does next really point to next Node using Linked List in underlying implementation? Linked List has O(1) to add a new node. Will this be actually same O(1) for JavaScript too when C++ (or any JavaScript engine) converts the implementation to system level ?
In JavaScript, identifiers (variables and arguments) and properties of objects are all essentially pointers to structures in memory (or on the heap).
Say that you have two Nodes like in your code, one linked to the other. One way to visualize the resulting structure is:
<Node>: memory address 16325
data: memory address 45642 (points to whatever argument was passed)
next: memory address 62563
<Node>: memory address 62563 (this is the same as the `next` above)
data: memory address 36425 (points to whatever argument was passed)
next: memory address 1 (points to null)
That's not exactly what happens, but it's close enough for the "underlying implementation" you're concerned about. If, in your JavaScript, you have a reference to one Node, and you link it to another by assigning to its next property, what's involved under the hood is simply taking the location of the linked object and changing the original object's next property to point to that location. And yes, that operation takes next to no processing power - it's definitely an O(1) process in any reasonable implementation, such as in browsers and Node.
The JavaScript engine does not have to re-analyze the structure from the ground up when a new property is created somewhere - it's all just objects and properties linking to other objects.

How to imitate garbage collection in javascript?

Let's say I have a library that keeps an array of objects, the purpose is not really relevant to the issue. it looks like this:
window.Tracker = {
objects: [],
track: function(obj){
this.objects.push(obj)
}
}
In other parts of the app, Vue/React components constantly push objects to this library as they're loaded from a server:
this.movie = { id: 56456, name: "Avengers" }
Tracker.track(this.props.movie)
Overtime, the Tracker.objects array gets bigger and bigger, mostly because of objects no longer needed (their components no longer exist), and I really don't want to keep objects like this in the array.
The problem is I don't have control over anything aside from this Tracker library. (so I can't really make callbacks when the object is no longer needed)
But I need a way to garbage collect/ get rid of objects that are no longer used by anything other than in the Tracker.objects array.
Is this possible?
The only way to store objects in a collection so that they are still garbage collected are WeakMaps. However you can't iterate them:
Because of references being weak, WeakMap keys are not enumerable (i.e. there is no method giving you a list of the keys). If they were, the list would depend on the state of garbage collection, introducing non-determinism.
~ MDN
So no, this is not possible in js for good reasons.

Avoiding duplication of key/data

I have a design annoyance with some existing code in JS. The code is working, so I have no desperate hurry to change it, but the duplication shown below does annoy me. What is the usual/recommended/official way of avoiding this situation?
The actual system is a large/complex financial system, so I have simplified it to the most basic example which demonstrates the problem:
var colours={
red:{id:"red", vals:[1,0,0]},
green:{id:"green", vals:[0,1,0]},
grey:{id:"grey", vals:[0.5,0.5,0.5]}
// ...etc
};
// id needs to be known internally within the object - thus it is defined as a property.
// e.g:
colour.prototype.identify(console.log(this.id));
// id also needs to be used externally to find an object quickly.
// e.g:
function getcolour(s){return colours[s];}
// Although this works. It does mean duplicating data, with the theoretical possibility of a mismatch:
var colours={//...
blue:{id:"green", // oh dear...
How would this normally be handled by the experts?
This question is somewhat subjective.
When creating my applications I typically try do do the following:
never define same data in multiple places. source should always be unambiguous
if I need to create any indices for faster/easier access, I use utility methods to do it. Those methods should be properly unit-tested, so that I would have little doubts on them doing the wrong thing
use third party libraries as much as possible (such as already suggested lodash or underscore) to minimize the amount of code to be written/maintained.
If your algorithms and utilities are properly unit-tested you should not worry (too much) about getting the data into inconsistent state. However, if those are critically important systems/interfaces, you may add some validation on output. And it is generally a good practice to have data validation and marshaling on input.
Explanation on the utility methods:
if you have data array, say
var data = [{"id":"i_1", ...}, {"id":"i_2", ...},{"id":"i_3",....}];
Then and you have to create an index out of that or create more data sets based on the original array, then you create yourself a library of utility methods that do the modification on the array, create derivative data sets, or iterate on the array and create a resulting item on the fly. For example:
var createIndex = function( arr ){
// do something that converts the data array with expected structure to object
// {
// i_1: {"id":"i_1", ...},
// i_2: {"id":"i_2", ...},
// i_3: {"id":"i_3", ...}
return newObj;
}
This method will create a hash-map to access your data, which is faster then to iterate over the original array all the time. But now, this method you can easily unit-test and be sure that when you use it on the source data to get your intended dataset, there will be no inconsistency.
I wouldn't change the colours[key] direct access with other method to avoid duplication.
Any other attempt will lead to processing and you have mentioned that you have a large amount of data.
I assume that the duplication is over the incoming data that is a waste.
An example of processing over the network data consuming could be, going over the map object and set the id dynamically according to the key. (processing vs traffic)
colours[key].id = key
You can filter your object converting it to an array of objects and then filtering unique values. Converting it to an array would allow you to perform a lot of operations quicker and easier.
So you can map your object to an array:
var coloursArray = myObj.map(function(value, index) {
return [value];
});
Remove duplicates:
function removeDuplicates() {
return coloursArray.filter((obj, pos, arr) => {
return arr.map(mapObj => mapObj[id]).indexOf(obj[id]) === pos;
});
}
You can remove duplicates from an array using for example underscore.js through the .uniq method:
var uniqueColoursArray = _.uniq(coloursArray , function(c){ return c.id; });
Moreover, this function is pretty useless because you can access your element directly:
function getcolour(s){return colours[s];}
Calling colours[s] it is also shorter than getcolour(s). Your function would make sense if you pass also the array because it is not accessible in some other scope.
Then I can't understand why you do pass a console.log as parameter here:
colour.prototype.identify(console.log(this.id));
maybe you would like to pass just the this.id

What is the accepted convention for when to use an object containing objects vs an array of objects in JSON?

I am currently in the process of writing a GUI which fundamentally allows users to edit/populate/delete a number of settings files, where the settings are stored in JSON, using AJAX.
I have limited experience with JavaScript (I have little experience with anything beyond MATLAB to be frank), however I find myself restructuring my settings structure because of the semantics of working with an object containing more objects, rather than an array of objects. In C# I would do this using a KeyValuePair, however the JSON structure prevents me from doing what I'd really like to do here, and I was wondering whether there was an accepted convention for do this in JavaScript which I should adopt now, rather than making these changes and finding that I cause more issues than I solve.
The sample data structure, which has similar requirements to many of my structures, accepts any number of years, and within these any number of events, and within these a set number of values.
Here is the previous structure:
{"2013":
{
"someEventName":
{
"data1":"foo",
"data2":"bar",
...},
...},
...}
Here is my ideal structure, where the year/event name operates as a key of type string for a value of type Array:
["2013":
[
"someEventName":
{
"data1":"foo",
"data2":"bar",
...},
...],
...]
As far as I am aware, this would be invalid JSON notation, so here is my proposed structure:
[{"Key":"2013",
"Value":
[{"Key":"someEventName",
"Value":
{
"data1":"foo",
"data2":"bar",
...}
},
...]
},
...]
My proposed "test" for whether something should be an object containing objects or an array of objects is "does my sub-structure take a fixed, known number of objects?" If yes, design as object containing objects; if no, design as array of objects.
I am required to filter through this structure frequently to find data/values, and I don't envisage ever exploiting the index functionality that using an array brings, however pushing and removing data from an array is much more flexible than to an object and it feels like using an object containing objects deviates from the class model of OOP; on the other hand, the methods for finding my data by "Key" all seem simpler if it is an object containing objects, and I don't envisage myself using Prototype methods on these objects anyway so who cares about breaking OOP.
Response 1
In the previous structure to add a year, for example, the code would be OBJ["2014"]={}; in the new structure it would be OBJ.push({"Key":"2014", "Value":{}}); both of these solutions are similarly lacking in their complexity.
Deleting is similarly trivial in both cases.
However, if I want to manipulate the value of an event, say, using a function, if I pass a pointer to that object to the function and try to superceed the whole object in the reference, it won't work: I am forced to copy the original event (using jQuery or worse) and reinsert it at the parent level. With a "Value" attribute, I can overwrite the whole value element however I like, provided I pass the entire {"Key":"", "Value":""} object to the function. It's an awful lot cleaner in this situation for me to use the array of objects method.
I am also basing this change to arrays on the wealth of other responses on stackoverflow which encourage the use of them instead of objects.
If all you're going to do is iterate over your objects, then an array of objects makes more sense. If these are settings and people are going to need to look up a specific one then the original object notation is better. the original allows people write code like
var foo = settings['2013'][someEventName].data1
whereas getting that data out of the array of objects would requires iterating through them to find the one with the key: 2013 which depending on the length of the list will cause performance issues.
Pushing new data to the object is as simple as
settings['2014'] = {...}
and deleting data from an object is also simple
delete settings['2014']

Is there a performance impact to using virtual getters in Mongoose with Node.js?

I'm starting to make use of virtual getter methods in Mongoose in a real-world application and am wondering if there is a performance impact to using them that would be good to know about up-front.
For example:
var User = new Schema({
name: {
first: String,
last: String
}
});
User.virtual('name.full').get(function () {
return this.name.first + ' ' + this.name.last;
});
Basically, I don't understand yet how the getters are generated into the Objects Mongoose uses, and whether the values are populated on object initialisation or on demand.
__defineGetter__ can be used to map a property to a method in Javascript but this does not appear to be used by Mongoose for virtual getters (based on a quick search of the code).
An alternative would be to populate each virtual path on initialisation, which would mean that for 100 users in the example above, the method to join the first and last names is called 100 times.
(I'm using a simplified example, the getters can be much more complex)
Inspecting the raw objects themselves (e.g. using console.dir) is a bit misleading because internal methods are used by Mongoose to handle translating objects to 'plain' objects or to JSON, which by default don't include the getters.
If anyone can shed light how this works, and whether lots of getters may become an issue at scale, I'd appreciate it.
They're probably done using the standard way:
Object.defineProperty(someInstance, propertyName, {get: yourGetter});
... meaning "not on initialization". Reading the virtual properties on initialization would defeat the point of virtual properties, I'd think.

Categories

Resources