Efficient memoization of object arguments - javascript

Summary: Is there a faster way to hash objects than JSON.stringify?
Details: I have a Ruby and JavaScript library (NeatJSON) that provides pretty-printing of JavaScript values. I recently fixed a problem where deeply-nested objects caused O(n!) performance (n being the nesting level) using memoization based on the object being serialized and the indentation amount.
In Ruby, the fix was really easy, because you can index hashes by arrays of unique sets of objects:
build = ->(object,indent) do
memoizer[[object,indent]] ||= <all the rest of the code>
end
In JavaScript, however, I can't index an object by another object (in a unique way). Following the lead of several articles I found online, I decide to fix the problem generically, using JSON.stringify on the full set of arguments to the function to create a unique key for memoization:
function memoize(f){
var memo = {};
var slice = Array.prototype.slice;
return function(){
var args = slice.call(arguments);
var mkey = JSON.stringify(args);
if (!(mkey in memo)) memo[mkey] = f.apply(this,args);
return memo[mkey];
}
}
function rawBuild(o,indent){ .. }
var build = memoize(rawBuild);
This works, but (a) it's a little slower than I'd like, and (b) it seems wildly inefficient (and inelegant) to perform (naive) serialization of every object and value that I'm about to serialize smartly. The act of serializing a large object with many values is going to store a string and formatting result for EVERY unique value (not just leaf values) in the entire object.
Is there a modern JavaScript trick that would let me uniquely identify a value? For example, some way of accessing an internal ID, or otherwise associating complex objects with unique integers that takes O(1) time to find the identifier for a value?

If you are looking to memoise your objects by identity (not by content), then you'll want to use a WeakMap which is designed for exactly this purpose. They don't work for primitive values though, so you'll need a different solution for such arguments.

Using #Bergi's suggestion of a WeakMap I found out about Map, which allows using any value type as the key (not just objects). Because I needed a compound key—uniquely memoizing the combination of the value passed in and the indentation string—I created a hierarchical memoization structure:
function memoizedBuild(){
var memo = new Map;
return function(value,indent){
var byIndent=memo.get(value);
if (!byIndent) memo.set(value,byIndent={});
if (!byIndent[indent]) byIndent[indent] = rawBuild(value,indent);
return byIndent[indent];
}
}
This proved to be about 4× faster than the memoization code I had been using when serializing a large 270kB JSON object.
Note that in the above code I'm able to use !byIndent[indent] only because I know that rawBuild will never return a falsey value (null, undefined, false, NaN, 0, ""). The safer code line would look something like:
if (!(indent in byIndent)) byIndent[indent] = rawBuild(value,indent);

If you just need to memoise objects then it makes sense to assign some unique ID to your objects .
var gID = 0;
function createNode() {
var obj = ...
obj.id = (++gID).toString();
}
and use those obj.id's as keys in your memo collection.
That would be fastest and least greedy solution.
Update:
If you want that id property to do not clash with existing properties
then you can create non-enumerable properties using standard ES5.1 Object.createProperty() (with some unique name) or to use ES6 symbols:
var gID = 0;
var gUidSym = Symbol("uid");
function getUidOf(obj) {
return obj[gUidSym]
|| (obj[gUidSym] = (++gID).toString());
}

Related

Node.JS behaves strange

I have a variable called uids
var uids = [];
Then I write some value to it property
uids[16778923] = "3fd6335d-b0e4-4d77-b304-d30c651ed509"
But before it
if (!uids[user.id]) {
uids[user.id] = generateKey(user);
}
This thing behaves ok. If I try to get the value of it property
uids[currentUser.id]
It will give me a value of this property. If I try to call some methods like
Object.keys(uids);
It will give me, what I expected. And here the mystery comes...
uids;
RAM rest in piece. See the node eating ram
I am very confused now. What's wrong?
This is because you are creating a huge array and node will reserve memory for it - who knows what comes. I'd say that's a scenario where you would use a Map (or a plain object, but Map feels better here.
var uids = new Map();
var key = 456464564564654;
if (! uids.has(key)) {
uids.set(key, generateKey(user))
}
You are creating an empty array (length is zero), then you assign some value to an arbitrary index. This will make the array grow as big as the index and assign the value to that index. Look at this example using node.js REPL:
> var a = []
undefined
> a[5] = "something"
'something'
> a
[ , , , , , 'something' ]
> a.length
6
Instead of creating an array, you could create a Map() or an common javascript object (singleton). Javascript objects behave like Maps but only Strings can be used as keys. If you assign a Number to be key, javascript will convert it to String automatically.
Personally, I would go with objects because they perform better. Instantiating an object takes longer than instantiating a Map (and it doesn't seem like you need to create several groups of "uids"), but once done, adding new keys and retrieving values from any key in faster when using common objects. At least that's how things go in my node.js v6.7.0 on ubuntu 14.04 but you could try for yourself. And it would also make the least alteration to your code.
var uids = {} // common/ordinary empty javascript object instead of array.
if (!uids[user.id]) { // getting value from one key works the same.
uids[user.id] = generateKey(user) // assignment works the same.
}
////
uids[16778923] = "3fd6335d-b0e4-4d77-b304-d30c651ed509" // key will be "16778923".
uids[16778923] // getting value for key "16778923" can be done using 16778923 instead of "16778923".
////
uids[currentUser.id] // still returning values like this.
Object.keys(uids) // still returning an array of keys like this. but they are all Strings.

Array.prototype.map() and Array.prototype.forEach()

I've an array (example array below) -
a = [{"name":"age","value":31},
{"name":"height (inches)","value":62},
{"name":"location","value":"Boston, MA"},
{"name":"gender","value":"male"}];
I want to iterate through this array of objects and produce a new Object (not specifically reduce).
I've these two approaches -
a = [{"name":"age","value":31},
{"name":"height (inches)","value":62},
{"name":"location","value":"Boston, MA"},
{"name":"gender","value":"male"}];
// using Array.prototype.map()
b = a.map(function(item){
var res = {};
res[item.name] = item.value;
return res;
});
console.log(JSON.stringify(b));
var newObj = [];
// using Array.prototype.forEach()
a.forEach(function(d){
var obj = {};
obj[d.name] = d.value;
newObj.push(obj)
});
console.log(JSON.stringify(newObj))
Is it not right to just use either one for this sort of operations?
Also, I'd like to understand the use case scenarios where one will be preferred over the other? Or should I just stick to for-loop?
As you've already discussed in the comments, there's no outright wrong answer here. Aside from some rather fine points of performance, this is a style question. The problem you are solving can be solved with a for loop, .forEach(), .reduce(), or .map().
I list them in that order deliberately, because each one of them could be re-implemented using anything earlier in the list. You can use .reduce() to duplicate .map(), for instance, but not the reverse.
In your particular case, unless micro-optimizations are vital to your domain, I'd make the decision on the basis of readability and code-maintenance. On that basis, .map() does specifically and precisely what you're after; someone reading your code will see it and know you're consuming an array to produce another array. You could accomplish that with .forEach() or .reduce(), but because those are capable of being used for more things, someone has to take that extra moment to understand what you ARE using them for. .map() is the call that's most expressive of your intent.
(Yes, that means in essence prioritizing efficiency-of-understanding over efficiency-of-execution. If the code isn't part of a performance bottleneck in a high-demand application, I think that's appropriate.)
You asked about scenarios where another might be preferred. In this case, .map() works because you're outputting an array, and your output array has the same length as your input array. (Again; that's what .map() does). If you wanted to output an array, but you might need to produce two (or zero) elements of output for a single element of input, .map() would be out and I'd probably use .reduce(). (Chaining .filter().map() would also be a possibility for the 'skip some input elements' case, and would be pretty legible)
If you wanted to split the contents of the input array into multiple output arrays, you could do that with .reduce() (by encapsulating all of them as properties of a single object), but .forEach() or the for loop would look more natural to me.
First, either of those will work and with your example there's no reason not to use which ever is more comfortable for your development cycle. I would probably use map since that is what is for; to create "a new array with the results of calling a provided function on every element in this array."
However, are you asking which is the absolute fastest? Then neither of those; the fastest by 2.5-3x will be a simple for-loop (see http://jsperf.com/loop-vs-map-vs-foreach for a simple comparison):
var newObj = [];
for (var i = 0, item; item = a[i]; i++) {
var obj = {};
obj[item.name] = item.value;
newObj.push(obj);
});
console.log(JSON.stringify(newObj));

What is a good way to create a JavaScript array with big indices?

I'm making a web app where a user gets data from PHP, and the data consists of MySQL rows, so I want to save the used ones in a global variable, something like a buffer, to prevent extra AJAX requests.
I'm doing this right now :
window.ray = []; // global variable
$(function(){
data = getDataWithAjax(idToSearch);
window.ray[data.id] = data.text;
});
but when the id is big, say 10 for now, window.ray becomes this :
,,,,,,,,42
so it contains 9 unnecessary spots. Or does it? Is it only visible when I'm doing console.log(window.ray);
If this is inefficient, I want to find a way like PHP, where I can assign only indices that I want, like :
$array['420'] = "abc";
$array['999'] = "xyz";
Is my current way as efficient as PHP, or does it actually contain unnecessary memory spots?
Thanks for any help !
Use an object instead of an array. The object will let you use the id as the key and be more efficient for non-sequential id values.
window.ray = {}; // global variable
$(function(){
data = getDataWithAjax(idToSearch);
window.ray[data.id] = data.text;
});
You can then access any element by the id:
var text = window.ray[myId];
If you are assigning values directly by property name, then it doesn't make any difference in terms of performance whether you use an Array or an Object. The property names of Arrays are strings, just like Objects.
In the following:
var a = [];
a[1000] = 'foo';
then a is (a reference to) an array with length 1,001 (always at least one greater than the highest index) but it only has one numeric member, the one called '1000', there aren't 1,000 other empty members, e.g.:
a.hasOwnProperty['999']; // false
Arrays are just Objects with a special, self–adjusting length property and some mostly generic methods that can be applied to any suitable object.
One feature of sparse arrays (i.e. where the numeric properties from 0 to length aren't contiguous) is that a for loop will loop over every value, including the missing ones. That can be avoided and significant performance gains realised by using a for..in loop and using a hasOwnProperty test, just like an Object.
But if you aren't going to use any of the special features of an Array, you might as well just use an Object as suggested by jfriend00.

Immutable Hash and Array implementation in JavaScript?

Is there simple immutable hash and array implementation in javascript? I don't need best speed, a reasonable speed better than a clone would be good.
Also, if there are simple implementations in Java or some other languages that can be easily understood and ported to JavaScript, it would be also nice.
UPDATE:
The goal isn't to just froze the hash (or array), but to make an efficient implementation of update operation - update of immutable hash should return a new immutable hash. And it should be more efficient than doing it by "clone original and update it".
Native JS types have complexity of update something like O(1), with cloning the complexity will be O(n), with special immutable data structures (what I asked for) it will be 0(log(n))
UPDATE2: JavaScript already has Array / Hash :
Yes, but they are mutable, I need something similar but immutable, basically it can be done very simply by cloning hash2 = hash1.clone(); hash2[key] = value but it's very inefficient, there are algorithms that made it very efficient, without using the clone.
hash1 = {}
hash2 = hash1.set('key', 'value2')
hash3 = hash1.set('key', 'value3)
console.log(hash1) // => {}
console.log(hash2) // => {key: 'value2'}
console.log(hash3) // => {key: 'value3'}
SOLUTION:
It's not an implementation for immutable hash, but more like a hack for my current problem, maybe it also helps someone.
A little more about why I need immutable data structures - I use Node.js and sort of in-memory database. One request can read database, other update it - update can take a lot of time (calling remote services) - so I can't block all read processes and wait until update will be finished, also update may fail and database should be rolled back. So I need to somehow isolate (ACID) read and write operations to the in-memory database.
That's why I need immutable arrays and hashes - to implement sort of MVCC. But it seems there is a simpler way to do it. Instead of updating database directly - the update operation just records changes to database (but not perform it directly) - in form of "add 42 to array db.someArray".
In the end - the product of update operation will be an array of such change commands, and because it can be applied very quickly - we can block the database to apply it.
But, still it will be interesting to see if there are implementation of immutable data structures in javascript, so I'll leave this question open.
I know this question is old but I thought people that were searching like me should be pointed to Facebook's Immutable.js which offers many different types of immutable data structures in a very efficient way.
I had the same requirements for persistent data structures for JS, so a while ago I made an implementation of a persistent map.. https://github.com/josef-jelinek/cofy/blob/master/lang/feat.js
It contains implementation of a balanced tree based (sorted) map, and a naive copy-on-write map (and unfinished persistent vector/array).
var map = FEAT.map();
var map1 = map.assoc('key', 'value');
var value = map1.get('key');
var map2 = map1.dissoc('key');
...
it supports other methods like count(), contains(key), keys(into = []), values(into = []), toObject(into = {}), toString()
The implementation is not too complicated and it is in public domain. I accept suggestions and contributors too :).
Update: you can find unit tests (examples of usage) at https://github.com/josef-jelinek/cofy/blob/master/tests/test-feat.html
Update 2: Persistent vector implementation is now there as well with the following operations: count(), get(i), set(i, value), push(value), pop(), toArray(into = []), toString()
The only way to make an object immutable is to hide it inside a function. You can then use the function to return either the default hash or an updated version, but you can't actually store an immutable hash in the global scope.
function my_hash(delta) {
var default = {mykey: myvalue};
if (delta) {
for (var key, value in delta) {
if (default.hasOwnProperty(key)) default[key] = value;
}
}
return default;
}
I don't think this is a good idea though.
The best way to clone an object in Javascript I'm aware of, is the one contained in underscore.js
Shortly:
_.clone = function(obj) {
if (!_.isObject(obj)) return obj;
return _.isArray(obj) ? obj.slice() : _.extend({}, obj);
};
_.extend = function(obj) {
each(slice.call(arguments, 1), function(source) {
for (var prop in source) {
obj[prop] = source[prop];
}
});
return obj;
};

Test for value within array of objects

I am dynamically building an array of objects using a process that boils down to something like this:
//Objects Array
var objects = [];
//Object Structure
var object1 = {"id":"foobar_1", "metrics":90};
var object2 = {"id":"some other foobar", "metrics":50};
objects[0] = object1;
objects[1] = object2;
(Let it be said for the record, that if you can think of a better way to dynamically nest data such that I can access it with objects[i].id I am also all ears!)
There's ultimately going to be more logic at play than what's above, but it's just not written yet. Suffice it to say that the "object1" and "object2" parts will actually be in an iterator.
Inside that iterator, I want to check for the presence of an ID before adding another object to the array. If, for example, I already have an object with the ID "foobar_1", instead of pushing a new member to the array, I simply want to increment its "metrics" value.
If I wasn't dealing with an array of objects, I could use inArray to look for "foobar_1" (a jQuery utility). But that won't look into the object's values. The way I see it, I have two options:
Keep a separate simple array of just the IDs. So instead of only relying on the objects array, I simply check inArray (or plain JS equivalent) for a simple "objectIDs" array that is used only for this purpose.
Iterate through my existing data object and compare my "foobar_1" needle to each objects[i].id haystack
I feel that #1 is certainly more efficient, but I can't help wondering if I'm missing a function that would do the job for me. A #3, 4, or 5 option that I've missed! CPU consumption is somewhat important, but I'm also interested in functions that make the code less verbose whether they're more cycle-efficient or not.
I'd suggest switching to an object instead of an array:
var objects = {};
objects["foobar_1"] = {metrics: 90};
objects["some other foobar"] = {metrics: 50};
Then, to add a new object uniquely, you would do this:
function addObject(id, metricsNum) {
if (!(id in objects)) {
objects[id] = {metrics: metricsNum};
}
}
To iterate all the objects, you would do this:
for (var id in objects) {
// process objects[id]
}
This gives you very efficient lookup for whether a given id is already in your list or not. The only thing it doesn't give you that the array gave you before is a specific order of objects because the keys of an object don't have any specific order.
Hmm , i wonder why dont you use dictionary cause that is perfectlly fits your case. so your code will be as below:
//Objects Array
var objects = [];
//Object Structure
var object1 = {"metrics":90};
var object2 = {"metrics":50};
objects["foobar_1"] = object1;
objects["some other foobar"] = object2;
// An example to showing the object existence.
if (!objects["new id"]){
objects["new id"] = {"metrics": 100};
}
else {
objects["new id"].matrics++;
}

Categories

Resources