Checking for duplicate Javascript objects - javascript

TL;DR version: I want to avoid adding duplicate Javascript objects to an array of similar objects, some of which might be really big. What's the best approach?
I have an application where I'm loading large amounts of JSON data into a Javascript data structure. While it's a bit more complex than this, assume that I'm loading JSON into an array of Javascript objects from a server through a series of AJAX requests, something like:
var myObjects = [];
function processObject(o) {
myObjects.push(o);
}
for (var x=0; x<1000; x++) {
$.getJSON('/new_object.json', processObject);
}
To complicate matters, the JSON:
is in an unknown schema
is of arbitrary length (probably not enormous, but could be in the 100-200 kb range)
might contain duplicates across different requests
My initial thought is to have an additional object to store a hash of each object (via JSON.stringify?) and check against it on each load, like this:
var myHashMap = {};
function processObject(o) {
var hash = JSON.stringify(o);
// is it in the hashmap?
if (!(myHashMap[hash])) {
myObjects.push(o);
// set the hashmap key for future checks
myHashMap[hash] = true;
}
// else ignore this object
}
but I'm worried about having property names in myHashMap that might be 200 kb in length. So my questions are:
Is there a better approach for this problem than the hashmap idea?
If not, is there a better way to make a hash function for a JSON object of arbitrary length and schema than JSON.stringify?
What are the possible issues with super-long property names in an object?

I'd suggest you create an MD5 hash of the JSON.stringify(o) and store that in your hashmap with a reference to your stored object as the data for the hash. And to make sure that there are no object key order differences in the JSON.stringify(), you have to create a copy of the object that orders the keys.
Then, when each new object comes in, you check it against the hash map. If you find a match in the hash map, then you compare the incoming object with the actual object that you've stored to see if they are truly duplicates (since there can be MD5 hash collisions). That way, you have a manageable hash table (with only MD5 hashes in it).
Here's code to create a canonical string representation of an object (including nested objects or objects within arrays) that handles object keys that might be in a different order if you just called JSON.stringify().
// Code to do a canonical JSON.stringify() that puts object properties
// in a consistent order
// Does not allow circular references (child containing reference to parent)
JSON.stringifyCanonical = function(obj) {
// compatible with either browser or node.js
var Set = typeof window === "object" ? window.Set : global.Set;
// poor man's Set polyfill
if (typeof Set !== "function") {
Set = function(s) {
if (s) {
this.data = s.data.slice();
} else {
this.data = [];
}
};
Set.prototype = {
add: function(item) {
this.data.push(item);
},
has: function(item) {
return this.data.indexOf(item) !== -1;
}
};
}
function orderKeys(obj, parents) {
if (typeof obj !== "object") {
throw new Error("orderKeys() expects object type");
}
var set = new Set(parents);
if (set.has(obj)) {
throw new Error("circular object in stringifyCanonical()");
}
set.add(obj);
var tempObj, item, i;
if (Array.isArray(obj)) {
// no need to re-order an array
// but need to check it for embedded objects that need to be ordered
tempObj = [];
for (i = 0; i < obj.length; i++) {
item = obj[i];
if (typeof item === "object") {
tempObj[i] = orderKeys(item, set);
} else {
tempObj[i] = item;
}
}
} else {
tempObj = {};
// get keys, sort them and build new object
Object.keys(obj).sort().forEach(function(item) {
if (typeof obj[item] === "object") {
tempObj[item] = orderKeys(obj[item], set);
} else {
tempObj[item] = obj[item];
}
});
}
return tempObj;
}
return JSON.stringify(orderKeys(obj));
}
And, the algorithm
var myHashMap = {};
function processObject(o) {
var stringifiedCandidate = JSON.stringifyCanonical(o);
var hash = CreateMD5(stringifiedCandidate);
var list = [], found = false;
// is it in the hashmap?
if (!myHashMap[hash] {
// not in the hash table, so it's a unique object
myObjects.push(o);
list.push(myObjects.length - 1); // put a reference to the object with this hash value in the list
myHashMap[hash] = list; // store the list in the hash table for future comparisons
} else {
// the hash does exist in the hash table, check for an exact object match to see if it's really a duplicate
list = myHashMap[hash]; // get the list of other object indexes with this hash value
// loop through the list
for (var i = 0; i < list.length; i++) {
if (stringifiedCandidate === JSON.stringifyCanonical(myObjects[list[i]])) {
found = true; // found an exact object match
break;
}
}
// if not found, it's not an exact duplicate, even though there was a hash match
if (!found) {
myObjects.push(o);
myHashMap[hash].push(myObjects.length - 1);
}
}
}
Test case for jsonStringifyCanonical() is here: https://jsfiddle.net/jfriend00/zfrtpqcL/

Maybe. For example if You know what kind object goes by You could write better indexing and searching system than JS objects' keys. But You could only do that with JavaScript and object keys are written in C...
Must Your hashing be lossless or not? If can than try to lose compression (MD5). I guessing You will lose some speed and gain some memory. By the way, do JSON.stringify(o) guarantees same key ordering. Because {foo: 1, bar: 2} and {bar: 2, foo: 1} is equal as objects, but not as strings.
Cost memory
One possible optimization:
Instead of using getJSON use $.get and pass "text" as dataType param. Than You can use result as Your hash and convert to object afterwards.
Actually by writing last sentence I though about another solution:
Collect all results with $.get into array
Sort it with buildin (c speed) Array.sort
Now You can easily spot and remove duplicates with one for
Again different JSON strings can make same JavaScript object.

Related

Reduce javascript object to unique identifier

I have an object that I'm storing page settings in that looks something like this:
var filters={
"brands":["brand1","brand2","brand3"],
"family":"reds",
"palettes":["palette1","palette2","palette3"],
"color":"a1b2"
};
This object is constantly being changed as the user browses the page. I looking for some fast way in the code (maybe using a built in jquery or javascript function) to reduce the current settings object to a unique identifier I can reference without using a lot of loops. Maybe something like this:
"brandsbrand1brand2brand3familyredspalettespalette1palette2palette3colora1b2"
Doesn't have to necessarily convert the object to a long string like that, as long as it is something that will be unique to a particular group of settings. And I won't need to convert this identifier back into the object later.
EDITS:
I need to give some more information.
I'm looking to store the items of the results of the filters I'm doing inside a variable that's named the same as the unique ID. So, var uniqueID1 is from the settings object that has brand1 and brand2, and contains ["filteredObject1_1","filteredObject1_2"...,"filteredObject1_500"], and var uniqueID2 is from the settings object that has brand3 and brand4, and contains ["filteredObject2_1","filteredObject2_2"...,"filteredObject2_500"]. What I'm looking to do is avoid doing really really slow filtering code more than once on a bunch of items by storing results of the filtering in unique variables.
So:
Convert settings to unique id and see if that if that variable exists.
If variable exists, just get that variable that has the already filtered items.
If variable doesn't exist, do the really slow filtering on hundreds of items and store these items in unique id variable.
Hopefully I just didn't make this more confusing. I feel like I probably made it more confusing.
You can use JSON, which is a method of stringifying objects that was designed for JavaScript.
var filters={
"brands":["brand1","brand2","brand3"],
"family":"reds",
"palettes":["palette1","palette2","palette3"],
"color":"a1b2"
};
var uniqueId = JSON.stringify(filters);
uniqueId equals the following string:
{"brands":["brand1","brand2","brand3"],"family":"reds","palettes":["palette1","palette2","palette3"],"color":"a1b2"}
This has the added benefit of being able to be turned back into an object with JSON.parse(uniqueId).
Note that with JSON.stringify, two objects with have exactly the same values will be converted into the same unique id.
EDIT:
Please let me know if I interpreted your edit correctly. However, I think this is what you want to do.
//object that will act as a cache
var cached_filters = {}
//this assumes the existence of a get_filter function that processes the filters object
function get_cached_filter(filters) {
let uniqueId = JSON.stringify(filters);
//use already cached filters
if (cached_filters[uniqueId]) {
return cached_filters[uniqueId];
//create filter and cache it
} else {
cached_filters[uniqueId] = get_filter(filters);
return cached_filters[uniqueId];
}
}
This will store an object that has keys for each filter each time you call get_cached_filter. If get_cached_filter has already been called with the same exact filter, it will use it from the cache instead of recreating it; otherwise, it will create it and save it in the cache.
You could iterate the filter object and filter with Array#filter the data.
data.filter(function (o) {
return Object.keys(filters).every(function (k) {
return Array.isArray(filters[k])
? filters[k].some(function (f) { return o[k] === f; })
: o[k] === filters[k];
});
});
If you won't need to convert this identifier back into the object later, Here you can use this simple hashing function:
function UniqueHashCode(obj){
var str = JSON.stringify(obj)
var hash = 0;
if (str.length == 0) return hash;
for (i = 0; i < str.length; i++) {
char = str.charCodeAt(i);
hash = ((hash<<5)-hash)+char;
hash = hash & hash; // Convert to 32bit integer
}
return hash;
}
function UniqueHashCode(obj){
var str = JSON.stringify(obj)
var hash = 0;
if (str.length == 0) return hash;
for (i = 0; i < str.length; i++) {
char = str.charCodeAt(i);
hash = ((hash<<5)-hash)+char;
hash = hash & hash; // Convert to 32bit integer
}
return hash;
}
var filters={
"brands":["brand1","brand2","brand3"],
"family":"reds",
"palettes":["palette1","palette2","palette3"],
"color":"a1b2"
};
alert(UniqueHashCode(filters));
This function create a simple and very short integer (for example 661801383) by given object.
I hope to be helpful for you:)

Node.js: How to serialize a large object with circular references

I use Node.js and want to serialize a large javascript object to HDD. The object is basically a "hashmap" and only contains data, not functions. The object contains elements with circular references.
This is an online application so the process should not block the main loop. In my use-case Non-blocking is much more important than speed (data is live in-memory data and is only load at startup, saves are for timed backups every X minutes and at shutdown/failure)
What is the best way to do this? Pointers to libraries that do what I want are more than welcome.
I have a nice solution I've been using. Its downside is that it has an O(n^2) runtime which makes me sad.
Here's the code:
// I defined these functions as part of a utility library called "U".
var U = {
isObj: function(obj, cls) {
try { return obj.constructor === cls; } catch(e) { return false; };
},
straighten: function(item) {
/*
Un-circularizes data. Works if `item` is a simple Object, an Array, or any inline value (string, int, null, etc).
*/
var arr = [];
U.straighten0(item, arr);
return arr.map(function(item) { return item.calc; });
},
straighten0: function(item, items) {
/*
The "meat" of the un-circularization process. Returns the index of `item`
within the array `items`. If `item` didn't initially exist within
`items`, it will by the end of this function, therefore this function
always produces a usable index.
Also, `item` is guaranteed to have no more circular references (it will
be in a new format) once its index is obtained.
*/
/*
STEP 1) If `item` is already in `items`, simply return it.
Note that an object's existence can only be confirmed by comparison to
itself, not an un-circularized version of itself. For this reason an
`orig` value is kept ahold of to make such comparisons possible. This
entails that every entry in `items` has both an `orig` value (the
original object, for comparison) and a `calc` value (the calculated, un
circularized value).
*/
for (var i = 0, len = items.length; i < len; i++) // This is O(n^2) :(
if (items[i].orig === item) return i;
var ind = items.length;
// STEP 2) Depending on the type of `item`, un-circularize it differently
if (U.isObj(item, Object)) {
/*
STEP 2.1) `item` is an `Object`. Create an un-circularized version of
that `Object` - keep all its keys, but replace each value with an index
that points to that values.
*/
var obj = {};
items.push({ orig: item, calc: obj }); // Note both `orig` AND `calc`.
for (var k in item)
obj[k] = U.straighten0(item[k], items);
} else if (U.isObj(item, Array)) {
/*
STEP 2.2) `item` is an `Array`. Create an un-circularized version of
that `Array` - replace each of its values with an index that indexes
the original value.
*/
var arr = [];
items.push({ orig: item, calc: arr }); // Note both `orig` AND `calc`.
for (var i = 0; i < item.length; i++)
arr.push(U.straighten0(item[i], items));
} else {
/*
STEP 2.3) `item` is a simple inline value. We don't need to make any
modifications to it, as inline values have no references (let alone
circular references).
*/
items.push({ orig: item, calc: item });
}
return ind;
},
unstraighten: function(items) {
/*
Re-circularizes un-circularized data! Used for undoing the effects of
`U.straighten`. This process will use a particular marker (`unbuilt`) to
show values that haven't yet been calculated. This is better than using
`null`, because that would break in the case that the literal value is
`null`.
*/
var unbuilt = { UNBUILT: true };
var initialArr = [];
// Fill `initialArr` with `unbuilt` references
for (var i = 0; i < items.length; i++) initialArr.push(unbuilt);
return U.unstraighten0(items, 0, initialArr, unbuilt);
},
unstraighten0: function(items, ind, built, unbuilt) {
/*
The "meat" of the re-circularization process. Returns an Object, Array,
or inline value. The return value may contain circular references.
*/
if (built[ind] !== unbuilt) return built[ind];
var item = items[ind];
var value = null;
/*
Similar to `straighten`, check the type. Handle Object, Array, and inline
values separately.
*/
if (U.isObj(item, Object)) {
// value is an ordinary object
var obj = built[ind] = {};
for (var k in item)
obj[k] = U.unstraighten0(items, item[k], built, unbuilt);
return obj;
} else if (U.isObj(item, Array)) {
// value is an array
var arr = built[ind] = [];
for (var i = 0; i < item.length; i++)
arr.push(U.unstraighten0(items, item[i], built, unbuilt));
return arr;
}
built[ind] = item;
return item;
},
thingToString: function(thing) {
/*
Elegant convenience function to convert any structure (circular or not)
to a string! Now that this function is available, you can ignore
`straighten` and `unstraighten`, and the headaches they may cause.
*/
var st = U.straighten(thing);
return JSON.stringify(st);
},
stringToThing: function(string) {
/*
Elegant convenience function to reverse the effect of `U.thingToString`.
*/
return U.unstraighten(JSON.parse(string));
}
};
var circular = {
val: 'haha',
val2: [ 'hey', 'ho', 'hee' ],
doesNullWork: null
};
circular.circle1 = circular;
circular.confusing = {
circular: circular,
value: circular.val2
};
console.log('Does JSON.stringify work??');
try {
var str = JSON.stringify(circular);
console.log('JSON.stringify works!!');
} catch(err) {
console.log('JSON.stringify doesn\'t work!');
}
console.log('');
console.log('Does U.thingToString work??');
try {
var str = U.thingToString(circular);
console.log('U.thingToString works!!');
console.log('Its result looks like this:')
console.log(str);
console.log('And here\'s it converted back into an object:');
var obj = U.stringToThing(str);
for (var k in obj) {
console.log('`obj` has key "' + k + '"');
}
console.log('Did `null` work?');
if (obj.doesNullWork === null)
console.log('yes!');
else
console.log('nope :(');
} catch(err) {
console.error(err);
console.log('U.thingToString doesn\'t work!');
}
The whole idea is to serialize some circular structure by placing every object within directly into an array.
E.g. if you have an object like this:
{
val: 'hello',
anotherVal: 'hi',
circular: << a reference to itself >>
}
Then U.straighten will produce this structure:
[
0: {
val: 1,
anotherVal: 2,
circular: 0 // Note that it's become possible to refer to "self" by index! :D
},
1: 'hello',
2: 'hi'
]
Just a couple of extra notes:
I've been using these functions for quite some time in a wide variety of situations! It's very unlikely there are hidden bugs.
The O(n^2) runtime issue could be defeated with an ability to map every object to a unique hash value (which can be implemented). The reason for the O(n^2) nature is a linear search must be used to find items that have already been circularized. Because this linear search is occurring within an already linear process, the runtime becomes O(n^2)
These methods actual provide a small amount of compression! Inline values that are the same will not occur twice at different indexes. All same instances of an inline value will be mapped to the same index. E.g.:
{
hi: 'hihihihihihihihihihihi-very-long-pls-compress',
ha: 'hihihihihihihihihihihi-very-long-pls-compress'
}
Becomes (after U.straighten):
[
0: {
hi: 1,
ha: 1
},
1: 'hihihihihihihihihihihi-very-long-pls-compress'
]
And finally, in case it wasn't clear using this code is very easy!! You only need to ever look at U.thingToString and U.stringToThing. The usage of these functions is precisely the same as the usage of JSON.stringify and JSON.parse.
var circularObj = // Some big circular object you have
var serialized = U.thingToString(circularObj);
var unserialized = U.stringToThing(serialized);

Referencing index of an Object within an Object using something equivelant Object.indexOf

I'm surprised that I can't find an answer to this question on StackOverflow (maybe I'm not searching right).
But basically I'm curious to know if there is something similar to the Array.indexOf() method, but for objects. That is, an efficient method of returning the index(es) of a value within an existing Object.
For example, say I have an object:
var obj = { prop1: "a", prop2: "b", prop3: "c", prop4: "a" };
Now I want to find the index(es) that contain "a", it would be nice to do a obj.indexOf("a") and have it return something like ["prop1", "prop4"]
But this doesn't seem to be an implemented method for objects.
Alternatively, I know I can create a function:
function indexOf(val, obj){
var indexes = [];
for (var index in obj){
if(!obj.hasOwnProperty(index)) continue;
if(obj[index] == val){
indexes.push(index);
}
}
if(!indexes.length) return false;
else return indexes;
}
indexOf("a", obj); // returns ["prop1","prop4"]
But this kind of feels clunky to iterate over the whole object this way!! Some of the objects I'll be dealing with will be quite huge and the values might be quite large as well.
Is there a better, more efficient way?
If you have a really huge object, you could use a nice weakmap implementation with the complexity of O(1) to store keys per single object. Therefor you have to implement your hash collection, so when setting a key-value pair, you also store the key in the weakmap.
I made also some bench. comparison of this custom HashMap vs RawObject search - jsperf
function HashMap() {
this.__map = new WeakMap;
this.__hash = {};
}
HashMap.prototype = {
set: function(key, value){
this.unset(key);
if (value == null)
return;
this.__hash[key] = value;
var keys = this.__map.get(value);
if (keys == null)
this.__map.set(value, keys = []);
keys.push(key);
},
unset: function(key){
var value = this.__hash[key];
if (value) {
var keys = this.__map.get(value),
index = keys.indexOf(key);
keys.splice(index, 1);
}
this.__hash[key] = void 0;
},
get: function(key){
return this.__hash[key];
},
getKeys: function(value){
return this.__map.get(value);
}
};
WeakMap uses Object.defineProperty method in its core. For this reason there are some limitations:
browsers: IE9+
Objects as Values in above HashMap example, because they are used as Keys in WeakMap Collection
But this approach makes a huge performance boost, as there is no need to iterate over the object, to look for a specific value.

How to identify anonymous types in JSON?

I am writing one function on Javascript which needs to address all the anynymous types in a JSON object.
For example,
Typed= {
emails: [{email:'a#a.com'}, {email:'b#a.com'}, {email:'c#a.com'}, {email:'d#a.com'}]
};
is an example of typed array in a JSON because each element inside the array is typed email
while,
Anon= {
emails: ['a#a.com', 'b#a.com', 'c#a.com', 'd#a.com']
};
is a JSON object where emails is collection of some anonymous objects.
Is there any ways that I can differentiate between both in JQuery or Javascript?
The simplest solution is to have the JSON source only return one of the two forms. Then you don't have to branch in your client.
If that's not an option, you could get the values out with JavaScript's handy lazy-evaluation of boolean expressions:
var em = json.emails[0].email || json.emails[0];
That statement will prefer the array-of-objects version, but use the array-of-strings version as a fallback.
(edited in response to clarifying comment below)
You can determine what properties a JS object has at runtime like this:
function enumerate(targetObject){
var props = [];
for (var propName in targetObject ){
props.push(propName);
}
return props;
}
console.log(enumerate({foo:1, bar:'baz'}),join(',')); //"foo, bar"
you could then modulate your logic on the basis of the properties you get back. You'll want to make sure you understand prototypes (specifically what Object.hasOwnProperty does and means), too.
You can use Array iteration methods to quickly check if all (or some) elements of the array have the desired type:
Anon.emails.every(function(e) { return typeof e == "object" }) // false
Typed.emails.every(function(e) { return typeof e == "object" }) // true
or a more generic solution
typeCheck = function(type) {
return function() {
return typeof arguments[0] == type
}
}
Anon.emails.every(typeCheck("object")) // false
Typed.emails.every(typeCheck("object")) // true
(An obligatory warning about iteration methods not being supported in ancient browsers)
How about this:
var istyped = function (a) {
if (typeof(a) !== 'object') {
return false;
}
var count = 0;
for (var key in a) {
count = count + 1;
}
return (count === 1);
}
I'm assuming here you just want to distinguish between regular variables (this would be your anonymous variable) and objects with just one key/value pair inside (this would be your typed variable).
To check if array contains only typed variables you'd just have to loop through it with that function. For example (in newer versions of JavaScript):
Typed.emails.every(istyped) = true
Anon.emails.every(istyped) = false
Why not do a map first:
emails = emails.map(function (email) {
if (typeof email.email === 'string')
return email.email;
});
That will make your emails array an array of just strings. Then you can just process it as usual. There aren't any side-effects if it is an array of strings (email.email will be undefined).
I do stuff like this when I have to make one client deal with multiple versions of an API. Alternatively, you could do the map the other way:
emails = emails.map(function (email) {
if (typeof email === 'string')
return {email: email};
});
This would work better if there could be other information in each object in your emails array.

In Javascript, given value, find name from Object literal

I'm new JavaScript and trying to find out an easier way to find name given a value from object literal.
e.g.
var cars ={ Toyata: ['Camry','Prius','Highlander'],
Honda: ['Accord', 'Civic', 'Pilot'],
Nissan: ['Altima', 'Sentra', 'Quest']};
Given 'Accord', I want to get Honda from the object Cars.
You would need to loop through, like this:
function getManufacturer(carName) {
for(var key in cars) {
if(cars.hasOwnProperty(key)) {
for(var i=0; i<cars[key].length; i++) {
if(cars[key][i] == carName) return key;
}
}
}
return "Not found";
}
You can test it out here, for the same of working cross-browser, this ignores the existence of .indexOf() since IE doesn't have it...that version would look like this:
function getManufacturer(carName) {
for(var key in cars) {
if(cars.hasOwnProperty(key) && cars[key].indexOf(carName) != -1) {
return key;
}
}
return "Not found";
}
If you're going to be doing this once, then use a function like the one given by Bobby. If you're going to be doing this multiple times then I'd suggest creating a reverse mapping of cars to manufacturers:
var manufacturers = {};
// create a map of car models to manufacturers:
for (var manf in cars) {
/* see note below */
for (var i=0; i<cars[manf].length; i++) {
manufacturers[cars[manf][i]] = manf;
}
}
// Now referencing the manufacturers is
// a very fast hash table lookup away:
var model = 'Accord';
alert(manufacturers[model]);
note for those with itchy downvoting fingers: For objects that don't inherit anything as given in the OP a hasOwnProperty check here is unnecessary. For objects that do inherit it depends on the programmer. If you want composability via inheritance then a hasOwnProperty check is exactly what you DONT want. If you don't care about inheritance then use a hasOwnProperty check but if so you would not be inheriting in the first place which would make a hasOwnProperty check unnecessary. In the rare case where you are forced to create the object via inheritance but don't want to check the parent's attributes then you should do a hasOwnProperty check. Of course, if you use a library like Prototype.js that insists on modifying the Object object then I feel sorry for you because you are forced to do a hasOwnProperty check.
Maintain a separate mapping of models to manufacturers.
var cars ={ Toyata: ['Camry','Prius','Highlander'],
Honda: ['Accord', 'Civic', 'Pilot'],
Nissan: ['Altima', 'Sentra', 'Quest']};
var models = {};
var hasOwnProperty = Object.prototype.hasOwnProperty;
for (key in cars) {
if (hasOwnProperty.call(cars, key)) {
var i=0,l=cars[key].length,manufacturer=cars[key];
while (i<l) {
if ( ! hasOwnProperty.call(models, manufacturer)) {
models[manufacturer] = key;
} else {
// Throw an error, or change the value to an array of values
}
i++;
}
}
}

Categories

Resources