Priority Queues have a priority value and data, for every entry.
Thus, when adding a new element to the queue, it bubbles up to the surface if it has a higher priority value than elements already in the collection.
When one calls pop, we get the data for the element with highest priority.
What is an efficient implementation of such a priority queue in Javascript?
Does it make sense to have a new object called PriorityQueue, create two methods (push and pop) that take two params (data, priority)? That much makes sense to me as a coder, but I'm uncertain of which data structure to use in the underbelly that will allow manipulation of the ordering of elements. Or can we just store it all in an array and walk through the array every time to grab the element with max priority?
What's a good way to do this?
Below is what I believe to be a truly efficient version of a PriorityQueue which uses an array-based binary heap (where the root is at index 0, and the children of a node at index i are at indices 2i + 1 and 2i + 2, respectively).
This implementation includes the classical priority queue methods like push, peek, pop, and size, as well as convenience methods isEmpty and replace (the latter being a more efficient substitute for a pop followed immediately by a push). Values are stored not as [value, priority] pairs, but simply as values; this allows for automatic prioritization of types that can be natively compared using the > operator. A custom comparator function passed to the PriorityQueue constructor can be used to emulate the behavior of pairwise semantics, however, as shown in the example below.
Heap-based Implementation:
const top = 0;
const parent = i => ((i + 1) >>> 1) - 1;
const left = i => (i << 1) + 1;
const right = i => (i + 1) << 1;
class PriorityQueue {
constructor(comparator = (a, b) => a > b) {
this._heap = [];
this._comparator = comparator;
}
size() {
return this._heap.length;
}
isEmpty() {
return this.size() == 0;
}
peek() {
return this._heap[top];
}
push(...values) {
values.forEach(value => {
this._heap.push(value);
this._siftUp();
});
return this.size();
}
pop() {
const poppedValue = this.peek();
const bottom = this.size() - 1;
if (bottom > top) {
this._swap(top, bottom);
}
this._heap.pop();
this._siftDown();
return poppedValue;
}
replace(value) {
const replacedValue = this.peek();
this._heap[top] = value;
this._siftDown();
return replacedValue;
}
_greater(i, j) {
return this._comparator(this._heap[i], this._heap[j]);
}
_swap(i, j) {
[this._heap[i], this._heap[j]] = [this._heap[j], this._heap[i]];
}
_siftUp() {
let node = this.size() - 1;
while (node > top && this._greater(node, parent(node))) {
this._swap(node, parent(node));
node = parent(node);
}
}
_siftDown() {
let node = top;
while (
(left(node) < this.size() && this._greater(left(node), node)) ||
(right(node) < this.size() && this._greater(right(node), node))
) {
let maxChild = (right(node) < this.size() && this._greater(right(node), left(node))) ? right(node) : left(node);
this._swap(node, maxChild);
node = maxChild;
}
}
}
Example:
{const top=0,parent=c=>(c+1>>>1)-1,left=c=>(c<<1)+1,right=c=>c+1<<1;class PriorityQueue{constructor(c=(d,e)=>d>e){this._heap=[],this._comparator=c}size(){return this._heap.length}isEmpty(){return 0==this.size()}peek(){return this._heap[top]}push(...c){return c.forEach(d=>{this._heap.push(d),this._siftUp()}),this.size()}pop(){const c=this.peek(),d=this.size()-1;return d>top&&this._swap(top,d),this._heap.pop(),this._siftDown(),c}replace(c){const d=this.peek();return this._heap[top]=c,this._siftDown(),d}_greater(c,d){return this._comparator(this._heap[c],this._heap[d])}_swap(c,d){[this._heap[c],this._heap[d]]=[this._heap[d],this._heap[c]]}_siftUp(){for(let c=this.size()-1;c>top&&this._greater(c,parent(c));)this._swap(c,parent(c)),c=parent(c)}_siftDown(){for(let d,c=top;left(c)<this.size()&&this._greater(left(c),c)||right(c)<this.size()&&this._greater(right(c),c);)d=right(c)<this.size()&&this._greater(right(c),left(c))?right(c):left(c),this._swap(c,d),c=d}}window.PriorityQueue=PriorityQueue}
// Default comparison semantics
const queue = new PriorityQueue();
queue.push(10, 20, 30, 40, 50);
console.log('Top:', queue.peek()); //=> 50
console.log('Size:', queue.size()); //=> 5
console.log('Contents:');
while (!queue.isEmpty()) {
console.log(queue.pop()); //=> 40, 30, 20, 10
}
// Pairwise comparison semantics
const pairwiseQueue = new PriorityQueue((a, b) => a[1] > b[1]);
pairwiseQueue.push(['low', 0], ['medium', 5], ['high', 10]);
console.log('\nContents:');
while (!pairwiseQueue.isEmpty()) {
console.log(pairwiseQueue.pop()[0]); //=> 'high', 'medium', 'low'
}
.as-console-wrapper{min-height:100%}
You should use standard libraries like e.g. the Closure Library (goog.structs.PriorityQueue):
https://google.github.io/closure-library/api/goog.structs.PriorityQueue.html
By clicking at the source code, you will know it is actually linking to goog.structs.Heap which you can follow:
https://github.com/google/closure-library/blob/master/closure/goog/structs/heap.js
I was not satisfied with the efficiency of existing priority queue implementations, so I decided to make my own:
https://github.com/luciopaiva/heapify
npm i heapify
This will run faster than any other publicly known implementation due to the use of typed arrays.
Works on both client and server ends, code base with 100% test coverage, tiny library (~100 LoC). Also, the interface is really simple. Here's some code:
import Heapify from "heapify";
const queue = new Heapify();
queue.push(1, 10); // insert item with key=1, priority=10
queue.push(2, 5); // insert item with key=2, priority=5
queue.pop(); // 2
queue.peek(); // 1
queue.peekPriority(); // 10
I provide here the implementation I use. I made the following decisions:
I often find that I need to store some payload together with the values by which the heap will be ordered. So I opted to have the heap consist of arrays, where the first element of the array must be the value to be used for the heap order. Any other elements in these arrays will just be payload that is not inspected.
True, a pure integer array, without room for payload, would make a faster implementation possible, but in practice I then find myself creating a Map to link those values with additional data (the payload). The administration of such a Map (also dealing with duplicate values!) destroys the benefits you get from such an integer-only array.
Using a user-defined comparator function comes with a performance cost, so I decided not to work with that. Instead the values are compared using comparison operators (<, >, ...). This works fine for numbers, bigints, strings, and Date instances. In case the values are objects that would not order well like that, their valueOf should be overridden to guarantee the desired ordering. Or, such objects should be provided as payload, and the object's property that really defines the order, should be given as the value (in first array position).
Extending the Array class also turned out to degrade the performance somewhat, so I opted to provide utility functions that take the heap (an Array instance) as first argument. This resembles how in Python the heapq module works and gives a "light" feeling to it: You work directly with your own array. No new, no inheritance, just plain functions acting on your array.
The usual sift-up and sift-down operations should not perform repeated swaps between parent and child, but only copy the tree values in one direction until the final insertion spot has been found, and only then the given value should be stored in that spot.
It should include a heapify function so an already populated array can be reordered into a heap. It should run in linear time so that it is more efficient than if you would start with an empty heap and then push each node unto it.
Here follows that collection of functions, with comments, and a simple demo at the end:
/* MinHeap:
* A collection of functions that operate on an array
* of [key,...data] elements (nodes).
*/
const MinHeap = {
/* siftDown:
* The node at the given index of the given heap is sifted down in
* its subtree until it does not have a child with a lesser value.
*/
siftDown(arr, i=0, value=arr[i]) {
if (i < arr.length) {
let key = value[0]; // Grab the value to compare with
while (true) {
// Choose the child with the least value
let j = i*2+1;
if (j+1 < arr.length && arr[j][0] > arr[j+1][0]) j++;
// If no child has lesser value, then we've found the spot!
if (j >= arr.length || key <= arr[j][0]) break;
// Copy the selected child node one level up...
arr[i] = arr[j];
// ...and consider the child slot for putting our sifted node
i = j;
}
arr[i] = value; // Place the sifted node at the found spot
}
},
/* heapify:
* The given array is reordered in-place so that it becomes a valid heap.
* Elements in the given array must have a [0] property (e.g. arrays).
* That [0] value serves as the key to establish the heap order. The rest
* of such an element is just payload. It also returns the heap.
*/
heapify(arr) {
// Establish heap with an incremental, bottom-up process
for (let i = arr.length>>1; i--; ) this.siftDown(arr, i);
return arr;
},
/* pop:
* Extracts the root of the given heap, and returns it (the subarray).
* Returns undefined if the heap is empty
*/
pop(arr) {
// Pop the last leaf from the given heap, and exchange it with its root
return this.exchange(arr, arr.pop()); // Returns the old root
},
/* exchange:
* Replaces the root node of the given heap with the given node, and
* returns the previous root. Returns the given node if the heap is empty.
* This is similar to a call of pop and push, but is more efficient.
*/
exchange(arr, value) {
if (!arr.length) return value;
// Get the root node, so to return it later
let oldValue = arr[0];
// Inject the replacing node using the sift-down process
this.siftDown(arr, 0, value);
return oldValue;
},
/* push:
* Inserts the given node into the given heap. It returns the heap.
*/
push(arr, value) {
let key = value[0],
// First assume the insertion spot is at the very end (as a leaf)
i = arr.length,
j;
// Then follow the path to the root, moving values down for as long
// as they are greater than the value to be inserted
while ((j = (i-1)>>1) >= 0 && key < arr[j][0]) {
arr[i] = arr[j];
i = j;
}
// Found the insertion spot
arr[i] = value;
return arr;
}
};
// Simple Demo:
let heap = [];
MinHeap.push(heap, [26, "Helen"]);
MinHeap.push(heap, [15, "Mike"]);
MinHeap.push(heap, [20, "Samantha"]);
MinHeap.push(heap, [21, "Timothy"]);
MinHeap.push(heap, [19, "Patricia"]);
let [age, name] = MinHeap.pop(heap);
console.log(`${name} is the youngest with ${age} years`);
([age, name] = MinHeap.pop(heap));
console.log(`Next is ${name} with ${age} years`);
For a more realistic example, see the implementation of Dijkstra's shortest path algorithm.
Here is the same MinHeap collection, but minified, together with its MaxHeap mirror:
const MinHeap={siftDown(h,i=0,v=h[i]){if(i<h.length){let k=v[0];while(1){let j=i*2+1;if(j+1<h.length&&h[j][0]>h[j+1][0])j++;if(j>=h.length||k<=h[j][0])break;h[i]=h[j];i=j;}h[i]=v}},heapify(h){for(let i=h.length>>1;i--;)this.siftDown(h,i);return h},pop(h){return this.exchange(h,h.pop())},exchange(h,v){if(!h.length)return v;let w=h[0];this.siftDown(h,0,v);return w},push(h,v){let k=v[0],i=h.length,j;while((j=(i-1)>>1)>=0&&k<h[j][0]){h[i]=h[j];i=j}h[i]=v;return h}};
const MaxHeap={siftDown(h,i=0,v=h[i]){if(i<h.length){let k=v[0];while(1){let j=i*2+1;if(j+1<h.length&&h[j][0]<h[j+1][0])j++;if(j>=h.length||k>=h[j][0])break;h[i]=h[j];i=j;}h[i]=v}},heapify(h){for(let i=h.length>>1;i--;)this.siftDown(h,i);return h},pop(h){return this.exchange(h,h.pop())},exchange(h,v){if(!h.length)return v;let w=h[0];this.siftDown(h,0,v);return w},push(h,v){let k=v[0],i=h.length,j;while((j=(i-1)>>1)>=0&&k>h[j][0]){h[i]=h[j];i=j}h[i]=v;return h}};
Took some inspiration from #gyre's answer and wrote a minimalistic version in TypeScript, that is about 550 bytes minified.
type Comparator<T> = (valueA: T, valueB: T) => number;
const swap = (arr: unknown[], i: number, j: number) => {
[arr[i], arr[j]] = [arr[j], arr[i]];
};
class PriorityQueue<T> {
#heap;
#isGreater;
constructor(comparator: Comparator<T>);
constructor(comparator: Comparator<T>, init: T[] = []) {
this.#heap = init;
this.#isGreater = (a: number, b: number) =>
comparator(init[a] as T, init[b] as T) > 0;
}
get size(): number {
return this.#heap.length;
}
peek(): T | undefined {
return this.#heap[0];
}
add(value: T): void {
this.#heap.push(value);
this.#siftUp();
}
poll(): T | undefined;
poll(
heap = this.#heap,
value = heap[0],
length = heap.length
): T | undefined {
if (length) {
swap(heap, 0, length - 1);
}
heap.pop();
this.#siftDown();
return value;
}
#siftUp(): void;
#siftUp(node = this.size - 1, parent = ((node + 1) >>> 1) - 1): void {
for (
;
node && this.#isGreater(node, parent);
node = parent, parent = ((node + 1) >>> 1) - 1
) {
swap(this.#heap, node, parent);
}
}
#siftDown(): void;
#siftDown(size = this.size, node = 0, isGreater = this.#isGreater): void {
while (true) {
const leftNode = (node << 1) + 1;
const rightNode = leftNode + 1;
if (
(leftNode >= size || isGreater(node, leftNode)) &&
(rightNode >= size || isGreater(node, rightNode))
) {
break;
}
const maxChild =
rightNode < size && isGreater(rightNode, leftNode)
? rightNode
: leftNode;
swap(this.#heap, node, maxChild);
node = maxChild;
}
}
}
Usage:
const numberComparator: Comparator<number> = (numberA, numberB) => {
return numberA - numberB;
};
const queue = new PriorityQueue(numberComparator);
queue.add(10);
queue.add(30);
queue.add(20);
while (queue.size) {
console.log(queue.poll());
}
Related
I'm trying to implement a Linked List with Array's methods. Currently trying to implement indexOf and includes. I was looking into the following corner case:
var arr = [1,2,,4,5];
console.log(arr.indexOf(undefined));
console.log(arr.includes(undefined));
console.log(arr[2] === undefined);
Output:
-1
true
true
It looks like, for arr=[1,2,,4,5] it will actually keep undefined at arr[2]. Reading ECMA-262 and based on this page, indexOf uses "Strict Equality" (operator === ) and includes uses SameValueZero (Same as Object.is, except -0 is considered equal to +0). My functions are:
isStrictlyEqual(x,y) {
return x === y;
}
sameValue(x, y) {
return Object.is(x,y);
}
sameValueZero(x, y) {
return x === y || (Number.isNaN(x) && Number.isNaN(y));
}
The implementation of sameValueZero was taken from this topic.
My indexOf uses this isStrictlyEqual and my includes uses this sameValueZero. But, sameValueZero(undefined,undefined) returns true so My indexOf return index 2 for indexOf(undefined) while arr.indexOf(undefined) returns -1. Note that [1,2,undefined,4,5].indexOf(undefined) returns 2. So my only guess is that Array does not actually store this empty item as undefined.
I have a method that allows to fill a linked list from an existing array, like so:
_extend(iterable) {
for (var i = 0; i < iterable.length; ++i) {
this.push(iterable[i]);
}
In this case iterable[2] will return undefined so my DS will save undefined at index 2. Also tried to use forEach, but it actually skips the empty items.
I want my linked list to be similar to implementation of Array and follow ECMA specification. How my linked list should treat this corner case? Keeping undefined at that place, didn't work because of indexOf. I guess I need to distinguish between an empty item and undefined. How can I do so? I came across with this topic - does it mean I need to do two loops on the array? One regular for and one for in? Would I need to create a dummy node in that case? Does it make sense to do it for Linked list?
My linked list class:
class DoublyLinkedList {
constructor(array=null) {
this.head = null;
this.tail = null;
this.length = 0;
if (array !== null) {
this._extend(array);
}
}
}
Yes, you can (have to) add a dummy node and check for it in includes (and I guess also find and findIndex). Here's a POC:
const HOLE = Symbol()
class MyArray {
constructor() {
this.items = []
}
extend(a) {
for (let i = 0; i < a.length; i++)
this.items.push(
a.hasOwnProperty(i) ? a[i] : HOLE
)
}
indexOf(val) {
for (let i = 0; i < this.items.length; i++)
if (this.items[i] === val)
return i
return -1
}
includes(val) {
for (let i = 0; i < this.items.length; i++)
if (Object.is(this.items[i], val) || (val === undefined && this.items[i] === HOLE))
return true
return false
}
}
m = new MyArray()
m.extend([1, 2, 3, , 5])
console.log(m.indexOf(undefined))
console.log(m.includes(undefined))
m = new MyArray()
m.extend([1, 2, 3, undefined, 5])
console.log(m.indexOf(undefined))
console.log(m.includes(undefined))
I am a noobie in JavaScript algorithm and cannot understand this optimal solution of the 2-sum
function twoNumberSum(array, target) {
const nums = {};
for (const num of array) {
const potentialMatch = target - num;
console.log('potential', potentialMatch);
if (potentialMatch in nums) {
return [potentialMatch, num]
} else {
nums[num] = true;
}
}
}
So the 2-sum problem basically says "find two numbers in an array that sum to the given target, and return their index". Let's walk through this code and talk about what's happening.
First, we start the function; I'm going to assume this makes sense (a function that's called twoNumberSum that takes in two arguments; namely, array and target) - note that in JS, we don't annotate types, so there is no return type
Now, first thing we do is create a new object called nums. In JS, objects are effectively hash maps (with some very important differences - see my note below); they store a key and a corresponding value. In JS, a key can be any string or number
Next, we start our iteration. If we do for (const a of b), and b is an array, this iterates over all the values of the array, with each iteration having that value stored in a.
Next, we subtract our current value from the target. Then comes the key line: if (potentialMatch in nums). The in keyword checks for the existence of a key: 'hello' in obj returns true if obj has the key 'hello'.
In this case, if we find this potential match, then that means we have found another number that is equal to target - num, which of course means we've found the other partner for our sum! So in this case, we simply return the two numbers. If, on the other hand, we do not find this potentialMatch, that means we need to keep looking. But we do want to remember we've seen this number - thus, we add it as a key by doing nums[num] = true (this creates a new key-value pair; namely the key is num and the value is true).
As one of the comments explained, this is just trying to keep track of a list of numbers; however, the author is trying to be clever by using a Hash Table instead of a normal array. This way, lookups are O(1) instead of O(n). For eyes not used to JS semantics, another way of explaining this code is that we build up a Map of the numbers, and then we check that map for our target value.
I mentioned earlier that using objects as hash tables isn't the best idea; this is because if you aren't careful, if you use user-provided keys, you can accidentally mess with what's called the Prototype Chain. This is beyond this discussion, but a better way forward would be to use a Set:
function twoNumberSum(array, target) {
// Create a new Hash Set. Sets take in an iterable, so we could
// Do it this way. But to remain as close to your original solution
// as possible, we won't for now, and instead populate it as we go
// const nums = new Set(array);
const nums = new Set();
for (const num of array) {
const potentialMatch = target - num;
if (nums.has(potentialMatch)) {
return [potentialMatch, num];
} else {
nums.add(num);
}
}
Sometimes, the problem instead asks for you to return the indices; using a Map instead makes this relatively trivial. Just store the index as the value and you're good to go!
function twoNumberSum(array, target) {
// Create the new map instead
const nums = new Map();
for (let n = 0; n < array.length; ++n) {
const potentialMatch = target - array[n];
if (nums.has(potentialMatch)) {
return [nums.get(potentialMatch), n];
} else {
nums.set(array[n], n);
}
}
Let me explain to you what it's all is working-.
function twoNumberSum(array, target) {
// This is and object in Javascript
const nums = {};
for (const num of array) { // This is for of loop which iterates the array.
//For of Doc - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/for...of
// Here's its calculating the potential.
const potentialMatch = target - num;
console.log('potential - ' + potentialMatch);
/**
* Nowhere `in` is used which check if any property exists in an object or not.
* in Usage - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/in
*
* It checks whether potential exists in the `nums` object, If exist it returns the array
* with potentialMatch and num to which it is matched.
*
* If the number is not there in nums object. It's setting there in else block
* to match in net iteration.
*/
if (potentialMatch in nums) {
return [potentialMatch, num]
} else {
nums[num] = true;
/**
* It forms an object when the potential match doesn't exist in nums for checking in the next iteration
* {
* 1: true,
* 2: true
* }
*/
}
console.log(nums)
}
}
console.log(twoNumberSum([1, 2, 4, 5, 6, 7, 8], 3))
You can also Run it from JSBin
I have a general question which is about whether it is possible to make zero-allocation iterators in Javascript. Note that by "iterator" I am not married to the current definition of iterator in ECMAScript, but just a general pattern for iterating over user-defined ranges.
To make the problem concrete, say I have a list like [5, 5, 5, 2, 2, 1, 1, 1, 1] and I want to group adjacent repetitions together, and process it into a form which is more like [5, 3], [2, 2], [1, 4]. I then want to access each of these pairs inside a loop, something like "for each pair in grouped(array), do something with pair". Furthermore, I want to reuse this grouping algorithm in many places, and crucially, in some really hot inner loops (think millions of loops per second).
Question: Is there an iteration pattern to accomplish this which has zero overhead, as if I hand-wrote the loop myself?
Here are the things I've tried so far. Let's suppose for concreteness that I am trying to compute the sum of all pairs. (To be clear I am not looking for alternative ways of writing this code, I am looking for an abstraction pattern: the code is just here to provide a concrete example.)
Inlining the grouping code by hand. This method performs the best, but obscures the intent of the computation. Furthermore, inlining by hand is error-prone and annoying.
function sumPairs(array) {
let sum = 0
for (let i = 0; i != array.length; ) {
let elem = array[i++], count = 1
while (i < array.length && array[i] == elem) { i++; count++; }
// Here we can actually use the pair (elem, count)
sum += elem + count
}
return sum
}
Using a visitor pattern. We can write a reduceGroups function which will call a given visitor(acc, elem, count) for each pair (elem, count), similar to the usual Array.reduce method. With that our computation becomes somewhat clearer to read.
function sumPairsVisitor(array) {
return reduceGroups(array, (sofar, elem, count) => sofar + elem + count, 0)
}
Unfortunately, Firefox in particular still allocates when running this function, unless the closure definition is manually moved outside the function. Furthermore, we lose the ability to use control structures like break unless we complicate the interface a lot.
Writing a custom iterator. We can make a custom "iterator" (not an ES6 iterator) which exposes elem and count properties, an empty property indicating that there are no more pairs remaining, and a next() method which updates elem and count to the next pair. The consuming code looks like this:
function sumPairsIterator(array) {
let sum = 0
for (let iter = new GroupIter(array); !iter.empty; iter.next())
sum += iter.elem + iter.count
return sum
}
I find this code the easiest to read, and it seems to me that it should be the fastest method of abstraction. (In the best possible case, scalar replacement could completely collapse the iterator definition into the function. In the second best case, it should be clear that the iterator does not escape the for loop, so it can be stack-allocated). Unfortunately, both Chrome and Firefox seem to allocate here.
Of the approaches above, the custom-defined iterator performs quite well in most cases, except when you really need to put the pedal to the metal in a hot inner loop, at which point the GC pressure becomes apparent.
I would also be ok with a Javascript post-processor (the Google Closure Compiler perhaps?) which is able to accomplish this.
Check this out. I've not tested its performance but it should be good.
(+) (mostly) compatible to ES6 iterators.
(-) sacrificed ...GroupingIterator.from(arr) in order to not create a (imo. garbage) value-object. That's the mostly in the point above.
afaik, the primary use case for this is a for..of loop anyways.
(+) no objects created (GC)
(+) object pooling for the iterators; (again GC)
(+) compatible with controll-structures like break
class GroupingIterator {
/* object pooling */
static from(array) {
const instance = GroupingIterator._pool || new GroupingIterator();
GroupingIterator._pool = instance._pool;
instance._pool = null;
instance.array = array;
instance.done = false;
return instance;
}
static _pool = null;
_pool = null;
/* state and value / payload */
array = null;
element = null;
index = 0;
count = 0;
/* IteratorResult interface */
value = this;
done = true;
/* Iterator interface */
next() {
const array = this.array;
let index = this.index += this.count;
if (!array || index >= array.length) {
return this.return();
}
const element = this.element = array[index];
while (++index < array.length) {
if (array[index] !== element) break;
}
this.count = index - this.index;
return this;
}
return() {
this.done = true;
// cleanup
this.element = this.array = null;
this.count = this.index = 0;
// return iterator to pool
this._pool = GroupingIterator._pool;
return GroupingIterator._pool = this;
}
/* Iterable interface */
[Symbol.iterator]() {
return this;
}
}
var arr = [5, 5, 5, 2, 2, 1, 1, 1, 1];
for (const item of GroupingIterator.from(arr)) {
console.log("element", item.element, "index", item.index, "count", item.count);
}
I use Node.js and want to serialize a large javascript object to HDD. The object is basically a "hashmap" and only contains data, not functions. The object contains elements with circular references.
This is an online application so the process should not block the main loop. In my use-case Non-blocking is much more important than speed (data is live in-memory data and is only load at startup, saves are for timed backups every X minutes and at shutdown/failure)
What is the best way to do this? Pointers to libraries that do what I want are more than welcome.
I have a nice solution I've been using. Its downside is that it has an O(n^2) runtime which makes me sad.
Here's the code:
// I defined these functions as part of a utility library called "U".
var U = {
isObj: function(obj, cls) {
try { return obj.constructor === cls; } catch(e) { return false; };
},
straighten: function(item) {
/*
Un-circularizes data. Works if `item` is a simple Object, an Array, or any inline value (string, int, null, etc).
*/
var arr = [];
U.straighten0(item, arr);
return arr.map(function(item) { return item.calc; });
},
straighten0: function(item, items) {
/*
The "meat" of the un-circularization process. Returns the index of `item`
within the array `items`. If `item` didn't initially exist within
`items`, it will by the end of this function, therefore this function
always produces a usable index.
Also, `item` is guaranteed to have no more circular references (it will
be in a new format) once its index is obtained.
*/
/*
STEP 1) If `item` is already in `items`, simply return it.
Note that an object's existence can only be confirmed by comparison to
itself, not an un-circularized version of itself. For this reason an
`orig` value is kept ahold of to make such comparisons possible. This
entails that every entry in `items` has both an `orig` value (the
original object, for comparison) and a `calc` value (the calculated, un
circularized value).
*/
for (var i = 0, len = items.length; i < len; i++) // This is O(n^2) :(
if (items[i].orig === item) return i;
var ind = items.length;
// STEP 2) Depending on the type of `item`, un-circularize it differently
if (U.isObj(item, Object)) {
/*
STEP 2.1) `item` is an `Object`. Create an un-circularized version of
that `Object` - keep all its keys, but replace each value with an index
that points to that values.
*/
var obj = {};
items.push({ orig: item, calc: obj }); // Note both `orig` AND `calc`.
for (var k in item)
obj[k] = U.straighten0(item[k], items);
} else if (U.isObj(item, Array)) {
/*
STEP 2.2) `item` is an `Array`. Create an un-circularized version of
that `Array` - replace each of its values with an index that indexes
the original value.
*/
var arr = [];
items.push({ orig: item, calc: arr }); // Note both `orig` AND `calc`.
for (var i = 0; i < item.length; i++)
arr.push(U.straighten0(item[i], items));
} else {
/*
STEP 2.3) `item` is a simple inline value. We don't need to make any
modifications to it, as inline values have no references (let alone
circular references).
*/
items.push({ orig: item, calc: item });
}
return ind;
},
unstraighten: function(items) {
/*
Re-circularizes un-circularized data! Used for undoing the effects of
`U.straighten`. This process will use a particular marker (`unbuilt`) to
show values that haven't yet been calculated. This is better than using
`null`, because that would break in the case that the literal value is
`null`.
*/
var unbuilt = { UNBUILT: true };
var initialArr = [];
// Fill `initialArr` with `unbuilt` references
for (var i = 0; i < items.length; i++) initialArr.push(unbuilt);
return U.unstraighten0(items, 0, initialArr, unbuilt);
},
unstraighten0: function(items, ind, built, unbuilt) {
/*
The "meat" of the re-circularization process. Returns an Object, Array,
or inline value. The return value may contain circular references.
*/
if (built[ind] !== unbuilt) return built[ind];
var item = items[ind];
var value = null;
/*
Similar to `straighten`, check the type. Handle Object, Array, and inline
values separately.
*/
if (U.isObj(item, Object)) {
// value is an ordinary object
var obj = built[ind] = {};
for (var k in item)
obj[k] = U.unstraighten0(items, item[k], built, unbuilt);
return obj;
} else if (U.isObj(item, Array)) {
// value is an array
var arr = built[ind] = [];
for (var i = 0; i < item.length; i++)
arr.push(U.unstraighten0(items, item[i], built, unbuilt));
return arr;
}
built[ind] = item;
return item;
},
thingToString: function(thing) {
/*
Elegant convenience function to convert any structure (circular or not)
to a string! Now that this function is available, you can ignore
`straighten` and `unstraighten`, and the headaches they may cause.
*/
var st = U.straighten(thing);
return JSON.stringify(st);
},
stringToThing: function(string) {
/*
Elegant convenience function to reverse the effect of `U.thingToString`.
*/
return U.unstraighten(JSON.parse(string));
}
};
var circular = {
val: 'haha',
val2: [ 'hey', 'ho', 'hee' ],
doesNullWork: null
};
circular.circle1 = circular;
circular.confusing = {
circular: circular,
value: circular.val2
};
console.log('Does JSON.stringify work??');
try {
var str = JSON.stringify(circular);
console.log('JSON.stringify works!!');
} catch(err) {
console.log('JSON.stringify doesn\'t work!');
}
console.log('');
console.log('Does U.thingToString work??');
try {
var str = U.thingToString(circular);
console.log('U.thingToString works!!');
console.log('Its result looks like this:')
console.log(str);
console.log('And here\'s it converted back into an object:');
var obj = U.stringToThing(str);
for (var k in obj) {
console.log('`obj` has key "' + k + '"');
}
console.log('Did `null` work?');
if (obj.doesNullWork === null)
console.log('yes!');
else
console.log('nope :(');
} catch(err) {
console.error(err);
console.log('U.thingToString doesn\'t work!');
}
The whole idea is to serialize some circular structure by placing every object within directly into an array.
E.g. if you have an object like this:
{
val: 'hello',
anotherVal: 'hi',
circular: << a reference to itself >>
}
Then U.straighten will produce this structure:
[
0: {
val: 1,
anotherVal: 2,
circular: 0 // Note that it's become possible to refer to "self" by index! :D
},
1: 'hello',
2: 'hi'
]
Just a couple of extra notes:
I've been using these functions for quite some time in a wide variety of situations! It's very unlikely there are hidden bugs.
The O(n^2) runtime issue could be defeated with an ability to map every object to a unique hash value (which can be implemented). The reason for the O(n^2) nature is a linear search must be used to find items that have already been circularized. Because this linear search is occurring within an already linear process, the runtime becomes O(n^2)
These methods actual provide a small amount of compression! Inline values that are the same will not occur twice at different indexes. All same instances of an inline value will be mapped to the same index. E.g.:
{
hi: 'hihihihihihihihihihihi-very-long-pls-compress',
ha: 'hihihihihihihihihihihi-very-long-pls-compress'
}
Becomes (after U.straighten):
[
0: {
hi: 1,
ha: 1
},
1: 'hihihihihihihihihihihi-very-long-pls-compress'
]
And finally, in case it wasn't clear using this code is very easy!! You only need to ever look at U.thingToString and U.stringToThing. The usage of these functions is precisely the same as the usage of JSON.stringify and JSON.parse.
var circularObj = // Some big circular object you have
var serialized = U.thingToString(circularObj);
var unserialized = U.stringToThing(serialized);
I need to add an element to an array only if it is not already there in Javascript. Basically I'm treating the array as a set.
I need the data to be stored in an array, otherwise I'd just use an object which can be used as a set.
I wrote the following array prototype and wanted to hear if anyone knew of a better way. This is an O(n) insert. I was hoping to do O(ln(n)) insert, however, I didn't see an easy way to insert an element into a sorted array. For my applications, the array lengths will be very small, but I'd still prefer something that obeyed accepted rules for good algorithm efficiency:
Array.prototype.push_if_not_duplicate = function(new_element){
for( var i=0; i<this.length; i++ ){
// Don't add if element is already found
if( this[i] == new_element ){
return this.length;
}
}
// add new element
return this.push(new_element);
}
If I understand correctly, you already have a sorted array (if you do not have a sorted array then you can use Array.sort method to sort your data) and now you want to add an element to it if it is not already present in the array. I extracted the binary insert (which uses binary search) method in the google closure library. The relevant code itself would look something like this and it is O(log n) operation because binary search is O(log n).
function binaryInsert(array, value) {
var index = binarySearch(array, value);
if (index < 0) {
array.splice(-(index + 1), 0, value);
return true;
}
return false;
};
function binarySearch(arr, value) {
var left = 0; // inclusive
var right = arr.length; // exclusive
var found;
while (left < right) {
var middle = (left + right) >> 1;
var compareResult = value > arr[middle] ? 1 : value < arr[middle] ? -1 : 0;
if (compareResult > 0) {
left = middle + 1;
} else {
right = middle;
// We are looking for the lowest index so we can't return immediately.
found = !compareResult;
}
}
// left is the index if found, or the insertion point otherwise.
// ~left is a shorthand for -left - 1.
return found ? left : ~left;
};
Usage is binaryInsert(array, value). This also maintains the sort of the array.
Deleted my other answer because I missed the fact that the array is sorted.
The algorithm you wrote goes through every element in the array and if there are no matches appends the new element on the end. I assume this means you are running another sort after.
The whole algorithm could be improved by using a divide and conquer algorithm. Choose an element in the middle of the array, compare with new element and continue until you find the spot where to insert. It will be slightly faster than your above algorithm, and won't require a sort afterwards.
If you need help working out the algorithm, feel free to ask.
I've created a (simple and incomplete) Set type before like this:
var Set = function (hashCodeGenerator) {
this.hashCode = hashCodeGenerator;
this.set = {};
this.elements = [];
};
Set.prototype = {
add: function (element) {
var hashCode = this.hashCode(element);
if (this.set[hashCode]) return false;
this.set[hashCode] = true;
this.elements.push(element);
return true;
},
get: function (element) {
var hashCode = this.hashCode(element);
return this.set[hashCode];
},
getElements: function () { return this.elements; }
};
You just need to find out a good hashCodeGenerator function for your objects. If your objects are primitives, this function can return the object itself. You can then access the set elements in array form from the getElements accessor. Inserts are O(1). Space requirements are O(2n).
If your array is a binary tree, you can insert in O(log n) by putting the new element on the end and bubbling it up into place. Checks for duplicates would also take O(log n) to perform.
Wikipedia has a great explanation.