Allocation-free abstractions in Javascript - javascript

I have a general question which is about whether it is possible to make zero-allocation iterators in Javascript. Note that by "iterator" I am not married to the current definition of iterator in ECMAScript, but just a general pattern for iterating over user-defined ranges.
To make the problem concrete, say I have a list like [5, 5, 5, 2, 2, 1, 1, 1, 1] and I want to group adjacent repetitions together, and process it into a form which is more like [5, 3], [2, 2], [1, 4]. I then want to access each of these pairs inside a loop, something like "for each pair in grouped(array), do something with pair". Furthermore, I want to reuse this grouping algorithm in many places, and crucially, in some really hot inner loops (think millions of loops per second).
Question: Is there an iteration pattern to accomplish this which has zero overhead, as if I hand-wrote the loop myself?
Here are the things I've tried so far. Let's suppose for concreteness that I am trying to compute the sum of all pairs. (To be clear I am not looking for alternative ways of writing this code, I am looking for an abstraction pattern: the code is just here to provide a concrete example.)
Inlining the grouping code by hand. This method performs the best, but obscures the intent of the computation. Furthermore, inlining by hand is error-prone and annoying.
function sumPairs(array) {
let sum = 0
for (let i = 0; i != array.length; ) {
let elem = array[i++], count = 1
while (i < array.length && array[i] == elem) { i++; count++; }
// Here we can actually use the pair (elem, count)
sum += elem + count
}
return sum
}
Using a visitor pattern. We can write a reduceGroups function which will call a given visitor(acc, elem, count) for each pair (elem, count), similar to the usual Array.reduce method. With that our computation becomes somewhat clearer to read.
function sumPairsVisitor(array) {
return reduceGroups(array, (sofar, elem, count) => sofar + elem + count, 0)
}
Unfortunately, Firefox in particular still allocates when running this function, unless the closure definition is manually moved outside the function. Furthermore, we lose the ability to use control structures like break unless we complicate the interface a lot.
Writing a custom iterator. We can make a custom "iterator" (not an ES6 iterator) which exposes elem and count properties, an empty property indicating that there are no more pairs remaining, and a next() method which updates elem and count to the next pair. The consuming code looks like this:
function sumPairsIterator(array) {
let sum = 0
for (let iter = new GroupIter(array); !iter.empty; iter.next())
sum += iter.elem + iter.count
return sum
}
I find this code the easiest to read, and it seems to me that it should be the fastest method of abstraction. (In the best possible case, scalar replacement could completely collapse the iterator definition into the function. In the second best case, it should be clear that the iterator does not escape the for loop, so it can be stack-allocated). Unfortunately, both Chrome and Firefox seem to allocate here.
Of the approaches above, the custom-defined iterator performs quite well in most cases, except when you really need to put the pedal to the metal in a hot inner loop, at which point the GC pressure becomes apparent.
I would also be ok with a Javascript post-processor (the Google Closure Compiler perhaps?) which is able to accomplish this.

Check this out. I've not tested its performance but it should be good.
(+) (mostly) compatible to ES6 iterators.
(-) sacrificed ...GroupingIterator.from(arr) in order to not create a (imo. garbage) value-object. That's the mostly in the point above.
afaik, the primary use case for this is a for..of loop anyways.
(+) no objects created (GC)
(+) object pooling for the iterators; (again GC)
(+) compatible with controll-structures like break
class GroupingIterator {
/* object pooling */
static from(array) {
const instance = GroupingIterator._pool || new GroupingIterator();
GroupingIterator._pool = instance._pool;
instance._pool = null;
instance.array = array;
instance.done = false;
return instance;
}
static _pool = null;
_pool = null;
/* state and value / payload */
array = null;
element = null;
index = 0;
count = 0;
/* IteratorResult interface */
value = this;
done = true;
/* Iterator interface */
next() {
const array = this.array;
let index = this.index += this.count;
if (!array || index >= array.length) {
return this.return();
}
const element = this.element = array[index];
while (++index < array.length) {
if (array[index] !== element) break;
}
this.count = index - this.index;
return this;
}
return() {
this.done = true;
// cleanup
this.element = this.array = null;
this.count = this.index = 0;
// return iterator to pool
this._pool = GroupingIterator._pool;
return GroupingIterator._pool = this;
}
/* Iterable interface */
[Symbol.iterator]() {
return this;
}
}
var arr = [5, 5, 5, 2, 2, 1, 1, 1, 1];
for (const item of GroupingIterator.from(arr)) {
console.log("element", item.element, "index", item.index, "count", item.count);
}

Related

2 Sum algorithm explantion?

I am a noobie in JavaScript algorithm and cannot understand this optimal solution of the 2-sum
function twoNumberSum(array, target) {
const nums = {};
for (const num of array) {
const potentialMatch = target - num;
console.log('potential', potentialMatch);
if (potentialMatch in nums) {
return [potentialMatch, num]
} else {
nums[num] = true;
}
}
}
So the 2-sum problem basically says "find two numbers in an array that sum to the given target, and return their index". Let's walk through this code and talk about what's happening.
First, we start the function; I'm going to assume this makes sense (a function that's called twoNumberSum that takes in two arguments; namely, array and target) - note that in JS, we don't annotate types, so there is no return type
Now, first thing we do is create a new object called nums. In JS, objects are effectively hash maps (with some very important differences - see my note below); they store a key and a corresponding value. In JS, a key can be any string or number
Next, we start our iteration. If we do for (const a of b), and b is an array, this iterates over all the values of the array, with each iteration having that value stored in a.
Next, we subtract our current value from the target. Then comes the key line: if (potentialMatch in nums). The in keyword checks for the existence of a key: 'hello' in obj returns true if obj has the key 'hello'.
In this case, if we find this potential match, then that means we have found another number that is equal to target - num, which of course means we've found the other partner for our sum! So in this case, we simply return the two numbers. If, on the other hand, we do not find this potentialMatch, that means we need to keep looking. But we do want to remember we've seen this number - thus, we add it as a key by doing nums[num] = true (this creates a new key-value pair; namely the key is num and the value is true).
As one of the comments explained, this is just trying to keep track of a list of numbers; however, the author is trying to be clever by using a Hash Table instead of a normal array. This way, lookups are O(1) instead of O(n). For eyes not used to JS semantics, another way of explaining this code is that we build up a Map of the numbers, and then we check that map for our target value.
I mentioned earlier that using objects as hash tables isn't the best idea; this is because if you aren't careful, if you use user-provided keys, you can accidentally mess with what's called the Prototype Chain. This is beyond this discussion, but a better way forward would be to use a Set:
function twoNumberSum(array, target) {
// Create a new Hash Set. Sets take in an iterable, so we could
// Do it this way. But to remain as close to your original solution
// as possible, we won't for now, and instead populate it as we go
// const nums = new Set(array);
const nums = new Set();
for (const num of array) {
const potentialMatch = target - num;
if (nums.has(potentialMatch)) {
return [potentialMatch, num];
} else {
nums.add(num);
}
}
Sometimes, the problem instead asks for you to return the indices; using a Map instead makes this relatively trivial. Just store the index as the value and you're good to go!
function twoNumberSum(array, target) {
// Create the new map instead
const nums = new Map();
for (let n = 0; n < array.length; ++n) {
const potentialMatch = target - array[n];
if (nums.has(potentialMatch)) {
return [nums.get(potentialMatch), n];
} else {
nums.set(array[n], n);
}
}
Let me explain to you what it's all is working-.
function twoNumberSum(array, target) {
// This is and object in Javascript
const nums = {};
for (const num of array) { // This is for of loop which iterates the array.
//For of Doc - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/for...of
// Here's its calculating the potential.
const potentialMatch = target - num;
console.log('potential - ' + potentialMatch);
/**
* Nowhere `in` is used which check if any property exists in an object or not.
* in Usage - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/in
*
* It checks whether potential exists in the `nums` object, If exist it returns the array
* with potentialMatch and num to which it is matched.
*
* If the number is not there in nums object. It's setting there in else block
* to match in net iteration.
*/
if (potentialMatch in nums) {
return [potentialMatch, num]
} else {
nums[num] = true;
/**
* It forms an object when the potential match doesn't exist in nums for checking in the next iteration
* {
* 1: true,
* 2: true
* }
*/
}
console.log(nums)
}
}
console.log(twoNumberSum([1, 2, 4, 5, 6, 7, 8], 3))
You can also Run it from JSBin

Best way to loop through an array and parse each element into several data structures

I have an array of ~11,000 JavaScript dictionaries, each representing 1 row in an Excel file.
I want to loop through this array and parse each element into a new datastructure. For example, I might have a function that will count for {"foo": true} or something.
As I have multiple of these functions, my question is would it be better to loop through this array for each function, or have one single loop with functions that parse each element and store it in a global variable?
Ex. I'm currently doing one single loop, and parsing each element into a global variable
const arr = [...]; // array of ~11,000 dictionaries
// example parsing function
let count = 0;
function countFoos(el) {
if (el["foo"] === true) count++;
}
let count2 = 0;
function countBars(el) {
if (el["bar"] === false) count2++;
}
arr.forEach(el => {
countFoos(el);
countBars(el);
});
But would it be better to do it this way?
class Parse {
constructor(arr) {
this.arr = arr;
this.count = 0;
this.count2 = 0;
}
countFoos() {
this.arr.forEach((el) => {
if (el["foo"] === true) this.count++;
});
}
countBars() {
this.arr.forEach((el) => {
if (el["bar"] === false) this.count2++;
});
}
}
const arr = [...]; // array of ~11,000 dictionaries
let x = Parse();
x.countFoos();
x.countBars();
EDIT: I should've clarified early, the examples shown above are just very simplified examples of the production code. Approximately 20 'parsing functions' are being run on for each element, with each of its corresponding global variables being large dictionaries or arrays.
You should generally do just one iteration that calls both functions.
Iterating takes time, so doing two iterations will double the time taken to perform the iterations. How significant this is to the entire application depends on how much work is done in the body of the iteration. If the bodies are very expensive, the iteration time might fall into the noise. But if it's really simple, as in your examples of a simple test and variable increment, the iteration time will probably be significant.
If you are worried about performance, the first method is better as it only involves one iteration over the entire array while the second approach requires two.
If think using classes is more readable, you could simply put write that as one method in the class.
class Parse {
constructor(arr) {
this.arr = arr;
this.count = 0;
this.count2 = 0;
}
count() {
this.arr.forEach((el) => {
countFoos(el), countBars(el);
});
}
countFoos(el){
if(el.foo === true) this.count1++;
}
countBars() {
if(el.bar === false) this.count2++;
}
}
I would approach this by using the Array.prototype.reduce function, which would only require a single pass over the given array. I also would not use a class here as it would not really make sense, but you can if you really want!
function count(arr) {
return arr.reduce(([fooCount, barCount], next) => {
if (next.foo === true) {
fooCount = fooCount + 1
}
if (next.bar === false) {
barCount = barCount + 1
}
return [fooCount, barCount]
}, [0, 0]);
}
const [fooCount, barCount] = count(...);
You can also use generators to accomplish this, which is even better because it doesn't require that you to iterate the entire set of words in the dictionary, but it's a little more unwieldy to use.
This is actually easier to use than other examples that require if statements, because you could quite easily run a battery of functions over each result and add it to the accumulator.
Just remember though that you don't want to optimize before you prove something is a problem. Iterating 22000 objects is obviously more than iterating 11000, but it is still going to be quite fast!
Restricting the number of loops is your best option as it requires less overhead.
Also here is an idea, using the foreach to do the processing with if statements and using a single counter object to hold all of the values so they mean something and can be easily referenced later on.
const arr = [
{"foo" : true,"bar" : false},{"bar" : true,"foo" : false}
];
let counters = {};
function inc(field) {
if (counters[field] == undefined) {
counters[field] = 0;
}
counters[field]++;
}
arr.forEach(el => {
if (el["foo"] === true) {
inc("foo");
}
if (el["bar"] === true) {
inc("bar");
}
});
console.log(counters);

Using array.splice inside Array prototype

Array.prototype.move = function(oldIndex, newIndex) {
var val = this.splice(oldIndex, 1);
this.splice(newIndex, 0, val[0]);
}
//Testing - Change array position
var testarray = [1, 2, 3, 4];
testarray.move(3, 0);
console.log(testarray);
This produces an error "this.splice is not a function" yet it returns the desired results. Why?
Array.prototype.move = function(oldIndex, newIndex) {
if(Object.prototype.toString.call(this) === '[object Array]') {
if(oldIndex && typeof oldIndex == 'number' && newIndex && typeof newIndex == 'number') {
if(newIndex > this.length) newIndex = this.length;
this.splice(newIndex, 0, this.splice(oldIndex, 1)[0]);
}
}
};
For some reason, the function is being called by the called by the document on load (still haven't quite figured that one out). I added a few checks to verify that this = an array, and then also reset the new index to be equal to the total size if the supplied int was greater than the total length. This solved the error issue I was having, and to me is the simplest way to move objects around in an array. As for why the function is being called onload must be something to do with my code.
You don't need the placeholder variable-
Array.prototype.move = function(oldIndex, newIndex) {
this.splice(newIndex, 0, this.splice(oldIndex, 1)[0]);
}
var a=[1,2,3,4,9,5,6,7,8];
a.move(4,8);
a[8]
/* returned value: (Number)
9
*/
Adding properties to built–in objects is not a good idea if your code must work in arbitrary environments. If you do extend such objects, you shouldn't use property names that are likely to be used by someone else doing the same or similar thing.
There seems to be more than one way to "move" a member, what you seem to be doing can be better named as "swap", so:
if (!Array.prototype.swap) {
Array.prototype.swap = function(a, b) {
var t = this[a];
this[a] = this[b];
this[b] = t;
}
}
I expect that simple re-assignment of values is more efficient than calling methods that need to create new arrays and modify the old one a number of times. But that might be moot anyway. The above is certainly simpler to read and is fewer characters to type.
Note also that the above is stable, array.swap(4,8) gives the same result as array.swap(8,4).
If you want to make a robust function, you first need to work out what to do in cases where either index is greater than array.length, or if one doesn't exist, and so on. e.g.
var a = [,,2]; // a has length 3
a.swap(0,2);
In the above, there are no members at 0 or 1, only at 2. So should the result be:
a = [2]; // a has length 1
or should it be (which will be the result of the above):
a = [2,,undefined]; // a has length 3
or
a = [2,,,]; // a has length 3 (IE may think it's 4, but that's wrong)
Edit
Note that in the OP, the result of:
var b = [,,2];
b.move(0,2);
is
alert(b); // [,2,];
which may not be what is expected, and
b.move(2,0);
alert(b); // [2,,];
so it is not stable either.

Append array element only if it is not already there in Javascript

I need to add an element to an array only if it is not already there in Javascript. Basically I'm treating the array as a set.
I need the data to be stored in an array, otherwise I'd just use an object which can be used as a set.
I wrote the following array prototype and wanted to hear if anyone knew of a better way. This is an O(n) insert. I was hoping to do O(ln(n)) insert, however, I didn't see an easy way to insert an element into a sorted array. For my applications, the array lengths will be very small, but I'd still prefer something that obeyed accepted rules for good algorithm efficiency:
Array.prototype.push_if_not_duplicate = function(new_element){
for( var i=0; i<this.length; i++ ){
// Don't add if element is already found
if( this[i] == new_element ){
return this.length;
}
}
// add new element
return this.push(new_element);
}
If I understand correctly, you already have a sorted array (if you do not have a sorted array then you can use Array.sort method to sort your data) and now you want to add an element to it if it is not already present in the array. I extracted the binary insert (which uses binary search) method in the google closure library. The relevant code itself would look something like this and it is O(log n) operation because binary search is O(log n).
function binaryInsert(array, value) {
var index = binarySearch(array, value);
if (index < 0) {
array.splice(-(index + 1), 0, value);
return true;
}
return false;
};
function binarySearch(arr, value) {
var left = 0; // inclusive
var right = arr.length; // exclusive
var found;
while (left < right) {
var middle = (left + right) >> 1;
var compareResult = value > arr[middle] ? 1 : value < arr[middle] ? -1 : 0;
if (compareResult > 0) {
left = middle + 1;
} else {
right = middle;
// We are looking for the lowest index so we can't return immediately.
found = !compareResult;
}
}
// left is the index if found, or the insertion point otherwise.
// ~left is a shorthand for -left - 1.
return found ? left : ~left;
};
Usage is binaryInsert(array, value). This also maintains the sort of the array.
Deleted my other answer because I missed the fact that the array is sorted.
The algorithm you wrote goes through every element in the array and if there are no matches appends the new element on the end. I assume this means you are running another sort after.
The whole algorithm could be improved by using a divide and conquer algorithm. Choose an element in the middle of the array, compare with new element and continue until you find the spot where to insert. It will be slightly faster than your above algorithm, and won't require a sort afterwards.
If you need help working out the algorithm, feel free to ask.
I've created a (simple and incomplete) Set type before like this:
var Set = function (hashCodeGenerator) {
this.hashCode = hashCodeGenerator;
this.set = {};
this.elements = [];
};
Set.prototype = {
add: function (element) {
var hashCode = this.hashCode(element);
if (this.set[hashCode]) return false;
this.set[hashCode] = true;
this.elements.push(element);
return true;
},
get: function (element) {
var hashCode = this.hashCode(element);
return this.set[hashCode];
},
getElements: function () { return this.elements; }
};
You just need to find out a good hashCodeGenerator function for your objects. If your objects are primitives, this function can return the object itself. You can then access the set elements in array form from the getElements accessor. Inserts are O(1). Space requirements are O(2n).
If your array is a binary tree, you can insert in O(log n) by putting the new element on the end and bubbling it up into place. Checks for duplicates would also take O(log n) to perform.
Wikipedia has a great explanation.

How do I add syntactic sugar in my Javascript library?

Right now the library can translate this operation
Select * from List where name = k% order by desc
to
List.filter(function(x) { return x.first_char() == 'k' }).sort().reverse());
Whats the best hack to remove the () so that the developer can write statements like:
List.filter(fn(x) { return x.first_char == 'k' }).sort.reverse;
Naive approach:
maxfn = function() {this[0]..}; Array.prototype.max = maxfn();
But with this approach I can't access 'this'.
I wanted to add a syntactic sugar for
new Array("1","2","3")
to something like :)(suggestions needed)
_("1","2" ,"3")
like we have in scheme where list -> '
I tried to clone the arguments but failed.
Thanks.
For lists you can use JSON notation:
["1", "2", "3"]
You can use JSON notation as suggested by RoBorg, if you control the list... However, there's no cross-browser way to treat a property as a method. Note: spidermonkey (firefox) does support using a getter (get method for a property).
Whats the best hack to remove the ()
Property getters/setters in JavaScript. Unfortunately it's a relatively new JavaScript feature that won't work on IE6/7 (as well as various other older browsers), so it's not really ready for prime-time yet (despite the intro of the linked article).
You could do this particular example by making a JavaScript object that wrapped a String and shadowed all String's methods, then add a static ‘first_char’ property set to the String's first character on initialisation. But it's really not worth it.
new Array("1","2","3")
to something like :)(suggestions needed)
_("1","2" ,"3")
Well that's simple enough:
function _(/* items */) {
var a= new Array();
for (var i= 0; i<arguments.length; i++)
a[i]= arguments[i];
return a;
}
There's no point in doing it nowadays, though, since the array literal syntax:
['1', '2', '3']
has been available since JavaScript 1.1-1.2 era and is available in every browser today. (It predates JSON by many, many years.)
I'll try to answer one by one:
1) Why would you want to remove parenthesis from a functon call?
2) If the "naive" approach is failing it's probably because you are calling the maxFn and assigning the results to Array.prototype.max. It should be like this:
maxfn = function() {this[0]..}; Array.prototype.max = maxfn;
3) RoBorg is correct, just use literal notation to construct arrays on the fly.
Edit:
Here's one way of implementing a max function on an array object. The optional evaluator argument is a function that takes two parameters, the current max value and current value in array. It should return the object that is "greater". Useful for non-primitives.
Array.prototype.max = function(evaluator) {
var max, i = 1; len = this.length;
if (len > 0) max = this[0];
for (; i < len; i++) {
if (evaluator) {
max = evaluator(max, this[i]);
}
else if(max < this[i]) {
max = this[i];
}
}
return max;
};
var a = [1, 3, 4, 5, 6];
alert(a.max());
var b = ["Arnold", "Billy", "Caesar"];
alert(b.max());
var c = ["Arnold", new Date(), 99, true];
alert(c.max());
var d = [1, 3, 4, 5, 6];
alert(d.max(function (max, val) { return max < val ? val : max }));

Categories

Resources