I just discovered that Javascript has typed arrays via this link. I was immediately curious what the benefit of such objects might be in the language.
I noticed that UInt8Arrays lose the .map()-type functions that I would have for normal arrays objects so if you want to loop over them you would need a for loop.
I assumed that I might be able to expect some performance boost when using UInt8Arrays but this doesn't seem to be the case.
var a = d3.range(225)
var b = new Uint8Array(d3.range(225))
console.time("a")
var result = 0;
for (var j = 10000; j >= 0; j--) {
for (var i = a.length - 1; i >= 0; i--) {
result += a[i];
};
};
console.timeEnd("a")
console.time("b")
var result = 0;
for (var j = 10000; j >= 0; j--) {
for (var i = b.length - 1; i >= 0; i--) {
result += b[i];
};
};
console.timeEnd("b")
I am using the d3 library to quickly generate a large array. This script gives the following output:
a: 2760.176ms
b: 2779.477ms
So the performance doesn't improve. The UInt8Array also doesn't throw an error when you insert a wrong value.
> new Uint8Array([1,2,3,4,'aasdf'])
[1,2,3,4,0]
With this in mind, what is the proper use case for UInt8Array in Javascript? It seems like the normal array is a lot more flexible, equally robust and equally fast.
"Performance" usually doesn't mean just how fast your script runs. There are also many other important factors, like how fast it freezes your pc and/or crashes. memory. The smallest amount of memory javascript implementation usually allocate for a var is 32 bit. This means
var a = true;
a boolean looks like this in your memory:
0000 0000 0000 0000 0000 0000 0000 0001
It's a huge waste, but usually not a problem, as no one uses a significant enough amount of them for it to really matter. Typed Arrays are for cases where it does matter, when you can actually reduce your memory usage by a huge amount, like when working with image data, sound data, or all sorts of raw binary data.
Another difference, that allows you to potentially save even more memory in some cases is that it allows you to operate on data passed by reference you'd normally pass by value.
Consider this case:
var oneImage = new Uint8Array( 16 * 16 * 4 );
var onePixel = new Uint8Array( oneImage.buffer, 0, 4 );
you now have 2 independent views on the same ArrayBuffer, operating on the same data, applying that concept allows you to not only have that one huge thing in your memory, it allows you to actually subdivide it into as many segments as you currently want to work on with little overhead, which is probably even more important.
Related
I want to create an array of type number with length n. All values inside the array should be 0 except the one which index matches a condition.
Thats how i currently do it:
const data: number[] = [];
for (let i = 0; i < n; i++) {
if (i === someIndex) {
data.push(someNumber);
} else {
data.push(0);
}
}
So lets say n = 4, someIndex = 2, someNumber = 4 would result in the array [0, 0, 4, 0].
Is there a way to do it in O(1) instead of O(n)?
Creating an array of size n in O(1) time is theoretically possible depending on implementation details - in principle, if an array is implemented as a hashtable then its length property can be set without allocating or initialising space for all of its elements. The ECMAScript specification for the Array(n) constructor doesn't mandate that Array(n) should do anything which necessarily takes more than O(1) time, although it also doesn't mandate that the time complexity is O(1).
In practice, Array(n)'s time complexity depends on the browser, though verifying this is a bit tricky. The performance.now() function can be used to measure the time elapsed between the start and end of a computation, but the precision of this function is artificially reduced in many browsers to protect against CPU-timing attacks like Spectre. To get around this, we can call the constructor repetitions times, and then divide the time elapsed by repetitions to get a more precise measurement per constructor call.
My timing code is below:
function timeArray(n, repetitions=100000) {
var startTime = performance.now();
for(var i = 0; i < repetitions; ++i) {
var arr = Array(n);
arr[n-1] = 'foo';
}
var endTime = performance.now();
return (endTime - startTime) / repetitions;
}
for(var n = 10000; n <= 1000000; n += 10000) {
console.log(n, timeArray(n));
}
Here's my results from Google Chrome (version 74) and Firefox (version 72); on Chrome the performance is clearly O(n) and on Firefox it's clearly O(1) with a quite consistent time of about 0.01ms on my machine.
I measured using repetitions = 1000 on Chrome, and repetitions = 100000 on Firefox, to get accurate enough results within a reasonable time.
Another option proposed by #M.Dietz in the comments is to declare the array like var arr = []; and then assign at some index (e.g. arr[n-1] = 'foo';). This turns out to take O(1) time on both Chrome and Firefox, both consistently under one nanosecond:
That suggests the version using [] is better to use than the version using Array(n), but still the specification doesn't mandate that this should take O(1) time, so there may be other browsers where this version takes O(n) time. If anybody gets different results on another browser (or another version of one of these browsers) then please do add a comment.
You need to assign n values, and so there is that amount of work to do. The work increases linearly with increasing n.
Having said that, you can hope to make your code a bit faster by making use of .fill:
const data: number[] = Array(n).fill(0);
data[someIndex] = someNumber;
But don't be mistaken; this is still O(n): .fill may be faster, but it still requires to fill the whole array with zeroes, which means a corresponding size of memory needs to be initialised, so that operation has linear time complexity.
If however you drop the requirement that zeroes need to be assigned, then you can only store the someNumber:
const data: number[] = Array(n);
data[someIndex] = someNumber;
This way you actually do not allocate the memory for the whole array, so this code snippet runs in constant time. Any access to an index different from someIndex will give you a value of undefined. You may trap that condition and translate that to a zero on-the-fly:
let value = i in data ? data[i] : 0;
Obviously, if you are going to access all indices of the array like that, you'll have again a linear time complexity.
Wanted to share a simple experiment I ran, using node.js v6.11.0 under Win 10.
Goal. Compare arrays vs. objects in terms of memory occupied.
Code. Each function reference, twoArrays, matrix and objects create two arrays of same size, containing random numbers. They organize the data a bit differentely.
reference creates two arrays of fixed size and fills them with numbers.
twoArrays fills two arrays via push (so the interpreter doesn't know the final size).
objects creates one array via push, each element is an object containing two numbers.
matrix creates a two-row matrix, also using push.
const SIZE = 5000000;
let s = [];
let q = [];
function rand () {return Math.floor(Math.random()*10)}
function reference (size = SIZE) {
s = new Array(size).fill(0).map(a => rand());
q = new Array(size).fill(0).map(a => rand());
}
function twoArrays (size = SIZE) {
s = [];
q = [];
let i = 0;
while (i++ < size) {
s.push(rand());
q.push(rand());
}
}
function matrix (size = SIZE) {
s = [];
let i = 0;
while (i++ < size) s.push([rand(), rand()]);
}
function objects (size = SIZE) {
s = [];
let i = 0;
while (i++ < size) s.push({s: rand(), q: rand()});
}
Result. After running each function separately in a fresh environment, and after calling global.gc() few times, the Node.js environment was occupying the following memory sizes:
reference: 84 MB
twoArrays: 101 MB
objects: 249 MB
matrix: 365 MB
theoretical: assuming that each number takes 8 bytes, the size should be 5*10^6*2*8 ~ 80 MB
We see, that reference resulted in a lightest memory structure, which is kind of obvious.
twoArrays is taking a bit more of memory. I think this is due to the fact that the arrays there are dynamic and the interpreter allocates memory in chunks, as soon as next push operation is exceeding preallocated space. Hence the final memory allocation is done for more than 5^10 numbers.
objects is interesting. Although each object is kind of fixed, it seems that the interpreter doesn't think so, and allocates much more space for each object then necessary.
matrix is also quite interesting - obviously, in case of explicit array definition in the code, the interpreter allocates more memory than required.
Conclusion. If your aim is a high-performance application, try to use arrays. They are also fast and have just O(1) time for random access. If the nature of your project requires objects, you can quite often simulate them with arrays as well (in case number of properties in each object is fixed).
Hope this is usefull, would like to hear what people think or maybe there are links to some more thorough experiments...
Out of curiosity I wrote some trivial benchmarks comparing the performance of golang maps to JavaScript (v8/node.js) objects used as maps and am surprised at their relative performance. JavaScript objects appear to perform roughly twice as fast as go maps (even including some minor performance edges for go)!
Here is the go implementation:
// map.go
package main
import "fmt"
import "time"
func elapsedMillis(t0, t1 time.Time) float64 {
n0, n1 := float64(t0.UnixNano()), float64(t1.UnixNano())
return (n1 - n0) / 1e6
}
func main() {
m := make(map[int]int, 1000000)
t0 := time.Now()
for i := 0; i < 1000000; i++ {
m[i] = i // Put.
_ = m[i] + 1 // Get, use, discard.
}
t1 := time.Now()
fmt.Printf("go: %fms\n", elapsedMillis(t0, t1))
}
And here is the JavaScript:
#!/usr/bin/env node
// map.js
function elapsedMillis(hrtime0, hrtime1) {
var n0 = hrtime0[0] * 1e9 + hrtime0[1];
var n1 = hrtime1[0] * 1e9 + hrtime1[1];
return (n1 - n0) / 1e6;
}
var m = {};
var t0 = process.hrtime();
for (var i=0; i<1000000; i++) {
m[i] = i; // Put.
var _ = m[i] + 1; // Get, use, discard.
}
var t1 = process.hrtime();
console.log('js: ' + elapsedMillis(t0, t1) + 'ms');
Note that the go implementation has a couple of minor potential performance edges in that:
Go is mapping integers to integers directly, whereas JavaScript will convert the integer keys to string property names.
Go makes its map with initial capacity equal to the benchmark size, whereas JavaScript is growing from its default capacity).
However, despite the potential performance benefits listed above the go map usage seems to perform at about half the rate of the JavaScript object map! For example (representative):
go: 128.318976ms
js: 48.18517ms
Am I doing something obviously wrong with go maps or somehow comparing apples to oranges?
I would have expected go maps to perform at least as well - if not better than JavaScript objects as maps. Is this just a sign of go's immaturity (1.4 on darwin/amd64) or does it represent some fundamental difference between the two language data structures that I'm missing?
[Update]
Note that if you explicitly use string keys (e.g. via s := strconv.Itoa(i) and var s = ''+i in Go and JavaScript, respectively) then their performance is roughly equivalent.
My guess is that the very high performance from v8 is related to a specific optimization in that runtime for objects whose keys are consecutive integers (e.g. by substituting an array implementation instead of a hashtable).
I'm voting to close since there is likely nothing to see here...
Your benchmark is synthetic a bit, just like any benchmarks are. Just for curious try
for i := 0; i < 1000000; i += 9 {
in Go implementation. You may be surprised.
I am profiling my javascript code intended to be used on embedded browser on Android (PhoneGap).
Basically I need a very large bitfield (200k+ bits) for my calculations.
I've tried to put them into array of unsigned integers with each item storing 32 bits - this indeed reduced memory usage but made execution time drastically too slow (over 30 seconds for simple iterating and reversing all bits in the bitfield on modern PC!)
Than I made good old fashion array of bools. This increased memory usage (but still it was less than 15 mega on Android for entire PhoneGap framework around my code). Profiling showed me that initial step in my algorithm - setting all elements of the bitfield to 1 (simple for- loop) - takes half of the execution time (~1.5 seconds on PC, more than few minutes on Android). I can rewrite my code so default value would be 0 not 1 (reverse all conditions), but I still don't know how to set such large array to 0'es fast.
Edit adding my code, as requested:
var count = 200000;
var myArr = [];
myArr.length = count;
for(var i = 0; i < count ; i++)
myArr[i] = true;
Could someone point me how can I clear very large array, or is there any faster way to store and operate on large bitfields in javascript?
See if this is a faster way to create the array:
var myArray = [true];
var desiredLength = 200000;
while (myArray.length < desiredLength) {
myArray = myArray.concat(myArray);
}
if (myArray.length > desiredLength) {
myArray.splice(desiredLength);
}
I've added a few more test cases to the jsperf page that Asad linked in his comment. By far the fastest in my browser (Chrome 23.0.1271.101 on Mac OS X 10.8.2) is this one:
var count = 200000;
var myArr = [];
for (var i = 0; i < count; i++) {
myArr.push(true);
}
Why pre-fill the array in the first place! Use undefined to your advantage. Remember that undefined acts as a falsey value. So it will act exactly like 0/false when you do a boolean check.
var myArray = new Array(200000);
if (myArray[1]) {
//I am a truthy value
} else {
//I am a falsey value
}
So when you initialize the array this way, there is no reason to prefill! That means no extra processing and take advantage of the sparse Array!
I really need an master of algorithm here! So the thing is I got for example an array like this:
[
[870, 23]
[970, 78]
[110, 50]
]
and I want to split it up, so that it looks like this:
// first array
[
[970, 78]
]
// second array
[
[870, 23]
[110, 50]
]
so now, why do I want it too look like this?
Because I want to keep the sum of sub values as equal as possible. So 970 is about 870 + 110 and 78 is about 23 + 50.
So in this case it's very easy because if you would just split them and only look at the first sub-value it will already be correct but I want to check both and keep them as equal as possible, so that it'll also work with an array which got 100 sub-arrays! So if anyone can tell me the algorithm with which I can program this it would be really great!
Scales:
~1000 elements (sublists) in the array
Elements are integers up to 10^9
I am looking for a "close enough solution" - it does not have to be the exact optimal solution.
First, as already established - the problem is NP-Hard, with a reduction form Partition Problem.
Reduction:
Given an instance of partition problem, create lists of size 1 each. The result will be this problem exactly.
Conclusion from the above:
This problem is NP-Hard, and there is no known polynomial solution.
Second, Any exponential and pseudo polynomial solutions will take just too long to work, due to the scale of the problem.
Third, It leaves us with heuristics and approximation algorithms.
I suggest the following approach:
Normalize the scales of the sublists, so all the elements will be in the same scale (say, all will be normalzied to range [-1,1] or all will be normalized to standard normal distribution).
Create a new list, in which, each element will be the sum of the matching sublist in the normalized list.
Use some approximation or heuristical solution that was developed for the subset-sum / partition problem.
The result will not be optimal, but optimal is really unattanable here.
From what I gather from the discussion under the original post, you're not searching for a single splitting point, but rather you want to distribute all pairs among two sets, such that the sums in each of the two sets are approximately equal.
Since a close enough solution is acceptable, maybe you could try an approach based on simulated annealing?
(see http://en.wikipedia.org/wiki/Simulated_annealing)
In short, the idea is that you start out by randomly assigning each pair to either the Left or the Right set.
Next, you generate a new state by either
a) moving a randomly selected pair from the Left to the Right set,
b) moving a randomly selected pair
from the Right to the Left set, or
c) doing both.
Next, determine if this new state is better or worse than the current state. If it is better, use it.
If it is worse, take it only if it is accepted by the acceptance probability function, which is a function
that initially allows worse states to be used, but favours them less and less as time moves on (or the "temperature decreases", in SA terms).
After a large number of iterations (say 100.000), you should have a pretty good result.
Optionally, rerun this algorithm multiple times because it may get stuck in local optima (although the acceptance probability function attempts to counter this).
Advantages of this approach are that it's simple to implement, and you can decide for yourself how long
you want it to continue searching for a better solution.
I'm assuming that we're just looking for a place in the middle of the array to split it into its first and second part.
It seems like a linear algorithm could do this. Something like this in JavaScript.
arrayLength = 2;
tolerance = 10;
// Initialize the two sums.
firstSum = [];
secondSum = [];
for (j = 0; j < arrayLength; j++)
{
firstSum[j] = 0;
secondSum[j] = 0;
for (i = 0; i < arrays.length; i++)
{
secondSum += arrays[i][j];
}
}
// Try splitting at every place in "arrays".
// Try to get the sums as close as possible.
for (i = 0; i < arrays.length; i++)
{
goodEnough = true;
for (j = 0; j < arrayLength; j++)
{
if (Math.abs(firstSum[j] - secondSum[j]) > tolerance)
goodEnough = false;
}
if (goodEnough)
{
alert("split before index " + i);
break;
}
// Update the sums for the new position.
for (j = 0; j < arrayLength; j++)
{
firstSum[j] += arrays[i][j];
secondSum[j] -= arrays[i][j];
}
}
Thanks for all the answers, the bruteforce attack was a good idea and NP-Hard is related to this too, but it turns out that this is a multiple knapsack problem and can be solved using this pdf document.