I have to load a good chunk of data form my API and I have the choice of the format that I get the data. My question is about performance and to choose the fastest format to load on a query and being able to read it fast as well in JavaScript.
I can have a two dimensional array :
[0][0] = true;
[0][1] = false;
[1][2] = true;
[...]
etc etc..
Or I can have an array of object :
[
{ x: 0, y: 0, data: true},
{ x: 0, y: 1, data: false},
{ x: 1, y: 2, data: true},
[...]
etc etc..
]
I couldn't find any benchmark for this comparison for a GET request, with a huge amount of data.. If there is anything anywhere, I would love to read it !
The second part of the question is to read the data. I will have a loop that will need to get the value for each coordinate.
I assume looking up directly for the coordinate in a 2 dimensional array would be faster than looking up into each object at every loop. Or maybe I am wrong ?
Which one of the two format would be the fastest to load and read ?
Thanks.
For the first part of your question regarding the GET request, I imagine the array would be slightly quicker to load, but depending on your data, it could very well be negligible. I'm basing that on the fact that, if you take out the white space, the example data you have for each member of the array is 12 bytes, while the example data for the similar object is 20 bytes. If that were true for your actual data, theoretically there would be only 3/5 of the data to transfer, but unless you're getting a lot of data it's probably not going to make a noticeable difference.
To answer the second part of your question: the performance of any code is going to depend significantly on the details of your specific use case. For most situations, I think the most important point is:
Objects are significantly more readable and user-friendly
That said, when performance/speed is an issue and/or high priority, which it sounds like could be the case for you, there are definitely things to consider. While it relates to writing data instead of reading it, I found this good comparison of the performance of arrays vs objects that brought up some interesting points. In running the tests above multiple times using Chrome 45.0.2454.101 32-bit on Windows 7 64-bit, I found these points to generally be true:
Arrays will always be close to the fastest, if not the fastest
If the length of the object is known/can be hard coded, it's possible to make their performance close to and sometimes better than arrays
In the test linked above, this code using objects ran at 225 ops/sec in one of my tests:
var sum = 0;
for (var x in obj) {
sum += obj[x].payload;
}
Compared to this code using arrays that ran at 13,620 ops/sec in the same test:
var sum = 0;
for (var x = 0; x < arr.length; ++x) {
sum += arr[x].payload
}
Important to note, however, is that this code using objects with a hard coded length ran at 14,698 ops/sec in the same test, beating each of the above:
var sum = 0;
for (var x = 0; x < 10000; ++x) {
sum += obj[x].payload
}
All of that said, it probably depends on your specific use case what will have the best performance, but hopefully this gives you some things to consider.
Related
I was just thinking about the following situation:
Let's say we have a line definition like shown below. Here start and end are both points.
let line = {
start: {x:0, y:0},
end: {x:0,y:0},
orientation: 'vertical'
}
Now imagine we have a very large array of lines, how do we save space? I know you can replace the orientation value 'vertical' with an Enum. But can you save space on the key names without reducing readability? E.g., you can replace orientation with o, but now it is no longer clear what the key stands for.
Let me know!
If you mean memory usage, Javascript Engines are very clever these days, things like using lookup's for keys internally, string de-duplication etc, make having short or long key names will have very little effect.
For example, using your data structure above I pushed 1 million records into an array, one with very long key names, and one with 1 characters key names. In both cases the memory usage per items worked out about 147 bytes. Also even using a const for the vertical had little effect.
But as you can see 147 bytes does seem high, so if you wanted to reduce this, you would need to use TypedArrays, unfortunately these can be a little bit more tricky to reason with. But if memory was a concern that might be worth the effort.
If you did use TypedArray you could use getters and setters to make this much easier. And doing this you could maybe use 33 bytes per record, 4 doubles & 1 byte.
But before doing anything, I would first make sure doing this sort of optimisation is even necessary. Pre-Optimising Javascript is a fools errand.
ps. I did the memory tests using NodeJs, that basically uses Chrome's Javascript engine.
Also if you want to play with how memory is effected, and you have NodeJs installed, here is some example code to get you started.
const oused = process.memoryUsage().heapUsed;
const values = [];
for (let l = 0; l < 1000000; l += 1) {
values.push({
start: { x: 0, y: 0 },
end: { x: 0, y: 0 },
orientation: "vertical",
});
}
console.log(process.memoryUsage().heapUsed - oused);
Here's a representation of a line that strikes a balance between usability and memory efficiency. It uses no string identifiers because the positions of the data are encoded into the type. It also uses standard arrays because you didn't specify the size of the cartesian coordinate plane (so I can't suggest a bounded container like a TypedArray):
TS Playground
type Point = [x: number, y: number];
type Line = [a: Point, b: Point];
const line: Line = [[0, 0], [0, 2]];
// More memory efficient: 2 fewer arrays per line:
type CompactLine = [...a: Point, ...b: Point];
const compactLine: CompactLine = [0, 0, 0, 2];
I suggest you create a map for all the shortened version of the keys and use the map in parallel with your object. The map can have the object keys as the map keys and the full form as the values
const map = new Map()
const object = {
o: 4
} // o means orientation
map.set("o", "orientation")
const mapKeys = Array.from(map.keys())
const redeableObj = {}
mapKeys.forEach(key => {
const redeableKey = map.get(key)
redeableObj[redeableKey] = object[key]
})
console.log(object)
console.log(redeableObj)
In order to solve a coding challenge involving large primes, I'm first trying to create an Array of length 999999999 with every element having the value "true." However, when I try to create this Array, I keep getting the error "FATAL ERROR: invalid table size Allocation failed - JavaScript heap out of memory."
I tried two ways of creating the Array and it happened both times.
let numbers = Array(999999999);
for (let i = 0; i < numbers.length; i++) {
numbers[i] = true;
}
let numbers = [];
for (let i = 0; i < numbers.length; i++) {
numbers.push(true);
}
I've tried using node --max-old-space-size to increase the memory limit up to 5GB but it still doesn't work.
I've read that the max length of an array in Javascript is 4294967296, which is significantly higher than 999999999, so I am a little confused as to why it isn't working.
I seem able to create the Array(999999999), the error happens when I try to assign the value true to each element.
Any advice would be much appreciated :)
5 GB (or GiB) is not enough.
A boolean in an array in node.js needs an average of 9.7 bytes of memory.. So, you'd need 999,999,999 * 9.7 = around 9.7 GB (~9.03 GiB) of memory, but other things will live in this space too, so to be on the safe side, I'd allocate around 11 GiB for it.
Having an array of this size is probably never a great idea though. You should think about a different way to approach your problem. The point of the challenge is probably that you find a smart solution that does not need loops over one billion things.
mate.
You can think of that like it's infinite Loop.
If you do something which is taking so much memory, then its bad idea.
You need to find another approach.
Kind regards
I am trying to build a large Array (22,000 elements) of Associative Array elements in JavaScript. Do I need to worry about the length of the indices with regards to memory usage?
In other words, which of the following options saves memory? or are they the same in memory consumption?
Option 1:
var student = new Array();
for (i=0; i<22000; i++)
student[i] = {
"studentName": token[0],
"studentMarks": token[1],
"studentDOB": token[2]
};
Option 2:
var student = new Array();
for (i=0; i<22000; i++)
student[i] = {
"n": token[0],
"m": token[1],
"d": token[2]
};
I tried to test this on Google Chrome DevTools, but the numbers are inconsistent to make a decision. My best guess is that because the Array indices repeat, the browser can optimize memory usage by not repeating them for each student[i], but that is just a guess.
Edit:
To clarify, the problem is the following: a large array containing many small associative arrays. Does it matter using long index or short index when it comes to memory requirements.
Edit 2:
The 3N array approach that was suggested in the comments and #Joseph Myers is referring to is creating one array 'var student = []', with a size 3*22000, and then using student[0] for name, student[1] for marks, student[2] for DOB, etc.
Thanks.
The difference is insignificant, so the answer is no. This sort of thing would barely even fall under micro optimization. You should always opt for most readable solutions when in such dilemmas. The cost of maintaining code from your second option outweighs any (if any) performance gain you could get from it.
What you should do though is use the literal for creating an array.
[] instead of new Array(). (just a side note)
A better approach to solve your problem would probably be to find a way to load the data in parts, implementing some kind of pagination (I assume you're not doing heavy computations on the client).
The main analysis of associative arrays' computational cost has to do with performance degradation as the number of elements stored increases, but there are some results available about performance loss as the key length increases.
In Algorithms in C by Sedgewick, it is noted that for some key-based storage systems the search cost does not grow with the key length, and for others it does. All of the comparison-based search methods depend on key length--if two keys differ only in their rightmost bit, then comparing them requires time proportional to their length. Hash-based methods always require time proportional to the key length (in order to compute the hash function).
Of course, the key takes up storage space within the original code and/or at least temporarily in the execution of the script.
The kind of storage used for JavaScript may vary for different browsers, but in a resource-constrained environment, using smaller keys would have an advantage, like still too small of an advantage to notice, but surely there are some cases when the advantage would be worthwhile.
P.S. My library just got in two new books that I ordered in December about the latest computational algorithms, and I can check them tomorrow to see if there are any new results about key length impacting the performance of associative arrays / JS objects.
Update: Keys like studentName take 2% longer on a Nexus 7 and 4% longer on an iPhone 5. This is negligible to me. I averaged 500 runs of creating a 30,000-element array with each element containing an object { a: i, b: 6, c: 'seven' } vs. 500 runs using an object { studentName: i, studentMarks: 6, studentDOB: 'seven' }. On a desktop computer, the program still runs so fast that the processor's frequency / number of interrupts, etc., produce varying results and the entire program finishes almost instantly. Once every few runs, the big key size actually goes faster (because other variations in the testing environment affect the result more than 2-4%, since the JavaScript timer is based on clock time rather than CPU time.) You can try it yourself here: http://dropoff.us/private/1372219707-1-test-small-objects-key-size.html
Your 3N array approach (using array[0], array[1], and array[2] for the contents of the first object; and array[3], array[4], and array[5] for the second object, etc.) works much faster than any object method. It's five times faster than the small object method and five times faster plus 2-4% than the big object method on a desktop, and it is 11 times faster on a Nexus 7.
I really need an master of algorithm here! So the thing is I got for example an array like this:
[
[870, 23]
[970, 78]
[110, 50]
]
and I want to split it up, so that it looks like this:
// first array
[
[970, 78]
]
// second array
[
[870, 23]
[110, 50]
]
so now, why do I want it too look like this?
Because I want to keep the sum of sub values as equal as possible. So 970 is about 870 + 110 and 78 is about 23 + 50.
So in this case it's very easy because if you would just split them and only look at the first sub-value it will already be correct but I want to check both and keep them as equal as possible, so that it'll also work with an array which got 100 sub-arrays! So if anyone can tell me the algorithm with which I can program this it would be really great!
Scales:
~1000 elements (sublists) in the array
Elements are integers up to 10^9
I am looking for a "close enough solution" - it does not have to be the exact optimal solution.
First, as already established - the problem is NP-Hard, with a reduction form Partition Problem.
Reduction:
Given an instance of partition problem, create lists of size 1 each. The result will be this problem exactly.
Conclusion from the above:
This problem is NP-Hard, and there is no known polynomial solution.
Second, Any exponential and pseudo polynomial solutions will take just too long to work, due to the scale of the problem.
Third, It leaves us with heuristics and approximation algorithms.
I suggest the following approach:
Normalize the scales of the sublists, so all the elements will be in the same scale (say, all will be normalzied to range [-1,1] or all will be normalized to standard normal distribution).
Create a new list, in which, each element will be the sum of the matching sublist in the normalized list.
Use some approximation or heuristical solution that was developed for the subset-sum / partition problem.
The result will not be optimal, but optimal is really unattanable here.
From what I gather from the discussion under the original post, you're not searching for a single splitting point, but rather you want to distribute all pairs among two sets, such that the sums in each of the two sets are approximately equal.
Since a close enough solution is acceptable, maybe you could try an approach based on simulated annealing?
(see http://en.wikipedia.org/wiki/Simulated_annealing)
In short, the idea is that you start out by randomly assigning each pair to either the Left or the Right set.
Next, you generate a new state by either
a) moving a randomly selected pair from the Left to the Right set,
b) moving a randomly selected pair
from the Right to the Left set, or
c) doing both.
Next, determine if this new state is better or worse than the current state. If it is better, use it.
If it is worse, take it only if it is accepted by the acceptance probability function, which is a function
that initially allows worse states to be used, but favours them less and less as time moves on (or the "temperature decreases", in SA terms).
After a large number of iterations (say 100.000), you should have a pretty good result.
Optionally, rerun this algorithm multiple times because it may get stuck in local optima (although the acceptance probability function attempts to counter this).
Advantages of this approach are that it's simple to implement, and you can decide for yourself how long
you want it to continue searching for a better solution.
I'm assuming that we're just looking for a place in the middle of the array to split it into its first and second part.
It seems like a linear algorithm could do this. Something like this in JavaScript.
arrayLength = 2;
tolerance = 10;
// Initialize the two sums.
firstSum = [];
secondSum = [];
for (j = 0; j < arrayLength; j++)
{
firstSum[j] = 0;
secondSum[j] = 0;
for (i = 0; i < arrays.length; i++)
{
secondSum += arrays[i][j];
}
}
// Try splitting at every place in "arrays".
// Try to get the sums as close as possible.
for (i = 0; i < arrays.length; i++)
{
goodEnough = true;
for (j = 0; j < arrayLength; j++)
{
if (Math.abs(firstSum[j] - secondSum[j]) > tolerance)
goodEnough = false;
}
if (goodEnough)
{
alert("split before index " + i);
break;
}
// Update the sums for the new position.
for (j = 0; j < arrayLength; j++)
{
firstSum[j] += arrays[i][j];
secondSum[j] -= arrays[i][j];
}
}
Thanks for all the answers, the bruteforce attack was a good idea and NP-Hard is related to this too, but it turns out that this is a multiple knapsack problem and can be solved using this pdf document.
Currently i am aggregating big amount of data on a daily basis and for each day i am calculating a median of the current values. Now i need to aggregate all this daily results into a monthly basis and of course i need to calculate the median again.
Is there a way to calculate a median of medians and have it statistically correct? I want to avoid to use the raw data again, because it is a huge amount of it :)
As a small proof of concept i made this javascript - maybe it helps to find a way:
var aSortedNumberGroups = [];
var aSortedNumbers = [];
var aMedians = [];
Math.median = function(aData)
{
var fMedian = 0;
var iIndex = Math.floor(aData.length/2);
if (!(aData.length%2)) {
fMedian = (aData[iIndex-1]+aData[iIndex])/2;
} else {
fMedian = aData[iIndex];
}
return fMedian;
};
for (var iCurrGroupNum = 0; iCurrGroupNum < 5; ++iCurrGroupNum) {
var aCurrNums = [];
for (var iCurrNum = 0; iCurrNum < 1000; ++iCurrNum) {
var iCurrRandomNumber = Math.floor(Math.random()*10001);
aCurrNums.push(iCurrRandomNumber);
aSortedNumbers.push(iCurrRandomNumber);
}
aCurrNums.sort(function(oCountA,oCountB) {
return (iNumA < iNumB) ? -1 : 1;
});
aSortedNumberGroups.push(aCurrNums);
aMedians.push(Math.median(aCurrNums));
}
console.log("Medians of each group: "+JSON.stringify(aMedians, null, 4));
console.log("Median of medians: "+Math.median(aMedians));
console.log("Median of all: "+Math.median(aSortedNumbers));
As you will see there is often a huge cap between the median of all raw numbers and the median of medians and i like to have it pretty close to each other.
Thanks alot!
you don't actually "calculate" a median you "discover" it through redistribution into subsets, the only optimization for this is a reloadable "tick chart" or running tally: e.g. store each occurrence with the number of times it occurred this way you can recreate the distribution without actually having to reparse the raw data. This is only a small optimization, but depending on the repetition of the data set in question you could save yourself tons of MB and at the very least a bunch of processor cycles.
think of it in JSON: { '1': 3, '5': 12, '7': 4 } canonical: '1' has occurred 3 times, '5' has occurred 12 times, etc...
then persist those counts for the starting at the beginning of time period in which you want to get a median for.
hope this helps -ck
No, unfortunately there is not a way to calculate the median based on medians of subsets of the whole and still be statistically accurate. If you wanted to calculate the mean, however, you could use the means of subsets, given that they are of equal size.
ck's optimization above could be of assistance to you.
Yet another approach is to take each day's data, parse it, and store it in sorted order. For a given day you can just look at the median piece of data and you've got your answer.
At the end of the month you can do a quick-select to find the median. You can take advantage of the sorted order of each day's data to do a binary search to split it. The result is that your end of month processing will be very, very quick.
The same kind of data, organized in the same kind of way, will also let you do various percentiles very cheaply. The only hard part is extracting each day's raw data and sorting it.
I know this is a very dated thread, but future readers may find Tukey's Ninther method quite relevant ... analysis here: http://www.johndcook.com/blog/2009/06/23/tukey-median-ninther/
-kg