Javascript array performance crash even with smaller chunks - javascript

Given is a big (but not huge) array of strings (in numbers 1000-5000 single strings). I want to perform some calculations and other stuff on these strings. Because it always stopped working when dealing with that one big array, I rewrote my function to recursively fetch smaller chunks (currently 50 elements) - I did this using splice because I thought it would be a nice idea to reduce the size of the big array step by step.
After implementing the "chunk"-version, I'm now able to calculate up to about 2000 string-elements (above that my laptop is becoming extremely slow and crashing after a while).
The question: why is it still crashing, even though I'm not processing that huge array but just small chunks successively?
Thanks in advance.
var file = {some-array} // the array of lines
var portionSize = 50; // the size of the chunks
// this function is called recursively
function convertStart(i) {
var size = file.length;
chunk = file.splice(0,portionSize);
portionConvert(chunk,i);
}
// this function is used for calculating things with the strings
function portionConvert(chunk,istart) {
for(var i=0;i<portionSize;i++) {
// doing some string calculation with the smaller chunk here
}
istart += 1;
convertStart(istart); // recall the function with the next chunk
}

From my experience the amount of recursion you're doing can "exceed the stack," unless you narrow down the input values, which is why you were able to do more with less. Keep in mind that for every new function call, the state of the function at the call site is saved in your RAM. If you have a computer with little RAM it's going to get clogged up.
If you're a having a processing problem you should switch to a loop version. Loops don't progressively save the state of the function, just the values. Typically, I would leave recursion for smaller jobs like processing a tree-like/object structures or parsing expressions; some situation where it requires processing to "intuitively go deeper" on something. In the case where you just have one long array, I would just process each of the elements with a forEach, which is a for-loop in a handy wrapper:
file.forEach(function(arrayElement) {
// doing some string calculation with the chunk (arrayElement) here
});
Take a look at forEach here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/forEach

Related

Are two for loops slower than a bigger one?

Which piece of code is faster
Number 1:
for(var i = 0; i<50; i++){
//run code in here
}
for(var i = 0; i<50; i++){
//run more code in here
}
Or number 2:
for(var i = 0; i<100; i++){
//run all the code in here
}
Thanks!
As already pointed out in another answer, both loops yield the same O(N) scaling behaviour (for whatever happens in the loop body as well as for scaling the loop lengths 50 and 100, resp. The point usually is the proportional factor that accompanies the power term (c . XN).
On many (most) real CPU systems used for performance-relevant computation, there are usually caches and pipelines for the data manipulated inside the loops. Then, the answer to the question depends on the details of your loop bodies (will all the data read/written in the loops fit into some level of cache, or will the second 50-loop miss all existing cache values, and retrieve data from memory again?). Additionally, speculation of branch prediction (to the loop exit/repeat branch as well as to those inside the loop) has a complicated influence on the actual performance.
It is an own section of computational science to take into account all relevant details exactly. One should analyse the concrete example (what do the loops do actually?) - and before, whether this loop is actually relevant.
Some heuristic may nevertheless be helpful:
If i is an iterator (and not only a repetition counter), the two 1..50 loops might be working on the same data.
If it is possible to treat every element by both loop bodies (which only works if there are no dependencies between the second loop and the state of other elements after the first loop), it is usually more efficient to treat each index only once.
It would depend on the logic inside of them (nested loops etc). Theoretically, They'd run the same, as they're both linear. (Both are 100 iterations). So the Big O Time Complexity is O(N), where N is the size of the loop.
Big O Time Complexity

Do I got number of operations per second in this way?

Look at this code:
function wait(time) {
let i = 0;
let a = Date.now();
let x = a + (time || 0);
let b;
while ((b = Date.now()) <= x) ++i;
return i;
}
If I run it in browser (particularly Google Chrome, but I don't think it matters) in the way like wait(1000), the machine will obviously freeze for a second and then return recalculated value of i.
Let it be 10 000 000 (I'm getting values close to this one). This value varies every time, so lets take an average number.
Did I just got current number of operations per second of the processor in my machine?
Not at all.
What you get is the number of loop cycles completed by the Javascript process in a certain time. Each loop cycle consists of:
Creating a new Date object
Comparing two Date objects
Incrementing a Number
Incrementing the Number variable i is probably the least expensive of these, so the function is not really reporting how much it takes to make the increment.
Aside from that, note that the machine is doing a lot more than running a Javascript process. You will see interference from all sorts of activity going on in the computer at the same time.
When running inside a Javascript process, you're simply too far away from the processor (in terms of software layers) to make that measurement. Beneath Javascript, there's the browser and the operating system, each of which can (and will) make decisions that affect this result.
No. You can get the number of language operations per second, though the actual number of machine operations per second on a whole processor is more complicated.
Firstly the processor is not wholly dedicated to the browser, so it is actually likely switching back and forth between prioritized processes. On top of that memory access is obscured and the processor uses extra operations to manage memory (page flushing, etc.) and this is not gonna be very transparent to you at a given time. On top of that physical properties means that the real clock rate of the processor is dynamic... You can see it's pretty complicated already ;)
To really calculate the number of machine operations per second you need to measure the clock rate of the processor and multiply it by the number of instructions per cycle the processor can perform. Again this varies, but really the manufacturer specs will likely be good enough of an estimate :P.
If you wanted to use a program to measure this, you'd need to somehow dedicate 100% of the processor to your program and have it run a predictable set of instructions with no other hangups (like memory management). Then you need to include the number of instructions it takes to load the program instructions into the code caches. This is not really feasible however.
As others have pointed out, this will not help you determine the number of operations the processor does per second due to the factors that prior answers have pointed out. I do however think that a similar experiment could be set up to estimate the number of operations to be executed by your JavaScript interpreter running on your browser. For example given a function: factorial(n) an operation that runs in O(n). You could execute an operation such as factorial(100) repeatedly over the course of a minute.
function test(){
let start = Date.now();
let end = start + 60 * 1000;
let numberOfExecutions = 0;
while(Date.now() < end){
factorial(100);
numberOfExecutions++;
}
return numberOfExecutions/(60 * 100);
}
The idea here is that factorial is by far the most time consuming function in the code. And since factorial runs in O(n) we know factorial(100) is approximately 100 operations. Note that this will not be exact and that larger numbers will make for better approximations. Also remember that this will estimate the number of operations executed by your interpreter and not your processor.
There is a lot of truth to all previous comments, but I want to invert the reasoning a little bit because I do believe it is easier to understand it like that.
I believe that the fairest way to calculate it is with the most basic loop, and not relying on any dates or functions, and instead calculate the values later.
You will see that the smaller the function, the bigger the initial overload is. That means it takes a small amount of time to start and finish each function, but at a certain point they all start reaching a number that can clearly be seen as close-enough to be considered how many operations per second can JavaScript run.
My example:
const oneMillion = 1_000_000;
const tenMillion = 10_000_000;
const oneHundredMillion = 100_000_000;
const oneBillion = 1_000_000_000;
const tenBillion = 10_000_000_000;
const oneHundredBillion = 100_000_000_000;
const oneTrillion = 1_000_000_000_000;
function runABunchOfTimes(times) {
console.time('timer')
for (let i = 0; i < times; ++i) {}
console.timeEnd('timer')
}
I've tried on a machine that has a lot of load already on it with many processes running, 2020 macbook, these were my results:
at the very end I am taking the time the console showed me it took to run, and I divided the number of runs by it. The oneTrillion and oneBillion runs are virtually the same, however when it goes to oneMillion and 1000 you can see that they are not as performant due to the initial load of creating the for loop in the first place.
We usually try to sway away from O(n^2) and slower functions exactly because we do not want to reach for that maximum. If you were to perform a find inside of a map for an array with all cities in the world (around 10_000 according to google, I haven't counted) we would already each 100_000_000 iterations, and they would certainly not be as simple as just iterating through nothing like in my example. Your code then would take minutes to run, but I am sure you are aware of this and that is why you posted the question in the first place.
Calculating how long it would take is tricky not only because of the above, but also because you cannot predict which device will run your function. Nowadays I can open in my TV, my watch, a raspberry py and none of them would be nearly as fast as the computer I am running from when creating these functions. Sure. But if I were to try to benchmark a device I would use something like the function above since it is the simplest loop operation I could think of.

Are there ideal array sizes in JavaScript?

I've seen little utility routines in various languages that, for a desired array capacity, will compute an "ideal size" for the array. These routines are typically used when it's okay for the allocated array to be larger than the capacity. They usually work by computing an array length such that the allocated block size (in bytes) plus a memory allocation overhead is the smallest exact power of 2 needed for a given capacity. Depending on the memory management scheme, this can significantly reduce memory fragmentation as memory blocks are allocated and then freed.
JavaScript allows one to construct arrays with predefined length. So does the concept of "ideal size" apply? I can think of four arguments against it (in no particular order):
JS memory management systems work in a way that would not benefit from such a strategy
JS engines already implement such a sizing strategy internally
JS engines don't really keep arrays as contiguous memory blocks, so the whole idea is moot (except for typed arrays)
The idea applies, but memory management is so engine-dependent that no single "ideal size" strategy would be workable
On the other hand, perhaps all of those arguments are wrong and a little utility routine would actually be effective (as in: make a measurable difference in script performance).
So: Can one write an effective "ideal size" routine for JavaScript arrays?
Arrays in javascript are at their core objects. They merely act like arrays through an api. Initializing an array with an argument merely sets the length property with that value.
If the only argument passed to the Array constructor is an integer between 0 and 232-1 (inclusive), this returns a new JavaScript array with length set to that number. -Array MDN
Also, there is no array "Type". An array is an Object type. It is thus an Array Object ecma 5.1.
As a result, there will be no difference in memory usage between using
var one = new Array();
var two = new Array(1000);
aside from the length property. When tested in a loop using chrome's memory timeline, this checks out as well. Creating 1000 of each of those both result in roughly 2.2MB of allocation on my machine.
one
two
You'll have to measure performance because there are too many moving parts. The VM and the engine and browser. Then, the virtual memory (the platform windows/linux, the physical available memory and mass storage devices HD/SSD). And, obviously, the current load (presence of other web pages or if server-side, other applications).
I see little use in such an effort. Any ideal size for performance may just not be ideal anymore when another tab loads in the browser or the page is loaded on another machine.
Best thing I see here to improve is development time, write less and be quicker on deploying your website.
I know this question and the answer was about memory usage. BUT although there might be no difference in the allocated memory size between calling the two constructors (with and without the size parameter), there is a difference in performance when filling the array. Chrome engine obviously performs some pre-allocation, as suggested by this code run in the Chrome profiler:
<html>
<body>
<script>
function preAlloc() {
var a = new Array(100000);
for(var i = 0; i < a.length; i++) {
a[i] = i;
}
}
function noAlloc() {
var a = [];
var length = 100000;
for(var i = 0; i < length; i++) {
a[i] = i;
}
}
function repeat(func, count) {
var i = 0;
while (i++ < count) {
func();
}
}
</script>
</body>
Array performance test
<script>
// 2413 ms scripting
repeat(noAlloc, 10000);
repeat(preAlloc, 10000);
</script>
</html>
The profiler shows that the function with no size parameter took 28 s to allocate and fill 100,000 items array for 1000 times and the function with the size parameter in the array constructor took under 7 seconds.

Can reducing index length in Javascript associative array save memory

I am trying to build a large Array (22,000 elements) of Associative Array elements in JavaScript. Do I need to worry about the length of the indices with regards to memory usage?
In other words, which of the following options saves memory? or are they the same in memory consumption?
Option 1:
var student = new Array();
for (i=0; i<22000; i++)
student[i] = {
"studentName": token[0],
"studentMarks": token[1],
"studentDOB": token[2]
};
Option 2:
var student = new Array();
for (i=0; i<22000; i++)
student[i] = {
"n": token[0],
"m": token[1],
"d": token[2]
};
I tried to test this on Google Chrome DevTools, but the numbers are inconsistent to make a decision. My best guess is that because the Array indices repeat, the browser can optimize memory usage by not repeating them for each student[i], but that is just a guess.
Edit:
To clarify, the problem is the following: a large array containing many small associative arrays. Does it matter using long index or short index when it comes to memory requirements.
Edit 2:
The 3N array approach that was suggested in the comments and #Joseph Myers is referring to is creating one array 'var student = []', with a size 3*22000, and then using student[0] for name, student[1] for marks, student[2] for DOB, etc.
Thanks.
The difference is insignificant, so the answer is no. This sort of thing would barely even fall under micro optimization. You should always opt for most readable solutions when in such dilemmas. The cost of maintaining code from your second option outweighs any (if any) performance gain you could get from it.
What you should do though is use the literal for creating an array.
[] instead of new Array(). (just a side note)
A better approach to solve your problem would probably be to find a way to load the data in parts, implementing some kind of pagination (I assume you're not doing heavy computations on the client).
The main analysis of associative arrays' computational cost has to do with performance degradation as the number of elements stored increases, but there are some results available about performance loss as the key length increases.
In Algorithms in C by Sedgewick, it is noted that for some key-based storage systems the search cost does not grow with the key length, and for others it does. All of the comparison-based search methods depend on key length--if two keys differ only in their rightmost bit, then comparing them requires time proportional to their length. Hash-based methods always require time proportional to the key length (in order to compute the hash function).
Of course, the key takes up storage space within the original code and/or at least temporarily in the execution of the script.
The kind of storage used for JavaScript may vary for different browsers, but in a resource-constrained environment, using smaller keys would have an advantage, like still too small of an advantage to notice, but surely there are some cases when the advantage would be worthwhile.
P.S. My library just got in two new books that I ordered in December about the latest computational algorithms, and I can check them tomorrow to see if there are any new results about key length impacting the performance of associative arrays / JS objects.
Update: Keys like studentName take 2% longer on a Nexus 7 and 4% longer on an iPhone 5. This is negligible to me. I averaged 500 runs of creating a 30,000-element array with each element containing an object { a: i, b: 6, c: 'seven' } vs. 500 runs using an object { studentName: i, studentMarks: 6, studentDOB: 'seven' }. On a desktop computer, the program still runs so fast that the processor's frequency / number of interrupts, etc., produce varying results and the entire program finishes almost instantly. Once every few runs, the big key size actually goes faster (because other variations in the testing environment affect the result more than 2-4%, since the JavaScript timer is based on clock time rather than CPU time.) You can try it yourself here: http://dropoff.us/private/1372219707-1-test-small-objects-key-size.html
Your 3N array approach (using array[0], array[1], and array[2] for the contents of the first object; and array[3], array[4], and array[5] for the second object, etc.) works much faster than any object method. It's five times faster than the small object method and five times faster plus 2-4% than the big object method on a desktop, and it is 11 times faster on a Nexus 7.

Performance of assigning values to array

Code optimizing is said here in SO that profiling is the first step for optimizing javascript and the suggested engines are profilers of Chrome and Firefox. The problem with those is that they tell in some weird way the time that each function is executed, but I haven't got any good understanding of them. The most helpful way would be that the profiler would tell, how many times each row is executed and if ever possible also the time that is spent on each row. This way would it be possible to see the bottlenecks strictly. But before such tool is implemented/found, we have two options:
1) make own calculator which counts both the time and how many times certain code block or row is executed
2) learn to understand which are slow methods and which are not
For option 2 jsperf.com is of great help. I have tried to learn optimizing arrays and made a speed test in JSPERF.COM. The following image shows the results in 5 main browsers and found some bottlenecks that I didn't know earlier.
The main findings were:
1) Assigning values to arrays is significantly slower than assigning to normal variables despite of which method is used for assigning.
2) Preinitializing and/or prefilling array before performance critical loops can improve speed significantly
3) Math trigonometric functions are not so slow when compared to pushing values into arrays(!)
Here are the explanations of every test:
1. non_array (100%):
The variables were given a predefined value this way:
var non_array_0=0;
var non_array_1=0;
var non_array_2=0;
...
and in timed region they were called this way:
non_array_0=0;
non_array_1=1;
non_array_2=2;
non_array_3=3;
non_array_4=4;
non_array_5=5;
non_array_6=6;
non_array_7=7;
non_array_8=8;
non_array_9=9;
The above is an array-like variable, but there seems to be no way to iterate or refer to those variables in other way as oppocite to array. Or is there?
Nothing in this test is faster than assigning a number to variable.
2. non_array_non_pre (83.78%)
Exactly the same as test 1, but the variables were not pre-initialized nor prefilled. The speed is 83,78% of the speed of test 1. In every tested browser the speed of prefilled variables was faster than non-prefilled. So initialize (and possibly prefill) variables outside any speed critical loops.
The test code is here:
var non_array_non_pre_0=0;
var non_array_non_pre_1=0;
var non_array_non_pre_2=0;
var non_array_non_pre_3=0;
var non_array_non_pre_4=0;
var non_array_non_pre_5=0;
var non_array_non_pre_6=0;
var non_array_non_pre_7=0;
var non_array_non_pre_8=0;
var non_array_non_pre_9=0;
3. pre_filled_array (19.96 %):
Arrays are evil! When we throw away normal variables (test1 and test2) and take arrays in to the picture, the speed decreases significantly. Although we make all optimizations (preinitialize and prefill arrays) and then assign values directly without looping or pushing, the speed decreases to 19.96 percent. This is very sad and I really don't understand why this occurs. This was one of the main shocks to me in this test. Arrays are so important, and I have not find a way to make many things without arrays.
The test data is here:
pre_filled_array[0]=0;
pre_filled_array[1]=1;
pre_filled_array[2]=2;
pre_filled_array[3]=3;
pre_filled_array[4]=4;
pre_filled_array[5]=5;
pre_filled_array[6]=6;
pre_filled_array[7]=7;
pre_filled_array[8]=8;
pre_filled_array[9]=9;
4. non_pre_filled_array (8.34%):
This is the same test as 3, but the array members are not preinitialized nor prefilled, only optimization was to initialize the array beforehand: var non_pre_filled_array=[];
The speed decreases 58,23 % compared to preinitilized test 3. So preinitializing and/or prefilling array over doubles the speed.
The test code is here:
non_pre_filled_array[0]=0;
non_pre_filled_array[1]=1;
non_pre_filled_array[2]=2;
non_pre_filled_array[3]=3;
non_pre_filled_array[4]=4;
non_pre_filled_array[5]=5;
non_pre_filled_array[6]=6;
non_pre_filled_array[7]=7;
non_pre_filled_array[8]=8;
non_pre_filled_array[9]=9;
5. pre_filled_array[i] (7.10%):
Then to the loops. Fastest looping method in this test. The array was preinitialized and prefilled.
The speed drop compared to inline version (test 3) is 64,44 %. This is so remarkable difference that I would say, do not loop if not needed. If array size is small (don't know how small, it have to be tested separately), using inline assignments instead of looping are wiser.
And because the speed drop is so huge and we really need loops, it's is wise to find better looping method (eg. while(i--)).
The test code is here:
for(var i=0;i<10;i++)
{
pre_filled_array[i]=i;
}
6. non_pre_filled_array[i] (5.26%):
If we do not preinitialize and prefill array, the speed decreases 25,96 %. Again, preinitializing and/or prefilling before speed critical loops is wise.
The code is here:
for(var i=0;i<10;i++)
{
non_pre_filled_array[i]=i;
}
7. Math calculations (1.17%):
Every test have to be some reference point. Mathematical functions are considered slow. The test consisted of ten "heavy" Math calculations, but now comes the other thing that struck me in this test. Look at speed of 8 and 9 where we push ten integer numbers to array in loop. Calculating these 10 Math functions is more than 30% faster than pushing ten integers into array in loop. So, may be it's easier to convert some array pushes to preinitialized non-arrays and keep those trigonometrics. Of course if there are hundred or thousands of calculations per frame, it's wise to use eg. sqrt instead of sin/cos/tan and use taxicab distances for distance comparisons and diamond angles (t-radians) for angle comparisons, but still the main bottleneck can be elsewhere: looping is slower than inlining, pushing is slower than using direct assignment with preinitilization and/or prefilling, code logic, drawing algorithms and DOM access can be slow. All cannot be optimized in Javascript (we have to see something on the screen!) but all easy and significant we can do, is wise to do. Someone here in SO has said that code is for humans and readable code is more essential than fast code, because maintenance cost is the biggest cost. This is economical viewpoint, but I have found that code optimizing can get the both: elegance and readability and the performance. And if 5% performance boost is achieved and the code is more straightforwad, it gives a good feeling!
The code is here:
non_array_0=Math.sqrt(10435.4557);
non_array_1=Math.atan2(12345,24869);
non_array_2=Math.sin(35.345262356547);
non_array_3=Math.cos(232.43575432);
non_array_4=Math.tan(325);
non_array_5=Math.asin(3459.35498534536);
non_array_6=Math.acos(3452.35);
non_array_7=Math.atan(34.346);
non_array_8=Math.pow(234,222);
non_array_9=9374.34524/342734.255;
8. pre_filled_array.push(i) (0.8%):
Push is evil! Push combined to loop is baleful evil! This is for some reason very slow method to assign values into array. Test 5 (direct assignments in loop), is nearly 9 times faster than this method and both methods does exactly the same thing: assign integer 0-9 into preinitialized and prefilled array. I have not tested if this push-for-loop evilness is due to pushing or looping or the combination of both or the looping count. There are in JSPERF.COM other examples that gives conflicting results. It's wiser to test just with the actual data and make decisions. This test may not be compatible with other data than what was used.
And here is the code:
for(var i=0;i<10;i++)
{
pre_filled_array.push(i);
}
9. non_pre_filled_array.push(i) (0.74%):
The last and slowest method in this test is the same as test 8, but the array is not prefilled. A little slower than 9, but the difference is not significant (7.23%). But let's take an example and compare this slowest method to the fastest. The speed of this method is 0.74% of the speed of the method 1, which means that method 1 is 135 times faster than this. So think carefully, if arrays are at all needed in particular use case. If it is only one or few pushes, the total speed difference is not noticeable, but on the other hand if there are only few pushes, they are very simple and elegant to convert to non-array variables.
This is the code:
for(var i=0;i<10;i++)
{
non_pre_filled_array.push(i);
}
And finally the obligatory SO question:
Because the speed difference according to this test seems to be so huge between non-array-variable- assignments and array-assignments, is there any method to get the speed of non-array-variable-assigments and the dynamics of arrays?
I cannot use var variable_$i = 1 in a loop so that $i is converted to some integer. I have to use var variable[i] = 1 which is significantly slower than var variable1 = 1 as the test proved. This may be critical only when there are large arrays and in many cases they are.
EDIT:
I made a new test to confirm the slowness of arrays access and tried to find faster way:
http://jsperf.com/read-write-array-vs-variable
Array-read and/or array-write are significantly slower than using normal variables. If some operations are done to array members, it's wiser to store the array member value to a temp variable, make those operations to temp variable and finally store the value into the array member. And although code becomes larger, it's significantly faster to make those operations inline than in loop.
Conclusion: arrays vs normal variables are analogous to disk vs memory. Usually memory access is faster than disk access and normal variables access is faster than array access. And may be concatenating operations is also faster than using intermediate variables, but this makes code a little non readable.
Assigning values to arrays is significantly slower than assigning to normal variables. Arrays are evil! This is very sad and I really don't understand why this occurs. Arrays are so important!
That's because normal variables are statically scoped and can be (and are) easily optimised. The compiler/interpreter will learn their type, and might even avoid repeated assignments of the same value.
These kind of optimisations will be done for arrays as well, but they're not so easy and will need longer to take effect. There is additional overhead when resolving the property reference, and since JavaScript arrays are auto-growing lists the length needs to be checked as well.
Prepopulating the arrays will help to avoid reallocations for capacity changes, but for your little arrays (length=10) it shouldn't make much difference.
Is there any method to get the speed of non-array-variable-assigments and the dynamics of arrays?
No. Dynamics do cost, but they are worth it - as are loops.
You hardly ever will be in the case to need such a micro-optimisation, don't try it. The only thing I can think of are fixed-sized loops (n <= 4) when dealing with ImageData, there inlining is applicable.
Push is evil!
Nope, only your test was flawed. The jsperf snippets are executed in a timed loop without tearup and -down, and only there you have been resetting the size. Your repeated pushes have been producing arrays with lengths of hundredth thousands, with correspondent need of memory (re-)allocations. See the console at http://jsperf.com/pre-filled-array/11.
Actually push is just as fast as property assignment. Good measurements are rare, but those that are done properly show varying results across different browser engine versions - changing rapidly and unexpected. See How to append something to an array?, Why is array.push sometimes faster than array[n] = value? and Is there a reason JavaScript developers don't use Array.push()? - the conclusion is that you should use what is most readable / appropriate for your use case, not what you think could be faster.

Categories

Resources