I am running some complex loop in node.js. But there is a problem I am facing. The time it takes for the loop to complete is around 200-300 ms which is very high. Will it be efficient if I convert also this piece of code to C? Or is there a better way? I have tried using clustering, fork(), reverse loops but nothing seems to make much difference
Some sample data
containers //Multidimensional 2D array on each index has array of 8 elements something like [ [1,2,3,4,5,6,7,8] ... ]
deleteItems= [1,2,3]
for (let indexi = 0; indexi < containers.length; indexi++) {
var shuoldRemove = false;
for (let indexj = 0; indexj < containers[indexi].length; indexj++)
{
for (let indexOfIPCPR = 0; indexOfIPCPR < deleteItems.length; indexOfIPCPR++){
if (containers[indexi][indexj] == deleteItems[indexOfIPCPR]){
shouldRemove = true;
shouldRemove && indexOfNextRound.splice(indexOfNextRound.indexOf(indexi),1);
}
}
}
}
The above code is further inside another loop. Which is much more worse.
Any help would be appreciated.
Thanks in advance
I think for very complex arrays you can get better speed by caching the array length. I see that you are getting the length in every iteration. This is how the loop should be updated to reduce the time taken in calculating array lengths. Please remember that modern JS engines do such optimization on their own. So, this may change nothing.
for (let indexi = 0, maxi = containers.length; indexi < maxi; indexi++) {
var shuoldRemove = false;
for (let indexj = 0, maxj = containers[indexi].length; indexj < maxj; indexj++)
{
for (let indexOfIPCPR = 0, maxDelete = deleteItems.length; indexOfIPCPR < maxDelete; indexOfIPCPR++){
if (containers[indexi][indexj] == deleteItems[indexOfIPCPR]){
shouldRemove = true;
shouldRemove && indexOfNextRound.splice(indexOfNextRound.indexOf(indexi),1);
}
}
}
}
As mentioned in the comments, this is not that simple and mainly depends on the content of your loop. But few relevant points that maybe will help you do decide:
The event-loop of Node.js is single threaded. It doesn't mean that everything will run in one thread - I/O (network operations, writing to files, etc) operations have their own threads and they will run async. BUT, if your code doesn't have lots of I/O, it will pretty much run in a single thread.
In C, you can create threads as you wish and run your code concurrently. But it will be more efficient only if your code can run concurrently, without the possibly high overhead of communicating between the threads, syncing them. So, if you can split the resources and the input data to few independent groups and then pass each group to a thread and run them all concurrently, that will probably will be more efficient then run it all in a single thread.
These will be the main differences between running it in Node.js and C - threads-wise. Of course, there are more aspects that differ between Node.js and C.
A Tale of Two Functions
I have one function that fills an array up to a specified value:
function getNumberArray(maxValue) {
const a = [];
for (let i = 0; i < maxValue; i++) {
a.push(i);
}
return a;
}
And a similar generator function that instead yields each value:
function* getNumberGenerator(maxValue) {
for (let i = 0; i < maxValue; i++) {
yield i;
}
}
Test Runner
I've written this test for both these scenarios:
function runTest(testName, numIterations, funcToTest) {
console.log(`Running ${testName}...`);
let dummyCalculation;
const startTime = Date.now();
const initialMemory = process.memoryUsage();
const iterator = funcToTest(numIterations);
for (let val of iterator) {
dummyCalculation = numIterations - val;
}
const finalMemory = process.memoryUsage();
// note: formatNumbers can be found here: https://jsfiddle.net/onz1ozjq/
console.log(formatNumbers `Total time: ${Date.now() - startTime}ms`);
console.log(formatNumbers `Rss: ${finalMemory.rss - initialMemory.rss}`);
console.log(formatNumbers `Heap Total: ${finalMemory.heapTotal - initialMemory.heapTotal}`);
console.log(formatNumbers `Heap Used: ${finalMemory.heapUsed - initialMemory.heapUsed}`);
}
Running the Tests
Then when running these two like so:
const numIterations = 999999; // 999,999
console.log(formatNumbers `Running tests with ${numIterations} iterations...\n`);
runTest("Array test", numIterations, getNumberArray);
console.log("");
runTest("Generator test", numIterations, getNumberGenerator);
I get results similar to this:
Running tests with 999,999 iterations...
Running Array test...
Total time: 105ms
Rss: 31,645,696
Heap Total: 31,386,624
Heap Used: 27,774,632
Running Function generator test...
Total time: 160ms
Rss: 2,818,048
Heap Total: 0
Heap Used: 1,836,616
Note: I am running these tests on node v4.1.1 on Windows 8.1. I am not using a transpiler and I'm running it by doing node --harmony generator-test.js.
Question
The increased memory usage with an array is obviously expected... but why am I consistently getting faster results for an array? What's causing the slowdown here? Is doing a yield just an expensive operation? Or maybe there's something up with the method I'm doing to check this?
The terribly unsatisfying answer is probably this: Your ES5 function relies on features that (with the exceptions of let and const) have been in V8 since it was released in 2008 (and presumably for some time before, as I understand that what became V8 originated as part of Google's web crawler). Generators, on the other hand, have only been in V8 since 2013. So not only has the ES5 code had seven years to be optimized while the ES6 code has had only two, almost nobody (compared to the many millions of sites using code just like your ES5 code) is using generators in V8 yet, which means there has been very little opportunity to discover, or incentive to implement, optimizations for it.
If you really want a technical answer as to why generators are comparatively slow in Node.js, you'll probably have to dive into the V8 source yourself, or ask the people who wrote it.
In the OP's example, the generator will always be slower
While JS engine authors will continue working to improve performance, there are some underlying structural realities that virtually guarantee that, for the OP's test case, building the array will always be faster than using an iterator.
Consider that a generator function returns a generator object.
A generator object will, by definition, have a next() function, and calling a function in Javascript means adding an entry to your call stack. While this is fast, it's likely never going to be as fast as direct property access. (At least, not until the singularity.)
So if you are going to iterate over every single element in a collection, then a for loop over a simple array, which accesses elements via direct property access, is always going to be faster than a for loop over an iterator, which accesses each element via a call to the next() function.
As I'm writing this in January of 2022, running Chrome 97, the generator function is 60% slower than the array function using the OP's example.
Performance is use-case-dependent
It's not difficult to imagine scenarios where the generator would be faster. The major downside to the array function is that it must build the entire collection before the code can start iterating over the elements, whether or not you need all the elements.
Consider a basic search operation which will only access, on average, half the elements of the collection. In this scenario, the array function exposes its "Achilles' heel": it must build an array with all the results, even though half will never be seen. This is where a generator has the potential to far outstrip the array function.
To demonstrate this, I slightly modified the OP's use-case. I made the elements of the array slightly more expensive to calculate (with a little division and square root logic) and modified the loop to terminate at about the halfway mark (to mimic a basic search).
Setup
function getNumberArray(maxValue) {
const a = [];
for (let i = 0; i < maxValue; i++) {
const half = i / 2;
const double = half * 2;
const root = Math.sqrt(double);
const square = Math.round(root * root);
a.push(square);
}
return a;
}
function* getNumberGenerator(maxValue) {
for (let i = 0; i < maxValue; i++) {
const half = i / 2;
const double = half * 2;
const root = Math.sqrt(double);
const square = Math.round(root * root);
yield square;
}
}
let dummyCalculation;
const numIterations = 99999;
const searchNumber = numIterations / 2;
Generator
const iterator = getNumberGenerator(numIterations);
for (let val of iterator) {
dummyCalculation = numIterations - val;
if (val > searchNumber) break;
}
Array
const iterator = getNumberArray(numIterations);
for (let val of iterator) {
dummyCalculation = numIterations - val;
if (val > searchNumber) break;
}
With this code, the two approaches are neck-and-neck. After repeated test runs, the generator and array functions trade first and second place. It's not difficult to imagine that if the elements of the array were even more expensive to calculate (for example, cloning a complex object, making a REST callout, etc), then the generator would win easily.
Considerations beyond performance
While recognizing that the OP's question is specifically about performance, I think it's worth calling out that generator functions were not primarily developed as a faster alternative to looping over arrays.
Memory efficiency
The OP has already acknowledged this, but memory efficiency is one of the main benefits that generators provide over building arrays. Generators can build objects on the fly and then discard them when they are no longer needed. In its most ideal implementation, a generator need only hold one object in memory at a time, while an array must hold all of them simultaneously.
For a very memory-intensive collection, a generator would allow the system to build objects as they are needed and then reclaim that memory when the calling code moves on to the next element.
Representation of non-static collections
Generators don't have to resolve the entire collection, which free them up to represent collections that might not exist entirely in memory.
For example, a generator can represent collections where the logic to fetch the "next" item is time-consuming (such as paging over the results of a database query, where items are fetched in batches) or state-dependent (such as iterating over a collection where operations on the current item affect which item is "next") or even infinite series (such as a fractal function, random number generator or a generator returning all the digits of π). These are scenarios where building an array would be either impractical or impossible.
One could imagine a generator that returns procedurally generated level data for a game based on a seed number, or even to represent a theoretical AI's "stream of consciousness" (for example, playing a word association game). These are interesting scenarios that would not be possible to represent using a standard array or list, but where a loop structure might feel more natural in code.
FYI this question is ancient in internet terms and generators have caught up (at least when tested in Chrome) https://jsperf.com/generator-vs-loops1
Try replacing the 'let' in the generator function with a function scoped 'var'. It seems that the 'let' inside the loop incurs a lot of overhead. Only use let if you absolutely have to.
In fact, running this benchmark now, generators at ~2x faster.
I've modified the code slightly (moved let i) and here is the full gist: https://gist.github.com/antonkatz/6b0911c85ddadae39c434bf8ac32b340
On my machine, these are the results:
Running Array...
Total time: 4,022ms
Rss: 2,728,939,520
Heap Total: 2,726,199,296
Heap Used: 2,704,236,368
Running Generator...
Total time: 2,541ms
Rss: 851,968
Heap Total: 0
Heap Used: -5,073,968
I was very curious myself and could not find a proper answer. Thanks #David for providing the test code.
I'm looping through a dataset with a couple of thousand items in it like this.
users.forEach(function(user){
//List ALLTHETHINGS!
listAllEverything(user)
//Add gropings
user.groupings = groupings.filter(function(grouping){
return grouping.conditional(user)
})
//Add conversions to user, one per user.
user.conversions = {}
//for each conversionList
conversionLists.forEach(function(conversionList){
user.conversions[conversionList.name] = [];
//for each conversion
for (var i = conversionList.conversions.length - 1; i >= 0; i--) {
var conversion = conversionList.conversions[i]
//test against each users apilog
var converted = user.apilog.some(function(event){
return conversion.conditional(event);
})
if (converted){
//Lägg till konverteringen och alla konverteringar som kommer innan.
for (var i = i; i >= 0; i--){
user.conversions[conversionList.name].push(conversionList.conversions[i])
}
break;
}
};
})
})
I know this is not the most optimized code and I have some ideas how it can be improved. But i'm pretty new to these kinds of problems, so I'm not sure how I should prioritize. I know about console.time, which is useful but I want to use something that allows me to compound the time spent on each part of the forEach-loop, either a tool (I usually use chrome) or some debugging-method. Perferably something that doesn't effect the performance too much.
Since you are using Chrome you shoud check out the Timeline tab in your browsers DevTools - just hit the record button before running the loop and stop it once it's done. You will se a nice breakdown of everything that just happened and you will be mostly interested in yellow bars - they show JavaScript operations.
Please check out this video presentation by Paul Irish about Performance Tooling
As you know, in Chrome or Firefox you can just wrap a piece of code with console.time (and console.timeEnd) and it will measure the speed of particular operation and print it in the console.
For example: to measure the time it takes for an entire loop to execute use:
console.time('For loop benchmark');
for(i=0; i<1000; i++) {
// do some operations here
}
console.timeEnd('For loop benchmark');
But, if you want to measure each iteration you can parameterize the name of the log inside the loop so that you can name each specific operation the way you want:
for(i=0; i<1000; i++)
var consoleTimeName = 'Measuring iteration no '+i+' which is doing this and that...';
console.time(consoleTimeName);
// do some operations here
console.timeEnd(consoleTimeName);
}
Using it you can see for yourself how much faster simple for loop can be in comparsion to jQuery's $.each loop.
You can find more about this on developer.mozilla.org and developer.chrome.com. Please not that this is note a standarized, cross-browser compatibile feature and you should not be using it on a production website, since some browser like IE may throw you an error when they see it.
I have some different Map/Reduces functions that I use in my project. But one is a lot different than the others since it requires a loop in the map functionality. And for each count in the loop, I send an emit.
What I have is this scenario (in the user collection):
"channels" : [
"Channel 1",
"Channel 2",
],
What I want to do is to count how many users each channel has. So for that I could use db.users.find({channels: "Channel 1"}).count() but unfortunately channels are dynamic which means I don't know all the possible channel names and it may well change in the future.
So I thought that a Map/Reduce job would sit just perfect. But the problem is that the first Reduce job I wrote calculated wrong. And the other where I used a query for each emit, would come to take forever (more than 3 hours before the ssh session shut down).
So now I'm stuck and I need help, preferably I would want to have a Map/Reduce job since it's more nice than a bunch of queries which is kind of slow to run in real time.
This is the latest Map and Reduce functions I wrote:
var map = function() {
if(this.channels) {
for(var i = 0, imax = this.channels.length; i<imax; i++) {
emit(this.channels[i], 1);
}
}
}
var reduce = function (key, values) {
var result = 0;
values.forEach(function (value) {
// had this before: result += 1;
result = db.users.find({'channels' : key}).count();
});
return result;
}
I knew that the reduce function was horrific but I just tried the best I could think of. I think my logic may seem wrong but I can't find a good solution. Now I'm thinking of just doing a bunch of queries on every page load, but it will be slow as hell.
Please help! :)
In your scenario the reduce function should look like this:
var reduce = function (key, values) {
var result = 0;
values.forEach(function (value) {
result += value;
});
return result;
}
Let me know if it is still not working and if it does please give an example of input and (incorrect) output.
MR is sometimes a bit slow. So you might want to check out the new aggregation framework coming with 2.2 (which i think is s currently in release phase).
See: http://docs.mongodb.org/manual/applications/aggregation/
Additionally you might need to speed up the queries via using proper indices. Or adding a user count to the channels and increasing/decreasing when the user joining/leaving a channel. Depends on your app's use case of course.
Alright, long story short, what I overall am attempting to do, is test the level of randomness in a series of multiple thousands of " previously generated seemingly "random" numbers.
I've already written something that will test for the probability of numbers with great success, however, the next step is identifying repeating or recurring patterns.
I'd prefer to get this part done in javascript, so as to avoid having to teach myself another language for the time being.
Now, obviously, I could just use regex and punch in some random sequences myself, but that is not ideal, and would take an infinite amount of time to get the results I'm after.
Ah, I missed a number of your comments above. I believe this is what you're looking for:
function findLongestMatch(StringOfNumbers) {
var matches = StringOfNumbers.match(/(.{2,})(?=.*?\1)/g);
if (!matches) { return null; }
var longestMatch = matches[0];
var longestMatchLength = longestMatch.length;
for (matchIndex = 1; matchIndex < matches.length; matchIndex++) {
if (matches[matchIndex].length > longestMatchLength) {
longestMatch = matches[matchIndex];
longestMatchLength = longestMatch.length;
}
}
return longestMatch;
}
It'll be slow, but it'll get the job done.