Avoid garbage collection in high performance JavaScript applications [duplicate] - javascript

I have a fairly complex Javascript app, which has a main loop that is called 60 times per second. There seems to be a lot of garbage collection going on (based on the 'sawtooth' output from the Memory timeline in the Chrome dev tools) - and this often impacts the performance of the application.
So, I'm trying to research best practices for reducing the amount of work that the garbage collector has to do. (Most of the information I've been able to find on the web regards avoiding memory leaks, which is a slightly different question - my memory is getting freed up, it's just that there's too much garbage collection going on.) I'm assuming that this mostly comes down to reusing objects as much as possible, but of course the devil is in the details.
The app is structured in 'classes' along the lines of John Resig's Simple JavaScript Inheritance.
I think one issue is that some functions can be called thousands of times per second (as they are used hundreds of times during each iteration of the main loop), and perhaps the local working variables in these functions (strings, arrays, etc.) might be the issue.
I'm aware of object pooling for larger/heavier objects (and we use this to a degree), but I'm looking for techniques that can be applied across the board, especially relating to functions that are called very many times in tight loops.
What techniques can I use to reduce the amount of work that the garbage collector must do?
And, perhaps also - what techniques can be employed to identify which objects are being garbage collected the most? (It's a farly large codebase, so comparing snapshots of the heap has not been very fruitful)

A lot of the things you need to do to minimize GC churn go against what is considered idiomatic JS in most other scenarios, so please keep in mind the context when judging the advice I give.
Allocation happens in modern interpreters in several places:
When you create an object via new or via literal syntax [...], or {}.
When you concatenate strings.
When you enter a scope that contains function declarations.
When you perform an action that triggers an exception.
When you evaluate a function expression: (function (...) { ... }).
When you perform an operation that coerces to Object like Object(myNumber) or Number.prototype.toString.call(42)
When you call a builtin that does any of these under the hood, like Array.prototype.slice.
When you use arguments to reflect over the parameter list.
When you split a string or match with a regular expression.
Avoid doing those, and pool and reuse objects where possible.
Specifically, look out for opportunities to:
Pull inner functions that have no or few dependencies on closed-over state out into a higher, longer-lived scope. (Some code minifiers like Closure compiler can inline inner functions and might improve your GC performance.)
Avoid using strings to represent structured data or for dynamic addressing. Especially avoid repeatedly parsing using split or regular expression matches since each requires multiple object allocations. This frequently happens with keys into lookup tables and dynamic DOM node IDs. For example, lookupTable['foo-' + x] and document.getElementById('foo-' + x) both involve an allocation since there is a string concatenation. Often you can attach keys to long-lived objects instead of re-concatenating. Depending on the browsers you need to support, you might be able to use Map to use objects as keys directly.
Avoid catching exceptions on normal code-paths. Instead of try { op(x) } catch (e) { ... }, do if (!opCouldFailOn(x)) { op(x); } else { ... }.
When you can't avoid creating strings, e.g. to pass a message to a server, use a builtin like JSON.stringify which uses an internal native buffer to accumulate content instead of allocating multiple objects.
Avoid using callbacks for high-frequency events, and where you can, pass as a callback a long-lived function (see 1) that recreates state from the message content.
Avoid using arguments since functions that use that have to create an array-like object when called.
I suggested using JSON.stringify to create outgoing network messages. Parsing input messages using JSON.parse obviously involves allocation, and lots of it for large messages. If you can represent your incoming messages as arrays of primitives, then you can save a lot of allocations. The only other builtin around which you can build a parser that does not allocate is String.prototype.charCodeAt. A parser for a complex format that only uses that is going to be hellish to read though.

The Chrome developer tools have a very nice feature for tracing memory allocation. It's called the Memory Timeline. This article describes some details. I suppose this is what you're talking about re the "sawtooth"? This is normal behavior for most GC'ed runtimes. Allocation proceeds until a usage threshold is reached triggering a collection. Normally there are different kinds of collections at different thresholds.
Garbage collections are included in the event list associated with the trace along with their duration. On my rather old notebook, ephemeral collections are occurring at about 4Mb and take 30ms. This is 2 of your 60Hz loop iterations. If this is an animation, 30ms collections are probably causing stutter. You should start here to see what's going on in your environment: where the collection threshold is and how long your collections are taking. This gives you a reference point to assess optimizations. But you probably won't do better than to decrease the frequency of the stutter by slowing the allocation rate, lengthening the interval between collections.
The next step is to use the Profiles | Record Heap Allocations feature to generate a catalog of allocations by record type. This will quickly show which object types are consuming the most memory during the trace period, which is equivalent to allocation rate. Focus on these in descending order of rate.
The techniques are not rocket science. Avoid boxed objects when you can do with an unboxed one. Use global variables to hold and reuse single boxed objects rather than allocating fresh ones in each iteration. Pool common object types in free lists rather than abandoning them. Cache string concatenation results that are likely reusable in future iterations. Avoid allocation just to return function results by setting variables in an enclosing scope instead. You will have to consider each object type in its own context to find the best strategy. If you need help with specifics, post an edit describing details of the challenge you're looking at.
I advise against perverting your normal coding style throughout an application in a shotgun attempt to produce less garbage. This is for the same reason you should not optimize for speed prematurely. Most of your effort plus much of the added complexity and obscurity of code will be meaningless.

As a general principle you'd want to cache as much as possible and do as little creating and destroying for each run of your loop.
The first thing that pops in my head is to reduce the use of anonymous functions (if you have any) inside your main loop. Also it'd be easy to fall into the trap of creating and destroying objects that are passed into other functions. I'm by no means a javascript expert, but I would imagine that this:
var options = {var1: value1, var2: value2, ChangingVariable: value3};
function loopfunc()
{
//do something
}
while(true)
{
$.each(listofthings, loopfunc);
options.ChangingVariable = newvalue;
someOtherFunction(options);
}
would run much faster than this:
while(true)
{
$.each(listofthings, function(){
//do something on the list
});
someOtherFunction({
var1: value1,
var2: value2,
ChangingVariable: newvalue
});
}
Is there ever any downtime for your program? Maybe you need it to run smoothly for a second or two (e.g. for an animation) and then it has more time to process? If this is the case I could see taking objects that would normally be garbage collected throughout the animation and keeping a reference to them in some global object. Then when the animation ends you can clear all the references and let the garbage collector do it's work.
Sorry if this is all a bit trivial compared to what you've already tried and thought of.

I'd make one or few objects in the global scope (where I'm sure garbage collector is not allowed to touch them), then I'd try to refactor my solution to use those objects to get the job done, instead of using local variables.
Of course it couldn't be done everywhere in the code, but generally that's my way to avoid garbage collector.
P.S. It might make that specific part of code a little bit less maintainable.

Related

'marking' an object for garbage collection in NodeJS

I'm working with some code in NodeJS, and some objects (i.e, 'events') will be medium-lived, and then discarded.
I don't want them becoming a memory burden when I stop using them, and I want to know if there is a way to mark an object to be garbage-collected by the V8 engine. (or better yet- completely destroy the object on command)
I understand that garbage collection is automatic, but since these objects will, 60% of the time, outlive the young generation, I would like to make sure there is a way they don't camp out in the old-generation for a while after they are discarded, while avoiding the inefficiency of searching the entire thing.
I've looked around, and so far can't find anything in the NodeJS docs. I have two main questions:
Would this even be that good? Would it be worth it to be able to 'mark' large amounts of unused objects to be gc'ed? (possibly 100+ at a time)
Is there even a way to do this?
Anything (speculation, hints, articles) would be appreciated. Thanks!
(V8 developer here.) There's no way to do this, and you don't need to worry about it. Marking works the other way round: the GC finds and marks live objects. Dead objects are never marked, and there's no explicit act of destroying them. The GC never even looks at dead objects. Which also means that dead objects are not a burden.
"Garbage collector" really is a misleading term: it doesn't actually find or collect garbage; instead it finds non-garbage and keeps it, and everything it hasn't found it just ignores by assuming that the respective memory regions are free.
In theory, there could be a way to manually add (the memory previously occupied by) objects to the "free list"; but there's a fundamental problem with that: part of the point of automatic memory management is that automating it provides better security and stability than relying on manual memory management (with programmers being humans, and humans making mistakes). That means that by design, a GC can't trust anyone else to declare objects as unreachable; it would always insist on verifying that claim -- which is equivalent to disregarding it, as the only way to verify it is to run a full regular GC cycle.

When does garbage collection actually get executed in Javascript? [duplicate]

Is there any way to control when Javascript performs garbage collection? I would like to enable it to perform garbage collection at certain times to ensure the smooth operation of my web site
Javascript doesn't have explicit memory management, it's the browser which decides when to clean it up. Sometimes it may happen that you experience un-smooth rendering of JavaScript due to a garbage collection pause.
There are many techniques that you can apply to overcome glitches caused by garbage collection (GC). More you apply more you explore. Suppose you have a game written in JavaScript , and every second you are creating a new object then its obvious that at after certain amount of time GC will occur to make further space for your application.
For real time application like games, which requires lot of space the simplest thing you can do is to reuse the same memory. It depends on you how you structure your code. If it generates lots of garbage then it might give choppy experience.
By using simple procedures: This is well know that new keyword indicates allocation. Wherever possible you can try to reuse the same object by each time by adding or modifying properties. This is also called recycling of object
In case of Arrays, assigning [] is often used to clear array, but you should keep in mind that it also creates a new array and garbages the old one. To reuse the same block you should use arr.length = 0 This has the same effect but it reuses the same array object instead of creating new one.
In case of functions: Sometimes our program needed to call a specific function more time or on certain intervals by using setInterval or setTimeout.
ex: setTimeout(function() { doSomething() }, 10);
You can optimize the above code by assigning the function to a permanent variable rather than spawning each time at regular intervals.
ex : var myfunc = function() { doSomething() }
setTimeout(myfunc, 10);
Other possible thing is that, the array slice() method returns a new array (based on a range in the original array,that can remain untouched), string's substr also returns a new string (based on a range of characters in the original string, that can remain untouched), and so on. Calling these functions creates garbage if not reutilized properly.
To avoid garbage completely in JavaScript is very difficult, you could say impossible. Its depends, how you reuse the objects and variables to avoid garbage. If your code is well structured and optimized you can minimize the overhead.
Unfortunately, there is no way to control WHEN the garbage collection takes place but with the proper formation of objects, you CAN control how quickly and cleanly it happens. Take a look at these documents on Mozilla Dev Net.
This algorithm assumes the knowledge of a set of objects called roots
(In JavaScript, the root is the global object). Periodically, the
garbage-collector will start from these roots, find all objects that
are referenced from these roots, then all objects referenced from
these, etc. Starting from the roots, the garbage collector will thus
find all reachable objects and collect all non-reachable objects.
This algorithm is better than the previous one since "an object has
zero reference" leads to this object being unreachable. The opposite
is not true as we have seen with cycles.
Why not keep references to all your objects until you want them to be GC'd?
var delayed_gc_objects = [];
function delayGC(obj) { // keeps reference alive
return delayed_gc_objects[delayed_gc_objects.length] = obj;
}
function resumeGC() { // kills references, letting them be GCd
delayed_gc_objects.length = 0;
}
you can perform some changes to improve your memory use, like:
don't set variables on loops
avoid using of global variables and functions. they will take a piece of memory until you get out
JavaScript is a garbage-collected language, meaning that the execution environment is responsible for managing the memory required during code execution.
The most popular form of garbage collection for JavaScript is called mark-and-sweep.
A second, less-popular type of garbage collection is reference counting. The idea is that every value keeps track of how many references are made to it
GC follows these algo, even if you manage to trigger the GC, it will not be guaranteed that it will run immediately, you are only marking it
garbage collection (GC) is a form of automatic memory management by removing the objects that no needed anymore.
any process deal with memory follow these steps:
1 - allocate your memory space you need
2 - do some processing
3 - free this memory space
there are two main algorithm used to detect which objects no needed anymore.
Reference-counting garbage collection: this algorithm reduces the definition of "an object is not needed anymore" to "an object has no other object referencing to it", the object will removed if no reference point to it
Mark-and-sweep algorithm: connect each objects to root source. any object doesn't connect to root or other object. this object will be removed.
currently most modern browsers using the second algorithm.

JavaScript: performance constraints of `delete` keyword

I'm trying to better learn how JS works under the hood and I've heard in the past that the delete keyword (specifically node.js or browsers using V8) results in poor performance, so I want to see if I can figure out what the benefits/detriments are for using that keyword.
I believe the reasoning for not using delete is that removing a property leads to a rebuilding of hidden class transitions and thus a recompiling of the inline cache. However, I believe it is also true that the object prototype will no longer enumerate that property, so if the object is used heavily the upfront cost may eventually pay off.
So:
Are my assumptions about the tradeoffs correct?
If they are correct, is one factor more important than the other (e.g. is rebuilding the IC much more expensive than many prototype enumerations)?
V8 developer here. Short answer: "it depends".
Having an unused property doesn't hurt; there is no general "enumeration cost" unless you actually perform explicit enumerations. In other words, an "enumeration cost" only exists if you find yourself doing something like this:
for (var p in object) {
if (p === old_property_that_I_could_have_deleted) continue;
/* process other properties... */
}
The key reason why it's hard to give a concrete answer (or to provide a canonical example where an effect would be measurable) is because the effects are non-local: they depend both on what exactly you're doing with the object in question, and on what the rest of your app is doing. Deleting a property from one object may well cause operations on other objects to become slower. Or faster. It depends.
To take a step back and look at the high-level situation: JavaScript as a language sort of assumes that objects are represented as dictionaries. Deleting an entry in a dictionary should be perfectly fine, which is why it makes sense that the delete operator exists. In practice, it turns out that an engine can achieve huge performance improvements for read-heavy apps, which is by far the most common case, if it does not store objects as dictionaries, but instead more like something that resembles C/C++ structs. However, such an object representation is (1) generally hard/inefficient to do when properties get deleted, and (2) the engine may well interpret even the first deletion of a property as a hint that the programmer wants this particular object to behave like a dictionary, so it might switch the internal representation over. If a fast-to-modify dictionary is what you wanted, then that's fine (it will provide a benefit even); however if you wanted the object to remain in slow-to-modify/fast-to-read mode, you would perceive the transition to fast-to-modify/slow-to-read dictionary mode as a performance problem.
Thankfully there is a great solution nowadays: when you want a dictionary, use a Map or Set. Engines can (and usually will) assume that you'll want to delete entries from these, so the implementations are optimized for making that possible without negative side effects; in particular no hidden classes are involved.
A few remarks on your assumptions: deleting a property makes an object (mostly) leave the system of hidden class transitions, no transitions will be rebuilt. There is no single global "inline cache", there are many inline caches sprinkled all over your functions. They don't get rebuilt, they just transition to slower and slower modes the more different cases they have to handle. (That's generally how caching works: caching a single case provides huge speedups; on the other end of the scale if you have as many different cases as executions, then a cache just wastes time and memory without providing any benefit.) Again the effect of dictionary-mode objects depends on the overall situation: an inline cache dealing with (mostly) dictionary-mode objects typically exhibits performance somewhere in between (1) an inline cache that only has to deal with objects sharing the single same hidden class, and (2) an inline cache that has to deal with hundreds or thousands of different hidden classes.

Is having a nested function inside a constructor memory inefficient? [duplicate]

In JavaScript, we have two ways of making a "class" and giving it public functions.
Method 1:
function MyClass() {
var privateInstanceVariable = 'foo';
this.myFunc = function() { alert(privateInstanceVariable ); }
}
Method 2:
function MyClass() { }
MyClass.prototype.myFunc = function() {
alert("I can't use private instance variables. :(");
}
I've read numerous times people saying that using Method 2 is more efficient as all instances share the same copy of the function rather than each getting their own. Defining functions via the prototype has a huge disadvantage though - it makes it impossible to have private instance variables.
Even though, in theory, using Method 1 gives each instance of an object its own copy of the function (and thus uses way more memory, not to mention the time required for allocations) - is that what actually happens in practice? It seems like an optimization web browsers could easily make is to recognize this extremely common pattern, and actually have all instances of the object reference the same copy of functions defined via these "constructor functions". Then it could only give an instance its own copy of the function if it is explicitly changed later on.
Any insight - or, even better, real world experience - about performance differences between the two, would be extremely helpful.
See http://jsperf.com/prototype-vs-this
Declaring your methods via the prototype is faster, but whether or not this is relevant is debatable.
If you have a performance bottleneck in your app it is unlikely to be this, unless you happen to be instantiating 10000+ objects on every step of some arbitrary animation, for example.
If performance is a serious concern, and you'd like to micro-optimise, then I would suggest declaring via prototype. Otherwise, just use the pattern that makes most sense to you.
I'll add that, in JavaScript, there is a convention of prefixing properties that are intended to be seen as private with an underscore (e.g. _process()). Most developers will understand and avoid these properties, unless they're willing to forgo the social contract, but in that case you might as well not cater to them. What I mean to say is that: you probably don't really need true private variables...
In the new version of Chrome, this.method is about 20% faster than prototype.method, but creating new object is still slower.
If you can reuse the object instead of always creating an new one, this can be 50% - 90% faster than creating new objects. Plus the benefit of no garbage collection, which is huge:
http://jsperf.com/prototype-vs-this/59
It only makes a difference when you're creating lots of instances. Otherwise, the performance of calling the member function is exactly the same in both cases.
I've created a test case on jsperf to demonstrate this:
http://jsperf.com/prototype-vs-this/10
You might not have considered this, but putting the method directly on the object is actually better in one way:
Method invocations are very slightly faster (jsperf) since the prototype chain does not have to be consulted to resolve the method.
However, the speed difference is almost negligible. On top of that, putting a method on a prototype is better in two more impactful ways:
Faster to create instances (jsperf)
Uses less memory
Like James said, this difference can be important if you are instantiating thousands of instances of a class.
That said, I can certainly imagine a JavaScript engine that recognizes that the function you are attaching to each object does not change across instances and thus only keeps one copy of the function in memory, with all instance methods pointing to the shared function. In fact, it seems that Firefox is doing some special optimization like this but Chrome is not.
ASIDE:
You are right that it is impossible to access private instance variables from inside methods on prototypes. So I guess the question you must ask yourself is do you value being able to make instance variables truly private over utilizing inheritance and prototyping? I personally think that making variables truly private is not that important and would just use the underscore prefix (e.g., "this._myVar") to signify that although the variable is public, it should be considered to be private. That said, in ES6, there is apparently a way to have the both of both worlds!
You may use this approach and it will allow you to use prototype and access instance variables.
var Person = (function () {
function Person(age, name) {
this.age = age;
this.name = name;
}
Person.prototype.showDetails = function () {
alert('Age: ' + this.age + ' Name: ' + this.name);
};
return Person; // This is not referencing `var Person` but the Person function
}()); // See Note1 below
Note1:
The parenthesis will call the function (self invoking function) and assign the result to the var Person.
Usage
var p1 = new Person(40, 'George');
var p2 = new Person(55, 'Jerry');
p1.showDetails();
p2.showDetails();
In short, use method 2 for creating properties/methods that all instances will share. Those will be "global" and any change to it will be reflected across all instances. Use method 1 for creating instance specific properties/methods.
I wish I had a better reference but for now take a look at this. You can see how I used both methods in the same project for different purposes.
Hope this helps. :)
This answer should be considered an expansion of the rest of the answers filling in missing points. Both personal experience and benchmarks are incorporated.
As far as my experience goes, I use constructors to literally construct my objects religiously, whether methods are private or not. The main reason being that when I started that was the easiest immediate approach to me so it's not a special preference. It might have been as simple as that I like visible encapsulation and prototypes are a bit disembodied. My private methods will be assigned as variables in the scope as well. Although this is my habit and keeps things nicely self contained, it's not always the best habit and I do sometimes hit walls. Apart from wacky scenarios with highly dynamic self assembling according to configuration objects and code layout it tends to be the weaker approach in my opinion particularly if performance is a concern. Knowing that the internals are private is useful but you can achieve that via other means with the right discipline. Unless performance is a serious consideration, use whatever works best otherwise for the task at hand.
Using prototype inheritance and a convention to mark items as private does make debugging easier as you can then traverse the object graph easily from the console or debugger. On the other hand, such a convention makes obfuscation somewhat harder and makes it easier for others to bolt on their own scripts onto your site. This is one of the reasons the private scope approach gained popularity. It's not true security but instead adds resistance. Unfortunately a lot of people do still think it's a genuinely way to program secure JavaScript. Since debuggers have gotten really good, code obfuscation takes its place. If you're looking for security flaws where too much is on the client, it's a design pattern your might want to look out for.
A convention allows you to have protected properties with little fuss. That can be a blessing and a curse. It does ease some inheritance issues as it is less restrictive. You still do have the risk of collision or increased cognitive load in considering where else a property might be accessed. Self assembling objects let you do some strange things where you can get around a number of inheritance problems but they can be unconventional. My modules tend to have a rich inner structure where things don't get pulled out until the functionality is needed elsewhere (shared) or exposed unless needed externally. The constructor pattern tends to lead to creating self contained sophisticated modules more so than simply piecemeal objects. If you want that then it's fine. Otherwise if you want a more traditional OOP structure and layout then I would probably suggest regulating access by convention. In my usage scenarios complex OOP isn't often justified and modules do the trick.
All of the tests here are minimal. In real world usage it is likely that modules will be more complex making the hit a lot greater than tests here will indicate. It's quite common to have a private variable with multiple methods working on it and each of those methods will add more overhead on initialisation that you wont get with prototype inheritance. In most cases is doesn't matter because only a few instances of such objects float around although cumulatively it might add up.
There is an assumption that prototype methods are slower to call because of prototype lookup. It's not an unfair assumption, I made the same myself until I tested it. In reality it's complex and some tests suggest that aspect is trivial. Between, prototype.m = f, this.m = f and this.m = function... the latter performs significantly better than the first two which perform around the same. If the prototype lookup alone were a significant issue then the last two functions instead would out perform the first significantly. Instead something else strange is going on at least where Canary is concerned. It's possible functions are optimised according to what they are members of. A multitude of performance considerations come into play. You also have differences for parameter access and variable access.
Memory Capacity. It's not well discussed here. An assumption you can make up front that's likely to be true is that prototype inheritance will usually be far more memory efficient and according to my tests it is in general. When you build up your object in your constructor you can assume that each object will probably have its own instance of each function rather than shared, a larger property map for its own personal properties and likely some overhead to keep the constructor scope open as well. Functions that operate on the private scope are extremely and disproportionately demanding of memory. I find that in a lot of scenarios the proportionate difference in memory will be much more significant than the proportionate difference in CPU cycles.
Memory Graph. You also can jam up the engine making GC more expensive. Profilers do tend to show time spent in GC these days. It's not only a problem when it comes to allocating and freeing more. You'll also create a larger object graph to traverse and things like that so the GC consumes more cycles. If you create a million objects and then hardly touch them, depending on the engine it might turn out to have more of an ambient performance impact than you expected. I have proven that this does at least make the gc run for longer when objects are disposed of. That is there tends to be a correlation with memory used and the time it takes to GC. However there are cases where the time is the same regardless of the memory. This indicates that the graph makeup (layers of indirection, item count, etc) has more impact. That's not something that is always easy to predict.
Not many people use chained prototypes extensively, myself included I have to admit. Prototype chains can be expensive in theory. Someone will but I've not measured the cost. If you instead build your objects entirely in the constructor and then have a chain of inheritance as each constructor calls a parent constructor upon itself, in theory method access should be much faster. On the other hand you can accomplish the equivalent if it matters (such as flatten the prototypes down the ancestor chain) and you don't mind breaking things like hasOwnProperty, perhaps instanceof, etc if you really need it. In either case things start to get complex once you down this road when it comes to performance hacks. You'll probably end up doing things you shouldn't be doing.
Many people don't directly use either approach you've presented. Instead they make their own things using anonymous objects allowing method sharing any which way (mixins for example). There are a number of frameworks as well that implement their own strategies for organising modules and objects. These are heavily convention based custom approaches. For most people and for you your first challenge should be organisation rather than performance. This is often complicated in that Javascript gives many ways of achieving things versus languages or platforms with more explicit OOP/namespace/module support. When it comes to performance I would say instead to avoid major pitfalls first and foremost.
There's a new Symbol type that's supposed to work for private variables and methods. There are a number of ways to use this and it raises a host of questions related to performance and access. In my tests the performance of Symbols wasn't great compared to everything else but I never tested them thoroughly.
Disclaimers:
There are lots of discussions about performance and there isn't always a permanently correct answer for this as usage scenarios and engines change. Always profile but also always measure in more than one way as profiles aren't always accurate or reliable. Avoid significant effort into optimisation unless there's definitely a demonstrable problem.
It's probably better instead to include performance checks for sensitive areas in automated testing and to run when browsers update.
Remember sometimes battery life matters as well as perceptible performance. The slowest solution might turn out faster after running an optimising compiler on it (IE, a compiler might have a better idea of when restricted scope variables are accessed than properties marked as private by convention). Consider backend such as node.js. This can require better latency and throughput than you would often find on the browser. Most people wont need to worry about these things with something like validation for a registration form but the number of diverse scenarios where such things might matter is growing.
You have to be careful with memory allocation tracking tools in to persist the result. In some cases where I didn't return and persist the data it was optimised out entirely or the sample rate was not sufficient between instantiated/unreferenced, leaving me scratching my head as to how an array initialised and filled to a million registered as 3.4KiB in the allocation profile.
In the real world in most cases the only way to really optimise an application is to write it in the first place so you can measure it. There are dozens to hundreds of factors that can come into play if not thousands in any given scenario. Engines also do things that can lead to asymmetric or non-linear performance characteristics. If you define functions in a constructor, they might be arrow functions or traditional, each behaves differently in certain situations and I have no idea about the other function types. Classes also don't behave the same in terms as performance for prototyped constructors that should be equivalent. You need to be really careful with benchmarks as well. Prototyped classes can have deferred initialisation in various ways, especially if your prototyped your properties as well (advice, don't). This means that you can understate initialisation cost and overstate access/property mutation cost. I have also seen indications of progressive optimisation. In these cases I have filled a large array with instances of objects that are identical and as the number of instances increase the objects appear to be incrementally optimised for memory up to a point where the remainder is the same. It is also possible that those optimisations can also impact CPU performance significantly. These things are heavily dependent not merely on the code you write but what happens in runtime such as number of objects, variance between objects, etc.

Best practices for reducing Garbage Collector activity in Javascript

I have a fairly complex Javascript app, which has a main loop that is called 60 times per second. There seems to be a lot of garbage collection going on (based on the 'sawtooth' output from the Memory timeline in the Chrome dev tools) - and this often impacts the performance of the application.
So, I'm trying to research best practices for reducing the amount of work that the garbage collector has to do. (Most of the information I've been able to find on the web regards avoiding memory leaks, which is a slightly different question - my memory is getting freed up, it's just that there's too much garbage collection going on.) I'm assuming that this mostly comes down to reusing objects as much as possible, but of course the devil is in the details.
The app is structured in 'classes' along the lines of John Resig's Simple JavaScript Inheritance.
I think one issue is that some functions can be called thousands of times per second (as they are used hundreds of times during each iteration of the main loop), and perhaps the local working variables in these functions (strings, arrays, etc.) might be the issue.
I'm aware of object pooling for larger/heavier objects (and we use this to a degree), but I'm looking for techniques that can be applied across the board, especially relating to functions that are called very many times in tight loops.
What techniques can I use to reduce the amount of work that the garbage collector must do?
And, perhaps also - what techniques can be employed to identify which objects are being garbage collected the most? (It's a farly large codebase, so comparing snapshots of the heap has not been very fruitful)
A lot of the things you need to do to minimize GC churn go against what is considered idiomatic JS in most other scenarios, so please keep in mind the context when judging the advice I give.
Allocation happens in modern interpreters in several places:
When you create an object via new or via literal syntax [...], or {}.
When you concatenate strings.
When you enter a scope that contains function declarations.
When you perform an action that triggers an exception.
When you evaluate a function expression: (function (...) { ... }).
When you perform an operation that coerces to Object like Object(myNumber) or Number.prototype.toString.call(42)
When you call a builtin that does any of these under the hood, like Array.prototype.slice.
When you use arguments to reflect over the parameter list.
When you split a string or match with a regular expression.
Avoid doing those, and pool and reuse objects where possible.
Specifically, look out for opportunities to:
Pull inner functions that have no or few dependencies on closed-over state out into a higher, longer-lived scope. (Some code minifiers like Closure compiler can inline inner functions and might improve your GC performance.)
Avoid using strings to represent structured data or for dynamic addressing. Especially avoid repeatedly parsing using split or regular expression matches since each requires multiple object allocations. This frequently happens with keys into lookup tables and dynamic DOM node IDs. For example, lookupTable['foo-' + x] and document.getElementById('foo-' + x) both involve an allocation since there is a string concatenation. Often you can attach keys to long-lived objects instead of re-concatenating. Depending on the browsers you need to support, you might be able to use Map to use objects as keys directly.
Avoid catching exceptions on normal code-paths. Instead of try { op(x) } catch (e) { ... }, do if (!opCouldFailOn(x)) { op(x); } else { ... }.
When you can't avoid creating strings, e.g. to pass a message to a server, use a builtin like JSON.stringify which uses an internal native buffer to accumulate content instead of allocating multiple objects.
Avoid using callbacks for high-frequency events, and where you can, pass as a callback a long-lived function (see 1) that recreates state from the message content.
Avoid using arguments since functions that use that have to create an array-like object when called.
I suggested using JSON.stringify to create outgoing network messages. Parsing input messages using JSON.parse obviously involves allocation, and lots of it for large messages. If you can represent your incoming messages as arrays of primitives, then you can save a lot of allocations. The only other builtin around which you can build a parser that does not allocate is String.prototype.charCodeAt. A parser for a complex format that only uses that is going to be hellish to read though.
The Chrome developer tools have a very nice feature for tracing memory allocation. It's called the Memory Timeline. This article describes some details. I suppose this is what you're talking about re the "sawtooth"? This is normal behavior for most GC'ed runtimes. Allocation proceeds until a usage threshold is reached triggering a collection. Normally there are different kinds of collections at different thresholds.
Garbage collections are included in the event list associated with the trace along with their duration. On my rather old notebook, ephemeral collections are occurring at about 4Mb and take 30ms. This is 2 of your 60Hz loop iterations. If this is an animation, 30ms collections are probably causing stutter. You should start here to see what's going on in your environment: where the collection threshold is and how long your collections are taking. This gives you a reference point to assess optimizations. But you probably won't do better than to decrease the frequency of the stutter by slowing the allocation rate, lengthening the interval between collections.
The next step is to use the Profiles | Record Heap Allocations feature to generate a catalog of allocations by record type. This will quickly show which object types are consuming the most memory during the trace period, which is equivalent to allocation rate. Focus on these in descending order of rate.
The techniques are not rocket science. Avoid boxed objects when you can do with an unboxed one. Use global variables to hold and reuse single boxed objects rather than allocating fresh ones in each iteration. Pool common object types in free lists rather than abandoning them. Cache string concatenation results that are likely reusable in future iterations. Avoid allocation just to return function results by setting variables in an enclosing scope instead. You will have to consider each object type in its own context to find the best strategy. If you need help with specifics, post an edit describing details of the challenge you're looking at.
I advise against perverting your normal coding style throughout an application in a shotgun attempt to produce less garbage. This is for the same reason you should not optimize for speed prematurely. Most of your effort plus much of the added complexity and obscurity of code will be meaningless.
As a general principle you'd want to cache as much as possible and do as little creating and destroying for each run of your loop.
The first thing that pops in my head is to reduce the use of anonymous functions (if you have any) inside your main loop. Also it'd be easy to fall into the trap of creating and destroying objects that are passed into other functions. I'm by no means a javascript expert, but I would imagine that this:
var options = {var1: value1, var2: value2, ChangingVariable: value3};
function loopfunc()
{
//do something
}
while(true)
{
$.each(listofthings, loopfunc);
options.ChangingVariable = newvalue;
someOtherFunction(options);
}
would run much faster than this:
while(true)
{
$.each(listofthings, function(){
//do something on the list
});
someOtherFunction({
var1: value1,
var2: value2,
ChangingVariable: newvalue
});
}
Is there ever any downtime for your program? Maybe you need it to run smoothly for a second or two (e.g. for an animation) and then it has more time to process? If this is the case I could see taking objects that would normally be garbage collected throughout the animation and keeping a reference to them in some global object. Then when the animation ends you can clear all the references and let the garbage collector do it's work.
Sorry if this is all a bit trivial compared to what you've already tried and thought of.
I'd make one or few objects in the global scope (where I'm sure garbage collector is not allowed to touch them), then I'd try to refactor my solution to use those objects to get the job done, instead of using local variables.
Of course it couldn't be done everywhere in the code, but generally that's my way to avoid garbage collector.
P.S. It might make that specific part of code a little bit less maintainable.

Categories

Resources