I am working on a little side project, a game of sorts that I built with D3, and I am currently trying optimize performance. Over time it really becomes a problem. Looking at the dev tools, it seems like the biggest issue is the Garbage Collector uses more and more CPU over time. It starts at around 20% and scales up to over 50% (at which point the game gets maybe a frame a second).
I have a couple of questions:
What are general best practices to minimize Garbage Collector usage?
Does the steadily increasing CPU usage indicate any particular issue with my code?
One tip I've already found is to save and reuse old objects rather than removing and recreating the, which I will work on next, but I want to make sure I'm not missing anything else.
Related
I've always been under the impression that as long as you have free memory, whether its 100% or 10%, the speed of your processes should not be affected.
However, I recently ran into a situation where it seems that my processes get a lot slower when it uses up a greater percentage of the memory available.
It could be a problem with the code itself, but I'm hoping to get a quick sanity check that I haven't been living a lie before delving deeper into the code iteself.
It all really depends upon how the app is coded and what it is doing. For some apps, it won't make any difference whether free memory is 10% or 100% as long as there's enough for it to do its job.
For other apps, they may encounter memory fragmentation, they may cause disk swapping, they may even adjust their own behavior because of less available memory (using smaller buffers, forcing data to disk, etc...). In a garbage collected system (like nodejs), a lower memory condition may cause more frequent garbage collection too.
The single biggest performance impact from running lower on memory will be if the app causes the OS to page memory to disk. This is where the virtual memory being used exceeds the actual physical memory and the OS has to substitute some disk space for memory that is allocated. The OS tries to swap memory to disk that hasn't been accessed recently in the hopes that it won't be needed again soon, but sometimes that just doesn't work very efficiently and you get a lot of hard disk thrashing, constantly reading/write memory to/from disk. Since disks are thousands of times slower than physical memory, this can massively slow things down.
There are also cases of app design where some operations in an app like Photoshop that will simply run faster with more memory available to use because the algorithms will adapt to use the larger amount of memory to make the operation run faster when working on large objects. A nodejs app or library could be doing the same thing. For example an image processing algorithm may be designed to work on images larger than will fit in memory so it has to decide how much memory is "safe" to allocate and then work on the image in chunks. With a smaller amount of memory available, the work gets done less efficiently in smaller chunks.
A more common reason why things get slower over time is because of some sort of internal fragmentation or leaks that make regular housekeeping chores (like allocating memory) less efficient. This may occur either at the heap level or at the app level. This is why some admins schedule long running processes (like servers) to be automatically restarted every once in awhile - to clear up any of this fragmentation or small leaks and regularly start afresh.
If it's a major problem, extensive debugging may be able to explain where any major impacts are coming from, but this is not trivial debugging as it involves lots of measuring, gathering data, adjusting what you're looking at based on what you find, etc... all while trying to not influence the very thing you're trying to find/measure.
Something wrong is happening with my pixi.js game. It allocated 1MB a second and 3 seconds later GS releases it. And so on, infinitely.
Of course I read this, but it seems like Chrome Tools are unable to detect a problem - when I record the Allocation Timeline - it shows some rare spikes, which, when selected - show some functions, but also there are constant tiny spikes of memory allocation, which don't show anything. I select them, and in a list of functions I see nothing!
In my frame by frame code I optimized everything - when I turn off pixi - the memory doesn't move. Only when I do the pixi render the scene on every frame - then this constant allocation/release starts and never ends. On PC it's ok, but on mobile every 10 seconds it freezes for 5 seconds - impossible to play.
Did anybody encounter frequent allocations/GC in their code? If yes - how did you debug it, how did you fix it?
In my experience Pixi.js has a GC spikes even with empty scene, you can test it yourself. Feel free to open github issue in their repo. I believe they had some discussions about leaks already. But I don't think pixi itself should impact that much, unless you have thousands and thousands of objects.
Are you sure you did everything? You should abuse Object pool pattern and pre-allocations in you code. This is especially true when you need to constantly create/delete objects (Something like bullets).
General information
https://www.html5rocks.com/en/tutorials/speed/static-mem-pools/
Upd:
For debugging chrome tools is pretty much okay.
https://developers.google.com/web/tools/chrome-devtools/memory-problems/
Background: I came from Microsoft world, in which I used to have websites stored on IIS. Experience taught me to recycle my application pool once a day in order to eliminate weird problems due to fragmentation. Recycling the app pool basically means to restart your application without restarting the entire IIS. I also watched a lecture that explained how Microsoft had reduced the fragmentation a lot in .Net 4.5.
Now, I'm deploying a Node.js application to production environment and I have to make sure that it works flawlessly all the time. I originally thought to make my app restarted once a day. Then I did some research in order to find some clues about fragmentation problems in Node.js. The only thing I've found is a scrap of paragraph from an article describing GC in V8:
To ensure fast object allocation, short garbage collection pauses, and
the “no memory fragmentation V8” employs a stop-the-world,
generational, accurate, garbage collector.
This statement is really not enough for me to give up building a restart mechanism for my app, but on the other hand I don't want to do some work if there is no problem.
So my quesion is:
Should or shouldn't I restart my app every now and then in order to prevent fragmentation?
Implementing a server restart before you know that memory consumption is indeed a problem is a premature optimization. As such, I don't think you should do it until you actually find that it is a problem. You will likely find more important issues to optimize for as opposed to memory consumption.
To figure out if you need a server restart, I recommend doing the following:
Set up some monitoring tools like https://newrelic.com/ that let's your monitor your performance.
Monitor your memory continuously. Try to see if there is steady increase in the amount of memory consumed, or if it levels off.
Decide upon an acceptable threshold before you need to act. For example once your app consumes 60% of system memory you need to start thinking about a server restart and decide upon the restart interval.
Decide if you are ok with having "downtime" while restarting the sever or not. If you don't want downtime, you may need to build a proxy layer to direct traffic.
In general, I'd recommend server restarts for all dynamic, garbage collected languages. This is fairly common in those types of large applications. It is almost inevitable that a small mistake somewhere in your code base, or one of the libraries you depend on will leak memory. Even if you fix one leak, you'll get another one eventually. This may frustrate your team, which will basically lead to a server restart policy, and a definition of what is acceptable in regards to memory consumption for your application.
I agree with #Parris. You should probably figure out whether you actually need have a restart policy first. I would suggest using pm2 docs here. Even if you don't want to sign up for keymetrics, its a pretty good little process manager and real quick to set up. You can get a report of memory usage from command line. Looks something like this.
Also, if you start in cluster mode like above, you can call pm2 restart my_app and the first one will probably be up again before the last one is taken offline (this is an added benefit, the real reason for having 8 processes is to utilize all 8 cores). If you are adamant about downtime, you could restart them 1 by 1 acording to id.
I agree with #Parris this seems like a premature optimization. Also, restarting is not a solution to the underlying problem, it's a treatment for the symptoms.
If memory errors are a prevalent issue for your node application then I think that some thought as to why this fragmentation occurs in your program in the first place could be a valuable effort. Understanding why memory errors occur after a program has been running for a long period of time, and refactoring the architecture of your program to solve the root of the problem, is a better solution in my eyes than just addressing the symptoms.
I believe two things will benefit you.
immutable objects will help a lot, they are a lot more predictable than using mutable objects, and will not be affected by the length of time the project has been live. Also, since immutable objects are read only blocks of memory they are faster than mutable objects which the server has to spend resources deciding whether to read, or write on the memory block which stores the object. I currently use the library called IMMUTABLE and it works well for me. There are other one's as well like Deep Freeze, however, I have never used it.
Make sure to manage your application's processes correctly, memory leaks are the second big contributor to this problem that I have experienced. Again, this is solved by thinking about how your application is structured, and how user events are handled, making sure once a process is not being used by the client that it is properly removed from the heap, if it is not then the heap keeps growing until all memory is consumed causing the application to crash(refer to the below graphic to see V8's memory Scheme, and where the heap is). Node is a C++ program, and it's controlled by Google's V8 and Javascript.
You can use Node.js's process.memoryUsage() to monitor memory usage. When you identify how to manage your heap V8 offers two solutions, one is Scavenge which is very quick, but incomplete. The other is Mark-Sweep which is slow and frees up all non-referenced memory.
Refer to this blog post for more info on how to manage your heap and manage your memory on V8 which runs Node.js
So the responsible approach to your implementation is to keep a close eye on open processes, a deep understanding of the heap, and how to free non-referenced memory blocks. Creating your project with this in mind also makes the project a lot more scaleable as well.
I've been trying to optimize an angular site, and I'm getting a huge amount of delay in the responsiveness of my page when switching between certain routes. Each page displayed is not massive, but it has a fair number of elements in, and a reasonable number of bindings. I've already done what I can with bindonce, so I went and looked in the debugger with Chrome and I see most of my time appears to be spent doing GC.
What's strange is there seems to be huge gaps between each GC, and I'm trying to figure out what exactly those are.
I'm guessing it's when it's actually removing the items and the little bars are when it's doing the mark and sweep, but I'm not as familiar with this level of depth of analyzing JS. Most of my work has been in C++/C#/Java.
In half a second more than 20MB of garbage was collected. GC is pretty busy. This also means that your software is pretty busy as well, producing at least the same memory usage through certain objects.
In order to better understand where the garbage came from, profiling heap allocations might prove useful at this point.
Under Profiles, you might take snapshots of heap allocations and see what type of objects were created, which objects consumed the most memory etc.
My site is pretty standard ecom site, it isn't a JS backed standalone app or anything, it's just a site which uses JS for standard stuff, as well as some jquery plugins to do a few things.
I'm trying to do some JS memory evaluation on my site. I've done this by looking at the Chrome Task Manager and through Heap Snapshots.
Initailly my site on first load sits between 35MB (i.e 35,000K) and 40MB on the task manager. This is the largest of any tab, if I have several tabs of other websites open at the same time.
If I refesh the page it jumps up to 55-60, another refresh sees it jump to 65-70MB.
On a normal page in a workflow, it fluctuates between 45-65 (sometimes 75 depending on what you're doing). Clicking around and doing the workflow from page to page sees the memory jump up to 85-100, and increases as you continue through the site.
I've tried to do a few things like check for:
detacted nodes
heap snapshots & looking at the deltas
amix's MemoryLeakChecker checking size of objects
I'd need a deeper dive to look for circular references or closure problems.
Heap snapshots don't reveal much, most of the top lists are (array), (string), (system). The snapshots sit between 4.8MB, 5.1MB, 5.8MB, 6.8MB and increase.
I've got a few questions as result:
How do I understand the different metrics between snapshot memory and task manager memory
Are there any good tutorials (apart from the ones on the Google Developers site)?
How much memory is considered acceptable? Given in the task manager my site is always the highest?
Do I have a memory leak? Apart from the steps I've described above (which I haven't found anything concrete from) is there any other ways I can find leaks?
Can you suggest any tools apart from the Chrome Dev Tools (a lot of the tools mentioned on Google for Firefox are not compatible with the latest version, eg: Leak Monitor for FF)
As a side note, most of my functions are low key operations, and don't exceed 200ms (based on a CPU profile). What is a good benchmark I should be aiming for? Is 200ms high?
What you are describing is not a memory leak, it's a garbage that Chrome knows of and that will be removed whenever Chrome decides it's time to do it. To explain this, lets have a closer look at the scenario you have described.
Making memory to 'leak'
First lets open up a new incognito window (just to be sure that browser extensions are not affecting our results) and navigate to google.com.
Then, lets open the Task Manager and enable "JavaScript Memory" column (by right-clicking on the Task Manager window). We need this column to be sure that the memory we will be 'leaking' is being, in fact, allocated by JavaScript. We end up with something like this:
Now, as you suggested, we should reload the page couple of times and observe the memory of our tab going up:
So far, so good - everything works exactly as you described it.
Wait a second...
However, lave your cursor inactive for half a minute, or go to another tab and you will observe a huge memory usage drop on our 'Tab:google'. Why is that? What happened there? Who cleaned up our 'leaked' memory for us?
The Memory Usage Drop
To investigate that, lets repeat what we have done so far, so that 'Tab:google' uses a lot of memory again. Then, lets open Chrome Developer Tools and start recording on the 'Timeline' tab. After that, lets change a tab for couple of seconds and when memory drops stop 'recording' on the 'Timeline'. You should end up with this:
In the last couple of seconds of our recording mysterious 'GC Events' appeared. Exactly in the same time when the memory was released. Coincidence? Nope.
GC Events
GC stands for the Garbage Collector. It's a mechanism that "attempts to reclaim garbage, or memory occupied by objects that are no longer in use by the program". So it turns out that memory of our tab was polluted by garbage and GC was capable of getting rid of these garbage for the whole time (you can even force garbage collection using button at the bottom of the 'Timeline' tab). So why it decided not to? Why it waited for us to stop interacting with the page or change the tab?
Lazy Garbage Collector
The short answer is that garbage collection has to 'freeze' the execution of all scripts before any work can be done. Also, it can take significant amount of CPU time to execute. This can result in lag, choppy animations, unresponsive controls etc. That's why Chrome waits for the right moment to call the garbage collection. And the best moment to do it is when user is not looking.
In addition, please note that 'GC Events' come in series, there are always couple of them with short breaks in between. These breaks are meant for 'normal' JavaScript to execute making the garbage collection less noticeable.
Live Objects
Take a look at "JavaScript Memory" tab at the top two screenshots in this post again. You will notice that this column contains two numbers. First one is memory "reserved for JavaScript VM
heap", the other one is "how much memory live (reachable) objects
comprise" (source). When benchmarking your applications you should worry only about the second value, all the rest will be handled by GC.
An example of a leak
A real JavaScript leak can happen ie. in a web chat application. If, over time, it will use more and more 'live' memory while always displaying only last 10 messages then we can talk about a leak. Such leak, will eventually crash a tab (or a browser).
Conclusion
For scripts running on the page, reloading the page (or going to another location) is equal to restarting your computer while your ANSI C app is running. After that, you should think about all the memory allocated by your scripts as wiped out. The only reason why, in practice, this may not happen immediately after reloading the page is that browser is waiting for the right moment to clean up. And you, as a web developer, should not be concerned about it.
If you still think that your page are leaking you can use the answer from this question to track down the leaked objects.