NodejS: memory usage of worker threads

NodejS: memory usage of worker threads - javascript

The bounty expires in 6 days. Answers to this question are eligible for a +50 reputation bounty.
Naveen is looking for an answer from a reputable source:
I need a solution that works or an answer explaining why this is not possible
Is there a way to obtain heap usage statistics of each worker threads in nodejs? I'm looking for something similar to getHeapStatistics() of v8 module. Mostly i'll just be needing the values of used heap size and max heap limit.

Related

Node.js and fragmentation

Background: I came from Microsoft world, in which I used to have websites stored on IIS. Experience taught me to recycle my application pool once a day in order to eliminate weird problems due to fragmentation. Recycling the app pool basically means to restart your application without restarting the entire IIS. I also watched a lecture that explained how Microsoft had reduced the fragmentation a lot in .Net 4.5.
Now, I'm deploying a Node.js application to production environment and I have to make sure that it works flawlessly all the time. I originally thought to make my app restarted once a day. Then I did some research in order to find some clues about fragmentation problems in Node.js. The only thing I've found is a scrap of paragraph from an article describing GC in V8:
To ensure fast object allocation, short garbage collection pauses, and
the “no memory fragmentation V8” employs a stop-the-world,
generational, accurate, garbage collector.
This statement is really not enough for me to give up building a restart mechanism for my app, but on the other hand I don't want to do some work if there is no problem.
So my quesion is:
Should or shouldn't I restart my app every now and then in order to prevent fragmentation?

Implementing a server restart before you know that memory consumption is indeed a problem is a premature optimization. As such, I don't think you should do it until you actually find that it is a problem. You will likely find more important issues to optimize for as opposed to memory consumption.
To figure out if you need a server restart, I recommend doing the following:
Set up some monitoring tools like https://newrelic.com/ that let's your monitor your performance.
Monitor your memory continuously. Try to see if there is steady increase in the amount of memory consumed, or if it levels off.
Decide upon an acceptable threshold before you need to act. For example once your app consumes 60% of system memory you need to start thinking about a server restart and decide upon the restart interval.
Decide if you are ok with having "downtime" while restarting the sever or not. If you don't want downtime, you may need to build a proxy layer to direct traffic.
In general, I'd recommend server restarts for all dynamic, garbage collected languages. This is fairly common in those types of large applications. It is almost inevitable that a small mistake somewhere in your code base, or one of the libraries you depend on will leak memory. Even if you fix one leak, you'll get another one eventually. This may frustrate your team, which will basically lead to a server restart policy, and a definition of what is acceptable in regards to memory consumption for your application.

I agree with #Parris. You should probably figure out whether you actually need have a restart policy first. I would suggest using pm2 docs here. Even if you don't want to sign up for keymetrics, its a pretty good little process manager and real quick to set up. You can get a report of memory usage from command line. Looks something like this.
Also, if you start in cluster mode like above, you can call pm2 restart my_app and the first one will probably be up again before the last one is taken offline (this is an added benefit, the real reason for having 8 processes is to utilize all 8 cores). If you are adamant about downtime, you could restart them 1 by 1 acording to id.

I agree with #Parris this seems like a premature optimization. Also, restarting is not a solution to the underlying problem, it's a treatment for the symptoms.
If memory errors are a prevalent issue for your node application then I think that some thought as to why this fragmentation occurs in your program in the first place could be a valuable effort. Understanding why memory errors occur after a program has been running for a long period of time, and refactoring the architecture of your program to solve the root of the problem, is a better solution in my eyes than just addressing the symptoms.
I believe two things will benefit you.
immutable objects will help a lot, they are a lot more predictable than using mutable objects, and will not be affected by the length of time the project has been live. Also, since immutable objects are read only blocks of memory they are faster than mutable objects which the server has to spend resources deciding whether to read, or write on the memory block which stores the object. I currently use the library called IMMUTABLE and it works well for me. There are other one's as well like Deep Freeze, however, I have never used it.
Make sure to manage your application's processes correctly, memory leaks are the second big contributor to this problem that I have experienced. Again, this is solved by thinking about how your application is structured, and how user events are handled, making sure once a process is not being used by the client that it is properly removed from the heap, if it is not then the heap keeps growing until all memory is consumed causing the application to crash(refer to the below graphic to see V8's memory Scheme, and where the heap is). Node is a C++ program, and it's controlled by Google's V8 and Javascript.
You can use Node.js's process.memoryUsage() to monitor memory usage. When you identify how to manage your heap V8 offers two solutions, one is Scavenge which is very quick, but incomplete. The other is Mark-Sweep which is slow and frees up all non-referenced memory.
Refer to this blog post for more info on how to manage your heap and manage your memory on V8 which runs Node.js
So the responsible approach to your implementation is to keep a close eye on open processes, a deep understanding of the heap, and how to free non-referenced memory blocks. Creating your project with this in mind also makes the project a lot more scaleable as well.

Node.js performance and memory leaks

I faced some node.js memory weirdness, with react prerendering app. Here is memory profiling via newrelic:
As you can see – once in an hour GC is freeing memory, when it comes up to 1GB. Is this okay for node.js (v0.12.x) or is something going wrong?
P.S. I read about newrelic's memory leaks, but turning it off provides the same results.

It is not nodeJS, it is v8 JS engine.
As far as I know, by #perfmatters talks, these issues (memory/performance) related to javascript can be made better by writing the application which takes more care while allocating new objects.
Here are some useful resource
Youtube: talk by colt McAnlis
Node.js Performance Tip: Managing Garbage Collection

In my experience this looks normal. Without further investigating you'd expect rather short lived, more extreme peaks if it would have been memory leeks. Maybe read up on this here

Which library should I use for server-side image manipulation on Node.JS? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I found a quite large list of available libraries on Node.JS wiki but I'm not sure which of those are more mature and provide better performance. Basically I want to do the following:
load some images to a server from external sources
put them onto one big canvas
crop and mask them a bit
apply a filter or two
Resize the final image and give a link to it
Big plus if the node package works on both Linux and Windows.

Answering my own question
I spent two days digging through Node.js graphics libraries.
node-canvas
I tried it first since I'm quite familiar with <canvas> API. It's a huge plus for a library.
it requires Cairo which doesn't have an easy Windows download. I found it in GTK+ distribution though.
moreover it needs native library binding code to be compiled on module installation. It uses Node-Waf which hasn't being ported to Windows yet.
gm
mature
runs on Windows smoothly
docs are ok but not thorough: I had to look up into source code to figure out what API is available
unfortunately there's no easy way to combine images with gm. Maybe there's some way to achieve that but I haven't found one after two hours spent with it.
node-imagemagick
The official repo has very few basic ImageMagick commands covered but I used this fork (good thing that NPM can pull libraries directly from git repositories). It has bindings for montage which does exactly what I need.
ImageMagick is quite slow, though it works on Windows.
Node-Vips
Huge plus: it uses an incredible VIPS library which I'm familiar with. VIPS is very fast and optimized for large images. It's very smart about utilizing hardware resources: if your machine has a lot of RAM it'll do all processing in memory but will switch to hard-drive caches if memory is scarce or required for other applications.
same as node-canvas it requires Node-Waf so it's not available for Windows yet.
I also looked at other libraries from the list but most of them are either very immature or do not suit my use case. I would really like to try migrating to either Node-Canvas or Node-Vips when Node-Waf gets ported to Windows but until then I'll stick to node-imagemagick.

I'd strongly advise you to check gm with GraphicsMagick.
Stable, feature rich, clean API, great docs, and fast.
And it works both on Windows and Linux / MacOS / BSD / ...

Her is the link to canvas implementation based on GDI+

Debugging "Maximum call stack size exceeded"

I have a server that I can cause to die with the following output:
events.js:38
EventEmitter.prototype.emit = function(type) {
^
RangeError: Maximum call stack size exceeded
However, without a stack dump or trace, I have no way of finding whether this is infinite recursion or just a slightly-too-large chain, let alone where the problem function is.
Running Node with the --trace option caused my tests to not only run slow (as one would expect), but to not reproduce the problem.
Anybody have any solutions or tips for getting to the bottom of this?

It seems the answer is currently: sit tight and wait for Node.js to update to a newer V8 version, or build your own with the patch from this Chromium project bug report.
This archived thread from the v8-dev mailing list shows a discussion in which
Dave Smith brings up this very issue and proposes a patch
Yang Guo of the Chromium project discusses it, files a Chromium bug against the issue, and applies a different fix
Dave notes that Node (0.8 at the time) is using V8 3.11 and asks about backporting the patch. Yang replies that the patch will probably land in V8 3.15 and will not be backported.
Note that Node.js v0.8 used V8 3.11; Node.js 0.10 is currently using V8 3.14. So the patch accepted by Chromium for this issue is still "in the future" as far as Node is concerned.
(This answer owes thanks to #Coderoshi, since it's by following the thread from his answer that I learned all this.)

The chance of it being a "slightly-too-large chain" seems unlikely.
It's probably a function calling the event that triggered itself.
So if the slowing down of the code is making the infinite recursion to stop.
My guess would be that you have a queue and with the slower mode its not getting
filled up as fast.
If this doesn't help then I think I need more info.
Maybe someone has a catch-all for this though.

This patch might help you find a solution. It expands the stack trace tremendously:
https://github.com/dizzyd/node/commit/40434019540ffc17e984ff0653500a3c5db87deb

How efficient is javascript? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 13 years ago.
Sometimes I hear the argument against doing things on client side using javascript. People say stuff like "JavaScript is inefficient...or slow". I'm wondering if there actual facts supporting this conclusion?

There are really two factors to Javascript performance:
Your code
The scripting engine running your code
Your code is the easiest factor to fix. As you develop, optimize your code as best you can. Easy.
The second isn't always as easy. I've had some applications where performance on one browser was excellent and another was slower than mud. Others run great across the board. Best you can do is test, test, test, and test again. If you want a good article, check out the link:
Coding Horror: The Great Browser JavaScript Showdown

That depends alot on the browser's javascript engine.
Over all, it's a scripting language, so it won't do as well as C++ or another compiled language. However, it's good at what it was intended for, and that's drive web pages.

Javascript is FAST if you use it properly. Otherwize it behaves bad.
Eg: an unlimited loop can hang your browser. (But browser will ask you whether to stop the execution)

The choice of what tasks to perform on the client versus on the server is an important one, and the efficiency of JavaScript as a language is not the only factor which needs to be considered.
Data which will be manipulated on the client must be transmitted to the client. If the script does not need all of the information which will be pushed down to the client, then page load time will suffer, and the filtering operation will be done on the less-efficient end of the link (i.e. you will pay for the network transmission time before the user gets their information).
Business rules which run on the client will be exposed to curious end users.
Validation business rules which are run on the client must be run again on the server, because you cannot trust code running in an environment you don't control.
The different browsers and even between ECMAScript implementations available within a given browser family make this question nastily subjective and subject to a lot of variation.

Well, it depends. What are you comparing it to? It differs alot between differnt browsers.
It can be really well performing, or the opposite, depending on the code written.
You HAVE to use JavaScript to do certain things, like manipulating the dom for example.

I would imagine that in most cases it is much quicker than a post back!

I would say it's incorrect answer. How would you measure JavaScript performance and that would you use for comparison. I guess as long as JavaScript is the only option for client side web programing (I'm not talking about VBScript) you cannot really say anything regarding it's efficiency.

Also depends on how you write your code. If you follow best practice it's fine and as said before, it's better than postbacks!

You can only really answer that question in the context of a specific problem you're trying to solve. Post an example and then we can debate the merit's of various technologies...

Javascript is not inefficient, efficiency does not depend on the language. Interpreters might be inefficient. For instance Firefox interpreter runs very slow in FF for Linux and much better in FF for windows. Chrome has implemented an interpreter which is much faster. There are Javascript interpreters that does not run into a browser, they are usually faster.

I guess what people are trying to tell you is: do what you can on the server, instead of putting all of the code in the client side.
Javascript performance differs from a browser to another (or from an interpreter to another), but javascript shouldn't serve the same purposes as server-side languages.

I'm a 'numbers guy' so when anybody says things like "well X is slow" or "of course, because Y is fast" that really gets my goat. So for starters, you need to use real data if you're going to make any kind of assessment:
JavaScript Performance Rundown
I also think watching Dromaeo in action is kinda cool

Modern browsers are implementing more and more just-in-time compilation to their interpreters.
My rule of thumb is that if you can't rely on JavaScript being turned on, do as much on the server as you can. If you absolutely know that JavaScript is on, do as much in the client as you can and you'll save on bandwidth and server load.

Develop Reference

JavaScript is the programming language of the Web.