V8 Snapshot feature for Node.js server failure debugging

V8 Snapshot feature for Node.js server failure debugging - javascript

I've googled for 'V8 mksnapshot' and found out that there is a 'snapshot' feature in V8 engine, also there is this question here
This feature seems to be an outstanding feature for bug reproduction in case of Node.js server fault.
The scenario
You use 'process.dumpAll' in some error-handler of Node.js server, maybe attach it process-wise, maybe filter error events somehow.
If a problem occur - all the V8 state is saved into a dump-file
Later when you want to reproduce the problem you can re-run Node.js from this dump
You than attach a debugger to Node.js, the process would be on in suspended state - same thing if 'debugger' is triggered, current statement would be 'process.dumpAll' function.
Now you can inspect every objects' state in V8.
I'd like to ask
Where can I find better documented (better than some chatting on forum) more info about V8 snapshots?
Do you see any pitfalls for this scenario?
What are the first steps should I take to implement a function 'process.dumpAll' and 'process.loadAll' for Node.js? (excluding knowledge about writing Node.js extensions)
Maybe someone is already making or made some solution for this?

V8's mksnapshot feature is not designed for postmortem debugging (but for startup acceleration), and I doubt that it could be useful for that purpose.
Coredumps (for crashing processes) and DevTools / heap snapshots (for exceptions) are in all likelihood more useful for debugging purposes.
There are some existing efforts for Node.js postmortem debugging. Maybe just knowing the right search engine query can help you get an overview of existing solutions, their abilities and limitations?

Related

How to prevent websites from tracking/reporting console.log calls of my browser extension?

Having caused some Track&Report-Messages in console recently I just became aware of those client-side logging capabilities of accordingly armed websites. I always thought the console would be my very own real estate, but that was naiive, quite obviously.
I am working on a browser extension to keep an eye on my own browsing behaviour etc., logging into storage.local for the time being, switching to indexedDB not far from here. But of course I utilise the console for debugging and convenience while developing the thing. So the website owner could fetch my logged objects, my stats, all of debugging in general? Even the whole storage or would obfuscating the object names help here?
Having looked into it I already read some "workarounds", like
no console logging at all (meh),
changing the console.log to only work with a global DEBUG-flag I set (same here if for many other reasons),
constantly clearing console (does it even help)
and lots of more ideas, none of which sounds elegant or even very helpful in the first place.
So my question is, if you have suggestions on how I can keep at least a minimal console-like feedback from javascript but hiding my own stuff from all the web servers and third parties?
Sadly I found no ressources that explain the topic further, also regarding privacy and all.
The usual term is "client side logging" btw., if you want to google it. There is much info and apps for the website side, not so much for the local side to control it. It is not well known enough yet to not cause misunderstandings.

The scenario you appear to be concerned about is when a server-side application deploys a client-side agent, to allow them to monitor their own app's behaviour from the client side, and it also sucks up your extension's debug info.
This is potentially possible, and there's not a general solution at present but while it's unlikely, you could do something like reviewing if any of the data potentially being exposed is actually sensitive. If so, secure that specifically by encrypting with public key encryption, but more likely, you'll be able to continue without needing to log that specific data.

Node.js and fragmentation

Background: I came from Microsoft world, in which I used to have websites stored on IIS. Experience taught me to recycle my application pool once a day in order to eliminate weird problems due to fragmentation. Recycling the app pool basically means to restart your application without restarting the entire IIS. I also watched a lecture that explained how Microsoft had reduced the fragmentation a lot in .Net 4.5.
Now, I'm deploying a Node.js application to production environment and I have to make sure that it works flawlessly all the time. I originally thought to make my app restarted once a day. Then I did some research in order to find some clues about fragmentation problems in Node.js. The only thing I've found is a scrap of paragraph from an article describing GC in V8:
To ensure fast object allocation, short garbage collection pauses, and
the “no memory fragmentation V8” employs a stop-the-world,
generational, accurate, garbage collector.
This statement is really not enough for me to give up building a restart mechanism for my app, but on the other hand I don't want to do some work if there is no problem.
So my quesion is:
Should or shouldn't I restart my app every now and then in order to prevent fragmentation?

Implementing a server restart before you know that memory consumption is indeed a problem is a premature optimization. As such, I don't think you should do it until you actually find that it is a problem. You will likely find more important issues to optimize for as opposed to memory consumption.
To figure out if you need a server restart, I recommend doing the following:
Set up some monitoring tools like https://newrelic.com/ that let's your monitor your performance.
Monitor your memory continuously. Try to see if there is steady increase in the amount of memory consumed, or if it levels off.
Decide upon an acceptable threshold before you need to act. For example once your app consumes 60% of system memory you need to start thinking about a server restart and decide upon the restart interval.
Decide if you are ok with having "downtime" while restarting the sever or not. If you don't want downtime, you may need to build a proxy layer to direct traffic.
In general, I'd recommend server restarts for all dynamic, garbage collected languages. This is fairly common in those types of large applications. It is almost inevitable that a small mistake somewhere in your code base, or one of the libraries you depend on will leak memory. Even if you fix one leak, you'll get another one eventually. This may frustrate your team, which will basically lead to a server restart policy, and a definition of what is acceptable in regards to memory consumption for your application.

I agree with #Parris. You should probably figure out whether you actually need have a restart policy first. I would suggest using pm2 docs here. Even if you don't want to sign up for keymetrics, its a pretty good little process manager and real quick to set up. You can get a report of memory usage from command line. Looks something like this.
Also, if you start in cluster mode like above, you can call pm2 restart my_app and the first one will probably be up again before the last one is taken offline (this is an added benefit, the real reason for having 8 processes is to utilize all 8 cores). If you are adamant about downtime, you could restart them 1 by 1 acording to id.

I agree with #Parris this seems like a premature optimization. Also, restarting is not a solution to the underlying problem, it's a treatment for the symptoms.
If memory errors are a prevalent issue for your node application then I think that some thought as to why this fragmentation occurs in your program in the first place could be a valuable effort. Understanding why memory errors occur after a program has been running for a long period of time, and refactoring the architecture of your program to solve the root of the problem, is a better solution in my eyes than just addressing the symptoms.
I believe two things will benefit you.
immutable objects will help a lot, they are a lot more predictable than using mutable objects, and will not be affected by the length of time the project has been live. Also, since immutable objects are read only blocks of memory they are faster than mutable objects which the server has to spend resources deciding whether to read, or write on the memory block which stores the object. I currently use the library called IMMUTABLE and it works well for me. There are other one's as well like Deep Freeze, however, I have never used it.
Make sure to manage your application's processes correctly, memory leaks are the second big contributor to this problem that I have experienced. Again, this is solved by thinking about how your application is structured, and how user events are handled, making sure once a process is not being used by the client that it is properly removed from the heap, if it is not then the heap keeps growing until all memory is consumed causing the application to crash(refer to the below graphic to see V8's memory Scheme, and where the heap is). Node is a C++ program, and it's controlled by Google's V8 and Javascript.
You can use Node.js's process.memoryUsage() to monitor memory usage. When you identify how to manage your heap V8 offers two solutions, one is Scavenge which is very quick, but incomplete. The other is Mark-Sweep which is slow and frees up all non-referenced memory.
Refer to this blog post for more info on how to manage your heap and manage your memory on V8 which runs Node.js
So the responsible approach to your implementation is to keep a close eye on open processes, a deep understanding of the heap, and how to free non-referenced memory blocks. Creating your project with this in mind also makes the project a lot more scaleable as well.

How can I debug/step-through/watch my Node.js in Windows?

Sorry, maybe this belongs in programmers stack exchange, but I'm trying to get in to Node.js web development, and I really need to ability to step through my code in order to gain a deeper understanding of just what is happening in all the tutorials I'm using.
I've done some googling, but it looks like everything is written assuming you're in a *nix or OSX environment.
I've tried node-inspector, but I'm being greeted with errors whenever I try to run process._debugProcess() with the PID.

JetBrains WebStorm is relatively inexpensive IDE you can use with Node.js, which is quite feature rich considering the price.
Watch the demonstration video and you should get an idea to see if it's the kind of thing which could be helpful.
http://www.jetbrains.com/webstorm/
Alternatively you could use Eclipse and get this up and running.
https://github.com/joyent/node/wiki/Using-Eclipse-as-Node-Applications-Debugger

How much does it cost in terms of performance to use console.log in nodejs and in browsers?

Let's say you log certain things on your nodejs app or on a browser.
How much does this affect performance / CPU usage vs removing all these logs in production?
I'm not asking because I'm just curious how much "faster" would things run without it so I can take that into account when developing.

It can cost a lot, specially if your application is hardly based on a loop, like a game or a GUI app that gets updated in real time.
Once I developed an educational physics app using <canvas>, and with logs activated withing the main application loop the frame rate easily dropped from 60fps to 28fps! That was quite catastrophic for the user experience.
Overall tip for browser applications is: Do not use console.log() in production for loop based applications specially the ones that need to update a graphical interface within the loop.

For Node: is node.js' console.log asynchronous?
I imagine it's implemented similar in some of the browsers.

I'm not familiar with node.js, however it's typically not a good thing to log anything except critical errors in a production environment. Unless node.js offers a logging utility like log4j, you should look at something like log4js (haven't used, just first google response)

Executing JavaScript with Python without X

I want to parse a html-page that unfortunately requires JavaScript to show any content. In order to do so I use a small python-script that pulls the html-code of the page, but after that I have to execute the JavaScript in a DOM-context which seems pretty hard.
To make it even harder I want to use it in a server environment that has no X11-server.
Note: I already read about http://code.google.com/p/pywebkitgtk/ but it seems to need a X-server.

You can simulate a browser environment using EnvJS. However, in order to make use of it, you will have to embed some kind of JavaScript runtime (e.g. Rhino) in your program (or spawn one as an external process).

You could try using Xvfb to have a fake frame buffer, so you won't need to run X11 (though it may be a dependency of Xvfb on your system). Most rendering engines don't have a headless mode, so something like Xvfb is necessary to run them. I used this technique successfully using XULRunner to navigate web pages, though not from python.

I'm still trying to figure this out myself, so take my answer with a grain of salt.
So far, I found http://blog.motane.lu/2009/06/18/pywebkitgtk-execute-javascript-from-python/, which describes the use and the quirks of Pywebkitgtk by someone who has similar needs to what we do.
Later, however, the writer of that blogpost discovered that he can't get it to work with Xvbf, so he hunted some more and found a Qt webkit (possibly in Qt itself, if I understand correctly) http://blog.motane.lu/2009/07/07/downloading-a-pages-content-with-python-and-webkit/. Apparently it's a much better solution than PywebkitGTK.
Naturally, I'll be looking into the other solutions offered here--but I wanted to bring up the Qt solution, because to me, it seems the most likely candidate for what I want to do...and if not, then perhaps it will be for someone else, looking for an answer to this question! :-)

I use VNC or Xvfb for this purpose, combined with Firefox. After experimenting with the two, I settled on XTightVNC. We use it to create screenshots on demand for various test purposes. It's nice to use one of these because you're executing it in an actual browser, same as a user would be (though most users probably won't be using the same OS as your server).
The handy thing about using VNC is that you can connect remotely to set up and test the browser when needed.

This might help: http://code.google.com/p/pyv8/

Develop Reference

JavaScript is the programming language of the Web.