node.js - Dealing with uncaught exceptions » Debuggable

node.js - Dealing with uncaught exceptions

I just ran into a really nasty bug related to node's process.on('uncaughtException') handler and thought I should share my experience.

Basically you need to be very careful with handling uncaught exceptions in node. Ideally you should just send out an error email and perform a graceful restart of your service whenever you hit an uncaught exception.

Why? Well, if you think you can use "uncaughtException" to just resume your service as if nothing had happened, you may be in for a nice suprise. Consider the following example:

Unless you are very familar with the inner workings of fs.ReadStream, the assumed output sequence would be:

data event fired
{ message: 'oh no', stack: [Getter/Setter] }  
end event fired

However, the actual output is:

data event fired
{ message: 'oh no', stack: [Getter/Setter] }

You might already be able to guess what happens. The exception caused in the callback to the 'data' event causes node to trigger the "uncaughtException" event, and drop the whole call stack after that. Unfortunately however, "fs.ReadStream" had plans to execute more code after firing this "data" callback, which doesn't get executed anymore, causing the "end" event to never fire.

This is a fairly harmless example. The bug I actually ran into caused my database queries to get out of order, so I was looking at the wrong end of my code base for quite a while.

So, unless you really know what you are doing, you should perform a graceful restart of your service after receiving an "uncaughtException" exception event. Otherwise you risk the state of your application, or that of 3rd party libraries to become inconsistent, leading to all kinds of crazy bugs.

Let me know if you have any questions, or other suggestions for dealing with uncaught exceptions!

--fg

PS: This bug was especially painful for, since I'm also to blame for "uncaughtException" being in the node core to begin with : ).

&nsbp;

You can skip to the end and add a comment.

Arnout said on Sep 17, 2010:

Even if you application generates uncaught exceptions there isn't always a need to restart the server. You application might still run fine even though the error occurred.

Felix Geisendörfer said on Sep 17, 2010:

Arnout: Yes, there are scenarios where nothing will happen if you just continue. But how do you detect which scenario it is? I think there is no general way to determine this in the uncaughtException handler.

Soenke Ruempler said on Sep 17, 2010:

Hi Felix,

yes, application should stop immediately in case of an uncaught exception. I could also imagine the following error handling "flow":

* uncaught exception
* exception handler logs the .essage/backdstrace/env (we're using graylog/graylog2 for this stuff -> http://www.graylog2.org/ )

* exception handler does exit(1) to make sure no more stuff in executed while your program is in bad state

* an angel process/script restarts the (node) process after some seconds (if the process returned exit code != 0)

* you could also think of some throtteling in the angel process if node dies too often within a specified time frame

i've used this pattern successfully (not with node, but it's kinda generic)

soenke

NOSLOW said on Sep 17, 2010:

Way to regain control, Felix! I knew you couldn't just let it go without explanation. I'd expect nothing less just because of your company's namesake :)

Benjamin said on Sep 18, 2010:

Is this a bug in fs.readStream? Maybe it should wrap the emit call in a try/catch, and then in the catch make sure it leaves itself in a sound state and then rethrow...

Marc Harter said on Sep 19, 2010:

You mention a "graceful" restart when you use uncaughtException, could you provide an example of what that would look like codewise?

Felix Geisendörfer said on Sep 20, 2010:

Benjamin: No, I don't think it is a bug. ReadStream is simply not "callback safe" (does that term make sense?). You generally shouldn't expect anything to be "callback safe" unless you have written it that way.

Now we could argue if everything in the node-core should be "callback safe". It would probably be fairly easy to do in the emit() function. Regular callbacks are a little more tricky so as *every* single one of them needs to be handled individually.

Marc Harter: That depends on your service. If you already run a watch process like monit, it may simply mean process.exit(0). If you have long-lived requests (like uploads), it means that you need to close your http server, and wait for everything to finish up before you process.exit(). At the same time you'd probably want to bring up a 2nd process listening on the same port again so new requests are handled by the new process while old ones are still in the old process. Either way, it get's a little tricky at that point : ).

Marco Rogers said on Sep 22, 2010:

Felix, I'm not sure I understand exactly what is happening after the uncaughtException callback fires. What's the state of the process afterwards?

- Is it clearing everything off the event loop?
- Is it clearing all event listeners?

- Is it stopping servers? If not, what state are they in?

Also, instead of restarting, what if you just restarted all of your services with a fresh state. Most node programs start everything up in the main script file. But what if you put your services into modules. When there is an exception, you trash the old stuff and just call the service initializer again. Which you've setup to start with a completely fresh state. Is that an option?

Felix Geisendörfer said on Sep 22, 2010:

Marc Harter: uncaughtException does not involve any magic. It really just stops executing where the exception occured, and then the event. The problem is that it doesn't continue where it left of afterwards.

About restarting the process: Yeah, you might do your cleanup internally. But you basically won't know what needs cleaning up, so restarting the process may still be safer.

Isaac Z. Schlueter said on Sep 22, 2010:

I disagree that this is a bug, or even undesirable behavior.

process!unhandledException is designed to resemble the window.onerror callback. It gets notified, but it's not a "catch". Code that throws jumps all the way out of its stack frame. That's what a "throw" is: A way to say "I have hit a point where I cannot continue. Someone fix me, please." You're done. Warp out of there. Do not pass Go, do not collect $200.

If it were to log the error and then keep on chugging with the "end" event and such in that context, then in my opinion, that would be a fundamental shift in the way that the language works.

jonathan chetwynd said on Jan 03, 2011:

this example includes 'throw'.
please could you include a means to find an error?

ie when the reason for the uncaught exception is not known.

message: [Getter/Setter],
stack: [Getter/Setter],

type: 'non_object_property_load',

arguments: [ 0, null ]

This error log is difficult for me to interpret.

Isaac Z. Schlueter said on Jan 03, 2011:

@Jonathan

It seems like that log is doing a sys.inspect or console.log of an Error object.

In that situation, you may be better off with something like this:

console.log("Error: %s", er.stack || er.message)

Jonathan Payne said on Feb 02, 2011:

This article freaked me out until I saw Isaac Z. Schlueter's comment. If you have a server that exits just because one little "thread" has an exception, what kind of server is that? A bad one.

But I have to admit I am not entirely sure what this all means. node.js is single threaded, there's only one thread, so when an exception occurs it should be caught by the event loop, printed out (or sent itself as a notification), and then the event loop should carry on and run the next thing.

If there's something fs.readStream needs to do in order to prevent things from getting out of whack, it needs to catch exceptions itself when it issues a callback and clean up after itself. Maybe it does and emits an error event for all I know. If you get the 'error' event you know you're not going to get the 'end' event, or something like that.

This post is too old. We do not allow comments here anymore in order to fight spam. If you have real feedback or questions for the post, please contact us.

debuggable