Node.js and The Case of the Blocked Event Loop

In [Pick Your Battles](http://zef.me/4235/pick-your-battles) I listed a few problems that we had in our production deployment of a big node.js codebase. Some people asked me to elaborate on one in particular: > "Oh, our node.js server processes seem to freeze up for a long time (seconds) from time to time, why does that happen?"

So, why _did_ this happen? The short answer is our code blocked the node.js eventloop from time to time. As you may be aware, node.js -- like Javascript in the browser -- is a single-threaded, [event loop driven](http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/) environment: only one thing can happen at a time. Parallelism does not exist from the programmer's perspective. This works, because the applications built using node.js should be I/O bound instead of CPU bound. Meaning that to to handle a request, most time is spent waiting for I/O (a disk has to spin, data has to travel over a network) rather than doing CPU intensive computations.

In a sense, node.js brings us back to the [cooperative scheduling](http://en.wikipedia.org/wiki/Cooperative_Scheduler#Cooperative_multitasking.2Ftime-sharing) days of Windows 3.x/Mac OS 9 and earlier -- but with the advantage of avoiding all the overhead that threads bring to the table.

Cooperative scheduling works well, as long as you -- well -- cooperate. What does cooperate mean in this context? You have to make sure that you chop up the work that needs to done in small nuggets of computation and don't do too much computation, especially not at once. For instance:

function requestHandler(req, res) {
   db.getUser(req.params.uid, function(err, user) {
       res.end(user.username);
   });
}

This super poor piece of Javascript code is wrong in many ways, but it does one thing well: whenever `requestHandler` is called (presumably when a HTTP request comes in) it performs an asynchronous call and returns immediately. Assuming `db.getUser` is as asynchronous as it looks -- you're good to go -- very little computation, immediately invoking another I/O-bound operation.

A year ago, Ted Dziuba made an important point about node.js in [a fairly inflammatory post (since removed from his blog, but I found a copy)](http://pages.citebite.com/b2x0j8q1megb):

> A function call is said to block when the current thread of execution's flow waits until that function is finished before continuing. Typically, we think of I/O as "blocking", for example, if you are calling socket.read(), the program will wait for that call to finish before continuing, as you need to do something with the return value.

> Here's a fun fact: every function call that does CPU work also blocks. This function, which calculates the n'th Fibonacci number, will block the current thread of execution because it's using the CPU.

>

function fibonacci(n) {
  if (n < 2)
    return 1;
  else
    return fibonacci(n-2) + fibonacci(n-1);
}

He goes on to demonstrate how his Fibonacci server written in node.js has abismal performance. That's great, but we don't usually build fibonacci servers in node.js. However, there are cases where node.js _does_ become CPU bound and blocking, albeit unintentional:

function requestHandler(req, res) {
   var body = req.rawBody; // Contains the POST body
   try {
      var json = JSON.parse(body);
      res.end(json.user.username);
   }
   catch(e) {
      res.end("FAIL");
   }
}

Looks fine right? It just takes the request's body and parses it. This works great until somebody POSTs a 15mb JSON file, which your server will now have to process. I just tested this on my laptop. Executing the `JSON.parse()` call on a 15mb JSON file took about 1.5 seconds. Similarly, if I stringify a JSON data structure of this size with `JSON.stringify(json, null, 2)` it takes about 3 seconds.

You may think: "oh, 1.5 seconds, 3 seconds, that's still pretty fast!" Do realize that during this time the event loop is completely blocked, your node.js server process will do _nothing_ else. It will not accept new connections, it will not keep processing ongoing requests -- the entire process freezes. While a 15mb request is a bit of a stretch, a 200kb JSON document may seem more reasonable. Yet, if you get 20 of those, your server clogs up just the same.

So, how much of an impact does this blocking have on performance? That's pretty easy to calculate. Let's say 1 request takes 1ms to process, that means you can handle at most 1/0.001 = 1000 requests/s (assuming you don't do any I/O). That's pretty good. Alright, so how about some longer event loop blocking?

* 5ms/req = max 200 reqs/s * 50ms/req = max 20 reqs/s * 500ms/req = max 2 reqs/s * 2s/req = 0.5 reqs/s

Of course this correlation holds for any other technology just the same: the more processing each request takes, the fewer a single server can handle. However, with other platforms requests will just get _slower_ as the load increases on a server. With node.js, if you do a blocking computation the entire process hangs completely for the duration of the request. You can limit the impact on your users by using a module like [cluster](http://nodejs.org/api/cluster.html), but the message is clear: blocking the eventloop is bad, mkay?

Node.js's core strength is message passing and glueing components together. It's great at receiving a request, processing it slightly and passing it on to another (database) server, waiting for a result, tweaking that result and passing it back to client.

The main problem is that you may be completely unaware that your program does serious computation until you start to notice hiccups. Tooling for this aspect of node.js development is still severely lacking.

A while back I was in contact with [NodeFly](http://www.nodefly.com), they seem very serious about building tools to detect these types of issues. Definitely a company to watch in this space.