Node.js and The Case of the Blocked Event Loop
By Zef Hemel
- 4 minutes read - 845 wordsIn Pick Your Battles I listed a few problems that we had in our production deployment of a big node.js codebase. Some people asked me to elaborate on one in particular:
So, why did this happen? The short answer is our code blocked the node.js eventloop from time to time. As you may be aware, node.jsâââlike Javascript in the browserâââis a single-threaded, event loop driven environment: only one thing can happen at a time. Parallelism does not exist from the programmerâs perspective. This works, because the applications built using node.js should be I/O bound instead of CPU bound. Meaning that to to handle a request, most time is spent waiting for I/O (a disk has to spin, data has to travel over a network) rather than doing CPU intensive computations.
In a sense, node.js brings us back to the cooperative scheduling days of Windows 3.x/Mac OS 9 and earlierâââbut with the advantage of avoiding all the overhead that threads bring to the table.
Cooperative scheduling works well, as long as youâââwellâââcooperate. What does cooperate mean in this context? You have to make sure that you chop up the work that needs to done in small nuggets of computation and donât do too much computation, especially not at once. For instance:
function requestHandler(req, res) { db.getUser(req.params.uid, function(err, user) { res.end(user.username); }); } This super poor piece of Javascript code is wrong in many ways, but it does one thing well: whenever requestHandler
is called (presumably when a HTTP request comes in) it performs an asynchronous call and returns immediately. Assuming db.getUser
is as asynchronous as it looksâââyouâre good to goâââvery little computation, immediately invoking another I/O-bound operation.
A year ago, Ted Dziuba made an important point about node.js in a fairly inflammatory post (since removed from his blog, but I found a copy):
Hereâs a fun fact: every function call that does CPU work also blocks. This function, which calculates the nâth Fibonacci number, will block the current thread of execution because itâs using the CPU.
function fibonacci(n) { if (n < 2) return 1; else return fibonacci(n-2) + fibonacci(n-1); } He goes on to demonstrate how his Fibonacci server written in node.js has abismal performance. Thatâs great, but we donât usually build fibonacci servers in node.js. However, there are cases where node.js _does_become CPU bound and blocking, albeit unintentional:
function requestHandler(req, res) { var body = req.rawBody; // Contains the POST body try { var json = JSON.parse(body); res.end(json.user.username); } catch(e) { res.end(“FAIL”); } } Looks fine right? It just takes the requestâs body and parses it. This works great until somebody POSTs a 15mb JSON file, which your server will now have to process. I just tested this on my laptop. Executing the JSON.parse()
call on a 15mb JSON file took about 1.5 seconds. Similarly, if I stringify a JSON data structure of this size with JSON.stringify(json, null, 2)
it takes about 3 seconds.
You may think: âoh, 1.5 seconds, 3 seconds, thatâs still pretty fast!â Do realize that during this time the event loop is completely blocked, your node.js server process will do nothing else. It will not accept new connections, it will not keep processing ongoing requestsâââthe entire process freezes. While a 15mb request is a bit of a stretch, a 200kb JSON document may seem more reasonable. Yet, if you get 20 of those, your server clogs up just the same.
So, how much of an impact does this blocking have on performance? Thatâs pretty easy to calculate. Letâs say 1 request takes 1ms to process, that means you can handle at most 1/0.001 = 1000 requests/s (assuming you donât do any I/O). Thatâs pretty good. Alright, so how about some longer event loop blocking?
* 5ms/req = max 200 reqs/s
* 50ms/req = max 20 reqs/s
* 500ms/req = max 2 reqs/s
* 2s/req = 0.5 reqs/s
Of course this correlation holds for any other technology just the same: the more processing each request takes, the fewer a single server can handle. However, with other platforms requests will just get slower as the load increases on a server. With node.js, if you do a blocking computation the entire process hangs completely for the duration of the request. You can limit the impact on your users by using a module like cluster, but the message is clear: blocking the eventloop is bad, mkay?
Node.jsâs core strength is message passing and glueing components together. Itâs great at receiving a request, processing it slightly and passing it on to another (database) server, waiting for a result, tweaking that result and passing it back to client.
The main problem is that you may be completely unaware that your program does serious computation until you start to notice hiccups. Tooling for this aspect of node.js development is still severely lacking.
A while back I was in contact with NodeFly, they seem very serious about building tools to detect these types of issues. Definitely a company to watch in this space.