CPUs barely get faster anymore, it’s physically difficult to do so. What we can do, however, is build many CPUs and strap them onto a board. That’s much cheaper and easier. Whereas before we simply waited a year for CPUs to catch up with your software’s performance requirements, today we have to approach the problem differently, by thinking concurrent. What parts of my program can be executed simultaneously?
In computer science, this is the big nut to crack. As it turns out, our brain is not wired to think about concurrency problems. It is just too damn difficult.
When I started web development, I did have to worry about this problem at all. Applications would easily scale to multiple processors and servers without a problem. Back then, the only real language used to develop web applications was Perl, which ran through the Common Gateway Interface. The way it worked was that for every HTTP request a new instance of the Perl interpreter would be fired up, which handled the request and then died. Later came PHP, which optimised this process by embedding the interpreter in the web server itself. Still, every request would give you a completely clean PHP environment. Requests were 100% isolated. FastCGI is a similar optimisation technique used to reduce overhead by reusing interpreter processes. Optimisations aside, conceptually the CGI model remains the same:
- HTTP requests comes in
- Interpreter is launched
- Interpreter handles request
- Interpreter is destroyed
- Goto 1
This generation of languages, however, no longer ran through CGI or equivalents — a shift that had consequences I was more or less oblivious to for a long time:
Request isolation as enforced by CGI is what forced you into the share-nothing architecture. With this restriction gone, it has become easy (attractive?) to no longer do the right thing™.
Why this change?
- Simplicity, you no longer need to setup a web server to run it,
- Performance, no need to fire up sub-processes to handle requests.
However, maintaining state in the web server between requests has now become simple and often slips in, for convenience or performance reasons. Why store session data in a database and perform two extra queries on every request, when it can be implemented in a single in-memory map? I’ve also seen cases where people would accidentally store something in a global variable or a static field of a class — all of a sudden you share state with all other ongoing requests without knowing it.
As a result, here’s what I’ve seen happen in the context of a big node.js application, but I’m sure the same problems could have happened with Ruby or Python just the same:
- Memory leaks occur and it’s unclear where they come from.
- One request may interfere with another request resulting in unpredictable behaviour. Specific to node.js (specifically 0.6 and earlier) is that an uncaught exception will kill the entire process, and with that all ongoing HTTP requests.
- Horizontal scaling becomes difficult because of in-process state.
- Server performance inexplicably seems to degrade over time.
As a result, developers have to do what they dread: monitor the application to figure out the problem, do a heap dump to see what kind of objects are leaked, or when desperate: killing and restarting the server from time to time. Yikes!
Once in-process state is introduced, horizontal scaling may not be a trivial exercise. For instance, for convenience you decided to maintain some map in memory with user sessions. Now we have to scale up, of course if we launch two instances of the app, they will no longer share that map. Now you need a way to keep the map in sync between servers, or encode the map in a database or something like memcached. Not an impossible refactor, but one that would have been avoided if you weren’t able to persist state in-between requests at all, like in the CGI model.
CGI strictly enforces that you do not keep state in the web server. Any state is maintained during a request only and is completely volatile. During a request you can be as stateful as you like — enjoy! — but after the request, all of that is gone. Any state that needs to be maintained is stored in a system that’s specialised in exactly that: storing data safely — usually a database.
From this strict enforcement you get a lot of benefit:
- No danger of memory leakage over time.
- Virtually infinite horizontal scalability with no extra effort, only limited by external systems such as databases.
- No concurrency problems. Requests cannot affect each other, cannot share state and the crashing of one request handler does not affect others at all.
- Upgrading of the software is much easier, because any new incoming request will automatically use the new version.
But, like any solution, this it’s not a silver bullet, there’s a clear disadvantage: more per-request overhead (depending on the sophistication of the implementation), conceptually you have to start up a new process for every request. Therefore you can expect:
- Higher latency (for setting up the request)
- Higher memory usage (less sharing of resources)
- More queries to a database or memcached, because in-process caching is no longer an option.
The question is if that’s a deal breaker in practice. In addition, I do believe that the impact of many of these can be minimised, both by the environment (e.g. through clever copy-on-write sharing of resources, or by embedding the runtime with direct support for environment isolation) or the program itself (minimize the amount of setup that needs to be done for every request).
You may have terrible memories of Perl or PHP, but the key idea behind CGI — process isolation — was surprisingly brilliant; an idea that has seems to have been forgotten, but is just as relevant today.
Worth revisiting, perhaps?