Serverless First

Ceci n’est pas une serveur

There’s two common “w” questions around serverless: the first one is “WHAT IS IT?” because, if you’re anything like me, your first response to this term was that it sounded more like link-bait than actual technology. I’ll get to my definition of serverless in a bit (because I’m sure what you’re really waiting for is definition #2323 of serverless) — but what I really want to get into in this article is the second “w” question: “Why!?” Or in fact, rather than asking “why?” I want to flip the question and have people ask themselves “why not?”

I think even today, for many companies, it’s very valid to make serverless the default choice, and only when there’s good reason, fall back to this ancient relic “serverful.” Ok, let’s not get too crazy — perhaps not actual physical servers, or even virtual machines. I think “container” is about has far as we need to fall back.

But, before I get there, let me give my long-awaited definition of what I think of when I refer to “serverless.” Some use the term as an architectural pattern, but I care less about that aspect. For now I primarily see serverless as deployment technology for your own code, and relying on software-as-service for core dependencies, such as databases.

In my mind, a service can be considered serverless when the following things are abstracted away:

  • Physical or VM instances, or clusters thereof
  • Availability zones (and ideally regions, but we’re not there yet)
  • Scaling based on data size and traffic (e.g. operations/second)

Let me clarify through some examples: EC2 instances — definitely not serverless. No further explanation required, I hope.

RDS clusters (except the new “Aurora serverless”) — managed, but not serverless, because I have to choose instance types and cluster sizes, that is: I have to do some planning of the capacity I need.

DynamoDB — serverless, I don’t pick instance types, cluster sizes, my only knobs are read and write capacity, and even those I can auto scale today. Whether there’s 1 or 10,000 instances powering my tables — I don’t care. I don’t have to worry if one of them fail. Probably instances powering DynamoDB fail all the time, but there’s no way I can tell.

And then, the example that created the serverless category: AWS Lambda. First there was me manually FTPing into a server, and uploading my Perl files to cgi-bin. Then there were configuration management tools like Puppet, Chef and Ansible that automated this. Then came containers, and now we zip up our code, upload it to an S3 bucket, and let AWS figure out how it should run, where, and at what scale. Does it run on a giant-ass server with 64 CPUs, or on an iPhone? 1 running instance, even 0, or 1000? I have no idea, and I’m ok with that.

There are many opinions on this concept of Function-as-a-Service, which is the generalization of AWS Lambda. How much logic should one function contain? How should it deal with state, or not at all? Is a function the same as a microservice? Honestly, at this point — I don’t care. In projects we build at OLX, some functions contain a full API server, with a large scope, and some functions simply push a message into SQS, or a CloudWatch log message into logz.io. Whatever works.

Evolution

Back to my “why not?” question. The way I see it, serverless is the next natural step after first manually managing physical servers in a rack, ensuring they have power, wire up the switches etc. Then we went to virtual machines followed by the concept of “the cloud”, which, in its first iteration, was primarily Infrastructure as a Service: you can instantly boot up VMs, without having to worry about capacity or the hardware behind the scenes.

In this transition, did everybody progress from managing physical servers to cloud computing? No. There’s use cases and a certain scale where it makes sense to manage servers yourself. Sometimes companies start out in the cloud and then move back to physical servers because the economics work out. Dropbox did this a while ago, for instance. But if you’re a startup today and you tell your potential investors that you want to spend the first 6 month and first $200k on buying and racking up a server infrastructure, you’ll be laughed out of the room. The cloud is the default choice today.

I see serverless as the next transition, although technically still cloud computing, it raises the level of abstraction even further. No longer do you launch some VMs and run MySQL or Cassandra on them, now you just buy a pay-for-what-you-use database service. As a result, when building services and apps using this technology the amount of time spent on operations plummets dramatically. Scaling from 1 to a 1 million users with a small team becomes possible, but even apps that will never reach that scale will not have to spend much money on their infrastructure, because it’s become much more granular in terms of costs.

For example, if I would want to run a small web service with a database with some level of high availability before serverless, I basically would have to deploy at least 5 VMs. 3 to run a high-availability database cluster, and 2 for the web app itself. Even with no traffic this would cost me a hundred dollars per month or more. With serverless I’d probably pay a single digit dollar amount and it’d be purely for the provisioned throughout on DynamoDB. If nobody calls my service, I don’t pay anything (except perhaps for the storage of my function in S3, which may well set me back a tenth of a cent).

And cost is just one aspect. Do you know how much expertise it requires to build a high availability database cluster and debug production issues?

Tooling and productivity

Tooling and other building blocks in the serverless ecosystem are getting better rapidly. At OLX we’re all in on AWS, so any team can mix and match anything that AWS has to offer. In the project I’m working on now, we’re 100% in on serverless and are combining: lambda, API Gateway, SQS, SNS, S3, X-Ray, CloudWatch, and DynamoDB. We tie this all together using the serverless framework and CloudFormation. As a result with just 1–2 people spending time on the backend and infrastructure (none are “qualified” DevOps people), we have a stack that 5 years ago I wouldn’t dream of. But because it’s essentially free to run as we develop it, we spin up a separate instance of our entire infrastructure for every git branch in about 5 minutes, for running our end-to-end tests, for example, and the destroy it again. Because… why not?

Yeah, yeah, but seriously, why not?

Yes. We’re lucky in that we’re building a largely greenfield product, what about the rest of the world that needs to interact or extend existing legacy systems?

We have cases like this too, but until now it was fairly straight forward to build solutions that just require a small whiff of server to resolve the impediment. For instance, to be able to listen to events on RabbitMQ, we’re building a bridge which pushes events from Rabbit to SNS, for further processing with lambda. This bridge cannot reasonably be built using lambda (which isn’t built for long running processes), so we will write something in Go, and deploy it in a container, perhaps running on EC2, perhaps in AWS Fargate.

Anyway, believe me, there are legacy system reasons that make serverless a slightly less friction free choice.

There are some other weaknesses as well.

If you have a very predictable amount of traffic or compute, serverless will be more expensive than serverful for sure. Generally, if you operate at massive scale, and have the engineering capacity, building and managing an infrastructure with the likes of what Facebook, Google, Netflix or Microsoft are running may make economical sense for you.

If you need to develop real-time interactivity via web sockets, for instance, that’s pretty hard to do with lambda, but I’m sure we’ll see solutions to this limitation soon, it’s not a fundamental one. And I’m sure there are more use cases like this that are not currently well supported, yet.

If you’re afraid of vendor lock-in, serverless is not a good place right now. Almost every single serverless service in e.g. AWS is pretty damn proprietary. There are cross-vendor solutions, but the amount of features you can get this way will be significantly less than when buying into just one provider.

So absolutely, there are reasons to still build serverful services in 2018, but I predict we will see the same thing happen as managing physical servers in a data center before: there are reasons to still do this, but it’s no longer the default choice. And as the serverless space gets more and more mature, the number of serverful use cases will steadily decrease.

When building the new, or replacing the old, the intuition will be serverless first.