Distributed Systems Week part II: The Client-Server Model

The most common model for distributing a system is the client-server model. The model is fairly simple to explain and use. The name of the model is quite descriptive. In your distributed system you have one or more servers. These servers provide services to other parts of the system, called clients. When a server is started it first opens up a particular port through which clients can access it. It then sits down and waits until somebody (the client) attempts to connect to it. When that happens, the server and client exchange some messages and ultimately of the two close the connection. This connection takes place using so-called sockets.

*The Server* The simplest version of such a server is non-threaded. That means that multiple connections are handled sequentially, in other words: clients have to queue up and they are handled one by one. That's fine if connections last only very short and if there are not too many clients connecting at the same time, however if you have many clients connecting at the same time or long-lasting connections you have to handle connections in parallel. You can handle multiple connections in parallel using threads. Each time a connection is established a new thread is created and the connection is handled by that new thread. The server thread then continues accepting new connections. Because creating threads is an expensive process (in terms of CPU cycles) threads are usually kept in a "pool". When a thread finished its job, it is kept alive until a new request has arrived it can handle.

*The Client* For the client to connect it has to know the server's IP or hostname and the port to connect to. Once the connection is established the client and server can exchange messages. Depending on the distributed system a client may connect to multiple servers. One server to access the database, one for file services, another for e-mail, for example.

*Stateful and Stateless* Until now we assumed that there's a lasting connection between the client and server, but that's not always the case. When you use a client-server model you can choose between two variants: stateful and stateless. So far we've assumed you use stateful connections. When a client connects the connection lasts, messages are exchanged and the server keeps the connection in a connection pool. With a stateless model you have only very short connections and no state is remembered between them. It's like the server only has a short-term memory, no long-term.

The most well-known example of this is HTTP(HyperText Transfer Protocol). When your browser requests a webpage it connects to the HTTP server, requests the file, gets it and closes the connection. After the connection is closed the server simply forgets about the browser ever connecting to it. With the stateful model the server would have to keep track of states of the connections that are kept alive. If there are thousands or even millions of connections, the server has to keep track of millions of connections and their states. With the stateless model there's no state stored. Every time the client does a request it sends the all the data the server needs to answer the request (not only the path of the file requested, but also authentication data and cookies). That's overhead on the amount of data that has to be sent, but it saves a lot of resources on the server side.

*Building State On Stateless* Right now you might be thinking, "Hey, but what about sessions? When I write webapplications I can uses sessions." That's true. Sessions build a state on top of the otherwise stateless HTTP protocol. Most of the time it works through storing a session ID in a cookie. On every HTTP request your browser sends all the cookies for that particular path/domain with the rest of the request. The session data is then retrieved from the filesystem or database (using the session ID it received), your webapplication scripts executes, changes the session state and at the end the session data is stored again. This process is repeated again and again on every request.

Building a state on an otherwise stateless protocol is common practice. When we dive down a layer of abstraction we end up at the TCP/IP level. Do you know how TCP/IP works? IP(Internet Protocol) is a very simple protocol. It just deals with packages that are addressed to some IP and contain data. In what order theses packages arrive, or even if they arrive at all, is totally unclear. IP packages are stateless, there's no notion of connections or orders, it's just data addressed to some IP on some particular port. What TCP does is add certainty to the IP. It makes sure packages arrive in the same order as they were sent and that they arrive. It also introduces the concept of connections. It adds tags saying "this package belongs to that particular connection", both ends of the line keep track of those connections; TCP adds state, therefore TCP makes IP stateful.

This might be confusing. HTTP uses TCP/IP right? How can HTTP be stateless and TCP/IP stateful? It's because HTTP connections exist only briefly. A browser connects, retrieves data and disconnects. In order to get the data from the browser to the server and back the stateful TCP/IP protocol is used. The stateful part of the process takes place on a low level and is only very short (only one request). Between two HTTP requests no state is kept and because you, as a developer, think in terms of requests, the protocol is called stateless.

In the next part of this series I'll talk about a more interesting model for distributed systems, called peer to peer networks.