When you’re developing applications you sometimes might want to do multiple things at the same time. For example, while you do some kind of batch processing in the background, your GUI should still respond. You can do this using threads. Threads a lightweight processes that run concurrently. Threads can be very useful, yet can complicate things a lot. What follows is a very introductory entry about concurrency.

As an example I’ll use a problem I (could have) had with “KeyTopic”: KeyTopic is a Java servlet application. Servlets run in a so-called servlet container, such as “Resin”: or “Tomcat”: Unlike many web development environments such as PHP, with servlet programming it’s possible to keep data persistent in memory. This means that you can share objects between requests. In KeyTopic, board and categories are kept in memory and shared among requests, for performance reasons. The category and board objects are mutable (fancy word meaning that they can be changed) by any request (thread). But what happens when two requests modify a certain board object at the very same time (for example to increase the number of posts in a board)? Well, this can very well go wrong. This is called a synchronization problem. I’ll give a simple example. Let’s assume a counter, x, which’s value is increased by one by two processes concurrently. The initial value of x is 0. At the end the value should, obviously, be 2. Now let’s look in semi machine code instructions how this is executed and what could happen. (Remember that a CPU has a couple of registers, which are very fast pieces of memory that can be used for storing intermediate calculation stuff, I’ll name them r0, r1, r2…):

Initially: x = 0

|*Process 1*|*Process 2*|
|load x into r0| |
| |load x into r1|
|r0 = r0 + 1| |
| |r1 = r1 + 1|
|store r0 into x | |
| | store r1 into x|

We end up with: x = 1. Hum? Why did that go wrong? Because two processes manipulated the same variable at the same time. Java (and many platforms) offer solutions for this, one of them in Java is to mark a method “synchronized” which means that only one process can call it at a time, the second will just have to wait until the first is done. So when you have a method to increase the post count of a board and mark it synchronized everything goes OK, considering that that’s the only method that can modify the post count value.

When I developed “YaBB”: I had similar problems. We used text files to store our data in. However, sometimes two perl processes (different requests) wrote to a certain file at the same time, leaving the file messed up. On Unix systems this problem can be solved using flock. Flock locks a file (File LOCK) for either writing or reading. If you’re going to write to a file, you have to lock it for writing. Other processes that want to read or write to the file will have to wait until the file is unlocked. If you’re going to read a file you lock it for reading, other process which want to read will be able to, but processes that want to write will have to wait until the reading lock is removed. It’s a simple, yet effective method. Note that this also shows that flat file databases don’t scale so well. If you got a lot of requests coming in at once, processes are lining up to access certain files. This is particulary bad when you write to those files a lot.

Those only the simplest of examples in the world of concurrency. The most important thing is that you’re aware of it: If you’re accessing shared variables from multiple threads or processes, beware of concurrency issues. If you’re interested, there are many books about the subject. Personally I don’t find the concurrency field that interesting, partly because we had the most annoying teacher teaching it. It’s also the only class that I quit upfront (luckily it was optional). Therefore I don’t have particular book recommendations, but I’m sure a search for “concurrent programming” on “Amazon”: will come up with good results.