Distributed Systems Week part V: Distributed Computing

If you're a big animation film producer, like Pixar, you have a problem when it comes to rendering a movie. As you probably know a film consists of about 20-25 frames per second. In the case of a animation film, each of these have to be rendered, one by one. Rendering such a high-quality frame can take an hour to multiple hours. A quick calculation shows that there are about 20 frames * 60 seconds * 90 minutes = 108,000 frames in such movie (probably more). Now, let's say that it takes one hour to render such a frame, that would mean that it would take at the very least 12 years to render a movie. Obviously, that's a little too long. That's where distributed computing comes in. Companies like Pixar have huge clusters with high-performance machines, rendering frames 24/7. Such a rendering farm usually consist of hunderds or even thousands of machines. The luck a company like Pixar has is that all the computations that has to be done can be distributed very well. Each machine renders a frame at a time. The frames don't depend on each other so the process can be parallelized easily.

Parallelizing means that parts of your big calculation can be executed simultaneously by multiple processors. For example, if you want to calculate 28 * 13 + 812 * 2, how would you calculate that normally?

# 28 * 13 = 364 # 812 * 2 = 1624 # 364 + 1624 = 1988

That requires three steps. Now, let's say we have two processors at our disposal. How can we parallelize this to be calculated faster?

|*Step*|*Processor 1*|*Processor 2*| |1|28 * 13 = 364|812 * 2 = 1624| |2|364 + 1624 = 1988| |

Now we can do it in two steps. This is a simple example of course, but it is how this works in practice. As a programmer of software that has to take advantage of multiple processors or even multiple machines you have to think about how to distribute the calculations over the different processors.

There are extensions for the Fortran programming language available that aid in distributed computing. It makes it possible to do things like this (note that this will be calculated by multiple processors with one shared memory):

REAL, DIMENSION(100,100) :: A, B,C ! Declare three 100x100 matrices, called A, B and C
A = 5      ! Assign 5 to each of the 100x100 elements of A
B = 10     ! Assign 10 to each of the 100x100 elements of B
C = A * B  ! Multiply A and B and put the result in C

The assignment to A on the second line is done in parallel. The values can be written to A using different processors at the same time. Same goes for B. The matrix multiplication is also done in parallel.

There are other ways to do distributed computing aswell. For example by using the "MPI interface":http://www-unix.mcs.anl.gov/mpi/, the Message Passing Interface. The idea behind this is to run the same program on each of your processors (without shared memory) and let them communicate. Each process can act differently depending of the rank they get assigned (a rank is a number each process running on each different processor gets assigned). I won't include an example here, as it requires too much explaining.

There are different scales of distributed computers. It can be multiple processors in one machine, a cluster of machines or even thousands and thousands of computers all over the world, like projects such as "SETI@home":http://setiathome.ssl.berkeley.edu/. In case you want to know more, I recommend reading the "syllabus":http://www.cs.rug.nl/~petkov/teaching/apphps2004.pdf we used when I took my distributed computing class.

Tomorrow's going to be the last day of my Distributed Systems week and it will be about Agents.