The Share-Nothing Architecture

Although I used to think otherwise, PHP scales very well and probably easier than other platforms like Java, ASP.NET and Perl (using mod_perl) and I’ll tell you why. But first I’ll tell what scalability is and how it’s different from speed, as they’re often mixed up.

People tend to say that because their product is really fast, it scales well. That’s wrong. Scalability is “the capability of a system to increase performance under an increased load when resources (typically hardware) are added.” (source: Wikipedia) In other words, will you be able to handle the slashdot load when you add another processor and double the memory? Will you be able to handle roughly twice as many concurrent requests when you set up a second server? Will your application even work when you duplicate your servers? So even if your application’s pages take 2 seconds to load, under no load, and your competitor’s load in half a second, that does not imply that your competitor’s scales better.

When I chose Java to develop KeyTopic in, I did so because I could finally do some interesting things with caching data. The cool thing about servlet containers, where Java web applications live in, is that it’s very possible to keep objects persistent in memory through requests. So it’s possible to have, say, a in-memory counter that is increased by 1 on every request. How fast and efficient is that? ASP.NET and mod_perl (and probably others) have similar features. Using this techniques I could just cache the data of all categories and boards in memory, so I could lower the amount of queries needed to the database. There’s one problem with this though and you may guess what it is… Yes, indeed, it’s scalability.

A big scalability problem with caching data is called the cache-coherence problem. What if one server can’t handle the traffic anymore and a second one is added. This can simply be done by putting a so-called load balancer as main entry. The load balancer will spread the requests over all servers that are available at that time. This way the load is balanced.

Let’s look at the ordinary request flow in a one-server situation:

Now add another two servers and put a load balancer in front:

Each server has its own cache. That’s fine if you only got one server, but if you got two or more, things can get nasty. When somebody posts at the first load-balanced server and the number of posts on a board is increased only the first server’s cache knows about this. All people whose requests are handled by the second and third server still get sent the old post count: The caches are not equal, they’re not coherent. In this scenario this is not disasterous and could be solved by a simple cache flush every, say, minute (or more, depending on the traffic), yet it can cause serious problems in many other occassions. For example, imagine a networking file system. It would be very efficient to cache some files locally that are accessed a lot, but what if somebody else on the network changes those files? You would still be using or editting the old ones. This can only work well with systems where files are only read, not written, or at least not often. What seemed to be a performance gain, appears to become a major scalability issue when used inappropriately.

The solution is the shared-nothing architecture, which I first heard of in an interview with Rasmus Lerdorf (the initial creator of PHP). What does this architecture involve? Simply not sharing anything. No shared data, at least within the webserver. Sessions are shared, but are shared through the filesystem or database, which can easily be scaled by using a networked filesystem (NFS, SMB) or by using a database. As no cache or whatsoever exists in a server you can copy it virtually as much as you like. At a certain moment the database server might become a bottleneck of course, but databases usually scale very well. MySQL might not, but Oracle scales to unimaginable proportions, so as long as you have enough money, you can scale as far as you want (virtually).

So, PHP scales better than Java, ASP.NET and mod_perl? No, not necessarily, but PHP scales by default. PHP allows no sharing of data, no caching within the process (unless you explicitly enable those features). Java, ASP.NET and mod_perl do allow sharing of data and so you have to make sure that you know when you use it and how to handle it in your situation. In other words: It’s easy to write non-scalable software on those platforms.

Got something to say?
  1. Perhaps you’re not being really fair to ASP.NET (don’t know much about Java or mod_perl and the like). Simply setting e.g.

    in web.config would turn put ASP.NET sessions in SqlServer. Other options are InProc (on the webserver, in memory) and StateServer, which lets you have the in-memory cache on another server running the ASP.NET State Service.

    http://msdn.microsoft.com/library/en-us/dnaspp/html/aspnetscal.asp is a good read for ASP.NET scalability.

  2. Perhaps you’re not being really fair to ASP.NET (don’t know much about Java or mod_perl and the like). Simply setting e.g.

    in web.config would turn put ASP.NET sessions in SqlServer. Other options are InProc (on the webserver, in memory) and StateServer, which lets you have the in-memory cache on another server running the ASP.NET State Service.

    http://msdn.microsoft.com/library/en-us/dnaspp/html/aspnetscal.asp is a good read for ASP.NET scalability.

  3. Zef says:

    I vaguely remember there’s also some other stuff, a Hashmap that you could use to store data in. Can’t really remember the name, it could even be Cache. But if there’s not, then ok, ASP.NET’s share-nothing aswell.

  4. Zef says:

    I vaguely remember there’s also some other stuff, a Hashmap that you could use to store data in. Can’t really remember the name, it could even be Cache. But if there’s not, then ok, ASP.NET’s share-nothing aswell.

  5. Zef says:

    Yes it’s a System.Web.UI.Page member variable called Cache.

  6. Zef says:

    Yes it’s a System.Web.UI.Page member variable called Cache.

  7. bws says:

    Having a cache does not mean that it isn’t scalable. Just create a new database table which contains a version number for every cache a server could have. When something changes, like a new category is added, that version number on the database server of the category cache is incremented, so all http servers know that they should recache stuff. Keeping a cache on the http servers is more efficient, and querying just ~20 values from a table (possibly with a join) is peanuts. I don’t see the scalability problem at all, just be creative :p

  8. bws says:

    Having a cache does not mean that it isn’t scalable. Just create a new database table which contains a version number for every cache a server could have. When something changes, like a new category is added, that version number on the database server of the category cache is incremented, so all http servers know that they should recache stuff. Keeping a cache on the http servers is more efficient, and querying just ~20 values from a table (possibly with a join) is peanuts. I don’t see the scalability problem at all, just be creative :p

  9. Zef says:

    That’s what I’m saying. You can cache and scale, but you have to be creative. With share-nothing you need no creativity, which very good because its users… nevermind ;)

  10. Zef says:

    That’s what I’m saying. You can cache and scale, but you have to be creative. With share-nothing you need no creativity, which very good because its users… nevermind ;)

  11. bws says:

    Well.. If you got a non caching slow system, you are forced to scale anyway :p Btw, I’m quite dissapointed in PHP that it doesn’t support caching :(. Though I can understand it for a lot of webservers still run the CGI version (which can appear to be a normal apache handler php install). Just hope AspX will be available widely soon (mod_mono). .Net just is 10 times faster than PHP, and the complicated component model doesnt decrease the speed that much :p

  12. bws says:

    Well.. If you got a non caching slow system, you are forced to scale anyway :p Btw, I’m quite dissapointed in PHP that it doesn’t support caching :(. Though I can understand it for a lot of webservers still run the CGI version (which can appear to be a normal apache handler php install). Just hope AspX will be available widely soon (mod_mono). .Net just is 10 times faster than PHP, and the complicated component model doesnt decrease the speed that much :p

  13. Ted says:

    Shared-nothing is nice, but if you’re not caching anything your database is going to become a bottleneck faster:

    No cache: hit the database for all reads and all writes Cache: hit the database for few reads and all writes

    “When somebody posts at the first load-balanced server and the number of posts on a board is increased only the first server’s cache knows about this. All people whose requests are handled by the second and third server still get sent the old post count: The caches are not equal, they’re not coherent.”

    Memcached! Favorite of LiveJournal, Slashdot, Wikipedia. “The memcached server and clients work together to implement one global cache across as many machines as you have.” Client APIs in PHP, Python, Java, Ruby, Perl, and C. http://www.danga.com/memcached/

    Here’s an interesting/informative/vaugely relevant look at what happened to LiveJournal when they started scaling way up using MySQL: http://www.danga.com/words/2004_mysqlcon/mysql-slides.pdf

    I found the LJ slides and subsequently Memcached on Jeremy Zawodny’s blog a while back. He works at Yahoo, and sometimes has interesting stuff to say: http://jeremy.zawodny.com/blog/

  14. Ted says:

    Shared-nothing is nice, but if you’re not caching anything your database is going to become a bottleneck faster:

    No cache: hit the database for all reads and all writes Cache: hit the database for few reads and all writes

    “When somebody posts at the first load-balanced server and the number of posts on a board is increased only the first server’s cache knows about this. All people whose requests are handled by the second and third server still get sent the old post count: The caches are not equal, they’re not coherent.”

    Memcached! Favorite of LiveJournal, Slashdot, Wikipedia. “The memcached server and clients work together to implement one global cache across as many machines as you have.” Client APIs in PHP, Python, Java, Ruby, Perl, and C. http://www.danga.com/memcached/

    Here’s an interesting/informative/vaugely relevant look at what happened to LiveJournal when they started scaling way up using MySQL: http://www.danga.com/words/2004_mysqlcon/mysql-slides.pdf

    I found the LJ slides and subsequently Memcached on Jeremy Zawodny’s blog a while back. He works at Yahoo, and sometimes has interesting stuff to say: http://jeremy.zawodny.com/blog/

  15. Alex says:

    Author is comparing heavy-thing to yellow-thing. There is no such thing as “Java scalability” BUT there are few Java-based solution that a much more scalable then using database for that purposes. Also, I wouldn’t call what is proposed a “share-nothing” as, in fact, a database is what is shared. And that is not the fastest way to share information.

  16. Alex says:

    Author is comparing heavy-thing to yellow-thing. There is no such thing as “Java scalability” BUT there are few Java-based solution that a much more scalable then using database for that purposes. Also, I wouldn’t call what is proposed a “share-nothing” as, in fact, a database is what is shared. And that is not the fastest way to share information.

Comments are closed now.