Doing research now about ESI / Varnish, and saw this useful explanation of the entire process. ESI looks like a pretty powerful weapon in the web scaling arsenal.
Assembling Pages Last: Edge Caching, ESI and Rails
Tag scaling
Why are we partitioning our dataset and how does it help us to achieve scalability of our application?
It is difficult, if not nearly impossible, to massively scale your data layer when the data is limited to residing on a single server. Whether the limiting factor is a hardware cost issue, or you’ve simply equipped your server with the highest performing hardware possible, we ultimately find ourselves up against a wall – there are inherent limitations to what is currently possible by vertical scaling our hardware, it is a simple matter of fact. If we instead take our dataset schema, duplicate it onto multiple servers (shards), and split (or partition) the data on the original single server into equal portions distributed amongst our new set of servers (shards), we can parallelize our query load across them. Adding more servers (shards) to our existing set of servers results in near limitless scalability potential. The theory behind how this works is simple, but the execution is a fair bit more complex thanks to a series of scale-specific issues one encounters along the way.
Great article, hat tip to Mike Montano from Backtype. We are both going through massive scale growing pains. Articles like this are like a balm for burn victims. ;-)
Brilliant must-read article on DB sharding with MySQL at popular social network Netlog.
Here is a list of projects that could potentially replace a group of relational database shards. Some of these are much more than key-value stores, and aren’t suitable for low-latency data serving, but are interesting none-the-less.
Name Language Fault-tolerance Persistence Client Protocol Data model Project Voldemort Java partitioned, replicated, read-repair Pluggable: BerkleyDB, Mysql Java API Structured / blob / text Ringo Erlang partitioned, replicated, immutable Custom on-disk (append only log) HTTP blob Scalaris Erlang partitioned, replicated, paxos In-memory only Erlang, Java, HTTP blob Kai Erlang partitioned, replicated? On-disk Dets file Memcached blob Dynomite Erlang partitioned, replicated Pluggable: couch, dets Custom ascii, Thrift blob MemcacheDB C replication BerkleyDB Memcached blob ThruDB C++ Replication Pluggable: BerkleyDB, Custom, Mysql, S3 Thrift Document oriented CouchDB Erlang Replication, partitioning? Custom on-disk HTTP, json Document oriented (json) Cassandra Java Replication, partitioning Custom on-disk Thrift Bigtable meets Dynamo HBase Java Replication, partitioning Custom on-disk Custom API, Thrift, Rest Bigtable Hypertable C++ Replication, partitioning Custom on-disk Thrift, other Bigtable
Awesome review of a bunch of possible Key-value store pairs. Tokyo Tyrant conspicuously missing.