Tag persistent storage

11 distributed key-value stores, as reviewed by last.fm engineers

Here is a list of projects that could potentially replace a group of relational database shards. Some of these are much more than key-value stores, and aren’t suitable for low-latency data serving, but are interesting none-the-less.

Name Language Fault-tolerance Persistence Client Protocol Data model
Project Voldemort Java partitioned, replicated, read-repair Pluggable: BerkleyDB, Mysql Java API Structured / blob / text
Ringo Erlang partitioned, replicated, immutable Custom on-disk (append only log) HTTP blob
Scalaris Erlang partitioned, replicated, paxos In-memory only Erlang, Java, HTTP blob
Kai Erlang partitioned, replicated? On-disk Dets file Memcached blob
Dynomite Erlang partitioned, replicated Pluggable: couch, dets Custom ascii, Thrift blob
MemcacheDB C replication BerkleyDB Memcached blob
ThruDB C++ Replication Pluggable: BerkleyDB, Custom, Mysql, S3 Thrift Document oriented
CouchDB Erlang Replication, partitioning? Custom on-disk HTTP, json Document oriented (json)
Cassandra Java Replication, partitioning Custom on-disk Thrift Bigtable meets Dynamo
HBase Java Replication, partitioning Custom on-disk Custom API, Thrift, Rest Bigtable
Hypertable C++ Replication, partitioning Custom on-disk Thrift, other Bigtable

Awesome review of a bunch of possible Key-value store pairs. Tokyo Tyrant conspicuously missing.

Currently contemplating Tokyo Tyrant as a distributed key/value hash DB... or MemcacheDB?

It's all the rage to be using non-sql-based storage software these days. Memcache is great for caching, but what happens when it falls out of cache? Enter Tokyo Tyrant / MemcacheDB.

Sometimes all you really do need is key-value pairs. Is Tokyo Tyrant the answer? Looks like it's being used under heavy load for a few other production sites. Though MemcacheDB is purportedly being used by Digg.

There's interestingly very little online that I've found about MemcacheDB vs Tokyo Tyrant. Anyone have some info to share?

MemcacheDB looks like a great place to store denormalized subscriptions.

Imagine Kevin Rose, the founder of Digg, who at the time of this presentation had 40,000 followers. If Kevin diggs just once a day that's 40,000 writes. As the most active diggers are the most followed it becomes a huge performance bottleneck. Two problems appear.

You can't update 40,000 follower accounts at once. Fortunately the queuing system we talked about earlier takes care of that.

The second problem is the huge number of writes that happen. Digg has a write problem. If the average user has 100 followers that’s 300 million diggs day. That's 3,000 writes per second, 7GB of storage per day, and 5TB of data spread across 50 to 60 servers.

With such a heavy write load MySQL wasn’t going to work for Digg. That’s where MemcacheDB comes in. In Initial tests on a laptop MemcacheDB was able to handle 15,000 writes a second. MemcacheDB's own benchmark shows it capable of 23,000 writes/second and 64,000 reads/second. At those write rates it's easy to see why Joe was so excited about MemcacheDB's ability to handle their digg deluge.

Posterous has this exact problem when handling subscriptions. We currently denormalize new posts for subscription lists using after_save hooks that translate new posts into new notification records on the backend. But if we can plug this instead into MemcacheDB instead, needless DB writes can be avoided.