The young “nosql” community met recently in San Francisco. A solid introduction was given about how distributed, non relational databases work. Moreover, they give an overview of the various projects out there.
Presentation slides and videos
Intro session – Todd Lipcon, Cloudera (slides, video1, video2)
Voldemort – Jay Kreps, Linkedin (slides, video1, video2)
Cassandra – Avinash Lakshman, Facebook (slides, video)
Dynomite – Cliff Moon, Powerset (slides, video)
HBase – Ryan Rawson, Stumbleupon (slides, video)
Hypertable – Doug Judd, Zvents (slides, video1, video2)
CouchDB – Chris Anderson, couch.io (slides, video1, video2)
Lightcloud is a distributed and persistent key-value database which scales out horizontally by adding new nodes.
Explanation from their website:
- Built on Tokyo Tyrant. One of the fastest key-value databases [benchmark]. Tokyo Tyrant has been in development for many years and is used in production by Plurk.com, mixi.jp and scribd.com (to name a few)…
- Great performance (comparable to memcached!)
- Can store millions of keys on very few servers – tested in production
- Scale out by just adding nodes
- Nodes are replicated via master-master replication. Automatic failover and load balancing is supported from the start
- Ability to script and extend using Lua. Included extensions are incr and a fixed list
- Hot backups and restore: Take backups and restore servers without shutting them down
- LightCloud manager can control nodes, take backups and give you a status on how your nodes are doing
- Very small foot print (lightcloud client is around ~500 lines and manager about ~400)
- Python only, but LightCloud should be easy to port to other languages
According to the blog entry of Richard Jones, the most mature distributed stores are:
Despite being the winner of Richard Jones’ comparison, Voldemort presents several drawbacks:
- JVM-only API
- purely key-value, no structured storage
- no dynamic group membership
Dynomite is currently used in production at powerset.com to serve images and supports thrift clients, dynamic adding of nodes. Unlike Voldemort it keeps all of the read repair, replication, and concurrency in the server, keeping the client code as simple as possible. Although this project seems very interesting, I would like to see operational reports from Microsoft about its scalabiliy and performance.
Currently, Scalaris has no disk persistence: the system works only in-memory. They reported the following performance: 2500 transactions per second with 16 servers. Voldemort reports much better performance.
An issue with Scalaris as mentionned by Werner Vogels is that under realistic failures scenarios, even with 3 phase paxos, commit consitency can only be guaranteed by not taking writes. Hence, Scalaris could not be used in a sales-oriented environment like Amazon where write availability is more important than consistency.
Cassandra is not part of Richard Jones’ winner list due to the fact that the community and documentation about it is not much present.
Cassandra was accepted into Apache Incubator on 02.01.2009. I think that the community around Cassandra will quickly grow during the next months. Moreover, the mailing lists are already more active that those of Voldemort. Here is a summary of its design choices and architecture.
Cassandra has some advantages over Voldemort:
- provides structured storage (not only key-value as Voldemort)
- Voldemort does not support dynamic group membership: one may not
be able to add nodes to the cluster without bringing the cluster down. Cassandra uses dynamic group membership algorithm based on Gossip.
- Thrift as a client protocol, and not only JVM API. Thrift was also accepted into Apache Incubator.
- Hadoop integration