Review of distributed key-value stores

According to the blog entry of Richard Jones, the most mature distributed stores are:

Project-Voldemort

Despite being the winner of Richard Jones’ comparison, Voldemort presents several drawbacks:

  • JVM-only API
  • purely key-value, no structured storage
  • no dynamic group membership

Dynomite

Dynomite is currently used in production at powerset.com to serve images and supports thrift clients, dynamic adding of nodes. Unlike Voldemort it keeps all of the read repair, replication, and concurrency in the server, keeping the client code as simple as possible.  Although this project seems very interesting, I would like to see operational reports from Microsoft about its scalabiliy and performance.

Scalaris

Currently, Scalaris has no disk persistence: the system works only in-memory. They reported the following performance: 2500 transactions per second with 16 servers. Voldemort reports much better performance.

An issue with Scalaris as mentionned by Werner Vogels is that under realistic failures scenarios, even with 3 phase paxos, commit consitency can only be guaranteed by not taking writes. Hence, Scalaris could not be used in a sales-oriented environment like Amazon where write availability is more important than consistency.

Cassandra

Cassandra is not part of Richard Jones’ winner list due to the fact that the community and documentation about it is not much present.

Cassandra was accepted into Apache Incubator on 02.01.2009.  I think that the community around Cassandra will quickly grow during the next months. Moreover, the mailing lists are already more active that those of Voldemort. Here is a summary of its design choices and architecture.

Cassandra has some advantages over Voldemort:

  • provides structured storage (not only key-value as Voldemort)
  • Voldemort does not support dynamic group membership: one may not
    be able to add nodes to the cluster without bringing the cluster down. Cassandra uses dynamic group membership algorithm based on Gossip.
  • Thrift as a client protocol, and not only JVM API. Thrift was also accepted into Apache Incubator.
  • Hadoop integration

P2P NAT and firewall traversal investigations

Introduction

Introduction paper: NAT Traversal Techniques and Peer-to-Peer Applications from Zhou Hu

Some excerpts:

The main reason of NAT’s birth is the short-term solution of IPv4 address depletion. In Client-Server network, NATed environment is not a big problem for private addressed clients since they almost always get service by initiating connections with dedicated servers on public addressed Internet.
Peer-to-Peer network is different with Client-Server network.In Peer-to-Peer network peers have equal positions without classification of client and server and peers are directly connected by other peers. They act both as client and server simultaneously. In NATed environment, general firewall/NAT role does not allow incoming connection to private addressed hosts unless private hosts initiate the connection at first or NAT is specifically configured. The built-in privacy and security benefits of NAT are private addressed hosts hiding. On the other hand, this is a trouble because it is hard to locate and communicate with the private hosts behind a NAT gateway. How two private addressed hosts behind NATs could get to know each other in the very beginning of the Peer-to-Peer communication?

nat.png

Diverse NAT Traversal Techniques offer a variety of NAT devices with transparent traversal abilities to keep the end-to-end connection virtually. Some of them are NAT gateway optimized and plugged techniques such as Universal Plug and Play (UPnP), Application Lever Gateway (ALG). Some of them are fall-back (make use of Client-Server model) approaches, by which they use a relay server or introducer server on either side of NAT gateway or both sides, such as STUN and TCP/UDP hole punching. TU hole punching is the most robust and practical NAT traversal technique. It makes use of a rendezvous server as an introducer for clients behind NAT to get know each other’s host endpoints (IP address and TU port).

Common techniques:

  • Universal Plug and Play (UPnP)
  • Simple Traversal UDP Through Network Address Translators (STUN), STUNT
  • Application Level Gateway (ALG)
  • UDP/TCP hole punching

Simple Traversal UDP Through Network Address Translators (STUN)

The advantage of STUN is it does not require any changes on NAT devices. Clients could learn NAT devices automatically. On the other side, STUN is just a short term solution. It does not work with symmetric NAT which is commonly used by large corporate users. IETF proposes TURN to solve this problem. STUN requires client application upgraded to support STUN and an additional STUN server residing in public Internet. Those reasons make STUN deployment slow and unpopular.

Application Level Gateway (ALG)

Application Level Gateway always resides in NAT/Firewall devices to modify payload transparently thus work together with NAT to offer transparent routing for packets.
ALG commonly requires replacement or modification of NAT/Firewall device and configuration. This restricts the deployment of ALG technology

UDP and TCP hole punching

UDP and TCP hold punching is general purpose, robust technique to establish peer-to-peer communication in Peer-to-Peer network. This technique does not require the application to know the topology of network and presence of NAT devices. It does not modify NAT/Firewall configuration. The main idea of hole-punching is to have a relaying server which could be reached by clients both or either behind NAT.

Relaying

Relaying always works
as long as both clients can connect to the server.
Its disadvantages are that
it consumes the server’s processing power and network bandwidth,
and communication latency between the peering clients
is likely increased even if the server is well-connected.
Nevertheless,
since there is no more efficient technique
that works reliably on all existing NATs,
relaying is a useful fall-back strategy
if maximum robustness is desired.
The TURN protocol
defines a method of implementing relaying
in a relatively secure fashion.


Existing framework


There is several projects aiming at solving the NAT traversal issue:

  • NATTrav: no Java source code available, TCP only, hole punching
  • STUNT: most reliable hole-punching solution (reliablility > 85%), hole punching
  • JXTA: too complex just for NAT problem, TCP only, relay
  • Skype: has almost solved the problem, but no source code is available, hole punching + relay


Resources

Some interesting links: