Distributed Hash Table Part Two

Simulating a DHT using consistent hashing and a ring of lightweight Erlang processes.

In part one we introduced the concept of a distributed hash table and build a simple process ring. Here in part two we'll simulate a DHT with processes running in the same node. Finally, in part three we'll extend the process DHT to a full-blown, multi-node DHT.

Complete code for this tutorial is available in the GitHub repository.

Why Simulate a DHT?

Creating a full-blown distributed hash table that works across physical machines is a bit daunting and requires careful attention to lots of nit-picky details. For this tutorial series, it makes sense to start off by focusing on the theory that powers the DHT, then extend a working system in small steps until it becomes a full-fledged DHT. The previous tutorial covered the theory we'll need, so here we'll get that theory working as a running system in a single node. Once we reach that point, essentially a simulation of a network of DHT nodes, it will only require some grunt work to convert it into a full-scale distributed system that runs on a cluster.

Simulating a DHT

As with a normal key/value store, this DHT will have a simple client interface: store(Key, Value) and fetch(Key). Unlike a normal key/value store, however, notice that consistent hashing allows (requires!) the client to compute the hash. Here are the implementations of those functions:

As I alluded in the previous part of this tutorial, notice that the critical operation in both of those functions is to locate the server responsible for storing a given key; this occurs in key_lookup/1. Since the key lookup is the most crucial part of this DHT, let's take a close look at that feature. Once you understand it, building the rest of the DHT is just straight-forward business logic.

Locating Keys on a Consistent Hash Ring

To find the node responsible for storing a given key, we "walk" clockwise from the key's location around the hash ring, and the first server encountered is the node we're looking for. More precisely, we need to find the first server whose ID is greater than or equal to the key in question.

In practice a client can send a key lookup request to any node in the DHT, but in this simulation we'll register the first node in the ring with the name dht_root_node so that all requests can be sent to a known process. Also note that there is no longer a need for the rpc function, since we're now using an OTP-compliant behaviour, gen_server:

When a server receives a key_lookup request, there are three possibilities:

  1. The key matches the server ID, or there is only one server in the DHT.
  2. The key is between this server's ID and the next server's ID, so it belongs with the next server.
  3. This server cannot answer the question "where does this key belong", so the request must be forwarded to the next server in the ring.

In cases 1 and 2 the 'current' server can respond directly to the client with an authoritative answer to the location of the key. In case 3, the message will have to be passed clockwise around the ring to the next server (known as a neighbour). The neighbouring server will then repeat this decision process until the request arrives at the node that has the answer. When the answer is known, a message will be sent directly back to the requesting client, rather than backtracking the response through all of the nodes that handled the request along the way.

If that process sounds confusing, don't worry: it can be. You might find it helpful to take another look at the diagram in the first part of this tutorial series. Below is the server implementation of the process described above. Study it until you understand it, because it is the most important aspect of a distributed hash table.

Finding Your Place in the World

Closely related to the idea of locating a key on the hash ring is the idea of determining where on the hash ring a server belongs. When a node joins the ring, this is an important question to answer so that it can locate its rightful neighbours (its 'previous' and 'next' nodes) and instruct them to update their neighbour references to point to it (the new node). By treating its hashed ID as a regular key, the new node inserts itself in between two existing nodes in the ring, and takes over responsibility for some of the keys that fall in the affected range of the ring. I call this task find_neighbours and it works very much like the key_lookup operation, except that we return both of the neighbouring nodes for the given location. The code should look quite familiar:

Unconventional Messaging

There is only one interesting item remaining to discuss: the somewhat unconventional messaging that occurs for the key_lookup and find_neighbours operations (and, generally, any task that requires a search around the hash ring to locate a key's position). When a request to locate a node is sent to dht_root_node, we have to assume that in many cases the node we send the request to will not know the answer, and will thus need to forward the request on around the ring until it arrives at a node that does know the answer. This is somewhat unusual, since a typical gen_server request/response cycle consists of a client sending a request to a server process and receiving a direct reply back from that server.

The following diagram illustrates such a key search, supposing a client is looking for Key 37, and initiates the request with the node whose ID is 10:

Fortunately, the gen_server behaviour can handle this situation. As you can see in the key_lookup operation below, we send the client's process ID in the cast. The server that handles the cast returns a {noreply, State} tuple and forwards the request on (if necessary). When a server can finally answer the request, it unpacks the RequestingPid from the request and fires a message directly back at the client.

The request from the client, including the client's Pid is familiar:

And the server's non-response:

Finally, the gen_server:cast call sends a message directly back the client, regardless of which server is actually sending that response:

Back at the client, intercepting the eventual reply (regardless of which server it comes from) is easy, if unorthodox:

And that's really the extent of the interesting stuff in making a simulated DHT using Erlang processes. If you want to see the other few dozen lines of code, have a browse through the GitHub repository. In the third part of this series, we'll convert this simulation into a full-blown distributed Erlang/OTP program that allows us to run a DHT across a cluster of physical machines.