FIG. 1 shows an example of a distributed hash table (DHT) 50. A distributed hash table is a distributed table of key-to-value mappings stored piecewise in a cloud or network of participating nodes. A node may be any type of computing device. A key (such as example key 60) is an identifier, usually numeric, in a large space and is intended to be associated with a node or piece of data such as a value. Multiple key-to-value mappings are possible. For instance, in some DHT implementations there can be 8 different pieces of data with the key “10”. A value is unconstrained data intended to be referenced by a key. A value (such as example value 62) can be anything such as an arbitrary string, a network address, a large blob of data, etc. A DHT has functionality much like that of a simple unitary hash table. For instance a DHT will usually have functionality for inserting key-value pairings and looking up keys to find their values. However, with a DHT the DHT's nodes cooperate to maintain the key-to-value mappings and to provide the hash table functionality. Although FIG. 1 shows a hash table 51, the hash table 51 is actually distributed among nodes 54; portions 52 of key-to-value mappings are maintained in nodes 54. For instance, the top node 54 in FIG. 1 stores a portion 52 comprised of keys key1 through key4 and respective values value1 through value4.
FIG. 2 shows a data network 70 with DHT nodes 54. Each DHT node 54 of a DHT may be capable of network-level communication with the other nodes, however, a node 54 may or may not know the network address of a node that is storing a particular key and its value, depending on the contents of its local routing cache. Each node maintains routing information (e.g., a routing cache, discussed later) that it uses to route or forward messages, such as key lookup messages, to other nodes that may either know the key's values or that may use their own routing information to forward the messages to other nodes (usually closer to the target node), and so on. A routing cache may have one or several hundred entries, while the DHT may have any number of nodes. A popular Peer Name Resolution Protocol (PNRP) network could have hundreds of millions of nodes, theoretically limited by its arbitrary key size to 2^128. Other DHTs such as Pastry have keys of arbitrary length with an arbitrary number of nodes. Nonetheless, if a DHT node needs the value of a key it will send a message with the key to another DHT node selected from its routing cache. If the receiving DHT node has the key's value then it will return the same to the requesting DHT node. If the receiving DHT node does not have the key's value then it will select from its routing cache yet another DHT node—usually closer to the target DHT node—and it will forward the message to that node. This process may be repeated until the message reaches the node that has the requested key.
As mentioned above, each DHT node has a routing cache. FIG. 3 shows a configuration of a node 54 with a routing cache 90. Each node in a DHT is usually assigned a unique node identifier (nodeID) which can be mapped into the key space. NodeIDs may be assigned randomly or by other means. The routing cache 90 stores pairings of nodeIDs and corresponding network addresses. A network address is defined herein to mean a node's address on a data network and may be in the form of a numerical address (e.g., an IP address), or a hostname that can be resolved to a numerical address, or other information that one node can use to direct communications via a data network to another node.
The node 54 in FIG. 3 also has an application 94 that interfaces with the DHT. The application 94 may provide or invoke a lookup function, for example lookup(key), which returns a value. The application 94 may also provide or invoke an insertion function, for example insert(key, value). Because lookups, insertions, deletions, etc. are similarly routed in a DHT, routing will be discussed with reference to a generic DHT message. The application 94 may request or lookup a key. The looked up key may either be a target nodeID or may be mapped to a target nodeID. A logic or routing module 96 may respond to a lookup (or insert) request by first checking to see if the key or target nodeID is stored in node 54's local portion 52 of the DHT. If the target nodeID or key is not stored locally in DHT portion 52 then the routing module 96 usually obtains from the routing cache 90 a nodeID (and its network address) that is numerically closest to the target nodeID. There are occasional exceptions to this numerically closest routing approach. For security reasons, some DHTs can forward messages to a sub-optimal next-hop. And some DHTs, such as those implementing the PNRP, can temporarily forward messages away from the target or destination rather than towards it.
After selecting or obtaining the next-hop nodeID, the node 54 then forwards the lookup request via the network 70 to the node of the selected nodeID using that node's network address in node 54's routing cache 90. Most DHT implementations measure numerical closeness by prefix matching. That is to say, a node will select from its routing cache a nodeID with a longest prefix that matches a prefix of the target nodeID. The result is that a request message is routed to nodes with nodeIDs increasingly closer to (and eventually equal to) the target nodeID. For example, a message targeted to nodeID 1243 may follow a route such as: 1876→1259→1247 →1243.
FIG. 4 shows a simple cloud or overlay network 100 formed by nodes 54 linked by their respective routing tables 90. The topology of overlay network 100 is determined by the contents of the routing tables in its nodes 54. In practice, the overlay network 100 may have any number of nodes 54.
The discussion above assumes the existence of routing caches or tables in DHT nodes. In reality, nodes build up and manage their routing tables in view of a number of objectives and constraints. Routing tables have been constrained to a small size relative to the number of nodes in a DHT. Routing tables have been populated and maintained according to structural or organizational rules designed to create highly structured overlay networks that facilitate efficient routing of request messages; the topology of an overlay network reflects the nodeIDs selected for inclusion in the routing tables of the participating nodes. Some routing maintenance techniques have favored popular keys, efficient replication, etc. Some DHT implementations have structured their routing tables with the goal of guarantying that requests are routed on the order of log(N) hops, where N is the number of nodes. These aims and constraints can be cumbersome and unreliable. Consider FIGS. 5 and 6.
FIG. 5 shows a structured routing table 110 with entries 112. The network addresses accompanying the entries 112 are not shown. The routing table 110 is populated and maintained such that there are entries 112 for nodeIDs with each sub-prefix of the hosting node's nodeID. Assuming that nodeIDs have a radix of 10, and assuming that the nodeID 114 of routing table 110's host node is 37124, then the routing table 110 is provided with levels of nodeIDs with increasingly longer prefixes that match 37124 (an “x” in an entry 112 indicates that the rest of the nodeID is unconstrained and can be any digit). Maintaining this kind of structure can be difficult, particularly when nodes frequently join and depart a DHT.
FIG. 6 shows a message route. In overlay network 100, node 42134 requests the value for key 35027. Node 42134 sends a request to node 37124. Assuming that node 37124 has the routing table 110 shown in FIG. 5, node 37124 refers to routing table 110 and determines that the entry at Level2, row 6 has the nodeID (35236) that is numerically closest to key 35027. Node 37124 refers to its routing table and accordingly forwards the request to node 35236 (which matches prefix “35”). The request is similarly forwarded by node 35236 and others until the request reaches node 35027, which is responsible for storing the key-value pair for key 35027. Incidentally, the filled request may traverse back along the same route to requesting node 54.
In sum, implementations of DHT routing tables have resulted in complex code with high overhead and low reliability.