A hash table is a data structure that associates keys with values. A key may be, for instance, a person's name. The corresponding value may then be that person's contact address (e.g. email address or Session Initiation Protocol (SIP) Uniform Resource Identifier (URI)). The primary operation a hash table supports is lookup: given a key, the hash table finds the corresponding value.
Distributed Hash Tables (DHTs) provide a lookup service similar to a hash table. However, unlike regular hash tables, DHTs are decentralized distributed systems. In a DHT, the responsibility for maintaining mappings from names to values is distributed among the nodes (or peers) participating in the system. This is achieved by partitioning the key space among the participating nodes. The nodes are connected together by an overlay network, which allows the nodes to find the owner of any given key in the key space.
In the overlay network, each node maintains a set of links to other nodes. This set of links is the node's routing table. In some DHTs such as Chord (Stoica et al.: “Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications”; in Proceedings of the ACM SIGCOMM'01 Conference, August 2001, San Diego, Calif., USA) the entries in the routing table are called fingers. In addition to the routing table, each node also maintains another data structure called the neighbour list. The neighbour list contains pointers to the immediate successors and predecessors (i.e., neighbours) of the node in the overlay network. A node picks its neighbours according to a certain structure, called the network's topology. One such topology, known as a ring topology in which the nodes are peers, is illustrated in FIG. 1. A peer 101 has immediate predecessors 102, 103 and immediate successors 104, 105. The peer 101 also maintains links in its routing table to three fingers 111, 112, 113 elsewhere in the network.
Each peer participating in the DHT is identified by its peer identifier (peer-ID), which is created by calculating a hash over certain piece of unique information such as the peer's IP address and port number. Many DHTs use Secure Hash Algorithm One (SHA-1) as the base hash algorithm. In DHTs, peer identifiers and resource identifiers (i.e. keys) are stored in the same namespace. As an example, in the Chord DHT, each peer is responsible for storing resources whose identifiers fall between its predecessor's peer identifier and its own peer identifier. Resource identifiers (resource-IDs) are also constructed (in an analogous manner to peer-IDs) by calculating a hash over a piece of information that uniquely identifies the resource. As an example, if the resource record is the contact address of a user's SIP terminal, the resource-ID can be formed by calculating a hash over the user's SIP URI. In a DHT, a user is responsible for storing his own contact information only if the resource-ID of his contact information falls in the portion of the identifier space his endpoint is responsible for.
Although DHTs have many benefits such as low capital cost, low operational cost, scalability, and robustness, they face certain challenges when used to distribute interpersonal communication systems such Voice over IP (VoIP), instant messaging and presence systems. In traditional client-server communication systems, all of the system's intelligence is located in centralized servers, whereas in peer-to-peer (P2P) systems, all of the system's intelligence is distributed to the endpoints. Client-server VoIP relies on centralized servers when providing features such as call control, presence, user registration, telephony supplementary services, and Network Address Translator (NAT) traversal, among other things. In P2P VoIP, these features need to be provided in a distributed fashion.
One challenge is the provision of a distributed presence service. In a pure P2P presence system, there is no central presence server to which users can update their presence status, and which can inform users about the changes in their buddies' presence status. Instead, users are responsible for tracking the presence status of their buddies themselves. To detect when his buddies join the P2P overlay, a user must poll these offline buddies periodically. Each periodic poll operation requires a P2P lookup that attempts to fetch the contact record of the buddy from the DHT. If the contact record cannot be found, the buddy is still offline. Ideally, the interval at which offline buddies are polled would be rather short so that there would not be a long gap between the buddy joining the overlay and the user detecting this. The problem with this approach is that the periodic lookups for offline buddies generate a lot of extra traffic into the overlay. If the polling interval is frequent, it is not uncommon that the majority of lookups in the system are lookups for offline buddies. This has a negative impact on performance since the system is flooded with such lookups.
To reduce the amount of polling traffic, the polling interval can be kept rather infrequent. This is the approach that real-world P2P presence systems have been forced to take. In these systems, it is not uncommon that it takes over five minutes from the moment that a user joins the system before his buddies detect this. This naturally results in poor user experience. Thus, P2P presence systems would greatly benefit from a mechanism that reduces or eliminates the polling traffic and yet makes the detection of peers becoming online almost immediate.
Telephony supplementary services are also difficult to provide in a distributed environment. In a traditional client-server system, supplementary services such as call forwarding, call barring, and completion of calls to busy subscribers are handled by centralized network components. There is no straightforward way to implement such services in a P2P environment.
One further problem is service discovery. In a P2P system, peers providing a particular service register as service providers into the overlay using the service identifier of the service. Other peers discover service providers by sending lookups for the service identifiers. If no service provider is available, a peer interested in the service must perform periodic lookups to detect when the service becomes available. Such periodic queries again cause extra traffic and degrade the performance of the system.