Peer-to-peer or P2P networks make use of the pooled resources of participating nodes including processing capabilities and communication bandwidth to facilitate a wide variety of services including file sharing and VoIP telephony. In the absence of central servers, particular P2P services may make use of “overlay networks” to optimise resource location. An overlay network comprises nodes connected by virtual links representing paths extending across possibly many physical links in the underlying network (e.g. the Internet). Each node in the overlay network maintains a routing table containing a set of links to certain other nodes within the overlay network. Resource requests are passed between nodes until they arrive at a node which is responsible for that resource.
Distributed Hash Tables (DHT) provide an efficient means for mapping resource names (“keys”) to locations within an overlay network. DHT makes use of a hashing algorithm to map keys, e.g. song titles, SIP URIs, etc, to a finite value space, e.g. 128 bits. The hashing algorithm is chosen to ensure a relatively uniform spread of hash values across the value space. Thus, for example, the hashing of 100 song titles will likely result in 100 hash values that are relatively evenly spaced across the value space. Nodes within an overlay network are identified by usernames, which are themselves hashed into respective hash values. Each node then becomes responsible for a set of hash values within the value space which neighbour its own value. In practice, a node will store locations (e.g. IP addresses) from which resources, matching resource names which it “owns”, can be obtained. When a node in the overlay network receives a request for a resource, the node determines whether or not it owns the corresponding hash value. If so, it returns the location of the resource to the requester (via the overlay network). If it does not own the hash value, it inspects its routing table to identify that node within the table which has a hash value closest to the hash value of the request, and forwards the request to that node. The receiving node repeats the procedure, and so on until the request arrives at the node which does own the hash value corresponding to the request and which therefore knows the resource location.
FIG. 1 illustrates an overlay network organised as a ring (only a small number of the nodes within the ring are illustrated). In this example, each node maintains a routing table containing the locations and hash values of a small number of succeeding and preceding nodes in the ring, as well as for a small number of more distant nodes. In the illustrated network, a Node X maintains within its routing table locations for two successor nodes and two predecessor nodes, as well as for three remote nodes. Whilst a larger number of entries within the routing tables can make the network more efficient in terms of routing and more robust against node withdrawal, large tables are difficult to maintain and therefore increase the unreliability of the network. If a node of the overlay network is behind a Network Address Translation node (or NAT), “pinholes” are opened in the NAT for those peer nodes contained within the routing table.
A node within the overlay network ensures that the information in its routing table is up to date by attempting to contact its neighbours periodically. A number of different mechanisms may be used for this purpose:                1) A node can periodically send keep-alive messages to check that the other nodes listed in its routing table have not left the overlay network. This mechanism is used by DHT approaches such as Pastry [A. Rowstron and P. Druschel: Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. Middleware, 2001], Chord [I. Stoica, R. Morris, D. Karger, M. F. Kaashoek and H. Balakrishnan: Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications. In Proceedings of the ACM SIGCOMM'01 Conference, August 2001, San Diego, Calif., USA.] and Content Addressable Network (CAN) [S. Ratsanamy, P. Francis, M. Handley, R. Karp and S. Shenker: A scalable content-addressable network. In Proceedings of ACM SIGCOMM 2001, August 2001].        2) A node can periodically send queries to learn about new nodes that could be inserted into the routing table, replacing old entries (e.g. Chord).        3) A node can periodically send queries to its direct neighbours requesting information about the entries in its neighbours' routing tables. This information is used to update the node's own routing table (e.g. Chord).        4) A node can periodically send its own routing table to its neighbours (e.g. CAN).        
Another (additional) approach to maintaining the routing tables involves a node checking whether the originator of a resource request could be inserted into its routing table (e.g. Kademlia [P. Maymounkov and D. Mazieres: Kademlia: A peer-to-peer information system based on the xor metric. In Proceedings of IPTPS02, Cambridge, USA, March 2002]).
Consider FIG. 2 which shows an example of neighbourhood relations in a DHT. In the Figure, a ring topology is assumed. Node X maintains three successors and three predecessor pointers in its routing table. It should already be clear that the reason for maintaining multiple successor and predecessor pointers is to increase robustness. If the probability that a single successors will fail is p, then the probability that all three successors will fail simultaneously is p3. However, in extremely large real-world DHT-based overlay networks, this is not sufficient to maintain connectivity in the network; if all three successors (or alternatively, all three predecessors) of a given node leave the network within a sufficiently short period of time, the network fragments.
DHT-based overlay networks are fully distributed systems that can work without any centralized components. Such decentralized systems are very challenging from the viewpoint of network performance monitoring operations. In particular, the following issues arise:
Determining the total size of the network—there is no central server which can keep track of the number of users in the system. Many DHT algorithms need information about the size of the network to optimize their performance. As an example, in the Chord DHT algorithm [Stoica 2001], information concerning network size is needed, e.g. when selecting the size of the neighbor set, the frequency of DHT maintenance operations and the size of the routing (i.e. finger) table. If the size of the network is not known, a pre-configured and thus sub-optimal value needs to be used. This can have a very negative impact on network performance.
Correctness of network topology—for example how to detect the presence of loops in the network when the network is completely decentralized. If a loop is formed within the network, this will result in failed or incorrect lookups, degraded performance or even complete disruption of the operation of the network. A DHT based overlay network would benefit from the capability to automatically detect and fix loops.
Existence of network partitions (is there a path from any peer in the network to any other peer in the network)—how to connect disconnected network partitions that can be created due to churn. Previous studies [Rhea 2004, Li 2004] have shown that existing DHTs cannot handle high churn rates. Due to high churn, nodes can lose all of their predecessor or successors. If even a single node loses track of its neighbors even in one direction, the network becomes partitioned. As a result, lookups may return inconsistent values or fail altogether. A DHT based overlay network would benefit from the capability to recover from network partitions.
The collection of statistical and performance related information from the network. As a network may consist of hundred or even thousands of nodes, collecting real-time or near real-time information about the status of the network becomes problematic. Separately collecting periodic status reports from all of the nodes to obtain near real-time information of network state will clearly not scale (since to collect real-time information, frequent periodic message exchanges are needed between the monitoring node and each of the peers in the overlay network).