Peer-to-peer (P2P) applications, such as those for sharing music, movies and software, have become very popular, such that the majority of IP traffic on the Internet is now caused by P2P applications. The share of Internet traffic taken by P2P applications will continue to grow, particularly as the media industry has identified P2P as an efficient system for content distribution.
In P2P networking, the peers are distributed throughout the physical network and an overlay network can be constructed on top of this physical network in which the nodes are connected by virtual or logical links (that may represent many physical links in the underlying physical network). This overlay network is then used for routing between the nodes in the P2P network. FIG. 1 illustrates schematically an overlay network and the underlying physical network.
In non-structured P2P networks, the content is distributed randomly, whereas in structured P2P networks the location of content is determined by the utilised P2P protocol and, as such, requests for content can be routed by the P2P protocol in order to determine a source for the content.
Typically, a node in a structured P2P network maintains a list of nodes of which it is aware and with whom it collaborates for a specific task. This list of nodes is the P2P routing table and contains the node's neighbours in the collaborative network. For example, in FIG. 1 Peer A's list of neighbouring nodes might include Peer B and Peer D. Structured P2P networks are mostly based on Distributed Hash Tables (DHT) in order to manage the storage and retrieval of data within the network. DHT define a logical keyspace and split ownership of this keyspace between the nodes participating in the P2P network. To store a file, a hashing algorithm (e.g. SHA-1) is applied to the file contents or to its filename, generating a hash which is used as a key for the file. The location at which this file can be found, or the file itself, is then sent to and stored at the node responsible for the area of the keyspace in which this key lies. Generally, when a node wants to obtain this file, it will generate the file key and select a node from within its list of neighbouring nodes whose keyspace is closer to that in which the key lies. The data requesting node will then send a message containing the file key to the selected neighbouring node. This node then forwards the message to a node within its list of neighbouring nodes whose keyspace is closer still to that in which the key lies. The message is forwarded from node to node until it reaches the node responsible for key. For this reason, the selection of the nodes that feature in the list of neighbouring nodes is usually determined in order to minimise the number of hops that are required to reach any other node in the P2P network.
Some structured P2P applications also take account of the topology of the underlying physical network when selecting the nodes that will feature in a nodes list of neighbours and when selecting a node from within that list, with the aim of alleviating unnecessary load in the communication links between the nodes. For example, proximity neighbour selection attempts to construct the list of neighbouring nodes such that it contains the topologically closest nodes among all nodes within the desired portion of the keyspace. Furthermore, when routing a message there can be potentially several nodes in the list of neighbouring nodes that are closer to the message's key in the keyspace. Proximity message routing attempts to select the node that is closest in the physical network or that represents a good compromise between progress in the keyspace and proximity in the physical network. When attempting to account for node proximity, these structured P2P applications often rely on simple Round Trip Time (RTT) measurements e.g. made using ‘ping’ packets.
Non-structured P2P networks can be further categorised as either pure P2P networks, centralised P2P networks or hybrid P2P networks. In a pure P2P network a node that wants to obtain some data floods the network with a query or request. This data requesting node then receives a response or a number of responses from those nodes that can provide the requested data and must then select one of these possible data sources from which to retrieve the data. Once again, the selection often relies on simple RTT measurements when attempting to take account of the topology of the underlying network.
In a centralised P2P network, a data requesting node sends the request for data to some centralised look-up server or hub that stores a directory or database of the data that each node can provide. The centralised server then returns a list identifying possible source nodes from which the requesting node must select the source(s) it will contact to retrieve the date. In hybrid P2P networks, a routing hierarchy is established within the nodes of the network in which some nodes are defined as ‘superpeers’. The other (non-super) nodes of the P2P network send their data requests to their superpeer node, which then returns a list identifying possible source nodes. In other words, each superpeer node acts as a centralized server to a subset of the other nodes in the P2P network. In these P2P networks, the data requesting peer may either make RTT measurements to each of the possible data sources, or will merely attempt to retrieve the data from the sources in the order in which they are listed. In some cases, the server or superpeer may determine the order of the list depending upon the uplink speed of each possible data source node.
In order to optimise the cost of the transport and the performance of collaborative applications (such as the P2P applications described above) it is inefficient to rely on simple RTT measurements or the ordering provided by some server or superpeer, and more detailed knowledge of the network architecture and dynamic conditions within the network is required. However, details of the transport network are often hidden from the end-users.
Proactive network Provider Participation for P2P (P4P) has been proposed as an approach to optimize peer-to-peer connections using a more detailed knowledge of the network. However, this approach is based on a centralized node that offers a global view of the network, and that the P2P client must interact with when selecting which neighbouring nodes to collaborate with. This centralised approach has a number of disadvantages. Firstly, it requires that the user's P2P application must be configured to cooperate with the centralised node. Secondly, the P2P application must contact the centralised P4P node at least once every time they initiate collaborative communication with other peers, placing a significant load on this centralised node. Furthermore, whenever there is a change in the network topology the centralised node must be updated to reflect the new topology.