Data networks are based on “client-server” interactions. Servers provide, or serve, access to various network resources. Clients request access to the resources from the servers. An Internet Service Provider (ISP) is one example of a server. Clients request access to the Internet from the ISP and the ISP provides access. Once a client has access to the Internet through the ISP, the client may request access to a virtually endless variety of additional resources provided by millions of different servers on the Internet.
In a traditional client-server environment, resources are often concentrated in a relatively small number of centralized servers that service requests from a much larger number of clients. Concentrating resources on a few servers in this traditional approach has its drawbacks. For instance, the servers need to run on very fast machines that can process large volumes of client requests. A server machine and a network connection that can handle simultaneous transactions with thousands, or even millions, of clients is likely to be very expensive to purchase and maintain.
Furthermore, during times of exceptionally high network traffic, even the fastest centralized servers can become bottlenecks in the network. For instance, as a nation-wide news story breaks, the amount of time it takes to access the latest updates at most news websites tends to increase dramatically due to the increased network traffic. Moreover, if one of these large-scale centralized machines fails for any reason, millions of clients will be denied access to the desired resource. This vulnerability makes centralized servers prime targets for hackers and cyber-terrorists.
A less traditional approach to networking is based on peer-to-peer interactions. In a peer-to-peer environment, transactions take place between end-user, or peer, machines. Each peer can be both a consumer and a provider of resources. That is, a peer can act like a traditional client by requesting access to a resource stored elsewhere, and the same peer can also act like a traditional server by responding to a request for a resource stored at the peer.
Peer-to-peer networking theoretically has a number of advantages over the traditional approach. Rather than funneling huge volumes of client transactions to centralized servers, multiple peers can work together to replace entire servers. For instance, if a resource is copied and cached on many peer machines, a request for the resource can be serviced by any one of the peers, or even by several of these peers at the same time, by having each of the multiple peers only serve a portion of the resource, in order to maximize the aggregate download bandwidth for the resource. By spreading resources out over multiple peers, bottlenecks at centralized servers can be reduced. Also, end-user peer machines tend to be comparatively inexpensive and are already present in most networking environments. Using the available, cumulative processing power and available bandwidth of peer machines can substantially decrease the overall cost of a network by reducing the number of expensive, centralized servers.
An additional benefit of a peer-to-peer networking approach is the frictionless publishing of content. Given that in a peer-to-peer system, every peer machine can be both a consumer and a publisher of information, publishing information in such a system can be as easy as creating a new file. This completely eliminates the friction involved with publishing content in a client-server environment (such as, for instance, the steps involved in publishing on a corporate Intranet portal), and thus encourages users to share more content with each other. This type of sharing is especially relevant in an Intranet context that is made up of “islands of information” (such as user desktops and laptops) that contain up-to-date, enterprise-critical information that is normally never shared by users.
Unfortunately, most traditional approaches to managing network traffic and resources do not conveniently accommodate peer-to-peer networking. For instance, locating servers on the World Wide Web relies on the Domain Name System (DNS). DNS is a well-established hierarchical scheme used for naming Internet hosts, or servers. DNS was designed to resolve host names to Internet Protocol (IP) addresses. That is, DNS resolves the host name part of URLs that are meaningful to humans, such as www.USnews.com, to IP addresses that are meaningful to Internet routers and switches, such as 205.245.172.72. The IP address and related host name of a given host machine are propagated to various DNS servers throughout the Internet so that client requests made using the host's name (such as requesting a URL containing that particular host name) are appropriately routed to the correct IP address. Propagating or updating this information can take a comparatively long period of time, on the order of several hours or even days.
DNS works well for locating network resources that are all available at one particular host, and for hosts that have relatively static IP addresses.
Peers, however, often have dynamically allocated IP addresses. That is, each time a peer logs onto the Internet, the peer is likely to have a different IP address. For instance, a corporate user may alternatively connect their laptop to the Internet from the office or from home, using two different IP addresses, none of which may be statically assigned (assuming both the enterprise and the ISP use Dynamic Host Configuration Protocol (DHCP) for assigning host addresses). It is not uncommon for the duration between peer log-ons to be shorter than the time period needed to propagate a new IP address through the DNS system. In which case, most of the time, it would not be possible or not reliable to find a peer by using its DNS host name.
Furthermore, a peer-to-peer resource that has been copied and cached may be simultaneously available at many locations. In a corporate Intranet environment, this could be the case for a marketing report that is accessed by several users. Caching resources is advantageous for a number of reasons. Firstly, it promotes availability of resources. For instance, if resources published from a laptop are systematically cached on an enterprise server, these resources will continue to be available after the publisher closes their laptop and leaves for the day. Secondly, caching allows resources to be kept “closer” (in terms of network topology) to where they are used. For instance, in a geographically distributed enterprise it may be advantageous to cache “hot” documents in each office location, to provide better response times when accessing those documents.
As new versions of the resource are created however, old copies of the resource stored on other peers become outdated, and should be invalidated (i.e. not used any longer) as the new version propagates among the peers. This illustrates another shortcoming of DNS, its inability (due to a lack of a fine-enough granularity) to keep track of individual resources, and to track resources that can be available at multiple locations, locations having dynamically changing IP addresses, and locations that potentially store different versions of the same resource.
Furthermore, in the presence of multiple peer copies for the same resource, it is important to be able to select a small set of “best”, or “closest”, copies for a given request. This ability requires tracking of all equivalent peer locations that have an up-to-date copy of and can serve the cached resource.