A peer-to-peer (also termed P2P) computer network is a network that relies primarily on the computing power and bandwidth of the participants in the computer network rather than concentrating computing power and bandwidth in a relatively low number of servers. P2P computer networks are typically used for connecting nodes of the computer network via largely ad hoc connections. The P2P computer network is useful for many purposes. Sharing content files containing, for example, audio, video and data is very common. Real time data, such as telephony traffic, is also passed using the P2P network.
A pure P2P network does not have the notion of clients or servers, but only equal peer nodes that simultaneously function as both “clients” and “servers” to the other nodes on the network. This model of network arrangement differs from the client-server model in which communication is usually to and from a central server. A typical example for a non P2P file transfer is an FTP server where the client and server programs are quite distinct. In the FTP server clients initiate the download/uploads and the servers react to and satisfy these requests from the clients.
Some networks and channels, such as Napster, OpenNAP, or IRC @find, use a client-server structure for some tasks (e.g., searching) and a P2P structure for other tasks. Networks such as Gnutella or Freenet use the P2P structure for all purposes, and are sometimes referred to as true P2P networks, although Gnutella is greatly facilitated by directory servers that inform peers of the network addresses of other peers.
One of the most popular file distribution programs used in P2P networks is currently BitTorrent which was created by Bram Cohen. BitTorrent is designed to distribute large amounts of data widely without incurring the corresponding consumption in costly server and bandwidth resources. To share a file or group of files through BitTorrent, clients first create a “torrent file”. This is a small file which contains meta-information about the files to be shared and about the host computer (the “tracker”) that coordinates the file distribution. Torrent files contain an “announce” section, which specifies the URL of a tracker, and an “info” section which contains (suggested) names for the files, their lengths, the piece length used, and a SHA-1 hash code for each piece, which clients should use to verify the integrity of the data they receive.
The tracker is a server that keeps track of which seeds (i.e. a node with the complete file or group of files) and peers (i.e. nodes that do not yet have the complete file or group of files) are in a swarm (the expression for all of the seeds and peers involved in the distribution of a single file or group of files). Nodes report information to the tracker periodically and from time-to-time request and receive information about other nodes to which they can connect. The tracker is not directly involved in the data transfer and is not required to have a copy of the file. Nodes that have finished downloading the file may also choose to act as seeds, i.e. the node provides a complete copy of the file. After the torrent file is created, a link to the torrent file is placed on a website or elsewhere, and it is normally registered with the tracker. BitTorrent trackers maintain lists of the nodes currently participating in each torrent. The computer with the initial copy of the file is referred to as the initial seeder.
Using a web browser, users navigate to a site listing the torrent, download the torrent, and open the torrent in a BitTorrent client stored on their local machines. After opening the torrent, the BitTorrent client connects to the tracker, which provides the BitTorrent client with a list of clients currently downloading the file or files.
Initially, there may be no other peers in the swarm, in which case the client connects directly to the initial seeder and begins to request pieces. The BitTorrent protocol breaks down files into a number of much smaller pieces, typically a quarter of a megabyte (256 KB) in size. Larger file sizes typically have larger pieces. For example, a 4.37 GB file may have a piece size of 4 MB (4096 KB). The pieces are checked as they are received by the BitTorrent client using a hash algorithm to ensure that they are error free.
As further peers enter the swarm, all of the peers begin sharing pieces with one another, instead of downloading directly from the initial seeder. Clients incorporate mechanisms to optimize their download and upload rates. Peers may download pieces in a random order and may prefer to download the pieces that are rarest amongst it peers, to increase the opportunity to exchange data. Exchange of data is only possible if two peers have a different subset of the file. It is known, for example, in the BitTorrent protocol that a peer initially joining the swarm will send to other members of the swarm a BitField message which indicates an initial set of pieces of the digital object which the peer has available for download by other ones of the peers. On receipt of further ones of the pieces, the peer will send a Have message to the other peers to indicate that the further ones of the pieces are available for download.
Caches for the intermediate storage of data transferred about the Internet are known in the art. The most common type of cache used in the Internet is a proxy cache. The proxy cache operates at the application level, passing some messages unaltered between a client and a server, changing other ones of the messages and sometimes responding to the messages itself rather than relaying the messages. A web proxy cache sits between servers in the Internet and one or more clients and watches requests for HTML pages, images and files (collectively known as objects) pass through. The web proxy cache saves a copy of the HTML pages, images and files for itself. Subsequently if there is another request for the same object, the web proxy cache will use the copy that was saved instead of asking an origin server to resend the request.
There are three main reasons why such proxy caches are used:
i) In order to reduce latency—in this case, the request is satisfied from the cache (which is closer to the client) instead of the origin server. It therefore takes less time for the client to get the object and display the object. This makes web sites seem more responsive to the client.
ii) To reduce traffic—Each object is only retrieved once from the server, and thus the cache reduces the amount of bandwidth used by a client. This saves money if the client is paying for the traffic and keeps the client's bandwidth requirements lower and more manageable.
iii) To increase delivery speed.
It would be advantageous if the cache could participate in the peer-to-peer distribution network and become a member of a swarm. By becoming a member, the cache would reduce traffic, increase delivery speed and reduce latency in the peer-to-peer distribution network. In order to become a member, the cache must know about the distribution of the file.