Caches for the intermediate storage of data transferred about the Internet are known in the art. The most common type of cache used in the Internet is a proxy cache. The proxy cache operates at the application level, passing some messages unaltered between a client and a server, changing other ones of the messages and sometimes responding to the messages itself rather than relaying the messages. A web proxy cache sits between web servers and one or more clients and watches requests for HTML pages, music or audio files, video files, image files and data files (collectively known as digital objects) pass through. The web proxy cache saves a copy of the HTML pages, images and files for itself. Subsequently if there is another request for the same object, the web proxy cache will use the copy that was saved instead of asking an origin server to resend the request.
There are three main reasons why proxy caches are used:    i) In order to reduce latency—in this case, the request is satisfied from the proxy cache (which is closer to the client) instead of the origin server. It therefore takes less time for the client to get the object and display the object. This makes web sites seem more responsive to the client.    ii) To reduce traffic—Each object is only retrieved once from the server once, the proxy cache reduces the amount of bandwidth used by an Internet Service Provider to the outside world and by the client. This saves money if the client is paying for the traffic and keeps the client's bandwidth requirements lower and more manageable.    iii) To increase delivery speed.
The proxy caches may be provided by an Internet Service Provider at an access point and can continually store digital objects accessed by the ISP customers. For example, CacheLogic, Cambridge, UK, provides solutions which can be used by ISPs and others to reduce their traffic.
These solutions are documented briefly in the document “the Impact of P2P and the CacheLogic P2P Management Solution” (available 1 Aug. 2006 at http://www.cachelogic.com/products/resource/Intro_CacheLogic_P2P_Mgmt_Solution_v3.0. pdf)
Caches generally have both a fast access solid state memory and disk memory. It is known that the access time to the disk memory is substantially slower than the access time to the solid state memory. This is because access to data on the disk memory requires the mechanical movement of a reading head. Alternatively, a cache may have some local memory (solid state and/or disk) but may also have access to remote memory (solid state and/or disk). Accessing remote memory on a remote machine is also more expensive than accessing memory on the immediate machine.
One solution to speed up the access time would be to have solely solid state memory. However this is extremely expensive. Given the large sizes of the caches used in the ISPs the cost is likely to be prohibitive for many ISPs except for very special applications. It would therefore be advantageous to provide a management system to improve the access times.
A peer-to-peer (also termed P2P) computer network is a network that relies primarily on the computing power and bandwidth of the participants in the computer network rather than concentrating computing power and bandwidth in a relatively low number of servers. P2P computer networks are typically used for connecting nodes of the computer network via largely ad hoc connections. The P2P computer network is useful for many purposes. Sharing content files containing, for example, audio, video and data is very common. Real time data, such as telephony traffic, is also passed using the P2P network.
A pure P2P network does not have the notion of clients or servers, but only equal peer nodes that simultaneously function as both “clients” and “servers” to the other nodes on the network.
This model of network arrangement differs from the client-server model in which communication is usually to and from a central server. A typical example for a non P2P file transfer is an FTP server where the client and server programs are quite distinct. In the FTP server clients initiate the download/uploads and the servers react to and satisfy these requests from the clients.
Some networks and channels, such as Napster, OpenNAP, or IRC@find, use a client-server structure for some tasks (e.g., searching) and a P2P structure for other tasks. Networks such as Gnutella or Freenet use the P2P structure for all purposes, and are sometimes referred to as true P2P networks, although Gnutella is greatly facilitated by directory servers that inform peers of the network addresses of other peers.
One of the most popular file distribution programmes used in P2P networks is currently BitTorrent which was created by Bram Cohen. BitTorrent is designed to distribute large amounts of data widely without incurring the corresponding consumption in costly server and bandwidth resources. To share a file or group of files through BitTorrent, clients first create a “torrent file”. This is a small file which contains meta-information about the files to be shared and about the host computer (the “tracker”) that coordinates the file distribution. Torrent files contain an “announce” section, which specifies the URL of a tracker, and an “info” section which contains (suggested) names for the files, their lengths, the piece length used, and a SHA-1 hash code for each piece, which clients should use to verify the integrity of the data they receive.
The tracker is a server that keeps track of which seeds (i.e. a node with the complete file or group of files) and peers (i.e. nodes that do not yet have the complete file or group of files) are in a swarm (the expression for all of the seeds and peers involved in the distribution of a single file or group of files). Nodes report information to the tracker periodically and from time-to-time request and receive information about other nodes to which they can connect. The tracker is not directly involved in the data transfer and is not required to have a copy of the file. Nodes that have finished downloading the file may also choose to act as seeds, i.e. the node provides a complete copy of the file. After the torrent file is created, a link to the torrent file is placed on a website or elsewhere, and it is normally registered with the tracker. BitTorrent trackers maintain lists of the nodes currently participating in each torrent. The computer with the initial copy of the file is referred to as the initial seeder.
Using a web browser, users navigate to a site listing the torrent, download the torrent, and open the torrent in a BitTorrent client stored on their local machines. After opening the torrent, the BitTorrent client connects to the tracker, which provides the BitTorrent client with a list of clients currently downloading the file or files.
Initially, there may be no other peers in the swarm, in which case the client connects directly to the initial seeder and begins to request pieces. The BitTorrent protocol breaks down files into a number of much smaller pieces, typically a quarter of a megabyte (256 KB) in size. Larger file sizes typically have larger pieces. For example, a 4.37 GB file may have a piece size of 4 MB (4096 KB). The pieces are checked as they are received by the BitTorrent client using a hash algorithm to ensure that they are error free.
As further peers enter the swarm, all of the peers begin sharing pieces with one another, instead of downloading directly from the initial seeder. Clients incorporate mechanisms to optimize their download and upload rates. Peers may download pieces in a random order and may prefer to download the pieces that are rarest amongst its peers, to increase the opportunity to exchange data. Exchange of data is only possible if two peers have different subsets of the file. It is known, for example, in the BitTorrent protocol that a peer initially joining the swarm will send to other members of the swarm a content availability message in the form of a BitField message which indicates an initial set of pieces of the digital object which the peer has available for download by other ones of the peers. On receipt of further ones of the pieces, the peer will send further content availability messages in the form of Have messages to the other peers to indicate that the further ones of the pieces are available for download.
The substantial increase in traffic over P2P networks in the past few years has increased the demand for P2P caches and also for alternative P2P management techniques. In particular there is a need to ensure that those pieces of the digital object required are preferably available with short access times.