1. Field of the Disclosure
The disclosure relates generally to data transfer, and in particular, to optimizing transfer times in a peer-to-peer network.
2. The Prior Art
One common use of the Internet since its inception is transferring and downloading files. The most common method by which files are transferred on the Internet is the client-server model. A central server sends the entire file to each client that requests it—this is how both http and ftp operate. The clients only speak to the server, and never to each other.
The main advantage of the client-server model is its simplicity—a user logs into to a server and initiates the download process. Additionally, files are usually available for long periods of time as the servers tend to be dedicated to the task of serving, and are always on and connected to the Internet.
However, the client-server model has a significant problems with files that are large or very popular, or both, such as newly released content. In particular, a great deal of bandwidth and server resources must be dedicated to distributing each file, since the server must transmit the entire file to each client. The concept of server mirrors partially addresses this shortcoming by distributing the load across multiple servers, however coordination between sites and much effort is required to set up an efficient network of mirrors. Hence, mirroring is typically feasible only for the busiest of sites.
Another method of transferring files has become popular recently: the peer-to-peer network (P2P), including systems such as Kazaa, eDonkey, Gnutella, Direct Connect, etc. In a typical peer-to-peer network, Internet users trade files by directly connecting to each other, i.e., on a one-to-one basis. Files may then be shared without having to access a central server. Because of the anonymity of this process, there is little accountability regarding the copyright protection of the files, and hence these networks tend to be very popular for the transfer of illicit files such as music, movies, pirated software, etc.
Typically, a downloader receives a file from a single peer source, however newer versions of some clients allow downloading a single file from multiple sources to achieve higher speeds. The problem discussed above of popular downloads is somewhat mitigated, because there's a greater chance that a popular file will be offered by a number of peers. The breadth of files available tends to be fairly wide, though download speeds for obscure files tend to be lower.
Another common problem associated with peer-to-peer systems is the significant protocol overhead for passing search queries amongst the peers, and the number of peers that one can reach is often limited as a result. Partially downloaded files are usually not available to other peers, although some newer clients may offer this functionality. Availability is generally dependent on the goodwill of the users, to the extent that some of these networks have tried to enforce rules or restrictions regarding send/receive ratios.
Usenet binary newsgroups represent yet another method of file distribution that is substantially different from the other methods. Files transferred over Usenet are often subject to miniscule windows of opportunity. Typical retention time of binary news servers are often as low as 24 hours, and having a posted file available for a week is considered a long time. However, the Usenet model is relatively efficient, in that the messages are passed around a large web of peers from one news server to another, and finally fanned out to the end user from there. Often the end user connects to a server provided by his or her ISP, resulting in further bandwidth savings.
Usenet is also one of the more anonymous forms of file sharing, and thus too often is used for illicit files of almost any nature. Due to the nature of NNTP (Network News Transfer Protocol), a file's popularity has little to do with its availability and hence downloads from Usenet tend to be quite fast regardless of content. The downside of this method include a extravagant set of rules and procedures, and thus efficient downloading requires a certain amount of effort and understanding from the user. Patience is often required to get a complete file due to the nature of splitting large files into a number of smaller segments. Finally, access to Usenet often must be purchased due to the extremely high volume of messages in the binary groups.
BitTorrent is a newer protocol designed for transferring files in a peer-to-peer fashion. In BitTorrent, users connect directly to each other to send and receive portions of the file. However, there is a central server (called a tracker) which coordinates the action of all such peers. The tracker only manages connections and does not have any knowledge of the contents of the files being distributed, and therefore a large number of users can be supported with relatively limited tracker bandwidth. The key philosophy of BitTorrent is that users should upload (transmit outbound) at the same time they are downloading (receiving inbound). In this manner, network bandwidth is utilized as efficiently as possible. BitTorrent is designed to work better as the number of people interested in a particular file increases, in contrast to other file transfer protocols where more users tend to bog the system down.
One type of file that is becoming more common is referred to a progressive resolution files that have lower resolution files embedded within higher resolution files. Such a files are also referred to coded image files. JPEG2000 is an example of such a file in which a lower resolution versions of the same file provides a complete image, just at a lower resolution when compared to the corresponding full-version image file.
Thus, in a P2P network having a progressive image file, some peers will have lower resolution pieces and some will have higher resolution pieces that correspond to the same image file. A challenge therefore exists to determine the optimal transfer pattern when performing a parallel file transfer from a give set of peers that have differing pieces of a desired file.
The BitTirrent protocol breaks files into blocks and attempts to find peers that together contain all of the blocks of a file desired by a peer. In BitTorrent, a ‘seed’ is a peer that contains a full version of a particular file. Peers, known as ‘leeches’, request a file and begin to download pieces of the file. As more leeches request the file, the leeches begin to ‘swarm’ and share various pieces of the file amongst other peers. Bittorrent demands that leeches share the pieces they have downloaded with other peers rather than the seed providing pieces that already exist in the swarm. Thus, BitTorrent forces a swarm of peers to share amongst themselves whenever possible, thus balancing the bandwidth across the swarm. As long as there is one seed with a complete version of the file, all leeches will eventually acquire a full version of the file.
However, as the emphasis of BitTorrent is on bandwidth sharing, there is little emphasis on optimizing the transfer time of files. Rather, BitTorrent aims to saturate a given link through a series of heuristics and rotating transfer attempts.
Some effort has been made in the prior art to examine the consequences of transferring progressive image file files in a P2P scenario. One such example that examines the consequences of a shrinking pool of peers as those peers with smaller versions of a coded file drop out of the peer supply pool is found in X. Su and R. Fatoohi, “Scalable Coded Image Transmissions over Peer-to-Peer Networks,” Proc. IEEE International Conference on Multimedia and Expo, pp. 493-496, July, 2003.
However, such algorithms tend to either download available pieces from the fastest source first, or from an optimized list of sources ordered from peers having the beginning of the file to peers having the end of the file. However, such algorithms will tend to provide the end piece of the file last, as this is the piece that generally is the least available.
Hence, there is a need for a parallel file transfer algorithm that achieves optimal transfer time in given domain.