1. Technical Field
The present application relates generally to an improved data processing system and method. More specifically, the present application is directed to mapping file fragments to file information and tagging in a segmented file sharing network.
2. Description of Related Art
Peer-to-peer file sharing programs are designed to widely distribute large amounts of data, while minimizing costly server and bandwidth resources. Peer-to-peer (P2P) systems, such as the BITTORRENT P2P file sharing system, have gained a wide following. P2P systems have recently been put to commercial use through partnerships with content providers, such as media and cable companies. P2P networks are gaining credibility as a means for legal revenue generating activity—creating a need for methods to rapidly optimize content delivery.
In one implementation, a file is made available for P2P download by providing a link to file information, often stored on a hypertext transport protocol (HTTP), or Web, server. In the BITTORRENT file sharing system, this file information is referred to as a “torrent.” The file information may include, for example, file name, file length, and hashing information.
The file information may also include the address of a tracker, which is a device in the P2P network that helps downloaders (peers) to find each other. Peers communicate with the host of the file information and the tracker using a simple protocol layered on top of HTTP. Each peer sends information about what file it is downloading, on what port it is listening, and other information. The tracker responds with a list of contact information for peers that are downloading the same file. However, the communication between a peer and a tracker requires much less bandwidth than a direct server-to-client file download.
In a typical P2P implementation, a file is divided into pieces of fixed size, e.g., 256 KB. Each downloader reports to its peers what pieces it has. Also, each downloader, at some point, uploads file pieces, also referred to as segments or fragments, to its peers. Whenever, a downloader finishes downloading a file fragment, the P2P client software performs a hash of the file fragment and compares the hash to an expected hash value, received in the file information, to determine if the file fragment downloaded correctly and has not been corrupted. If the file fragment downloads correctly, then the client reports to its peers that it has the file fragment available for upload.
At least one peer must start with the whole file. This peer is referred to as a “seed.” Eventually other peers will possess the whole file, or at least every file fragment will be found on at least one client. Some peers may leave the network before possessing the whole file, while others may remain in the network well after completing retrieval of the file. The goal is to balance downloading clients with uploading clients.
Several techniques or policies may be used to ensure that it is possible to download the entire file. For example, the tracker may return a random list of peers to each new participant in the download. As another example, P2P clients may attempt to request the rarest file fragment first. As more peers request the rarest file fragment, another file fragment becomes the rarest, and so forth. This technique helps to equally distribute the demand for particular file fragments. Other techniques, such as “random first” and “endgame mode” may be used; however, they are not a focus of this disclosure and will not be discussed in detail.
Content distribution among peers increases in efficiency with the number of peers who are sharing that content on a network. A network of peers participating in distributing a particular file is referred to as a “swarm.” Swarms are formed around the retrieval of a particular file and are comprised of peers retrieving (downloading) and sharing (uploading) file fragments simultaneously. The larger the swarm, the faster the per-peer retrieval of that file, and the more distributed the bandwidth cost becomes for each participant client device.