Field of the Invention
The present invention relates to application of information associated with and derived from peer-to-peer networks. More particularly, the present invention relates to data mining on peer-to-peer networks, including the surveying and collection of target content for downstream application in various business environments and legal enforcement efforts.
Description of the Related Art
The broad accessibility of broadband internet service has allowed users to quickly and often illegally download media files such as music, movies, and games.
Data assets can be shared on and across networks using a variety of devices and protocols. Moreover, the activity and identity of users can constitute valuable information when properly surveyed and cataloged. The sharing and distribution of information via electronic communication networks has traditionally followed the client-server model. A central server sends the entire file to each client that requests it. The clients only communicate with the server, and never to each other. The main advantage of this method is that it is simple to set up. However, this method can be problematic with files that are large or very popular. It requires a great deal of bandwidth and server resources to distribute such a file, since the server must transmit the entire file to each client. Mirrors partially address this shortcoming by distributing the load across multiple servers, but at a significant expense.
Another popular method of transferring media uses a peer-to-peer network. BitTorrent is the most popular protocol for transferring large files over peer-to-peer networks and have accounted for a large percentage of total internet traffic. BitTorrent works by separating a file to be transferred into many small pieces to be distributed amongst multiple computers. A computer may receive one piece of the file from one particular computer, while simultaneously getting another piece of the file from a different computer. Any computer has the ability to upload already downloaded pieces of the file to any other computer that lacks that piece of the file.
The peer-to-peer model has superseded the client-server model in many areas of use, particularly in that of file sharing, in regard to both legitimate and illegal uses (e.g., in violation of copyrights that pertain to the data content in the files that are shared on the network). In contrast to the client-server model, the P2P model conflates clients and servers such that each is a node (also called a peer) that can be both a client and a server at the same time; nodes are generally assigned the same properties and privileges such that any node can access information stored in other nodes and provide information to other nodes. Thus, a network comprising nodes/peers is called a peer-to-peer (P2P) network. P2P networks often comprise overlay networks on top of an existing IP network such as the Internet. A well-known example of a P2P network is the set of nodes (such as personal computers) connected to each other using the P2P protocol BitTorrent; note therefore that a node may be regarded as both a data structure and a computer device, simultaneously or alternatively, as is understood by persons of ordinary skill in the art.
Peer-to-peer protocols are used to distribute a wide range of content to millions of people. The content typically comprises large data files or a collection of related files, such as multimedia containing movies and music, but more abstractly content can be any type of data elements (e.g., any object residing in computer memory). Because data often has commercial value, said content may be termed “data assets.” The fundamental design of P2P protocols can be broadly described as comprising two distinguishable methods for the coordinating of file sharing: a Centralized Method and a Decentralized Method. These broad types continue to evolve and spawn new variants of methods, and these two category names have been assigned here for the purposes of imposing a conceptual model only, and they are intended to be construed broadly to encompass the full range of P2P methods applied in the relevant technological arts.
Briefly, the Centralized Method uses one or more servers designated as a “tracker” to coordinate communication and data exchanges with peers.
In the BitTorrent scheme, sets of files (“torrents”) are pointed to by a small file called a “torrent file,” and the contents of a torrent may include multimedia data files, URL identifiers, executable files and data objects. For example, a network for sharing motion picture movie files would comprise torrent files comprising pointers to the movie file desired by the user making the request for the transaction, plus a tracking file and associated images and text that provide additional entertainment content related to the main multimedia file. The term “content” can refer to any and all of such contents of a torrent. Users of BitTorrent systems and services are often permitted to discover content on a particular P2P network via a web-based torrent search engine which may be privately maintained by the network owner (e.g., a portal or website) or publicly presented through third parties, such as commercial search engines (e.g., Google). In the latter case, torrent files may be specifically identified by using search parameters that limit the search results to torrent files, which carry the “.torrent” extension.
When a user obtains a torrent file, they are acquiring a small file that contains information on the larger files desired for downloading to their local machine. The torrent file tells the torrent client (a local application on the user's machine) what are the names of the files being shared, the URL for the tracker, and more. Popular torrent clients known in the art include, for example, uTorrent, Vuze, Transmission, and Deluge. The local torrent client then calculates a hash code, which is a unique code specific to that unique torrent, analogous to an ISBN or catalog number, or a fingerprint, which hash code is then used to identify the desired content distributed among the other nodes on the network, to be downloaded by the client. P2P file sharing is generally faster, and therefore more suitable for sharing large files, than that of the Client-server model, because it acquires the target content of a torrent in subdivided packets which are downloaded simultaneously from many nodes on the network in parallel, rather than downloading an entire file from just one memory location.
One example of an emerging adaptation in P2P systems is the magnet link. A magnet link is essentially a hyperlink containing the hash code for a torrent, which is passed to the local torrent client immediately enabling the identification of peers and the download of torrents from nodes. Magnet links can avoid the requirement of using a tracker because of their use of distributed hash tables (DHT). Many P2P service providers, on their web-based portals or browser-based search engines, now offer magnet links in conjunction with each instance of a downloadable torrent file. The present invention is adapted to accommodate network activity using magnets and other such variants, which are essentially modifications of the same P2P networks. Other adaptations known in the art include “Peer Exchange” (PEX) and “trackerless” torrents. From the perspective of individual users, these adaptations are often effectively invisible since the local client often handles the execution of the appropriate instructions necessary for accommodating each variant on a network. Functionally, these network protocols can resemble hybridized forms of the Centralized and Decentralized Models.
For example, DHT is used to find the IP addresses of peers, typically in addition to a tracker. It is enabled by default in clients such as uTorrent and Vuze and millions of people are already using it without knowing. DHT's function is to find peers who are downloading the same files, but without communicating with a central BitTorrent tracker (e.g., a server, a network owner or service provider). Similarly, PEX is another means of finding IP addresses; rather than mimicking a tracker, its local client identifies peers directly connected to the local node, and it queries these peers for the addresses of their peers, and so on.
In an attempt to increase anonymity, fault tolerance, and scalability, a decentralized method to augment, and often times replace the aforementioned centralized model has been adopted. The decentralized method is based on Distributed Hash Tables (DHT), and provides a lookup service similar to, or in the format of: (Key, Value) pairs which are stored in the DHT, and any participating node can retrieve the value associated with a given key.
In order to access these dynamic and ever-evolving networks, and to accurately survey the information in and passing through them, particular systems and methods are required and they must be not only tailored for distinct tasks but flexible enough to accommodate slight differences between individual P2P networks. Additionally, greater power is needed in order to extract, analyze, compile, and utilize the full scope of data assets to be found in P2P networks.
There is a current need for an efficient means of tracking and cataloging the information present within, and being passed among, the nodes in peer-to-peer networks.