Peer-to-peer (P2P) technology supports the sharing of large amounts of content among a potentially huge number of users. Historically, networks with P2P technology have distributed information without a central server or a central router. Individual nodes in a P2P network, known as peers, have equal status and can function as both clients and servers. That is, peers can both upload files to the P2P network and download files from the P2P network. Various types of decentralized P2P networks have developed during the past decade.
While some P2P networks, such as Napster™, were centralized, others did not have a main server that directed traffic between users on the P2P network. Freenet, an open source version of a P2P system described by Ian Clarke in 1999, only requires a new node connecting to the P2P network to know the location of at least one existing node. The Gnutella file sharing network, developed by Justin Frankel and Tom Pepper of Nullsoft in 2000, links a large number of user nodes by having an initial node perform an operating known as “bootstrapping,” thereby making a connection to at least one other node.
The FastTrack file sharing protocol, introduced by the Dutch company Consumer Empowerment in 2001, features the use of supernodes to improve scalability. Many P2P programs, such as Kazaa, Grokster, and Morpheus became FastTrack clients. However, extensive use of FastTrack soon revealed that it had some drawbacks, such as allowing massive corruption of a file to go unnoticed. Thus, other protocols became more popular as the use of FastTrack declined.
The BitTorrent file sharing protocol, developed by Bram Cohen in 2001, operates by creating “torrent” files when a user first uploads a large file onto a P2P network that uses the BitTorrent protocol. Because the original file may be split into many pieces and scattered across the network, each torrent file contains metadata that may allow a peer to contact a tracker computer regarding desired content. The tracker computer, performing a bootstrap operation, uses the torrent file to locate the specified content from the various locations where it may be stored and permits the peer to reassemble the scattered pieces into the original file. Some networks using the BitTorrent protocol may be trackerless, using means other than a tracker computer to bootstrap connections between users.
There are currently a number of problems with P2P protocols. First, P2P protocols for decentralized systems may provide no way to verify that the user is downloading desired content. For BitTorrent networks, torrent files may not be indexed. As a result, the peer cannot be certain that the torrent file will actually result in the tracker retrieving the desired content. Instead of obtaining the requested file, the peer could receive malware or even a computer virus.
Second, P2P protocols may be used to exchange copyrighted material without authorization. Such exchanges may also devour any and all available bandwidth on a network, harming the online Quality of Experience of other users. Because of the large amount of bandwidth consumed by P2P traffic that violates copyright laws, some Internet Service Providers (ISPs) have treated all P2P traffic as suspect and have sought to block it. However, such treatment is unfair to legitimate P2P transfers.
The problems described above could be reduced by providing a database that distinguishes between legitimate and illegitimate P2P traffic. Such a database would allow an ISP to perform appropriate management actions on illegitimate P2P traffic, while allowing legitimate P2P traffic to proceed without such management actions. Current implementations fail to enable such an approach, as they provide no way to efficiently distinguish between legitimate and illegitimate content.
For the foregoing reasons and for further reasons that will be apparent to those of skill in the art upon reading and understanding this specification, there is a need for a system that comprehensively collects and classifies P2P content in a database. Furthermore, there is a need to map unique keys within metadata to respective P2P files. Additionally, there is a need to more efficiently identify malware and online piracy by using a database to survey P2P content.