Peer-to-peer (P2P) file sharing is a major peer-to-peer application, with millions of users sharing millions of files and consuming a large proportion of Internet bandwidth. In such a large-scale system, it is important to supply good search capabilities, lest the user be overwhelmed with search results. However, the search capabilities of these systems are weak, particularly in ranking query results.
In a pure peer-to-peer system, true clients and servers do not really exist because each node functions simultaneously as a both a server and a client. However, as an aide to understanding the present invention, and not by way of limitation, the following terminology as may be used herein is explained. A client is a machine running a software routine seeking and receiving information. A server is a machine in the P2P file sharing system acting as a data repository and provider. A content file is a data object that is a unique set of data, e.g., song, picture, or any other thing in digital format. A replica is a copy of a content file. A node is one or more machines acting as one location in the network. A node will simply be referred to as a computer herein, and is meant to encompass all automated data handling apparatuses.
Standard file sharing models include the common P2P file sharing systems Gnutella and Kazaa. These systems make very few assumptions about the behavior of users and about the data they share. Peers of a P2P file sharing system collectively share a set of content files by maintaining local replicas of them. Each replica of a content file (e.g., a music file) is identified by a descriptor. A descriptor is a metadata set, which comprises terms (i.e., a “bag of words”). Depending on the implementation, a term may be a single word or a phrase. P2P searching consists of identifying content files through a search of the descriptors of the individual content files.
A peer acts as a client by initiating a particular query for a content file. A query is also a metadata set, composed of terms that a user thinks best describe the desired content file. A query is routed to all reachable peers, which act as servers. Query results are metadata references to content files that fulfill the matching criterion. The matching criterion in known P2P systems requires that the content file's descriptor contain all the query terms.
A query result contains the content file's descriptor as well as the identity of the present server. The descriptor helps the user distinguish the relevance of the content file to the query, and the server identity is required to initiate the content file's download.
Once the user selects an object, a local replica of the content file is made by downloading it from the server. In addition, the user has the option of manipulating the local replica's descriptor in his own computer. He may manipulate it for personal identification or to better share it in the P2P file sharing system.
Much of the known P2P improvement work proposes a focus on the architecture of P2P file sharing systems to improve searching by identifying highly reliable peers, and giving them specialized roles in statistics maintenance, indexing, and routing. The performance of such systems can be impressive; however, the application domain is different than the one presently considered. The present invention makes no assumptions about the relative capabilities of the peers, and so is more applicable to ad hoc environments, where functionality is fully distributed among all participants.