In recent years, communications environments referred to as “peer to peer” networks have become increasing common. Generally, a peer to peer network enables multiple computer systems to share files that they store. Peer to peer networks may be used in any environment in which it is inconvenient or impractical to share files using a dedicated file server. Client versions of many contemporary operating systems allow files to be shared between client systems over a network. In addition, several well known examples of peer-to-peer network applications operate over the Internet, allowing users to share files stored on their local hard disks, and essentially creating global peer-to-peer networks. Often used for sharing music files, this widely distributed approach to file sharing was popularized by the famous Napster service, as well as Gnutella, Grokster, KaZaA, and others.
While specific file sharing systems have been architected in different ways, they all allow users to search for a desired file or files. If a desired file is present within the peer to peer network, the search results indicate user names or links associated with one or more computer systems from which the file can be downloaded. For example, a user name or “handle” associated with a computer system may be returned if a copy of the desired file is currently hosted on that system. The user that issued the search can then request that a copy of the desired file be downloaded onto his or her local hard drive from one of the remote hosting computer systems indicated in the search results.
As it is generally known, in order to improve search operation performance, it is often useful to create and maintain a data structure referred to as a “search index”. A search index enables efficient matching between tokens in a search query and files associated with or containing those tokens. For a file to be represented in a search index, it must go through an “indexing” step, resulting in information describing the file being added to the index.
Unfortunately, indexing large numbers of files is expensive both in terms of CPU utilization and in the size of the search index. For each file indexed, multiple processing steps may be required, such as format conversion, language detection, tokenization, and insertion into the index. These actions often consume significant processor and storage resources.
In a peer to peer network, physically distributed computer systems belonging to the network operate independently, but may share centrally provided resources. One such shared resource is often a network wide search service, which may include a search index to improve search performance. Accordingly, files stored on the system are passed to an indexing process that maintains the search index. However, multiple copies of a single file are often hosted on different systems in the peer to peer network. Such duplicate copies may cause a single file to be re-indexed for each location at which a copy is stored. This is disadvantageous, resulting in identical content being re-indexed. It would be desirable to eliminate such unnecessary processing and resource consumption to improve the performance of a shared indexing service in a peer to peer network.