Search engines on the Internet are used by people all over the world to find and download text, video, images, audio, and other information, collectively referred to as data. Typically, search engines periodically examine the data objects contained on many servers connected to the Internet. The search engines then construct an index of each server's contents and creates a link for the server locations corresponding to contents.
Most commercial Internet search engines contain a search engine application, residing in a central-server complex. This central server complex receives search requests throughout the day from Internet users globally. However, when the number of search requests per day is too large, this approach can prove disadvantageous.
For example, for handling a situation where the number of search requests per day is close to a 100 million, a heavy central server infrastructure is required, both in terms of the size of the central server complex and magnitude of incoming bandwidth. Furthermore, the number of search requests per day peaks during certain times of the day to extremely high values, which makes load-balancing an important consideration.
Another typical approach to searching content is by using a distributed search system, which is employed to search content in certain distributed computing networks such as peer-to-peer networks (“P2P networks”). In such a system, there is no central server, or central bank of servers, that receives all search requests, conducts the search, and responds with search results. Rather, the database of searchable content, available as shared files in the network, is indexed and the index is distributed to the clients in the network, or nodes regionally distributed throughout the network. Updates or changes in the available file information in each client are periodically uploaded to the clients, or the regional nodes, in the network, via a peer-to-peer client application. As a result, search requests can be widely distributed, with each search engine responding to a subset of search queries. To retrieve search results, a client receives a search input from a user, locates a copy of the content database or content index locally or on a regional node, finds the entry associated with the search input, and generates a search response of available file information in the entire network.
This distributed search system requires a simpler server infrastructure than the central server approach and, thus, is capable of sustaining a larger number of search requests per day for a lower operational cost. Conventionally, however, distributed searching has been applied to files that are specifically designated as being part of a distributed search network. For example, in a typical distributed network, files that are indexed, and therefore capable of being distributed to other client devices, are located in specially designated folder, i.e. a “shared folder”. Once placed in the specially designated folder, file identifying information, such as name, file size, hash value, author, owner, and other data, is extracted and incorporated into a central index that is stored either in a local index or index located on a regional node. These conventional systems only operate on those files that are specifically designated as being derived from the distributed network or specifically designated as being part of the distributed network.
Additionally, users often want to search and obtain content from various sources. Conventionally, users have to go to multiple search indices, input the search request, and, if the file is available, access the content file from that source. Most of the conventional search engines have a central-server based search system, i.e., all search requests are sent to a server complex, which performs a search on a central database. The search results in this case do not include content from specific P2P networks. Search in a P2P network uses a completely different approach: the entire search database is distributed to every client participating in the P2P network, and each client performs searches on their local copy of the database. However, the search results do not include content outside the P2P network.
The art fails to disclose distributed search systems that take advantage of searches conducted by users on conventional centralized server search systems to build a distributed search index. What is needed, therefore, is an improved distributed network search system that takes advantage of the millions of searches conducted by users on central server search systems to construct an index of data. What is also needed is a method and system for capturing web pages and other data accessed by an individual, indexing that data, and making that indexed data available to other distributed network users.
Further, the art fails to disclose methods and systems for being able to search, via a single site or application, content from the Internet as well as specific distributed computing (P2P) networks. What is needed, therefore, is a search system with the capability to integrate search results from distributed computing networks and those from central server search systems and present those search results to the user via a single interface.