This invention relates generally to search engine technology. More specifically, this invention relates to reducing the computational overhead associated with merging results from multiple search engines.
Contemporary computers often operate in a network environment that allows them to communicate with each other. Accordingly, they can exchange data and search and retrieve the contents of another computer in the same network. As individual computers can store information, large networks of computers can act as vast storehouses of information with each computer able to access this storehouse through the network.
Searches for information in the networked computer environment may be cumbersome due to the sheer amount of information stored, or due to the complexity of finding information in large file structures. Indeed, with the advent of the World Wide Web (WWW) as well as other forms of computer networking, and the corresponding explosion in the amount of information available, it is now simply impractical for users to search for information manually. The ability of search engines to analyze enormous amounts of data and isolate useful information thus becomes of paramount importance.
The use of search engines can speed information retrieval by automating the task of collecting information over a network of computers. In essence, users direct a computer to search for information much faster than a human ever could. Search engines are computer programs designed to seek out information based on instructions from the user. Typically, the user enters a set of instructions, often called a query, which instructs the search engine to search for specific types of information. Most contemporary search engines are designed to take a query, search a group of networked computers for information that satisfies the query, and return any results to the user. Often termed a result list, the data returned to the user normally contains, at a minimum, a number of entries or results that describe the locations of relevant information. Many times, this result list also includes an excerpt of the relevant information for the user""s inspection, as well as a ranking. This ranking serves as a rough indicator of how well the returned information satisfies the query, and is usually based on a numerical scoring value or metric.
Almost all search engines work in this general manner. However, their architectures vary according to the context in which they operate. Search engines are currently constructed in at least three architectures: federated, peer-to-peer, and meta-search engines. Each is used to conduct different types of searches.
Federated search engines are used in the client-server environment. A client or server may initiate a search for data located in various networked servers. Federated search engines are most commonly used in the WWW context, but need not be limited in this manner. Typical federated engines search the WWW by utilizing programs called bots or spiders to examine the content of information available on other computers and build an index consisting of the words or other data stored in these computers, as well as where they are located. Once users enter a query consisting of words or data desired, the search engine searches its index for any locations that contain these words/data and returns a list of such locations. The result list returned is normally a list of each such returned location and any associated information, and may include Uniform Resource Locators (URLs) for finding WWW-based data, or other expressions of data location. The results or entries in this list are often ranked according to any of a number of criteria currently available, with the goal of presenting the most relevant results to the user first.
One flaw in this type of search engine is the potential for inaccurate information. Because the WWW is so large, indices are updated only sporadically, meaning searches may not uncover the most recent information. Other types of search engines avoid the need for spiders and indices, and thus present users with up-to-date information more often. One example is the peer-to-peer search engine, which can also be used for other networks besides the WWW. These search engines operate in the peer-to-peer environment, where computers are simply linked together with no centralized servers and no distinct clients. They typically work by distributing a search to various peer computers, each of which can in turn farm out the search to other computers in the same network. In this way, individual computers search only the current contents of a few peers and not the entire WWW or other network. This eliminates the need to build a large index, and delivers to the user a real-time snapshot of the content of the network or the WWW.
Finally, web meta-search engines can operate in either the client-server or peer-to-peer environment. These search engines typically act as aggregators that farm a WWW search out to other public web search engines, then process the results.
A common thread amongst all types of search engines, including the three listed above, is that all usually involve the merging of result lists. Federated search engines typically farm out a search to different search engines, each of which has access to certain server databases. The federated search engine must then merge the result lists returned by each search engine. Peer-to-peer search engines, as mentioned above, distribute a search to other engines in the same peer network. These engines can then distribute the search to other computers, and so on. At each stage, the results returned may need to be collected and merged before being passed back up the chain. Finally, meta-search engines must merge the result lists sent back by each public web search engine.
This merging tactic has its drawbacks. Currently, the merging of multiple result lists into a single list is usually accomplished by examining and ranking every single entry of every list. As one can imagine, this ranking process can become quite computationally intensive if the number of lists or the number of entries per list is large. Thus, for large lists or large numbers of lists, the computation time required by the merging process can nullify any advantage gained by operating multiple search engines at the same time.
In view of this shortcoming, it would be highly desirable to merge entries from multiple result lists into a single list in a manner that avoids some of the computational overhead associated with current methods. Accomplishing this goal would improve the speed and efficiency with which useful information could be brought to people, thus reducing the tedium associated with many different tasks.
This invention includes a method for merging multiple result lists from search engines.
The invention includes the step of transmitting a query to a set of search engines. Any result lists returned from these search engines is received, and a subset of entries from each result list is selected. Each entry in this subset is assigned a scoring value according to a scoring function, and each result list is then assigned a representative value according to the scoring values assigned to its entries. A merged list of entries is produced based upon the representative value assigned to each result list.
The invention further includes a computer-readable memory to instruct a computer to merge multiple result lists from search engines. Executable instructions stored in the memory include instructions for selecting a subset of entries from each result list. Each entry in the subset is assigned a scoring value according to a scoring function. Each result list is assigned a representative value based on a function of scoring values assigned to its entries. The entries are then ranked based on the representative value assigned to their result list.
This invention allows for a reduction in computational overhead when merging and re-ranking multiple result lists. Ranking of results is accomplished by evaluating a subset of entries instead of every single one, thus reducing the number of calculations required.