1. Field of the Invention
Aspects of the present invention relate generally to selecting which posting lists should be stored to a static search engine cache.
2. Description of Related Art
As is known in the art, search engines enable users to find content available through a wide variety of mediums and protocols, such as, for example, the Internet and the Hypertext Transfer Protocol (HTTP), etc. On a regular basis, the majority of Internet users submit search queries to search engines when looking for specific content on the Internet. Given the large number of users routinely searching for content, the large volume of data required to enable useful results, and the high processing requirements of such a search engine system, efficient mechanisms are needed to enable search engines to respond to the queries as quickly and as efficiently as possible.
One mechanism for increasing the efficiency of processing search engine queries is a cache, which can maintain, in some medium, the results of frequently or recently submitted queries. Generally, a cache allocates a fixed, pre-determined amount of memory space for storing previously processed search queries. If a search engine then receives a query that it has already processed and stored in the cache, the results may be returned without having to re-process the query; such an action is known as a “hit” (the performance of a cache is generally measured in terms of “hit ratio,” which is the number of hits divided by the total number of queries). If a search engine receives a query that it has not already processed, or, more specifically, that it has not already processed and saved to the cache (a “miss”), then it processes the query and [potentially] saves the results to the cache. If the cache is full when new results need to be saved to it, an eviction policy may be employed to determine what, if anything, will be removed from the cache so that room can be made for the new results.
Generally, a cache holds either the results of a particular query or the posting lists associated with a given query. Returning a result to a query that already exists in the cache is generally more efficient than computing the result using cached posting lists. Relatedly, previously unseen queries generally occur more often than previously unseen terms, resulting in a lower hit rate for cached results.
Thus, it would be desirable to divide the cache into a first part for caching query results and a second part for caching posting lists. It also would be desirable to determine which terms to cache in the posting lists section of the cache, where the method takes into account both the frequency of the query terms and the size of the associated posting list.