As the use of computing devices and networks such as the Internet that connect computing devices has increased, there has been a rapid increase in the number of pages on the World Wide Web. Because of the large and growing number of available web pages, it is currently difficult for a search engine to place all available web pages into a single, high-speed index. Instead, search engines often use multiple indexes, including one or more indexes with smaller capacity but higher speed and one or more indexes with larger capacity but lower speed. Alternatively, some existing search engines utilize a single index and provide varying levels of priority and/or optimization based on various factors relating to received queries.
Conventionally, smaller, faster indexes used by a search engine contain head Uniform Resource Locators (URLs), e.g. URLs searched by many users, while larger, slower indexes contain tail URLs, e.g., URLs searched by fewer people. In order to maintain a trade-off between quality and speed, search engines traditionally skip using larger indexes when a sufficient amount of URLs that are responsive to a user's query can be found in smaller indexes. In order to ensure optimal performance of a search engine, a search engine can utilize mechanisms to ensure that the quality/speed trade-off incurred in processing a query results in an optimal user experience. For example, a user can experience dissatisfaction with a search engine if the most desirable URL for the user's query is located in a larger index but the larger index is skipped by the search engine. Similarly, however, if a search engine utilizes a larger index for every user query, the search engine will perform as slow as the larger index.
Conventional techniques for determining whether a larger, slower index is to be processed or skipped for a given user query generally rely on sets of rules that are manually written and applied to the search engine. However, rigid application of such rules can result in search engine performance that is not optimal for all cases and/or provides limited adaptability for changing network and/or hardware conditions. Accordingly, there is a need in the art for techniques for query classification and processing that mitigate at least the above shortcomings.