A common goal of many search systems, such as search engines, is to provide quick and meaningful responses to queries. This typically requires that the searches be conducted efficiently. In an attempt achieve efficient searches, many search engines utilize indexes to facilitate searching. An index maps content (typically in the form of tokens) to the entities being searched (database records, web pages, or the like). For example, a computer system could be used to store text documents and full text indexes could be used to help search the documents. The indexes could map words to lists of document identifiers. The indexes could be used to respond to queries containing one or more words. And a query response would contain a list of all documents containing the words of the query.
Typically, as the number of entities to be searched increases, the size of the index increases. In many cases however, it can be prohibitively inefficient to maintain only one index. For example, the amount of data in an index can become too large to maintain in a processor's internal memory. Many current search systems are continuously queried, and documents are continuously being added thereto. In such systems, as an index becomes too large, it is stored on slower, secondary storage, e.g., disk memory or the like. This results in multiple indexes. Typically, the search system consults each index in response to a query.
Accordingly, as the number of indexes increases, efficiency is affected. Consulting more indexes takes more time. The system can improve efficiency by merging some or all of the indexes into a single index. The operation of merging, however, also takes time. Hence, there is a tension in the system. How can one know, before merging indexes, if it will be more efficient to merge multiple indexes and consult the resultant single index, or individually consult the multiple indexes?