Typical document indexing systems have word occurrence data arranged in an inverted content index partitioned by document. The data is distributed over multiple computer systems that are dedicated to index storage with each computer system handling a subset of the total set of documents that are indexed. This allows for a word search query to be presented to a number of computer systems at once with each computer system processing the query with respect to the documents that are handled by the computer system.
An inverted word location index partitioned by document is generally more efficient than an index partitioned by word. This is because partitioning by word becomes expensive when it is necessary to rank hits over multiple words. Large amounts of information are exchanged between computer systems for words with many occurrences. Therefore, typical document index systems are partitioned by document and queries on the indexed documents are processed against the contents of the indexes until a sufficient results set is obtained. While the number of documents indexed in search engines is growing, in many cases the results for most queries come from a small portion of the entire set of documents. Therefore it may be inefficient to search indexes that contain documents that are less likely to return results in response to a query.