The following description relates to information management systems and executing a query on a subset of data, for example, to facilitate a fast search with a very large result set.
An information management system may include a computer system and a data repository. A data repository includes data, such as documents, and may reside on a storage device. In a traditional database system, the data in the data repository typically are referred to as records. Information about the records may be available through an index of the data repository that includes properties, also known as attributes, of the records. In order to retrieve data from the data repository, a user may submit a search query through a computer system. The query may include criteria for searching, such as terms and operators. The information management system may execute the query by reviewing an index of the data repository to find entries in the index that match criteria in the query.
Depending on criteria specified in a search query and the processes used to execute a search query, the search may require the calculation of “intermediate results.” Intermediate results are results that when properly linked together, can be used to generate a set of final results matching the search criteria. For example, a search using the terms “John” and “Smith” with the Boolean operator “AND” placed between the terms may require a first search for “John,” which returns a first intermediate result, and a second search for “Smith,” which returns a second intermediate result. The intermediate results may be linked together to generate a final result set.
In some situations only a certain number of results may be desired. Such situations may include a query where only a certain number of results are requested, and/or in calculating intermediate results where one or more of the intermediate results require only a certain number of results. For example, a query may specify that only fifty results meeting the search criteria are requested.
In the case of calculating intermediate results, the execution time of a query typically correlates to the size of the intermediate results involved because generating the intermediate results and calculating the required links typically is very time-consuming. The execution of a query is also time-consuming if the results are sorted by an attribute and only a certain number of results are desired. For example, if only fifty results are desired and it is desired that those results are sorted, a query may be executed on all data, all results from that query may be sorted, and then fifty results may be selected.