A user submits a search query in order to identify, among a set of data items, data items having certain characteristics. For example, it is common for users to query a relational database by submitting a query that specifies values of one or more fields present in the database, and receive in return a query result listing records in the database that contain the specified values in the specified fields. Queries may either be applied directly against the authoritative data source containing information about the set of data items, or against a separate index that is optimized for handling certain kinds of queries.
In the case of some sets of data items, the data items have attributes of different types that all may be the subject of a query. For example, in addition to relational fields, some conventional database engines support the storage of geographic locations for data items. In such a case, two separate indices are constructed: a relational index whose structure is tailored to identifying data items based upon their relational field contents, and a geographic index whose structure is tailored to identifying data items based upon their geographic locations—such as an R-tree. A query specifying relational attributes alone is typically processed solely against the relational index, while a query specifying geographic attributes alone is typically processed solely against the geographic index.
In conventional database systems, a query that specifies attributes of multiple types, sometimes called a “hybrid query,” is first processed against the index appropriate to each attribute type. In the above example, a hybrid query specifying both relational and geographic attributes would be processed independently against both the relational and geographic indices. Each of the indices produces an intermediate query result, sometimes called a “constituent query result,” identifying all of the data items having the specified attributes of the attribute type represented in the index, irrespective of whether they have the attributes of attribute types not represented in the index. In order to obtain a final query result from the constituent query results, the constituent query results must be joined, or “intersected,” so that the final query result contains only data items present in each of the constituent query results. Joining groups of data items such as those contained in the constituent query results is much more efficient if the data items in each group occur in the same order as in the other groups. Because the different indices used to represent the different types of attributes usually have different structures to more effectively identify data items based upon their different attribute types, however, the constituent query results they produce tend to list items in different orders. Accordingly, in the conventional approach, the constituent query results must all be sorted into a common order before joining.
This process is illustrated in FIG. 1. FIG. 1 is a data flow diagram showing a conventional process for processing a hybrid query. First, indices 111-113, each representing different attribute types, are initially built and then maintained to reflect changes in the data source. Second, a query 120 received from the user is applied simultaneously against all of the indices to obtain a constituent query result for each of the indices, here constituent query results 131-133. Third, each constituent query result is normalized, such as by sorting it to obtain a normalized query result, here normalized query results 141-143. Finally, the normalized constituent query results are intersected, such as by joining them, to obtain a final query result 150.
Unfortunately, sorting the constituent query results before joining them is often an expensive operation, consuming significant computing resources. Accordingly, an approach to processing a hybrid query without sorting constituent query results would have significant utility.