1. Field of the Invention
The embodiments of the invention generally relate to list processing, and, more particularly, to intersecting lists using lazy segment merging and an adaptive n-ary intersecting process.
2. Description of the Related Art
Row identification (id) list (RID-List) intersection is a common strategy in query processing, used in star joins, column stores, and even search engines. To apply a conjunction of predicates on a table, a query processor does index lookups to form sorted RID-lists (or bitmap) of the rows matching each predicate, then intersects the RID-lists via an AND-tree, and finally fetches the corresponding rows to apply any residual predicates and aggregates.
Currently, the most popular way of doing RID-list intersection is index anding: using a suitable index (often a bitmap index), construct one or several lists of matching RIDs for each predicate, merge all the RID lists for each predicate, and then intersect all the lists together to compute the intersection. This process is shown in FIG. 1 where lists of RIDs 104 are merged into segments 102, which can be intersected according to a predicate (AND) 100. The problem with usual implementation of index anding is: (a) the RID list merge 102 is often expensive, and (b) the intersection 100 is done via a binary AND-tree, whose performance is highly sensitive to the ordering of lists in the tree.
This process can be expensive when the RID-lists are large. Furthermore, the performance is sensitive to the order in which RIDlists are intersected together, and to treating the right predicates as residuals. If the optimizer chooses a wrong order or a wrong residual, due to a poor cardinality estimate, the resulting plan can run orders of magnitude slower than expected.