1. Field
Embodiments of the present invention relate generally to databases and, more particularly, to accelerating database queries.
2. Background Art
Databases commonly organize data in the form of tables, each table having a number of rows and columns. Each row in a table generally has a data value associated with each of the columns, this intersection of rows and columns commonly called a cell. A system needing access to data in the database typically issues a request in the form of a query. A query usually involves a request for the data contained in one or more cells of any rows which satisfy a particular condition. This condition often involves the comparison of the values of cells in a column to some other value to determine whether the row associated with the compared cell satisfies the condition.
A direct comparison of each cell of interest in a table to a value is often computationally expensive, and database developers have accordingly introduced means by which rows satisfying a comparison operation can be more readily determined without the need to traverse every row of a table. A common optimization involves the use of a tree-based index structure to determine which rows contain a desired value. Each node of the tree represents a distinct value appearing within a particular column in any row of the table. Each node of the tree connects to a data structure representing the set of all rows in the table where the indexed column contains the specified distinct value. One such data structure that can be used to represent a set of rows is a bitmap, where each bit with a ‘1’ value within the bitmap corresponds to a row containing the specified distinct value.
This approach is reasonably efficient when an exact value is desired, such as with, for example, a query for all rows in which a particular column has the string value “Sybase”. In this approach, database software would traverse the tree structure to locate the node corresponding to the distinct string value “Sybase” and retrieve an associated bitmap. The rows for which the value of the particular column is “Sybase” would be represented by “set” bits in the bitmap (i.e., bits set to either a ‘1’ or a ‘0’ value in order to indicate that a corresponding row satisfies the condition or fails to satisfy it). As a result, the database software is able to simply retrieve those rows and produce a result set from them.
In recent years, among both database vendors and database scholars, there has been significant interest in the use of bitmap-based methods for the evaluation of a set of conditions. These sets of conditions can be combined using typical Boolean combining operators (e.g., AND, OR, and NOT), and these combinations can be arbitrarily nested within each other.
One example of such a combination of conditions, using SQL, is a WHERE clause expression that includes both conjunctive conditions (i.e., a set of conditions connected by one or more AND operators) and a set of disjunctive conditions (i.e., a set of conditions connected by one or more OR operators). The goal of a database optimizer when faced with a query containing such a WHERE clause is to determine the fastest way to identify a set of relevant rows in a table that satisfy conditions in the WHERE clause.
To process such expressions and index-based operations, conventional approaches may take the original or transformed complex condition expression, and render it into a tree of simple conditions (i.e., conditions which do not contain other conditions) and complex conditions (i.e., conditions containing one or more other conditions and one or more Boolean combining operators). Each simple condition may then be evaluated in isolation, and then the simple binary combining operators may be applied to each pair of results from lower level conditions in the tree.
Some existing databases extend the above outlined conventional Boolean combination method by enabling a simple condition, within a sequence of conjunctive conditions, to accept as an input a bitmap which represents the rows that have satisfied all the preceding conjunctive conditions. Other existing databases have extended their naïve Boolean combination method in an orthogonal direction by supporting n-ary OR and n-ary AND operations instead of restricting themselves to a tree of binary AND and OR operators. However, such conventional methods preclude many mechanisms that could potentially be used to determine and reduce the cost of evaluating simple conditions.
Accordingly, systems, method and computer program product embodiments are desired that overcome limitations of conventional approaches to accelerate database queries.