1. Field of the Invention
This invention generally relates to optimization of range queries in relational databases, and, more particularly, to a method for accelerating range queries using periodic monotonic properties of non-monotonic functions.
2. Description of the Related Art
Indexes are used in most database management systems to speed up lookups of point queries as well as lookups of range queries, which are limited by a lower and upper bound. They are defined for one or multiple columns. The multi-dimensional clustering (MDC) works, for example, as in DB2® manufactured by IBM Corporation.
The ability to cluster on column expression is useful for rolling up dimensions to a coarser granularity. Examples include rolling up an address to a geographic location or region, rolling up a date to a week, month, or year. Generated columns are used to implement the rolling up of dimensions.
For example, to create a table clustered on one base column and two column expressions in DB2® the following commands may be used:
CREATE TABLE T1(c1 DATE, c2 INT, c3 INT, c4 DOUBLE,
c5 DOUBLE GENERATED ALWAYS AS (c3+c4), // c5 is an expression based on columns c3 and c4,
C6 INT GENERATED ALWAYS AS (MONTH(C1))) //c6 rolls up column c1 to a coarser granularity in time
ORGANIZE BY DIMENSIONS (c2, c5, c6)
The above statement will cluster the table based on the values in columns c2, c5, and c6.
The concept of multi-dimensional clustering has been used, for example, in DB2® as MDC. An example of MDC clustering is given in FIG. 1.
When a query selects values from a column x, then the index pre-defined on x is used for optimization. If the query selects values from a column x that does not have an index, the index on a column y that is derived from x can be used if, and only if, there is the guarantee that columns x and y are monotonic with respect to one another. Currently, if the two columns are not monotonic with respect to each other, then the conventional approach cannot be used.
Monotonicity is currently exploited to reduce the number of columns that need to be indexed. Since there is a considerable amount of space that needs to be allocated for each index and for each new dimension, this reduction has obvious advantages. Consider a base column x and a derived column y=F(x) where F(x) is monotonic. Then a range query over x (the range is xi . . . xj) can be answered by performing a range query over y (over the range F(xi) . . . F(xj)) which will retrieve the same blocks of data as the user query. Thus, if the values of a derived column y behave monotonically with respect to the values in a column x, then the index on x can be used to answer range queries on column y, and vice versa.
There are many instances where two columns are piecewise monotonic, meaning that if split into intervals, their values are monotonic over these intervals, although the type of monotonicity, increasing or decreasing, may change from one interval to another. Such a condition is referred to as periodic piecewise monotonic (PPM). However, some functions are observed to be PPM without the explicit knowledge of the user, in which case the manual approach would not even be feasible.