The present invention relates to databases, and more particularly to data structures used in relational database management servers (RDBMS).
Online Analytical Processing (OLAP) applications are used in many business contexts, such as business reporting for sales, marketing, management reporting, budgeting and forecasting, etc. The OLAP applications typically obtain their data from Relational Database Management Systems (RDBMS). Such RDBMS systems can contain tables with billions of tuples (i.e., objects and information about the objects), which can require many terabytes of persistent storage.
A predicate can be described in general terms as a truth-valued function, i.e., a function that, when it is invoked, returns a truth value, such as “yes”, “no” or “not defined”. In RDBMS, predicates are used in Structured Query Language (SQL) expressions for conditional computation, such as evaluation of filter expression, performing join, etc. Predicate expression used in SQL could be of different types i.e. comparison (equality, less than, greater than, or not equal), range (between, combination of different comparison predicates using AND/OR), IN, Like, Exists, Equivalent, or any other Boolean expression implemented in SQL either proprietary or as per defined by standards.
One example of a predicate is a range predicate. A range predicate can be used, for example, to determine if a certain variable falls within a particular range specified by the range predicate (e.g. “Is variable A in the range of 100-200?”). When a range predicate is executed on a RDBMS table (without any index on the column), a full table scan is performed, i.e., all the column values need to be examined. Full table scans, however, slow down the execution of SQL queries and also consume excess resources, such as CPU resources, disk space, buffer space, cache, etc., and are therefore not desirable. One way full table scans can be avoided is to create suitable indexes on the required columns, i.e., those columns that are used for filtering or join expressions, which may vary from query to query.
An example of such an index is a B+ index. The B+ index is a type of binary tree that represents sorted data in a way that allows for efficient insertion, retrieval and removal of records, each of which is identified by a key. In the B+ tree, all records are stored at the leaf level of the tree, and only keys are stored in interior nodes. The B+ index facilitates faster execution of SQL queries since it provides random access to all of the tuples in a given column. However, in order to provide such access, the B+ index must store all the unique tuple-values along with their identifiers. B+ (and other similar) indexes are therefore often considered costly due to their space and maintenance requirements.
An alternative to using the B+ index is to use a so-called Sparse index. The Sparse index stores only a limited set of tuple-values and a tuple-identifier for each tuple-value. As a result, the Sparse index is more storage space efficient than a conventional B+ index, but is still useful when executing range predicates. However, a Sparse index can currently be implemented only on a column having sorted data, based on the data values in the column. Thus, when executing a range predicate on an unsorted column, it is still necessary to perform a full table scan or use a bulky conventional index, such as the B+ index.