1. Field
Embodiments of the invention relate to efficient evaluation of index screening predicates.
2. Description of the Related Art
Relational DataBase Management System (RDBMS) software may use a Structured Query Language (SQL) interface. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Institute (ANSI) and the International Standards Organization (ISO).
A RDBMS uses relational techniques for storing and retrieving data in a relational database. Relational databases are computerized information storage and retrieval systems. Relational databases are organized into tables that consist of rows and columns of data. The rows may be called tuples or records or rows. A database typically has many tables, and each table typically has multiple records and multiple columns.
A table in a database can be accessed using an index. An index is an ordered set of references (e.g., pointers) to the records in the table. The index is used to access each record in the table using a key (i.e., one of the fields or attributes of the record, which corresponds to a column). The term “key” may also be referred to as “index key”. Without an index, finding a record requires a scan (e.g., linearly) of an entire table. Indexes provide an alternate technique to accessing data in a table. Users can create indexes on a table after the table is built. An index is based on one or more columns of the table.
A query may be described as a request for information from a database based on specific conditions. A query typically includes one or more predicates. A predicate may be described as an element of a search condition that expresses or implies a comparison operation (e.g., A=3).
Indexable predicates may be described as predicates that are applied to index columns (i.e., columns used to build an index). An indexable predicate may be described as either a matching predicate or a screening predicate. Matching predicates may be described as having a constraint on leading index columns and, hence, may be used to define the search range of index scans. Screening predicates, on the other hand, are used to provide additional filtering once a key is looked up in an index scan. For example, let IX1 be an index on columns C1 and C2. In Query (1), the predicate ‘C1 BETWEEN 1 AND 95’ is a matching predicate, and predicate ‘C2 BETWEEN 50 AND 52’ is a screening predicate.
Query (1)
SELECT * FROM T
WHERE C1 BETWEEN 1 AND 95                AND C2 BETWEEN 50 AND 52;        
An example is provided below to explain the nature of the problem. In the example, table T is a table of transactions in year 2003, and index IX1 is a 2-column index over date and amount for table T. The Query (2) finds all the ‘high volume’ transactions in the first half of year 2003. Suppose an optimizer of the RDBMS exploits index IX1 for the scan of table T. In this example, the 1-column matching index scan has poor matching selectivity (roughly 50%) and high screening selectivity (assuming 2%).
Query (2)
SELECT * FROM T
WHERE date BETWEEN ‘2003-01-01’AND ‘2003-06-30’<-50% selectivity
AND amount>=300000; <-2% selectivity
In this example, table T contains one million records. The index scan for the matching predicate results in a scan of roughly 500,000 keys. For each key found to meet the matching predicate, the screening predicate is applied, which, results in 500,000 evaluations of the screening predicate. However, less than 20,000 keys actually survive the screening predicate in this example.
A B-tree index is a binary tree that may be used to store the references to the records in a table. FIG. 1 illustrates a sample B-tree index 100 that represents the results of Query (2). In the B-tree index 100, the bottom level contains “leaves” or “leaf pages”. Each leaf page contains a list of keys. Each leaf page includes a pointer to a next leaf page. For the matching range, there are a large number of false-positives. A false-positive may be described as a key that results in a match for the matching predicate but not for the screening predicate.
Thus, there is a need in the art for efficient evaluation of index screening predicates.