The present invention relates to technique for indexing data in a database system.
In typical database systems, users store, update, and retrieve information by interacting with the database system by submitting commands to a database system. The database system responds to the commands by performing the specified actions on the database managed by the database system. To be correctly processed, the commands must comply with the database language that is supported by the database server. One popular database language is known as Structured Query Language (SQL).
To retrieve data from the database server, users typically submit queries that conform to the database language. To retrieve data that satisfies criteria, users submit queries that specify the retrieval criteria. Retrieval criteria is criteria that the data to be retrieved must satisfy. In response to receiving a query that specifies retrieval criteria, a database system may use an index to retrieve the requested data more efficiently.
A typical index associates values from a field with records that contain those values for the field. The field is referred to as the key field. Values in the key field are referred to as xe2x80x9ckey valuesxe2x80x9d, or simply as xe2x80x9ckeysxe2x80x9d. An entry in a typical index is in the form  less than key, record id greater than .
A record id is data that identifies a record, such as a row in a relational database table. For a particular entry in the index, the record identified by the record id within the entry has a key value that matches the key value specified in the index entry. An index that contains entries that each associate a key with a single record is herein referred to as a one-to-one index.
An index is referred to as an xe2x80x9cequality indexxe2x80x9d if the entries of the index associate keys to one or more records that contain key values that match the key values specified in the index entries. An equality index is useful for more efficiently executing queries that specify criteria based on equality, that is, criteria which require that all retrieved records have a field set to a value equal to a specified value. The database system examines the index, scanning the entries to locate those that have keys equal to the specified value to determine what records to retrieve. The database system then retrieves those records identified by the entries that have keys equal to the specified value.
Another type of an index is a bitmap index. An index entry in a bitmap index may have the form:  less than key, bitmap greater than .
An entry in a bitmap index may associate a key with multiple records. The records to which the key of an entry is associated are identified by the bitmap in the entry. Specifically, a bitmap is a sequence of bits, where each bit in the sequence has a bit position, and where each bit position corresponds to a record. The bitmap indicates that a record is associated with the index key when the bit at the bit position in a bitmap that corresponds to the record is set to particular value, e.g. 0 or 1, or ON or OFF.
Bitmap indexes may be advantageously used for indexing fields that contain low cardinality data. Low cardinality data is data that includes a relatively low number of distinct values. On other hand, high cardinality data is data that includes a relatively high number of distinct values. For example, in a database system for a payroll system used to manage thousands of employees, a field may contain data representing a state (e.g. California). Such a field may have up to 50 distinct values, while a field that represents the salary of an employee may have many more unique values.
Bitmap entries typically contain one entry for each distinct value in the key field. For a particular set of records, a bitmap index of a field that contains low cardinality data has far fewer entries than a one-to-one index. For a field that contains high cardinality data on the other hand, the difference in the number of entries in a bitmap index and a one-to-one index is much smaller. Furthermore, using bitmap indexes to index high cardinality data creates many entries that have just a few bits set to identify records.
Not all queries specify equality-based criteria. Rather, a query may specify that the data to be retrieved satisfy criteria based on a range of values (xe2x80x9crange criteriaxe2x80x9d). For example, a query may request records with a value in the salary field that falls between one thousand and two thousand dollars.
An equality based index of the salary field can be used to retrieve records for a query that specifies a salary range. Under these circumstances, the database system scans the entries that correspond to all keys that fall within the range. Thus, when a query specifies range criteria, scanning an equality index to determine what records need to be retrieved generally involves scanning more entries than are scanned when the query specifies equality based retrieval criteria.
Furthermore, queries that specify range criteria often specify range criteria that limits values in fields that contain high cardinality data. For example, a query may specify range criteria that limits values in a field that contains a numeric value used to represent a salary. Thus, fields used for range criteria are not amenable to bitmap indexes.
Based on the foregoing, it clearly desirable to provide an index that may be scanned to determine what records satisfy range-based retrieval criteria, and that allows queries that specify such criteria to be processed without scanning as many entries as would have to be scanned using equality indexes.
A method and mechanism is described for indexing a body of records. An index associates ranges with records that hold key field values that fall within those ranges. Such an index may be implemented as a bitmap index. The bitmap index may contain entries that associate a range with a bitmap. The bitmap of an index entry identifies which records have a key field value that falls within the range associated with the entry. The index may be a native index maintained by a database system. The database system uses the index to efficiently process queries that specify range criteria.