1. Field of the Invention
The invention relates to a technique, specifically a method, apparatus, and article of manufacture that implements the method, to organize and access rows in a range-clustered table in a database management system. This technique is particularly, though not exclusively, suited for use within a database management system.
2. Description of the Related Art
Database management systems allow large volumes of data to be stored and accessed efficiently and conveniently in a computer system. In a database management system, data is stored in database tables which effectively organize the data into rows and columns. A database engine responds to user commands to store and access the data.
In FIG. 1, a database table 30 having rows 32 and columns 34 is shown. A row 32 has one or more columns 34. A row 32 is also referred to as a record.
In the database table 30, the columns 34 are associated with a datatype that defines the type of data stored in that column. The datatype may be selected from a set of predefined datatypes such as numeric, integer, decimal, string, character, date, time and timestamp, among others. In a typical database management system, some datatypes assign a predefined length to a column. Such exemplary datatypes include, and are not limited to, “INTEGER,” “DECIMAL,” “DATE,” “TIME,” “TIMESTAMP,” and “CHAR(N)” which allocates space for N bytes in a character column. Other datatypes store data having a varying length such as VARCHAR, Binary Large Object (BLOB), and VARGRAPHIC.
To more quickly access the data in a database table, an index is generated based on one or more specified columns of the database table. This set of specified columns is referred to as a key or index key. Each row of the database table is associated with a key. The value of the key is determined by the values stored in the columns making up the key. In the index, the values of the keys are ordered. Each key value is associated with a least one row identifier, or pointer, to its associated row of the database table. Therefore, the keys provide access to all the rows in a database table. The database engine accesses the index to quickly locate a row of data in a database table based on the value of the key. Because the columns of a key are a subset of the columns of the database table, the columns of the key are associated with a datatype, and this datatype information is stored as column description information for that index.
Traditionally, two types of methods are used to access rows—balanced tree (B-tree) indexes and hash indexes. In FIG. 2, a query uses a B-tree 40 to access a particular row 36 in the database table 30. A B-tree 40 is a data structure, stored on disk, that stores indexes. As shown in FIG. 3, the B-tree 40 has a primary data structure 42 that is used to access a secondary data structure 44 storing indexes 46. The indexes 46 of the secondary data structure point to the locations on the disk where the rows are stored. The B-tree 40 allows rows to be read in any order. When searching through a range of key values, the rows can be found efficiently even if those rows are distributed across the disk. The B-tree is typically structured to minimize the number of disk accesses to retrieve a row.
In FIG. 3, the primary data structure 42 has values A1 . . . An. Index values less than A1 are stored in secondary structure 44-1. Index values greater than or equal to A1 are stored in secondary structure 44-2. Values greater than or equal to An are stored in secondary structure 44-3. The index value v(2,2) points to row 48, and the index value v(n+1,o) points to row 50.
The hash index transforms a key into an index value, which is used to store and retrieve a row of data. The hash index performs well when accessing unique keys, such as social security numbers. However, the hash index does not perform range searches efficiently because each row must be read. For example, the hash index cannot efficiently be used to search for all employee names that start with “T”.
Although B-tree indexes are efficient at retrieving rows and searching ranges of rows, B-tree indexes require many accesses to the hard disk and performance can be slow. Although hash indexes are efficient at retrieving rows with unique key values, hash indexes cannot perform range searches efficiently because each row must be scanned. Therefore, there is a need for a technique to access a database table that increases the speed of accessing data and allows range searches to be performed efficiently.