1. Technical Field
The invention relates to the field of databases, and in particular to creating indexes for databases.
2. Discussion of the Related Art
Along with the development of computer technology, the application of a database is more and more prevalent, and the capacity of databases is increasingly larger. In order to quickly query a database, existing database management systems create indexes for databases. Indexes are created on some columns in a database table. For example, an index can be created on columns such as: 1) columns that need to be searched frequently (to speed up searching); 2) columns that are used as a primary key (to enforce uniqueness of the column and organize arrangement structure of data in the table); 3) columns that are frequently used for a ‘join’ operation between tables (such columns can be foreign keys and may speed up ‘join’); 4) columns that frequently need to be searched according to scope (the data in the designated scope is continuous since the index has been ordered); 5) columns that frequently need to be sorted (a query that requires sorting can utilize the order of indexes to speed up the query, since indexes have been sorted); and 6) columns that are frequently used in a WHERE clause to speed up computation of conditions in the where clause.
Creation of an index can greatly improve performance of a database system. Uniqueness of each row of data in a database table can be assured by creating a unique index, such that searching of data can be accelerated and a ‘join’ operation between tables can also be accelerated, specifically, it is particularly meaningful in realizing reference integration of data. When data is searched by using a group and order clause, the time of grouping and sorting in a query can also be significantly reduced. By utilizing an index, optimization may be used during the query, thereby improving performance of system.
In existing database management systems, a B+Tree index is the most common index structure, and an index created by default is such type of index. A B+Tree index is a type of balance tree and is based on a binary tree; it is composed of a root node, an intermediate node and a leaf node. Root nodes, intermediate nodes and leaf nodes are all located in an index page and are called as an index node. The node that does not have parent node is a root node, the node that does not have child node is a leaf node, and the node located between root node and leaf node is an intermediate node. A root node stores a pointer pointing to an intermediate node; an intermediate node is located between a root node and a leaf node and stores a pointer pointing to next level intermediate node or leaf node; and a leaf node stores a pointer pointing to a data page (i.e., a physical storage location of data).
A database query is often a multiple-stage query. Assume there are two tables, T1 and T2, in which T1 has two indexes: index T1_i1 on a first column col1 and an index T1_i2 on a second column col2, taking the following query for example:
select*from T1, T2 where T1.col1=T2.col1 and T1.col2>=60 and T1.col2<90
The database management system first scans index T1_i2 by using the condition T1.col2>=60 and T1.col2<90 to obtain an intermediate result, and the intermediate result is copied into memory or other temporary storage location. At this time, the intermediate result corresponds to content between 60 and 90 indexed by the col2 in T1. Then, the column col1 of the intermediate result is matched with the first column of table T2 to obtain a query result. At this time, since what is stored in leaf node of index T1_i1 on the first column col1 of T1 is a pointer pointing to a physical storage location of data in original table T1, and the intermediate result is copied to memory or other temporary storage location, the pointer in index T1_i1 will become invalid for the intermediate result. That is, the intermediate result cannot utilize index T1_i1, thus, a full table traverse has to be performed on the intermediate result. In a large enterprise application, the capacity of the intermediate result is often very large, such that a full table traverse on intermediate result will consume a large amount of computing resources.