The present invention relates to a method for reorganizing a source index tree of a database table resulting in a target index tree.
A database index is a data structure used in database systems. One of many purposes of a database index is an acceleration of operations on a database table. The database index is a sorted list of the contents of one or more table columns of the database table, where each element of the list is associated with a pointer to a respective table row of the database table. While a database index saves time in the reading of data matching some criterion, it costs additional time to update the database index when table data are inserted, modified, or deleted.
Database indices can be stored in various forms of data structures. The most common data structure for a database index is a B+ tree. This index tree keeps data sorted in a way that allows searches, insertions, and deletions of index tree records at a short time. The B+ tree stores index records, each of which is identified by an index key (K) and a reference to one of the table rows of the database table. The index key is constructed from the contents of the one or more table columns the database index is referring to. Each of the table rows is identified by a respective row identifier (RID) or pointer (P).
In many database systems, database indices can be stored at storage locations, which are, for example, called indexspaces, that are separate from storage locations for database tables, which are, for example, called tablespaces.
The tree structure of the B+ tree comprises internal nodes, which are non-leaf nodes of the tree structure, and external nodes, which are leaf nodes of the tree structure. In contrast to the non-leaf nodes, the leaf nodes do not have any child nodes. The non-leaf nodes are parent nodes of the leaf nodes.
The internal nodes of the B+ tree have a variable number of child nodes within a pre-defined child number range, that is, from a minimum child number to a maximum child number. The order b of the B+ tree measures the capacity of child nodes for each parent node, that is, defines the maximum possible number of child nodes. The minimum number of child nodes is typically defined to be half of the order b, that is, b/2, rounded up to the nearest integer.
The external nodes of the B+ tree store sets of the index records, where the sets have pre-defined record number ranges, that is, from a minimum record number to a maximum record number. The internal nodes of the B+ tree, however, do not store index records. The index records have a logical sort order that is defined by a sequence of the identifying index keys. The logical sequence of index records also defines a logical order of the leaf pages of the index tree.
All the internal and external nodes of the B+ tree have respective parent nodes except for the root node, which is at the top level of the index tree.
The internal nodes store respective ordered sets of pointer-key-pairs (pi, ki). The pointer pi refers to a subtree of a child node, which has index records with key values that are less or equal than the key ki and greater than the key ki-1 of a preceding pointer-key-pair (pi-1, ki-1).
The number of nodes along an index tree branch descending from the root node to a leaf node defines a height h of the index tree. A B+ tree is kept balanced by requiring that all leaf nodes have the same index tree height n.
If a storage system has a block or page size of B bytes, and each of the pointer-key-pairs to be stored in the non-leaf nodes has a size of k, the most efficient B+ tree has a maximum child number of b<(B/k) for the non-leaf nodes. In this case, the physical storage size of one of the non-leaf nodes does not exceed the block or page size of the storage system. In the same way, a maximum number of index records can be calculated for the leaf nodes that is most efficient for a given block or page size. In the remainder of the description, the term “page” is used to describe a segment of storage space that holds information represented by one node of the index tree. The sequence of physical storage locations for respective leaf pages defines a physical order of the leaf pages.
When a table row is added or removed from a database table that is associated with a database index, a corresponding index record must be respectively added or removed from the database index. In the case of the B+ tree, all insertions and deletions of index records happen in the leaf nodes.
When a specific index record is to be inserted into a specific leaf page and the number of index records of the specific leaf page exceeds the pre-defined maximum record number, the specific leaf page can be subject to a split operation. In this case, a database management component of a database management system (DBMS) determines an unused storage location for a leaf page based on a space map page and allocates the free storage location to a new leaf page. In the database storage, the allocated storage location should be as close to the storage location of the specific leaf page as possible. Typically, half of the index tree records of the specific leaf page are moved to the new leaf page. A new pointer is added to the parent non-leaf page of the specific leaf page, where the new pointer refers to the new leaf page.
When an index record is deleted from a specific leaf page and the number of index records of the specific leaf page falls below a pre-defined minimum record number, the specific leaf page can be subject to a join operation. A join operation joins index records of two leaf pages that are adjacent in the logical sequence of leaf pages. The two leaf pages do not need to have adjacent physical storage locations. The join operation of the two leaf pages is only possible if their total number of index records falls below the maximum record number. After the join operation, one of the pointers referring to the two leaf pages is deleted from the respective parent non-leaf page and one or more key values of one or more parent non-leaf pages are adapted to key values of the joined leaf page.
The insertion and deletion of child pointers the non-leaf pages can cause split and join operations of the non-leaf pages in the same way as described for index records of the leaf pages. The root node of the index tree, however, plays a special role: When more child nodes are to be added to the root node than a pre-defined maximum child number, the root node is split into two non-leaf nodes and a new root node is created for the index tree, where the two split non-leaf nodes become child nodes of the new root node. In this case, the height of the index tree is increased by one. When child nodes are removed from the root node and the root node only has one child node left, the root node is deleted and the child node becomes a new root node of the index tree. In this case, the height of the index tree is decreased by one.
B+ trees can waste some storage space since leaf and non-leaf nodes are not always entirely full, that is, respectively have less than the pre-defined maximum number of records and child nodes.
Join and split operations can cause a fragmentation of the leaf pages. Due to the split operations, the logical order of the leaf pages will become different from the physical order of the leaf pages. The join operations will leave unused storage locations after deleting leaf pages. The fragmentation of the leaf pages can significantly reduce the performance of the database index. When a database index has been perfectly reorganized and only a few split and join operations have fragmented the database index, subsequent leaf pages in the logical order are mostly neighbours in physical storage or at least not too far away from one another. To keep a database index defragmented as far as possible, the database management component tries to avoid split and join operations.
The database management component accesses leaf pages of the index tree by reading chunks of leaf pages and writing them to a cache. The leaf pages of one chunk are subsequent in the physical order, but not necessarily subsequent in the logical order. The more the leaf pages are fragmented, the more chunks of leaf pages must be read by the database management component to get a logical sequence of leaf pages. Thus, the increasing fragmentation of the leaf pages reduces the performance of the database system.
To improve the performance of the database system, a so-called reorganization can eliminate a difference between the physical order and the logical order of the leaf pages of the database index. Prior art database systems are simply rebuilding the index tree based on current information found in one or more columns of the database table. The database management component scans the database table in a physical order of the table rows. This is also called a full table scan. For each of the table rows, a respective index record is created. If the total set of created index records is small, the index records can be sorted in the memory of the database system. If the total set of index records is too large to perform the sort operation in the memory, smaller subsets of index records can be separated and sorted in the memory. The sorted subsets can be temporarily stored in permanent storage and merged into a sorted total set of index records in the memory. The reorganization is called online, when the new database index is rebuilt in a shadow object, while concurrent transactions are accessing the old database index. Once the rebuild of the new database index has been completed, the database management component will re-read database logs to update the new database index incrementally. The updated new database index will replace the original version. Storage space used by the original database index will be released after the online reorganization.
The leaf pages of the B+ tree are linked to one another in a linked list according to the logical order of the leaf pages. Each of the leaf pages has a pointer to a preceding leaf page and a pointer to a succeeding leaf page of the linked list of leaf nodes. The first and last leaf page of the linked list only have one pointer to the respective adjacent leaf page. These pointers make range queries simpler and more efficient. The links to adjacent leaf pages allow quickly traversing the list of leaf pages in the logical order without reading pointer information from the parent non-leaf pages.
A database backup and restore creates a backup image, in the literature also named as backup copy, of a source database system, which is used to rebuild a target database system. The backup and restore of a database system may be also denoted as a database copy. The backup image of one database system can be represented by one or more files or datasets. Database tables and indices may be subdivided to different, files or datasets. Reasons for a database backup and restore are data protection against loss and a setup of a system environment with multiple almost identical database systems that may be used as development, quality assurance and production systems. A different, for example, more powerful, hardware can also account for a database backup and restore.
Operating system components are used for transferring the backup image from the source to the target database system. In prior art, a reorganization of the database indices is performed after the restore of the target database system because the physical order of the database indices is maintained. As described above, a reorganization involves a rebuild of the database index trees, which can take a long time.