This disclosure relates generally to the field of computer databases. More particularly, but not by way of limitation, it relates to a technique for avoiding or reducing index page splits when inserting large numbers of rows into a table of a relational database.
Computer databases have been an important part of enterprises for decades. Two major types of computer databases are hierarchical databases and relational databases. Hierarchical databases, which were developed prior to relational databases, are arranged into tree-like hierarchical structures representing logical relationships. Relational databases organize data into tables of rows and columns. One common relational database management system (DBMS) is the DB2® database system marketed by International Business Machines Corporation (IBM). (DB2 is a registered trademark of IBM.)
Relational databases typically store indices in addition to the actual data stored in the database. These indices allow efficient access to desired data in the database. Indices are typically implemented as B-trees, with the actual index data stored in leaf nodes in the B-tree.
In many relational databases, the leaf nodes are not individual key data, but groups of keys and associated data, typically kept as pages of a fixed size convenient for the underlying computer system. For example, in a DB2 database, index leaf pages can be 4 KB, 8 KB, 16 KB, or 32 KB in size, and are typically stored as 4 KB pages on disc. Each page contains a group of index keys, sequentially ordered on the page.
Also typically, non-leaf nodes store key data representing the highest key value on corresponding leaf nodes. They also contain groups of keys, but without the associated data stored in the leaf pages. The non-leaf page structure and sizes are typically the same or similar to leaf pages.
In addition to the links that are found in a traditional B-tree to allow traversal of the B-tree, some DBMSs, such as DB2, link leaf pages together in an index with a leaf-page chain pointer. This allows efficient searching for a key or group of keys in situations where traversal of the B-tree would be less efficient. For example, when a truncated search (a search where only a first portion of the desired key is identified in the search) is allowed, traversing the leaf-page chain to get to the next leaf page may be more efficient than traversing the B-tree multiple times.
In such a leaf-page chain, each leaf page in the leaf-page chain has keys that are higher in value (according to some predetermined collating sequence) than the keys in the previous page in the leaf-page chain. Therefore, a search that traverses the B-tree to find the first of a desired set of index leaf pages may start at the first page, read all of the keys on that page, then follow the leaf-page chain to the next page, repeating the procedure to get the desired index keys in sequential order.
Insertion of data into the database adds to the index leaf pages keys that correspond to the inserted data. The inserted keys generally must preserve the sequential ordering of key data in the leaf pages. When a key is added that cannot be placed on an existing leaf page in correct sequential order because there is no room to insert the key on that page, a page split is typically performed to make room for the new key on one of the two resulting pages.
Insertion into leaf pages that cause page splits, or insertions that do not cause a split but modify the highest key value on the leaf page may cause an update of one or more non-leaf pages. Non-leaf pages may also be split when there is no space available to accommodate the high key on new pages caused by leaf page splits. The process is similar to leaf page splits.
Page splits tend to be expensive in terms of time, processing resources, and input/output (I/O) operations, and a way of reducing the number of page splits has long been desired. In addition, page splits have tended to result in less than optimal leaf-page chains. As a result, relatively expensive and time-consuming database reorganizations are frequently performed by database administrators (DBAs) because of inefficiencies resulting from page splits and complex leaf-page chains related to index key insertions.