Computers are used today to store large amounts of data. Such information is often stored in information storage and retrieval systems referred to as databases. This information is stored and retrieved from a database using an interface known as a database management system (DBMS).
One type of DBMS is called a Relational Database Management System (RDBMS). An RDBMS employs relational techniques to store and retrieve data. Relational databases are organized into tables, wherein tables include both rows and columns, as is known in the art. Each row of a table may be referred to as a record.
One type of data structure used to implement the tables of a database is a B-tree, and its variant, a B+-tree. These tree structures are used to implement primary keys, secondary indexes, clustering indexes, non-clustering indexes, unique constraints, and the like.
A B-tree can be viewed as a hierarchical index. The root node is at the highest level of the tree, and stores one or more pointers, each pointing to a child of the root node. Each of these children may, in turn, store one or more pointers to children, and so on. At the lowest level of the tree are the leaf nodes, which have no children. The leaf nodes may contain data records or pointers or other indicia pointing to data records.
In addition to the pointers to child nodes, each of the non-leaf nodes of the B-tree also stores at least one key value used to search the tree for a particular data record. For instance, assume a node stores a first key value, and first and second pointers that each point to a child node. According to one exemplary organizational structure, the first pointer may be used to locate the child node storing one or more key values that are less than the first key value, whereas the second pointer is used to locate the child storing one or more key values greater than, or equal to, the first key. Using the key values and the pointers to search the tree in this manner, a node may be located that stores a record associated with a particular key value that is used as the search key.
Rows, or records, of a table, are generally stored within a logical container referred to as a file. Files reside on a persistent storage media and may contain all of the records of one or more tables. In the art, a file may be divided into equal sized portions called pages. When a database table is represented by a tree structure in the manner discussed above, each page of the file that stores the database corresponds to a node of the tree. Thus, in the following discussion, the terms “node” and “page” are used interchangeably.
DBMS applications typically build B-trees according to the following process. The DBMS application obtains a first record having a first key value that is to be added to new B-tree. A root node is created that points to a leaf node, and the record is stored within the leaf node. When a second record is received, the key value stored within the root node and the second record will be used to determine whether the second record will be stored within the existing leaf node or within a newly created leaf node. The point of insertion will be selected so that all records are stored in a sort order based on the key values. Similarly, as additional records are received, the records are added to the tree by traversing the tree structure using the key values to locate the appropriate location of insertion, then adding leaf nodes as necessary. Whenever it is determined that the root or an intermediate node has too many children, that node is divided into two nodes, each having some of the children of the original node. Similarly, whenever it is determined that a record must be added to a leaf node that is too full to receive the record, the leaf node must be split to accommodate the new record.
When a leaf node must be divided to make room for an additional data record, prior art systems generally split the leaf node at the logical or physical halfway point. If the logical halfway point is used to make the division, two leaf nodes are created, with each storing half of the records of the original leaf node. If the physical halfway point is employed, each of the two leaf nodes stores approximately half of the data stored within the original leaf nodes. For example, assume an original leaf node includes four records, with the first record being roughly three times larger than any of the other three records. If the original leaf node is to be divided into two leaf nodes at the physical halfway point, a first leaf node will store the first record, and a second leaf node will store the remaining three records.
The foregoing prior art method of dividing a leaf node at a halfway point to insert a data record generally provides results in satisfactory memory usage when unsorted records are received in a random manner. However, this approach does not work well when a series of sequential records are received. For example, assume a sequence of sequential records in an ascending sort order is received. A leaf node is located that contains the insertion point for this sequence of records. Assuming that leaf node is full, the prior art method will divide that node at a halfway point into a first and a second node. Assuming the insertion point for the sequence is located within the second node, the first node remains only half full. Assume, further, that after a portion of the sequence is stored within the second node, that node becomes full, and is therefore divided in half to produce a third node. Because the sequence continues where it left off, the insertion of additional records occurs within the third node, leaving the second node only half full. The process of splitting nodes in this manner continues as the sequence is processed, resulting in nodes that are only half full. That is, the “load factor” of the nodes is only 50%, resulting in inefficient use of storage space and causing subsequent search operations to be more time-consuming.
U.S. Pat. No. 5,644,763 to Roy provides an alternative approach to the above-described prior art mechanism. According to this alternative, in some instances wherein a sequence of records are received in an ascending order, a leaf node that includes the insertion point will be divided at that insertion point instead of at a halfway point. This allows the newly-created to nodes to be filled such that the load factor increases above 50%. However, the method disclosed in Roy is only effective if records sorted within an ascending order are added to the end of an existing sequence of records. When records are being added in descending sort order, or are added to a beginning of a sequence of records, the load factor can drop significantly below 50%. This results in even poorer performance than was provided by the alternative prior art method. This can be appreciated by the following example.
Assume that a descending sequence of records is being inserted into a tree structure according to the method described in the Roy patent. A node is located that includes the insertion point for this sequence. The insertion point is not located at the edge of a node. After several records in the descending sequence are stored to this node, the node becomes full. According to Roy, this node will be divided at the insertion point into two nodes, with a first node storing the records in the sequence, and the other node storing other records. These two nodes are linked to their parent node via the index values of the first records in each of the nodes.
Next, another record in the descending sequence is received. This record will, by definition, have an index value less than the index values of the other records in the sequence that have been stored within the tree. Because of the way the nodes are linked to the parent node, the insertion point for this record will not be located on the first node with the other records in the sequence, but will instead be located on the second node. After one or more additional records in the sequence are received, the second node becomes full and another division is performed. One of the resulting nodes contains the records in the sequence, and the other node stores those records not in the sequence. The process is then repeated again. This causes the load factor to trend downward, since the nodes storing records included in the sequence are only partially full, and can never store any additional records.
Although the foregoing example discusses a situation involving an insertion point that is not at the edge of a node, a similar situation occurs when the insertion point is at the edge of a node. Moreover, although the above example relates to a case wherein index values from first records in the nodes are used to associate parent and child nodes, a similar problem exists when index values from the last records in the nodes are used to make this association when an ascending sequence of records is being received.
Thus, what is needed is a system and method for inserting sorted data into a hierarchical data structure in a manner that utilizes storage space more efficiently. This system and method is optimally suited for storing data sorted both in an ascending and a descending order, and for storing data both to the beginning and the end of an existing sequence of data.