Computers are used today to store large amounts of data. Such information is often stored in information storage and retrieval systems referred to as databases. This information is stored and retrieved from a database using an interface known as a database management system (DBMS).
One type of DBMS is called a Relational Database Management System (RDBMS). An RDBMS employs relational techniques to store and retrieve data. Relational databases are organized into tables, wherein tables include both rows and columns, as is known in the art. A row of the horizontal table may be referred to as a record.
One type of data structure used to implement the tables of a database is a B-tree. A B-tree can be viewed as a hierarchical index. The root node is at the highest level of the tree, and may store one or more pointers, each pointing to a child of the root node. Each of these children may, in turn, store one or more pointers to children, and so on. At the lowest level of the tree are the leaf nodes, which typically store records containing data.
In addition to the pointers, the nodes of the B-tree also store at least one key value used to search the tree for a particular data record. For instance, assume a node stores a first key value, and first and second pointers that each point to a child node. According to one exemplary organizational structure, the first pointer may be used to locate the child node storing one or more key values that are less than the first key value, whereas the second pointer is used to locate the child storing one or more key values greater than, or equal to, the first key. Using the key values and the pointers to search the tree in this manner, a node may be located that stores a record associated with a particular key value that is used as the search key.
DBMS applications typically build B-trees according to the following process. The DBMS application obtains a first record having a first key value that is to be added to new B-tree. A root node is created that points to a leaf node, and the record is stored within the leaf node. When a second record is received, the key value stored within the root node and the second record will be used to determine whether the second record will be stored within the existing leaf node or within a newly created leaf node. The point of insertion will be selected so that all records are stored in a sort order based on the key values. Similarly, as additional records are received, the records are added to the tree by traversing the tree structure using the key values to locate the appropriate location of insertion, then adding leaf nodes as necessary. Whenever it is determined that the root or an intermediate node has too many children, that node is divided into two nodes, each having some of the children of the original node. Similarly, it is determined that a record must be added to a leaf node that is too full to receive the record, the leaf node must be split to accommodate the new addition.
The foregoing approach of building trees in a top down manner is generally employed when a tree is created from a stream of unsorted records. In this situation, the relationship existing between the key values of two sequentially received records is unknown. As a result, the tree must always be searched to find the appropriate location for insertion of the record.
Another consequence of the foregoing mechanism involves the allocation of storage space. Generally, each node of the tree is stored in an addressable portion of storage space that can be retrieved using a single I/O operation. This portion of storage space may be referred to as a “page”. When adding records to a node, it may be desirable to store this information in a way that will allow additional information to be readily inserted into the page that stores the node. For example, it may be desirable to reserve space between two records having non-sequential key values so that a record having an intermediate key value can be inserted onto the page without having to move any of the existing records. This objective complicates the building of B-trees from a stream of unsorted records.
In some situations, a DBMS application builds a tree structure from a stream of sorted records. That is, a know relationship exists between the key and/or index values of sequentially received records. For example, it may be known that the key value of a first record will be less than that of the record that is encountered next, which, in turn, will be less than that of the record received thereafter. In this type of situation, it may be advantageous to build the B-tree from the bottom up. According to this scenario, a page is allocated to a leaf node. Records are stored in the leaf node according to the order of receipt. Since the records are sorted, no space needs to be reserved for later insertion of additional records. When the page is full, a non-leaf, or “index”, node is created to point to this leaf node. A second pointer is stored within this “index” node to point to a second newly-created leaf node which will receive the next sequence of records. When the first index node is full, a second index node is added that points to the first index node. This second index node will also point to another leaf node, and the process will be repeated.
Several disadvantages exist with the foregoing approach to building a B-tree. First, as records are added to the tree, the tree becomes unbalanced. In other words, the number of levels of hierarchy existing between the root node and a given leaf node may vary widely. For instance, in the foregoing example, two levels of hierarchy exist between the second index node and the records that were received first. Only one level of hierarchy exists between the second index node and latter received records. This disparity will continue to grow as hierarchical levels are added to the tree. As a result, search times will not be uniform, but will depend on which leaf node stores a particular record.
One way to address the foregoing problem is to re-balance the tree after it is constructed. This involves “shuffling” nodes so that all paths to the leaf nodes approximately traverse the same number of hierarchical levels. To do this, some existing links between hierarchical levels are severed and new links are created. Additionally, some pages may be divided in half to allow room to add the additional links. This re-balancing activity is time-consuming.
Another problem with the foregoing mechanism involves the latency associated with data availability. In general, data is not available to a user as a table is being constructed. If a large number of records are received for entry into a table, none of the data will be available until the entire table is constructed. This time could be prohibitively large. Thus, a system and method is needed to address the foregoing limitations.