Computers are used today to store large amounts of data. Such information is often stored in information storage and retrieval systems referred to as databases. This information is stored and retrieved from a database using an interface known as a database management system (DBMS).
One type of DBMS is called a Relational Database Management System (RDBMS). An RDBMS employs relational techniques to store and retrieve data. Relational databases are organized into tables, wherein tables include both rows and columns, as is known in the art. Each row of a table may be referred to as a record. Each column defines a respective data item that may be saved within each of the records. For example, a column may store data that identifies a business name that is associated with the record. The remaining columns may store additional data for the business identified within the business name column.
One type of data structure used to implement the tables of a database is a B-tree, and its variant, a B+-tree. As is known in the art, these tree structures are used to implement primary keys, secondary indexes, clustering indexes, non-clustering indexes, unique constraints, and the like.
A B-tree can be viewed as a hierarchical index. The root node is at the highest level of the tree, and stores one or more pointers, each pointing to a child of the root node. Each of these children may, in turn, store one or more pointers to children, and so on. At the lowest level of the tree are the leaf nodes, which have no children. The leaf nodes may contain data records or pointers or other indicia pointing to data records.
In addition to the pointers to child nodes, each of the non-leaf nodes of the B-tree also stores at least one index, or key, value that is used to search the tree for a particular data record. For instance, assume a node stores a first index value, and first and second pointers that each point to a child node. According to one exemplary organizational structure, the first pointer may be used to locate the child node storing one or more index values that are less than the first index value, whereas the second pointer is used to locate the child storing one or more index values greater than, or equal to, the first index. Using the index values and the pointers to search the tree in this manner, a node may be located that stores a record associated with a particular index value that is used as the search index.
As an example of the foregoing, consider a database that tracks residential phone numbers. A single B-tree may be created to allow for searching of the database. Each non-leaf node in the B-tree may store a first index value consisting of a last name. For each non-leaf node, a first pointer may be used to locate information on people having names less than the index value (e.g., preceding the index value name in an alphabetical listing). A second pointer is used to locate information on the remaining people. A variant of this B-tree may utilize two-part index values that include both first and last names so that a group of people having the same last name may be secondarily sorted alphabetically according to their first names.
As may be appreciated, a database that includes information for all people located within a given country, state, county, or other sizeable geographic area may become unwieldy. As a result, the time required to search the database may become prohibitive. In such cases, it may be beneficial to create multiple “partitions” for the database. Each of the partitions tracks information for a group of records, wherein the group may be defined based on the values in any one or more of the columns. As one example, unrelated records may be grouped into a given partition using a hash function.
Returning to the current example, a partition may be created for each city represented within the database. The partition includes a respective B-tree that tracks all of the people that live in that city. Then, when a search is conducted, a user not only specifies the index value (i.e., a person's name), but also the partition that is to be searched (the city in which the person lives.) Because the search is initiated on a much smaller sub-set of all of the database entries, it can be completed much more quickly.
Another advantage of maintaining separate B-trees for each partition involves increased availability. It is possible for the nodes of a B-tee to become corrupted, or to be unavailable because of ongoing updates to those nodes. If a database is managed using multiple B-trees that are each associated with a different partition, searches may be continue within most partitions even if some nodes of another partition are unavailable or corrupted. This increases the overall efficiency of the database.
On the other hand, when multiple partitions are maintained, efficiency may be decreased if a query does not, or cannot, provide the name of the partition to be searched. In this case, it is necessary to initiate separate searches for multiple ones, or even for all, of the partitions in the database. This is generally more time-consuming than if a single search could be initiated on a single B-tree that represents all of the names in the directory.
Another related problem involves the situation wherein a partition that is identified by a query is searched for an index value that is not located within the partition. Therefore, a search must be initiated on other associated partitions. For instance, if a search is initiated to locate a telephone number for a “John Doe” within the partition for “City A”, and if the search does not locate this index value, it may be desirable to search the directories for an area “surrounding” City A. Again, if separate searches must be initiated from the root nodes of the B-trees for each of the cities in this “surrounding area”, the search may be prohibitively time-consuming.
What is needed, therefore, is a system and method for maintaining a database that includes multiple partitions so that searching and management of these partitions can be performed in a more efficient manner.