1. Technical Field
The present invention relates in general to data processing and in particular to organizing information that is stored in a data processing system so that the information may be accessed efficiently. Still more particularly, the present invention relates to an improved system and method for balancing binary search trees.
2. Description of the Related Art
Advances in data processing technology have led to the commonplace utilization of information repositories (e.g., database files or tables) that contain millions of records. Furthermore, the records are often organized using data structures that provide a sequential record order, based on values contained within the records. For example, a telephone company might maintain a file (or data set) that contains the name of each customer in a particular service area, and that file might be maintained in alphabetical order, according to customer name.
Primitive database systems provide ordering for such files by simply keeping records in sequential physical memory locations (such as on tape or in a disk drive) according to the desired order. Such an organization is essentially an array, and it is inefficient, and therefore rarely used, when processing large files because records usually cannot be added and deleted without altering a large percentage of the utilized storage. Other database systems employ more sophisticated data structures that utilize pointers to order records. Linked lists and binary search trees (BSTs) are two such data structures, and both allow records to be added and deleted more efficiently, relative to arrays. However, when data sets are searched for particular values (a step that is prerequisite to adding and deleting records in any ordered data set), binary search trees are much more efficient than linked lists. While search techniques for linked lists are linear searches, and thus have worst-case run times on the order of xe2x80x98nxe2x80x99 (i.e., the run time is directly proportional to the total number of records), techniques for searching BSTs are typically binary searches, and thus have worst-case run times on the order of xe2x80x98log2(n)xe2x80x99 (i.e., the run time is directly proportional to the log to the base 2 of the total number of records), provided the BST is balanced. For example, if a telephone company""s file of customers for a particular service area includes 6,000,000 customer names, a worst-case linear search will process 6,000,000 records and an average linear search will process 3,000,000 records, whereas a worst-case binary search will process no more than log2(6,000,000)=23 records.
However, as insert and delete operations are performed on a BST, the tree may lose its balanced state, and the worst-case run time can deteriorate to the order of xe2x80x98n.xe2x80x99 Consequently, the importance of keeping a BST in a balanced state, or nearly so, is commonly recognized. A BST is considered balanced when, for each node, the number of nodes in the left subtree of that node and the number of nodes in the right subtree of that node differ, at most, by one.
Numerous methods for balancing binary search trees have been developed, however those methods present a number of disadvantages. For example, xe2x80x9cNew Algorithms For Balancing Binary Search Treesxe2x80x9d by E. Haq, Y. Cheng, and S. S. Iyengar (1988 IEEE SOUTHEASTCON Conference Proceedings, pp. 378-82) (hereinafter New Algorithms) discloses an algorithm that, given an unbalanced binary search tree and a count of the nodes in that tree, computes how many levels a corresponding balanced binary search tree should have and how many nodes should reside on particular levels of that balanced tree and then balances the tree based on those computations. As suggested by New Algorithms, an advantage provided by such a technique is its adaptability for parallel processing. Among the disadvantages associated with the New Algorithms technique, however, are the number and complexity of instructions used to perform the balancing.
Furthermore, as the number of records to be balanced increases, so does the time associated with executing a lengthy and complex tree balancing utility. In fact, tests have shown that, when balancing a BST containing millions of records, the difference between inefficient and efficient balancing facilities can mean the difference between spending over an hour balancing the tree and spending less than a minute. The present invention recognizes that the need for an efficient technique for balancing BSTs becomes more pressing as databases containing millions of records become more commonplace.
To address the shortcomings of conventional tree balancing facilities, the present invention provides a system, method, and program product that balances BSTs by copying pointers to the nodes of a BST into a pointer list in accordance with a sequential order of respective data values of the nodes. The balacing facility then builds a balanced BST based on a first index to a first pointer of the pointer list and a second index to a last pointer of the pointer list. In an illustrative embodiment, the balanced BST is built by identifying a central pointer at a midpoint of the pointer list, a left range of pointers before the midpoint, and a right range of pointers after the midpoint. The central pointer is then interpreted as a pointer to a root node, and a balanced left subtree of the root node and a balanced right subtree of the root node are built based on the left range of pointers and the right range of pointers, respectively.
All objects, features, and advantages of the present invention will become apparent in the following detailed written description.