1. Field of the Invention
The subject disclosure relates to data structures, and more particularly, to a system and method for implementing dynamic set operations on data stored in a sorted array using a balanced search tree.
2. Background of the Related Art
Binary search trees are data structures that store collections of items that can be ordered, such as integers. In a binary search tree, a data item is stored at a root. Smaller items are organized recursively in a subtree to the left of the root, while larger items are stored recursively in a subtree to the right of the root, as illustrated for example in FIG. 1. The right and left subtrees contain nodes storing the items in the collection. Those skilled in the art will readily appreciate that there are many possible search trees for a given collection of items.
Binary search trees support standard dynamic set operations such as search, insert and delete. A search is implemented to determine whether an item is contained in a tree, and if so, it returns the item. Insert is implemented to add an item to a collection stored in a tree, if it is not already present, and delete is implemented to remove an item from a collection stored in a tree. A binary search tree of height h can implement any standard dynamic set operation in O(h) time. Thus, set operations are fast if the height of the tree is small, but if the height of the tree is large, their performance may be no better than operations implemented with a linked list of data items.
A red-black tree is a binary search tree with one extra bit of storage per node: its color, which can be either red or black. By constraining the way nodes can be colored on any path from the root to a leaf, red-black trees ensure that no such path is twice as long as any other, so that the tree is approximately balanced. Thus, basic dynamic set operations on a red-black tree with n nodes take O(lg n) time in the worst case.
Each node of a red-black tree contains the fields color, key, left child pointer, right child pointer and parent. If a child or the parent of the node does not exist, the corresponding pointer field of the node contains the NIL value. NIL's are regarded as pointers to external nodes (leaves) of the binary search tree, and the normal, key-bearing nodes are regarded as internal nodes of the tree.
A binary search tree is a red-black tree if it satisfies the following red-black properties:
1) Every node is either red or black.
2) The root is black.
3) Every leaf (NIL) is black.
4) If a node is red, then both its children are black.
5) For each node, all paths from the node to descendant leaves contain the same number of black nodes.
Often, large sets of data items are displayed in tabular form, where the data items are arranged in rows. In many applications, for instance, in investment portfolios containing a multiplicity of financial instruments, the rows of the table are maintained in sorted order, and there are frequent single row insertions and deletions which must be displayed in real-time.
There are various computer applications that are currently available for displaying tabular data. Two used commonly are XRT tables for Motif based applications and JTables for Java based applications. These applications present a “view” into the table consisting of a fixed number of rows and columns which will fit onto a screen. Behind this view, is application code which supplies data to the view. In Java, this code is called a TableModel. The JTable and other similar applications retrieve data by calling the data model and requesting the value at a cell specified by row and column index. As the user scrolls around the table, there are an enormous amount of calls to the data model as different cells move into view.
The clear way for the TableModel to store data is in an array of rows, where each row contains an array of columns. In the majority of table applications, this method is quite appropriate, as it provides fast access to any individual cell by indexing into the row array, then indexing into the column array. In tables where the rows are either unsorted or there are infrequent row changes, this method is ideal. If the array is sorted but unchanging, one can use any fast sorting algorithm to create the sorted array, then access data quickly. If the array changes frequently, but the order is unimportant, new rows can be added at the end of the array very quickly.
A problem arises however, when the rows of the table are maintained in sorted order and there are frequent additions and deletions, as is common with investment portfolios. If an array representation is used, to insert a new row in the table, all of the rows below the new row must be moved down, in sorted order, to make room for the new row. This takes time proportional to the number of rows in the table. If the table is large and insertions are frequent, there will be a performance problem.
This problem can be overcome by storing the data in a binary search tree rather than in an array of rows and columns, since set operations such as insertions and deletions take place in a standard binary search tree in O(h) time, or in an red-black tree in O(lg n) time in the worst case. However, binary search trees, including red-black trees, do not provide access to data items by their position in an array. Rather, they provide access to data items by key value.
For example, if an array of data consists of a set of trades involving fixed income securities which are displayed in entry date order, a binary search tree can be generated which allows a user to perform standard dynamic set operations such as search, delete and insert in O(h) time. But when the TableModel is asked to search for and return a trade at particular position in the table, rather than one with a particular entry data, it is unable to do so. In conventional binary search trees, including red-black trees, there is no direct access to a data item by its position, unlike in an array wherein data may be retrieved by row and column index. Therefore, it would be beneficial to construct a data structure, and preferably a modified balanced tree that can facilitate access to data items at given positions in a table of sorted rows, as quickly as rows can be found by key value.