The invention relates to computer data and file storage systems, and more particularly to a method and system for inserting and deleting search keys from a structure implementing a compact representation of a 0-complete binary tree.
Data and file storage systems such as a database, in particular those implemented in computer systems, provide for the storage and retrieval of specific items of information stored in the database. The information stored in the database is generally indexed such that any specific item of information in the database may be located using search keys. Searches are generally accomplished by using search keys to search through an index to find pointers to the most likely locations of the information in the database, whether that location is within the memory of the computer system or in a storage medium of the computer system.
An index to database records within a computer is sometimes structured as a “trie” comprised of one or more nodes, connected hierarchically, stored within a storage means of the computer. A trie is a tree structure designed for storing strings in which there is one node for every common prefix. The actual strings are stored at the “bottom” of this hierarchical structure in leaf nodes. Each node generally includes one or more branch fields containing information for directing a search, and each such branch field usually contains a pointer, or branch, to another node, and an associated branch key indicating ranges or types of information that may be located along that branch from the node. The trie, and any search of the trie, begins at a single node referred to as the root node and progresses downwards through the various branch nodes until the nodes containing either the items of information or, more usually, pointers to items of information are reached. The information related nodes are often referred to as leaf nodes or, since this is the level at which the search either succeeds or fails, failure nodes. Within a tree storage structure of a computer, any node within a trie is a parent node with respect to all nodes dependent from that node, and sub-structures within a trie which are dependent from that parent node are often referred to as subtries with respect to that node.
The decision as to which direction, or branch, to take through a tree storage structure in a search is determined by comparing the search key and the branch keys stored in each node encountered in the search. The results of the comparisons to the branches descending from a given node are to be followed in the next step of the search. In this regard, search keys are most generally comprised of strings of characters or numbers which relate to the item or items of information to be searched for within the computer system.
The prior art contains a variety of search tree data storage structures for computer database systems, among which is the apparent ancestor from which all later tree structures have been developed and the most general form of search tree well known in the art, the “B-tree.” See, for example, Knuth, The Art of Computer Programming, Vol. 3, pp. 473-479. A B-tree provides both primary access and then secondary access to a data set. Therefore, these trees have often been used in data storage structures utilized by database and file systems. Nevertheless, there are problems that exist with the utilization of B-tree storage structures within database systems. Every indexed attribute value must be replicated in the index itself. The cumulative effect of replicating many secondary index values is to create indices which often exceed the size of the database itself. This overhead can force database designers to reject potentially useful access paths. Moreover, inclusion of search key values within blocks of the B-tree significantly decreases the block fan out and increases tree depth and retrieval time.
Another tree structure which can be implemented in computer database systems, compact 0-complete binary trees (i.e., O-trees), eliminates search values from indices by replacing them with small surrogates whose typical 8-bit length will be adequate for most practical key lengths (i.e., less than 32 bytes). Thus, actual values can be stored anywhere in arbitrary order, leaving the indices to the tree structure to be just hierarchical collections of (surrogate, pointer) pairs stored in an index block. This organization can reduce the size of the indexes by about 50% to 80% and increases the branching factor of the trees, which provides a reduction in the number of disk accesses in the system per exact match query within computer database systems. See Orlandic and Pfaltz, Compact 0-Complete Trees, Proceedings of the 14th VLDB Conference, pp. 372-381.
While the known method of creating C0-trees increases storage utilization 50% to 80% over B-trees, there still remains a waste of storage space because of the presence of dummy entries (surrogate, pointer==NIL) wherein the number of index entries at the lowest level of the tree exceeds the actual number of records stored. Therefore, the expected storage utilization of index entries of C0-trees at the lowest tree level is 0.567 versus 0.693 as in the case of B-trees. See Orlandic and Pfaltz, Compact 0-Complete Trees, Proceedings of the 14th VLDB Conference, pp. 372-381.
Moreover, although B-trees and O-tree storage structures represent efficient methods of searching for values, both methods require initial generation and subsequent maintenance of the tree data storage structure itself. Neither of these computer storage structures inherently stores information in sorted order.
A trie can be built more efficiently if the key records are initially sorted in the order of their key field, than if records are in random order. Therefore, an efficient computer database system should sort sets of keys first, and then build a trie based on keys extracted at intervals from the sorted keys. Searches of the tree data storage structure will also be performed more efficiently if the trie does not contain an excess number of keys, namely keys that are associated with data no longer in the database or keys that are no longer necessary to maintain the structure of the trie. In some implementations of O-tree storage structures, the method of storing and indexing the search keys may be complex, and the method of inserting and deleting groups of keys may be inefficient. Therefore, a need exists to simplify the trie structure and to easily delete and insert groups of keys in batches is desirable, especially when large groups of keys are involved.