The present invention is in the general field of databases and database management systems.
Using trees as a database structure for accessing data records is very common, and indeed, tree schemes that serve to this end are known in literature. When considering a large amount of data, it is of particular importance to maintain a so-called balanced structure of the tree, in order to avoid long paths for accessing a given data record from the root node to the leaf node that is associated with the sought data record. In order to cope with these shortcomings, various tree structures, such as the known Btree of 2- 3-tree, confer in inherent balanced tree structure, even after the tree has undergone modification, such as the insertion of a new data record, the deletion of an existing data record and/or the updating of the value of a given data record in the tree. The inherent balance (or essentially balanced) structure is accomplished, however, at the penalty of inflating the contents of the nodes in the tree and, consequently, unduly increasing the file size that holds the tree, particularly insofar as large trees which hold multitude of data records are concerned. The large volume of the files adversely affects the performance of the data management system in terms of accessing time to a sought data record, which is obviously undesired.
There are trees available in the art which are more efficient in terms of the volume of data that is held in entry nodes, e.g. the tri-S tree and, consequently, the file size of tri-S-tree, which holds the same number of data records, is significantly smaller than the counterpart size of an inherently balanced tree, e.g.-2-3-tree or Btree. However, the tri-S-tree is inherently unbalanced which, as explained above, adversely the affects the performance in terms of access time to data records, and whilst there are proposed techniques which render this tree balanced, the application thereof in real life scenarios is practically infeasible.
There is a accordingly a need in the art to provide a generic technique which will enable to essentially balance trees which are inherently susceptible to an unbalanced structure, and which will not interfere with the intrinsic search scheme that is associated with the new balanced tree.
Realization of data dictionaries which provide information as to the type of stored data, definition of data fields etc. is well known in the literature, and there are multitute techniques that serve to this end. There is however a need in the art to provide a a data dictionary structure that is incorporated with the digital tree structure. Reflection of the data model (such as, Hierarchy, Relational, Object Oriented, Object Relational) and reflection of several data models simultaneously from within the data elements and the embodiment of the data relations would allow higher efficiency in DBMS mechanism.
Detailed information on Tri-S (tries) can be found atxe2x80x94Donald Knuth, The Art of Computer Programming, Vol. III, Sorting and Searching, Third Edition, pages 481-490, 493-494, 499-502, 505. A specific form of tries is a compressed form of tires called Patricia triesxe2x80x94Donald Knuth, The Art of Computer Programming, Vol. III, Sorting and Searching, Third Edition, pages 490-493, 497-499, 501-504. A Patricia trie is an example of a sparse trie that differs from a standard trie in that nodes with one child are compressed into their parent node, so that all nodes have at least two children. An example of a Patricia trie, is shown in FIG. 3A, where the nodes are labeled with their depth: the position in the key represented by the node (in the example of FIG. 3A, the node represent nibble position in the key). Because not every character of the key is examined during the search, the record that is ultimately found must be checked against the search key. For example, if we search for record g (A333444) in FIG. 3A, we will follow nodes with the values 3 and 7 in block 60 and the node with the value 9 in block 61 to reach the g record by the link labeled 4. We now need to compare the search key with the key of record g hence a search for (A333445) would lead to record g as well. The size of the Patricia trie does not depend on the length of inserted keys. Rather, each new key adds at most a single link and node to the index regardless of the actual key length. Furthermore, the unlike B-trees, Patricia tries grow slowly even as large numbers of strings are inserted because of the aggressive (lossy) compression inherent in the structure.
Although researchers have long known about Patricia tries, such structures have rarely been used to manage large amounts of data, especially disk-based data, because they are unbalanced and best suited for usage in main memory. There is a need in the art for a structure that has the graceful scaling properties of Patricia tries, but that is balanced and optimized for disk-based access like B-trees.
The technique of the invention allows for a structure of the kind specified (applied for tries and sparse tries, not only to Patricia tries). It adds extra index layers to allow an update or search to proceed directly to the needed portion of the index. Every update and query accesses about the same number of layers, providing balanced access to the index. The extra layers constitute a horizontal index (referred to as horizontal oriented digital tree structure) that includes the vertical structure of the original index (in the example of FIG. 3Axe2x80x94a Patricia trie), referred to as vertical oriented digital tree structure.
The present invention provides for A method for obtaining balanced digital tree structure; the digital tree structure including a first vertical oriented digital tree structure that is susceptible to unbalanced structure of blocks due to modify transactions; the first digital tree including blocks, each, accommodating a plurality of nodes and links originating from said nodes; the method including the steps of:
constructing i (i greater than =1) vertical oriented digital tree structure levels which, along with said first digital tree structure, constitute i+1 vertical oriented digital tree structure levels,; said first digital tree constituting the lower vertical oriented tree; the i trees are arranged such that from blocks of the jth tree from among said i trees, it is possible to access horizontally all the blocks of the (j+1)th, lower level, digital tree structure, according to a common key value of the accessed block, whereby an essentially balanced horizontal oriented digital tree structure is obtained.
Still further the invention provides for a method for obtaining a balanced digital tree structure; the tree including blocks each accommodating a plurality of nodes and links originated from said nodes; leaf nodes from among said nodes are associated with data records; the method comprising executing the following steps as many times as required:
(I) replacing a block, constituting a replaced block, with at least two split blocks, constituted by a splitting block and at least one split block, such that few from among the nodes of said split block are accommodated within said splitting block and the remaining nodes from among the nodes of said split block are accommodated within the al least one split block; the said few nodes including a splitting node associated with at least one split link and the remaining nodes including at least one split node associated with said at least one split link;
(II) in the case that said splitting block is not a child block,
(a) constructing a father block;
(b) coping at least the splitting node to the father block, thereby constituting at least one duplicate splitting node;
(c) linking at least one duplicate splitting node to the splitting block by means of a direct pointer);
(d) linking, by far link, at least one duplicate splitting node to the at least one split block; the far link(s) having the value of said split link(s);
(e)
(III) in the case that said splitting block is a child block of a father block,
(a)coping at least the splitting node to the father block in the case that it is not accommodated within the father block, the splitting block in the father block constituting a duplicate splitting node(s);
(b) linking the duplicate splitting node or children node thereof in the father block, to the splitting block by means of a direct link
(c) linking, by far link, the duplicate splitting node or children node thereof in the father block to the split block; the far link(s) having the value of said split link(s).
(d) establishing a intra-block connections between the nodes in the father block in such a way that all the blocks connected with far links from the said nodes in the father block can be accessed by their common key applying the search scheme that is relevant in the vertical tree of the father block.
The invention further provides for a method for obtaining a balanced digital tree structure; the tree including blocks each accommodating a plurality of nodes and links originated from said nodes; leaf nodes from among said nodes are associated with data records; the method comprising executing the following steps as many times as required:
(i) replacing a block, constituting a replaced block, with at least two split blocks such that few from among the nodes of said split block are accommodated within one of said split blocks and the remaining nodes from among the nodes of said split block are accommodated within other split blocks;
(ii) coping at least one node from among the nodes of said replaced block into a block such that said at least two split blocks being children blocks thereof.
Still further, the invention provides for A method for obtaining balanced digital tree structure; the digital tree structure including a first vertical oriented digital tree structure that is susceptible to unbalanced structure of blocks due to modify transactions; the first digital tree including blocks, each, accommodating a plurality of nodes and links originating from said nodes; the method including the step of:
constructing an essentially balanced horizontal tree structure having probabilistic search characteristics.
The present invention still further provides for, in a digital tree structure having a probabilistic access characteristics, a method for recovering faulty search or modify transaction that is associated with a search path, comprising:
(i) returning in the search path to a node or block from which another search path can be commenced;
(ii) repeating step (I) until a correct search or modify transaction is accomplished, or a failure criterion is met.
The invention further provides for a memory containing a digital tree structure that was generated by the specified methods.
The present invention further provide apparatus which operates mutatis mutandis similar to the specified method aspects of the invention.
As will be explained in greater detail below, the procedure of constructing i digital tree structures preferably, although not necessarily, terminates when the uppermost level is constituted by a single block tree. In accordance with the invention, the balancing technique may be accomplished, on the fly in order to maintain balanced tree of blocks, or alternatively post factum in order to render unbalanced structure to an essentially balanced
In the context of the invention, the hereinbelow terms should be construed as follows:
Replacingxe2x80x94encompasses, preferably although not necessarily, using the replaced block as one of the splitting or split blocks.
Obtaining balanced tree structurexe2x80x94encompasses applying the techniques of the invention, post factum, on an unbalanced structure, bringing about a balanced or essentially balanced structure, or, if desire, applying the technique of the invention on the fly, so as to maintain thus, a balanced or an essentially balanced structure whenever there is a necessity to split block.
Digital tree structurexe2x80x94encompasses any known and new search tree. The search tree encompasses trees that are susceptible to unbalanced structure, including but not limited to triS (pronounces try-S), the one defined in U.S. patent U.S. Pat No. 5,495,609 and others. Digital tree structure also encompasses trees which maintain essentially balanced structure, including but not limited to 2-3 tree Btree etc. As is well known, search tree is a data structure arranged as a tree which enables to access a data record(s) according to key(s) of the data record(s). Blocks and/or nodes of digital tree structure may be associated with part or all of the key relating to a node or to a block. In a specific embodiment, each block being associated with the common key or portion thereof. Other information relating or not relating to the search scheme may also be included in the Digital tree structure.
Search schemexe2x80x94meaning the search path characteristics (i.e. the algorithm) that is used for accessing a given data record; intra-block search scheme meaning the search path characteristics (i.e. the algorithm) that is used, inside the block, for accessing a given data record. The data record is not necessarily accommodated within said block.
Leaf nodes are associated with data recordsxe2x80x94the term associated with encompasses any realization which enables to access data records from leaf nodes. Thus, by way of example, a data record may be accessed directly (i.e. through pointer) from the leaf node. By another non-limiting example, the leaf node points to data structure, (e.g. a table) which, in turn, enables to access data records. Other variants are of course, also feasible.
Modify transactionsxe2x80x94transaction applied to a digital tree structure consisting of insert new data record, delete existing data record or modifying the value of existing data record.
Vertical orientated digital tree structurexe2x80x94conventional orientation of digital tree structure from root to leaves. As will be exemplified below, it is not always obligatory to maintain all the links between nodes and/or blocks in the vertical tree, and this is due to the construction of the horizontal oriented digital tree structure of the invention. This definition encompasses also deviation from conventional definition of tree e.g. a level of the Btree digital tree structure referred to for example in FIG. 4 below.
Horizontal oriented digital tree structurexe2x80x94having n levels of vertical orientated digital tree structures with the first level standing for the uppermost level and the nth level standing for the lowermost level which is normally associated with data records, and allows to move from a block in the ith level to a block in the i+1th level according to a common key value of the block.
Common key value of a blockxe2x80x94a key portion that is associated with all nodes in a block. The common key value of a block is the key portion, common to all the data records that can be accessed from the block by the relevent search scheme. The common key is therefore a characteristic of all the nodes in the block. If desired, part or all of the common key may be held explicitly in the block.
Direct linkxe2x80x94a link between a duplicated splitting node within a block in i vertical oriented digital tree to a block in the i+1 vertical oriented digital tree that includes the splitting node.
Far linkxe2x80x94a link from a node in a block in I vertical oriented digital tree to a split block in the I+1 vertical oriented digital tree or to data records.
By another aspect, the invention provides for a memory containing at least one computer file that includes data representing a digital tree structure incorporating a data dictionary; the data dictionary including a feature that data records are grouped in at least two sub-trees each being indicative of a distinct type of data, and wherein all data records belonging to a given sub-tree, from among said at least two sub trees, are associated with respective type.
Still in accordance with the other aspect of the invention there is provided a memory containing at least one computer file that includes data representing a digital tree structure incorporating a data dictionary; the data dictionary including a feature that data records are grouped in at least two sub-trees each being indicative of a distinct type of data, and wherein all data records belonging to a given sub-tree, from among said at least two sub trees, are associated with respective type; the data dictionary further represents an ordered structure.
Still further, the invention provides for a memory containing at least one computer file that includes data representing a digital tree structure incorporating a data dictionary; the data dictionary including a feature that data records are grouped in at least two sub-trees each being indicative of a distinct type of data, and wherein all data records belonging to a given sub-tree, from among said at least two sub trees, are associated with respective types; said data dictionary further include data relationship feature.
The invention further provides for a memory containing at least one computer file that includes data representing a digital tree structure incorporating a data dictionary; the data dictionary including a feature that data records are grouped in at least two sub-trees each being indicative of a distinct type of data, and wherein all the data records belonging to a given sub-tree, from among said at least two sub trees, are associated with respective type; said data dictionary further represent levels of data records.