1. Field of the Invention
This invention relates to a technique for efficient storage of recursive data structures in computer memory. More particularly this invention relates to improvements in the arrangement of binary tree elements in a cache computer memory and improved performance of binary tree operations.
2. Description of the Related Art
Binary trees are widely employed data structures in practical computer applications, as they enable the rapid access of keyed data sets. Search binary trees, an example of which is seen in FIG. 1, are particularly useful for rapid localization of data. A search binary tree 10 satisfies the following conditions:
1. Every node, for example node 12, has a unique key 14.
2. All keys in the left subtree, indicated by the dotted line 16 are smaller than the key 18 in the root 20.
3. All keys in the right subtree, indicated by the dotted line 22, are larger than the key 18 in the root 20.
4. The left and right subtrees are also binary search trees.
Known common operations on binary trees include the insert operation, which inserts a new key into the set, the delete operation, which deletes a key from the set, and the traversal of the tree. The traversal operation outputs the keys in a specific order. Forms of traversal are the inorder traversal, preorder traversal, and postorder traversal. Another important and relevant operation is the membership operation, which checks if a given key belongs to the set.
Frequently a binary tree is a dynamic data structure, in which nodes are allocated during runtime using the heap-allocation mechanism found in languages like C and C++. In its classical implementation, each node of the binary tree contains a key, and two pointers to descendant nodes. Often, two additional pointers are kept in each node: a pointer to the data associated with the key, particularly if the data is large, and a pointer to the ancestor node.
There are two main drawbacks to such an implementation. First, the memory space occupied by the binary tree can considerably exceed the original memory space needed for the data set itself. For example, if the size of each key and pointer is eight bytes, then the size of each node is 24 bytesxe2x80x94three times as large as the original key. It will be evident that eliminating the pointers saves more than 50% of the memory store for the binary tree.
Secondly, as a result of using the heap to allocate and deallocate the nodes, the nodes can become scattered across memory. This is especially true when the user performs a large number of insertions and deletions, because each operation usually allocates or deallocates memory. The result is a functionally inefficient layout, and an increased number of cache misses and page faults during traversals of the binary tree. Analysis of pointer-based algorithms has indicated that as hardware performance has improved over time, cache performance increasingly outweighs instruction count as a determinant of overall performance. This is partly due to an increasing cache-miss penalty in modern machines, compared to older computers, and partly due to a trend of decreasing processor cycle time relative to memory access time. An example of such analysis is found in the paper The Influence of Caches on the Performance of Heaps, LaMarca Anthony, and Ladner, Richard E., The ACM Journal of Experimental Algorithmics, Jan. 6, 1997.
In dynamic binary trees, gaps between nodes in memory layout can be exploited during insertions and deletions. Improved performance in these dynamic operations tends to offset slow tree traversal due to increased memory latency. However in search trees in which insertion and deletions are uncommon, there is no offsetting benefit, and poor spatial locality of data in memory is especially undesirable.
Prior art approaches to improving global performance of algorithms having poorly localized data layout have involved reorganizing the computation while leaving the data layout intact. This is difficult, and in practice not too effective in the case of recursive data structures such as binary trees.
Hardware optimizations such as prefetching have resulted in better performance, however they have not afforded a completely satisfactory solution.
Another alternative to reducing cache misses is to maintain the entire binary tree in the cache during the program lifetime. Cache misses would only occur the first time the tree is accessed. Generally, however, this is not practical. If the tree is large, it will not fit into the cache. Furthermore the availability of the cache to other processes would be reduced, which might decrease the performance of the program itself, and generally degrade the performance of other processes executing on the computer.
It is therefore a primary object of some aspects of the present invention to improve the performance of computer algorithms involving pointer-based data structures.
It is another object of some aspects of the present invention to speed up binary tree traversals.
It is a further object of some aspects of the present invention to improve the performance of cache memory by increasing spatial locality in memory of recursive data structures.
It is still another object of some aspects of the present invention to decrease the memory space required by recursive data structures.
These and other objects of the present invention are attained by arranging a tree structure in a memory by the steps of: defining a tree data structure, wherein a parent node in the tree has a predetermined number of child nodes; defining an indexed array of data elements for storage thereof in a memory, wherein each element of the array holds a node of the tree; associating a parent node of the tree with a first index of the array, and associating each child node of the parent node with second indices of the array, wherein predefined individual functional relationships exist between the first index and each of the second indices; and mapping the nodes indexed by the first index and by the second indices to a predefined area of a memory which can be efficiently accessed. The arrangement is such that each node of the tree is assigned a unique position or index inside the array. From the functional arrangements existing therebetween, and given the position of any node it is easy to calculate the position of a parent or a child node. In some preferred embodiments of the invention, triplet-tiles formed by a parent and its children are positioned consecutively in the array.
According to an aspect of the invention the memory is a cache memory, and the predefined area of the memory is a cache line.
According to another aspect of the invention the predefined area of the memory is a page.
According to yet another aspect of the invention the tree is traversed by accessing one of the nodes in the predefined area of the memory according to the index associated therewith, calculating an index of a parent node or a child node of the accessed node according to the functional relationship therebetween, and then accessing the node that is associated with the calculated index.
The invention provides a method of arranging a binary tree structure in a memory, which is performed by: defining a binary tree data structure which has a plurality of nodes, including at least a parent node, a first child node and a second child node; and defining an indexed array of data elements for storage thereof in a memory, wherein each data element holds a node of the tree; associating a parent node of the tree with a first index of the array; associating a first child node of the parent node with a second index of the array; and associating a second child node of the parent node with a third index of the array. Predefined functional relationships exist between the first index and the second index, and between the first index and the third index. The nodes associated with the first index, the second index and the third index are mapped to a predefined area of a memory which can be efficiently accessed. The predefined area may be a cache line or a memory page.
Preferably the first index has a value j and the predefined functional relationships are: in a first event that the value j modulo three has a value zero, then the second index has a value j+1 and the third index has a value j+2; in a second event that the value j modulo 3 has a value one, then the second index has a value 4jxe2x88x921 and the third index has a value 4j+2; and in a third event that the value j modulo 3 has a value two, then the second index has a value 4j+1 and the third index has a value 4j+4.
The invention provides a computer software product, which is a computer-readable medium in which computer 7 program instructions are stored, which instructions, when read by a computer, cause the computer to arrange a tree structure in a memory, by executing the steps of: defining a tree data structure, wherein a parent node in the tree has a predetermined number of child nodes; defining an indexed array of data elements for storage thereof in a memory, wherein each element of the array holds a node of the tree; associating a parent node of the tree with a first index of the array; and associating each child node of the parent node with second indices of the array, wherein predefined individual functional relationships exist between the first index and each of the second indices; and mapping the nodes indexed by the first index and by the second indices to a predefined area of a memory which can be efficiently accessed.
According to an aspect of the invention the memory is a cache memory, and the predefined area memory is a cache line.
According to another aspect of the invention the predefined area of the memory is a page.
According to yet another aspect of the invention the computer program instructions further cause the computer to traverse the tree by executing the steps of: accessing one of the nodes in the predefined area of the memory according to the index associated therewith; calculating an index of a parent node or a child node of the accessed node according to the functional relationship therebetween; and then accessing the node that is associated with the calculated index.
The invention provides a computer software product, which is a computer-readable medium in which computer program instructions are stored, which instructions, when read by a computer, cause the computer to arrange a binary tree structure in a memory by executing the steps of: defining a binary tree data structure having a plurality of data elements, wherein the tree data structure has a parent node, a first child node and a second child node; defining an indexed array of data elements for storage thereof in a memory, wherein each the element represents a node of the tree; associating the parent node of the tree with a first index of the array; associating the first child node of the parent node with a second index of the array; and associating the second child node of the parent node with a third index of the array, wherein predefined individual functional relationships exist between the first index and each of the second indices and third indices. The nodes associated with the first index, the second index and the third index are mapped to a predefined area of a memory which can be efficiently accessed.
According to an aspect of the invention, the memory is a cache memory, and the predefined area memory is a cache line.
According to a further aspect of the invention the predefined area of the memory is a page.
Preferably the first index has a value j and the predefined functional relationships are: in a first event that the value j modulo three has a value zero, then the second index is a value j+1 and the third index has a value j+2; in a second event that the value j modulo 3 has a value one, then the second index has a value 4jxe2x88x921 and the third index has a value 4j+2; and in a third event that the value j modulo 3 has a value two, then the second index has a value 4j+1 and the third index has a value 4j+4.
The invention provides a data retrieval system, including a computer in which computer program instructions are stored, which instructions cause the computer to arrange a binary tree structure in a memory by executing the steps of: defining a binary tree data structure having a plurality of data elements, wherein the tree data structure has a parent node, a first child node and a second child node; defining an indexed array of data elements for storage thereof in a memory, wherein each the element represents a node of the tree; associating the parent node of the tree with a first index of the array; associating the first child node of the parent node with a second index of the array; and associating the second child node of the parent node with a third index of the array, wherein predefined individual functional relationships exist between the first index and each of the second indices and third indices. The nodes associated with the first index, the second index and the third index are mapped to a predefined area of a memory which can be efficiently accessed.
According to an aspect of the invention the memory is a cache memory, and the predefined area memory is a cache line.
According to another aspect of the invention the predefined area of the memory is a page.
According to yet another aspect of the invention the first index has a value j and the predefined functional relationships are: in a first event that the value j modulo three has a value zero, then the second index is a value j+1 and the third index has a value j+2; in a second event that the value j modulo 3 has a value one, then the second index has a value 4jxe2x88x921 and the third index has a value 4j+2; and in a third event that the value j modulo 3 has a value two, then the second index has a value 4j+1and the third index has a value 4j+4.