I. Technical Field
Various embodiments of the invention relate generally to database systems. More particularly, various embodiments of the invention relate to a compression scheme for improving index cache behavior in main-memory database systems.
II. Description of the Related Art
With server DRAM modules priced at less than $2,000/GB, many of the database tables and indexes can now fit in the main memory of modern computer systems. It is predicted that it will be common to have terabytes of main memory for a database within ten years or so.
With such a large amount of memory, the traditional bottleneck of disk access almost disappears, especially for search transactions. Instead, memory access becomes a new bottleneck. A recent study with commercial DBMSs shows that half of the execution time is spent on memory access when the whole database resides in memory.
Since the speed in DRAM chips has been traded off for the capacity, the gap between the CPU speed and the DRAM speed has grown significantly during the past decade. In today's computer systems, each memory access costs tens of processor cycles. To overcome this gap, modern processors adopt up to several megabytes of SRAM as the cache, which can be accessed in just one or two processor cycles.
Recognizing the widening gap between the CPU speed and the DRAM speed, the importance of the cache behavior in the design of main memory indexes was emphasized. It was shown that the cache-conscious search trees (“CSS-trees”) perform lookups much faster than binary search trees and T-trees in the read-only environment. B+-trees and their variants were shown to exhibit a reasonably good cache behavior.
For example, CSB+-trees (“Cache Sensitive B+-trees”) store child nodes contiguously in memory to eliminate most child pointers in the nodes except the first one. The location of the i-th child node is computed from that of the first child. Providing more room for keys in the node, this pointer elimination approach effectively doubles the fanout of a B+-tree. Given the node size in the order of the cache block size, the fanout doubling reduces the height of the B+-tree, which again leads to smaller number of cache misses during the tree traversal.
Note that such a pointer elimination technique does not provide much benefit in disk-based indexes where the fanout is typically in the order of a few hundreds and doubling the fanout does not lead to an immediate reduction in the tree height.
However, the pointer elimination technique cannot be directly applied to multidimensional index structures such as the R-tree, which have numerous application domains such as spatio-temporal databases, data warehouses, and directory servers. The data object stored in an R-tree are approximated by, so called, minimum bounding rectangles (“MBRs”) in the multidimensional index space, where each MBR is the minimal hyper-rectangle (i.e. 2-dimensional or higher-dimensional rectangle or box) enclosing the corresponding data object. Those skilled in the art would appreciate the MBR may be extended to a multi-dimensional shape including boxes or pyramids.
Typically, MBRs are much larger than pointers. Thus, pointer elimination alone cannot widen the index tree to reduce the tree height significantly. For example, when the 16-byte MBR is used for the two-dimensional key, the simple elimination of a 4-byte pointer provides at most 25% more room for the keys, and this increase is not big enough to make any significant difference in the tree height for the improved cache behavior. Therefore, there is a need for a scheme for improving cache behavior to in accessing multidimensional indexes to access main-memory database.