The present invention is directed, in general, to database management and, more specifically, to a system and method for improving index performance through prefetching.
As the gap between processor speed and both DRAM and disk speeds continues to grow exponentially, it is becoming increasingly important to make effective use of caches to achieve high performance on database management systems. Caching exists at multiple levels within modern memory hierarchies. Typically, two or more levels of SRAM serve as cache memories (or xe2x80x9ccaches,xe2x80x9d for short) for the contents of main memory in DRAM, which in turn may serve as a cache memory for the contents of a disk. Database researchers have historically focused on the importance of this latter form of caching (also known as a xe2x80x9cbuffer poolxe2x80x9d). However, recent studies have demonstrated that even with traditional disk-oriented databases, roughly 50% or more of execution time is often wasted due to SRAM cache misses. For main-memory databases, it is even clearer that SRAM cache performance is crucial. Hence, attention has been directed in revisiting core database algorithms in an effort to make them more cache friendly.
Index structures are used extensively throughout database systems, and they are often implemented as B+-Trees. While database management systems perform several different operations that involve B+-Tree indices (e.g., selections, joins, etc.), these higher-level operations can be decomposed into two key lower-level access patterns. One of these is searching for a particular key, which involves descending from the root node to a leaf node using binary search within a given node to determine which child pointer to follow. The other is scanning some portion of the index, which involves traversing the leaves (leaf nodes) through a linked-list structure for a non-clustered index. For clustered indices, one can directly scan the database table after searching for the starting key. While search time is the key factor in single value selections and nested loop index joins, scan time is the dominant effect in range selections.
An example of cache performance of both search and scan on B+-Tree indices may be considered by simulating their performance using a memory subsystem comparable to that associated with a Compaq ES40. A search experiment may look up 100,000 random keys in a main-memory B+-Tree index after it has been bulkloaded with 10 million keys. A scan experiment performs 100 range scan operations starting at random keys, each of which scans through one million (key, tupleID) pairs retrieving the tupleID values. The results for shorter range scans (e.g., 1000 tuple scans) are similar. The B+-Tree node size is equal to the cache line size, which is 64 bytes in this example. The results may be broken down into the three categories of busy time, data cache stalls, and other stalls. Results of the experiment indicate that both search and scan accesses on B+-Tree indices spend a significant fraction of their time (i.e., 65% and 84%, respectively) stalled on data cache misses. Hence there appears to be considerable room for improvement.
In an effort to improve the cache performance of index searches for main-memory databases, the two other types of index structures cache-sensitive search treesxe2x80x9d (CSS-Trees) and cache-sensitive B+- Trees (CSB+-Trees) have been studied. The premise of these studies is the conventional wisdom that the optimal tree node size is equal to the natural data transfer size. This corresponds to the disk page size for disk-resident databases and the cache line size for main-memory databases. Because cache lines are roughly two orders of magnitude smaller than disk pages (e.g., 64 bytes vs. 4 Kbytes), the resulting index trees for main-memory databases are considerably deeper. Since the number of expensive cache misses is roughly proportional to the height of the tree, it would be desirable to somehow increase the effective fanout (also called the branching factor) of the tree, without paying the price of additional cache misses that this would normally imply.
This may be accomplished by restricting the data layout such that the location of each child node can be directly computed from the parent node""s address (or a single pointer) thereby eleminating all or nearly all of the child pointers. Assuming that keys and pointers are the same size, this effectively doubles the fanout of cache-line-sized tree nodes, thus reducing the height of the tree and the number of cache misses. CSS-Trees eliminate all child pointers, but do not support incremental updates and therefore are only suitable for read-only environments. CSB+-Trees do support updates by retaining a single pointer per non-leaf node that points to a contiguous block of its children. Although CSB+-Trees outperform B+-Trees on searches, they still perform significantly worse on updates due to the overheads of keeping all children for a given node in sequential order within contiguous memory, especially during node splits.
The execution time of CSB+-Trees (normalized to that of B+-Trees) for the same index search experiment eliminate 20% of the data cache stall time, thus resulting in an overall speedup of 1.15 for searches. While this is a significant improvement, over half of the remaining execution time is still being lost due to data cache misses. In addition, these search-oriented optimizations provide no benefit to scan accesses, which suffer even more from data cache misses.
Accordingly, what is needed in the art is a way to enhance the effectiveness and efficiency of database searches and scans.
To address the above-discussed deficiencies of the prior art, the present invention provides a prefetch system for use with a cache memory associated with a database employing indices. In one embodiment, the prefetch system includes a search subsystem configured to prefetch cache lines containing an index of a node of a tree structure associated with the database. Additionally, the prefetch system also includes a scan subsystem configured to prefetch cache lines based on an index prefetch distance between first and second leaf nodes of the tree structure.
In another aspect, the present invention provides a method of prefetching for use with a cache memory associated with a database employing indices. The method includes prefetching cache lines containing an index of a node of a tree structure associated with the database. The method also includes prefetching cache lines based on an index prefetch distance between first and second leaf nodes of the tree structure.
The present invention also provides, in yet another aspect, a database management system including a computer employing a central processing unit, a main memory containing a database employing indices and a cache memory associated with the central processing unit and the main memory. The database management system also includes a prefetch system for use with the cache memory that is coupled to the database employing indices. The prefetch system has a search subsystem that prefetches cache lines containing an index of a node of a tree structure associated with the database, and a scan subsystem that prefetches cache lines based on an index prefetch distance between first and second leaf nodes of the tree structure.
The foregoing has outlined, rather broadly, preferred and alternative features of the present invention so that those skilled in the art may better understand the detailed description of the invention that follows. Additional features of the invention will be described hereinafter that form the subject of the claims of the invention. Those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiment as a basis for designing or modifying other structures for carrying out the same purposes of the present invention. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the invention in its broadest form.