1. Field of the Invention
The present invention relates generally to the field of computer memory, and, more particularly, to mechanisms and methods of using data access patterns to optimize performance of memory access operations in computer systems.
2. Description of the Related Art
In a modern computer system, data can usually be propagated in the memory and cache hierarchy. For a multi-processor system, data can be replicated in multiple caches, and a cache coherence mechanism can be employed to maintain cache coherence of the system. Ideally, an effective memory system should place data in the right place at the right time. This requires that requested data be moved to a cache close to the accessing processor in advance to reduce potential cache miss latency, and that the corresponding cache line be brought to an appropriate cache coherence state to reduce potential coherence overhead.
From a software perspective, different applications usually have different data access patterns. Given an application, different memory regions can exhibit different data access patterns. Furthermore, the same memory region may exhibit different data access patterns throughout the program execution. To improve overall performance, it is generally desirable to have a memory system that can be adaptive to various data access patterns.
For data access patterns that are dynamically predictable, hardware can incorporate appropriate prediction mechanisms. For example, the IBM® POWER 4 system comprises a hardware pre-fetch engine that allows hardware to detect streaming data accesses on-the-fly and to retrieve streaming data from memory automatically. When cache misses occur on sequential cache lines, the pre-fetch engine can initiate memory accesses to subsequent cache lines before they are referenced. This allows data to be pre-fetched from memory to an L3 cache, from the L3 cache to an L2 cache, and from the L2 cache to an L1 cache.
Many data access patterns can be statically detected or predicted, by the programmer or the compiler. For data access patterns that are statically predictable, software can specify proper heuristic information that can be passed to the underlying system. For example, the IBM® PowerPC® architecture comprises DCBT (data cache block touch) and DCBTST (data cache block touch for store) instructions, which behave as hints to hardware that data of a memory block should be pre-fetched to avoid potential memory access latency.
For many applications, the programmer or the compiler can determine possible data access patterns for some, if not all, commonly-used variables. The data access patterns may be more sophisticated than simple pre-fetch operations that intend to retrieve individual cache lines. However, modern computer systems lack an effective means for software to pass such data access pattern information to the underlying memory system. For example, in a multi-threading program, the programmer or the compiler may have good knowledge about memory addresses that are associated with a semaphore. This knowledge, if made available to the underlying memory system, could be used to reduce memory access latency. For example, when a processor acquires the semaphore, the semaphore could be an indication that data of the corresponding memory addresses associated with the semaphore should be pre-fetched to a cache close to the processor. However, software cannot inform hardware of such data access pattern information via an effective architecture interface.
Therefore, it is generally desirable to have an effective mechanism with appropriate architecture support that enables software to specify data access patterns that are to be passed to underlying hardware.