Computer technology continues to advance at a remarkable pace, with numerous improvements being made to the performance of both microprocessors—the “brains” of a computer—and the memory that stores the information processed by a computer.
In general, a microprocessor operates by executing a sequence of instructions that form a computer program. The instructions are typically stored in a memory system having a plurality of storage locations identified by unique memory addresses. The memory addresses collectively define a “memory address space,” representing the addressable range of memory addresses that can be accessed by a microprocessor.
Both the instructions forming a computer program and the data operated upon by those instructions are often stored in a memory system and retrieved as necessary by the microprocessor when executing the computer program. The speed of microprocessors, however, has increased relative to that of memory devices to the extent that retrieving instructions and data from a memory can often become a significant bottleneck on performance. To decrease this bottleneck, it is desirable to use the fastest available memory devices possible. However, both memory speed and memory capacity are typically directly related to cost, and as a result, many computer designs must balance memory speed and capacity with cost.
A predominant manner of obtaining such a balance is to use multiple “levels” of memories in a memory architecture to attempt to decrease costs with minimal impact on system performance. Often, a computer relies on a relatively large, slow and inexpensive mass storage system such as a hard disk drive or other external storage device, an intermediate main storage memory that uses dynamic random access memory devices (DRAM's) or other volatile memory storage devices, and one or more high speed, limited capacity cache memories, or caches, implemented with static random access memory devices (SRAM's) or the like. Often multiple levels of cache memories are used, each with progressively faster and smaller memory devices. Also, depending upon the memory architecture used, cache memories may be shared by multiple microprocessors or dedicated to individual microprocessors, and may either be integrated onto the same integrated circuit as a microprocessor, or provided on a separate integrated circuit.
Moreover, some cache memories may be used to store both instructions, which comprise the actual programs that are being executed, and the data being processed by those programs. Other cache memories, often those closest to the microprocessors, may be dedicated to storing only instructions or data.
When multiple levels of memory are provided in a computer architecture, one or more memory controllers are typically relied upon to swap needed data from segments of memory addresses, often known as “cache lines”, between the various memory levels to attempt to maximize the frequency that requested data is stored in the fastest cache memory accessible by the microprocessor. Whenever a memory access request attempts to access a memory address that is not cached in a cache memory, a “cache miss” occurs. As a result of a cache miss, the cache line for a memory address typically must be retrieved from a relatively slower, lower level memory, often with a significant performance penalty.
Caching depends upon both temporal and spatial locality to improve system performance. Put another way, when a particular cache line is retrieved into a cache memory, there is a good likelihood that data from that cache line will be needed again, so the next access to data in the same cache line will result in a “cache hit” and thus not incur a performance penalty.
One manner of increasing the performance benefits of caching involves the use of prefetching, which generally attempts to predict future program references, directed to program instructions and/or data accessed by program instructions, and fetch such information into a cache before the information is actually needed. As such, when the information is later requested, the likelihood increases that the information will already be present in the cache, thus averting the potential occurrence of a cache miss.
Prefetching can rely on a number of different algorithms. For example, prefetching may be history-based (also referred to as context-based), whereby access patterns in a program are monitored to attempt to detect repeating sequences of accesses, or even repeating sequences of cache misses. For example, history-based prefetching may be used to detect that whenever a reference to cache line X occurs, it is usually followed by references to cache lines Y and Z. As such, whenever a reference to cache line X occurs, a prefetch engine may be used to automatically initiate a prefetch of cache lines Y and Z so that when the references to those cache lines do occur, the fetches of those lines will be completed, or at least already underway.
One drawback to conventional history-based prefetching algorithms, however, is the requirement for relatively large hardware tables to store detected sequences of accesses. Large size hardware tables tend to occupy too much area on a hardware integrated circuit, and thus add to the cost and complexity of the hardware. Smaller size hardware tables often are incapable of storing enough prefetch history data to appreciably improve performance.
Therefore, a significant need has arisen in the art for a manner of improving the performance of and reducing the storage requirements for a history-based prefetching algorithm.