Cache memory has been in existence for many years. Modern cache memories are designed to take advantage of the locality of temporal reference as well as the locality of spatial reference. This means that once an operand has been referenced from memory, either the same operand, or other operands nearby in the memory are often referenced a short time later. Recognizing the value of cache memories, practitioners have developed ever more complex, hierarchical memory systems. For example, today it is quite common to find commercially-available computer systems having two or more cache memories.
It is important to understand that caches exist in the context of a memory hierarchy within a computer system. Usually, there is small but very fast local cache memory coupled to a microprocessor or central processing unit (CPU). Often the local cache memory is incorporated on the same integrated circuit as the microprocessor itself. If the processor attempts to access a memory reference, and the access to the local cache misses, then the access is passed on to a bigger, but slower, secondary memory.
Very often, computer systems employ a secondary cache in the memory hierarchy, interposed between the local cache memory and a third level memory. The third level memory usually consists of another cache memory or the system's main memory (e.g., DRAM or a rigid disk drive). In the case where an access misses both the local and secondary caches, the access ends up at the third level cache (or main memory).
Some computer systems also employ a separate instruction cache, which is a fast, local memory that holds instructions to be executed. When a program tries to access instruction data that is not yet (or no longer) in the instruction cache, the processor unit must wait until the desired instruction or instructions are fetched from a higher level of memory within the memory hierarchy. By way of example, if an access to the instruction cache misses, a memory access is then generated to the secondary cache level, or to main memory.
Fetching refers to the act of extracting instructions to be executed from the memory hierarchy, including cache memories. In many cases, a computer system employs an instruction fetch unit that is responsible for deciding which instruction cache entry ought to be accessed next to maximize program performance.
Besides cache memories, another microarchitecture design technique for improving performance is pipelining. Pipelining divides the execution of an instruction into sequential steps, using different microarchitectural resources at each step. One of the defining characteristics of pipelined computer machines is that they have several different instructions all executing at the same time, but usually at different stages in the machine. The particular parts of the execution process within the pipeline are often referred to as pipestages.
Pipelined machines must fetch the next instructions in a program flow before they have completed the previous instructions. This means that if the previous instruction was a branch, then the next instruction fetch could be to the wrong place. Branch prediction is a computer system technique that attempts to infer the proper next instruction address, knowing only the current address. Branch prediction typically utilizes an associative memory called a branch target buffer.
An associative memory is a table that is accessed not via an explicit index, but by the data it contains. If no entries of the associative memory match the input data, a "miss" signal is asserted. If any entries of the associative memory match the input data, the associative memory indicates the match, and produces any related data that was stored with that entry. Branch target buffers typically comprises a small associative memory that monitors the instruction cache index and tries to predict which instruction cache index should be accessed next, based on branch history.
Optimizing the actual algorithm used in retaining the history of each entry is an area of ongoing research. When a branch is incorrectly predicted, the speculative state of the machine must be flushed and fetching restarted from the correct place. This process is referred to as branch recovery. Speculation is the technique of guessing which way a program will proceed, and executing down that path. Speculation implies a method of correction when a guess is determined to be wrong.
Difficulties arise in computer systems that employ a memory hierarchy and which attempt to take advantage of the parallelism present in a program by executing instructions based on data dependencies and resource availability. These types of machines are referred to as "out-of-order" computing machines. The term "out-of-order" means not necessarily executed in the same sequence implied by the source program. What can happen is that when an access misses the instruction cache, the computer's CPU issues a fetch to the external memory of the system. The problem in out-of-order machines, however, is that accesses can be returned in arbitrary order. Moreover, there exists a further problem in keeping track of pending requests from the external memory system in the face of mispredicted branches. In most cases, the external memory systems have no way of knowing that a memory access was canceled due to a mispredicted branch.
As will be seen, the present invention provides a method of prefetching using M physical streaming buffers having N logical identifiers, where N is greater than M. These physical buffers are renamed when an instruction stream is redirected. The invention provides numerous advantages in computer performance; especially in computer systems employing out-of-order execution engines.