1. Field of the Invention
The present invention relates generally to the storage of data in a microprocessor cache memory, and more particularly to a system, method, and microprocessor architecture for avoiding cache pollution caused by speculative memory operations.
2. Description of Related Art
Early computer processors, also called microprocessors, included a central processing unit or instruction execution unit that executed only one instruction at a time. Early microprocessors also executed instructions in an order determined by the compiled machine-language program running on the microprocessor and so are referred to as “sequential” microprocessors. In response to the need for improved performance, several techniques have been used to extend the capabilities of these early microprocessors including pipelining, superpipelining, superscaling, and speculative instruction execution.
Pipelined architectures break the execution of instructions into a number of stages where each stage corresponds to one step in the execution of the instruction. Pipelined designs increase the rate at which instructions can be executed by allowing a new instruction to begin execution before a previous instruction is finished executing. Pipelined architectures have been extended to “superpipelined” or “extended pipeline” architectures where each execution pipeline is broken down into even smaller stages (i.e., microinstruction granularity is increased). Superpipelining increases the number of instructions that can be executed in the pipeline at any given time.
“Superscalar” microprocessors generally refer to a class of microprocessor architectures that include multiple pipelines that process instructions in parallel. Superscalar microprocessors typically execute more than one instruction per clock cycle, on average. Superscalar microprocessors allow parallel instruction execution in two or more instruction execution pipelines. The number of instructions that may be processed is increased due to parallel execution.
The goal of superscalar and superpipeline microprocessors is to execute multiple instructions per microprocessor clock cycle. Instruction-level parallelism, such as Single Instruction Multiple Data instructions, available in programs can be exploited to realize this goal. Full exploitation of this potential parallelism requires that instructions be dispatched for execution at a sufficient rate.
However, the ability of modern high performance microprocessors to execute instructions has typically outpaced the ability of memory subsystems to supply instructions and data to the microprocessors. Thus, most high-performance modern microprocessors use cache memory systems with at least one level of high speed, on-chip cache memory to speed memory access.
Cache memory, sometimes simply called “cache”, comprises one or more levels of dedicated high-speed memory for holding recently accessed instructions and data, designed to speed up subsequent access to the same instructions or data. Cache technology is based on the premise that programs frequently re-execute the same instructions or execute different instructions on recently accessed data.
When data is read from main memory, a cache system saves a copy of the data in the cache memory, along with an index to the associated main memory. The cache system then monitors subsequent requests for data to see if the information needed has already been stored in the cache.
If the cache system indicates that the data had indeed been stored in the cache, sometimes called a cache “hit”, the data is delivered immediately to the microprocessor and the attempt to fetch the information from main memory is aborted (or not started). If, on the other hand, the data had not been previously stored in cache, sometimes called a cache “miss”, then the requested data is fetched directly from main memory and also saved in the cache for future access.
Each cache entry is typically accessed by a unique address “tag” stored separately in a tag random access memory (RAM). A cache “hit” occurs whenever a memory access to the cache occurs and the cache system finds, through inspecting its tag memory, that the requested data is present and valid in the cache.
In superscalar microprocessors, multiple pipelines could simultaneously process instructions only when there were no data dependencies between the instructions in each pipeline. For example, when an instruction that operates on data fetched from memory is dependent upon one or more preceding instructions to load the required data into working operand registers, the dependent instruction cannot execute until all of the required stored data has been retrieved from cache or main memory.
Microprocessor instruction execution units in an execution pipeline cannot predict how long it may take to load data into the working operand registers specified by a particular load operation. Microprocessors typically handled this uncertainty by delaying execution until the fetched data is returned by “stalling” the execution pipeline. Consequently data dependencies caused one or more pipelines to wait for the dependent data to become available. This stalling was inconsistent with high speed, multiple instruction per cycle processing.
Speculative instruction execution was an attempt to address pipeline “stalls” in deeply pipelined microprocessors caused by data load operation uncertainty. Modern microprocessors pipeline memory operations allowed a second load operation to enter a load/store stage in an execution pipeline before a previous load instruction, the results on which the second load operation depends, has passed completely through the execution pipeline.
In such deeply pipelined, speculative instruction execution microprocessors, a dependent load instruction, sometimes called a consumer load instruction, dependent on a sequentially earlier load instruction, sometimes called a producer load instruction, may be scheduled for issue before confirmation that the load data required by the consumer load instruction is available in cache memory.
In such microprocessors there is a delay between the decision to issue an instruction and the actual execution of the instruction. Thus, in the case of load instructions, there may exist a significant delay between the issue of a load instruction and data fetch from cache memory. The load instruction is often said to be “in-flight” during this delay. Accordingly, a consumer load instruction, dependent on an “in-flight” producer load instruction, may be issued before the confirmation by the cache system that the load data required by the consumer load instruction is available in the cache. When the required data is not found in the cache, such dependent consumer load instructions could be executing with incorrect data operands. Consumer load instructions, directly or indirectly dependent on “in-flight” producer load instructions, executing with incorrect data operands, send erroneous data to the main memory and cache memory. Such consumer load instructions may cause useful data to be replaced in cache with erroneous data, i.e., data that will likely not be referenced by sequentially later program instructions. This replacement of erroneous data with useful data in the cache, sometimes referred to as cache pollution, often leads to poor cache system performance and thereby adversely effects processor performance and system throughput.
What is needed is a method and apparatus for minimizing or eliminating cache pollution caused by speculative memory load operations.