1. Field of the Invention
The present invention relates to techniques for improving computer system performance. More specifically, the present invention relates to a method and an apparatus for avoiding hazards involving data dependencies in a processor that supports speculative program execution.
2. Related Art
Advances in semiconductor fabrication technology have given rise to dramatic increases in microprocessor clock speeds. This increase in microprocessor clock speeds has not been matched by a corresponding increase in memory access speeds. Hence, the disparity between microprocessor clock speeds and memory access speeds continues to grow, and is beginning to create significant performance problems. Execution profiles for fast microprocessor systems show that a large fraction of execution time is spent not within the microprocessor core, but within memory structures outside of the microprocessor core. This means that the microprocessor systems spend a large fraction of time waiting for memory references to complete instead of performing computational operations.
Efficient caching schemes can help reduce the number of memory accesses that are performed. However, when a memory reference, such as a load operation generates a cache miss, the subsequent access to level-two (L2) cache or memory can require dozens or hundreds of clock cycles to complete, during which time the processor is typically idle, performing no useful work.
A number of techniques are presently used (or have been proposed) to hide this cache-miss latency. Some processors support out-of-order execution, in which instructions are kept in an issue queue, and are issued “out-of-order” when operands become available. However, allowing instructions to issue out-of-order greatly increases the complexity of a processor, because the processor must provide mechanisms to avoid problems caused by data dependencies between instructions that execute out-of-order. These inter-instruction data dependencies give rise to a number of problems, such as read-after-write (RAW) hazards, write-after-write (WAW) hazards, and write-after-read (WAR) hazards.
Conventional out-of-order processors deal with RAW hazards by structuring an issue queue as a content-addressable-memory (CAM). Unfortunately, this type of CAM structure has a complexity that grows quadratically with the number of entries in the issue queue and the issue width of the processor. Moreover, performance considerations make it highly desirable to pick ready consumer instructions in the same cycle as producers make the data available. This factor along with the timing constraints introduced by higher clock frequencies limits the size of the issue queue to 128 or fewer entries, which is not sufficient to hide memory latencies as processors continue to get faster.
Conventional out-of order machines deal with WAW and WAR hazards through register renaming. In a system that supports register renaming, producer instructions specify architectural registers as their destinations, and these architectural registers are mapped by hardware onto unique physical registers. This eliminates WAW and WAR hazards, because the unique physical register cannot be overwritten by another producer instruction. Unfortunately, the register renaming circuitry is also structured as a CAM, which similarly has a complexity that grows quadratically with the number of entries in the issue queue and with the issue width of the processor. Furthermore, constraints on the number of physical registers which are available for register renaming purposes also limits the size of the issue queue.
Hence, what is needed is a method and an apparatus for hiding memory latency and dealing with data dependencies without the above-described drawbacks of existing processor designs.