1. Field
The disclosed embodiments relate to techniques for improving the performance of computer systems. More specifically, the disclosed embodiments relate to a method and apparatus for reducing hardware costs involved in supporting a lookahead mode, which occurs when a strand encounters a long-latency event and continues executing to generate prefetches without waiting for the long-latency event to complete.
2. Related Art
Advances in semiconductor fabrication technology have given rise to dramatic increases in microprocessor clock speeds. This increase in microprocessor clock speeds has not been matched by a corresponding increase in memory access speeds. Hence, the disparity between microprocessor clock speeds and memory access speeds continues to grow, and is creating significant performance problems. Execution profiles for fast microprocessor systems show that a large fraction of execution time is spent not within the microprocessor core, but within memory structures outside of the microprocessor core. This means that the microprocessor systems spend a large fraction of time waiting for memory references to complete instead of performing computational operations.
Efficient caching schemes can reduce the number of memory accesses that are performed. However, when a memory reference, such as a load operation, generates a cache miss, the subsequent access to level-two cache, level-three cache or main memory can require hundreds of clock cycles to complete, during which time the processor is typically idle, performing no useful work.
The “miss-lookahead” technique has been developed to improve the performance of microprocessors when running applications that encounter such long-latency events (e.g., outermost-level cache misses). In the miss-lookahead technique, a processor transitions a strand (e.g., a hardware thread) from a normal-operating mode to a lookahead mode when that strand encounters a long-latency event, such as a cache miss. As part of the transition, the system takes a checkpoint of the processor state of the strand. In lookahead mode, the processor executes the same code as in normal-operating mode but converts outermost-level cache misses into prefetches and converts instructions which are dependent on the data of these cache misses into no-ops.
When the long-latency event that triggered the entry into lookahead mode completes, the strand exits lookahead mode and resumes execution in normal-operating mode from the instruction that triggered the long-latency event. Note that the strand's architectural register state is modified as instructions are retired during lookahead mode. However, this strand's architectural state is restored back to the checkpointed state prior to resuming execution in the normal-operating mode.
The miss-lookahead technique can significantly improve processor performance because it can effectively perform instruction and data prefetching for the lookahead strand. However, the miss-lookahead technique has a number of drawbacks. (1) As described above, a conventional miss-lookahead technique involves taking a checkpoint of the processor state prior to entering lookahead mode. However, supporting this checkpointing operation can be expensive in terms of hardware costs, especially for processor architectures with a large number of architectural registers (e.g., register windows). Moreover, highly multi-threaded processors potentially require many concurrent checkpoints. (2) Also, the miss-lookahead technique consumes additional power because the lookahead instructions must eventually be re-executed non-speculatively. (3)
Moreover, the miss-lookahead technique can take hardware resources away from other strands that are sharing the same hardware resources as the lookahead strand, thereby slowing those other strands and ultimately affecting processor performance.
Hence, it is desirable to be able to reduce or eliminate the negative effects of the above-described drawbacks in a system that supports miss-lookahead mode.