1. Field of the Invention
The present invention relates to techniques for improving computer system performance. More specifically, the present invention relates to techniques for supporting different modes of multi-threaded speculative execution.
2. Related Art
Advances in semiconductor fabrication technology have given rise to dramatic increases in microprocessor clock speeds. This increase in microprocessor clock speeds has not been matched by a corresponding increase in memory access speeds. Hence, the disparity between microprocessor clock speeds and memory access speeds continues to grow, and is beginning to create significant performance problems. Execution profiles for fast microprocessor systems show that a large fraction of execution time is spent not within the microprocessor core, but within memory structures outside of the microprocessor core. This means that the microprocessor systems spend a large fraction of time waiting for memory references to complete instead of performing computational operations.
When a memory reference, such as a load operation, generates a cache miss, the subsequent access to level-two (L2) cache (or main memory) can require dozens or hundreds of clock cycles to complete, during which time the processor is typically idle, performing no useful work.
Some processors processor designers have suggested using “speculative-execution” to avoid pipeline stalls associated with cache misses. Two such proposed speculative-execution modes are: (1) execute-ahead mode and (2) scout mode. Execute-ahead mode operates as follows. During normal execution, the system issues instructions for execution in program order. Upon encountering a data-dependent stall condition during execution of an instruction, the system generates a checkpoint that can be used to return execution of the program to the point of the instruction. Next, the system executes subsequent instructions in the execute-ahead mode, wherein instructions that cannot be executed because of a data dependency are deferred, and wherein other non-deferred instructions are executed in program order.
When the unresolved data dependency is resolved during execute-ahead mode, the system enters a deferred-execution mode, wherein the system executes deferred instructions. If all of the deferred instructions are executed during this deferred-execution mode, the system returns to normal-execution mode to resume normal program execution from the point where the execute-ahead mode left off. Alternatively, if some deferred instructions were not executed during deferred-execution mode, the system returns to execute-ahead mode until the remaining unresolved data dependencies are resolved and the deferred instructions can be executed.
If the system encounters a non-data-dependent stall condition while executing in normal-execution mode or execute-ahead mode, the system moves into scout mode. In scout mode, instructions are speculatively executed to prefetch future loads and stores, but results are not committed to the architectural state of the processor. When the launch point stall condition (the unresolved data dependency or the non-data dependent stall condition that originally caused the system to move out of normal-execution mode) is finally resolved, the system uses the checkpoint to resume execution in normal-execution mode from the launch point instruction (the instruction that originally encountered the launch point stall condition).
By allowing the processor to perform work during stall conditions, speculative-execution can significantly increase the amount of computational work the processor completes.
In an effort to increase the effectiveness of speculative-execution, processor designers have suggested using processors that support multithreading to perform speculative-execution. One such design is a speculative multi-threaded (SMT) processor. On such processors, two or more speculative execution threads can operate independently of one another on the processor. SMT processors are most effective for applications where the threads are performing independent tasks. For example, because database queries are typically individual tasks, SMT processors tend to work well for databases.
Processor designers have also suggested using processors that support simultaneous speculative threading (SST). In such processors, a primary thread operates in normal mode and execute-ahead mode, while another thread trails the primary thread and executes instructions deferred by the primary thread in deferred mode. SST is described in more detail in a pending U.S. patent application entitled, “Method and Apparatus for Simultaneous Speculative Threading,” by inventors Shailender Chaudhry, Marc Tremblay, and Paul Caprioli having Ser. No. 11/361,257, and filing date 24 Apr. 2006. SST designs are most effective where the processor is required to perform a single task at the highest possible speed.
Therefore, SMT processors are more effective in situations where multiple threads are performing independent tasks, and SST processors are more effective in situations where multiple threads are working on the same task.
Hence, what is needed is a method and an apparatus that provides the benefits of both an SMT processor and an SST processor.