1. Field of the Invention
The present invention relates generally to speculative instruction execution in a processor, and more particularly to achieving a very large lookahead instruction window via non-sequential instruction fetch and issue.
2. Description of Related Art
A rather common computer program feature is a loop where each iteration has a large number of dependent floating-point operations but there are few or no data dependencies between iterations. A traditional issue mechanism in a prior art processor was unable to extract much instruction-level parallelism (ILP) because the processor has a limited lookahead instruction window.
Typically, the number of entries in an issue queue of the processor limited the size of the lookahead instruction window. Consequently, the processor only picked instructions for issue from one iteration of the loop. Unfortunately, these instructions were dependent on one another. Thus, the processor was forced to execute the instructions sequentially.
To achieve a very large lookahead instruction window, a traditional out-of-order processor required a very large issue queue. However, such an issue queue was impractical since the issue queue was usually implemented as a content-addressable memory (CAM) structure to support associative searches. A large CAM typically had an unacceptable cycle time and/or power impact.
On the other hand, some processors, with a capability to selectively defer execution of instructions, (See for example, commonly assigned, U.S. Pat. No. 7,114,060 B2, entitled “Selectively Deferring Instructions Issued in Program Order Utilizing a Checkpoint and Multiple Deferral Scheme,” of Shailender Chaudhry and Marc Tremblay, issued on Sep. 26, 2006, which is incorporated herein by reference in its entirety) could potentially support a very large instruction window because long latency operations such as L2 cache misses did not block the pipeline. However, in the above loop example, the current implementation of such processors, even in a Simultaneous Speculative Threading (SST) mode, does not achieve a very large lookahead instruction window because the processor processes the dynamic instruction stream sequentially. Since each loop iteration is a long chain of dependent floating-point (e.g. execution of instruction fmuladd) operations, the issue mechanism processes the dynamic instruction stream slowly and is unable to achieve a large lookahead instruction window. Such a processor supports two threads, and to implement the SST mode, provides mechanisms to allow close interaction between the two threads, e.g., each thread has a copy of the register file. See commonly assigned, U.S. Patent Application Publication No. 2006/0212689 A1, entitled “Method and Apparatus for Simultaneous Speculative Threading” of Shailender Chaudhry et al., published Sep. 21, 2006, and filed Apr. 24, 2006, which is incorporated herein by reference in its entirety.