A conventional processor in an information handling system may include several pipeline stages to increase the effective throughput of the processor. For example, the processor may include a fetch stage that fetches instructions from memory, a decoder stage that decodes instructions into opcodes and operands, and an execution stage with various execution units that execute decoded instructions. Pipelining enables the processor to obtain greater efficiency by performing these processor operations in parallel. For example, the decoder stage may decode a fetched instruction while the fetch stage fetches the next instruction. Similarly, an execution unit in the execution stage may execute a decoded instruction while the decoder stage decodes another instruction.
The simplest processors processed instructions in program order, namely the order that the processor encounters instructions in a program. Processor designers increased processor efficiency by designing processors that execute instructions out-of-order (OOO). Designers found that a processor can process instructions out of program order provided the processed instruction does not depend on a result not yet available, such as a result from an earlier instruction. In other words, a processor can execute an instruction out-of-order (OOO) provided that instruction does not exhibit a dependency.
To enable a processor to execute instructions out-of-order (OOO), the processor may include an issue queue between the decoder stage and the execution stage. The issue queue acts as a buffer that effectively decouples the decoder stage from the execution units that form the execution stage of the processor. The issue queue includes logic that determines which instructions to send to the various execution units and the order those instructions are sent to the execution units.
The issue queue of a processor may stall when the queue encounters one or more instructions that exhibit a dependency on other instructions. In other words, the issue queue waits for the processor to resolve these dependencies. Once the processor resolves the dependencies, the issue queue may continue issuing instructions to the execution units and execution continues. Unfortunately, the processor loses valuable time when the issue queue exhibits a stall until the processor resolves the dependencies causing the stall. Some modern processors may allow multiple instructions to stall; however, they generally do not scale to high frequency operation or scale to large issue queues.
What is needed is a method and apparatus that addresses the processor inefficiency problem described above in a scalable manner.