1. Field of the Invention
The present invention generally relates to microprocessors, and particularly relates to dynamically delaying instruction execution.
2. Relevant Background
Modern processing systems conventionally include a main or central processor and one or more coprocessors. The main processor offloads certain tasks to the coprocessors. For example, floating-point, arithmetic, graphics, signal-processing, string-processing, encryption, and/or vector-processing tasks can be off-loaded to corresponding coprocessors. The computational load placed on the main processor is lightened by offloading processor-intensive tasks, thus improving system performance.
An instruction offloaded to a coprocessor may be speculative, that is, the main processor has not yet finally resolved whether the instruction should be executed when it is provided to the coprocessor for execution. This often occurs in the context of conditional branch instructions. When a conditional branch instruction is encountered, the processor predicts which path it will take; whether it will jump to a different stream of code (i.e., the branch is taken), or if execution will continue with the instruction after the branch (i.e., the branch is not taken). The processor or coprocessor then speculatively executes one or more instructions associated with the predicted path. When the branch instruction is later resolved, it may be determined that the path was incorrectly predicted, often referred to as a branch misprediction. When a branch misprediction occurs, all instructions in the mispredicted path are discarded or flushed. Power consumption may be adversely affected as any work performed on instructions in the mispredicted path was unnecessary, yet could have been substantial.
To reduce the adverse effects associated with mispredicted branches, conventional coprocessors operate with a fixed instruction execution delay. Instructions received by a conventional coprocessor incur an execution delay corresponding to the fixed delay. As such, execution of instructions received by the coprocessor does not begin until the fixed delay lapses. Once the processor's design is ‘frozen’, the instruction execution delay cannot be modified without revising the design. Thus, each instruction executed by a conventional coprocessor incurs the same fixed delay without regard to the particular software application being executed nor the environment in which it is run. However, different application types result in different instruction execution predictability. That is, speculative instructions associated with some application types are almost always correctly predicted taken, and thus, speculative instruction execution is fairly reliable because the likelihood of branch misprediction is relatively low.
For example, the inverse transform function employed by a video decoding application is executed for each macro block of every video frame. Conversely, execution of speculative instructions associated with other application types may be less reliable, making the likelihood of execution pipeline flushing higher. For example, the de-blocking and de-ringing filter functions executed by a video decoding application depend on the pixel values of a particular video frame, and are executed less predictably. As a result, a fixed instruction execution delay may hamper processor performance or power when executing speculative instructions associated with applications that have different instruction execution predictabilities.
Instruction execution delay is conventionally fixed to one extreme that enables full speculative instruction execution or to the other extreme that enables no speculative instruction execution. In full speculative instruction execution mode, coprocessors speculatively execute each instruction provided to them without delay and before the instruction has been committed by the main processor. As such, the fixed instruction execution delay is essentially zero. Although this technique is beneficial for performance, it hampers power efficiency when speculative instruction execution becomes unpredictable. For example, highly speculative instructions may be mispredicted often, thus causing a zero-delay coprocessor to frequently execute code unnecessarily. Power efficiency is degraded each time computations associated with a mispredicted instruction are flushed from the execution pipeline. However, performance is enhanced due to the lack of additional startup latency for new instructions issued to the coprocessor.
Conversely, in non-speculative execution configurations, a coprocessor defers instruction execution until the main processor commits the corresponding instruction. Although this technique is beneficial from a power point of view for highly speculative instructions by preventing the execution of instructions which will subsequently be flushed, it hampers system performance due to its additional startup latency. When branches are often predicted correctly, and can be fed continuously to the coprocessor, the long startup latency is hidden and thus performance is not significantly degraded. On the other hand, the non-speculative approach greatly increases the instruction's effective latency when, for example, the processor is waiting for the results from the coprocessor.