Commonly portable products, such as cell phones, laptop computers, personal data assistants (PDAs) or the like, require the use of a processor executing a program supporting communication and multimedia applications. The processing system for such products includes a processor, a source of instructions, a source of input operands, and storage space for storing results of execution. For example, the instructions and input operands may be stored in a hierarchical memory configuration consisting of general purpose registers and multi-levels of caches, including, for example, an instruction cache, a data cache, and system memory.
In order to provide high performance in the execution of programs, a processor typically executes instructions in a pipeline optimized for the application and the process technology used to manufacture the processor. In high performance processors, the rate of accessing operands from storage tends to be slower than the processor instruction execution rate. Consequently, obtaining instruction specified operands from storage, may result in stalling the processor for one or more cycles to account for differences in storage access time and the processor clock cycle time. Further, it often occurs that an instruction specifies a source operand that is a result of executing a previous instruction. In multiple stage execution pipelines, the instruction requiring a previous execution result must be stalled, pending the completion of executing the previous instruction. These stalls limit the performance of the processor.