Processor systems perform various tasks by processing task instructions within pipelines contained in the processor systems. Pipelines generally are responsible for fetching instructions from a storage unit such as a memory or cache, decoding the instructions, executing the instructions, and then writing the results into another storage unit, such as a register. Pipelines generally process multiple instructions at a time. For example, a pipeline may simultaneously execute a first instruction, decode a second instruction and fetch a third instruction from a cache.
General purpose microprocessors are presently being extended to include Single Instruction, Multiple Data (SIMD) and DSP functions, and DSP processors are extended to include controller code. SIMD instructions allow a single instruction to operate at the same time on multiple data items.
As a means of power conservation, instructions comprising a loop may be fetched and transferred to an instruction queue, rather than the instruction cache, as described in U.S. application Ser. No. 11/273,691, filed Nov. 14, 2005, entitled “Loop Detection and Capture in the Instruction Queue,” and incorporated herein by reference. If a loop is detected and the number of iterations through the loop are known, or if the starting and ending points in the loop are known, the instruction cache and branch prediction module may be shut down while the instructions for the loop are executed from the instruction queue. When the end of the loop is reached (i.e. the branch instruction does not point back to the beginning of the loop), the instruction cache and branch prediction module may be powered again, and fetching from the instruction cache may resume. When a loop buffer is implemented in the instruction queue, as described in U.S. application Ser. No. 11/273,691, filed Nov. 14, 2005, entitled “Loop Detection and Capture in the Instruction Queue,” power is conserved by not fetching instructions from the instruction cache. For a microprocessor with an SIMD engine implemented in the back end of the integer execution pipeline, the instructions are, however, still pipelined through all of the pipeline stages from the instruction queue and the integer execution unit is tied up during the execution of SIMD instructions.
Presently, architectures with resources dedicated to executing SIMD instructions are emerging, and there are no known solutions to increase power efficiency and throughput in such architectures by handling loops and SIMD instructions.