1. Field of the Invention
This invention relates to processors and to circuits and methods for scheduling and controlling execution of instructions.
2. Description of Related Art
A single instruction multiple data (SIMD) processor uses a single sequential program thread for parallel manipulations of multiple data elements. The multiple data elements when grouped together form a "vector" that can be stored in a single "vector register." Vector registers have wide data widths to accommodate vectors that are operands for SIMD instructions. Processing vectors increases processing power because each vector operation performs multiple parallel calculations, e.g. one calculation per data element.
Superscalar architectures provide another way to increase processor power. A superscalar processor executes a sequential program thread but permits parallel and out-of-order execution of independent instructions. Parallel and out-of-order execution provides higher performance by reducing the time that processing circuitry is idle. Typically, a superscalar processor contains a scheduler and multiple executions units that can operate in parallel. Each clock cycle, the scheduler attempts to select from the program thread multiple instructions for issue to the execution units. Typically, the scheduler checks execution unit availability and operand dependencies and only issues an instruction if the necessary execution unit and operands are available.
An execution unit often requires multiple clock cycles to execute an instruction with the number of clock cycles depending on the instruction. Typically, an execution unit operates as a pipeline having multiple stages, and the scheduler cannot issue an instruction to an initial stage of the pipeline if a blockage in the pipeline keeps a previously issued instruction in the first stage. In the pipeline, different data and resources may be required at different stages, and future availability of such data and resources can be difficult to determine when the scheduler issues an instruction to an execution unit. Accordingly, schedulers often issue instructions without completely evaluating whether necessary resources and data will be available.
Execution pipelines may require complex circuitry to monitor execution of several types of instructions. When the scheduler issues an instruction to the initial stage of an execution pipeline, the execution unit decodes parameters at each stage to determine the proper action at that stage. Additionally, the latencies (or numbers of stages) for instructions vary which further increases execution unit complexity. Use of a uniform number of execution cycles per instruction is typically not feasible because instructions sets, even reduced instruction set computing (RISC) instruction sets, include instructions that cannot be executed in the same time as the simplest instructions. A simpler processor architecture is desired.