1. Technical Field
The technical field of the present specification relates in general to a method and system for data processing and in particular to a processor and method for executing instructions. Still more particularly, the technical field relates to a processor and method for selective out-of-order execution of instructions based upon an instruction parameter.
2. Description of the Related Art
Within state-of-the-art processors, out-of-order execution of instructions is often employed to maximize the utilization of execution unit resources within the processor, thereby enhancing overall processor efficiency. Several factors influence the execution of instructions out of linear program order, including the availability of execution unit resources, data dependencies between instructions, and the availability of rename and completion buffers. The performance enhancement resulting from out-of-order execution is maximized when implemented within a superscalar processor having multiple execution units capable of executing multiple instructions concurrently and independently. In state-of-the-art processors that support out-of-order execution of instructions, instructions are typically dispatched according to program order, executed opportunistically within the execution units of the processor, and completed in program order.
Although, in general, this practice leads to efficient processor performance, in some processors that support out-of-order execution, in-order dispatch of particular sequences of instructions results in inefficiency when such adherence to program order causes a dispatch stall for subsequent instructions. For example, referring now to FIG. 4A, there is depicted an instruction execution timing diagram for an exemplary instruction sequence in which the in-order dispatch of instructions results in a dispatch stall within a superscalar processor having one floating-point, one fixed-point, and one load/store execution unit. For the purpose of this discussion, assume that the floating-point execution unit of the processor is a pipelined execution unit having 3 stages: multiply, add, and normalize/round. In addition, assume that the processor can dispatch up to 2 instructions per cycle in program order, but that only one instruction can be dispatched to each execution unit during a single cycle. Finally, assume that some floating-point instructions, for example, double-precision floating-point multiplication instructions, have a longer execution latency than other floating-point instructions, such as floating-point addition and subtraction instructions.
As illustrated in FIG. 4A, instructions 1 and 2, a double-precision floating-point multiply and a single-precision floating-point subtraction, respectively, are fetched from memory during cycle 1. Next, during cycle 2, instructions 3 and 4, two fixed-point instructions, are fetched while instruction 1 is dispatched. Only instruction 1 is dispatched during cycle 2 because instructions 1 and 2 are both floating-point instructions, which require execution by the floating-point execution unit. During cycle 3, the dispatch of instructions 2-4 is inhibited since the execution within the described processor of double-precision floating-point multiply instructions (e.g., instruction 1) requires 2 cycles at the multiply and add stages, resulting in a total execution latency of 4 cycles for instruction 1. Thereafter, instruction 2-4 are dispatched during cycles 4-6, respectively. The execution of instructions 1 and 2 continues until cycles 6 and 7, when instructions 1 and 2 finish execution and the results of instructions 1 and 2 are stored within floating-point registers.
As should thus be apparent from the exemplary instruction execution timing diagram pictured in FIG. 4A, dispatching and executing instructions in program order that are directed to the same execution unit and have differing execution latencies can cause a stall of the dispatcher and consequently diminishes overall processor performance. Accordingly, there is a need for a processor which supports the selective out-of-order execution of instructions based upon a parameter of one or more instructions.
It should also be noted that it is well-known in the art to utilize various compiler optimizations to take advantage of particular architectural efficiencies by compiling a sequence of instructions in alternative program orders. However, such compilers are distinguishable from the present disclosure in that the present disclosure concerns the real-time intelligent ordering of the execution of instructions to maximize processor efficiency rather than the manipulation of the sequence of instructions within the program order.