Many performance computing processors are often able to accept Complex Instruction Set Computer (CISC) instruction or opcodes. Certain high performance computing processors perform the processing associated with certain CISC instructions by executing multiple Reduced Instruction Set Computer (RISC) instructions or opcodes that perform the required processing. Such processors “crack” the CISC instruction into multiple RISC instructions, which are referred to as “micro-ops” or μOps, that are processed by the RISC processing core of that processor. Normally cracking yields improved performance since it better utilizes the execution unit resources and it allows the micro-ops to execute out of order which makes their results available to other dependent ops earlier.
Cracking CISC instructions into multiple RISC instructions increases the complexity of performing the cracked CISC instruction in processor architectures that pipeline RISC micro-op instruction execution. When a single CISC instruction is cracked into multiple RISC micro-op instructions, those multiple micro-op instructions often have dependencies that must be tracked so that they are executed in the required order. An example of a CISC instruction that is cracked into RISC micro-ops are RX instructions of the zGryphon processor produced by International Business Machines (IBM), Inc. of Armonk, N.Y. RX instructions are arithmetic instructions with one storage operand and one register operand where one operand is sourced from storage and another operand is sourced from a register. The zGryphon processor, for example, cracks RX instructions into a RISC load operation and a RISC mathematic operation. The RISC mathematics operation is, for example, executed by a Fixed Point Unit (FXU) or a Floating Point Unit (FPU) with data produced by the load operation. This results in the mathematics operation being dependent upon the load operation and these micro-ops are required to execute in order. An architecture with parallel pipelined processing paths to execute RISC instruction that simply issues the two micro-ops corresponding to the RX instruction cracking RX ops at decode can degrade processor performance since two target (physical) registers and two issue queue entries are assigned. Both the issue queue and physical registers are frequency (and thus performance) limiting structures. Other CISC instructions with various addressing modes are able to be similarly processed.
Therefore, a more efficient cracking architecture for instructions in out-of-order computer processors is required to improve the performance of such processors.