The present invention relates generally to processors, and in particular to a system and method for utilizing existing register renaming resources to execute expanded instructions that pass partial results.
Processor instructions operate on data obtained from, and write their results to, memory. Modern processors utilize a hierarchical memory structure comprising a few fast, expensive memory elements, such as registers, at the top level. The memory hierarchy then comprises successively slower but more cost-effective memory technologies at lower levels, such as cache memories (SRAM), solid-state main memory (DRAM), and disks (magnetic or optical media), respectively. For applications such as portable electronic devices, DRAM is often the lowest level of the memory hierarchy.
Most processor instruction set architectures (ISA) include a set of General Purpose Registers (GPRs), which are architected registers used to pass data between instructions, and to and from memory. Instructions that perform logical and arithmetic operations on data read their operands from, and write their results to, specified GPRs. Similarly, memory access instructions read data to be stored to memory from GPRs, and write data loaded from memory to GPRs. A compiler assigns source and target GPR identifiers to each instruction, and orders the instructions, such that the proper results are calculated. That is, instructions are arranged in “program order” that guarantees correct results by directing earlier instructions to store results in specified GPRs, and directing later instructions to read those GPRs to obtain operands for further processing. The GPR identifiers are logical labels (e.g., r0-r15).
Some modern processor support “expanded” instructions—that is, instructions that perform more than a single arithmetic or logical operation. For example, the instruction
ADD r1, r2, r3 LSL r4 implements the equation r1=r2+(r3<<[r4]), that is, left-shift the value in register r3 by the amount stored in r4, add this result to the value in r2, and store the sum in register r1. In a processor whose adder requires the full cycle time, this expanded instruction may be implemented as two separate, composite instructions—a shift instruction that left-shifts the value in r3, generating an intermediate result, and an add instruction that adds the intermediate result to the value in r2 and stores the sum in r1. In some processors—i.e., processors that support operand forwarding and only execute expanded instructions in program order—passing the intermediate results from the shift instruction to the add instruction is straightforward. In general, however—particularly in superscalar processors that support out of order instruction execution—additional resources, such as non-architected “scratch” registers and complex control logic, must be added to the processor to reliably implement the forwarding of intermediate results between constituent instructions of an expanded instruction.