In the field of microprocessors, the way in which registers are addressed and accessed in retrieving or storing operands during the execution of an instruction is a distinguishing feature of the microprocessor architecture. For example, instructions executed by microprocessors of the accumulator architecture implicitly refer to an operand stored in the accumulator. In general-purpose register architecture microprocessors, both memory and register operands are explicitly addressed in the instructions. In microprocessors of the load/store register architecture, which has been widely adopted in recent years, register operands are directly addressed in the instructions, and memory operands are accessed only by execution of load and store instructions.
Another well-known microprocessor architecture is referred to as the stack architecture. In the stack architecture, as is well-known, operands are implicitly stored in an ordered sequence within a group of registers referred to as a stack. A separate register, referred to as a stack pointer, stores an address corresponding to the stack register from or to which the next instruction is to fetch or store an operand. Operations referred to as PUSH and POP are respectively used to store operands into, and retrieve operands from, the stack; the stack pointer is incremented or decremented accordingly. In modern microprocessors, stack architectures are used primarily for internal execution units, rather than for the main instruction pipelines. For example, modern microprocessors of the x86-architecture type commonly utilize stack architecture in their on-chip floating-point units (FPUs), while the main, or integer, pipelines are of load/store architecture.
In stack architecture processors, actual (raw) register addresses must be calculated for each instruction that involves an operand store or fetch from the stack. For example, a typical floating-point instruction in an x86-architecture microprocessor involves the calculation of a register address based upon the current stack pointer value (referred to in this architecture as the top-of-stack value TOP) and upon an explicit register address indicated in the instruction; the register address in the instruction is typically an offset address relative to the top-of-stack value TOP. The execution of certain instructions may also involve a PUSH operation, in which an operand is stored in the next register above the top of the stack. As is known in the art, a PUSH operation involves the pre-decrementing of the stack pointer, followed by the storage of the operand in the new pointed-to location; conversely, a POP operation involves the read of an operand from the top of the stack, followed by incrementing of the stack pointer.
Because the floating-point units in typical x86-architecture microprocessors of the so-called P5 or P6 class are pipelined, register address calculation must be performed prior to execution, for example in an instruction schedule stage of the pipeline. This early register mapping is necessary in order to detect register conflicts and dependencies that may arise from the multiple instructions in the pipeline. Especially in the case of floating-point instruction execution, therefore, the calculation of register addresses for instructions involving PUSH operations is performed for a high percentage of the executed instructions. The time required for calculation of register addresses affects the overall performance of the unit.
It is therefore desirable to calculate raw register addresses as rapidly as possible. Of course, as is well known in the art, the addition of two binary numbers, either with or without a carry-in bit, may be done by a simple single carry-propagate adder. Accordingly, in a stack architecture, the calculation of a register address by the addition of the stack pointer contents and a relative register address may be performed with a single add. However, the calculation of a register address for an instruction involving a PUSH operation requires both adding the stack pointer contents to the relative register address indicated in the instruction, and also subtracting one to account for the PUSH. As is well known in the art, however, subtraction of one may not be performed in combination with a sum operation in a conventional single carry-propagate adder, but instead requires a second carry-propagate addition of minus one (i.e., 2's complement of 1.sub.2) to the previous sum. The necessity for two carry-propagate adds to calculate register addresses has been observed to affect a large number of operations, particularly in modern pipelined floating-point sequences, and, since this calculation is in the critical performance path, limits the overall microprocessor performance.