(1) Field of the Invention
The present invention relates to the field of microprocessor architecture. Specifically, the present invention relates to the field of microprocessor architecture for increasing processing efficiency within microprocessors having limited numbers of registers by providing register renaming ability.
(2) Prior Art
Microprocessors execute instructions and micro-operations ("uops") by reading source operands from registers and storing destinations or result operands into registers. A register is a temporary storage area within a microprocessor for holding arithmetic and other results used by microprocessor device. Different registers may be used for different functions. For example, some registers may be used primarily for storage of arithmetic results, while other registers may be used primarily for conveying status information via various flag bits (such as system status or floating point status). Registers are individually composed of bits. A bit is a binary digit and may adopt either a "0" value or a "1" value. A given register may contain various bit widths. For example, a 32 bit register may also contain separate 8 bit widths or a separate 16 bit width. Each of the above different register widths for a given 32 bit register may be separately addressable.
The register set of the well known Intel microprocessor architecture ("Intel architecture") has specially defined registers. For background information regarding the register set of the well known Intel architecture, reference is made to Chapter 2 of the i486 Microprocessor Programmer's Reference Manual, published by Osborne-McGraw-Hill, 1990, which is also available directly from Intel Corporation of Santa Clara, Calif. In terms of the Intel register set, 32-bit arithmetic registers are called eax, ebc, ecx, and edx. With reference to eax, this register is composed of other registers of varying width; the low word 16 bits of the eax register are called the ax register. The low byte of the ax register is the al register. The high byte of the ax register is the ah register. Likewise in similar fashion, the other 32-bit registers, ebx, ecx, and edx individually contain separate registers of varying widths. The basic arithmetic registers for use within the Intel register set include: eax, ebx, ecx, edx, edi, esi and ebp, and esp (as well as the partial bit widths thereof).
The amount of registers available within the Intel architecture register set is adequate and advantageous within some microprocessor architectures that are not superscalar or that are superscalar but at most execute two instructions per instruction cycle. However, the register set of the Intel architecture is somewhat limited and it would be advantageous to be able to expand the register set in some way. Superscalar microprocessors, as any other microprocessor, can take advantage of the increased register set to increase performance. Superscalar microprocessors execute uops simultaneously that do not have data dependencies between them. For instance, consider the pseudo code below.
______________________________________ uop0: mov eax, 0x8A uop1: add eax, ebx uop2: add ecx, eax ______________________________________
The uop1 may not execute simultaneously with uop0 because uop1 adds the value of eax with ebx and stores the result into eax. Therefore, uop1 requires the result of uop0 to perform its operation. Likewise, uop2 requires the result (i.e., eax) of uop1 and therefore may not execute simultaneously with uop1. When one uop requires as a source of information a register from a prior uop that is a destination register, this condition is referred to as a data dependency between the two uops. For instance, uop2 and uop1 are data dependent. Some data dependencies, like the above, are unavoidable and therefore impact on the performance of a superscalar microprocessor simply because some uops demand a particular execution order. These data dependencies are called true data dependencies.
However, other data dependencies of uops are not true data dependencies and are more the result of the limited size of a particular microprocessor's register set. Because a register set may be constrained in size, uops may tend to utilize the same registers as temporary storage registers rather than moving data to and from memory. This is the case because memory moves take quite a large amount of processing time and are very costly to processor overall performance. Therefore, a small register set may create a form of "bottleneck" in the performance stream of a superscalar microprocessor as multiple uops target the same register for temporary storage of data but really do not depend on the data of these registers for their own execution. For instance, consider the code below:
______________________________________ uop0: mov bx, 0x8A uop1: add ax, bx uop2: mov bx, cx uop3: inc bx ______________________________________
While uop1 is data dependent on the result of uop0 for the bx register, there are no data dependencies between uop2 and uop1. Although uop2 and uop1 both utilize the bx register, the source value of uop2 does not in any way depend on the outcome of the execution of uop0 or uop1 even though both uops in some way utilize the bx register. This is called a false dependency between uop1 and uop2. The same is true for uop3 in that uop3, while data dependent on uop2, does not depend on the results of either uop0 or uop1. Therefore, a superscalar microprocessor should be able to at least execute uop1 and uop2 simultaneously. However, since they both utilize the bx register, it would be advantageous to be able to provide a microprocessor architecture to allow the above uops (uop1 and uop2) to simultaneously execute. The present invention allows such advantageous result while the prior art would treat uop1 and uop2 as truly data dependent. However, the present invention provides a mechanism and method for allowing simultaneous execution of uops that do not have true data dependencies but may share common logical registers.
Floating point registers within the Intel macroarchitecture are 86-bits wide each and are referenced from a top of stack (TOS) pointer and are therefore stack relative. Individual floating point operations may alter the TOS pointer. Many of the floating point operations utilize the TOS register as a primary source or as the destination storage location. Therefore, floating point data placed into the TOS register must be removed periodically as new operations are executed. For this reason, and others, the Intel instruction set provides an FXCH operation which exchanges data between the TOS register and any other FP register. As floating point operations are executed, the FXCH operation is used quite often. In order to exchange data between a first floating point register and a second, the prior art FXCH operation moves data from the first register into a temporary memory area, then moves the data from the second register into the first register and then moves the data of the temporary area back into the second register. In all, at least three moves of 86-bit floating point data are required for each FXCH operation. It would be advantageous to be able to reduce the processing time required to perform the FXCH operation. The present invention provides such capability. It would further be advantageous to provide a floating point register renaming mechanism that incorporates such efficient FXCH operation.
Speculative execution by microprocessors of the prior art utilize a branch target buffer for anticipating the future program flow of a particular program at a branch instruction based on the path last taken by the program code for that branch instruction. Until the microprocessor actually determines that it took the proper pathway subsequent to the branch instruction, the code processed by the microprocessor is "speculative." Once the speculative instructions are determined to be on the proper pathway, they may be retired. If they are part of the incorrect pathway, they are called "mispredicted" and are discarded by the microprocessor and the microprocessor then processes the correct pathway subsequent to the branch instruction. Therefore, it is advantageous to provide a floating point register renaming mechanism that accounts for FXCH operations in an
,5 environment that allows speculative execution of instructions. The present invention provides such capability.
Accordingly, it is an object of the present invention to allow more efficient processing performance within a superscalar microprocessor. It is an object of the present invention to specifically increase the execution performance of a superscalar microprocessor by allowing more uops the ability to simultaneously execute within a given execution cycle. It is yet another object of the present invention to allow simultaneous execution of multiple uops that utilize the same registers as operands but are not truly data dependent uops. It is yet another object of the present invention to provide the above features for increasing execution efficiency for floating point, stack based registers. It is another object of the present invention to provide an efficient FXCH operation. It is an object of the present invention to provide a floating point register renaming mechanism that accounts for speculative FXCH operations at operation retirement.
It is another object of the present invention to provide the above functionality within a high performance superscalar microprocessor resulting in increased execution efficiency. It is another object of the present invention to provide a general purpose computer system having such high performance superscalar microprocessor as an integral component. These an other objects of the present invention not specifically stated above will become evident according to discussions of the present invention to follow.