(1) Field of the Invention
The present invention relates to the field of microprocessor architecture. Specifically, the present invention relates to the field of microprocessor architecture for increasing processing efficiency within microprocessors having limited numbers of registers by providing register renaming ability and selective right-shifting of data from non right-adjusted registers, and specifically high byte registers, before executing operations.
(2) Related Art
Microprocessors execute instructions and micro-operations ("uops") by reading source operands from registers and storing destinations or result operands into registers. A register is a temporary storage area within a microprocessor for holding arithmetic and other results used by the microprocessor device. Different registers may be used for different functions. For example, some registers may be used primarily for storage of arithmetic results, while other registers may be used primarily for conveying status information via various flag bits (such as system status or floating point status). Registers are individually composed of bits. A bit is a binary digit and may adopt either a "0" value or a "1" value. A given register may contain various bit widths. For example, a full width bit register may also contain separate 8 bit widths or a separate 16 bit width. Each of the above different register widths for a given 32 bit register (i.e., partial widths of the 32 bit register) may be separately addressable.
The register set of the well known Intel architecture has specially defined registers. For background information regarding the register set of the well known Intel macroarchitecture, reference is made to Chapter 2 of the i486 Microprocessor Programmer's Reference Manual, published by Osborne-McGraw-Hill, 1990, which is also available directly from Intel Corporation of Santa Clara, California. In terms of the Intel macroarchitecture register set, 32-bit arithmetic registers are called eax, ebc, ecx, and edx. With reference to the full width register eax, this register is composed of other partial width registers of varying width; the low word 16 bits, bits 15-0, of the eax register are called the ax register. The low byte, bits 7-0, of the ax register is the al register. The high byte, bits 15-8, of the ax register is the ah register. Likewise in similar fashion, the other full width 32-bit registers, ebx, ecx, and edx individually contain separate registers of varying widths. The basic arithmetic registers for use within the Intel macroarchitecture register set include: eax, ebx, ecx, and edx (as well as the partial bit widths thereof). These are the logical registers.
The number of registers available within the Intel architecture register set is adequate within some microprocessor architectures that are not superscalar or that are superscalar but at most execute two instructions per instruction cycle. However, the register set of the Intel architecture is somewhat limited and it would be advantageous to be able to expand the register set in some way. Superscalar microprocessors, as any other microprocessor, could take advantage of the expanded register set to increase performance. A superscalar microprocessor simultaneously executes uops that do not have data dependencies between them. For instance, consider the pseudo code below.
______________________________________ uop0: mov eax, 0x8A uop1: add eax, ebx uop2: add ecx, eax ______________________________________
The uop1 may not execute simultaneously with uop0 because uop1 adds the value of eax with ebx and stores the result into eax. Therefore, uop1 requires the result of uop0 to perform its operation. Likewise, uop2 requires the result (i.e., eax) of uop1 and therefore may not execute simultaneously with uop1. When one uop requires as a source of information a register from a prior uop that is a destination register, this condition is referred to as a data dependency between the two uops. For instance, uop2 and uop1 are data dependent. Some data dependencies, like the above, are unavoidable and therefore impact on the performance of a superscalar microprocessor simply because some uops demand a particular execution order. These data dependencies are called true data dependencies.
However, other data dependencies of uops are not true data dependencies and are more the result of the limited size of a particular microprocessor's register set. Because a register set may be constrained in size, uops may tend to utilize the same registers as temporary storage locations (registers) rather than moving data to and from memory. This is the case because memory moves take quite a large amount of processing time and are very costly to processor overall performance. Therefore, a small register set may create a form of "bottleneck" in the performance stream of a superscalar microprocessor as multiple uops target the same register for temporary storage of data but really do not depend on the data of these registers for their own execution. For instance, consider the instruction code below:
______________________________________ uop0: mov bx, 0x8A uop1: add ax, bx uop2: mov bx, cx uop3: inc bx ______________________________________
While uop1 is data dependent on the result of uop0 for the bx register, there are no data dependencies between uop2 and uop1. Although uop2 and uop1 both utilize the bx register, the source value of uop2 does not in any way depend on the outcome of the execution of uop0 or uop1 even though both uops in some way utilize the bx register. This is called a false dependency (or resource dependency) between uop1 and uop2. The same is true for uop3 in that uop3, while data dependent on uop2, does not depend on the results of either uop0 or uop1. Therefore, a superscalar microprocessor should (in principle) be able to at least execute uop1 and uop2 simultaneously. However, since they both utilize the bx register, it would be advantageous to be able to provide a microprocessor architecture to allow the above uops (uop1 and uop2) to simultaneously execute.
Superscalar microprocessors execute several operations simultaneously and therefore, any process that removes false data dependencies as discussed above must operate on a given set of operations simultaneously, the set being those operations that are simultaneously issued by the microprocessor within a given "cycle." Data dependencies may occur between operations of different sets or may occur within operations of the same set (i.e., intracyle). It would be advantageous to be able to eliminate false data dependencies between operands of different sets of operations and also between operands of the same set of operations. The present invention provides such advantageous result. Furthermore, it would be advantageous to preserve the use of partial widths of a larger register such as al, ah, and ax partial widths of the full width eax register while eliminating false data dependencies. The present invention provides such advantageous result.
Data stored in the high byte registers, such as ah, bh, ch, and dh of the Intel microarchitecture, must be right-shifted before being processed by units, such as a 32-bit arithmetic unit, that assume right-adjusted input data. Data is right-adjusted when the least significant bit of the data is also bit 0 of the data. A partial width register is right-adjusted when the least significant bit of the register is also bit 0 of the larger width register. For example, consider the following uop:
______________________________________ uop0: inc ah ; increment register ah ______________________________________
This uop retrieves the binary data stored in the ah register, provides the binary data and a constant `1` to the arithmetic unit for arithmetical addition, and stores the result back in the ah register. In order to produce a correct arithmetic result, the arithmetic unit requires that both inputs be right-adjusted. Since register ah is bits 15-8 of the 32-bit eax register, the binary data retrieved from the ah register must be right-shifted to bits 7-0 before being input to the arithmetic unit. For a byte operation, the result produced by the arithmetic unit is in bits 7-0 of the 32-bit result. (The 32-bit arithmetic unit always produces a 32-bit result that is right-adjusted.) Therefore, after arithmetic addition, bits 7-0 of the result must be left-shifted to bits 15-8 before the result is stored back in the ah register. Previously the right-shifting of high byte input data was perforated every time a high byte register was sourced and left-shifting was required every time a result's destination was a high byte register. These right-shifting and left-shifting operations consume precious time in a microprocessor's operation cycle and can affect speed paths within a given microarchitecture. It would be advantageous to be able to remove the need for right-shifting every time a high byte register, or other non right-adjusted register, is sourced. It would also be advantageous to remove the need for left-shifting every time a right-adjusted result's destination is a high byte register or other non right-adjusted register. The present invention provides such advantageous results.
Accordingly, it is an object of the present invention to allow increased processing performance within a superscalar microprocessor. It is an object of the present invention to specifically increase the execution performance of a superscalar microprocessor by removing the need for right-shifting every time a high byte register, or other non right-adjusted register, is sourced. It is yet another object of the present invention to specifically increase the execution performance of a superscalar microprocessor by removing the need for left-shifting every time a right-adjusted result's destination is a high byte register, or other non right-adjusted register.
It is an object of the present invention to reduce the need for left-shifting and right-shifting in conjunction with a register renaming capability. It is yet another object of the present invention to reduce the need for left-shifting and right-shifting in conjunction with a register renaming capability that renames larger width and partial width registers.
It is another object of the present invention to provide the above functionality within a high performance superscalar microprocessor resulting in increased execution efficiency. It is another object of the present invention to provide a general purpose computer system having such high performance superscalar microprocessor as an integral component. These and other objects of the present invention not specifically stated above will become evident according to discussions of the present invention to follow.