1. Field of the Invention
The present invention relates to the field of microprocessor architecture. Specifically, the present invention relates to the field of microprocessor architecture for increasing processing efficiency within microprocessors having limited numbers of registers by providing a register renaming ability and, in addition, an idiom recognition ability for preventing certain partial width renaming stalls.
2. Related Art
Microprocessors execute instructions and micro-operations ("uops") by reading source operands from registers and storing destinations or result operands into registers. A register is a temporary storage area within a microprocessor for holding arithmetic and other results used by the microprocessor device. Different registers may be used for different functions. For example, some registers may be used primarily for storage of arithmetic results, while other registers may be used primarily for conveying status information via various flag bits (such as system status or floating point status). Registers arc individually composed of bits. A bit is a binary digit and may adopt either a "0" value or a "1" value. A given register may contain various bit widths. For example, a full width 32 bit register may also contain separate 8 bit widths or a separate 16 bit width. Each of the above different register widths for a given 32 bit register (i.e., partial widths of the 32 bit register) may be separately addressable.
The register set of the well known Intel architecture has specially defined registers. For background information regarding the register set of the well known Intel macroarchitecture, reference is made to Chapter 2 of the i486 Microprocessor Programmer's Reference Manual, published by Osborne-McGraw-Hill, 1990, which is also available directly from Intel Corporation of Santa Clara, Calif. In terms of the Intel macroarchitecture register set, 32-bit arithmetic registers are called eax, ebx, ecx, and edx. With reference to the full width register eax, this register is composed of other partial width registers of varying width; the low word 16 bits, bits 15-0, of the eax register are called the ax register. The low byte, bits 7-0, of the ax register is the al register. The high byte, bits 15-8, of the ax register is the ah register. Likewise in similar fashion, the other full width 32-bit registers, ebx, ecx, and edx individually contain separate registers of varying widths. The basic arithmetic registers for use within the Intel macroarchitecture register set include: eax, ebx, ecx, and edx (as well as the partial bit widths thereof). These are the logical registers.
The number of registers available within the Intel architecture register set is adequate within some microprocessor architectures. However, the register set of the Intel architecture is somewhat limited and it would be advantageous to be able to expand the register set in some way. Superscalar microprocessors, as any other microprocessor, could take advantage of the expanded register set to increase performance. A superscalar microprocessor simultaneously executes uops that do not have data dependencies between them. For instance, consider the pseudo code below.
______________________________________ uop0: mov eax, 0x8A uop1: add eax, ebx uop2: add ecx, eax ______________________________________
The uop1 may not execute simultaneously with uop0 because uop1 adds the value of eax with ebx and stores the result into eax. Therefore, uop1 requires the result of uop0 to perform its operation. Likewise, uop2 requires the result (i.e., eax) of uop1 and therefore may not execute simultaneously with uop1. When one uop requires as a source of information a register from a prior uop that is a destination register, this condition is referred to as a data dependency between the two uops. For instance, uop2 and uop1 are data dependent. Some data dependencies, like the above, are unavoidable and therefore impact on the performance of a superscalar microprocessor simply because some uops demand a particular execution order. These data dependencies are called true data dependencies.
However, other data dependencies of uops are not true data dependencies and are more the result of the limited size of a particular microprocessor's register set. Because a register set may be constrained in size, uops may tend to utilize the same registers as temporary storage locations (registers) rather than moving data to and from memory. This is the case because memory moves take quite a large amount of processing time and are very costly to processor overall performance. Therefore, a small register set may create a form of "bottleneck" in the performance stream of a superscalar microprocessor as multiple uops target the same register for temporary storage of data but really do not depend on the data of these registers for their own execution. For instance, consider the instruction code below:
______________________________________ uop0: mov bx, 0x8A uop1: add ax, bx uop2: mov bx, cx uop3: inc bx ______________________________________
While uop1 is data dependent on the result of uop0 for the bx register, there are no data dependencies between uop2 and uop1. Although uop2 and uop1 both utilize the bx register, the source value of uop2 does not in any way depend on the outcome of the execution of uop0 or uop1 even though both uops in some way utilize the bx register. This is called a false dependency (or resource dependency) between uop1 and uop2. The same is true for uop3 in that uop3, while data dependent on uop2, does not depend on the results of either uop0 or uop1. Therefore, a superscalar microprocessor should (in principle) be able to at least execute uop1 and uop2 simultaneously. However, since they both utilize the bx register, it would be advantageous to be able to provide a microprocessor architecture to allow the above uops (uop1 and uop2) to simultaneously execute.
Superscalar microprocessors execute several operations simultaneously and therefore, any process that removes false data dependencies as discussed above must operate on a given set of operations simultaneously, the set being those operations that are simultaneously issued by the microprocessor within a given "cycle." Data dependencies may occur between operations of different sets or may occur within operations of the same set (i.e., intracyle). It would be advantageous to be able to eliminate false data dependencies between operands of different sets of operations and also between operands of the same set of operations. The present invention provides such advantageous result. Furthermore, it would be advantageous to preserve the use of partial widths of a larger register such as al, ah, and ax partial widths of the full width eax register while eliminating false data dependencies. The present invention provides such advantageous result.
When renaming of partial width registers is accommodated there are certain cases that cause renaming to stall. For instance, if a write to a partial width of a register is followed by a read of a larger width of the register, then the data required by the larger width read must be an assimilation of multiple previous writes to different pieces of the larger register. This is called a partial width stall. It would be advantageous to eliminate partial width stalls where possible. The present invention provides such advantageous result.
Accordingly, it is an object of the present invention to provide increased processor performance within a microprocessor, and particularly within a superscalar microprocessor. It is an object of the present invention to specifically increase the execution performance of a superscalar microprocessor by allowing more uops the ability to simultaneously execute within a given execution cycle. It is yet another object of the present invention to allow simultaneous execution of multiple uops that utilize the same registers as operands but are not truly data dependent uops. It is yet another object of the present invention to provide a mechanism ,and method for eliminating false data dependencies between operands of operations that are issued simultaneously by a superscalar microprocessor.
It is an object of the present invention to provide a register renaming capability that accommodates partial width registers. It is an object of the present invention to provide a register renaming capability that detects a partial width stall condition upon a larger width read of a register having partial widths that was previously renamed as a smaller width register. It is yet another object of the present invention to override partial width stalls in certain cases.
It is another object of the present invention to provide the above functionality within a high performance superscalar microprocessor resulting in increased execution performance. It is another object of the present invention to provide a general purpose computer system having such high performance superscalar microprocessor as an integral component. These and other objects of the present invention not specifically stated above will become evident according to discussions of the present invention to follow.