(1) Field of the Invention
The present invention relates to the field of microprocessor architecture. Specifically, the present invention relates to the field of microprocessor architecture for increasing processing efficiency within microprocessors by removing false data dependencies between instructions by renaming flags and by using flag masks.
(2) Prior Art
Microprocessors execute instructions ("uops") by reading source operands from registers and storing destinations or result operands into registers. A register is a temporary storage area within a microprocessor for holding arithmetic and other results used by microprocessor device. Different registers may be used for different functions. For example, some registers may be used primarily for storage of arithmetic results, while other registers may be used primarily for conveying status information via various flag bits (such as system status or floating point status). Registers are individually composed of bits and a given register may contain various bit widths. For example, a 32 bit register may also contain separate 8 bit widths or a separate 16 bit width. Each of the above different register widths for a given 32 bit register may be separately addressable.
The register set of the well known Intel architecture has specially defined registers. For background information regarding the register set of the well known Intel macroarchitecture, reference is made to Chapter 2 of the i486 Microprocessor Programmer's Reference Manual, published by Osborne-McGraw-Hill, 1990, which is also available directly from Intel Corporation of Santa Clara, Calif. In terms of the Intel macroarchitecture register set, 32-bit arithmetic registers are called eax, ebc, ecx, and edx. With reference to eax, this register is composed of other registers of varying width; the low word 16 bits of the eax register are called the ax register. The low byte of the ax register is the al register. The high byte of the ax register is the ah register. Likewise in similar fashion, the other 32-bit registers, ebx, ecx, and edx individually contain separate registers of varying widths. The basic arithmetic registers for use within the Intel macroarchitecture register set include: eax, ebx, ecx, edx, edi, and esi (as well as the partial bit widths thereof). These are the logical registers.
The amount of registers available within the Intel architecture register set is adequate and advantageous within some microprocessor architectures that are not superscalar or that are superscalar but at most execute two instructions per instruction cycle. However, the register set of the Intel architecture is somewhat limited and it would be advantageous to be able to expand the register set in some way. Superscalar microprocessor, as any other microprocessor, could take advantage of the increased register set to increase performance. Superscalar microprocessors execute uops simultaneously that do not have data dependencies between them. For instance, consider the pseudo code below.
______________________________________ uop0: mov eax, 0x8A uop1: add eax, ebx uop2: add ecx, eax ______________________________________
The uop1 may not execute simultaneously with uop0 because uop1 adds the value of eax with ebx and stores the result into eax. Therefore, uop1 requires the result of uop0 to perform its operation. Likewise, uop2 requires the result (i.e., eax) of uop1 and therefore may not execute simultaneously with uop1. When one uop requires as a source of information a register from a prior uop that is a destination register, this condition is referred to as a data dependency between the two uops. For instance, uop2 and uop1 are data dependent. Some data dependencies, like the above, are unavoidable and therefore impact on the performance of a superscalar microprocessor simply because some uops demand a particular execution order. These data dependencies are called true data dependencies.
However, other data dependencies of uops are not true data dependencies and are more the result of the limited size of a particular microprocessor's register set. Because a register set may be constrained in size, uops may tend to utilize the same registers as temporary storage locations (registers) rather than moving data to and from memory. This is the case because memory moves take quite a large amount of processing time and are very costly to processor overall performance. Therefore, a small register set may create a form of "bottleneck" in the performance stream of a superscalar microprocessor as multiple uops target the same register for temporary storage of data but really do not depend on the data of these registers for their own execution. For instance, consider the instruction code below:
______________________________________ uop0: mov bx, 0x8A uop1: add ax, bx uop2: mov bx, cx uop3: inc bx ______________________________________
While uop1 is data dependent on the result of uop0 for the bx register, there are no data dependencies between uop2 and uop1. Although uop2 and uop1 both utilize the bx register, the source value of uop2 does not in any way depend on the outcome of the execution of uop0 or uop1 even though both uops in some way utilize the bx register. This is called a false dependency between uop1 and uop2. The same is true for uop3 in that uop3, while data dependent on uop2, does not depend on the results of either uop0 or uop1. Therefore, a superscalar microprocessor should (in principle) be able to at least execute uop1 and uop2 simultaneously. However, since they both utilize the bx register, it would be advantageous to be able to provide a microprocessor architecture to allow the above uops (uop1 and uop2) to simultaneously execute.
The same data dependencies may occur between flag bits of flag registers that indicate the value of system, arithmetic or control flags within the microprocessor. Different instructions may read and update different flag bits. Instructions that write to a flag register may create data dependencies between later issued instructions that need read the bits of the flag register as input. Therefore, like the arithmetic registers as described above, it would be advantageous to eliminate false flag data dependencies between instructions caused by usage of the flag registers in order to allow more parallel processing of instructions. The present invention offers such a solution.
Superscalar microprocessors execute several operations simultaneously and therefore, any process that removes data dependencies as discussed above must operate on a given set of operations simultaneously, the set including those operations that are issued by the issuing units of a microprocessor within a common clock "cycle." Data dependencies may occur between operations of different sets (intercyle) or may occur within operations of the same set (intracyle). It would be advantageous to be able to eliminate false data dependencies between operands of different sets of operations and also between operands of the same set of operations that arise as a result of the flag registers. The present invention provides such advantageous result. Furthermore, when updating a register renaming table simultaneously for a set of operations, it would be advantageous to account for data dependencies between logical destination operands of a given set of operations as a result of the flag registers. The present invention provides such advantageous result.
Further, since instruction pipelines are arranged such that register renaming units (for a given operation) are performed typically before the operation is executed, it would be advantageous to provide a mechanism for indicating which flags an operation actually updates after execution. The present invention provides such functionality.
Accordingly, it is an object of the present invention to allow more efficient processing performance within a superscalar microprocessor. It is an object of the present invention to specifically increase the execution performance of a superscalar microprocessor by allowing more parallel processing of instructions. It is yet another object of the present invention to allow simultaneous execution of multiple instructions that utilize the same registers as input/output operands but are not truly data dependent. It is yet another object of the present invention to provide a mechanism and method for eliminating false data dependencies between operands of instructions that read or write flag data or flag bits. It is an object of the present invention to provide a capability for renaming the flag registers of a microprocessor to a larger set of physical registers to eliminate false data dependencies between instructions to increase processor parallelism. It is yet another object of the present invention to provide the above capability for intracycle flag data dependencies between the inputs of one operation and outputs of proceeding operations within the same set of operations of a given clock cycle. It is yet another object of the present invention to provide a mechanism and method for updating a register renaming table to rename flag registers, taking into consideration intracycle data dependencies.
It is an object of the present invention to provide a mechanism and method for retiring flag register information efficiently by indicating those flags that are updated by an operation as a result of the operation's execution.
It is another object of the present invention to provide the above functionality within a high performance superscalar microprocessor resulting in increased execution efficiency and increased parallelism. It is another object of the present invention to provide a general purpose computer system having such high performance superscalar microprocessor as an integral component. These and other objects of the present invention not specifically stated above will become evident according to discussions of the present invention to follow.
(3) Related U.S. Application
The present invention is a continuation-in-part of application Ser. No. 08/129,867 filed on Sep. 30, 1993 and entitled "N-Wide Bypass for Data Dependencies Within Register Alias Table," and assigned to the assignee of the present invention.