1. Field of the Invention
This invention relates in general to microprocessors, and more particularly, to microprocessor architectures using aliased registers in an out-of-order machine.
2. Relevant Background
Modern computer processors (also called microprocessors) conventionally provide a programmer with a choice between different levels of numeric precision for the execution and calculation of arithmetic floating-point (i.e., non-integer) operations such as add, subtract, multiply, or divide. For instance, a microprocessor could support single-precision floating-point operations and double-precision floating-point operations, wherein the double-precision floating-point operations utilize generally twice as many bits as the single-precision operations (i.e., 64-bit double-precision operations vs. 32-bit single-precision operations).
As microprocessor architectures are developed and designed for greater computer throughput and computational accuracy, double-precision or multiple-precision floating-point operations have essentially become the programmer's standard arithmetic operation. Conventional processors are now generally designed to support double-precision operations as the baseline arithmetic operation. However, because older software programs have been written for older processors using single-precision floating-point operations, it is beneficial that a processor design provide support both for double-precision floating-point operations and single-precision floating-point operations. In this way, software written using single-precision floating-point operations for an earlier generation of a processor should operate without modification on a newer design of a processor.
In order to provide efficient use of the processor's register resources, single-precision floating-point registers and double-precision floating-point registers can be arranged utilizing an "aliasing" or overlapping technique. When two or more data addresses refer to the same datum, the address is said to be "aliased". FIGS. 1A and 1B illustrate such an arrangement or register file of floating-point registers utilized in SPARC, a scaleable processor architecture. FIG. 1A shows thirty-two single-precision (32 bit) registers f0 f1, f2, . . . f31. FIG. 1B shows a set of sixteen double-precision (64 bit) registers f0, f2, f4, . . . f30 which utilize an aliasing or overlapping arrangement to support both double-precision and single-precision floating-point operations. In the example of FIG. 1A and FIG. 1B, the double-precision register f4, single-precision register f4, and single-precision register f5 all refer to the same datum.
Referring to FIG. 1B, each double-precision register is 64 bits wide and comprises two single-precision registers. For instance, double-precision register f4 is a 64-bit register formed from the single-precision register f4 (32 bits) occupying the most significant or higher 32 bits, and the single-precision register f5 (32 bits) occupying the least significant or lower 32 bits.
When coding an arithmetic floating-point instruction, depending on the desired accuracy, a programmer could refer to a single-precision 32-bit register such as single-precision register f4 or single-precision register f5, or a double-precision 64-bit register such as double-precision register f4. In SPARC, each single-precision register is aliased to a corresponding double-precision register (i.e., single-precision f5 is aliased to double-precision f4).
For example, the following floating-point operation references double-precision registers: EQU fadd.d f2, f4, f6
This instruction adds the contents of double-precision registers f2 and f4 (referred to as the operands or source registers), and stores the result in double-precision register f6 (known as the destination register).
The following floating-point operation references single-precision registers: EQU fmul.s f3, f4, f7
This instruction multiples the contents of single-precision registers f3 and single-precision register f4, and stores the result in single-precision register f7.
Traditionally, processors have been designed using various techniques for improving their performance and increasing the number of instructions per clock cycle which the processor can execute. These techniques have included pipelining, super pipelining, super scaling, speculative instruction execution, and "out-of-order" instruction execution. While early processors executed instructions in a sequential order determined by the compiled machine language program, modern processors using multiple pipelines which can simultaneously process instructions when there are no data dependencies between the instructions in each of the pipelines. If a data dependency exists between the instructions in one or more pipelines, the pipelines "stall" and wait for the dependent data to become available.
As an example of a dependency, the following two double-precision operations share a double-precision data register (i.e., f6), and therefore the second instruction is dependent on the completion of the first operation (underline indicates dependency):
______________________________________ fadd.d f2, f4, f6 fadd.d f6, f8, f10 ______________________________________
With double precision registers, there are two possible dependencies per instruction since each source register can be dependent on one prior instruction.
Sequential or "in-order" processors can generally utilize aliased double-precision and single-precision registers without concern for the register dependencies between aliased registers. This is because in-order processors guarantee that each single-precision instruction would have no data register dependency due to aliasing.
However, with out-of-order operations, the number of possible dependencies that a single-precision register can have effectively doubles because a given single-precision source register could be dependent on prior operations which utilized either the same single-precision register or the aliased double-precision register. For example, a single-precision operation utilizing single-precision register f5 as a source register could be dependent upon a prior operation storing a value to a destination register utilizing either single-precision register f4 or single-precision register f5 (underline indicates dependency):
______________________________________ fadd.s f6, f9, f4 fadd.s f0, f1, f5 fadd.s f5, f7, f9 ______________________________________
This is because the single-precision register f5 and single-precision register f4 are both aliased into the double-precision register f4. Hence, for single precision operations using aliased registers, there are at least four possible dependencies per instruction since each source register can have two possible dependencies.
While microprocessor architectures can be designed to track multiple dependencies between different instructions, it is desirable to reduce the number of dependencies which an out-of-order processor must track so that the processor's performance is improved.
What is needed is a system, method, and processor for handling aliased registers in an out-of-order processor so that the number of register dependencies which need to be tracked within the processor can be reduced.