1. Field of the Invention
The present invention generally relates to computer processors. More particularly, the present invention relates to a system and method to reduce the power consumed by the floating point unit of a processor in inhibiting redundant reads of the floating point register during iterations of a loop, such as in scientific computing, where one or more source operands have not changed in value.
2. Description of the Prior Art
Power conservation is increasingly becoming a concern in both computer systems and processor design. The components of the processor, such as the logic gate transistors, buses and registers, generate heat from their electrical conductance in computer operations. The dramatic increase of chip components on a processor has exacerbated the problems associated with heat generation on the processor, as more components yield more heat during operation.
There have been several attempts in the prior art to alleviate processor power consumption problems. One method is to simply have the processor operate at lower power levels and clock frequency. Another solution has been to create modes within the processor that deactivate system power to components in a computer system when not in use. The processors include power-down circuitry that controls the power delivered to functional units of the processor, and the individual units of the processors have the power cut to them when it is determined that the unit is not necessary during the current operational cycle. However, this system adds to manufacturing costs of the processor, and creates significant overhead in activating and deactivating the units of the processor to affect overall performance of the processor.
One feature provided in state of the art processors is the availability of floating point operations. In early designs, because of processor design complexity, such features were provided via a separate co-processor. In modern processors, such floating-point functionality has been provided in the main processor in a floating point unit, and most modem processors clock the floating point circuitry, even though no floating point operations are currently executed, or floating point registers used. The floating point unit and processor are actuated by micro-code instructions that direct the loading and storing of floating point calculations.
Furthermore, in specific computer programs, a large iterative sequence can reuse the same series of components such that the components can become overheated and damaged from execution of the iterative program. In the example of a DAXBY/Dot Product Loop with an Execution Group of LU: MADD: STU: BC, the instruction cycle from BC→LFDU iterates at each execution of the loop. In numeric intensive computing (NIC), the utilization of the Floating Point Multiply Adder (FPMAD) approaches 100% since the entire FPMAD unit is used each cycle. The modern FPU is a very large unit (64-bit multiply/adder) that at high frequency can dissipate more power than all the other fixed point part of the core. A significant portion of this FPU power is dissipated in the floating point register file and this power is increasing in current designs for several reasons. First, the number of registers in the FPU has grown to as many as 128 or 256 registers for handling software loop unrolling, hardware renaming, multithreading (two sets of registers), VMX (128 128b registers), and other hardware-intensive items, and the register size is likewise increasing to 128 bits. Further, because of high-frequency cycle time pressures caused by the increasing number of physical registers, dynamic logic is usually required for the register file read ports. The power of such a large register file can become a third of the FPU power, which is more than 1/3  of the entire FX unit power if used at 100% utilization, which is a common case in scientific computing. Thus, the power and power density in the FPU are very excessive at clock frequencies above 5 GHz.
It would therefore be advantageous to provide a system and method that can reduce the power consumed in a tight loop of floating point calculations though minimizing unnecessary floating point register reads. Such system and method should be robust and not require significant overhead in processor manufacture or operation. Nor should the system and method unnecessarily operate the circuitry of processor or co-processor in assisting the floating point unit in the iterative calculations. It is thus to the provision of such a system and method that the present invention is primarily directed.