The present invention relates to floating-point units and, more particularly, to a floating-point unit which is capable of performing load bypasses with data conversion in order to reduce the amount of time required for a load operation to be completed.
Floating-point units have been developed that implement a technique commonly referred to as xe2x80x9cbypassingxe2x80x9d in order to improve processor performance. When the results of a first operation to be performed by a floating-point unit are needed to be used in a subsequent operation to be performed by the floating-point unit, if bypassing is not performed, the second operation cannot be launched until the results of the first operation are available and have been stored in the register file block of the floating-point unit. However, the results of the first operation typically are available at some point in time prior to the results being stored in the register file block. Bypassing is used to bypass the result of the first operation into the second operation in order to enable the second operation to begin once the results of the first operation are available and prior to those results being stored in the register file block. Therefore, bypassing improves processing throughput of the floating-point unit by eliminating dead states that would be incurred if the second operation could not begin until the results of the first operation had been written back to the register file block.
Load bypassing has been performed in relation to floating-point units. Generally, a load operation involves retrieving instructions and data for a particular operation to be performed from an on-chip or off-chip cache memory element, decoding the instructions, converting the decoded instructions into a form suitable for use by the floating-point unit, and storing the converted data in the register file block. These load operations typically require a large number of cycles. It has been determined that if the number of cycles required for a load operation could be reduced by as little as one cycle, the overall throughput of the floating-point unit could be significantly increased. However, performing a load bypass is a difficult task.
The primary difficulty associated with performing a load bypass is caused by the aforementioned data conversion process which occurs after the instructions have been decoded and before the data is loaded into the register file block. This data conversion is dependent upon the type of data retrieved from the cache memory element. For example, various types of data are typically processed by floating-point units and are defined by the Institute of Electrical and Electronics Engineers (IEEE) floating-point standards. These data types are typically referred to as xe2x80x9cnon-normalxe2x80x9d data types and include de-normalized numbers, infinites, and the not a number (NAN) constant. Other data types also exist that are not defined by the IEEE standards which are architecture-specific and, therefore, require some additional conversion, such as changing the exponent to a particular constant value.
The amount of time required to do this additional exponent conversion makes it particularly difficult to perform load bypasses from a current load instruction A to a floating-point arithmetic instruction B which uses the results of A. Accordingly, a need exists for a floating-point unit which is capable of performing load bypasses with conversion.
The present invention provides a method and apparatus for performing load bypasses with data conversion in a floating-point unit. The process of reading instructions and data out of a cache memory component, of decoding the instructions, of performing a memory-format-to-register format conversion and of writing the converted data to the register file block of a floating-point unit is known as a load operation. A load operation occurs over many cycles. In accordance with the present invention, the number of cycles normally required to perform a load operation has been shortened significantly, thereby dramatically increasing the overall throughput of the floating-point unit. In accordance with the present invention, the floating-point unit performs a load bypass with conversion, which significantly shortens the load operation time.
The apparatus of the present invention comprises a floating-point unit which comprises a register file, at least one bypass component and at least one multiply accumulate unit. The register file comprises a plurality of registers for storing operand data to be operated on and for storing results of operations that have been performed by the floating-point unit. The bypass component is configured to perform a memory format-to-register format conversion. This memory format-to-register format conversion includes a partial conversion process and a final conversion process. The partial conversion process includes the steps of formatting data into a format which is suitable for storage in the registers of the register file, detecting whether operand data to be operated on includes a special case exponent, and generating at least one special case flag which indicates whether or not a special case exponent has been detected.
The multiply accumulate unit is configured to perform an arithmetic operation on the operand data. The multiply accumulate unit is also configured to receive the results of the partial conversion process including the special case flags and to perform the final conversion process. Once the bypass component has performed the partial conversion process, the bypass component bypasses the results of the partial conversion process, including the special case flags, to the multiply accumulate unit. While the results of the partial conversion process including the special case flag are being bypassed to the multiply accumulate unit, the bypass component performs the final conversion process and writes the results of the final conversion process to the register file. The final conversion process completes the memory format-to register format conversion.
While the bypass component is writing the results of the final conversion process to the register file, the multiply accumulate unit begins operating on the results of the partial conversion process and simultaneously performs the final conversion process. During the final conversion process, the multiply accumulate unit utilizes the special case flags in order to obtain a converted exponent value. While the multiply accumulate unit is performing the final conversion process, it begins performing the arithmetic operation on the partially converted results. The first phase of this arithmetic operation does not require the fully converted results. However, the second and third phases of this operation do require the fully converted results because the exponent value is need, which is obtained during the final conversion process. The final conversion process is completed by the time the multiply accumulate unit needs the fully converted results.
These and other features and advantages of the present invention will become apparent from the following description, drawings and claims.