The present invention relates to a floating-point unit and, more particularly, to a floating-point unit that is capable of utilizing standard MAC units for performing operations on traditional data type formats and on SIMD data type formats.
As processor speeds and data sizes increase, a critical bottleneck in computation performance for floating-point operations exists with respect to the amount of data that can be brought into the floating-point unit at any one time. With the evolution of processor architectures to 64-bit architectures and greater, the impact of this bottleneck can only be reduced by either utilizing more data load ports, and thus more load bandwidth, or by dividing the 64-bit data into smaller pieces and performing multiple operations on these smaller pieces. This later technique is particularly useful for performing many small operations that do not require precision as great as one 64-bit floating-point number, which is referred to in the Institute of Electrical and Electronics Engineers (IEEE) floating-point form standard as a double word. For example, in typical graphics display operations, floating-point operations are computationally intensive, but do not require the range that a 64-bit number is capable of representing. Therefore, this later method of dividing the data into smaller pieces and operating on these smaller pieces can be used advantageously in this type of environment.
Some known architectures that are designed to implement this technique utilize what is commonly referred to as single instruction, multiple data (SIMD) operations. A SIMD instruction causes identical operations to be performed on multiple pieces of data at the same time, i.e., in parallel. Storing smaller pieces of data in one larger register is a more efficient use of die area than storing the smaller pieces of data in a plurality of smaller registers. Therefore, SIMD operations are normally performed on the smaller data pieces in a single, larger register simultaneously. Also, it is necessary to perform the SIMD operations on the smaller data pieces at the same time in order to meet the requirements of SIMD operations.
Processor architectures are currently being designed to support both traditional and SIMD type data formats. Traditional data type formats typically have wider bit sizes than SIMD data type formats. In order to support both of these types of operations, SIMD and standard functional units have been implemented in these architectures for processing traditional and SIMD data type formats. These functional units, one type of which is commonly referred to as multiply accumulate (MAC) blocks, perform various types of arithmetic functions, such as, for example, adds, subtracts and multiplies on the data presented to them. The primary reason for utilizing dedicated MACs for handling SIMD operations is that these dedicated MACs are capable of simultaneously performing two SIMD operations. However, implementing these dedicated SIMD MACs in a floating-point unit is costly in terms of the amount of additional die area consumed by the SIMD MACs. Furthermore, since SIMD operations typically represent approximately less than five percent of all operations performed by the floating-point unit, the tradeoff of die area for processing throughput is expensive.
Accordingly, a need exists for a floating-point unit which is capable of operating on multiple data type formats and which does not require dedicated hardware for each of the different data type formats.
The present invention provides a method and apparatus for performing floating-point operations. The apparatus of the present invention comprises a floating point unit which comprises two standard multiply accumulate units (MACs) which are capable of performing multiply accumulate operations on a plurality of data type formats. The standard MACs are configured to operate on traditional data type formats and on single instruction multiple data (SIMD) type formats. Therefore, dedicated SIMD MAC units are not needed, thus allowing a significant savings in die area to be realized.
In accordance with the present invention, when a SIMD instruction is to be operated on by one of the MAC units, the data is presented to the upper and lower MAC units as 64-bit words. Each MAC unit also receives one or more bits which cause the MAC units to each select either the upper or lower halves of the 64-bit words, depending on the MAC unit. For example, the lower 32-bit words may be processed by the upper MAC unit and the upper 32-bit words may be processed by the lower MAC unit.
Each MAC unit operates on its respective 32-bit words. The results of the operations performed by the MAC units are then coalesced by the bypass blocks of the floating-point unit into a 64-bit word. The results are coalesced in such a manner that the results appear identical to the results obtained in floating-point units which utilize dedicated SIMD hardware.
These and other features and advantages of the present invention will become apparent from the following description, drawings and claims.