Floating point apparatus allow for improvements in the amount of time a data processing unit takes to perform arithmetic calculations. An American national standard has been developed in order to provide a uniform system of rules governing the implementation of floating point systems. This standard, ANSI/IEEE Standard No. 754-1985 is hereby incorporated by reference as background material. The standard specifies basic and extended floating point number number formats, arithmetic operations, conversions between integer and floating point formats, conversions between different floating point formats, conversions between basic format floating point numbers and decimal strings, and the handling of certain floating point exceptions.
The typical floating point arithmetic operation may be accomplished in either single precision or double precision format. Each of these formats utilizes a sign, exponent and fraction/mantissa field, where the respective fields occupy predefined portions of the floating point number. In the case of a 32-bit single precision number the sign field is a single bit occupying the most significant bit position; the exponent field is an 8-bit quantity occupying the next-most significant bit positions; the mantissa field occupies the least significant 23-bit positions. In the case of a double precision floating point number the sign field is a single bit occupying the most significant bit position; the exponent field is an 11-bit field occupying the next-most significant bit positions; the mantissa field is a 52-bit field occupying the least significant bit positions.
Conventional techniques for performing floating point operations are achieved by utilizing three steps: (i) pre-alignment, (ii) arithmetic operation, and (iii) normalization (these techniques are used for `addition-type` instructions, such as Add, Subtract, Compare, Muliply-Add, etc.; they are not generally used for such operations as Multiply or Divide). The pre-alignment step is used to align the mantissa portion of the floating point numbers to be operated upon, such that respective exponents are equal in value. Normally, the number having the larger exponent remains unshifted, whereas the number having the smaller exponent has its mantissa portion shifted to the right a number of bit positions corresponding to the difference between exponent values. For example, assume a binary representation of floating point A having an exponent of 5 (meaning 2 to the 5th power), needs to be added to a floating point number B having an exponent of 3 (meaning 2 to the 3rd power). The number B will have its mantissa portion shifted to the right two bit locations, and the exponent of B will be increased by two, such that both exponents are now five. Note that the numbers still maintain the same value as they originally had, but are merely represented in a different floating point internal representation.
After pre-alignment, the second step of conventional floating point arithmetic is performed. This step performs the arithmetic operation specified, such as addition, subtraction, multiplication, division, etc. Both the exponent and mantissa fields are operated upon.
The third traditional step is to normalize the data after the arithmetic operation. This step allows for maintaining the highest degree of precision of floating numbers, as is commonly known to those of ordinary skill in the art. In standard systems, this normalization is accomplished by counting the number of leading zeroes contained in the resultant mantissa field. This count is then subtracted from the exponent field, and the mantissa field is shifted left by a similar count amount, resulting in a one bit being in the most significant bit position of the mantissa field.
Numerous attempts have been made to improve the amount of time required to calculate/perform floating point operations. Conventional methods perform addition of addends, and then shift the resultant amount to remove leading zero-bits. Improved techniques determine an approximate shift amount by analyzing the addends during addition. This results in time savings. Another similar technique analyzes the fractional result for predicting when post-normalization and rounding can be eliminated.
Other techniques perform two operations in parallel, and choose one of the two resultants of these parallel operations. In a first path, the steps of pre-alignment and addition are performed. In the second parallel path, the steps of addition and post-normalization are performed. This is a performance improvement over the conventional method which does pre-alignment, addition, and post-normalization, as only two steps are performed in any given path. The two resultants are compared, at the end of the two parallel operations, to determine which resultant conforms to being normalized.
Other techniques anticipate leading zeros in parallel with the arithmetic unit, as disclosed in Hokenek, E. et al, "Leading-zero anticipator (LZA) in the IBM RISC System/6000 Floating-Point Execution Unit", IBM Journal of Research Development, Volume 34, No. 1, January 1990, and Montoye, R. et al, "Design of the RISC System/6000 Floating Point Execution Unit", IBM Journal of Research Development, Volume 34, No. 1, January 1990, both hereby incorporated by reference as background material. However, these techniques fail to accommodate large data paths in an efficient manner.
As data processing system's have grown in complexity, it has become necessary to increase the bus width, or data path, used to transfer information from one portion of the system to another. Original central processor units, or CPU's, had 4 and 8 bit bus widths. To increase system throughput, the bus widths have been increased in order to transfer more data in the same amount of time. This is desirable as, given a bus bandwidth that has a maximum transfer rate, only a given number of data exchanges can occur for a given time period. By increasing the width of the bus, more information can be transferred while maintaining a fixed bus bandwidth. As such, 16 bit buses such as the Intel 80286 microprocessor and 32 bit buses such as the Intel 80386 microprocessor, have become increasingly popular. Even larger bus widths are easily envisioned to be forthcoming in the near future.
These increases in bus data width cause increases in the die size of the microprocessors, however. As the die size is proportional to the overall manufacturing cost of a given integrated circuit component such as a microprocessor, these higher bus width components result in a proportionally higher cost. Further, the amount of functionality that can be placed on the integrated circuitry is reduced for devices that support these larger bus widths. This is because a 32 bit register takes approximately twice the die surface area of a 16 bit register. As less functionality can be maintained in a given device having a large data path, more integrated circuit components are required to maintain a given functionality. Thus, for data processing systems having large bus widths, numerous integrated circuit components are required. This results in higher costs to the end user.
Another driving force in the data processing system is the desire to continually integrate more and more functionality into a given integrated circuit component. As an example, computers today are of similar size to that of calculators manufactured ten years ago. Further, as previously discussed, as the transfer rates across buses approach their upper limits, larger data paths are needed to increase system throughput and performance. These larger data path requirements run counter to the quest for greater integration, as larger data paths require more devices. There is a real need to provide increases in data path widths supported by an integrated circuit component without a corresponding increase in integrated circuit component size.