Floating point units perform various arithmetic operations such as addition, subtraction, multiplication, division, square root on numerical operands represented in floating point notation. Floating point notation utilizes the format of a sign, a mantissa, and an exponent to represent a number. A floating point unit ascertains the sign, mantissa, and exponent for any input floating point number by decoding bit positions and then determining the sign, the numeric value of the mantissa, and the magnitude of the exponent from the decoded bits representing the floating point number.
The IEEE promulgates standards (specifically the ANSI/IEEE 754-1985) that govern the representation of numbers in floating point notation to ensure uniformity among floating point notation users. The IEEE standards include extended, double, and single precision formats as well as a special case called denormalized format. These formats determine the quantity of significant figures or size of the bit field for any number represented in floating point notation. For example, a double precision format defines 64 bits for operands with one bit representing the sign, eleven bits representing the magnitude of the exponent, and 52 bits representing the numeric value of the mantissa. Alternatively, a single precision format defines 32 bits for operands with one bit representing the sign, 8 bits representing the magnitude of the exponent, and 23 bits representing the numeric value of the mantissa.
Under IEEE formats having a normalized mantissa, floating point numbers rarely have equal exponents. Accordingly, when floating point units add or subtract a second operand represented in floating point notation to a first operand represented in floating point notation, the mantissa of the second operand typically must be shifted because an addition or subtraction cannot be performed until the exponent of the second operand equals the exponent of the first operand. The floating point units equalize the first and second exponents by shifting the mantissa of the second operand relative to the mantissa of the first operand. Shifting the second mantissa to the right increases its exponent one for each shift, while shifting the second mantissa to the left decreases its exponent one for each shift.
Floating point units typically include a comparator and alignment shifter for shifting the second mantissa such that the second exponent equals the first exponent. The comparator compares the values of the first and second exponents to determine the number of shifts the second mantissa requires to equalize the first and second exponents.
The alignment shifter includes a bit field greater than the bit field for the first mantissa in order to accommodate any bits of the second mantissa not aligned with the bits of the first mantissa. The first and second mantissas will not align whenever there exists a difference in magnitude between the first and second exponent. Specifically, if the second exponent exceeds the first exponent, at least one bit of the second mantissa will reside within the alignment shifter in a bit positions left of the most significant bit of the first mantissa (hereinafter referred to as second path bits). Conversely, if the second exponent is less than the first exponent, at least one bit of the second mantissa will reside in bit positions to the right of the least significant bit of the first mantissa (hereinafter referred to as sticky bits).
Floating point units include an adder that performs the desired operation (i.e., addition or subtraction) on the first mantissa and any bits of the second mantissa residing in bit positions aligned with the bits of the first mantissa to produce an intermediate result. The output from the adder forms an intermediate result because the final result of the operation must be modified if the shifting of the second mantissa created second path bits. If second path bits exist, the intermediate result must be modified by placing the least significant bit of the second path bits in the bit position to the left of the most significant bit of the intermediate result so that the final result reflects all the bits of the second mantissa.
Floating point units include a normalize shifter that normalizes the final result by shifting the leading one of the final result until it resides to the left of the most significant bit of the normalize shifter. However, before the normalize shifter can normalize the final result, a normalize shift value must be calculated from the position of the leading one within the second path bits. Consequently, floating point units include a leading ones detector (LOD) that inputs the second path bits, determines the position of the leading one, and calculates the normalize shift value required to control the normalize shifter.
Floating point units include an adjust circuit that will supply the final sign, exponent and mantissa required for IEEE correct results.
If the first exponent exceeds the second exponent, sticky bits rather than second path bits exist. Consequently, the intermediate result output from the adder does not require modification and, thus, forms the final result. A leading ones detector (LOD) similar to the LOD described above determines the leading one within the intermediate result and calculates a normalize shift value accordingly. Normalize shifter normalizes the unmodified final result and then outputs a normalized final result to the adjust circuit. The adjust circuit rounds off the final result utilizing the sticky bits.
One present floating point processing system includes a multiple stage processing pipeline for executing instructions, and circuitry within the pipeline for detecting the magnitude of the result. When the result is such that it can be represented in the fields described above for normalized numbers, the result is made available for storage or further processing. When the value of the result is so small that the exponent portion of the normalized result would be less than hex001, order of magnitude information will be lost when the result is stored into a memory organized according to the fields described above. Therefore the exponent must somehow be adjusted to avoid losing the information.
In one known floating point processing system, the registers surrounding the floating point unit have extended exponent field bit storage as well as a marker bit to indicate that the result stored in the register is not the standard denormalized format but is too small to be stored in the normalized format. The marked result is then available for processing by a subsequent instruction by being sent back through hardware to convert the result to denormalized format before making it available for storage in memory. Extra bits in the registers require that the random test pattern generator used for error detection and correction also allow for these extra bit positions. Further, the design of surrounding elements of the processor must take the marker bits into account an therefore the complication caused by the marker bit approach multiplies.
In another floating point processing system according to the IEEE standard, a small number is detected as an underflow exception and trapped for handling by software. The trap handler adjusts the exponent bias by a fixed constant as described in section 7.4 of ANSI/IEEE standard 754-1985 so that order of magnitude information is not lost during the interval that the result must be stored into a register that has an exponent field too small to contain the exponent accurately in normalized format. A program routine that usually exceeds twenty instructions then must split the exponent and mantissa into two integers and operate on them to convert the biased trapped result into the standard denormalized format.