The present invention relates to residue calculation and correction within a floating-point unit (FPU) of a microprocessor, and more specifically, to residue calculation with built-in correction eliminating a need for additional circuitry or logic delay.
A conventional FPU of a microprocessor typically includes a residue checking apparatus which performs residue checking for detecting errors in arithmetic floating-point operations such as addition, subtraction, multiplication, division, square root or convert operations. The residue checking is performed within a checking flow by performing the same operations on the residue as those performed on the operands of the FPU. That is, a checking flow is performed in parallel to a data flow within the FPU. In FIG. 1, a data flow 1 and a checking flow 2 of a conventional residue checking apparatus for a FPU are shown. Operands A, B and C are provided by an input register 3 in the data flow 1. The operands A, B and C are processed differently based on different functional elements 4 e.g., an aligner 21 and a normalizer 22, and a result is provided by a result register 5.
Residues are generated at specified positions within the data flow 1 by residue generators 6. When performing residue-checking of the FPU, several residue calculations are performed via the checking flow 2 in parallel with the data flow 1 performing the operations on the data. Therefore, modulo decoders 7 are connected to the residue generators 6 and provide residue modulos to different functional elements 8 such as a modulo multiplier 16, modulo adder 17, modulo subtract 18, modulo add/sub 19, and modulo subtract 20 within the checking flow 2. In the first stage 10 of the checking flow 2, the residue modulos of operands A and B are multiplied by the modulo multiplier 16. In the second stage 11, the residue modulo from operand B is added to the product-residue modulo from stage 10 via the modulo adder 17. In the third stage 12, the residue modulo of bits lost at the aligner 21 is subtracted by the modulo subtract 18 from the sum of the second stage 11. During, the residue checking operation, residue corrections to the actual residue value corresponding to the manipulated data in the data flow 1 may be necessary. For example, a small correction amount such as +/−1 may be necessary. Therefore, in the fourth stage 13, residue correction of +/−1 is performed by the modulo add/sub 19. Then, in the fifth stage 14, a residue-subtract of bits lost at the normalizer 22 is performed by the modulo subtract 20. In the sixth stage 15, a single check operation is performed by a compare element 9. The compare element 9 compares the result provided by the modulo subtract 20 with the residue modulo of the result provided by the result register 5 of the data flow 1.
Each residue generator 6 includes a residue generation tree 23 as shown in FIG. 2. FIG. 2 illustrates a conventional modulo 15 residue-generation tree 23, for example. Different residue values other than residue 15 are also utilized. As shown in FIG. 2, register-bits of an operand register 24 carry 32 bits of an operand, starting with the most significant bit (MSB) in the register-bit indicated with “0” on the left, and ending with the least significant bit (LSB) in the register-bit indicated with “31” on the right. The residue-generation tree 23 includes a plurality of modulo 15 decoders 26 and a plurality of residue condensers 28. Each modulo 15 decoder 26 is connected with four adjacent register-bits of the operand register 23 for receiving in parallel four bits of numerical data. Every adjacent pair of modulo 15 decoders 26 is connected to a residue condenser 28. Further, each residue condenser 28 is connected to two residue condensers 28 from a previous stage. According to m=2b−1, a number of segment bits b=4 is required to receive a modulo base m=15. According to w=p*b the number of segments p=8 in combination with an operand with an operand width w of w=32. In the conventional residue generation tree 23, the number input into the residue generator 6 typically does not use all of the input bits because floating point data include a mantissa and an exponent, and the exponent is extracted and handled separately. Therefore, the register-bits that contain the exponent-bits at the entrance (as indicated by the arrows 29) now containing the MSBs or LSBs of the number are filled with logical zeros and are not used to generate a residue value. Also, to save design and circuitry-work in the floating point unit, the same residue-generating macro is typically used multiple times for the different residue-generation-points within the unit, and since these residue generators do not all need the full width of the dataflow, typically some bits of these residue generators are unused and are tied to zero.