1. Field of the Invention
The invention relates to methods and systems for floating-point operations and has been developed with particular attention to its use in VLSI (Very Large-Scale Integration) circuit implementation of signal processing applications.
2. Description of the Related Art
A floating-point arithmetic support is a major asset in permitting easy and effective implementation of modern multimedia and signal processing in VLSI circuits. A floating-point signed adder, able to perform addition and subtraction, represents the basic arithmetic operator in many signal processing applications.
VLSI implementation of such an adder involves a very high number of logic modules in order to perform the basic steps of operand (significand) alignment, integer addition, re-normalization, and rounding.
Floating-point operators are preferably implemented using a pipeline architecture, which drastically increases the maximum throughput. A general-purpose microprocessor (such as, e.g., the Intel Pentium® microprocessor) employs a deep pipeline, which permits the execution of floating-point operations in several clock cycles. Such a processor typically includes more than ten pipeline stages. Operation thereof does not exclusively rely on floating point (FP) adders and multipliers, in that the processor includes complex circuits such as, e.g., dividers, square root extraction circuits for use in image processing and so on. The standard literature on FP adders indicates an adder including 4-5 stages as a “good” adder. High-speed adders typically include 3 stages.
Embedded cores (such as ST230, also called LX-Mobile) have a shorter pipeline depth, which forces them to perform the floating-point operation in a few clock cycles. Specifically, in the case of Lx-Mobile, the three-stage structure is purely notional, in that the first and third stages are partly occupied by external logical circuitry. The equivalent latency in terms of the ratio of the total delay to the clock period is around 2.25 clock cycles.
High-speed floating-point addition procedures typically employ a Leading Zero Anticipatory Logic (LZA) circuit to partially perform the re-normalization process in parallel with the execution of the integer addition. A LZA logic circuit is currently included in commercial solutions such as the Super H (ST-Hitachi) and IBM RISC system/6000. Unfortunately, this approach introduces a small precision error in the results.
A high-speed floating-point signed addition represents a major task for a core processor that dedicates few clock cycles in the execution pipeline. Unfortunately, this important operation requires several tasks that employ the major latencies in a VLSI circuit implementation.
In particular, the problem of counting the leading zeroes produced by the integer adder arises whenever a subtraction operation (addition with opposite signs) is performed with two floating-point numbers. In that case the “absolute” result produces several zeroes thus leading to an un-normalized result. A re-normalization unit is therefore used to count the leading zeroes and shift the un-normalized result according to the leading-zero count.
Generally, this operation involves one clock cycle in latency. The use of a leading zero anticipatory logic permits execution of the leading zero count in parallel with the computation of the integer sum.
In FIG. 1A an integer sum operation is schematically shown as performed in an adder module 30, starting from operands (significands) stored in two registers 10 and 20. The result of the sum operation is fed to a leading zero counter 40 and after a clock cycle the output of the leading zero counter 40 is used to perform the normalization of the sum in a shifter module 50. Finally, the result of the normalization is stored in a register 60.
In the arrangement shown in FIG. 1B, a leading zero anticipatory module 45 operates in parallel with the adder module 30. In this case, the shifter module 50 can perform the normalization of the sum in the subsequently clock cycle, saving one clock cycle.
Background literature concerning the Leading Zero Anticipatory (LZA) approach includes T. Chang, J. Huang and S. Yang “Leading-zero anticipatory logics for fast floating additions with carry propagation signal” IEEE 1997, and H. Suzuki, H. Morinaka, H. Makino, Y. Nakase, K. Mashiko, and T. Sumi “Leading Zero anticipatory logic for high-speed floating-point additions” IEEE Journal of Solid State Circuits, Vol. 31 199. These articles explain, i.a., how the circuitry introduces a wrong estimation, subtracting two nearby operands (significands).
As indicated, an integer adder for floating point operands (significands) represents an important element in a floating-point signed adder. This circuit is dedicated to executing the operand (significand) addition. This operation is anticipated by the operand (significand) alignment in order to compute two floating-point numbers with the same exponent value.
In general terms, the problem of integer addition has different solutions depending on different design criteria: high-speed adder, low-area adder and low-power adder. Basically, a number of different solutions are known and currently used, namely:
Carry Ripple Adders,
Carry Look Ahead and Brent-Kung approach, and
Carry Skip Adders.
More to the point, one may distinguish a first category of solutions where the carry signal is propagated from a full adder (FA) to the next adder. This solution represents the simplest way of performing integer addition, but is affected by large latencies.
The Carry Look Ahead (CLA) and the Brent Kung approaches directly compute the carry input in each full adder without propagating this signal from a full adder to the next one. This solution entails notable area consumption when operating with “deep” integer adders, but drastically reduces the involved latencies operating at higher frequencies.
FIG. 2 shows a typical Carry Ripple Adder layout. This circuit represents the slowest solution for integer addition. It is based on carry propagation from a full adder to the adjacent one. So the total latency is M times greater than the latency of a single full-adder (where M is the number of full adders involved).
In particular, each output and carry operation follows the Boolean expressions:Si=Ai⊕Bi⊕Ci  (1)Ci=Ai−1·Bi−1+(Ai−1⊕Bi−1)·Ci−1  (2)
The symbol ⊕ represents the Exclusive OR (XOR) operator, Si represents the sum produced by adding the operands Ai, Bi at the i-th level of the adder and Ci represents the carry (possibly) associated thereto.
FIG. 3 illustrates an example of Carry Look Ahead implementation. This circuit does not propagate the carry from a full adder to the subsequent. As opposed thereto, it employs two signals, called the “generate” wire (Gi) and the “propagate” wire (Pi), for the carry computation using the current input Ai and Bi.Gi=Ai·Bi  (3)Pi=Ai⊕Bi  (4)Ci=Gi+Pi·Ci−1  (5)Si=Pi⊕Ci  (6)
Here again the symbol ⊕ represents the Exclusive OR (XOR) operator, Si represents the sum produced by adding the operands Ai, Bi at the i-th level of the adder and Ci represents the carry (possibly) associated thereto.
If one assumes that the delay through an AND gate is equal to one “gate delay” and the delay through an XOR gate is equal to two gate delays, then the Propagate (P) and Generate (G) signals (which only depend on the input bits) will be valid after two and one gate delay, respectively.
Using the above expression to calculate the carry signals, it is not necessary to wait for the carry to ripple through all the previous stages to find its proper value.
This point may be made clear by making reference, e.g., to a 4-bit adder:C1=G0+P0·C0  (7)C2=G1+P1·C1=G1+P1·G0+P1·P0·C0  (8)C3=G2+P2·G1+P2·P1·G0+P2·P1·P0·C0  (9)C4=G3+P3·G2+P3·P2·G1+P3P2·P1·G0+P3P2·P1·P0·C0  (10)
The carry-out bit, Ci+1, of the last stage will be available after four delays (two gate delays to calculate the Propagate signal and two delays as a result of the AND and OR gate). The sum signal can be calculated according to expression (6).
Carry Ripple, Carry Look Ahead, and Carry Skip thus essentially represent different implementations of the same operation. These circuits are however different in terms of performance, namely—in order of importance: speed, area, and power requirements. This disclosure essentially refers to manipulating carries, and thus applies identically to all the possible implementations considered in the foregoing. For instance in the Carry Look Ahead case, the relationship (7) is identical to formula (2), and this is not by chance.
The problem thus arises of estimating with the highest possible accuracy the number of zeroes lying at the left of the sum of the floating point mantissas, in the case of a pure subtraction. This operation must be fast enough, in that the floating point sum involves other “slow” elements: exemplary of these are shifters as used in the re-normalization step (which is the last stage) and in the case where the mantissas are aligned to the same exponent (i.e., the first stage). The IEEE-754 standard for Binary Floating-Point Arithmetic (IEEE-754) defines the format of floating point numbers. Different formats are thus defined: single (32-bit), double (64-bit), and extended precision (80-bit). Each format is characterized by a double representation:
normalized, where the mantissa is preceded by a hidden “1”. For instance, the (decimal) floating point number 1.5 is represented by the binary (mantissa) 1.10000, where the first “1” is hidden. The (decimal) floating point number 1.75 is represented by 1.11000 (the first “1” at the right of the point has a weight 2^−1=0.5; the second weight 2^−2=0.25 and so on);
de-normalized, which is used to represent floating point numbers very close to zero. A de-normalized number has a mantissa of the type 0.XYZW where the hidden bit is equal to “0”.
If one needs to perform the sum of F1 and F2, where:F1=1.5×2^10F2=1.8×2^11the final result has an exponent 2^11, whereby the mantissa 1.5 must be aligned with the final result (2^11).
The “ideal” LZA produces (in a fast manner) an entire string having a length which is equal to the length of the adder with the same number of leading zeroes.
The following formula represents a fast method of computing such a string:Si=Ai⊕Bi 
It will be appreciated that this formula corresponds to the formula (1) above if the carry (Ci)—which is typically a slow signal—is neglected. Such an LZA arrangement operates correctly when:
no carries exist in the chain (which is a rare event), or
carries exist that extinguish before the first “1” in the result read from right to left (which is a more frequent event).
For instance, in the case:
0100011100001111—01010110the carry is “extinguished” in bit number 4 starting from left.
If conversely the following case is considered:
A=00001111 (a positive number with an exponent lower than the maximum one)
B=00000001 (negative number)
A represents a positive number with an exponent lower than B. The last zero represents the sign of the mantissa, and the following “1” represents a normalization “1” (which is indicated explicitly), while the rest is the actual mantissa (111). The three leading zeroes indicate that the difference between the exponents is 3, whereby the mantissa of A has been duly aligned.
Conversely, B represents (as a complement to 2) the mantissa of a negative number with an exponent higher than the exponent of A. By returning to the original number:
B (orig)=11111111
Such an LZA arrangement would produce 4 leading zeroes are actually three. Consequently the need arises for having a fast (or “early”) carry with a high degree of accuracy.