As software applications become more complex, hardware designers try to come up with different approaches to improve the performance of the system. Hardware circuits are limited by their physical hardware that affects practical logic designs. One of the important limitations is the fact that a real logic element cannot respond instantaneously to a change of signals on its inputs. There is a finite period of time after the input signal changes before the output signal will change. This time delay is dependent upon the type of circuit used, the number of inputs, and the specific geometry and physical composition of each specified component in a circuit.
One of the areas of the hardware that experiences the delay is the computation processing functions in the arithmetic logic unit (ALU) of the processor. FIG. 1A illustrates an exemplary full adder 100. The full adder 100 adds the three bits A0, B0 and Carry-in (Cin) together and produce a sum bit S0 and a Carry-out (C0). FIG. 1B illustrates the most basic type of carry propagate n-bit adder (CPA), the carry-ripple adder, implemented using n cascaded full adders. The result is all the sum bits (S0, S1, . . . , Sn−1) with one sum bit per result position (0 to n−1), where n is the number of bits per add operand. The carry-out bit C0 from the first full adder is propagated to the second full adder, etc. The carry-out bit C1 from the second full adder must wait for the carry-out bit C0 bit from the first full adder, etc. The wait time at each adder is a propagated time delay (tpd). Thus, for the n-bit adder, the total time delay is ntpd. This propagated time delay tpd for the output signals to be available provides a bottleneck for the speed performance of the adder, especially if the numbers to be added are longer in length. There are various carry propagate adders to speed up the execution of the add and subtract opertions. Some of these include carry skip, carry select, carry look ahead, and complex hybrid implementations. All CPAs suffer from increased delay due to increased precision of the operation.
One scalable approach to reduce the execution time is to implement redundant arithmetic. With redundant arithmetic implementation, each position's adder in FIG. 1B is not chained to the previous adder. Each adder performs the addition without having to wait for the carry out (Cout) bit from the previous adder. Each result position is represented by two bits, the sum bit and the Cout bit from the previous full adder. This is referred to as a redundant form as compared to the conventional form where there is one bit per result position. Only selected instructions can operate with redundant operands. Thus, an operand in the redundant form needs to be converted back to the conventional form with one bit per position for the instructions that require the conventional operands. Generally, an optimal carry propagate adder is used to convert the redundant form into the conventional form. Using the redundant arithmetic does not necessarily produce faster execution. The execution delay may increase when the redundant processing is always performed before the resulting operand is converted back to the conventional form. For example, the redundant addition of two conventional operands producing a redundant result can increase the delay when a next scheduled instruction requires an input operand in the conventional form instead of the redundant form.