Modern data processor process large amounts of data in a relatively short amount of time. In the drive to produce ever faster data processor, one of the critical speed limitations to overcome is the arithmetic logic unit speed. Therefore, any speed improvement in digital logic and the arithmetic logic unit, can directly affect modern data processor speed.
Digital data processor arithmetic involves the development of complex logic circuitry and of efficient algorithms utilizing the available hardware. Given that numbers in a digital data processor are represented as binary-strings of zeros and ones, and that hardware can perform only a relatively simple and primitive set of Boolean operations, all the arithmetic operations performed are based on a hierarchy of operations, the building blocks of which are the basic arithmetic operations.
One essential basic arithmetic operation repeatedly performed by a digital data processor is the addition operation, which is carried out by electronic circuits known as adders. An introductory description of adders can be found in a book by K. Hwang, entitled “Computer Arithmetic” published by John Wiley & Sons, New York, 1979, the contents of which are hereby incorporated by reference. Nevertheless, for the purpose of providing a complete and self contained description, an introductory explanation of the principles of binary arithmetic and the operations of binary components is given hereinbelow.
The most basic addition is an addition of two binary bits. Since each bit has only two possible values, 0 or 1, there are only four possible combinations of inputs. These four possibilities, and the resulting sums, are 0+0=0, 1+0=1, 0+1=1 and 1+1=10. The output (10) of the latter example is referred to as a 2-bit binary-string, or a 2-bit binary number, where the position of each bit represents the weight of the respective bit. Thus, in the binary-string 10, the weight of the “1” bit is double that of the weight of the “0” bit. A situation in which the number of output bits exceeds the number of input bits is known as overflow.
An adder which performs the addition of two binary bits consists of two output channels. One output is referred to as the sum and the other output is referred to as the carry. In the example of 1+1=10, the sum is the bit “0” and carry is the bit “1”, while in the other three examples the carry is the bit “0”, and the sum represent the correct addition result. The addition of two binary bits may be represented by two primitive logic gates: the carry output can be obtained by an AND gate and the sum bit which represents the rightmost bit can be obtained by a exclusive-OR gate, also known as a XOR gate. Such a simple adder is called a half-adder (HA) and is illustrated in FIG. 1.
When adding multiple-bit numbers, each pair of bits can produce an output carry, and an adder must be able to recognize and include a carry from a lower weight. This can be done by using two HA circuits. The first HA adds the two bits to produce a partial sum, while the second HA adds the carry of the first HA to partial sum to produce the final output. Such an adder is called a full-adder (FA). The logical gates of a FA are illustrated in FIG. 2a. 
A common practice in logic diagrams is to represent any complex function as a “black box” having input and output signals designated, thereby defining the complex function as a primitive which can be used in more complex diagrams. As the full adder is a basic building brick of almost any logic diagram, it is commonly designated by a separate symbol, shown in FIG. 2b. The inputs to the full adder are three binary bits (the two binary bits of the present weight and one carry from the lower weight). The outputs of the full adder are two binary bits: a sum and a carry.
Full adders are typically concatenated to each other, forming adder circuitry for addition of multiple-bit numbers. In a modern processor, the adder circuitry includes a way of negating one of the input numbers directly, so that the circuit is operable to perform either addition or subtraction on demand. Other functions are commonly included in modern implementations of the adder circuit, especially in modern microprocessors. The two most commonly encountered adder circuitry types are ripple-carry adders and carry-save adders.
FIG. 3a is a simplified diagram of a ripple-carry adder designed for adding two 4-bit numbers. Each FA is devoted to sum two bits of different weight. In ripple-carry adders addend bits of the same weight are added together, and a first carry bit is transferred to an adjacent higher FA when required. The final sum is directly derived from a bit-by-bit addition, with an appropriate carry to an adjacent higher order bit position and a single bit carry out from the highest order bit position. Addition, using the ripple-carry adder cannot be executed simultaneously for all the bits, since each FA needs the output carry from the preceding lower weight FA as an input before adding the bits. In other words, the propagation of the carry from one bit to the next bit tends to result in slow, non-parallel, operations of the ripple-carry adder because high order bit computations are dependent on the results from low order bits.
The above principles may be employed also for a ripple-carry adder for adding two n-bit numbers, where the inputs are two n-bit numbers and a binary bit input carry, and the output is one n-bit number and a binary bit output carry. A ripple-carry adder for adding two n-bit numbers is referred to as a 2:1 n-bit ripple-carry adder and it is commonly designated by a separate symbol, shown in FIG. 3b. 
A more efficient adder with respect to resulting computation delay is a carry-save adder. In carry-save adders, carry bits are accumulated separately from the sum bits of any given weight, thus, the addition process of all the weights is executed simultaneously. Consequently, the outputs of a carry-save adder are two binary-strings: a sum binary-string and a carry binary-string, which when added together yield the final result. The benefit of a first carry-save adder is that high order bits do not depend on any low order bit because all bit positions are calculated independently, thereby avoiding the propagation latency associated with carry bits in ripple-carry adders. Because of their speed and simplicity, carry-save adders are pervasively found in digital logic designs.
Reference is now made to FIG. 4a, which is a simplified diagram of a carry-save adder for adding three 4-bit numbers. The shown carry-save-adder includes four FAs each designed to add three equal-weight bits (one from each of the three 4-bit numbers), and to output a carry, C, and a sum, S. A carry-save adder which is designed for adding three n-bits numbers is referred to as a 3:2 n-bits carry-save adder and is commonly designated by a separate symbol, shown in FIG. 4b. One would appreciate that in the case of n-bit numbers, there are 2n output bits which may be referred in more than one S/C combination, e.g., S[n−1:0] and C[n−1:0] (FIG. 4a) or S[n−1:0] and C[n:1] (FIG. 4b).
A somewhat more complicated carry-save-adder is a 4:2 carry-save adder which is designed to add 4 operands to output two strings (a sum, S, and a carry, C). A typical case is a 4:2 carry-save adder for 4-bit operands illustrated in FIG. 4c. In FIG. 4c, three of the four operands, A, B and D, are fed into a first 3:2 carry-save adder, while the fourth operand, E, is fed directly into a second 3:2 carry-save adder together with the intermediate sum and carry outputted from the first 3:2 carry-save adder. An additional carry-in, Cin, may be also used, by feeding it into the second 3:2 carry-save adder (see FIG. 4c), where no carry-in is equivalent to Cin=0. A 4:2 n-bits carry-save adder is commonly designated by a separate symbol, shown in FIG. 4d. 
Irrespectively of the type of adder circuitry which is being used for adding multiple-bit numbers, the weights, that is to say the number of bit positions, of the inputs, dictates the number of elementary full adders which are needed to construct the adder circuitry. When the digital codes of the numbers are added and the output obtained by the addition exceeds the range that can be expressed by the number of bits of the output signal, overflow occurs and an overflow signal is generated by the adder. A detailed description of overflow detection can be found, e.g., in an article by Fayez Elguibaly, entitled “Overflow Handling in Inner-Product Processors”, published in IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, Vol. 47, No. 10 (2000). The overflow signal is important as both a reference signal for many control applications and for judging whether the adding result is correct or not. For example, for a 4-bit adder, if the adding result is 16, a 4-bit data output would be 0000. In this example, the signal at the output of the adder does not indicate a correct value because the adder is in overflow status. The overflow signal can be used to indicate the error, and/or to correct the adding result using the carry which indicates the overflow status of the adder.
It is recognized that for adding more than two multiple-bit numbers, more carries may be generated in the summing processes and the number of elementary full adders required may exceed the weights of the inputs. In such calculations, which require many additions in series, carry-save adder circuits are cascaded together. These additions often lead to an overflow condition both in intermediate additions results and in the final sum, which need to be detected in order to avoid large oscillations in the sampled outputs. This condition is called overflow oscillation. The cascading of these additions requires optimization of each addition operation so as both to decrease the occupied silicon area, and to increase the speed with which the adder cascades its output to the next carry-save adder stage. Prior art methods to detect an overflow when adding up a plurality of n-bit operands requires expanding the width of the data-path to more than n bits. Specifically, the minimal number of bits which are needed to represent a sum of k n-bit operands, is n+┌log2(k)┐ bits, where ┌ ┐ denotes the CEILING operation. However, in most of the applications, not all the bits are needed to be stored in a register for further data processing. Thus, prior art methods lead to unnecessary degradation in terms of both speed and area.
As stated, an adder is also used for performing a subtraction operation, by negating one or more of the inputs. The method of negating a binary-string depends on the sign representation of the binary-string, for example, in two's complement binary-strings a negation is by determining a two's complement value and adding 1 to the least significant bit. Besides being used to perform addition or subtraction, the adder is also an integral part of the multiplier, thus playing an important role in the multiplication operation. Thus, the adder speed is a significant limiting factor on the overall speed of a data processor.
There is thus a widely recognized need for, and it would be highly advantageous to have a method and apparatus for adding, subtracting and detecting presence and direction of overflow n-bit data inputs, devoid of the above limitations.