1. Field of the Invention
The present invention relates generally to a carry save adder. More specifically, the present invention relates to a carry save adder for use in a multiplier circuit.
2. Description of the Related Art
The earliest processors did not include hardware to perform multiplication. Instead, multiply operations were accomplished by performing sequences of shifts and adds. As technology evolved to provide higher levels of device integration, it became practical to include hardware dedicated to multiplication within the processor. Initially, the multiply hardware improved the performance of multiplication by directly supporting multiply instructions, although not at the processor""s best possible speed (i.e., multiplication typically took much longer to perform than addition). Further evolution of processors has led to fully pipelined multipliers, allowing processors to initiate multiply instructions at the same rate as they initiate addition instructions, although in most cases at a greater latency.
While increasing integration levels allow the implementation of fully pipelined multipliers, the need for multiply performance is being driven by the changing nature of compute intensive programs. Today, these programs are typically dominated by algorithms that model aspects of the physical world. For example, audio and video compression involve transforming information into a different domain, such as audio into the frequency domain, and then the removal of unimportant information from the desired data. This is one example of a multiply intensive operation. Future software will likely require even greater multiply performance, for example, the Newton-Raphson technique in implementing the divide operation is growing in popularity and uses a sequence of multiply operations.
Nearly all processors implement multiply using a combination of two techniques: Booth encoding and Wallace trees. The Booth encoding process, which produces a number of partial products, uses one of the two multiplicands to select multiples of the other operand in each pair of bit positions of the first operand. A Wallace tree then sums the partial products to produce an output. Booth encoding and Wallace trees are well known in the art, and are not discussed in detail in this disclosure.
There exists a variety of Booth encoders and Wallace trees that are known in the art. One of the most common Booth encoders generates one of five different multiples of the second operand: 2x, 1x, 0x, xe2x88x921x or xe2x88x922x. It is possible to design other types of Booth encoders, but these other designs could not be implemented by simple multiplexers. (E.g., the 1x and 2x multiples are accomplished by simple shifts, while a 3x multiple requires an addition).
A Wallace tree is typically built from 3:2 counter gates or carry-save-adders (CSA) (also known as carry-save-adders). These gates add three bits of identical significance (or weight), and produce a carry of one bit greater significance, and a sum of identical significance to the inputs. The result is that three bits of the input are reduced to two in one CSA level. It is possible to design other forms of CSAs, such as a 7:3 counter described in Metha et al, High-Speed Multiplier Design Using Multi-Input Counter and Compressor Circuits, IEEE 10th Symposium on Computer Architecture (June 1991). These types of counters are typically avoided due to increased complexity, gate count, or an inability to achieve similar performance.
Nearly all of today""s CMOS processors are constructed from flip-flop or latch synchronized static logic. New designs are starting to emerge that use combinations of static and dynamic logic, and more aggressive synchronization schemes. For example, Intrinsity, Inc. (formerly known as EVSX, Inc.) has invented a new logic family called N-NARY logic, which can be characterized as a fully-dynamic and self-synchronized logic family. N-NARY logic is more fully described in a copending patent application, U.S. patent application Ser. No. 09/019355, filed Feb. 05, 1998, now U.S. Pat. No. 6,066,965, and titled xe2x80x9cMethod and Apparatus for a N-NARY logic Circuit Using 1-of-4 Signalsxe2x80x9d, which is incorporated by reference for all purposes into this disclosure and is referred to as xe2x80x9cThe N-NARY Patent.xe2x80x9d Additionally, the present invention is related to a multiplier built using N-NARY logic that is fully described in a copending patent application, U.S. patent application Ser. No. 09/186843, filed Nov. 05, 1998, now U.S. Pat. No. 6,275,841 and titled xe2x80x9c1-of-4 Multiplierxe2x80x9d, which is incorporated by reference for all purposes into this disclosure. The present invention incorporates and or modifies N-NARY adders that are described in several copending patent applications, U.S. patent application Ser. No. 09/150720, filed Sep. 10, 1998, now U.S. Pat. No. 6,219,687, and titled xe2x80x9cMethod and Apparatus for an N-NARY Sum/HPG Gatexe2x80x9d, U.S. patent application Ser. No. 09/150829, filed Sep. 10, 1998, now U.S. Pat. No. 6,216,146, and titled xe2x80x9cMethod and Apparatus for an N-NARY Adder Gatexe2x80x9d, and U.S. patent application Ser. No. 09/150575, filed Sep. 10, 1998, now U.S. Pat. No. 6,223,199, and titled xe2x80x9cMethod and Apparatus for an N-NARY HPG Gatexe2x80x9d, all of which are incorporated by reference into this disclosure for all purposes. A greater discussion of capacitance isolation using N-NARY logic can be found in a copending patent application, U.S. patent application Ser. No. 09/209967, filed Dec. 10, 1998, now U.S. Pat. No. 6,124,735, and titled xe2x80x9cMethod and Apparatus for a N-Nary Logic Circuit Using Capacitance Isolation,xe2x80x9d which is incorporated by reference for all purposes into this disclosure. Additionally, a greater discussion of the wire capacitance can be found in a copending patent application, U.S. patent application Ser. No. 09/019278, filed Feb. 05, 1998, titled xe2x80x9cMethod and Apparatus for a 1-of-N Signal,xe2x80x9d which is incorporated by reference for all purposes into this disclosure. And, the reduced power consumption benefits using N-NARY logic can be found in a copending patent application, U.S. patent application Ser. No. 09/209207, filed Dec. 10, 1998, now U.S. Pat. No. 6,107,835, and titled xe2x80x9cOperation-Independent Power Consumption,xe2x80x9d which is incorporated by reference for all purposes into this disclosure.
The N-NARY logic family supports a variety of 1-of-N signal encodings, including 1-of-4. In 1-of-4 encoding, four wires are used to indicate one of four possible values. In contrast, traditional static logic design uses two wires to indicate four values, as is demonstrated in Table 1. In Table 1, the A0 and A1 wires are used to indicate the four possible values for operand A: 00, 01, 10, and 11. Table 1 also shows the decimal value of an encoded 1-of-4 signal corresponding to the two-bit operand value, and the methodology by which the value is encoded using four wires.
xe2x80x9cTraditionalxe2x80x9d dual-rail dynamic logic also uses four wires to represent two bits, but the dual-rail scheme always requires two wires to be asserted. In contrast, as shown in Table 1, N-NARY logic only requires assertion of one wire. The benefits of N-NARY logic over dual-rail dynamic logic, such as reduced power and reduced noise, should be apparent from a reading of the N-NARY Patent. All signals in N-NARY logic, including 1-of-4, are of the 1-of-N form where N is any integer greater than one. A 1-of-4 signal requires four wires to encode four values (0-3 inclusive), or the equivalent of two bits of information. More than one wire will never be asserted for a valid 1-of-N signal. Similarly, N-NARY logic requires that a high voltage be asserted on only one wire for all values, even the value for zero (0). A null value (or no wires asserted) means that no valid data is present.
Any one N-NARY logic gate may comprise multiple inputs and/or outputs. In such a case, a variety of different N-NARY encodings may be employed. For instance, consider a gate that comprises two inputs and two outputs, where the inputs are a 1-of-4 signal and a 1-of-2 signal and the outputs comprise a 1-of-4 signal and a 1-of-3 signal. Variables such as P, Q, R, and S may be used to describe the encoding for these inputs and outputs. One may say that one input comprises 1-of-P encoding and the other comprises 1-of-Q encoding, wherein P equals two and Q equals four. Similarly, the variables R and S may be used to describe the outputs. One might say that one output comprises 1-of-R encoding and the other output comprises 1-of-S encoding, wherein R equals four and S equals 3. Through the use of these, and other, additional variables, it is possible to describe multiple N-NARY signals that comprise a variety of different encodings.
The N-NARY logic family achieves significant advantages when it is able to operate on a xe2x80x9cDIT,xe2x80x9d which is a DUAL-BIT technology and comprise a pairs of bits. Constructing execution units that operate on DITs allows for a significant reduction in power consumption, often by a factor of two, and for a reduction in electrical signal noise levels due to a reduction in the number of wires that are actively switching at any given time, and by the carefully controlled timing of the switching signals. While implementations are possible for virtually all functions required by a typical processor, some functions present problems for N-NARY logic. The two most difficult functions are odd-bit shifts and multiplication. Odd bit shifts are a problem in N-NARY logic because shifting by one bit requires information to be taken from two DITs and combined into one DIT. Multiplication is difficult for N-NARY logic because the nature of all previously designed Wallace trees has required odd-bit shifts at each carry output of each CSA used in the tree. The present invention overcomes the odd-bit shift problem entirely by a novel structure, a 5:2 carry-save-adder.
The present invention comprises an apparatus and method for a 5:2 carry save adder (CSA). The 5:2 CSA receives the five input signals I0, I1, I2, I3, and I4 and computes the two output signals SUM and CARRY. The 5:2 CSA comprises a first level of logic circuitry and a second level of logic circuitry. The first level of logic circuitry receives the input signals and generates three intermediate terms T0, T1, and T2. The second level of logic circuitry couples to the first level of logic circuitry and uses the intermediate terms to compute the two output signals SUM and CARRY. The 5:2 CSA of the present invention operates using either binary signals or N-NARY signals.
The first level of logic circuitry of the present invention further comprises a plurality of adder gates. A first adder adds the input signals I0 and I1 to generate a first intermediate addend term T0. A second adder adds the input signals I2, I3, and I4 to generate a second intermediate addend term T1. And, a third adder adds the input signals I2, I3, and I4 to generate a third intermediate addend term T2.
The second level of logic circuitry of the present invention further comprises a carry logic circuit and a sum adder circuit. The carry logic circuit receives intermediate terms (T0, T1, and T2) from the plurality of adders in the first level of logic circuitry and computes an output carry signal CARRY. The sum adder receives intermediate terms from the first and third adders (T0 and T2) of the first level of logic circuitry and computes an output sum signal SUM.