1. Technical Field
The present invention relates in general to a digital multiplier, and in particular to a Carry-Save Adder circuit utilized in a digital multiplier.
2. Description of the Related Art
Digital computer arithmetic involves the development of complex logic circuitry and of efficient algorithms to utilize the available hardware. Given that numbers in a digital computer are represented as strings of zeros and ones, and that hardware can perform only a relatively simple and primitive set of Boolean operations, all the arithmetic operations performed are based on a hierarchy of operations which are built upon the very simple ones. What distinguishes computer arithmetic is its intrinsic relation to technology and the way things are designed and implemented in a digital computer. This comes from the fact that the value of a particular way to compute, or a particular algorithm, is directly evaluated from the actual speed with which this computation is performed. Therefore, there is a very direct and strong relationship between the technology in which digital logic is implemented and the way the computation is structured. The greatest utility of the modern computer is to process large amounts of data in a relatively short amount of time, and the basic arithmetic operations are the building blocks of those computations. In the drive to produce ever faster computers, one of the critical speed limitations to overcome is the arithmetic logic unit speed. Therefore, any speed improvement in digital logic and the arithmetic logic unit, or in how the computation is structured, can directly affect modern computer speed.
Almost all multiplication operations in modern computer systems use the basic algorithm of the Wallace tree algorithm with some adaptations and modifications of the implementation and number system used. For an example, consider a basic multiplication algorithm that operates on positive n-bit-long integers X and Y resulting in the product P, which is 2n bits long: ##EQU1##
This expression indicates that the multiplication process is performed by summing n terms of a partial product: X*y.sub.i r.sup.i. This product indicates that the ith term is obtained by a simple arithmetic left shift of X for the i positions and multiplication by the single digit y.sub.i. For the binary radix r=2, y.sub.i is 0 or 1 and multiplication by the digit y.sub.i is very simple to perform. The addition of n terms can be performed at once, by passing the partial products through an array of adders, or sequentially, by passing the partial product through an adder n times. The algorithm to perform multiplication of X and Y can be described as: EQU P.sup.(0) =0 EQU P.sup.j+1 =1/r(P.sup.j +r.sup.n Xy.sub.j) for j=0, . . . , n-1
It can be easily proved that this recurrence results in P.sup.(n) =XY.
Various modifications of the above basic multiplication algorithm exist. One of the most famous is the modified Booth recoding algorithm described by Booth. This algorithm allows for the reduction of the number of partial products, thus speeding up the multiplication process. Generally speaking, the Booth algorithm is a case of reusing the redundant number system with the radix higher than 2.
The basic multiplication algorithm, including the Booth algorithm, and the hardware implementations of a multiplier using these algorithms are well known by those skilled in the art. A derailed description of these and other algorithms and digital multipliers can be found in many textbooks on digital design--for example, "Computer Architecture, A Quantitative Approach", David A. Patterson and John L. Hennessy, Morgan Kaufmann Publishers Inc., 1989, incorporated herein by reference.
Arithmetic logic units (ALUs) are combinational logic circuits that can perform basic arithmetic (addition or subtraction) or logical (AND, OR, NOT, etc.) operations on two m-bit operands. ALUs may be constructed from standard integrated circuits or programmable logic devices, and are available as single-chip medium-scale integrated circuits as well as incorporated into single chip microcomputers. Integrated ALUs may be, cascaded to perform longer word lengths than are available in a single device.
The basic building block for most arithmetic circuits, including ALUs, is the full adder, also known as the Carry-Save adder in one configuration. A Carry-Save adder is a logic circuit that produces the 2-bit sum (S and C) of three 1-bit binary numbers (X, Y and Z). Table 1 shows the truth table and logic equations of a full adder. Here, S is the sum signal, and C is the carry signal produced by the full adder. A logic symbol and a gate-level realization of a full adder are shown in FIG. 1A and FIG. 1B, respectively.
TABLE 1 ______________________________________ X Y Z S C ______________________________________ 0 0 0 0 0 0 0 1 1 0 S = XYZ + XY'Z' + X'YZ' + X'Y'Z 0 1 0 1 0 0 1 1 0 1 1 0 0 1 0 C = XY + XZ + YZ 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 ______________________________________
FIG. 2 shows a Sum Cell and a Carry Cell for implementing the logic of FIG. 1B using CMOS technology. Together, the Sum Cell and Carry Cell comprise a Carry-Save adder, as is well known by those skilled in the art. The Sum Cell and the Carry Cell each receive inputs X, Y, and Z, and the complement signals X', Y', and Z', to produce the sum signal S and the carry signal C, respectively. The operation of these circuits is well known and has been thoroughly explained in the prior art--for example, see Patterson-Hennessy, incorporated herein by reference. The traditional design of the Carry-Save adder, as shown in FIG. 2, uses a CMOS design using full-voltage-level outputs.
In a multiplier array, many Carry-Save adder circuits are cascaded together to perform the partial product summation of the multiplication. Accordingly, each Carry-Save adder must reach its high or low output levels before propagating its signals to the next adder in the array. The speed of these circuits is directly related to the time it takes for their outputs to reach either the upper rail or the lower rail voltage. Because several Carry-Save adders can be cascaded together to add a column of many partial products, it would be desirable to provide a Carry-Save adder which significantly increases the speed with which the adder cascades its output to the next Carry-Save adder stage.