1. Field of the Present Invention
The present invention generally relates to the field of digital circuits and more particularly to a 4:2 compressor circuit that facilitates computations in an arithmetic unit of a microprocessor.
2. History of Related Art
Data processing devices typically perform numeric multiplication in three general steps: (1) partial product generation; (2) partial product reduction; and (3) final addition. Multiplication of an n-bit number and an m-bit number generally produces a result up to n+m bits in length. For example, multiplication of a multiplicand of xe2x80x9c11xe2x80x9d and a multiplier of xe2x80x9c11xe2x80x9d yields a first partial product xe2x80x9c11xe2x80x9d and a second partial product xe2x80x9c11.xe2x80x9d See, e.g., Eisig et al., Method and Apparatus for Re-Configuring a Partial Product Reduction Tree, U.S. Pat. No. 5,343,416. The second partial product is shifted left by one bit position. The sum of the two a partial products is the 4-bit result xe2x80x9c1001.xe2x80x9d
As the number of bits in the operands increases, so does the number of partial products. Since speed is among the major factors in multiplier design, summing the partial products becomes problematic. When multiplying two sixty-four bit operands, for example, sixty-four partial products must be summed. Several methods exist for reducing the number of partial products.
A Booth decoding technique has been used to reduce the number of partial products by a factor of two or more. Even with a minimization scheme such as Booth, however, the problem of quickly adding the remaining partial products using a minimum amount of circuitry remains.
A second approach, which may be used in conjunction with the first approach, is the implementation of Carry-Save-Adders (CSAs), which are similar to full adders. A CSA is similar to a full adder in that it inputs three numbers and outputs two numbers. For this reason, a CSA is referred to herein as a 3:2 compressor. A tree of CSAs can be used to reduce a number of partial products to two numbers which can then be summed by a standard Carry-Propagate Adder. For wide operands, however, the number of stages of 3:2 compressors required may result in excessive propagation delay. To address this problem, so-called 4:2 compressors have been used to reduce the propagation delay by reducing the number of stages.
In a conventional implementation, 4:2 compressors employ complementary pass-gate logic (CPL). In CPL design, logic gates are implemented with transistors of a single polarity (typically n-channel) while transistors of the opposite polarity may be used to reduce the circuit""s static current.
Referring to FIG. 16, an exclusive-or (EXOR) circuit 10 is depicted as implemented with a conventional CPL design. Circuit 10 receives input signals xe2x80x9caxe2x80x9d and xe2x80x9cbxe2x80x9d and their corresponding complements (indicated by the apostrophe mark). The xe2x80x9caxe2x80x9d signal is connected to the gate electrodes of n-channel transistors 12 and 14 while the axe2x80x2 signal is connected to the gate electrodes of n-channel transistors 16 and 18. The xe2x80x9cbxe2x80x9d signal is connected to the source electrode of transistors 14 and 16 while the xe2x80x9cbxe2x80x9dxe2x80x2 signal is connected to the source electrode of transistors 12 and 18. The drain terminals of transistors 12 and 16 are tied together at node 20 while the drain terminals of transistors 14 and 18 are tied together at node 22. It can be easily verified that node 20 is the exclusive-or (EXOR) of signals xe2x80x9caxe2x80x9d and xe2x80x9cbxe2x80x9d while node 22 is the negated EXOR (XNOR). CPL circuit 10 further includes cross-coupled p-channel transistors connected to nodes 20 and 22 to reduce static current by imposing a high impedance channel between the power supply and the logically low input signal.
When a logical xe2x80x9c1xe2x80x9d is passed through the source/drain of the n-channel device in a CPL circuit, a voltage of Vddxe2x88x92Vtn is produced where Vdd is the supply voltage and Vtn is the n-channel threshold voltage. This passed voltage is typically restored through an inverter having relatively weak p-channel device and a relatively strong n-channel device. The speed of a CPL circuit is strongly dependent on the xe2x80x9chighxe2x80x9d voltage that is applied to the gate of the n-channel device to turn it on. The higher the voltage applied at the gate, the harder the n-channel device is turned on and the lower the channel resistance. Reduced channel resistance translates into reduced RC delay. Moreover, a higher voltage applied at the gate translates into a higher output voltage produced at the output end of the circuit. The higher output voltage beneficially improves the ability of the inverter to generate a logical xe2x80x9c0xe2x80x9d because the Vgs of the inverter""s n-channel device is larger. In summary, a higher xe2x80x9c1xe2x80x9d voltage results in a faster CPL circuit and, conversely, a lower xe2x80x9c1xe2x80x9d voltage results in a slow CPL circuit. Unfortunately, CPL circuits are typically affected by a number of factors that can decrease the xe2x80x9c1xe2x80x9d voltage including coupling noise, delta-I noise, and DC voltage drop. Moreover, in silicon on insulator (SOI) devices, the voltage drop access the transistor tends to vary. This phenomenon is commonly referred to as the floating body effect or history effect and it can have a negative effect on the switching times of SOI devices. For these reasons, it is hard to model and predict the circuit speed. Scaling means applying successive generations of lower supply voltage process technology to the same circuit design. Unfortunately, scaling also means lower supply voltages that reduce the speed of CPL circuits thereby making them less scalable.
It would be desirable to implement a multiplier that optimized speed without undue expense in the form of a very complex or very large circuit. It would be further desirable if the implemented design was scalable and less dependent upon gate voltage than traditional CPL circuits.
The problem described above is addressed in the present invention by a compressor circuit suitable for use in an arithmetic unit of a microprocessor includes a first stage, a second stage, a carry circuit, and a sum circuit. The first stage is configured to receive a set of four input signals. The first stage generates a first intermediate signal indicative of the XNOR of a first pair of the input signals and a second intermediate signal indicative of the XNOR of a second pair of the input signals. The second stage configured to receive at least a portion of the signals generated by the first stage. The second stage generates first and second control signals where the first control signal is indicative of the XNOR of the four input signals and the second signal is the logical complement of the first signal. The carry circuit is configured to receive at least one of the control signals and further configured to generate a carry bit based at least in part on the state of the received control signal. The sum circuit is configured to receive at least one of the control signals and further configured to generate a sum bit based at least in part on the state of the received control signal. At least one of the first stage, second stage, sum circuit, and carry circuit include at least one CMOS transmission gate comprised of an n-channel transistor and a p-channel transistor having their source/drain terminals connected in parallel, wherein the p-channel transistor gate is driven by the logical complement of the n-channel transistor gate. In one embodiment, the first stage, second stage, carry circuit, and sum circuit are comprised primarily of such transmission gates to the exclusion of conventional CMOS complementary passgate logic.