1. Field of the Invention
The present invention relates to computer architecture, and more specifically to arithmetic operations with a digital computer.
2. Description of the Related Art
The speed of computer systems has exploded in recent years. Processor designs have become more efficient, and smaller substrate feature sizes and improved designs have allowed the achievement of speeds than had been thought impossible only a few years previously. However, the computer industry continues to drive toward even greater speeds in the future.
Early generations of logic circuit families of bipolar transistors, P-channel field effect transistors (PFETs or p-channel devices), and N-channel field effect transistors (NFETs or n-channel devices), have given way to processor designs using a logic circuit family known as CMOS (Complementary Metal Oxide Semiconductor). A traditional CMOS logicgate consists of a pair of complementary transistors where one transistor is a P-channel field effect transistor and the other transistor is a N-channel field effect transistor.
CMOS gained rapid favor for its ease of construction and simple design rules as well as its tolerance for noise and low power consumption. Power consumption in CMOS occurs only during the switching of the FETs. As a result of its wide popularity, most manufacturing capacity and design research investment in the last several years went into CMOS, which eventually overtook the other types of logic circuit families in nearly every category. Today, most people regard CMOS as the clear winner and preferred choice for virtually every semiconductor logic design task.
The advantage of the CMOS logic family, that it consumes power only when the FETs are switching, was limited to the older circuits that were slow by today's standards, and has become its primary disadvantage as clock rates have increased. The drive for faster dock rates means that the same CMOS circuit that used so little power in the past now requires ever increasing power. Typical CMOS processor designs have been known to consume power in the neighborhood of 50 watts or more. Such power demands (and their related heat dissipation problems) make designing computer systems very difficult.
Another logic family, non-inverting dynamic logic (also called domino logic, or asymmetrical CMOS), has lent itself to very high clock rates. Circuits within the nor-inverting dynamic logic have typically implemented each signal as a pair of wires or datapaths, providing all information in both true and complemented form. Twice as many wires or datapaths have been required as in a similar traditional CMOS design, because dynamic logic generally has not allowed inverted signals. Boolean AND, NAND, OR, NOR, and other well-known functions have been implemented in non-inverting dynamic logic using typical CMOS gates with nor-inverted signals. For example, U.S. Pat. No. 5,208,490 to Yetter et al and U.S. Pat. No. 5,640,108 to Miller describe methods for improving the speed and or accuracy (de-glitching) of dynamic logic circuits. However, the power consumption of the logic family remains problematic.
Synthesized multiplication implemented through repeated addition is extremely slow; for a 32-bit multiplication, 32 adds and 32 shifts would be required. Multicycle partial multipliers, which implement hardware to perform a portion of the multiplication (for example, 32 times 8 bits) in a single cycle, have improved multiplication latency dramatically, but typically have not been able to be pipelined, since all the hardware must be used in four successive cycles to produce a full product. Full multipliers, containing hardware sufficient to compute a full product (64 bits, following the 32-bit example), have been implemented to ii avoid recycling results, and consequently have improved multiplication throughput (that is, number of results produced per cycle), albeit atthe expense of additional hardware cost. The superior performance of full multipliers has made these devices a common implementation choice for contemporary microprocessors. The additional hardware cost of a full multiplier has been mitigated somewhat by shrinking device sizes and larger transistor budgets.