Embodiments of the invention relate to the field of semiconductor devices and, more particularly, to an adder circuit and method for use in an integrated circuit component, such as a microprocessor, and in a digital computer system.
The users of digital computers have a virtually insatiable demand for computers that operate at faster and faster clock speeds and which are increasingly lighter in weight and portable, and which thus require relatively low power to operate. Thus, the manufacturers of digital computers, and of the microprocessor xe2x80x9cbrainsxe2x80x9d that go into them, are constantly looking for ways to increase their processing speeds that don""t require more power.
Digital addition constitutes a fundamental operation of virtually all microprocessors and digital computer systems, not only to provide basic addition functions but also to provide many other logical operations. Addition, and other arithmetic operations, are generally performed by an arithmetic logic unit (ALU) contained with the computer""s processor unit.
Digital addition is also one of the performance-limiting operations in a microprocessor""s internal circuitry, and it therefore has been a significant focus of high-performance ALU research over recent years.
FIG. 1 illustrates a simplified block diagram of a prior art Kogge-Stone adder, shown generally as 1. In this example, two 4-bit signals Ai and Bi are added together to form a SUM Si. The Ai and Bi addends are fed into a Propagate/Generate circuit 2 along with a carry-in signal Cin, which also is fed unchanged into SUM circuit 8 as carry signal CO.
The propagate and generate signals are generated within circuit 2, and they are subsequently output to a binary tree structure for calculating the carries. In this example, the tree structure comprises circuit 4, which calculates a first level (gx, px) of generate and propagate terms along with carry signals C1 and C2, and it further comprises circuit 6, which calculates a second level (gy, py) of generate and propagate terms along with carry signals C3 and C4. Carry signals C1-C3 are referred to as bit-carry signals, and carry signal C4 is referred to as a sum-carry signal.
The multi-level tree structure is characteristic of Kogge-Stone adder architecture, and it is used to perform what is referred to as xe2x80x9ccarry-mergingxe2x80x9d, xe2x80x9cpropagate/generate mergingxe2x80x9d, or simply xe2x80x9cP/G mergingxe2x80x9d. As the number of bits in the addends increases, so do the number of levels in the carry propagation tree. In general, if N represents the addend bit-width, the number of P/G merging levels required is log2N (e.g., a bit-width of 16 requires 4 levels).
Addends Ai and Bi along with Carry signals C0 through C3 are summed in SUM circuit 8 to form the SUM Si. The C4 output of circuit 6 represents the carry signal for Si.
The Kogge-Stone adder is widely used in microprocessor ALUs, due in part to the predictable log2N depth of the carry propagation tree, and in part to the limitation of fan-out at every stage to two, which helps retain device sizes significantly smaller (and more energy efficient) than other comparable architectures.
A known prior art circuit for implementing the Kogge-Stone adder in microprocessor ALUs is the fully-differential (also referred to as xe2x80x9cdual railxe2x80x9d) domino circuit. Here, both true and complementary inputs are required. The dual-rail domino circuit consumes these differential inputs and delivers differential SUM and SUMxe2x80x2 outputs. (In the description the complement of a term or expression will be indicated either by a xe2x80x9cprimexe2x80x9d xe2x80x2 following or by a bar over the term or expression.)
A significant reason for generating both true and complementary sum outputs is because a microprocessor ALU has to perform both addition and subtraction operations using the same adder in a single cycle. Since the subtraction operation (Axe2x88x92B) in two""s complement arithmetic is performed as (A+Bxe2x80x2+1), differential outputs are necessary.
FIG. 2 illustrates a schematic diagram of a prior art fully-differential domino circuit 20. Circuit 1 is implemented in Complementary Metal Oxide Semiconductor (CMOS), and it includes a pair of P-type Metal Oxide Semiconductor (PMOS) transistors 22 and 24 coupled to the power supply voltage Vcc; a pair of inverter circuits 26 and 28; an N-type Metal Oxide Semiconductor (NMOS) Combinatorial Network 30; and an NMOS transistor 32 coupled to ground Vss. By way of example, three sets of complementary inputs A and Axe2x80x2, B and Bxe2x80x2, and C and Cxe2x80x2, are shown input into Combinational Network 30. Complementary outputs Q and Qxe2x80x2 are output from inverters 28 and 26, respectively. A clock signal is applied to the gates of P-type transistors 22 and 24, and it is also applied to the gate of N-type transistor 32.
The operation of fully-differential domino circuit 20 is well known to those of ordinary skill in the art. It is also well known how to implement a Kogge-Stone adder using fully-differential domino circuits as building blocks.
The use of fully-differential domino circuits requires a significant amount of circuit wiring layout, circuit area, and circuit complexity for performing complementary logic functions.
There is a substantial need in the semiconductor art for a fast, low-power domino circuit which is less complex and more efficient in terms of the amount of circuit wiring and area consumed.
In addition, there is a substantial need in the computer art for a microprocessor, and for a digital computer incorporating a microprocessor, which operate at very high speed and consume relatively little power.
Accordingly, in one embodiment of the invention there is provided an adder circuit comprising at least one single-ended domino circuit (also referred to herein as a xe2x80x9csingle-railxe2x80x9d domino circuit), and at least one dual-function generator circuit coupled to the at least one single-ended domino circuit and which generates differential sum and sum-complement output signals.
In another embodiment of the invention there is provided a processor comprising an arithmetic logic unit. The arithmetic logic unit includes an adder circuit comprising at least one single-ended domino circuit, and at least one dual-function generator circuit coupled to the at least one single-ended domino circuit and which generates differential sum and sum-complement output signals.
In yet another embodiment of the invention there is provided an integrated circuit comprising a processor having an arithmetic logic unit. The arithmetic logic unit includes an adder circuit comprising at least one single-ended domino circuit, and at least one dual-function generator circuit coupled to the at least one single-ended domino circuit and which generates differential sum and sum-complement output signals.
In a further embodiment of the invention there is provided a data processing system comprising a bus coupling components in the data processing system. A display and an external memory are coupled to the bus. Also coupled to the bus is a microprocessor comprising an arithmetic logic unit. The arithmetic logic unit includes an adder circuit comprising at least one single-ended domino circuit, and at least one dual-function generator circuit coupled to the at least one single-ended domino circuit and which generates differential sum and sum-complement output signals.
Yet a further embodiment of the invention includes a method of adding numbers, A and B, each having a plurality of bits. The method includes generating propagate and generate signals from single-ended expressions of A and B, generating differential carry signals from the propagate and generate signals, and producing differential sum and sum-complement output signals from the differential carry signals and from single-ended expressions of A and B.
Other embodiments are described and claimed.