In the design of microprocessors, it has generally been desirable to provide fast and low powered operation. One manner of providing a fast ALU is providing a tree structure for carry generation which results in a carry propagation delay proportional to log (N), where (N) is the number of bits in the ALU. The base for the log is the number of bits being combined at each node in the tree. For example, if two bits are combined at each node in the tree and it is a 16-bit processor, then the propagation delay through the ALU is proportional to log.sub.2 (16) which equals 4.
An example of a tree structure for carry generation is provided in "Digital CMOS Circuit Design", by Marco Annaratone, pages 204-209, at page 207 where
FIG. 6-34 illustrates an internal cell having a tree structure for carry generation. The equation at page 207 of the above article illustrates that noninverting logic is to be utilized. Also, the tree structure illustrated provides fanout at least as high as five for a 16-bit ALU.
Another known ALU scheme is disclosed in U.S. Pat. No. 4,559,608. This patent relates to a CMOS ALU and discloses a look ahead carry circuit using inverting logic.
The present invention provides a full function ALU having the capability of performing the logical functions of one or more input variables on a bit-by-bit basis and of providing sum and difference of the inputs with or without a carry-in or borrow-in. Table 1 illustrates the defined logical and arithmetic functions for implementation of the ALU for two input variables. Other combinations of K and L terms and carry-in are possible depending upon the needs of the user.
TABLE 1 __________________________________________________________________________ K.sub.3 K.sub.2 K.sub.1 K.sub.0 L.sub.2 L.sub.1 L.sub.0 CIN P G S FUNCTION __________________________________________________________________________ 0 0 0 0 0 0 0 0 0 0 0 Logical (0) 0 0 0 1 0 0 0 0 AB 0 AB Logical (A AND B) 0 0 1 0 0 0 0 0 A--B 0 A--B Logical (A AND --B) 0 0 1 1 0 0 0 0 A 0 A Logical (A) 0 1 0 0 0 0 0 0 --AB 0 --AB Logical (--A AND B) 1 0 1 0 0 0 0 0 B 0 B Logical (B) 0 1 1 0 0 0 0 0 A XOR B 0 A XOR B Logical (A EXCLUSIVE OR B) 0 1 1 1 0 0 0 0 A OR B 0 A OR B Logical (A OR B) 1 0 0 0 0 0 0 0 --A--B 0 --A--B Logical (--A AND --B) or (A NOR B) 1 0 0 1 0 0 0 0 A XNOR B 0 A XNOR B Logical (A EXCLUSIVE NOR B) 1 0 1 0 0 0 0 0 --B 0 --B Logical (--B) 1 0 1 1 0 0 0 0 A OR --B 0 A OR --B Logical A OR --B) 1 1 0 0 0 0 0 0 --A 0 --A Logical (--A) 1 1 0 1 0 0 0 0 --A OR B 0 --A OR B Logical (--A OR B) 1 1 1 0 0 0 0 0 --A OR B 0 --A OR --B Logical (--A OR --B) OR (A NAND B) 1 1 1 1 0 0 0 0 1 0 1 Logical (1) 0 1 1 0 0 0 1 0 A XOR B AB A + B Sum (A plus B) 0 1 1 0 0 0 1 1 A XOR B AB A + B + C.sub.IN Sum (A plus B plus CARRY IN) 1 0 0 1 0 1 0 1 A XNOR B A--B A - B Difference (A minus B) 1 0 0 1 0 1 0 0 A XNOR B A--B A - B - C.sub.IN Difference (A minus B minus BORROW IN) 1 0 0 1 1 0 0 1 A XNOR B --AB B- A Difference (B minus A) 1 0 0 1 1 0 0 0 A XNOR B --AB B- A- C.sub.IN Difference (B minus A minus BORROW IN) __________________________________________________________________________
In the description of the invention it will be convenient to refer to various variables which are generally used in reference to ALU design and functioning. These commonly used terms are defined as follows: Arithmetic Operations:
S.sub.N =Sum from N.sup.th bit PA1 C.sub.N-1 =Carry into the N.sup.th bit PA1 C.sub.N =Carry from N.sup.th bit PA1 A.sub.N =A input to the N.sup.th bit PA1 P.sub.N =Propagate term of the N.sup.th bit PA1 B.sub.N =B input to the N.sup.th bit PA1 G.sub.N =Generate term of the N.sup.th bit PA1 S.sub.N =A.sub.N XOR B.sub.N XOR C.sub.N-1 PA1 C.sub.N =G.sub.N OR P.sub.N C.sub.N-1 PA1 G.sub.N =A.sub.N B.sub.N PA1 (1) P.sub.N =A.sub.N OR B.sub.N (OR represents the INCLUSIVE OR function) PA1 (2) P.sub.N =A.sub.N XOR B.sub.N (XOR represents the EXCLUSIVE OR function) PA1 (1) Ripple Carry PA1 (2) Look Ahead Carry PA1 (3) Tree Structured Carry
The benefits of the invention may be readily illustrated with respect to an operation where:
To better understand the operation of the ALU of the present invention, please consider that G.sub.N is a term in which a carry is generated independent of carry-in. P.sub.N is a term which causes the carry-in to be propagated to the next bit position. Therefore, there are two possible implementations for P.sub.N, i.e.,
The first implementation can generally cause P.sub.N to be generated faster than in the second implementation but the first implementation cannot be used to directly generate the sum S.sub.N. The second implementation generates a P.sub.N which can be used both to generate C.sub.N and S.sub.N directly, since: EQU S.sub.N =A.sub.N XOR B.sub.N XOR C.sub.N-1 EQU S.sub.N =P.sub.N XOR C.sub.N-1
Therefore: EQU P.sub.N =A.sub.N XOR B.sub.N EQU G.sub.N =A.sub.N B.sub.N EQU C.sub.N =G.sub.N OR P.sub.N C.sub.N-1 EQU S.sub.N =P.sub.N XOR C.sub.N-1
Once these terms have been derived, the carry propagation is performed. There are three generally recognized methods of propagating the carry:
In ripple carry, a carry generated in the least significant bit and is serially propagated to each higher order bit. The total delay (tp) is generally: EQU t.sub.P =K.sub.1 +nK.sub.2
Where K.sub.1 and K.sub.2 are constants and n is the number of bits. Thus, the propagation of the carry through the carry propagation circuitry is proportional to the number of bits (n).
In look ahead carry, a carry generated from an m-bit group of bits is then serially propagated to each higher bit group by skipping over bits in groups of m bits. The total propagation delay is generally of the form: EQU t.sub.P =K.sub.1 +n/m K.sub.2
Where m is the number of bits per group. Here too, the propagation through the carry propagation circuitry is proportional to the number of bits (n).
The tree structure computes carrys by combining carrys in groups of m bits, groups are combined from the least to most significant bit until a particular bit position's carry is computed. All carrys are computed in parallel. The general form of the propagation delay is: EQU t.sub.p =K.sub.1 +K.sub.2 log.sub.m (n)
This implementation generally requires the most hardware but gives the fastest results because the delay grows as log.sub.m (n) rather than being proportional to n as in ripple carry and look ahead carry. It should be noted that the constants for ripple carry, look ahead carry and the structured carry are not necessarily the same.
A tree structure ALU generates all propagate and generate terms in parallel then combines the propagate and generate terms of bit position N with lower order bits to form the complete carry term C.sub.N.
An analysis of the logic functions needed to derive CN is shown below: EQU C.sub.N =G.sub.N OR P.sub.N (C.sub.N-1)
However, C.sub.N-1 must also be considered, so: EQU C.sub.N =G.sub.N OR P.sub.N (G.sub.N-1 OR P.sub.N-1 (G.sub.N-2 OR P.sub.N-2 (G.sub.N-3 . . . (G.sub.O OR P.sub.O C.sub.IN)) . . . )
Expanding again reveals: EQU C.sub.N =G.sub.N OR P.sub.N G.sub.N-1 OR P.sub.N P.sub.N-1 G.sub.N-2 OR P.sub.N P.sub.N-1 P.sub.N-2 G.sub.N-3 OR . . . P.sub.N . . . P.sub.1 G.sub.0 OR P.sub.N . . . P.sub.O C.sub.IN
Two bit positions' propagate and generate terms can be combined as follows: EQU P.sub.N '=P.sub.N P.sub.N-1 EQU G.sub.N '=G.sub.N OR P.sub.N G.sub.N-1
The G.sub.N ' terms are important in the carry portion of the ALU and are generated in complex gates. Since the complex gates which are used in the implementation of the carry circuitry as well as the P and G generate circuit and output circuit involve the execution of multiple logical combinations in a single gate, shorthand notations for these gates are conveniently used to identify their functions. For instance, the gates shown in FIGS. 1C (AND/NOR) and 1D (OR/NAND) are used in the carry circuitry as the logical elements which make up the tree structure. Since it will be necessary to identify the various inputs to these complex gates, reference will be made to the AND inputs xxx and the NOR input xxx of the AND/NOR gates and to the OR inputs xxx and the NAND input xxx of the OR/NAND gates, as shown in FIGS. 1C and 1D. Alternatively, the signals provided to the inputs will merely be referred to as the input P and G terms or P and G signals. Figures 1A-1D illustrate various logic symbols for gates and FIGS. 2A-2F illustrate various implementations of these gates.
According to the present invention, an ALU can be designed from a repeatable cell which contains the necessary components for a given number of bits of the ALU. Thus, if the cell contains the necessary circuitry for two bits of the ALU, a 32-bit ALU can be built by providing 16 repeats of the cell and providing the appropriate interconnections.
In order to simplify the cell yet insure that all necessary components are included, it is necessary to provide not only the logic gates needed for the specific tasks of the manipulation of a single bit (or two bits in a two bit arrangement) but also to provide the circuitry needed to interconnect adjacent cells.
It is an object of the present invention to provide an Arithmetic Logic Unit with reduced delay.
It is another object of the invention to provide a cell based ALU design which includes minimum excess circuitry.
It is still another object of the invention to provide a cell design for an ALU which contains all the circuitry necessary for fabrication of an ALU without additional circuitry.
It is yet another object of the invention to provide a cell layout which provides a minimum number of levels of devices in the physical structure of the ALU formed by use of the cell.
It is a still further object of the invention to provide a cell design which is compatible with an ALU design technique having minimum propagation delay as a feature of the ALU.
Another object of the invention is the provision of a cell based ALU having a tree-based carry portion with inverting logic and fanout limited to a specific design factor for optimizing throughput.
These and other objects of the invention are attained by providing a cell structure having a propagate and generate portion, a carry portion and an output portion.
The propagate and generate portion of the invention provides P and G terms to the carry and output portions and the carry portion provides a carry term to the output portion. In one implementation, the carry and output portions are combined while in a more generic approach, the carry portion is separate from the output portion.
The carry portion of the cell includes a number of gates which have their inputs and outputs individually accessible during the design stages in order that these inputs and outputs may be selected to perform needed functions in the ALU or may be left unused in the ALU, depending upon the location of the cell in the ALU design. The provision of the individually accessible gates in the cell design provides a smaller cell design than could be previously provided by ALU building blocks which did not contain the individually accessible gates.
One aspect of the invention involves the provision of a carry portion which has a plurality of inverters which are totally individually accessible, as well as the provision of at least one totally individually accessible complex logic gate. A cell having these individually accessible components can be utilized in a flexible configuration such that the identical cell can be used to implement the circuitry for the bits at any location in an ALU.
The use of a complex AND/NOR gate and a complex OR/NAND gate in the carry portion of the cell in combination with a plurality of inverters and at least one NOR and one NAND gate provides a cell having the necessary circuitry for a 1-bit cell which can be repeated for each bit of an ALU of virtually unlimited bit length. In this version of the cell, it may be desirable to leave each input and each output of the components unconnected in order that the designer can provide the proper interconnection of the cell's components for the particular location of the cell in the ALU. It is to be noted that the interconnection of the components will vary as the cell is used at different locations in the ALU.
The present invention is suitable for use with either inverting or noninverting logic but is ideally suited for use with inverting logic such as is involved in the design of CMOS microprocessors. Thus, another feature of the invention is its provision of the means for designing an ALU where each level of logic is inverted with respect to the preceding level, even for those signals which have not been inverted by a logical operation. This is the reason for the provision of the inverters in the carry portion of the circuitry. Another aspect of the invention is the provision of the means for keeping the maximum fanout from any gate to a manageable limit. The inverters which are used for obtaining proper logic inversions are conveniently used for a separate and unrelated purpose of driving multiple gates in cases where fanout from the preceding gate would otherwise be potentially excessive. This provides a substantial speed advantage over ALU designs where excessive fanout is present. According to the present invention, ALU's of any arbitrary bit length may be fabricated without exceeding a fanout of three in the carry circuitry.
It is intended that the present invention will be implemented in many alternative manners all of which will be based on the general principle that a simple cell containing the necessary components for fabricating a multicell ALU can be designed with certain dedicated interconnections of the components in the cell and with various components of the cell left unconnected until the ALU design is assembled, at which time the cell's individually accessible inputs and outputs can be interconnected, as necessary, for the optimization of the ALU design. For purposes of this description, individually accessible means that there is an input or output to a gate which is not connected to any other component in the cell design until a multicell ALU is designed. The provision of the cell with the necessary building blocks and the flexibility to use the building blocks in a wide variety of manners without the constraints imposed by preconnecting the individually accessible components results in a substantial savings of time and effort in ALU design and fabrication. Referring to FIG. 8, it can be seen that OR/NAND gate G.sub.33 has each of its inputs 846, 847 and 848, as well as its output 849, individually accessible. This is an isolated individually accessible gate which will provide great flexibility in the design of ALU's having any multiple of cell repeats. For purposes of this description, a gate such as this which has none of its inputs or outputs preconnected is referred to as totally individually accessible. Signals are equivalently referred to as X or XBAR in this description.
Other objects, advantages and novel features of the invention are described herein with respect to the various specific implementations of the invention.