Exemplary embodiments of the present invention relate to an asynchronous full adder, and more particularly to an enhanced technique thereof to employ a dual-rail scheme for the datapath, including arithmetic-logic unit (ALU), of asynchronous microprocessors.
An ALU of a microprocessor, that performs arithmetic and logic operation, such as addition, subtraction, logical OR and AND, includes a full adder, an accumulator register, a flag resister, etc. ALUs are designed in optimum schemes that are employed with regard to the applications, performance, power consumption, chip area and other factors. For example, the related art includes a ripple-carry adder and a carry-lookahead adder. The ripple-carry adder is slightly inferior to the carry-lookahead adder in operation speed. However, the ripple-carry adder leads to a smaller chip area, and therefore is most typically used in microprocessor design. The datapath circuit including an adder is driven by global clock in synchronous design. Therefore, in the case of that the delay of a ripple-carry chain is sufficiently small against the clock period, the operation speed of the microprocessor solely depends on the delay of the critical path. Hence, the ripple-carry adder, which leads to a smaller chip area, has an advantage over carry-lookahead adders.
Related art adders are designed as a part of a datapath that is driven by a global clock, so that addition under the worst condition shall be completed within a cycle or cycles of the global clock, satisfying the setup time and hold time against, for example, the rising edge of the global clock signal. For example, in the case of an 8-bit ripple-carry adder, which is constituted by connecting eight 1-bit full adders, the longest delay following the 8-stage carry chain is subject to a problem in timing design. Eight times the delay of the 1-bit full adder may roughly be regarded as the delay of the 8-bit ripple-carry adder. Preferably the transistors are sized to reduce the carry chain delay, and then the nominal delay is determined according to the timing simulation, such as SPICE simulation. Usually, iteration of sizing across circuit and physical design for speed/area/power trade-offs is unavoidable and time-consuming.
Since, in synchronous design or worst-case design, the longest delay of 8-stage carry chain is assumed to be constant in the context that it shall always be accommodated in the clock period, no matter how large the transistors are sized and what type of scheme is adopted, the delay of computation is independent on the addends and result. So far, as mentioned above, synchronous design, which is used in the related art, involves the foregoing problem across the circuit and physical design process, as well as the clock-skew problem as described in U.S. Pat. No. 3,290,511.