Binary addition is the single most important operation that a computer processor performs and has been thoroughly investigated since the beginning of computing. The performance of processors is significantly influenced by the speed of their adders and it is shown by M. A. Franklin and T. Pan, Performance Comparison of Asynchronous Adders, in Proc. Of Int'l Symp. Advanced Research in Asynchronous Circuits and Systems, pp. 117-125, November 1994; that in a prototypical RISC machine (DLX), 72 percent of the instructions perform additions (or subtractions) in the datapath (J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 1990). It is even reported by “J. D. Garside, A CMOS VLSI Implementation of an Asynchronous ALU, Asynchronous Design Methodologies, S. Furber and M. Edwards, eds., vol. A-28 of IFIP Trans., pp. 181-207 1993;” to reach 80 percent in ARM processors.
The adders can be sequential or combinatorial. As the sequential adders are bound to perform slowly due to the incremental nature of operation, sequential adders are not considered for parallel and fast adders. The basic building block of combinatorial digital adders is a single bit adder. The Half-Adders (HA) are the simplest single bit adders. The Full-Adders (FA) are single bit adders with the provision of carry input and output. The full-adders are typically composed of two HAs and hence are more expensive than half-adders in terms of area, time and inter-connection complexity.
The most common approach for designing multi-bit adders is to form a chain of FA blocks by connecting the carry out bit of a FA to the carry in bit of the next FA block.
It is known as Ripple Carry Adder (RCA). The delay in RCAs increases linearly with number of bits. However, it remains the most efficient and thereby the choice for the designers for fewer number of bits (≤4) as clarified by N. H. E. Weste, K. Eshraghian, Principles of CMOS VLSI Design A Systems Prespective, 2nd Edition, Addison-Wesley Pub., 1994. Many different combinatorial adders are designed for improving the efficiency of basic RCAs and some of them consider the possible parallelism of the addition operation.
As described by R. E. Ladner and M. J. Fischer, Parallel Prefix Computation, Journal of the ACM, 27(4), pp. 831-838, October 1980; addition is a special prefix problem which means that each sum bit is dependent on all equal or lower input bits. This dependency makes it difficult to implement a parallel algorithm for addition. However the flow of bits can be tactfully arranged for a tree structured implementation of the adder that can reduce the addition overhead significantly. Carry Look Ahead/Carry Select/Carry Skip adders belong to this category of adders. On the other hand the Carry Save adders avoid the carry propagation altogether by employing a redundant number representation.
Eventually the redundant number needs to be converted to the non-redundant representation by using a carry propagate adder that eliminates much of earlier gains.
Apart from the theoretically possible best design for adders some implementation issues regarding circuit complexity and fabrication limitations also play a crucial role in circuit design. The circuit complexity and irregular design can render it infeasible for VLSI fabrication. Moreover, the number of outputs an input signal need to drive is limited which is known as fan-out limitation. The fan-out limitation also incurs extra delay as the capacitance increases with increasing fan-out parameter. The power dissipation is also another important practical issue that limits the number of interconnection in a VLSI fabrication.
As reported by Fu-C. Cheng, S. H. Unger and Michael Theobald, Self-Timed Carry-Lookahead Adders, IEEE Transactions On Computers, 49(7), pp. 659-672, July 2000; the best parallel adder can perform addition in log of log number of bits time. Typically the area and interconnection efficiency is traded off to achieve logarithmic/sub-logarithmic performance. Thus, it remains to be a challenge for the researchers to achieve fast adder with less area and interconnection requirement.
The present invention discloses a recursive formulation for PArallel Self-Timed Adder (PASTA). The design of PASTA is regular and uses HAs along with multiplexers with minimum interconnection requirement. Thus the interconnection and area requirement is linear that makes it practical to fabricate in a VLSI chip. The design works in truly parallel manner for the number of bits that do not require carry propagation. The carry chains for long number of bits are logarithmic and significantly smaller (B. Gilchrist, J. H. Pomerene, and S. Y. Wong, Fast Carry Logic for Digital Computers, IRE Trans. Electronic Computers, 4(4): 133-136, December 1955). Hence theoretically it can perform in logarithmic time. It is self-timed that means it will signal the completion of addition as soon as it is done thereby overcoming the clocking limitations.