1. Field of the Invention
This invention relates to binary adding apparatus for adding together a plurality of multi-bit input words to produce a multi-bit output word.
2. Description of the Prior Art
FIG. 1 of the accompanying drawings shows how such an apparatus may be constructed. In the case of FIG. 1 it is assumed that, for example, two 8-bit input words are to be added, one of the words comprising bits A0 to A7 (in ascending order of significance) and the other of the words comprising bits B0 to B7 (in ascending order of significance). The apparatus comprises eight single-bit full adders FA each having inputs A and B, a carry-in input CI, a sum output Q and a carry-out output CO. The bits A0 to A7 and B0 to B7 of the input words are applied to the inputs A and B of the full adders FA as shown. The carry-in input CI of the full adder FA of least significance is connected to ground (GND). The full adders FA are connected in cascade. That is to say, the carry out output CO of each full adder FA (except, of course, for that of most significance) is connected to the carry-in input CI of the full adder of next highest significance. As will be evident, the result is that the full adders FA add the input words to produce an output word comprising eight bits E0 to E7, in ascending order of significance, together with a carry bit. As shown, the bits E0 to E7 are produced at the sum output terminals Q of the full adders FA, in ascending order of significance, with the carry bit being produced at the carry-out output CO of the full adder of most significance.
Due to the cascaded manner of connection of the full adders FA of the apparatus of FIG. 1, the time taken for the output word to be generated is the sum of the propagation delays of all eight full adders. This is because each full adder FA can only begin to operate when it receives a carry-out bit from the carry-out output CO of the full adder of immediately lower significance. That is, the output word is not generated until after the carry-out bits have propagated through from the least significant bit (LSB) to the most significant bit (MSB).
The above-described speed limitation may be overcome, at the cost of introducing considerable extra hardware into the apparatus, by using so-called pipelining architecture. According to this technique, the full adders are interconnected in cascaded groups, the number of full adders in each group being such that the total propagation delay through each group is less than a predetermined period. The groups are separated by sets of latches which are controlled by a clock generator, which produces a clock signal whose period is the above-mentioned predetermined period, whereby all the latches output simultaneously, once per clock period (i.e. on receipt of each clock pulse of the clock signal), the bits supplied thereto. FIG. 2 of the accompanying drawings shows such a pipelined adding apparatus might be constructed. The apparatus of FIG. 2, like that of FIG. 1, adds two 8-bit input words comprising bits A0 to A7 and B0 to B7, respectively, to produce an output word comprising bits E0 to E7 (and a carry bit). The apparatus of FIG. 2 comprises eight single-bit full adders FA like those of FIG. 1. Additionally, the apparatus of FIG. 2 comprises 48 latches LA which effect the pipelining and afford temporal pre-equalizing and post-equalizing of the bits to insure that the bits E0 to E7 (and the carry bit) of the output word have the same timing. The latches LA are arranged in first to fourth sets (each set comprising those latches shown mutually vertically aligned in FIG. 2) with an addition stage comprising a respective one of first to fourth groups of two cascaded full adders FA preceding each set. A clock generator CK is connected to all of the latches LA to supply thereto a clock signal comprising clock pulses spaced by a clock period which is greater than the sum of the propagation delays of the groups of cascaded full adders FA.
The apparatus of FIG. 2 functions as follows. Upon generation of a first clock pulse by the clock generator CK, the hits A0 to A7 and B0 to B7 of the input words are supplied to the apparatus as shown. The bits A2 to A7 and B2 to B7 go to pre-equalizing latches of the first set of latches LA. The bits A0, A1, B0 and B1 go to the inputs A and B of the first cascaded group of full adders FA. The total propagation delay through the first cascaded group of full adders FA is twice the propagation delay through one full adder, the propagation delay through all of the full adders being substantially the same. This total propagation delay is less than the clock period. Accordingly, prior to generation of the next clock pulse, the sum output bit and carry-out bit provided by the outputs Q and CO, respectively, of the full adder receiving the input bits A1 and B1, and the sum output bit of the full adder receiving the input bits A0 to B0, have been latched into associated latches of the first set of the latches LA.
When the next clock pulse is generated, the bits stored in the latches LA of the first set are outputted and, during the following clock period, the input bits A2, B2, A3 and B3 (and the carry-out bit from the preceding stage) are added in the second group of two cascaded full adders. This process is then repeated twice more whereby, after a total of four clock periods or cycles, an output word is developed at the output of the fourth set of latches LA. The pre-equalizing latches (those preceding the full adders) and the post-equalizing latches (those following the full adders) insure that the bits of the output words as outputted from the fourth set of latches have the same timing, in which regard it should be appreciated that bits of successive pairs of input words may be inputted during each successive clock cycle whereby an output word may be produced during each clock cycle.
The apparatus of FIG. 2 is subject to the following disadvantages. Firstly, it involves the use of a large number of latches. In this regard, the pre-equalizing latches are used very inefficiently because two latches are needed for each of the bits at each pipelining stage. For example, the bits A7 and B7 require the use of six equalizing latches before they are added together. Secondly, it involves a long delay (four clock periods or cycles) between input and output. While the number of latches and the delay might be reducible if it were possible to cascade more than two full adders between the sets of latches, the disadvantages will still remain. For any given number of cascaded adders in the successive stages, the above disadvantages tend to get worse if the number of bits of the input words is increased beyond eight. Further, the employment of a large number of latches involves a further disadvantage when the apparatus is embodied using one or more programmable gate arrays (PGAs), which are also known as programmable logic arrays. A PGA comprises a chip having an array of configurable logic blocks (CLBs), one of which is shown schematically in FIG. 3 of the accompanying drawings. Each CLB comprises a programmable logic or gate section (PLS) and a latch LA, and the CLBs (and the PLSs and latches thereof) can be connected together in any desired manner. In each CLB, it is possible to use only the PLS or only the latch LA. Alternatively, it is possible to use both the PLS and latch LA, the former feeding an output signal into the latter. The PLS can be programmed to perform any one of a variety of desired logical operations on up to (say) four input signals supplied thereto. (Alternatively, the PLS may be programmed to perform any two separate ones of a variety of desired logical operations on up to three input signals, provided that the input signals for the two operations are the same.) A PGA of the above-described type (referred to as a logic cell array) is available from Monolithic Memories of Santa Clara, Calif., under the type numbers M2064/M2018.
It is possible to embody the apparatus of FIG. 2 in a PGA of the type mentioned above. Thus, the full adders FA can be implemented by appropriate programming of an appropriate number of PLSs. To maximize hardware use, at least some of the latches LA immediately following the full adders FA can be the latches of the same CLBs whose PLSs form the full adders. However, the bulk of the latches LA of FIG. 2 would comprise the latches of individual CLBs whose PLSs would not be used. There would therefore be a very great wastage of available hardware.
An object of the invention is to provide a pipelined binary adding apparatus in which the number of latches is reduced.
Another object of the invention is to provide a pipelined binary adding apparatus which is well adapted to implementation in the form of a programmable gate array in a manner which minimizes hardware wastage.
A further object of the invention is to provide a pipelined binary adding apparatus having a reduced delay between input and output.