1. Field of the Invention
The present invention relates to a squaring circuit for squaring an n-bit binary number X hereinafter referenced [x.sub.n-1, x.sub.n-2, . . . x.sub.0 ].
2. Discussion of the Related Art
To square a binary number X, a table of squared numbers stored in a non-volatile memory (ROM) is commonly used, the number X being applied on the address lines of the ROM. To store all the squares of an n-bit number X, the size of the ROM must be of 2.sup.n words of 2n bits. Such a ROM-table is particularly large in silicon area.
Furthermore, a ROM has a long access time. In present common technologies, successive words in ROM cannot be accessed at a frequency as high as approximately 50 MHz.
Another method for squaring a number X is to use a binary multiplier provided twice with the number X. The operation of a conventional specific binary multiplier, referred to as a Booth's multiplier, as described below, is advantageous in that it is fast and uses a relatively small area.
By multiplying a binary number X expressed in decimal by x.sub.0 2.sup.0 +x.sub.1 2.sup.1 + . . . x.sub.i 2.sup.i + . . . x.sub.n-1 2.sup.n-1 and a binary number Y expressed in decimal by y.sub.0 2.sup.0 +y.sub.1 2.sup.1 + . . . y.sub.i 2.sup.i + . . . y.sub.n-1 2.sup.n-1, a product XY is obtained equal in decimal to the sum of all the products x.sub.i y.sub.j 2.sup.i+j, where x.sub.i =0 or 1, y.sub.j =0 or 1, and i and j vary between 0 and n-1.
The products x.sub.i y.sub.j (equal to 0 or 1) are arranged in a table having 2n-1 columns numbered from 0 to 2n-2, from right to left, corresponding to successive weights of the binary product XY, and n lines numbered from L1 to Ln.
FIG. 1 represents such a table with exemplary four-bit numbers X and Y. In a column of weight k, all the products x.sub.i y.sub.j such that i+j=k are successively arranged. The table is filled triangularly, as represented. The value (0 or 1) of each product x.sub.i y.sub.j is obtained by pre-processing circuits that define a logic AND between the bit x.sub.i of number X and the bit y.sub.j of number Y. Each line Lp (p=1, 2, . . . n) corresponds to a word Lp whose bits of successive weights are constituted by the contents of the cells of line Lp, respectively. The desired binary product XY is obtained by summing the words L1 to Ln.
Hereinafter, terms such as "line" of the table and "word" are used interchangeably as are "cell", "term", and "bit" of the table.
To sum words L1 to Ln, it is common to use a first series of adders to sum the words by pairs, a second series of adders to sum the outputs of the first series of adders by pairs, and so on. In the example of FIG. 1, a first adder would sum words L1 and L2, a second adder would sum words L3 and L4, and a third adder would sum the results provided by the first and second adders. Generally, n-1 adders are required to sum the n words L.
When a Booth's multiplier is used to square a number X, a table such as the one of FIG. 1 is obtained by replacing y's with x's.
FIG. 2A represents an elementary summing cell of a conventional adder. Two bits of weight i, Ai, and Bi, of two words to sum are provided to an Exclusive OR gate 10 whose output provides the sum Si of bits Ai and Bi. Moreover, the gate 10 receives a carry bit Ci-1 generated by a preceding cell summing the bits of weight i-1 of the two words to sum. The cell provides a carry bit Ci to the next cell through an OR gate 12 receiving the outputs of three AND gates 14. A first of gates 14 receives bits Ai and Bi, a second gate receives bits Ai and Ci-1, and a third gate receives bits Bi and Ci-1.
A p-bit adder includes p-1 cells, referred to as full-adders, such as the ones of FIG. 2A, and includes, for weight 0, a simpler, so-called half-adder, described hereinafter.
Gates 12 and 14 have a specific switching time. Thus, a carry bit Ci is steady shortly after the inputs Ai, Bi and Ci-1 are steady. The same is true for the carry bit Ci-1 which has a steady value shortly after the inputs of the preceding cell are steady, and so on. It is said that the carry bit propagates from the cell of weight 0 to the cell of highest weight. The stabilization time of the output of a p-bit adder is substantially equal to the propagation time, that is, proportional to p.
If, additionally, the output of a first adder is provided to a second adder, the output of the second adder can stabilize only when the output of the first adder is steady. The stabilization time of the last adder of a set of cascaded adders increases with the number of adders.
Accordingly, to increase the speed of a circuit such as a Booth's multiplier, using cascaded adders, it is advantageous to decrease the size of the adders and the number of the cascaded adders.
The table of FIG. 1 has some properties allowing to simplify the adders to be used and to decrease their stabilization time. Indeed, a large number of cells of the table are empty (at 0). To sum the words L3 and L4, for example, a 5-bit adder is normally needed. However, the bits of weight 0 and 1 of word L3 are zero and the bits of weights 0 to 2 and 4 of word L4 are zero. In an adder summing the words L3 and L4, the outputs S0 and S1 are forced to 0 and the output S2 is directly connected to the bit of weight 2 (x.sub.2 y.sub.0) of word L3. To sum the bits of weight 3, a half-adder is used, and to sum the bits of weight 4, since one of the bits is zero, a cell similar to the half-adder is used. Thus, instead of using a 5-bit adder, an adder with only two half-adders is used.
FIG. 2B represents a half-adder. When one of the bits of the numbers to be summed is zero, or when the cell is the first one (receiving no carry bit), the cell structure is simplified. When one of the bits, for example Bi, is zero the cell includes an Exclusive OR gate 16 receiving the non-zero bit Ai and carry bit Ci-1 provided by the preceding cell. An AND gate 18 receiving the bits Ai and Ci-1 provides the carry bit Ci to the next cell. The first cell of an adder has the same structure except that i=0, and bit B0 is provided to gate 16 instead of carry bit Ci-1. The structure of FIG. 2B is simpler and faster than the one of FIG. 2A. It is therefore advisable to use this structure whenever possible.