1. Field of the Invention
The present invention relates to a processor such as a semiconductor device for executing operations of addition and multiplication, its operation method used in the device, and a data processor in which the processor is used.
2. Related Background Art
In recent years, operation speeds of adders or multipliers are significantly increased with remarkable developments of semiconductor manufacturing technology due to micronization and semiconductor circuit technology including algorithms. Their arithmetic processing is used for every kind of semiconductor devices including a field of a central processing unit (CPU) or a digital signal processor (DSP). More the technologies develop, however, this arithmetic processing is required to have a higher performance, that is, a higher speed.
Particularly in fields requiring image processing in a multi-media age and a tremendous amount of calculations such as matrix operations, high-speed processing is required; in particular, the processing of an adder or a multiplier is one of the most important processing for determining its performance and it is required to be performed in higher speed.
As an example of an adder in the present operation method, an explanation is given below for an adder described in "Design of CMOS VLSI" (Supervised by Takuo Sugano, Baifukan).
For an addition of two binary numbers, assuming that X and Y indicate the binary numbers, S indicates a sum of X and Y, and C indicates a carry, there are the following four types of calculations if X and Y each have a single place:
When X=0 and Y=0, S=0 and C=0. PA0 When X=0 and Y=1, S=1 and C=0. PA0 When X=1 and Y=0, S=1 and C=0. PA0 When X=1 and Y=1, S=0 and C=1.
If the sum S and the carry C are expressed by logical expressions considering the above as a truth table, expressions S=X+Y and C=X Y are obtained. They can be achieved in a two-input two-output circuit based on a single exclusive OR and a single AND as shown in FIG. 41A. A circuit having this function is called a half adder.
If the binary numbers each have multiple places, in other words, if they have a bit width of two or more bits each, it is required to perform processing of a carry signal from a lower place. Accordingly, the processing needs a circuit in which three binary numbers, X.sub.i, Y.sub.i, and C.sub.i-1 can be added for a place. This three-input two-output circuit is called a full adder. FIG. 41C shows a truth table and a logical expression representing its operation. A circuit for performing an addition of any number of places can be obtained by arranging the required number of full adders and connecting them so that a carry signal of a lower adder is entered into an upper adders. It is called a ripple carry adder. An example formed as a four-bit adder is illustrated in FIG. 41B. Although there are a variety of circuitry for a single-bit full adder which correctly reflect the action of the truth table in FIG. 41C, there is a point for designing with a purpose of a high-speed operation not in creating a sum signal, but in transmitting a carry signal entered from a lower place to an upper place as speedily as possible. FIG. 41D shows an example of full adders designed from this viewpoint.
If the number of the places is increased to, for example, 16 bits, there is a limitation on speed-up achieved by an improvement in an individual full adder and therefore, the speed-up must be achieved by the entire 16-bit adder. Since the operation speed of the adder is regulated by a carry transmission speed as mentioned above, the speed-up can be achieved if a carry signal of the adder itself can be determined without awaiting a carry signal from a lower adder.
A carry signal for all places can be created only from an input value of the own place and a carry signal of the lowest place. It is called carry look ahead (CLA). An example of a circuit to which this method is applied (CLA circuit) is shown in FIG. 42A. In FIG. 42A, HA indicates a half adder in FIG. 42B, and a part enclosed by a dotted line is achieved by a CMOS circuit in FIG. 42C.
At the actual implementation in the circuit, mostly the carry signal for all places is not created in CLA taking into consideration the hardware amount or an efficiency, but the carry signal is transmitted by using CLA for each block such as, for example, a block consisting of 4 bits and the carry signal is transmitted by using a ripple within each block (block CLA). An example of a 16-bit adder in this method is illustrated in FIG. 43.
A subtraction is achieved by adding a 2's complement of a subtracter to a minuend by using the adder.
Also in the above methods, however, it is not easy to achieve further speedy operations against an increase of operands since both the number of elements and the operation time are increased significantly with an increase of the number of the operands.
For example, parallel six stages of add operations can be performed as shown in FIG. 44 for a speedy operation when 63 pieces of data are totally added, but 62 full adders are required. On the other hand, the operation can be performed only by a single full adder as shown in FIG. 45 if the number of the elements are decreased, but addition must be performed 62 times sequentially.
Then, a parallel multiplier is briefly described below as an example of a multiplier in the present operation method.
In a multiplication of (n.times.n) bits, a partial product is obtained as follows: ##EQU1## where the partial product is a result of multiplying the following multiplicand by a single bit 2.sub.j Y.sub.j (J=0, 1, . . . , n-1) of a multiplicator Y: ##EQU2##
Since there are only 0s and is in binary numbers, P.sub.ij is always 0 when Y.sub.j is 0, and each bit of P.sub.ij equals each bit of X.sub.i when Y.sub.j is 1. Accordingly, the partial product can be obtained by taking AND between each bit of the multiplicand and a bit of the multiplicator. By adjusting the places of the created partial products according to weights of multiplicator bits and adding them each other, the following multiplication result can be obtained: ##EQU3## The most fundamental parallel multiplier can be achieved by arranging hardware (AND gate) for creating partial products in the above and a circuit for adding partial products in an array and connecting them. A parallel multiplier of 8 bits.times.8 bits is shown as an example in FIG. 46. As shown in this drawing, the parallel multiplier includes a full adder 301, a half adder 302, and an AND gate 303.
As shown in this example, in a multiplication for (n.times.n) bits, the partial products are easily and speedily calculated in n.sup.2 AND gates and an addition step for adding the partial products regulates the operation speed. Therefore, increasing the speed of the addition step for the partial products is a key to the speed-up of a multiplier.
As improvement methods, there are a carry save adder method in which it is possible to remove the need of transmitting a carry signal in the own stage by transmitting a carry signal of an addition stage for the partial products to an adder in the next addition stage, a Wallace-tree method (Wallace, C., IEEE Trans. on Electronic Computers, EC-13, 1, 1964, pp.14-17) for performing an addition step in the same place in parallel, and a method in which a Booth algorithm (Rubinifield, L., IEEE Trans. on Computers, C24, 10, 1975, pp.1014-1015) is used to decrease the number of the created partial products, to increase the speed of the operation.
In the above methods, however, both the number of the elements and the operation time are significantly increased with an increase of the number of the bits and it is not easy to further increase the speed against the tendency of increasing the bits. Accordingly, a multiplier to which a multivalued logic is applied is reported recently (T. Hanyu et al. Proc. IEEE Int. Symp. on MVL, pp.19-26, May (1994), November 1993). It, however, does not come to be put to practical use in the present conditions.