1. Field of the Invention
The present invention relates to a multiplier, which is particularly used in a parallel multiplier.
2. Description of the Related Art
FIGS. 1A and 1B show a conventional parallel multiplier using Booth's algorithm. In FIGS. 1A and 1B, reference numeral 1 is a Booth's decoder, 2: a Booth's selector, and 3: a parallel adding circuit for adding partial products.
FIG. 2 shows an example of a circuit forming the Booth's decoder 1 of FIG. 1A. In FIG. 2, reference numeral 4 is an NAND gate, 5: an NOR gate, 6: an OR gate, 7: an AND gate, and 8: an inverter.
FIG. 3 shows an example of a circuit forming the Booth's selector 2 of FIG. 1A. In FIG. 3, reference numeral 9 is an exclusive NOR gate.
Regarding the parallel adding circuit 3, the parallel adding circuit, which is disclosed in Japanese Patent Application KOKAI Publication No. 63-55627, can be used.
Booth's algorithm is algorithm for multiplying a multiplier X and a multiplicand Y, which are expressed by twos complement, together at high speed as shown in equations (1) and (2). In this algorithm, as shown in equation (3), if a value of the multiplicand Y is decoded every three bits by the decoder 1, and each bit of the multiplier X is selected by the selector 2 in accordance with the decoding result, a partial products Pm is generated.
A product Z can be obtained by adding the partial products Pm from m=0 to m=(n/2)-1 as shown in equation (4). According to this algorithm, since the number of the partial products can be reduced to a half of the case in which an array multiplier using an AND gate to generate the partial products, the calculating speed can be increased. Normally, a value of y.sub.2m-1 is set to 0 when m=0. EQU Multiplier X=-2.sup.n-1 x.sub.n-1 +2.sup.n-2 x.sub.n-2 + . . . +2x.sub.1 +x.sub.0 ( 1) EQU Multiplicand Y=-2.sup.n-1 y.sub.n-1 +2.sup.n-3 y.sub.n-2 + . . . +2y.sub.1 +y.sub.0 ( 2) EQU PARTIAL PRODUCTS Pm=X (-2y.sub.2m+1 +y.sub.2m y.sub.2m-1)2.sup.2m( 3)
wherein y.sub.-1 =0. ##EQU1##
In recent years, a portable data communication apparatus has been widely used. As an LSI, which is mounted on the apparatus, an LSI whose consumption of electrical power is low is required so as to prolong a life of a battery. Moreover, in order to deal with noise and an insufficiency of a communication channel capacity, the digital processing is essential, and an LSI for digital signal processing, that is, a digital signal processor (DSP) is mounted on the above apparatus.
By mounting the digital signal processor on the data communication apparatus, highly sophisticated processing can be realized. However, as the processing becomes sophisticated, the following problem rises.
More specifically, the principle of the operation of the digital signal processor is substantially the same as that of a general purpose microprocessor. Due to this, the more the processing becomes complicated, the more the processing time is increased. However, since the processing time has its upper limit, a frequency of an operation clock must be increased in the portable data communication apparatus in which a real time operation is required. However, if the clock frequency is increased, the consumption of electrical power is increased.
The above problem does not meet the requirement of the LSI mounted on the portable data communication apparatus.
As means for preventing such a problem, there is a parallel processing. The parallel process is that a plurality of processings are executed in parallel. An amount of processing within the unit time can be increased without increasing the clock frequency.
However, the basic calculation of the digital signal processing is an operation in which the calculation of the sum of products, that is, the multiplied results are cumulatively added. Therefore, both the multiplier and the adder are built in the digital signal processor.
Moreover, the above-mentioned convention multiplier can multiply only a pair of data at one time. Due to this, for executing the calculation of the sum of products at a double speed by the parallel processing, two multipliers must be built in the digital signal processor. Also, for cumulatively adding the multiplied results, at least two adders must be built therein.
Moreover, for obtaining the final result of the cumulative addition, the results, which are separately added, must be added. Due to this, one more adder must be built in the digital signal processor, or two sets of registers (accumulators) for saving the results, which are separately added, must be prepared.
Furthermore, a compiler having an optimization function is indispensable for using such a paralleled architecture. However, since an object conversion efficiency of the compiler does not suffice, the programming of the digital signal processor is executed by use of a common assembler. Due to this, the paralleled architecture applies an optimization load to the programmer, and efficiency of developing the software is dropped.
Moreover, in the digital signal processing, a problem of a calculation precision always follows. Particularly, this problem is brought about when the DSP for fixed-point calculation is used. An error included in the result of the multiplication is accumulated by the cumulative addition, and the operation of the entire system becomes unstable.
If the double precision calculation is used, the problem of the precision is improved. However, the circuit scale of the double precision multiplier is four times as large as that of the single precision multiplier, an area occupacy ratio of the double precision multiplier to the LSI is increased. In addition, the conventional multiplier is used, the plurality of the multipliers are needed in parallel processing for the above-mentioned reason. Therefore, such multipliers are not suitable in practical use.
As mentioned above, the conventional multiplier has the following disadvantages.
The more the processing becomes complicated, the more the clock frequency must be increased, and the consumption of the electrical power is increased. The provision of the parallel processing enlarges the circuit scale of the digital signal processor. The paralleled architecture applies an optimization toad to the programmer.