1. Field of the Invention
The present invention relates to a multiplier used as an arithmetic circuit such as a data processor, a microprocessor, etc. and a multiplying method which is executed in the multiplier.
2. Description of the Prior Art
In recent years, in the multimedia processing intended for audio data, image data, etc., it has been requested to execute plural sets of multiplication processes by using a bit width (e.g., 16 bit) which is smaller than a data bit width (e.g., 32 bit) of a processor. But a high-speed multiplication has its limits. Therefore, in order to execute the plural sets of multiplication, the multiplication is carried out in parallel plural times by utilizing such smaller bit width (e.g., four parallel multiplication). It is a matter of course that the multiplication in a normal bit width (e.g., 32 bit) must be also handled.
FIG.1 is a view showing an example of a conventional circuit which has a 32 bit.times.32 bit multiplication function and four sets of 16 bit.times.16 bit parallel multiplication function together. A 32 bit.times.32 bit multiplier 51 is operated to execute the 32 bit.times.32 bit multiplication, whereas four 16 bit.times.16 bit multipliers 52a, 52b, 52c, 52d are operated simultaneously to execute the four sets of 16 bit.times.16 bit parallel multiplication.
However, in case an occupied area of a multiplier circuit becomes an issue, all the multiplier circuits 51, 52a, 52b, 52c, 52d cannot be incorporated like the above. For this reason, the multiplication must be handled only by the 32 bit.times.32 bit multiplier 51.
FIG.2 is a view showing an example of a conventional circuit which executes the 16 bit.times.16 bit multiplication and the 32 bit.times.32 bit multiplication by using the 32 bit.times.32 bit multiplier only. In order to execute the 16 bit.times.16 bit multiplication, the 32 bit.times.32 bit multiplier is separated into four multiplication blocks 61a, 61b, 61c, 61d and then the multiplication is executed by using twice the multiplication block 61a and the multiplication block 61d in which data propagation is not overlapped. Also, in order to execute the 32 bit.times.32 bit multiplication, the multiplier can be used as the normal 32 bit.times.32 bit multiplier. In both cases of the 32 bit.times.32 bit multiplication and the 16 bit.times.16 bit multiplication, outputs of the multiplication block 61c and the multiplication block 61d are outputs of the multiplication result. In this case, if the 16 bit.times.16 bit multiplication is to be executed, a function for cutting off carry propagation is needed between the multiplication block 61a and the multiplication block 61d not to cause interference of data.
In the following explanation, assuming that a(multiplicand).times.b(multiplicator) is expressed as &lt;(m-1):0&gt; in m-bit data, where 0 is a least significant bit and (m-1) is a most significant bit. Assuming that, in the 32 bit.times.32 bit multiplication, the multiplicand of input data being represented by two's complement is expressed as x&lt;31:0&gt; and the multiplicator thereof is expressed y&lt;31:0&gt;. Also, assuming that, in four sets of the 16 bit.times.16 bit multiplication, the multiplicands of input data being represented by two's complement are expressed as a1&lt;15:0&gt;, a2&lt;15:0&gt;, a3&lt;15:0&gt;, and a4&lt;15:0&gt; respectively, and also the corresponding multiplicators thereof are expressed b1&lt;15:0&gt;, b2&lt;15:0&gt;, b3&lt;15:0&gt;, and b4&lt;15:0&gt; respectively.
In order to respond to four sets of the 16 bit.times.16 bit parallel multiplication, selectors Sel1, Sel3, Sel5, Sel7 for selecting multiplicand data and also selectors Sel2, Sel4, Sel6, Sel8 for selecting multiplicator data are attached to preceding stages of data input ports of the multiplication blocks 61a, 61b, 61c, 61d respectively.
The lower 16-bit multiplicand data x&lt;15:0&gt; in the 32 bit.times.32 bit multiplication and also the multiplicand data a1&lt;15:0&gt;, a3&lt;15:0&gt; in four sets of the 16 bit.times.16 bit parallel multiplication are input into the selector Sell. The lower 16-bit multiplicator data y&lt;15:0&gt; in the 32 bit.times.32 bit multiplication and also the multiplicator data b1&lt;15:0&gt;, b3&lt;15:0&gt; in four sets of the 16 bit.times.16 bit parallel multiplication are input into the selector Sel2. Similarly, the upper 16-bit multiplicand data x&lt;31:16&gt; in the 32 bit.times.32 bit multiplication and 0 are input into the selector Sel3. The lower 16-bit multiplicator data y&lt;15:0&gt; in the 32 bit .times.32 bit multiplication and 0 are input into the selector Sel4. Then, the lower 16-bit multiplicand data x&lt;15:0&gt; in the 32 bit.times.32 bit multiplication and 0 are input into the selector Sel5. The upper 16-bit multiplicator data y&lt;31:16&gt; in the 32 bit.times.32 bit multiplication and 0 are input into the selector Sel6. Similarly, the upper 16-bit multiplicand data x&lt;31:16&gt;in the 32 bit.times.32 bit multiplication and also the multiplicand data a2&lt;15:0&gt;, a4&lt;15:0&gt; in four sets of the 16 bit.times.16 bit parallel multiplication are input into the selector Sel7. The upper 16-bit multiplicator data y&lt;31:16&gt;in the 32 bit.times.32 bit multiplication and also the multiplicator data b2&lt;15:0&gt;, b4&lt;15:0&gt; in four sets of the 16 bit.times.16 bit parallel multiplication are input into the selector Sel8.
Next, an operation of the multiplier constructed as above will be explained. First, in the case of 32 bit.times.32 bit multiplication, in the multiplication block 61a, the multiplicand data x&lt;15:0&gt; is selected by the selector Sel1 and also the multiplicator data y&lt;15:0&gt; is selected by the selector Sel2. Similarly, in the multiplication block 61b, the multiplicand data x&lt;31:16&gt; is selected by the selector Sel3 and also the multiplicator data y&lt;15:0&gt; is selected by the selector Sel4. In the multiplication block 61c, the multiplicand data x&lt;15:0&gt; is selected by the selector Sel5 and also the multiplicator data y&lt;31:16&gt; is selected by the selector Sel6. Similarly, in the multiplication block 61d, the multiplicand data x&lt;31:16&gt; is selected by the selector Sel7 and also the multiplicator data y&lt;31:16&gt; is selected by the selector Sel8. In the same manner as in the usual case, if partial products are generated from input data and then cumulative addition of the partial products is executed, the result of 32 bit.times.32 bit multiplication can be derived.
Then, in the case of 16 bit.times.16 bit multiplication, arithmetic operation will be executed in the following order. Four sets of the 16 bit.times.16 bit parallel multiplication to be calculated are set in the order of multiplicand and multiplicator like a first set: a1&lt;15, 0&gt;, b1&lt;15:0&gt;, a second set: a2&lt;15, 0&gt;, b2&lt;15:0&gt;, a third set: a3&lt;15, 0&gt;, b3&lt;15:0&gt;, and a fourth set: a4&lt;15, 0&gt;, b4&lt;15:0&gt;. The first and second sets are executed in the first arithmetic cycle and then the third and fourth sets are executed in the next arithmetic cycle.
To begin with, in the first arithmetic cycle, input data are selected by the selectors Sel1, Sel2 so as to input the multiplicand a1&lt;15, 0&gt;, the multiplicator b1&lt;15:0&gt;into the multiplication block 61a. Also, input data are selected by the selectors Sel7, Sel8 so as to input the multiplicand a2&lt;15, 0&gt;, the multiplicator b2&lt;15:0&gt; into the multiplication block 61d. In this case, in order to avoid unnecessary data propagation in the cumulative addition, in the multiplication block 61b and the multiplication block 61c, input data 0 are selected as the multiplicand and the multiplicator by the selectors Sel3, Sel5 and the selectors Sel4, Sel6 respectively. In this situation, when the 32 bit.times.32 bit multiplication is carried out, the multiplication result of the second set: a2.times.b2 can be output to the upper 32-bit out of 64-bit output and also the multiplication result of the first set: a1.times.b1 can be output to the lower 32-bit out of the 64-bit output.
Then, in the succeeding arithmetic cycle, the multiplication results of the third set: a3.times.b3 and the fourth set: a4.times.b4 can be derived similarly. In this way, four sets of multiplication results can derived by two arithmetic cycles in total.
However, in the example shown in FIG. 1, because a number of multipliers each having a wide occupied area are provided and in addition the circuits which are not always operated depending on a command are present, the multiplier circuit is wasteful in structure. Therefore, in the event that the occupied area is made much account, such wide occupied area has becomes the very serious drawback.
In the example shown in FIG. 2, the multiplication blocks equivalent to four 16 bit.times.16 bit multiplication blocks have been incorporated, nevertheless two arithmetic cycles are needed in order to obtain four sets of 16 bit.times.16 bit multiplication results. In other words, in this example in the prior art, it is possible to say that the circuit has occupied twice areas in view of performance of the 16 bit.times.16 bit multiplication.