This invention relates to an improvement of an arithmetic processor for executing multiplication and product-sum operation, and particularly relates to an arithmetic processor and an arithmetic method for high-speed execution of multiplication instructions and product-sum operation instructions.
Recently, a high-speed DSP (digital signal processor) is being developed as a hardware for executing high-speed digital signal processing for enormous data in fields of communication and image processing. Particularly, high speed multiplier and accumulator are contemplated for executing in high speed product-sum operation instructions which are the most frequently used in the DSP. As a product-sum arithmetic processor built in a conventional DSP, a multiplier and an adder for accumulation are in pipeline construction (refer to, for example, ISSCC Technical paper, 1993, pp.-28-29).
One example of the above-mentioned conventional product-sum arithmetic processor is explained, with reference to drawings. In FIG.12 of block diagram showing the conventional product-sum arithmetic processor, reference numeral 101 denotes a multiplier for outputting one pair of multiplied results generated by adding partial products in a carry save adder. 102 denotes a carry lookahead adder (hereinafter referred to as CLA) for adding and converting into a binary numeral output results of the multiplier 101. 103 denotes an accumulation part for accumulating outputs of the CLA 102. 104 denotes a storage for supplying a source operand to the multiplier 101.
Also, 110-113 are latches, wherein 110 is a first input latch for storing a multiplier factor Y, 111 is a second input latch for storing a multiplicand X, 112 is an intermediate latch for storing the output result of the CLA 102, and 113 is an accumulation result latch for storing a result of the accumulation part 103 and for forwarding the own output to the accumulation part 103. 114 is a selector for selecting and outputting one of an output of the intermediate latch 112 and an output of the accumulation result latch 113.
The operation of the thus constructed product-sum arithmetic processor is explained next.
In case of execution of a multiplication instruction, the selector 114 is operated, according to a decoded result of the instruction, to output the output of the intermediate latch 112 as an output of the product-sum arithmetic processor. One pair of source operands (X, Y) are selected and outputted from the storage 104.
Next, the pair of source operands (X, Y) are supplied respectively through first and second input latches 110,111 to the multiplier 101 to generate a pair of multiplied results. The output result of the multiplier 101 is added in the CLA 102.
Then, the output result of the CLA 102 is latched to the intermediate latch 112 to be outputted through the selector 114.
In case of executing a product-sum operation instruction, the selector 114 is operated, according to a decoded result of the instruction, to output the output of the accumulation result latch 113 as an output of the product-sum arithmetic processor. One pair of source operands (X0, Y0) are selected and outputted from the storage 104.
Next, the source operands (X0, Y0) are supplied respectively through first and second input latches 110, 111 to the multiplier part 101 to generate a pair of multiplied results. The output result of the multiplier 101 is added in the CLA 102.
Then, the output result of the CLA 102 is latched to the intermediate latch 112 and supplied to the accumulation part 103 for addition, together with the output of the accumulation result latch 113.
The output of the accumulation part 103 is latched to the accumulation result latch 113 and outputted through the selector 114.
As to a multiplier constructing the product-sum arithmetic processor, there is a redundant binary multiplier as a conventional high-speed multiplier.
One example of the conventional redundant binary multiplier is explained, with reference to drawings.
FIG.13 is a block diagram of a conventional 16.times.16 redundant binary multiplier. In FIG.13, reference numerals 130-133 denote respectively first to fourth partial product generation circuit arrays (hereinafter referred to as first to fourth PPG arrays). 140 denotes a record circuit of Booth.
Reference numerals 120-123 denote redundant binary adder arrays (hereinafter referred to as RBA arrays) for adding redundant binary numerals and for generating a redundant binary output (combination of sign and absolute value or of positive output and negative output). Wherein 120 is a first RBA array for generating a result on upper digit side, 121 is a second RBA array for generating a result on lower digit side, 122 is a third RBA array for adding a result of the first RBA array 120 and a result of the second RBA array 121, and 123 is a fourth RBA array for adding a result of the third RBA array 122 and a supplementary term generated by the first PPG array 130. The partial product generation circuit array for generating a partial product and a supplementary term, the record circuit of Booth, and the redundant binary adder array are disclosed in U.S. Pat. No. 4,864,528 to Nishiyama et al.
The operation of the thus constructed redundant binary multiplier is explained below.
A 16-digit multiplier factor supplied from the first input latch 110 is converted into eight pairs of record values in the Booth's record circuit. First to fourth PPG arrays 130-133 generates four partial products and four supplementary terms, using the eight pairs of record values and a multiplicand supplied from the second input latch 111. Each supplementary term absorbs, in the addition in the redundant binary expression, a carry at a one-digit upper digit from a certain digit without exception so as not to propagate the carry to further upper digits, namely so as to propagate the carry to only one digit. In the second RBA array 121, the partial product and the supplementary term which are generated by the fourth PPG array 133 and the partial product generated by the third PPG array 132 are added. In the first RBA array 120, also, the supplementary term generated by the third PPG array 132, the partial product and the supplementary term which are generated by the second PPG array 131 and the partial product generated by the first PPG array are added. Next, in the third RBA array 122, the outputs of first and second RBA arrays 120,121 are added. Finally, in the fourth RBA array 123, the output of the third RBA array 122 and the supplementary term generated by the first PPG array 1SO are added, thereby an output in redundant binary numeral is obtained.
In the conventional product-sum arithmetic processor in FIG.8, however, the processing time of the multiplier 101 is the longest of those of the respective circuits, such as the CLA. Therefore, the operation speed of the multiplier restrains the high speed arithmetic operation.
Further, in the above conventional redundant binary multiplier with a binary tree adder construction shown in FIG.13, in case where a multiplier factor is a multiple of 4, the fourth RBA array 123 for adding the supplementary term at the most significant digit of output of the third RBA array 122 is required, which involves one-stage increase of addition stages.