A general-purpose processing system which performs multiply and add operations may allow these arithmetic operations to be performed at varying precision. High-precision operations generally consume greater circuit resources than low-precision operations. For example, in order to double the precision of a multiply operation, about four times as many circuits are required if the same performance is to be achieved.
A multiplier array which is capable of performing a multiply of two 64-bit operands, without reusing the array in sequential fashion, must generate the equivalent of 642, or 4096 bits of binary product (a 1-bit multiply is the same as a boolean or binary “and” operation), and reduce the product bits in an array of binary adders which produces 128 bits of result. As a single binary adder (a full adder) takes in three inputs and produces two outputs, the number of binary adders required for such an array can be computed 642-128, or 3968.
There are well-known techniques for reducing the number of product bits, such as Booth encoding. There are also well-known techniques for performing the required add operations so as to minimize delay, such as the use of arrays of carry-save-adders. These techniques can reduce the size of multiplier arrays and reduce the delay of addition arrays, however, these techniques do not appreciably change the relation between the size of the operand and the size of the multiplier and adder arrays.
Using the same arithmetic as before, a multiply of 32-bit operands generates the equivalent of 322, or 1024 bits of binary product, and use the 322-64, or 960 full adders to generate a 64-bit product. This clearly is approximately one fourth the resources required for a multiply of 64-bit operands.
Because the product of 32-bit operands is 64-bits, while the product of 64-bit operands is 128-bits, one can perform two 32-bit multiples which produce 2 64-bit products, giving a 128-bit result. As such, because the 32-bit product uses one-fourth the resources of the 64-bit product, these two 32-bit products use one-half the resources of the 64-bit product. Continuing this computation, four 16-bit products use one-quarter of the 64-bit multiplier resources, eight 8-bit products use one-eighth of the resources, and so forth.
Thus, while this technique produces results with the same number of bits as the 64-bit product, decreasing the symbol size results in a proportionately decreasing utilization of the multiplier and adder array resources. Clearly, a design that has sufficient resources for a 64-bit multiply will be under-utilized for multiplies on smaller symbols.
Accordingly, there exits a need for a method, instruction set and system in which a set of multiplier and adder circuit resources may be employed in a manner that increases the utilization of these resources for performing several multiply and add operations at once as a result of executing an instruction, and which also permits the expansion of the multiplier and adder circuit resources to an even higher level so as to further increase overall performance.