1. Field of the Invention
This invention relates to the field of data processing. More particularly, this invention relates to data processing utilising a special purpose multiplier.
2. Description of the Prior Art
There is an established body of designs for digital circuits that produce the product of two input integer operands. It is also known to produce circuits that perform a multiplication and addition operation in response to a single "multiply-accumulate" instruction (e.g. (A*B)+C). Circuits that perform a multiplication and an addition can be considered to be an included subset of general purpose multiplier circuits.
In response to circuit area and cost constraints, it is known to provide such functions using a circuit that performs only a portion of the required multiplication operation at a time and to use that circuit iteratively to complete the full multiplication. An example of this is that if the product of two 32-bit numbers is required, it is possible to provide a circuit that produces the product of a 32-bit number and an 8-bit number to produce a partial result and to use this circuit four times to calculate the full 32-bit required result. The circuit which is repeatedly used within such systems is termed the multiplier core.
A sophisticated multiplier core aiming for high performance operation may use techniques such as Booth recoding and carry-save addition, whereas in a less demanding system a one bit per cycle "AND-gate and adder" multiplier core may be utilised.
FIG. 1 of the accompanying drawings illustrates an iterative multiplier. Two input operands A,B are supplied to an initialisation and latch circuit 2 where they are stored. The initialisation and latch circuit 2 initialises the rest of the circuit when it has latched the two operands and then outputs D and R (latched versions of A and B) directly to a multiplier core 4 and an initialisation value I via a multiplexer 6 to the multiplier core 4. The initialisation value I may be zero, or computed from A and B, depending on the type of multiplier core used. On each iterative cycle, the multiplier core 4 generates a partial result that is stored within a result latch 8. At least some of the bits of the partial result are fed back to the multiplier core 4 via the multiplexer 6 for the next iterative cycle. When the full number of cycles have been completed, the multiplication result is output from the result latch 8.
The circuit of FIG. 1 utilises a standard binary representation of the operands and partial results throughout. FIG. 2 of the accompanying drawings illustrates a circuit using a redundant data representation within the multiplier core such that the result produced by the multiplier core on each iteration and after the final iteration is represented by two numbers that must be added to yield the standard binary representation (e.g. the carry result and the save result in a carry-save system). Circuits such as that illustrated in FIG. 2 yield a faster and/or less expensive multiplier core at the cost of having to perform a final addition to complete the multiplication.
FIG. 2 illustrates how two partial results C,S are output from a multiplier core 10 utilising a redundant data representation and stored within a result latch 12. At least some of the bits of these two partial results C,S are then fed back to the multiplier core 10 for the next iteration via respective multiplexers 14, 16. When the last iteration has been completed, the two partial results C,S are supplied to an adder 18 where they are subject to an addition operation to generate the final multiplication result.
A known refinement to multipliers as illustrated in FIGS. 1 and 2 is to provide a mechanism for early termination. Early termination is a technique intended to reduce the average number of iterations through the multiplier core (and hence the average time to perform a multiplication operation) by detecting situations in which further iterations through the multiplier core will not change the value of the multiplication result (typically because these iterations would each add only zero to the result). In some multiplier designs it is possible to determine the number of iterations that will be required by inspecting the value of the input operands. Typically, if one or both of the operands is small, fewer than the maximum number of iterations will be required, whereas if the operands are both large then all the iterations will be required.
Whilst the omitted iterations following such early termination do not change the value of the result, they typically change the position or "alignment" of the result on the output buses, as each iteration through the multiplier core gives a different weighting to the multiplier outputs. In order to deal with this, multipliers implementing early termination provide circuits to perform variable length shifts of the multiplier core outputs to counter the effect on the result alignment caused by any early termination.
FIG. 3 of the accompanying drawings illustrates the circuit of FIG. 2 modified to provide an early termination capability. The circuit of FIG. 3 differs from that of FIG. 2 by the addition of respective early terminate shift mechanisms 20, 22 to act upon the partial results C,S. In operation, if the multiplication is to be early terminated, then the contents of the result latch 12 are passed through the early terminate shift mechanisms 20, 22 to undergo a degree of realignment dependent upon the stage at which the multiplication has been terminated. The realigned partial results are then combined by the adder 18.