Multiplication is one of the most time-consuming arithmetic operations for a processor to perform. As a result, much effort has been expended at making the multiplication operation more efficient. In many instances, the success of a particular effort has been measured by determining if it results in an acceptable tradeoff between the number of clock cycles required to execute a multiply operation, versus the amount of hardware required to implement the execution. For example, a 16-bit by 16-bit multiply instruction can be executed in one clock cycle (or a small number clock cycles, accounting for instruction execution overhead) if a 16.times.16 hardware multiplier is used, but the same instruction will take more clock cycles if a smaller multiplier is used.
Other approaches have been taken as described, for example, in U.S. Pat. Nos. 5,557,563 and 4,276,607. The disclosures of these patents are summarily described here and, for full details, the reader is referred directly to their disclosures.
U.S. Pat. No. 5,557,563 to Larri et al. describes a processor circuit that terminates a multiply instruction based on the one of the input operands being small, limiting the number of bits of the result. The circuit described by Larri et al. can terminate the multiply operation after one, two, three or four iterations of the multiplier core. See, e.g., col. 5, lines 57-58.
U.S. Pat. No. 4,276,607 to Wong describes a processor circuit that detects trailing zeros in a multiplier operand, and performs the multiplication operation only beginning with that word which is the lowest order word having a non-zero content.
What is desired is a circuit and method that can further reduce the number of clock cycles (or, at least, the average number of clock cycles) of a processor required to perform a multiply instruction.