Hardware-based multipliers are often added to digital signal processors (DSP) and central processing units (CPU) to speed up the multiplication function, and thus speed up application execution. Typically, in DSPs and CPUs, the operands are loaded into respective registers, and are then supplied to the execution unit to have the appropriate operation performed. The execution unit is often made up of one or more function units, each of which has access to the operands stored in the registers, and which performed a particular dedicated function, e.g., add, Boolean arithmetic, or multiplication. The longest time taken by any of the function units represents the critical path of the execution unit, and determines the maximum speed of the execution unit. Since all the function units have access to the operands in parallel, all the function units perform their respective function each cycle of the execution unit. However, only the result from the function unit specified by the instruction currently being executed is supplied as the output of the execution unit. As a result, much work is performed, and power is consumed so doing, although only a small fraction of that consumed power contributes to the production of a beneficial result. In particular, the multiplier function unit, consumes the lion's share of the power, while it is often used only a small percentage of the time.
Known approaches to reducing the power used by the multiplier is to logically gate one or more of its inputs with a control signal that is responsive to the instruction type, where inputs are supplied to the multiplier only when multiplication is actually being performed. The most commonly used type of multiplier makes use of Booth encoding of one of the operands. It is known that gating the Booth encoded input yields the lowest power consumption in the multiplier. In addition, it is also well known that the critical path of the multiplier is along the path through the Booth encoder. As a result, disadvantageously, gating this input would add delay to the multiplier, and hence slow down the speed of the entire execution unit.