1. Field of the Invention
The present invention relates generally to the field of multiply-accumulate modules and parallel multipliers. More specifically, the present invention is directed towards low power and high performance multiply-accumulate modules and parallel multipliers, and methods for designing such multiply-accumulate modules and parallel multipliers.
2. Description of Related Art
Some known multiply-accumulate modules may comprise a multiplier register, a multiplicand register, an accumulator or result register, and a multiply-accumulate core. The multiplier register may comprise a first binary number and multiplicand register may comprise a second binary number. Moreover, the multiply-accumulate core may multiply the first binary number and the second binary number, and also may add the product of the first binary number and the second binary to a third binary number initially or previously stored in the result register. The multiply-accumulate core may comprise a Booth encoder, a plurality of data processing cells, a Booth decoder, and a Wallace tree. The multiply-accumulate core also may comprise an adder circuit, and a saturation detection circuit. The multiplier register may be connected to the Booth encoder, which may be connected to the Booth decoder. The multiplicand register may be connected to each data processing cell. In addition, each data processing cell may be connected to the Booth decoder. The Booth decoder may be connected to the Wallace tree, which may be connected to the adder and the result register. Moreover, the adder may be connected to the saturation detector, which may be connected to the result register, such that the product of the first binary number and the second binary number may be added to the third binary number initially stored in the result register. This new value then may replace the initial value stored in the result register. The result register then is connected to the Wallace tree, such that a product of the subsequent first binary number and the subsequent second binary number may be added to the previous output stored in the result register, i e., the sum of the value initially stored in the result register and the product of the first binary number and the second binary number. As such, the previous output stored in the result register may be replaced by a new output from the multiply-accumulate core. Moreover, the new output from the multiply-accumulate core stored in the result register may be expressed as An=An−1+Xi*Yi, where An−1 is the output from the multiply-accumulate core previously stored in the result register, Xi*Yi is the product of the current first binary number and the current second binary number being multiplied by the multiply-accumulate core, and An is the new value stored in the result register, which replaces An−1.
In any known multiply-accumulate module, the multiply-accumulate module may have a plurality of paths. A path may be defined as an electrical route through which an electrical signal travels in order to flow from an input of the multiply-accumulate module, e.g., the multiplier register or the multiplicand register, to an output of the multiply-accumulate module, e.g., the output from the saturation detector. A number of these paths also may be a critical path. A critical path may be defined as those paths through which an amount of time that it takes for the electrical signal to travel from an input of the multiply-accumulate module to an output of the multiply-accumulate module is greater than or equal to a predetermined amount of time, in which the predetermined amount of time is less than a greatest or longest amount of time that it takes any other electrical signal to travel from an input of the multiply-accumulate module to an output of the multiply-accumulate module. For example, the number of paths in the known multiply-accumulate module which also may be critical paths may be greater than ten thousand. Moreover, in any known multiply-accumulate module, the Wallace tree may comprise a plurality of Wallace tree cells, and each of the Wallace tree cells may comprise a Wallace tree circuit, which may comprise a plurality of components, e.g., a plurality of transistors. In addition, some of the Wallace tree cells may be involved in at least one critical path of the multiply-accumulate module. For example, some of the Wallace tree cells may be involved in one critical path, and other Wallace tree cells may be involved in greater than four thousand critical paths, greater than six thousand critical paths, or greater than eight thousand critical paths. Nevertheless, some Wallace tree cells may not be involved in any critical paths. Similarly, the Booth decoder may comprise a plurality of Booth decoder cells, and each of the Booth decoder cells may comprise a Booth decoder circuit, which may comprise a plurality of components. In addition, some of the Booth decoder cells may be involved in at least one critical path of the multiply-accumulate module, and other Booth decoder cells may not be involved in any critical paths.
Nevertheless, in one known multiply-accumulate module, when a first Wallace tree cell is involved in at least one critical path, and a second Wallace tree cell is not involved in any critical paths, the Wallace tree circuit for the first Wallace tree cell may be structurally the same as the Wallace tree circuit for the second Wallace tree cell, i.e., the circuit design employed in the first Wallace tree cell may be the same as the circuit design employed in the second Wallace tree cell. Moreover, the components used to implement the Wallace tree circuit design for the first Wallace tree cell may have the same performance capabilities as the corresponding components used in the Wallace tree circuit for the second Wallace tree cell, i.e., each of the components used in the first Wallace tree cell may operate with the same speed capabilities and may be the same size as a corresponding component used in the Wallace tree circuit for the second Wallace tree cell. When a first component is of a greater size, e.g., of a greater width, than a corresponding second component, the first component may operate at a faster speed than the second component. Nevertheless, the first component also may consume more power than the second component. Similarly, in such a known multiply-accumulate module, when a first Booth decoder cell is involved in at least one critical path, and a second Booth decoder cell is not involved in any critical paths, the Booth decoder circuit for the first Booth decoder cell may be structurally the same as the Booth decoder circuit for the second Booth decoder cell. Moreover, each of the components used in the Booth decoder circuit for the first Booth decoder cell may have the same performance capabilities as their corresponding component used in the Booth decoder circuit for the second Booth decoder cell.
Another known multiply-accumulate module may be substantially similar to the above-described known multiply-accumulate module, except that two power supplies operating at two different voltages may be employed to power the cells. Specifically, each of the first cells which are involved in at least one critical path may be connected to the first power supply. Moreover, each of the second cells which are not involved any critical paths may be connected to the second power supply, which may operate at a lesser voltage than the first power supply. Using two separate power supplies may decrease an amount of power consumed by those cells not involved in any critical paths, which also may decrease an amount of power consumed by the multiply-accumulate module. Nevertheless, using two power supplies may require the use of an extra power supply line, which may increase a size of the multiply-accumulate module.
Yet another known multiply-accumulate module also may be substantially similar to the above-described known multiply-accumulate module, including the employment of a single power supply, except that the threshold voltage of the transistors employed in those cells which are not involved in any critical paths may be altered. Nevertheless, employing transistors having different threshold voltages may increase a cost of manufacturing the multiply-accumulate module. Moreover, because an amount of power consumed by a cell may not substantially depend on threshold voltage of the transistors employed in the cell, an amount of power consumed by the cell may not be substantially reduced.