Many advanced existing electronic devices employ a digital signal processor to perform complex signal processing. Most digital signal processing functions require multiplication that is realized by a Multiplier, addition that is realized by an Adder, subtraction that is realized by an Adder (with simple modifications to the inputs) and in some applications, division that is realized by a Divider. All these mathematical operations involve addition that is realized by an Adder. In typical mathematically intensive computations, the Multiplier dissipates substantial power.
It is well established that the primary power dissipation mechanism of the Multiplier is the dynamic switching power that is due to the large number of high speed transistor switching within the Multiplier. This switching power problem is well recognized within the electronics industry. Generally, it is desirable that the power dissipation be as low as possible as this leads to a longer battery life for portable devices, less heat dissipation problems, etc. A well-established method to reduce the power dissipation of Multipliers is to reduce spurious switching (unnecessary transistor switching) where possible.
Multipliers may be categorized into two general classes: Serial and Parallel Multipliers. Parallel Multipliers are more popular as they have improved speeds over Serial Multipliers and they often dissipate lower power. However, their integrated circuit (IC) area requirement is usually larger but this is not usually a problem.
A Parallel Multiplier generally comprises three functional blocks as shown in FIG. 1. A Multiplicand input (X) 1 and a Multiplier input (Y) 2 are entered into a Partial Product Generator functional block 3 and the Partial Product Generator 3 generates Partial Products. The Partial Product Generator 3 can be realized by different methodologies including Booth algorithm, modified Booth algorithm, Baugh-Wooley algorithm, etc. The Partial Products are subsequently added in the First Stage Adder Circuit 4 whose internal structure is usually either array-based or tree-based. A Multiplier whose First Stage Adder Circuit 4 is based on an array structure is henceforth termed a Parallel Array Multiplier, and a Multiplier whose First Stage Adder Circuit 4 is based on a tree structure is henceforth termed a Parallel Tree Multiplier. The design of Parallel Multipliers is well established and is described in a book authored by Wakerly entitled Digital Design—Principles & Practices, 3rd Edition, Prentice Hall, 2000 (Wakerly). The most popular tree-based structure is the Wallace-Tree, first proposed by Wallace in, “A Suggestion for a Fast Multiplier,” IEEE Transactions on Electronic Computers, vol. 13, pp. 14–17, February 1964. The adders in the First Stage Adder Circuit 4 can be realized by a number of different adders including the Carry-Ripple Adder, Carry-Save Adder, Carry-Look-Ahead Adder, etc, and the Carry-Save Adder is probably the most popular design. The Adder 7 in the First Stage Adder Circuit 4 can be a Full Adder, a Half Adder or a combination of these basic adders. Wakerly gives a good description of these prior-art adders.
In FIG. 1, the first subscript 8 and a second subscript 9 of the Adder 7 respectively denote a row and a column of the Adder 7. As an illustration in FIG. 1, there are M rows of Adders 7 in the First Stage Adder Circuit 4. Depending on specific parallel structure, the number of columns in each row may be different. In the first row 10 there are a columns 13, in the second row 11 there are b columns 14, . . . , and in the Mth row 12, there are z columns 15. It is well known that the number of Adders 7 in this First Stage Adder Circuit 4 of a Parallel Array Multiplier is essentially the same for all rows. It is also well known that the number of rows of Adders 7 in the First Stage Adder Circuit 4 of a Parallel Tree Multiplier is less (than that in the Parallel Array Multiplier) and that the number of adders reduces from the first row to the last row. Consequently, the Parallel Tree Multiplier is usually faster (than the Parallel Array Multiplier) and potentially dissipates less power. However, its layout structure is less regular.
The output of the First Stage Adder Circuit 4 comprising the Partial Product additions goes to the Final Stage Adder Circuit 5. The Final Stage Adder Circuit 5 is typically one row of adders. Usually, the output of the Final Stage Adder Circuit 5 and some products from the First Stage Adder Circuit 4 collectively form the Multiplication Product 6.
The abovementioned spurious switching primarily occurs in the internal nodes of the First Stage Adder Circuit 4 and in the internal nodes of the Final Stage Adder Circuit 5. The origin of the spurious switching in the First Stage Adder Circuit 4 and in the Final Stage Adder Circuit 5 may be largely attributed to the different arrival times of input signals to the different adders. In the First Stage Adder Circuit 4, the spurious switching propagates from the nodes in the input stages of the first row to the latter rows where the amount of spurious switching usually increases substantially. Typically, the total amount of spurious switching in the Final Stage Adder Circuit 5 is substantially less than that in the First Stage Adder Circuit 4 because the Final Stage Adder Circuit 5 usually comprises a significantly smaller number of adders. However, the amount of spurious switching per adder in the Final Stage Adder Circuit 5 may be higher.
Several methods have been proposed to reduce the undesired spurious switching. For example, U.S. Pat. No. 5,333,119 Raatz et al. employs a delayed-evaluation technique where delay lines are used to appropriately time the dynamic Complementary Metal-Oxide-Semiconductor (CMOS) adders in the First Stage Adder Circuit 4. However, because the dynamic CMOS adders require constant pre-charging and evaluation of their output logic levels according to the applied inputs, the adders potentially feature higher spurious switching activity compared to the more conventional static CMOS adders. In U.S. Pat. No. 5,787,029 de Angel, applies a logic called an Enable/Disable CMOS Differential Logic (ECDL). The ECDL is used to construct the computation units including the basic adder cells that operate in an iterative manner in the First Stage Adder Circuit 4. This operation potentially reduces the intermediate spurious switching in Multipliers. However, as such ECDL adders require reset and enable operations (similar to that of the dynamic CMOS adders) to appropriately time the ECDL adders, the advantages gained are mitigated by the potentially higher spurious switching activity. Furthermore, the ECDL adders require complementary signals that double the switching activity and require a larger IC area.
Another method to reduce the undesired spurious switching is described in U.S. Pat. No. 5,818,743 Lee et al, which places a plurality of delay elements and registers in selected signal lines to delay the arrival of signals to a Booth Encoder in the Partial Product Generator 3 and to the adders in the First Stage Adder Circuit 4. This improves the synchronicity of input timings. These delay elements and registers are separate circuit entities that are independent of the adders and the Booth Encoder. Being separate circuit entities, the added hardware costs are high. Furthermore, the additional power costs from the delay elements and registers may instead increase the overall power dissipation of the Multiplier rather than reducing the power dissipation originally obtained from reduced spurious switching.
Lemonds et al., in a technical paper entitled “A Low Power 16 by 16 Multiplier Using Transition Reduction Circuitry,” International Workshop on Low Power Design, pp. 139–142, April 1994 (Lemonds 1994), proposes placing Latches at the input of the adders in the First Stage Adder Circuit 4 and clocking the Latches in a precise sequence so that inputs to the adders are synchronized. Although the spurious switching is reduced, the overhead cost of the Latches remains high. This is because the Latches are circuit entities independent of (and separate from) the adders, as in U.S. Pat. No. 5,818,743 Lee et al. Consequently, the power savings from reduced spurious switching may be offset by the power dissipation of the Latches that are external to the adders. Furthermore, the circuit costs of these Latches in terms of IC area may also be high.
Lu et al., in a technical paper entitled “A 200-MHz CMOS Pipelined Multiplier-Accumulator Using a Quasi-Domino Dynamic Full-Adder Cell Design,” IEEE Journal of Solid-State Circuits, vol. 28, No. 2, pp. 123–132, February 1993 (Lu 1993), proposes to include an internal C2MOS dynamic Latch at the output of all adders to perform logic inversion, buffering and pipelining functions. The objective of this proposal is to increase the throughput rate of the Multiplier. Although this design indirectly reduces the spurious switching in some of the subsequent stages of the adders in the First Stage Adder Circuit 4, the reduction of the spurious switching may not be significant for three reasons. First, with the Latch placed at the output instead of the input, some spurious switching does occur within the internal nodes of the adder. Second, because not all the outputs of the adders in one row are connected to the inputs of the following row of adders (in particular in Tree-based First Stage Adder Circuits), but are instead also connected to the inputs of some adders in other rows, substantial spurious switching may still occur in the adders of these other rows. This is because the input signals to the adder of these other rows are poorly synchronized. Third, some further spurious switching may occur because of the way the Quasi-Domino Dynamic Full Adder operates. During the de-assert phase, the output of the Quasi-Domino Dynamic Full Adder is floating. Consequently, the output may change state (logic high to logic low) if the charge at the output node leaks away (where the clock rate is slow). This change in state may inadvertently initiate some spurious switching. A pertinent observation of this Quasi-Domino Dynamic Full Adder design is the placement of its internal Latch—at the output of the Adder—and that the internal Latch latches the output signal.
In summary, the abovementioned methods attempt to reduce spurious switching either by appropriately timing the input signals (by means of simultaneously latching the input signals, that is synchronizing the inputs) using Latches (or other similar logic circuits) that are separate circuit entities to (that is separate of) the Adders or by an internal Latch (internal to the Adders) to latch the output signal. In the former method, the overheads for realizing these timing adjustments are high, thereby defeating the advantages of reduced spurious switching. In the latter method, the amount of reduced spurious switching is low.
Furthermore, all the abovementioned methods or proposals are intended for synchronous digital logic circuits only and their application to asynchronous logic circuits is uncertain. A good description of synchronous logic circuits and asynchronous logic circuits can be found in a book authored by Dally and Poulton and entitled Digital Systems Engineering, Cambridge University Press, 1998.
It is of interest to note that the computation process of a Parallel Divider is similar to that of a Parallel Multiplier. Instead of a series of addition processes in a multiplication, a division performed by a Parallel Divider involves a series of subtraction processes. As mentioned earlier, a subtraction is simply by an addition (by means of adders) with simple modifications to the inputs. If the arrival time of the input signals to these adders (with modifications to the inputs) are poorly synchronized, a significant amount of spurious switching results. In this case, the power dissipation in a Parallel Divider increases. Put simply, the spurious switching in prior-art Parallel Multipliers similarly occur in prior-art Parallel Dividers.
Hence, it would be highly desirable to have synchronous logic-based and/or asynchronous logic-based Parallel Multipliers (and/or Parallel Dividers) with reduced spurious switching in the First Stage Adder Circuit 4 and/or with reduced spurious switching in the Final Stage Adder Circuit 5, wherein the reduced spurious switching in the First Stage Adder Circuit 4 and/or Final Stage Adder Circuit 5 is obtained with little overhead costs resulting from a small amount of added hardware dissipating less power than existing or prior-art methods or proposals.