1. Field of the Invention
The present invention relates to computer processors including dynamic hardware logic. In particular, it relates to a method and respective system for operating a digital adder circuit comprising a plurality of logical stages in the carry logic of said adder circuit, for generating and propagating predetermined groups of operand bits, each stage implementing a predetermined logic function and processing input variables from a preceding stage and outputting result values to a succeeding stage.
2. Description and Disadvantages of Prior Art
It is a general task for microprocessor development to make computing increasingly faster from one microprocessor generation to the next one. Additionally, there is quite a large sector of computing devices, wherein a second requirement is basically rated equally important to the computing performance, which is a low power consumption. This is specifically true for all portable devices, as for example notebooks, mobile phones, FDA devices, etc.
Adder circuits, to which the present invention is focused, occupy a critical path in many areas of microprocessor operation. Their important role for microprocessor operation is due to the fact that adder devices are present in microprocessor operation in order to operate ADD/SUB units in arithmetic logic units, for memory address generation and for floating point calculations. Thus, it is key to the cycle time, to reach a minimum delay for those adder units. In particular in CMOS hardware logic the microprocessor implementing such adder units can be clocked very high and further architectural efforts can be undertaken, in order to reach said minimum time delay and thus to increase processing speed. But by virtue of the before-mentioned second requirement, a reduced power consumption, it is worth while thinking about a useful compromise between performance and power consumption. This is specifically true when developing adder architecture as they play an important role, as stated above, and because the add operation per se is a very complicated and time-consuming operation, compared to other operations, due to the enormous carry network of an adder device. The key role for adders is even more increasing, the more important larger address spaces are needed and the longer operands are, compared, for example, to 16-bit operands to be added with two 64-bit operands to be added. The computing time needed for the 64-bit operands is basically 30% higher.
with reference back to the task of finding a good compromise between performance and power consumption so-called static CMOS logic in 64-bit ADD/SUB units can reach a delay of about 10 FO4 at some moderate power consumption. With dynamic CMOS logic the same adder can achieve a delay (latency) of about 6 fanout of 4 (FO4) inverter delays, but at about 4 times the power consumption of the above-mentioned static solution. This is specifically true for the so-called DOMINO-TYPE dynamic logic.
In prior art adder architecture the developers of adder units decide if the adder should be implemented in static logic or in dynamic logic. A static adder is slower but needs less power, whereas a dynamic adder unit is quicker, but has significantly higher power consumption. Thus, disadvantageously, prior art does not offer to find a good compromise between power consumption and adder speed other than by reducing speed in order to obtain a moderate power consumption.
A promising approach to combine static logic with dynamic logic was offered by R. Montoye et. al., “A Double precision Floating Point Multiply”, ISSCC 2003, Vol. 46, pp. 336, Digest of technical papers, Visuals Supplement, pp. 270.
In this publication a first trial is offered to implement a latch at particular locations, in order to avoid the regular switching frequency to be expected in dynamic logic and thus to save power by avoiding some power consumption due to precharging the precharge nodes necessary in each cycle.
With reference to FIG. 1 (prior art), the precharge problem of prior art is shortly described next below, as it stands in a close context to the intentional approach disclosed in here.
In prior art it is known to apply so-called “keeper-devices” and/or “bleeder-devices”, which try to supply charge to a precharge node temporarily or continuously, respectively. This reduces the voltage drop caused by charge sharing, but also slows down the switching of the circuit. Keeper and Bleeder devices charge the precharge node, which slows down the discharge of this node in case the logical function forces a discharge said node.
In particular, in FIG. 1 the node 40 is the above-mentioned precharge node. During the so-called reset phase it is precharged to a certain voltage level, e.g. the supply voltage Vdd. This is done by the control of the reset transistor 12, which when switched to “pass”, connects the precharge node to the voltage source Vdd.
During the evaluation phase of the circuit, when some input setting is connected to the control inputs of the NFETs controlled by the input lines Ai, and Bi, these transistors remove this charge to ground, if the logic condition as defined by the value of the logic input variables A, B turns “ON”, ie, to pass mode, all transistors on the path depicted between the precharged node 40 and ground terminal. If only a part of said transistors are turned “ON” without opening up a connection between the precharged node and ground, then the node has to keep its charge but must share its charge with those active transistors.
Thus, basically the bleeder device 46 and a foot transistor device, which is not depicted in FIG. 1, but which resides at the “foot” of each transistor stack (the vertical paths in FIG. 1) cooperate, in order to provide a proper precharging independent of the actual input setting of the evaluation transistor stacks.
The promising approach according to above mentioned “Montoye et al.”, however, can not be transferred to 4-bit carry groups (or more) of adder units, because of the general, architectural constraint, to limit the evaluation transistor stacks of N-FET devices to a maximum number of 4 including said above mentioned foot transistor device, as the stacks would have at least 5 transistors in at least some paths of the carry network of the adder.
Thus, this hopeful approach could maybe used for 2-bit carry groups of adders, but not for 4-bit groups, which leads to a very limited applicability of this prior art static/dynamic logic combination.