Modern data processing systems may perform Boolean operations on a set of signals using dynamic logic circuits. Dynamic logic circuits are clocked. During the precharge phase of the clock, the circuit is preconditioned, typically, by precharging an internal node (dynamic node) of the circuit by coupling to a power supply rail. During an evaluate phase of the clock, the Boolean function being implemented by the logic circuit is evaluated in response to the set of input signal values appearing on the inputs during the evaluate phase. (For the purposes herein, it suffices to assume that the input signals have settled to their “steady-state” values for the current clock cycle, recognizing that the input value may change from clock cycle to clock cycle.) Such dynamic logic may have advantages in both speed and the area consumed on the chip over static logic. However, the switching of the output node with the toggling of the phase of the clock each cycle may consume power even when the logical value of the output is otherwise unchanged.
This may be appreciated by referring to FIG. 1A illustrating an exemplary three-input OR dynamic logic gate and the accompanying timing diagram, FIG. 1B. This type of logic gate is referred to in the literature as a Domino logic gates since state changes ripple through cascaded circuits when the clock signal evaluates the dynamic node like “Dominos” falling.
Dynamic logic 100, FIG. 1A, includes three inputs a, b and c coupled to a corresponding gate of NFETs 102a-102c. During an evaluate phase of clock 104, N1, NFET 106 is active, and if any of inputs a, b or c are active, dynamic node 108 is pulled low, and the output OUT goes “high” via inverter 110. Thus, referring to FIG. 1B, which is illustrative, at t1 input a goes high during a precharge phase N2 of clock 104. During the precharge phase N2 of clock 104, dynamic node 108 is precharged via PFET 112. Half-latch PFET 114 maintains the charge on dynamic node 108 through the evaluate phase, unless one or more of inputs a, b or c is asserted. In the illustrative timing diagrams in FIG. 1B, input a is “high” having a time interval t1 through t2 that spans approximately 2½ cycles of clock 104, which includes evaluation phases, 116 and 118. Consequently, dynamic node 108 undergoes two discharge-precharge cycles, 124 and 126. The output node similarly undergoes two discharge-precharge cycles, albeit with opposite phase, 124 and 126. Because the output is discharged during the precharge phase of dynamic node 108, even though the Boolean value of the logical function is “true” (that is, “high” in the embodiment of OR gate 100) the dynamic logic dissipates power even when the input signal states are unchanged.
Additionally, dynamic logic may be implemented in a dual rail embodiment in which all of the logic is duplicated, one gate for each sense of the data. That is, each logic element includes a gate to produce the output signal, and an additional gate to produce its complement. Such implementations may exacerbate the power dissipation in dynamic logic elements, as well as obviate the area advantages of dynamic logic embodiments.
Selection circuits, including shifting circuits and multiplexors, are used extensively within computer systems. Some of these selection circuits require multiple levels of selection, for example, a first input is selected from a plurality of first inputs wherein each of the first inputs are additionally selected from a plurality of second inputs. Computer systems employing dynamic logic may find that it is difficult to implement selection circuits for single and multilevel selection from many inputs because of the limitations of required precharge and evaluation times as well as the fact that outputs are not held during the precharge cycle.
Limited switching dynamic logic (LSDL) circuits produce circuits which mitigate the dynamic switching factor of dynamic logic gates with the addition of static logic devices which serve to isolate the dynamic node from the output node. Additionally, LSDL circuits and systems maintain the area advantage of dynamic logic over static circuits, and further provide both logic senses, that is, the output value and its complement.
Rapid performance increases and multi-functionality of microprocessors require larger entry register files that operate at higher speeds. Address decoding consumes approximately >50% of the overall operation time when accessing these register files, therefore improving address decoding speed must be a priority to continue improving microprocessor performance.
Using Partially-Depleted Silicon-On-Insulator (PDp-SOI) technology and an exemplary 8 GHz (120 ps]) operating frequency and pipeline operation, only one 120 ps clock cycle is allocated for capturing addresses, partial decoding, and selecting/de-selecting a word-line (WL). A total of two clock cycles are necessary to read/write and provide primary/secondary sensing. Output data driving consumes an additional clock cycle. Scaling CMOS technology below 100 nm has continued to improve transistor performance while PDp-SOI technology has achieved further improvement due to low junction capacitance and absence of the “body effect.” However, interconnect performance has been degrading since the feature sizes have dropped below 0.5 um. Smaller transistors has enabled compaction of layout area and has resulted in shorter interconnect wire, however, the wire pitch has also been further reduced. This has resulted higher RC time constants even with the use of copper instead of aluminum and the used of low-k inter-layer dielectrics materials. At least 20 ps of the timing margin is consumed by the propagation delays in each of the word-lines and the partial decoder lines even if a compact layout is prepared and the line lengths are optimized to reduce wire delay. As a result, only two thirds (e.g., 80 ps) may be allotted for capturing addresses (true/complement generation), partial decoding, and selecting/de-selecting a word-line. Therefore, there is a need for circuitry to improve address decoder performance that will allow improvements in microprocessor performance.