Increasing the performance of Integrated Circuits (ICs) is of paramount concern to support advances in technology. Many techniques are employed for increasing performance, ranging from new and improved fabrication technology to advances in design techniques. One design approach is that of partitioning a globally clocked synchronous logic circuit into disjoint subcircuits referred to as ‘phases.’ Each phase then utilizes a private clock distribution network with a substantially reduced clock frequency. However, overall performance does not degrade since the multiple subcircuits may all operate concurrently. This approach, referred to here as ‘Multi-phase clocking’ is a form of pipelining since it incorporates temporal concurrency. Pipelining is a well-known digital design performance enhancement technique whereby concurrency is exploited in a temporal sense as opposed to physical duplication of functional units [16].
While multi-phase clocking has the advantage of retaining high-performance while relaxing timing constraints within each subcircuit phase domain, the significant disadvantage is that multiple clock distribution networks must be provided for each phase clock instead of a single global clock distribution network. Such clock distribution networks are a well-known source of power dissipation and overall area utilization.
The chief concept utilized is that of a Multiple-Valued Logic (MVL) global clock signal. MVL concepts have been devised and used by circuit designers [1, 2, 3, 5] and also developers of Electronic Design Automation (EDA) tools for digital circuit simulation and synthesis. MVL circuitry is attractive for high-speed IC designs as reduction in chip area, increase in performance, and reduced power dissipation characteristics are major requirements for future ICs.
Clocking is an essential concept in the design of synchronous digital systems [10]. A synchronous system is comprised of storage elements and combinational logic that together make up a Finite State Machine (FSM) controller and a datapath. A typical clock signal has to be distributed to a large number of storage elements and hence it has the highest fan-out of any node in a typical digital design. As a result, a clock distribution system alone can consume up to 25-70% of the power budget of the IC chip [4, 17, 18, 19, 20]. Clocking in digital systems continues to gain importance since the clock frequency is increasing rapidly, approximately doubling every three years. The increase in clock uncertainties due to higher clock frequencies has made designing clock distributions in high-performance microprocessors and other ICs increasingly difficult. Hence distributed multi-phase clock systems can play a vital role in high-performance circuit designs where independent clock networks with lower frequency non-overlapping clock signals are physically distributed to disjoint subsets of clocked storage elements.
Many high-performance digital integrated circuits being produced today use multi-phase clock distribution systems [7,8,9]. The clock distribution networks used in multi-phase clock distribution systems require a significant amount of resources in terms of area since each clock phase requires an independent distribution network. Difficulties also arise in maintaining synchronization among the independent clock phases. Such high performance digital integrated circuits typically use multi-phase clock distribution systems with level-sensitive latches as clocked storage elements. A set of N periodic non-overlapping binary clock signals propagate over each of the clock phase distribution networks and drive disjoint subsets of level-sensitive latches providing enhanced throughput and performance. This performance enhancement can result in increased area characteristics since the individual distribution networks are required for each clock phase.
Other current approaches in overcoming challenges in clocking include the use of reference-based distribution architectures [22] involving multi-tap distribution lines. Another approach modifies the binary clock signal to have a swing of one-half Vdd in an attempt to decrease power consumption [23]. The half-swing approach has some similarity to the method described here in that overall clock signal voltage amplitudes are modified.
Level-sensitive transparent latches [6] as state-holding elements provide high-performance and low power consumption [13] as compared to flip-flops. Level-sensitive latches are attractive since they require fewer transistors to implement as compared to edge-sensitive storage devices. However, the transparent nature of latches increases the difficulty in meeting timing criteria as compared to the use of edge-sensitive circuits. Because timing constraints are considerably relaxed within each subcircuit of a multi-phased clocked logic design, latches are more easily used and are often the state-holding element of choice for these types of designs.
For example, FIG. 1 is a logic symbol for a level-sensitive latch or D-latch 100 in accordance with the prior art. A level-sensitive latch or D-latch 100 is a logic circuit which acts as a data storage element. A D-latch 100 has a data input signal (D), a gate/enable signal (EN), an output (Q) and an inverted output (Q′ or Q). The characteristic table for a binary D-latch 100 is shown Table 1.
TABLE 1D-Latch Characteristic TableEN/CLKDQ Q0XQprev Qprev10011110
A typical CMOS voltage-mode D-latch circuit 100 can be implemented in a fashion as shown in FIG. 2 [11]. The Data input signal (D) is input to a transmission gate 200 controlled by the enable signal (EN). The EN input serves as the latch's gate input and is connected to the output of the modified literal selection gate. The output of the transmission gate 200 is connected to a latch comprised of two inverters 202a and 202b where the topmost inverter 202b serves as a keeper logic circuit.
A typical high performance IC design 300 with multiple phase clock signal distribution networks 3021, 3022 and 302N in accordance with the prior art is depicted in FIG. 3. An on-chip Phase Locked Loop (PLL) 304 receives the binary external clock input 306 to generate a stable high frequency global clock signal that is then input to a multi-phase generation circuit 308. The phase generation circuit 308 then produces each of the N individual clock phase signals Φ0, Φ1, . . . , ΦN that are in turn distributed to disjoint sets of sub-circuits 3100, 3101 and 310N over corresponding Clock Distribution Tree (CDT) networks 3120, 3121 and 312N.
The N multiple phase shifted clock signals are represented by Φ0, Φ1, . . . , ΦN. For example, a quaternary logic network will have four phase shifted clock signals, represented by Φ0, Φ1, Φ2 and Φ3. An example of the clock signal waveforms for the external global clock input 306 and the resulting four phase shifted clock signals Φ0, Φ1, Φ2 and Φ3 are shown in FIG. 4. The four multiple non-overlapping phase shifted clock signals, Φ0, Φ1, Φ2 and Φ3, propagate to disjoint sets of sub-circuits over corresponding Clock Distribution Tree (CDT) networks 3120, 3121 and 312N. Each sub-circuit 3100, 3101 and 310N is comprised of combinational logic along with sequential logic elements that are typically level-sensitive transparent latches.
As is apparent from the foregoing discussion, multi-phase clock distribution systems require a significant amount of resources in terms of area and power dissipation. Accordingly, there is a need for a single clock distribution network for a MVL clock signal.