1. Field of the Invention
The present invention relates to clocking in synchronous integrated circuits in general and, more specifically, to clock network balancing to minimize clock skew.
2. Description of the Related Art
Modern integrated circuit design usually requires the use of sequential elements that transfer and store a function of the input to the sequential elements at the output of the sequential element at a given change of state of a clock. Typical sequential elements include flip-flops and latches. Such storage is necessary for controlled timing of signals within an integrated circuit.
Clock networks are constructed such that there is correct timing between sequential elements. Clock networks deliver the clock signal from the source of the clock to sequential elements through various types of logic. This logic can be comprised of clock gating, clock inverting, clock selecting, or clock delivery logic. Clock skew is defined as the difference between the arrival of the clock signal at different sequential elements. Clock skew is problematic because it can cause aberrant behavior instead of the desired synchronous behavior. For example, consider two D flip-flops with the output of a first flip-flop tied to the input of a second flip-flop. At the clock edge, the output of the first flip-flop immediately prior to the clock edge should be transferred to the output of the second flip-flop. However, if the clock to the second flip-flop is delayed significantly relative to the first flip-flop, the input to the first flip-flop can propagate through first flip-flop and appear on the input to the second flip-flop before the clock edge arrives on the second flip-flop. When this clock edge does arrive, the input of the first flip-flop will now be transferred to the output of the second flip-flop. After the clock edge, the output of the second flip-flop contains the input of the first flip-flop instead of the output of the first flip-flop due to the clock skew. Clock networks that require low skew relative to each other are correlated and referred to as correlated clock networks.
Clock skew is dominantly caused by three mechanisms. The first is the propagation delay due to the interconnect variation between different clock paths. Because the path from the clock source to the various sequential elements differs, the interconnect between these paths differ, and the time for the clock signal to propagate to these elements will differ due to the parasitic impedance per unit length. This interconnect variation is a function of the physical topology of the clock network. The second is the difference in load placed on the clock network by the input impedance of the sequential elements. Typically each clock network will be driven by an active element which is capable of sourcing a fixed amount of current. Due to the finite current drive capabilities of the active element, the impedance seen by the active element at the clocked input of the sequential elements will cause the clock to be delayed. This loading is referred to as fan out. A fan out of N indicates that the buffer is driving the equivalent of N simple gates. The third element affecting delay is the propagation delay through the active elements in the clock delivery paths. These include buffers, inverters, multiplexers, and clock delivery gates.
The loading of all sequential elements in an integrated circuit is far too large to drive with a single buffer element. Because of this, the clock is divided into several clocking networks. Typically, a given network will be driven by a separate buffer. The networks may be further divided into sub-networks until the all sub-networks have manageable loading. In modem integrated circuit design, this clock network partitioning is done with automated tools. The goal of this partitioning is to meet the timing specifications between critical sequential networks. This process is referred to as clock network synthesis.
Correlated clock networks are clock networks that require low clock skew relative to each other. This is usually because the output of the sequential devices on one clock network drive the inputs of the sequential elements on a second clock network. Additional constraints are imposed on the clock network synthesis process due to the desire for low power operation, the use of inverted clocks, and testability. Low power operation often requires that clocks be gated off when not needed, adding logic gates into the clocking network. The use of inverted clocks requires inverters be inserted into the clock network. Finally, the requirement for testability of an integrated circuit often requires that a clock network can be driven by an alternative test clock, thus requiring a multiplexer to be inserted into the clock network. All of these logic elements add delays to the clock network. These delays must be compensated for in other elements of a correlated clock network. The present state of the art consists of inserting a single buffer in other networks in an attempt to match the delays of each logic gate in the network under consideration. This is done by specifying minimum delay parameters in the clock network synthesis tool, forcing the tool to insert buffers in almost all clock networks to equalize the delays in all networks. This process is illustrated in FIGS. 1 and 2. In FIG. 1, clock network 101 is directly connected to system clock 105. Clock network 102 is a selected clock, the selection being performed by multiplexer 106 and thus incurring some delay. Clock network 103 is a gated clock, the gating being performed by logic gate 107. Clock network 104 is an inverted clock, being inverted by inverter 108. Thus, clock network 102 is delayed by one multiplexer relative to clock network 101, clock element 103 is delayed by multiplexer 106 and logic gate 107 relative to clock network 101, and clock network 104 is delayed by multiplexer 106, logic gate 107, and inverter 108 relative to clock network 101.
The present state of the art would correct for these delays as shown in FIG. 2. The object of this correction is to place the same number of active elements in all clock networks. The longest path occurs in clock network 204 and consists of three active elements. This requires that three buffers, 208, 209, and 210 be added to clock network 101 to form clock network 201, buffers 211 and 212 be added to clock network 102 to form 202, and buffer 213 be added to clock network 203. Clock network 204 is unchanged and remains the same as clock network 104.
A significant problem with the method shown in FIG. 2 is that the delays in a clocking networks are due to both the propagation delay of the logic gates as well as the interconnect delay of the clock path. The buffers do not correct for the wire loading in any way and therefore do not exactly compensate for the delays in the other networks. Further, there is no attempt to match path lengths which further increases clock skew. Finally, the buffers present different input loading and output drive as the active elements the buffers are supposed to emulate.
Typical clock synthesis tools produce a clock network that delivers the proper clock signal to all sequential elements. FIG. 3 shows two correlated clock networks before clock network synthesis. Clock network 301 has a fanout of eight while clock network 302 has a fanout of three. Note that clock network 301 has significantly more fanout than in clock network 302. In addition, the path lengths and the topology of the two networks are very different. A clock network synthesis tool might break clock network 301 into two smaller clock networks. The result after clock network synthesis of this is shown schematically in FIG. 4. This will nearly correct for differences in the clock skew due to loading, but will not have any effect on delays due to wire lengths. In addition, buffers 404 and 405 see very different fanouts.
As described above, there are a number of different dilemmas and considerations to take into account for correlated clock networks. What is needed is a clock network synthesis method that matches the active elements in correlated clock networks, matches loading of active nodes in correlated clock networks, and matches a physical topology of correlated clock networks.