The present invention pertains generally to integrated circuits, and more particularly, to a method and apparatus for minimizing clock skew in a balanced tree when interfacing to an unbalanced load.
Clock networks on CMOS integrated circuits have long been a source of difficulty to integrated circuit designers due to the importance of minimizing skew between clock inputs. A typical integrated circuit includes a clock tree which distributes one or more clock signals throughout the chip to clocked elements. A primary goal of a clock tree is to minimize clock skew between clocked elements. Since all clocked elements are driven from one net with a clock spine, skew is caused by differing interconnect lengths and loads. If the delay is much larger than the interconnect delays, a clock spine achieves minimum skew but with long latency. Clock skew represents a fraction of the clock period that cannot be used for computation. A clock skew of 500 ps with a 200 MHz clock means that 500 ps of every 5 ns clock cycle, or 10 percent of the performance is wasted. That is, clock skew may reduce the time allowed for certain logic paths within the design, and thus may reduce the performance of the design. Thus, for high performance designs that have strict timing requirements, it is often critical to minimize clock skew.
To minimize clock skew, typical clock trees include a number of clock drivers that are symmetrically and evenly placed on the integrated circuit die. In order to reduce clock skew in a clock tree, it is important to balance the delays through the tree carefully to minimize clock skew. There may be a number of first level drivers, which may receive a clock signal from an input buffer, and may be placed near the center of the integrated circuit. Each of the first level drivers may drive a number of second level drivers. Typically, each of the first level drivers will drive the same number of second level drivers. This is intended to maintain a matched load therebetween. The number of second level drivers may be symmetrically and evenly placed on the integrated circuit die.
A typical clock tree may include a number of levels of clock drivers. The number of clock drivers in the last level is typically sufficient to drive all of the clock loads within the design. Like all other levels, the last level of clock drivers is typically placed symmetrically and evenly throughout the integrated circuit die.
In many cases, all of the clock drivers are pre-placed on the integrated circuit die. This allows the clock drivers to be placed at any desired location on the integrated circuit die. This allows the clock tree to be evenly distributed and balanced. The routing between clock drivers may also be pre-placed and balanced.
Designing and constructing a balanced clock tree is often a time-consuming task, requiring significant design resources. Therefore, it is common for only one xe2x80x9cworst casexe2x80x9d clock tree to be designed. The xe2x80x9cworst casexe2x80x9d clock tree may then be used in each integrated circuit within a system, while still maintaining an acceptable clock skew.
After the xe2x80x9cworst casexe2x80x9d clock tree is designed and preplaced, the circuit designer may use a placement tool to manually place selected regions or cell of the circuit design. Thereafter, an automatic place and route tool may be used to place the remaining cells, and route the design according to the overall design specifications.
The above clock tree generating scheme has a number of limitations, some of which are described below. First, each of the clock drivers in the last level of the clock tree may have a limited drive capability, and thus may only drive a limited number of clock loads (e.g. registers, flip-flops, etc.). To use the same clock tree for multiple integrated circuit designs, and as described above, the clock tree may have to be designed to accommodate the number of clock loads in the xe2x80x9cworst casexe2x80x9d integrated circuit design. Because the same xe2x80x98worst casexe2x80x99 clock tree may be used for all integrated circuits within the system, many of the integrated circuits may be populated with more clock drivers than are actually required. This is especially limiting when the number of clock drivers that are required varies dramatically between circuit designs. These extra clock drivers may consume die area and power that could otherwise be used to implement the logical design.
FIG. 1 is a diagram illustrating a reduction in the effective clock period between registers caused by clock skew. An illustrative timing path is shown at 10, and a timing diagram therefore is shown at 30. The timing path includes a first rising edge triggered register 22a receiving data Da from a first input/output pad 20a, and a second rising edge triggered register 22b receiving data Db from a first input/output pad 20a. The first register 22a is clocked by a first clock signal CLKa and the second register 22b is clocked by a second clock signal CLKb.
With reference to the timing diagram 30, the input clock CLK is shown at 24. The first clock signal CLKa and the second clock signal CLKb are generated from the input clock signal CLK 24 via a clock tree or the like. The timing diagram 30 shows that the first clock signal CLKa is skewed relative to the second clock signal CLKb, as shown by tskew. This clock skew tskew may be caused by an improperly designed clock tree.
On the rising edge of the first clock signal CLKa, the first register 22a may release data Qa via the logic-in signal Da. On the rising edge of the second clock signal CLKb, the second register 22b may release data Qb via the logic-in signal Da. When the subsequent logic (not shown) is designed to receive and use the latched data Qa and Qb simultaneously, the clock skew tskew is clearly problematic.
Because of the clock skew tskew between the first and second clock signals CLKa and CLKb, the effective clock period Teff between the rising edge of the first clock signal CLKa and the subsequent rising edge of the second clock signal CLKb is less than the clock period Tperiod, this effectively reduces the time allowed for the data to pass through subsequent logic before receiving the next incoming data, and thus may reduce the performance of the logic path.
For the above reasons, a primary goal of a clock tree is to minimize clock skew between clocked elements. As shown above, clock skew may reduce the effective clock period for certain logic paths within the design, and thus may reduce the performance of the design. For high performance designs that have strict timing requirements, clock skew may consume a substantial portion of the total clock period.
Clock skew may have a number of other detrimental effects on the performance of a circuit design, only some of which are described below. For example, clock skew may cause hold time violations when only a small amount of logic is provided between registers. Further, clock skew may cause communication problems between integrated circuits. It should be recognized that these are only illustrative examples of effects that clock skew may have on a system.
FIG. 2 is a schematic diagram illustrating a typical prior art clock tree. As indicated above, each integrated circuit typically includes a clock tree. The clock tree may distribute one or more clock signals throughout the design. As indicated above, a primary goal of a clock tree is to minimize clock skew between clocked elements.
Referring to FIG. 2, balanced clock trees include a number of clock drivers that are symmetrically and evenly placed on the integrated circuit die. An integrated circuit die is generally shown at 50. There may be a number of first level drivers 55, which may receive a clock signal from an input buffer (not shown), and may be placed near the center of the integrated circuit. Each of the first level drivers may drive a number of second level drivers 56. Typically, each of the first level drivers 55 drives the same number of second level drivers 56 as all other first level drivers. This may maintain a load match therebetween. The number of second level drivers may be symmetrically and evenly placed on the integrated circuit die, as shown. Symmetrical placement is typically used to distribute the clock signal evenly throughout the design to minimize clock skew between clocked elements.
Although the illustrative embodiment shown in FIG. 2 only shows two levels of clock drivers, it is recognized that a typical clock tree may include a number of levels of clock drivers, such that the number of clock drivers in the last level is sufficient to drive all of the clock loads within the design. Like the other levels, the clock drivers in the last level are typically placed symmetrically and evenly throughout the integrated circuit die.
In many cases, all of the dock drivers are pre-placed on the integrated circuit die prior to placement of functional logic blocks. For example, the first level clock drivers 55 and the second level clock drivers (e.g. clock driver 56) may be pre-placed on the integrated circuit die. This may allow the clock drivers to be placed at any desired location on the integrated circuit die, without having to be concerned with avoiding the placement locations of other cells. This may allow the clock tree to be evenly distributed and balanced. The routing between clock drivers may also be pre-placed and balanced.
For the integrated circuit shown at 50, a functional logic block 60 may be placed in the lower-right quadrant, as shown. The functional logic block 60 may not provide any clock loads to the clock tree, as it may be an asynchronous device. Thus, for the clock tree scheme shown in FIG. 2, all of the second level clock drivers that are pre-placed in the lower-right quadrant 54 of the integrated circuit die 50 may not be required, and the die area and power consumed by those clock drivers may be wasted.
Second, and because the clock tree shown is symmetrically and evenly distributed throughout the integrated circuit die 50, the circuit designer typically must consider the number of clock loads that are placed in a given region.
As process generations have advanced, the contributions of resistance and capacitance in clock tree routing have becomes a significant portion of the overall clock skew. Without a well designed balanced clock tree, designers have to allocate a larger percentage of the chip real estate budget allocation for the dock tree to uncertainty and mismatch of clock edge arrival.
Designing a balanced clock tree is problematic when faced with the physical constraints of the integrated circuit. The actual physical layout often presents an unbalanced set of clock inputs (an unbalanced load) to the balanced clock tree. This can result in an undesirable increase in clock skew between branches of the clock tree.
The problem of unbalanced loading of dock inputs has been solved previously by physically duplicating the metal muting of a used clock branch in a branch without clock inputs. This technique reduces clock skew by creating a balanced load network to present to the clock tree. Unfortunately, this technique consumes additional routing resources, adds complexity to the physical artwork layout, and is limited in application to areas of the integrated circuit where space is available.
The present invention is a method and apparatus for minimizing clock skew in a balanced tree when interfacing to an unbalanced load. The present invention solves the physical limitations and drawbacks of the prior art by creating a physically balanced load, preferably with a loading equivalent circuit comprising an RC circuit that has been modeled to match the performance of the actual metal route. The loading equivalent circuit enables the designer to minimize clock skew by tuning the circuit after the clock tree has been designed.
The loading equivalent circuit of the invention can effectively reduce clock skew even with imprecise matching of the actual clock branch. Simulations reveal even 20 percent variations in component values do not compromise the significant improvements to the clock skew when the loading equivalent circuit is used in a clock tree.
The loading equivalent circuit of the invention can be used in space limited clock routing without consuming significant routing resources or adding unnecessary complexity. The loading equivalent circuit can be built on the integrated circuit from a resistance and capacitance network. This network can be formed from a combination of metal traces and parasitic capacitances, poly resistors and poly capacitors, or FETs.