Advances in semiconductor technology presently make it possible to integrate large-scale systems, including hundreds of millions of transistors, onto a single semiconductor chip. Integrating such large-scale systems onto a single semiconductor chip increases the speed at which such systems can operate, because signals between system components do not have to cross chip boundaries, and are not subject to lengthy chip-to-chip propagation delays.
The speed of a system on an integrated circuit (IC) chip is largely determined by system clock frequency. In a typical synchronous IC chip, a clock distribution network is used to distribute a clock signal from a common source to various circuit components. This clock signal is used to coordinate data transfers between circuit components. The arrival of clock signal at the sequential components (typically registers or latches) triggers the launching and the capturing of the digital signal before and after logical computation. As increasing clock frequencies reduce the clock periods to fractions of a nanosecond, designing clock distribution networks is becoming increasingly more challenging. A direct result of the decreasing clock period is a shrinking “timing budget” between logically coupled clock sinks. This decreasing timing budget requires clock networks to be implemented with precision so that the skew of the clock arrival times at sequential components is minimized.
The primary goal of clock network synthesis is to minimize skew. In addition, the total insertion delay and clock buffer delay and power consumption are important metrics of quality. Traditionally, clock network synthesis tools employ greedy algorithm based local optimization methods to balance the skew across the clock sinks. These conventional tools primarily address clock networks with tree like topology. Therefore clock network synthesis is commonly referred to as clock tree synthesis. With these conventional tools, complex clock systems are divided into tree-like partitions based on heuristics that simplifies the conditions of how the clock network operates. These tools typically involve a top down or bottom up traversal of the clock network. During the traversal, these tools will try to decide whether it is possible to insert a buffer or a pair of inverters to each edge when the edge is traversed. These traditional methods tend to insert as many buffers as early as possible based entirely on local information. They are not aware of potential conflicts between different clocks in different timing mode. They are not able to handle the delay differences among a large number of corners. They are not aware of the physical context of the clock network such as floorplan limitations, congestions, and routability.
The existing clock tree synthesis methods require a lot of human intervention. They require the users to define and partition the clock system so that the tools can handle it properly. The existing clock tree synthesis methods do not know whether a target skew is achievable until the clock network is fully synthesized, placed, and routed. When the target skew is not achieved, these tools will not be able to determine whether it is due to a design mistake or a tool limitation. Even if the skew target is achieved on a block level, when the IC blocks are assembled at the top level, they often cannot be balanced and the only remedy is to open up some of the blocks and redesign their clock systems. Furthermore, known clock skew balancing tools produce unroutable or infeasible skew balancing solutions, especially for designs with complex floorplans and narrow channels.