In synchronized circuits, a clock tree is generally used for the purpose of distributing a common clock signal to many sequential elements, which include flip flops, latches, and memories, so that the sequential elements have a uniform timing. Manual-based structures like H tree meshes, and tool-based balanced buffer trees are widely used in the clock trees for the distribution of the clock signal.
Differences in the arrival of the clock signal at two or more clocked elements can result in errors in the synchronous system. Clock skew is the difference in the time points of the clock signal arriving at different clock-receiving units such as flip flops. The clock skew can cause errors in the synchronous system as the clocked elements are triggered at different points in time. One such error is called hold violation, which occurs when the clock skew between two sequentially connected flips flops is greater than the data propagation delay from the first flip flop to the second flip flop; allowing data at the first flip flop output to race through the second flip flop and bypassing a full clock cycle.
The clock skew is always a major obstacle for high-speed circuit design. Conventionally, to reduce the clock skew in ASIC implementations, clock-tree synthesis tools, such as Synopsys' CTS tool, were used to balance the clock tree. The CTS tools insert and adjust buffers along the paths into different leaf points of the clock tree, so that the clock signal arrives at the leaves of the clock tree at substantially the same time. The CTS tools balance the clock tree under the constraint that the clock insertion delay, defined as the time elapsed from the clock arrival at the root of the tree to the arrival at the leaf points does not exceed certain budget. This is because a long insertion delay leads to undesirable, high clock tree power consumption. Also, clock trees with long insertion delays are more susceptible to manufacturing on-chip variation (OCV), which injects uncertainty in the clock insertion delay, exacerbating the clock skew problem. The clock insertion delay budget, in general, places a limit on the amount of clock skew that CTS tools are able to reduce.
With the advancement in the integrated circuits, faster circuits are manufactured. For example, embedded ASIC cores have achieved a very high-speed such that the solution provided by the CTS tools can no longer meet the clock skew requirement of the high-speed circuits. To overcome the problem, many resort to manual design methods to form H trees and mesh structures, which are common in traditional high-end CPU designs. This may generate clock trees with small clock skews. However, due to the huge size of some circuits, it may take very long time to perform the manual design, and the time-to-market is significantly affected.
Another difficulty in adopting H tree-/mesh-based clock trees in ASIC implementation is caused by the widespread use of the integrated clock gates (ICGs) for power reduction. The ICGs, which may be inserted into the clock tree manually by designers or automatically by synthesis tools, save power by shutting off sections of clock tree that are not required for an operation, preventing the controlled logic from toggling and consuming power. Because ICG inputs and outputs are logically distinct, inserting ICGs into the clock trees render the tree fragmented and unsuited for H tree-/mesh methodology, which can only be applied on an un-fragmented net.