Clock distribution is a critical task in modern chip design. In recent years, the advance in CMOS technology has led to an exponential increase in chip complexity. The number of transistors on a large chip can reach billions. Modern System-on-a-Chip (SoC) can be regarded as many on-chip micro-networks communicating to each other all the time. Clock is the key signal that makes this happen. From clocking perspective, chip architecture can be classified as Globally Asynchronous Locally Synchronous (GALS) and Globally Synchronous Locally Synchronous (GSLS). In GSLS approach, the clock signals driving all the on-chip modules run at the same frequency. Among them, they also have fixed phase relationship. This fact requires the distribution of a global clock signal. There are several design considerations when distributing a clock signal globally: minimizing the skew caused by different distribution paths, minimizing the jitter accumulated along the distribution path, minimizing the silicon and metal resource required for routing the clock signal and minimizing the power used by the distribution network.
Refer to FIG. 1A, in conventional practice, tree structures are used to distribute a global clock signal generated from source 111. The circuit elements receiving the clock signal are called clock sinks. A group of clock sinks is marked as 112. The global clock signal is distributed using a distribution network. The distribution network can be constructed using branch tree 116, H-tree 115 or X-tree 114. In the distribution network, buffer cells of various driving strengths are used to compensate the energy loss and keep the global clock signal at an appropriate voltage level. An exemplary buffer cell is labeled 113.
In FIG. 1B, a H-tree with active skew compensation is depicted. The global clock signal generated from source 151 is required to be delivered to all the clock sinks. An exemplary group of clock sinks is marked as 156. The distribution network contains multiple buffer cells. One of them is labeled as 152. To alleviate skew problem, the delays at the ends of different branches are compared by using phase detectors. An exemplary case is shown as the delays of branches 157 and 158 being compared by a phase detector 155. The result is used to drive delay lines 153 and 154 so that the delays of the paths can be adjusted. Consequently, skew can be minimized.
Refer now to FIG. 2A, clock mesh (clock grid) is also used in some designs, especially in high end microprocessor, for distributing global clock signal. In this method, a solid grid 213 made of metals is constructed on-chip as shown in the figure. Its purpose is to deliver the global clock signal generated from source 211 to all the clock sinks in the chip. An exemplary group of clock sinks is labeled as 214. The grid also requires multiple buffer cells attached at various locations of the grid to compensate the energy loss and keep the signal strength at appropriate level. An exemplary buffer cell is labeled as 212. In practice, the tree and grid methods can be used together to achieve the goal of distributing a clock signal from a source to all the sinks across a large chip
Refer now to FIG. 2B, to minimize skew actively, a method of using distributed PLL array can also be used. The global clock signal generated from source 221 is required to be delivered to all the clock sinks. An exemplary group of clock sinks is marked as 222. The entire chip area 227 is split into multiple small areas called tiles. One of such tile is labeled 228. Inside each tile, there is a local frequency generator 226 (represented by the VCO [Voltage Controlled Oscillator] symbol). Along the four boundaries of each tile, there are phase detectors 223, 224, 225 and 226 used for comparing the delay differences between the local clock and its neighboring clocks. The result is used to drive the frequency generator and then to minimize the skew. In this approach, the array of distributed PLLs actively compensates the skew.
As semiconductor process technology advances, the tree and grid structures face difficult challenges. The circuit operating frequency becomes higher due to the reduction in transistor gate delay. The chip size becomes larger since more transistors can be packed. As a result, the global clock signal has to travel further. Moreover, both the gate and interconnect delay variations induced by PVT (process, voltage, temperature) change become larger. Furthermore, the interconnect delay does not scale well with process advance. All these factors have made skew take larger percentage of the clock period. They also make the variation of skew hard to be controlled. To make it even worse, the distribution of clock signal crossing a big chip in high frequency requires large amount of metal resource (for shielding) and high consumption of energy (could be as high as 50% of the total power used by the chip). For the distributed PLL array approach, besides the high resource and high power consumption problems, it also has additional stability problem due to the fact that many PLLs are required to lock to the same common reference.
This “Discussion of the Background” section is provided for background information only. The statements in this “Discussion of the Background” are not an admission that the subject matter disclosed in this “Discussion of the Background” section constitutes prior art to the present disclosure, and no part of this “Discussion of the Background” section may be used as an admission that any part of this application, including this “Discussion of the Background” section, constitutes prior art to the present disclosure.