A clock mesh network has been the preferred clock network structure for high-end microprocessor design because of its tolerance to variations. The variation tolerance is achieved by having the redundant mesh grid wires near the sink registers at the cost of power dissipation. Although other clock structures with redundancy such as clock spines and cross links exist, these structures only improve the tolerance to local skew variation. A clock mesh network, by design, has a very low global clock skew (variation). As such, the clock mesh network is popular in high-end microprocessors and, consequently, many known design automation methods have been developed in the area of clock mesh synthesis and optimization. In such prior art systems, the methods proposed aim to reduce the power dissipation given a practical skew requirement. For instance, the methods described by Venkataraman, Feng, Hu, and Li in “Combinatorial algorithms for fast clock mesh optimization,” IEEE Transactions on Very Large Scale Integration Systems (TVLSI), Vol. 18, No. 1, pp. 131-141, January 2010 and by Abdelhadi, Ginosar, Kolodny, and Friedman in “Timing-driven variation-aware nonuniform clock mesh synthesis,” Proceedings of the Great Lakes Symposium on VLSI (GLSVLSI), May 2010, pp. 15-20, and the methods described by Shelar in “An algorithm for routing with capacitance/distance constraints for clock distribution in microprocessors,” in Proceedings of the International Symposium on Physical Design (ISPD), March, 2009, pp. 141-148, and by Guthaus, Wilke, and Reis in “Non-uniform clock mesh optimization with linear programming buffer insertion,” Proceedings of the ACM/IEEE Design Automation Conference (DAC), June 2010, pp. 74-79 aim to reduce the mesh grid wires and stub wires, respectively, whereas the methods described by Rajaram and Pan in “Meshworks: An efficient framework for planning, synthesis and optimization of clock mesh networks,” in Asia and South Pacific Design Automation Conference (ASPDAC), January 2008, pp. 250-257; Cho, Pan and Puri in “Novel binary linear programming for high performance clock mesh synthesis,” Proceedings of the IEEE/ACM International Conference on Computer-aided Design (IC-CAD), 2010, pp. 438-443; and Lu, Mao, and Taskin in “Timing slack aware incremental register placement with non-uniform grid generation for clock mesh synthesis,” Proceedings of the International Symposium on Physical Design (ISPD), March 2011, pp. 131-138 aim to reduce the sum of the mesh grid wires and stub wires.
Although optimizing for power dissipation, none of these disclosures has considered the commonly used power saving techniques for clock tree network such as clock gating and register clustering on meshes. In the clock mesh network, the clock gating is only potentially applicable on the local connections between the mesh grid wires and the sink registers. In the prior art, the stub wires that connect the grid wires to the sink registers are considered buffer-less where clock gating is inapplicable. A significant percentage of the switching capacitance (30-70%) is at the sinks of the clock network; therefore, clock gating on the local trees of a clock mesh is beneficial. As will be explained herein, it is desired in accordance with the method of the invention to connect the sink registers using local steiner trees and to insert the integrated clock gating cells (ICG) for power saving purposes.