1. Field of the Invention
The present invention relates generally to methods and apparatus for the design, partition, and placement of microelectronic integrated circuits. More specifically, the present invention is related methods and apparatus for the design and construction of a hierarchical clock distribution system within microelectronic integrated circuits. Even more particularly the present invention relates to methods and apparatus for compensating for clock skew within the clock distribution system between functional circuit blocks of the integrated circuits.
2. Description of Related Art
Electronic Design Automation (EDA) tools and methods facilitate the design, partition, and placement of microelectronic integrated circuits on a semiconductor substrate. Generally transistors are formed into primitive circuits that perform digital logic functions such as AND, OR, NAND, NOR, etc. The primitive circuits are then organized into macro circuits such as multiplexers, adders, multipliers, decoders, etc., which in turn are organized as functional blocks. In a hierarchical design, the functions of the integrated circuit design are allocated space on the semiconductor substrate. Each of the individual functions is then partitioned into the various macro circuits which are often predesigned and placed in a library of the EDA system. When the individual functional designs are completed, the global design of the whole integrated circuit is then completed to interconnect the individual functional blocks.
In a synchronous logic design, a common timing signal or clock is employed to insure that the circuitry functions correctly. The clock is distributed to each of the registers or latches within the functions and ideally arrives at each of the latches simultaneously during operation. In reality this is not true. There are differences in the distribution of the clock which causes variation in the arrival of the clock at each of the registers or latches. This variation is referred commonly as “clock skew”.
Refer now to FIG. 1 for a discussion of the structure of a clock distribution system for an integrated circuit of the prior art and the contributing factors to the clock skew. The primitive logic circuits are configured to form a combinational logic function 115. The registers 110 and 120 provide the memory elements for groups of the sequential logic functions 100. The sequential logic functions 100 are partitioned and organized to form the individual macro-function logic blocks. The macro-function logic blocks are arranged and placed physically on the semiconductor substrate.
A clock generator 125 provides the clock timing signal used to provide the synchronization of the data being transferred to and transferred from the registers. The clock timing signal is transferred through a clock distribution system or clock tree from the clock generator 125 to the registers 110. The clock distribution system or clock tree is a series of buffer circuits placed in an ever widening network or subtrees 135, 140, and 145. Each buffer is generally a driver circuit constructed to provide an increment of delay to the clock timing signal and sufficient drive for the number of buffers in the next layer of buffers.
In the example of the clock distribution system, as shown, the clock timing signal is received from the clock signal generator 125 by the buffer 130. The buffer 130 forms a first layer of the clock distribution system or clock tree. The output of the buffer 130 is connected to the second layer of buffers 132. Each of the buffers of the second layer of buffers 132 is in turn connected to a group of buffers of the third layer of buffers 134a, . . . , 134z. The first, second, and third layers of buffers form the global or top level of the clock distribution system and provide the interconnections to distribute the clock timing signal to the macro-function logic blocks. The global or top layer clock distribution system may provide a balanced common subtree 135.
The clock distribution system is further distributed through the clock subtrees 140 and 145 to the sequential logic functions 100. In the example as shown the output of the buffer 134b is connected to the buffers 142 and 146. The outputs of the buffers 142 and 146 are connected respectively to a group of buffers 144a, . . . , 144n and 148a, . . . , 148n within each of the macro-function logic blocks. The output of the subtree 140 provides the clock timing signal to the register 105 and the output of the subtree 145 provides the clock timing signal to the register 110.
The clock skew for the clock distribution network is determined by the load that results from the number of buffers of a following layer being driven by an output of a buffer and by the physical wiring segments required to connect the output of the buffer to the input of the buffers of the following layer. It is not possible to either totally balance the number of buffers or the amount of wiring segment used in creating the clock distribution. Additionally, the structure of the two subtrees 140 and 145 may differ in the number of layers of buffers. Thus, the arrival times of the timing clock signals at the registers 105 and 110 may differ. These differences create the differences in arrival times of the clock timing signal or clock skew. In the present hierarchical design methods, the portion of the clock distribution system, within the macro-function logic blocks, are designed initially when the macro-function logic blocks are designed. Normally, the physical sizes or the macro-function logic blocks permit the structure of the clock distribution system to be well balanced to minimize the clock skew within the macro-function logic blocks. When the global interconnection of the clock distribution system are implemented, the distances between the macro-function logic blocks vary significantly. The clock skew at the global level can thus differ at the global or top level significantly more than within the macro-function logic blocks.
“Clock Generation and Distribution for the First IA-64 Microprocessor,” Tam et al. IEEE Journal of Solid-State Circuits, pp. 1545–1552, November 2000, Volume: 35 Issue: 11, ISSN: 0018-9200 describes clock distribution with an active distributed deskewing technique. The technique is capable of compensating skews caused by within-die process variations that are becoming a significant factor of the clock design. A multilevel skew budget and local clock timing methodology are used to enable a high-performance design by providing support for intentional clock skew injection and time borrowing. A test access port interface is provided to the deskew architecture with the incorporation of the on-die-clock-shrink for post-silicon timing debug.
“Performance Optimization of VLSI Interconnect Layout,” Cong et al. The Journal of VLSI Integration, Vol. 21, Nos. 1&2, November 1996, pp. 1–94 presents a comprehensive survey of existing techniques for interconnect optimization during the VLSI physical design process, with emphasis on recent studies on interconnect design and optimization for high-performance VLSI circuit design under the deep submicron fabrication technologies.
“An Algorithm for Zero-Skew Clock Tree Routing with Buffer Insertion,” Chen et al. Proceeding—European Design and Test Conf., pages 652–657, 1996 presents multi-stage zero skew clock tree construction for minimizing clock phase delay and wire-length. Chen et al. simultaneously performs clock tree routing and buffer insertion. A clustering-based algorithm, which uses shortest delay as the cost function, is described.
“Physical Design CAD in Deep Sub-micron Era,” Mitsuhashi et al., Proceedings of the European Design Automation Conference with EURO-VHDL'96, 1996, Geneva, Switzerland, IEEE Computer Society Press, Los Alamitos, Calif., pp. 350–355, ISBN:0-8186-7573-X describes timing optimization and power minimization methods using the concept are discussed in detail.
“Wire segmenting for improved buffer insertion,” Alpert et al., Proceedings of the 34th Annual ACM/IEEE Design Automation Conference, 1997, ACM Press, New York, N.Y., USA, pp. 588–593 ISBN:0-89791-920-3 presents buffer insertion, which seeks to place buffers on the wires of a signal net to minimize delay. Alpert et al. studies the problem of finding the correct number of segments for each wire in the routing tree. Too few segments yields sub-par solutions, but too many segments can lead to excessive run times and memory loads.
“Repeater Block Planning under Simultaneous Delay and Transition Time Constraints,' Sarkar et al. Proceedings 2001 European Design, Automation and Test Conference, March 2001, pp. 540–544 describes a solution to the problem of repeater block planning under both delay and signal transition time constraints for a given floor plan.
U.S. Pat. No. 6,311,314 (McBride) describes a system and method for evaluating the loading of a clock driver. The method evaluates each node within a net list file to determine: (1) whether that node is an output node for a clock driver; and (2) for clock driver nodes, whether that node is within loading specification for the particular clock driver circuit.
U.S. Pat. No. 6,053,950 (Shinagawa) teaches a layout method for a clock tree in a clock signal distribution circuit. In the layout of the clock tree, a standard clock tree is prepared having a route buffer, a plurality of intermediate stage buffer cells and a plurality of last stage buffer cells connected in a hierarchical configuration. All of the clock lines have an equal length. If there is no set of flip-flops in a target integrated circuit corresponding to a set of last stage buffer cells, the set of last stage buffer cells are removed as a whole provided there is not other last stage buffer cells connected to a flip-flop.
U.S. Pat. No. 6,020,774 (Chiu, et al.) demonstrates a gated clock tree synthesis (CTS) method for the purpose of synthesizing a gate array logic circuit to allow optimal topological arrangement of the gate array on the logic circuit.
U.S. Pat. No. 5,864,487 (Merryman, et al.) illustrates a method and apparatus for identifying gated clocks within a circuit design using a standard optimization tool. The gated clock signals may be identified by identifying which of the number of raw clock signals is coupled, through combinational logic, to a selected one of the number of state devices. This results in an identified raw clock signal. A number of enable signals coupled through combinational logic to the selected one of the number of state devices is identified and results in an identified enable signal. The gated clock signals are then uniquely determined by the particular combination of the identified raw clock signal and the identified enable signal.
U.S. Pat. No. 5,686,845 (Erdal, et al.) describes a hierarchical clock distribution system and method. The method of producing a hierarchical clock distribution system for the circuit includes determining clock skews between the clock driver and the sub-blocks respectively. Delay buffers are selected from a predetermined set of delay buffers having the same physical size and different delays, with the delay buffers being selected to provide equal clock skews between the clock driver and the distribution systems respectively. Each delay buffer includes a delay line, and a number of loading elements that are connected to the delay line, with the number of loading elements being selected to provide the required clock delay for the respective sub-block.