Sequential logic circuits or networks are typically controlled by one or more periodic clock signals which synchronize the storage of new data values in the various memory elements of the circuit, including flip-flops, latches, memory arrays, and the like. A clock distribution network is designed so that clock signals arrive simultaneously at all memory elements of the circuit, in which case, every signal propagation path between memory elements must have a delay less than the clock period.
Alternatively, the arrival time (AT) of the clock at each memory element may be adjusted using various Clock Skew Scheduling (CSS) algorithms to accommodate differences in delays of different memory element to memory element paths, maximizing the frequency (minimize the clock period) at which the network operates. CSS algorithms work by introducing AT adjusts at the endpoints of the clock distribution network (clock inputs of the memory elements of the sequential circuits), specifying that the clock AT at an endpoint (or equivalently, the clock distribution delay to the endpoint) should be at a certain amount earlier or later than the “nominal” clock distribution delay. The AT adjusts are then passed to a clock distribution generation process that builds a clock distribution network (e.g., a clock tree) that, as closely as possible, implements these AT adjusts.
Referring to FIG. 9, there is shown an illustrative instance within a real design that illustrates the complexity involved in the gating circuitry, and the inherent problems of not being able to take the gating elements into account when adjusting the skew at the endpoints of the clock distribution network. FIG. 9 shows three stages of memory elements, including clock gated memory elements whose data outputs in turn control other clock gating elements. In such cases, timing constraints on the control inputs to gating elements 30 are not considered by a conventional CSS algorithm that focus only on the endpoints 20 of the clock distribution network. In this case, the clock signal passes through the clock gating elements 30 to reach the clock endpoints 20, i.e., the clock gates and clock distribution network endpoint elements are serially connected within the clock distribution network. Although the serially connected clock gate and endpoint have no intervening circuits in this example, in general they may be separated by other circuitry, such as clock buffers or inverters. More complex clock networks may occur in which multiple clock gate elements occur along the path from the clock source to one or more clock endpoints. Other clock signal processing elements, such as pulse shapers or clock multiplexers, may also occur within the clock distribution network. These elements will be collectively referred to as clock gates, while non-clock control signals feeding these elements will be referred to as clock gate enable signals.
Meeting the timing requirements of complex circuitry such as that shown in FIG. 9 requires that the CSS algorithm account for both the timing requirements of the memory elements at the endpoints of the clock distribution network and the timing requirements of the gating signals that feed the clock gates and other clock control elements within the clock distribution network.
Referring now to FIG. 1, there is shown a sample situation that occurs in an illustrative circuit consisting of a gating element that provides the function of gating a clock signal driving two downstream sequential elements. The sequential elements represent the endpoints of the clock distribution network. This figure further represents a single stage of memory element and its associated clock gating element in the multi-stage example shown in FIG. 9. The clock signal arrives at pin CK and is enabled (i.e., propagated to output ECK) when the data signal at input E is active. The output pin ECK of the gate is connected to the CK clock inputs that trigger (i.e., cause new data to be stored in) the downstream sequential elements. The gating element is labeled in the figures as 110, and the two downstream sequential elements are labeled 120 and 130.
Late mode timing slack (or simply slack) of a signal is defined as the difference between the earliest time that a signal may be required to arrive (or reach a stable state) in order to satisfy timing requirements and the latest time it may actually arrive. In the case of a clock gate element, the data signal is required to arrive before the clock signal so that no clipping (shortening) or glitching (partial pulse propagation) of the clock signal occurs. The local slack at gating element 110, defined as the difference between the earliest time the clock signal can switch and the latest time the data signal may actually switch, is −100 ps. This clock signal at gating element 110 then drives the CK pins of the sequential elements through the ECK pin, resulting in local slacks at the sequential elements of −50 ps at 120 and −110 ps at 130. The slack visible at the output end of these sequential elements is −80 ps at sequential element 120, and −70 ps at sequential element 130. The goal of any CSS algorithm is to balance the skew at the CK pins by balancing the slack between the input and output sides of the sequential elements.
Referring now to FIG. 2, there is shown an instance after a single iteration of the iterative CSS algorithm that attempts to balance the slacks. The slack is balanced by applying an adjust to the clock AT at the CLK pin given by ½(output slack−input slack). In the present example, the result is a −15 ps adjust on the sequential element 120, and +20 ps on the sequential element 130. For simplicity, it is assumed that the adjusts at elements 120 and 130 may be implemented by varying the delay of the connections between them and element 110, and thus the clock arrival time at pin CK of 110, and hence its slack, is unchanged (this assumption will be removed in a latter description of the invention). As illustrated in the example, a conventional CSS algorithm typically improves the worst slack in the design, but the critical slack is now located at the gating element 110 and cannot be improved further with CSS adjusts at the clock distribution network endpoints.
Introducing CSS AT adjusts only at the endpoints of the clock distribution network are sufficient when all the critical paths in the circuit involve just these clock distribution network endpoints. However, with recent increases in the use of gating circuits within a clock distribution network to reduce power and/or improve the performance of the sequential network of which it is a part, it is often the case that the critical paths involve the clock gating elements, rather than the clock distribution network endpoints. These are not taken into account by conventional CSS algorithms. The situation is complicated further when outputs of clock gated sequential elements themselves generate signals that control other clock gating elements, as illustrated in FIG. 9.
The solution to this problem is not as simple as taking the gating element into account as just another endpoint of the clock distribution network. This may result in the slack being balanced at the gating point, but cause the ATs at the clock endpoints fed by the gated clock to be pushed out further than necessary. The fundamental problem is that existing CSS algorithms do not properly consider the necessary relationship between AT adjusts at different points that are serially connected along a single path in a clock distribution network (e.g., at a clock gating element and at the clock distribution network endpoints fed by it).
Generally, when balancing skew on the clock distribution network, the enable inputs of the clock gating elements are not taken into account. There are algorithms for skew scheduling, but no specific algorithms that take the enable input into account.
Therefore, there is a need in industry for a method and a system that generates the Clock Skew Schedule (CSS) for a clock distribution network that includes clock gating elements.