The invention relates to clock circuits in electronic devices, and more particularly to methods and apparatuses for automatically determining a preferred technique for resolving short path violations or adding robustness in clock distribution systems.
The precise timing of events in synchronous digital circuits is supervised and synchronized by so-called clock signals. The task of the clock signals is to reduce the uncertainty in delay between sending and receiving storage elements. Storage elements, such as latches and flip-flops, respond to a predefined characteristic of a clock signal (e.g., a leading or trailing edge of the clock signal) by sampling output signals supplied by combinational logic or other storage elements. The sampled value is internally preserved by the storage element as the state of the circuit. The state of the storage element is made available for new computations after a certain delay.
As the overall design of a digital circuit becomes more complex, it is increasingly important to design a clock distribution system with care. As a clock signal traverses different branches of a tree-like distribution network, its critical component (e.g., a leading clock edge) may arrive at different storage elements at different times. This timing difference between clock arrival times at different points in the digital circuit is called clock skew.
Clock skew can be quantified in any of a number of different ways. When only two registers are considered, a local skew can be defined as the time difference between the clocking of the sending register and the clocking of the receiving register. This skew takes on a negative value if the receiving register is clocked later than the sending register, and takes on a positive value if the receiving register is clocked earlier than the sending register. Alternatively, clock skew can be defined as the magnitude of the difference between the clocking of the sending register and the clocking of the receiving register. In this case, clock skew is always stated as a positive value.
When more than two registers are considered, the term clock skew can be defined as the difference between the longest propagation path through the clock network and the shortest, and is thus always a positive value.
Clock skew may cause data-processing errors if it becomes excessive. This is because a storage element, responding to an early-arriving clock signal (excessive positive clock skew), may attempt to sample the output of one portion of the digital circuit before that output has settled into a valid state. This so-called xe2x80x9clong path violationxe2x80x9d limits the maximum clock frequency that may be applied to the circuit.
Alternatively, a storage element receiving an excessively delayed clock signal (excessive negative clock skew) may attempt to sample the output of one portion of the digital circuit at a point in time when what was a valid signal has already begun to transition into a different state. The term xe2x80x9cshort path violationxe2x80x9d is used throughout this disclosure to refer to this type of timing violation problem, in recognition of the fact that the problem can be viewed as arising because one circuit path is effectively too short (i.e., it does not introduce a sufficient amount of delay to compensate for the late-arriving clock at the destination register).
To avoid these problems, current schemes for distributing clock signals to storage elements concentrate on ensuring a high degree of synchronism between all clock signals, in what are termed balanced clocking or zero-skew clocking strategies. Clocks are typically distributed in a tree-like structure, which permits delays in different branches to be balanced to a high degree. A benefit of this approach is that the clock rate can be high, because it is not limited by the variation in clock arrival times. Even when lower clock rates are involved, the uniformity provided by these strategies brings predictability and therefore simplifies the overall design problem.
Even with highly balanced clock-trees, between groups of storage elements with separate clock-trees and within blocks having very large clock-trees, the clock skew may still be large enough to violate setup and hold constraints on storage elements or even cause race conditions. For short path violations, this is traditionally solved by extending the data paths causing the problems: individual or several cascaded inverters or buffers are added, which add delay in the data path.
Another approach has been to add a storage element that is responsive to an opposite clock edge between problematic or potentially problematic groups of storage elements. In general, this added storage element need not be an edge triggered element; for example, it may be a level-sensitive latch. The basic function of the added element is to send and receive an opposite clock edge between groups with large or potentially large skew between them. This is, for instance, used in tools that put in scan-chains in synchronous designs, for handling skew in the chains. In this context, the devices are called lockup-latches (or lockup-flip-flops). Where it has been desired to add these elements only where they are needed, this operation has so far had to be performed manually. With respect to test mode paths, some automated design tools insert these added elements based merely on the possibility of clock skew, without taking the actual timing into consideration. As a result, many latches are inserted into test mode paths without actually being needed. Such an approach imposes a needless waste of resources on the design.
In contrast to highly balanced clock-trees, a different methodology exists that combines clock timing with data path timing. Timing analysis produces permissible ranges, which impose a set of constraints on clock delays to individual registers or other storage devices. Then, the permissible ranges are explored to increase safety margins on timing, improve clock frequencies, and optimize the clock design. The method, called useful-skew clock skew scheduling, needs sufficient slack on local skew to be efficient.
The dominating method of resolving short path violations or adding robustness by adding cascaded buffers becomes more expensive than an added latch at about the point when three buffers are added (approximately 1 ns in 0.25 micrometers CMOS), but no technique has so far been used for automatically choosing between adding cascaded buffers and adding a latch, except with respect to scan chains, but here only on scan paths.
It should be emphasized that the terms xe2x80x9ccomprisesxe2x80x9d and xe2x80x9ccomprisingxe2x80x9d, when used in this specification, are taken to specify the presence of stated features, integers, steps or components; but the use of these terms does not preclude the presence or edition of one or more other features, integers, steps, components or groups thereof.
In accordance with one aspect of the present invention, the foregoing and other objects are achieved in methods and apparatuses that automatically select a preferred timing change from a set of candidate timing changes that can be made to address a timing problem in a circuit design. This involves determining an associated cost metric for each of the candidate timing changes. In some embodiments, a lowest cost metric is identified from the associated cost metrics. The candidate timing change associated with the lowest cost metric is then selected from the set of candidate timing changes for use as the preferred timing change.
In some embodiments, the associated cost metric represents a cost per resolved source or destination register.
In alternative embodiments, a total cost is determined for each of the candidate timing changes. Then, after performing the cost analysis for each timing violation to be resolved, a set of candidate timing changes is selected that together resolve all of the timing violations at the lowest combined cost.