Achieving timing closure of high-performance digital integrated circuits (or functional units of a high-performance digital integrated circuit) implies obtaining sufficient timing performance from the design. This may mean, for example, being able to operate the clock fast enough to obtain the required performance while guaranteeing functional correctness. Achieving timing closure is an important, iterative and time-consuming step in the design of any digital integrated circuit. Particularly in microprocessor designs, timing requirements, logic requirements and technology parameters are often changed late in the design cycle, making automated design closure techniques extremely valuable.
Prior-art methods are illustrated in FIG. 1 (flow 100). Because the overall design is too large and complex to optimize at once, prior-art methods typically divide the design into partitions called macros, and assign to each individual macro a timing and area budget by a process of apportionment (box 110). Then each macro is designed or the design is refined with the goal of meeting its budget, either by a process of automated synthesis or by means of custom design techniques (box 120). The optimization at this stage takes many forms such as logic re-structuring, buffer insertion, transistor sizing and use of low threshold voltage devices. The resulting design is timed, typically by means of static timing analysis (box 130). If every macro meets its budget, it is obvious that timing closure is achieved and the design is complete (box 150). More typically, the apportionment process is imperfect and involves some conjecture and guesswork. Hence, several macros will not meet their budgets, and overall timing closure is not achieved, as detected by box 140. In this case, the apportionment process is repeated (box 110), individual macros are then redesigned and/or re-optimized (box 120), and the resulting overall design is timed (box 130), and this process iteratively repeated until timing closure is obtained (box 150), as depicted in FIG. 1.
The main difficulty in prior-art techniques is that the application of automatic optimization techniques on individual macros interferes with the achievement of overall timing closure. This problem is illustrated in FIG. 2. Consider the simple case of macro A (box 200) feeding macro B (box 210). A short path of delay 200 time units of macro A feeds a long path of delay 600 time units of macro B. A different long path of delay 600 time units of macro A feeds a different short path of delay 200 time units of macro B, as shown in FIG. 2. Assume that all output signals are required to be available by time 700. In this case, the initial design is missing timing closure by 100 time units, or, in other words, the initial design has a slack of −100 time units. Slack is defined as the algebraic difference between required arrival time (RAT) and actual arrival time (AT). One particular prior-art apportionment technique will assign this negative slack of 100 time units to each of the two macros, giving the optimization procedures applied to each macro the opportunity to see and correct the entire negative slack of the global path. Using this apportionment method, the required arrival times will be 100 and 500 at the upper and lower outputs of macro A, respectively, and 700 at both the upper and lower outputs of macro B, and the arrival times will be zero at both the upper and lower inputs of macro A, and 200 and 600 at the upper and lower inputs of macro B, respectively, as shown in the Figure.
Suppose the short paths cannot be improved, but there is room for improvement in the long paths. It is clear from this example that improving the two long paths from 600 to 500 units will achieve overall timing closure. Unfortunately, prior-art methods will never achieve timing closure in this case, since the redesign and re-optimization of individual macros typically target the worst slack, and because the short paths cannot be improved, the redesign and re-optimization techniques have no incentive to improve the delay of the long paths.
Another prior-art apportionment method, one iteration of which is illustrated in FIG. 3, would divide the negative slack according to the fraction of the global path delay suffered in each macro, and in the example of FIG. 2, would assign −25 time units of the upper path slack to macro A, −75 of the upper path slack to macro B, −75 of the lower path slack to macro A, and −25 of the lower path slack to macro B. Using this apportionment method, the required arrival times will be 175 and 525 at the upper and lower outputs of macro A, respectively, and 700 and 700 at both the upper and lower outputs of the second macro B, and the arrival times will be zero at both the upper and lower inputs of macro A, and 175 and 525 at the upper and lower inputs of macro B, respectively. The situation after one iteration is depicted in FIG. 3.
Suppose now that each of the delays through each of the macros can be decreased by 50 units by optimization. Again, prior-art methods will never achieve timing closure under this apportionment scheme, since the redesign and re-optimization of individual macros typically target the worst slack, and because the long paths cannot be improved beyond 550, the redesign and re-optimization techniques have no incentive to improve the delay of the short paths, and upon successive iterations through loop of FIG. 1, the delays and targets will be adjusted by decreasing amounts, and will asymptotically approach but not reach timing closure.
With this second prior-art apportionment method, if the long paths in each macro can be improved by 100 units each, and the short paths cannot be improved at all, it is clear that although an easy solution exists for global timing closure, the iteration of FIG. 1 will not converge to the solution in reasonable time. The reason is that the short path's stubborn negative slack at each iteration of FIG. 1 will limit the improvement that is targeted for the long path of each macro.
Irrespective of the apportionment method employed, the crux of the problem is that prior-art optimization techniques target only paths with the worst slack and therefore do not improve sub-critical slacks even though such actions would help achieve timing closure from a global vantage point. Improving sub-critical paths also makes it easier downstream in the methodology to focus design efforts in limited areas of the circuit to obtain timing convergence. Thus the formulation of the objective function during individual macro optimization has the unwanted consequence of preventing or impeding overall timing convergence.
It is to be appreciated that this simple example merely illustrates the problem. With a large number of macros and a large number of interconnections between them, the problem is exacerbated and achievement of timing closure becomes an extremely hard problem, leading to costly redesign efforts and increased time-to-market of the product.