1. Technical Field
The invention relates to a method for generating optimized timing constraint systems for retimable digital designs.
2. Description of the Prior Art
The following is a description of a generic logic synthesis system, with descriptive emphasis being given to those aspects of the system that are most relevant to the invention described herein. FIG. 1 shows the flow through a generic logic synthesis system 11 that features retiming, and that produces a netlist 19 expressing an optimized design. The input design 10 is expressed in a hardware description language (HDL), such as Verilog or VHDL. This text is analyzed, i.e. parsed and translated into an initial circuit representation 12. In the next step 14, a clock signal is declared, and a clock period is associated with the clock signal. A clock is present in both combinational and sequential circuits. In a combinational circuit the clock signal is a dummy, but the period must still be declared. In both the combinational and sequential cases, the clock period is used to constrain the timing of the circuit.
In the case of a combinational circuit, the paths being constrained begin at inputs and end at outputs. The clock period determines the allowable difference between the arrival time at the inputs and the required time at the outputs. Where additional timing offsets are needed, e.g. an additional delay on a particular input or output pin, these can be expressed as being with respect to the arrival/required times implicit in the clock period. Thus, for example, if one specifies a clock period of 100 nanoseconds, the default delay allowed between an input I and an output O is 100 nanoseconds. If there was a specific input pin X with an arrival time 10 nanoseconds later than the others, one would express this as being 10 nanoseconds late with respect to the declared clock.
In the case of a sequential circuit, a constrained path may begin either at a circuit input pin, or at the Q pin of a flip-flop. Paths end either at circuit output pins, or at D pins of flip-flops. The path delay constraint is the period of the clock that drives the flip-flops in the case where a path begins at a flip-flop and ends at a flip-flop. In other cases, the timing relationship between the I/O port and the flip-flop is computed in a manner known to practitioners of the art.
It is sometimes helpful to visualize this kind of timing constraint system implemented by collections of variously colored tokens. Each token is colored with a color corresponding to a particular clock. A token is launched at either an input pin or at the Q of a flip-flop clocked by the clock whose color the token bears. It is propagated through combinational logic, accumulating delay as it propagates, and finally it arrives at, and is absorbed by, either an output pin or the D pin of a flip-flop. The timing relationship (if there is one) between the token and its final destination can then be determined by comparing the colors of the token and the clock that constrains the flip-flop or output. Thus, for example, if a green token arrives at a D pin of a blue-clocked flip-flop, then the permissible delay accumulated on the token must be less than or equal to the worst-case time between the valid edges of the green and blue clocks, if such a worst-case time is defined. If a worst-case time is not defined, then the token is ignored and no constraint is adduced.
For purposes of logic synthesis, it is also convenient to imagine a second set of tokens being propagated backwards through the same circuit, in a symmetrical manner, with delays being subtracted instead of added, thus computing required times.
The term ‘slack’ is used to denote the difference between the arrival and the required time at a particular net or pin of the circuit. A positive slack characterizes a situation where the circuit satisfies the constraint in question; a negative slack characterizes a situation in which the circuit does not.
Hence, the worst-case slack of a pin P of the circuit can be computed as being the minimum, taken across all colors and valid combinations of colors, of the difference between the arrival (forward traversal) and required (backward traversal) token delays, which are recorded at P as the tokens pass through P. It is also normal practice just to speak of the ‘slack’ of a gate, where implicitly the slack is of the output pin (usually there is only one) of the gate in question, and the slack is the worst-case slack.
The next step of the generic flow pictured in FIG. 1 is logic synthesis 16. Here the circuit is optimized by restructuring its logic. The primary objective is usually to meet the timing constraints as expressed in the previous step, and the secondary objective is usually to minimize circuit area, gate count, or some other cost function such as power consumption. The logic synthesis software uses the difference between token arrival times and token required times to drive its decision making process.
Consider, for example, the two circuits shown in FIGS. 2a and 2b. The two circuits shown both implement the logical AND of five literals A, B, C, D, and E. In FIG. 2a, the function is implemented by a single five-input gate 20, and in FIG. 2b, by a degenerate tree 22 of four two-input gates, 23-26. Neither of these circuits is intrinsically better than the other. The one a synthesis tool ought to choose depends on the arrival times and slacks of the signals A-E. If, for example, all five inputs arrive at the same time and have the same slack, the circuit shown in FIG. 2a is probably better; whereas if E is a relatively late-arriving, low-slack signal, then the circuit shown in FIG. 2b is probably better.
The example discussed above represents only one kind of optimizing decision. There are many other kinds of optimizing decisions that a logic synthesis system can make. Each of these can be characterized by two or more alternative designs or classes of designs, and by a tradeoff between optimizing some properties and degrading others. For example, the path A-F is degraded in the design shown in FIG. 2b of the simple AND-gate example, whereas the path E-F is optimized. Furthermore, area, power, and other cost functions are also affected by these tradeoffs.
Some of these optimizing decision classes are:    <Sharing of high-level functional units such as adders, multipliers, etc. (multiplexing inputs);    <Speculation of high-level functional units (multiplexing outputs);    <Implementation styles of high-level functional units, e.g. the choice of carry-lookahead as opposed to ripple-carry adders;    <The use of complex gates as opposed to collections of simple gates;    <The choice of drive strength within a family of functionally similar gates;    <Input swapping; and    <Functional decomposition.
These decisions have a profound effect on the overall performance and cost of the design.
Following logic synthesis is a retiming step 18. This is not always present because retiming is most useful in the context of a pipelined design. In retiming, registers are repositioned in the design in such a way as to preserve overall functionality while optimizing the achievable clock frequency and register count.
Consider, for example, FIG. 3. In this circuit 30, the maximum achievable clock frequency is determined by the delay on the path A-F1. However, notice that if the flip-flop F1 32 is retimed through the gate G 34, in effect creating two flip-flops on the inputs of G, the maximum clock frequency is improved by the delay of G. If those two flip-flops are then retimed further to the left (see the three routers 32′ in the circuit 32′ shown in FIG. 4), the clock frequency improves still more.
At this point it is useful to make two observations about the generic synthesis flow being described herein.
First, logic synthesis never changes the population of flip-flops in the design, except in cases where a flip-flop can be completely deleted because there is no path from the flip-flop's output to an output of the circuit as a whole or as the result of a constant propagation process.
In this context, the population of flip-flops of a design can be represented of as the set of Boolean functions that drive flip-flop D pins. Thus, the number of functions in the population determines the number of flip-flops in the design, and the sequential (Mealy machine) behavior of the design is completely described by the union of the flip-flop population and the functions that drive output pins of the design.
Second, retiming only repositions flip-flops, and never changes the circuit other than by deleting and inserting flip-flops. In other words, once logic synthesis has come up with a certain topology for the combinational logic, retiming cannot change that topology. Also, once HDL analysis or retiming has come up with a flip-flop population, logic synthesis cannot change that population. Thus, it is easy to construct examples of circuits where a poor choice of the initial flip-flop population can only be remedied with difficulty if at all.
Consider, for example, the circuits shown in FIGS. 5a and 5b. If retiming moves F 52 one gate to the left in both cases, in FIG. 5a the circuit 50 has five flip-flops, none of which could be removed by logic synthesis. Whereas in the circuit 51 shown in FIG. 5b, moving F one gate to the left results in only two flip-flops.
The salient implication of these properties is that a well-chosen set of initial timing constraints and flip-flop population leads to a better design after retiming. Note however that it is seldom easy to construct a well-chosen constraint set and flip-flop population because the exact positioning of the flip-flops, the delays of the gates, and the clocking constraints collectively determine the slack of the gates and hence the retiming solution. While the delays of the gates cannot be known until after logic synthesis, which chooses the gates and the topology by which the gates are interconnected.
Another observation is that optimizing decisions are driven primarily by the timing slack available at the various gates of the circuit. In other words, two copies of a small region of the circuit tend toward the same general topology if the slacks are the same, almost independently of the surrounding logic.
The constraint generation system should therefore have a property that is henceforth called slack equivalence. Slack equivalence means that for any gate G, the timing slack at G is the same under the initial constraint system and in an optimally retimed version of the circuit. Thus, if the entire circuit is slack-equivalent, logic synthesis sees the same local constraints as if the circuit had already been retimed in a near-optimal way.
Slack equivalence is useful because it allows logic to be optimized independently of the distribution of registers in the design, and hence it alleviates the problem that logic synthesis is unable to change the population of registers.