For the design of digital circuits (e.g., on the scale of Very Large Scale Integration (VLSI) technology), designers often employ computer aided techniques. Standard languages such as Hardware Description Languages (HDLs) have been developed to describe digital circuits to aid in the design and simulation of complex digital circuits. Several hardware description languages, such as VHDL and Verilog, have evolved as industry standards. VHDL and Verilog are general purpose hardware description languages that allow definition of a hardware model at the gate level, the register transfer level (RTL) or the behavioral level using abstract data types. As device technology continues to advance, various product design tools have been developed to adapt HDLs for use with newer devices and design styles.
In designing an integrated circuit with an HDL code, the code is first written and then compiled by an HDL compiler. The HDL source code describes at some level the circuit elements, and the compiler produces an RTL netlist from this compilation. The RTL netlist is typically a technology independent netlist in that it is independent of the technology/architecture of a specific vendor's integrated circuit, such as field programmable gate arrays (FPGA) or an application-specific integrated circuit (ASIC). The RTL netlist corresponds to a schematic representation of circuit elements (as opposed to a behavioral representation). A mapping operation is then performed to convert from the technology independent RTL netlist to a technology specific netlist which can be used to create circuits in the vendor's technology/architecture. It is well known that FPGA vendors utilize different technology/architecture to implement logic circuits within their integrated circuits. Thus, the technology independent RTL netlist is mapped to create a netlist which is specific to a particular vendor's technology/architecture.
One operation, which is often desirable in this process, is to plan the layout of a particular integrated circuit and to control timing problems and to manage interconnections between regions of an integrated circuit. This is sometimes referred to as “floor planning.” A typical floor planning operation divides the circuit area of an integrated circuit into regions, sometimes called “blocks,” and then assigns logic to reside in a block. These regions may be rectangular or non-rectangular. This operation has two effects: the estimation error for the location of the logic is reduced from the size of the integrated circuit to the size of the block (which tends to reduce errors in timing estimates), and the placement and the routing typically runs faster because as it has been reduced from one very large problem into a series of simpler problems.
Retiming algorithms have been used to optimize a design of a circuit. Typically, a synchronous circuit works properly only when a signal propagates from one register to another along a combinational path, a path that does not include a register, such as a memory cell, a flip-flop, a delay element, etc., within a specified number of clock cycles (e.g., in one clock period). Thus, the maximum signal delay on the paths between the registers (e.g., due to the computation time of the combinational computing elements on a path and the wire delays) determines the minimum clock period in which the circuit can work properly. Registers may be placed or repositioned on a path of the circuit to reduce the maximum signal delay on the path and to reduce the clock period of the circuit. A general retiming algorithm may be used to redistribute some of the registers in the circuit, based on a timing model of the circuit to minimize the clock period.
Typically, the timing model of a circuit is obtained by putting together the timing models of the combinational computation units. delays (e.g., due to the registers), and interconnections that make up the circuit. Interconnect delays are hard to model and thus often ignored. A typical timing model for a circuit system that includes one or more circuit modules is generated from aggregating the timing models of the combinational computation units of the modules.
Typical retiming algorithms (e.g., described in “VLSI Digital Signal Processing Systems: Design and Implementation” by Keshab K. Parhi, pp. 91-118, Wiley-Interscience, 1999) are formulated based on data flow graphs. Data flow graphs are composed of nodes that represent the combinational computation units and edges interconnecting them. Delays (e.g. registers) are represented as weights on the edges. Each node has an execution time associated with it.
For example, FIGS. 2-3 illustrate a prior art method to construct a data flow graph for retiming. The combinational computation units (e.g., adder 205, multipliers 207 and 209) in FIG. 2 are represented as computation nodes (e.g., nodes 225, 227 and 229 in FIG. 3). Execution time at the combinational computation units is represented by the computation time of the nodes. For example, node 225 has a computation time of 2 ns, which is required by adder 205; and each of nodes 227 and 229 has a computation time of 4 ns, which is required by a multiplier (e.g., 209 or 207). Edge 231 represents the connection between multiplier 207 and adder 205. Edge 231 has a weight of 1, representing register 217 (or the one clock cycle latency due to register 217). Similarly, edge 233 has one delay representing register 215. Edge 235 represents the connection between multipliers 209 and 207; and, there is no delay associated with edge 235.
A critical path in a data flow graph is the path with the longest computation time among all paths that contain zero delay edges (combinatorial paths). For example, in FIG. 3, the path from node 229 to node 227 contains edge 235 that has zero delay, and, the path from node 229 to node 227 takes the longest computation time (e.g., 8 ns, of which 4 ns are for node 229 and 4 ns for node 227). Thus, the minimum clock period for the circuit in FIG. 2 is 8 ns. In FIG. 3, the delay on edge 233 can be moved to edge 235 so that the critical path becomes the path between nodes 225 and 229, which takes only 6 ns of computation time. Thus, moving the delay from edge 233 to edge 235, which can be implemented by moving register 215 from between adder 205 and multiplier 209 to between multipliers 209 and 207, allows the modified (retimed) circuit to be operated at a reduced clock period of 6 ns.
The conventional approach for obtaining the timing model for a circuit module is breaking down the module into the actual registers and combinational computing elements that make up the module and assigning one node to each combinational computing element. Typically, circuit modules in a design are translated into a set of nodes and edges that correspond to the combinational units in the modules and the nets connecting them. In other words, the timing model of each hardware module is typically constructed by putting together the timing models of the combinational computation units, delays, and interconnections that make up the hardware module. The aggregation of the set of nodes and edges used in the translation of a particular hardware module is effectively the timing model (data flow graph) of that hardware module.
Retiming algorithms include cutset retiming and pipelining. Further, there exist retiming algorithms for clock period minimization using the data flow graph. More details about the cutset retiming, pipelining and retiming for clock period minimization can be found in the literature (e.g., “VLSI Digital Signal Processing Systems: Design and Implementation” by Keshab K. Parhi, pp. 97-106, Wiley-Interscience, 1999).