Programmable logic devices are available for implementing any of a wide variety of logic designs. The user designs a logic circuit, and uses software to direct a computer to program a logic device to implement the design. Timing of signals passed between parts of the logic device depends in part upon placement of the user's circuit elements into transistor groups in the physical structure and the resultant arrangement of the metal wires. In general, longer wires mean longer delays, the actual timing depending upon capacitances, resistances, and the number of logic elements a signal must traverse. As transistors have become faster, a larger percentage of the delay in passing a signal through the device is contributed by the wiring. Therefore it has become a worthwhile effort to select wiring segments carefully to minimize the delay caused by these wiring segments.
FPGA devices include a plurality of configurable logic blocks or modules connected together by programmable interconnect lines. The configurable logic blocks can be programmed to implement multiple functions, and the programmable interconnect can be connected together in a wide variety of ways to produce an overall structure which implements the user's design.
In a typical island-type FPGA, a plurality of configurable logic blocks (CLBs) implement functions selected by a user, and must be interconnected to perform an overall function desired by the user. Wire segments are provided in the chip for interconnecting these CLBs. The CLB leads (wires extending from CLBs) may be programmably connected to the wire segments, for example by pass transistors at programmable interconnect points (PIPs).
Also present in the FPGA are switch boxes that selectively interconnect the wire segments to each other at PIPs. These PIPs may offer a variety of connections, for example simple pass transistors, transmission gates, unidirectional buffers, and bidirectional buffers.
Wire segments of different lengths extend between pairs of switch boxes. Longer segments are useful for high fanout signals, or to route signals whose components are placed relatively far apart, while shorter segments are best for local routing. The time delay experienced by a signal routed from one CLB to another will depend upon the kinds of wire segments used and the kind and number of PIPs traversed.
Definitions
In a logic design, a conductive route from one source logic block or register to one destination logic block or register is called a connection. Frequently a signal generated at one source must be sent to several destinations (also referred to as “loads” or “sinks”). The collection of routes from one source to all destinations is called a net. In the hardware which implements the logic design, a single piece of wire which makes up one part of a connection or net is called a segment. One segment can be interconnected to another at a PIP. Typically, a wire segment may be interconnected to any of several other wire segments at a PIP. A path is the set of connections and combinational logic blocks that extend from one package pin or clocked register to another package pin or clocked register.
Motivation to Include Interconnect Wiring in Timing Analysis
If segments are connected at PIPs by turning on transistors, the connection will comprise both wire segments and transistors. Since transistors still have finite resistance in their ON states, the connection will impose a finite RC delay on the signal path. Further, as feature size decreases, the fraction of delay due to wiring increases. Connection delays have become a significant part of the total delay for transistor-connected devices. Even for hard-wired ASIC devices, as feature size has become smaller, wiring has begun to contribute a major part to total delay of signals. In devices such as FPGAs, a variety of routing resources have a variety of time delay characteristics. Therefore routing can significantly affect timing of the configured logic device.
The task of FPGA routing is to select interconnect lines to realize connections in the user's design that join partitioned pieces of the user's logic. In general, a route to connect a source logic block to a destination logic block is selected by adding segments one step at a time to form a continuous connection between source and destination.
Search Algorithms to Establish Connections
The generic routing problem is well studied. The most common search algorithms are based on breadth-first expansion searches, biased so that segments more likely to be useful are explored earlier. Such algorithms will find a path if one exists. The search process is typically repeated for each connection in turn. As later connections are routed, there are fewer free routing resources. It becomes difficult to complete later connections legally, e.g., without crossing other signals. When legal routes cannot be found, the router can either leave affected connections unrouted (“opens”) or temporarily allow them to overlap other signals (“shorts”). Later connections tend to become longer and less efficient, or produce more shorts that must be removed by rerouting. The collective result of finding connections will in general depend on which connections are routed first as well as on the algorithm used to route each connection.
A connection can be routed using different goals. In resource-based routing, the focus is on completing the connection with minimum total cost of the resources consumed, where each segment and PIP is assigned a cost depending, e.g., on its reach and the scarcity of that type of resource.
In delay based routing, the goal is to find a route for which the connection delay is less than a target determined to support the overall (path) timing constraints of the design.
Register-to-Register Transfers
A typical logic design uses both clocked registers (flip-flops) and un-clocked combinational and arithmetic functions. Timing of the overall device (how fast the device can be clocked) is determined by how quickly a signal can propagate from the output of one clocked register to the input of the next clocked register. This in turn depends upon how quickly any intermediate logic can be processed, and on delays in wire segments and PIPs that connect the combinational logic blocks along the path.
Improving the Device Performance—Critical Paths
The time delay for each path used can be computed once routes have been selected. Users are generally interested in minimizing the longest delay in propagating a signal between one clocked register and another because this delay determines the maximum clock speed which can be used with the device when it is implementing the user's function. The slowest or “critical” path limits the speed of the device. Critical path delay must be reduced if the overall speed of the device is to be improved. The task of transforming the results of critical path analysis into guidance for routing is not trivial.
Slack
When a timing requirement in one connection of a logic design fails to be met but a timing requirement in another connection of the logic design is met with time to spare, it may be possible to adjust element routes so that timing of the connection with room to spare is made slower while speeding up the connection that failed to meet a requirement so that all paths meet the requirements. The room to spare is called slack. The slack of a path is defined as R(p)−A(p) where R(p) is the required propagation time along path p, and A(p) is the actual total propagation time along path p. Positive slack indicates a connection with time to spare. Negative slack indicates that a timing requirement has been missed. Near-zero (positive) slack means a timing requirement is barely met. Paths routed with positive slack can be rerouted to have less positive slack if that action allows other paths with negative slack to be rerouted so that they meet a timing requirement.
Slack Calculation for Connections of a Path
In a typical circuit, there will be both fan-in (multiple signals entering an element) and fanout (output signals applied to more than one element). The slack of a connection, slack(c), is defined as the minimum slack of any path that includes c. Equivalently,slack(c)=R(c)−A(c)  (1) where:    R(c) is the earliest required arrival time of a signal at the output end of connection c, and    A(c) is the latest actual arrival time of a signal at the output end of connection c.
R. B. Hitchcock, Sr., G. L. Smith, and D. D. Cheng, in “Timing Analysis of Computer Hardware,”, IBM J. Res. Develop., Vol. 26, No. 1, pp. 100-108, 1982, described how to compute slacks of all connections in a design. Two linear time computations suffice—forward propagation of actual times, and backward propagation of required times.
Routing delay from a net's source S to a load L is a function of the topology of the routing to all loads of the net, not just the segments along the route from S to L. This effect can be approximated by expressions described by Penfield and Rubenstein (Penfield, Paul Jr. and Rubenstein, Jorge, “Signal Delay in RC Tree Networks”, Proceedings 18th Design Automation Conference, 613-617, June 1981), extending the early analysis presented by W. C. Elmore, “The Transient Response of Damped Linear Networks with Particular Regard to Wideband Amplifiers”, Journal of Applied Physics, vol. 19, no. 1, pp. 55-63, January 1948. The approximate delay from source S to load Lj isT(S, Lj)=Σk Rk*Cap(k) Where the summation is over all wire segments k on the route from S to Lj, Rk=resistance of a wire segment k and Cap(k)=total capacitance in the downstream sub-tree rooted at k, i.e., on the other side of k from the source S.
The relevance of this formula to the routing of multi-load nets can be seen by studying the change in delay at a given load (Lj) that results from completing an additional route from the same source (S) to a different load (Ln) that branches off from the original route at a node B.
FIG. 1a shows an RC network 100 of the background art representing a routed connection S to Lj. The routed delay from S to Lj can be defined as Dj.
FIG. 1b shows the same RC network from S to Lj with a new routed branch 110 from node B to Ln. In the example shown in FIG. 1b, the routing to Ln causes extra delay to occur on the existing route to Lj. The added delay is caused by the additional parallel capacitance (between B and Ln) loading the upstream resistance between S and B. The new delay at Li is given by the following equation:Dj(new)=Dj(old)+Σk(Rs*Ck) where the summation is over segments k on the new branch from B to Ln, Rs is the upstream resistance common to the paths from S to Lj and from S to Ln. In this case, RS=R1+R2+R3 and Ck includes C4, C5, C6.
Previous approaches to routing used two main techniques to reduce connection delays: early branching and preservation of critical load routing. In “early branching”, after a first load has been routed, delay-based routes are encouraged to pick branch points close to the signal driver. Although this reduces the upstream resistance that contributes to later loads (and hence helps keep delay small), it does not eliminate the secondary effect on critical loads of capacitance on side routes past the branch point, termed “parallel capacitance”.
Second, routes to critical loads are preserved during rip-up and retry phases. This optimization does not exploit the possible advantage in control of parallel capacitance from preserving well-chosen routes to non-critical loads.
What is needed is a new delay-driven routing method that is less vulnerable to increases in delay due to uncontrolled increases in capacitance on routing branches parallel to critical routes.