A FPGA is an integrated circuit designed to be configured by a customer or a designer after being manufactured. The FPGA configuration is generally specified using a hardware description language (HDL). Contemporary FPGAs have large resources of logic gates and random access memory (RAM) blocks to implement complex digital computations. FPGAs typically contain programmable logic components called “configurable logic blocks” (CLB) or “logic array blocks” (LAB), and a hierarchy of reconfigurable interconnects that allow the blocks to communicate with each other. Logic blocks can be configured to perform complex combinational functions, or merely simple logic gates like AND and XOR. In most FPGAs, the logic blocks also include memory elements, which may be simple flip-flops or more complete blocks of memory.
An application circuit can be mapped into a FPGA provided that adequate resources are available. While the number of CLBs/LABs and I/Os required can be easily determined from the design, the number of routing tracks needed may vary considerably even among designs with the same amount of logic. For example, a crossbar switch typically requires much more routing than a systolic array with the same gate count. Since unused routing tracks (i.e. wires) increase the cost (and decrease the performance) of the part without providing any benefit, FPGA manufacturers try to provide just enough tracks so that most designs that will fit in terms of Lookup tables (LUTs) and IOs can be routed. This is determined by estimates such as those derived from Rent's rule or by experiments with existing designs.
Generally, in hierarchical networks, information can be transmitted and/or received between various elements (e.g. CEs, switches, etc.) that are directly connected in an iterative manner. Typically, modern hierarchical networks can be based on a Benes network that includes Y computing elements (CEs) that communicate with each other by 2*log(Y) stages of 2×2 switches. Benes networks are rearrangable and non-blocking providing congestion free communication between CEs. A fat-tree network can reduce the number of stages from 2*log(Y) to log(Y) by allowing communications to flow forward and backward at each stage.
A Benes network including 8 CEs with 5 stages of 2×2 switches is illustrated in FIG. 1A. The network 100 includes eight CEs 102 each having 2 inputs 110 and 2 outputs 104. Multiple CEs can be abstracted into a single CE by adding more input and output wires. For example, CE 7 and CE 8 can be combined to form a block 112 with 4 outputs. Similarly, CE 7 and CE 8 can be combined to form a block 114 with 4 inputs. Following this abstraction, any single-route Benes network can be transformed into a double-route Benes network, as shown in FIG. 1B. In the network 150, the number of switches per stage can be reduced by half compared to network 100, by using double-route switches instead of single-route switches. For example, the switches 116 and 118 illustrated in FIG. 1A are both single-route switches having 2 inputs and 2 outputs. However, the abstracted switch 152 in FIG. 1B is a double-route switch having 4 inputs (2 pairs) and 4 outputs (2 pairs). Although constructed differently, the functionality of network 100 and network 150 are identical. Since 4-input CEs are more commonly applied to FPGAs, the rest of this description is illustrated using 4-input CEs with double-route switches, but it should be clear to one of ordinary skill in the art that this description can apply to a variety of CEs with arbitrary number of inputs and outputs and various switches with arbitrary number of stages and input and output routes by using similar abstraction techniques as discussed above. As an example, a Benes network that includes 8 CEs with 5 stages art is illustrated in FIG. 1C. The network 180, includes 8 CEs with 5 stages with each CEs' having 4-input, 4-output utilizing double-route switches.
A fat-tree network comprising of 8 CEs with 3 states of 2×2 switches is illustrated in FIG. 2. The network 200 includes 2×2 switches 202 having 4 bi-directional wires on both sides of the switch. Although drawn with 4 bi-directional wires 204 on both sides of the switch, the same can be implemented as 8 uni-directional wires 206 (4-input, 4-output) on each side of the switch.
A radix-2 fat-tree network of 16 CEs with 4 stages of 2×2 switches is illustrated in FIG. 3. The network 300 includes cross-routes where the distance between CE and various switches can be computed. For example, the cross-routes in the first stage (such as cross-route 302) have a distance of 1, which is 2Z-1 where Z=1 (Z corresponding to the stage). The cross-routes in the second stage (such as cross-route 304) have a distance of 2, which is 2Z-1 where Z=2. The cross-routes in the third stage (such as cross-route 306) have a distance of 4, which is 2Z-1 where Z=3. The cross-routes in the fourth stage (such as cross-route 308) have a distance of 8, which is 2Z-1 where Z=3. Further, a radix boundary can be defined (illustrated by the dotted lines) as the boundary each cross-route crosses in realizing a fat tree network. For example, routing between a CE and stage 1, the cross-route 302 crosses the radix boundary 310. Similarly, in transmitting data between stage 1 and stage 2, the cross-route 304 crosses boundary 312. Likewise, between stage 2 and stage 3, the cross-route 306 crosses boundary 314 and between stage 3 and stage 4, the cross-route 308 crosses boundary 316.