As individual reconfigurable computational elements are adapted to perform more complex logic, arithmetic, and other processor-oriented functions, at least two vital structural components have emerged as impediments to efficient use of computational resources and to enhanced performance of reconfigurable processors. These two structural components are: (1) the programmable routing resources disposed within reconfigurable logic arrays, and (2) the bit permutation resources that are programmatically formed from reconfigurable computational elements.
As to the latter, specialized permutations circuits functions, such as shifting circuits, have been traditionally formed by programming reconfigurable computational elements to perform such functions. In conventional designs, a significant amount of computational resources are dedicated to support these shifter circuits and bit-manipulative circuits, which consume a relatively large amount of circuit area. Regarding the former, programmable routing resources (or routing networks) are crucial to connect an output of any one reconfigurable element to an input of any other reconfigurable element. To do so, reconfigurable computational elements as well as switches generally require configuration data to program their respective functionalities, which includes routing data. But as those functionalities become increasingly complex, the number of interconnections also increases, which thereby further burdens the routing resources. As a result, the increased number of interconnections in routing resources causes circuit area and timing delays to increase correspondingly.
FIG. 1 illustrates a conventional routing network typically formed within a reconfigurable logic array 100. Reconfigurable logic array 100 includes a number of Arithmetic Logic Elements (“ALEs”) 101, 105, 106, 107 and 109 as reconfigurable computational elements for performing logic and arithmetic functions and requires a number of routing resources. First, a typical ALE 101 requires interconnections for receiving configuration data bits to implement logic functions and also requires interconnections between one to two logic outputs and three or more logic inputs to a routing network. Horizontal routing blocks 102 and vertical routing blocks 103 represent the routing network and each typically consist of a large number of switches for routing data among ALEs 101. Also, a large number of interconnections reside in routing blocks 102, 103 to support transport of configuration data bits to control the data path routings. For example, switches and interconnects at cross-bar routing block 108 must be sufficient to configure the routing of the output of ALE 105 to the input of ALE 109 by first routing data over one of horizontal routing blocks 102 and then routing data over one of vertical routing blocks 103.
FIG. 2A depicts a switch circuit 300 commonly used to facilitate data path routing in routing blocks 102, 103 of FIG. 1. Switch circuit 300 is a 16 input-to-1 output routing block that selects any one input 302 (e.g., any input P.0 to input P.15) and then routes data from a selected input 302 via an enabled cross-point gate 303 to output (“Y”) 301. In operation, at most one static configuration signal 305 (i.e., any signal from “cf0” to “cf15”) controls the activation of a specific cross-point gate 303. Static configuration signal 305 is a “static” signal. That is, it is generated by a specific configuration register, the contents of which are loaded when reconfigurable logic array 100 is initialized. Thereafter, the state of the static signals remains unchanged. FIG. 2B illustrates a typical implementation of cross-point gate 303 of FIG. 2A including an inverter 292 and a complementary metal-oxide semiconductor (“CMOS”) transmission gate 294 to operate as a three-state switch.
FIG. 3 depicts a cross-bar routing block 108 for routing data via switch circuits 300 between the outputs and the inputs of the reconfigurable computational elements. In particular, FIG. 3 shows a specific data path for routing data from ALE (1) 106 to ALE (2) 107 as determined by the configuration bits routed into cross-bar routing block 108. Each of sixteen outputs from “O/P 0” to “O/P 16” of ALE (1) 106 is connected to a separate vertical switch in vertical routing blocks 103. Each of sixteen inputs from “I/P 0” to “I/P 16” of ALE (2) 107 is connected to a separate horizontal switch in horizontal routing blocks 102. As shown, output O/P 0 is connected to vertical switch (“SW-1V”) 320a, which is a separate switch than switch (“SW-16V”) 320b. Also, input I/P 16 is connected to horizontal switch (“SW-16H”) 322b, which is separate from switch (“SW-1H”) 322a for input I/P 0. Although omitted for purposes of discussion, other inputs and outputs each require a similar number of separate horizontal and vertical switches. Therefore, cross-bar routing block 108 does not use fifteen inputs of each switch and corresponding interconnections at any one time, which decreases the efficacy of cross-bar routing block 108 to maximize the use of its circuitry. Moreover, the relatively large number of switch circuits 300 consume circuit area that otherwise could be used for other purposes.
As an example, consider that cross-bar routing blocks 108 are used to route data in reconfigurable logic array 100 of N rows by N columns of reconfigurable computational elements, where “N” is 64. Further consider a simple case where there are only two inputs per reconfigurable computational element. As such, there would be a total of N2 or 4,096 reconfigurable computational elements requiring a total of 8,192 cross-bar routing blocks 108. As the combination of cross-bar routing blocks 108 would then yield 4,096 inputs, then a total of 33,554,432 switch circuits 300 would be needed (as well as an equivalent same number of configuration registers and corresponding bit paths).
A common approach to reduce routing resources requirements for a reconfigurable logic array includes adding one or more stages to form a multi-stage routing network, but retaining the use of cross-bar routing block 108. In this approach, either a set of horizontal routing blocks are configured to drive a set of vertical routing blocks, or vice versa. Or, these two approaches can be mixed as demonstrated in U.S. Pat. No. 6,633,181 B1, entitled “Multi-Scale Programmable Array,” which is commonly owned by the owner of this application and is incorporated by reference in its entirety for all purposes. For example, for each row and column, there is a first stage of K switches with N inputs per switch and then a second stage of switches for each of the inputs with K inputs per switch. To compare the reduction of switches and interconnections in view of the previous example, consider K=64, such that there will 64 first stage routing circuits per row and 2*N second stage routing circuits per column, thus the total number of switches required for horizontal routing is 64*64*64=262,144, and the number of switches for vertical routing is 2*64*64*64=524,288. The total number of switches for this approach is 786,432. Although this approach reduces circuit area, there might be an increase in complexity in the placement of functions as well as a possible increase in delay time between the source and the destination for each path. A further reduction in cost might be achieved by adding more routing stages, but the increased number of routing stages makes computing the optimal routing paths for a given placement of functions very difficult.
Thus, there is a need for a system, an apparatus and a method for routing data over fewer switches and interconnections among reconfigurable logic elements to conserve reconfigurable computational elements without substantially increasing the difficulty of computing routing paths, and for adapting routing resources to dynamically perform complex bit-level operations.