The invention relates to reconfigurable computing devices. More particularly the invention relates to heterogeneous arrays with array element types capable of implementing multiple aspects of an application.
Reconfigurable devices, such as field programmable gate arrays (“FPGAs”), processor arrays and reconfigurable arithmetic arrays (“RAAs”), normally include a number of processing elements together with an interconnect scheme to connect them together. This interconnect commonly takes the form of a general-purpose routing network, but sometimes other more restrictive forms of interconnect are used. The interconnect typically includes one or more types of routing elements.
A routing element is a device used to route signals across an interconnect from one processing element to another. A routing element is controllable solely by configuration signals, which are signals directly or indirectly derived from the configuration process, and not dependent on run-time data. Examples of routing elements include pass transistors, tristate buffers, and statically configured multiplexers (i.e. multiplexers with the select input controlled by the configuration of the array) but regardless of the construction of the network its function remains the same—to propagate data from network inputs to network outputs.
A processing element has one or more data inputs and computes one or more data outputs, each of which is a function that may depend on one or more input values. Processing elements are controllable by data signals received from other processing elements, or by configuration signals, or by both. Examples of processing elements include adders, multipliers, FPGA-like Look-up tables (LUTs), and multiplexers with the select signal capable of being connected to a data input. Processing elements may include registers, so that the output is a function of the values of some or all of the inputs at earlier times.
A general purpose routing network has multiple input terminals and multiple output terminals (and possibly also some bi-directional terminals configurable as either input terminals or output terminals), and can be configured to create a connection between any input terminal and any output terminal. The general purpose routing network carries values of the same bit width. When configured, a general purpose routing network makes multiple independent connections, each one connecting a network input to one or more network outputs, while each network output is connected to at most one network input. A general purpose routing network can simultaneously make any two arbitrary connections (A→B) and (C→D) between any two network inputs A, C and any two network outputs B, D, where B≠D. These connections may pass through registers (so that there may be some time offset between network input and network output) and switches used to route the data. The bit width of a general purpose routing network is determined by the number of 1-bit data lines which are controlled by each bit of configuration memory in the switches of the general purpose routing network. Thus, in a 4-bit general purpose routing network, each bit of configuration memory controls 4 1-bit data lines. Data is therefore sent across the network as 4-bit wide words.
The design of a reconfigurable device is a process of specifying the properties of the processing elements and the interconnect. For both of these elements this involves a series of compromises, discussed below.
The choice of processing element is a compromise between functionality and various parameters such as physical size, operating speed or power dissipation. For example, adding functionality increases the size of each element, but may reduce the total number of elements needed to implement an application. Functionality is only worth adding if the reduction in number of elements outweighs the increase in size of each individual element, so that there is no net increase in application area. Increasing functionality impacts other parameters similarly.
There are various different types of reconfigurable devices, as noted above. There are also various different types of applications for reconfigurable devices. Each of the different types of reconfigurable devices typically perform some types of applications better than others. The assessment of the suitability of a particular processing element used in a reconfigurable device is therefore dependent on the type of applications the device is intended to be used for.
There are several “sweet spots” in the size/functionality space, partly due to partitioning of the application space (e.g. processor arrays are typically used for different types of applications than FPGAs), and partly because a combination of features together may be better than any one of them on their own (e.g. adding a multiplier or a divider to a processor may not be worthwhile, but adding both—with some sharing of hardware between them—is a net benefit).
The interconnect is also a compromise between functionality and various parameters such as physical size, operating speed or power dissipation. The ideal interconnect has zero propagation delay, no risk of one route interfering with another, and a negligible physical area. This ideal does not exist in practice. In reaching a suitable compromise, the properties of various elements can be considered, such as:
The processing elements:                High-speed processing elements can only be fully exploited with a high-speed interconnect;        It is beneficial to route data in the same bit width as the data is processed by the processing elements.        
The array:                The number of possible connections grows as the square of the number of processing elements. The “cost per element” of an interconnect that guarantees no interference between connections therefore increases with the number of processing elements. This may be affordable for small arrays, but is not for large ones.        Propagation delay will tend to increase with the size of the array.        
The applications:                If the applications written for use on the reconfigurable device are written such that the application can be implemented on a device having only nearest-neighbor connectivity, then the interconnect can be greatly simplified. If such simplification is not possible then a general-purpose routing network (as described above) is normally used as the basis of the interconnect, the terminals of the network being the terminals of the processing elements.        
To improve performance, a reconfigurable device may also include additional elements such as heterogeneous processing elements, a hierarchical routing network, and/or a heterogeneous interconnect. Heterogeneous processing elements are a combination of two or more different types of processing elements on one device, for example:                FPGAs with both lookup table based elements and dedicated multiplier blocks;        FPGAs with both lookup table based elements and product-term based logic; or        Processor arrays containing both integer and floating-point processors.        
Combining processing elements may be done for a variety of reasons, for example to attempt to reduce the “functionality vs. cost” tradeoff problem—if a feature is added as an alternative type of block on a device, then it doesn't add to the cost of all processing elements, just those processing elements that contain the added feature. While superficially attractive this approach has one significant problem—determining what the ratio of different types of processing elements should be and how they should be arranged relative to each other. For example, whether there should be a fine grain mixing of element types: ABABAB . . . or coarser grain mixing: AAABBBAAABBB, such as in a row or column of an array. The mixing analysis becomes more significant as more different types of processing elements are incorporated into a reconfigurable device.
A hierarchical routing network scheme typically allocates processing elements into groups, with heavy connections within groups, and additional connections between groups (and between groups of groups, etc.). In extensions to this model the groups may overlap—the boundaries are not opaque walls with no connections other than inter-group connections. For instance, processing elements at group boundaries may be members of both groups.
With a heterogeneous interconnect scheme there are two or more types of connections available, for example an additional fast but limited interconnect added to complement a slower but more capable general-purpose routing network:                Dedicated wiring may be added to support common connection patterns, e.g. the “Carry wires” in many FPGAs.        There may be dedicated nearest-neighbor connections in addition to a general purpose routing network.        
There is a significant difference between “heterogeneous” and “hierarchical” interconnects—hierarchical routing networks use the same type of connections for all levels of the hierarchy, but vary the reach of the connections from level to level, while heterogeneous interconnects use different types of connections for different networks. Note that an array may contain both heterogeneous and hierarchical interconnects.
Processors typically manage the flow of control within an application with a mixture of conditional and unconditional branches and jumps, and/or predicated execution of instructions. “Reconfigurable computing,” defined herein as computing by constructing an application-specific datapath to perform a computation on a reconfigurable device, is not normally so good at managing the control flow.
In processor arrays, while the individual processors are good at managing their own instruction flow they have little or no influence on the other processors in the array.
In FPGA-based reconfigurable computing, every path through the program has to be implemented in the hardware, even those that are not used very often. Given that up to 90% of run-time operations for a processor may be specified in just 10% of the code, this can result in most of the FPGA silicon area being dedicated to infrequently used operations. In the above example, 90% of the area is only used 10% of the time, whereas the remaining 10% of the area is used 90% of the time.
In other devices designed for reconfigurable computing (such as RAA) an attempt is made to improve on the FPGA situation. RAA has arithmetic logic units (“ALUs”) with instruction inputs so it is possible to dynamically change the functionality of the datapath by varying the instructions provided to the ALUs. However, this is not a perfect solution.
RAA ALUs process multi-bit words (e.g. 4-bit nibbles) rather than bits, and have a compact instruction encoding (again into 4 bits) to select the operation to perform on the input words. Control conditions, however, tend to be single bits expressing the true/false nature of the decision, for example:                Are the A and B inputs equal?        Is input A greater than input B?        Is bit 3 of an input set to 1?        
Processing such single-bit conditions (in statements like “if condition1 or condition2 then . . . ) with n-bit ALUs makes inefficient use of the ALU datapath: (n−1) of the bits are unused.
This results in a situation where the 1-bit nature of FPGAs makes them good for processing conditions, but poor at branching based on the result of the condition, while multi-bit RAA-like devices are better at branching, but inefficient at processing the conditions.
A useful implementation technique for reconfigurable computing applications is to process data in a bit (or nibble, or some other fraction of the word or other full-width data item) serial form—a single processing element is used in consecutive clock cycles to process consecutive parts of a word. This technique allows area and throughput to be traded off against each other—serialized processing takes longer but uses a smaller number of processing elements.
The ability to transform data between serial and parallel formats is useful in serialized processing. One way of performing this transformation is by using circuits constructed from multiplexers and registers.
Multiplexers are also useful in a reconfigurable device to implement a number of common 1- and 2-input logic functions. These examples are written in terms of the C/java “conditional choice” operator: “a=(b?c:d);” being shorthand for “if (b) then {a=c;} else {a=d;}”    A & B=A?B:0    A|B=A?1:B    NOT A=A?0:1    A^B=A?(NOT B):B
As discussed above, a heterogeneous array provides a mix of processing elements optimized to handle different wordlengths. However conventional heterogeneous arrays suffer from the ratio determining problems discussed above. A useful solution to these problems is to design the first type of processing elements such that they are biased towards multi-bit processing but capable of 1-bit processing, and design the second type of processing elements such that they are biased towards 1-bit processing but capable of multi-bit processing.