1. Field of the Invention
The present invention is related to reconfigurable architectures and, more particularly, to reconfigurable architectures used to process information in a pipelined fashion.
2. Description of the Background
Traditional approaches to reconfigurable computing statically configure programmable hardware to perform a user-defined application. The static nature of such a configuration causes two significant problems: a computation may require more hardware than is available, and a single hardware design cannot exploit the additional resources that will inevitably become available in future process generations. A technique called pipelined reconfiguration implements a large logical configuration on a small piece of hardware through rapid reconfiguration of that hardware. With this technique, the compiler is no long responsible for satisfying fixed hardware constraints. In addition, a design's performance improves in proportion to the amount of hardware allocated to that design.
Pipelined configuration involves virtualizing pipelined computations by breaking a single static configuration into pieces that correspond to pipeline stages in the application. Each pipeline stage is loaded, one per cycle, into the fabric. This makes performing the computation possible, even if the entire configuration is never present in the fabric at one time.
FIG. 1 illustrates the virtualization process, showing a five-stage pipeline virtualized on a three-stage fabric. FIG. 1A shows the five-stage application and each logical (or virtual) pipeline stage's state in six consecutive cycles. FIG. 1B shows the state of the physical stages in the fabric as it executes this application. In this example, virtual pipe stage 1 is configured in cycle 1 and ready to execute in the next cycle; it executes for two cycles. There is no physical pipe stage 4; therefore, in cycle 4, the fourth virtual pipe stage is configured in physical pipe stage 1, replacing the first virtual stage. Once the pipeline is full, every five cycles generates two results for two consecutive cycles. For example, cycles 2, 3, 7, 8 . . . consume inputs and cycles 6, 7, 11, 12, . . . generate outputs.
FIG. 2 is an abstract view of the architectural class of a pipelined fabric. Each row of processing elements (PEs) together with its associated interconnections is referred to as a stripe. Each PE typically contains an arithmetic logic unit (ALU) and a pass register file. Each ALU contains lookup tables (LUTs) and extra circuitry for carry chains, zero detection, and so on. Designers implement combinational logic using a set of N B-bit-wide ALUs. The ALU operation is static while a particular virtual stripe resides in a physical stripe. Designers can cascade, chain or otherwise connect the carry lines of the ALUs to construct wider ALUs, and chain PEs together via an interconnection network to build complex combinational functions.
Because reconfigurable fabrics provide an opportunity to carry out a process in a fabric having fewer physical stripes than the process requires, it is necessary to associate the virtual stripes to the physical stripes. FIG. 3 illustrates a global association option in which any virtual stripe can be loaded into any physical stripe. Global association provides an advantage in that storage is consolidated saving on memory overhead. However, a substantial disadvantage is that the design is not scalable. As the number of physical stripes increases, the global bus lines become long and highly-loaded. Thus, although global association may work well in fabrics having small numbers of physical stages, as hardware improves and the number of physical stages is increased, associating each physical stripe with any virtual stripe becomes less and less desirable.
Turning to FIG. 4, a purely local association option is illustrated. As seen in FIG. 4, physical stripe 1 can be configured with virtual stripes 0 and 4. Physical stripe 2 can be configured with virtual stripes 1 or 5. Physical stripe 3 can be configured with virtual stripes 2 or 6 while physical stripe 4 can be associated with virtual stripes 3 or 7. Like FIG. 3, there are still four physical and eight virtual stripes. The local association illustrated in FIG. 4 overcomes the disadvantage of global association of FIG. 3 in that the association option of FIG. 4 is scalable due to short and lightly-loaded configuration buses. The local association option illustrated in FIG. 4 is also faster than the global association option due to smaller memories and the ability to interleave the access to those memories, thus allowing the memory to cycle more slowly than the fabric. The local association of FIG. 4, however, has some disadvantages in that the storage is highly distributed and therefore inefficient because of the overhead necessary for operation of the distributed storage.
Additional buses must be provided to have an operational device. For example, input and output buses must be provided. Typically, such input and output buses are global in that they service each of the physical stripes. However, if the input and output buses are less than global, it is necessary to insure during the design phase that a physical stripe that is not serviced by the input bus will not be required to be the first physical stage and that a physical stripe not serviced by the output bus will not be required to be the last physical stage. Finally, it may be necessary for some value produced by a physical stripe to be used in the next instance of that physical stripe. In that case, the value must be taken from the physical stripe, stored in memory, and input (restored) to that or another physical stripe when the next instance of that stripe occurs. If the bus providing that function is less than global, it is necessary during the design phase to insure that a physical stripe that is not serviced by the restore bus will not be required to provide or receive such a value.
Thus, the need exists for an association option which maintains the advantages of global association while at the same time being scalable, is capable of providing state information to stripes as needed, and is capable of outputting information even when the output stripe is not serviced by an output bus.