1. Field of the Invention
The present invention is related to reconfigurable architectures and, more particularly, to reconfigurable architectures used to process information in a pipelined fashion.
2. Description of the Background
Traditional approaches to reconfigurable computing statically configure programmable hardware to perform a user-defined application. The static nature of such a configuration causes two significant problems: a computation may require more hardware than is available, and a single hardware design cannot exploit the additional resources that will inevitably become available in future process generations. A technique called pipelined reconfiguration implements a large logical configuration on a small piece of hardware through rapid reconfiguration of that hardware. With this technique, the compiler is no long responsible for satisfying fixed hardware constraints. In addition, a design's performance improves in proportion to the amount of hardware allocated to that design.
Pipelined configuration involves virtualizing pipelined computations by breaking a single static configuration into pieces that correspond to pipeline stages in the application. Each pipeline stage is loaded, one per cycle, into the fabric. This makes performing the computation possible, even if the entire configuration is never present in the fabric at one time.
FIG. 1 illustrates the virtualization process, showing a five-stage pipeline virtualized on a three-stage fabric. FIG. 1A shows the five-stage application and each logical (or virtual) pipeline stage's state in six consecutive cycles. FIG. 1B shows the state of the physical stages in the fabric as it executes this application. In this example, virtual pipe stage 1 is configured in cycle 1 and ready to execute in the next cycle; it executes for two cycles. There is no physical pipe stage 4; therefore, in cycle 4, the fourth virtual pipe stage is configured in physical pipe stage 1, replacing the first virtual stage. Once the pipeline is full, every five cycles generates two results for two consecutive cycles. For example, cycles 2, 3, 7, 8 . . . consume inputs and cycles 6, 7, 11, 12, . . . generate outputs.
FIG. 2 is an abstract view of the architectural class of a pipelined fabric. Each row of processing elements (PEs) together with its associated interconnections is referred to as a stripe. Each PE typically contains an arithmetic logic unit (ALU) and a pass register file. Each ALU contains lookup tables (LUTs) and extra circuitry for carry chains, zero detection, and so on. Designers implement combinational logic using a set of N B-bit-wide ALUs. The ALU operation is static while a particular virtual stripe resides in a physical stripe. Designers can cascade, chain or otherwise connect the carry lines of the ALUs to construct wider ALUs, and chain PEs together via an interconnection network to build complex combinational functions.
One of the key enabling structures for pipeline reconfiguration is the pass register file. An example pass register file 10 is shown in FIG. 3. Pass register file 10 is comprised of four registers 12, 14, 16, 18 (which may have an arbitrary bitwidth); a write port consisting of, in this figure, four multiplexers 20, 22, 24, 26 and a write address decoder 28; and a read port, consisting of, in this figure, a 4-to-1 multiplexers 30 responsive to a read address. The structure of FIG. 3 allows a functional unit connected to this register file 10 to read one value from the register file 10 and also allows a functional unit to write one value into one of the specific registers 12, 14, 16, 18. If a value is not written into one of the registers 12, 14, 16, 18 by the write port, then the value from the corresponding pass register in the previous pass register file in the previous stripe is written into registers 12, 14, 16, 18 via lines 32, 34, 36, 38, respectively.
FIG. 4 illustrates how four pass register files 42, 44, 46, 48 might be used in an application. In this figure, the pass register files 42, 44, 46, 48 are connected in a ring, but need not be so connected. In FIG. 4, only one register is shown in each of the register files 42, 44, 46, 48 although each of the register files could be arbitrarily large. In FIG. 4, data generated by Functional Unit 1 proceeds to Functional Unit 2 through one pass register file 44.
A chief problem with the structure of FIG. 4 is that the value, which is only meant for use by Functional Unit 2, continues through the other pass register files 46, 48, 42, in subsequent stripes. If the value is not overwritten by other stripes using this register, such values continue to propagate all the way back to Functional Unit 1. This activity is worthless for the computation, and dissipates significant power.
A related power consumption problem that occurs in pass register files in pipeline reconfigurable devices is that old values from previous applications that were in the chip continue to propagate through the chip, consuming power even though they are irrelevant to the current computation. Thus, the need exist for a mechanism in the pipeline fabric for terminating signals that are no longer needed for the computation.