Although microprocessor computing power has been progressively increased, the need for additional increases remains unabated. For example, smart phones now burden their processors with a bewildering variety of tasks. But a single core processor can only accommodate so many instructions at a given time. Thus, it is now common to provide multi-core or multi-threaded processors that can process sets of instructions in parallel. But such instruction-based architectures must always battle the limits imposed by die space, power consumption, and complexity with regard to decreasing the instruction processing time.
As compared to the use of a programmable processing core, there are many algorithms that can be more efficiently processed in dedicated hardware. For example, image processing involves substantial parallelism and processing of pixels in groups through a pipeline of processing steps. If the algorithm is then mapped to hardware, the implementation takes advantages of this symmetry and parallelism. But designing dedicated hardware is expensive and also cumbersome in that if the algorithm is modified, the dedicated hardware must be redesigned.
To provide an efficient compromise between instruction-based architectures and dedicated hardware approaches, a reconfigurable instruction cell array (RICA) architecture has been developed. FIG. 1A illustrates an example RICA system 50. In this RICA system 50, a plurality of instruction cells 2 are interconnected through a programmable switching fabric 4. The configuration of the instruction cells (with regard to what sort of logical function or instruction they implement) as well as the switching fabric can be reprogrammed every clock cycle as necessary to implement a given algorithm or function. The instruction cells process data as retrieved by MEM cells 12 (which in turn are loaded from a data RAM 8). This processing by the instruction cells 2 occurs according to configuration instructions 10 obtained from a configuration RAM 6. A decode module 11 decodes instructions 10 to not only get the programming for the instructions cells 2 but also for the switching fabric 4. Additional features shown in FIG. 1A are described in U.S. Patent Publication No. 2010/0122105, filed Apr. 28, 2006, the contents of which are hereby incorporated by reference in their entirety.
Note the advantages of RICA: an algorithm such as image processing that involves processing multiple pixels through a pipelined processing scheme can be mapped to instruction cells in a manner that emulates a dedicated hardware approach. But there is no need to design dedicated hardware, instead one can merely program the cells and switching fabric as necessary. Thus, if an algorithm must be redesigned, there is no need for hardware redesign but instead a user may merely change the programming as necessary. This is quite advantageous over traditional instruction-based computing approaches.
Although RICA thus offers robust advantages, challenges remain in its implementations. For example, it is conventional to arrange the instruction cells in a reconfigurable array by rows and columns. Each instruction cell, any associated register, and the input and output switching fabric may be considered to reside within a switch box. FIG. 1B shows an example array of switch boxes arranged in rows and columns. The switching fabric in each switch box must then accommodate a data path that might begin at a given switch box 100 at some row and column location and then end at some other switch box 105 at a different row and column location. In this data path, two instruction cells are configured arithmetic logic units (ALUs) 110. The instruction cells for the remaining switch boxes are not shown for illustration clarity. Note that each switch box must then accommodate two switching matrices or fabrics: an input switching fabric to select for the inputs to its instruction cell and also an output switching fabric to select for the outputs from the switch box.
In contrast to an instruction cell, the logic block in a field programmable gate array (FPGA) uses lookup tables (LUTs). For example, suppose one needs an AND gate in the logic operations carried out in a configured FPGA. A LUT would then be programmed with the truth table for the AND gate logical function. But an instruction cell is much “coarser-grained” in that it contains dedicated logic gates. For example, an ALU instruction cell would include assorted dedicated logic gates. It is the function of the ALU instruction cell that is configurable—its primitive logic gates are dedicated gates and thus are non-configurable. For example, a conventional CMOS inverter is one type of dedicated logic gate. There is nothing configurable about such an inverter, it needs no configuration bits. But the instantiation of an inverter function in a FPGA programmable logic block is instead performed by a corresponding programming of a LUT's truth table. Thus, as used herein, the term “instruction cell” refers to a configurable logic element that comprises dedicated logic gates.
An instruction cell performs its logical functions on one or more operands to form an instruction cell output. An operand in this context is a received input channel. Depending upon its configuration bits, an instruction cell is configured to perform corresponding logical operations. For example, a first switch box may include an ALU instruction cell configured to add two operands corresponding to two channel inputs. But the same ALU instruction cell may later be updated to subtract the two operands. The instruction cell output that results from the logical operation within the instruction cell may be required in another instruction cell. Thus, the output switch fabric in the first switch box would be configured to drive the instruction cell output out of the first switch box through corresponding channel outputs. In contrast, an FPGA's LUTs each produce a bit, they do not generate words. So the switch fabric in an FPGA is fundamentally different from the switch fabrics in a RICA in that an FPGA's switch fabric is configured to route the bits from the FPGA's LUTs. In contrast, the routing between switch boxes in a RICA is configured to route words as both input channels and output channels. For example, a switch box array maybe configured to route twenty channels. Switch boxes in such an embodiment may thus receive twenty input channels from all four directions (as defined by the row and column dimensions) and drive twenty output channels in the four directions. The column dimension may be considered to correspond to the north and south directions for any given switch box. Similarly, the row dimension may be considered to correspond to the east and west directions.
Each output channel from a switch box may be selected for by a corresponding channel output multiplexer within the switch box. Such a channel output multiplexer comprises a collection of output multiplexers, each output multiplexer corresponding to just one bit of the channel word width. The following discussion is referring to the channel output multiplexer that selects for the entire channel but it will be understood that such a channel output multiplexer actually comprises a plurality of output multiplexers each having a single bit output. With regard to any given output direction (e.g., north, south, east, or west), there are three remaining input directions. For example, a north output channel may be selected from the east, west, and south input channels. Each channel output multiplexer for a given output direction could thus comprise a 3:1 multiplexer. But an output channel may also be driven by a switch box's instruction cell output. Thus, each channel output multiplexer may comprise a 4:1 multiplexer in a RICA switch box. If the column channels are assumed to travel in north and south directions, a switch box would thus require twenty 4:1 channel output multiplexers to drive the north output channels and another twenty 4:1 channel output multiplexers to drive the south output channels in a twenty channel embodiment. Similarly, row channels may be assumed to travel in the east and west directions. Thus, a switch box in a twenty channel embodiment would include twenty 4:1 channel output multiplexers to drive the east output channels and twenty 4:1 channel output multiplexers to drive the west output channels. The resulting set of 4:1 channel output multiplexers for all four directions forms the output switch fabric for each switch box.
Each 4:1 channel output multiplexer requires two configuration bits to control which one of the 4 inputs it has available that should be selected to drive the 4:1 channel output multiplexer's output channel. In a conventional RICA, these configuration bits are static: they are part of the configuration stream that also configures the logical operation of the instruction cells and the input switch fabric for each switch box. But certain applications such as multi-media applications require conditional moves that a static output switching fabric cannot accommodate.
Accordingly, there is a need in the art for or reconfigurable instruction cell arrays having output switch fabrics with conditional move capabilities.