Although processor speeds have been progressively increased, the need for increased computing power remains unabated. For example, smart phones now burden their processors with a bewildering variety of tasks. But a single-core processor can only accommodate so many instructions at a given time. Thus, it is now common to provide multi-core or multi-threaded processors that can process sets of instructions in parallel. But such instruction-based architectures must always battle the limits imposed by die space, power consumption, and complexity with regard to decreasing the instruction processing time.
As compared to the use of a programmable processing core, there are many algorithms that can be more efficiently processed in dedicated hardware. For example, image processing involves substantial parallelism and processing of pixels in groups through a pipeline of processing steps. If the algorithm is then mapped to hardware, the implementation takes advantages of this symmetry and parallelism. But designing dedicated hardware is expensive and also cumbersome in that if the algorithm is modified, the dedicated hardware must be redesigned.
To provide an efficient compromise between instruction-based architectures and dedicated hardware approaches, a reconfigurable instruction cell array (RICA) architecture has been developed. FIG. 1A illustrates an example RICA system 50 having a reconfigurable core 1. In RICA 50, a plurality of instruction cells 2 such as adders (ADD), multipliers (MUL), registers (REG), logic operation shifters (SHIFT), dividers (DIV), data comparators (COMP), logic gates (LOGIC), and logic jump cells (JUMP) are interconnected through a programmable switching fabric 4. The configuration of instruction cells 2 with regard to their logical function or instruction they implement can be reprogrammed every clock cycle as necessary to implement a given algorithm or function. Switching fabric 4 would be reprogrammed accordingly as well. Instruction cells 2 include memory interface cells 12 that interface data for instructions cells 2 as retrieved or loaded into a data memory 8. The resulting processing by instruction cells 2 occurs according to configuration instructions 10 obtained from a configuration RAM 6. A decode module 11 decodes instructions 10 to not only get the configuration data for instructions cells 2 but also for switching fabric 4. RICA 50 interfaces with external systems through I/O ports 16 and specialized instructions cell registers 14. Additional features shown in FIG. 1A are described in U.S. Patent Publication No. 2010/0122105, filed Apr. 28, 2006, the contents of which are hereby incorporated by reference in their entirety.
Note the advantages of a RICA: an algorithm such as image processing that involves processing multiple pixels through a pipelined processing scheme can be mapped to instruction cells in a manner that emulates a dedicated hardware approach. But there is no need to design dedicated hardware, instead one can merely program the cells and switching fabric as necessary. Thus, if an algorithm must be redesigned, there is no need for hardware redesign but instead a user may merely change the programming as necessary. This is quite advantageous over traditional instruction-based computing approaches.
Although a RICA thus offers robust advantages, challenges remain in its implementations. For example, it is conventional to arrange the instruction cells in a reconfigurable array by rows and columns. Each instruction cell, any associated register, and the associated input and output switching fabric for the instruction cell may be considered to reside within a switching box. FIG. 1B shows an example array of switch boxes arranged in rows and columns. A datapath formed between selected switch boxes is carried on selected channels from a plurality of channels. The channels are also arranged in rows and columns matching the rows and columns for the switch boxes. Each channel has a certain width in bits. The row directions may be considered to run east and west whereas the column directions run north and south. A datapath beginning in an instruction cell in an initial switchbox 100 routes from initial switch box 100 on a channel 101 in an east row direction. The routing for the datapath from subsequent switch boxes is in the appropriate east/west row direction or north/south column direction such that a final switch box 105 at some selected row and column position is reached. In this example data path, two instruction cells are configured as arithmetic logic units (ALUs) 110. The instruction cells for the remaining switch boxes are not shown for illustration clarity. Note that each switch box must then accommodate two switching matrices or fabrics: an input switching fabric to select for channel inputs to its instruction cell and also an output switching fabric to select for the channel outputs from the switch box. This disclosure focuses on the output switching fabric.
The number of channels for a RICA is arbitrary—e.g., suppose there are 20 channels, each 8 bits wide. The output switch fabric for any given direction for a switch box could then use 20*8=160 multiplexers to drive the 160 bits in the 20 channels. For example, initial switch box 100 would include 160 multiplexers to drive the 20 channels in east row direction 101 in such an embodiment. An example output switch fabric 150 is shown in FIG. 1C. Switch fabric 150 is configured to switch the channels with regard to north, south, east, and west directions. With regard to each direction, switch fabric 150 receives the channels on input conductors. Similarly, switch fabric 150 drives the channels in each direction on corresponding output conductors. As known in the integrated circuit layout arts, the routing of the channels occurs in tracks in corresponding metal layers. For example, the south input conductors for the channels are arranged in a track 171 that becomes the track for the north output conductors for the channels. Similar tracks cross switch fabric 150 for the north-to-south, east-to-west, and west-to-east routing. The channels are driven out of each side of switch fabric 150 on the output conductors by corresponding multiplexers.
Although a “channel” is a signal that is distinct from the conductors on which it is carried, it is convenient to simply refer to a channel carried on corresponding input conductors as an “input channel.” Similarly, a channel carried on corresponding output conductors is an “output channel” For example, a south switching circuit 155 includes the multiplexers to drive the south output channels. Similarly, an east switching circuit 160 includes the multiplexers to drive the east output channels, a west switching circuit 165 includes the multiplexers to drive the west output channels, and a north switching circuit 170 includes the multiplexers to drive the north output channels.
Referring again to FIG. 1B, the output channels for a given switch box's output switch fabric become the input channels for a neighboring switch box's output switch fabric. For example, channel 101 in FIG. 1B is the east output channel for initial switch box 100 whereas channel 101 is the west input channel for neighboring switch box 115.
By grouping all the output multiplexers in corresponding switching circuits, output switching fabric 150 of FIG. 1C suffers from a large degree of bus turning. In that regard, as known in the routing arts, the row and column routing is typically organized in corresponding tracks. With regard to a switching fabric, the track for input conductors in a given direction becomes the track for the output conductors in the opposing direction. Such tracking greatly simplifies the row and column routing. For example, a track 172 for the west input channels spans across the die space for north switching circuit 170 and east switching circuit 160. Track 172 does not run across the die space dedicated to south switching circuit 155. Because channel routing for the north and south directions cannot short to the channel routing for the east the west directions, the row and column routing occurs in dedicated metal layers. For example, a first metal layer (or layers) may be dedicated to the east/west row routing whereas a second metal layer (or layers) would carry the north/south column routing.
The west input channels must thus be “bus turned” in a different metal layer to be received at the multiplexers in south switching circuit 155. The west input channels could not route directly through the first metal layer to couple to south switching circuit 155 since they would then short to the east input channels in their track to south switching circuit 155. Analogous bus turning must occur for the other switching circuits. For example, the south input channels require bus turning to be received at east switching circuit 160. Such bus turning wastes die space, demands excessive power consumption, and leads to timing delays.
The channel switching for switch fabric 150 is conducted with regard to its north, south, west, and east sides of its footprint on its semiconductor substrate surface. With regard to any given footprint side, the corresponding switching circuit can select from the three remaining sides with regard to the input channel selection. For example, the multiplexers in south switching circuit 155 may select from the north input channels, the east input channels, and the west input channels. But south switching circuit 155 cannot select from the south input channels. Similarly, east switching circuit 160 may select from the input channels for the north, south, and west footprint sides. Such a restriction to the three remaining sides for the outputs from any given switch fabric footprint side is conventional in that it leads to considerable routing complexity reduction.
Much study has thus been expended for various switch fabric architectures that follow such a channel selection from the three remaining sides for any given switch fabric side. FIG. 2A shows one type of switch fabric architecture known as a disjoint matrix. In this example, there are five rows and five columns, each numbered from 0 to 4. Each one of rows (or each one of the columns) may be thought of as representing a channel for a given data word. Thus, there are five data channels in this system. For illustration clarity, the input and output channels are not shown separately. Instead, a given channel such as west channel 4 represents both the west input channel 4 and the west output channel 4. In a disjoint matrix, a given channel is restricted to be routed into the same channel. For example, the data word for channel 0 carried on its west input can be switched to propagate in the north output for channel 0 but cannot be switched to propagate in the north output for the remaining channels 1 through 4. Each channel output for a switch fabric side facing a given cardinal direction (north, south, east, or west) can thus be selected by a 3:1 multiplexer (not illustrated) that selects from the remaining sides facing the remaining cardinal directions.
Note the advantage of the disjoint matrix: the 3:1 multiplexer can be located at the intersection of the row and column for a given channel. The inputs to the 3:1 multiplexer are right there at the intersection—there needs to be no bus turning or spanning across other channels to get the inputs. Such a disjoint switching fabric thus greatly simplifies the layout design. But this disjoint simplification comes at a considerable restriction in routing flexibility: a disjoint matrix provides no means for selecting from other channels with regard to any given channel output.
To provide a more flexible routing ability, a universal switch matrix and a Wilton switch matrix have been developed as shown in FIG. 2B and FIG. 2C, respectively. In these switch matrices or fabrics, the selection of the output signals for a channel in a given cardinal direction is not restricted to the same channel. For example, in the universal switch matrix, the output in channel 4 in the north direction can selected from channel 0 west input, channel 4 south input, and channel 4 east input. Similarly, in the Wilton switch matrix, the output in channel 4 north can be selected from the inputs for channel 1 west, channel 0 east, and channel 4 south. But just like the disjoint matrix, each output in a given direction for a universal or Wilton switch matrix may be provided by a 3:1 multiplexer that selects from channel inputs from the remaining directions.
Regardless of the type of matrix, a given channel output in the column dimension is either headed in the north (N) direction or the south direction (S). Similarly, a given channel output in the row dimension is either headed in the west (W) direction or the east (E) direction. The input and output channels follow the same track regardless of the type of switching matrix. For example, the track for input channel 4 becomes the track for the output channel 4 in all the directions. In that regard, it is always the case (regardless of whether the matrix is disjoint, universal, or Wilton) that for a given channel in a given output direction, the same channel can be routed as an input with regard to the opposing cardinal direction. This same-channel-routing occurs for both the columns and the rows. Thus, a north input for a given channel can always be routed in that channel's south output. Conversely, a south input for a given channel can always be routed into that channel's north output. The analogous routing is true for the east and west outputs with regard to the west and east inputs. The possibility of selecting for another channel thus only exists when switching from the row dimension to the column dimension or vice versa. One of the inputs to the 3:1 multiplexing is thus always determined by the channel number and the opposite cardinal direction to the output.
Although universal and Wilton switch matrices have routing flexibility as compared to a disjoint approach, that flexibility comes at the cost of routing complication. For example, the ability to select for channel 0 west input with regard to channel 4 north output in the universal switch matrix example discussed above means that the channel 0 east input to the switching means (such as a 3:1 multiplexer) must span at least the intervening row channels 1, 2, and 3. The wire or lead for such a span must be electrically isolated from the remaining row channel routing as discussed above with regard to bus turning. Thus, the spanning wire such as from channel 0 west input to the multiplexer for the channel 4 north output in the universal matrix must then be routed on a different metal layer from the normal row tracking as coupled to by vias. This bus turning complicates the layout and design considerably.
Accordingly, there is a need in the art for a switching fabric architecture that can provide routing flexibility yet simplify the associated routing complexities.