Logic configurability is impractical, and in many instances not possible, with traditional synchronous shift register circuits, especially those that incorporate feed-back paths. The impracticality in providing configurability to synchronous shift register based circuits is a result of the fact that the number of data storage elements in synchronous shift register based circuits strictly governs the number and distribution of logical data bits in the circuit.
In many applications, a system, such as a processor chip or the like, may involve many different permutations of a particular logic algorithm. For example, an encoder may apply different formulae to implement different encoding schemes; a cyclic redundancy checker may apply different formulae to compute different checksums; and spread-spectrum communication systems and cryptographic engines may generate many different random code words.
It is possible to use software to implement variations in the configuration of logic based on a conventional shift register circuit. However, due to performance issues, such as real-time requirements, software based solutions are often not sufficient or compatible with other system requirements. Another possibility, is to implement different permutations of a logic algorithm by simply designing and building a distinct logic circuit for each permutation. Of course, this solution, while likely meeting performance requirements, greatly increases design effort, chip complexity, and the valuable chip real estate.
A conventional synchronous shift register 10 involves a series of data storage elements connected such that data bits propagate serially through the register. FIG. 1 shows one example of a conventional circuit implementation of a synchronous shift register 10, where each data storage element is a latch labeled “L”. On every positive clock edge or a clock input of a given latch, each latch copies a data bit from an input to an output of each latch.
A fundamental property of such a synchronous shift register circuit is that the data bit at the input of each storage element is assumed to be valid on every clock edge. Therefore, every register stage stores a valid data item, and the number of register stages (e.g., the number of latches strictly governs the number of data items in the register. An N-stage register stores N bits, and every bit shifts down the register (from one latch to the adjacent latch) by one position on every clock cycle.
Many computational circuits apply combinational logic on bits stored in a shift register. Two conventional shift register circuit families, feed-forward and feed-back, are shown in FIGS. 2 and 3. FIG. 2 shows an example of a feed-forward shift register circuit 12, where some combinational logic block 14 may operate on bits M(3) and N(5) of the shift register. In general, the logic can operate on any number of bits in the register. The logic, however, is limited to operating on bits where there is a connection to the combinational logic, e.g., M(3) (after the third latch L3) and N(5) (after the fifth latch L5). FIG. 3 shows a circuit 16 in a feed-back configuration, where a combinational logic block 18 may operate on an input, e.g., bits M (3) and N (5) of the shift register; the result is fed back to the beginning of the register. In both cases, the combinational logic block may produce one or more outputs. Notably, each circuit is limited by the number of register elements (latches) in the circuit and the position of the connections or taps between the shift register circuit and the combinational logic. A tap copies the value of a data bit and feeds it elsewhere. In the circuits of FIGS. 2 and 3, bits M(3) and N(5) have taps to provide the data to the combinational logic 14 and 18, respectively.
FIGS. 4 and 5 show examples of actual circuits that share the same basic structures as the ones in FIGS. 2 and 3, respectively. The feed-forward circuit 20 of FIG. 4 generates the outputsout0[t]=bk[t]bk+2[t]out1[t]=bk[t]bk+2[t]  (1)where t denotes the clock cycle, and bk and bk+2 represent two data bits two positions apart, represents the logical AND operation performed by the AND gate 22, and Λ represents the logical OR operation performed by the OR gate 24.
Because the number and order of bits in a shift register are strictly bound to the structure of the circuit, each circuit can only implement a set combinational formula. For instance, the circuit of FIG. 4 always produces outputs according to (1). The circuit cannot implement another formula without additional circuitry.
It is possible to attach a tap to every register stage, and include a multiplexer to select a subset of the tapped bits and feed them into different combinational logic blocks. However, this involves a large overhead in circuit complexity and latency, especially if the shift register is very long, i.e., includes a relatively large number of registers. As a result, it is usually more efficient to design a separate register and combinational logic circuit for each permutation of logic to be implemented. While addressing the complexity and some of the latency issues, both the use of a multiplexor as well as deploying separate circuits, can involve substantial chip real estate.
The feed-back circuit 26 of FIG. 5 illustrates another aspect of configurability that is restricted by the use of a simple shift register. The circuit generates the next input bit b1 to the register according to the formula:b1[t+1]=(b2[t]⊕b5[t])⊕in [t].  (2)The symbol + represents the functions performed by the XOR gates 28 and 30, because the input to the register is a function of its current state, the number of data bits in the shift register is pertinent to the correctness of the computation. For example, the circuit of FIG. 5 computes the remainder of a binary division of the input polynomial by the polynomial represented by the shift register structure, in this case x5+x2+1. It is not possible to use the same circuit to compute the remainder of a division by a polynomial of a different order, for example, x7+x3+1. This is because there are always five data bits in the shift register, and therefore the circuit can only compute the remainder by a fifth order polynomial. Thus, conventionally, a different circuit must be designed and built for each and every computation. As discussed in greater detail below, polynomial computations of different orders occur in a wide variety of applications.
It may also be possible to build a synchronous pipeline with multiple parallel bypass paths, each with a different number of pipeline stages, and select between these stages to allow configurability. However, as with other conventional solutions set out above, this is complex and may involve a prohibitive amount of overhead and real estate driving designers to implement separate circuits for different computations.
In light of these and other problems in the art, the following disclosure describes various reconfigurable circuit implementations as well as discusses how to build on the base of reconfigurable circuit knowledge set out herein to implement other implementations. Reconfigurable circuits as set out below alleviate the need to build multiple circuits for each computational effort or to deploy other impractical circuit implementations. Further, some of the solutions set out below illustrate how it is possible to decouple the logical data distribution from the physical circuit structure using asynchronous control protocols. By loading different distributions of control tokens into a circuit, it is possible to use one circuit to implement many different desired logical permutations of an algorithm.