The present invention relates to application-specific processor architectures and, more particularly, to an inner product processor architecture that is reconfigurable base upon the square recursive decomposition of partial product matrix and dynamic reconfiguration of basic multipliers and adders.
The design of an inner product processor inevitably faces the limitation of the amount of VLSI area allowed for the processor. An excessive amount of VLSI area used for such a processor is a concern of both economy and performance. Under a restricted VLSI area, the design of such a processor often introduces a conflict between its versatility and computation speed. The multipliers used in the processor should be large in size (such as 64xc3x9764 bits), if the processor is designed to compute the inner product of two input arrays (vectors) with array item precision ranging from integer (say, an 8-bit number), to double precision (say, 64-bit) numbers. Thus, the design would sacrifice the inner product capability of the processor handling the number of items in the arrays. In other words, the array size should be small. This leads to a very inefficient application for input arrays with a large amount of lower precision items. On the other hand, if the multipliers of the processor are restricted to small size (such as 8xc3x978 or 16xc3x9716 bits), the computation for input arrays with higher precision items will be impossible. Such a problem exists in all known inner product processor designs found in literature, including those that do not adopt a multiply/add approach, where the parameters for the number of items (N) and the number of bits (b) in each item are always fixed with an individual design. [See S. P. Smith and H. C. Torng, xe2x80x9cDesign of a Fast Inner Product Processorxe2x80x9d, Proc. IEEE Symp. on Computer Arithmetic, 1985.]
One of the objects of the present invention is to introduce novel reconfigurable inner product processor schemes which would resolve the design conflict between versatility and computation speed. Thus, it would be possible to build feasible and efficient arithmetic processors useful in most scientific and engineering applications, as also described in the applicant""s copending patent application for PARALLEL VLSI SHIFT SWITCH LOGIC DEVICES, Ser. No. 09/022,248, filed on Feb. 11, 1998.
The processor of the current invention possesses the following features:
1) it can be easily reconfigured for computing inner products of input arrays with four or more types of structures. Typically, each input array may contain sixty-four 8-bit items, or sixteen 16-bit items, or four 32-bit items, or one 64 bit item, with items in 2""s complement or unsigned form (i.e., with two number representation options);
2) it can be pipelined to output an inner product in one machine cycle (for example, 3 to 5 ns per cycle), and to complete an inner product evaluation in two to four cycles, which is particularly attractive to high-speed and efficient matrix multiplication applications;
3) it requires a compact VLSI area with very simple reconfigurable components. The processor consists mainly of an array of 8xc3x978 or 4xc3x974 simple multipliers, plus a few adder arrays of the same structure. The total amount of hardware is comparable to two 64xc3x9764 array multipliers;
4) the whole network is reconfigured using a few control bits for the desired computations, and the reconfiguration can be done dynamically, in one machine cycle. Also, each reconfiguration switch is controlled by a single bit; and
5) the design is highly regular and modular; most parts of the network are symmetric and repeatable.
The processor architecture of the current invention is based on a novel linear recursive decomposition (called square recursive decomposition) of partial product matrix and dynamic reconfigurable computation methods with particular applications on the basic multipliers and adder arrays.
In accordance with the present invention, there is provided process architecture that is reconfigurable. The reconfigurable architecture is based upon square recursive decomposition of partial product matrix and dynamic reconfiguration of basic multipliers and adders. The processor consists mainly of an array of 8xc3x978 or 4xc3x974 simple multipliers, plus a few adder arrays of the same structure. The total amount of hardware is comparable to two single 64xc3x9764 array multipliers. The whole architecture is reconfigured using a few control bits for the desired computations. The reconfiguration can be done in one machine cycle. Also, each reconfiguration switch is controlled by a single bit. The design is highly regular and modular. Most parts of the architecture are symmetric and repeatable.
It is an object of the present invention to provide an improved processor architecture.
It is another object of this invention to provide a processor architecture that is based on a novel linear recursive decomposition (called square recursive decomposition) of partial product matrix, and dynamic reconfigurable computation methods.