Systolic arrays are now widely used in a number of signal and image processing applications. An important reason for this wide usage is that signal processing algorithms which perform repeated similar operations on large amounts of data can often be partitioned so that many operations are performed in parallel and that these parallel operations are repeated a number of times. Keys to systolic array design are that all communications are local (i.e. the length of communication paths is constant, independent of the array size), and that the system is fully pipelined, or systolized. What this means is that all combinational paths are localized within processors, and all data transfer between processors is done through delay registers.
Digit-serial computation is carried forward in a synchronous architecture in which data is strictly clocked from one operator to the next, and arithmetic units often include internal stages of pipelining. Combinational paths are usually kept short and outputs of individual operators are usually latched. Digit-serial computation is thus compatible with systolic array design. Indeed, digit-serial circuits are sometimes described as being systolic, differing from systolic arrays only in their lack of regular connection topology. It is obvious to try, then, to use digit-serial arithmetic to implement the individual processors in a systolic array. Allowing a choice of digit size provides the flexibility of being able to make trade-offs between hardware cost and throughput.
In translating a systolic array using parallel arithmetic to one implemented using digit-serial arithmetic, one must be careful that the functionality of the array is unchanged. The different timing characteristics of digit-serial operations as compared with bit-parallel operations means that the timing of the array may be changed, affecting functionality. P. F. Corbett and R. I. Hartley outline a simple method for translating an array of bit-parallel processors to a corresponding array of digit-serial processors using a chosen digit-size in their paper "Use of Digit-serial Computation in Systolic Arrays" appearing on pages 1-4 of IEEE International Symposium on Circuits & Systems published in May 1990 and incorporated in its entirety into this application. As long as a simple criterion is satisfied, identical functionality between the original bit-parallel array and the digit-serial version is guaranteed.
Despite their high efficiency of computation, systolic arrays have some potential disadvantages. One of these is low processor utilization. A frequent occurrence in systolic designs is that processors cannot be provided data on every clock cycle. Data arrives on only one of every .alpha. cycles where .alpha. is some integer greater than one. Therefore, the utilization of the processor is reduced to 1/.alpha.. However, the processor itself must still meet all the performance requirements of a fully utilized processor since, whenever it is required to process data, it must do so in the same time as a fully utilized processor would. In such arrays, it is impossible to maintain a data rate of one sample per systolic cycle in the data pipelines and it is necessary to separate valid data by dummy or zero-valued data in order to meet the systolic requirements.
The low utilization of processors arises because of cycles in the data flow graph of the array. Typically, the projection of the data dependency graph onto a reasonable number of processors leads to an array in which there are loops with duration greater than one systolic cycle. In order to implement the systolic algorithm correctly, it is necessary to separate the data by dummy or zero values.
Several techniques have been proposed to utilize the unused cycles in these arrays. In one technique, the array hardware remains much the same, but is utilized on every cycle. This utilization is achieved by interleaving problem instances on a single array. These may be multiple copies of identical problem instances to achieve some measure of tolerance for transient faults in the network. This is accomplished by comparing two or more outputs which should be identical in the fault free case. The .alpha.=2 case allows fault detection. Cases where .alpha.&gt;2 allow fault correction through voting. An alternative technique is to increase the overall throughput of the systolic array by interleaving different problem instances onto the array. Each of these alternative techniques can require additions to the processor hardware to separate data belonging to different problem instances. They also require more complex external interfaces to control the interleaving of problem instances. Another technique is to multiplex data streams through the processors. This requires many registers and multiplexers, as well as control, and therefore adds considerable complexity.
The inventors propose an alternative approach which increases processor utilization to unity. Rather than increasing the utilization of the standard processor hardware, the technique the inventors propose allows throughput to be maintained while reducing the hardware by a factor approaching .alpha.. The technique is to divide the data words into .alpha. sections, or digits, and to process these digits serially whenever it would otherwise be necessary to separate the data by dummy or zero values. Digit-serial data inherently has a data rate that is an integer submultiple of the parallel data rate, and the digit-size may be chosen so as to meet the data rates required by the systolic algorithm. The advantages of this technique are two-fold. First, the number of wires required for the transmission of data is decreased. Second, the fully-utilized digit-serial computational elements that replace the under-utilized bit-parallel computational elements generally have fewer components, even taking into account the data format conversion circuitry needed at their inputs.