Many problems, particularly those in the area of signal processing, require the calculation of the correlation of two functions. Given two functions, X and Y, the correlation may be approximated through digital processes. In order to do this the functions must be sequentially sampled to produce the series (x.sub.1, x.sub.2, x.sub.3, ..., x.sub.s) and (y.sub.1, y.sub.2, y.sub.3, ..., y.sub.t). Using these series the correlation may be approximated as a series of summations of products of the elements of the X and Y series. Such a series of summations can be represented as ##EQU1## where the range of values of j over which the summation is performed is representative of the time period over which the correlation is to be calculated and the value of N is a measure of the bandwidth desired for the particular calculation.
The correlation series described above may be produced in various manners. On a general purpose computer each summation may be calculated from the values of the X and Y series. If the time-bandwidth product is large, or if a large number of x and y values are to be used, however, the time required to perform these calculations could be so large that real time processing of rapidly accumulating data is precluded. Such is often the case in the area of signal processing.
Parallel processing schemes provide methods of calculating a correlation series such as that shown above more rapidly. One such approach of the prior art is the use of a systolic array such as the one shown in prior art FIG. 1. The systolic array of FIG. 1 has a plurality of multiply-accumulate devices, 10, 11, 12, 13, and 14. A multiply-accumulate device typically has two data inputs. The device is adapted to accept pairs of numbers as input data, one member of each pair at each of the inputs, and to provide as output the sums of the products of consecutive pairs of numbers used.
In FIG. 1 five multiply-accumulate devices are shown. Typically more than this would be provided. In order to provide the most efficient processing 2N-1 multiply-accumulate devices are required, where N is the same as N in the example of the correlation series shown above. Additionally FIG. 1 shows two shift registers, each having 2N-1 stages.
In operation the values x.sub.j are inserted into shift register 15 and advanced through registers 16, 17, 18, and 19 while the values of y.sub.j are inserted into shift register 20 and advanced through registers 21, 22, 23, and 24. During a cycle of the apparatus each multiply-accumulate device which is in use during that cycle receives an x value at one input and a y value at the other input. For example, when shift register 10 is in use it receives an x value from shift register stage 15 at input 25 and a y value from shift register stage 24 at input 26.
One of the disadvantages of the prior art is that not all multiply-accumulate devices are used during any one cycle. The reason for this may be more clearly seen by reference to FIGS. 2A and 2B. Those figures show systolic arrays of the prior art where N is equal to 3, i.e. five stage shift registers are used.
FIG. 2A shows the state of such a system during a first clock cycle. Shift register stages 15, 17, and 19 contain the values x.sub.j, x.sub.j-l, and x.sub.j-2 respectively. Likewise shift register stages 20, 22, and 24 contain the values y.sub.j, y.sub.j-l, and y.sub.j-2. Each stage of a shift register transmits the value which it contains to the multiply-accumulate device input associated therewith. As illustrated, multiply-accumulate device 10 calculates ##EQU2## over the course of the processing, while multiply-accumulate device 12 calculates and multiply-accumulate device 14 calculates ##EQU3##
FIG. 2B illustrates the state of the system during the clock cycle following that of FIG. 2B. The values in the shift registers have advanced so that stages 16 and 18 now contain the values x.sub.j and x.sub.j-1 respectively while stages 21 and 23 contain the values y.sub.j and y.sub.j-1. As shown multiply-accumulate device 11 calculates ##EQU4## and multiply-accumulate device 13 calculates ##EQU5##
As shown in FIGS. 2A and 2B only a portion of the stages of the shift registers contain x or y values during any given clock cycle and only those multiply-accumulate devices associated with stages containing values are operative. If x and y values were to be loaded into the shift registers during every time period, so that all stages contained an x or y value at all times, every other x and y value would shift past one another and half of the desired summations would not be calculated.