Known digital processors for convolution and correlation implemented as bit-level systolic arrays are described in British Patent Application No. 2106287A published on 7th of April, 1983 (Reference 1), in which FIGS. 15 to 20 refer to a convolver. Equivalent U.S. Pat. Nos. are U.S. Pat. Nos. 48533,993 and 4,639,857. This device consists of a rectangular array of gated full adder logic cells arranged in rows and columns. Each cell is connected to its immediate row and column neighbours only, ie each cell is connected to a maximum of four other cells. Cell operation is controlled by clocked latches which effect movement of data, coefficient, carry and sum bits through the array. Each cell evaluates the product of an input data bit and an input coefficient bit received from left and right hand neighbours respectively, and adds the product to input carry and cumulative sum bits received from the right and above respectively. New carry and cumulative sum bits are generated for output to the left and below, and the input data and coefficient bits pass to the right and left respectively. Each coefficient word is circulated bit serially through a respective array row. Each data word passes through each row in succession and in effect spirals (strictly speaking zig-zags) up the array. Successive carries move with coefficient bits, and successive cumulative sums move down array columns. Data moves in counterflow with respect to both cumulative sum generation and coefficient and carry propagation. Cumulative sum generation is cascaded down array columns to produce partial sums output from the array. Partial sums of like bit significance emerge from the same array column in succession, and are accumulated to form convolution results by full adders arranged for output sum feedback.
It is a disadvantage to the use of processors described in Reference 1 that data and coefficient words must be interspersed with zero bits to avoid generation of unwanted partial products. At any time, at least half and in one case three quarters of the array cells compute zero partial products. The array is therefore inefficient, and much larger than would be required if interspersed zero bits could be avoided.
A further bit-level systolic array is described in British Patent Application No. 2144245A published 27, Feb. 1985 (Reference 2). The equivalent U.S. Pat. No. is 4,686,645. This relates to an array similar to that of Reference 1 for multiplying two matrices having multi-bit coefficients. This provides for row elements of one matrix to propagate along array rows in counterflow with column elements of the other, carry bits being recirculated on each cell rather than moving along rows. The use of so-called "guard bands" is described, this being the extension of coefficient words with zero bits to provide for word growth of accumulating results.
British Patent Application No. 2147721A published May 15, 1985 (Reference 3) relates to a further bit-level systolic array for matrix-vector multiplication. The equivalent U.S. Pat. No. is 4,701,876. Improved array efficiency is obtained in two ways. Firstly, array output accumulation is arranged such that parts of the array corresponding to inactive regions in Reference 1 contribute to convolution results. Secondly, the need for zeros between data and coefficient bits is avoided by complex clocking arrangements effecting bit movement in adjacent rows on alternate clock cycles. As in References 1 and 2, multiplicand bits move in counterflow in array rows. As in Reference 2, carry bits are recirculated on each cell and word extension with guard bands is employed.
In the GEC Journal of Research, Vol. 2, No. 1, 1984, R. B. Urquhart and D. Wood introduce the concept of using static coefficients in bit-level systolic arrays. Each cell of an array is associated with a respective single bit of a coefficient, and a coefficient word is associated with a corresponding array row. The cells are arranged for carry bit recirculation, data is input to each array row and moves along it. Cumulative sum generation is cascaded down array columns and guard bands provide for word growth. Partial sums of like bit significance emerge from different array columns either with relative delays or synchronously according to whether input data meets coefficient bits in ascending or reverse order of bit significance. This arrangement provides 100% cell utilisation or array efficiency without requiring complex clocking arrangements.
Each cell computes products on every clock cycle, and all latches are clocked in the same way. Unfortunately, however, array accumulation as described cannot provide correct convolution or correlation results, since the scheme proposed would produce wrongful accumulation of partial sums and carry bits corresponding to different results.
In the art of digital arithmetic circuits, it is important to provide for standardisation of components if at all possible. This is greatly facilitated if integrated circuits designed for small calculations can be linked together or cascaded in an array to perform a much larger calculation. It is also important, although very rarely achievable, to provide for some degree of fault tolerance in such an array of integrated circuits, in order that a comparatively small fault might not render the array entirely useless. This is of particular importance in the developing field of wafer scale integration, in which wafer yields can be virtually zero without some degree of fault tolerance. It is an object of the present invention to provide a digital processor for correlation or convolution capable of being cascaded to form a fault tolerant assembly.