This invention relates to a systolic array, and more particularly to a systolic array for solving least squares problems.
Systolic arrays are known, the concept being set out by Kung and Leiserson in "Systolic Arrays (for VLSI)" in the text of "Introduction to VLSI Systems" by Mead and Conway", Addison-Wesley (1980). Such an array comprises individual electronic signal processing cells which are interconnected. The operation of the array as a whole depends on the function of individual cells and the interconnection scheme, the only external control required being a clock. The term "systolic" arises from the clock "pumping" the operation of the array. The basic advantage of systolic arrays is that complex operations may be performed by arrays of comparatively simple processing cells having defined functions and appropriate interconnections, preferably nearest-neighbour interconnections only. This approach is highly applicable to the construction of very large scale integrated (VLSI) circuits.
Systolic arrays are particularly suitable for performing pipelined operations. A sequence of operations is said to be pipelined if an element of a data stream can enter the sequence before the preceding element has left it. Pipelining is highly beneficial in VLSI, since it affords the possibility of reducing the number of idle devices awaiting data.
The nomenclature employed in the art of systolic array technology for matrix computations express mathematical relationships rather then physical ones. Arrays implemented as electronic circuits are geometrically arranged on the basis of engineering convenience, since the important factors are processing cell functions and cell interconnections, not the physical positions of electronic components. Accordingly, for the purposes of this specification, geometrical and positional expressions such as triangular, column, nearest neighbour, diagonal, hypotenuse, boundary, internal etc describing array features shall be construed as terms of art expressing mathematical relationships and extending to or including corresponding features of topologically equivalent arrays.
In "Matrix Triangularization by Systolic Arrays", Proc. SPIE., Vol 28, Real-Time Signal Processing IV (1981), Kung and Gentleman showed that systolic arrays might be employed to solve linear least squares problems which arise in a wide range of signal and data processing applications. The particular problem is to determine a p-vector of statistical weights w(N) for which .vertline..vertline.Xw(N)-y.vertline..vertline. is minimized, where y is a given N-vector of data elements and X is a given Nxp design matrix with p.ltoreq.N, the usual Euclidean norm being assumed.
Kung and Gentleman solve this problem by a two stage process employing two coupled systolic arrays. The first systolic array is triangular, and is used to implement a pipelined sequence of Givens rotations. The mathematics of Givens rotations is described by Gentleman, J. Inst. Maths. Applics (1973), 12, pp 329-336. The approach is to carry out a QR decomposition of the matrix X; ie the sequence of Givens rotations operates on the elements of X to build up a unitary matrix Q such that: ##EQU1## where R is a pxp upper triangular matrix (a matrix in which all subdiagonal elements are zero). Each element of R is computed by and stored in a corresponding processing cell of the systolic array as elements x of the matrix X are clocked into it. The approach is to (Givens) rotate each successive row of X with each row of R in turn. The major diagonal of the triangular systolic array is occupied by boundary cells having processing functions appropriate to evaluate sine and cosine Givens rotation parameters. All other (ie above-diagonal) cells are referred to as internal cells, and have processing functions appropriate to apply the rotation parameters to incoming data comprising elements of X. The array may be schematically illustrated as a right isosceles triangle with one shorter side horizontally uppermost and the other vertical. Cell interconnections are between nearest horizontal and vertical neighbours only.
Information or rows of X enters the triangular array via its uppermost row in a temporally skewed order as required to synchronize array operation. This will be described in more detail later. Each boundary or internal cell stores a respective current value r or element of the upper triangular matrix R. Each boundary cell receives input data from above, updates the respective stored value of r, evaluates the rotation parameters and transfers them to the respective lateral nearest neighbour internal cell. Each internal cell receives rotation parameters from one side and input data from above. It applies the rotation parameters to the input, passes on the parameters laterally, provides an output below and updates its stored value of r. When all the elements x of the nxp matrix X have flowed through the triangular systolic array in a pipelined manner, the values of r stored in the cells give the elements of the upper triangular matrix R. An exact QR decomposition or triangularisation of the matrix X has been performed. It should be emphasised that the stored cell values only represent the R matrix when all data has flowed completely through the array. During processing, the stored cell values correspond to data input at different times, in view of the temporal skew applied to input data and the fact that horizontally or vertically successive cells are at any time processing progressively earlier data.
The n-vector of data elements y is fed into a further column of internal cells alongside the triangular array and connected to it in a nearest neighbour fashion. The rotation parameters from the array are passed to this further column for application to y after operation on X. In effect, the vector y is processed as an extra column of the matrix X.
The evaluation of Givens rotation parameters by the boundary cells normally requires calculation of square roots. However, Kung and Gentleman also describe an array for square root free parameter evaluation based on the earlier work of Gentleman, J. Inst. Maths Applics, Vol 2, pp 329-336, 1973. In effect, the Givens rotation is mapped into a different mathematical domain for the purposes of avoiding square root calculation. Different boundary and internal processing cell functions are required, and the boundary cells are connected together along the array diagonal. The values stored by the cells are not equal to the elements of the matrix R, but have a simple relationship thereto. The square root free approach is accordingly mathematically equivalent to the previous technique. It is also possible to employ other forms of processing cells having different but equivalent functions.
The second stage of the Kung and Gentleman procedure to obtain the weight vector w(N) comprises extracting the values stored by each cell of the triangular array and feeding them into a linear systolic array. The linear array performs a back-substitution process which solves the triangular linear system associated with Equation (1) and given by: EQU Rw(N)=Q.sub.1 y (2)
where Q.sub.1 is a matrix comprising the first p rows of the matrix Q previously defined. Accordingly, Q.sub.1 y denotes the first p elements of the vector obtained by applying the same series of Givens rotations to the vector y as were employed to generate R from X.
The linear systolic array generates the required weight vector w(N) directly, providing an exact least squares solution. The vector w(N) is then available inter alia for calculating the least squares residual e.sub.N defined by: EQU e.sub.N =x.sub.N.sup.T w(N)-y.sub.N ( 3)
where y.sub.N is the Nth element of y, and x.sub.N.sup.T is the Nth or final row of the matrix X. However, the back-substitution process of Kung and Gentleman has a number of disadvantages. The triangular linear system may be ill-conditioned; eg if the Nxp martix X does not have full rank (either N&lt;p or N includes less than p independent rows), the back-substitution process involves division by zero which is undefined. The back-substitution process may also be numerically unstable, ie involve division by small inaccurate quantities. This could be improved by interchanging columns of X, but such a procedure would be inconsistent with the design of a hard-wired systolic array representing a matrix having fixed rows and columns. Furthermore, Kung and Gentleman require both a triangular and a linear systolic array to solve the Equation (2) triangular linear system, and need to compute the vector product x.sub.N.sup.T w(N) in order to obtain the least squares residual e.sub.N.