As is known, systolic arrays have been used for VLSI systems. A systolic array is a matrix of individual signal processing cells, where overall operation of the systolic array depends upon functions of the individual signal processing cells and the interconnection scheme of such signal processing cells. A clock signal is conventionally applied to a systolic array to control data flow therethrough.
While an interconnection scheme may include only interconnects between nearest neighbor signal processing cells within a systolic array, interconnection schemes are not limited to having only nearest neighbor interconnects. Moreover, a systolic array may be pipelined, namely where data stream bits are clocked into and out of signal processing cells for sequential processing. In some pipelined implementations, back-to-back data words of a data stream may be processed by a signal processing cell, where a leading word of a sequence is clocked out of such signal processing cell at generally the same time an immediately following word of the sequence is clocked into such signal processing cell.
Matrix triangularization using systolic arrays was described by Gentleman and Kung in “Matrix triangularization by systolic arrays” in SPIE, Vol. 298, Real-Time Signal Processing IV (1981) at pages 19-26 (hereinafter “Gentleman and Kung”). Gentleman and Kung proposed solving linear least squares problems using systolic arrays. As is known, least squares problems arise in both data and signal processing applications. Gentleman and Kung proposed coupling two systolic arrays, where a first of the two systolic arrays, which is triangular, is used for implementing a pipelined sequence of Givens rotations for QR decomposition (“QRD”). However, Gentleman and Kung's matrix triangularization and least squares solution is predicated on processing matrices with only real numbers.
For a matrix with one or more complex numbers, namely a “complex matrix,” heretofore, a complex-to-real data reduction was performed to convert each individual cell in the complex matrix to a real number before performing a Givens rotation by factoring out the phase before Givens rotations and applying such phase at a later stage. Complex-to-real data reduction and application of a conventional complex Givens Rotations matrix is known, such as described in a book entitled “Adaptive Filter Theory, 4th Edition” by Simon Haykin, published by Pearson Education (Asia), Indian Edition in 2002, at pages 838-841 (hereinafter “Simon Haykin”), and a book entitled “Fundamentals of Adaptive Filtering, 1st Edition” by Ali Sayed, published by John Wiley in 2003, at pages 804-807. Additionally, others have proposed various forms of phase factoring for complex-to-real data reduction. However, complex-to-real data reduction adds additional complexity, and thus adds overhead, whether implemented in hardware or software, or a combination thereof.
A boundary cell and an internal cell of a systolic array for processing real numbers are described in detail in U.S. Pat. No. 4,727,503 (“McWhirter”). In McWhirter, a boundary cell and an internal cell for use in a systolic array configured for Givens rotation as described by Gentleman and Kung are described, which description is incorporated by reference herein in its entirety. FIG. 7A is a block diagram of a prior art boundary cell 700 of a systolic array for processing complex numbers, and FIG. 7B is a block diagram of a prior art internal cell 710 of a systolic array for processing complex numbers. Boundary cell 700, in contrast to the boundary cell described in McWhirter, has a φ output associated with the phase of a complex number. Likewise, internal cell 710, in contrast to the internal cell described in McWhirter, has a φ input and a φ output each of which is associated with the phase of a complex number. Another paper that disclosed factoring out the phase and applying it at a later stage is “Efficient Implementation of Rotation Operations for High Performance QRD-RLS Filtering”, by B. Haller, J. Gotze and J. R. Cavallaro published in IEEE conference on Application-Specific Systems, Architectures and Processors, 1997, Proceedings, published on 14-16 Jul. 1997, pages 162-174.
In McWhirter, equations for implementing the boundary cell and the internal cell thereof are listed. Likewise, FIG. 7C lists equations 701 for implementing boundary cell 700, and FIG. 7D lists equations 711 for implementing internal cell 710. In contrast to such equations in McWhirter, equations 701 and 711 include having to deal with phase φ. More particularly, boundary cell 700 has to convert a phase to a real number in terms of an input Xin and output such phase φ for tracking by an internal cell 710. An internal cell 710 uses phase φ, such as output from a boundary cell or output from another internal cell 710 in a systolic array, to apply rotation to an input Xin thereto. Accordingly, it should be appreciated that performing a complex-to-real data reduction and having to account for phase φ in terms of rotation of an input adds overhead to a systolic array for processing complex numbers.
Right multiplication of an upper triangular matrix with another matrix using a triangular systolic array is known, and thus not described in unnecessary detail. However, there are instances when left matrix multiplication would be useful.
Accordingly, it would be desirable and useful to provide a systolic array capable of both right and left matrix multiplication.