1. Field of the Invention
This invention relates generally to a circuit for matrix multiplication. More particularly, this invention relates to a compact pipelined matrix multiplier implemented as integrated circuit (IC) with faster multiplication operation and reduced occupied area on the IC chip by the use of an encoding and shifting circuit configuration.
2. Description of the Prior Art
Matrix multiplications are commonly performed under various circumstances for different applications. In order to carry out the multiplications expeditiously, the multiplication algorithm are generally implemented in hardware by use of special IC circuit designs. Recent trend of miniaturization in manufacturing ever smaller electronic devices has placed a greater demand on the techniques to design the circuits such that the area occupied by the circuits on an IC chip is minimized. A circuit structure which has a feature capable of enabling an IC circuit design in reducing the size of circuit area on a chip can be applied to a broad spectrum of applications to further miniaturize the electronic devices.
One specific application of matrix multiplication is in the area of video image data compression where a two-dimensional discrete cosine transform (DCT) is performed. For a N.times.N data matrix, a two dimensional discrete cosine transform is defined as: ##EQU1## An inversion transform can be similarly carried out by: ##EQU2## X(n.sub.1, n.sub.2) is the input data matrix and Z(K.sub.1,K.sub.2) is the matrix of transform coefficients.
Equation (1a) can be written in matrix form EQU Z=CXC.sup.t ( 2)
where C stands for the cosine coefficient matrix and C.sup.t stands for the transpose of C.
In a more detailed version, equation (2) can be rewritten as ##EQU3## where C.sup.t.sub.ij=C.sub.ij.
There are three N.times.N matrices multiplying with each other. We can thus define a matrix Y=XC.sup.t or ##EQU4## the matrix can be written as ##EQU5##
In a two-by-two example, the equation (4) reduces itself to ##EQU6## and equation (5) reduces itself to ##EQU7##
The prior art techniques for DCT matrix multiplication have several limitations. In the case when a row-column decomposition method is used, a transposed memory is required. A solution with heavily pipeline data flow would become very difficult to implement in actual circuit designs. Under the circumstances when a direct two-dimensional DCT is implemented, the costs for hardware design and manufacturing is relatively too high compared with the benefits that could potentially gain from the intended hardware implementations. Furthermore, the proposed techniques generally lack a characteristics of modularization and require highly convoluted and complex circuit structures thus making the actual implementation very time consuming and inefficient.
A modularized circuit design to implement a matrix multiplication algorithm is provided in a prior patent application by the inventor of the present invention. (patent application with Ser. No. 07/836,075 entitled "Fast Pipelined Matrix Multiplier" which was issued on Apr. 20, 1993 as U.S. Pat. No. 5,204,830) A pipeline structure is used in U.S. Pat. No. 5,248,830 wherein a plurality of registers and multiplexers are configured in a top-down architecture forming a plurality of bit multiplication element, i.e., processing elements (PEs). The multiplication is performed in a time progressive manner one bit a time in each processing element from the top to the bottom. In addition to the top-down pipeline configuration, the multiplier further has an advantage where the multiplications are performed in a parallel manner, i.e., each element of a row and the corresponding column of the multiplied matrix are processed in one of many top-down processing columns simultaneously in a parallel manner. Because of the simplicity of the design and the pipeline and parallel processing architecture, the multiplier can be designed and manufactured very economically. The cost of hardware implementation is greatly reduced from that of the earlier techniques. The modular structure further enables the circuits to be conveniently applied to perform multi-stage matrix multiplications without requiring major circuit design efforts.
However, this pipeline multiplier may encounter a different potential hardware limitation under some special circumstances. Since each of the plurality of bit-multiplication elements, i.e., PEs, processes one bit per clock cycle, the number of registers and multiplexers may be incremental to a large number as the number of bits of individual matrix elements and the numbers of columns and rows of the matrices increase. Specifically, for an 8.times.8 DCT matrix multiplication, the S-box and matrix of accumulator arrays as shown in FIG. 6 and 6A in Wang et al. require an array of 64 accumulator and very complicate time-control switches. Large circuit areas are occupied on an IC chip by these accumulator arrays and time-control switches which reduced the usefulness of this pipelined matrix multiplier. The requirement to miniaturize the electronic devices strongly demand that the number of electronic circuit components on an IC chips be reduced. Large number of registers, accumulators and associated interconnecting lines for time control switches as disclosed in Wang et al. occupy too much areas on an IC chip which could make the hardware implementation incompatible with some manufacturing specifications.
Therefore, a need still exits in the art to reduce the number of the circuit components in a pipeline multiplier whereby the pipeline/parallel multiplication configuration can be further improved for implementation in a broader range of applications.