The discrete cosine transform (DCT) in two dimensions is important for compressing video signals for storage and for transmission to a remote location.
A two-dimensional discrete cosine transform from an NxN input data matrix can be defined as ##EQU1##
Its inversion can be similarly defined as ##EQU2##
X(n.sub.1, n.sub.2) data matrix and Z(k.sub.1, k.sub.2) is the matrix of transform coefficients.
Equation (1a) can be written in matrix form EQU Z=CXC.sup.t
(2)
where C stands for the cosine coefficients matrix and c.sup.t stands for the transpose of C.
In a more detailed version, equation (2) can be rewritten as ##EQU3##
There are three NxN matrices multiplying with each other. We can thus define a matrix Y=XC.sup.t or ##EQU4## the matrix Z can be written as
In a two-by-two example, the equation (4) reduces itself to ##EQU5## and equation (5) reduces itself to
In the prior art there have been numerous proposals for implementing the DCT (see, e.g., N. Ahmed, T. Natarajan and K. Rao, Discrete Cosine Transform 1974 IEEE Trans. on Comp. 90; M. Sun, T. Chen and A. Gottlieb, VLSI Implementation of 16.times.16 Discrete Cosine Transform, IEEE Trans. on Cir. and Sys., Apr., 1989, 610; N. Chou and S. Lee, Fast Algorithm and Implementation of a 2-D Discrete Cosine Transform, IEEE Trans. on Cir. and Sys., Mar., 1991, 297; H. Hou, A Fast Recursive Algorithm for Computing the Discrete Cosine Transform, IEEE Trans. on Cir. and Sys., Oct., 1987, 1455).
These prior art techniques have the following weaknesses.
1. If row-column decomposition is used, a transposed memory is required which renders a solution with heavily pipe-lined data flow impossible.
2. If a direct implementation of a two-dimensional DCT is used, the cost of the hardware necessary for such an implementation exceeds its benefit.
3. Most solutions lack modularization and hence yield implementations which are both time-consuming and inefficient.
It is one aspect of the present invention to provide a circuit and algorithm for obtaining a DCT which overcomes the weaknesses of the prior art.
As can be seen above, to carry out a DCT the multiplication of three matrices is necessary. Therefore it is also an object of the invention to provide a circuit and algorithm which efficiently multiply three matrices, wherein one application of the circuit is to multiply the three matrices required for a DCT.