The present invention relates to an orthogonal transform processor to be suitably utilized for an image processing.
A small-size circuit for achieving an orthogonal transform is recently required as an important part of a system of compressing and coding two-dimensional image data with a high degree of efficiency. In an encoder, there is utilized a forward orthogonal transform such as a discrete cosine transform which is referred to DCT, a discrete sine transform which is referred to DST, or the like. In a decoder, there is utilized an inverse orthogonal transform such as an inverse discrete cosine transform which is referred to IDCT, an inverse discrete sine transform which is referred to as IDST, or the like.
U.S. Pat. No. 4,791,598 discloses a two-dimensional DCT processor comprising two one-dimensional DCT processors and a transposition memory interposed therebetween. Each of the two one-dimensional DCT processors incorporates a distributed arithmetic (DA) circuit for obtaining vector inner products using ROMs (read only memory) without multipliers used. The DA circuit comprises a plurality of ROMs and accumulators which are referred to RACs. Each of the RACs comprises (i) a ROM which contains, in a form of a look-up table, the partial sums of vector inner products based on a discrete cosine matrix, and (ii) an accumulator for adding, with the digits aligned, the partial sums successively retrieved from the ROM with the bit slice words serving as addresses, thereby to obtain the vector inner product corresponding to an input vector. Such an arrangement of the two-dimensional DCT processor can be applied to a two-dimensional IDCT processor.
It is now supposed to execute a two-dimensional IDCT processing on an input data comprising 8.times.8 elements. The input data is expressed by a matrix Y in 8 rows and 8 columns having elements y.sub.ij (i=0 to 7, j=0 to 7). Also, there is considered an inverse discrete cosine matrix D in 8 rows and 8 columns. Each of elements d.sub.ij of the matrix D is expressed as follows; EQU d.sub.10 =1/(2.multidot.2.sup.0.5), i=0 to 7 EQU d.sub.ij =(1/2)cos{(2i+1)j.pi./16}, i=0 to 7, j=1 to 7 (1)
The two-dimensional IDCT of the matrix Y is a DYD.sup.T, wherein D.sup.T refers to a transposition matrix of the matrix D. When there are used a transposing means and a one-dimensional IDCT processor for calculating the one-dimensional IDCT of the matrix Y or matrix product DY, an intermediate matrix X=(DY).sup.T can readily be obtained. The final result DYD.sup.T can also be obtained in a similar manner, because DYD.sup.T is equal to (D(DY)T).sup.T =(DX).sup.T. More specifically, the one-dimensional IDCT processor for calculating the matrix product DY, plays an important role for achieving a two-dimensional IDCT.
The result of a one-dimensional IDCT on the jth column of the matrix Y is expressed by the jth column of a matrix W in 8 rows and 8 columns. Here, each element w.sub.ij of the matrix W is expressed as follows: EQU w.sub.ij =.SIGMA..sub.k+0.sup.7 d.sub.ik y.sub.kj, i=0 to 7, j=0 to 7 (2)
The element w.sub.ij is the inner product of the ith row of the matrix D and the jth column of the matrix Y, and is equal to the sum of eight products. The processing for obtaining the element w.sub.ij is called an 8-point IDCT processing.
According to a one-dimensional IDCT processor having eight multipliers and eight accumulators, there can be calculated in parallel eight inner products w.sub.0j, w.sub.1j, w.sub.2j, w.sub.3j, w.sub.4j, w.sub.5j, w.sub.6j, w.sub.7j which form the jth column of the matrix W, wherein EQU w.sub.0j =.SIGMA..sub.k=0.sup.7 d.sub.0k y.sub.kj EQU w.sub.1j =.SIGMA..sub.k=0.sup.7 d.sub.1k y.sub.kj EQU w.sub.2j =.SIGMA..sub.k=0.sup.7 d.sub.2k y.sub.kj EQU w.sub.3j =.SIGMA..sub.k=0.sup.7 d.sub.3k y.sub.kj EQU w.sub.4j =.SIGMA..sub.k=0.sup.7 d.sub.4k y.sub.kj EQU w.sub.5j =.SIGMA..sub.k=0.sup.7 d.sub.5k y.sub.kj EQU w.sub.6j =.SIGMA..sub.k=0.sup.7 d.sub.6k y.sub.kj EQU w.sub.7j =.SIGMA..sub.k=0.sup.7 d.sub.7k y.sub.kj ( 3)
The one-dimensional IDCT processor having eight multipliers above-mentioned is disadvantageous in that the multipliers occupy a large area on the chip when the processor is mounted on a VLSI (very large scale integration).
Further, to achieve, by the prior art DA circuit, a parallel calculation of eight inner products represented by the equations (3), large-size ROMs are disadvantageously required.