The present invention relates to a discrete cosine transform processor suitable for use to compress and expand image data, in particular.
Recently, since semiconductor technology has advanced markedly and in addition International Standards (e.g., CCITT H261, ISO, MPEG, etc.) with respect to image data coding have been established, LSIs for compressing and expanding image data are now being developed. The basis of the standards of the image data compression and expansion technique resides in two-dimensional discrete cosine transformation. Here, the two-dimensional discrete cosine transformation at N.times.N points in the forward direction can be expressed by the following formula: ##EQU1## where x(i, j) (i, j=0, 1, . . . , N-1) denote the original signals (image data) and X(u, v) (u, v=0, 1, . . . , N-1) denote the transformed coefficients. Further, when u=v=0, C(0)=C(0)=2.sup.-1/2 and when u and V.noteq.0, C(u)=C(v)=1.
Further, the discrete cosine transformation in the inverse direction can be expressed by the following formula: ##EQU2##
Now, assuming that X denotes a matrix of N.times.N having components of X(u, v) (where u denotes the u-th row and v denotes the v-th column ); x denotes a matrix of N.times.N having components of x(i, j ) (where i denotes the i-th row and j denotes the j-th column); and further EQU C.sub.u,i =(2/N).sup.1/2 C(u)cos((2i+1)u.pi./N)
denotes a transform matrix C of N.times.N having components of the u-th row and the i-th column), the above-mentioned formulae (1) and (2) can be expressed as follows: EQU X=C.multidot.x.multidot.C.sup.t ( 3) EQU x=C.sup.t .multidot.X.multidot.C (4)
where C.sup.t denotes the transposed matrix of the matrix C. The above formulae (3) and (4) indicate that the two-dimensional discrete cosine transformation can be obtained by twice achieving the one-dimensional discrete cosine transformation.
In the case of the general two-dimensional discrete cosine transformation used for image data compression and expansion, the transformation is usually executed in unit of block of 8 (vertical pixels).times.8 (horizontal pixels). Therefore, the case of N=8 will be explained hereinbelow by way of example. In this case, since the transform matrix is a (8.times.8) matrix, it has been so far necessary to simply execute multiplication and accumulation calculations 4096 times. Therefore, in order to realize a discrete cosine transform LSI, the important problem is to execute a great amount of multiplication and accumulation calculations at high speed by use of a small-scaled circuit.
Now, a technique to execute a great amount of multiplication and accumulation calculations effectively is so far known. In this technique, the two-dimensional discrete cosine transformation is executed by separating it into two one-dimensional discrete cosine transformation calculations in the row and column directions. In more detail, in this technique, the one-dimensional discrete cosine transformation is executed for the input data in the row direction, and the calculated results are inverted by and then stored in an inversion RAM. Further, these inverted data are further transformed on the basis of one-dimensional discrete cosine transformation in the column direction to obtain the two-dimensional discrete cosine transform coefficients. Further, in the above-mentioned technique, a high speed algorithm for one-dimensional discrete cosine transformation has been used. The discrete cosine transformation in the forward direction executed in accordance with this high speed algorithm can be expressed as follows: ##EQU3## where A=cos (.pi./4), B=cos (.pi./8), C=sin (.pi./8), D=cos (.pi./16), E=cos (3.pi./16), F=sin (3.pi./16), F=sin (3.pi./16), and G=sin (.pi./16), which are all transform matrix components; xi (i=0, 1, . . . , 7) denote the original signals and Xj (j=0, 1, . . . , 7) denote the transform coefficients.
Further, the discrete cosine transformation in the inverse direction executed in accordance with this high speed algorithm can be expressed as follows: ##EQU4##
In the prior art discrete cosine transform processor for executing the discrete calculations in both the forward and inverse directions in accordance with the above-mentioned formulae (5), (6), (7) and (8), the discrete calculation results have been so far obtained as follows: external data are inputted through a data input device; the calculations are executed through adders, subtracters, registers, etc. in combination; and the discrete calculation results are obtained on the basis of multiplication and accumulation calculation results stored in a read only memory.
The data input device to which external data are input holds image data x.sub.0, . . . , x.sub.7 in the case of the forward direction transformation and transform coefficients X.sub.0, . . . , X.sub.7 in the case of the inverse direction transformation. The data input device is constructed by two banks of orthogonal memory (corner turn memory) (two bank ROM) for executing parallel-serial transformation, as shown in FIG. 1. In more detail, as shown in FIG. 2, the orthogonal memory includes 8 word WORD0, . . . , WORD7, and the WORDi (i=0, . . . , 7) holds the input data x.sub.i or X.sub.i. Therefore, each word has a number of bits the same as that of the input data. For instance, if the input data x.sub.i or X.sub.i is 16-bit data, the number of bits of each word of the orthogonal memory is 16. In the case of the orthogonal memory, data are generally written in the word direction and read in one bit column direction, so that it is possible to execute parallel-serial transformation. In the data input device as shown in FIG. 2, 8 input data x.sub.0, . . . , x.sub.7 or X.sub.0, . . . , X.sub.7 are written in the A-bank of the orthogonal memory. Further, when these written data are being read, the succeeding 8 input data are written in the B-bank of the orthogonal memory, so that the data can be inputted continuously. Further, in this data read, the input data are read 2 bits by 2 bits in one cycle beginning from the less significant bit. Now, if the input data are of 16 bits, 8 cycles are required to read the input data completely. Further, the data input device has 8 output terminals 2.sub.1 to 2.sub.8, and 2-digit 2-bit input data of x.sub.i or X.sub.i are outputted in each cycle from the output terminal 2.sub.i (i=1, . . . , 8). For instance, if the input data (x.sub.i or X.sub.i) are d.sub.15, d.sub.14, . . . , d.sub.1, d.sub.0, 2-digit 2-bit data d.sub.2j-1 and d.sub.2j-2 are outputted from the output terminal 2.sub.i in j (j=1, . . . , 8) cycle.
Further, two read only memories (ROMs) are used for discrete calculations to store the multiplication and accumulation calculation results. The multiplication and accumulation data of the even-order coefficients are stored in one of the ROMs and those of odd-order coefficients are stored in the other thereof. Further, each ROM includes 4 (first to fourth) memory sections.
The data stored in the first to fourth memory sections are different from each other between the forward direction transformation and the inverse direction transformation, so that the ROM is composed of two banks as shown in FIG. 1. The ROM shown in FIG. 1 is of contact program ROM type, in which NMOS transistors whose gates are connected to word lines 42, respectively are used as memory cells, and the ROM can be programmed depending upon whether the drains (on the side opposite to the bit line 43) of these transistors are connected to any of the 4 program lines 46, 47, 48 and 49 or not.
In the prior art discrete cosine transform processor based upon the above-mentioned discrete calculation system, in order to use the read only memory in common for both the forward and inverse directions, since the contact program system of two-bank construction is adopted, the data stored therein can be replaced between the forward and inverse direction transformations. However, in the prior art two-bank construction, since 4 control lines are necessary, there exists a problem in that a large area is required, so that the chip size of the discrete cosine transform processor increases inevitably.
On the other hand, another prior art discrete cosine transform processor is disclosed in Japanese Laid-open Patent Application 5-153402 (1993) such that the discrete cosine transform (DCT) calculations are executed by sampling the input data of a plurality of dimensions 2 bits by 2 bits for each dimension and by using look-up tables prepared for each of the two bits and for each of the forward and inverse DCT matrices. In this prior art DCT processor, however, since two ROMs for storing the same data are required for both the forward and inverse transformation directions in order to execute 2-digit processing in one cycle, 4 ROM tables are necessary for each coefficient, so that the ROM tables as large as 64 are required in the case of 2 dimensions of 8 coefficients. In other words, in this prior art processor, there exists another problem in that the number of necessary ROM tables is large.