a) Field of the Invention
The present invention relates to a technique of discrete cosine transformation (DCT), and more particularly to a discrete cosine transformation unit for performing two-dimensional discrete cosine transformations.
DCT is known as a transformation method suitable for image data compression. DCTs include a DCT in the forward direction for transforming image data into frequency components and a DCT in the backward direction for recovering the original image data by inversely transforming the frequency components. In this specification, both types of DCTs are represented by a term DCT, and discriminated by calling either a forward direction (forward) DCT or a backward (inverse) DCT.
b) Description of the Related Art
DCT which is one of orthogonal transformation methods is widely used nowadays as a data compression method.
FIGS. 3A to 3C are exemplary diagrams explaining an image data compressing technique.
As shown in FIG. 3A, a frame 50 to be processed is divided into small sub-frames 51 each having a size of 8*8 pixels for example. Each sub-frame 51 constitutes a square matrix of 8 rows and 8 columns having 64 matrix elements. Image information of the frame 50 is processed in units of the sub-frame 51.
As shown in FIG. 3B, a sub frame 51 of image data 52 is processed by a forward direction DCT processing unit 53 to obtain a DCT factor (F) 54. The DCT factor is obtained by frequency analyzing the image information in the row direction and column direction. The DCT factor 54 is processed by a threshold processing unit 55 to discard data having a value equal to or smaller than a predetermined value. Next, in order to shorten the length or non-zero data, the data is divided by a predetermined value by a normalization processing unit 56 to obtain data with a shortened length.
the image data 57 obtained in the manner described above includes non-zero data and zero data, most of high frequency components being zero data. The non-zero data is encoded by the Hafman Huffman encoding method to further compress the image data. The zero data is encoded by the run-length encoding method to handle a string of zero data as one data, and is further encoded by the Hafman Huffman encoding method.
In reproducing the original image from the compressed image data, first it is decoded by the Hafman Huffman decoding method or the like, to obtain the image data 57. The image data 57 is then subjected to an inverse processing opposite to the normalization processing and to an inverse DCT processing opposite to the forward direction DCT processing, to thereby reproduce the original image information.
FIG. 3C shows tile contents of the forward direction DCT processing shown in FIG. 3B. The image data f is sandwiched between a transposed cosine factor matrix D.sup.t and cosine factor matrix D to obtain a DCT factor F through matrix calculation. The forward direction DCT processing can be developed to and expressed by: EQU F=D.sup.t *f*D={(fD).sup.t D}.sup.t
Namely, the image data f is multiplied by the cosine factor matrix D on the right side of the image data f to frequency-analyze the row direction, the obtained matrix is transposed by interchanging rows and columns, the transposed matrix is multiplied again by the cosine factor matrix D to frequency-analyze the column direction, and the obtained matrix is transposed to obtain tile original orientation of rows and columns. In this manner, the DCT factor F is obtained which represents the results of the frequency analysis of the image information In the row and column directions. Matrix multiplication is required to be executed two times.
FIG. 4A to 4C show transformation factor matrices to be used by tile Forward and inverse DCT transformation operations for a sub-frame or block size of 8*8. FIG. 4A shows a cosine factor for matrix D and transposed cosine factor matrix D.sup.t.
In performing tile forward direction DCT by the above-described equation, the cosine factor matrix D is stored in a memory, and an input signal Is multiplied by the cosine factor matrix D (products-summing calculation). In the inverse DCT, the image information F can be reproduced From the DCT factor F by the following equation: EQU f=DFD.sup.t =(F.sup.t *D.sup.t *D.sup.t)={(FDt).sup.t *Dt}.sup.t.
Namely, in the inverse DCT, the DCT factor F is multiplied by tile transposed cosine factor matrix D.sup.t on the right side of the DCT factor F, and the obtained result is transposed by interchanging rows and columns. The transposed matrix is again multiplied by tile transposed cosine factor matrix D.sup.t, and tile obtained result is transposed to obtain the original orientation of rows and columns.
Assuming that the image data f and cosine factor matrix D are both an 8*8 matrix, an 8*8 matrix multiplication is performed. In the Forward and inverse DCT processing of such an 8*8 matrix multiplication, it is necessary to use eight multipliers.
When carefully observing tile cosine factor matrix D, it can be seen that in each column the first to fourth rows are symmetrical with the fifth to eighth rows, neglecting the signs of each element. More in particular, there is a relationship of D.sub.0 =.+-.D.sub.7, D.sub.1 =.+-.D.sub.6, D.sub.2=.+-.D.sub.5, and D.sub.3 =.+-.D.sub.4, where D.sub.0 to D.sub.7 are the elements of a column of the cosine factor matrix D. The sign of each element is identified by a column, taking a plus sign for an odd column and a minus sign for an even column.
Multiplications for the same element factor can be used in common, allowing Four multiplication calculations to be executed by one products-summing calculation. DCT with a high speed algorithm positively using such matrix element symmetry has been proposed.
The number of element factors used in the products-summing calculation of DCT is determined by the block or sub-frame size, and is a fixed number.
As one method of matrix multiplication, there is known a distributed arithmetic (DA) algorithm. Consider a matrix multiplication Y=A*X where X has m bits. A matrix multiplication for one column can be expressed by the following equation (i): ##EQU1## where X=-x.sup.(m-1) *2.sup.m-1 =.SIGMA..sub.M X.sup.(M) *2.sup.M. Therefore, the EQUATION (i) can be expressed by the following equation (ii) ##EQU2## where x.sup.(M) is an M-th bit of X and takes a value "0" or "1". The equation (ii) can be expressed by: EQU Y.sub.i =-(.SIGMA..sub.j A.sub.ij xj.sup.(m-1) *2.sup.m-1 +.SIGMA..sub.m (.SIGMA..sub.j A.sub.ij xj.sup.(M)) *2.sup.M
where the first term of the right side represents a sign bit, and the second term represents products-summing for bits of X, x.sub.j.sup.(M) takes a value "0" or "1". If A is an n*n matrix, the products are summed j times where j=0 to (n-1).
By using look-up tables which store the contents in the parentheses () in the equation (ii) by using x.sub.j as a parameter, the matrix multiplication Y.sub.i can be calculated by a shift operation depending upon the bit position and by additional and subtraction operations,
The scale of the hardware configuration for DCT calculation becomes large if multiplier are used in speeding up the operation speed. It is therefore desired not to use multipliers as less as possible. In this connection, the DA algorithm is suitable for a multiplication method without using multipliers.
FIGS. 5A to 5C show an example of a two-dimensional DCT calculation circuit using a DA algorithm. FIG. 5A is a schematic diagram showing the outline structure of the DCT calculation circuit, FIG. 5B shows the structure of a one-dimensional (1-D) processing unit, and FIG. 5C shows the structure of a DA products-summing calculation block of the one-dimensional processing unit.
In. FIG. 5A, input data is supplied to a 1-D DCT processing unit 61 to DCT-transform tile input data by using look-up tables. An output of the 1-D processing unit 61 is supplied to a shift/round circuit 62. The shift/round circuit 62 aligns the digit position and rounds off a signal having an increased number of bits caused by the DCT process to thereby have a predetermined number of bits.
An output of the shift/round circuit 62 is supplied to a transpose RAM 63 whereat rows and columns are interchanged. The transposed signal is supplied to a 1-D DCT processing unit 64 to frequency-analyze in other direction, the result being supplied to a shift/round circuit 65. The shift/round circuit 65 again aligns the digit position and round off a signal outputted from the 1-D DCT processing unit 64 to thereby have the predetermined number of bits, the result being supplied as output data.
FIG. 5B is a schematic diagram showing the structure of the 1-D DCT processing unit 61, 64 shown in FIG. 5A. In the 1-D DCT processing unit, input data is supplied to a pre-processing circuit 66 to form a combination of proper input signals. Combinations of input signals are supplied to two DA products-summing blocks 67 and 68.
For example, if an image signal is an 8*8 sub-frame or block, eight input data signals are supplied to the pre-processing circuit 66 which supplies four signals to the DA products-summing block 67 and the other four signals to the DA products-summing block 68.
Use of the two separate DA products-summing blocks is desirable in that the size of a look-up table is not made too large and the symmetry of a DCT matrix can be positively used. The output signals of the DA products-summing blocks 67 and 68 are supplied to a post-processing circuit 69 to rearrange the signals to form new set of signals. A set of output signals of the post-processing circuit 69 is supplied to the shift/round circuit 62, 65.
FIG. 6 shows the fundamental structure of such a DCT processing circuit using look-up tables. N input signals are supplied as addresses to an element factor ROM 81 which stores DCT look-up tables, and a products-summing calculation is performed using the look-up tables. If the input x.sub.i is a sign bit, the sign of the output signal of the look-up table 81 is inverted by a signal Ts. The sign-inverted signal is supplied to an adder 83 to be added to an output of a coefficient circuit 84. The adder 83 delivers an output signal Y.sub.i.
The output signal Y.sub.i is halved by the coefficient circuit, 84 and fed back to the adder 83.
Next, the input of one-bit higher is supplied to the look-up table 81 and processed in a similar manner as above. The output of the look-up table 81 is supplied to the adder 83 to be added to the previous calculation result with its digits being aligned by the coefficient circuit 84, to thereby generate the output signal Y.sub.i. The coefficient circuit 84 is used for aligning bit positions.
If an input signal has 15 bits, it takes generally 15 cycles to process the input signal. In order to process an 8*8 block in real time, it is necessary for the calculation to be completed within 8 cycles, even if pipelining technique is used. It is possible to complete the calculation within 8 cycles by using 2 bits of the input signal at a time and doubling the capacity of look-up tables.
As seen from the DCT matrices shown in FIG. 4A, the first to fourth rows of each column of the cosine factor matrix D are symmetrical with the fifth to eighth rows of the same column. The same look-up table can therefore be used both for the first to Fourth rows and for the fifth to eighth rows. It is efficient to divide eight input signals into two signal groups each having four signals, and to use the same look-up table for each group.
FIG. 7 shows the structure of a DCT processing circuit for processing an 8*8 block wherein input signals are divided into two groups each having four signals, and two bits of each signal are supplied at a time to the DCT processing circuit.
An element factor ROM 81a and element factor ROM 81b each have look-up tables of the same contents. A set of upper bits and a set of lower bits, respectively of the four input signals, are supplied to look-up tables. A lower bit is supplies to a look-up table 81b, and the output of the table 81b is halved by a coefficient circuit 86 and added to an output signal of an upper bit look-up table 81a by an adder 83.
If the input is a sign bit, the sign of the input signal is inverted in response to a signal Ts and added to an output of the coefficient circuit 86 to generate an output signal Y.sub.1. This output signal Y.sub.i is fed back to the adder 83 via a coefficient circuit 87 which divides the output signal Y.sub.1 by 4 and feeds it back to the adder 83. This division by 4 is necessary because of the calculation of two bits at a time, so that the preceding calculation result will not be multiplied by 4 at the current calculation.
FIG. 5C shows tire structure of the DA products-summing calculation block to be used by the one-dimensional DCT processing unit shown in FIG. 5B. Each DA products-summing calculation block is inputted with four groups of input signals, each of two bits. Two bits of each of four input signals are divided into an upper bit and a lower bit. The lower bit is supplied to a lower bit look-up table 71a or 72a, and the upper bit is supplied to an upper bit look-up table 71b or 72b.
Namely, the lower bit look-up tables 71a and 72a and the upper bit look-up tables 71b and 72b are supplied with lower and upper bits of the same combination of input signal, and performs the same transformation.
An output signal from the lower bit look-up table 71a, 72a is halved by a coefficient circuit 73, and supplied to an adder 74. An output signal from the upper bit look-up table 71b, 72b is directly supplied to the adder 74.
The coefficient circuit 73 operates to align the digit position of the data read from the tables by using the upper and lower bits, and the adder 74 adds the data. An output signal from the adder 74 is supplied to an accumulator 75 to calculate a products-sum. The accumulator 75 includes an adder 74, a register 78, and a shifter 79. The preceding output signal is bit-shifted by the shifter 79 and fed back to the adder 77.
The adder 77 adds the preceding output signal to the current output signal, and the result is stored in the register 78. For example, in the calculation starting from the lowest bit, the shifter 79 divides the output signal by 4 to align the number of bits with that of the preceding calculation. In the calculation starting from the highest bit, the shifter 79 multiplies the output signal by 4 to align the number of bits with that of the preceding calculation.
The above-described manner, a DCT calculation is performed by using the DA products-summing blocks shown in FIG. 5C.
In an inverse DCT processing, the transposed matrix D.sup.t shown in FIG. 4A is used. D.sup.t is not symmetric as the transformation matrix D. However, it is symmetric with respect to odd and even columns, 1st column and 8th column, 2nd column and 7th column, 3rd column and 6 th column, and 4th column and 5th column.
In the inverse DCT processing, therefore, the scale of a look-up table can be reduced like the forward DCT processing, by separating the transformation matrix into matrices for odd, and even numbers.
Tables 1 to 8 show the contents of look-up tables shown in FIG. 5C.
The forward DCT look-up tables shown in Tables 1 to 8 correspond to the first to eighth columns of the transformation matrix, and the input signals x1 to x4 correspond to the first to fourth rows of the transformation matrix.
In the inverse DCT look-up tables shown in Tables 1 to 8, No. 0 of Table 1 and No. 1 of Table 2 correspond to the first and eighth columns, No. 0 corresponds to the odd row and No. 1 corresponds to the even row. Similarly, No. 2 of Table 3 and No. 3 of Table 4 correspond to the second and seventh columns. No. 4 of Table 5 and No. 5 of Table 6 correspond to the third and sixth columns. No. 6 of Table 7 and No. 7 of Table 8 correspond to the fourth and fifth columns.
Numbers of the forward and inverse DCT look-up tables shown in Tables 1 to 8 are the same for Nos. 1, 3, 5, and 7. Therefore, a single look-up table can be shared.
Each of the eight DA products-summing blocks shown in FIG. 5B uses four look-up tables shown in FIG. 5C. Therefore, the one dimensional DCT processing is carried out by using 4*8=32 look-up tables, It is necessary to use 32*2=64 look-up tables for the two-dimensional DCT processing.
However, some look-up tables can be shared for the forward and inverse DCT processing as seen from Tables 1 to 8, so that the number of necessary look-up tables becomes 48.
Furthermore, the look-up table shown in FIG. 1 has high symmetry which allows a simpler circuit configuration,
FIGS 8A and 8B show an example of a simplified DCT processing circuit using the symmetry of a look-up table. As shown in FIG. 8A, in the forward DCT look-up table No. 0 for example, if each number is subtracted by 8192, the upper half and lower half of Table become symmetrical.
Namely, the contents of the look-up table can be halved if bit x4 is used to exclusive-OR other bits and the signs thereof are inverted.
FIG. 8B shows the structure of a DA products-summing calculation block using such symmetry. Look-up tables 88a and 88b have half the contents of the look-up tables 81a and 81b shown in FIG. 7, and are inputted with three input signals generated by exclusive or gates.
The signal x4 is supplied via an exclusive OR gate to a sign inverter 82. The other structure is the same as that show in FIG. 7.
As described above, for simplifying the structure of the DCT processing unit, it is necessary to simplify the calculation by using the symmetry or the cosine factor matrix and transposed cosine factor matrix.
It is therefore necessary to divide input signals into groups each using the same transformation factor. In order to divide a plurality of input signals into a plurality of desired combinations of input signals, butterfly circuits in combinations of adders and subtracters are used.
FIG. 2 shows an example of a two-dimensional DCT processing unit having ROM tables which store matrix element factor for the DCT processing. Input signals are supplied to a shift register 121 which supplies in parallel the input signals to butterfly circuit 122 and to a selector 123. The butterfly circuit 122 divides a plurality of input signals into a plurality of desired input signal groups and outputs the latter to the selector 123.
The selector 123 selects one of the input signals in response to a forward/inverse select signal indicating whether the DCT processing unit performs a forward DCT processing or inverse DCT processing. The input signal selected by the selector 123 is supplied to a calculation ROM unit 124 to perform a matrix calculation.
An output signal from the calculation ROM unit 124 is supplied parallel to a selector 126 and to a butterfly circuit 125. The butterfly circuit 125 divides a plurality of input signals into a plurality of desired input signal groups, and outputs the latter to a selector 126.
The selector 126 selects one of the input signals in response to a forward/inverse select signal indicating whether the DCT processing unit performs a forward DCT processing or inverse DCT processing. The input signal selected by the selector 126 is supplied to an accumulator 127 which sums up the sequentially-supplied calculation results for respective bits.
An output signal from the accumulator 127 is supplied via a shift register 128 to a transposing circuit 130 to interchange rows and columns. Signals transposed by the transposing circuit 130 are supplied to another one-dimensional processing unit having the same structure described above.
Specifically, the other one-dimensional processing unit includes a shift register 131, butterfly circuit 132, selector 133, calculation ROM 134, butterfly circuit 135, selector 136, accumulator 137, and shift register 138. These elements perform similar operations to those of the corresponding elements 121 to 128, and the calculation result is outputted.
As described above, four butterfly circuits are used for the two-dimensional DCT processing by positively using the symmetry of the cosine factor matrix and transposed cosine factor matrix.
A DCT processing unit with look-up tables can execute the DCT processing without using multipliers as described above.
However, if the capacity of a look-up table is large, the chip area of a ROM occupied by the look-up table becomes large, increasing the chip size and power consumption.