1. Field of the Invention
The present invention relates generally to data processors and data processing methods and, more particularly, to an apparatus and method for carrying out discrete cosine transform or inverse cosine transform of data.
2. Description of the Background Art
In order to process video data at a high speed, high effective coding is carried out. In high effective coding, a data amount of a digital video signal is compressed with picture quality being maintained as high as possible. In high effective coding, a redundant component of the signal is first removed for efficient coding. For this purpose, orthogonal transform techniques are often employed. As one of the orthogonal transform techniques, discrete cosine transform DCT is provided. The DCT is implemented by a simple product sum operation using a cosine function as a coefficient. The DCT is defined by the following expression (1): EQU Y=AX (1)
where X is an N-term column vector indicating input data, Y is an N-term column vector indicating output data, and A is N by N coefficient matrix represented by the following expression. ##EQU1##
The expression (1) represents a case where input data X is of N terms. 2.sup.m points are generally employed, where m is a natural number. A description will now be made on 8 point DCT where N=8 (m=3). As can be seen from the expression (1), DCT is a matrix operation, and in practice, this processing is realized by product sum operation.
FIG. 1 shows configuration of a conventional DCT processor. This DCT processor is described in, for example, IEEE, Proceedings of Custom Integrated Circuits Conference 89, 1989, pp. 24.4.1 to 24.4.4.
Referring to FIG. 1, the conventional DCT processor includes eight sum product operation units 100a to 100h arranged in parallel for calculating respective terms y0 to y7 of output data Y.
Each of product sum operation units 100a to 100h is of the same configuration and includes a parallel multiplier 101 for taking a product of input data xi (i= 0 to 7) and a predetermined weighting coefficient, and an accumulator 102 for accumulating an output of parallel multiplier 101 to generate output data yj (j=0 to 7). Here, reference characters 101 and 102 generically denote respective components 101a to 101h and 102a to 102h. In the following description also, reference numerals having no suffixes generically denote corresponding elements.
Accumulator 102 includes a 2-input adder 103 for receiving an output of parallel multiplier 101 at its one input, and an accumulating register 104 for latching an output of adder 103. An output of register 104 is applied to an output terminal 106 and also to the other input of adder 103. Data yj of the respective terms of output data Y are sequentially output through a selector not shown from output terminal 106. An operation will now be described.
Identical data are applied through an input terminal 105 to product sum operation units 100a to 100h. The following arithmetic operation is carried in each of product sum operation units 100a-100h: ##EQU2## For example, data y0 of a zeroth term in an output data vector Y is calculated as follows in product sum operation unit 100a.
When receiving zeroth-term data x0 (hereinafter referred to simply as input data) in an input data vector, parallel multiplier 101a outputs a product A (0, 0).multidot.x0 of data x0 and a coefficient A (0, 0) to adder 103a. Register 104a is being reset, and the content thereof is 0. Accordingly, product A (0, 0).multidot.x0 is output from adder 103a and then stored in register 104a.
When input data x1 is applied, a product A (1, 0).multidot.x1 is output from multiplier 101a. An output of adder 103a is A (0, 0).multidot.x0+A (1, 0).multidot.x1 and stored in register 104a.
By repetition of such an operation, an output of accumulator 102a provided after application x7 is ##EQU3## so that output data y0 is obtained.
Similar calculation (which differs merely in values of a weighting coefficient A (i, j)) is carried out also in the remaining product sum operation units 100b-100h, and output data y1-y7 are obtained. These output data y0-y7 are sequentially output through output terminal 106.
In contrast to the DCT operation, there is an inverse DCT operation for carrying out the inverse operation of the DCT operation. The inverse DCT (IDCT) is expressed as follows: EQU X=A'Y
where an input data vector X is obtained from an output data vector Y. That is, only the difference between the DCT operation and the IDCT operation is a difference between coefficients A and A'. Thus, in the configuration of FIG. 1, the IDCT operation can be carried out by changing the coefficients in parallel multipliers 101a-101h.
In other words, the DCT and the IDCT can be carried out on the same hardware. An increase in hardware is only concerned with a control circuit (not shown) for making a selection between a coefficient for DCT. and that for IDCT.
The above-described one-dimensional DCT operation can be expanded to a two-dimensional DCT operation. The two-dimensional DCT operation is obtained by making both input data vector X and output data vector Y be two-dimensional vectors.
FIG. 2 shows configuration of a conventional two-dimensional DCT (or IDCT) processor. Referring to FIG. 2, the processor includes a first one-dimensional DCT processing section 111a for subjecting input data from input terminal 105 to one-dimensional DCT processing, a transposition circuit 112 for rearranging rows and columns of an output of first one-dimensional DCT processing section 111a, and a second one-dimensional DCT processing section 111b for subjecting an output of transposition circuit 112 to one-dimensional DCT processing. First one-dimensional DCT processing section 111a performs a DCT (or IDCT) operation in a row direction, and second one-dimensional DCT processing section 111b performs a DCT (or IDCT) operation in a column direction.
FIG. 3 is a diagram showing configuration of the transposition circuit of FIG. 2. Referring to FIG. 3, transposition circuit 112 includes a buffer memory 121 and an address generation circuit 122 for generating write/read addresses of buffer memory 121. Buffer memory 121 receives output data of first one-dimensional DCT processing section 111a through an input terminal 125 and sequentially stores the same therein in accordance with an address signal from address generation circuit 122. Also, buffer memory 121 applies corresponding data from an output terminal 126 to second one-dimensional DCT processing section 111b in accordance with an address signal from address generation circuit 122. An operation will now be described. Input data X and output data Y are two dimensional, the elements of which are each represented by x (i, j) and y (i, j), i, j=0, 1 . . . 7.
Input data are applied in the order of rows to first one-dimensional DCT processing section 111a. More specifically, input data are applied to input terminal 105 in the order of 8-term row vectors x (0, j), x (1, j), . . . x (7, j).
First one-dimensional DCT processing section 111a performs the DCT operation for each row vector to output intermediate data Z. At that time, first DCT processing section 111a outputs intermediate data of row vectors in the order of rows, i.e., z (0, j), z (1, j) . . . . Accordingly, a DCT operation in the row direction of input data X is carried out.
As shown in FIG. 3, transposition circuit 112 first stores the intermediate data from first DCT processing section 111a into buffer memory 121 in the order of receiving of the intermediate data (the order of rows).
Then, intermediate data Z are read in the order of columns, i.e., the order of column vectors z (i, 0), z (i, 1) . . . from buffer memory 121.
Intermediate data Z read in the order of columns are applied to second DCT processing section 111b. Second DCT processing section 111b carries out on the intermediate data one-dimensional DCT processing. Accordingly, data subjected to one-dimensional DCT processing in the column direction are output from second one-dimensional DCT processing section 111b. Output data Y from second one-dimensional DCT processing section 111b are output in the order of columns from output terminal 106. As a result, two-dimensional DCT shown by the following equation (3) is performed. ##EQU4##
First and second DCT processing sections 111a and 111b carry out the same processing except for coefficients in the parallel multiplying circuits. If multiplication coefficients of first and second DCT processing sections 111a and 111b are changed, two-dimensional IDCT shown by the following equation (4) is carried out. ##EQU5##
The DCT processing and IDCT processing as shown above include a product sum operation. A product operation of this product sum operation is carried out by the parallel multipliers shown in FIG. 1. A multiplier in general requires a large number of adders and the like and has a large scale. Thus, there is a disadvantage that a conventional DCT processor requiring a plurality of parallel multipliers is not allowed to be sized-down.
In a semiconductor integrated circuit for carrying out a synchronization operation, the upper limit of operation speed is determined by a worst delay path (the path which provides a maximum delay). In the conventional configuration, the worst delay path is established by a parallel multiplier, and the operation speed depends on processing speed of the parallel multiplier. It is thus difficult to implement a fast DCT processing and a fast IDCT processing.