1. Field of the Invention
The present invention relates to a discrete cosine transformer for performing discrete cosine transform in a device for compressing image data, and to an inverse discrete cosine transformer for performing inverse discrete cosine transform in a device for decompressing image data, for example. More specifically, the present invention relates to a discrete cosine transformer and an inverse discrete cosine transformer having small circuit scale and small power consumption.
2. Description of the Background Art
Image signals contain formidable amount of data. Therefore, it is a general practice to compress the amount of data for transmission or recording, and to decompress to the original data amount at the time of reception or reproduction. Generally, image data has much low frequency component but less high frequency component. Therefore, the low frequency component is subjected to fine quantization while the high frequency component is subjected to coarse quantization. For this purpose, it is necessary to transform the image data to frequency components. Discrete cosine transformation (hereinafter referred to as DCT) is used as a method for transforming image data to frequency component. Inverse discrete cosine transformation (hereinafter referred to as IDCT) is used as a method for reverse transformation.
Generally, image data is processed on a unit of 8.times.8 pixels, which requires two-dimensional DCT of 8.times.8 as well as a two-dimensional IDCT of 8.times.8. However, direct implementation of two-dimensional DCT or IDCT results in a considerably large circuit scale. Accordingly, the two-dimensional DCT or two-dimensional IDCT is realized by performing one-dimensional DCT or one-dimensional IDCT once for the longitudinal direction and once for the lateral direction of the image.
The one-dimensional DCT and one-dimensional IDCT will be described in the following. Let us represent 8 input signals as X.sub.j (j=0 to 7). The DCT can be represented by the following expression (1). ##EQU1## 8Y.sub.i (i=0 to 7) obtained by this expression are referred to as DCT coefficients.
The expression (1) can be represented by the following matrix representation (2). ##EQU2## where C.sub.k (k=1 to 7) represents cos (k .pi./16).
The following matrix representations (3) and (4) are derived from matrix representation (2). ##EQU3##
Meanwhile, one-dimensional IDCT for inverse transformation to one-dimensional DCT is represented by the following expression (5). ##EQU4##
The expression (5) can be represented by the following matrix representation (6). ##EQU5##
The following matrix representations (7) and (8) are derived from the matrix representation (6). ##EQU6##
When DCT and IDCT are to be performed actually, in order to reduce the number of multiplication, not the 8.times.8 matrix representations of (2) and (6) but 4.times.4 matrix representations of (3), (4), (7) and (8) of the representations for DCT and IDCT listed above are used.
For multiplication and accumulation implementing the matrix operation, a multiplier accumulator including a multiplier and an adder are often used. However, the multiplication and accumulation using the multiplier accumulator results in a large scale multiplier. Accordingly, a method for performing multiplication and accumulation using a memory instead of the multiplier has been proposed. Once such method of multiplication and accumulation using a memory and an adder is disclosed, for example, in A. Peled and B. Liu, "A New Hardware Realization of Digital Filters", IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-22, pp. 456-462, December 1974., in which distributed arithmetic method is discussed.
The distributed arithmetic method will be described in the following. Multiplication and accumulation of the variable X.sub.j (j=0 to M-1) and a fixed coefficient Cj(j=0 to M-1) of the following equation (9) will be considered. ##EQU7##
When Xj is represented by a 2's complement of N bits, Xj will be given by the following expression (10). ##EQU8## where X.sub.j (k) represents a bit at the kth bit position (lower by k than the most significant bit), which is either 0 or 1.
From the expression (10), the expression (9) can be modified to the expression (11). ##EQU9##
When the partial sum Z.sub.k is defined by the following expression (12), the expression (11) will be given by the following expression (13). ##EQU10##
In this manner, the result Y of multiplication and accumulation of variable X.sub.j (j=0 to M-1) and the fixed coefficient C.sub.j (j=0 to M-1) is given by the expression (13). Note the partial sum Z.sub.k defined by the expression (12). Components of this expression are C.sub.j (j=0 to M-1) and X.sub.j (k)(j=0 to M-1), that is, a bit train including M bits (X.sub.0 (k) X.sub.1 (k) X.sub.2 (k) . . . X.sub.M-1 (k)) which are the bits at the kth bit position of the variables X.sub.j (j=0 to M-1) (hereinafter, the bit train will be referred to as M bit train X.sub.j (k)(j=0 to M-1)). Here, C.sub.j is a fixed coefficient. Accordingly, the partial sum Z.sub.k is a function of the M bit train X.sub.j (k)(j=0 to M-1). Accordingly, values which the partial sum Z.sub.k can assume are stored in advance in a memory so that when the memory is accessed using (inputting) a bit train X.sub.j (k)(j=0 to M-1) as an address, a partial sum Z.sub.k corresponding to the bit train can be read (output). Multiplication and accumulation is possible when each of the partial sums Z.sub.k read from the memory is shifted as represented by the expression (13) in accordance with the bit position k of the bits of the bit train constituting the partial sum and the shifted sums are added.
Referring to FIG. 1, a one-dimensional DCT circuit (discrete cosine transformer) employing the distributed arithmetic is adapted such that 8 multiplication and accumulation operations implementing the matrix representations of (3) and (4) are performed by distributed arithmetic on input X.sub.j (j=0 to 7) and to output DCT coefficients Y.sub.i (i=0 to 7). Here, it is assumed that both X.sub.j (j=0 to 7) and Y.sub.i (i=0 to 7) are data of 9 bits, respectively.
The operation of the circuit will be described in the following. 8 data X.sub.0 to X.sub.7 input from an input terminal 401 are successively transferred to registers 402 to 409, and held in registers 409 to 402, respectively. Outputs X.sub.0 to X.sub.7 from registers 409 to 402 are added or subtracted in adder/subtractor 410. Results of addition X.sub.0 +X.sub.7, X.sub.1 +X.sub.6, X.sub.2 +X.sub.5 and X.sub.3 +X.sub.4 are held by registers 411 to 414, respectively. Results of subtraction X.sub.0 -X.sub.7, X.sub.1 -X.sub.6, X.sub.2 -X.sub.5 and X.sub.3 -X.sub.4 are held in registers 415 to 418, respectively. Since input X.sub.j (j=0 to 7) is data containing 9 bits, the results of addition X.sub.0 +X.sub.7, X.sub.1 +X.sub.6, X.sub.2 +X.sub.5 and X.sub.3 +X.sub.4 as well as the results of subtraction X.sub.0 -X.sub.7, X.sub.1 -X.sub.6, X.sub.2 -X.sub.5 and X.sub.3 -X.sub.4 are data of 10 bits, respectively.
Referring to FIGS. 2A and 2B, operations of a bit distributor 419 receiving as inputs the outputs X.sub.0 +X.sub.7, X.sub.1 +X.sub.6, X.sub.2 +X.sub.5 and X.sub.3 +X.sub.4 from registers 411 to 414, and a bit distributor 420 receiving as inputs the outputs X.sub.0 -X.sub.7, X.sub.1 -X.sub.6, X.sub.2 -X.sub.5 and X.sub.3 -X.sub.4 from registers 415 to 418, respectively will be described in the following. In bit distributor 419, outputs from registers 411 to 414 are divided into ten signal lines corresponding to the most significant to the least significant bits (most significant bit 9! to least significant bit 0!). Ten sets of four signal lines representing 4 bits of signals of the same bit position are provided and connected to a selector 479. In response to a select signal output from a select signal generating circuit 479b, values of ten sets of 4 bit signal lines are successively selected by selector 479 starting from the set of the most significant bit, and output to multiplication and accumulation blocks 421 to 424. More specifically, the values of 4 bit signal lines are output to multiplication and accumulation blocks 421 to 424 in the order of signal lines 9!, 8!, . . . , 0!. Similarly, in bit distributor 420, the values of ten sets of 4 bit signal lines are provided from the outputs of registers 415 to 418, and 4 bit signal lines are connected to multiplication and accumulation blocks 415 to 418, in the order of signal lines 9!, 8!, . . . , 0!.
In summary, ten sets of 4 bit trains including bits of the same bit position of the outputs X.sub.0 +X.sub.7, X.sub.1 +X.sub.6, X.sub.2 +X.sub.5 and X.sub.3 +X.sub.4 which are outputs from registers 111 to 114, respectively, are output from bit distributor 419 to multiplication and accumulation blocks 421 to 424, starting from the set of the most significant bit. Similarly, from bit distributor 420, ten sets of 4 bit trains which include bits of the same bit position of X.sub.0 -X.sub.7, X.sub.1 -X.sub.6, X.sub.2 -X.sub.5 and X.sub.3 -X.sub.4, which are outputs from registers 115 to 118, respectively, are output to multiplication and accumulation blocks 425 to 428 successively, starting from the set of the most significant bit.
Thereafter, the multiplication and accumulation blocks 421 to 424 successively receive ten sets of 4 bit trains from bit distributor 419, and carry out four multiplication and accumulation operations implementing the matrix representation of (3), to calculate Y.sub.0, Y.sub.2, Y.sub.4 and Y.sub.6, respectively. Multiplication and accumulation blocks 425 to 428 successively receive ten sets of 4 bit trains from bit distributor 420, perform four multiplications and accumulations implementing the matrix representation of (4), to calculate Y.sub.1, Y.sub.3, Y.sub.5 and Y.sub.7, respectively. These operations will be described taking calculation of Y.sub.0 by multiplication and accumulation block 421 as an example.
Referring to FIGS. 3A and 4A, ROM 429 provided in multiplication and accumulation block 421 is adapted to output 0 when the input 4 bit train is (0000), to output C.sub.4 when the 4 bit train is (0001), . . . and 4C.sub.4 when (1111), respectively. The respective bit values of the 4 bit trains are the values of bits of the same position of X.sub.0 +X.sub.7, X.sub.1 +X.sub.6, X.sub.2 +X.sub.5 and X.sub.3 +X.sub.4. An output from ROM 429 is, when a 4 bit train is an input, a partial sum of Y.sub.0 ={C.sub.4 (X.sub.0 +X.sub.7)+C.sub.4 (X.sub.1 +X.sub.6)+C.sub.4 (X.sub.2 +X.sub.5)+C.sub.4 (X.sub.3 +X.sub.4)}/2. Though not shown, in multiplication and accumulation block 421, output polarity of ROM 429 for every ten inputs is inverted. This corresponds to the fact that a partial sum having 4 bit trains of the most significant bit is negative. In the following, it is assumed that the ROM 429 is structured as described above.
First, 4 bit trains consisting of the most significant bits of X.sub.0 +X.sub.7, X.sub.1 +X.sub.6, X.sub.2 +X.sub.5 and X.sub.3 +X.sub.4 from bit distributor 419 are input to ROM 429. From ROM 429, a partial sum of Y.sub.0 including as component 4 bit trains of the most significant bit will be output. The output from ROM 429 is input to adder 437 where it is added to an initial output value 0 of shifter 453, and the result is held in register 445. Namely, in this stage, a partial sum of Y.sub.0 having 4 bit trains of the most significant bit as a component is held in register 445.
Thereafter, 4 bit trains consisting of bits at a bit position 1 of X.sub.0 +X.sub.7, X.sub.1 +X.sub.6, X.sub.2 +X.sub.5 and X.sub.3 +X.sub.4 from bit distributor 419 are input to ROM 429. Consequently, a partial sum of Y.sub.0 having 4 bit trains of the bits at bit position 1 is output from ROM 429. The output from ROM 429 is input to adder 437. Meanwhile, the value held in register 445 is shifted upward by 1 bit by shifter 453 and input to adder 437. The result of addition at adder 437 is held in register 445. Namely, in this stage, register 445 holds the result of addition of the partial sum (shifted upward by 1 bit) of Y.sub.0 having 4 bit trains of the most significant bit as a component and a partial sum of Y.sub.0 having 4 bit trains of the bits at bit position 1 as a component.
Thereafter, 4 bit trains of the bits at bit position 2, 4 bit trains of the bits at bit position 3, . . . 4 bit trains of the least significant bit (bits at bit position 9) of X.sub.0 +X.sub.7, X.sub.1 +X.sub.6, X.sub.2 +X.sub.5 and X.sub.3 +X.sub.4 from bit distributor 419 are input in this order to ROM 429. Every time, the output from ROM 429 is added to the data (shifted by 1 bit upward by shifter 453) held in register 445 in adder 437, and the result of addition is held in register 445.
Ten sets of 4 bit trains consisting of bits of the same bit position of X.sub.0 +X.sub.7, X.sub.1 +X.sub.6, X.sub.2 +X.sub.5 and X.sub.3 +X.sub.4 are input from bit distributor 419 to multiplication and accumulation block 421 starting from the ones of the most significant bit. After the end of above described operation, the data held in register 445 is rounded to 9 bits by a rounding circuit 461, held in register 469 and then output from multiplication and accumulation block 421.
Now, assume that ten sets of 4 bit trains consisting of the bits of the same bit position of X.sub.0 +X.sub.7, X.sub.1 +X.sub.6, X.sub.2 +X.sub.5 and X.sub.3 +X.sub.4 are all input from bit distributor 419 to multiplication and accumulation block 421 and the above described operation is completed. At that time, determining from the content of operation and expression (13), the data held in register 445 is Y.sub.0. Namely, Y.sub.0 is output from multiplication and accumulation block 421.
Referring to FIGS. 3B to 3D, in multiplication and accumulation blocks 422 to 424, the relation between input/output of ROM 430 to 432 are as shown in FIG. 4A, respectively, and these blocks perform similar operation as in multiplication and accumulation block 421. Therefore, ten sets of 4 bit trains consisting of the bits at the same bit position of X.sub.0 +X.sub.7, X.sub.1 +X.sub.6, X.sub.2 +X.sub.5 and X.sub.3 +X.sub.4 are input starting from the ones consisting of the most significant bit, from bit distributor 419, and Y.sub.2, Y.sub.4 and Y.sub.6, are output, respectively.
Referring to FIGS. 3E to 3H, the relations between input/output of ROMs 433 to 436 of multiplication and accumulation blocks 425 to 428 are as shown in FIG. 4B, respectively, and the multiplication and accumulation blocks perform similar operation as in accumulation and multiplication block 421. Accordingly, ten sets of 4 bit trains consisting of bits at the same bit positions of X.sub.0 -X.sub.7, X.sub.1 -X.sub.6, X.sub.2 -X.sub.5 and X.sub.3 -X.sub.4 are successively input starting the ones consisting of the most significant bits from bit distributor 420, and Y.sub.1, Y.sub.3, Y.sub.5 and Y.sub.7 are output, respectively.
Here, coefficient of each of the matrixes (3) and (4) is assumed to have 10 bits. Accordingly, the number of bits of the outputs ROMs 429 to 436 is 12 and the number of bits of the output from adders 437 to 444, registers 445 to 452 and shifters 453 to 460 is 21.
Finally, Y.sub.0, Y.sub.2, Y.sub.4, Y.sub.6, Y.sub.1, Y.sub.3, Y.sub.5 and Y.sub.7 output from multiplication and accumulation blocks 421 to 428, respectively, are input to an output selector 477. The respective input values are selected in a prescribed order, output from a successive output terminal 478, and one-dimensional DCT is completed.
Here, in the one-dimensional DCT circuit, X.sub.j (j=0 to 7) are input successively in 8 steps from input terminal 401. Meanwhile, for calculation of Y.sub.0 to Y.sub.7 in multiplication and accumulation blocks 421 to 428, ten steps, that is, same as the number of bits of X.sub.0 +X.sub.7, X.sub.1 +X.sub.6, X.sub.2 +X.sub.5 and X.sub.3 +X.sub.4 as well as X.sub.0 -X.sub.7, X.sub.1 -X.sub.6, X.sub.2 -X.sub.5 and X.sub.3 -X.sub.4 are necessary. Namely, the time of calculation is longer than the time of input. This means that the DCT circuit is not capable of real time operation. Therefore, when input is continuously given through input terminal 401 to the DCT circuit, calculation cannot catch up. Therefore, continuous output from input terminal 401 is not possible, which means that efficient image compression is impossible.
In order to solve this problem, a method has been proposed, for example, in S. Uramoto et al., "A 100-MHz 2-D Discrete Cosine Transform Core Processor", IEEE Journal of Solid-State Circuits, vol. 27, No. 4, pp. 492-499, April 1992, in which a plurality of 4 bit trains are processed two sets by two sets simultaneously, each train consisting of bits of the same bit position of the input data.
Referring to FIG. 5, a one-dimensional DCT circuit (discrete cosine transformer) is adapted such that matrix representations (3) and (4) are implemented by distributed arithmetic on input X.sub.j (j=0 to 7), and a DCT coefficient Y.sub.i (i=0 to 7) is output. Here, it is assumed that X.sub.j (j=0 to 7) and Y.sub.i (i=0 to 7) are data of 9 bits, respectively.
The operation of the circuit will be described in the following. 8 data X.sub.0 to X.sub.7 input from an input terminal 501 are successively transferred to registers 502 to 509 in the order of input, and held in registers 509 to 502, respectively. Outputs X.sub.0 to X.sub.7 of registers 509 to 502 are added or subtracted in an adder/subtractor 510, respectively. The result of addition X.sub.0 +X.sub.7, X.sub.1 +X.sub.6, X.sub.2 +X.sub.5 and X.sub.3 +X.sub.4 are held in registers 511 to 514, respectively. The result of subtraction X.sub.0 -X.sub.7, X.sub.1 -X.sub.6, X.sub.2 -X.sub.5 and X.sub.3 -X.sub.4 are held in registers 515 to 518, respectively. Since input X.sub.j (j=0 to 7) is data of 9 bits, the results of addition X.sub.0 +X.sub.7, X.sub.1 +X.sub.6, X.sub.2 +X.sub.5 and X.sub.3 +X.sub.4 as well as the results of subtraction X.sub.0 -X.sub.7, X.sub.1 -X.sub.6, X.sub.2 -X.sub.5 and X.sub.3 -X.sub.4 are each data of 10 bits.
Referring to FIGS. 6A and 6B, operations of bit distributor 519 receiving outputs X.sub.0 +X.sub.7, X.sub.1 +X.sub.6, X.sub.2 +X.sub.5 and X.sub.3 +X.sub.4 of registers 511 to 514, respectively and bit distributor 520 receiving outputs X.sub.0 -X.sub.7, X.sub.1 -X.sub.6, X.sub.2 -X.sub.5 and X.sub.3 -X.sub.4 of registers 515 to 518, respectively, will be described in the following. In bit distributor 519, output from registers 511 to 514 are distributed to ten signal lines (most significant 9! to least significant 0!) from the most to least significant bits, and ten sets of 4 bits signal lines including a combination of four signal lines of the same bit position are provided. Of these, five sets of 4 bit signal lines 9!, 7!, 5!, 3! and 1! are connected to selector 603, while five sets of 4 bit signal lines 8!, 6!, 4!, 2! and 0! are connected to selector 604. In response to a select signal output from a select signal generating circuit 603b, the five sets of 4 bit signal lines are successively selected starting from the set of higher bit, by selectors 603 and 604. More specifically, two sets of 4 bit signal lines, that is, values of signal lines 9! and 8!, 7! and 6!, 5! and 4!, 3! and 2! and 1! and 0! are simultaneously output to multiplication and accumulation blocks 521 to 524. Similarly, in bit distributor 520, ten sets of 4 bit signal lines are provided based on the output from registers 515 to 518, and two sets of 4 bit signal lines, that is, the values of signal lines 9! and 8!, 7! and 6!, 5! and 4!, 3! and 2! and 1! and 0! are output to multiplication and accumulation blocks 525 to 528.
In summary, ten sets of 4 bit trains consisting of the bits of the same bit positions of four data X.sub.0 +X.sub.7, X.sub.1 +X.sub.6, X.sub.2 +X.sub.5 and X.sub.3 +X.sub.4 which are the outputs from registers 511 to 514 are output two sets by two sets starting from the most significant bit, to multiplication and accumulation blocks 521 to 524. Similarly, from bit distributor 520, ten sets of 4 bit trains consisting of bits of the same bit positions of four data X.sub.0 -X.sub.7, X.sub.1 -X.sub.6, X.sub.2 -X.sub.5 and X.sub.3 -X.sub.4 which are the outputs of registers 515 to 518 are output successively two sets by two sets starting from the most significant bit, to multiplication and accumulation blocks 525 to 528.
Then, multiplication and accumulation blocks 521 to 524 successively receive the ten sets of 4 bit trains two sets by two sets from bit distributor 519, perform four multiplications and accumulations implementing the matrix representation (3), and calculate Y.sub.0, Y.sub.2, Y.sub.4 and Y.sub.6, respectively. Multiplication and accumulation blocks 525 to 528 successively receive the ten sets of 4 bit trains two sets by two sets from bit distributor 520, perform multiplications and accumulations implementing the matrix representation (4) and calculate Y.sub.1, Y.sub.3, Y.sub.5 and Y.sub.7, respectively. These operations will be described, taking multiplication and accumulation block 521 calculating Y.sub.0 as an example.
Referring to FIG. 7A, ROMs 529 and 530 provided in multiplication and accumulation block 521 are adapted to output 0 when input 4 bit train is (0000), C.sub.4 when it is (0001), . . . , and 4C.sub.4 when it is (1111), as shown in FIG. 4A. The bit values of respective 4 bit trains are the values of the bits of the same bit position of X.sub.0 +X.sub.7, X.sub.1 +X.sub.6, X.sub.2 +X.sub.5 and X.sub.3 +X.sub.4. The outputs from ROMs 529 and 530 are partial sum of Y.sub.0 ={C.sub.4 (X.sub.0 +X.sub.7)+C.sub.4 (X.sub.1 +X.sub.6)+C.sub.4 (X.sub.2 +X.sub.5)C.sub.4 (X.sub.3 +X.sub.4)}/2, when a 4 bit train is input. In multiplication and accumulation block 521, though not shown, the output polarity of upper ROM (ROM 529) is inverted for every five inputs. This corresponds to the fact that a partial sum having the 4 bit train consisting of the most significant bit as a component is negative. The following description is on the premise that ROMs 529 and 530 are structured as described above.
First, 4 bit trains consisting of the most significant bit of X.sub.0 +X.sub.7, X.sub.1 +X.sub.6, X.sub.2 +X.sub.5 and X.sub.3 +X.sub.4, and 4 bit trains consisting of the bits at bit position 1 are input from bit distributor 519 to ROMs 529 and 530. Consequently, a partial sum of Y.sub.0 having 4 bit trains consisting of the most significant bit as a component is output from ROM 529. A partial sum of Y.sub.0 having 4 bit trains consisting of the bit at bit position 1 as a component is output from ROM 530. Two outputs from ROM 529 and 530 are added in adder 553 after the output of ROM 529 is shifted by 1 bit upward by shifter 545. The result of addition is added to initial output 0 of shifter 577 in adder 561, and the result of addition is held in register 569. More specifically, in this stage, register 569 holds the result of addition of a partial sum (shifted by 1 bit upward) having 4 bit trains of the most significant bit as a component and partial sum of Y.sub.0 having 4 bit trains of the bits at bit position 1 as a component.
Thereafter, 4 bit trains consisting of bits at bit position 2 of X.sub.0 +X.sub.7, X.sub.1 +X.sub.6, X.sub.2 +X.sub.5 and X.sub.3 +X.sub.4 and 4 bit trains consisting of bits at bit position 3 of X.sub.0 +X.sub.7, X.sub.1 +X.sub.6, X.sub.2 +X.sub.5 and X.sub.3 +X.sub.4 are respectively input to ROMs 529 and 530 from bit distributor 519. Consequently, a partial sum of Y.sub.0 having 4 bit trains consisting of bits at bit position 2 as a component is output from ROM 529. A partial sum of Y.sub.0 having 4 bit trains consisting of bits at bit position 3 as a component is output from ROM 530. The two outputs from ROMs 529 and 530 are added in adder 553 after the output from ROM 529 is shifted by 1 bit upward by shifter 545. The result of addition is added to data held in register 569, which has been shifted 2 bits upward by shifter 577, and the result of addition is held in register 569. More specifically, in this stage, register 569 holds the result of addition of partial sum (shifted by 3 bits upward) of Y.sub.0 having 4 bit trains consisting of most significant bits as a component, a partial sum (shifted by 2 bits upward) of Y.sub.0 having 4 bit trains consisting of bits at bit position 1 as a component, a partial sum (shifted by 1 bit upward) of Y.sub.0 having 4 bit trains consisting of bits at bit position 2 as a component, and a partial sum of Y.sub.0 having 4 bit trains consisting of bits at bit position 3 as a component.
Thereafter, 4 bit trains consisting of bits at bit position 4, 4 bit trains consisting of bits at bit position 5, 4 bit trains consisting of bits at bit position 6, 4 bit trains consisting of bits at bit position 7, 4 bit trains consisting of bits at bit position 8 and thereafter 4 bit trains of least significant bits (bits at bit position 9) of X.sub.0 +X.sub.7, X.sub.1 +X.sub.6, X.sub.2 +X.sub.5 and X.sub.3 +X.sub.4 are input to ROMs 529 and 530, successively, in this order, from bit distributor 519, and at every input, the output from ROM 529 (shifted by 1 bit upward by shifter 545) and output from ROM 530 are added in adder 553. The result of addition is added to data (shifted by 2 bits by shifter 577) held in register 569 in adder 561. The result of addition is held in register 569.
Thereafter, ten sets of 4 bit trains consisting of bits at the same bit positions of X.sub.0 +X.sub.7, X.sub.1 +X.sub.6, X.sub.2 +X.sub.5 and X.sub.3 +X.sub.4 are all input two sets by two sets starting from those consisting of the most significant bits from bit distributor 519 to multiplication and accumulation block 521. After the end of the above described operation, the data held in register 569 is rounded to 9 bits in rounding circuit 585, held in register 593 and output from multiplication and accumulation block 521.
Now, assume that ten sets of 4 bit trains consisting of bits of respective same bit positions of X.sub.0 +X.sub.7, X.sub.1 +X.sub.6, X.sub.2 +X.sub.5 and X.sub.3 +X.sub.4 are all input two sets by two sets starting from the ones consisting of the most significant bits from bit distributor 519 to multiplication and accumulation block 521 and the above described operation is completed. At this time, determining from the content of operation and expression (13), the data held in register 569 is Y.sub.0, and hence Y.sub.0 is output from multiplication and accumulation block 521.
Further, in multiplication and accumulation blocks 522 to 524, the relation between input/output of ROMs 531 and 532, 533 and 534 and 535 and 536 are as shown in FIG. 4A, and these blocks operate in the similar manner as multiplication and accumulation block 521. Therefore, ten sets of 4 bit trains consisting of bits of the same bit positions of X.sub.0 +X.sub.7, X.sub.1 +X.sub.6, X.sub.2 +X.sub.5 and X.sub.3 +X.sub.4 are input two sets by two sets starting from those consisting of the most significant bit from bit distributor 519, and Y.sub.2, Y.sub.4 and Y.sub.6 are output, respectively.
Referring to FIG. 7B, in multiplication and accumulation blocks 525 to 528, the relations between input and output of ROMs 537 and 538, 539 and 540, 541 and 542 and 543 and 544 are as shown in FIG. 4B, and these blocks operate in the similar manner as multiplication and accumulation block 521. Therefore, ten sets of 4 bit trains consisting of bits of the same bit positions of X.sub.0 -X.sub.7, X.sub.1 -X.sub.6, X.sub.2 -X.sub.5 and X.sub.3 -X.sub.4 are input two sets by two sets successively starting from the ones consisting of most significant bits from bit distributor 520, and Y.sub.1, Y.sub.3, Y.sub.5 and Y.sub.7 are output.
Now, each coefficient of the matrixes shown in (3) and (4) is assumed to have 10 bits. Accordingly, the number of bits of the outputs from ROMs 529 to 544 is 12, the number of bits of the outputs from adders 553 to 560 is 13, and that of adders 561 to 568, registers 569 to 576 and shifters 577 to 584 is 21.
Finally, Y.sub.0, Y.sub.2, Y.sub.4, Y.sub.6, Y.sub.1, Y.sub.3, Y.sub.5 and Y.sub.7 output from multiplication and accumulation blocks 521 to 528, respectively, are input to an output selector 601. The respective input values are selected in a prescribed order and output successively from an output terminal 602. Thus one-dimensional DCT is completed.
Now, in the one-dimensional DCT circuit, X.sub.j (j=0 to 7) are successively input in 8 steps from input terminal 501. Meanwhile, calculation of Y.sub.0 to Y.sub.7 in multiplication and accumulation blocks 521 to 528 requires 5 steps, that is, half the number of bits of X.sub.0 +X.sub.7,X.sub.1 +X.sub.6,X.sub.2 +X.sub.5 and X.sub.3 +X.sub.4 and X.sub.0 -X.sub.7,X.sub.1 -X.sub.6,X.sub.2 -X.sub.5 and X.sub.3 -X.sub.4. Namely, the time for calculation is shorter than the time for input. It means that the DCT circuit is capable of real time operation. Therefore, when input is continuously provided from input terminal 501 to the DCT circuit, calculation can be done in time without fail. This allows continuous input from input terminal 501, and image compression can be done efficiently.
In this manner, by using a memory and an adder, one-dimensional DCT is possible in real time without using a multiplier. However, in the one-dimensional DCT circuit, 8 multiplications and accumulations are performed in parallel, two partial sums including two sets of 4 bit trains consisting of bits of the same bit positions of the input data as components are successively generated for multiplication and accumulation respectively, and these are added and stored. Therefore, the number of adders is as large as 16, and the number of registers is also as large as 16. Further, it is necessary for the register to hold the final result of multiplication and accumulation. Therefore, it has the wide input/output bit width of 21. Accordingly, the output bit width of the adders related to the registers and the input/output bid width of the shifter are also 21, which means that the circuit scale is considerably large. Further, an output having a large bit width of the register is fed back to the adders through the shifter. This means that there are much changes in signals and much power consumption.