1. Field of the Invention
The present invention in general relates to devices for computing discrete cosine transform (hereafter called DCT) and inverse discrete cosine transform (hereafter called IDCT). More specifically, the present invention relates to DCT/IDCT processors based on adders, wherein these processors require less transistors, less estimated area and operate at higher speed than those of the prior art.
2. Description of Related Arts
The discrete cosine transform (DCT) has been adopted by many international standardizations of image processing and digital communication, such as MPEG1, MPEG2, and ISO9660. Since these standardizations are frequently applied to ISDN (Integrated Service Digital Network), video telephones, interactive television and high-definition television systems, reducing the fabrication cost of DCT converters and speeding up their convertion process will be a critical factor determining the success of a product.
DCT and IDCT are the inverse of each other. In addition, the transformation schemes of both are very similar. One-dimensional DCT or IDCT can be conceptionally regarded as a matrix. For example, 8-point DCT and 8-point IDCT can be expressed as: ##EQU1## where .phi..sub.mn = ##EQU2## 0.ltoreq.m, n.ltoreq.7 ##EQU3## and EQU V=.PHI..sup.T U (2)
where .PHI..sup.T is the transpose matrix of .PHI.,
respectively. In equations (1) and (2), U and V represent 8.times.1 vectors, and .PHI. and .PHI..sup.T represent 8.times.8 matrices.
In addition, the computation of DCT and IDCT respectively formulated in equations (1) and (2) can be simplified by the "decimation in frequency" technique. The reduced DCT derived from equation (1) can be expressed as: ##EQU4## where U.sub.i represents a term of vector U, i=0 to 7; V.sub.j represents a term of vector V, j=0 to 7; and .theta.=.pi./16. In the same way, the reduced IDCT derived from equation (2) can be expressed as: EQU V'=(.PHI.').sup.-1 U' (5) EQU V"=(.PHI.").sup.-1 U" (6)
Two-dimensional DCT or IDCT can be obtained by directly applying the cascaded one-dimensional DCT or IDCT. Accordingly, equation (1) or equations (3) and (4) can be used for calculating DCT, while equation (2) or equations (5) and (6) can be used for calculating IDCT.
There are a lot of techniques to implement DCT and IDCT processors. For example, U.S. Pat. Nos. 4,791,598 and 4,831,574, and R.O.C. Patent No. 190220 disclose ROM-based and distributed architectures for the DCT/IDCT processor. In addition, U.S. Pat. Nos. 4,837,724, 5,117,381, 5,197,021, 5,249,146 and 5,257,213, and R.O.C. Patent No. 211610 and 219420 discloses architectures of combined adders and multipliers for the DCT/IDCT processor. Furthermore, U.S. Pat. No. 5,053,985 discloses an architecture based on central logic operational units, and U.S. Pat. No. 5,181,183 discloses a combinational logic circuit to simplify the operation. In the following, ROM-based DCT and IDCT processors are described in detail to illustrate the prior art.
Distributed Arithmetic (DA) is often used to calculate the inner product of two vectors when one vector is known and fixed. Matrix multiplication in equations (1) to (6) can be regarded as several inner products of vectors. For a generalized N-point system, the i-th term U.sub.i of vector U in equation (1) can be expressed as: ##EQU5## where .phi..sub.ij is the term in the i-th row and the j-th column of transform matrix .PHI.; V is the j-th term of vector V; and 0.ltoreq.i,j.ltoreq.N-1. The value of transform matrix term .PHI..sub.ij depends on the definition of the N-point system and is known. In addition, vector term V.sub.j is in the form of a series of binary bits and can be expressed as: ##EQU6## where P and M are nonzero integers and P.ltoreq.M. V.sub.j(k) is the coefficient of the 2.sup.k power term of vector V.sub.j, ONE or ZERO.
Therefore, according to equations (7) and (8), vector term U.sub.i can be calculated by: ##EQU7##
In the ROM-based processor using a DA algorithm, all row vectors of transform matrix can be individually multiplied by all binary vectors with the same length. These multiplication results are stored in individual ROMs addressed by the corresponding binary vectors in advance. Therefore, an input vector ready for DCT or IDCT can be divided into a plurality of binary component vectors with incremental powers. These binary component vectors sequentially address the stored ROMs to acquire the corresponding data. According to the powers of these binary component vectors, the acquired corresponding data are shifted and added with each other to obtain an output vector, like equation (9).
FIG. 1 (Prior Art) is a block diagram of the conventional ROM-based DCT processor. Input vector V, which includes eight terms V.sub.0, V.sub.1, . . . , V.sub.7, is supplied to parallel-to-serial converter 2. All terms of input vector V are individually supplied but all bits of each term are supplied together. After parallel-to-serial conversion, the same bits of all terms, V.sub.j(k) (j=0 to 7) in equation (9), are simultaneously fed into ROMs 10 and serve as addressing data to fetch the corresponding data stored in advance. In this case, each of ROMs 10 has at least 256 memory cells corresponding to 2.sup.8 situations. Fetched data from each of ROMs 20 are sequentially fed into the corresponding shift-adder 12. Shift-adders 12 can recursively compute the vector terms U.sub.0 to U.sub.7 by these fetched data, according to equation (9). Finally, output buffer 4 collects all bits of output vector terms U.sub.0, U.sub.1, . . . , U.sub.7, and then individually outputs these terms as output vector U. The structure of the DCT processor in FIG. 1 is also applicable to an IDCT processor, unless the data stored in ROMs 10 must match with the transform matrix of IDCT.
FIG. 2 (Prior Art) is a block diagram of the conventional ROM-based reduced DCT processor, which is similar to that in FIG. 1. Pre-processing device 6 adds the opposing two terms (V.sub.0 and V.sub.7, V.sub.1 and V.sub.6, V.sub.2 and V.sub.5, V.sub.3 and V.sub.4) to generate four addition quantities in equation (3). In the same way, pre-processing device 6 subtracts the opposing two terms to generate four subtraction quantities in equation (4). According to matrix .phi.' in equation (3) and matrix .phi." in equation (4), required data are respectively stored in ROMs 20a and ROMs 20b in advance. Addition quantities V.sub.0 +V.sub.7, V.sub.1 +V.sub.6, V.sub.2 +V.sub.5, and V.sub.3 +V.sub.4 are fed into ROMs 20a and the following shift-adders 22a to produce the even terms U.sub.0, U.sub.2, U.sub.4, and U.sub.6 of output vector U, like the FIG. 1 processor. At the same time, subtraction quantities V.sub.0 -V.sub.7, V.sub.1 -V.sub.6, V.sub.2 -V.sub.5, and V.sub.3 -V.sub.4 are fed into ROMs 20b and the following shift-adders 22b to produce the odd terms U.sub.1, U.sub.3, U.sub.5, and U.sub.7 of output vector U. Finally, output buffer 4 collects all bits of output vector terms U.sub.0, U.sub.1, . . . , U.sub.7, and then individually outputs these terms as output vector U.
FIG. 3 (Prior Art) is a block diagram of the conventional ROM-based reduced IDCT processor. All terms of input vector U are divided into even terms (U.sub.0, U.sub.2, U.sub.4, and U.sub.6) and odd terms (U.sub.1, U.sub.3, U.sub.5, and U.sub.7). According to equation (5), the even terms of input vector U are fed into ROMs 30a and the following shift-adders 32a to produce the corresponding addition quantities V.sub.0 +V.sub.7, V.sub.1 +V.sub.6, V.sub.2 +V.sub.5, and V.sub.3 +V.sub.4. At the same time, the odd terms of input vector U are fed into ROMs 30b and the following shift-adders 32b to produce the corresponding subtraction quantities V.sub.0 -V.sub.7, V.sub.1 -V.sub.6, V.sub.2 -V.sub.5, and V.sub.3 -V.sub.4, according to equation (6). Output buffer 4 is used to collect all bits of these addition quantities and subtraction quantities and output them together. Finally, operator 8 is used to calculate all terms of output vector V by adding or subtracting the addition quantities by the corresponding subtraction quantities, for example, V.sub.0 +V.sub.7 and V.sub.0 -V.sub.7, and dividing them by 2.
The transformation scheme used in the ROM-based DCT/IDCT processor can easily be implemented. However, its main drawback is that it requires a large amount of chip area to implement ROMs. In addition, there is a timing bottleneck while addressing these ROMs. The computation speed is inevitably degraded.