In recent years, many proposed image compression standards have been DCT based algorithms, such as MPEG-1, MPEG-2, H.261, H.263, JPEG and so on. Consequently, the discrete cosine transform (DCT) has become a primitive function in image compression chips. Since DCT needs a large amount of multiplication, large hardware costs are incurred for real time applications.
The Discrete Cosine Transform (DCT) is an orthogonal transform consisting of a set the basis of which is sampled cosine functions. A generalized two dimension DCT is defined as below: ##EQU1##
These two equations are standard separable two-dimensional even cosine transforms and are implemented by row-column decomposition methods in traditional DCT/IDCT circuits, wherein the N.times.N 2-D DCT can be achieved by a N-point 1 -D DCT. The forward and inverse 1 -D DCT are given by: ##EQU2##
So, various fast algorithms have been introduced for reducing the amount of multiplication involved in this transform. These algorithms usually form a butterfly structure in a flow diagram. This butterfly structure has many drawbacks in IC implementation, such as irregular structure and complex data routing which may require large silicon area and longer design time. Moreover, since multiple stages of multipliers are accompanied by rounding or truncation in finite precision arithmetic, fixed internal precision can cause the resulting accuracy to be seriously reduced.
As a result of much effort, several designs on DCT have been developed in past decades. Peter. A. etc., in "A High Performance Full-Motion Video Compression Chipset", published in "IEEE Transaction Circuits and Systems For Video Technology" Vol.2 NO.2 June 1992, pp. 111-122, presented the DCT circuit implemented by 4-point inner product and Wallace tree skill and disclosed that 20% of the hardware could be saved. Min-Ting Sun etc., in "VLSI Implementation of 16.times.16 Discrete Cosine Transform" published in "IEEE Transaction Circuits and Systems" Vol.36 NO.4 April 1989 pp.610-617, showed a concurrent architecture on DCT with 32 PEs and a RAM performing 16.times.16 transposition by exploiting distribute arithmetic. A SIMD-systolic architecture for DCT realized by butterfly algorithm is presented in "A SIMD-Systolic Architecture and VLSI Chips for the two dimensional DCT and IDCT" published in "IEEE Transaction Consumer Electronics" Vol.39 NO.4 Nov. 1993, pp.859-869 by Chen-Mie Wu and Auchy Chiou. Yi-Feng Tang etc. designed a DCT circuit by exploiting a fast DCT algorithm and multiplier-accumulator based on this distribute algorithm. Transpose memory inserted between each dimension of DCT is partitioned in order to reduce further hardware overheads. This design is pressed on "A 0.8u 100 MHz 2-D DCT Core Processor" published in "IEEE Transaction on Consumer Electronics" Vol.40 NO.3 August 1994.
In fact, there are still many issues which should be taken into consideration in IC implementation, such as those raised by the performing of both forward and inverse transforms, the complexity of the control circuit logic and the number of data storage and shuffling elements.
Therefore, an object of the present invention is to provide a DCT/IDCT circuit, which is suitable for VLSI implementation and has the advantages of low hardware costs, high efficiency and a regular structure.