Transform digital coding applied to blocks of one-, two-, and three-dimensional digital signal samples is widely used in all applications, e.g. video signal processing which requires spectral analysis, data compression, and reduction of original signal bandwidth.
The various types of transform coding are well known. They include the Hadamard or HCT (High Correlation Transform) types, which are based on extremely simple coefficients, and Fourier transform, which requires complicated floating point calculations. Yet other types include the Slant transform, which concerns optimal frequency spectrum energy distribution.
At the present time, however, the discrete cosine transform, referred to hereunder as the DCT transform, provides the best compromise between effective representation in the transform signal frequency spectrum and simplicity of construction in many applications, including video signal processing.
In the case of the N.N base one-dimensional DCT transform, the major advantage consists in the recurrence of its N real coefficients.
A number of DCT transform computational algorithms are used; some are based on its direct derivation from the Fourier transform, while others exploit coefficient recurrence. These algorithms all serve to reduce the number of multiplications by comparison with the total number of operations to be carried out (addition, accumulation, addressing, normalizing, rounding off and cut-off operations); this makes them particularly suitable for software applications whose major objective is to reduce the number of microinstruction cycles.
Of the several well-known N.N base one-dimensional DCT transform computational algorithms, that which provides the greatest reduction in the number of multiplications is the Fralick-Chen algorithm. This algorithm is described in the paper, "A fast computational algorithm for the discrete cosine transform", W. Chen, C. H. Smith, S. C. Fralick, IEEE Transactions on Communications, Vol. COM. 25, No. 11, September 1977, and requires a number of operations namely, EQU 3N/2(log N-1)+2 additions and EQU N log N-3N/2+4 multiplications.
For the N.N base two-dimensional DCT transform, on the other hand, it is possible to exploit the distributed property and apply the algorithms for the one-dimensional case, such as the Fralick-Chen algorithm, in the two orthogonal directions; in this way, the number of operations carried out would be 2N times the number required for the one-dimensional case.
There is, however, a 2-D transform computational algorithms which produces a further reduction in the number of operations. This algorithm is described in a paper by M. Vetterli, "Fast 2-D discrete cosine transform", IEEE ICASSP-1985, and requires an amount of operations equal to: EQU (N.N/2) log N+N.N/3-2N+8/3 additions and EQU (N.N.5/2) log N+N.N/3-6N+62/3 multiplications
Normally, however, the drastic reduction in the number of operations provided by these algorithms is accompanied by a corresponding complication in handling and re-ordering intermediate product data, which produces serious memory addressing problems in designing computation circuits for these algorithms. Moreover, the non-uniform distribution in these ciruits of computational elements such as adders and multipliers which have different propagation times makes these components inefficient both as regards the reduction of overall computation time, and in terms of utilization of processing resources.
Irrespective of whether circuits for these algorithms are designed to use discrete or integrated components, the main problem is still the part of the circuit dedicated to multiplication operations.
This is because of the circuit complexity, elevated computation time, space occupation and power dissipation.
The best known application of N.N bit multiplication operation involves converting this operation into a sequence of N elementary N-bit adding and shifting operations. This solution has been used in parallel type multipliers with various circuit optimizations.
This solution would not appear to be the most efficient for the DCT transform, even if a limited number of cofficients must be used, given that the number of elementary operations to be performed is still high.
An attempt could be made to simplify the multiplier structure by using conversion tables employing ROM or PROM memories or programmed logic arrays (PLA) which contain the results of multiplications directly addressed by the operands.
In our case, however, such structures cannot be used because the large number of multiplication coefficients and of operand representation bits would require an excessively large memory capacity.