In current audio coding standards such as MPEG-I Layers 1-3, MPEG-II Layers 1-4, MPEG-IV, and AC-3, cosine modulated filter banks (CMFBs) have been widely adopted to transform an audio sequence from time domain to transform domain or subband domain for compression. However, all the CMFBs' formulae vary with not only the different standards but also the standard layers, block length, and either encoder or decoder. For real-time applications, these various forms need to be individually designed and tuned for precision, complexity, and memory movements.
FIG. 1 illustrate the structure of CMFBs in an audio encoder and decoder. As shown in FIG. 1, the process of CMFBs comprising two steps, i.e., the window-and-overlapping addition (WOA) and the modified cosine transform (MCT). The WOA is to perform a windowing multiplication and addition with overlapping audio blocks.
The complexity of this step is O(k) per audio sample, where k depends on the overlapping factors of the forms. For example, the factor k is 16 for the MPEG-I Layers 2 and is 2 for the AC-3. The second step, MCT, has a complexity O(W) per audio sample, where W is the windowing length and is quite different for various CMFBs. The range of W is from 36 for MPEG-I Layer 3 to 4096 for the MPEG-IV. For WOA, direct implementation has been generally adopted and the design is straightforward. On the contrary, the complexity of the MCT is high, and fast approaches have been developed based on similar concepts developed for the fast Fourier transform.
It has been widely known that developing fast approaches like the fast Fourier transform and the fast cosine transform needs to consider the tradeoff between arithmetic complexity, regularity, modularity, and numerical precision. Hence, it is always a critical issue for designing hardware or software for the fast MCTs.
There have been many fast computing mechanisms developed for the discrete cosine transform (DCT). These mechanisms are developed for different transform length and different DCT types. On the audio coding, the radix 2 DCT is the main considering length. The developing of the radix-2 fast DCT mechanisms can be classified into two approaches including indirect computation of the DCT through the fast Fourier transform or the fast Hartley transform, and direct computation of the DCT through matrix factorization or recursive decomposition.
However, these two approaches have some disadvantages. The first approach needs additional complexity in mapping DCTs into other transforms while the second approach in general lacks the modularity and data regularity.
As mentioned by Yun et al., "On the fixed-point-error analysis of several fast DCT algorithms," IEEE Trans. Circuits Syst. Video Technol., Vol. 3, February 1993, pp. 27-41, the modularity and the regularity are essential for designing hardware and generalizing to higher order transform.
Recently, Kok has developed the fast algorithm for type-II DCT which can recursively decompose one type-II DCT with length N into two type-II DCTs with length (N/2) (see "Fast algorithm for computing discrete cosine transform," IEEE Trans. on Signal Porcess., Vol. 45, No. 3, March 1997, pp. 757-760). The decomposition from one DCT into two DCTs leads to the merit in modularity and regularity.