Transforms, which take data from one domain (e.g., sampled data) to another (e.g., frequency space), are used in many signal and/or image processing applications. Such transforms are used for a variety of applications, including, but not limited to data analysis, feature identification and/or extraction, signal correlation, data compression, or data embedding. Many of these transforms require efficient implementation for real-time and/or fast execution whether or not compression is used as part of the data processing.
Signal and image processing frequently require converting input data into transform coefficients for the purposes of analysis. Often only a quantized version of the coefficients is needed (e.g. JPEG/MPEG data compression or audio/voice compression). Many such applications need to be processed in real time such as the generation of JPEG data for high speed printers.
The discrete cosine transform (DCT) is a widely used transform for image processing. With DCT coding, images are decomposed using a forward DCT (FDCT) and reconstructed using an inverse DCT (IDCT). The 16×16 DCT can be especially effective to decorrelate high-definition image and video signals, and is currently being considered in the development of the High Efficiency Video Coding project being developed by the Joint Collaboration Team—Video Coding in both the ITU-T Study Group 16 and ISO/IEC/JCT1/SC29/WG11.
Scaled architectures have previously been shown as an effective mechanism for reducing the complexity of the transform implementations. However, implementations for scaled architectures are more easily realized for 4×4 and 8×8 DCTs because of the inherent difficulty of finding scaling terms for the larger sets of simultaneous constants (e.g., the set of constants required to compute a set of concurrent rotations) more common in larger transforms (i.e. 16×16, 32×32, etc. . . . ).
For example, in Practical fast 1-D DCT algorithms with 11 multiplications, by C. Loeffler, A. Ligtenberg, and G. S. Moschytz, (Proc. IEEE Int. Conf. Accoust., Speech, and Sig. Proc. (ICASSP'89), vol. 2, pp. 988-991, February 1989), a factorization for a 1-D 16×16 DCT is presented that may be implemented with 31 multiplications and 81 additions. In this example, four rotations by four unique angles must be performed in the second stage of the transform, which require eight unique irrational constants.
These computations are difficult to implement in a scaled architecture given that a single common factor must be found for the entire set of constants without overly compromising the precision of the approximations. A common solution to this problem is to forgo the scaled approach entirely for this set of constraints since it is difficult to implement the 16×16 DCT in a scaled architecture.
Therefore, a scaled architecture that provides common factors for some or all of the irrational factors of the 16×16 DCT is desired.