Due to the limited network bandwidth, most audio, image and video media are compacted before being broadcast in television networks or transmitted through the Internet. Compression is also needed for practical storage of large amount of information such as a high quality motion video. The current lossless compression techniques cannot achieve a high enough compression ratio for the effective transmission of multimedia content in television broadcasting or over the Internet. In contrast, lossy compression yields a much higher compression ratio than that of lossless compression owing to the fact that some information of the source content is discarded during compression and that the decompressed content is not exactly the same as the source content, but a close approximation of it. However, with a conservative compression ratio, the decompressed content and the source content can appear perceptually indistinguishable. Many known implementations of lossy compression are included in existing industry standards such as: JPEG, JPEG 2000, MPEG-1, MPEG-2, and H.264/MPEG-4 or AVS.
Most lossy compression implementations apply some form of transform coding techniques. Transform coding is used to convert input signal data, such as spatial image pixel values to transform coefficient values. The transform coding process can be viewed as transforming the raw media content data from one domain to another domain. For example, an audio bit stream expressed as amplitude levels over time can be expressed as frequency spectrum over time. Lossy compression of the data in the frequency spectrum over time domain becomes a selective removal of the least significant data rather than losses across the board. The removal selection is made in a way that the audio bit stream reconstructed from the fewer data is to be perceived without detectable differences as compared to the source audio bit stream.
Another way to remove the less-significant data is through a quantization process. A quantizer maps an input data value to a quantized value within a reduced value range, usually reducing the precision of the data. And because the quantized data has fewer possible values, it can be represented using fewer bits than the input data. The decompression then applies the reversed transform coding on the quantized data to reconstruct an approximation of the original content. In typical commercial digital audio/video playback systems, such as the MPEG video software programs and MP3 music players, the compression-decompression processes are executed in a pair of codec and decoder.
It is the general goal of a transform coding scheme to convert an input content data into transform coefficients of as few significant ones as possible, such that the lesser significant coefficients can be discarded but still allowing close approximate reconstruction of the original content from the fewer data. This concept can be described as packing the input signal energy or information in as few number of transform coefficients as possible. In addition, the transform should be reversible. Also, the transform should be computationally tractable.
Block-based transforms are particular well suited for the compression of motion videos. A block-based transform operates on blocks of N×N image data, thus a motion video is processed frame by frame and each frame in units of a block. Some of the block-based transforms are the Karhunen-Loeve Transform (KLT), Singular Value Decomposition (SVD), and the Discrete Cosine Transform (DCT).
The DCT is featured in the H.264/MPEG-4 standard. DCT with block sizes of 4×4 and 8×8 with scaled integer transfer matrices, which is named the Integer Cosine Transfer (ICT), have been adopted by the standard. It is described in Siwei Ma, Xiaopeng Fan, and Wen Gao, “Low Complexity Integer Transform and High Definition Coding”, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, the content of which is incorporated herein by reference in its entirety. With high definition videos becoming more prevalent, larger block size is desired as it can more efficiently process the large number of frame image pixels. Although order-16 DCTs have been found, their fast algorithms are not known. Other transform coding schemes using 16×16 and 32×32 sized blocks have been proposed to the industry standard bodies such as the Audio Video Standard (AVS) and the Video Coding Experts Group (VCEG). The key issue is to determine which transform coding scheme to be adopted in the standards.
One candidate proposed to the AVS is the SICT Transform Coding disclosed in the U.S. patent application Ser. No. 11/950,182, filed Dec. 4, 2007. Another is the LKT Transform Coding, which is proposed to the VCEG as the new High Efficiency Video Coding (HEVC) standard, and is documented in Bumshik Lee, Munchurl Kim, Changseob Park, Sangjin Hahm, and Injoon Cho, “A 16×16 Transform Kernel with Quantization for (Ultra) High Definition Video Coding”, VCEG, April 2009. Yet another HEVC candidate is disclosed in R. Joshi, Y. Reznik, and M. Karczewicz, “Simplified Transforms for Extended Block Sizes”, VCEG, July 2009.