The block-based discrete transform is a fundamental component of many image and video compression standards including, for example, the Joint Photographic Experts Group (JPEG), the International Telecommunication Union, Telecommunication Sector (ITU-T) H.263 Recommendation (hereinafter the “H.263 Recommendation”), the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-1 (MPEG-1) Standard, the ISO/IEC MPEG-2 Standard, the ISO/IEC MPEG-4 Part 10 Advanced Video Coding (AVC) Standard/ITU-T H.264 Recommendation (hereinafter the “MPEG-4 AVC Standard”), as well as others, and is used in a wide range of applications. The transform converts a signal into the transform domain and represents the signal as a linear combination of a set of transform basis functions. The quantization stage then follows. A good transform for video coding should: (1) de-correlate the signal to be quantized, so that scalar quantization over individual values can be effectively used without losing too much coding efficiency in comparison with vector quantization; and (2) compact the energy of the video signal into as few coefficients as possible, which allows the encoder to represent the image by a few coefficients with large magnitudes. A transform that performs well under the preceding two criteria is the Karhunen-Loeve transform (KLT). The discrete cosine transform (DCT) provides a good approximation for KLT for common image signals and is used in almost all modern video coding standards.
The DCT scheme takes advantage of the local spatial correlation property of the image/frame by dividing it into blocks of pixels (usually 4×4, 8×8, and 16×16), transforming each block from the spatial domain to the frequency domain using the discrete cosine transform and quantizing the DCT coefficients. Most image and video compression standards use a fixed two-dimensional (2-D) separable DCT block transform. If several block sizes are allowed (typically, from 4×4 to 16×16 blocks), then they use the DCT with the size corresponding to the block.
In image and video coding standards such as, for example, the MPEG-4 AVC Standard, the transform to use depends on the block size. For example, a 4×4 integer DCT is used for 4×4 blocks, an 8×8 integer DCT for 8×8 blocks, and a 4 cascaded 4×4 integer DCT for INTRA16×16 blocks. The DCT basis functions are pre-determined and do not adapt to the video content or coding parameters.
Since KLT is an optimized linear transform, in a first prior art approach, it is utilized to derive the best transform for each of the nine intra prediction modes in the MPEG-4 AVC Standard. The statistics for each mode are extracted and the corresponding KLTs are derived. Residual data for each intra prediction mode is encoded with a corresponding KLT. The nine intra modes partition the data space effectively, such that the DCT is no longer close to the best transform, so a distinctive best transform can be derived and successfully applied. In sum, the first prior art approach uses several transforms, and each of them is fixed to the intra prediction mode selected despite the video content.
In a second prior art approach, it is proposed to train one or more transforms offline and, then, the encoder selects, for each block among these blocks, transforms to optimize the compression performance. The selection is signaled for each block. However, the image and video content has data with varying statistics and properties. The encoder also operates using different coding parameters, such as different target bit rates. The variations in the original images and residual images cannot always be captured by DCT.
Turning to FIG. 1, a typical transform selection method at an encoder is indicated generally by the reference numeral 100. The method 100 includes a start block 110 that passes control to a function block 120. The function block 120 initializes a transform set, and passes control to a loop limit block 130. The loop limit block 130 begins a loop (hereinafter “loop (1)”) using a variable j having a range from 1 through the number (#) of pictures in a current video sequence (being processed), and passes control to a loop limit block 140. The loop limit block 140 begins a loop (hereinafter “loop (2)”) using a variable i having a range from 1 through the number (#) of blocks in a current picture being processed, and passes control to a function block 150. The function block 350 selects the best transform for a block (e.g., based on one or more criteria), and passes control to a function block 160. The function block 160 encodes block i in picture j, and passes control to a loop limit block 170. The loop limit block 170 ends the loop (2), and passes control to a loop limit block 180. The loop limit block 180 ends the loop (1), and passes control to an end block 199.
In the prior art, the transform set is trained offline with a large training data set. The training techniques can be based on the common KLT, a sparsity objective function, and so forth. During encoding, the encoder selects the best transform from the training set for each block to improve the compression performance. The selection is signaled in the bitstream, so that a corresponding decoder can parse the bitstream and decode the video signal using the same (but inverse) transform as that used by the encoder.
Turning to FIG. 2, a typical transform selection method at a decoder is indicated generally by the reference numeral 200. The method 200 includes a start block 210 that passes control to a function block 220. The function block 220 initialized a transform set, and passes control to a loop limit block 230. The loop limit block 230 begins a loop (hereinafter “loop (1)”) using a variable j having a range from 1 through the number (#) of pictures in a current video sequence (being processed), and passes control to a loop limit block 240. The loop limit block 240 begins a loop (hereinafter “loop (2)”) using a variable i having a range from 1 through the number (#) of blocks in a current picture being processed, and passes control to a function block 250. The function block 250 decodes the transform for the (current) block, and passes control to a function block 260. The function block 260 decodes block i in picture j, and passes control to a loop limit block 270. The loop limit block 270 ends the loop (2), and passes control to a loop limit block 280. The loop limit block 280 ends the loop (1), and passes control to an end block 299.
Thus, in method 200, for each block the decoder obtains from the bitstream the transform used by the encoder and then reconstructs the video signal using the signaled transform (inverse transform). However, the set of transforms is derived offline and cannot adapt to the input video sequence and coding parameters.