The block-based discrete transform is a fundamental component of many image and video compression standards including the Joint Photographic Experts Group (JPEG), the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-1 (MPEG-1) Standard, the ISO/IEC MPEG-2 Standard, the ISO/IEC MPEG-4 Standard, the International Telecommunication Union, Telecommunication Sector (ITU-T) H.263 Recommendation, and the ISO/IEC MPEG-4 Part 10 Advanced Video Coding (AVC) Standard/ITU-T H.264 Recommendation (hereinafter the “MPEG-4 AVC standard”), and so forth, and is used in a wide range of applications.
The discrete cosine transform (DCT) is the most extensively used block transform. The DCT scheme takes advantage of the local spatial correlation property of an image/frame by dividing the image/frame into blocks of pixels (usually 4×4 and 8×8), transforming each block from the spatial domain to the frequency domain using the discrete cosine transform and quantizing the DCT coefficients. Most image and video compression standards use a fixed two-dimensional (2-D) separable DCT block transform. If several block sizes are allowed (typically, from 4×4 to 16×16 blocks), then the DCT with the size corresponding to the block is used. Nonetheless, there is only one possible transform for each block size.
However, the image and video content has data with varying statistics and properties. Therefore, there are potential compression gains if several transforms could be used for each block, selecting for each situation the most favorable transform within a range of options. In the image and video coding standards such as, for example, the MPEG-4 AVC Standard, there is only one choice for the block transform to use for each block size. Thus, there is no selection of the transform.
There have been some prior proposals for the use of multiple transforms in a single coding scheme. The Karhunen Loeve Transform (KLT) is an optimal linear transform described in a first prior art. KLT is employed in the first prior art approach to derive the best transform for each of the 9 intra prediction modes in the MPEG-4 AVC Standard. The statistics for each mode are extracted and the corresponding KLTs are derived. Each intra prediction residual is encoded with its KLT. The 9 intra modes partition the data space effectively in such a way that the DCT is no longer close to the best transform, so a distinctive best transform can be derived and successfully applied. To sum up, the first prior art approach uses several transforms, but each of them is fixed to the intra prediction mode selected.
A second prior art approach proposes to modify the DCT transform to several frequencies, that is, changing the basis functions with different all-pass filters to attain a variety of warped frequency responses. The resulting transforms are called warped DCT (WDCT). An exhaustive rate distortion (R-D) search is performed for each block and the selected transform is indicated with side information. The idea is applied to image compression. A third prior art approach describes using the WDCT and embedding the transform selection within the transformed coefficients themselves. The third prior art approach shows good performance for low-bit rate image compression. Also, the third prior art approach adds a post-filtering step that minimizes the mean square error (MSE). The filter is determined at the encoder and multiplexed into the bit-stream.
A fourth prior art approach proposes an algebraic optimization of a set of transforms for a large database. The set is partitioned iteratively until the set reaches a stable point in which each transform is sparse-optimal for its particular subset of data. The encoder indicates through a quad-tree which transform is used in each block. Thus, the transform choice is not obtained independently for each block.
A fifth prior art approach proposes an integer sine transform (IST) for inter frame residues. An inter frame residue has low correlation, and the DCT is adequate only for highly correlated data. Therefore, the fifth prior art approach proposes a sine transform, which is efficient for data with correlation from −0.5 to 0.5. The KLT coincides with the sine transform in part of this range. The IST is derived from the sine transform in exactly the same way as the integer cosine transform in the MPEG-4 AVC Standard. The fifth prior art approach has implemented 4×4 and 8×8 IST versions. The same transform is applied for the whole macroblock, sending a flag, unless the macroblock is divided into 4 sub-macroblocks, in which case 4 flags are sent indicating the transform employed in each sub-macroblock.
A sixth prior art approach proposes a scheme similar to that proposed in the fifth prior art approach. The sixth prior art approach proposes an adaptive prediction error coding (APEC) coding scheme that enables adaptive prediction error coding in the spatial and frequency domain. For each block of the prediction error, either transform coding or spatial domain coding is applied. The algorithm with a lower rate-distortion (RD) cost is chosen.
These approaches propose a limited range of choice of the best transform and do not fully exploit the generality of the concept.
We have previously disclosed and described a more general and broader approach that includes alternatives not considered in the aforementioned prior art. These concepts are disclosed with respect to a seventh prior art approach and an eighth prior art approach. The seventh and eighth prior art approaches describe the use of a set of transforms (two or more transforms) and then encode an image or video choosing the best transform of the set for each region, slice, block or macroblock. The set of transforms may be optimized or designed for a range of statistics or image/video patterns. In practice, one of the transforms is the DCT. A problem then arises on what alternative transforms should be in the set that work well along with the DCT. Different methods to obtain the alternative transforms are outlined in the seventh and eighth prior art approaches and include, for example, training a set and obtaining the corresponding KLT by sparsity-based methods, and so forth. However, these methods either try to optimize an objective metric, like the peak signal to noise ratio (PSNR) or BD-rate (Bjontegaard bit rate savings), or use basic alternatives like the DST (Discrete Sine Transform), but they do not consider the subjective quality of the encoded sequences.
We have observed that sequences encoded with a transform selection method (even though improving the PSNR) might suffer from a new artifact which we call “windowed pattern”. The pattern is mainly found at low bit-rates and can be annoying.