The block-based discrete transform is a fundamental component of many image and video compression standards and recommendations including the Joint Photographic Experts Group (JPEG) Standard, the International Telecommunication Union, Telecommunication Sector (ITU-T) H.263 Recommendation (hereinafter the “H.263 Recommendation”), the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-1 (MPEG-1) Standard, the MPEG-2 Standard, the ISO/IEC MPEG-4 Part 10 Advanced Video Coding (AVC) Standard/ITU-T H.264 Recommendation (hereinafter the “MPEG-4 AVC Standard”), and others, and it is used in a wide range of applications.
The discrete cosine transform (DCT) is the most extensively used block transform. The DCT scheme takes advantage of the local spatial correlation property of the image/frame by dividing the image/frame into blocks of pixels (usually 4×4 and 8×8), transforming each block from the spatial domain to the frequency domain using the discrete cosine transform, and quantizing the DCT coefficients. Most image and video compression standards use a fixed two-dimensional (2-D) separable DCT block transform. If several block sizes are allowed (typically, from 4×4 to 16×16 blocks), then they use a DCT having a size corresponding to the block. Nonetheless, there is only one possible transform for each block size.
However, the image and video content has data with varying statistics and properties. Thus, the availability of, and hence forced use of, a single transform per block size fails to realize any potential compression gains that could be available using a different transform than the single transform available per block size.
In the image and video coding standards such as, for example, the MPEG-4 AVC Standard, there is only one choice for the block transform to use for each block size. There is no selection of the transform.
Turning to FIG. 1, a video encoder capable of performing video encoding in accordance with the MPEG-4 AVC Standard is indicated generally by the reference numeral 100. The video encoder 100 includes a frame ordering buffer 110 having an output in signal communication with a non-inverting input of a combiner 185. An output of the combiner 185 is connected in signal communication with a first input of a transformer and quantizer 125. An output of the transformer and quantizer 125 is connected in signal communication with a first input of an entropy coder 145 and a first input of an inverse transformer and inverse quantizer 150. An output of the entropy coder 145 is connected in signal communication with a first non-inverting input of a combiner 190. An output of the combiner 190 is connected in signal communication with a first input of an output buffer 135.
A first output of an encoder controller 105 is connected in signal communication with a second input of the frame ordering buffer 110, a second input of the inverse transformer and inverse quantizer 150, an input of a picture-type decision module 115, a first input of a macroblock-type (MB-type) decision module 120, a second input of an intra prediction module 160, a second input of a deblocking filter 165, a first input of a motion compensator 170, a first input of a motion estimator 175, and a second input of a reference picture buffer 180.
A second output of the encoder controller 105 is connected in signal communication with a first input of a Supplemental Enhancement Information (SEI) inserter 130, a second input of the transformer and quantizer 125, a second input of the entropy coder 145, a second input of the output buffer 135, and an input of the Sequence Parameter Set (SPS) and Picture Parameter Set (PPS) inserter 140.
An output of the SEI inserter 130 is connected in signal communication with a second non-inverting input of the combiner 190.
A first output of the picture-type decision module 115 is connected in signal communication with a third input of the frame ordering buffer 110. A second output of the picture-type decision module 115 is connected in signal communication with a second input of a macroblock-type decision module 120.
An output of the Sequence Parameter Set (SPS) and Picture Parameter Set (PPS) inserter 140 is connected in signal communication with a third non-inverting input of the combiner 190.
An output of the inverse quantizer and inverse transformer 150 is connected in signal communication with a first non-inverting input of a combiner 119. An output of the combiner 119 is connected in signal communication with a first input of the intra prediction module 160 and a first input of the deblocking filter 165. An output of the deblocking filter 165 is connected in signal communication with a first input of a reference picture buffer 180. An output of the reference picture buffer 180 is connected in signal communication with a second input of the motion estimator 175 and a third input of the motion compensator 170. A first output of the motion estimator 175 is connected in signal communication with a second input of the motion compensator 170. A second output of the motion estimator 175 is connected in signal communication with a third input of the entropy coder 145.
An output of the motion compensator 170 is connected in signal communication with a first input of a switch 197. An output of the intra prediction module 160 is connected in signal communication with a second input of the switch 197. An output of the macroblock-type decision module 120 is connected in signal communication with a third input of the switch 197. The third input of the switch 197 determines whether or not the “data” input of the switch (as compared to the control input, i.e., the third input) is to be provided by the motion compensator 170 or the intra prediction module 160. The output of the switch 197 is connected in signal communication with a second non-inverting input of the combiner 119 and an inverting input of the combiner 185.
A first input of the frame ordering buffer 110 and an input of the encoder controller 105 are available as inputs of the encoder 100, for receiving an input picture. Moreover, a second input of the Supplemental Enhancement Information (SEI) inserter 130 is available as an input of the encoder 100, for receiving metadata. An output of the output buffer 135 is available as an output of the encoder 100, for outputting a bitstream.
Turning to FIG. 2, a video decoder capable of performing video decoding in accordance with the MPEG-4 AVC Standard is indicated generally by the reference numeral 200. The video decoder 200 includes an input buffer 210 having an output connected in signal communication with a first input of the entropy decoder 245. A first output of the entropy decoder 245 is connected in signal communication with a first input of an inverse transformer and inverse quantizer 250. An output of the inverse transformer and inverse quantizer 250 is connected in signal communication with a second non-inverting input of a combiner 225. An output of the combiner 225 is connected in signal communication with a second input of a deblocking filter 265 and a first input of an intra prediction module 260. A second output of the deblocking filter 265 is connected in signal communication with a first input of a reference picture buffer 280. An output of the reference picture buffer 280 is connected in signal communication with a second input of a motion compensator 270.
A second output of the entropy decoder 245 is connected in signal communication with a third input of the motion compensator 270 and a first input of the deblocking filter 265. A third output of the entropy decoder 245 is connected in signal communication with an input of a decoder controller 205. A first output of the decoder controller 205 is connected in signal communication with a second input of the entropy decoder 245. A second output of the decoder controller 205 is connected in signal communication with a second input of the inverse transformer and inverse quantizer 250. A third output of the decoder controller 205 is connected in signal communication with a third input of the deblocking filter 265. A fourth output of the decoder controller 205 is connected in signal communication with a second input of the intra prediction module 260, a first input of the motion compensator 270, and a second input of the reference picture buffer 280.
An output of the motion compensator 270 is connected in signal communication with a first input of a switch 297. An output of the intra prediction module 260 is connected in signal communication with a second input of the switch 297. An output of the switch 297 is connected in signal communication with a first non-inverting input of the combiner 225.
An input of the input buffer 210 is available as an input of the decoder 200, for receiving an input bitstream. A first output of the deblocking filter 265 is available as an output of the decoder 200, for outputting an output picture.
There have been some prior proposals for the use of multiple transforms in a single coding scheme. In a first prior art approach, an optimal linear transform is disclosed, which is referred to as the Karhunen Loeve Transform (KLT). KLT is employed to derive the best transform for each of the 9 intra prediction modes in the MPEG-4 AVC Standard. The statistics for each mode are extracted and the corresponding KLTs are derived. Each intra prediction residual is encoded with its KLT. The 9 intra modes partition the data space effectively, in such a way that the DCT is no longer close to the best transform, so a distinctive best transform can be derived and successfully applied. In sum, the proposal uses several transforms, but each of them is fixed to the intra prediction mode selected.
A second prior art approach proposes to modify the DCT transform to several frequencies, that is, changing the basis functions with different all-pass filters to attain a variety of warped frequency responses. The resulting transforms are called warped DCT (WDCT). An exhaustive rate distortion (R-D) search is performed for each block and the selected transform is indicated with side information. The idea is applied to image compression.
A third prior art approach describes using the WDCT and embedding the transform selection within the transformed coefficients themselves. The method shows good performance for low-bit rate image compression. Also, the method adds a post-filtering step that minimizes the mean square error (MSE). The filter is determined at the encoder and multiplexed into the bit-stream.
A fourth prior art approach proposes an algebraic optimization of a set of transforms for a large database. The set is partitioned iteratively until it reaches a stable point in which each transform is sparse-optimal for its particular subset of data. The coder indicates through a quad-tree which transform is used in each block. Thus, the transform choice is not done independently for each block.
A fifth prior art approach proposes an integer sine transform (IST) for inter frame mode. An inter frame residue has a low correlation, and the DCT is adequate only for highly correlated data. Therefore, it proposes a sine transform, which is efficient for data with a correlation from −0.5 to 0.5. The KLT coincides with the sine transform in part of this range. The IST is derived from the sine transform in exactly the same way as the integer cosine transform in the MPEG-4 AVC Standard. The fifth prior art approach has implemented the 4×4 and 8×8 IST versions. The same transform is applied for the whole macroblock, sending a flag, unless the macroblock is divided into 4 sub-macroblocks, then 4 flags are sent specifying the transform employed in each sub-macroblock.
A sixth prior art approach proposes a scheme similar to that proposed in the fifth prior art approach. The sixth prior art approach proposes an adaptive prediction error coding (APEC) scheme that enables adaptive prediction error coding in the spatial and frequency domain. For each block of the prediction error, either transform coding or spatial domain coding is applied. The algorithm with a lower rate-distortion cost is chosen.
The preceding approaches propose a limited range of choice of the best transform and do not fully exploit the available possibilities.