The block-based discrete transform is a fundamental component of many image and video compression standards including, for example, the Joint Photographic Experts Group, the International Telecommunication Union, Telecommunication Sector (ITU-T) H.263 Recommendation (hereinafter the “H.263 Recommendation”), the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-1 (MPEG-1) Standard, the ISO/IEC MPEG-2 Standard, the ISO/IEC MPEG-4 Part 10 Advanced Video Coding (AVC) Standard/ITU-T H.264 Recommendation (hereinafter the “MPEG-4 AVC Standard”), as well as others, and is used in a wide range of applications.
The discrete cosine transform (DCT) is the most extensively used block transform. The DCT scheme takes advantage of the local spatial correlation property of a picture by dividing it into blocks of pixels (usually 4×4 and 8×8), transforming each block from the spatial domain to the frequency domain using the DCT, and quantizing the transform coefficients. Most image and video compression standards use a fixed two-dimensional (2-D) separable DCT block transform. If several block sizes are allowed (typically, from 4×4 to 16×16 blocks), then a DCT with a size corresponding to size of the block is used. However, there is only one transform for each block size and all the pixels in the block are processed with that transform.
In image and video coding standards such as, for example, the MPEG-4 AVC Standard, there is one choice for the block transform to use for each block size. If the residue (i.e., the prediction error) is coded, then such coding is performed via the transform coefficients. All the pixels are transformed. Turning to FIG. 1, some transform sizes in the MPEG-4 AVC Standard are indicated generally by the reference numeral 100. With respect to the depicted transform sizes 100, for an 8×8 block 110 to be coded: the 8×8 block 110 is divided into four 4×4 blocks 121 through 124 that are transformed with a 4×4 transform. In some cases, sending the transform coefficients may not be necessary for some of the 4×4 blocks. For example, with respect to the depicted transform sizes 100, the residue (as represented by the corresponding coefficients) is not sent for the three 4×4 blocks 121, 122, and 123 (depicted without any hatch patterns), while the residue is sent for the remaining 4×4 block 124 (depicted using a diagonal hatch pattern). The main disadvantage is that the spatial support of the transforms is fixed, so the flexibility to encode the residue is significantly reduced.
One prior art approach introduces more flexibility in the residue coding step by proposing a spatially varying transform. Turning to FIG. 2, a spatially varying transform is indicated generally by the reference numeral 200. In such a case, the residue may be coded in accordance with the MPEG-4 AVC Standard, but the spatially varying transform is also allowed. The spatially varying transform is applied only to a sub-block 210 (depicted using a diagonal hatch pattern), leaving the rest of the residue un-coded. Therefore, the sub-block 210 of M×M pixels from an N×N block 220 are transformed. The encoder has to signal the position of the M×M sub-block 210 (that is, the locations x and y). However, this approach still lacks flexibility. For example, the approach lacks flexibility because there is only one transform within the block 220, the approach does not code part of the residue data, and there is no pre-filtering to improve visual quality.
A second prior art approach proposes the so-called Adaptive Prediction Error Coding (APEC) technique. An inter frame residue has low correlation, and the DCT is adequate only for highly correlated data. Therefore, the second prior art approach proposes to enable an adaptive prediction error coding in the spatial and frequency domains. For each block of the prediction error, either transform coding or spatial domain coding is applied. The algorithm with the lower rate-distortion cost is chosen for the block. In sum, the second prior art approach proposes a selection of whether or not to use a transform to code the residue of a block, but ultimately only one of the two following options is applied for each block: all pixels are transformed prior to entropy coding; or all of the pixels are entropy encoded directly in the spatial domain.