In video encoding, a frame of a video sequence may be partitioned into rectangular regions or blocks. A video block may be encoded in Intra-mode (I-mode) or Inter-mode (P-mode).
FIG. 1 shows a diagram of a prior art video encoder for the I-mode. In FIG. 1, a spatial predictor 102 forms a predicted block 103 from video block 100 by using pixels from neighboring blocks in the same frame. The neighboring blocks used for prediction may be specified by a spatial mode 101. A summer 104 computes the prediction error 106, i.e., the difference between the image block 100 and the predicted block 103. Transform module 108 projects the prediction error 106 onto a set of basis or transform functions. In typical implementations, the transform functions can be derived from the discrete cosine transform (DCT), Karhunen-Loeve Transform (KLT), or any other transforms. A set of transform functions can be expressed as {f0, f1, f2, . . . , fN}, where each fn denotes an individual transform function.
The transform module 108 outputs a set of transform coefficients 110 corresponding to the weights assigned to each of the transform functions. For example, a set of coefficients {c0, c1, c2, . . . , cN} may be computed, corresponding to the set of transform functions {f0, f1, f2, . . . , fN}. The transform coefficients 110 are subsequently quantized by quantizer 112 to produce quantized transform coefficients 114. The quantized coefficients 114 and spatial mode 101 may be transmitted to the decoder.
FIG. 1A depicts a video decoder for the I-mode. In FIG. 1A, quantized coefficients 1000 are provided by the encoder to the decoder, and supplied to the inverse transform module 1004. The inverse transform module 1004 reconstructs the prediction error 1003 based on the coefficients 1000 and the fixed set of transform functions, e.g., {f0, f1, f2, . . . , fN}. The spatial mode 1002 is supplied to the inverse spatial prediction module 1006, which generates a predicted block 1007 based on pixel values of already decoded neighboring blocks. The predicted block 1007 is combined with the prediction error 1003 to generate the reconstructed block 1010. The difference between the reconstructed block 1010 and the original block 100 in FIG. 1 is known as the reconstruction error.
An example of a spatial predictor 102 in FIG. 1 is herein described with reference to document VCEG-N54, published by ITU—Telecommunication Standardization Sector of Video Coding Expert Group (VCEG) in September 2001. In the embodiment, a coder offers 9 spatial modes of prediction for 4×4 blocks, including DC prediction (Mode 2) and 8 directional modes, labeled 0 through 8, as shown in FIG. 2. Each spatial mode specifies a set of already encoded pixels to use to encode a neighboring pixel, as illustrated in FIG. 3. In FIG. 3, the pixels from a to p are to be encoded, and already encoded pixels A to L are used for predicting the pixels a to p. If, for example, Mode 0 is selected, then pixels a, e, i and m are predicted by setting them equal to pixel A, and pixels b, f, j and n are predicted by setting them equal to pixel B, etc. Similarly, if Mode 1 is selected, pixels a, b, c and d are predicted by setting them equal to pixel I, and pixels e, f, g and h are predicted by setting them equal to pixel J, etc. Thus, Mode 0 is a predictor in the vertical direction; and Mode 1 is a predictor in the horizontal direction. The encoder is further described in the aforementioned document, and in document JVT-B118r4, published by the Joint Video Team of ISO/IEC MPEG and ITU-T VCEG in February 2002.
It has been noted that when performing the mode-based spatial prediction described above, the reconstruction error may exhibit regular spatial patterns. For example, the reconstruction error may have strong correlation in the direction corresponding to the mode used for prediction. It would be desirable to reduce the reconstruction error by reducing direction-dependent spatial patterns in the reconstruction error.