In video encoding, a frame of a video sequence may be partitioned into rectangular regions or blocks. A video block may be encoded in Intra-mode (I-mode) or Inter-mode (P-mode).
FIG. 1 shows a diagram of a prior art video encoder for the I-mode. In FIG. 1, a spatial predictor 102 forms a predicted block 103 from video block 100 using pixels from neighboring blocks in the same frame. The neighboring blocks used for prediction may be specified by a prediction mode 101. A summer 104 computes the prediction error 106, i.e., the difference between the image block 100 and the predicted block 103. Transform module 108 projects the prediction error 106 onto a set of basis or transform functions. In typical implementations, the transform functions can be derived from the discrete cosine transform (DCT), Karhunen-Loeve Transform (KLT), or any other functions.
The transform module 108 outputs a set of transform coefficients 110 corresponding to the weights assigned to each of the transform functions. For example, a set of coefficients {c0, c1, c2, . . . , cN} may be computed, corresponding to the set of transform functions {f0, f1, f2, . . . , fN}. The transform coefficients 110 are subsequently quantized by quantizer 112 to produce quantized transform coefficients 114. The quantized coefficients 114 and prediction mode 101 may be transmitted to the decoder.
FIG. 1A depicts a video decoder for the I-mode. In FIG. 1A, quantized coefficients 1000 are provided by the encoder to the decoder, and supplied to the inverse transform module 1004. The inverse transform module 1004 reconstructs the prediction error 1003 based on the coefficients 1000 and the fixed set of transform functions, e.g., {f0, f1, f2, . . . , fN}. The prediction mode 1002 is supplied to the inverse spatial prediction module 1006, which generates a predicted block 1007 based on pixel values of already decoded neighboring blocks. The predicted block 1007 is combined with the prediction error 1003 to generate the reconstructed block 1010. The difference between the reconstructed block 1010 and the original block 100 in FIG. 1 is known as the reconstruction error.
An example of a spatial predictor 102 in FIG. 1 is herein described with reference to section 8.3.1 of ITU-T Recommendation H.264, published by ITU—Telecommunication Standardization Sector in March 2005, hereinafter referred to as H.264-2005. In H.264-2005, a coder offers 9 prediction modes for prediction of 4×4 blocks, including DC prediction (Mode 2) and 8 directional modes, labeled 0 through 8, as shown in FIG. 2. Each prediction mode specifies a set of neighboring pixels for encoding each pixel, as illustrated in FIG. 3. In FIG. 3, the pixels from a to p are to be encoded, and neighboring pixels A to L and X are used for predicting the pixels a to p.
To describe the spatial prediction, a nomenclature may be specified as follows. Let s denote a vector containing pixel values from neighboring blocks (e.g., values of pixels A to X in FIG. 3 form a 1×12 vector s), and sA denote the element of vector s corresponding to pixel A, etc. Let p denote a vector containing the pixel values for the block to be predicted (e.g., values of pixels a to p in FIG. 3 form a 1×16 vector p), and pa denote the element of vector p corresponding to pixel a, etc. Further let wd denote a matrix of weights to be multiplied to the vector s to obtain the vector p when a prediction mode d is specified. wd may be expressed as follows (Equation 1):
      w    d    =      [                                        w                          a              ,              A                        d                                                w                          a              ,              B                        d                                    …                                      w                          a              ,              X                        d                                                            w                          b              ,              A                        d                                                                                                                                                                                                ⋮                                                                          ⋰                                                                                                  w                          p              ,              A                        d                                                                                                                                                w                          p              ,              X                        d                                ]  The vector of predicted pixels p may then be expressed as follows (Equation 2):
  p  =                    w        d            ·              s        ⁢                                  [                                                            p                a                                                                                        p                b                                                                        ⋮                                                                          p                p                                                    ]              =                  [                                                            w                                  a                  ,                  A                                d                                                                    w                                  a                  ,                  B                                d                                                    …                                                      w                                  a                  ,                  X                                d                                                                                        w                                  b                  ,                  A                                d                                                                                                                                                                                                                                                                                    ⋮                                                                                                          ⋰                                                                                                                                              w                                  p                  ,                  A                                d                                                                                                                                                                                                            w                                  p                  ,                  X                                d                                                    ]            ⁡              [                                                            s                A                                                                                        s                B                                                                        ⋮                                                                          s                X                                                    ]            
According to H.264-2005, if, for example, Mode 0 is selected, then pixels a, e, i and m are predicted by setting them equal to pixel A, and pixels b, f, j and n are predicted by setting them equal to pixel B, etc. Each set of pixels in Mode 0 corresponds to pixels lying along a single vertical direction, as shown in FIGS. 2 and 3. The relationships of the predicted to neighboring pixels for Mode 0 may be represented as follows (Equations 3):w0a,A=w0e,A=w0i,A=w0m,A=1;w0b,B=w0f,B=w0j,B=w0n,B=1;w0c,C=w0g,C=w0k,C=w0o,C=1;w0d,D=w0h,D=w0l,D=w0p,B=1;and all other w0=0.
On the other hand, if Mode 1 is selected, pixels a, b, c and d are predicted by setting them equal to pixel I, and pixels e, f, g and h are predicted by setting them equal to pixel J, etc. In this case, each set of pixels corresponds to pixels lying along a single horizontal direction, also as shown in FIGS. 2 and 3. The relationships for Mode 1 may be represented as follows (Equations 4):w1a,I=w1b,I=w1c,I=w1d,I=1;w1e,J=w1f,J=w1g,J=w1h,J=1;w1i,K=w1j,K=w1k,K=w1l,K=1;w1m,L=w1n,L=w1o,L=w1p,L=1;and all other w1=0.
Note that the modes given in H.264-2005 all specify setting the pixels along a single direction (e.g., the vertical direction in Mode 0, and the horizontal direction in Mode 1) equal to each other, and to a single neighboring pixel. While this is straightforward to implement and specify, in some cases it may be advantageous to set pixels along a single direction to values that are different from each other, and/or a combination of more than one neighboring pixel.