Spatial prediction has been employed in many applications. In particular, spatial prediction forms an essential part of many image and video coding and processing applications. In hybrid image or video coding algorithms, spatial prediction is typically employed for determining a prediction for an image block based on the pixels of already encoded/decoded blocks. On the other hand, spatial prediction may also be used as a part of post processing the decoded image or video signal, in particular for error concealment.
The majority of standardized video coding algorithms are based on hybrid video coding. Hybrid video coding methods typically combine several different lossless and lossy compression schemes in order to achieve the desired compression gain. Hybrid video coding is also the basis for ITU-T standards (H.26x standards such as H.261, H.263) as well as ISO/IEC standards (MPEG-X standards such as MPEG-1, MPEG-2, and MPEG-4). The most recent and advanced video coding standard is currently the standard denoted as H.264/MPEG-4 advanced video coding (AVC) which is a result of standardization efforts by joint video team (JVT), a joint team of ITU-T and ISO/IEC MPEG groups. A new video coding standard is currently being developed by Joint Collaborative Team on Video Coding (JCT-VC) under a name High-Efficiency Video Coding (HEVC), aiming, in particular at improvements of efficiency regarding the high-resolution video coding.
A video signal input to an encoder is a sequence of images called frames, each frame being a two-dimensional matrix of pixels. All the above-mentioned standards based on hybrid video coding include subdividing each individual video frame into smaller blocks consisting of a plurality of pixels. Typically, a macroblock (usually denoting a block of 16×16 pixels) is the basic image element, for which the encoding is performed. However, various particular encoding steps may be performed for smaller image elements, denoted subblocks or simply blocks and having the size of, for instance, 8×8, 4×4, 16×8, etc. The largest possible size for such a block, for instance in HEVC, is 64×64 pixels. It is then called the largest coding unit (LCU). A subdivision of a LCU into smaller blocks is possible in HEVC. One such block is called a Coding Unit (CU). A CU is the basic image element, for which the coding is performed.
FIG. 1 is an example of a typical H.264/MPEG-4 AVC standard compliant video encoder 100. A subtractor 105 first determines differences between a current block to be encoded of an input video image (input signal) and a corresponding prediction block, which is used for the prediction of the current block to be encoded. In H.264/MPEG-4 AVC, the prediction signal is obtained either by a temporal or by a spatial prediction. The type of prediction can be varied on a per frame basis, per slice basis or on a per macroblock basis.
Macroblocks or CUs predicted using temporal prediction are called inter-encoded and macroblocks or CUs predicted using spatial prediction are called intra-encoded. The type of prediction for a video frame can be set by the user or selected by the video encoder so as to achieve a possibly high compression gain. In accordance with the selected type of prediction, an intra/inter switch 175 provides corresponding prediction signal to the subtractor 105. The prediction signal using temporal prediction is derived from the previously encoded images, which are stored in a memory 140. The prediction signal using spatial prediction is derived from the values of boundary pixels in the neighboring blocks of the same frame, which have been previously encoded, decoded, and stored in the memory 140. The memory unit 140 thus operates as a delay unit that allows a comparison between current signal values to be encoded and the prediction signal values generated from previous signal values. The memory 140 can store a plurality of previously encoded video frames. The difference between the input signal and the prediction signal, denoted prediction error signal or residual signal, is transformed resulting in coefficients, which are quantized 110. Entropy encoder 190 is then applied to the quantized coefficients in order to further reduce the amount of data in a lossless way. This is mainly achieved by applying a code with code words of variable length wherein the length of a code word is chosen based on the probability of its occurrence.
Intra-encoded images (called also I-type images or I frames) consist solely of macroblocks or CUs that are intra-encoded, i.e. intra-encoded images can be decoded without reference to any other previously decoded image. The intra-encoded images provide error resilience for the encoded video sequence since they refresh the video sequence from errors possibly propagated from frame to frame due to temporal prediction. Moreover, I frames enable a random access within the sequence of encoded video images. Intra-frame prediction uses a predefined set of intra-prediction modes. Some of the intra-prediction modes predict the current block using the boundary pixels of the neighboring blocks already encoded. Other intra-prediction modes, as template matching for example, use a search area made of already encoded pixels belonging to the same frame. The predefined set of intra-prediction modes includes some directional spatial intra-prediction modes. The different modes of directional spatial intra-prediction refer to different directions of the applied two-dimensional prediction. This allows efficient spatial intra-prediction in the case of various edge directions. The prediction signal obtained by such an intra-prediction is then subtracted from the input signal by the subtractor 105 as described above. In addition, spatial intra-prediction mode information indicating the prediction mode is provided to the entropy encoder 190 (not shown in FIG. 1), where it is entropy encoded and provided together with the encoded video signal.
In the H.264/MPEG-4 AVC intra coding scheme, the spatial prediction is performed for subblocks of sizes 4×4, 8×8 or 16×16 pixels in order to reduce spatial redundancy. Intra-frame prediction uses a predefined set of intra-prediction modes, which basically predict the current block using the boundary pixels of the neighboring blocks already coded. The different types of directional spatial prediction refer to different edge directions, i.e. the direction of the applied two-dimensional extrapolation. There are eight different directional prediction modes and one DC prediction mode for subblocks of size 4×4 and 8×8, and three different directional prediction modes and one DC prediction mode for the macroblocks of 16×16 pixels. In HEVC, spatial prediction can be performed for CUs of size 4×4, 8×8, 16×16 or 32×32. There are 34 different directional prediction modes for all CU sizes.
Within the video encoder 100, a decoding unit is incorporated for obtaining a decoded video signal. In compliance with the encoding steps, the decoding steps include inverse quantization and inverse transformation 120. The decoded prediction error signal differs from the original prediction error signal due to the quantization error, called also quantization noise. A reconstructed signal is then obtained by adding 125 the decoded prediction error signal to the prediction signal. In order to maintain the compatibility between the encoder side and the decoder side, the prediction signal is obtained based on the encoded and subsequently decoded video signal which is known at both sides the encoder and the decoder. Due to the quantization, quantization noise is superposed to the reconstructed video signal. Due to the block-wise coding, the superposed noise often has blocking characteristics, which result, in particular for strong quantization, in visible block boundaries in the decoded image. In order to reduce these artifacts, a deblocking filter 130 is applied to every reconstructed image block.
In order to be decoded, inter-encoded images require previously encoded and subsequently decoded (reconstructed) image(s). Temporal prediction may be performed uni-directionally, i.e., using only video frames ordered in time before the current frame to be encoded, or bi-directionally, i.e., using also video frames following the current frame. Uni-directional temporal prediction results in inter-encoded images called P frames; bi-directional temporal prediction results in inter-encoded images called B frames. In general, an inter-encoded image may comprise any of P-, B-, or even I-type macroblocks. An inter-encoded macroblock (P- or B-macroblock) or an inter-encoded CU is predicted by employing motion compensated prediction 160. First, a best-matching block is found for the current block within the previously encoded and decoded video frames by a motion estimator 165. The best-matching block then becomes a prediction signal and the relative displacement between the current block and its best match is signalized as motion data in the form of three-dimensional (one temporal, two spatial) motion within the bitstream comprising also the encoded video data. In order to optimize the prediction accuracy, motion vectors may be determined with a spatial sub-pixel resolution e.g. half pixel or quarter pixel resolution. This is enabled by an interpolation filter 150.
For both, the intra- and the inter-encoding modes, the differences between the current input signal and the prediction signal are transformed and quantized by the unit 110, resulting in the quantized coefficients. Generally, an orthogonal transformation such as a two-dimensional discrete cosine transformation (DCT) or an integer version thereof is employed since it reduces the correlation of the natural video images efficiently. After the transformation, low frequency components are usually more important for image quality than high frequency components so that more bits can be spent for coding the low frequency components than the high frequency components. In the entropy coder, the two-dimensional matrix of quantized coefficients is converted into a one-dimensional array. Typically, this conversion is performed by a so-called zig-zag scanning, which starts with the DC-coefficient in the upper left corner of the two-dimensional array and scans the two-dimensional array in a predetermined sequence ending with an AC coefficient in the lower right corner. As the energy is typically concentrated in the left upper part of the two-dimensional matrix of coefficients, corresponding to the lower frequencies, the zig-zag scanning results in an array where usually the last values are zero. This allows for efficient encoding using run-length codes as a part of/before the actual entropy coding. H.264/MPEG-4 AVC employs scalar quantization 110, which can be controlled by a quantization parameter (QP) and a customizable quantization matrix (QM). One of 52 quantizers is selected for each macroblock by the quantization parameter. In addition, quantization matrix is specifically designed to keep certain frequencies in the source to avoid losing image quality.
Quantization matrix in H.264/MPEG-4 AVC can be adapted to the video sequence and signalized together with the video data.
The H.264/MPEG-4 AVC includes two functional layers, a Video Coding Layer (VCL) and a Network Abstraction Layer (NAL). The VCL provides the encoding functionality as briefly described above. The NAL encapsulates information elements into standardized units called NAL units according to their further application such as transmission over a channel or storing in storage. The information elements are, for instance, the encoded prediction error signal or other information necessary for the decoding of the video signal such as type of prediction, quantization parameter, motion vectors, etc. There are VCL NAL units containing the compressed video data and the related information, as well as non-VCL units encapsulating additional data such as parameter set relating to an entire video sequence, or a Supplemental Enhancement Information (SEI) providing additional information that can be used to improve the decoding performance.
In order to improve the image quality, a so-called post filter 280 may be applied at the decoder side 200. The H.264/MPEG-4 AVC standard enables sending of post filter information for such a post filter via the SEI message. The post filter information is determined at the encoder side by means of a post filter design unit 180, which compares the locally decoded signal and original input signal. In general, the post filter information is an information allowing decoder to set up an appropriate filter. It may include directly the filter coefficients or another information enabling setting up the filter. The filter information, which is output by the post filter design unit 180 is also fed to the entropy coding unit 190 in order to be encoded and inserted into the encoded signal. Such an adaptive filter may also be used as a second post-filter, as for example in the HEVC standard.
FIG. 2 illustrates an example decoder 200 compliant with the H.264/MPEG-4 AVC video coding standard. The encoded video signal (input signal to the decoder) bitstream first passes to entropy decoder 290, which decodes the quantized coefficients, the information elements necessary for decoding such as motion data, mode of prediction etc., and the post filter information. In the entropy decoder 290, spatial intra-prediction mode information is extracted from the bitstream, indicating the type/mode of the spatial prediction applied to the block to be decoded. The extracted information is provided to the spatial prediction unit 270 (not shown in FIG. 2). The quantized coefficients are inversely scanned in order to obtain a two-dimensional matrix, which is then fed to inverse quantization and inverse transformation 220. After inverse quantization and inverse transformation, a decoded (quantized) prediction error signal is obtained, which corresponds to the differences obtained by subtracting the prediction signal from the signal input to the encoder in the case no quantization noise is introduced.
The prediction signal is obtained from either a temporal or a spatial prediction 260 and 270, respectively, which are switched 275 in accordance with a received information element signalizing the prediction applied at the encoder. The decoded information elements further include the information necessary for the prediction such as prediction type in the case of intra-prediction (a spatial intra-prediction mode information) and motion data in the case of motion compensated prediction. Depending on the current value of the motion vector, interpolation of pixel values may be needed in order to perform the motion compensated prediction. This interpolation is performed by an interpolation filter 250. The quantized prediction error signal in the spatial domain is then added by means of an adder 225 to the prediction signal obtained either from the motion compensated prediction 260 or intra-frame prediction 270. The reconstructed image may be passed through a deblocking filter 230 and the resulting decoded signal is stored in the memory 240 to be applied for temporal or spatial prediction of the following blocks. The post filter information is fed to a post filter 280, which sets up a post filter accordingly. The post filter is then applied to the decoded signal in order to further improve the image quality.
Directional intra prediction modes are very efficient to predict sharp edges, but are not adapted to predict smooth or out of focus regions. For such regions smoothing the references with a low pass filter is particularly appropriate and provides gains in terms of coding efficiency. Thus, applying a low pass filter on the reference pixels for intra prediction is a known technique to remove the quantization noise added to the reconstructed pixels and to improve intra prediction, especially when the region of the image to be predicted is blurred or out of focus.