1. Field of the Invention
The present invention relates to prediction of a block of an image. In particular, the present invention relates to spatial (intra-image) prediction of an image block.
2. Description of the Related Art
Spatial prediction has been employed in many applications. In particular, spatial prediction forms an essential part of many image and video coding and processing applications. In hybrid image or video coding algorithms, spatial prediction is typically employed for determining a prediction for an image block based on the pixels of already encoded/decoded blocks. On the other hand, spatial prediction may also be used as a part of post processing the decoded image or video signal, in particular for error concealment.
The majority of standardized video coding algorithms are based on hybrid video coding. Hybrid video coding methods typically combine several different lossless and lossy compression schemes in order to achieve the desired compression gain. Hybrid video coding is also the basis for ITU-T standards (H.26x standards such as H.261, H.263) as well as ISO/IEC standards (MPEG-X standards such as MPEG-1, MPEG-2, and MPEG-4). The most recent and advanced video coding standard is currently the standard denoted as H.264/MPEG-4 advanced video coding (AVC) which is a result of standardization efforts by joint video team (JVT), a joint team of ITU-T and ISO/IEC MPEG groups.
A video signal input to an encoder is a sequence of images called frames, each frame being a two-dimensional matrix of pixels. All the above-mentioned standards based on hybrid video coding include subdividing each individual video frame into smaller blocks consisting of a plurality of pixels. Typically, a macroblock (usually denoting a block of 16×16 pixels) is the basic image element, for which the encoding is performed. However, various particular encoding steps may be performed for smaller image elements, denoted subblocks or simply blocks and having the size of, for instance, 8×8, 4×4, 16×8, etc.
FIG. 1 is an example of a typical H.264/MPEG-4 AVC standard compliant video encoder 100. A subtractor 105 first determines differences between a current block to be encoded of an input video image (input signal) and a corresponding prediction block, which is used for the prediction of the current block to be encoded. In H.264/MPEG-4 AVC, the prediction signal is obtained either by a temporal or by a spatial prediction. The type of prediction can be varied on a per frame basis, per slice basis or on a per macroblock basis.
Macroblocks predicted using temporal prediction are called inter-encoded and macroblocks predicted using spatial prediction are called intra-encoded. The type of prediction for a video frame can be set by the user or selected by the video encoder so as to achieve a possibly high compression gain. In accordance with the selected type of prediction, an intra/inter switch 175 provides corresponding prediction signal to the subtractor 105. The prediction signal using temporal prediction is derived from the previously encoded images, which are stored in a memory 140. The prediction signal using spatial prediction is derived from the values of boundary pixels in the neighboring blocks of the same frame, which have been previously encoded, decoded, and stored in the memory 140. The memory unit 140 thus operates as a delay unit that allows a comparison between current signal values to be encoded and the prediction signal values generated from previous signal values. The memory 140 can store a plurality of previously encoded video frames. The difference between the input signal and the prediction signal, denoted prediction error signal or residual signal, is transformed resulting in coefficients, which are quantized 110. Entropy encoder 190 is then applied to the quantized coefficients in order to further reduce the amount of data in a lossless way. This is mainly achieved by applying a code with code words of variable length wherein the length of a code word is chosen based on the probability of its occurrence.
Intra-encoded images (called also I-type images or I frames) consist solely of macroblocks that are intra-encoded, i.e. intra-encoded images can be decoded without reference to any other previously decoded image. The intra-encoded images provide error resilience for the encoded video sequence since they refresh the video sequence from errors possibly propagated from frame to frame due to temporal prediction. Moreover, I frames enable a random access within the sequence of encoded video images. Intra-frame prediction uses a predefined set of intra-prediction modes. Some of the intra-prediction modes predict the current block using the boundary pixels of the neighboring blocks already encoded. Other intra-prediction modes, as template matching for example, use a search area made of already encoded pixels belonging to the same frame. The predefined set of intra-prediction modes includes some directional spatial intra-prediction modes. The different modes of directional spatial intra-prediction refer to different directions of the applied two-dimensional prediction. This allows efficient spatial intra-prediction in the case of various edge directions. The prediction signal obtained by such an intra-prediction is then subtracted from the input signal by the subtractor 105 as described above. In addition, spatial intra-prediction mode information indicating the prediction mode is provided to the entropy encoder 190 (not shown in FIG. 1), where it is entropy encoded and provided together with the encoded video signal.
In the H.264/MPEG-4 AVC intra coding scheme, the spatial prediction is performed for subblocks of sizes 4×4, 8×8 or 16×16 pixels in order to reduce spatial redundancy. Intra-frame prediction uses a predefined set of intra-prediction modes, which basically predict the current block using the boundary pixels of the neighboring blocks already coded. FIG. 3A illustrates an image block 300 with pixels p(0, 0) to p(N, M), M+1 being the number of lines of the block and N+1 being the number of columns of the block. Reference pixels p′ may be used for the prediction since they belong to already encoded blocks. In particular, reference pixels 310 p′(0, −1) to p′(N, −1) and pixels 330 p′(−1, 0) to p′(−1, M) are the pixels on the top and left block's boundary, respectively. Pixels 320 p′(N+1, −1) to p′(N+1+K, −1) may also be used for the prediction, especially if the edge crossing the block is cutting them, K being the number of such pixels. In a similar way, also pixels p′(−1, M+1) to p′(−1, M+1+L) below left reference pixels 330 may be employed (not shown), L being the number of such pixels. Pixel p′(−1, −1) 340 may also be used.
The different types of directional spatial prediction refer to different directions, i.e. the direction of the applied two-dimensional extrapolation as illustrated in FIG. 3B. There are eight different directional prediction modes and one DC prediction mode for subblocks of size 4×4 and 8×8, and three different directional prediction modes and one DC prediction mode for the macroblocks of 16×16 pixels, in the standard H264/MPEG4 AVC. The future HEVC standard that is currently being developed defines up to 34 different prediction modes, including a DC mode.
FIG. 3B schematically illustrates the eight directional prediction modes used for the subblocks of 4×4 pixels in the H264/MPEG4 AVC standard. The eight prediction modes from FIG. 3B are labeled by a value 302 of range {0,1,3,4,5,6,7,8} and associated with predictions in eight different directions 301. The remaining one prediction mode is labeled by value 2 and called “DC mode”. In the DC mode, all pixels in a block are predicted by a single value, which is the mean value of the surrounding reference pixels. In the eight directional modes, the reference pixels are repeated along the corresponding directions 301. For instance, the vertical mode labeled with “0” consists in repeating vertically the reference pixels of the row immediately above the current block. The horizontal mode labeled with “1” consists in repeating horizontally the reference pixels of the column immediately to the left of the current block. The remaining modes labeled with a value from 3 to 8 are diagonal prediction modes, according to which the reference pixels are diagonally repeated in the respective diagonal direction.
Within the video encoder 100, a decoding unit is incorporated for obtaining a decoded video signal. In compliance with the encoding steps, the decoding steps include inverse quantization and inverse transformation 120. The decoded prediction error signal differs from the original prediction error signal due to the quantization error, called also quantization noise. A reconstructed signal is then obtained by adding 125 the decoded prediction error signal to the prediction signal. In order to maintain the compatibility between the encoder side and the decoder side, the prediction signal is obtained based on the encoded and subsequently decoded video signal which is known at both sides the encoder and the decoder. Due to the quantization, quantization noise is superposed to the reconstructed video signal. Due to the block-wise coding, the superposed noise often has blocking characteristics, which result, in particular for strong quantization, in visible block boundaries in the decoded image. In order to reduce these artifacts, a deblocking filter 130 is applied to every reconstructed image block.
In order to be decoded, inter-encoded images require previously encoded and subsequently decoded (reconstructed) image(s). Temporal prediction may be performed uni-directionally, i.e., using only video frames ordered in display order before the current frame to be encoded, or bi-directionally, i.e., using also video frames following the current frame in display order. Inter-encoded images called P frames can contain only blocks predicted with spatial intra prediction or unidirectional temporal prediction. Inter-encoded images called B frames can contain blocks predicted with spatial intra prediction or unidirectional temporal prediction or bidirectional temporal prediction. An inter-encoded macroblock (unidirectionally or bidirectionally predicted macroblock) is predicted by employing motion compensated prediction 160. First, a best-matching block is found for the current block within the previously encoded and decoded video frames by a motion estimator 165. The best-matching block then becomes a prediction signal and the relative displacement between the current block and its best match is signalized as motion data in the form of three-dimensional (one temporal, two spatial) motion within the bitstream comprising also the encoded prediction error or residual data. In order to optimize the prediction accuracy, motion vectors may be determined with a spatial sub-pixel resolution e.g. half pixel or quarter pixel resolution. This is enabled by an interpolation filter 150.
For both, the intra- and the inter-encoding modes, the differences between the current input signal and the prediction signal are transformed and quantized by the unit 110, resulting in quantized coefficients. Generally, an orthogonal transformation such as a two-dimensional discrete cosine transformation (DCT) or an integer version thereof is employed since it reduces the correlation of the natural video images efficiently. After the transformation, low frequency components are usually more important for image quality than high frequency components so that more bits can be spent for coding the low frequency components than the high frequency components. In the entropy coder, the two-dimensional matrix of quantized coefficients is converted into a one-dimensional array. Typically, this conversion is performed by a so-called zig-zag scanning, which starts with the DC-coefficient in the upper left corner of the two-dimensional array and scans the two-dimensional array in a predetermined sequence ending with an AC coefficient in the lower right corner. As the energy is typically concentrated in the left upper part of the two-dimensional matrix of coefficients, corresponding to the lower frequencies, the zig-zag scanning results in an array where usually the last values are zero. This allows for efficient encoding using run-length codes as a part of/before the actual entropy coding. H.264/MPEG-4 AVC employs scalar quantization 110, which can be controlled by a quantization parameter (QP) and a customizable quantization matrix (QM). One of 52 quantizers is selected for each macroblock by the quantization parameter. In addition, quantization matrix is specifically designed to keep certain frequencies in the source to avoid losing image quality. Quantization matrix in H.264/MPEG-4 AVC can be adapted to the video sequence and signalized together with the video data.
The H.264/MPEG-4 AVC includes two functional layers, a Video Coding Layer (VCL) and a Network Abstraction Layer (NAL). The VCL provides the coding functionality as briefly described above. The NAL encapsulates information elements into standardized units called NAL units according to their further application such as transmission over a channel or storing in storage. The information elements are, for instance, the encoded prediction error signal or other information necessary for the decoding of the video signal such as type of prediction, quantization parameter, motion vectors, etc. There are VCL NAL units containing the compressed video data and the related information, as well as non-VCL units encapsulating additional data such as parameter set relating to an entire video sequence, or a Supplemental Enhancement Information (SEI) providing additional information that can be used to improve the decoding performance.
In order to improve the image quality, a so-called post filter 280 may be applied at the decoder side 200. The H.264/MPEG-4 AVC standard enables sending of post filter information for such a post filter via the SEI message. The post filter information is determined at the encoder side by means of a post filter design unit 180, which compares the locally decoded signal and original input signal. In general, the post filter information is an information allowing decoder to set up an appropriate filter. It may include directly the filter coefficients or another information enabling setting up the filter. The filter information, which is outputted by the post filter design unit 180 is also fed to the entropy coding unit 190 in order to be encoded and inserted into the encoded signal.
FIG. 2 illustrates an example decoder 200 compliant with the H.264/MPEG-4 AVC video coding standard. The encoded video signal (input signal to the decoder) bitstream first passes to entropy decoder 290, which decodes the quantized coefficients, the information elements necessary for decoding such as motion data, mode of prediction etc., and the post filter information. In the entropy decoder 290, a spatial intra-prediction mode information or a motion vector information is extracted from the bitstream, indicating the type/mode of the spatial prediction or the motion data applied to the block to be decoded. The extracted information is provided to the spatial prediction unit 270 or the motion compensated prediction unit 260 (not shown in FIG. 2). The quantized coefficients are inversely scanned in order to obtain a two-dimensional matrix, which is then fed to inverse quantization and inverse transformation 220. After inverse quantization and inverse transformation, a decoded (quantized) prediction error signal is obtained, which corresponds to the differences obtained by subtracting the prediction signal from the signal input to the encoder in the case no quantization noise is introduced.
The prediction signal is obtained from either a temporal or a spatial prediction 260 and 270, respectively, which are switched 275 in accordance with a received information element signalizing the prediction applied at the encoder. The decoded information elements further include the information necessary for the prediction such as prediction type in the case of intra-prediction (a spatial intra-prediction mode information) and motion data in the case of motion compensated prediction. Depending on the current value of the motion vector, interpolation of pixel values may be needed in order to perform the motion compensated prediction. This interpolation is performed by an interpolation filter 250. The quantized prediction error signal in the spatial domain is then added by means of an adder 225 to the prediction signal obtained either from the motion compensated prediction 260 or intra-frame prediction 270. The reconstructed image may be passed through a deblocking filter 230 and the resulting decoded signal is stored in the memory 240 to be applied for temporal or spatial prediction of the following blocks. The post filter information may be fed to a post filter 280, which sets up a post filter accordingly. The post filter is then applied to the decoded signal in order to further improve the image quality.
FIG. 4 illustrates intra prediction of a block 410 of an original image with an edge. The edge enters the block 410 on the top and curves smoothly to the right. A prediction signal for the block 410 is obtained by extrapolating the available reference pixels. In FIG. 4, the block to be predicted 420 can be predicted by using a vertical prediction mode (mode number 0 of FIG. 3B) since the edge is vertical in the upper part of the original block. The entering edge is thus prolonged through the block. This prediction signal differs from the original image block since the edge is not curved to the right. The prediction error 440 is given by the difference between the original image block and the prediction block. As can be seen from the figure, the prediction error 440 itself represents an edge which may be rather sharp, meaning that the prediction error block 440 will contain high frequency coefficients after a transformation into a domain of spatial frequency. Thus, the transform coding and the subsequent quantization shall have a reduced efficiency and the resulting coded signal shall require higher rate for the transmission.