At present, the majority of standardized video coding algorithms are based on hybrid video coding. Hybrid video coding methods typically combine several different lossless and lossy compression schemes in order to achieve the desired compression gain. Hybrid video coding is also the basis for ITU-T standards (H.26x standards such as H.261, H.263) as well as ISO/IEC standards (MPEG-X standards such as MPEG-1, MPEG-2, and MPEG-4). The most recent and advanced video coding standard is currently the standard denoted as H.264/MPEG-4 advanced video coding (AVC) which is a result of standardization efforts by joint video team (JVT), a joint team of ITU-T and ISO/IEC MPEG groups.
A video signal input to an encoder is a sequence of images called frames, each frame being a two-dimensional matrix of pixels. All the above-mentioned standards based on hybrid video coding include subdividing each individual video frame into smaller blocks consisting of a plurality of pixels. Typically, a macroblock (usually denoting a block of 16×16 pixels) is the basic image element, for which the coding is performed. However, various particular coding steps may be performed for smaller image elements, denoted sub-macroblocks or simply blocks and having the size of, for instance, 8×8, 4×4, 16×8, etc.
Typically, the coding steps of a hybrid video coding include a spatial and/or a temporal prediction. Accordingly, each block to be coded is first predicted using either the blocks in its spatial neighborhood or blocks from its temporal neighborhood, i.e. from previously coded video frames. A block of a difference between the block to be coded and a result of the prediction, also called block of prediction residuals, is then calculated. Another coding step is a transformation of a block of residuals from the spatial (pixel) domain into a frequency domain. The transformation aims at reducing the redundancy of the input block. Further coding step is quantization of the transform coefficients. In this step, the actual lossy (irreversible) compression takes place. Usually, the compressed transform coefficient values are further compacted (losslessly compressed) by means of an entropy coding. In addition, side information necessary for reconstruction of the coded video signal is coded and provided together with the coded video signal. This is for example information about the spatial and/or temporal prediction, amount of quantization, etc.
FIG. 1 is a block diagram illustrating an example of a typical H.264/MPEG-4 AVC standard compliant image coding apparatus (encoder) 100. The H.264/MPEG-4 AVC standard combines all above-mentioned coding steps. A subtractor 105 first determines differences between a current block to be coded of an input video image (input signal) and a corresponding prediction block (prediction signal), which are used for the prediction of the current block to be coded. In H.264/MPEG-4 AVC, the prediction signal is obtained either by a temporal or by a spatial prediction. The type of prediction can be varied on a per frame basis or on a per macroblock basis. Macroblocks predicted using temporal prediction (inter prediction) are called inter-coded macroblocks, and macroblocks predicted using spatial prediction (intra prediction) are called intra-coded macroblocks. The type of prediction for a video frame can be set by the user or selected by the video encoder so as to achieve a possibly high compression gain. In accordance with the selected type of prediction, an intra/inter switch 175 provides corresponding prediction signal to the subtractor 105.
The prediction signal using temporal prediction is derived from the previously coded images which are stored in a memory 140. The prediction signal obtained by an intra prediction unit 170 using spatial prediction is derived from the values of boundary pixels in the neighboring blocks, which have been previously coded, decoded, and stored in the memory 140. The memory 140 thus operates as a delay unit that allows a comparison between current signal values to be coded and the prediction signal values generated from previous signal values. The memory 140 can store a plurality of previously coded video frames. A transform quantization unit 110 transforms a difference (prediction error signal) between the input signal and the prediction signal, denoted prediction error or residual, resulting in coefficients of frequency components. An entropy coding unit 190 entropy-codes the quantized coefficients in order to further reduce the amount of data in a lossless way. This is mainly achieved by applying a code with code words of variable length wherein the length of a code word is chosen based on the probability of occurrence thereof.
Intra-coded images (called also I-pictures, I-type images or I-frames) consist solely of macroblocks that are intra-coded. In other words, intra-coded images can be decoded without reference to any other previously decoded image. The intra-coded images provide error resilience for the coded video sequence since they refresh the video sequence from errors possibly propagated from frame to frame due to temporal prediction. Moreover, I-frames enable a random access within the sequence of coded video images. Intra-frame prediction uses a predefined set of intra-prediction modes, which basically predict the current block using the boundary pixels of the neighboring blocks already coded. The different modes of spatial intra-prediction refer to different directions of the applied two-dimensional prediction. This allows efficient spatial intra-prediction in the case of various edge directions. The prediction signal obtained by such an intra-prediction is then subtracted from the input signal by the subtractor 105 as described above. In addition, spatial intra-prediction mode information is entropy coded and provided together with the coded video signal.
Within the image coding apparatus 100, a decoding unit is incorporated for obtaining a decoded video signal. In compliance with the coding steps, the image coding apparatus 100 includes an inverse quantization/inverse transformation unit 120 that performs the decoding steps. The inverse quantization/inverse transformation unit 120 inverse-quantizes and inverse-transforms quantized coefficients to generate a quantized prediction error signal. The decoded prediction error signal differs from the original prediction error signal due to the quantization error, called also quantization noise. An adder 125 adds the quantized prediction error signal to the prediction signal to generate a reconstructed signal. In order to maintain the compatibility between the encoder (image coding apparatus 100) side and the decoder (image decoding apparatus) side, the prediction signal is obtained based on the coded and subsequently decoded video signal, which is known at both sides of the encoder and the decoder. Due to the quantization, quantization noise is superposed to the reconstructed video signal. Due to the block-wise coding, the superposed noise often has blocking characteristics, which result, in particular for strong quantization, in visible block boundaries in the decoded image that are indicated by the reconstructed signal. Such blocking artifacts (blocking artifacts) have a negative effect upon human visual perception.
In order to reduce these artifacts, a deblocking filter unit 130 applies a deblocking filter to every decoded image block indicated by a reconstructed signal. The deblocking filter is applied to the reconstructed signal, which is the sum of the prediction signal and the quantized prediction error signal. The reconstructed signal after deblocking is the decoded signal, which is generally displayed at the decoder side (if no post filtering is applied). The deblocking filter of H.264/MPEG-4 AVC has the capability of local adaptation. In the case of a high degree of blocking noise, a strong (narrow-band) low pass filter is applied, whereas for a low degree of blocking noise, a weaker (broad-band) low pass filter is applied. The strength of the low pass filter is determined by the prediction signal and by the quantized prediction error signal. Deblocking filter generally smoothes the block edges leading to an improved subjective quality of the decoded images. Moreover, since the filtered part of an image is used for the motion compensated prediction of further images, the filtering also reduces the prediction errors, and thus enables improvement of coding efficiency.
Intra-coded macroblocks are filtered before displaying, but intra prediction is carried out using the unfiltered macroblocks indicated by the decoded signal.
In order to be decoded, inter-coded images require also the previously coded and subsequently decoded image(s). Temporal prediction may be performed uni-directionally, i.e., using only video frames ordered in time before the current frame to be coded, or bi-directionally, i.e., using also video frames preceding and following the current frame. Uni-directional temporal prediction results in inter-coded images called P-frames (P-pictures); bi-directional temporal prediction results in inter-coded images called B-frames (B-pictures). In general, an inter-coded image may comprise any of P-, B-, or even I-type macroblocks.
A motion compensated prediction unit 160 predicts an inter-coded macroblock (P- or B-macroblock). First, a motion estimation unit 165 estimates a best-matching block for the current block within the previously coded and decoded video frames. The best-matching block then becomes a prediction signal, and the relative displacement (motion) between the current block and the best-matching block is then signalized as motion data in the form of three-dimensional motion vectors within the side information provided together with the coded video signal. The three dimensions consist of two spatial dimensions and one temporal dimension. In order to optimize the prediction accuracy, motion vectors may be determined with a spatial sub-pixel resolution e.g. half pixel or quarter pixel resolution. A motion vector with spatial sub-pixel resolution may point to a spatial position within an already decoded frame where no real pixel value is available, i.e. a sub-pixel position. Hence, spatial interpolation of such pixel values is needed in order to perform motion compensated prediction. The spatial interpolation is achieved by an interpolation filter unit 150. According to the H.264/MPEG-4 AVC standard, a six-tap Wiener interpolation filter with fixed filter coefficients and a bilinear filter are applied in order to obtain pixel values for sub-pixel positions in vertical and horizontal directions separately.
For both, the intra- and the inter-coding modes, the transform quantization unit 110 transforms and quantizes the differences between the current input signal and the prediction signal, resulting in the quantized coefficients. Generally, an orthogonal transformation such as a two-dimensional discrete cosine transformation (DCT) or an integer version thereof is employed since it reduces the correlation of the natural video images efficiently. After the transformation, lower frequency components are usually more important for image quality than high frequency components so that more bits can be spent for coding the low frequency components than the high frequency components. The entropy coding unit 190 converts the two-dimensional matrix of quantized coefficients into a one-dimensional array. Typically, this conversion is performed by a so-called zig-zag scanning, which starts with the DC-coefficient in the upper left corner of the two-dimensional array and scans the two-dimensional array in a predetermined sequence ending with an AC coefficient in the lower right corner. As the energy is typically concentrated in the left upper part of the image corresponding to the lower frequencies, the zig-zag scanning results in an array where usually the last values are zero. This allows for efficient coding using run-length codes as a part of/before the actual entropy coding.
H.264/MPEG-4 AVC employs scalar quantization, which can be controlled by a quantization parameter (QP) and a customizable quantization matrix (QM). One of 52 quantizers is selected for each macroblock by the quantization parameter. In addition, quantization matrix is specifically designed to keep certain frequencies in the source to avoid losing image quality. Quantization matrix in H.264/MPEG-4 AVC can be adapted to the video sequence and signalized together with the coded video signal.
The H.264/MPEG-4 AVC includes two functional layers, a Video Coding Layer (VCL) and a Network Abstraction Layer (NAL). The VCL provides the coding functionality as briefly described above. The NAL encapsulates the coded prediction error signal together with the side information necessary for the decoding of video into standardized units called NAL units according to their further application such as transmission over a channel or storing in storage. There are VCL NAL units containing the compressed video data and the related information, as well as non-VCL units encapsulating additional data such as parameter set relating to an entire video sequence, or recently added Supplemental Enhancement Information (SEI) providing additional information that can be use to improve the decoding performance.
In order to improve the image quality, a so-called post filter unit 280 (see FIG. 2) may be applied at the decoder (image decoding apparatus 200). The H.264/MPEG-4 AVC standard allows the sending of post filter information for such a post filter via the Supplemental Enhancement Information (SEI) message. The post filter information (a so-called post filter hint) is determined at the encoder side by means of a post filter design unit 180, which compares the locally decoded signal and original input signal. In general, the post filter information is information allowing a decoder to set up an appropriate filter. The post filter information may include, for instance, directly the filter coefficients. However, it may also include another information enabling setting up the filter, such as cross-correlation information related to the uncompressed signal, such as cross-correlation information between the original input image and the decoded image or between the decoded image and the quantization noise. This cross-correlation information can be used to calculate the filter coefficients. The post filter information, which is output by the post filter design unit 180 is also fed to the entropy coding unit 190 in order to be coded and inserted into the coded video signal. At the decoder, the post filter unit 280 may use the post filter information which is applied to the decoded signal before displaying.
FIG. 2 is a block diagram illustrating the example image decoding apparatus (decoder) 200 compliant with the H.264/MPEG-4 AVC video coding standard. The coded video signal (input signal to the decoder) first passes to an entropy decoding unit 290, which decodes the quantized coefficients, the information elements necessary for decoding such as motion data, mode of prediction etc., and the post filter information. The quantized coefficients are inversely scanned in order to obtain a two-dimensional matrix, which is then fed to an inverse quantization and inverse transformation unit 220. After inverse quantization and inverse transformation, a decoded (quantized) prediction error signal is obtained, which corresponds to the differences obtained by subtracting the prediction signal from the signal input to the encoder in the case no quantization noise is introduced.
The prediction signal is obtained from either a motion compensated prediction unit 260 or an intra prediction unit 270. An intra/inter switch 275 switches between the prediction signals output to the adder 225 in accordance with a received information element applied at the encoder. The decoded information elements further include the information necessary for the prediction such as prediction type in the case of intra-prediction and motion data in the case of motion compensated prediction. Depending on the current value of the motion vector, interpolation of pixel values may be needed in order to perform the motion compensated prediction. This interpolation is performed by an interpolation filter unit 250.
The quantized prediction error signal in the spatial domain is then added by means of the adder 225 to the prediction signal obtained either from the motion compensated prediction unit 260 or the intra prediction unit 270. The resulting reconstructed signal may be passed through a deblocking filter 230, and the resulting decoded signal obtained by the deblocking filter 230 is stored in the memory 240 to be applied for temporal or spatial prediction of the following blocks.
The post filter information is fed to the post filter unit 280 which sets up a post filter accordingly. The post filter unit 280 applies a post filter to the decoded signal in order to further improve the image quality. Thus, the post filter unit 280 is capable of adapting to the properties of the video signal entering the encoder.
In summary, there are three types of filters used in the latest standard H.264/MPEG-4 AVC: an interpolation filter, a deblocking filter, and a post filter. In general, the suitability of a filter depends on the image to be filtered. Therefore, a filter design capable of adapting the image characteristics is advantageous. The coefficients of such a filter may be designed as Wiener filter coefficients.
FIG. 3 illustrates a signal flow using a Wiener filter 300 for noise reduction. Noise n is added to an input signal s, resulting in a noisy signal s′ to be filtered. With the goal of reducing the noise n, a Wiener filter is applied to the signal s′, resulting in the filtered signal s″. The Wiener filter 300 is designed to minimize the mean squared error between the input signal s, which is the desired signal, and the filtered signal s″. This means that Wiener filter coefficients w correspond to the solution of the optimization problem argw min E[(s−s″)2] which can be formulated as a system of linear equations called Wiener-Hopf equations, operator E[x] indicating the expected value of x. The solution is given by w=R−1·p.
Here, w is an M×1 vector containing the optimal coefficients of Wiener filter having order M, M being a positive integer. R−1 denotes the inverse of an M×M autocorrelation matrix R of the noisy signal s′ to be filtered, and p denotes an M×1 cross correlation vector between the noisy signal s′ to be filtered and the original signal s. See NPL 1 for further details on adaptive filter design.
Thus, one of the advantages of the Wiener filter 300 is that the filter coefficients can be determined from the autocorrelation of the corrupted (noisy) signal and the cross correlation of the corrupted signal and the desired signal. In video coding, quantization noise is superposed to the original (input) video signal in the quantization step. Wiener filtering in the context of video coding aims at the reduction of the superposed quantization noise in order to minimize the mean squared error between the filtered decoded signal and the original video signal.
In general, a two dimensional filter may be separable or non-separable. A filter is said to be separable if it can be separated into two one-dimensional component filters: a vertical component filter and a horizontal component filter. A significant reduction in computational complexity can be obtained for filtering if the filter can be applied as the convolution of one one-dimensional filter in the horizontal direction and one one-dimensional filter in the vertical direction instead of performing the two-dimensional convolution with the corresponding two-dimensional filter.
Filter data (filter information) that is transmitted from the encoder to the decoder can either be directly the calculated filter coefficients or the cross correlation p which is necessary for calculating the Wiener filter and which cannot be determined at the decoder. Transmitting such side information may improve the quality of filtering, but, on the other hand, requires additional bandwidth. Similarly, further improving the filter (for instance, by increasing its order, or by its separate determining and/or application to parts of the video signal) may even more improve the video quality.