1. Field of the Invention
The present invention relates to the filtering of images. In particular, the present invention relates to deblocking filtering and to the derivation of decision criteria for deblocking filtering.
2. Description of the Related Art
At present, the majority of standardized video coding algorithms are based on hybrid video coding. Hybrid video coding methods typically combine several different lossless and lossy compression schemes in order to achieve the desired compression gain. Hybrid video coding is also the basis for ITU-T standards (H.26x standards such as H.261, H.263) as well as ISO/IEC standards (MPEG-X standards such as MPEG-1, MPEG-2, and MPEG-4). The most recent and advanced video coding standard is currently the standard denoted as H.264/MPEG-4 advanced video coding (AVC) which is a result of standardization efforts by joint video team (JVT), a joint team of ITU-T and ISO/IEC MPEG groups. This codec is being further developed by Joint Collaborative Team on Video Coding (JCT-VC) under a name High-Efficiency Video Coding (HEVC), aiming, in particular at improvements of efficiency regarding the high-resolution video coding.
A video signal input to an encoder is a sequence of images called frames, each frame being a two-dimensional matrix of pixels. All the above-mentioned standards based on hybrid video coding include subdividing each individual video frame into smaller blocks consisting of a plurality of pixels. The size of the blocks may vary, for instance, in accordance with the content of the image. The way of coding may be typically varied on a per block basis. The largest possible size for such a block, for instance in HEVC, is 64×64 pixels. It is then called the largest coding unit (LCU). In H.264/MPEG-4 AVC, a macroblock (usually denoting a block of 16×16 pixels) was the basic image element, for which the encoding is performed, with a possibility to further divide it in smaller subblocks to which some of the coding/decoding steps were applied.
Typically, the encoding steps of a hybrid video coding include a spatial and/or a temporal prediction. Accordingly, each block to be encoded is first predicted using either the blocks in its spatial neighborhood or blocks from its temporal neighborhood, i.e. from previously encoded video frames. A block of differences between the block to be encoded and its prediction, also called block of prediction residuals, is then calculated. Another encoding step is a transformation of a block of residuals from the spatial (pixel) domain into a frequency domain. The transformation aims at reducing the correlation of the input block. Further encoding step is quantization of the transform coefficients. In this step the actual lossy (irreversible) compression takes place. Usually, the compressed transform coefficient values are further compacted (losslessly compressed) by means of an entropy coding. In addition, side information necessary for reconstruction of the encoded video signal is encoded and provided together with the encoded video signal. This is for example information about the spatial and/or temporal prediction, amount of quantization, etc.
FIG. 1 is an example of a typical H.264/MPEG-4 AVC and/or HEVC video encoder 100. A subtractor 105 first determines differences e between a current block to be encoded of an input video image (input signal s) and a corresponding prediction block ŝ, which is used as a prediction of the current block to be encoded. The prediction signal may be obtained by a temporal or by a spatial prediction 180. The type of prediction can be varied on a per frame basis or on a per block basis. Blocks and/or frames predicted using temporal prediction are called “inter”-encoded and blocks and/or frames predicted using spatial prediction are called “intra”-encoded. Prediction signal using temporal prediction is derived from the previously encoded images, which are stored in a memory. The prediction signal using spatial prediction is derived from the values of boundary pixels in the neighboring blocks, which have been previously encoded, decoded, and stored in the memory. The difference e between the input signal and the prediction signal, denoted prediction error or residual, is transformed 110 resulting in coefficients, which are quantized 120. Entropy encoder 190 is then applied to the quantized coefficients in order to further reduce the amount of data to be stored and/or transmitted in a lossless way. This is mainly achieved by applying a code with code words of variable length wherein the length of a code word is chosen based on the probability of its occurrence.
Within the video encoder 100, a decoding unit is incorporated for obtaining a decoded (reconstructed) video signal s′. In compliance with the encoding steps, the decoding steps include dequantization and inverse transformation 130. The so obtained prediction error signal e′ differs from the original prediction error signal due to the quantization error, called also quantization noise. A reconstructed image signal s′ is then obtained by adding 140 the decoded prediction error signal e′ to the prediction signal ŝ. In order to maintain the compatibility between the encoder side and the decoder side, the prediction signal ŝ is obtained based on the encoded and subsequently decoded video signal which is known at both sides the encoder and the decoder.
Due to the quantization, quantization noise is superposed to the reconstructed video signal. Due to the block-wise coding, the superposed noise often has blocking characteristics, which result, in particular for strong quantization, in visible block boundaries in the decoded image. Such blocking artifacts have a negative effect upon human visual perception. In order to reduce these artifacts, a deblocking filter 150 is applied to every reconstructed image block. The deblocking filter is applied to the reconstructed signal s′. For instance, the deblocking filter of H.264/MPEG-4 AVC has the capability of local adaptation. In the case of a high degree of blocking noise, a strong (narrow-band) low pass filter is applied, whereas for a low degree of blocking noise, a weaker (broad-band) low pass filter is applied. The strength of the low pass filter is determined by the prediction signal ŝ and by the quantized prediction error signal e′. Deblocking filter generally smoothes the block edges leading to an improved subjective quality of the decoded images. Moreover, since the filtered part of an image is used for the motion compensated prediction of further images, the filtering also reduces the prediction errors, and thus enables improvement of coding efficiency.
After a deblocking filter, a sample adaptive offset 155 and/or adaptive loop filter 160 may be applied to the image including the already deblocked signal s″. Whereas the deblocking filter improves the subjective quality, sample adaptive offset (SAO) and ALF aim at improving the pixel-wise fidelity (“objective” quality). In particular, SAO adds an offset in accordance with the immediate neighborhood of a pixel. The adaptive loop filter (ALF) is used to compensate image distortion caused by the compression. Typically, the adaptive loop filter is a Wiener filter with filter coefficients determined such that the mean square error (MSE) between the reconstructed s′ and source images s is minimized. The coefficients of ALF may be calculated and transmitted on a frame basis. ALF can be applied to the entire frame (image of the video sequence) or to local areas (blocks). An additional side information indicating which areas are to be filtered may be transmitted (block-based, frame-based or quadtree-based).
In order to be decoded, inter-encoded blocks require also storing the previously encoded and subsequently decoded portions of image(s) in the reference frame buffer 170. An inter-encoded block is predicted 180 by employing motion compensated prediction. First, a best-matching block is found for the current block within the previously encoded and decoded video frames by a motion estimator. The best-matching block then becomes a prediction signal and the relative displacement (motion) between the current block and its best match is then signalized as motion data in the form of three-dimensional motion vectors within the side information provided together with the encoded video data. The three dimensions consist of two spatial dimensions and one temporal dimension. In order to optimize the prediction accuracy, motion vectors may be determined with a spatial sub-pixel resolution e.g. half pixel or quarter pixel resolution. A motion vector with spatial sub-pixel resolution may point to a spatial position within an already decoded frame where no real pixel value is available, i.e. a sub-pixel position. Hence, spatial interpolation of such pixel values is needed in order to perform motion compensated prediction. This may be achieved by an interpolation filter (in FIG. 1 integrated within Prediction block 180).
For both, the intra- and the inter-encoding modes, the differences e between the current input signal and the prediction signal are transformed 110 and quantized 120, resulting in the quantized coefficients. Generally, an orthogonal transformation such as a two-dimensional discrete cosine transformation (DCT) or an integer version thereof is employed since it reduces the correlation of the natural video images efficiently. After the transformation, lower frequency components are usually more important for image quality then high frequency components so that more bits can be spent for coding the low frequency components than the high frequency components. In the entropy coder, the two-dimensional matrix of quantized coefficients is converted into a one-dimensional array. Typically, this conversion is performed by a so-called zig-zag scanning, which starts with the DC-coefficient in the upper left corner of the two-dimensional array and scans the two-dimensional array in a predetermined sequence ending with an AC coefficient in the lower right corner. As the energy is typically concentrated in the left upper part of the two-dimensional matrix of coefficients, corresponding to the lower frequencies, the zig-zag scanning results in an array where usually the last values are zero. This allows for efficient encoding using run-length codes as a part of/before the actual entropy coding.
The H.264/MPEG-4 H.264/MPEG-4 AVC as well as HEVC includes two functional layers, a Video Coding Layer (VCL) and a Network Abstraction Layer (NAL). The VCL provides the encoding functionality as briefly described above. The NAL encapsulates information elements into standardized units called NAL units according to their further application such as transmission over a channel or storing in storage. The information elements are, for instance, the encoded prediction error signal or other information necessary for the decoding of the video signal such as type of prediction, quantization parameter, motion vectors, etc. There are VCL NAL units containing the compressed video data and the related information, as well as non-VCL units encapsulating additional data such as parameter set relating to an entire video sequence, or a Supplemental Enhancement Information (SEI) providing additional information that can be used to improve the decoding performance.
FIG. 2 illustrates an example decoder 200 according to the H.264/MPEG-4 AVC or HEVC video coding standard. The encoded video signal (input signal to the decoder) first passes to entropy decoder 290, which decodes the quantized coefficients, the information elements necessary for decoding such as motion data, mode of prediction etc. The quantized coefficients are inversely scanned in order to obtain a two-dimensional matrix, which is then fed to inverse quantization and inverse transformation 230. After inverse quantization and inverse transformation 230, a decoded (quantized) prediction error signal e′ is obtained, which corresponds to the differences obtained by subtracting the prediction signal from the signal input to the encoder in the case no quantization noise is introduced and no error occurred.
The prediction signal is obtained from either a temporal or a spatial prediction 280. The decoded information elements usually further include the information necessary for the prediction such as prediction type in the case of intra-prediction and motion data in the case of motion compensated prediction. The quantized prediction error signal in the spatial domain is then added with an adder 240 to the prediction signal obtained either from the motion compensated prediction or intra-frame prediction 280. The reconstructed image s′ may be passed through a deblocking filter 250, sample adaptive offset processing 255, and an adaptive loop filter 260 and the resulting decoded signal is stored in the memory 270 to be applied for temporal or spatial prediction of the following blocks/images.
A further illustration of an exemplary hybrid video encoder is shown in FIG. 3. The encoder of FIG. 3 differs from the encoder of FIG. 1 in that deblocking filter 150 of FIG. 1 has been subdivided in a filter 350a for horizontal deblocking of vertical edges and a filter 350b for vertical deblocking of horizontal edges. Filter 350a is applied to the reconstructed signal S′ being the output of adder 140. The output of filter 350b, i.e. an image with deblocked vertical edges as denoted S″ and input into filter 350b. The output signal of filter 350b, i.e. a vertically and horizontally deblocked image, and has been denoted S′″. Moreover, FIG. 3 explicitly shows the quantization parameter QP to be input into entropy encoder 190, horizontal deblocking filter 350a and vertical deblocking filter 350b. 
The remaining blocks of FIG. 3 correspond to respective blocks of FIG. 1, and like features have been denoted by the same reference numerals in FIG. 3 and FIG. 1. In FIG. 3, the adapted loop filter 160 has been explicitly described as a Wiener filter, and the blocks 155 (SAO) and 160 (ALF) have been interchanged. The sequence of these steps is, however, not essential for the present invention. Moreover, reference frame buffer 170 has not been explicitly shown in FIG. 3.
In view of the close analogy of the respective features of the encoder of FIG. 1 and the decoder of FIG. 2, a person skilled in the art is aware of how to modify FIG. 2 in order to illustrate a decoder wherein horizontal and vertical deblocking in two subsequent steps is made explicit. A respective figure has therefore been omitted herein.
When compressing and decompressing an image, the blocking artifacts are typically the most annoying artifacts for the user. The deblocking filtering helps to improve the perceptual experience of the user by smoothing the edges between the blocks in the reconstructed image. One of the difficulties in deblocking filtering is to correctly decide between an edge caused by blocking due to the application of a quantizer and between edges which are part of the coded signal. Application of the deblocking filter is only desirable if the edge on the block boundary is due to compression artifacts. In other cases, by applying the deblocking filter, the reconstructed signal may be despaired, distorted. Another difficulty is the selection of an appropriate filter for deblocking filtering. Typically, the decision is made between several low pass filters with different frequency responses resulting in strong or weak low pass filtering. In order to decide whether deblocking filtering is to be applied and to select an appropriate filter, image data in the proximity of the boundary of two blocks are considered.
For instance, quantization parameters of the neighboring blocks may be considered. Alternatively or in addition, prediction modes such as intra or inter may be considered. Another possibility is to evaluated quantized prediction error coefficients, for instance, how many of them are quantized to zero. Reference frames used for the motion compensated prediction may also be indicative for selection of the filter, for instance, whether the same reference frames are used for prediction of the current block and the neighboring blocks. The decision may also be based on motion vectors used for the motion compensated prediction and on whether the motion vectors for the current block and for the neighboring blocks are the same or better they defer. The decision may involve spatial positions of the samples such as distance to the block patch.
For instance, H.264/MPEG-4 AVC evaluates the absolute values of the first derivation (derivative) in each of the two neighboring blocks, the boundary of which is to be deblocked. In addition, absolute values of the first derivative across the edge between the two blocks are evaluated, as described, for instance in H.264/MPEG-4 AVC standard, Section 8.7.2.2. A similar approach is also described in US 2008/0025632 A. The decision is taken for all pixels to be filtered based on the same criterion and the selection is performed for the entire block. HEVC employs a similar mechanism, however, uses also a second derivative.
According to these approaches, for a particular edge (boundary) between two blocks, it has to be decided whether to apply deblocking at all, and if so, which filter out of a plurality of different deblocking filters having different filter strengths is to be applied. Generally speaking, a deblocking filter having a higher filter strength (“strong filter”) performs more substantial amendments to the pixel values adjacent to the boundary than a filter having less filter strength (“weak filter”). The aim of the decision whether to filter or not is to filter only those samples, for which the large signal change detected at the block boundary results from the quantization applied in the block-wise processing. The result of this filtering is a smooth signal at the block boundary. The smooth signal is less annoying to the viewer than the blocking artefact. Those samples for which the large signal change at the block boundary belongs to the original signal to be coded should not be filtered in order to keep high frequencies, and thus the visual sharpness. In the case of wrong decisions, the image is either unnecessarily smoothened or remains blocky.
A plurality of decision criteria have been derived in the art in order to perform the decisions described above. The decision criteria operate on the basis of parameters specifying particulars of the pixel value distribution on both sides of the block boundary. Generally speaking, at first a parameter (boundary strength, BS) is derived to indicate how pronounced block artefacts at a block boundary appear. Based thereon, parameters for defining decision thresholds are derived. Each step in said decision flow, and in particular, in the derivation of the boundary strength (BS) consumes some (1 or several) CPU cycles. Moreover, each of the parameters involved in the decision flow requires a respective memory space. For reasons of processing efficiency, it is therefore desirable to perform the necessary calculations and decisions with as few intermediate steps and parameters as possible.