1. Field of the Invention
The present invention relates to a method for reducing visual artefacts in a digital image, which is encoded and decoded by blocks, in which filtering is performed to reduce visual artefacts due to a boundary between a current block and an adjacent block. The present invention also relates to a device for reducing visual artefacts in a digital image, which is encoded and decoded by blocks, the device comprising means for performing filtering to reduce visual artefacts due to a boundary between a current block and an adjacent block. The present invention also relates to an encoder comprising means for coding and means for locally decoding a digital image by blocks, which encoder comprises means for performing filtering to reduce visual artefacts due to a boundary between a current block and an adjacent block. The present invention also relates to a decoder comprising means for decoding a digital image by blocks, which decoder comprises means for performing filtering to reduce visual artefacts due to a boundary between a current block and an adjacent block. The present invention also relates to a terminal comprising an encoder, which comprises means for coding and means for locally decoding a digital image by blocks, means for performing filtering to reduce visual artefacts due to a boundary between a current block and an adjacent block. The present invention further relates to a terminal comprising means for decoding a digital image by blocks, means for performing filtering to reduce visual artefacts due to a boundary between a current block and an adjacent block. The present invention further relates to a storage medium for storing a software program comprising machine executable steps for coding and locally decoding a digital video signal by blocks, and for performing filtering to reduce visual artefacts due to a boundary between a current block and an adjacent block. The present invention further relates to a storage medium for storing a software program comprising machine executable steps for decoding a digital video signal by blocks, and for performing filtering to reduce visual artefacts due to a boundary between a current block and an adjacent block.
2. Description of Related Art including information disclosed under 37 CFR 1.97.
An arrangement like that shown in FIG. 1 is generally used for transferring a digital video sequence in compressed form. The digital video sequence is formed of sequential images, often referred to as frames. In some prior art digital video transmission systems, for example ITU-T H.261/H.263 recommendations, at least three frame types are defined: an I-frame (intra), a P-frame (predicted or inter), and a B-frame (bi-directional). The I-frame is generated solely on the basis of information contained in the image itself, wherein at the receiving end, this I-frame can be used to form the entire image. P-frames are formed on the basis of a preceding I-frame or P-frame, wherein at the receiving stage a preceding I-frame or P-frame is correspondingly used together with the received P-frame in order to reconstruct the image. In the composition of P-frames, for instance motion compensation is used to compress the quantity of information. B-frames are formed on the basis of one or more preceding P-frames or I-frames and/or one or more following P- or I-frames.
The frames are further divided into blocks. One frame can comprise different types of blocks. A predicted frame (e.g. inter frame) may also contain blocks that are not predicted. In other words, some blocks of a P-frame may in fact be intra coded. Furthermore, some video coders may use the concept of independent segment decoding in which case several blocks are grouped together to form segments that are then coded independently from each other. All the blocks within a certain segment are of the same type. For example, if a P-frame is composed mainly of predicted blocks and some intra-coded blocks, the frame can be considered to comprise at least one segment of intra blocks and at least one segment of predicted blocks.
As is well known, a digital image comprises an array of image pixels. In the case of a monochrome image, each pixel has a pixel value within a certain range (e.g. 0-255), which denotes the pixel's luminance. In a colour image, pixel values may be represented in a number of different ways. In a commonly used representation, referred to as the RGB colour model, each pixel is described by three values, one corresponding to the value of a Red colour component, another corresponding to the value of a Green colour component and the third corresponding to the value of a Blue colour component. Numerous other colour models exist, in which alternative representations are used. In one such alternative, known as the YUV colour model, image pixels are represented by a luminance component (Y) and two chrominance or colour difference components (U, V), each of which has an associated pixel value.
Generally, colour models that employ luminance and chrominance components provide a more efficient representation of a colour image than the RGB model. It is also known that the luminance component of such colour models generally provides the most information about the perceived structure of an image. Among other things, this allows the chrominance components of an image to be spatially sub-sampled without a significant loss in perceived image quality. For these reasons colour models that employ a luminance/chrominance representation are favoured in many applications, particularly those in which data storage space, processing power or transmission bandwidth is limited.
As stated above, in the YUV colour model, an image is represented by a luminance component and two chrominance components, Typically, the luminance information in the image is transformed with full spatial resolution. Both chrominance signals are spatially subsampled, for example a field of 16×16 pixels is subsampled into a field of 8×8 pixels. The differences in the block sizes are primarily due to the fact that the eye does not discern changes in chrominance equally well as changes in luminance, wherein a field of 2×2 pixels is encoded with the same chrominance value.
Typically, image blocks are grouped together to form macroblocks. The macroblock usually contains 16 pixels by 16 rows of luminance samples, mode information, and possible motion vectors. The macroblock is divided into four 8×8 luminance blocks and to two 8×8 chrominance blocks. Scanning (and, encoding/decoding) proceeds macroblock by macroblock, conventionally from the top-left to the bottom-right corner of the frame. Inside one macroblock the scanning (and encoding/decoding) order is from the top-left to the bottom-right corner of the macroblock.
Referring to FIG. 1, which illustrates a typical encoding and decoding system (codec) used, for example, in the transmission of digital video, a current video frame to be coded comes to the transmission system 10 as input data In(x,y). The input data In(x,y) typically takes the form of pixel value information. In the differential summer 11 it is transformed into a prediction error frame En(x,y) by subtracting from it a prediction frame Pn(x,y) formed on the basis of a previous image. The prediction error frame is coded in block 12 in the manner described hereinafter, and the coded prediction error frame is directed to a multiplexer 13. To form a new reconstructed frame, the coded prediction error frame is also directed to a decoder 14, which produces a decoded prediction error frame Ên(x,y) which is summed in a summer 15 with the prediction frame Pn(x,y), resulting in a reconstructed frame În(x,y). The reconstructed frame is saved in a frame memory 16. To code the next frame, the reconstructed frame saved in the frame memory is read as a reference frame Rn(x,y) and is transformed into a new prediction frame Pn(x,y) in a motion compensation and prediction block 17 according to the formula:Pn(x,y)=Rn[x+Dx(x,y), y+Dy(x,y)]  (1)The pair of numbers [Dx(x,y), Dy(x,y)] is called the motion vector of the pixel at location (x,y) and the numbers Dx(x,y) and Dy(x,y) are the horizontal and vertical shifts of the pixel. They are calculated in a motion estimation block 18. The set of motion vectors [Dx(·), Dy(·)] consisting of all motion vectors related to the pixels of the frame to be compressed is also coded using a motion model comprising basis functions and coefficients. The basis functions are known to both the encoder and the decoder. The coefficient values are coded and directed to the multiplexer 13, which multiplexes them into the same data stream with the coded prediction error frame for sending to a receiver. In this way the amount of information to be transmitted is dramatically reduced.
Some frames can be partly, or entirely, so difficult to predict using only the reference frame Rn(x,y) that it is not practical to use motion compensated prediction when coding them. These frames or parts of frames are coded using intra-coding without any prediction from the reference frame Rn(x,y), and accordingly motion vector information relating to them is not sent to the receiver. In prior art another kind of prediction may be employed for I-frames or those parts of P-frames which are intra-coded, namely intra prediction. In this case, the reference is formed by the previously decoded and reconstructed blocks which are part of the same frame (or slice if independent segment decoding is used).
In the receiver 20, a demultiplexer 21 separates the coded prediction error frames and the motion information transmitted by the motion vectors and directs the coded prediction error frames to a decoder 22, which produces a decoded prediction error frame Ên(x,y), which is summed in a summer 23 with the prediction frame Pn(x,y) formed on the basis of a previous frame, resulting in a decoded and reconstructed frame În(x,y). The decoded frame is directed to an output 24 of the decoder and at the same time saved in a frame memory 25. When decoding the next frame, the frame saved in the frame memory is read as a reference frame Rn(x,y) and transformed into a new prediction frame in the motion compensation and prediction block 26, according to formula (1) presented above.
The coding method used in the coding of prediction error frames and in the intracoding of a frame or part of a P-frame to be sent without using motion prediction, is generally based on a transformation, the most common of which is the Discrete Cosine Transformation, DCT. The frame is divided into adjacent blocks having a size of e.g. 8×8 pixels. The transformation is calculated for the block to be coded, resulting in a series of terms. The coefficients of these terms are quantized on a discrete scale in order that they can be processed digitally. Quantization causes rounding errors, which can become visible in an image reconstructed from blocks, so that there is a discontinuity of pixel values at the boundary between adjacent blocks. Because a certain decoded frame is used to calculate the prediction frame for subsequent predicted (P) frames, these errors can be propagated in sequential frames, thus causing visible edges in the image reproduced by the receiver. Image errors of this type are called blocking artefacts, Furthermore, if intra-prediction is used, blocking artefacts may also propagate from block to block within a given frame. In this case blocking artefacts typically lead to visual effects which are specific to the type of intra prediction used. It should therefore be appreciated that there exists a significant technical problem relating to the spatial and temporal propagation of blocking artefacts in digital images that are coded for transmission and subsequently decoded.
The principles presented above are also applicable to a situation where segmented frames are used. In that case the coding and decoding is performed in segments of the frame, according to the type of blocks in each segment.
It should also be noted that although the preceding discussion and much of the following description concentrates on application of the invention to image sequences, such as digital video, the method according to the invention may also be applied to individual digital images (i.e. still images). Essentially, the method according to the invention may be applied to any digital image that is encoded and/or decoded on a block-by-block basis using any encoding/decoding method.
Furthermore, the method according to the invention may be applied to any luminance or colour component of a digital image. Taking the example of an image represented using a YUV colour model, as introduced above, the method according to the invention may be applied to the luminance (Y) component, to either chrominance component (U or V) to both chrominance components (U and V), or to all three components (Y, U and V). In this case, where it is known that the luminance component provides more perceptually important information relating to image structure and content, it may be sufficient to apply the method according to the invention only to the luminance component, but there is no limitation on the number or combination of luminance/colour/colour difference components to which the method according to the invention may be applied.
Some prior art methods are known for removing blocking artefacts. These methods are characterized by the following features:                determining which pixels require value correction in order to remove a blocking artefact,        determining a suitable low-pass filtering for each pixel to be corrected, based on the values of other pixels contained by a filtering window placed around the pixel,        calculating a new value for the pixel to be corrected, and        rounding the new value to the closest digitized pixel value.        
Factors that influence the selection of a filter and the decision whether to use filtering can be, for example, the difference between the values of pixels across the block boundary, the size of the quantization step of the coefficients received as the transformation result, and the difference between the pixel values on different sides of the pixel being processed.
In prior art methods, the filtering of blocking and other types of visual artefacts is performed frame by frame, i.e. the whole frame is first decoded and then filtered. As a result, the effects of blocking artefacts easily propagate within a frame or from one frame to the next. This is especially true when predictive intra-coding is used.
It has been found that prior art methods also tend to remove lines that belong to real features of the image. On the other hand, prior art methods are not always capable of removing all blocking or blocking-related artefacts.
A primary objective of the method according to the invention is to limit the propagation of blocking artefacts within frames and from one frame to another. Another objective of the present invention is to present a new kind of filtering arrangement for removing blocking and other blocking-related artefacts which are especially visible when predictive intra-coding is used. The invention also has the objective that the method and associated device operate more reliably and efficiently than prior art solutions.