Motion pictures are being adopted in increasing numbers of applications, ranging from video-telephoning and video-conferencing to digital television and Digital Versatile Disc (DVD). When a motion picture is being transmitted or recorded as digital data, a substantial amount of data has to be sent through transmission channels of limited available frequency bandwidth or has to be stored on storage media of limited data capacity. Thus, in order to transmit and store digital data representing a motion picture on channels and media, it is inevitable to compress and reduce the volume of the digital data.
For the compression of video data, a plurality of video coding standards has been developed. Such video standards are, for instance, ITU-T standards denoted with H.26x and ISO/IEC standards denoted with MPEG-x. The most up-to-date and advanced video coding standards are currently the standards denoted as H.264/AVC or MPEG-4/AVC.
These standards have following main stages: First, each individual frame (picture) of a motion picture is divided into blocks (macroblocks) in order to subject each video frame to data compression at a block level. Then, spatial redundancies within a frame are reduced by applying each block to a transform from the spatial domain into the frequency domain. Further, the resulting transform coefficients are quantized. As a result of such coding, the data volume of the video data is reduced. Then, the quantized transform coefficients are entropy coded.
Here, the original transform coefficient values cannot be recovered from the quantized transform coefficients due to a data in loss introduced by the above described quantizing operation. In other words, coding video data causes the image quality to be impaired by a corresponding quantizing noise.
Further, in the above described standards, in order to further minimize the data volume of the coded video data, temporal dependencies between blocks of subsequent frames are exploited to only transmit changes between subsequent frames. This is accomplished by employing a motion estimation and compensation technique.
The above described video compression technique (an image coding method) is called a hybrid coding technique, and is known to be the most effective among the various video compression techniques. The hybrid coding technique combines temporal and spatial compression techniques together with statistical coding techniques. Further, the hybrid coding technique employs motion-compensated Differential Pulse Code Modulation (DPCM), two-dimensional Discrete Cosine Transform (DCT), quantization of DCT coefficients, and a Variable Length Coding (VLC).
The motion-compensated DPCM is a process of estimating the movement of an image object between a current frame to be processed and a processed frame, and predicting the current frame to be processed according to the estimated motion to produce differences between the current frame and its prediction result.
During the coding and decoding on the image data, several disturbances are added to the image. For example, a quantization noise is added when the DCT coefficients are quantized. Further, block distortions will occur when the image is coded on a block-by-block basis.
Hereinafter, with reference to the drawings, a conventional image coding apparatus and image decoding apparatus employing the hybrid coding technique shall be described.
FIG. 1 is a block diagram showing the configuration of a conventional image coding apparatus.
An image coding apparatus 1000 includes a subtractor 1100, an orthogonal transform and quantization unit 1200, an inverse quantization and inverse orthogonal transform unit 1300, an adder 1350, a de-blocking filter 1400, a memory 1500, an intra-frame prediction unit 1600, a motion compensation unit 1650, a motion estimation unit 1700, a switching unit 1800, and an entropy coding unit 1900.
The subtractor 1100 calculates, as a prediction error Res, differences between an input image represented by an input image signal In and a predictive image Pre outputted from either the intra-frame prediction unit 1600 or the motion compensation unit 1650.
The orthogonal transform and quantization unit 1200 transforms the prediction error Res calculated by the subtractor 1100 to frequency components (by Discrete Cosine Transform, for example), and quantizes each of the frequency components to compress-code them into quantized coefficients Qc.
The inverse quantization and inverse orthogonal transform unit 1300 de-quantizes the quantized coefficients Qc outputted from the orthogonal transform and quantization unit 1200 so as to transform the quantized coefficients Qc to frequency components. Furthermore, by applying an inverse orthogonal transformation to the frequency components (Inverse Discrete Cosine Transform, for example), the inverse quantization and inverse orthogonal transform unit 1300 transforms the frequency components to a prediction error Dr.
The adder 1350 adds the above mentioned predictive image Pre and prediction error Dr to generate a locally decoded image Rc, and outputs the locally decoded image Rc to the de-blocking filter 1400.
The de-blocking filter 1400 filters the locally decoded image Rc outputted from the adder 1350 to remove block distortions therefrom. That is to say, the above described processing, up to the generation of the locally decoded image Rc, is performed per block of a picture, and thus the locally decoded image Rc contains block distortions. Therefore, the de-blocking filter 1400 removes the block distortions from the locally decoded image Rc.
For example, the de-blocking filter 1400 smoothes the edge of each block by a linear filtering of the borders of each block of the locally decoded image Rc. Then, the de-blocking filter 1400 stores the filtered locally decoded image Rc in the memory 1500 as a locally decoded image Rdf.
When macroblocks in the input image are to be coded in the intra mode, the intra-frame prediction unit 1600 extracts one or more locally decoded images Rdf corresponding to the input image to be coded from the memory 1500 as reference image(s) Ref, and generates, using the extracted reference image(s) Ref, a predictive image Pre corresponding to the input image to be coded.
The motion estimation unit 1700 refers to, as a reference image Ref, a picture coded prior to the to-be-coded picture in the input image, that is, refers to the locally decoded image Rc stored in the memory 1500, to estimate a motion vector MV per macroblock of the to-be-coded picture, for example.
When macroblocks in the input image are to be coded in the inter mode, the motion compensation unit 1650 extracts, from the reference image Ref stored in the memory 1500, an image of an area indicated by the motion vector MV estimated by the motion estimation unit 1700, to output the image as a predictive image Pre.
When the macroblocks are to be coded in the intra mode, the switching unit 1800 connects the subtractor 1100 to the intra-frame prediction unit 1600 so that the subtractor 1100 uses, for its processing, the predictive image Pre outputted from the intra-frame prediction unit 1600. Alternatively, when the macroblocks are to be coded in the inter mode, the switching unit 1800 connects the subtractor 1100 to the motion compensation unit 1650 so that the subtractor 1100 uses, for its processing, the predictive image Pre outputted from the motion compensation unit 1650.
The entropy coding unit 1900 generates a coded stream Str by performing entropy coding (variable-length coding) on the quantized coefficients Qc generated by the orthogonal transform and quantization unit 1200 and the motion vector MV estimated by the motion estimation unit 1700.
Such an image coding apparatus 1000 as described above codes an input image through generation of a predictive image Pre; orthogonal transformation; quantization; subtraction of the predictive image Pre from an input image; and so on. The image coding apparatus 1000 further decodes the coded input image through inverse quantization of quantized coefficients Qc; inverse orthogonal transformation, addition of a prediction error Dr and the predictive image Pre; and so on.
FIG. 2 is a block diagram showing the configuration of a conventional image decoding apparatus.
An image decoding apparatus 2000 includes an entropy decoding unit 2100, an inverse quantization and inverse orthogonal transform unit 2200, an adder 2300, a de-blocking filter 2400, a memory 2500, an intra-frame prediction unit 2600, a motion compensation unit 2650, and a switching unit 2700.
The entropy decoding unit 2100 obtains a coded stream Str and performs entropy decoding (variable-length decoding) thereon. Then, the entropy decoding unit 2100 extracts quantized coefficients Qc and a motion vector MV from the entropy decoded coded stream Str.
The inverse quantization and inverse orthogonal transform unit 2200 obtains the quantized coefficients Qc extracted by the entropy decoding unit 2100, and de-quantizes the quantized coefficients Qc to transform them to frequency components. Furthermore, by applying an inverse orthogonal transformation to the frequency component (Discrete Cosine Transform, for example), the inverse quantization and inverse orthogonal transform unit 2200 transforms the frequency component to a prediction error Dr.
The adder 2300 adds, to the prediction error Dr outputted from the inverse quantization and inverse orthogonal transform unit 2200, the predictive image Pre outputted from the intra-frame prediction unit 2600 or from the motion compensation unit 2650, to generate a decoded image Rc. Further, the adder 2300 outputs the generated decoded image Rc to the de-blocking filter 2400.
The de-blocking filter 2400 filters the decoded image Rc outputted from the adder 2300 to remove block distortions therefrom. That is to say, the above described processing, up to the generation of the decoded image Rc, is performed per block of a picture, and thus the decoded image Rc includes block distortions.
For example, the de-blocking filter 2400 smoothes the edge of each block by a linear filtering of the borders of each block of the decoded image Rc. Then, the de-blocking filter 2400 outputs the filtered decoded image Rc as an output image Ds, and stores the decoded image Rc in the memory 2500 as a reference image Ref.
When macroblocks included in the coded stream Str are to be decoded in the intra mode, the intra-frame prediction unit 2600 extracts one or more reference images Ref corresponding to the prediction error Dr from the memory 2500, and generates a predictive image Pre using the extracted reference image(s) Ref.
When macroblocks included in the coded stream Str are to be decoded in the inter mode, the motion compensation unit 2650 extracts, from a reference image Ref stored in the memory 2500, an image of an area indicated by the motion vector MV extracted by the entropy decoding unit 2100, to output the extracted image as a predictive image Pre.
When macroblocks to be decoded have been coded in the intra mode, the switching unit 2700 connects the adder 2300 to the intra-frame prediction unit 2600 so that the adder 2300 uses, for its processing, the predictive image Pre outputted from the intra-frame prediction unit 2600. Alternatively, when the macroblocks are to be decoded in the inter mode, the switching unit 2700 connects the adder 2300 to the motion compensation unit 2650 so that the adder 2300 uses, for its processing, the predictive image Pre outputted from the motion compensation unit 2650.
Such an image decoding apparatus 2000 as described above performs decoding through inverse quantization of quantized coefficients Qc; inverse orthogonal transformation; addition of a prediction error Dr and a predictive image Pre; and so on.
As described above, with the conventional image coding apparatus 1000 and image decoding apparatus 2000, coding and decoding of images causes the locally decoded images Rc in the image coding apparatus 1000 and the decoded images Rc in the image decoding apparatus 2000 to be inferior in image quality to input images corresponding to such images. In other words, the locally decoded images Rc and the decoded images Rc are distorted compared to the input images, and contain noise as a result of the coding and decoding. Noise includes quantization noise, block distortions, and so on.
Thus, the de-blocking filters 1400 and 2400 are provided to improve the image quality of the locally decoded images Rc and the decoded images Rc.
However, such de-blocking filters 1400 and 2400 cannot adequately improve the image quality of the locally decoded images Rc and the decoded images Rc. To be more specific, the de-blocking filters 1400 and 2400 only attempt to remove block distortions by smoothing the borders of blocks, and do not remove other types of noise or improve the image quality of areas other than the borders. In addition, the de-blocking filters 1400 and 2400 are not capable of applying a suitable filtering to the locally decoded images Rc and the decoded images Rc because regardless of the content (sharpness, smoothness, for example) of the locally decoded images Rc and the decoded images Rc, a linear filtering is applied with predetermined filter coefficients.
In light of the above, in order to improve the image quality, proposed is a filtering adapted to each image, that is, an adaptive filtering applied by analyzing the content of the image. For example, an image which has been coded and then decoded is analyzed, and a filter parameter (filter coefficients), which is adapted depending on the analysis result, is used for the filtering.
Patent Reference 1 describes an adaptive filtering of a video sequence. With the adaptive filtering according to Patent Reference 1, motion and noise of a decoded image is estimated to compute a filter parameter adaptive to the estimation result, and the decoded image is applied to a filtering according to the computed filter parameter. Based on the above mentioned estimation, an iterative calculation of filter parameters is carried out.    Patent Reference 1: United States Patent Application Publication No. 2005/0105627