Through advances in digital technologies combining multiple audio, video, and other kinds of pixel streams into a single transmission stream, conventional information media, that is, means of communicating information to people such as newspapers, magazines, television, radio, and the telephone, can now be used for multimedia communication. “Multimedia” generally refers to text, graphics, audio, and video linked together in a single transmission stream, but conventional information media must first be digitized before the information can be handled in a multimedia format.
The estimated storage capacity needed to store the information carried by conventional information media when converted to digital data is only 1 or 2 bytes per character for text, but 64 kbits for one second of telephone quality audio, and 100 Mbits for one second of video at current television receiver quality. It is therefore not practical to handle these massive amounts of video, and other kinds of pixel streams into a single transmission stream, conventional information media, that is, means of communicating information to people such as newspapers, magazines, television, radio, and the telephone, can now be used for multimedia communication. “Multimedia” generally refers to text, graphics, audio, and video linked together in a single transmission stream, but conventional information media must first be digitized before the information can be handled in a multimedia format.
The estimated storage capacity needed to store the information carried by conventional information media when converted to digital data is only 1 or 2 bytes per character for text, but 64 kbits for one second of telephone quality audio, and 100 Mbits for one second of video at current television receiver quality. It is therefore not practical to handle these massive amounts of information in digital form on the above information media. For example, video telephony service is available over ISDN (Integrated Services Digital Network) is lines with a transmission speed of 64 Kbps to 1.5 Mbps, but television camera grade video cannot be sent as is over ISDN lines.
Data compression therefore becomes essential. Video telephony service, for example, is implemented by using video compression techniques internationally standardized in ITU-T (International Telecommunication Union, Telecommunication Standardization Sector) Recommendations H.261 and H.263. Using the data compression methods defined in MPEG-1, video information can be recorded with audio on a conventional audio CD (Compact Disc).
The MPEG (Moving Picture Experts Group) is an international standard for digitally compressing moving picture signals (video). MPEG-1 enables compressing a video signal to 1.5 Mbps, that is, compressing the information in a television signal approximately 100:1. Furthermore, because the transmission speed for MPEG-1 video is limited to approximately 1.5 Mbps, MPEG-2, which was standardized to meet the demand for even higher picture quality, enables compressing a moving picture signal to 2 Mbps to 15 Mbps.
MPEG-4 with an even higher compression rate has also been standardized by the working group (ISO/IEC JTC1/SC29/WG11) that has advanced the standardization of MPEG-1 and MPEG-2. MPEG-4 not only enables low bit rate, high efficiency coding, it also introduces a powerful error resistance technology capable of reducing subjective image degradation even when transmission path errors occur. The ITU-T is also working on standardizing Recommendation H.26L as a next-generation picture coding method.
Unlike conventional video coding techniques, H.26L uses a coding distortion removal method accompanied by complex processing to remove coding distortion. Block unit coding methods using orthogonal transforms such as the DCT techniques widely used in video coding are known to be subject to a grid-like distortion known as block distortion at the coding block boundaries. Because image quality loss in low frequency components is more conspicuous than image quality loss in high frequency components, the low frequency components are coded more faithfully than the high frequency components in block unit coding. Furthermore, because natural images captured with a camera, for example, contain more low frequency components than high frequency components, the coding blocks contain more low frequency components than high frequency components. The coding blocks therefore tend to have substantially no high frequency components and adjacent pixels in a block tend to have substantially the same pixel value.
Furthermore, because coding is by block unit, there is no assurance that the pixel values will be substantially the same at the boundary between adjacent blocks, that is, that the pixel values will change continuously across the block boundary, even if the pixel values are substantially identical within each block. The result is that, as shown in FIG. 31 describing the concept of coding distortion removal, while the change in pixel values is smooth and continuous in the source image across the block boundary indicated by the dotted line as shown in FIG. 31 (a), and the pixel values change continuously within each block as shown in FIG. 31 (b) after the source image is coded by block unit, block distortion, that is, a discontinuity in pixel values only at the block boundary, occurs. Block distortion is thus a significant image quality problem resulting from image coding, but can be reduced by correcting the pixel values to be continuous across the block boundary as shown in FIG. 31 (c). This process of reducing block distortion is called coding distortion removal (also referred to as “deblocking”).
When deblocking is applied at the video decoding stage, the deblocking filter can be used as a post filter as shown in the block diagram of a video decoder using a conventional decoding method in FIG. 32, or it can be used as an in-loop filter as shown in the block diagram of a video decoder using a conventional decoding method in FIG. 33. The configurations shown in these block diagrams are described below.
In the block diagram of a video decoder using a conventional decoding method shown in FIG. 32, a variable length decoder 52 variable length decodes encoded signal Str and outputs frequency code component DCoef. A de-zigzag scanning unit 54 rearranges the frequency components of the frequency code component DCoef in two-dimensional blocks, and outputs frequency component FCoef, the block unit frequency components. The reverse cosine transform unit 56 applies dequantization and reverse DCT operations to frequency component FCoef, and outputs difference image DifCoef.
Motion compensator 60 outputs the pixel at the position indicated by externally input motion vector MV from the reference image Ref accumulated in memory 64 as motion compensated image MCpel. Adder 58 adds difference image DifCoef and motion compensated image MCpel to output reconstructed image Coef. Deblocking filter 62 applies coding distortion removal to reconstructed image Coef, and outputs decoded image signal Vout. Reconstructed image Coef is stored in memory 64, and used as reference image Ref for the next image decoding.
The block diagram in FIG. 33 of a video decoder using a conventional decoding method is substantially identical to the block diagram of a video decoder shown in FIG. 32, but differs in the location of the deblocking filter 62. As will be known from FIG. 33 the decoded image signal Vout output from deblocking filter 62 is stored to memory 64.
The block diagram in FIG. 32 of a video decoder using a conventional decoding method shows the configuration and method used in MPEG-1, MPEG-2, MPEG-4, and H.263. The block diagram in FIG. 33 of a video decoder using a conventional decoding method shows the configuration and method used in H.261 and H.26L TM8.
With the block diagram in FIG. 32 of a video decoder using a conventional decoding method the reconstructed image Coef stored to memory 64 is not dependent upon the method applied by the deblocking filter 62. This allows developing and implementing various kinds of deblocking filters 62, including complex yet high performance filters as well as simple filters with relatively little effect according to the performance of the available hardware and the specific application. The advantage is that a deblocking filter 62 appropriate to the device can be used.
With the block diagram in FIG. 33 of a video decoder using a conventional decoding method the decoded image signal Vout stored to memory 64 is dependent upon the method employed by the deblocking filter 62. The problem here is that the filter cannot be changed to one appropriate to the hardware or application, but the advantage is that the same level of coding distortion removal can be assured in every device.
FIG. 34 is a block diagram of a coding distortion removal unit using the conventional coding distortion removal method. FIG. 34 shows the configuration of the deblocking filter 62 in FIG. 32 and FIG. 33 in detail. To efficiently remove only coding distortion from an image signal containing coding distortion, it is important to determine the amount and tendency for coding distortion in the image signal and then apply appropriate filtering so as to not degrade the actual image signal.
Because high frequency components account for much of the coding distortion, the general concept behind coding distortion removal is to survey the image signal to determine the ratio of high frequency components in the image signal, identify high frequency components in image signal pixels normally thought to not contain a high frequency component as coding distortion, and apply a high frequency component suppression filter to the coding distortion. This is possible because the correlation between adjacent pixels in an image signal is high, pixels containing a high frequency component are concentrated in edge areas, and dispersed high frequency components can be considered to be coding distortion.
This deblocking filter 62 was created by the inventors of the present invention based on content found in ITU-T Recommendation H.26L TML8.
Filtered pixel count controller 84 uses reconstructed image Coef to determine the pixel positions containing coding distortion, and outputs filtered pixel count FtrPel. Filter coefficient controller 86 uses filtered pixel count FtrPel and reconstructed image Coef to determine the filter coefficient (including the number of filter taps) appropriate to removing coding distortion from the indicated pixels, and outputs filter coefficient FtrTap. The filter processor 88 applies filtering to remove coding distortion from reconstructed image Coef using the filter coefficient indicated by filter coefficient FtrTap, and outputs decoded image signal Vout.