1. Field of the Invention
The present invention relates to a method for encoding and decoding video signals using scalable Motion Compensated Temporal Filtering (MCTF).
2. Description of the Related Art
While television broadcast signals require high bandwidth, it is difficult to allocate such high bandwidth for the type of wireless transmissions/receptions performed by mobile phones and notebook computers, for example. Thus, video compression standards for such devices must have high video signal compression efficiencies.
Such mobile devices have a variety of processing and presentation capabilities so that a variety of compressed video data forms must be prepared. This indicates that the same video source must be provided in a variety of forms corresponding to a variety of combination of variables such as the number of frames transmitted per second, resolution, the number of bits per pixel, etc. This imposes a great burden on content providers.
In view of the above, content providers prepare high-bitrate compressed video signals for each video source and perform, when receiving a request from a mobile device, a process of decoding the compressed video signals and encoding it back into video data suitable to the video processing capabilities of the mobile device before providing the requested video signals to the mobile device. However, this method entails a transcoding procedure including decoding, scaling and encoding processes, which causes some time delay in providing the requested signals to the mobile device. The transcoding procedure also requires complex hardware and algorithms to cope with the wide variety of target encoding formats.
A Scalable Video Codec (SVC) has been developed in an attempt to overcome these problems. This scheme encodes video signals into a sequence of pictures with the highest image quality while ensuring a certain level of image quality of the video when using part of the encoded picture sequence (specifically, a partial sequence of pictures intermittently selected from the total sequence of frames).
Motion Compensated Temporal Filtering (MCTF) is an encoding and decoding scheme that has been suggested for use in the scalable video codec.
Although it is possible to represent video signals in low image-quality by receiving and processing a part of the sequence of pictures encoded in the scalable MCTF coding scheme as described above, the image quality is significantly degraded as the bitrate is lowered. One solution to this problem is to provide an auxiliary picture sequence of low bitrates, for example, a sequence of pictures that have a small screen size and/or a small number of frames per second. The auxiliary picture sequence is referred to as a base layer, and a main picture sequence is referred to as an enhanced layer. Video signals of the base and enhanced layers have redundancy since the same video signal source is encoded into the two layers.
To improve the coding efficiency of the enhanced layer according to the MCTF scheme, one method generates a predicted image for each video frame of the enhanced layer based on a video frame of the base layer temporally coincident with the enhanced layer video frame. FIG. 1 illustrates how the predicted image for each video frame of the enhanced layer is generated based on a temporally coincident video frame of the base layer.
In this method, a small-screen auxiliary picture composed of a specific number of macroblocks of the base layer is unsampled so that the auxiliary picture is enlarged to have the same screen size as a video frame of the enhanced layer (S10). To produce a predicted image for a current macroblock EM10 in an enhanced layer frame E100, which is temporally coincident with the enlarged base layer picture B100, prediction is performed for the current macroblock EM10 based on a macroblock BM10 at the same position as the macroblock EM10 (S11). The difference (i.e., residual) of the macroblock EM10 of the enhanced layer from the macroblock BM10 of the base layer is encoded to the macroblock EM10 of the enhanced layer. The base layer macroblock BM10 used herein must be a block encoded in an intra mode. This is because a predicted block produced based on a different block in the same frame can be restored to a block having original pixel values based on the different block.
A base layer frame, which is coincident temporally with an enhanced layer frame, may not be present since the base and enhanced layers are encoded at different frame rates. If a base layer frame and an enhanced layer frame are present within a certain time gap, the two layer frames can be regarded as temporally coincident and a predicted image for the enhanced layer frame can be produced based on the base layer frame. The degree of identity between an image of a target macroblock in an enhanced layer frame, which is to be converted into a predicted image, and an image of a macroblock in an enlarged base layer frame at the same position as the target macroblock, may be reduced even if the base and enhanced layer frames are present within the certain time gap and thus can be regarded as temporally coincident. Further, since the small-screen frame of the base layer is enlarged for use, the positions of macroblocks in the two layer frames having the same image may differ by several pixel lines depending on the degree of enlargement.
In this case, the highest coding efficiency cannot be achieved if a predicted image for a target macroblock in the enhanced layer frame is produced using a macroblock in the enlarged base layer frame at the same position as the target macroblock as described above.