1. Field of the Invention
The present invention relates to a method for encoding and decoding a video signal, and more particularly to a method for encoding a video signal by employing a residual prediction mode and decoding the encoded video data.
2. Description of the Prior Art
It is difficult to allocate a broadband available for TV signals to wirelessly transmitted/received digital video signals wirelessly transmitted/received from/in a portable phone and a notebook computer, which have been extensively used, and a mobile TV and a hand held PC, which are expected to be extensively used in the future. Accordingly, a standard to be used for a video compression scheme for such portable devices must enable a video signal to be compressed with a relatively high efficiency.
In addition, such portable mobile devices are equipped with various processing and presentation capabilities. Accordingly, compressed videos must be variously prepared corresponding to the capabilities of the portable devices. Therefore, the portable devices must be equipped with video data having various qualities obtained through the combination of various parameters including the number of transmission frames per second, resolution, and the number of bits per pixel with respect to one video source, burdening content providers.
For this reason, the content provider prepares compressed video data having a high bit rate with respect to one video source so as to provide the portable devices with the video data by decoding the compressed video and then encoding the decoded video into video data suitable for a video processing capability of the portable devices requesting the video data. However, since the above-described procedure necessarily requires trans-coding (decoding+scaling+encoding), the procedure causes a time delay when providing the video requested by the portable devices. In addition, the trans-coding requires complex hardware devices and algorithms due to the variety of a target encoding.
In order to overcome these disadvantages, there is suggested a Scalable Video Codec (SVC) scheme. According to the SVC scheme, a video signal is encoded with a best video quality in such a manner that the video quality can be ensured even though parts of the overall picture sequences (frame sequences intermittently selected from among the overall picture sequences) derived from the encoding are decoded.
A motion compensated temporal filter (or filtering) (MCTF) is an encoding scheme suggested for the SVC scheme. The MCTF scheme requires high compression efficiency, that is, high coding efficiency in order to lower the number of transmitted bits per second because the MCTF scheme is mainly employed under a transmission environment such as mobile communication having a restricted bandwidth.
As described above, although it is possible to ensure video quality even if only a part of the sequence of a picture encoded through the MCTF, which is a kind of the SVC scheme, is received and processed, video quality may be remarkably degraded if a bit rate is lowered. In order to overcome the problem, an additional assistant picture sequence having a low transmission rate, for example, a small-sized video and/or a picture sequence having the smaller number of frames per second may be provided.
The assistant picture sequence is called a base layer, and a main picture sequence is called an enhanced (or enhancement) layer. Since the base layer and the enhanced layer are obtained by encoding the same video contents with different spatial resolution and frame rates, redundancy information exists in video signals of both layers. Accordingly, in order to improve coding efficiency of the enhanced layer, a variety of schemes for predicting the frame of the enhanced layer based on the frame of the base layer have been suggested.
For example, there is a scheme for coding a motion vector of an enhanced layer picture by using a motion vector of a base layer picture temporally simultaneous with the motion vector of the enhanced layer picture. In addition, it is possible to make a prediction video for a video frame of the enhanced layer on the basis of the video frame of the base layer temporally simultaneous with the video frame of the enhanced layer.
In addition, an additional prediction operation may be performed with respect to the prediction video of the enhanced layer created in relation to the main picture sequence by using a prediction video of the base layer created in relation to the assistance picture sequence. This is called a “residual prediction” mode. Herein, the prediction video denotes an image difference value found by performing a prediction operation for a macro block. In other words, the prediction video denotes a video having residual data. Hereinafter, a macro block having a residual data is called a “residual block”, and a frame having the residual data is called a “residual frame”.
In more detail, a residual block of an enhanced layer is found through a prediction operation for a macro block in a predetermined frame of a main picture sequence, and the prediction operation is performed even for the assistance picture sequence, thereby creating the residual block and the residual frame of the base layer. Thereafter, a residual block of the base layer corresponding to the macro block is found. The residual block of the base layer undergoes up-sampling, so that the size of the residual block of the base layer is enlarged corresponding to the size of the macro block. The pixel values of the enlarged residual block of the base layer are subtracted from the pixel values of the residual block of the enhanced layer, and the resultant value is encoded for the macro block.
FIG. 1 illustrates the conventional residual prediction mode based on a macro block. Herein, an enhanced layer has a frame rate of 30 Hz and frame resolution of CIF. In contrast, a base layer has a frame rate of 15 Hz and frame resolution of QCIF.
A fourth residual block R_MB_4 and a fifth residual block R_MB_5 of the base layer corresponding to a first residual block R_MB_1 and a third residual block R_MB_3 encoded using residual data in the enhanced layer are undergone up-sampling such that the resolution of the residual blocks of the base layer are enhanced to the CIF. Based on the residual blocks of the base layer having the enhanced resolution, the prediction operation for the residual block of the enhanced layer is performed.
However, since a residual block of the base layer corresponding to the second residual block of the enhanced layer R_MB_2 may not exist, it is difficult to find a predicted video based on the residual block of the base layer.
In other words, in order to apply the residual prediction mode, a residual block of the base layer corresponding to a macro block of the enhanced layer must exist. In addition, the residual prediction mode cannot be applied even if a residual frame temporally simultaneous with a frame including a macro block of the enhanced layer does not exist in the base layer.
Accordingly, when the enhanced layer has a frame rate higher than that of the base layer, a frame temporally simultaneous with a frame including a macro block of the enhanced layer to be encoded using residual data or difference values of residual data may not exist in the base layer. Such a frame, which does not exist in the base layer, is called a ‘missing picture’. A residual prediction mode cannot be applied to the macro block of the enhanced layer if the base layer has the missing picture. Accordingly, it is difficult to expect the improvement of coding efficiency.