It is difficult to assign a wide bandwidth, such as the bandwidth required for television (TV) signals, to digital video signals that are transmitted in a wireless manner through mobile phones or notebook computers, which are currently widely used, or through mobile TVs or hand-held Personal Computers (PCs), which will be widely used in the future. Therefore, a standard to be used in a video compression scheme for such mobile devices needs to have higher video signal compression efficiency.
Moreover, such mobile devices inevitably have varying inherent capabilities of processing or presenting video signals. Therefore, a compressed image must be variously prepared in advance to correspond to such capabilities, which means that video data having various image qualities, with respect to various combined parameters, such as the number of frames per second, the resolution, and the number of bits per pixel, must be provided for a single image source, thus inevitably placing a great burden on content providers.
For this reason, a content provider prepares compressed video data having a high bit rate for each individual image source, and, when the mobile device requests the video data, performs a procedure of decoding a compressed image and encoding the decoded image into video data suitable for the video processing capability of the mobile device that requested the image, and then provides the encoded video data. However, such a scheme must be accompanied by a transcoding (decoding+encoding) procedure, so that a slight time delay occurs at the time of providing the image requested by the mobile device. Further, the transcoding procedure also requires complicated hardware devices and algorithms depending on the variety of encoding targets.
A Scalable Video Codec (SVC) has been proposed to overcome these obstacles. SVC is a scheme of encoding video signals at the highest image quality when encoding the video signals, and enabling image quality to be secured to some degree even though only a part of the entire picture sequence generated as a result of the encoding (a sequence of frames intermittently selected from the entire sequence) is decoded and used. A Motion Compensated Temporal Filter (MCTF) scheme is an example of an encoding scheme proposed for use in a scalable video codec.
As described above, even if only a partial sequence of a picture sequence encoded by the MCTF, which is a scalable scheme, is received and processed, image quality can be secured to some degree. However, if the bit rate is decreased, the deterioration of image quality becomes serious. In order to solve the problem, a separate sub-picture sequence for a low bit rate, for example, small screens and/or a picture sequence having a small number of frames per second, can be provided.
A sub-picture sequence is called a base layer, and a main picture sequence is called an enhanced (or enhancement) layer. However, since the base layer and the enhanced layer are obtained by encoding the same image content, redundant information (redundancy) exists in video signals of the two layers. Therefore, in order to improve the coding efficiency of the enhanced layer, the video frame of the enhanced layer may be generated into a predictive image on the basis of an arbitrary video frame of the base layer temporally coincident with the video frame of the enhanced layer, or the motion vector of an enhanced layer picture may be coded using the motion vector of a base layer picture temporally coincident with the enhanced layer picture. FIG. 1 illustrates a coding procedure using the motion vector of the base layer picture.
The motion vector coding procedure of FIG. 1 is described. If the frame of a base layer has a smaller screen size than that of the frame of an enhanced layer, a frame F1 of the base layer, temporally coincident with a frame F10 of the enhanced layer to be generated as a current predictive image, is extended to have the same size as the enhanced layer frame F10. In this case, the motion vectors of respective macroblocks in the base layer frame are scaled at the same extension rate as that of the base layer frame F1.
Further, a motion vector mv1 is detected through a motion estimation operation on an arbitrary macroblock MB10 within the enhanced layer frame F10. The motion vector mv1 is compared to a scaled motion vector mvScaledBL1 of a motion vector mvBL1 (this motion vector is obtained by a base layer encoder prior to the encoding of the enhanced layer) of a macroblock MB1 in the base layer frame F1 covering an area corresponding to the macroblock MB10 (if the enhanced layer and the base layer use macroblocks having the same size, for example, 16×16 macroblocks, the macroblock of the base layer covers a wider area in a frame than does the macroblock of the enhanced layer).
If the two vectors mv1 and mvScaledBL1 are equal to each other, a value, indicating that the motion vector mv1 of the macroblock MB10 in the enhanced layer is equal to the scaled motion vector of the corresponding block MB1 of the base layer, is recorded in a block mode. In contrast, if the two vectors differ, the difference between the vectors, that is, ‘mv1−mvScaledBL1’ is coded when the coding of the difference vector ‘mv1−mvScaledBL1’ is more profitable than the coding of the vector mv1, thus reducing the amount of vector-coded data at the time of coding the enhanced layer. However, since the base layer and the enhanced layer have different encoded frame rates, there exists a plurality of frames of the enhanced layer, which do not have temporally coincident frames in the base layer. For example, a frame B of FIG. 1 is such a frame. Since the frame B does not have a corresponding base layer frame temporally coincident with the frame B, the above method cannot be applied to the frame B.
However, even if the frames do not temporally coincident with each other, an enhanced layer frame and a base layer frame having a small temporal gap therebetween are adjacent images, so that there is a high probability that the frames have correlation therebetween with respect to motion estimation. In other words, there is a high probability that the directions of motion vectors are similar to each other, so, even in this case, coding efficiency can be improved using the motion vector of the base layer.