It is difficult to assign wide bandwidth, such as the bandwidth required for television (TV) signals, to digital video signals that are transmitted in a wireless manner through mobile phones or notebook computers, which are currently widely used, or through mobile TVs or hand-held Personal Computers (PCs), which will be widely used in the future. Therefore, a standard to be used in a video compression scheme for such mobile devices needs to have higher video signal compression efficiency.
Moreover, such mobile devices inevitably have varying inherent capabilities of processing or presenting video signals. Therefore, a compressed image must be variously prepared in advance to correspond to such capabilities, which means that video data having various image qualities, with respect to various combined parameters, such as the number of frames per second, the resolution, and the number of bits per pixel, must be provided for a single image source, thus inevitably placing a great burden on content providers.
For this reason, a content provider prepares compressed video data having a high bit rate for each individual image source, and, when the mobile device requests the video data, performs a procedure of decoding a compressed image and encoding the decoded image into video data suitable for the video processing capability of the mobile device that requested the image, and then provides the encoded video data. However, such a scheme must be accompanied by a transcoding (decoding+scaling+encoding) procedure, so that a slight time delay occurs at the time of providing the image requested by the mobile device. Further, the transcoding procedure also requires complicated hardware devices and algorithms depending on the variety of encoding targets.
A Scalable Video Codec (SVC) has been proposed to overcome these obstacles. SVC is a scheme for encoding video signals at the highest image quality when encoding the video signals, and enabling image quality to be secured to some degree even though only part of the entire picture (frame) sequence generated as a result of the encoding (a sequence of frames intermittently selected from the entire sequence) is decoded.
A Motion Compensated Temporal Filter (MCTF) scheme is an example of an encoding scheme proposed for use in a scalable video codec. There is a high probability that the MCTF scheme will be applied to transmission environments such as a mobile communication environment having a limited bandwidth, therefore the MCTF scheme requires high compression efficiency, that is, high coding efficiency, in order to decrease the number of bits transmitted per second.
As described above, even if only a partial sequence of a picture sequence encoded by the MCTF, which is a scalable scheme, is received and processed, image quality can be secured to some degree. However, if the bit rate is decreased, the deterioration in image quality becomes serious. In order to solve the problem, a separate sub-picture sequence for a low bit rate, for example, small screens and/or a picture sequence having a small number of frames per second, can be provided.
A sub-picture sequence is called a base layer, and a main picture sequence is called an enhanced (or enhancement) layer. However, since the base layer and the enhanced layer are obtained by encoding the same image content with different temporal resolutions and different frame rates, redundant information (redundancy) exists in the video signals of the two layers. Therefore, in order to improve the coding efficiency of the enhanced layer, the video signal of the enhanced layer is predicted and encoded using the motion information and/or texture information of the base layer. Such an encoding method is designated as an inter-layer prediction method.
Motion information of a base layer used in the inter-layer prediction method includes reference index information that indicates a picture (frame) including a reference block, motion vector information that indicates a displacement to the reference block, partitioning information of a corresponding block (a block that is placed in the frame of a base layer, temporally coincident to the frame of an enhanced layer, including a macroblock to be encoded, and has a region covering the macroblock when the block is magnified according to the ratio of the screen size of the enhanced layer to the screen size of the base layer), etc.
FIG. 1 is a diagram showing an embodiment of a conventional method of deriving motion information of the macroblock of an enhanced layer, for example, partitioning information, reference index information, motion vector information, etc., from a base layer. In FIG. 1(a), an embodiment in which a reference index and a motion vector for a 4×4 sub-block b are derived from the base layer is shown.
First, a reference index and a motion vector for each of four corner pixels c1 to c4 of a block to be encoded can be set to a reference index and a motion vector, respectively, for the block of the base layer corresponding to each pixel.
However, when a block corresponding to each pixel does not exist in the base layer, as in the case where a temporally coincident frame does not exist in the base layer, or when a block corresponding to each corner pixel is encoded in an intra mode, the block b can be set to an intra block.
If a block corresponding to the corner pixel does not use a frame existing in a reference picture list List—0, the frame existing in List—0 and a motion vector directed toward the frame in List—0 are not set in the block b. This is equally applied to List—1.
A reference index rb(List_x) for the block b is set to a minimum value of reference indices rci(List_x), determined for respective corner pixels, and a motion vector mvb (List_x) for the block b is set to the mean of the motion vectors of the corner pixels having the set reference index rb(List_x).
In FIG. 1(b), an embodiment in which motion information of an 8×8 block B is derived from 4×4 sub-blocks is shown.
In the case where all four 4×4 sub-blocks are intra blocks, the 8×8 block B is set to an intra block. In other cases, the reference index information and partitioning information of the 8×8 block B are determined through the following procedure.
For respective 4×4 sub-blocks, reference indices for reference picture lists List—0 and List—1 are set to the same values. Description is made using List—0 as an example, and the same operation is performed on List—1.
In the case where no 4×4 sub-block uses a frame in List—0, the reference index and the motion vector for List—0 are not set for the 8×8 block B.
In other cases, a reference index rB(List—0) for the 8×8 block B is calculated as the minimum value of the reference indices for the four 4×4 sub-blocks. The mean motion vector mvmean(List—0) of the 4×4 sub-blocks having the calculated reference index rB(List—0) is calculated. Further, in the 4×4 sub-blocks, a reference index and a motion vector for each of i) an intra block, ii) a block not using List—0, or iii) a block having a reference index rb(List—0) other than the calculated reference index rB(List—0), are forcibly set to the calculated reference index rB(List—0) and the calculated motion vector mvmean(List—0), respectively.
Thereafter, the partitioning mode for the 8×8 block B is determined as follows. If the motion vectors of two neighboring 4×4 sub-blocks are equal to each other, the sub-blocks are considered to be equal to each other, and are then combined with each other. In FIG. 1(b), if sub-blocks b1 and b2 are equal to each other, and b3 and b4 are equal to each other, a partitioning mode is determined to be a BLK—8×4 mode. At this time, if sub-blocks b1 and b3 are also equal to each other, a partitioning mode is determined to be a BLK—8×8 mode. Similar to this, if sub-blocks b1 and b3 are equal to each other, and b2 and b4 are equal to each other, a partitioning mode is determined to be a BLK—4×8 mode. In other cases, a partitioning mode is determined to be a BLK—4×4 mode.
However, when the ratio of the screen size (or resolution) of an enhanced layer to the screen size of a base layer is not a multiple of 2 (non-dyadic case), for example, when the screen size of the base layer is ⅓, ⅔, etc. of that of the enhanced layer, it is not easy to derive motion information, such as reference index information, motion vector information, or partitioning information, from the base layer, so that an inter-layer prediction method cannot be sufficiently applied to the scalable encoding of the enhanced layer.