A Scalable Video Codec (SVC) scheme is a video signal encoding scheme that encodes video signals at the highest image quality and can represent images at some level of image quality even though only part of a picture sequence (a sequence of frames that are intermittently selected from among the entire picture sequence) that is produced by the encoding is decoded and used.
A picture sequence encoded using the scalable method allows images to be represented at some level of image quality even if only a partial sequence thereof is received and processed. In the case where the bit rate is low, image quality is significantly degraded. In order to overcome this problem, a separate auxiliary picture sequence for the low bit rate, for example, small screens and/or a picture sequence having a low frame rate, may be provided.
An auxiliary picture sequence is referred to as a base layer, and a main picture sequence is referred to as an enhanced layer or enhancement layer. A base layer and its enhanced layer result from the encoding of the same source video signals. In the video signals of the two layers, redundancy exists. Accordingly, in the case where a base layer is provided, an interlayer prediction method of predicting the video signals of an enhanced layer using the motion information and/or texture information, corresponding to image data, of the base layer, and performing encoding based on the prediction may be employed so as to increase coding efficiency.
Prediction methods using the texture information of a base layer include an intra base prediction mode and a residual prediction mode.
An intra base prediction mode (simply referred to as an intra base mode) is a method of predicting and encoding a macro block of an enhanced layer based on a block of a base layer that corresponds to the macro block of the enhanced layer (a block that is located in the frame of the base layer temporally coincident with a frame including the macro block and has a region covering the macro block when enlarged at the ratio of the screen sizes of the enhanced layer and base layer) and has been encoded in an intra mode. In this case, the corresponding block of the base layer is decoded to have image data, and is then enlarged and used at the ratio of the screen sizes of the enhanced layer and base layer through up-sampling.
A residual prediction mode is similar to the intra base mode except that the residual prediction mode uses a corresponding block of a base layer having residual data, corresponding to an image difference value, not a corresponding block of a base layer encoded to have image data. Based on a corresponding block of a base layer that has been encoded in an inter mode and has residual data, predicted data is created for a macro block of an enhanced layer that has been encoded in an inter mode and has residual data. At this time, the corresponding block of the base layer having residual data is enlarged and used through up-sampling, as in the intra base mode.
FIG. 1 illustrates an embodiment in which an image block of an enhanced layer that has been encoded in an inter mode and has residual data is decoded using the residual data of a base layer.
A residual prediction flag indicating that an image block of an enhanced layer has been encoded in a residual prediction mode is set to ‘1’, and corresponding residual data of the base layer is added to the residual data of the enhanced layer.
In the case where the spatial resolutions of the base layer and enhanced layer do not coincide with each other, the residual data of the base layer is up-sampled first. The up-sampling for the residual data (hereinafter simply referred to as residual up-sampling) is carried out in the following way, unlike up-sampling in an intra base mode, in which up-sampling is carried out after decoding into image data.
1. In the case where the resolution of an enhanced layer is two times the resolution of the base layer (in a dyadic case), bi-linear interpolation is employed.
2. In a non-dyadic case, a 6 tap interpolation filter is used.
3. Up-sampling is carried out using only pixels within the same transform block. Up-sampling filtering beyond the boundary of the transform block is not allowed.
FIG. 2 illustrates an example of the up-sampling of a 4×4 residual block in a dyadic case.
Simple bi-linear interpolation is used for residual up-sampling, but bi-linear interpolation is not applied to the boundary of a transform block so as to avoid the use of pixels within another transform block. Accordingly, as illustrated in FIG. 2, only the pixels of a corresponding block are used for the up-sampling of pixels existing at the boundary of a transform block. Furthermore, different operations are performed on pixels at the boundary of a transform block depending on the locations of pixels relative to the boundary.
Since a transform operation can be carried out for different block sizes, the boundary of a transform block must be determined in consideration of the size of the transform block of a base layer (for example, 4×4, 8×8, or . . . ).
Up-sampling processes are basically the same except that a 6 tap interpolation filter is used even in the case where the ratio of the resolutions of the base layer and enhanced layer is not dyadic. Pixels within another transform block are not used for residual up-sampling.
Furthermore, the same up-sampling is applied to the signals of luminance and chrominance components.
FIG. 3 illustrates an embodiment in which an image block of an enhanced layer encoded in an intra base mode is decoded using the decoded image data of a base layer.
In up-sampling in the intra base mode, the boundary of a transform block is not taken into consideration, and a 6 tap interpolation filter is applied to both luminance and chrominance signals.