1. Field of the Invention
The present invention relates to scalable encoding and decoding of a video signal, and more particularly to a method and apparatus for encoding a video signal, wherein a base layer in the video signal is additionally used to code an enhanced layer in the video signal, and a method and apparatus for decoding such encoded video data.
2. Description of the Related Art
Scalable Video Codec (SVC) is a method which encodes video into a sequence of pictures with the highest image quality while ensuring that part of the encoded picture sequence (specifically, a partial sequence of frames intermittently selected from the total sequence of frames) can also be decoded and used to represent the video with a low image quality. Motion Compensated Temporal Filtering (MCTF) is an encoding scheme that has been suggested for use in the scalable video codec.
Although it is possible to represent low image-quality video by receiving and processing part of the sequence of pictures encoded in a scalable fashion as described above, there is still a problem in that the image quality is significantly reduced if the bitrate is lowered. One solution to this problem is to hierarchically provide an auxiliary picture sequence for low bitrates, for example, a sequence of pictures that have a small screen size and/or a low frame rate, so that each decoder can select and decode a sequence suitable for its capabilities and characteristics. One example is to encode and transmit not only a main picture sequence of 4CIF (Common Intermediate Format) but also an auxiliary picture sequence of CIF and an auxiliary picture sequence of QCIF (Quarter CIF) to decoders. Each sequence is referred to as a layer, and the higher of two given layers is referred to as an enhanced layer and the lower is referred to as a base layer.
Such picture sequences have redundancy since the same video signal source is encoded into the sequences. To increase the coding efficiency of each sequence, there is a need to reduce the amount of coded information of the higher sequence by performing inter-sequence picture prediction of video frames in the higher sequence from video frames in the lower sequence temporally coincident with the video frames in the higher sequence.
However, video frames in sequences of different layers may have different aspect ratios. For example, video frames of the higher sequence (i.e., the enhanced layer) may have a wide aspect ratio of 16:9, whereas video frames of the lower sequence (i.e., the base layer) may have a narrow aspect ratio of 4:3. In this case, there is a need to determine which part of a base layer picture is to be used for an enhanced layer picture or for which part of the enhanced layer picture the base layer picture is to be used when performing prediction of the enhanced layer picture.