1. Field of the Invention
Apparatuses and methods consistent with the present invention relate to video coding, and more particularly, to effectively coding multiple layers using interlayer information in a multi-layered video codec.
2. Description of the Related Art
With the development of information communication technology, including the Internet, there have been an increasing number of multimedia services containing various kinds of information such as text, video, audio and so on. Multimedia data requires a large capacity of storage media and a wide bandwidth for transmission since the amount of multimedia data is usually large. For example, a 24-bit true color image having a resolution of 640*480 needs a capacity of 640*480*24 bits, i.e., data of about 7.37 Mbits, per frame. When this image is transmitted at a speed of 30 frames per second, a bandwidth of 221 Mbits/sec is required. When a 90-minute movie based on such an image is stored, a storage space of about 1200 Gbits is required. Accordingly, a compression coding method is a requisite for transmitting multimedia data including text, video, and audio.
A basic principle of data compression is removing data redundancy. Data can be compressed by removing spatial redundancy in which the same color or object is repeated in an image, temporal redundancy in which there is little change between adjacent frames in a moving image or the same sound is repeated in audio, or mental visual redundancy taking into account human eyesight and limited perception of high frequencies. Data compression can be classified into lossy/lossless compression according to whether source data is lost, intraframe/interframe compression according to whether individual frames are compressed independently, and symmetric/asymmetric compression according to whether the time required for compression is the same as the time required for recovery. Data compression is defined as real-time compression if the compression/recovery time delay does not exceed 50 ms and is defined as scalable compression when frames have different resolutions. For text or medical data, lossless compression is usually used. For multimedia data, lossy compression is usually used. Further, intraframe compression is usually used to remove spatial redundancy, and interframe compression is usually used to remove temporal redundancy.
Transmission media for transmitting multimedia information differ in performance according to the types of media transmitted. The transmission media currently in use have a variety of transfer rates, ranging for example, from a very-high speed communication network capable of transmitting the data at a transfer rate of tens of Mbits per second to a mobile communication network having a transfer rate of 384 Kbps. Previous video coding techniques such as MPEG-1, MPEG-2, H.263 or H.264 remove redundancy based on a motion compensated prediction coding technique. Specifically, temporal redundancy is removed by motion compensation, while spatial redundancy is removed by transform coding. These techniques have a good compression rate, but do not provide flexibility for a true scalable bitstream due to the use of a recursive approach in a main algorithm. Thus, recent research has been actively made on wavelet-based scalable video coding. Scalability indicates the ability to partially decode a single compressed bitstream, that is, the ability to perform a variety of types of video reproduction. Scalability includes spatial scalability indicating a video resolution, signal-to noise ratio (SNR) scalability indicating a video quality level, temporal scalability indicating a frame rate, and a combination thereof.
Standardization of H.264 Scalable Extension (hereinafter, to be referred to be as “H.264 SE”) is being performed at present by a joint video team (JVT) of the MPEG (Motion Picture Experts Group) and ITU (International Telecommunication Union). An advantageous feature of H.264 SE lies in that it exploits the relevancy among layers in order to code a plurality of layers while employing an H.264 coding technique. While the plurality of layers are different from one another in view of resolution, frame rate, SNR, or the like, they basically have a substantial similarity in that they are generated from the same video source. In this regard, a variety of efficient techniques that utilize information about lower layers in coding upper layer data are proposed.
FIG. 1 is a diagram for explaining weighted prediction proposed in conventional H.264. The weighted prediction allows a motion-compensated reference picture to be appropriately scaled instead of being averaged in order to improve prediction efficiency.
A motion block 11 (a “macroblock or “subblock” as the basic unit for calculating a motion vector) in a current picture 10 corresponds to a predetermined image 21 in a left reference picture 20 pointed by a forward motion vector 22 while corresponding to a predetermined image 31 in a right reference image 30 pointed by a backward motion vector 32.
An encoder reduces the number of bits required to represent the motion block 11 by subtracting a predicted image obtained from the images 21 and 31 from the motion block 11. A conventional encoder not using weighted prediction calculates a predicted image by simply averaging the images 21 and 31. However, since the motion block 11 is not usually identical to an average of the left and right images 21 and 31, it is difficult to obtain an accurate predicted image.
To overcome this limitation, a method for determining a predicted image using a weighted sum is proposed in H.264. According to the method, weighting factors α and β are determined for each slice and a sum of products of multiplying the weighting factors α and β by the images 21 and 31 are used as a predicted image. The slice may consist of a plurality of macroblocks and be identical to a picture. A plurality of slices may make up a picture. The proposed method can obtain a predicted image with a very small difference from the motion block 11 by adjusting the weighting factors. The method can also improve coding efficiency by subtracting the predicted image from the motion block 11.
While the weighted prediction defined in H.264 is very effective, this technique has been applied so far only to single layer video coding. Research has not yet been conducted on application of this technique to multi-layered scalable video coding. Accordingly, there is a need to apply weighted prediction to multi-layered scalable video coding.