1. Field of the Invention
Apparatuses and method consistent with the present invention relate to a multi-layer video coding algorithm, and more particularly, to a multi-layer video coding algorithm designed to encode a predetermined resolution layer using a plurality of coding algorithms.
2. Description of the Related Art
With the development of information communication technology including Internet, video communication as well as text and voice communication has increased. Conventional text communication cannot satisfy the various demands of users, and thus multimedia services that can provide various types of information such as text, pictures, and music have increased. Multimedia data requires a large capacity storage medium and a wide bandwidth for transmission since the amount of multimedia data is usually large. For example, a 24-bit true color image having a resolution of 640×480 needs a capacity of 640×480×24 bits, i.e., data of about 7.37 Mbits, per frame. When this image is transmitted at a speed of 30 frames per second, a bandwidth of 221 Mbits/sec is required. When a 90-minute movie based on such an image is stored, a storage space of about 1200 Gbits is required. Accordingly, a compression coding method is a requisite for transmitting multimedia data including text, video, and audio.
A basic principle of multimedia data compression is removing data redundancy. In other words, video data can be compressed by removing spatial redundancy in which the same color or object is repeated in an image, temporal redundancy in which there is little change between adjacent frames in a moving image or the same sound is repeated in audio, or mental visual redundancy taking into account human eyesight and limited perception of high frequency.
FIG. 1 shows an environment in which video compression is applied.
Video data is compressed by a video encoder 110. Currently known Discrete Cosine Transform (DCT)-based video compression algorithms are MPEG-2, MPEG-4, H.263, and H.264. In recent years, research into wavelet-based scalable video coding has been actively conducted. Compressed video data is sent to a video decoder 130 via a network 120. The video decoder 130 decodes the compressed video data to reconstruct original video data.
The video encoder 110 compresses the original video data not to exceed the available bandwidth of the network 120 in order for the video decoder 130 to decode the compressed data. However, communication bandwidth may vary depending on the type of the network 120. For example, the available communication bandwidth of an Ethernet is different from that of a wireless local area network (WLAN). A cellular communication network may have a very narrow bandwidth. Thus, research is being actively conducted into a method for generating video data compressed at various bit-rates from the same compressed video data, in particular, scalable video coding.
Scalable video coding is a video compression technique that allows video data to provide scalability. Scalability is the ability to generate video sequences at different resolutions, frame rates, and qualities from the same compressed bitstream. Temporal scalability can be provided using Motion Compensation Temporal filtering (MCTF), Unconstrained MCTF (UMCTF), or Successive Temporal Approximation and Referencing (STAR) algorithm. Spatial scalability can be achieved by a wavelet transform algorithm or multi-layer coding that has been actively studied in recent years. Signal-to-Noise Ratio (SNR) scalability can be obtained using Embedded ZeroTrees Wavelet (EZW), Set Partitioning in Hierarchical Trees (SPIHT), Embedded ZeroBlock Coding (EZBC), or Embedded Block Coding with Optimized Truncation (EBCOT).
FIGS. 2 and 3 illustrate examples of multi-layer bitstream structures.
Referring to FIG. 2, a multi-layer video encoder encodes each layer using an MPEG-4 Advanced Video Coding (AVC) algorithm offering the highest coding efficiency currently available. The MPEG-4 AVC algorithm removes temporal redundancies between frames and uses DCT to transform the resulting frames for quantization.
Referring to FIG. 2, each layer has at least one different resolution, frame rate, and bit-rate. In an AVC scheme, a base layer frame having the lowest resolution, lowest frame rate, and lowest bit-rate is encoded and then an enhancement layer is encoded using the encoded base layer frame. The AVC-based multi-layer video coding scheme uses an AVC-based technique for encoding each layer, providing high coding efficiency. In particular, intra prediction and deblocking techniques used in an AVC algorithm effectively remove most artifacts caused by block-based coding. Furthermore, each layer is optimized with respect to rate-distortion. However, the generated bitstream does not have a flexible scalability. That is, it is difficult to provide fine grain scalability (FGS) and combined scalability using a bitstream generated by multi-layer AVC video coding because scalabilities are dependent on each other. When video data is encoded into many layers, the multi-layer coding scheme shown in FIG. 2 performs AVC encoding on all layers.
Referring to FIG. 3, after encoding a base layer with the lowest resolution, lowest frame rate, and lowest bit-rate using AVC, a layer having the highest resolution, highest frame rate, and highest quality is encoded using the encoded base layer by wavelet coding.
Since the layer having the highest resolution, highest frame rate, and highest quality is encoded using wavelet coding, a coding scheme shown in FIG. 3 can provide a bitstream with full scalability. Furthermore, since the lowest resolution layer is encoded using AVC, a video decoder can reconstruct a video frame of satisfactory quality at the lowest resolution.
While the bitstream shown in FIG. 2 is optimized for each layer with respect to rate-distortion but has weak scalability, the bitstream shown in FIG. 3 has excellent scalability but low video quality since all layers excluding the lowest resolution AVC coded layer are reconstructed from one wavelet coded layer.