1. Field of the Invention
Apparatuses and methods consistent with the present invention relate to a method and apparatus for predecoding and decoding a bitstream including a base layer, and more particularly, to extracting a higher quality video stream for a given bit-rate by replacing a specific frame by a base layer frame at a predecoder.
2. Description of the Related Art
With the development of information communication technology including the Internet, video communication as well as text and voice communication has explosively increased. Conventional text communication cannot satisfy various user demands, and thus multimedia services that can provide various types of information such as text, pictures, and music have increased. Multimedia data requires a large capacity of storage media and a wide bandwidth for transmission since the amount of multimedia data is usually large in relative terms to other types of data. Accordingly, a compression coding method is required for transmitting multimedia data including text, video, and audio. For example, a 24-bit true color image having a resolution of 640*480 needs a capacity of 640*480*24 bits, i.e., data of about 7.37 Mbits, per frame. When an image such as this is transmitted at a speed of 30 frames per second, a bandwidth of 221 Mbits/sec is required. When a 90-minute movie based on such an image is stored, a storage space of about 1200 Gbits is required. Accordingly, a compression coding method is a requisite for transmitting multimedia data including text, video, and audio.
In such a compression coding method, a basic principle of data compression lies in removing data redundancy. Data redundancy is typically defined as spatial redundancy in which the same color or object is repeated in an image, temporal redundancy in which there is little change between adjacent frames in a moving image or the same sound is repeated in audio, or mental visual redundancy taking into account human eyesight and perception dull to high frequency. Data can be compressed by removing such data redundancy. Data compression can largely be classified into lossy/lossless compression, according to whether source data is lost, intraframe/interframe compression, according to whether individual frames are compressed independently, and symmetric/asymmetric compression, according to whether time required for compression is the same as time required for recovery. In addition, data compression is defined as real-time compression when a compression/recovery time delay does not exceed 50 ms and as scalable compression when frames have different resolutions. As examples, for text or medical data, lossless compression is usually used. For multimedia data, lossy compression is usually used. Meanwhile, intraframe compression is usually used to remove spatial redundancy, and interframe compression is usually used to remove temporal redundancy.
Transmission performance is different depending on transmission media. Currently used transmission media have various transmission rates. For example, an ultrahigh-speed communication network can transmit data of several tens of megabits per second while a mobile communication network has a transmission rate of 384 kilobits per second. In related art video coding methods such as Motion Picture Experts Group (MPEG)-1, MPEG-2, H.263, and H.264, temporal redundancy is removed by motion compensation based on motion estimation and compensation, and spatial redundancy is removed by transform coding. These methods have satisfactory compression rates, but they do not have the flexibility of a truly scalable bitstream since they use a reflexive approach in a main algorithm. Accordingly, in recent year, wavelet video coding has been actively researched. Scalability indicates the ability to partially decode a single compressed bitstream, that is, the ability to perform a variety of types of video reproduction.
Scalability includes spatial scalability indicating a video resolution, Signal to Noise Ratio (SNR) scalability indicating a video quality level, temporal scalability indicating a frame rate, and a combination thereof.
The spatial scalability and SNR scalability can be implemented using wavelet transform and quantization, respectively. The temporal scalability is realized using motion compensated temporal filtering (MCTF) or unconstrained MCTF (UMCTF).
FIG. 1 shows the entire configuration of a conventional video coding system supporting the above-mentioned scalabilities. Referring to FIG. 1, an encoder 40 encodes an input video 10 into a bitstream 20 by performing temporal filtering, spatial transform, and quantization. A predecoder 50 truncates a portion of the bitstream 20 received from the encoder 40 or extracts a bitstream 25 according to extraction conditions such as quality, resolution or frame rate determined considering environment of communication with and performance of a decoder 60, thereby implementing scalability for texture data in a simple manner.
The decoder 60 performs the inverse operation of the encoder 40 on the extracted bitstream 25 and generates an output video 30. When the processing power of the decoder 60 is insufficient to support real time decoding of the entire bitstream 20 generated by the encoder 40, the decoder 60 may extract the bitstream 25. Of course, the extraction may be performed by both the predecoder 50 and the decoder 60.
This scalable video coding allows a bit-rate, a resolution, and a frame rate to be all changed by the predecoder 50 and provides significantly high compression ratios at a high bit-rate. However, the scalable video coding exhibits significantly lower performance than conventional coding schemes such as MPEG-4 and H.264 at an insufficient bit-rate because of several reasons.
The degraded performance fundamentally results from a feature of a wavelet transform exhibiting lower degradation at a low resolution than discrete cosine transform (DCT). Another important reason is that encoding is optimized for a specific bit-rate in scalable video coding supporting various bit-rates while the encoding performance is degraded at other bit-rates.
Accordingly, there is a need to develop an efficient predecoding method to reduce degradation in quality, resolution or frame-rate.