1. Field of the Invention
Methods and apparatuses consistent with the present invention relate to coding and decoding of a video signal, and more particularly, to adaptively selecting a context model for entropy coding and a video decoder.
2. Description of the Related Art
With the development of information communication technology, including the Internet, there have been increasing multimedia services containing various kinds of information such as text, video, audio and so on. Multimedia data requires a large capacity of storage media and a wide bandwidth for transmission since the amount of multimedia data is usually large. Accordingly, a compression coding method is a requisite for transmitting multimedia data including text, video and audio.
A basic principle of data compression is removing data redundancy. Data can be compressed by removing spatial redundancy in which the same color or object is repeated in an image, temporal redundancy in which there is little change between adjacent frames in a moving image or the same sound is repeated in audio, or mental visual redundancy which takes into account human eyesight and its limited perception of high frequency. In general video coding, temporal redundancy is removed by motion compensation based on motion estimation and compensation, and spatial redundancy is removed by transform coding.
To transmit multimedia generated after removing data redundancy, transmission media are necessary. Transmission performance is different depending on transmission media. Currently used transmission media have various transmission rates. For example, an ultrahigh-speed communication network can transmit data of several tens of megabits per second while a mobile communication network has a transmission rate of 384 kilobits per second. Accordingly, to support transmission media having various speeds or to transmit multimedia at a data rate suitable to a transmission environment, data coding methods having scalability, such as wavelet video coding and sub-band video coding, may be suitable to a multimedia environment.
Scalable video coding is a technique that allows a compressed bitstream to be decoded at different resolutions, frame rates and signal-to-noise ratio (SNR) levels by truncating a portion of the bitstream according to ambient conditions such as transmission bit-rates, error rates and system resources. Motion Picture Experts Group 4 (MPEG-4) Part 10 standardization for scalable video coding is under way. In particular, much effort is being made to implement scalability based on a multi-layered structure. For example, a bitstream may consist of multiple layers, i.e., a base layer and first and second enhanced layers with different resolutions (QCIF, CIF, and 2CIF) or frame rates.
Like when a video is coded into a singe layer, when a video is coded into multiple layers, a motion vector (MV) is obtained for each of the multiple layers to remove temporal redundancy. The motion vector MV may be separately searched for each layer (i.e., in the former case) or a motion vector obtained by a motion vector search for one layer is used for another layer (without or after being upsampled/downsampled) (i.e., in the latter case). In the former case, however, in spite of the benefit obtained from accurate motion vectors, there still exists overhead due to motion vectors generated for each layer. Thus, it is a very challenging task to efficiently remove redundancy between motion vectors for each layer.
FIG. 1 shows an example of a scalable video codec using a multi-layer structure. Referring to FIG. 1, a base layer has a Quarter Common Intermediate Format (QCIF) resolution and a frame rate of 15 Hz, a first enhancement layer has a Common Intermediate Format (CIF) resolution and a frame rate of 30 Hz, and a second enhancement layer has a Standard Definition (SD) resolution and a frame rate of 60 Hz. For example, in order to obtain CIF 0.5 Mbps stream, a first enhancement layer bitstream (CIF—30 Hz—0.7 M) is truncated to match a target bit-rate of 0.5 M. In this way, it is possible to provide spatial temporal, and signal-to-noise ratio (SNR) scalabilities.
As shown in FIG. 1, frames (e.g., 10, 20, and 30) at the same temporal position in each layer can be considered to be similar images. One known coding technique includes predicting texture of current layer from texture of a lower layer (directly or after upsampling) and coding a difference between the predicted value and actual texture of the current layer. This technique is defined as Intra_BL prediction in Scalable Video Model 3.0 of ISO/IEC 21000-13 Scalable Video Coding (“SVM 3.0”).
The SVM 3.0 employs a technique for predicting a current block using correlation between a current block and a corresponding block in a lower layer in addition to directional intra prediction and Inter prediction used in conventional H.264 to predict blocks or macroblocks in a current frame. The prediction method is called an “Intra_BL prediction” and a coding mode using the Intra_BL prediction is called an “Intra_BL mode”.
FIG. 2 is a schematic diagram for explaining the above three prediction methods: {circle around (1)} an Intra prediction for a macroblock 14 in a current frame 11; {circle around (2)} an Inter prediction using a frame 12, which includes a corresponding macroblock 15, at a different temporal position than the current frame 11; and {circle around (3)} an Intra_BL prediction using texture data from a region 16 in a base layer frame 13 corresponding to the macroblock 14.
The scalable video coding standard selects an advantageous method of the three prediction methods for each macroblock.
In order to provide information about selected prediction method or data used for the selected prediction method to a decoder, a variety of flags can be used. One bit, several bits, and several ten bits may be used as flags depending on whether coding is performed on a macroblock-by-macroblock, slice-by-slice or frame-by-frame basis. The size of data increases when the flags are set for each macroblock, slice, or frame in the entire moving picture.
Accordingly, a need exists for a method and an apparatus for efficiently compressing the flags.