1. Field of the Invention
Methods and apparatuses consistent with the present invention relate to adaptive entropy encoding and adaptive entropy decoding for scalable video encoding, and more particularly, to adaptive entropy encoding and adaptive entropy decoding using various context models.
2. Description of the Related Art
Scalable video encoding is video encoding for transmitting transmission data after adjusting the amount of transmission data according to heterogeneous network and terminal environments and is essential for processing videos adaptively according to various transmission environments. With advances in mobile communication technologies, transmission systems, high-performance semiconductors, and video compression technologies, the demand for video services that can adapt to various transmission environments is growing fast.
However, to process videos adaptively according to changes in various environments including a channel bandwidth, a terminal processing capability, a packet loss rate, and a user's mobile environment, conventional video encoding techniques that have been developed based on specific communication environments are not suitable for encoding videos adaptively according to various transmission environments. Scalable video encoding is an intelligent technique of adapting to such various transmission environments. Examples of scalable video encoding include spatial scalable encoding, frame-rate adaptive temporal scalable encoding, and signal-to-noise ratio (SNR) scalable encoding based on video quality.
Conventional video standards include these scalable video encoding techniques. For example, there are MPEG-2 scalable encoding based on video data transmission mainly in asynchronous transfer mode (ATM) networks, SNR, temporal, and spatial scalable encoding of H.263 Annex.0, and MPEG-4 based fine granular scalable encoding. In addition, MPEG-4 AVC compliant scalable video encoding is being standardized. MPEG-4 AVC compliant scalable video encoding aims at providing scalable video encoding in terms of SNR, temporal, and spatial scalabilities.
FIG. 1 is a view for explaining an example of video encoding using scalable video encoding.
Referring to FIG. 1, it can be seen that scalable video encoding can be performed in terms of SNR, temporal, and spatial scalabilities. Scalable video encoding involves encoding a video into multiple layers according to a network state, in which enhancement layers are encoded using data of their immediately lower layers.
In the example of video encoding shown in FIG. 1, when a transmission bit rate of video data is below 41 kbps, only a base layer 110 is encoded. When a transmission bit rate of video data is between 41 kbps and 80 kbps, SNR scalable encoding that improves video quality using data of the base layer 110 is performed to create and encode a first enhancement layer 120. A picture size of each frame of the base layer 110 and the first enhancement layer 120 is in a quarter common intermediate format (QCIF), and the base layer 110 and the first enhancement layer 120 are encoded at a rate of 15 frames per second.
The picture size of each frame of a second enhancement layer 130 and a third enhancement layer 140 is specified by a CIF, and the second enhancement layer 130 and the third enhancement layer 140 are encoded at a rate of 30 frames per second. Thus, when a transmission bit rate of video data is 115 kbps or more, the second enhancement layer 130 is created by up-sampling frames having the QCIF picture size in the first enhancement layer 120 into frames having the CIF picture size and performing predictive encoding on the upsampled frames to further create intermediate frames, i.e., high-pass (H) frames. When a transmission bit rate of video data is 256 kbps or more, the third enhancement layer 140 is created by performing SNR scalable encoding that improves video quality using data of the second enhancement layer 130, which is immediately below the third enhancement layer 140.
Since bi-predictive (B) frames or H frames of each layer are used as reference frames for motion compensation of preceding frames in terms of a transmission order, they can be temporal scalable encoded. Referring to FIG. 1, I frames and P frames or low-pass (L) frames precede B frames or H frames in terms of a transmission order. A transmission order between the B frames and H frames changes with indices (indicated by superscripts in frame names) assigned to the B frames and H frames, as shown in FIG. 1. A frame is preferentially transmitted as its index is low among the B frames and as its index is high among H frames.
For example, in the base layer 110 or the first upper layer 120, B1 frames are motion-compensated by referring to I frames and P frames and B2 frames are motion-compensated by referring to the B1 frames. In the second enhancement layer 130 and the third enhancement layer 140, H3 frames are motion-compensated by referring to L3 frames and H2 frames are motion-compensated by referring to the H3 frames. Thus, a frame transmission order is I->P->B1->B2->B3 in the base layer 110 and the first enhancement layer 120 and is L3->H3->H2->H1->H0 in the second enhancement layer 130 and the third enhancement layer 140. A transmission order between frames having the same index is determined by a time order of the frames. Through such temporal scalable encoding, spatial scalable encoding, and SNR scalable encoding, a decoder can decode layers at scalable bit rates corresponding to the layers.
Although scalable video encoding has already been established as a standard in the MPEG-2 standard and has been studied in depth, it has not yet come into common use. The main reason for this is a low coding efficiency. In other words, when compared to a non-scalable video encoder, a scalable video encoder performs encoding to gradually improve the quality of a low-quality base layer. As a result, even when videos have the same bit rate, the qualities of some of the videos may be seriously degraded. Without addressing such a coding efficiency problem, scalable encoding is difficult to deploy in the market.
To solve the problem, research is being actively conducted on overcoming encoding efficiency degradation in scalable encoding. For example, in spatial scalable encoding, encoding efficiency can be greatly improved compared to independent encoding of each layer, by using up-sampled frames of a lower layer in motion compensation. In other words, since there is a high correlation between layers, a high encoding efficiency can be obtained in predictive encoding by using such a high correlation.
However, in conventional scalable video encoding, entropy encoding does not use a correlation between layers, but instead is performed in the same manner as non-scalable video encoding. As a result, encoding efficiency degradation cannot be solved.