1. Field of the Invention
Methods and apparatuses consistent with the present invention relate to video compression technology, and more particularly, to encoding/decoding using an extended macro-block skip mode, which performs inter-layer prediction by selectively using information of a base layer according to the type of the extended macro-block skip mode, wherein the type indicates if the information of the base layer is used as it is while skipping a frame of an enhanced layer in a multi-layer video signal.
2. Description of the Related Art
In related art information communication technologies including the Internet, multimedia services capable of supporting various types of information such as text, image and music are increasing. Multimedia data usually have a large volume, which requires a large capacity medium for storage of data and a wide bandwidth for transmission of data. Therefore, a compression coding scheme is required to transmit multimedia data including text, image, and audio data.
Data compression lies includes a process of removing redundancy in data. Data compression can be achieved by removing the spatial redundancy such as repetition of the same color or entity in an image, the temporal redundancy such as repetition of the same sound in audio data or nearly no change between temporally adjacent pictures in a moving image stream, or the perceptional redundancy based on human visual and perceptional capability being insensitive to high frequencies. Data compression can be classified into loss/no-loss compression according to whether the source data is lost, in-frame/inter-frame compression according to whether the compression is independent for each frame, and symmetric/non-symmetric compression according to whether time necessary for the compression and restoration is the same. In related art video coding schemes, temporal repetition is removed by temporal filtering based on motion compensation and spatial repetition is removed by spatial transform.
Transmission media, which are necessary to transmit multimedia data generated after redundancies in the data are removed, show various levels of performance. Related art transmission media include media having various transmission speeds, from an ultra high-speed communication network capable of transmitting several tens of megabits of data per second, to a mobile communication network having a transmission speed of 384 kilobits per second. In such an environment, the scalable video coding scheme, that is, a scheme for transmitting the multimedia data at a data rate according to the transmission environment or to support transmission media of various speeds, is more suitable for the multimedia environment.
In a broad sense, the scalable video coding includes a spatial scalability for controlling a resolution of a video, a Signal-to-Noise Ratio (SNR) scalability for controlling a screen quality of a video, a temporal scalability for controlling a frame rate, and combinations thereof.
Standardization of the scalable video coding as described above is disclosed in Moving Picture Experts Group-21 (MPEG-4) part 10. To arrange the standardization of the scalable video coding, related art efforts have been attempted to implement scalability on a multi-layer basis. For example, the scalability may be based on multiple layers including a base layer, a first enhanced layer (enhanced layer 1), a second enhanced layer (enhanced layer 2), etc., which have different resolutions (QCIF, CIF, 2CIR, etc.) or different frame rates.
As is the case in related art coding with a single layer, it is necessary to obtain a motion vector (MV) for removing the temporal redundancy for each layer in the coding with multi-layers. The motion vector includes a motion vector (former), which is individually obtained and used for each layer, and a motion vector (latter), which is obtained for one layer and is then also used for other layers (either as it is or after up/down sampling). It is possible to obtain a more exact motion vector in the former case than in the latter case. However, the motion vector obtained for each layer may instead serve as an overhead. Therefore, in the former case, the redundancy between the motion vectors of the respective layers should be substantially eliminated.
FIG. 1 is a view illustrating a scalable video codec using a multi-layer structure. First, a base layer is defined to have a frame rate of Quarter Common Intermediate Format (QCIF)−15 Hz, a first enhanced layer is defined to have a frame rate of Common Intermediate Format (CIF)−30 Hz, and a second enhanced layer is defined to have a frame rate of Standard Definition (SD)−60 Hz. If a CIF 0.5 Mbps stream is required, it is possible to cut and transmit the bit stream so that the bit rate is changed to 0.5 Mbps in CIF—30 Hz—0.7 Mbps of the first enhanced layer. In this way, the spatial, temporal, and SNR scalability can be implemented.
As noted from FIG. 1, frames 10, 20, and 30 of respective layers having the same temporal position may have similar images. Therefore, there is a related art scheme in which a texture of a current layer is predicted from a texture of a lower layer either directly or through up-sampling, and a difference between the predicted value and the texture of the current layer is encoded. In “Scalable Video Model 3.0 of ISO/IEC 21000-13 Scalable Video Coding (hereinafter, referred to as SVM 3.0),” the scheme as described above is defined as an “Intra_BL prediction.”
As described above, the SVM 3.0 employs not only the “inter-prediction” and the “directional intra-prediction,” which are used for prediction of blocks or macro-blocks constituting a current frame in the related art H.264, but also the scheme of predicting a current block by using a correlation between a current block and a lower layer block corresponding to the current block. This related art prediction scheme is called “Intra_BL prediction,” and an encoding mode using this prediction is called “Intra_BL mode.”
FIG. 2 is a schematic view for illustrating the three related art prediction schemes described above, which include an intra-prediction ({circle around (1)}) for a certain macro-block 14 of a current frame 11, an inter-prediction ({circle around (2)}) using a macro-block 15 of a frame 12 located at a position temporally different from that of the current frame 11, and an intra_BL prediction ({circle around (3)}) using texture data for an area 16 of a base layer frame 13 corresponding to the macro-block 14. In the scalable video coding standard as described above, one advantageous scheme is selected and used from among the three prediction schemes for each macro-block.
A macro-block skip mode has been employed in the related art in the enhanced layer of the scalable video coding as described above and is being used in the coding of a static motion sequence. Originally, the macro-block skip mode was not designed for the scalable video coding and has been used in the related art H.264, and can be considered as having been borrowed for the scalable video coding. To apply the macro-block skip mode of the H.264, the Coded Block Pattern (CBP) must have a value of 0, which means there is no data to be predicted. Therefore, in order to apply the macro-block skip mode of the H.264, the value of the residual prediction flag (residual_pred_flag) must be set to 0.
When the macro-block skip mode is applied in this way, a motion vector and pattern information of a macro-block to be encoded are skipped, and instead a motion vector and pattern information of another macro-block adjacent to the macro-block are taken and encoded. Then, they are expressed by a flag of one bit.
However, according to the related art macro-block skip mode employed in the scalable video coding, it is possible to apply the skip mode only to the base layer and each of the enhanced layers, and it is impossible to use an extended macro-block skip mode for extending the macro-block skip mode to the inter-layers by using a flag of one bit.