1. Field of the Invention
Methods and apparatuses consistent with the present invention relate to video compression, and more particularly, to reducing inter-layer redundancy of a difference signal obtained from an intra-prediction when coding a video using a multi-layer structure supporting intra-prediction.
2. Description of the Related Art
Development of communication technologies such as the Internet has led to an increase in video communication in addition to text and voice communication. However, consumers have not been satisfied with existing text-based communication schemes. To satisfy various consumer needs, services for multimedia data containing text, images, music and the like have been increasingly provided. Multimedia data is usually voluminous and requires a large capacity storage medium. Also, a wide bandwidth is required for transmitting the multimedia data. Accordingly, a compression coding scheme is required when transmitting multimedia data.
A basic principle of data compression is to eliminate redundancy in the data. Data can be compressed by removing spatial redundancy, which is the duplication of colors or objects in an image, temporal redundancy, which is little or no variation between adjacent frames in a moving picture or successive repetition of the same sounds in audio, or perceptual-visual redundancy referring to the limitations of human vision and the inability to hear high frequencies.
So far, a variety of standards, such as MPEG-2, MPEG-4, and H.254, have been suggested as video compression methods. In addition, the Joint Video Team (JVT), which is a joint working group of the Moving Picture Experts Group (MPEG) and the International Telecommunication Union (ITU), is doing work for standardization, hereinafter, referred to as “H.264 SE” (Scalable Extension), in order to implement scalability in multi-layer-based H.264.
FIG. 1 illustrates the concept of scalable video coding method according to the Scalable Video Coding (SVC) standard. As an example of scalable video coding, a basic layer can be set as a quarter common intermediate format (QCIF) and has a frame rate of 15 Hz, a first enhancement layer can be set as a common intermediate format (CIF) and has a frame rate of 30 Hz, and a second enhancement can be set as a standard definition (SD) and has a frame rate of 60 Hz.
Inter-layer correlation can be used to encode a multi-layer video frame. For example, a certain area 12 among video frames of the first enhancement layer can be effectively encoded through a prediction performed from the corresponding area 13 included in a video frame of a base layer. Similarly, an area 11 among the second enhancement layer video frames can be effectively encoded through a prediction performed from the area 12 of the first enhancement layer. If resolution for each layer in the multi-layer video is different, an image of the base layer needs to be up-sampled prior to the prediction.
In the SVC standard, an intra-prediction performed with reference to a frame having a similar temporal position, as well as an inter-prediction performed with reference to a frame having a different temporal position on the conventional single layer are supported. In the intra-prediction, a directional intra-prediction performed with reference to other area in the frame to be predicted, and an intra-prediction (also, referred to as “intra-base-layer prediction”), performed with reference to the lower layer frame having a temporal position identical to the frame to be predicted, are supported.
FIG. 2 is a conceptual diagram illustrating the three prediction methods. With reference to FIG. 2, a directional intra-prediction can be performed on a certain block 4 of the current frame 1 by using the information on neighboring blocks, or the inter-prediction can be performed on the block 4 by using a frame 2 (previous or future frame) having a temporal position different from that of the current frame 1. The intra-base prediction can be performed on the block 4 by using the information of the corresponding area 6 included in the lower layer frame 3 having a temporal position identical to the current frame 1.
Generally, a prediction implies a process of reducing the complexity of a to-be-coded signal by subtracting the prediction block available in a video encoder and video decoder from a frame to be coded or a block (coding object). The result of subtracting a prediction block from the coding object is referred to as a difference signal. Ultimately, the differences between the three prediction methods in FIG. 2 is the way in which the prediction block is obtained.