1. Field of the Invention
Apparatuses and methods consistent with the present invention relate generally to video coding, and more particularly, to effectively predicting a video frame that use all of the advantages of an intra mode and an intra BL mode in multi-layer structure based-video coding.
2. Description of the Related Art
As information and communication technology, including the Internet, develops, image-based communication as well as text-based communication and voice-based communication is increasing. The existing text-based communication is insufficient to satisfy various consumers demands. Therefore, the provision of multimedia service capable of accommodating various types of information, such as text, images and music, is increasing. Since the amount of multimedia data is large, multimedia data require high-capacity storage media and require broad bandwidth at the time of transmission. Therefore, to transmit multimedia data, including text, images and audio, it is essential to use a compression coding technique.
The fundamental principle of data compression is to eliminate redundancy in data. Data can be compressed by eliminating spatial redundancy such as the case where an identical color or object is repeated in an image, temporal redundancy such as the case where there is little change between neighboring frames or identical audio sound is repeated, or psychovisual redundancy in which the fact that humans' visual and perceptual abilities are insensitive to high frequencies is taken into account.
For such a moving image compression method, H.264/Advanced Video Coding (AVC), which has higher compression efficiency than Moving Picture Experts Group (MPEG)-4, has attracted attention recently. H.264 uses directional intra-prediction, which eliminates spatial similarity in each frame, as one of the schemes for improving compression efficiency.
Directional intra-prediction is a method of predicting the values of a current sub-block and encoding only a difference in such a way as to perform copying in predetermined directions using neighboring pixels on the upper and left sides of a sub-pixel with respect to the sub-block.
In H.264, a predicted block with respect to a current block is generated based on other blocks having preceding sequential positions. The difference between the current block and the predicted block is encoded. For a luminance component, each predicted block is generated on a 4×4 block or 6×16 macroblock basis. There are a total of nine optional prediction modes for each 4×4 block, whereas there are a total of four optional prediction modes for each 16×16 block. An H.264-based video encoder selects the one prediction mode that minimizes the difference between the current block and the predicted block, from among the prediction modes, for each block.
For prediction modes for the 4×4 block, H.264, as shown in FIG. 1, employs a total of nine prediction modes, including a total of nine directional modes (modes 0, 1, 3 to 8), and a DC mode (mode 2) that uses an average of the values of nine neighboring pixels.
FIG. 2 illustrates an example of labeling to illustrate the nine prediction modes. In this example, a predicted block (including regions “a” to “p”) with respect to a current block is generated using previously decoded samples A to M. If regions E, F, G and H cannot be previously decoded, regions E, F, G and H can be virtually created by copying region D to the locations of the regions E, F, G and H.
With reference to FIG. 3, the nine prediction modes are respectively described in detail below. In the case of mode 0, the pixels of a predicted block are extrapolated using upper samples A, B, C and D in a vertical direction, and in the case of mode 1, the pixels are extrapolated using left samples I, J, K and L in a horizontal direction. Furthermore, in the case of mode 2, the pixels of the predicted block are uniformly replaced by the averages of upper samples A, B, C and D and left samples I, J, K and L.
In the case of mode 3, the pixels of the predicted block are interpolated between a lower-left position and an upper-right position in a direction that is inclined at an angle of 45°, and in the case of mode 4, the pixels are extrapolated in a direction that is inclined toward an upper-left position at an angle of 45°. Furthermore, in the case of mode 5, the pixels of the predicted block are extrapolated in a direction that is inclined rightward from a vertical direction at an angle of about 26.6° (width/height=1/2).
In the case of mode 6, the pixels of the predicted block are extrapolated in a direction that is inclined downward from a horizontal direction at an angle of about 26.6°, and in the case of mode 7, the pixels are extrapolated in a direction that is inclined leftward from a vertical direction at an angle of about 26.6°. Finally, in the case of mode 8, the pixels of the predicted block are interpolated in a direction that is inclined upward from a horizontal direction at an angle of about 26.6°.
The arrows of FIG. 3 indicate prediction directions in respective modes. In modes 3 to 8, the samples of the predicted block can be generated from the weighted averages of previously decoded reference samples A to M. For example, in the case of mode 4, sample d, which is located in the upper left, can be predicted as expressed by the following Equation 1. In this Equation, the round(.) function is a function that rounds off an input value to an integer position.d=round(B/4+C/2+D/4)   (1)
Meanwhile, a 16×16 prediction model for luminance components includes four modes, that is, mode 0, mode 1, mode 2 and mode 3. In the case of mode 0, the pixels of a predicted block are extrapolated from upper samples H, and in the case of mode 1, the pixels of a predicted block are extrapolated from left samples V. Furthermore, in the case of mode 2, the pixels of a predicted block are calculated using the averages of upper samples H and left samples V. Finally, in the case of mode 3, a “plane” function suitable for upper samples H and left samples V is used. This mode is more suitable for a region in which luminance smoothly changes.
Meanwhile, in addition to efforts to improve the efficiency of video coding, research into video coding that allows the resolution of transmitted video data, a frame rate, and a Signal-to-Noise Ratio (SNR) to be adjusted, that is, that supports scalability, is actively being carried out.
With regard to this scalable video coding technique, standardization work is in progress in the Moving Picture Experts Group (MPEG)-21 PART-13. Of these methods for supporting scalability, a multi-layered video coding method is considered a prominent method. For example, multiple layers, including a base layer, a first enhanced layer and a second enhanced layer 2, are provided and respective layers have different resolutions QCIF, CIF and 2CIF or different frame rates.
In the scalable video coding standard currently in progress, besides inter prediction and directional intra prediction (hereinafter simply referred to as intra prediction) used in existing H.264 to predict a current block or macroblock, a method of predicting a layer to which a current block belongs using the correlation between the current block and a corresponding lower layer block is additionally introduced. This prediction method is referred to as “intra BL (intra_BL) prediction” in the standard, and the case of performing encoding using such prediction is referred to as “intra BL mode.”
FIG. 4 is a schematic diagram showing the three prediction methods, which illustrates the case of performing intra prediction on a macroblock of a current frame 1 ({circle around (1)}), the case of performing inter prediction using a frame 2 placed at a temporal location different from that of the current frame 1 ({circle around (2)}), and the case of performing intra BL prediction using texture data about the region 6 of the frame of a base layer corresponding to a macroblock ({circle around (3)}).
As described above, in the scalable video coding standard, the advantageous one of the three prediction methods is selected for each macroblock, and a corresponding macroblock is encoded using the selected method. That is, for one macroblock, inter prediction, intra prediction and intra BL prediction are selectively used. However, a differential block created using intra BL prediction still has considerable correlation with neighboring differences. Accordingly, it is necessary to develop a prediction technique that takes the advantages of both intra BL prediction and intra prediction into account. Although a prediction method that takes the advantages of intra prediction, intra BL prediction and inter prediction into account may be considered, the characteristics of intra BL prediction and intra prediction are considerably different from those of inter prediction, so this method is not desirable.