The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
Following the development of information and communication technology including the Internet, communication is on the rise in the form of video as well as text and voice. Users unsatisfied with existing text-oriented communication services are being offered an increasing number of multimedia services encompassing texts, images, music, and various types of information. The enormous quantity inherent to multimedia data calls for larger and larger storage capacities and broader bandwidths. Therefore, compressive coding technologies have become a requisite in transmitting multimedia data including text, video, and audio.
A basic principle of compressing a data includes a process of removing a factor of the data redundancy. The data can be compressed by removing the spatial redundancy corresponding to the repetition of the same color or object in an image, the temporal redundancy corresponding to the repetition of the same note in an audio or a case where there is little change of an adjacent frame in a video, or the psychological vision redundancy considering a fact that human's visual and perceptive abilities are insensitive to a high frequency.
As a dynamic image or video compressing method, H.264/AVC recently draws interest with its more improved compression efficiency in comparison with MPEG-4 (Moving Picture Experts Group-4).
The H.264 is a digital video codec standard having a very high data compression rate, and is also referred to as MPEG-4 part 10 or AVC (Advanced Video Coding). This standard is a result generated by making a Joint Video Team and performing the standardization together by VCEG (Video Coding Experts Group) of ITU-T (International Telecommunication Union Telecommunication Standardization Sector) and MPEG of ISO/IEC (International Standardization Organization/International Electrotechnical Commission).
Various methods to improve the compression efficiency in a compression encoding are proposed, and include representative methods of using a temporal prediction and a spatial prediction as methods.
As shown in FIG. 1 for predicting a current block 112 of a current frame 110, the temporal prediction scheme performs a prediction with reference to a reference block 122 of another temporally adjacent frame 120. That is, in inter-predicting the current block 112 of the current frame 110, searching is performed for the temporally adjacent reference frame 120 in which the reference block 122 that is most similar to the current block is then searched for. Here, the reference block 122 is a block, which can predict the current block 112 best, and a block having the smallest SAD (Sum of Absolute Difference) from the current block 112 can be the reference block 122. The reference block 122 becomes a predicted block of the current block 112, and a residual block is generated by subtracting the reference block 122 from the current block 112. The generated residual block is encoded and inserted in a bitstream. In this event, the relative difference between a position of the current block in the current frame 110 and a position of the reference block 122 in the reference frame 120 corresponds to a motion vector 130, and the motion vector 130 is encoded like the residual block. The temporal prediction is also referred to as an inter prediction or an inter frame prediction.
The spatial prediction corresponds to a prediction of obtaining a predicted pixel value of a target block by using a reconstructed pixel value of a reference block adjacent to the target block in one frame, and is also referred to as a directional intra-prediction (hereinafter, simply referred to as an “intra-prediction”) or an inter frame prediction. H.264 defines an encoding/decoding by using the intra-prediction.
The intra-prediction scheme predicts values of a current subblock by using a copy in a determined direction of one subblock based on adjacent pixels located in an upper direction and a left direction with respect to the subblock and encodes only their differential. According to the intra-prediction scheme based on the H.264 standard, a predicted block for a current block is generated based on another block having a prior coding order. Further, a coding is carried out on a value generated by subtracting the predicted block from the current block. A video encoder based on the H.264 standard selects from prediction modes a prediction mode having the smallest difference between the current block and the predicted block for each block.
The intra-prediction based on the H.264 standard provides 9 prediction modes shown in FIG. 2 in consideration of the prediction directivity and positions of adjacent pixels used for generating predicted pixel values of a 4×4 luma block and an 8×8 luma block. The 9 prediction modes are divided into a vertical prediction mode (prediction mode 0), a horizontal prediction mode (prediction mode 1), a DC prediction mode (prediction mode 2), a diagonal_down_left prediction mode (prediction mode 3), a diagonal_down_right prediction mode (prediction mode 4), a vertical_right prediction mode (prediction mode 5), a horizontal_down prediction mode (prediction mode 6), a vertical_left prediction mode (prediction mode 7), and a horizontal_up prediction mode (prediction mode 8) according to their prediction directions. Here, the DC prediction mode uses an average value of eight adjacent pixels.
Further, 4 intra-prediction modes are used for an intra-prediction processing for a 16×16 luma block, wherein the 4 intra-prediction modes are the vertical prediction mode (prediction mode 0), the horizontal prediction mode (prediction mode 1), the DC prediction mode (prediction mode 2), and the diagonal_down_left prediction mode (prediction mode 3). In addition, the same 4 intra-prediction modes are used for an intra-prediction processing for an 8×8 chroma block.
FIG. 3 illustrates a labeling example for the 9 intra-prediction modes shown in FIG. 2. In this event, a predicted block (an area including a to p) for the current block is generated using samples (A to M) decoded in advance. When E, F, G, and H cannot be decoded in advance, E, F, G, and H can be virtually generated by copying D into their positions.
FIG. 4 is a diagram for illustrating the 9 prediction modes shown in FIG. 2 by using FIG. 3. Referring to FIG. 4, a predicted block in a case of the prediction mode 0 predicts pixel values in the same vertical line as the same pixel value. That is, in pixels of the predicted block, pixel values are predicted from pixels, which are most adjacent to a reference block located in an upper side of the predicted block. Reconstructed pixel values of an adjacent pixel A are set to predicted pixel values of a pixel a, pixel e, pixel i, and pixel m in a first column of the predicted block. Further, in the same way, pixel values of a pixel b, pixel f, pixel j, and pixel n in a second column are predicted from reconstructed pixel values of an adjacent pixel B, pixel values of a pixel c, pixel g, pixel k, and pixel o in a third column are predicted from reconstructed pixel values of an adjacent pixel C, and pixel values of a pixel d, pixel h, pixel l, and pixel p in a fourth column are predicted from reconstructed pixel values of an adjacent pixel D. As a result, a predicted block in which predicted pixel values of each column correspond to pixel values of the pixel A, pixel B, pixel C, and pixel D is generated as shown in FIG. 5A.
Further, a predicted block in a case of the prediction mode 1 predicts pixel values in the same horizontal line as the same pixel value. That is, in pixels of the predicted block, pixel values are predicted from pixels, which are most adjacent to a reference block located in a left side of the predicted block. Reconstructed pixel values of an adjacent pixel l are set to predicted pixel values of a pixel a, pixel b, pixel c, and pixel d in a first row of the predicted block. Further, in the same way, pixel values of a pixel e, pixel f, pixel g, and pixel h in a second row are predicted from reconstructed pixel values of an adjacent pixel J, pixel values of a pixel i, pixel j, pixel k, and pixel l in a third row are predicted from reconstructed pixel values of an adjacent pixel K, and pixel values of a pixel m, pixel n, pixel o, and pixel p in a fourth row are predicted from reconstructed pixel values of an adjacent pixel L. As a result, a predicted block in which predicted pixel values of each column correspond to pixel values of the pixel l, pixel J, pixel K, and pixel L is generated as shown in FIG. 5B.
Furthermore, pixels of a predicted block in a case of the prediction mode 2 are equally replaced with an average of pixel values of upper pixels A, B, C, and D, and left pixels I, J, K, and L.
Meanwhile, pixels of a predicted block for prediction mode 3 are interpolated in lower-left direction at an angle of 45° between the lower-left side and the upper-right side of the predicted block, and pixels of a predicted block for prediction mode 4 are extrapolated in lower-right direction at an angle of 45° between a lower-left side and an upper-right side of the predicted block. Further, pixels of a predicted block for prediction mode 5 are extrapolated in a lower-right direction at an angle of about 26.6° (width/height=½) with respect to a vertical line. In addition, pixels of a predicted block for prediction mode 6 are extrapolated in a lower-right direction at an angle of about 26.6° with respect to a horizontal line, pixels of a predicted block for prediction mode 7 are extrapolated in a lower-left direction at an angle of about 26.6° with respect to a vertical line, and pixels of a predicted block for prediction mode 8 are interpolated in an upper direction at an angle of about 26.6° with respect to a horizontal line.
In the prediction modes 3 to 8, the pixels of the predicted block can be generated from a weighted average of the pixels A to M of the reference block decoded in advance. For example, in the case of prediction mode 4, the pixel d located in an upper right side of the predicted block can be estimated as shown in Formula 1. Here, a round( ) function is a function of rounding off to the nearest whole number.d=round(B/4+C/2+D/4)  Formula 1
Meanwhile, in a 16×16 prediction model for luma components, there are 4 modes including the prediction mode 0, prediction mode 1, prediction mode 2, and prediction mode 3 as described above.
In a case of the prediction mode 0, pixels of the predicted block are interpolated from upper pixels, and, in a case of the prediction mode 1, the pixels of the predicted block are interpolated from left pixels. Further, in a case of the prediction mode 2, the pixels of the predicted block are calculated as an average of the upper pixels and the left pixels. Lastly, in a case of the prediction mode 3, a linear “plane” function suitable for the upper pixels and the left pixels is used. The prediction mode 3 is more suitable for an area in which the luminance is smoothly changed.
As described above, in the H.264 standard, pixel values of the predicted block are generated in directions corresponding to respective modes based on adjacent pixels of the predicted block to be currently encoded in the respective prediction modes except for the DC mode.
However, there is a case where an accurate prediction is difficult to be performed only using the 9 modes as an image characteristic demands, and, in this event, the encoding efficiency can be deteriorated. For example, a certain image includes patterns having specific directivity, but the directivity may not exactly correspond with the aforementioned 8 directions. In this case, if the number of directivity modes is increased, an amount of information to be encoded may be significantly increased in comparison with the encoding efficiency, so increasing the number of directivity modes is not preferable.
That is, in the majority of cases, it is sufficient to perform the accurate prediction only with current directivity modes. However, since there is a limit to a direction of each prediction mode according to an image, a pixel value of a predicted block may not be accurately predicted due to the deterioration of the encoding efficiency. In this event, a sufficient gain of an entropy encoding cannot be obtained because of the inaccurate intra-prediction and thus a bit rate is unnecessarily increased.