1. Field of the Invention
The present invention relates to an image encoding method and an image decoding method for use in moving image or still image, and an image encoding apparatus and an image decoding apparatus.
2. Description of the Related Art
In recent years, a video encoding method by which coding efficiency is largely improved compared with a conventional art is urged as ITU-TRec.H.264 and ISO/IEC14496-10 (referred to as “H.264”) jointly with ITU-T and ISO/IEC. The conventional encoding system such as ISO/IECMPEG-1,2,4, ITU-TH.261, H.263 performs intra prediction on an orthogonal-transformed frequency domain (DCT coefficients) to reduce the number of encoded bits of the transform coefficients. H.264 takes in a directional prediction in a space region (pixel region) (non-patent literature 1) to realize the high prediction efficiency in comparison with intra prediction of a conventional video encoding system (ISO/IECMPEG-1,2,4).
In H.264 high profile, etc., three kinds of intra prediction systems are defined for a luminance signal, and one of the systems can be selected in units of macroblock (16×16-pixel block). The prediction systems are referred to as 4×4-pixel prediction, 8×8-pixel prediction, and 16×16-pixel prediction, respectively.
Four encoding modes are defined for the 16×16-pixel prediction, and referred to as a vertical prediction, a horizontal prediction, a DC prediction, and a plane prediction. The pixel value of surrounding decoded macroblocks before being subjected to a deblocking filter is used as a reference pixel value and utilized for a prediction process.
The 4×4-pixel prediction divides a luminance signal in the macroblock into 16 4×4-pixel blocks, and selects one of nine modes for each 4×4-pixel block. The nine modes each have a prediction direction in units of 22.5 degrees except for DC prediction (mode 2) to predict with an average of available reference pixels, and extrapolates the macroblock in the prediction direction using the reference pixel to generate a predicted value. The mode information of the 4×4-pixel prediction needs 16 information items per one macroblock. Because the 4×4-pixel prediction is small in unit of a prediction process, prediction of the comparatively high efficiency can be performed on an image having a complicated texture. However, this 4×4-pixel prediction is prediction done only by copying an interpolation value simply in the prediction direction, so that there is a problem that the prediction error increases as the distance with respect to the reference pixel increases.
The 8×8-pixel prediction divides a luminance signal in the macroblock into four 8×8-pixel blocks, and any one of the nine modes is selected for each 8×8-pixel block. The 8×8-pixel prediction mode is designed by the same framework as the 4×4-pixel prediction, performs filtering of three taps on the already encoded reference pixel, and includes a process to average distortion by flatting the reference pixel used for prediction. However, there is a problem that the prediction does not prove right more and more with increase of a distance with respect to the reference pixel like the 4×4-pixel prediction. There is a problem that prediction precision cannot be expected for an image having a complicated texture since the distance with respect to the reference pixel becomes longer compared with the 4×4-pixel prediction.
The 8×8-pixel prediction is a prediction unit prescribed only by a H.264 high profile, and introduced, in particularly, for the purpose of improving coding efficiency with a high-resolution image. A 4×4-pixel transformed/quantized block size is used in the 4×4-pixel prediction, and a 8×8 pixel transformed/quantized block size is used in the 8×8-pixel prediction. In other words, the transformed/quantized block size is defined by a prediction block shape. Because compatibility of a main profile and a high profile is considered for the prediction block shape, the 8×8-pixel prediction and the 4×4-pixel prediction cannot be coexisted in a macroblock in a standard.
In order to reduce the number of encoded bits of mode information, the 4×4-pixel prediction or 8×8-pixel prediction of H.264 reduces the number of encoded bits by predicting mode information using correlation level of mode information of adjacent blocks. When the prediction of mode information proves right, a flag of 1 bit is encoded, and when it does not prove right, data of 3 bits further are encoded, whereby the number of encoded bits of mode information is reduced. However, if the 4×4-pixel prediction is selected at a time when the error signal is not almost generated in the macroblock, minimum 16 bits (maximum 64 bits) must be encoded, resulting in largely deteriorating coding efficiency.
JP-A 2003-323736 (KOKAI) proposes a system for performing prediction by blockmatching in a frame and supplement of a prediction value of a predictive block from the encoded reference image. This system is a prediction method premising that an image of an arbitrary encoded block in a frame is similar to that of a block to be predicted, and has a problem that the prediction precision is poor when correlation of blocks in a frame is low. The position displacement quantity showing a position of a reference image used in the prediction must be encoded, resulting in increasing the number of encoded bits of mode information.
As discussed above, in the case where interpolated pixels according to a prediction mode is generated from an encoded reference image by a method prescribed in a H.264 high profile, and a predictive image signal is generated by copying the interpolated pixels in a direction prescribed by a prediction mode, there occurs a problem that a prediction error increases as the distance between the prediction pixel and the reference pixel increases in the prediction direction, resulting in that prediction block shapes cannot be coexisted in the macroblock, and the number of encoded bits of mode information cannot be decreased.