1. Field of the Invention
The present invention relates to a video encoding method and apparatus and a video decoding method and apparatus which use a motion compensation predictive inter-frame encoding.
2. Description of the Related Art
As video compression encoding techniques, MPEG-1 (ISO/IEC11172-2), MPEG-2 (ISO/IEC13818-2), MPEG-4 (ISO/IEC14496-2), and the like have been widely used. In these video encoding schemes, encoding is performed by a combination of intra-frame encoding, forward predictive inter-frame encoding, and bi-directional predictive encoding. Frames encoded in these encoding modes are called I, P, and B pictures, respectively. The P picture is encoded by using the immediately preceding P or I picture as a reference frame. The B picture is encoded by using the immediately preceding and succeeding P or I pictures as reference frames. Forward predictive inter-frame encoding and bi-directional predictive encoding are called motion compensation predictive inter-frame encoding.
In video encoding based on an MPEG scheme, a prediction picture can be selectively generated for each macroblock from one or more video frames. In the case of P pictures, a prediction picture is generally generated on a macroblock basis from one reference frame. In the case of B pictures, either a prediction picture is generated from one of a forward reference frame and a backward reference frame, or a prediction picture is generated from the average value of reference macroblocks extracted from both a forward reference frame and a backward reference frame. The information of these prediction modes is embedded in encoded data for each macroblock.
In such motion compensation predictive inter-frame encoding, when the same picture moves temporally and horizontally between frames in an area equal to or larger than the size of each macroblock, a good prediction result can be obtained. With regard to temporal enlargement/reduction and rotation of pictures or temporal variations in signal amplitude such as fade-in and fade-out, however, high prediction efficiency cannot always be obtained. In encoding at a constant bit rate, if such pictures from which high prediction efficiency cannot be obtained are input, a great deterioration in picture quality may occur. In encoding at a variable bit rate, a large code amount is consumed for pictures with poor prediction efficiency to suppress deterioration in picture quality, resulting in an increase in total code amount.
Temporal enlargement/reduction, rotation, and fade-in/fade-out of pictures can be approximated by affine transformation of video signals. Predictions using affine transformation will therefore greatly improve prediction efficiency. In order to estimate a parameter for affine transformation, an enormous amount of parameter estimation computation is required at the time of encoding. More specifically, a reference picture must be transformed by using a plurality of transformation parameters, and one of the parameters that exhibits the minimum prediction residual error must be determined. This requires an enormous amount of transformation computation. This leads to an enormous amount of encoding computation or an enormous increase in hardware cost and the like. In addition, a transformation parameter itself must be encoded as well as a prediction residual error, and hence the encoded data becomes enormous. In addition, inverse affine transformation is required at the time of decoding, resulting in a great amount of decoding computation or a very high hardware cost.
As described above, in the conventional video encoding methods such as MPEGs, sufficient prediction efficiency cannot be obtained with respect to temporal changes in video other than translations. In addition, in the video encoding and decoding method using affine transformation, although prediction efficiency itself can be improved, the overhead for encoded data increases and the encoding and decoding costs greatly increase.
It is an object of the present invention to suppress increases in computation amount and the overhead for predictive picture encoded data, while greatly improving prediction efficiency, in video encoding and decoding, particularly for fading pictures, which has to date been a weak point in conventional video encoding methods such as MPEG.