The interframe predictive coding method for coding motion pictures (i.e., video data) is known, in which an already-encoded frame is used as a prediction signal so as to reduce temporal redundancy. In order to improve the efficiency of the time-based prediction, a motion-compensating interframe prediction method is used in which a motion-compensated picture signal is used as a prediction signal. The number and the kinds of components of the motion vector used for the motion compensation are determined depending on the assumed motion model used as a basis. For example, in a motion model in which only translational movement is considered, the motion vector consists of components corresponding to horizontal and vertical motions. In another motion model in which extension and contraction are also considered in addition to the translational movement, the motion vector consists of components corresponding to horizontal and vertical motions, and a component corresponding to the extending or contracting motion.
Generally, the motion compensation is executed for each small area obtained by dividing a picture into a plurality of areas such as small blocks, and each divided area has an individual motion vector. It is known that the motion vectors belonging to neighboring areas including adjacent small areas have a higher correlation. Therefore, in practice, the motion vector of an area to be encoded is predicted based on the motion vector of an area which neighbors the area to be encoded, and a prediction error generated at the prediction is variable-length-encoded so as to reduce the redundancy of the motion vector.
In the moving-picture coding method ISO/IEC 11172-2 (MPEG-1), the picture to be encoded is divided into small blocks so as to motion-compensate each small block, and the motion vector of a small block to be encoded (hereinbelow, called the “target small block”) is predicted based on the motion vector of a small block which has already been encoded.
In the above MPEG-1, only translational motions can be compensated. It may be impossible to compensate more complicated motions with a simpler model, such as MPEG-1, which has few components of the motion vector. Accordingly, the efficiency of the interframe prediction can be improved by using a motion-compensating method which corresponds to a more complicated model having a greater number of components of the motion vector. However, when each small block is motion-compensated in such a method for a complicated motion model, the amount of codes generated when encoding the relevant motion vector is increased.
An encoding method for avoiding such an increase of the amount of generated codes is known, in which the motion-vector encoding is performed using a method, selected from a plurality of motion-compensating methods, which minimizes the prediction error with respect to the target block. The following is an example of such an encoding method in which two motion-compensating methods are provided, one method corresponding to a translational motion model, the other corresponding to a translational motion and extending/contracting motion model, and one of the two motion-compensating methods is chosen.
FIG. 9 shows a translational motion model (see part (a)) and a translational motion and extending/contracting motion model (see part (b)). In the translational motion model of part (a), the motion of a target object is represented using a translational motion component (x, y). In the translational motion and extending/contracting motion model of part (b), the motion of a target object is represented using a component (x, y, z) in which parameter Z for indicating the amount of extension or contraction of the target object is added to the translational motion component (x, y). In the example shown in FIG. 9, parameter Z has a value corresponding to the contraction (see part (b)).
Accordingly, motion vector
of the translational motion model is represented by:
while motion vector
of the translational motion and extending/contracting motion model is represented by:

In the above formulas, x, y, and z respectively indicate horizontal, vertical, and extending/contracting direction components. Here, the unit for motion compensation is a small block, the active motion-compensating method may be switched for each small block in accordance with the present prediction efficiency, and the motion vector is predicted based on the motion vector of an already-encoded small block.
If the motion-compensating method chosen for the target small block is the same as that adopted for the already-encoded small block, the prediction error of the motion vector is calculated by the following equations.
For the translational motion model:d1x,y=v1x,y(i)−v1 x,y(i−1)  (1)
For the translational motion and extending/contracting motion model:d2x,y,z=v2x,y,z(i)−v2x,y,z(i−1)  (2)
Here, v1x,y(i) and v2x,y,z(i) mean components of the motion vector of the target small block, while v1x,y(i−1) and v2x,y,z(i−1) mean components of the motion vector of a small block of the previous frame.
As explained above, prediction errors d x,y and d x,y,z are calculated and encoded so as to transmit the encoded data to the decoding side. Even if the size of each small block is not the same in the motion-compensating method, the motion vector predictive encoding is similarly performed if the motion model is the same.
If the motion-compensating method chosen for the target small block differs from that adopted for the already-encoded small block, or if intraframe coding is performed, then the predicted value for each component is set to 0 and the original values of each component of the target small block are transmitted to the decoding side.
By using such an encoding method, the redundancy of the motion vector with respect to the motion-compensating interframe predictive encoding can be reduced and the amount of generated codes of the motion vector can be reduced.
On the other hand, the motion vector which has been encoded using the above-described encoding method is decoded in a manner such that the prediction error is extracted from the encoded data sequence, and the motion vector of the small block to be decoded (i.e., the target small block) is decoded by adding the prediction error to the motion vector which has already been decoded. See the following equations.
For the translational motion model:v1x,y(i)=v1x,y(i−1)+d1x,y  (3)
For the translational motion and extending/contracting motion model:v2x,y,z(i)=v2x,y,z(i−1)+d2x,y,z  (4)
Here, v1x,y(i) and v2x,y,z(i) mean components of the motion vector of the target small block, while v1x,y(i−1) and v2x,y,z(i−1) mean components of the already-decoded motion vector.
In the model ISO/IEC 14496-2 (MPEG-4) under testing for international standardization in January, 1999, a similar motion-compensating method is adopted. The MPEG-4 adopts a global motion-compensating method for predicting the general change or movement of a picture caused by panning, tilting and zooming operations of the camera (refer to “MPEG-4 Video Verification Model Version 7.0”, ISO/IEC JTC1/SC29/WG11N1682, MPEG Video Group, April, 1997). Hereinafter, the structure and the operational flow of the encoder using the global motion compensation will be explained with reference to FIG. 11.
First, a picture to be encoded (i.e., target picture) 31 is input into global motion detector 34 so as to determine global motion parameters 35 with respect to the entire picture. In the MPEG-4, the projective transformation and the affine transformation may be used in the motion model.
With a target point (x,y) and a corresponding point (x′,y′) relating to the transformation, the projective transformation can be represented using the following equations (5) and (6).x′=(ax+by+tx)/(px+qy+s)  (5)y′=(cx+dy+ty)/(px+qy+s)  (6)
Generally, the case of “s=1” belongs to the projective transformation. The projective transformation is a general representation of the two dimensional transformation, and the affine transformation is represented by the following equations (7) and (8), which can be obtained under conditions of “p=Q=0” and “s=1”.x′=ax+by+tx  (7)y′=cx+dy+ty  (8)
In the above equations, “tx” and “ty” respectively represent the amounts of translational motions in the horizontal and vertical directions. Parameter “a” represents extension/contraction or inversion in the horizontal direction, while parameter “b” represents extension/contraction or inversion in the vertical direction. Parameter “b” represents shear in the horizontal direction, while parameter “c” represents shear in the vertical direction. In addition, the conditions of “a=cos θ, b=sin θ, c=−sin θ, and d=cos θ” correspond to rotation of angle θ. The conditions of “a=d=1” and “b=c=0” equal the conventional translational motion model.
As explained above, the affine transformation used as the motion model enables the representation of various motions such as translational movement, extension/contraction, reverse, shear, and rotation, and any combination of these motions. A projective transformation having eight or nine parameters can represent more complicated motions or deformations.
The global motion parameters 35, determined by the global motion detector 34, and reference picture 33 stored in the frame memory 32 are input into global motion compensator 36. The global motion compensator 36 generates a global motion-compensating predicted picture 37 by making the motion vector of each pixel, determined based on the global motion parameters 35, act on the reference picture 33.
The reference picture 33 stored in the frame memory 32, and the input picture 31 are input into local motion detector 38. The local motion detector 38 detects, for each macro block (16 pixels×16 lines), motion vector 39 between input picture 31 and reference picture 33. The local motion compensator 40 generates a local motion-compensating predicted picture 41 based on the motion vector 39 of each macro block and the reference picture 33. This method equals the conventional motion-compensating method used in the conventional MPEG or the like.
Next, one of the global motion-compensating predicted picture 37 and the local motion-compensating predicted picture 41, whichever has the smaller error with respect to the input picture 31, is chosen in the encoding mode selector 42 for each macro block. This choice is performed for each macro block. If the global motion compensation is chosen, the local motion compensation is not performed in the relevant macro block; thus, motion vector 39 is not encoded. The predicted picture 43 chosen via the encoding mode selector 42 is input into subtracter 44, and picture 45 corresponding to the difference between the input picture 31 and the predicted picture 43 is converted into DCT (discrete cosine transformation) coefficient 47 by DCT section 46. The DCT coefficient 47 is then converted into quantized index 49 in quantizer 48. The quantized index 49 is encoded by quantized-index encoder 57, encoded-mode choice information 56 is encoded by encoded-mode encoder 58, motion vector 39 is encoded by motion-vector encoder 59, and the global motion parameters 35 are encoded by global-motion-parameter encoder 60. These encoded data are multiplexed and output as an encoder output.
In order for the encoder to also acquire the same decoded picture as acquired in the decoder, the quantized index 49 is inverse-converted into a quantization representative value 51 by inverse quantizer 50, and is further inverse-converted into difference picture 53 by inverse-DCT section 52. The difference picture 53 and predicted picture 43 are added to each other by adder 54 so that local decoded picture 55 is generated. This local decoded picture 55 is stored in the frame memory 32 and is used as a reference picture at the encoding of the next frame.
Next, relevant decoding operations of the MPEG-4 decoder will be explained with reference to FIG. 12. The multiplexed and encoded bit stream is divided into each element, and the elements are respectively decoded. The quantized-index decoder 61 decodes quantized index 49, encoded-mode decoder 62 decodes encoded-mode choice information 56, motion-vector decoder 63 decodes motion vector 39, and global-motion-parameter decoder 64 decodes global motion parameters 35.
The reference picture 33 stored in the frame memory 68 and global motion parameters 35 are input into global motion compensator 69 so that global motion-compensated picture 37 is generated. In addition, the reference picture 33 and motion vector 39 are input into local motion compensator 70 so that local motion-compensating predicted picture 41 is generated. The encoded-mode choice information 56 activates switch 71 so that one of the global motion-compensated picture 37 and the local motion-compensated picture 41 is output as predicted picture 43.
The quantized index 49 is inverse-converted into quantization representative value 51 by inverse-quantizer 65, and is further inverse-converted into difference picture 53 by inverse-DCT section 66. The difference picture 53 and predicted picture 43 are added to each other by adder 67 so that local decoded picture 55 is generated. This local decoded picture 55 is stored in the frame memory 68 and is used as a reference picture when encoding the next frame.
In the above-explained global motion-compensating predictive method adopted in MPEG-4, one of the predicted pictures of the global motion compensation and the local motion compensation, whichever has the smaller error, is chosen for each macro block so that the prediction efficiency of the entire frame is improved. In addition, the motion vector is not encoded in the macro block to which the global motion compensation is adopted; thus, the generated codes can be reduced by the amount necessary for conventional encoding of the motion vector.
On the other hand, in the conventional method in which the active motion-compensating method is switched between a plurality of motion-compensating methods corresponding to different motion models, no prediction relating to a shift between motion vectors belonging to different motion models is performed. For example, in the encoding method in which the motion-compensating method corresponding to a translational motion model and the motion-compensating method corresponding to a translational motion and extending/contracting motion model are switched, a shift from the motion vector of the translational motion and extending/contracting motion model to the motion vector of the translational motion model cannot be simply predicted using a difference, because the number of used parameters with respect to the motion vector is different between the two methods.
However, redundancy of the motion vector may also occur between different motion models. Therefore, correlation between the motion vector of the translational motion model and the motion vector of the translational motion and extending/contracting motion model will be examined with reference to motion vectors shown in FIG. 10. In FIG. 10, it is assumed that in the motion compensation of target small blocks Boa and Bob, the target small block Boa is motion-compensated using the method corresponding to the translational motion model and referring to small block Bra included in the reference frame, while the target small block Bob is motion-compensated using the method corresponding to the translational motion and extending/contracting motion model and referring to small block Brb included in the reference frame.
In this case, motion vector {right arrow over (va)}=(xa, ya) in FIG. 10 indicates the translational motion model, while motion vector {right arrow over (vb)}=(xb, yb, zb) in FIG. 10 indicates the translational motion and extending/contracting motion model. Here, in the motion compensation of the small block Bob, small block Brb in the reference frame to be referred to is extended. Therefore, the translational motion components of the motion vector va and vb in FIG. 10 have almost the same values and redundancy exists.
However, in the conventional method, such redundancy between motion vectors of different motion models cannot be reduced because no motion vector of a motion model which differs from the present motion model is predicted based on the motion vector of the present model.
In the above MPEG-4, predictive encoding is adopted so as to efficiently encode the motion vector. For example, the operations of motion-vector encoder 59 in FIG. 11 are as follows. As shown in FIG. 13, three motion vectors such as motion vector MV1 of the left block, motion vector MV2 of the block immediately above, and motion vector MV3 of the block diagonally above to the right are referred to so as to obtain a median thereof as a predicted value of the motion vector MV of the present block. The predicted value PMV of the vector MV of the present block is defined using the following equation (9).PMV=median (MV1, MV2, MV3)  (9)
If the reference block corresponds to the intraframe-coding mode, no motion vector exists. Therefore, the median is calculated with vector value 0 at the relevant position. If the reference block has been predicted using the global motion compensation, no motion vector exists. Therefore, the median is calculated with vector value 0 at the relevant position also in this case. For example, if the left block was predicted using the local motion compensation, the block immediately above was predicted using the global motion compensation, and the block diagonally above to the right was encoded using the intraframe coding method, then MV2=MV3=0. In addition, if the three reference blocks were all predicted using the global motion compensation, then MV1=MV2=MV3=0. In this case, the median is also 0 and thus the predicted value is 0. Therefore, this case is equal to the case that the motion vector of the target block is not subjected to predictive encoding, and the encoding efficiency is degraded.
In the MPEG-4, the following seven kinds of ranges (see List 1) are defined with respect to the size of the local motion vector, and the used range is communicated to the decoder by using a codeword “fcode” included in the bit stream.
List 1fcodeRange of motion vector1 −16to +15.5pixels2 −32to +31.5pixels3 −64to +63.5pixels4 −128to +127.5pixels5 −256to +255.5pixels6 −512to +511.5pixels7−1024to+1023.5pixels
The global motion parameters used in MPEG-4 may have a wide range of −2048 to +2047.5; thus, the motion vector determined based on the global motion vector may have a value from −2048 to +2047.5. However, the range of the local motion vector is smaller than the above range and the prediction may have a large error. For example, if fcode=3; the motion vector of the target block (Vx, Vy)=(+48, +36.5); the predicted vector determined based on the global motion vector (PVx, PVy)=(+102, +75), then the prediction error (MVDx, MVDy)=(−54, −38.5). The absolute values of this error are thus larger than the above values of the motion vector (Vx, Vy). The smaller the absolute values of the prediction error (MVDx, MVDy), the shorter the length of the codeword assigned to the prediction error. Therefore, there is a disadvantage in that the amount of code are increased due to the prediction of the motion vector.
Therefore, the objective of the present invention is to provide a motion vector predictive encoding method, a motion vector decoding method, a predictive encoding apparatus, a decoding apparatuses, and computer-readable storage media storing motion vector predictive encoding and decoding programs, which reduce the amount of generated code with respect to the motion vector, and improve the efficiency of the motion-vector prediction.