The present invention relates to methods for encoding and decoding signals of video data (i.e., moving pictures).
In existing video data coding standards such as ITU-T H.261, H. 263, ISO/IEC 11172-2 (MPEG-1), and ISO/IEC 13818-2 (MPEG-2), a motion-compensated interframe prediction method is adopted for reducing temporal redundancy with respect to video data. Also in an example model based on the ISO/IEC14496-2 (MPEG-4) standard which is currently being studied, a similar motion compensating method is adopted.
Generally in motion-compensated predictive coding methods, (i) a frame to be encoded (i.e., the current frame) is divided into rectangular blocks, called xe2x80x9cmacroblocksxe2x80x9d, having 16 pixelsxc3x9716 lines, (ii) a relative amount of the motion (i.e., a motion vector having horizontal component tx and vertical component ty of displacement) with respect to a reference frame is detected for each macroblock, and (iii) an interframe difference between a predicted frame and the current frame is encoded, where the predicted frame is obtained in a manner such that the block of the reference frame corresponding to the relevant macroblock of the current frame is shifted by the motion vector.
More specifically, predicted image data (in the reference frame) which most matches the image data at point (x, y) of the current frame is represented by using coordinates (xxe2x80x2, yxe2x80x2) and the above motion vector (tx, ty) as follows.
xe2x80x83xxe2x80x2=x+tx
yxe2x80x2=y+ty
That is, the pixel value at the same point (x, y) of the reference frame is not directly used, but the pixel value at a point obtained by shifting the point (x, y) by the motion vector (tx, ty) is determined as the predicted value, thereby remarkably improving the efficiency of the interframe prediction.
On the other hand, a global motion compensation method has been proposed, in which motions of the whole picture caused by a camera motion such as panning, tilting, or zooming are predicted (refer to H. Jozawa, et al., xe2x80x9cCore Experiment on Global Motion Compensation (P1) Version 5.0xe2x80x9d, Description of Core Experiments on Efficient Coding in MPEG-4 Video, pp. 1-17, December, 1996). Below, the general structure and operation flow of the encoder and decoder used for the global motion compensation will be explained with reference to FIGS. 3 and 4.
First, frame (data) 1 to be encoded (i.e., input frame 1) and reference frame (data) 3 are input into global motion estimator 4, where global motion parameters 5 relating to the whole frame are determined. Projective transformations, bilinear transformations, or affine transformations can be used as a motion model in this system. The method disclosed by Jozawa et al. can be applied to any motion model so that the kind of motion model is unlimited; however, the general functions of the representative motion models as described above will be explained below.
With any point (x, y) of the current frame and corresponding predicted point (xxe2x80x2, yxe2x80x2) of the reference frame, the projective transformation is represented by the following formula.
xxe2x80x2=(ax+by+tx)/(px+qy+s)
yxe2x80x2=(cx+dy+ty)/(px+qy+s)xe2x80x83xe2x80x83(1)
where a, b, c, d, p, q, and s are constants. The projective transformation is a basic form of the two-dimensional transformation, and generally, the case s=1 in formula (1) is called the projective transformation. If p=q=0 and s=1, then the formula represents the affine transformation.
The following is the formula representing the bilinear transformation.
xxe2x80x2=gxy+ax+by+tx
yxe2x80x2=hxy+cx+dy+tyxe2x80x83xe2x80x83(2)
where a, b, c, d, g, and h are constants. If g=h=0 in this formula, then the affine transformation can also be obtained as the following formula (3).
xxe2x80x2=ax+by+tx
yxe2x80x2=cx+dy+tyxe2x80x83xe2x80x83(3)
In the above formulas, tx and ty respectively represent the amounts of parallel shifting motions in the horizontal and vertical directions. Parameter xe2x80x9caxe2x80x9d represents an extension/contraction or inversion effect in the horizontal direction, while parameter xe2x80x9cdxe2x80x9d represents an extension/contraction or inversion effect in the vertical direction. Parameter xe2x80x9cbxe2x80x9d represents a shearing effect in the horizontal direction, while parameter xe2x80x9ccxe2x80x9d represents a shearing effect in the vertical direction. In addition, the condition that a=cosxcex8, b=sinxcex8, c=xe2x88x92sinxcex8, and d=cosxcex8 represents rotation by angle xcex8. The condition that a=d=1 and b=c=0 represents a model equal to a conventional parallel motion model.
As explained above, the motion model employing the affine transformation can represent various motions such as parallel shift, extension/contraction, inversion, shear and rotation and any composite motions consisting of a few kinds of the above motions. Projective or bilinear transformations having many more parameters can represent more complicated motions.
The global motion parameters 5 determined in the global motion estimator 4 are input into global motion compensated predictor 6 together with reference frame 3 stored in frame memory 2. The global motion compensated predictor 6 makes the motion vector (for each pixel) calculated using the global motion parameters 5 act on the reference frame 3, so as to generate global motion-compensating predicted frame (data) 7.
On the other hand, the reference frame 3 stored in the frame memory 2 is input into local motion estimator 8 together with input frame 1. In the local motion estimator 8, motion vector 9 between the input frame 1 and the reference frame 3 is detected for each macroblock of 16 pixelsxc3x9716 lines. In the local motion compensated predictor 10, local motion-compensating predicted frame (data) 11 is generated using the motion vector 9 of each macroblock and the reference frame 3. The above operation corresponds to the conventional motion compensation method used in MPEG or the like.
Next, the prediction mode determining section 12 chooses one of the global motion-compensating predicted frame 7 and the local motion-compensating predicted frame 11 for each macroblock, the chosen one having a smaller error with respect to the input frame 1. The predicted frame 13 chosen by the prediction mode determining section 12 is input into subtracter 14, and a difference frame 15 between the input frame 1 and the predicted frame 13 is converted into DCT coefficients 17 in DCT (discrete cosine transform) section 16. Each DCT coefficient 17 obtained by the DCT section 16 is further converted into quantized index 19 in quantizer 18. The quantized index 19, global motion parameters 5, motion vector 9, and prediction mode information 26 showing the determined prediction mode output from the prediction mode determining section 12 are respectively encoded in encoding sections 101 to 104, and then multiplexed in the multiplexer 27xe2x80x2 so as to generate encoder output (i.e., encoded bit sequence) 28xe2x80x2.
In order to make the reference frames in both the encoder and decoder agree with each other, the quantized index 19 is restored to quantization representative value 21 by inverse quantizer 20, and then inversely converted into difference frame 23 by inverse DCT section 22. The difference frame 23 and the predicted frame 13 are added in adder 24, so that locally decoded frame 25 is obtained. This locally decoded frame 25 is stored in frame memory 2, and is used as a reference frame when the next frame is encoded.
In the decoder (see FIG. 4), the encoded bit sequence 28xe2x80x2 which was received is separated using demultiplexer 29xe2x80x2 into four encoded components, that is, quantized index 19, prediction mode information 26, motion vector 9, and global motion parameters 5. These four components are respectively decoded by decoding sections 201 to 204. The reference frame 3 (equal to the reference frame 3 as shown in FIG. 3) stored in frame memory 33 is input into global motion compensated predictor 34 together with the decoded global motion parameters 5. The global motion compensated predictor 34 makes the global motion parameters 5 act on the reference frame 3 so as to generate global motion-compensating predicted frame 7 which is the same as frame 7 in FIG. 3. The reference frame 3 is also input into local motion compensated predictor 35. In the local motion compensated predictor 35, the motion vector 9 acts on the reference frame 3 so as to generate local motion-compensating predicted frame 11 which is also the same as frame 11 in FIG. 3.
In the following step, the global and local motion-compensating predicted frames 7 and 11 are input into prediction mode determining section 36. In the prediction mode determining section 36, one of the global and local motion-compensating predicted frames 7 and 11 is chosen based on the decoded prediction mode information 26. The chosen frame is determined as predicted frame 13.
The decoded quantized index 19 is restored to quantization representative value 21 in inverse quantizer 30, and then inversely converted into difference frame 23 in the inverse DCT section 31. The difference frame 13 and the predicted frame 23 are added in adder 32 so that locally decoded frame 25 is obtained. This locally decoded frame 25 is stored in frame memory 33 and is used as a reference frame when the next frame is decoded.
In the global motion-compensated prediction method in the above-explained conventional technique, one of the predicted images, which has the smaller prediction error, obtained by the global and local compensated methods is chosen for each macroblock so as to improve the prediction efficiency over the whole frame. To implement such a system, it is necessary to insert a code word in the encoded data sequence, which represents which prediction method (among the global motion compensation and the local compensation) was used. This is because the decoder must be informed of which motion compensating method was used for the prediction of each macroblock. Therefore, in a proposal (by the present inventors) for the MPEG-4 currently examined for standardization, the encoded data structure (i.e., syntax) of the macroblock is as that shown in the following List 1. In List 1, the encoded data sequence is described using pseudo-C codes, and operations of the encoder and decoder are also described. FIG. 5 is a model diagram showing the data structure (i.e., bit stream structure) represented by List 1, in which data are constructed using code words D1 to D8, the motion vector, and DCT coefficient information (corresponding to the quantized index) in turn.
In MPEG-4, a conventional frame is called VOP (video object plane). The VOP has four types as shown in the following List 2.
The I-, P-, and B-VOPs are the same as I-, P-, and B-pictures defined in MPEG-1 or MPEG-2. The SPRITE-VOP is a newly introduced concept in MPEG-4, in which prediction is performed based on the background picture over the whole part of a video clip in a video data sequence (such a background image being called the xe2x80x9cstatic spritexe2x80x9d) or on the xe2x80x9cdynamic spritexe2x80x9d obtained by the global motion compensation. In the syntax shown in List 1, descriptions relating to the I-VOP and B-VOP are omitted for simplifying the explanations. Additionally, in MPEG-4, a video object of any form can be encoded and thus, shape information is also described in the relevant syntax; however, such shape information is also omitted for simplifying the explanations.
In a global motion-compensated predictive encoder suitable for the syntax of List 1, if the VOP type is SPRITE, then a 1-bit code word xe2x80x9cMCSELxe2x80x9d (see reference symbol D1 in FIG. 5) is output as the prediction mode information 26. MCSEL is a flag indicating which of the global motion compensation and the local motion compensation was used for the prediction of the current macroblock. If the global motion compensation was used, then MCSEL=1, while if the local motion compensation was used, then MCSEL=0.
If the VOP type is P or SPRITE, then a 1-bit code word xe2x80x9cCODxe2x80x9d (see reference symbol D2) is output. COD is a flag indicating whether the current macroblock was skipped. If no skip and encoding was performed, then COD=0, while if the macroblock was skipped, then COD=1. The skipping of the macroblock occurs when the type of the macroblock is INTER, the motion vector is (0,0), and all DCT coefficient values are zero. In this case, it is unnecessary to encode the macroblock type, information of the motion vector, and the DCT coefficient; thus, a large compression is possible. If COD=0, then the operation proceeds to the next step, while if COD=1, then all the following steps (relating to the current macroblock) are skipped and the operation necessary for processing the next macroblock is started.
In the next step, the encoder outputs a variable-length code word xe2x80x9cMCBPCxe2x80x9d (see reference symbol D3). MCBPC indicates the macroblock type and the absence/presence of the DCT coefficient of each of two blocks which are selected for sending color-difference signals.
The macroblock has the following five types (or modes) shown in the following List 3.
If the macroblock type belongs to the intraframe coding mode, that is, is INTRA or INTRA+Q, then code word xe2x80x9cAcpred_flagxe2x80x9d (see reference symbol D4) is output. xe2x80x9cAcpred_flagxe2x80x9d is a flag indicating whether the AC (alternating current) coefficient prediction of the DCT was performed with respect to the current macroblock. If the AC coefficient prediction was performed, then Acpred_flag=1, while if no AC coefficient prediction was performed, then Acpred_flag=0.
The encoder then outputs code word xe2x80x9cCBPYxe2x80x9d (see reference symbol D5).
CBPY indicates whether the DCT coefficients were determined with respect to four blocks for sending brightness signals. If the macroblock type is INTER+Q or INTRA+Q, then quantization step information DQUANT (variable-length code word: D6) is output.
Next, if the macroblock type does not belong to the intraframe coding mode, that is, is neither INTRA nor INTRA+Q, then motion vector information (see reference symbol D7) is output. Here, if the VOP type is SPRITE, then the motion vector information (D7) is output only when MCSEL=0, that is, when the local motion compensation was employed, and thus no motion vector information is output when the global motion compensation is employed.
In the last step, the DCT coefficient information of each 8xc3x978 block included in the 16xc3x9716 macroblock is output as quantized index 19 (see reference symbol D8).
In the above-explained syntax, if the macroblock type belongs to the intraframe coding mode (such as INTRA and INTRA+Q), then MCSEL is output also in this case. In the intraframe coding mode, neither global nor local motion compensation is performed; thus, a decision of MCSEL is useless. Therefore, in this case, there occurs the problem that 1-bit of unnecessary data is added for each macroblock.
In addition, if the global motion compensation is effective (for a frame to be encoded), the macroblock skipping is generally performed in the global motion compensation mode, and the macroblock skipping is rarely performed in the local motion compensation mode. Therefore, also in the case of the macroblock skipping, MCSEL is practically useless and there also occurs the problem that 1-bit of unnecessary data is added for each macroblock.
If the transmission rate is high, such overhead data occupies a very small portion of the whole data; thus, no serious problem occurs. However, as the Internet has become widespread very rapidly, video data transmission with a low transmission rate has been required recently. In the encoding of video data having a low-transmission rate, the rate of overhead data to the whole data is inevitably increased. Therefore, the necessity of reducing such overhead data has also increased.
More specifically, the code word MCSEL takes only one bit per macroblock. However, in a CIF (common interface format) picture of 352 pixelsxc3x97288 lines, MCSEL occupies 396 bits per frame, while in a QCIF (quarter common interface format) picture of 176 pixelsxc3x97144 lines, MCSEL occupies 99 bits per frame. The amount of MCSEL is fixed regardless of the encoding rate; thus, in the low-rate encoding, the amount of MCSEL increases and it may be a great burden on the system. For example, if QCIF pictures with a transmission rate of 10 frames/sec are encoded with a 20 kbit/sec, then MCSEL occupies a data amount of 99xc3x9710≈1 kbit/sec, which thus occupies almost 5% of the whole data rate.
The inventors of the present invention noted the above-described requirement at the start, and tried to solve the above-explained problems. That is, the present invention relates to a video data (i.e., moving pictures) predictive coding method using two kinds of prediction modes, the global and local motion compensation modes, and the objective thereof is to provide a video data predictive encoding method and a corresponding decoding method for reducing unnecessary MCSEL as much as possible, and improving the data compression efficiency.
To achieve the above objective, the present invention provides a predictive encoding method of video data, in which one of a global motion-compensating process for predicting a global motion of the whole frame and a local motion-compensating process for predicting a local motion of each block in a frame is selectively performed, wherein:
if a current block to be processed was interframe-encoded, then a code word for indicating the prediction mode is inserted in an encoded data sequence of the current block, the code word indicating which of the global and local motion-compensating processes was used for predicting the current block, and the code word inserted after another code word indicating the encoding mode of the current block;
otherwise, the code word for indicating the prediction mode is not inserted in the data sequence.
The above is the first method.
In the above method, it is possible that when the current block is block-skipped, the global motion-compensating process is always chosen and in the skipped block, the code word for indicating the prediction mode is omitted. This is the second method of the present invention.
The present invention also provides a decoding method for decoding a data sequence encoded by the above first method, wherein:
if the current block was interframe-encoded, then the code word for indicating the prediction mode is extracted from the data sequence and decoding is performed using the indicated prediction method;
otherwise the code word for indicating the prediction mode is not extracted.
The present invention also provides a decoding method for decoding a data sequence encoded by the above second method, wherein when the current block has been block-skipped, the code word for indicating the prediction mode is not extracted and a decoding process corresponding to the global motion-compensating process is performed.
As described above, if the macroblock type belongs to the intraframe coding mode, that is, the type is INTRA or INTRA+Q, then neither the global motion compensation method nor the local motion compensation method is used; thus, a flag (MCSEL) for indicating which method was adopted is unnecessary. However, in the conventional methods, MCSEL is positioned before the code word (MCBPC) for indicating the macroblock type; therefore, the decoder cannot determine whether MCSEL is necessary until MCBPC is extracted in the decider. In this case, regardless of whether the macroblock type is the intraframe coding mode, MCSEL must be added to every macroblock.
In comparison, according to the above first method according to the present invention, MCSEL is inserted after MCBPC; thus, after the decoder reads out the macroblock type, the decoder can determine whether MCSEL appears. Therefore, in the intraframe coding mode, it is unnecessary to add MCSEL, thereby reducing overhead data.
Also as explained above, if the global motion compensation is effective (for a frame to be encoded), the macroblock skipping is generally performed in the global motion compensation mode, and the macroblock skipping is rarely performed in the local motion compensation mode. Therefore, also in the case of the macroblock skipping, MCSEL is practically useless.
According to the above second method, the macroblock skipping can be limitedly performed in the global motion compensation, thereby omitting MCSEL at the macroblock skipping and further reducing unnecessary overhead data.
That is, according to the predictive encoding and decoding methods of video data of the present invention, unnecessary MCSEL data can be reduced as much as possible, so that overhead data can be reduced and the data-compression efficiency can be improved. The lower the encoding rate, the clearer the effect of the present invention.
The present invention also provides a storage medium storing a program for making a computer execute any method as described above, and a storage medium storing data encoded by any encoding method as described above.