1. Field of the Invention
The present invention relates to high-efficiency encoding of video information in the form of a stream of pictures expressed by a video signal, for the purpose of transmitting or storing the video information, whereby the pictures can be conveyed as a smaller amount of code than has been possible in the prior art.
2. Description of the Related Art
Typical methods of high-efficiency encoding of video information which are widely used at present are the MPEG-1 or MPEG-2 standards, i.e. Moving Picture Experts Group international encoding standards which have been set by the IEC/ISO. With an MPEG encoding system, certain periodically selected pictures (transmitted as respective frames of a digital video signal, each picture being conveyed as an array of pixels each expressed by a digitized sample) are encoded by motion compensation, with each picture being encoded as a set of blocks of pixels. With motion compensation, a part of a preceding picture is shifted spatially, such as to to derive predicted pixel values for a block which is being encoded. The requisite amount and direction of that shift is expressed as a motion vector, which is derived by a process referred to as motion estimation. The respective differences between the actual pixel values and the prediction values, referred to in the following as the prediction error values, are obtained and encoded using DCT conversion and variable-length encoding of the resultant coefficients. The motion vectors are generally derived for blocks of 8.times.8 or 16.times.16 element size, and the DCT generally applied to blocks of 8.times.8 values, however for simplicity of description it will be assumed in the following that the same block size (e.g., 8.times.8 elements) is utilized in all encoding/decoding operations.
Since it is necessary to use the prediction error amounts for the blocks in conjunction with the motion vectors for the blocks, at the time of subsequent decoding, the motion vectors are also encoded. This is done by variable-length encoding, such as Huffman encoding, since the motion vectors do not generally change substantially between successive blocks of a picture.
FIG. 11 is a general system block diagram of an example of a prior art type of motion compensation encoding apparatus which utilizes the above principles. In the following, when describing both this prior art example and embodiments of the present invention, only the encoding processing applied to those pictures which are encoded by motion compensation will be described.
In FIG. 11, a digital video signal from a video input terminal 1 is supplied to one input of a subtractor 2. In the following description of this prior art example and also in the subsequent description of embodiments of the invention, it is to be understood that such a digital video signal consists of successive digital samples expressing respective pixels, supplied to the input terminal 1 in the appropriate sequence for use by the encoding apparatus, i.e. so that successive blocks of pixels of a picture are operated on. A motion estimation section 15 operates on the input video signal to derive respective motion vectors for the blocks which are to be encoded, and supplies the motion vectors to a motion compensation section 8. As each pixel value of such a block which is being encoded is supplied to the subtractor 2, a corresponding prediction value for that pixel is derived by the motion compensation section 8 and supplied to the other input of the subtractor 2, to thereby derive the corresponding prediction error value. The prediction values are generated through motion compensation by the motion compensation section 8, i.e., by shifting a reconstructed picture (supplied from the picture memory 51) by an amount and direction specified by the corresponding motion vector, with that motion compensation typically using linear interpolation to achieve an accuracy of 1/2 pixel. The prediction error values thereby successively derived for the block are supplied to a DCT section 3. The DCT section 3 executes DCT conversion processing on that set of pixels, i.e., as a 2-dimensional (8.times.8) array of values, and supplies the resultant set of DCT coefficients for the block to a quantizer 4. The quantizer 4 performs quantization of the DCT coefficients using a predetermined quantization step size, and supplies the resultant values to a variable-length encoder 5 and to a dequantizer 12, as a 2-dimensional array of quantized coefficients.
The variable-length encoder 5 performs conversion of each of these 2-dimensional arrays to a 1-dimensional set of values by array conversion, using zigzag scanning, with Huffman encoding then being applied to express the resultant sequence of values as runs of consecutive 0 values and runs of values other than 0. In that way, respective bit streams are derived for the prediction error values of each of the blocks of a picture which is being encoded, and are successively supplied to a multiplexer 6, to be multiplexed with bit streams which are derived by encoding the motion vectors which are derived for that picture. The resultant code stream is supplied to a code output terminal 7.
The dequantizer 12 and inverse DCT section 11 perform inverse processing to that executed by the quantizer 4 and the DCT section 3 on a block which is being encoded, to reconstruct the respective prediction error values for each of the pixels of the block. As each such pixel prediction error value is thereby reconstructed, it is added in an adder 10 to a prediction value which has been derived by the motion compensation section 8 to thereby obtain respective reconstructed values for each of the pixels of a picture, with resultant reconstructed pictures being stored in the picture memory 51, to be supplied to the motion compensation section 8.
The motion vectors derived by the motion estimation section 15 are also supplied to a motion vector encoder 13. Typically, the motion vectors are derived to an accuracy of 1/2 pixel for each block.
The motion compensation section 8 receives from the picture memory 51 the pixel values of a previously encoded picture, i.e., a picture which has been reconstructed, and which is to be used as a reference picture, selects a region of the reference picture that is determined by the motion vector for a block which is being encoded, and successively outputs to the subtractor 2 successive pixel values of that reference picture region, as a motion compensation prediction signal, to be subtracted from the actual pixel values of the input video signal, and thereby derive the aforementioned prediction error values.
Typically, the motion compensation section 8 will utilize linear interpolation in deriving the motion compensation prediction signal, enabling an accuracy of 1/2 pixel to be attained for motion prediction.
The motion vector encoder 13 compares the x and y-direction components of the motion vector of a block which is to be encoded with those of the motion vector of the immediately preceding encoded block and performs Huffman encoding of the resultant difference values, with the resultant bit streams being supplied to the multiplexer 6 to be multiplexed with the bit streams which are obtained for the prediction error values as described above.
A decoding apparatus corresponding to the motion compensation encoding apparatus of FIG. 11 will be described in the following, referring to the general system block diagram of such a decoding apparatus which is shown in FIG. 12. Here, an input code stream which has been generated by the motion compensation encoding apparatus of FIG. 11 is supplied to an input terminal 61 and is separated by the demultiplexer 62 into the bit streams for the aforementioned prediction error values of respective (8.times.8) blocks of a picture which has been encoded by motion compensation and the bit streams for the motion vectors which were derived for that picture. The prediction error value bit streams are restored to fixed-length code form by the variable-length decoder 63, and reconstructed prediction error values for respective blocks of a picture which is being decoded are then obtained by the dequantizer 72 and the inverse DCT section 71. The respective prediction error values for pixels of a block which is being decoded are successively supplied to a adder 70. Predicted pixel values, derived by motion compensation, are supplied from a motion compensation section 78 to the other input of the adder 70, to thereby obtain reconstructed pixel values for the picture which is being decoded.
These reconstructed pixel values are supplied to a output video terminal 64, and also to a picture memory 52, to be temporarily stored for use as a reconstructed reference picture.
The bit streams for the motion vectors are supplied to a motion vector decoder 65, which derives decoded motion vectors, and supplies that information to the motion compensation section 78. The motion compensation section 78 derives respective motion-compensated predicted pixel values, using the motion vector information supplied from the motion vector decoder 65 in conjunction with reconstructed reference picture values supplied from the picture memory 52, and supplies these predicted values to the adder 70. It can be understood that the dequantizer 72, the inverse DCT section 71, the adder 70, the picture memory 52 and the motion compensation section 78 of this decoding apparatus respectively operate in the same manner as for the dequantizer 12, the inverse DCT section 11, the subtractor 10, the picture memory 51 and the motion compensation section 8 of the encoding apparatus of FIG. 11.
The motion vector bit streams which are separated from the input code stream by the demultiplexer 62 are converted from variable-length encoding form to fixed-length code by the motion vector decoder 65, and each of the motion vectors which are thereby obtained is added to the motion vector obtained for the preceding block, to thereby obtain information specifying a motion vector for the block which is currently being decoded, which is supplied to the motion compensation section 78.
With a prior art motion compensation encoding apparatus, the motion vectors which are obtained by motion estimation are directly encoded, without change. The operation of a motion estimation section such as the motion estimation section 15 of FIG. 11 serves to derive, for each block, the motion vector which will result in the smallest amount of prediction error. In that way, the amount of code which is generated as encoded prediction error values will be minimized. However, there are many cases in which the derivation of such optimum motion vectors by motion estimation will result in an excessive amount of code being derived for the motion vectors. This is due to the fact that with variable-length encoding of the motion vectors, the amount of motion vector code which is generated (e.g. by the motion vector encoder 13 of FIG. 11) is determined by the numbers of continuous runs of the same motion vectors. However if only the optimum motion vectors are always utilized, then even if the amount of reduction of prediction error value code that would result from using a different motion vector for the current block (i.e., different from the motion vector derived for the preceding block) is very small, that different motion vector will invariably be utilized. This can result in unnecessary amounts of motion vector code being generated, leading to a lowering of overall encoding efficiency.
Since there is no control of the amount of code which is generated by encoding the motion vectors, if that amount become excessively large, the amount of code which is generated from the prediction error values will be suppressed. This problem is especially severe when the block size that is utilized for motion compensation is small, and when highly accurate motion compensation is executed.