Compression of digital video to a very low bit rate, VLBR, is a very important problem in the field of communications. In general, a VLBR is considered not to exceed 64 kilo-bits per second (Kbps) and is associated with existing personal communication apparatus, such as the public switch telephone network and cellular apparatus. To provide services like video on demand and video conferencing on these apparatus, would require the information contained in a digital video sequence to be compressed by a factor of 300 to 1. To achieve such large compression ratios, requires that all redundancy present in a video sequence be removed.
Current standards, such as H.261, MPEG1, and MPEG2 provide compression of a digital video sequence by utilizing a block motion-compensated Discrete Cosine Transform, DCT, approach. This video encoding technique removes the redundancy present in a video sequence by utilizing a two-step process. In the first step, a block-matching, BM, motion estimation and compensation algorithm estimates the motion that occurs between two temporally adjacent frames. The frames are then compensated for the estimated motion and compared to form a difference image. By taking the difference between the two temporally adjacent frames, all existing temporal redundancy is removed. The only information that remains is new information that could not be compensated for in the motion estimation and compensation algorithm.
In the second step, this new information is transformed into the frequency domain using the DCT. The DCT has the property of compacting the energy of this new information into a few low frequency components. Further compression of the video sequence is obtained by limiting the amount of high frequency information encoded.
The majority of the compression provided by this approach to video encoding is obtained by the motion estimation and compensation algorithm. That is, it is much more efficient to transmit information regarding the motion that exists in a video sequence, as opposed to information about the intensity and color. The motion information is represented using vectors which point from a particular location in the current intensity frame to where that same location originated in the previous intensity frame. For BM, the locations are predetermined non-overlapping blocks of equal size. All pixels contained in these blocks are assumed to have the same motion. The motion vector associated with a particular block in the present frame of a video sequence is found by searching over a predetermined search area, in the previous temporally adjacent frame for a best match. This best match is generally determined using the mean-squared-error (MSE) or mean-absolute-difference (MAD) between the two blocks. The motion vector points from the center of the block in the current frame to the center of the block which provides the best match in the previous frame.
Utilizing the estimated motion vectors, a copy of the previous frame is altered by each vector to produce a prediction of the current frame. This operation is referred to as motion compensation. As described above, the predicted frame is subtracted from the current frame to produce a difference frame which is transformed into the spatial frequency domain by the DCT. These spatial frequency coefficients are quantized and entropy encoded, providing further compression of the original video sequence. Both the motion vectors and the DCT coefficients are transmitted to the decoder, where the inverse operations are performed to produce the decoded video sequence.
It is well known in video compression that a dense motion vector field provides a much higher quality prediction of the current frame. However, since each pixel element, pixel, in a dense motion vector field has a motion vector associated with it, such a representation of the motion in the video sequence is prohibitively large to transmit. Therefore, video encoders are forced to utilize a BM approach to motion estimation and compensation. A method and apparatus that would allow a dense motion vector field to be used within the video encoder would be extremely beneficial and enabling.