Compression of digital video to a very low bit rate, VLBR, is a very important problem in the field of communications. In general, a VLBR is considered not exceed to 64 kilo-bits per second (Kbps) and is associated with existing personal communication system, such as the public switch telephone network and cellular system. To provide services like video on demand and video conferencing on these system, would require the information contained in a digital video sequence to be compressed by a factor of 300 to 1. To achieve such large compression ratios, requires that all redundancy present in a video sequence be removed.
Current standards, such as H.261, MPEG1, and MPEG2 provide compression of a digital video sequence by utilizing a block motion-compensated Discrete Cosine Transform, DCT, approach. This video encoding technique removes the redundancy present in a video sequence by utilizing a two-step process. In the first step, a block-matching, BM, motion estimation and compensation algorithm estimates the motion that occurs between two temporally adjacent frames. The frames are then compensated for the estimated motion and compared to form a difference image. By taking the difference between the two temporally adjacent frames, all existing temporally redundancy is removed. The only information that remains is new information that could not be compensated for in the motion estimation and compensation algorithm.
In the second step, this new information is transformed into the frequency domain using the DCT. The DCT has the property of compacting the energy of this new information into a few low frequency components. Further compression of the video sequence is obtained by limiting the amount of high frequency information encoded.
The majority of the compression provided by this approach to video encoding is obtained by the motion estimation and compensation algorithm. That is, it is much more efficient to transmit information regarding the motion that exists in a video sequence, as opposed to information about the intensity and color. The motion information is represented using vectors which point from a particular location in the current intensity frame to where that same location originated in the previous intensity frame. For BM, the locations are predetermined non-overlapping blocks of equal size. All pixels contained in these blocks are assumed to have the same motion. The motion vector associated with a particular block in the present frame of a video sequence is found by searching over a predetermined search area in the previous temporally adjacent frame for a best match. This best match is generally determined using the mean-squared-error (MSE) or mean-absolute-difference (MAD) between the two blocks. The motion vector points from the center of the block in the current fram to the center of the block which provides the best match in the previous frame.
Utilizing the estimated motion vectors, a copy of the previous frame is altered by each vector to produce a prediction of the current frame. This operation is referred to as motion compensation. As described above, the predicted frame is subtracted from the current frame to produce a difference frame which is transformed into the spatial frequency domain by the DCT. These spatial frequency coefficients are quantized and entropy encoded providing further compression of the original video sequence. Both the motion vectors and the DCT coefficients are transmitted to the decoder, where the inverse operations are performed to produce the decoded video sequence.
As mentioned above, motion compensating is very effective at removing temporal redundancy or temporal correlation from a video sequence. However, there exist areas in a video sequence where there is no temporal correlation. These areas result due to new objects entering or leaving the video scene. Also, they can be the result of moving objects covering and uncovering other objects within the video sequence. If motion compensation is used in these areas to removal temporal redundancies, generally a substantial decrease in the video encoder's compression efficiency will result. This decrease in compression efficiency is caused during the generation of the difference image since, in the particular areas where motion compensation fails, a large increase in the DFD signal's energy will result. The energy in these areas is generally larger than the energy contained in the current frame. This problem prohibits the encoding of video to the targeted VLBRs.