1. Field of the Invention
This invention relates to the field of video encoding and decoding. In particular, the invention relates to constrained motion estimation and compensation for packet loss resiliency.
2. Description of Related Art
The techniques of video motion estimation, compensation and prediction are used in the encoding and decoding of video data involving motion to increase the compression rate of the video data. There are a number of techniques and standards for video compression. Examples of these techniques and standards include: the Motion Joint Picture Experts Group (JPEG), the wavelets, the International Telecommunications Union (ITU), the H.261/H.263 video codec, and the Motion Picture Experts Group (MPEG) standards. Due to high compression rates, the ITU H.261/H.263 and MPEG-1, MPEG-2 standards are popular in applications involving real-time video telecommunication and video conferencing.
In the H.261/H.263 and MPEG-1, MPEG-2 standards, compression is performed on both intraframes and interframes of data. Interframe compression is the major compression technique used for compressing video data involving moving objects. Interframe compression removes temporal redundancy from successive frames rather than from single frames by encoding the difference between frames. For example, in MPEG compression, block-based motion compensation is employed to reduce temporal redundancy inherent in video motion sequences. Motion compensation is used for both the causal prediction of a current picture from a previous picture, and for non-causal, interpolative prediction from past to future pictures. Two-dimensional motion vectors are used to provide offsets used in previous or upcoming frames whose pixel values have already been stored. These two-dimensional motion-vectors, and the associated error difference signals are then applied back to the previous or following frames in the decoding process to recover the original data.
Essentially, motion compensation involves the use of motion vectors to improve the efficiency of the prediction of pixel values. Both the encoder and decoder in a video compression system are responsible for handling motion compensation. The process of determining the values of the motion vectors for each frame is called motion estimation. In motion compensation, the image is arbitrarily divided into macroblock (xe2x80x9cMBxe2x80x9d) regions. To determine motion vectors (from the previous frame to current frame), a MB in the current frame is compared with the MBs in the search area in the previous frame (which serves as a reference frame). The motion vector is obtained at the location of the MB in the search area that provides the best match, based on some predefined criteria.
In addition to motion compensation, most video compression systems employ additional processing steps such as transformation, quantization and statistical encoding. A popular transform technique is the Discrete Cosine Transform (DCT). A popular quantization method is scalar quantization. An example of statistical codes used for encoding is the variable length Huffman code.
As an example, the H.261/H.263 video encoder combines intraframe and interframe coding to provide fast processing for on-the-fly video. The algorithm creates two types of frames:
(1) DCT-based intraframes without motion compensation, and
(2) predictive interframes using motion estimation and compensation.
The H.261/H.263 encoder is shown in FIG. 1. The H.261/H.263 video decoder is shown in FIG. 2. It consists of a receiver buffer 52, a VLC decoder 55, an inverse quantizer 60, an inverse DCT 62, an adder 65, and a motion compensator 80, which includes frame memory and an optional loop filter 70. The H.261/H.263 coding process begins by coding an intraframe block, which is then sent to a video multiplex coder. Upon receipt, the same frame is decompressed using the inverse quantizer 20 and inverse DCT 22, and then stored in the frame memory 30 for interframe coding.
During interframe coding, the prediction is used to compare every macro block of the actual frame with the available micro blocks of the previous frame. This is done by a motion estimator 33 and motion compensator 34. To reduce the encoding delay, only the closest previous frame is used for prediction. This is accomplished by subtracting a predictive frame from the image by a subtractor 12. Then, the difference, creates as error terms, is DCT-coded and quantized, and sent to the video multiplex coder with or without the motion vector. At the final step, variable-length coding (VLC), such as Huffman encoder 40, is used to produce more compact code. An optional loop filter 35 can be used to minimize the prediction error by smoothing the pixels in the previous frame. At least one in every 132 frames is intraframe coded.
One major problem in such typical video compression systems is the presence of channel errors or packet losses. Packet losses result in the creation of a corrupted bitstream that is not decodable. In addition, errors may propagate resulting in incorrect decoding of variable or fixed-length codes. If the system is not carefully designed, the errors will result in the provision of unacceptable image quality.
Conventional solutions to the packet loss problem involve one of two approaches: (1) avoiding motion compensation and (2) packet loss recovery. These conventional solutions have a number of drawbacks.
Avoiding motion compensation eliminates the use of motion compensation altogether. However, other forms of non-motion compression are required. An example is motion JPEG where compression is only performed intraframe, not interframe. The major drawback of this solution is a reduced compression rate. The typical compression rate for motion JPEG is 10:1 as opposed to compression rates as high as 200:1 for H.261/H.263 MPEG.
Packet loss recovery techniques reduce the effect of packet loss by introducing additional information. Some examples of packet loss recovery techniques include (i) increasing the frequency of sending the Interpolation frames or Intra MBs, and (ii) providing feedback of DCT coefficients at the encoder. However, these techniques in packet loss recovery do not provide consistent results. Such inconsistent performance results in the provision of random motion vectors so that compression from and/or prediction of the lost motion vectors are not reliable.
Accordingly, there is a need in the technology to provide an apparatus and method for motion estimation and compensation for high compression rates, which maintains packet loss resiliency.
The present invention discloses a method and apparatus for transmitting a plurality of images of at least one moving object over a packet network. A plurality of motion vectors based on at least one moving object is generated. The plurality of images is encoded by constraining the plurality of motion vectors so as to produce a bitstream. The bitstream is then transmitted over the packet network.