In a typical video encoder, such as those conforming to, for example, the International Telecommunication Union, Telecommunication Sector (ITU-T) H.261 recommendation, the ITU-T H.263 recommendation, the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-1 (MPEG-1) standard, the ISO/IEC MPEG-2 standard, and the ISO/IEC MPEG-4 Part 10 Advanced Video Coding (AVC) standard/ITU-T H.264 recommendation (hereinafter the “MPEG-4 AVC standard”), a video frame is divided into macroblocks, and each macroblock (MB) may be coded in one of several coding modes. In an inter mode, a motion vector (MV) is first found that points to the best matching block in a previously coded frame, then the difference between this macroblock and its best matching block is coded. Instead of allowing only one motion vector for a macroblock, other inter modes allow the macroblock to be divided into several sub-blocks, and estimate a separate motion vector for each sub-block. Instead of using the frame immediately preceding the current frame, the encoder may also choose among several possible previously encoded frames. For a given inter mode (corresponding to a particular sub-block structure), the motion estimation process determines the best reference frame index and the corresponding motion vector for each macroblock or sub-block by minimizing a motion estimation criterion. In SKIP mode, the encoder presumes the motion vector is zero or predicts the motion vector for this macroblock from the motion vectors of selected neighboring macroblocks that have been coded, and does not code the prediction error. The encoded block is simply the motion compensation block based on the predicted motion vector (which could be zero). In an intra mode, the macroblock is either coded directly or predicted from some previously coded pixels in the same frame (called intra prediction). There are several possible neighborhood patterns for intra prediction, each corresponding to a different intra mode. Among all possible modes, the encoder finally chooses an optimal one, based on a preset mode decision criterion.
In rate-distortion optimized motion estimation and mode selection, both the motion estimation and mode decision criteria are a weighted sum of the distortion of the decoded macroblock and the number of bits used. When the underlying transmission network is not reliable, part of the transmitted video bit stream may be lost. A challenging problem is how to determine the expected distortion at the decoder.
The aforementioned rate-distortion optimized motion estimation and mode decision process requires quantizing and coding the prediction error for each candidate option, so as to determine the quantization distortion and the number of bits required for coding the prediction error. In rate-constrained motion estimation, the criterion used for motion search does not involve the bits required for coding the prediction error, instead involving the prediction error itself.
In most of the prior work on motion estimation and mode decision, the search criterion considers only the quantizer invoked distortion at the encoder (also referred to herein as “quantization distortion”). Specifically, the search criterion is a weighted sum of the quantization distortion and the number of bits needed to code the macroblock using the candidate mode, including the mode information, the motion vectors (if an inter-mode is chosen), and the prediction error signal (or the original pixels if the intra mode does not make use of intra prediction). Such methods are commonly referred to as rate-distortion optimized mode selection. The weight is referred to as the Langrangian multiplier. For motion estimation, simplified search criteria have also been proposed, which uses a weighted sum of the inter-prediction error and the number of bits needed to code the motion vectors. Such methods are commonly referred to as rate-constrained motion estimation.
When a compressed video stream is delivered over a network that may experience bit errors and/or packet losses, the distortion seen at the decoder differs from that at the encoder. The main challenge in rate-distortion optimized motion estimation and mode selection is how to determine the expected decoder distortion for a macroblock given a candidate coding mode and motion vector. A prior art method known as the ROPE method recursively computes and records the first order and second order moments for each pixel in the past decoded frames in the encoder. Based on these recorded first order and second order moments in the previous frames, the encoder can then compute the expected distortion for each macroblock for each candidate coding mode. A problem with the ROPE method is that it is not applicable when motion estimation accuracy is sub-pel, when multiple reference frames are allowed for motion compensation, or when the encoder applies deblocking filtering. The ROPE method is applicable to only a type of error concealment known as frame copy. Also, the ROPE method requires intense computation, since it involves tracking channel distortion at every pixel. An extension of ROPE has been proposed to consider sub-pel motion compensation, however such extension requires substantially more computation than the original ROPE method. Another extension of the ROPE method to motion estimation has been considered, however this extension still assumes integer-pel motion vectors and the use of the frame copy error concealment method.
A prior art approach (hereinafter referred to as the “first prior art approach”) considers the channel-induced distortion in mode decision, and computes the expected distortion for each macroblock for a candidate mode. The first prior art approach uses a method for determining the expected distortion that requires the storage of the concealment distortions of all macroblocks in all previously coded frames after an I-frame. As with the ROPE method, the first prior art approach does not take into account sub-pel motion compensation, multiple reference frames for motion compensation, and deblocking filtering.
A second prior art approach involving a block-level decoder distortion model has been proposed that also recursively calculates the expected decoder distortion at the macroblock level. The second prior art approach determines the distortion of the current macroblock by separately considering the cases when the corresponding matching block in the previous frame is received and lost. However, the second prior art approach is only applicable when the encoder motion vectors are available at the decoder for error concealment. Moreover, the second prior art approach separately needs to track the distortion when a block is received and the distortion when a block is lost, for each macroblock in each past frame, hence requiring a significant amount of computation and memory space. Also, the second prior art approach is used to estimate the expected distortion for each macroblock based on encoder chosen motion vectors and coding modes, rather than for motion estimation and mode decision.