Video coding often includes making a sequence of loosely dependent election decisions, Mi's, of some length N. Each decision Mi involves selecting a “mode” from a mode set m={m1, . . . , mK} of cardinality K. Loosely dependent means the cost, Ci, associated with each selection Mi, depends directly on Mi and a small subset of dependent decisions hi={Mj, Mp. . . }. To optimize the decision a determination, is made of the optimal sequence of selections M=[M1, . . . , MN] that minimizes the sum,
            ∑              i        =        1            N        ⁢          C      i        ,of costs, Ci. A general framework that solves optimization problems of this class is the corresponding optimization framework.
Two actual applications of the corresponding optimization framework are now considered.
One application of this optimization framework is referred to as the macroblock mode selection problem. Many popular video coding standards, such as MPEG 1/2/4 and H.263, employ a segmentation/multi-mode approach, in which a video frame is first divided into multiple sections called macroblocks. The macroblocks are coded sequentially in some predetermined order The coding method for each macroblock depends on the mode or the macroblock. Each macroblock mode is selected from a mode set; also each mode models a particular type of scene activity.
For example, in the case of the H.263 standard, a motion picture is expressed by a series of I-frames or I-pictures and P-frames or P-pictures. An I-frame is a frame that is coded by so called intra-frame coding while a P-frame is a frame that is coded by so called inter-frame coding. I-frames are coded independently from the other frames, and all the macroblocks in an I-frame are coded in a predetermined sequence, each with a coding mode referred to as Intra. In the Intra mode, the content of each macroblock is coded independently of any other macroblocks, and hence a particular macroblock is not dependent on any other macroblocks. Each macroblock of a P-frame is coded by one of the four modes: Intra, Inter, Skip, and Inter-4. Intra mode coding for a P-frame is the same as Intra mode coding for an I-frame. In the Inter mode, the content of a macroblock Xt(i,j) of frame t and location (i,j) is first estimated using the content of a macroblock Xt−1(s,t) from the immediate previous frame t−1. Location of Xt−1(s,t) is specified relative to Xt(i,j) by a motion vector (MV) (s−i, t−j). The pixel difference between the estimate Xt−1(s,t) and the actual content of the macroblock Xt(i,j) is then intra-coded using a coding technique similar to the Intra coding mode. During actual encoding of macroblock Xt(i,j), macroblock Xt(i,j) is represented in compact binary form. Macroblock Xt(i,j) is differentially encoded for bit-saving. An estimated MV, called predicted motion vector (PMV), is calculated by using MVs of three surrounding macroblocks Xt(i−1j), Xt(i−1,j+1), Xt(i,j−1), if the three surrounding macroblocks exist. Only the difference between PMV and MV is actually encoded. As a consequence, there exists a dependency between the current macroblock Xt(i,j) and its surrounding macroblocks Xt(i−1,j), Xt(i−1,j+1), Xt(i,j−1).
In the Skip mode, the macroblock of the same location of the previous frame t−1 Xt−1(i,j) represents the current macroblock Xt(i,j) of the current frame t and no information of the current macroblock is encoded. In the Inter-4 mode, the current macroblock Xt(i,j) is further divided into four blocks, each of which performs Inter-coding similar to the Inter mode as described above.
Thus, the coding method for each macroblock of a P frame depends on the mode of the macroblock, which is selected from a mode set comprising Intra, Inter, Skip and Inter-4. Selecting the right set of modes for a P-frame involves more than just individually selecting the mode that results in highest visual quality for a macroblock. In particular, the best quality mode may require too many bits to encode, and hence a lower quality that results in a lower number of bits may be necessary. Further, the coding efficiency of a macroblock depends on the mode selection of the macroblock's neighboring blocks. The macroblock mode selection problem thus involves selecting coding modes for a set of macroblocks in a P-frame, taking into consideration the scene activities of the P-frame and the inter-dependencies of neighboring macroblocks, to minimize visual distortion while satisfying a bit rate constraint.
Another prior art application of the optimization framework is the mapping of video packets to different network services of varying qualities, as illustrated in FIG. 1. Video packets are often dependent on each other due to differentiated coding of video frames. FIG. 1 includes illustrations of two differentiated coded video formats, F1 and F2. At time t2, for example, video packet P of format F1 depends on video packet I at time t1, while video packet P2 of format F2 depends on video packet P1 at time t2, as well as video packet I1 and I2 at time t1. Given this type of dependency, sending a video packet, x, via a network having a particular quality of service affects the performance of the dependent video packets. The problem is how to map video packets in different modes to different networks having different service qualities to minimize end-to-end visual distortion.
To provide a concrete example, consider the first application of the optimization framework, i.e. the macroblock mode selection problem. Initial consideration is given to the details of the macroblock mode selection problem to outline related work for this problem.
The goal of the macroblock mode selection problem, like other rate-distortion optimization problems, is to minimize the amount of distortion subject to a bit rate constraint. By assuming that the distortion metric is additive, the mode selection problem (that is, selecting the best sequence of modes M=[M1, . . . , MN] for the N macroblocks, where each mode Mi is selected from a mode set m={m1, . . . , mK}), can be expressed as follows:
                                          min            M                    ⁢                                    ∑                              i                =                1                            N                        ⁢                                                            D                  i                                ⁡                                  (                  M                  )                                            ⁢                                                          ⁢                              s                .                t                .                                                      ∑                                          i                      =                      1                                        N                                    ⁢                                                            R                      i                                        ⁡                                          (                      M                      )                                                                                                          ≤                  R          s                                    (        1        )            where Di(M) is the resulting distortion of the ith macroblock MBi having a mode sequence M, Ri(M) is the resulting rate of MBi having a mode sequence M, and Rs is the bit rate constraint of the frame.
As the constrained problem (1) is difficult to solve, the conventional approach is to solve, instead, the corresponding Lagrangian, expressed as follows:
                                          min            M                    ⁢                                    ∑                              i                =                1                            N                        ⁢                                          D                i                            ⁡                              (                M                )                                                    +                  λ          ⁢                                          ⁢                                    R              i                        ⁡                          (              M              )                                                          (        2        )            
It can be shown that for a given multiplier λ, an optional solution MO to (2) is also an optimal solution to (1) if the following equation (2)′ is satisfied.
                                          ∑                          i              =              1                        N                    ⁢                                    R              i                        ⁡                          (                              M                O                            )                                      =                  R          s                                              (          2          )                ′            
If equation (2)′ is not satisfied, an appropriate value for λ must be derived to drive the sum to Rs while satisfying the inequality of equation (1), since the approximation bound is related to the numeric difference between the sum and Rs A wealth of literature has proposed techniques for finding the appropriate λ. Such literature includes T. Wiegand et al, “Rate-distortion optimized mode selection for very low bit rate video coding and the emerging h.263 standard.” IEEE Trans. on CSVT, April 1996, vol. 6, no.2, and G. Sullivan and T. Wiegand, “Rate-distortion optimization for video compression,” IEEE SP Magazine, November 1998.
As discussed earlier, distortion and rate of MB do not directly depend on modes of all the other macroblocks for most video coding standards. In particular, recall that for a P-frame of H.263, a particular predicted motion vector PMV is first calculated for each macroblock using the three neighboring macroblocks of the particular macroblock, if the neighboring macroblocks exist. The actual motion vector for the macroblock is the sum of the particular PMV and the differentially encoded vector for the particular macroblock. This means that MBi depends directly only on the mode selection of its three neighboring macroblocks. This is discussed in ITU-T Recommendation H.263, Video Coding for Low Bit rate Communication, February 1998.
The above mentioned paper by T. Wiegand et al. as well as “Combined mode selection and macroblock quantization step adaptation for the h.263 video encoder”, D. Mukherjee and S. Mitra, ICIP, 1997 have assumed that this dependency is too complex and made the following simplifying assumption to ease the optimization. The rate and distortion of MBi depend only on the mode of at most one other macroblock, typically the left neighboring macroblock. With this assumption, and assuming that the macroblocks are numbered from left to right, and top to bottom, the expression (2) simplifies to:
                                          min            M                    ⁢                                    ∑                              i                =                1                            N                        ⁢                                          D                i                            ⁡                              (                                                      M                                          i                      -                      1                                                        ,                                      M                    1                                                  )                                                    +                  λ          ⁢                                          ⁢                                    R              i                        ⁡                          (                                                M                                      i                    -                    1                                                  ,                                  M                  i                                            )                                                          (        3        )            
The main benefit of using this assumption is that a single dependency relationship leads simply to a special state transition diagram (SD) or trellis, and the Viterbi algorithm can be used to find the optimal solution to (3) by finding the least cost path having a length N through the trellis.
While this approach is efficient, it is optimal only for the specific case of the dependency being singular. In video coding, multi-dependency exists between macroblocks, as discussed earlier.
Another existing approach is more general, but is computationally complex as it includes an exhaustive search through all possible combinations of modes. This approach fails to capitalize on the loosely coupled nature of the dependency, making the computation more complex than necessary. For example, the order of complexity of the exhaustive search for a doubly dependent decision, where each mode selection depends on two other selection, is an exponential function of the size of the input: O(KN).
Thus, there is a need for an efficient mode optimizer that can work for the multiple dependencies. The mode optimizer needs to have significantly less computational complexity than the exhaustive search approach.