The invention relates to a video coding system. In particular, it relates to two standard systems for the compression of video sequences using motion-compensated prediction: ITU-T H.263 and MPEG-4 Very Low Bitrate Video (VLBV).
Two standard systems for the compression of video sequences using motion-compensated prediction are: ITU-T H.263, described in ITU-T SG 15 Experts Group for Very Low Bitrate Visual Telephony, Draft Recommendation H.263, February 1995; and MPEG-4 Very Low Bit rate Video (VLBV), described MPEG-4 Video Verification Modelxe2x80x94Version 5.0, by the MPEG Video Group, Doc. ISO/IEC/JTCI/SC29/WG11, N1469 Maceio, November 1996. There have been some extensions to H.263 after the definition of Version 1. The extended version is often referred to as H.263+. The term H.263 is used here to refer to the un-extended version, i.e. Version 1. Because of the similarity between the algorithms employed in MPEG-4 VLBV and in H.263, the discussion here will focus on algorithms for H.263.
H.263 is an ITU-T recommendation for the compression of video sequences at low bit rates ( less than 64 kbits/sec), and is based on an earlier ITU-T recommendation, H.261. A block diagram of the control components of an H.263 encoder is depicted in FIG. 1. (The video coder is not shown.) The main elements of such an encoder are a prediction module 11, a pair of block transformation modules 12 (the transform modules T and Txe2x88x92), and a pair of quantization modules 13 (the quantizer modules Q and Qxe2x88x92). In addition, there is a coding control module 14.
The coding control module 14 determines all coding parameters; it acts as the brain of the system. The INTRA/INTER decision data flow from the coding control signals whether the knowledge of previous frames will be exploited for encoding the current frame.
The quantization indication data flow provided by the coding control module determines what are called quantizer parameters to be used for each macroblock. The determination can be at the frame level or at the macroblock level. The invention focuses on the generation of this signal.
The xe2x80x98video multiplex coderxe2x80x99 referred to in FIG. 1 is simply a multiplexer, and does not use xe2x80x98video inxe2x80x99 as one of its inputs. The signals indicated as being provided xe2x80x9cto video multiplex coderxe2x80x9d comprise the compressed representation of a video signal.
To exploit the temporal correlation between successive frames, the system first performs a motion-compensated prediction if previously reconstructed frames are available and useful, the term xe2x80x98usefulxe2x80x99 indicating that using the previous frames (INTER coding) would yield better compression performance than not using them (INTRA coding). If the consequent frames are not sufficiently correlated, it may be possible that INTRA coding yields better compression performance. The system first performs the motion-compensated prediction using motion information and the previously reconstructed frames. The motion information (v in FIG. 1) is transmitted (from encoder to decoder) in order to allow the decoder to perform the same prediction as the prediction performed in the encoder. Next, a block transform, referred to as a DCT (i.e. a Discrete Cosine Transform), is applied to the prediction error (or the frame itself in the case of no prediction) in the block T to exploit the spatial correlation. Finally, the DCT coefficients of the prediction error are quantized and entropy coded in the block Q. The quantizer is the main mechanism for introducing loss to the video sequence to achieve higher compression. The amount of loss (and thus the bit rate) is controlled by the quantizer step size, which in turn is parameterized by a quantizer parameter (QP), having integer values between 1 and 31 and provided by the coding control module 14 to the block Q. QP must be known to the decoder, so it is transmitted as side (ancillary) information (and designated as qz in FIG. 1).
A fundamental layer of H.263 is the macroblock layer. A macroblock (MB) is the basic building block of H.263 in the sense that the main elements of encoding (prediction, block transformation and quantization) can be performed by processing one macroblock at a time. A macroblock consists of the representations of a 16xc3x9716 luminance block, two 8xc3x978 chrominance blocks, and macroblock-level coding parameters (such as macroblock type, etc.); some macroblock-level coding parameters are optional. Macroblocks are transmitted in raster scan order. The macroblock-level coding parameters are encoded at the macroblock level only when there is a need to do so, since they are costly in terms of bits. (There is a frame-level encoding of the coding parameters such as the quantization parameter QP. When macroblock-level encoding is not performed, these frame level values are used.)
H.263 provides limited macroblock-level control of QP; there is an optional 2-bit DQUANT field that encodes the difference between the QP of the current macroblock and the QP of the previously encoded macroblock. (See Section 5.3.6 of H.263+(February 1998).) Due to the bit-field restriction described in section 5.3.6 of H.263, the macroblock QP can be varied by at most xc2x12 each time it is changed. In a scenario where QP variation is used for rate control, such a restriction on the range of variation of QP is quite sufficient. However, there may be other reasons to vary QP besides rate control, such as region-of-interest coding, which is a technique of allocating more bits (and thus introducing less loss) to an automatically-detected or user-defined region of interest, such as a human face.
The Problem Solved by the Invention
Consider a scenario for which macroblock-level QP variation is needed for a purpose other than rate control, such as region-of-interest (ROI) coding. In such a scenario, limited macroblock-level control of QP poses a significant restriction. An arbitrary QP distribution either suggested by a region-of-interest analyzer or input by a user cannot be fully realized by an H.263 (or MPEG-4 VLBV) encoder. Thus, such an encoder needs to choose an approximate realization that is as close to the originally suggested distribution as possible, in some defined sense. Given a definition of optimality (or equivalently a measure of cost), the invention provides a method to optimally choose the realization, i.e. it provides a method to minimize the total cost incurred by constrained realization, the total cost being defined so as to be lower the closer the QP distribution is to the suggested distribution, but higher the more bits that must be used.
As a related problem, in H.263+, defined in ITU-T SG 15 Experts Group for Very Low Bitrate Visual Telephony, Draft Recommendation H.263 Version 2, January 1998, an exact representation of an arbitrary distribution may be very costly in terms of a bit budget, and hence may not be the most desirable solution. Quantization can be varied on a macroblock basis, and no finer variation is possible. By an exact representation of an arbitrary distribution is meant an arbitrary selection of macroblock QPs. Consider a vector QP, formed by assigning a separate QP for each macroblock. Representing some of the component vectors cost much less (in bits) than representing others. If a QP is chosen for each macroblock without considering the QP values of the neighboring macroblocks, it is likely that a prohibitively high number of bits will be spent in representing the variation of QP from block to block.
An encoder needs to find the best trade-off between following the original suggestion (for a QP distribution) by a region of interest analyzer on the one hand, and minimizing the total bits spent for QP variation on the other. For this related problem, the invention provides a method for finding the best trade-off when the cost function is defined to represent the trade-off.
How the Problem was Solved Earlier
According to the prior art, for MPEG-4 video encoders employing ROI coding, QP control is achieved through the separation of each frame into two video-object planes (VOPs), a foreground VOP and a background VOP, as described in Low Bit-Rate Coding of Image Sequences Using Adaptive Regions of Interest, by N. Doulamis, A. Doulamis, D. Kalogeras, and S. Kollias, IEEE Tran. on CAS for Video Technology, pp. 928-934, vol. 8, no. 8, December 1998, and as also described in Video Segmentation for Content-Based Coding, by T. Meier, and N. Ngan, IEEE Tran. on CAS for Video Technology, pp. 1190-1203, vol. 9, no. 8, December 1999. Such a separation overcomes the xc2x12 variation restriction at foreground/background separation boundaries.
The prior art solution has two shortcomings. First, there is a bit overhead for encoding VOP segmentation, which might become prohibitive if the foreground is not compact. It is difficult to take these bits into account when deciding on the foreground/background separation. Thus, optimization is difficult, if not impossible. Second, the prior art solution does not directly provide graded QP control within each VOP (i.e. variable within each VOP). The value of QP must be varied further within the background and/or within the foreground to achieve graded control within each VOP. There is still a restriction on how QP can be varied, and optimization becomes even more impractical than the original problem of delicate QP control within a single VOP comprising the whole frame.
In H.263, VOP structure is not supported. So, even this solution is not available. Hence, xc2x12 variation restriction applies to all macroblocks in H.263. There is thus no technique in H.263 for frame-level optimization of macroblock QP selection for ROI coding (or any other scenario requiring delicate QP control).
H.263+ provides a bit-costly mechanism for precise QP control; it is possible to represent an arbitrary QP for a macroblock by spending (using) 6 bits. For very low bit rate applications (which are the main focus of H.263 and H.263+), representing an arbitrary QP distribution can easily become prohibitive in terms of a bit budget, so a frame-level optimization is especially advantageous. However, as in H.263, there is no known technique in H.263+ for frame-level optimization of macroblock QP selection for ROI coding (or any other scenario requiring delicate QP control).
What is needed is a mechanism for providing precise QP control, i.e. frame-level control, in a way that is not bit-costly so as to be of use in low bit-rate applications.
Accordingly, a first aspect of the invention provides a method for selecting a sequence of quantization parameter values in a video encoder, the video encoder being arranged to encode a video frame as a sequence of n macroblocks and to assign a quantization parameter value for each macroblock of the video frame, the method characterized in that quantization parameter values assigned to at least a sub-set of said sequence of n macroblocks are optimized in such a way as to minimize a cost associated with their encoding.
In accord with the first aspect of the invention, said sub-set of said sequence of n macroblocks may comprise all of said n macroblocks.
Also in accord with the first aspect of the invention, optimization of said quantization parameter values assigned to at least a sub-set of said sequence of n macroblocks may be performed using a Viterbi search algorithm.
Also in accord with the first aspect of the invention, optimization of said quantization parameter values assigned to at least a sub-set of said sequence of n macroblocks may be performed by comparing a cost associated with encoding a suggested sequence of quantization parameter values with a cost of encoding a candidate sequence of quantization parameter values.
Still also in accord with the first aspect of the invention, in applications where the method is for providing frame-level control of a quantizer parameter QP of a video encoder, the video encoder having one quantizer parameter for each macroblock of a frame, a frame consisting of a number n of macroblocks, the method providing an optimizing quantizer parameter sequence Q* from among a set of all possible quantizer parameter sequences having a quantizer parameter sequence Q as an arbitrary element serving as a candidate quantizer parameter sequence, the optimizing quantizer parameter sequence Q* minimizing a cost function C(Q,S) indicating a cost of using the candidate quantizer parameter sequence Q in place of a suggested quantizer parameter sequence S, the method may include: a step of receiving a suggested quantizer parameter sequence S; a step of defining a cost function C(Q,S) having a component D(Q,S) representing a discrepancy between the candidate sequence Q and the suggested quantizer parameter sequence S as measured according to a predetermined criterion, and having a component R(Q) proportional to the number of bits spent in representing the candidate sequence Q; and a step of determining the optimizing quantizer parameter sequence Q* as that which minimizes the cost function C(Q,S); thereby providing an optimizing quantizer parameter sequence Q* that approximates the suggested quantizer parameter sequence S in a bit-efficient way.
Further, the suggested quantizer parameter sequence S may be provided by a region-of-interest analyzer.
Also further, the step of determining the optimizing quantizer parameter sequence Q* as that which minimizes the cost function C(Q,S) may comprise the step of computing, for a value k falling in the range of quantizer parameter values spanned by the quantizer parameters of the suggested quantizer parameter sequence S, an optimum constrained cost function C*k(St), the optimum constrained cost function being a function of a partial suggested quantizer parameter sequence St of a number t of quantizer parameters and indicating the lowest cost achievable by using any possible partial candidate quantizer parameter sequence Qt,k having as a last, tth element a quantizer parameter with the value k.
Still also further, the step of determining the optimizing quantizer parameter sequence Q* as that which minimizes the cost function C(Q,S) may comprise the substeps of: determining from the suggested quantizer parameter sequence S a range of quantizer parameter values from a predetermined minimum to a predetermined maximum; setting a sequence length t equal to one; setting a value k equal to the predetermined minimum; computing an optimum constrained cost function C*k(St), the optimum constrained cost function being a function of a partial suggested quantizer parameter sequence St of the number t of quantizer parameters, the computing of the optimum constrained cost function for t greater than one being based on a recursion relation giving C*k(St) in terms of C*k(Stxe2x88x921) and involving a term r(k,j)+C*j(Stxe2x88x921), where r(k,j) is an element of the cost function component R(Q) proportional to the number of bits spent in representing Q with the variable j having a value in the range determined from the suggested quantizer parameter sequence S; storing for the current t and k the value of j that minimizes the term r(k,j)+C*j(Stxe2x88x921); computing the optimum constrained cost function C*k(St) and storing for the current t and k the value of j that minimizes the term r(k,j)+C*j(Stxe2x88x921) for successive values of k each greater by one than the previous until k is equal to the predetermined maximum and for successive values of t each greater by one than the previous value of t until t is equal to the number of macroblocks in a frame; determining the optimum cost function C*(St=n) based on a comparison of C*k(St=n) for all k values in the range of quantizer parameter values in the suggested sequence S; and constructing the optimizing sequence Q* by a process of first setting the quantizer parameter for the last macroblock equal to the value of k that minimizes C*k(St=n), and then tracing backward, assigning to each previous quantizer parameter in the optimizing sequence being constructed the value of j stored for the next macroblock.
Still even also further, the cost function component D(Q,S) may be of the form given by the equation,       D    ⁡          (              Q        ,                  xe2x80x83                ⁢        S            )        =            ∑              i        =        1            n        ⁢          xe2x80x83        ⁢                  d        ⁡                  (                                    q              i                        ,                          xe2x80x83                        ⁢                          s              i                                )                    ,      
where d(qi,si) is a memoryless macroblock-level nonnegative cost component having as arguments an element qi of the candidate quantizer parameter sequence and a corresponding element si of the suggested quantizer parameter sequence; the cost function component R(Q) may then be of the form given by the equation,       R    ⁡          (      Q      )        =            ∑              i        =        1            n        ⁢                  r        ⁡                  (                                    q                                                i                  ,                                ⁢                                  xe2x80x83                                                      ⁢                          q                              i                -                1                                              )                    ,      
where r(qi,qixe2x88x921) is a memoryless macroblock-level nonnegative cost component expressing a relation between the number of bits used to encode the quantizer parameter for macroblock i and the number of bits used to encode the quantizer parameter for macroblock ixe2x88x921; an element r(qi,qixe2x88x921) of the cost function component R(Q) may then be given by the equation,       r    ⁡          (                        q          i                ,                  q                      i            -            1                              )        =      {                                                                      0                ⁢                                  ,                                      xe2x80x83                                                  ⁢                                  xe2x80x83                                ⁢                for                ⁢                                  xe2x80x83                                ⁢                                  "LeftBracketingBar"                                                            q                      i                                        -                                          q                                              i                        -                        1                                                                              "RightBracketingBar"                                            ≤              2                                                                                          +                ∞                            ,                              xe2x80x83                            ⁢              otherwise              ⁢                              xe2x80x83                                                        ,      
and an element d(qi,si) of the cost function component D(Q,S) may be given by the equation
d(qi,si)=(qixe2x88x92si)2;
and in addition, an element r(qi,qixe2x88x921) of the cost function component R(Q) may then be given by the equation,       r    ⁡          (                        q          i                ,                  xe2x80x83                ⁢                  q                      i            -            1                              )        =      {                                                      λ              ⁢                              xe2x80x83                            ⁢                              N                                                      q                    i                                    ,                                      xe2x80x83                                    ⁢                                      q                                          i                      -                      1                                                                                  ,                              xe2x80x83                            ⁢              if              ⁢                              xe2x80x83                            ⁢                              q                i                            ⁢                              xe2x80x83                            ⁢              is              ⁢                              xe2x80x83                            ⁢              representable                                                                                          +                ∞                            ,                              xe2x80x83                            ⁢              otherwise                                          ,      
in which xcex is a predetermined nonnegative value useful as a rate distortion trade-off parameter and in which Nqi,qixe2x88x921 is the number of bits for representing qi given qixe2x88x921; and still also, an element d(qi,si) of the cost function component D(Q,S) may be given by the equation
d(qi,si)=∥mse(qi)xe2x88x92mse(si)∥,
where mse( . . . ) denotes the mean square error between the original macroblock content and macroblock content reconstructed using the indicated quantizer parameter sequence element qi or si.
In accord with a second aspect of the invention, an apparatus is provided for performing the method according to the first aspect of the invention.
Thus, the invention is based on a frame-level algorithm for optimal selection of macroblock QPs. The optimizing criterion can be externally defined, which allows customizing the algorithm to different modes of operation. Although the algorithm employs a dynamic programming approach, the computational complexity of the core algorithm is reasonably low.
The invention is, to the inventor""s knowledge, the first method for frame-level optimization of macroblock QP selection in either the H.263(+) or MPEG-4 framework. In a delicate QP control scenario, the optimization of the invention yields better utilization of a constrained bit budget than the prior art, which provides suboptimal delicate QP control. For example, an H.263+ encoder provides the capability of fully controlling QP variation, but at a high cost (in bits). Furthermore, an H.263 encoder can only suboptimally approximate a QP variation suggested by an ROI coding engine. Good utilization of a bit budget is especially advantageous in very low bit rate video compression applications.
The computational complexity of the core algorithm is reasonably low. If the computational cost of calculating the values of the elements of the cost function, e.g. the elements r(qi,qixe2x88x921) and d(qi,si) in some embodiments, is not taken into account (because the elements of the cost function are usually calculated outside the core algorithm), then the core algorithm typically executes approximately one addition and one comparison operation (or less) per pixel. (There are 256 pixels in a macroblock.) Such a number of operations per pixel is very small overhead for a high performance video encoder.