In video encoding, it is desirable to determine how best to accurately estimate the rate-distortion (RD) curve of a video frame. When the rate-distortion characteristics of a frame are known, one can optimally allocate the limited coding resources, usually the coding bit rate, to different frames such that an optimized overall coding performance is achieved. Most often, the problem is formulated as rate-distortion optimized frame-level bit rate allocation, where the objective is to minimize either the average or the maximum mean squared error (MSE) source coding distortion, subject to a specific total bit rate and buffer constraint. Hence, whether or not the rate-distortion characteristics of a frame can be accurately estimated will critically affect the resultant overall rate control performance.
In practice, existing video coding standards specify a finite number of quantization scales for encoding. Effective rate control can be carried out knowing the resultant rate-distortion data of a frame after applying each legitimate quantization scale. For convenience, in our discussion, it is presumed that the prediction residue data for transform coding is already available. The problem now is to calculate all the R-Q and D-Q data for all the valid Q's, where “R-Q” denotes the resultant coding bits with a certain Q, “D-Q” denotes the resultant coding distortion with a certain Q, and “Q” denotes the quantization scale, i.e., quantization step size. Note that there is one-to-one mapping between Q and the quantization parameter (denoted by QP) defined in video coding standards and recommendations. For example, in the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-4 (MPEG-4) Part 10 Advanced Video Coding (AVC) standard/international Telecommunication Union, Telecommunication Sector (ITU-T) H.264 recommendation (hereinafter the “MPEG-4 AVC standard”), QP ranges from 0 to 51, and each QP corresponds to a certain quantization step size or scale Q. To exactly calculate the rate-distortion data, one has to exhaustively encode the frame with all the Q's, using brut force. Although exhaustive calculation gives the highest accuracy, it also incurs prohibitive computation complexity and, thus, in practice, many various rate-distortion models have been proposed, targeting accurate rate-distortion data estimation with low or reduced complexity.
Most existing rate-distortion models are analytical models. In these analytical models, R or D are represented as an explicit function with respect to the quantization scale Q and the variance of the residue signal σ2.
We know that in principle the resultant rate and distortion of coding a frame is related to not only the quantization scale but also the characteristic of the source video signal itself. However, the characteristic of source video signals are non-stationary. Hence, in analytic models, the variance of the prediction residue signal is commonly adopted to account for non-stationary video signals. Regarding distortion modeling, while the distortion estimate may bear the simple form of a unified function with respect to Q and σ2 in one prior art distortion estimating approach, in another approach D can be more accurately estimated via a piecewise function which gives a different D-Q or D-σ2 relationship according to the different relative magnitude of Q with respect to σ. The most notable advantage of analytical rate-distortion modeling is its low computation complexity. One only needs to first calculate σ2, and then can directly estimate R or D according to the prescribed function. The variance calculation can be simply conducted on a spatial domain residue signal, requiring no transformation and quantization operations and, thus, incurs very low computation complexity. However, the disadvantage of D-Q analytic modeling is its compromised estimation accuracy, which is mostly because of the inadequacy of using only the variance to fully account for the impact of video signal non-stationarity in rate-distortion estimation. This shortcoming is ameliorated in the more recent ρ-domain analytic RD models, where instead of the traditional R-Q and D-Q models, the. new model is based on the percentage of zero quantized coefficients, denoted by ρ, which bears a one-to-one mapping with Q. Note that ρ is an outcome of applying Q to the transformed residue signal, and thus, reflects not only the information of Q but also the information of the non-stationary source video signal. The ρ-domain models yield better modeling performance than the other existing Q-based models, while the price here is a little increased computation complexity due to the additional involvement of a Discrete Cosine Transform (DCT).
The analytic models assume a fixed explicit relationship between RD and Q (or ρ). However, in practice, the actual rate-distortion data of a frame renders an operational rate-distortion curve which, more often than not, is not smooth or piecewise smooth at all. This mismatch may greatly compromise the estimation accuracy of analytic models. To ensure high accuracy, while still reducing the complexity, an empirical approach was proposed, where exhaustive encoding is only conducted for a small set of selected Q's, and the rate-distortion data of the rest of the Q's are interpolated from the available ones. Although the modeling accuracy of the empirical model is better than that of analytic models, it requires multiple additional encoding operations, which still poses a significant amount of additional computation load, and may not be always acceptable in real-time video streaming systems.
It is also worthwhile noting that in terms of R modeling, the ρ-domain model already achieves high estimation accuracy, and the scope for further improvement is very limited. However, in terms of D modeling, both the ρ-domain model and the existing Q-based models cannot render as good an estimation performance as that of the ρ-domain R model.