Rate control is a component that performs a critical function in a modern video encoder. It ensures that the generated compressed bit stream (a) achieves the bit rate target, (b) satisfies maximum average bit rate constraints, and (c) satisfies hypothetical reference decoder (buffering delay) constraints, among others. An optional, but highly desirable, objective is to optimize the video quality of the compressed video bitstream.
Satisfying constraint (a) ensures that the compressed video bitstream will fit the communication pipe or storage space. For example, a wireless network may only allow up to 768 kbps for video transmission. This is in contrast to, say, a Blu-Ray disc, which may allow up to 40 Mbps of video bandwidth for 2D applications and 60 Mbps for 3D applications. In addition, for archival applications or applications where bandwidth can be extremely high (such as reading from a hard drive) one may only specify the total size of the final bitstream file. Constraint (b) also deserves attention, since playback devices can only store and decode a certain number of bits per second. It is possible that during the encoding process one may have the situation where the average bit rate for the entire compressed bitstream achieves the bit rate target, but the average bit rate exceeds it locally, for example for a duration of some seconds. Often this can happen because difficult-to-code areas usually require more bits to ensure consistent or better video quality. However, these bitstream “spikes”, provided they are large enough, can create problems for resource-constrained decoders. Problems that may affect a decoder include either overflowing of internal buffers or the inability to decode the bitstream in time to display the frames in the correct order and with proper timing. Last, constraint (c) is closely related to constraint (b) and can be thought of as a more rigorous set of requirements that a bitstream has to satisfy. In short, the compressed bitstream has to be coded such that if transmitted at the target bit rate it will never cause a decoder buffer overflow or underflow, and as a result, the decoded video will never stall or stop during playback.
Rate control is also tasked with ensuring the best possible video quality given the above bit rate and buffering constraints.
A video sequence may be coded in a single coding pass. This may be due to computational or time constraints, or due to the nature of the content: the content is being streamed live and the fixed delay from capture to delivery to the consumer may only allow a small lookahead into the future. If these constraints are relaxed, one may wish to do more than one coding passes to compress the video sequence. In such a case, rate control benefits from information drawn from previous coding passes. This information, for example, may include a measure of the complexity, such as the number of header and texture bits generated for the given frame type and the quantization parameter (QP), or the temporal correlation of frames in the image sequence, among others, and can improve both bit rate accuracy and help satisfy the bit rate and buffering constraints. Header bits include bits used to code motion information, coding modes, block types, parameter sets, and also information that is not essential to the decoding process such as video usability descriptors. Texture bits include bits used to code the transformed coefficients of the inter or intra prediction residuals. Usually the latter number of bits forms the bulk of the coded bitstream especially for high bit rates.
Furthermore, the information drawn from previous coding passes can greatly improve the quality of the compressed video bitstream. The reason is the availability of coding statistics for the entire video sequence. Such knowledge enables one to efficiently spend bits in the video sequence segments where they will do the most good (as usually measured in terms of rate-distortion performance). It is well known that spending more bits in difficult-to-code areas (high motion scenes, scenes with lots of texture, fades, scene changes, flashes, etc.) as compared to, say, static scenes, will improve quality overall, both subjectively and objectively. In general, the more coding passes, the better the video quality that can be achieved given some fixed bit rate budget. However, there will always be some point of diminishing returns, where the return on coding gain will be trivial compared to the added computational expense.
The traditional rate control paradigm has been applied for the coding of single-layer video bitstreams. A single-layer bitstream corresponding to a single frame needs to be parsed and decoded in its entirety in order to reconstruct the frame. Such bitstreams are created when conforming to Annex H of the H.264/MPEG-4 Part 10 AVC video coding standard. See reference 1, incorporated herein by reference in its entirety.