Prior art video encoders are designed to operate in one of two distinct ways: variable bit rate mode (VBR) or constant bit rate mode (CBR) mode. In VBR mode, video quality stays constant while bit rate varies. Conversely, in CBR mode, video quality varies while bit rate remains constant. VBR mode may thus be thought of as constant quality mode, and CBR may thus be thought of a variable quality mode.
In at least some implementations, VBR operation has been achieved in prior art video encoders by setting quantization parameters to a constant level that result in effectively constant perceived video quality. Prior art VBR encoding also incorporates a buffering stage that temporarily stores bits before transmitting them over a communications channel. The incorporation of a video buffer in VBR mode enables the encoder to schedule the delivery time of video bits to decoders/receivers in a specified manner. In some implementations, VBR operations can be thought of as VBR with a cap (capped VBR). In capped VBR mode, bit rate is allowed to vary but it is not allowed to exceed a predetermined maximum. Capped VBR behavior can be achieved by temporarily buffering bits and then later transmitting them at the maximum bit rate, thereby limiting the maximum bit rate without impacting video quality, though at the price of increased end-to-end delay.
CBR operation has typically been achieved in prior art video encoders by varying quantization parameters in response to a rate-control feedback signal. Prior art CBR encoding incorporates a buffer that stores bits for transmission in order to fulfill certain timing requirements between the encoder and decoder/receiver. In CBR mode, bits are removed from the video buffer at a constant rate and then transmitted. The rate-control feedback signal, which may be some measure of the fullness of the video buffer, is generally designed to vary the rate at which bits are produced and deposited in the video buffer. The feedback signal is such that as buffer fullness decreases, quantization parameters, for example, are adjusted to increase encoding precision and thus increase the bit rate into the video buffer. Conversely, the feedback signal is such that an increase in buffer fullness results in an adjustment of quantization parameters, for example, that is designed to reduce encoding precision and consequent bit production. The feedback is typically designed to counteract the constant outflow from the video buffer. For example, the feedback signal may be designed so as to maintain the output buffer at some predetermined fractional level of fullness. In at least some embodiments, the feedback signal may be designed to maintain the output buffer level so that the end-to-end delay between encoder and decoder satisfies some predetermined timing constraints. In some situations, the encoder may not be able to produce enough bits to maintain target buffer fullness because particular video sequences require very few bits to encode at best quality. In such situations, the outflow of bits from the video buffer may fall below the target bit rate and the resulting bit stream may appear similar to capped VBR, though video quality will correspond to the maximum achievable by the encoder.
VBR mode is typically preferred in applications that benefit from constant quality and which have no practical imposed bit rate limits, such as DVD authoring. CBR mode is typically preferred in applications that have bandwidth limitations, such as video over DSL. In an ideal world, an encoder would be able to produce constant quality in applications that have hard bandwidth limitations.
A central principle of video encoding is that, at any given level of video quality, video sequences with significant random detail and/or motion require more bits to encode than do sequences with more predictable spatial and temporal detail. Video sequences that contain a significant amount of unpredictable details may be considered to be difficult to encode. In VBR mode, it is not uncommon that the most difficult sections of a video sequence would result in peak bit rates that are several times greater than average bit rate.
Another principle of video encoding is that encoded data must be buffered to satisfy certain timing requirements between encoder and decoder. Video encoding and decoding systems are typically designed so that the end-to-end timing requirements between encoder and decoder satisfy some predetermined interval. It will be appreciated that, for a fixed end-to-end transmission delay, a larger buffer would be required for difficult-to-encode video than would be required for easy-to-encode video. Conversely, for a fixed-size buffer, end-to-end transmission delay would be longer for difficult-to-encode video than it would be for easy-to-encode video.
A disadvantage of VBR in video communications is that channel capacity must be sized to accommodate peak bit rates. In such situations, a large fraction of allocated bandwidth is routinely wasted. The disadvantage is made worse in cases in which several video streams share a common communications channel because the channel must be large enough to accommodate simultaneous occurrence of peak bit rate for each video stream. Thus, the maximum number of channels that could be supported is a function of the peak VBR bit rate and the total channel capacity. Capped VBR could address some of the disadvantages of VBR, but at the expense of larger video buffers and consequently increased end-to-end transmission delay.
A disadvantage of CBR mode in video communications is that perceived video quality can fluctuate noticeably. In at least one sense, CBR mode preferentially boosts the quality of easy-to-encode video sequences and systematically deteriorates the quality of hard-to-encode sequences such as sports video and action scenes.
In some applications, such as satellite and cable, where a predetermined number of particular video streams share a common communications channel, a method known as statistical multiplexing has been used to overcome some of the disadvantages of VBR and CBR. In at least one sense, statistical multiplexing can be thought of as a hybrid of VBR and CBR. Statistical multiplexing can be thought of as combining multiple VBR video streams in such a way as to achieve a constant bit rate overall. Prior art implementations of statistical multiplexing typically incorporate a pool of independent encoders, a controller, and a multiplexer in which the various video streams are combined and retransmitted. The controller typically receives signals from each encoder in the pool, processes the signals to determine the fraction of total bandwidth that will be allocated to each encoder, and then sends some signal to each encoder by which each encoder can determine its particular bandwidth allocation. The downstream signal from each encoder typically provides a measure of how many bits would be required to encode its video at a particular quality; e.g. a proxy for the bit rate that would be required in VBR mode at a particular quality setting. The upstream signal from the controller to each encoder can be thought of as an adjustment to the CBR bit rate setting of the target encoder. As a result, statistical multiplexing balances the bit rates and, indirectly, the quality settings of a pool of encoders with the objective of making most efficient use of a common communications channel. In at least some implementations, statistical multiplexing enables more video streams to share a common channel than would be the case for VBR. In some implementations, statistical multiplexing can also reduce quality fluctuations because bits can be borrowed from simple video streams and allocated to difficult video streams, thereby smoothing quality variations.
Statistical multiplexing and CBR mode share a common disadvantage: each makes sharing of communications channels with non-video services less efficient. In some applications, such as IPTV and some cable applications, is desirable to bundle video, voice telephony, and Internet access together and have all services share bandwidth simultaneously. In CBR implementations, voice telephony and Internet data services would be allocated a fixed portion of the total channel bandwidth; i.e. the portion of total bandwidth not allocated to one or more CBR video streams. However, voice telephony and Internet data services are naturally bursty. The corresponding non-video communications channels would need to be sized to accommodate peak bit rates. As a result, a portion of non-video bandwidth would be routinely wasted because voice and data services would most often require less than peak bit rates. Statistical multiplexing applications suffer the same disadvantages because the overall bandwidth allocated to the multiplexed video streams is constant: i.e., equivalent to CBR mode.
Video communications would be improved if bit rate and video quality were regulated together. One advantage would be lower average bit rates than could be achieved by VBR or capped VBR. Another advantage would be a reduction in the visual quality fluctuations that are observed in CBR mode. Another advantage would be the ability to increase the number of video streams that could be accommodated in a common communications channel, compared to VBR implementations. Still another advantage would be the ability to share video and non-video services, such as voice telephony and Internet access, in a common communications channel more efficiently.
Video communications would be further improved if a hybrid VBR/CBR encoding mode were implemented in individual encoders without the need for a separate controller such as, for example, the separate controller in statistical multiplexing applications. An advantage would be the ability to deploy video encoders either individually or as members of a pool of encoders and achieve at least some of the advantages of traditional statistical multiplexing without the cost of a separate controller.
Video communications would be further improved if an encoder produced metadata that could be processed to produce indications of video bit rate and quality. An advantage, particularly in packet-based networks, would be the ability to manage quality-of-service at various points within a network, including network endpoint, such as subscriber premises.